On Tue, Apr 8, 2008 at 11:21 AM, Gregg Yows <gregg / yows.net> wrote:
> Code:
>
>  "<td align="left" ><div style="width: 165px; height: 175px;"><a
>  href="http://www.amazon.com/Rails-Recipes/dp/0977616606/ref=pd_sim_b_njs_img_1">testPit
>  something here Best</td>"
>
>
>  Pattern:
>
>  <td.*?>.*?<\/td\s*>
>
>
>  I'm trying to match this whole block and use it for further parsing.
>  This started from an example in Brian Merick's book "Everyday
>  Scripting..." that had to be modified because amazon has changed their
>  presentation to tables instead of lists.
>
>  Anyway, the regex works fine as a single-line. as soon as I introduce
>  this:
>
>  "<td align="left" ><div style="width: 165px; height: 175px;"><a
>  href="http://www.amazon.com/Rails-Recipes/dp/0977616606/ref=pd_sim_b_njs_img_1">testPit
>  something here
>
>  Best</td>"
>
>  it fails.
>
>  When I try this same expression with perl using the //s mode, it works.
>  I understand Ruby uses //m (multi-line mode in nearly the same fashion
>  causing newlines to be considered any character, so it should work,
>  right? Can anyone tell me what I am doing wrong here? Why isn't
>  "multiline" mode working?
>
>  Thanks!

<CODE>

s = '<td align="left" ><div style="width: 165px; height: 175px;"><a
href="http://www.amazon.com/Rails-Recipes/dp/0977616606/ref=pd_sim_b_njs_img_1">testPit
something here

Best</td>'

puts "######\ns:"
puts s

r1 = /<td.*?>.*?<\/td.*?>/m
r2 = /<td.*?>(.*?)<\/td.*?>/m

puts "######\nscan with r1:"
puts s.scan(r1)
puts
puts "######\nmatch with r1:"
puts (s.match r1)[0]
puts

s =~ r1
puts "######\n=~ and $1 with r1:"
puts $1

puts
puts
puts

puts "######\nscan with r2:"
puts s.scan(r2)
puts
puts "######\nmatch with r2:"
puts (s.match r2)[0]
puts

s =~ r2
puts "######\n=~ and $1 with r2:"
puts $1

</CODE>

Hmm, I'm not sure if the regexp /<td[^>]*>.*?<\/td[^>]*>/m would be
more appropriate or not.

Todd