--001636163ed32dc7430462913178 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit On Tue, Feb 10, 2009 at 3:19 AM, William James <w_a_x_man / yahoo.com> wrote: > Joao Silva wrote: > > > how i can extract: > > > > <td>Traffic left:</td><td > > align ght><b><script>document.write(setzeTT(""+Math.ceil(-123313/100 > > 0)));</script> MB</b></td> > > > > i need this nuber: 123313? I tried to match this in many ways but i > > stil have problem with escape characters. > > > list ATA.read.scan( %r{<td.*?>\s*(.*?)\s*</td>}im ).flatten > > list.each_cons(2){|a,b| > if "Traffic left:" a and b /Math.ceil\((-?\d+)/ > p $1 > end > } > > > __END__ > > <td>NOT TRAFFIC LEFT:</td><td > align ght><b> > <script>document.write(setzeTT(""+Math.ceil(-9999999/1000))); > </script> > MB</b></td> > > <td> Traffic left: > </td><td > align ght><b><script> > document.write(setzeTT(""+Math.ceil(-123313/1000))); > </script> > MB</b></td> > > As 7Stud pointed out, a toolbox with only regular expressions inside is often a poor choice for dealing with xml/html Here's a rather verbose and commented program using a combination of hpricot and a regular expression to do something like what I think you are looking for: require 'rubygems' require 'hpricot' def get_traffic_left_numbers(string) doc pricot(string) results ] # iterate over all of the td elements in the document traffic_lefts oc.search("td").each do |td1| # check to see if the td contents is "Traffic left:" if td1.inner_text "Traffic left:" # if yes, get the next sibling td2 d1.next_sibling # and then for each script tag inside td2.search("script") do | script | # get the script_tag text script_text cript.inner_text # Use a regexp to capture the number number Math\.ceil\(-?(\d+)/.match(script_text) # add the number we found, if any, to the results array results << number[1] if number end end end results end p get_traffic_left_numbers("<td>Traffic left:</td><td align ght><b><script>document.write(setzeTT(""+Math.ceil(-123313/1000)));</script> MB</b></td> <td>NOT TRAFFIC LEFT:</td><td align ght><b><script>document.write(setzeTT(""+Math.ceil(-9999999/1000)));</script> MB</b></td>") When run this outputs: ["123313"] In other words it produces an array of strings representing the target numbers in a script tag within a td tag which follows another td tag whose inner text is "Traffic left:" HTH -- Rick DeNatale Blog: http://talklikeaduck.denhaven2.com/ Twitter: http://twitter.com/RickDeNatale --001636163ed32dc7430462913178--