--001636163ed32dc7430462913178
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

On Tue, Feb 10, 2009 at 3:19 AM, William James <w_a_x_man / yahoo.com> wrote:

> Joao Silva wrote:
>
> > how i can extract:
> >
> > <td>Traffic left:</td><td
> > alignght><b><script>document.write(setzeTT(""+Math.ceil(-123313/100
> > 0)));</script> MB</b></td>
> >
> > i need this nuber: 123313? I tried to match this in many ways but i
> > stil have problem with escape characters.
>
>
> list  ATA.read.scan( %r{<td.*?>\s*(.*?)\s*</td>}im ).flatten
>
> list.each_cons(2){|a,b|
>  if "Traffic left:" a  and  b /Math.ceil\((-?\d+)/
>    p $1
>  end
> }
>
>
> __END__
>
> <td>NOT TRAFFIC LEFT:</td><td
> alignght><b>
> <script>document.write(setzeTT(""+Math.ceil(-9999999/1000)));
> </script>
> MB</b></td>
>
> <td> Traffic left:
> </td><td
> alignght><b><script>
> document.write(setzeTT(""+Math.ceil(-123313/1000)));
> </script>
> MB</b></td>
>
>
As 7Stud pointed out, a toolbox with only regular expressions inside is
often a poor choice for dealing with xml/html

Here's a rather verbose and commented program using a combination of hpricot
and a regular expression to do something like what I think you are looking
for:

require 'rubygems'
require 'hpricot'

def get_traffic_left_numbers(string)
  doc  pricot(string)
  results  ]
  # iterate over all of the td elements in the document
  traffic_lefts  oc.search("td").each do |td1|
    # check to see if the td contents is "Traffic left:"
    if td1.inner_text "Traffic left:"
      # if yes, get the next sibling
      td2  d1.next_sibling
      # and then for each script tag inside
      td2.search("script") do | script |
        # get the script_tag text
        script_text  cript.inner_text
        # Use a regexp to capture the number
        number  Math\.ceil\(-?(\d+)/.match(script_text)
        # add the number we found, if any, to the results array
        results << number[1] if number
      end
    end
  end
  results
end

p get_traffic_left_numbers("<td>Traffic left:</td><td
alignght><b><script>document.write(setzeTT(""+Math.ceil(-123313/1000)));</script>
MB</b></td>
<td>NOT TRAFFIC LEFT:</td><td
alignght><b><script>document.write(setzeTT(""+Math.ceil(-9999999/1000)));</script>
MB</b></td>")

When run this outputs:

["123313"]

In other words it produces an array of strings representing the target
numbers in a script tag within a td tag which follows another td tag whose
inner text is "Traffic left:"

HTH


-- 
Rick DeNatale

Blog: http://talklikeaduck.denhaven2.com/
Twitter: http://twitter.com/RickDeNatale

--001636163ed32dc7430462913178--