Bil Kleb wrote:
> OK, so I haven't done this in years.
>
> What's the "modern" way of grabbing the data off
> a webpage, e.g.,
>
>   http://yorkcountyschools.org/mves/arlist/3-3.4.htm
>
> My initial attempt has been focused on Hpricot,
>
>   require 'rubygems'
>   require 'open-uri'
>   require 'hpricot'
>   doc = Hpricot(open('http://yorkcountyschools.org/mves/arlist/3-3.4.htm'))
>
> and I can find doc/"th" and doc/"tr", but what's
> the best way to cram them into an array of structs
> or something?
>
> Thanks,
> --
> Bil Kleb
> http://funit.rubyforge.org

require 'net/http'
http = Net::HTTP.new( "yorkcountyschools.org" )
resp, data = http.get( "/mves/arlist/3-3.4.htm", nil )

table = data.scan( %r{<tr>(.*?)</tr}im ).flatten.
  map{|s| s.scan( %r{<td>(.*?)</td>}i ).flatten }.
  reject{|ary| ary.size != 5}

p table