Bil Kleb wrote: > OK, so I haven't done this in years. > > What's the "modern" way of grabbing the data off > a webpage, e.g., > > http://yorkcountyschools.org/mves/arlist/3-3.4.htm > > My initial attempt has been focused on Hpricot, > > require 'rubygems' > require 'open-uri' > require 'hpricot' > doc = Hpricot(open('http://yorkcountyschools.org/mves/arlist/3-3.4.htm')) > > and I can find doc/"th" and doc/"tr", but what's > the best way to cram them into an array of structs > or something? > > Thanks, > -- > Bil Kleb > http://funit.rubyforge.org require 'net/http' http = Net::HTTP.new( "yorkcountyschools.org" ) resp, data = http.get( "/mves/arlist/3-3.4.htm", nil ) table = data.scan( %r{<tr>(.*?)</tr}im ).flatten. map{|s| s.scan( %r{<td>(.*?)</td>}i ).flatten }. reject{|ary| ary.size != 5} p table