On Wed, Aug 29, 2007 at 09:45:04PM +0900, kazaam wrote: > I'm trying to fetch all google results with hpricot. For the first page > of results I wrote this here: > > #!/usr/bin/env ruby > $Verbose=true > > require 'hpricot' > require 'open-uri' > > google = Hpricot(open("http://www.google.com/search?name=f&hl=en&q=#{$*}")) > (google/"h2.r/a").each {|line| puts line.to_s.gsub(/^.+href="/,'').gsub(/" .+$/,'')} > > So my first question is can I connect the both gsub statments above in > just one gsub which should increase the speed? Or is there even a better > way than using gsub for cleaning the results? > > And the next question is: how can I get all results not just from the > first page? Look into mechanize or scrubyt for this. They sit on top of hpricot, but are much better suited to screen scraping applications than hpricot alone. > greets > kazaam <kazaam / oleco.net> --Greg