On Wed, Aug 29, 2007 at 09:45:04PM +0900, kazaam wrote:
> I'm trying to fetch all google results with hpricot. For the first page
> of results I wrote this here:
> 
> 	#!/usr/bin/env ruby
> 	$Verbose=true
> 
> 	require 'hpricot'
> 	require 'open-uri'
> 
> 	google = Hpricot(open("http://www.google.com/search?name=f&hl=en&q=#{$*}"))
> 	(google/"h2.r/a").each {|line| puts line.to_s.gsub(/^.+href="/,'').gsub(/" .+$/,'')}
> 
> So my first question is can I connect the both gsub statments above in
> just one gsub which should increase the speed? Or is there even a better
> way than using gsub for cleaning the results?
> 
> And the next question is: how can I get all results not just from the
> first page?

Look into mechanize or scrubyt for this. They sit on top of hpricot, but
are much better suited to screen scraping applications than hpricot alone.

> greets
> kazaam <kazaam / oleco.net>
--Greg