On 2007-05-31 02:36:57 -0700, "Richard Conroy" <richard.conroy / gmail.com> said:

> On 5/31/07, Dick Davies <rasputnik / gmail.com> wrote:
>> Hpricot is a good starting point.
> 
> Yeah Hpricot is good, but in general the quality of the Ruby web scraping
> choices is pretty impressive. There are variants that are just built on top
> of Hpricot but provide an even simpler API.
> 
> However your second problem is a bit trickier, where you encounter
> alternate encodings. To do any kind of real work with multiple code
> pages you want to be converting it to unicode (UTF-8) at fetch time.
> 

I've had great success with this. Just make sure you're using a later 
version of Ruby 1.8.5+ (that includes the NKF library) and you should 
be fine.