On 2007-05-31 02:36:57 -0700, "Richard Conroy" <richard.conroy / gmail.com> said: > On 5/31/07, Dick Davies <rasputnik / gmail.com> wrote: >> Hpricot is a good starting point. > > Yeah Hpricot is good, but in general the quality of the Ruby web scraping > choices is pretty impressive. There are variants that are just built on top > of Hpricot but provide an even simpler API. > > However your second problem is a bit trickier, where you encounter > alternate encodings. To do any kind of real work with multiple code > pages you want to be converting it to unicode (UTF-8) at fetch time. > I've had great success with this. Just make sure you're using a later version of Ruby 1.8.5+ (that includes the NKF library) and you should be fine.