Andrei Maxim wrote:
> Hi guys,
> 
> I'm starting to learn Ruby and I was thinking about a little app so I can
> get things started as quickly as possible. Since I'm an avid blog reader,
> the first thing that went though my mind was a small app that would extract
> the RSS or Atom feed from a web page, giving the URL.
> 
> My first choice were regexps but I'm thinking that my little app my grow a
> little bit more in the not-so-distant future and I might be doing more than
> just extracting feeds.
> 
> I found:
> 
> * ymHTML at http://www.yoshidam.net/Ruby.html
> * RAA at http://raa.ruby-lang.org/project/html-parser-2/
> 
> but they don't look really standard and RAA doesn't look like it's currently
> maintained. I've also heard that there's a Rails HTML parser but I couldn't
> find more info (an pro'lly I'll ask on one of the Rails list).
> 
> Is there a more "standard" way to parse HTML pages in Ruby?
The closest you'll find to a standard is REXML, which is an XML parser 
that ships in the stdlib.  You'll want to throw your HTML through Tidy 
first, though - but that's an easy install.

There are a couple of alternatives: Hpricot and html-parser spring 
instantly to mind.

If you're doing feed parsing, you probably also want to check out feedtools.

-- 
Alex