Kevin, I settled on using Tidy to clean up the HTML, then parsing it into a tree using the HTML scanner that comes with Rails. Tidy does all the hard stuff of dealing with bad HTML and straightening it up. The HTML scanner is very lightweight and has a simple, clean API. You don't need to run Rails, just require the scanner library (look for html/document.rb). It's two passes, but with Tidy being C++ and HTML scanner doing no cleanup, it's amazingly fast. I'm processing around 500Kb/s (mobile Duo Core 1.8GHz). You can walk the DOM, or use XPath-like finders, or my preferred method of looking up content: using CSS selectors. If you're doing HTML scraping this library will do all the hard work for you: http://blog.labnotes.org/2006/07/11/scraping-with-style-scrapi-toolkit-for-ruby/ Assaf http://labnotes.org Kevin Weller wrote: > Anybody have experience with a decent HTML parser for a Ruby > application? I've looked around, and so far everything I've found is > either unfinished, unstable, [relatively] undocumented, or just plain > ugly in terms of API. > > I'd like a parser that can take a partial HTML file and return an > easily-traversable data structure, in the same order that the elements > appear in the file. I don't want or need a callback mechanism, only > something I can iterate and tree-search. Though I don't hold much hope > it will work, I will try using REXML on my text and see what it > produces...results to be posted here. Thanks in advance! > > -- > Kevin Weller > Information Technology Crucible > http://www.itcrucible.com