Yan-Fa Li wrote:
> Yeah I had the same problem recently.  I think since html allows lax 
> closing of elements rexml will just barf.  In the end I used regular 
> expressions to slurp catch the lines I was interested in and regex to 
> capture the fields I wanted.  Works really well.  There's also a html 
> parser class based on the python one, but it was so badly documented and 
> it seems to be poorly supported that I chose not to use it.

You might have more luck with mine:

http://rubyforge.org/projects/htmltokenizer/

It is more forgiving, and pretty easy to use.

Ben