SpringFlowers AutumnMoon wrote:
> Would a good HTML parser be Hpricot? 

It definitely is.

> I wonder if anyone knows an easy
> way for it to get all text of an HTML file?   (removing all formatting
> tags).

It looks like #inner_text removes all tags and what remains is the plain
text content. Note that it won't convert <br>'s and <p>'s to newlines -
it really just strips tags. If you want more sophisticated text results,
you should iterate over the elements, and implement your logic for
specific ones.

mortee