On Sat, 2006-11-25 at 09:00 +0900, _why wrote:
> On Sat, Nov 25, 2006 at 04:50:07AM +0900, Ross Bamford wrote:
> > In terms of preserving the original HTML, I found the libxml2 and Hpricot  
> > parsers to be fairly even, with both doing pretty good job of fixing up  
> > broken HTML.
> 
> Thanks, Ross, that was great.  Libxml2 has HTML fixup stuff?  That's
> sensational.  Are the bindings pretty stable?

Surely does: http://xmlsoft.org/html/libxml-HTMLparser.html . It's a new
addition to the bindings (still in CVS) but it's really 'just another
parser' and uses the same (reasonably well tested) parser context / tree
bindings as the regular XML parsers.

-- 
Ross Bamford - rosco / roscopeco.REMOVE.co.uk