On Sat, 2006-11-25 at 09:00 +0900, _why wrote: > On Sat, Nov 25, 2006 at 04:50:07AM +0900, Ross Bamford wrote: > > In terms of preserving the original HTML, I found the libxml2 and Hpricot > > parsers to be fairly even, with both doing pretty good job of fixing up > > broken HTML. > > Thanks, Ross, that was great. Libxml2 has HTML fixup stuff? That's > sensational. Are the bindings pretty stable? Surely does: http://xmlsoft.org/html/libxml-HTMLparser.html . It's a new addition to the bindings (still in CVS) but it's really 'just another parser' and uses the same (reasonably well tested) parser context / tree bindings as the regular XML parsers. -- Ross Bamford - rosco / roscopeco.REMOVE.co.uk