On 21-Nov-06, at 5:27 PM, Wes Gamble wrote:

> Has anyone done a head to head comparison of Hpricot and Rubyful Soup
> (both HTML parsers)?
>
> If so, would you be willing to comment on which one a) is faster  
> for an
> average sized HTML page and b) preserves the original HTML better.

I switched from Rubyful Soup to Hpricot a while ago. The reason was  
performance on 1000-2000 character html chunks -- I didn't do a  
benchmark because there just was no need to... Hpricot is *a lot*  
faster.

I have no idea which preserves html better, I'm only using them to  
find specific bits of the html (e.g. links, images, a few other  
things). I do not use either to transform the input html, I *always*  
keep the input as it was. In all cases I have html in a string that I  
give to the parser, I do know that with Rubyful Soup it was  
absolutely necessary to dup the string first or you were liable to  
have changes made to the input string.

Cheers,
Bob

>
> Thanks,
> Wes
>
> -- 
> Posted via http://www.ruby-forum.com/.
>

----
Bob Hutchison                  -- blogs at <http://www.recursive.ca/ 
hutch/>
Recursive Design Inc.          -- <http://www.recursive.ca/>
Raconteur                      -- <http://www.raconteur.info/>
xampl for Ruby                 -- <http://rubyforge.org/projects/xampl/>