Henry Maddocks wrote:

> Sorry, try again...
> 
> Not sure where to send this, sorry if it's not the right place...
> 
> The html in the attached file renders 'correctly' in the 3 browsers I
> have tried but it tricks hpricot because of the second malformed
> comment. When I say correctly I mean I get to see 'Some text'. I
> guess it could be argued that this is incorrect. For my application
> it would be nice if hpricot behaved like a browser.

You have created a new thread, and you have not attached any prior text.
This requires us to start over.

Tell us what you hoped would happen, what happened instead, and how they
differ.

If your goal is to filter particular content from HTML pages, just say so,
and be specific about what you want and don't want. Given this information,
I will show you how to extract the desired content with a few lines of
Ruby, no fuss, no undue complexity, no Hpricot.

IIRC, you had asked for help using Hpricot to extract text between <p> and
</p> tag pairs, but with the added requirement that there be an IMG tag
within the <p> ... </p> tag pair to validate the case. Is this still the
goal? If so, how did my previously posted, simple solution work out for
you?

This is a scene in a much larger play, one in which someone says, "Wow, I
had no idea there was such a powerful library, so carefully designed, so
complete. But, notwithstanding its extraordinary features, notwithstanding
the hundreds of man-hours expended creating it ... I can't get it to do
what I want."

This is a very common refrain. I think I can solve your problem with a few
lines of Ruby code, code that you can easily understand and adapt to
specific and evolving requirements. And if I cannot do this, I will say so.

-- 
Paul Lutus
http://www.arachnoid.com