Dhanasekaran Vivekanandhan wrote:

> yes, I want the text of the first <p> because it
> has an image. and reject if <p> has no image.

Hpricot might be able to do this, but you can also do it on your own, and
know why the solution works.

---------------------------------------

#!/usr/bin/ruby -w

data = File.read("test.html")

array = data.scan(%r{<p>([^<]+?)<img .*?/></p>})

p array

---------------------------------------

Input text:

<p>don't want this text</p>

<p>want this text<img src=""/></p>
 
<p>don't want this text either</p>

<p>want this text too<img src=""/></p>

Output:

[["want this text"], ["want this text too"]]

-- 
Paul Lutus
http://www.arachnoid.com