Phlip wrote:
> SpringFlowers AutumnMoon wrote:
>> Would a good HTML parser be Hpricot?
>>   
> It's extremely good; try it and see!
>>   I wonder if anyone knows an easy
>> way for it to get all text of an HTML file?   (removing all formatting
>> tags).
>>   
> 
> .each_element( './/text()' ){}.join() might do it.

anyone knows where to go from:

require 'hpricot'
doc = Hpricot("<b>hello <i>world</i></b>")


and what can i do to get "hello world"?



in 
http://code.whytheluckystiff.net/hpricot/wiki/HpricotChallenge#StripallHTMLtags
it says just use

str=doc.to_s
print str.gsub(/<\/?[^>]*>/, "")

but can't the   <  >  be nested in some HTML code?  If it is nested then 
the above won't work, it seems.


-- 
Posted via http://www.ruby-forum.com/.