On Mon, Oct 02, 2006 at 11:38:05PM +0900, HH wrote:
} I've been messing with Hpricot and I'm trying to do a few things that
} aren't apparently documented or available as part of Hpricot.  Can
} someone verify the following...
} 
} 1)  Is there a simple way to determine the element's current path /
} location?  For example, if I find a text node, is there a simple way to
} determine the path of that text node so I can find it again later using
} that path / location as a parameter to the search method?  I assume I
} can use the parent method to find the parent and recurse through until
} I get to the root node...is there an easier way?

I have been using the recursive (well, iterative, actually) way. I suspect
that that is the way to do it since the tree structure is intentionally
simple and is designed to allow you to move nodes around arbitrarily.
Maintaining a node's path independent of its structural location is
inefficient at best and impossible at worst.

} 2)  Is there a simple way to find all elements with non-empty text
} nodes?  It appears that Hpricot is focused on providing methods for
} finding something if you know the element tag / attributes / classes /
} etc.  I've been using traverse_text which requires going through every
} text node and filtering out the ones that are empty / whitespace.  Is
} there an easier way to find all elements with non-empty text nodes?

nodes = []
doc.traverse_text { |t| nodes << t.parent if (t.content && t.content != '') }

} This is in reference to parsing HTML pages which may or may not be
} well-formed.

I've found Hpricot to be remarkably resilient in parsing questionable HTML.

} All in all - I really like Hpricot.  I was using REXML and tidy before,
} but this is alot simplier and faster!
} 
} Thanks to _why the lucky stiff for a great little HTML parser...

I'll second that.
--Greg