http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/207625
is an answer to most of my requirements, except one.

How can I do a selective traverse_text so that I can skip text of
specific tags?

One option was to use parent.name while traversing over text.
Here is the code that I tried,

require 'hpricot'
class Hpricot::Text
	def set(string)
		@content = string
		self.raw_string = string
	end
end

s = <<HTML
<html>
	<body>
		<h4>Abcd</h4>
		<java>this is in java1</java>
		<ul>
			<li>aabbcc</li>
			<li>mmnnoo</li>
			<li><java>this is in java2</java></li>
		</ul>
		<java>this is in java3</java>
	</body>
</html>
HTML

index = Hpricot.parse(s)
index.traverse_text { |text|
	t = text.to_s.strip
	if text.parent and text.parent.name and text.parent.name != 'java' and
not t.empty?
		t = "=#{t}="
		text.set(t)
		puts "Modified text to:#{t}"
	end
}
puts index


Getting following error,

Modified text to:=Abcd=
Modified text to:=aabbcc=
Modified text to:=mmnnoo=
hpricot-test1.rb:30: undefined method `name' for
#<Hpricot::Doc:0x2e49c18> (NoMethodError)
        from
c:/ruby/lib/ruby/gems/1.8/gems/hpricot-0.4-mswin32/lib/hpricot/traverse.rb:377:in
`traverse_text_internal'
        from
c:/ruby/lib/ruby/gems/1.8/gems/hpricot-0.4-mswin32/lib/hpricot/traverse.rb:366:in
`traverse_text_internal'
        from
c:/ruby/lib/ruby/gems/1.8/gems/hpricot-0.4-mswin32/lib/hpricot/traverse.rb:146:in
`each'
        from
c:/ruby/lib/ruby/gems/1.8/gems/hpricot-0.4-mswin32/lib/hpricot/traverse.rb:146:in
`each_child'
        from
c:/ruby/lib/ruby/gems/1.8/gems/hpricot-0.4-mswin32/lib/hpricot/traverse.rb:366:in
`traverse_text_internal'
        from
c:/ruby/lib/ruby/gems/1.8/gems/hpricot-0.4-mswin32/lib/hpricot/traverse.rb:358:in
`traverse_text'
        from hpricot-test1.rb:28


Am I making any mistake?

I am new to the world of Ruby and Hpricot ... so please bear with me.

- Siddharth