Here is the scenario,

I am trying to have my blog in 2 languages. English and my native
language 'marathi'. The blog posts will be written in plain text. Using
bluecloth, I am generating required html markup.
I have hacked bluecloth to spit <english>...</english> in required
places,

e.g.
### <E title E>

will generate
<h3><english>title</english></h3>

Now when I get this kind of html, I would like to skip all the text
under 'english' tag and convert all the remaining text to my language
'marathi' (utf8 codes). Using Hpricot for this.

After that I am thinking of removing all the 'english' tags but keeping
the markup surrounding them.

- Siddharth

On Dec 13, 3:34 pm, Paul Lutus <nos... / nosite.zzz> wrote:
> Siddharth  Karandikar wrote:
> >http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/207625
> > is an answer to most of my requirements, except one.
>
> > How can I do a selective traverse_text so that I can skip text of
> > specific tags?/ ... snip lengthy listing of Hpricot error messages
>
> > Am I making any mistake?Rather than describe the problems you are having trying to make Hpricot
> deliver a particular result, why not say what you are trying to accomplish
> and we can discuss that instead?
>
> Parsing and extracting particular text from syntactically correct HTML pages
> is relatively easy. It only requires a few lines of Ruby code. You can
> choose which tags to extract text from, and leave all the others behind.
>
> In some cases, it is simpler to write your own extraction code than to try
> to get a library to do this for you. But this approach requires that the
> HTML pages be reasonably error-free -- it doesn't work very well if there
> are errors in the syntax of the source pages.
>
> If the pages you have to parse are reasonably error-free, you may have a
> much easier time getting what you are after than you may think at this
> point.
> 
> --
> Paul Lutushttp://www.arachnoid.com