Siddharth  Karandikar wrote:

> Here is the scenario,
> 
> I am trying to have my blog in 2 languages. English and my native
> language 'marathi'. The blog posts will be written in plain text. Using
> bluecloth, I am generating required html markup.
> I have hacked bluecloth to spit <english>...</english> in required
> places,
> 
> e.g.
> ### <E title E>
> 
> will generate
> <h3><english>title</english></h3>
> 
> Now when I get this kind of html, I would like to skip all the text
> under 'english' tag and convert all the remaining text to my language
> 'marathi' (utf8 codes). Using Hpricot for this.

Okay, that sounds a great deal more complex than a typical text extraction
task from an HTML page. I assume you mean to preserve some parts unchanged,
while translating other parts, and reassemble the page at the end of the
process.

This could be done using your own custom code, but only if a much more
specific, detailed description were offered. The same thing could be said
of an Hpricot-based approach, by the way.

> After that I am thinking of removing all the 'english' tags but keeping
> the markup surrounding them.

Okay, that part is easy:

data.gsub!(%r{<english>.*?</english>}im,"")

Most tasks in this class are easy to accomplish, as long as the description
is clear and detailed enough.

-- 
Paul Lutus
http://www.arachnoid.com