On Sun, Jul 17, 2011 at 12:34 PM, Rousan Malik <rousanmalik / gmail.com> wrote:
> I will try to elaborate on this. All I want is to parse an xml file
> which contains special characters. But my parser fails because it cannot
> open the file correctly (because of the special characters). So I
> decided to write the file contents to a new file by removing the special
> characters and then parse it. My sample input file(input.xml) with
> special characters is as follows:
>
> <configuration><database>
> <step type="general" datetime="Thu Jul 07 23:44:48 -0600 2011">
> <message>Validating element: %F0wt^b%99%90%94%D4N%8D%FA%8A%EE%81_
> g%9B@I%E3%F6%FCp%AFX%BD%80%91%B5pEK%C9!j%D3%F3S Y%C3%F6B~%C8%FC
> ^%87%C4%F2]! %B9%DF=%E7Y%B9</message>
> </step>
> </database></configuration>

This sample is not like your original example, which wasn't even
valid XML. However, if you're working with XML you shouldn't be
wasting time with any regex-based approach. Use nokogiri, which
can parse the above example just fine, and with which you can
easily accomplish your goal.

> <message>TEXT REMOVED</message>

-- 
Hassan Schroeder ------------------------ hassan.schroeder / gmail.com
http://about.me/hassanschroeder
twitter: @hassan