On Dec 5, 6:13 pm, "Michael W. Ryder" <_mwry... / worldnet.att.net>
wrote:
> I am trying to process an XML file that includes various codes.  The
> problem I am running into is that some of these codes are inserted into
> the middle of an encrypted string.  If I display the file using a
> browser these codes do not show up and copying and pasting the string
> work fine.  The problem occurs when I try to strip out the string in a
> program and these "extraneous" XML codes are included.  This of course
> makes the decryption routine crash.
> What I am looking for is a simple way to read through the file and
> remove all the XML codes leaving just plain text.  I could probably
> write a series of regular expressions to remove each code that I can
> find in my text but am afraid I might miss some and it will come back to
> haunt me at a later time.

str.gsub /</?[^>]+>/, ''

This will only be a problem if your XML file is legal and has a CDATA
section which has a literal < character (not &lt;), like:

   for ( var i=0, len=a.length; i<len; ++i )

In that case you likely want a proper XML parser (like REXML) and to
use it.

Do you really want to remove the XML, or would it suffice to just:

  str.gsub! '&', '&amp;'
  str.gsub! '<', '&lt;'
  str.gsub! '>', '&gt;'
(and maybe even)
  str.gsub! '"', '&quot;'
  str.gsub! "'", '&apos;'

to make your string valid and escaped for use in an HTML context?