On Dec 5, 6:13 pm, "Michael W. Ryder" <_mwry... / worldnet.att.net> wrote: > I am trying to process an XML file that includes various codes. The > problem I am running into is that some of these codes are inserted into > the middle of an encrypted string. If I display the file using a > browser these codes do not show up and copying and pasting the string > work fine. The problem occurs when I try to strip out the string in a > program and these "extraneous" XML codes are included. This of course > makes the decryption routine crash. > What I am looking for is a simple way to read through the file and > remove all the XML codes leaving just plain text. I could probably > write a series of regular expressions to remove each code that I can > find in my text but am afraid I might miss some and it will come back to > haunt me at a later time. str.gsub /</?[^>]+>/, '' This will only be a problem if your XML file is legal and has a CDATA section which has a literal < character (not <), like: for ( var i=0, len=a.length; i<len; ++i ) In that case you likely want a proper XML parser (like REXML) and to use it. Do you really want to remove the XML, or would it suffice to just: str.gsub! '&', '&' str.gsub! '<', '<' str.gsub! '>', '>' (and maybe even) str.gsub! '"', '"' str.gsub! "'", ''' to make your string valid and escaped for use in an HTML context?