On Sep 5, 2005, at 8:25 AM, Zach Dennis wrote:

> On Mon, 2005-09-05 at 22:34 +0900, Damphyr wrote:
>
>> OK, I am officially frustrated/lost/bewildered (take your pick)  
>> with all
>> this encoding/decoding of character sets.
>> I'm trying to grab some book data from a web service using ISBN  
>> numbers.
>> I'm using a simple GET HTTP request and on a query the service  
>> returns
>> the following:
>>
>> <?xml version="1.0" encoding="utf-8"?>
>> <string xmlns="http://www.webserviceX.NET">
>> &lt;ISBNORG&gt;
>> &lt;RECORD&gt;
>> &lt;ISBN&gt;0764558315&lt;/ISBN&gt;
>> &lt;AUTHOR&gt;Rod Johnson, with Juergen Hoeller.&lt;/AUTHOR&gt;
>> &lt;FULLTITLE&gt;Expert one-on-one J2EE development without EJB / Rod
>> Johnson, with Juergen Hoeller.&lt;/FULLTITLE&gt;
>>
>> &lt;SHORTTITLE&gt;Expert one-on-one J2EE development without EJB
>> /&lt;/SHORTTITLE&gt;
>> &lt;EDITION&gt;&lt;/EDITION&gt;
>> &lt;PUBLISHER&gt;Wiley Pub./Wrox,&lt;/PUBLISHER&gt;
>> &lt;DATE&gt;c2004.&lt;/DATE&gt;
>> &lt;SUBJECT&gt;Java (Computer program language)&lt;/SUBJECT&gt;
>>
>> &lt;/RECORD&gt;
>> &lt;/ISBNORG&gt;
>> </string>
>>
>> which I can't parse with REXML :(. If all the &lt; &gt; where < and >
>> then no prob, everything checks out fine. Same code with the above
>> snippet refuses to extract the data. Obviously I'm missing something.
>> Is there a way to parse this string so that all the escaped stuff  
>> goes
>> back to normal? Can REXML understand the ampersand thingies?
>> Any help will be appreciated,
>> Cheers,
>> V.-
>> P.S. I'd have used Pickaxe 2.ed for the example if only the book  
>> was in
>> their database :)
>>
>
> You are seeing already escaped characters. You need to unescape them.
>
> str = CGI.unescapeHTML( string )
> REXML::Document.new( str )

Another way of looking at this: you're getting one XML document  
embedded in another:

enclosing_doc = REXML::Document.new(str)
real_doc = REXML::Document.new(enclosing_doc.elements["/string"].text)

Josh