On Sep 5, 2005, at 8:25 AM, Zach Dennis wrote: > On Mon, 2005-09-05 at 22:34 +0900, Damphyr wrote: > >> OK, I am officially frustrated/lost/bewildered (take your pick) >> with all >> this encoding/decoding of character sets. >> I'm trying to grab some book data from a web service using ISBN >> numbers. >> I'm using a simple GET HTTP request and on a query the service >> returns >> the following: >> >> <?xml version="1.0" encoding="utf-8"?> >> <string xmlns="http://www.webserviceX.NET"> >> <ISBNORG> >> <RECORD> >> <ISBN>0764558315</ISBN> >> <AUTHOR>Rod Johnson, with Juergen Hoeller.</AUTHOR> >> <FULLTITLE>Expert one-on-one J2EE development without EJB / Rod >> Johnson, with Juergen Hoeller.</FULLTITLE> >> >> <SHORTTITLE>Expert one-on-one J2EE development without EJB >> /</SHORTTITLE> >> <EDITION></EDITION> >> <PUBLISHER>Wiley Pub./Wrox,</PUBLISHER> >> <DATE>c2004.</DATE> >> <SUBJECT>Java (Computer program language)</SUBJECT> >> >> </RECORD> >> </ISBNORG> >> </string> >> >> which I can't parse with REXML :(. If all the < > where < and > >> then no prob, everything checks out fine. Same code with the above >> snippet refuses to extract the data. Obviously I'm missing something. >> Is there a way to parse this string so that all the escaped stuff >> goes >> back to normal? Can REXML understand the ampersand thingies? >> Any help will be appreciated, >> Cheers, >> V.- >> P.S. I'd have used Pickaxe 2.ed for the example if only the book >> was in >> their database :) >> > > You are seeing already escaped characters. You need to unescape them. > > str = CGI.unescapeHTML( string ) > REXML::Document.new( str ) Another way of looking at this: you're getting one XML document embedded in another: enclosing_doc = REXML::Document.new(str) real_doc = REXML::Document.new(enclosing_doc.elements["/string"].text) Josh