Peter Higgins wrote: > I've written a small script to parse an xml doc with SaxParser and > everything goes well until the parser encounters a Unicode character. > For example, in the for the following snippet: > > <key>Name</key><string>90's Music</string> > > In case it doesn't come through correctly, the "'" character above is an > apostrophe, represented as <E2><80><99> when I view the xml with less. > > When the on_characters method is called for the string "90's Music", the > buffer only contains "90", with no error or warning being presented. > After this is encountered parsing occurs normally; the first I saw of > the bug was when I noticed some of my strings being truncated. Is there > some setting of libxml or ruby that I've overlooked to cause this > behavior? As part of researching the problem, I wrote a small test script with REXML looking for that particular string, and it returned the correct, full quote: "90”Ēs Music". It looks like this is a bug with libxml then, so I'll post on their mailing list. -- Posted via http://www.ruby-forum.com/.