Hi,

On 2010-08-24, at 9:49 AM, Michel Demazure wrote:

> Michel Demazure wrote:
>> Michel Demazure wrote:
> 
>> 2. but when parsing "<foo>deuxième</foo>", I get "ème" (this was the 
>> initial bug I discovered in my app).
>> 
>> This is not the first time I see the "grave accented e" giving trouble 
>> when scanning or parsing in ruby, whatever tool is used...
>> 
> Sorry for posting again. Actually, in this last example, 'characters' is 
> called twice, the first call giving "deuxi", the second one "ème". trange feature, still a bug (?), but one can do with...

Actually this is allowed by the XML spec, annoying as it is. Many parsers do this when encountering an entity (e.g. &apos;) in the input stream (you get three strings, before, entity character, after). Some XML parsers have a parameter that tells it to join adjacent strings together before reporting a single string. I don't know if Nokogiri provides this functionality, but it might be worth a quick peek.

Cheers,
Bob

> 
> _md
> 
> 
> -- 
> Posted via http://www.ruby-forum.com/.
> 

----
Bob Hutchison
Recursive Design Inc.
http://www.recursive.ca/
weblog: http://xampl.com/so