Thanks, I am going to try with html entities.

However, I recheked with my browsers and:

when the accented character comes from ?? YAML file, it is correctly HTML 
encoded, however when it comes from the dBase file, it goes again 
through:

é gets translated to \202 that then gets translated to ‚
è gets translated to \212 that then gets translated to Š
?? gets translated to \205 that then gets translated to …
ç gets translated to \207 that then gets translated to ‡
â gets translated to \203 that then gets translated to ƒ
ê gets translated to \210 that then gets translated to ˆ
î gets translated to \214 that then gets translated to Œ
ô gets translated to \223 that then gets translated to “
û gets translated to \226 that then gets translated to –
ä gets translated to \204 that then gets translated to „
ë gets translated to \211 that then gets translated to ‰
ï gets translated to \213 that then gets translated to ‹
ö gets translated to \224 that then gets translated to ”
ù gets translated to \227 that then gets translated to —

like the behavior seen on this page (python behavior however 
(http://www.reportlab.com/i18n/python_unicode_tutorial.html))

For example:

[dBase > XML] é gets translated to \202 that then gets translated to 
‚ (single low-9 quotation mark)
[YAML > XML] é gets translated to é

Thanks for your help!

Jamal

Jimmy Kofler wrote:
>> Jamal Bengeloun wrote:
>> Thanks a lot for your help. I thought I will be going mad with this. I 
>> thought it had something to do with ruby being C based (I saw something 
>> on the internet about the difference between Python and JPython and the 
>> accented characters were encoded in UTF-8 and not html escaped).
>> 
>> What if the end rendering engine is not a browser (I checked and you're 
>> absolutely right, it does work in a browser)? How to get true UTF-8 
>> encoded characters instead of HTML escaped ones? I am using builder to 
>> generate XML files from the data I get.
>> 
>> Thanks a lot for your explanation (it really did enlighten me) and your 
>> help.
>> 
>> Jamal
> 
> 
> It should be possible to convert CP437 - 
> http://en.wikipedia.org/wiki/Code_page_437 - to UTF-8 using iconv.
> 
> iconv -l | grep -i CP437 # => 437 CP437 IBM437 CSPC8CODEPAGE437
> 
> "How to get true UTF-8 encoded characters instead of HTML escaped ones?"
> 
> This should be doable with http://htmlentities.rubyforge.org .
> 
> (For a Ruby & UTF-8 snippet btw see 
> http://snippets.dzone.com/posts/show/4527 ).
> 
> Cheers,
> 
> j. k.

-- 
Posted via http://www.ruby-forum.com/.