The odds are your text is in non-UTF8 encoding, but in CP1252 or similar.
Then indeed,  if $KCODE = 'u' split won't work right.

2006/1/31, Nick Snels <nick.snels / gmail.com>:
> Hi Axel,
>
> thanks for the reply. If I try your code, my characters with accents
> don't get translated to numbers, unfortunately. Do you know where these
> numbers come from, I looked on the net but \352 is not the octal,
> hexadecimal or UTF-8 representation of . Could you split the following
> sentence for me and let me know what the result is:
>
> a="Ils sont tr´ď ˝¤ervles regexps."
> splitted_text=a.split(/\s/)
>
> Not my best French. But if I try this, 'tr´ď ˝¤ervles' is still one
> part, eventhough I split it on the spaces. Maybe it is different with
> you and then I have to look deeper. Thanks for your help. If anybody is
> able to split is like 'tr´ď', '˝¤ervÚž, 'les' please let me know!!
>
> Kind regards,
>
> Nick
>
> --
> Posted via http://www.ruby-forum.com/.
>
>