I write Ruby plugins for Google Sketchup.

Sketchup uses UTF-8 strings and passes this to ruby (1.8) - which
handles Strings as simple series of bytes. This caused problems when I
tried to pass a String I got from Sketchup which contained a file path
with some Norwegian letters. (æøå??????) as ruby then raised an error
saying the file/path didn't exist.

This was because æøå?????? lies outside the ASCII character set so it was
returned as double byte characters in UTF-8.


Searching the net I found some hacks that converted UTF-8 into single
byte characters: str_utf8.unpack('U*').pack('C*')

The Norwegian characters lies outside the ASCII range, but still they
get packed into single bytes characters that the File classes can
handle.


Example:
'æøå??????'.length # <- all these characters causes the File class to fail
> 12

'æøå??????'.unpack('U*').pack('C*').length # <- File class now can handle
this
> 6

So it seems that the File class doesn't just handle ASCII, but maybe
ANSI (Windows-1252) or ISO-8859-1. Or does this depend on some system
setting?

My tests has been on a Norwegian Windows XP system with Norwegian
locale. Default language for applications that doesn't support Unicode
is also set to Norwegian.


To summon up what I'm trying to work out is how UTF-8 characters above
the ASCII range (0-127) is mapped to the 128-255 range. Does the 128-255
range refer to ANSI (1252) or ISO-8859-1? <- and is this due to system
settings?
-- 
Posted via http://www.ruby-forum.com/.