Yukihiro Matsumoto wrote:
> You said Tcl has Unicode support that works well with you.  So that I
> think treating all of them in UTF-8 is OK for you.

It's actually not about treating everything in UTF-8, it just unifies 
everything in Tcl in a way that you can have all variety of characters 
in strings.

> Then how can it
> determine which should be in the current code page, or in Unicode?
> Or using Win32 API ending with W could allow you living in the
> Unicode?

Well, currently (just downloaded latest cvs sources) ruby uses ansi 
versions of CreateFile and FindFirstFile/FindNextFile APIs, so even if I 
say, for example, KCODE to UTF-8 (not sure how you can currently make 
ruby work with UTF-8) ansi versions of APIs are still called, and that 
means that

  1) if there are filenames with characters that don't fall in range of 
current codepage, I will receive '?' in place of them when I enumerate 
directory contents.
  2) I receive filenames in current code page, and not in UTF-8
  3) There is no way for me to open a file with these characters using 
standard ruby classes

The same with win32ole extension, I can see a lot of ole_wc2mb/ole_mb2wc 
there, which breaks things horribly when interoperating with, for 
example, Excel and trying to work with russian/greek/japanese and all 
other languages all on the same sheet (after I process the sheet, 
modifying all of the cells, it will just strip all languages except 
russian from it).

In *nixes you can just change your locale to *.UTF-8 and you're ok with 
that, because everything you receive when enumerating directory is 
UTF-8, and File.open will expect UTF-8. Unfortunately, for Windows that 
is not possible: MS already provides 'wide' versions of APIs for those 
who need them, and there is no UTF-8 ANSI codepage you can set as 
default (because UTF-8 codepage in Windows is somewhat 'virtual', for 
conversion purposes only).

In Tcl you have all of your strings in UTF-8, and when Tcl interoperates 
with the rest of the world, it converts strings appropriately (for 
example, on Win9x there are mostly no 'wide' APIs, so it converts 
strings to current code page and uses ansi APIs, but on WinNT it 
converts it to unicode and uses 'wide' APIs). What I was thinking is 
maybe a way for setting "current codepage" for ruby on win32 (including 
possibility to set it to UTF-8), and so that when ruby works with the 
world it would use 'wide' APIs when possible, converting to and from 
this codepage (so that instead the way it is Tcl when it is hard-coded 
to be UTF-8, there would be a possibility to choose), because there are 
no other way to do that on Windows by user (user can't set current 
codepage to UTF-8).

-- 
Posted via http://www.ruby-forum.com/.