On Wed, 12 Jan 2005 23:12:14 +0900, Florian Gro? <florgro / gmail.com> wrote:
> Yukihiro Matsumoto wrote:
>> Besides, characters in _my_ encoding would not fit in the range
>> of 0..255 anyway.
> Depends on whether you're handling them as raw bytes or not. :)
> 
> (But I think that the unused bit in the Symbol namespace could be
> used for representing full UCS2 range, but that would mean
> enforcing a specific encoding for those immediate one-character
> Strings. And I think you would need full UCS4 range anyway. I'm
> not sure.)

UTF-16 is better than UCS2 -- UTF-16 is UCS2 with surrogate
characters, and you don't really gain anything by using UCS4/UTF-32
(in fact, you lose quite a bit of Huffman encoding). Most Western
languages can be efficiently presented in UTF-8; most Eastern
languages are more efficiently presented in UTF-16. (I've done quite
a bit of research on this.)

Both UCS2 and UCS4 are deprecated encodings, and the only reason
that they're really in use at this point is because some filesystems
(NTFS) use UCS2 as their base encoding and cannot therefore support
UTF-16 encodings.

I do believe that there was discussion regarding having a ByteVector
sort of class, too -- resulting in the sort of speed optimisations
necessary without impacting normal String use.

-austin
-- 
Austin Ziegler * halostatue / gmail.com
               * Alternate: austin / halostatue.ca