Hi,
In message "Re: The face of Unicode support in the future"
on Thu, 20 Jan 2005 02:57:58 +0900, Florian Gro<florgro / gmail.com> writes:
|Maybe that could be equivalent to String.new("\244\336...", "euc-jp")?
Probably.
|(And I think it would somehow need to work for all possible script
|encodings, but I'm not sure if this is possible when all string literals
|automatically use the script encoding. This might be a problem.)
We might need something to denote raw strings (r"" for example).
|Does this mean that #size and #length would do different things or am I
|just misunderstanding?
In the current prototype, it work differently (length gives number of
code points in the string; size give the length of the byte sequence).
But I now think they should behave same.
|> * restrict symbols to 7bit ascii
|
|Hm, what about international method and variable names? (These are
|possible with -Ku right now.)
In this case, we give up those.
|> * embed encoding info in Symbols
|
|Does this mean that Symbols would not be immediate in all cases? (And
|any guesses as to how that would effect performance?)
They will keep being immediate. I need to make up some tricks.
|> * symbols just use byte sequence
|
|Hm, I think that would work in most cases. Maybe it should not be
|possible to .intern Strings that are not fully compatible (it should
|still be possible to do utf8_str.intern in an ascii script if it only
|contains 0...127) to the script's encoding.
The prototype works this way.
|> * something else I don't think of now.
|
|It's a difficult problem for sure.
Indeed.
matz.