Yukihiro Matsumoto wrote: > Hi, > > In message "Re: \u escapes in string literals: proof of concept implementation" > on Tue, 23 Oct 2007 16:53:57 +0900, David Flanagan <david / davidflanagan.com> writes: > > |I like the \Uxxxxxx escape instead of \u{}. Would you consider this, Matz? > > Actually I hate counting digits. When I am forced to put sufficient > number of preceding zeros to specify non-BMP character, I'd go mad. > Is there any reason \U<8ditits> is better than \u{}? If it's > sufficient reason, it's OK to allow \U as well. > > matz. I've been meaning to ask you about the 8 digits. Unicode only uses 6 digits currently: the highest allowed codepoint is 10FFFF. So even if Unicode grew to have 16 times then number of codepoints 6 hex digits would still be enough. What I was proposing was \U with exactly 6 digits after it. And you'd only use it for those rare codepoints with 5 or 6 digits. Without the curly braces it is shorter. I don't feel actually feel strongly about \u{} versus \U however. And reducing the number of special characters after slash is probably a good thing. Unless I'm missing the point, however, I don't think there is any reason to allow 4-byte codepoints. I read somewhere that although the UTF-8 encoding scheme can be extended to encode 32 bits in 6 bytes, this is actually forbidden by the UTF-8 spec. (I haven't verified that, but I think I saw it on Wikipedia.) So if Ruby allows \u{xxxxxxxx} (8 hex digits) it will generate invalid codepoints in an illegal extension of UTF-8. David