Yukihiro Matsumoto wrote:
> Hi,
> 
> In message "Re: \u escapes in string literals: proof of concept   implementation"
>     on Tue, 23 Oct 2007 16:53:57 +0900, David Flanagan <david / davidflanagan.com> writes:
> 
> |I like the \Uxxxxxx escape instead of \u{}.  Would you consider this, Matz?
> 
> Actually I hate counting digits.  When I am forced to put sufficient
> number of preceding zeros to specify non-BMP character, I'd go mad.
> Is there any reason \U<8ditits> is better than \u{}?  If it's
> sufficient reason, it's OK to allow \U as well.
> 
> 							matz.

I've been meaning to ask you about the 8 digits.  Unicode only uses 6 
digits currently: the highest allowed codepoint is 10FFFF.  So even if 
Unicode grew to have 16 times then number of codepoints 6 hex digits 
would still be enough.  What I was proposing was \U with exactly 6 
digits after it.  And you'd only use it for those rare codepoints with 5 
or 6 digits.  Without the curly braces it is shorter.  I don't feel 
actually feel strongly about \u{} versus \U however.  And reducing the 
number of special characters after slash is probably a good thing.

Unless I'm missing the point, however, I don't think there is any reason 
to allow 4-byte codepoints.  I read somewhere that although the UTF-8 
encoding scheme can be extended to encode 32 bits in 6 bytes, this is 
actually forbidden by the UTF-8 spec.  (I haven't verified that, but I 
think I saw it on Wikipedia.)  So if Ruby allows \u{xxxxxxxx} (8 hex 
digits) it will generate invalid codepoints in an illegal extension of 
UTF-8.


	David