At 01:24 08/10/21, David Flanagan wrote:
>Tim Bray wrote:

>> However, in Unicode, it's not ambiguous whether a character has the upper-case or lower-case property.  What's ambiguous and locale-dependent and not even one-to-one is the mapping between the cases.  If Ruby program text were defined as unicode, I suppose you could allow anything with the "Lu" property.
>> Since Ruby is not limited to Unicode, and we don't know if Unicode and other character sets agree on the semantics of upper-case (I suspect not) it seems to me that only safe/portable definition of "upper-case" is [A-Z].

Excluding the somewhat complicated issue of the Georgian script
(which in its modern form basically is caseless), I cannot currently
immagine any case where upper/lower case semantics would differ in
legacy encodings. It's always difficult for character issues to say
"no, such a thing doesn't exist", but the majority of scripts don't
have casing distinctions, and case issues have always been taken
as very important when encoding characters in Unicode. So I think
this is one of the safer, if not safest, areas of Unicode.

The fact that only a few scripts have upper-case also means that
Ruby Class names could be written in only a few scripts. Unless e.g.
for Japanese, we want to come up with a convention such as
Katakana for constants, Hiragana for variables :-). 

>The consensus seems to be to leave the current rules as they are, and I agree.
>
>I do want to point out, however, that it isn't just case that is ignored outside of the ASCII range.  Any character outside of ASCII is considered a letter for the rules of identifier formation, for example.    I don't think it would make sense to start paying attention to letter case for constant names without also paying attention more generally to whether a character was in fact a letter for identifier names.

Good point. But rather than saying: All characters outside ASCII count
as lower case, all codepoints (I guess it's not just symbols, but even
unassigned codepoints) outside ASCII count as letters, I think it would
be much better to recognize this as an imperfect intermediary state,
with the documentation saying: Don't count on this to stay that way,
it may get fixed in the future.

Regards,    Martin.



#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst / it.aoyama.ac.jp