On Oct 18, 2008, at 9:43 AM, Yukihiro Matsumoto wrote:

> Some languages allow string non ASCII case conversion in Unicode
> (perhaps ignoring Turkish case).  I myself sometimes want these
> functionality to normalize full-width alphabets used in Japanese.
> European people would have bigger needs for them.
>
> As far as I know, the issues are:
>
>  * some case conversion does not map one to one (German eszett)
>  * some case conversion does not round trip (German eszett)
>  * some case conversion rely on locale (Turkish i)
>
> Are there any other issues?  How big are they?  Can they be ignored?
> How other languages treat them?

I think Java is generally considered to get this reasonably correct.   
You can see the algorithms at http://www.docjar.com/html/api/java/lang/String.java.html 
  and http://www.docjar.com/html/api/java/lang/Character.java.html.    
It's nontrivial but not terrible.  Um, these days I guess it's OK for  
Ruby designers to look at Java source code?

Unfortunately, I can state from personal experience that the  
performance of String.toUpper() and toLower() is terrible, very slow.   
Not sure if that's inherent or just the quality of the  
implementation.   -Tim