On 10/22/06, Wilson Bilkovich <wilsonb / gmail.com> wrote: > The problem is that proper upcasing and downcasing of characters is > locale-dependent, not just encoding or language-dependent. > > As examples, he mentioned that the uppercase version of accented > characters varies from area to area in France. No, not depending on jurisdiction in France. In French French, one would capitalize ?tre as Etre. In Canadian French, one would capitalize it as ?tre. > Also, in Turkish, there are four different cases of 'i', not just two.. and which is > correct depends on the jurisdiction. Not quite. There are two different 'i' letters: one with a dot, one without. One is capitalized with a dot and one is capitalized without the dot. Also, the German eszet (?, as in Schlo?) would be capitalized as SCHLOSS, but downcasing that would be schloss, not necessarily schlo?. (Actually, and the Germans here will correct me on this I'm sure, I think it would always be Schloss or Schlo? becaus the leading S would not be lowercased in proper German. Looking at some German webpages suggests so.) > Determining the locale in a correct way is really, really hard. Tim > Bray says it's basically impossible. Also, all of these rules make > any decent upcase/downcase function ruinously slow. Not impossible, just fraught with errors and performance issues. One would not only have to have the locale lookup stuff, but one would have to do statistical analysis to get better than mostly wrong with anything but English. ;) -austin -- Austin Ziegler * halostatue / gmail.com * http://www.halostatue.ca/ * austin / halostatue.ca * http://www.halostatue.ca/feed/ * austin / zieglers.ca