On Dec 8, 8:29 am, Axel Etzold <AEtz... / gmx.de> wrote: > -------- Original-Nachricht -------- > > > > > Datum: Sat, 8 Dec 2007 22:15:00 +0900 > > Von: MonkeeSage <MonkeeS... / gmail.com> > > An: ruby-t... / ruby-lang.org > > Betreff: Re: sorting Array of accentuated Strings > > On Dec 8, 1:09 am, unbewusst.s... / weltanschauung.com.invalid (Une > > Bévue) wrote: > > > cruiserdan <d... / zeraweb.com> wrote: > > > > > You might be interested in using some of the emerging Unicode support > > > > in Ruby. Ruby 2 will have it built-in and there are several libraries > > > > out there, although I don't have any experience using them. > > > > right, thanks, i've only wrote a workaround before getting Ruby 2... > > > -- > > > Une Bévue > > > Hmmm. Maybe I'm mistaken, but this seems to have nothing to do with > > unicode. An ascii char is always going to be less than a utf-8 char, > > since utf-8 is a superset of ascii. > > > Fenêtre <=> tre -> > > > F (\x46) <=> (\xc3\x8a) -> > > > -1 > > > To get the right behavior I think you have to translate the utf-8 > > characters to ascii. You can try something like: > > > require 'iconv' > > class String > > def translit > > Iconv.iconv('ascii//translit', 'utf-8', self)[0] > > end > > end > > a.sort { | i, j | i.translit <=> j.translit } > > > But some people have had strange effects from #iconv (e.g., a recent > > thread [1]). > > > Regards, > > Jordan > > Besides that, the problem of sorting accented strings seems to be > somewhat unsolvable, as different natural languages using the > same accents have different conventions. > I'd claim the highest degree of inconsistency in this issue > for the German language (other proposals invited): > > - German phone books sort words containing <A-DIAERESIS>,<O-DIAERESIS>, > <U-DIAERESIS>, as if they were spelled with "AE","OE","UE" instead of <A-DIAERESIS> etc., > - otherwise, the diacritics are quite often just ignored, > - in Austria, including in phone books, diacritics come behind "z" .... (just like in Swedish, where <A-DIAERESIS>,<O-DIAERESIS> are also used (but consistently), > - French and Spanish use diaeresis on some letters to mark that > they have to be pronounced separately (Citro{"e}n,Camag{"u}ey). > > (see:http://en.wikipedia.org/wiki/Collation) > > How can one establish a single standard, for all (natural) languages > with such a confusion ? Just to emphasize the point...Greek ¦Ç (eta) can be transliterated as e, (yet another level of indirection!), h or i. ;) > I'd recommend to use a couple of gsub calls, much like Xavier Noria > proposed in his post > > http://groups.google.de/group/comp.lang.ruby/browse_thread/thread/9fb... > > and to adapt them to the situation at hand to pre-process the strings > to sort. > > Best regards, > > Axel > > -- > GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS. > Alle Infos und kostenlose Anmeldung:http://www.gmx.net/de/go/freemail Regards, Jordan