On Dec 8, 8:29 am, Axel Etzold <AEtz... / gmx.de> wrote:
> -------- Original-Nachricht --------
>
>
>
> > Datum: Sat, 8 Dec 2007 22:15:00 +0900
> > Von: MonkeeSage <MonkeeS... / gmail.com>
> > An: ruby-t... / ruby-lang.org
> > Betreff: Re: sorting Array of accentuated Strings
> > On Dec 8, 1:09 am, unbewusst.s... / weltanschauung.com.invalid (Une
> > Bévue) wrote:
> > > cruiserdan <d... / zeraweb.com> wrote:
>
> > > > You might be interested in using some of the emerging Unicode support
> > > > in Ruby. Ruby 2 will have it built-in and there are several libraries
> > > > out there, although I don't have any experience using them.
>
> > > right, thanks, i've only wrote a workaround before getting Ruby 2...
> > > --
> > > Une Bévue
>
> > Hmmm. Maybe I'm mistaken, but this seems to have nothing to do with
> > unicode. An ascii char is always going to be less than a utf-8 char,
> > since utf-8 is a superset of ascii.
>
> > Fenêtre <=> tre ->
>
> > F (\x46) <=>  (\xc3\x8a) ->
>
> > -1
>
> > To get the right behavior I think you have to translate the utf-8
> > characters to ascii. You can try something like:
>
> > require 'iconv'
> > class String
> >   def translit
> >     Iconv.iconv('ascii//translit', 'utf-8', self)[0]
> >   end
> > end
> > a.sort { | i, j | i.translit <=> j.translit }
>
> > But some people have had strange effects from #iconv (e.g., a recent
> > thread [1]).
>
> > Regards,
> > Jordan
>
> Besides that, the problem of sorting accented strings seems to be
> somewhat unsolvable, as different natural languages using the
> same accents have different conventions.
> I'd claim the highest degree of inconsistency in this issue
> for the German language (other proposals invited):
>
> - German phone books sort words containing <A-DIAERESIS>,<O-DIAERESIS>,
> <U-DIAERESIS>,  as if they were spelled with "AE","OE","UE" instead of <A-DIAERESIS> etc.,
> - otherwise, the diacritics are quite often just ignored,
> - in Austria, including in phone books, diacritics come behind "z" .... (just like in Swedish, where <A-DIAERESIS>,<O-DIAERESIS> are also used (but consistently),
> - French and Spanish use diaeresis on some letters to mark that
> they have to be pronounced separately (Citro{"e}n,Camag{"u}ey).
>
> (see:http://en.wikipedia.org/wiki/Collation)
>
> How can one establish a single standard, for all (natural) languages
> with such a confusion ?

Just to emphasize the point...Greek  (eta) can be transliterated as
e,  (yet another level of indirection!), h or i. ;)

> I'd recommend to use a couple of gsub calls, much like Xavier Noria
> proposed in his post
>
> http://groups.google.de/group/comp.lang.ruby/browse_thread/thread/9fb...
>
> and to adapt them to the situation at hand to pre-process the strings
> to sort.
>
> Best regards,
>
> Axel
>
> --
> GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS.
> Alle Infos und kostenlose Anmeldung:http://www.gmx.net/de/go/freemail

Regards,
Jordan