-------- Original-Nachricht --------
> Datum: Sat, 8 Dec 2007 22:15:00 +0900
> Von: MonkeeSage <MonkeeSage / gmail.com>
> An: ruby-talk / ruby-lang.org
> Betreff: Re: sorting Array of accentuated Strings

> On Dec 8, 1:09 am, unbewusst.s... / weltanschauung.com.invalid (Une
> Bue) wrote:
> > cruiserdan <d... / zeraweb.com> wrote:
> >
> > > You might be interested in using some of the emerging Unicode support
> > > in Ruby. Ruby 2 will have it built-in and there are several libraries
> > > out there, although I don't have any experience using them.
> >
> > right, thanks, i've only wrote a workaround before getting Ruby 2...
> > --
> > Une Bue
> 
> Hmmm. Maybe I'm mistaken, but this seems to have nothing to do with
> unicode. An ascii char is always going to be less than a utf-8 char,
> since utf-8 is a superset of ascii.
> 
> Fenre <=> tre ->
> 
> F (\x46) <=> (\xc3\x8a) ->
> 
> -1
> 
> To get the right behavior I think you have to translate the utf-8
> characters to ascii. You can try something like:
> 
> require 'iconv'
> class String
>   def translit
>     Iconv.iconv('ascii//translit', 'utf-8', self)[0]
>   end
> end
> a.sort { | i, j | i.translit <=> j.translit }
> 
> But some people have had strange effects from #iconv (e.g., a recent
> thread [1]).
> 
> Regards,
> Jordan

Besides that, the problem of sorting accented strings seems to be
somewhat unsolvable, as different natural languages using the
same accents have different conventions.
I'd claim the highest degree of inconsistency in this issue
for the German language (other proposals invited):

- German phone books sort words containing <A-DIAERESIS>,<O-DIAERESIS>,
<U-DIAERESIS>,  as if they were spelled with "AE","OE","UE" instead of <A-DIAERESIS> etc.,
- otherwise, the diacritics are quite often just ignored,
- in Austria, including in phone books, diacritics come behind "z" .... (just like in Swedish, where <A-DIAERESIS>,<O-DIAERESIS> are also used (but consistently),
- French and Spanish use diaeresis on some letters to mark that
they have to be pronounced separately (Citro{"e}n,Camag{"u}ey).

(see: http://en.wikipedia.org/wiki/Collation)

How can one establish a single standard, for all (natural) languages
with such a confusion ?

I'd recommend to use a couple of gsub calls, much like Xavier Noria
proposed in his post 

http://groups.google.de/group/comp.lang.ruby/browse_thread/thread/9fbb85fa49dd700f/eed0350375a53abe

and to adapt them to the situation at hand to pre-process the strings
to sort.

Best regards,

Axel 

-- 
GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS.
Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail