Bill Kelly wrote:
> Brian Candler wrote:
>   
>> I got as far as recording 200 behaviours of String in ruby 1.9 before I 
>> gave up:
>> http://github.com/candlerb/string19/blob/master/string19.rb
>>
>> The solution I use is simple: stick to ruby 1.8.x. When that branch 
>> dies, perhaps reia will be ready. If not I'll move to something else.
>>
>> IMO, both python 3 and erlang have got the right idea when it comes to 
>> handling UTF8.
>>     
>
> Could you summarize what you feel the key difference of
> the python 3 / erlang approach is, compared to ruby19 ?
>   

Taking a UTF-8 approach is easier to implement because you enforce all
strings to be UTF-8 and ignore when this doesn't work.  Kind of like
saying everything will be ASCII or converted to it ;)

> I'm a relative newbie in dealing with character encodings,
> but I do recall a few lengthy discussions on this list when
> ruby19's M17N was being developed, where the "UTF-8 only"
> approaches of some other languages were deemed insufficient
> for various reasons.
>   

Not everything maps one-to-one to UTF-8.

> However, my understanding is that one is supposed to be
> able to effectively make ruby behave as a "UTF-8 only"
> language if one makes sure external data is transcoded to
> UTF-8 at I/O boundaries.
>   

That is pretty much it.  The problem is that a lot of libraries still
don't handle encodings.  This results in some spurious errors when a
function requiring compatible encoding operates on them[1].  The
solution is to add support for handling encodings.

Edward

1. As appose to ruby 1.8 which would silently ignore actual errors
caused by the use of incompatible encodings.