I would like to chime in here and point out that sometimes you really 
want to ignore the errors caused by mis-matched encodings, (as was the 
case in my script where I just wanted to match filenames ending in *.mpg 
and really didn't care if the characters occurring before had funkiness 
going on with them.)

1.8 had this kind of behavior by default, and I'm assuming python3 and 
erlang do too based on the descriptions given in this thread.

As Matz pointed out, you can force ruby1.9 to have this behavior simply 
by using the ASCII-8 encoding rather than the default ASCII-7 encoding. 
Basically causes the regular expression engine to look at the string as 
a series of bytes again like it used to rather than freaking out when it 
see's something it doesn't expect in that last byte.

I'm by no means knowledgeable about encodings, so take what I'm about to 
say with a grain of salt.  It seems like the old way of handling 
encodings was permissive but imprecise, and the new way is precise but 
not always permissive.  I like the ability to be precise because before 
that ability simply wasn't an option, however, since allot of people 
seem to be confused by the default behavior why not make the default 
behavior permissive and set it up so that IF YOU WANT to be precise you 
can enable the proper encodings that ensure that behavior?  To me this 
seems to fall in with the principal of least surprise.  (Sorry for 
quoting it, I know it's over-quoted).

What do people think?

         Regards
           Gary


Edward Middleton wrote:
> Bill Kelly wrote:
>>> handling UTF8.
>>>     
>>
>> Could you summarize what you feel the key difference of
>> the python 3 / erlang approach is, compared to ruby19 ?
>>   
> 
> Taking a UTF-8 approach is easier to implement because you enforce all
> strings to be UTF-8 and ignore when this doesn't work.  Kind of like
> saying everything will be ASCII or converted to it ;)
> 
>> I'm a relative newbie in dealing with character encodings,
>> but I do recall a few lengthy discussions on this list when
>> ruby19's M17N was being developed, where the "UTF-8 only"
>> approaches of some other languages were deemed insufficient
>> for various reasons.
>>   
> 
> Not everything maps one-to-one to UTF-8.
> 
>> However, my understanding is that one is supposed to be
>> able to effectively make ruby behave as a "UTF-8 only"
>> language if one makes sure external data is transcoded to
>> UTF-8 at I/O boundaries.
>>   
> 
> That is pretty much it.  The problem is that a lot of libraries still
> don't handle encodings.  This results in some spurious errors when a
> function requiring compatible encoding operates on them[1].  The
> solution is to add support for handling encodings.
> 
> Edward
> 
> 1. As appose to ruby 1.8 which would silently ignore actual errors
> caused by the use of incompatible encodings.

-- 
Posted via http://www.ruby-forum.com/.