Seebs wrote:
> On 2010-02-08, Brian Candler <b.candler / pobox.com> wrote:
>> The scariest bit for me is that a simple expression like
>>
>>     a = b + c
>>
>> (where b and c are both Strings) can raise exceptions. Writing your 
>> program so that you can be *sure* it won't raise an exception is hard. 
> 
> I'd rather get an exception than silently get incoherent output, though.

Likewise.

>> I don't want to have to expend effort working around artefacts of the 
>> language, especially when dealing with binary data.

It seems to me encodings are less artifacts of *the* language and more
artifacts of *language*.

> To some extent, I agree, but I was under the impression that you could
> address this by specifying a desired encoding.

Indeed, one can force_encoding ASCII-8BIT, if one wants "a = b + c" to
simply concatenate bytes without complaining that one may be jamming two
incompatible encodings together.

Also, reading a file opened in "rb" mode returns strings with encoding
already set to ASCII-8BIT.

So it's still possible to treat strings as binary in 1.9.


If it were really true that at any given point in my program, I can't
be sure that string 'b' doesn't have some random, incompatible encoding
from string 'c', then I think I'd agree with Brian that string handling
in 1.9 has become unreasonably complex.

But in practice, so far it has worked well for me to transcode to UTF-8
at I/O boundaries.  (Or, to use "rb" or force ASCII-8BIT if I know I'm
specifically dealing with binary data.)

So far, I'm just not experiencing much pain in dealing with encodings in
1.9.  And the places I have encountered exceptions, have been occasions
when I really would have been jamming incompatible encodings together,
and I was glad to know about it rather than be producing bogus data.

(In this case I was reading lines via popen() from a program ostensibly
outputting ISO_8859_1, but which under some circumstances, for some
fields, could output UTF-8 or MACROMAN.  So yes, I had to do some extra
work at the I/O boundary to try to handle such cases as well as possible;
but that is hardly Ruby's fault.)


Regards,

Bill