On 20.03.2011 14:19, Brian Candler wrote:
> Robert K. wrote in post #988404:
>> --------------------------------------------------- IO#external_encoding
>>        io.external_encoding   =>  encoding
>>
>>        From Ruby 1.9.1
>> ------------------------------------------------------------------------
>>        Returns the Encoding object that represents the encoding of the>>        file. If io is write mode and no encoding is specified, returns>>        +nil+.
>>
>> I'd say it means that the default encoding is used.
>
> No, it doesn't.

So, which encoding is used then?  An encoding *has* to be used because 
you cannot write to a file without a particular encoding.  There needs 
to be a defined mapping between character data and bytes in the file.

>> Apparently the file *is* encoded in UTF-8 because I can read it without
>> errors
>
> ruby 1.9 does not give errors if you read a file which is not UTF-8
> encoded with the external encoding is UTF-8. You will just get strings
> with valid_encoding? false.

I could see in the console that the file was read properly.  Also:

irb(main):001:0> File.open("x","w"){|io| p io.external_encoding; io.puts "a"}
nil
=> nil
irb(main):002:0> s = File.open("x","r:UTF-8"){|io| p 
io.external_encoding; io.read}
#<Encoding:UTF-8>
=> "a\n"
irb(main):003:0> s.valid_encoding?
=> true
irb(main):004:0>

> It will give errors if you attempt UTF-8 regexp matches on the data
> though.
>
> The rules for which methods give errors and which don't are pretty odd.> For example, string[n] doesn't give an exception, even if the string is> invalid.

I would concede that encodings in Ruby are pretty complex.  It's easier 
in Java where String never has a particular encoding and only reading 
and writing uses encodings.  However, Java's Strings were not capable of handling all Asian symbols as I have learned on this list.  Since 1.5 
they managed to increase the range of Unicode codepoints which can be 
covered - at the cost of making String handling a mess:

http://download.oracle.com/javase/6/docs/api/java/lang/String.html#codePointAt%28int%29

Now suddenly String.length() no longer returns the length in real 
characters (code points) but rather the length in chars.  I figure, 
Ruby's solution might not be so bad after all.

Kind regards

	robert

-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/