On 26.6.2006, at 20:37, Michal Suchanek wrote: >> However, whether you use an encoding or not, you still get a String >> back. Consider: >> >> s1 = File.open("file.txt", "rb") { |f| f.read } >> s2 = File.open("file.txt", "rb", encoding: :utf8) { |f| f.read } >> >> s1.class == s2.class # true >> s1.encoding == s2.encoding # false >> >> But that doesn't mean I have to keep treating s1 as a raw data byte >> array -- or even convert it. >> >> s1.encoding = :utf8 >> s1.encoding == s2.encoding # true >> >> I think that the fundamental difference here is whether you view >> encoded >> strings as fundamentally different objects, or whether you view the >> encodings as *lenses* on how to interpret the object data. I >> prefer the >> latter view. > > If you consider s3 = File.open('legacy.txt','rb',:iso885915) { |f| > f.read } > without autoconversion you would have to immediately do > s3.recode :utf8 > otherwise s1 + s3 would not work. Yes. This shows that if there is no autoconversion, programmer will always need to recode to a common app encoding if the aplication is to work without problems. And if we always need to recode strings which we receive from third-part classes/libraries, encoding handling will either consume half of the program lines or people won't do it and programs will be full of errors. As can be seen from experience of other languages (and Ruby), the second option will prevail and we will be in a mess not much better than today. Therefore m17n without autconversion (as is current Matz's proposal) gains us almost nothing. If we have no autoconversion, my vote goes to Unicode internal encoding (because it implicitly handles autoconversion problems). On the topic of ByteArray: my concern is that the distinction between bytes and characters will not be clear and therefore we need to introduce ByteArray to separate bytes from characters, to ensure reliability and predictability of code like result = File.open ( "file" ) { |f| f.read 1000 } (now tell me what 'result' is?}. If there will be clear and simple rules, such as "IO always returns binary strings if not given encoding parameter" then this distinction will not need to be additionally enforced by separating classes. One String class will do. On the other hand, if there will be all kinds of automatic encoding tagging for convenience of simple-script-writers, then we need ByteArray to prevent error-prone code with undefined results. izidor