On 26.6.2006, at 21:46, Austin Ziegler wrote:

> On 6/26/06, Izidor Jerebic <ij.rubylist / gmail.com> wrote:
>> On 26.6.2006, at 20:37, Michal Suchanek wrote:
>>> If you consider s3 = File.open('legacy.txt','rb',:iso885915) { |f|
>>> f.read } without autoconversion you would have to immediately do
>>> s3.recode :utf8 otherwise s1 + s3 would not work.
>> Yes. This shows that if there is no autoconversion, programmer will
>> always need to recode to a common app encoding if the aplication is
>> to work without problems. And if we always need to recode strings
>> which we receive from third-part classes/libraries, encoding handling
>> will either consume half of the program lines  or people won't do it
>> and programs will be full of errors. As can be seen from experience
>> of other languages (and Ruby), the second option will prevail and we
>> will be in a mess not much better than today.
>
> I doubt this is in the least bit true.
> I'm saying that your cure is far worse than disease.

Basically, I am just advocating to get autoconversion into "official"  
proposal. I am not proposing unicode. But if there is no  
autoconversion, unicode is better. This claim is supposed to get  
support for autoconversion :-)

BTW, you may have no problems at all. We, on the other hand, have  
lots of problems (in Ruby and other languages) which can be traced to  
exactly this hope of "all programmers will be doing lots of manual  
work to make things safe for others". You are deluded.

In environments which already have this cure (internal unicode),  
there are no such enormous problems as we experience in those without  
this cure. So sucessess and failures I describe are based on real  
experience. Unlike your claims, which are just opinions.

I am not saying that unicode encoding is the ideal solution. But it  
turned out to be quite good one, and for sure much better than manual  
checking/changing of encoding.

>
>> Therefore m17n without autconversion (as is current Matz's proposal)
>> gains us almost nothing. If we have no autoconversion, my vote  
>> goes to
>> Unicode internal encoding (because it implicitly handles
>> autoconversion problems).
>
> So does the coersion proposal that I've made without locking ourselves
> into Unicode.

But that is your proposal (and mine and several others'), not Matz's.  
Current "official" proposal will make a mess.

>
> I couldn't make sense of your last paragraph.

Well, tell me what exactly do I get when this code executes:

result = File.open( "file ) { |f| f.read( 1000 ) }

What is 'result' ? Binary string under all circumstances? Or maybe  
sometimes I get a String and sometimes I get a binary String? Which  
one under what circumstances?

This is called error-prone code with undefined results.

We have two equally good options:
1. If we change API and IO returns ByteArray, we have no confusion.
2. If we have clear and simple rules about IO returning Strings, we  
also have no confusion.

Therefore, if there will be complex auto-magic String tagging with  
encoding, I prefer introducing ByteArray, because it will prevent  
errors.


izidor