On Thu, 19 Feb 2009 21:00:52 +1100, Tanaka Akira <akr / fsij.org> wrote:

> It seems the number, 40, is a number for "big enough for
> names".
>
> Why don't you use 40 bytes data format, both with Ruby 1.8
> and 1.9?
>
> Do you think that 40 bytes is not big enough for names in
> some country?
>
> If the data format uses 40 bytes, instead of 40 chars,
> it is easy to read it in Ruby 1.8, even if it contains UTF-8
> chars.

Sorry, I think you have missed the point. I have input forms which accept  
40 characters, which I don't want to change. I want to make them allow  
UTF-8 characters, then to store them in 40 bytes means possible  
truncation, which I don't want. Not only would this be bad for the user,  
but I'd then have to truncate to a whole character so that the string is  
40 bytes or less. This might require space padding to make the total  
length 40 bytes. Very messy - very bad.

Also there are reports reading the data which expect the data to be 40  
characters wide. If it wasn't 40 chars, the formatting of the report may  
screw up.

>> Also I think there are other cases when applications which used to use
>> IO#read to read a fixed length ASCII string will need to be changed to
>> instead read the same fixed length but in chars. Currently the only way  
>> to
>> do this in Ruby is to use a loop I believe.
>
> I'd like to hear the actual example.

How many examples are enough?


>> Also it seems to me that the current usage of the "limit" parameter of
>> IO#gets is not intuitive in 1.9. It is "maximum number of bytes, but  
>> don't
>> split a character", and I think it should be changed to mean "maximum
>> number of chars". That would be much more obvious, more useful (IMHO),  
>> and
>> still be backward compatible with 1.8.
>
> It is introduced for security reason.  bytes are more stable
> than characters.

I understand that it is to prevent multibyte characters to be split.  
However, I think my suggestion is much, much better, as it not only  
provides the "security" of not splitting characters, but is easier to  
understand and more useful.and should be considered.

Cheers
Mike