On Thu, 19 Feb 2009 02:21:21 +1100, Michal Suchanek <hramrach / centrum.cz> wrote: > 2009/2/18 Tanaka Akira <akr / fsij.org>: >> In article >> <a5d587fb0902160252u56b50cfdv8e0fd36bb4f0b1b3 / mail.gmail.com>, >> Michal Suchanek <hramrach / centrum.cz> writes: >> >>>> What is represented by the N chars? >>> >>> I don't understand the question. N chars are N chars, they do not >>> represent anything else. >> >> I expect something like person's name, zip code, etc. >> >> However, person's name is variable length. >> >> The zip code (in Japan) is fixed length but multibyte >> encoding is not useful because it uses only digits. > > As was explained by the original poster there are file formats similar > to CSV that use fixed field length instead of separators. I have > myself used such files, and they were in 8-bit fixed width encoding. > > However, if you want to "upgrade" your code that uses such files to > multibyte for international support you need reading N characters. > > Of course, the alternative is to change your code to use a different > format.This might make exports to and imports from legacy applications > hard, however. > > Sure, the export can never be perfect if the files really contain > internationalized data because recoding to the legacy format and > encoding loses some information then. Yes, this is very close to the situation I was trying to explain. In more detail: I have a legacy system that uses fixed length fields. Yes, a name is variable length, but some old systems use a fixed length field, say 40 chars, which is space filled on the right (or truncated). In my case, the data input is by a form, and each field is fixed width. I am changing the system so that the SAME forms can be used, but extended to use UTF-8 not just ASCII. So this means that the number of characters is still fixed, but the number of bytes is no longer fixed. I do *not* want to change the format of the file (though it probably should be, but that would be a lot more work), because I want the application to be backward compatible (when using ASCII data). I hope you can now understand. Also I think there are other cases when applications which used to use IO#read to read a fixed length ASCII string will need to be changed to instead read the same fixed length but in chars. Currently the only way to do this in Ruby is to use a loop I believe. Also it seems to me that the current usage of the "limit" parameter of IO#gets is not intuitive in 1.9. It is "maximum number of bytes, but don't split a character", and I think it should be changed to mean "maximum number of chars". That would be much more obvious, more useful (IMHO), and still be backward compatible with 1.8. Cheers Mike