At 10:20 08/09/17, Michael Selig wrote: >Hi, > >You might at first glance think that this post should go to ruby-dev, but >please read to the end! If it's in English, it should be ruby-core, not ruby-dev, as far as I understand. >I have been pulling my hair out trying to convert a relatively simple app >to support m17n under Ruby 1.9 to see what is involved. I need to support >all common locales worldwide, and data can also be stored in UTF-8 or >UTF-16. I was hoping that Ruby 1.9 was going to take the hard work out of >this for me. It has to a certain extent, but UTF-16 is the problem - it >breaks so many things, due to its "ASCII incompatibility" (using Ruby's >definition). I can't even do simple things like pull out fields and >substitute into another string without testing "encoding compatibility". >Something as simple as: > > puts "The value is #{val}" > >fails if val is UTF-16 data. I think in this case, the reason why you see the problem only for UTF-16 is that your string, other than the interpolated data, is currently all US-ASCII. But immagine that sooner or later you (or somebody) is going to localize your application. Then the string might be in any encoding, and you'll get much more "encoding compatibility" exceptions. >At one stage I got so frustrated that I was even thinking about going back >to Python :-( >So I have ended up transcoding any UTF-16 data to UTF-8, and now things >are going much better. > >Maybe I am doing something wrong - if so please suggest something I can do >other than transcode the UTF-16. I think your problem is more general, and you should transcode other encodings to UTF-8, too, if you're not sure you'll be in a situation with a single encoding. >But this has lead me to look back at the issues with UTF-16 I have hit, >and to think about all the internal code in Ruby to handle "ASCII >incompatible" encodings, and the overhead involved with supporting it. > >And I think that other Ruby programmers may end up doing what I have done >- avoid using UTF-16 internally because it is too hard. I agree that all non-ASCII encodings should come with a sticker with a big warning on it, at least. >So my radical suggestion is this: > >Remove internal support for non-ASCII encodings completely, and when >reading/writing UTF-16 (and UTF-32) files automatically transcode to/from >UTF-8. I can understand the former part. Providing something half-baked can have advantages and disadvantages. >My reasons: > >- String & Regexp operations should just "work" without the programmer >worrying about encoding comaptibility (I think!) See below. >- The programmer only has to think about character encodings at the >"interfaces" (files, network interfaces) not throughout the program logic This is desirable/good architecture. Ruby 1.9 will force you to do that, or come up with some other architecture, but won't handle things automatically for you. >- To my knowledge UTF-16 & UTF-32 are the only "non-ASCII compatible" as >Ruby defines it No, there are others, such as iso-2022-jp. But they are not really the main issue. You can get an encoding incompatibility error for any two ASCII-compatible encodings. E.g. iso-8859-1 and iso-8859-2, or any two others. The reason that you currently don't is that one of your strings (or a regexp) always is ASCII-only, even if it's labeled as something else. >- To my knowledge no one actually uses UTF-16 or UTF-32 as a locale True. >- I would avoid having to use ugly modes to open a file like >"r:UTF-16LE:UTF-8" (very minor) Telling Ruby what encoding you expect from the outside is kind of unavoidable. But it would indeed help if it would suffice to tell a Ruby application only once that you want to handle everything internally in a certain encoding. >- Ruby's internal code would be simpler & cleaner and therefore probably >faster and easier to maintain If everything is done in UTF-8 all the time, yes. But I don't think we will go there soon (I wouldn't mind). Speed isn't too much of an issue, but of course the code would be quite a bit simpler. Regards, Martin. #-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University #-#-# http://www.sw.it.aoyama.ac.jp mailto:duerst / it.aoyama.ac.jp