------ art_152423_10932093.1150910270345 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline On 6/21/06, Yukihiro Matsumoto <matz / ruby-lang.org> wrote: > > Hi, > > In message "Re: Unicode roadmap?" > on Thu, 22 Jun 2006 00:41:02 +0900, Julian 'Julik' Tarkhanov < > listbox / julik.nl> writes: > > |Matz, this would be a disaster (if in such a situation a library > |throws). It's gonna be like python. > |Because it means that 99 percent of the libraries will throw. > > Can you elaborate? I don't want to see disaster whatever it is. > > matz. > > Single scripts and small self-contained applications almost always are written in the same codepage. Usually text data processing also is done for the same codepage, that simplifies life a lot even with current String as byte vector. So recoding is an overhead here, and external data is only recoded on input/output in relativey small number of well-defined places, using known subset of source and target encodings. In this case when you know what to expect from your file/network IO, things are OK. It is also OK, when part of script is extracted and evolves to a library, as long as you use it in the same environment. But let's view a case when several third-party libraries are used, all returning strings with different encodings. gettext for libraries won't solve everything, as even externalized strings will have some particular encoding. E.g. localization libraries can't fit in only ASCII. And now calls to methods will behave like some kind of IO in respect to encoding of passed parameters. Number of i/o points grows drastically. How can it be solved in consistent and reliable manner? a) just simply declare in documentation: "Methods in these classes *require* strings to be in UTF16, you've been warned!!!" So users of that code will have to remember those constrains and enforce encoding of their data before calling those methods. With dynamic nature of Ruby things will break in unexpected places. No, i dislike idea to write: str.enforce_encoding!(BooClass::INTERNAL_ENCODING) b ooClass.new(str) b) take care in called methods to enforce encoding def process_formatting(str) str.enforce_encoding!(MY_INTERNAL_ENCODING) # now it is compatible with rest of my code # and i can do something with it end This is also too error-prone :( And what about processing results of calls? To take care about it in caller code? res_str omeUtil.fancy_format( str ) res_str.enforce_encoding!(MY_INTERNAL_ENCODING) On input parameters and returned results which represent complex structures with some String fields things will go even worse. Who will ever cope with this issues? Probably this is what Julik meant by "disaster"? Things shouldn't be that complicated. ------ art_152423_10932093.1150910270345--