On 6/21/06, Yukihiro Matsumoto <matz / ruby-lang.org> wrote: > Hi, > > In message "Re: Unicode roadmap?" > on Wed, 21 Jun 2006 20:45:38 +0900, "Michal Suchanek" <hramrach / centrum.cz> writes: > > |> If you choose to convert all input text data into Unicode (and convert > |> them back at output), there's no need for unreliable automatic > |> conversion. > | > |Well, it's actually you who chose the conversion on input for me. > |Since the strings aren't automatically converted I have to ensure that > |I have always strings encoded using the same encoding. And the only > |reasonable way I can think of is to convert any string that enters my > |application (or class) to an arbitrary encoding I choose in advance. > > Agreed. It is me. Perhaps you don't know how terrible code > conversion can be. In the ideal world, lazy conversion seems > attractive, but reality bites. Conversions fail so easily. > Characters lost, text broken. Failures can not be avoided for various > reasons, mostly historical reasons we can't fix anymore. When error > happens (often) it's good to detect errors as early as possible, > i.e. on input/output. So I encourage universal character set model as > far as it is applicable. You may use UTF-8 or ISO8859-1 for universal > character set. I may use EUC-JP for it. I do not see how converting the strings on input will make the situation better than converting them later. The exact place where the text is garbled because it is converted incorrectly does not change the fact it is no longer usable, does it? well, it may be possible to detect characters that are invalid for certain encoding either by scanning the string or by attempting a conversion. But I would rather like optional checks that can be added when something breaks or is likely to break rather than forced conversion. Or to put it another way: If I get a string from somewhere where the encoding is marked incorrectly it is wrong and it should be expected to fail. And I can do some checks if I think my source of data is not reliable in this respect. But if I get string that is marked correctly and it fails because I did not manually convert it it is frustrating. And needlessly so. > > For only rare case, there might be need to handle multiple encoding in > an application. I do want to allow it. But I am not sure how we can > help that kind of applications, since they are fundamentally complex. > And we don't have enough experience to design a framework for such > applications. I do no think it is that rare. Most people want new web (or any other) stuff in utf-8 but there is need to interface legacy databases or applications. Sometimes converting the data to fit the new application is not practical. For one, the legacy application may be still used as well. Anyway, Ruby being as dynamic as it is I should be able to add support for automatic recoding myself quite easily. The problem is I would not be able to use it in libraries (should I ever write some) without risking a clash with similar feature added by somebody else. Thanks Michal