Hi,
In message "Re: Unicode roadmap?"
on Wed, 21 Jun 2006 20:45:38 +0900, "Michal Suchanek" <hramrach / centrum.cz> writes:
|> If you choose to convert all input text data into Unicode (and convert
|> them back at output), there's no need for unreliable automatic
|> conversion.
|
|Well, it's actually you who chose the conversion on input for me.
|Since the strings aren't automatically converted I have to ensure that
|I have always strings encoded using the same encoding. And the only
|reasonable way I can think of is to convert any string that enters my
|application (or class) to an arbitrary encoding I choose in advance.
Agreed. It is me. Perhaps you don't know how terrible code
conversion can be. In the ideal world, lazy conversion seems
attractive, but reality bites. Conversions fail so easily.
Characters lost, text broken. Failures can not be avoided for various
reasons, mostly historical reasons we can't fix anymore. When error
happens (often) it's good to detect errors as early as possible,
i.e. on input/output. So I encourage universal character set model as
far as it is applicable. You may use UTF-8 or ISO8859-1 for universal
character set. I may use EUC-JP for it.
For only rare case, there might be need to handle multiple encoding in
an application. I do want to allow it. But I am not sure how we can
help that kind of applications, since they are fundamentally complex.
And we don't have enough experience to design a framework for such
applications.
matz.