On 6/21/06, Yukihiro Matsumoto <matz / ruby-lang.org> wrote:
> Hi,
>
> In message "Re: Unicode roadmap?"
>     on Wed, 21 Jun 2006 20:45:38 +0900, "Michal Suchanek" <hramrach / centrum.cz> writes:
>
> |> If you choose to convert all input text data into Unicode (and convert
> |> them back at output), there's no need for unreliable automatic
> |> conversion.
> |
> |Well, it's actually you who chose the conversion on input for me.
> |Since the strings aren't automatically converted I have to ensure that
> |I have always strings encoded using the same encoding. And the only
> |reasonable way I can think of is to convert any string that enters my
> |application (or class) to an arbitrary encoding I choose in advance.
>
> Agreed.  It is me.  Perhaps you don't know how terrible code
> conversion can be.  In the ideal world, lazy conversion seems
> attractive, but reality bites.  Conversions fail so easily.
> Characters lost, text broken.  Failures can not be avoided for various
> reasons, mostly historical reasons we can't fix anymore.  When error
> happens (often) it's good to detect errors as early as possible,
> i.e. on input/output.  So I encourage universal character set model as
> far as it is applicable.  You may use UTF-8 or ISO8859-1 for universal
> character set.  I may use EUC-JP for it.

I do not see how converting the strings on input will make the
situation better than converting them later. The exact place where the
text is garbled because it is converted incorrectly does not change
the fact it is no longer usable, does it?
well, it may be possible to detect characters that are invalid for
certain encoding either by scanning the string or by attempting a
conversion. But I would rather like optional checks that can be added
when something breaks or is likely to break rather than forced
conversion.

Or to put it another way: If I get a string from somewhere where the
encoding is marked incorrectly it is wrong and it should be expected
to fail. And I can do some checks if I think my source of data is not
reliable in this respect. But if I get string that is marked correctly
and it fails because I did not manually convert it it is frustrating.
And needlessly so.

>
> For only rare case, there might be need to handle multiple encoding in
> an application.  I do want to allow it.  But I am not sure how we can
> help that kind of applications, since they are fundamentally complex.
> And we don't have enough experience to design a framework for such
> applications.

I do no think it is that rare. Most people want new web (or any other)
stuff in utf-8 but there is need to interface legacy databases or
applications. Sometimes converting the data to fit the new application
is not practical. For one, the legacy application may be still used as
well.

Anyway, Ruby being as dynamic as it is I should be able to add support
for automatic recoding myself quite easily. The problem is I would not
be able to use it in libraries (should I ever write some) without
risking a clash with similar feature added by somebody else.

Thanks

Michal