------art_152423_10932093.1150910270345
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

On 6/21/06, Yukihiro Matsumoto <matz / ruby-lang.org> wrote:
>
> Hi,
>
> In message "Re: Unicode roadmap?"
>     on Thu, 22 Jun 2006 00:41:02 +0900, Julian 'Julik' Tarkhanov <
> listbox / julik.nl> writes:
>
> |Matz, this would be a disaster (if in such a situation a library
> |throws). It's gonna be like python.
> |Because it means that 99 percent of the libraries will throw.
>
> Can you elaborate?  I don't want to see disaster whatever it is.
>
>                                                         matz.
>
>

Single scripts and small self-contained applications almost always
are written in the same codepage. Usually text data processing also
is done for the same codepage, that simplifies life a lot even with
current String as byte vector. So recoding is an overhead here, and
external data is only recoded on input/output in relativey small number
of well-defined places, using known subset of source and target encodings.
In this case when you know what to expect from your file/network IO, things
are OK.

It is also OK, when part of script is extracted and evolves to a library,
as long as you use it in the same environment.

But let's view a case when several third-party libraries are used, all
returning
strings with different encodings. gettext for libraries won't solve
everything, as even externalized strings will have some particular encoding.
E.g. localization libraries can't fit in only ASCII.

And now calls to methods will behave like some kind of IO in respect to
encoding of passed parameters.
Number of i/o points grows drastically.

How can it be solved in consistent and reliable manner?
a) just simply declare in documentation: "Methods in these classes *require*

strings to be in UTF16, you've been warned!!!"

  So users of that code will have to remember those constrains and enforce
  encoding of their data before calling those methods. With dynamic nature
  of Ruby things will break in unexpected places. No, i dislike idea to
write:

     str.enforce_encoding!(BooClass::INTERNAL_ENCODING)
     b  ooClass.new(str)

b) take care in called methods to enforce encoding
     def process_formatting(str)
        str.enforce_encoding!(MY_INTERNAL_ENCODING)
        # now it is compatible with rest of my code
        # and i can do something with it
     end

 This is also too error-prone :(

And what about processing results of calls? To take care about it in caller
code?
       res_str  omeUtil.fancy_format( str )
       res_str.enforce_encoding!(MY_INTERNAL_ENCODING)

On input parameters and returned results which represent complex structures
with some
String fields things will go even worse.

Who will ever cope with this issues?
Probably this is what Julik meant  by "disaster"?

Things shouldn't be that complicated.

------art_152423_10932093.1150910270345--