At 17:20 08/09/17, Robert Klemme wrote:
>Disclaimer: I haven't used 1.9 encoding stuff so far. Nevertheless my 0.02EUR:
>
>2008/9/17 Michael Selig <michael.selig / fs.com.au>:
>> <soapbox>
>>
>> Using Ruby SHOULD be making our lives easier, not harder. Other languages
>> like Python have taken an easier route to m17n - represent all strings
>> internally as unicode codepoints.
>
>Which is also what Java does.  I have always found Java's approach to
>encodings very clean and workable.  But if I remember correctly Matz
>once said that Unicode does not cover all Asian symbols so it might
>not be a too good choice for internal representation.

In some sense, this is true, but then this is true for any other encoding
(in particular all those used in Asia), too. So that's not really an
argument (apart from the fact that if you really need, you can always
use the huge private-use areas provided by Unicode; not that I would
suggest that though myself).


>I believe that one reason for the difficulties we encounter now is the
>fact that String is historically used for binary and text data.  So
>there is no clear separation between the two and this bears potential
>for confusion and bugs.

That's a part of the problem, but not too big a part.


>A clean solution would probably involve having a character type which
>is capable of representing *all* possible symbols and model String as
>sequence of those characters.

Well, yes, that could be done, e.g. model a character as the union
of Unicode codepoints and any other odd objects. Users could then
define their own objects for their own characters, e.g. with lots
of metadata, or font information, or what not. Such ideas have been
around for a long time, but in contrast to Ruby's current model,
which in some ways is on the edge, but still doable, such a model
quickly becomes way more complex and hopelessly slow.


>Encoding would then be done during
>input and output only.  Questions I see
>
>1. Is this feasible, i.e. is there something similar to Unicode
>without its limitations?

Unicode is as good as it gets. And it gets better and better
(not all historic or minority scripts are encoded yet, but
that work is ongoing). The conclusion is: If you want something
better than Unicode (in particular something with more scripts
and characters covered), the best thing is to contribute to
Unicode.

Regards,   Martin.



#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst / it.aoyama.ac.jp