Hi,

In message "Unicode in Ruby now?"
    on 02/07/31, Tobias Peters <tpeters / invalid.uni-oldenburg.de> writes:

|When I export a string to an utf-8 encoded stream, how can I possibly know
|its current encoding. Strings do not have an "encoding" tag. Will they
|have in future?

Yes.

|Wouldn't it be a better solution to store strings in memory in a canonical
|format (be it utf-8 for space savings, ucs-4 for O(1) indexing operations,
|whatever) and let string sources and sinks have an "encoding" property, 
|and do the transformation on the fly?

No.  Considering the existence of "big character set" like Mojikyo
(charset developed in Japan, which is bigger than Unicode), there
cannot be any ideal canonical format.  In addition, from my
estimation, the cost penalty from code conversion to/from the
canonical character set is intolerable if one processes mainly on
non-ASCII, non-Unicode text data, like we do in Japan.

							matz.