I've read the thread "Unicode in Ruby's Future?" [ruby-talk: 40016]. It 
remains a bit vague.

Of course you can already translate strings between multiple encodings
with one of the existing chartacter encoding libs right now. A problem
that I am currently facing:

When I export a string to an utf-8 encoded stream, how can I possibly know
its current encoding. Strings do not have an "encoding" tag. Will they
have in future?

Wouldn't it be a better solution to store strings in memory in a canonical
format (be it utf-8 for space savings, ucs-4 for O(1) indexing operations,
whatever) and let string sources and sinks have an "encoding" property, 
and do the transformation on the fly?

We would need to identify possible sinks and sources of character strings
and how to determine their encoding anyway. Anyone interested? Perhaps
I'll create a Wiki Page at rubygarden for that.

Examples: stdin and stderr would be influenced by the user's locale. 
Literal strings in ruby source code are a string source. There should be a 
mechanism to state the character encoding for ruby source files, with a 
reasonable default (which?). Filesystem names returned by Dir objects have 
a charset encoding. How to determine that?

You get the picture. If you do Data Serialization to formats that restrict 
the character encoding (be it xml or yaml), you have to know the encoding 
of strings in memory. It would be helpful if ruby determined the character 
encoding right when a string was created. Later on, there is no chance to 
do that (except for error-prone heuristics).

  Tobias