On 6/14/06, Austin Ziegler <halostatue / gmail.com> wrote: > On 6/14/06, Michal Suchanek <hramrach / centrum.cz> wrote: > > What I want is all methods working seamlessly with unicode strings so > > that I do not have to think about the encoding. > > That will *never* happen. Even with Unicode, you have to think about > the encoding, because UTF-32 (the closest representation to the > Platonic ideal "Unicode" you'll ever find) is unlikely to be supported > in the general case. Matz's idea of m17n strings is the right one: you > have a "byte stream" and an attribute which indicates how the byte > stream is encoded. This will sort of be like $KCODE but on an > individual string level so that you could meaningfully have Unicode > (probably UTF-8) and ShiftJIS strings in the same data and still > meaningfully call #length on them. > > You will *always* have to care about the encoding. As well as, > ultimately, your locale. No. Since I have locale stdin can be marked with the proper encoding information so that all stings originating there have the proper encoding information. The string methods should not just blindly operate on bytes but use the encoding information to operate on characters rather than bytes. Sure something like byte_length is needed when the string is stored somewhere outside Ruby but standard string methods should work with character offsets and characters, not byte offsets nor bytes. Since my stdout can be also marked with correct encoding the strings that are output there can be converted to that encoding. Even if it originates from a source file that happens to be in a different encoding. Hmm, prehaps it will be necessary to mark source files with encoding tags as well. It could be quite tedious to assingn the tag manually to every string in a source file. When strings are compared, concatenated, .. the encoding is known so the methods should do the right thing. I do not have to care about encoding. You may make a string implemenation that forces me to care (such a the current one). But I do not have to. I can always turn to perl if I get really desperate. Thanks Michal