>|Like many others, I would be happy to devote a large >|amount of time to Ruby. In my particular case it >|would be to i18n, since I can't use Ruby without it. >|But in practice, I have no way to find out whether >|someone in Japan is already making an i18n effort, or >|whether any changes I made would be accepted, or >|whether matz has decided what i18n should consist of, >|so it doesn't really make sense for me to do anything >|at all. >You can tell me what you like to see in the future, >although I cannot >promise you anything (yet). I mean I'd like to hear >about the spec, >not about the implementation. Well, since you asked for my Christmas list, here it is! My wishes for the spec are very similar to those you stated years ago: >>>Their's only one I18N policy for Ruby. >>> It should not cause me trouble handling Japanese. ([ruby-talk:02587]) I would simply like to amend it slightly, thus: >>>It should not cause me trouble handling text. To meet this spec, I think the following features would be needed. * Files in text mode should be read in to provide a stream of characters, not bytes. It will sometimes be necessary to specify the encoding explicitly, but most common ones can be guessed. Ruby should NOT stop reading a file when it comes to a 0x1a character! *splutter* * Files in text mode should appear in Ruby as a stream of characters, and be written out to disk as bytes in the specified format. *Consoles and other IO devices are like files in this respect. To my Ruby program, it looks like I am just sending and getting 'characters'. In the Ruby engine code, it will be necessary to translate them to whatever encoding is specified for the console/port/whatever. *Strings should be of characters. length() should return character length, each() should iterate by characters, [4] should get me the 4th character in the string. Bytes and encodings are an implementation detail and I do not want to have to think about them when I think of a 'string'. *Regular expressions should work, even if I am searching for a hangul followed by an accent-independant 'e' in a chinese document. They should operate on characters, not bytes. *All characters that exist in Unicode plane 0 should be specifiable, handled identically, handled fast, and handled in constant time in Ruby. Other characters like unicode surrogates and TRON characters are not essential; they may require special syntax and slower processing or may be unsupported totally. *Source string literals should be able to contain any Unicode character. There is no need for source to be able to be in any arbitrary encoding, though. UTF8 would probably be good. *Finally, although generally I want to think of a string as just characters, sometimes I need to deal with software that thinks in terms of bytes and INSISTS on EUC-KR or ASMO-708 or some other strange encoding. For these cases, it would be necessary to translate a string into a particular encoding like so: a = "my string".get_encoded_bytes("EUCKR") # a is now an array of bytes... *pauses for breath* I would of course be willing to work on any of these things if there were a plan. >For your information, >you can get and >see my experimental M17N implementation from the CVS >ruby_m17n branch. I know, but I figured something must have changed since then, even if there is no physical expression of it in cvs. Speaking of things in cvs, though, I should congratulate Kosako-san on providing a non-gnu regular expression library and thus removing a painful licensing issue. Ah, how wonderful oniguruma is! How yet more wonderful it could be if it worked on wide chars! Benjamin x