Firstly, I apologise if I am going over old ground here - I haven't been  
on the mailing list for several months.


I have started to investigate using 1.9, and I spent quite a bit of time  
playing with the new string & character encoding features over the  
weekend, and I have a few comments and suggestions.

1) Maybe I am blind, but I cannot find something like String#each_code to  
return an Enumerator of the Unicode codepoints as fixnums. Is there such a  
beast? If not, I think there should be, considering that there is a  
String#each_byte. (Yes, you can use String#each_char and then String#ord  
on each). Also I think there should be an equivalent of String#setbyte &  
getbyte for unicode codepoints (String#setcode & getcode?).

2) If there are new "code" methods above as mentioned above, are the  
methods String#getbyte, setbyte, each_byte, really necessary? You can  
always do "force_encoding("BINARY")" if you really want to do byte  
stuffing, and then each_code, setcode & getcode should do the same as the  
current "byte" methods.

3) Suggestion: when opening a file with mode "b" (binary), I think the  
encoding should be automatically set to 8-bit ASCII, overriding the  
default locale. I think this should happen on Linux & Unix as well. That  
way "IO#readchar" and others will only try to do byte-by-byte processing  
(I hope!).

4) I notice that some methods like String#toutf8 no longer exist, but are  
still in the doc.

I'd like to say how amazing the character encoding implementation is. I  
don't know of any other language that has attempted to support all  
encodings internally, as you guys have. You have also done a really good  
job at optimizing UTF-8 string processing performance when all data is  
ASCII. However, I imagine that using UTF-8 internally for strings of  
multi-byte characters (or any other variable-length encoding) is going to  
be slow. I also have a concern that supporting so many character encodings  
internally is making Ruby's C code (eg: string.c) hard to optimize for a  
particular class of encoding and when you do, messy and difficult to  
maintain. It would be nicer if the internal implementation of say "String"  
could be done in a more OO approach, based on encoding. Probably easier  
said than done, though!

Thanks,
Mike