On Sat, 29 Jun 2002, Frank Mitchell wrote: > Java programmers will tell you that converting Unicode to a native > encoding takes up a surprisingly large amount of time. Well, I'm a Java programmer, and I tend to disagree with that. Especially if you're using Latin-1, the conversion is very, very cheap. (Because it does almost nothing!) > Reading a string > from a file, doing a trivial substitution, and writing it to another > file does an unnecessary amount of work. It certainly does! But that's because you programmed it poorly for Java's model. Something like this, if you want it to be efficient, should never use a java.lang.String class. Almost every time I've seen poor performance in String handling in Java, it's been because the programmer is using strings badly and forcing a lot of data copies. > Maybe this has been suggested already but, since Ruby is > object-oriented, I'd vote for two (or more) virtually indistinguishable > String classes, one for Unicode strings, one for single-byte strings. Now this, I agree with, and I sure wish Java had it. We need, essentially, "character strings" (which are Unicode) and "byte strings" (which are a set of arbitrary bytes). > Perhaps byte strings could have an "encoding" attribute (a Symbol) to > make converting from one representation to another automatic. That could be handy, yes. > Maybe > you'd also need a distinction between getting the Nth byte, and getting > the Nth character (always converted to a Unicode character.) Err...I'd say provide just "get the Nth byte," and leave it to character strings to get the Nth character; should the programmer need it he can use ByteString.getCharacterString or whatever. cjs -- Curt Sampson <cjs / cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're all light. --XTC