> Here you contradict yourself. Regexes are string (character) > operations, and you want them on byte arrays. So the concepts aren't > Similarily, when you read part of a file, and use it > to determine what kind of file it was you do not want to convert that > part into another class or re-read it because somebody decided String > and ByteVector are separate. Why not? When I read CGI params I get them as strings, but if I want to add them together I need to convert them to integers, because someone decided that "1" != 1. This is a good thing, so you don't get "5 purple elephants"+"3 monkeys" = 7, like you do in PHP. Likewise, when you read from a file/socket/whatever you might not be getting a real string, you might be getting a byte array. They are fundamentally different things, a byte array may happen to contain text at some point, but some time later it may be just a stream of data. Conversely a String _always_ contains human-readble text in whatever encoding you want. As someone who has to work with Unicode in PHP, I'd say it's important to separate the types. If you want to display something to a user you have to know what it is, but when you're reading a file you don't care, unless you know what's in it. A Unicode String could be a subclass of the byte array with some niceties for dealing with multibyte characters. Just a thought. -- Phillip Hutchings http://www.sitharus.com/