On 6/25/06, Phillip Hutchings <sitharus / sitharus.com> wrote:
>> Here you contradict yourself. Regexes are string (character)
>> operations, and you want them on byte arrays. So the concepts aren't
>> Similarily, when you read part of a file, and use it to determine
>> what kind of file it was you do not want to convert that part into
>> another class or re-read it because somebody decided String and
>> ByteVector are separate.
> Why not? When I read CGI params I get them as strings, but if I want
> to add them together I need to convert them to integers, because
> someone decided that "1" != 1. This is a good thing, so you don't get
> "5 purple elephants"+"3 monkeys" = 7, like you do in PHP.

Sorry, but "reading" CGI params is a red herring. You may get it as one
thing and then convert it to something else.

> Likewise, when you read from a file/socket/whatever you might not be
> getting a real string, you might be getting a byte array. They are
> fundamentally different things, a byte array may happen to contain
> text at some point, but some time later it may be just a stream of
> data. Conversely a String _always_ contains human-readble text in
> whatever encoding you want.

Okay. What class should I get here?

  data = File.open("file.txt", "rb") { |f| f.read }

Under the people who want separate ByteVector and String class, I'll
need *two* APIs:

  st = File.open("file.txt", "rb") { |f| f.read_string }
  bv = File.open("file.txt", "rb") { |f| f.read_bytes }

Stupid, stupid, stupid, stupid. If I have guessed wrong about the
contents of file.txt, I have to rewind and read it again. Better to
*always* read as bytes and then say, "this is actually UTF-8". This
would be as stupid in C++, Java, or C#:

  class File
  {
	bool read(string& st);
	bool read(byte_vector& bv);
  }

Yes, I can't actually read into the item, but have to call an accessor.
Moronic design, mostly because I can't do:

  class File
  {
	string read(void);
	byte_vector read(void);
  }

That would help in static languages, but they can't do that -- and Ruby
can't do it either, since variables are just labels.

> As someone who has to work with Unicode in PHP, I'd say it's important
> to separate the types. If you want to display something to a user you
> have to know what it is, but when you're reading a file you don't
> care, unless you know what's in it.

The problem here is not unification. The problem here is that PHP is
stupid. It is generally recognised that Ruby's API decisions are much
smarter than most other languages, and this is a good example of where
this would happen.

> A Unicode String could be a subclass of the byte array with some
> niceties for dealing with multibyte characters. Just a thought.

Unnecessary and overcomplex.

-austin
-- 
Austin Ziegler * halostatue / gmail.com * http://www.halostatue.ca/
               * austin / halostatue.ca * http://www.halostatue.ca/feed/
               * austin / zieglers.ca