On 28-jun-2006, at 20:36, Austin Ziegler wrote:

> Except that @top is guaranteed to not have an encoding -- at least it
> damned well better not -- and @top.bytes is redundant in this case. I
> see no reason to access #bytes unless I know I'm dealing with a
> multibyte String.
You never know if you are, that's the problem. And no, it's NOT  
redundant. You should just get used
to the fact that _all_ strings might become multibyte.

> Worse, why would "Not PNG." be treated as Unicode
> under your scheme but "\x89PNG\x0d\x0a\x1a\x0a" not be? I don't think
> you're thinking this through.
>
> @top[0, 8] is sufficient when you can guarantee that sizeof(char) ==
> sizeof(byte).

You can NEVER guarantee that. N e v e r. More languages and more  
people use multibyte characters by default than all
ASCII users combined.

It seems very pity but you still approcah multibyte strings as  
something "special".

> On "raw" strings, this is always the case.

The only way to distinguish "raw" strings from multibyte strings is  
to subclass (which sucks for you as a byte user and for me as strings  
user).

> On all
> strings, @top[0, 8] would return the appropriate number of characters
> -- not the number of bytes. It just so happens on binary strings that
> the number of characters and bytes is exactly the same.

This is a very leaky abstraction - you can never expect what you will  
get. What's the problem with having bytes as an accessor?

>
> What I'm arguing is that while the pragma may work for the less-common
> encodings, both binary (non-)encoding and Unicode (probably UTF-8) are
> going to be common enough that specific literal constructors are
> probably a very good idea.

Python proved that to be wrong - both the subclassing part and the  
literals part.
The fact that you have to designate Unicode strings with literals is  
a bad decision and I can only suspect that it has to do with compiler  
intolerance,
and the need to do preprocessing.

-- 
Julian 'Julik' Tarkhanov
please send all personal mail to
me at julik.nl