On 28-jun-2006, at 20:36, Austin Ziegler wrote: > Except that @top is guaranteed to not have an encoding -- at least it > damned well better not -- and @top.bytes is redundant in this case. I > see no reason to access #bytes unless I know I'm dealing with a > multibyte String. You never know if you are, that's the problem. And no, it's NOT redundant. You should just get used to the fact that _all_ strings might become multibyte. > Worse, why would "Not PNG." be treated as Unicode > under your scheme but "\x89PNG\x0d\x0a\x1a\x0a" not be? I don't think > you're thinking this through. > > @top[0, 8] is sufficient when you can guarantee that sizeof(char) == > sizeof(byte). You can NEVER guarantee that. N e v e r. More languages and more people use multibyte characters by default than all ASCII users combined. It seems very pity but you still approcah multibyte strings as something "special". > On "raw" strings, this is always the case. The only way to distinguish "raw" strings from multibyte strings is to subclass (which sucks for you as a byte user and for me as strings user). > On all > strings, @top[0, 8] would return the appropriate number of characters > -- not the number of bytes. It just so happens on binary strings that > the number of characters and bytes is exactly the same. This is a very leaky abstraction - you can never expect what you will get. What's the problem with having bytes as an accessor? > > What I'm arguing is that while the pragma may work for the less-common > encodings, both binary (non-)encoding and Unicode (probably UTF-8) are > going to be common enough that specific literal constructors are > probably a very good idea. Python proved that to be wrong - both the subclassing part and the literals part. The fact that you have to designate Unicode strings with literals is a bad decision and I can only suspect that it has to do with compiler intolerance, and the need to do preprocessing. -- Julian 'Julik' Tarkhanov please send all personal mail to me at julik.nl