On Wednesday, November 24, 2010 08:40:22 pm J=F6rg W Mittag wrote:
> David Masover wrote:
> > Java at least did this sanely -- UTF16 is at least a fixed width. If
> > you're going to force a single encoding, why wouldn't you use
> > fixed-width strings?
>=20
> Actually, it's not.

Whoops, my mistake. I guess now I'm confused as to why they went with UTF-1=
6=20
=2D- I always assumed it simply truncated things which can't be represented=
 in=20
16 bits.

> You can produce corrupt strings and slice into a half-character in
> Java just as you can in Ruby 1.8.

Wait, how?

I mean, yes, you can deliberately build strings out of corrupt data, but if=
=20
you actually work with complete strings and string concatenation, and you=20
aren't doing crazy JNI stuff, and you aren't digging into the actual bits o=
f=20
the string, I don't see how you can create a truncated string.

> > The whole point of having multiple encodings in the first place is that
> > other encodings make much more sense when you're not in the US.
>=20
> There's also a lot of legacy data, even within the US. On IBM systems,
> the standard encoding, even for greenfield systems that are being
> written right now, is still pretty much EBCDIC all the way.

I'm really curious why anyone would go with an IBM mainframe for a greenfie=
ld=20
system, let alone pick EBCDIC when ASCII is fully supported.

> And now there's a push for a One Encoding To Rule Them All in Ruby 2.
> That's *literally* insane! (One definition of insanity is repeating
> behavior and expecting a different outcome.)

Wait, what?

I've been out of the loop for awhile, so it's likely that I missed this, bu=
t=20
where are these plans?