On Jun 19, 2006, at 4:16 AM, Christian Neukirchen wrote: >>> Does that mean that binary.to_unicode.to_binary != binary is >>> possible? >>> That could turn out pretty bad, no? >> >> Yes, but having "máÔ" != "máÔ" is pretty bad too; the alternative is >> normalizing at comparison time, which would really hurt for example >> in a big sort, so you'd need to cache the normalized form, which >> would be a lot more code. >> >> binary.to_unicode looks a little weird to me... can you do that >> without knowing what the binary is? If it's text in a known >> encoding, no breakage should occur. If it's unknown bit patterns, >> you can't really expect anything sensible to happen... or am I >> missing an obvious scenario? -Tim > > Those were just fictive method calls. But let's say I read from > a pipe and I know it contains UTF-16 with BOM, then .to_unicode > would make perfect sense, no? Yep. And yes, calling to_unicode on it might in fact change the bit patterns if you adopted Early Uniform Normalization (which would be a good thing to do). -Tim