On Jun 19, 2006, at 4:16 AM, Christian Neukirchen wrote:

>>> Does that mean that  binary.to_unicode.to_binary != binary  is
>>> possible?
>>> That could turn out pretty bad, no?
>>
>> Yes, but having "m" != "m" is pretty bad too; the alternative is
>> normalizing at comparison time, which would really hurt for example
>> in a big sort, so you'd need to cache the normalized form, which
>> would be a lot more code.
>>
>> binary.to_unicode looks a little weird to me... can you do that
>> without knowing what the binary is?  If it's text in a known
>> encoding, no breakage should occur.  If it's unknown bit patterns,
>> you can't really expect anything sensible to happen... or am I
>> missing an obvious scenario?  -Tim
>
> Those were just fictive method calls.  But let's say I read from
> a pipe and I know it contains UTF-16 with BOM, then .to_unicode
> would make perfect sense, no?

Yep.  And yes, calling to_unicode on it might in fact change the bit  
patterns if you adopted Early Uniform Normalization (which would be a  
good thing to do).  -Tim