> A general hint for debugging encoding troubles: the UTF-8 encoding > *guarantees* that every Unicode codepoint is *either* encoded into a > *single* octet with its most significant bit cleared to 0 (i.e. a > decimal value between 0 and 127) *or* into a *sequence* of 2 to 6 > octets, *all* of which have their MSB set to 1 (i.e. a decimal value > between 128 and 255). Question: The sequence of 2 to 6 octets: is it always even? i.e. 2, 4, or 6 but not 3 nor 5 octects? -- Posted via http://www.ruby-forum.com/.