On Thu, Nov 25, 2010 at 12:56 PM, Robert Klemme
<shortcutter / googlemail.com> wrote:
>
>> Since UTF-8 is a subset of UTF-16, which in turn is a subset of
>> UTF-32,
>
> I tried to find more precise statement about this but did not really
> succeed.  ¨Β τθουηθαμΥΤΖ­χεςε κυστ διζζεςεξεξγοδιξη ζοςνοζ
> the same universe of code points.

It's an implicit feature, rather than an explicit one:
Wester languages get the first 8 bits for encoding. Glyphs going
beyond the Latin alphabet get the next 8 bits. If that isn't enough, n
additional 16 bits are used for encoding purposes.

Thus, UTF-8 is a subset of UTF-16 is a subset of UTF-16. Thus, also,
the future-proofing, in case even more glyphs are needed.


>> (at least, ISO learned from the
>> mess created in the 1950s to 1960s) so that new glyphs won't ever
>> collide with existing glyphs, my point still stands. ;)
>
> Well, I support your point anyway.  ¨Βθαχακυστ νεαξασ γαφεατ σο
> people are watchful (and test rather than believe). :-)  ¨Βυτ ασ > think about it it more likely was a statement about Java's
> implementation (because a char has only 16 bits which is not
> sufficient for all Unicode code points).

Of course, test your assumptions. But first, you need an assumption to
start from. ;)

-- 
Phillip Gawlowski

Though the folk I have met,
(Ah, how soon!) they forget
When I've moved on to some other place,
There may be one or two,
When I've played and passed through,
Who'll remember my song or my face.