On Thu, Aug 01, 2002 at 09:55:48PM +0900, Curt Sampson wrote:
> On Thu, 1 Aug 2002, Alexander Bokovoy wrote:
> 
> > Unicode 3.1 is 32-bit wide.
> 
> I have just looked at my 3.0 standard and the 3.1 and 3.2 updates on the
> web site, and I do not see any evidence of this. Did I miss something?
> See the message I just posted for the details as I know them.
http://www.unicode.org/unicode/reports/tr19/tr19-9.html :

--8<--8<--8<--8<--8<--8<--8<--8<--8<--8<--8<--8<--8<--8<--8<--8<--8<--8<--

3 Relation to ISO/IEC 10646 and UCS-4

ISO/IEC 10646 defines a 4-byte encoding form called UCS-4. Since UTF-32 is
simply a subset of UCS-4 characters, it is conformant to ISO/IEC 10646 as
well as to the Unicode Standard.

As of the recent publication of the second edition of ISO/IEC 10646-1,
UCS-4 still assigns private use codepoints (E0000016..FFFFFF16 and
6000000016..7FFFFFFF16) that are not in the range of valid Unicode
codepoints. To promote interoperability among the Unicode encoding forms
JTC1/SC2/WG2 has approved a motion removing those private use assignments:

Resolution M38.6 (Restriction of encoding space) [adopted unanimously]

"WG2 accepts the proposal in document N2175 towards removing the provision
for Private Use Groups and Planes beyond Plane 16 in ISO/IEC 10646, to
ensure internal consistency in the standard between UCS-4, UTF-8 and
UTF-16 encoding formats, and instructs its project editor [to] prepare
suitable text for processing as a future Technical Corrigendum or an
Amendment to 10646-1:2000."

While this resolution must still be approved as an Amendment to
10646-1:2000, the Unicode Technical Committee has every expectation that
once the text for that Amendment completes its formal balloting it will
proceed smoothly to publication as part of that standard.

Until the formal balloting is concluded, the term UTF-32 can be used to
refer to the subset of UCS-4 characters that are in the range of valid
Unicode code points. After it passes, UTF-32 will then simply be an alias
for UCS-4 (with the extra requirement that Unicode semantics are observed)

--8<--8<--8<--8<--8<--8<--8<--8<--8<--8<--8<--8<--8<--8<--8<--8<--8<--8<--


> > I do not see reason to exist projects like
> > Mojikyo when it is perfectly can be done in 32-bit Unicode.
> 
> Mojikyo is doing things like setting code points for characters that
> will never exist in Unicode, because those characters are combined due
> to the character combining rules. Mojikyo has a different purpose from
> Unicode: Unicode wants to make doing standard, day-to-day work easy;
> Mojikyo wants to give maximum flexability in the display of Chinese
> characters. Given the number and complexity of kanji, these two aims are
> basically incompatable.
I still don't see why both goals should be incompatible a priori. But this
is possible an offtopic here. :)


-- 
/ Alexander Bokovoy
---
I went to a Grateful Dead Concert and they played for SEVEN hours.  Great song.
		-- Fred Reuss