On 6/16/06, Juergen Strobel <strobel / secure.at> wrote:
> On Fri, Jun 16, 2006 at 03:39:00AM +0900, Austin Ziegler wrote:
> > On 6/15/06, Juergen Strobel <strobel / secure.at> wrote:
> > [ snip essentially accurate information ]
> >
> > >UTF-8 encodes every Unicode code point as a variable length sequence
> > >of 1 to 4 (I think) bytes.
> >
> > It could be up to six bytes at one point. However, I think that there
> > is still support for surrogate characters meaning that a single glyph
> > *might* take as many as eight bytes to represent in the 1-4 byte
> > representation. Even with that, though, those are rare and usually
> > user-defined (private) ranges IIRC. This also doesn't deal with
> > (de)composed glyphs/combining glyphs.
>
> No. According to wikipedia, it is upt to 4 bytes for plain UTF8 for
> all characters. Only Java may need more than that because of their use
> of UTF16 surrogates and special \0 handling in an intermediary step. See
>
Please, do not use Wikipedia as an argument. It can contain useful
information but it may as well contain utter nonsense. I may just go
there and change that 4 to 32. Maybe somebody will notice and correct
it, maybe not. You never know.
When reading anything on wikipedia you should verify from other
sources. It applies to other webs as well. But with wikipedia you have
no clue who wrote it.

If you want to get more idea about the quality of some wikipedia
articles search for wikipedia and Seigenthaler in your favorite search
engine (preferrably non-Google :).

One of the many results returned:
http://www.usatoday.com/news/opinion/editorials/2005-11-29-wikipedia-edit_x.htm

Thanks

Michal