On 6/16/06, Juergen Strobel <strobel / secure.at> wrote: > On Fri, Jun 16, 2006 at 03:39:00AM +0900, Austin Ziegler wrote: > > On 6/15/06, Juergen Strobel <strobel / secure.at> wrote: > > [ snip essentially accurate information ] > > > > >UTF-8 encodes every Unicode code point as a variable length sequence > > >of 1 to 4 (I think) bytes. > > > > It could be up to six bytes at one point. However, I think that there > > is still support for surrogate characters meaning that a single glyph > > *might* take as many as eight bytes to represent in the 1-4 byte > > representation. Even with that, though, those are rare and usually > > user-defined (private) ranges IIRC. This also doesn't deal with > > (de)composed glyphs/combining glyphs. > > No. According to wikipedia, it is upt to 4 bytes for plain UTF8 for > all characters. Only Java may need more than that because of their use > of UTF16 surrogates and special \0 handling in an intermediary step. See > Please, do not use Wikipedia as an argument. It can contain useful information but it may as well contain utter nonsense. I may just go there and change that 4 to 32. Maybe somebody will notice and correct it, maybe not. You never know. When reading anything on wikipedia you should verify from other sources. It applies to other webs as well. But with wikipedia you have no clue who wrote it. If you want to get more idea about the quality of some wikipedia articles search for wikipedia and Seigenthaler in your favorite search engine (preferrably non-Google :). One of the many results returned: http://www.usatoday.com/news/opinion/editorials/2005-11-29-wikipedia-edit_x.htm Thanks Michal