--
Wolfgang N?dasi-Donner
wonado / donnerweb.de
"Nikolai Weibull" <mailing-lists.ruby-talk / rawuncut.elitemail.org> schrieb
im Newsbeitrag news:20050320142500.GB6070 / puritan.pcp.ath.cx...
> * Wolfgang N?dasi-Donner (Mar 15, 2005 19:10):
> > > I'm working with Japanese character sets in Windows. I can save my
> > > *.rb files with notepad using UTF-8 but I can't run them with Ruby.
>
> > The Windows-Editor writes always a "Byte Order Mark" (BOM) at the
> > beginning of UTF-8/16LE/16BE coded files. In this case a UTF-8 coded
> > file begins with "EF BB BF" (hex). These non-characters should usually
> > be ignored (for more information see http://www.unicode.org/).
>
> Why does it write a BOM for UTF-8 encoded files?  It's utterly
> meaningless to discuss byte order for UTF-8 encoded text,
>         nikolai
>
> --
> ::: name: Nikolai Weibull    :: aliases: pcp / lone-star / aka :::
> ::: born: Chicago, IL USA    :: loc atm: Gothenburg, Sweden    :::
> ::: page: minimalistic.org   :: fun atm: gf,lps,ruby,lisp,war3 :::
> main(){printf(&linux["\021%six\012\0"],(linux)["have"]+"fun"-97);}
>
>

Simply said, because it is allowed by the Unicode Standard.

I assume that Microsoft uses it because the Notepad can decide which
Encoding is used in existing data. This means, that one cannot edit UTF-8
encoded Data using Notepad, if there is no appropriate BOM.