Hi,

In message "Re: Hopefully Complete List of Possible Encoding Specificationsxisting Ones"
    on Thu, 25 Oct 2007 04:53:58 +0900, Wolfgang NáÅasi-Donner <ed.odanow / wonado.de> writes:

|The following command line options exist for the specification of
|program source encoding.
|
|- "-Kn" for Ascii-Encoding
|- "-Ka" for Ascii-Encoding
|- "-Ku" for Unicode utf-8 encoding
|- "-Ks" for Shift JIS encoding
|- "-Ke" for Extended UNIX Coding for Japanese
|- "-E ascii" (space is optional) for Ascii-Encoding
|- "-E ascii-8bit" (space is optional) for Ascii-Encoding
|- "-E us-ascii" (space is optional) for Ascii-Encoding
|- "-E binary" (space is optional) for Ascii-Encoding
|- "-E utf-8" (space is optional) for Unicode utf-8 encoding
|- "-E shift_jis" (space is optional) for Shift JIS encoding
|- "-E sjis" (space is optional) for Shift JIS encoding
|- "-E euc-jp" (space is optional) for Extended UNIX Coding for Japanese
|- "--encoding=ascii" (equal sign or space) for Ascii-Encoding
|- "--encoding=ascii-8bit" (equal sign or space) for Ascii-Encoding
|- "--encoding=us-ascii" (equal sign or space) for Ascii-Encoding
|- "--encoding=binary" (equal sign or space) for Ascii-Encoding
|- "--encoding=utf-8" (equal sign or space) for Unicode utf-8 encoding
|- "--encoding=shift_jis" (equal sign or space) for Shift JIS encoding
|- "--encoding=sjis" (equal sign or space) for Shift JIS encoding
|- "--encoding=euc-jp" (equal sign or space) for Extended UNIX Coding
|                       for Japanese

The encoding for -E (and --encoding) can be extensible by C extension
(or Ruby in the future).  So you can have other encoding support in
the near future.

|File Specific Methods
|---------------------
|
|There is only one file specific encoding identification. If a file
|starts with the Byte sequence 0xEF 0xBB 0xBF it will be identified
|as encoded in utf-8 and the the bytes will be ignored.

Currently, I am not going to add magic BOM support.  Not for 1.9.1 at
least.  I hate BOMs.  They are one of the two abominable things from
Unicode.  The other is UTF-16, especially as external encoding.

							matz.