Dear Ruby 1.9 architects, developers, and testers!
I am putting together all information about encoding specifications for Ruby 1.9
(existing ones!) from your posts for my testing activities. Before I will
generate tests (Windows 'bat' files, programs, and data), I want to make sure to
have a complete and correct list. Please, glance at it and report faults.
>>>>>>>>>> List of Encoding Specifications for Ruby 1.9 >>>>>>>>>
Methods for Specification of Encoding
=====================================
Ruby Command Line Options
-------------------------
The following command line options exist for the specification of
program source encoding.
- "-Kn" for Ascii-Encoding
- "-Ka" for Ascii-Encoding
- "-Ku" for Unicode utf-8 encoding
- "-Ks" for Shift JIS encoding
- "-Ke" for Extended UNIX Coding for Japanese
- "-E ascii" (space is optional) for Ascii-Encoding
- "-E ascii-8bit" (space is optional) for Ascii-Encoding
- "-E us-ascii" (space is optional) for Ascii-Encoding
- "-E binary" (space is optional) for Ascii-Encoding
- "-E utf-8" (space is optional) for Unicode utf-8 encoding
- "-E shift_jis" (space is optional) for Shift JIS encoding
- "-E sjis" (space is optional) for Shift JIS encoding
- "-E euc-jp" (space is optional) for Extended UNIX Coding for Japanese
- "--encoding=ascii" (equal sign or space) for Ascii-Encoding
- "--encoding=ascii-8bit" (equal sign or space) for Ascii-Encoding
- "--encoding=us-ascii" (equal sign or space) for Ascii-Encoding
- "--encoding=binary" (equal sign or space) for Ascii-Encoding
- "--encoding=utf-8" (equal sign or space) for Unicode utf-8 encoding
- "--encoding=shift_jis" (equal sign or space) for Shift JIS encoding
- "--encoding=sjis" (equal sign or space) for Shift JIS encoding
- "--encoding=euc-jp" (equal sign or space) for Extended UNIX Coding
for Japanese
File Specific Methods
---------------------
There is only one file specific encoding identification. If a file
starts with the Byte sequence 0xEF 0xBB 0xBF it will be identified
as encoded in utf-8 and the the bytes will be ignored.
Magic Coding Comments
---------------------
This comments will influence the encoding recognition only, if present in
the first line of source, or in the second, if the first is a special
information comment for the underlying OS (e.g. "#!...").
The encodings can be specified by the following specified keywords.
- Ascii-Encoding with local character allocation for the range 0x80..0xFF
(which are invalid outside comments and string constants) is specified
by "ascii", "ascii-8bit", "us-ascii", and "binary".
- Unicode utf-8 encoding is specified by "utf-8".
- Shift JIS encoding is specified by "shift_jis" or "sjis".
- Extended UNIX Coding for Japanese ist specified by "euc-jp".
This results in the following possible magic coding comments
(The comments are case insensitive):
- "# -*- coding: ascii -*-"
- "# -*- coding: ascii-8bit -*-"
- "# -*- coding: us-ascii -*-"
- "# -*- coding: binary -*-"
- "# -*- coding: utf-8 -*-"
- "# -*- coding: shift_jis -*-"
- "# -*- coding: sjis -*-"
- "# -*- coding: euc-jp -*-"
- "# vi: set fileencoding=ascii"
- "# vi: set fileencoding=ascii-8bit"
- "# vi: set fileencoding=us-ascii"
- "# vi: set fileencoding=binary"
- "# vi: set fileencoding=utf-8"
- "# vi: set fileencoding=shift_jis"
- "# vi: set fileencoding=sjis"
- "# vi: set fileencoding=euc-jp"
In addition it is possible to use a free comment instead, which contains
the word "coding" followed by an equal sign and one of the possible
encoding specifications, e.g. "# coding=utf-8" is a valid magic coding comment.
Thank you in advance, Wolfgang NĂ¡dasi-Donner