On Wed, Aug 13, 2003 at 08:56:34AM +0900, nobu.nokada / softhome.net wrote:
> Hi,
>
> At Wed, 13 Aug 2003 01:34:41 +0900,
> Nikolai Weibull wrote:
> > hm...I messed up.  I was trying to do "hispa?ola".gsub(/\xf1/, 'n'), but
> > should have been doing "hispa?ola".gsub(/?/, 'n')
> Although "\xf1" seems ISO-8859-1 instead of UTF-8, you have to
> use -Ku option in command line or shebang to write literals in
> UTF-8.

Right.  In a UTF-8 file, the string "hispa?ola" doesn't
contain the byte \xf1.  It contains the UTF-8 encoding of the character
U+00F1, which is \xc3 \xb1.  Ruby can examine the language settings of its
runtime environment, but has no way of doing so for the environment
in which the script was written.  So it has no way of knowing what
character encoding a program file itself uses.  The -Ku option tells Ruby
that the program file is written in UTF-8.

I believe it assumes ISO-8859-1 otherwise.  I think it would also honor
a Unicode Byte Order Mark at the top of the file, but a BOM gets in the
way of the #!.

-Mark