Making a few extremely simple tests, I discovered some conflicting
and confusing behavior around $KCODE and # -*- coding: in 1.9
(today's checkout).

The following two-liner produces an error
("日本語" will reach you as iso-2022-jp, but it's pure utf-8 here):

$KCODE = 'utf-8'
puts "日本語".scan(/./u).length

The error is `scan': character encodings differ (ArgumentError).
It turns out that "日本語" is taken to be US-ASCII, and the
regular expression is taken as UTF-8. On the other hand,
the following two-liner (removing the 'u') works:

$KCODE = 'utf-8'
puts "日本語".scan(/./).length

The result is 3, which means that character (utf-8) semantics
is applied. However, "日本語".encoding still is "US-ASCII",
and therefore the regular expression also is "US-ASCII",
although it doesn't have a .encoding method.

So what the regular expression does (UTF-8) and what it says
(US-ASCII) doesn't match at all.

Replacing the first line in the above scripts by
# -*- coding: utf-8 -*-
makes both cases work.

Is this the above an oversight, a secret plan to get people to
abandon $KCODE (which I understand will be phased out anyway),
or something else?

Regards,    Martin.


#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst / it.aoyama.ac.jp