Making a few extremely simple tests, I discovered some conflicting
and confusing behavior around $KCODE and # -*- coding: in 1.9
(today's checkout).
The following two-liner produces an error
("日本語" will reach you as iso-2022-jp, but it's pure utf-8 here):
$KCODE = 'utf-8'
puts "日本語".scan(/./u).length
The error is `scan': character encodings differ (ArgumentError).
It turns out that "日本語" is taken to be US-ASCII, and the
regular expression is taken as UTF-8. On the other hand,
the following two-liner (removing the 'u') works:
$KCODE = 'utf-8'
puts "日本語".scan(/./).length
The result is 3, which means that character (utf-8) semantics
is applied. However, "日本語".encoding still is "US-ASCII",
and therefore the regular expression also is "US-ASCII",
although it doesn't have a .encoding method.
So what the regular expression does (UTF-8) and what it says
(US-ASCII) doesn't match at all.
Replacing the first line in the above scripts by
# -*- coding: utf-8 -*-
makes both cases work.
Is this the above an oversight, a secret plan to get people to
abandon $KCODE (which I understand will be phased out anyway),
or something else?
Regards, Martin.
#-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-# http://www.sw.it.aoyama.ac.jp mailto:duerst / it.aoyama.ac.jp