Bug #3202: potential regression? \w in regex doesn't match umlauts anymore.
http://redmine.ruby-lang.org/issues/show/3202

Author: Andreas Fuchs
Status: Open, Priority: Normal
ruby -v: ruby 1.9.1p378 (2010-01-10 revision 26273) [i386-darwin10.2.0]

I'm trying to match umlauts using \w in regular expressions. In 1.9.1-p243, this works:

$ cat bar.rb
# encoding: utf-8
puts "ä".encoding
puts /\w/u.encoding
puts ("ä" =~ /\w/u).inspect
$ ruby bar.rb
UTF-8
UTF-8
0
$ ruby --version
ruby 1.9.1p243 (2009-07-16 revision 24175) [i386-darwin10.2.0]

With p378, it doesn't match the a with diaeresis anymore:

$ ruby bar.rb
UTF-8
UTF-8
nil
$ ruby --version
ruby 1.9.1p378 (2010-01-10 revision 26273) [i386-darwin10.2.0]

I'm seeing the same result in 1.9.2dev (2010-04-26 trunk 27503).

This is OS X 10.6, with the following locale settings:
$ locale
LANG="C"
LC_COLLATE="C"
LC_CTYPE="de_AT.UTF-8"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL=

No setting of either LC_CTYPE, LANG, nor LC_ALL has any effect on the p378 result.

This unexpected difference in behavior leads me to believe that something changed for the worse between these two releases.


----------------------------------------
http://redmine.ruby-lang.org