Issue #7501 has been updated by shyouhei (Shyouhei Urabe).

Status changed from Open to Rejected

If I remember correctly this is an intentional design.  Because as Unicode version grows, the definition of what is a word character and what is not changes form time to time.  It is hard for us to follow that.
----------------------------------------
Bug #7501: \w in a regular expression doesn't match international characters
https://bugs.ruby-lang.org/issues/7501#change-34380

Author: eltomito (Tomas Partl)
Status: Rejected
Priority: Normal
Assignee: 
Category: core
Target version: 
ruby -v: ruby 1.9.3p0 (2011-10-30 revision 33570) [i686-linux]


When using regexp matching, \w doesn't match characters which are not in the English alphabet.
For example, the characters "ナセナ。?????焦・??aテ。テゥテュテウナッテコテス" should all be matched by \w but aren't.

This program demonstrates the bug:

--------------------------------------------------------
# encoding: utf-8
match = /\w+/.match( "abcdefghijklmnopqrstuvwxyz" )
puts match.to_s
	
match = /\w+/.match( "テ。テゥテュテウナッテコテスナセナ。?????焦・??" ) #some Czech characters
puts match.to_s

match = /\w+/.match( "端辰旦" )	#some German characters
puts match.to_s
----------------------------------------------------------

Expected output:
----------------------------------------------------------
abcdefghijklmnopqrstuvwxyz
テ。テゥテュテウナッテコテスナセナ。?????焦・??
端辰旦
----------------------------------------------------------
Actual output:
----------------------------------------------------------
abcdefghijklmnopqrstuvwxyz


----------------------------------------------------------



-- 
http://bugs.ruby-lang.org/