Issue #7501 has been updated by shyouhei (Shyouhei Urabe). Status changed from Open to Rejected If I remember correctly this is an intentional design. Because as Unicode version grows, the definition of what is a word character and what is not changes form time to time. It is hard for us to follow that. ---------------------------------------- Bug #7501: \w in a regular expression doesn't match international characters https://bugs.ruby-lang.org/issues/7501#change-34380 Author: eltomito (Tomas Partl) Status: Rejected Priority: Normal Assignee: Category: core Target version: ruby -v: ruby 1.9.3p0 (2011-10-30 revision 33570) [i686-linux] When using regexp matching, \w doesn't match characters which are not in the English alphabet. For example, the characters "ナセナ。?????焦・??aテ。テゥテュテウナッテコテス" should all be matched by \w but aren't. This program demonstrates the bug: -------------------------------------------------------- # encoding: utf-8 match = /\w+/.match( "abcdefghijklmnopqrstuvwxyz" ) puts match.to_s match = /\w+/.match( "テ。テゥテュテウナッテコテスナセナ。?????焦・??" ) #some Czech characters puts match.to_s match = /\w+/.match( "端辰旦" ) #some German characters puts match.to_s ---------------------------------------------------------- Expected output: ---------------------------------------------------------- abcdefghijklmnopqrstuvwxyz テ。テゥテュテウナッテコテスナセナ。?????焦・?? 端辰旦 ---------------------------------------------------------- Actual output: ---------------------------------------------------------- abcdefghijklmnopqrstuvwxyz ---------------------------------------------------------- -- http://bugs.ruby-lang.org/