Issue #7501 has been updated by charliesome (Charlie Somerville).


/[[:alpha:]]+/ should behave as you expect
----------------------------------------
Bug #7501: \w in a regular expression doesn't match international characters
https://bugs.ruby-lang.org/issues/7501#change-34360

Author: eltomito (Tomas Partl)
Status: Open
Priority: Normal
Assignee: 
Category: core
Target version: 
ruby -v: ruby 1.9.3p0 (2011-10-30 revision 33570) [i686-linux]


When using regexp matching, \w doesn't match characters which are not in the English alphabet.
For example, the characters "ナセナ。?????焦・??aテ。テゥテュテウナッテコテス" should all be matched by \w but aren't.

This program demonstrates the bug:

--------------------------------------------------------
# encoding: utf-8
match = /\w+/.match( "abcdefghijklmnopqrstuvwxyz" )
puts match.to_s
	
match = /\w+/.match( "テ。テゥテュテウナッテコテスナセナ。?????焦・??" ) #some Czech characters
puts match.to_s

match = /\w+/.match( "端辰旦" )	#some German characters
puts match.to_s
----------------------------------------------------------

Expected output:
----------------------------------------------------------
abcdefghijklmnopqrstuvwxyz
テ。テゥテュテウナッテコテスナセナ。?????焦・??
端辰旦
----------------------------------------------------------
Actual output:
----------------------------------------------------------
abcdefghijklmnopqrstuvwxyz


----------------------------------------------------------



-- 
http://bugs.ruby-lang.org/