Caleb Clausen wrote:
> On 4/27/10, Nikolay Khodyunya <nickolayho / gmail.com> wrote:
>> #coding: utf-8
>> str2 = "asdf妙我抗我技忘批扼"
>> p str2.encoding #<Encoding:UTF-8>
>> p str2.scan /\p{Cyrillic}/ #found all cyrillic charachters
>> str2.gsub!(/\w/u,'') #removes only latin characters
>> puts str2
>>
>> The question is why /\w/ ignore cyrillic characters?
> 
> I think that \w (and similar shortcuts) is supposed to match ascii
> characters only... thus it's equivalent to [a-zA-Z].
> 
> Isn't there some kind of unicode character class you can use?

Actually "asdf妙我抗我技忘批扼".gsub!(/\w/u,'') returns "" on linux so the 
problem is from the windows package.

you can use "asdf妙我抗我技忘批扼".gsub!(/\p{L}/,'') to remove letters thought
-- 
Posted via http://www.ruby-forum.com/.