Caleb Clausen wrote: > On 4/27/10, Nikolay Khodyunya <nickolayho / gmail.com> wrote: >> #coding: utf-8 >> str2 = "asdf妙我抗我技忘批扼" >> p str2.encoding #<Encoding:UTF-8> >> p str2.scan /\p{Cyrillic}/ #found all cyrillic charachters >> str2.gsub!(/\w/u,'') #removes only latin characters >> puts str2 >> >> The question is why /\w/ ignore cyrillic characters? > > I think that \w (and similar shortcuts) is supposed to match ascii > characters only... thus it's equivalent to [a-zA-Z]. > > Isn't there some kind of unicode character class you can use? Actually "asdf妙我抗我技忘批扼".gsub!(/\w/u,'') returns "" on linux so the problem is from the windows package. you can use "asdf妙我抗我技忘批扼".gsub!(/\p{L}/,'') to remove letters thought -- Posted via http://www.ruby-forum.com/.