The meaning of \w can change if you alter the global $KCODE variable. It's best to specify exactly what you mean if you know exactly what you want (eg, follow Robert's advice). Specifying \w says that you want "wordful," non-breaking characters; this includes non-English characters, even CJK. irb(main):001:0> s = " " => "\327\251\327\221\327\252 \327\251\327\234\327\225\327\235" irb(main):002:0> s =~ /\w/ ? "match" : "no match" => "no match" irb(main):003:0> $KCODE = "u" => "u" irb(main):004:0> s =~ /\w/ ? "match" : "no match" => "match" On May 10, 11:10 pm, Ehud <ehud... / gmail.com> wrote: > Hi everyone... > I'm looking for a way to only allow english characters through a > simple regex. > It seems that \w (altough the documentation states is equivalent to [a- > zA-Z0-9] still allows > non english characters (in my case hebrew). > > Has anyone come up with a solution other than specifying [abcdef...]? > > Thanks! > Ehud