Issue #5685 has been updated by Yui NARUSE. Status changed from Open to Rejected \p{Katakana} refers Script=Katakana, not Block=Katakana. So current behavior is correct. See also http://unicode.org/reports/tr18/ http://www.unicode.org/Public/UNIDATA/Scripts.txt http://perldoc.perl.org/perluniprops.html ---------------------------------------- Bug #5685: Oniguruma does not recognize U+30FC as Katakana http://redmine.ruby-lang.org/issues/5685 Author: Jani Patokallio Status: Rejected Priority: Normal Assignee: Category: Target version: 1.9.3 ruby -v: ruby 1.9.3dev (2011-09-23 revision 33323) [x86_64-darwin10.8.0] The character U+30FC KATAKANA-HIRAGANA PROLONGED SOUND MARK (Japanese choonpu) belongs to the Unicode Katakana block (U+30A0-30FF), but it is not matched by /\p{Katakana}/. Demonstration: "ç§???®ã???????¼ã?¯ã?©ã???????¯é°»??§ã????£ã?±ã????§ã??".gsub(/(\p{Katakana}|\p{Hiragana}|\p{Han})+/, 'X') => "X???X" In other words, all kana and kanji in that string except U+30FC are matched. And it really is 30FC/12540: "ç§???®ã???????¼ã?¯ã?©ã???????¯é°»??§ã????£ã?±ã????§ã??".gsub(/(\p{Katakana}|\p{Hiragana}|\p{Han})+/, '').unpack("U*") => [12540] Also occurs in Ruby 1.8 with the Oniguruma library. -- http://redmine.ruby-lang.org