Issue #5685 has been reported by Jani Patokallio.

----------------------------------------
Bug #5685: Oniguruma does not recognize U+30FC as Katakana
http://redmine.ruby-lang.org/issues/5685

Author: Jani Patokallio
Status: Open
Priority: Normal
Assignee: 
Category: 
Target version: 1.9.3
ruby -v: ruby 1.9.3dev (2011-09-23 revision 33323) [x86_64-darwin10.8.0]


The character U+30FC KATAKANA-HIRAGANA PROLONGED SOUND MARK (Japanese choonpu) belongs to the Unicode Katakana block (U+30A0-30FF), but it is not matched by /\p{Katakana}/.  Demonstration:

"私のホバークラフトは鰻でいっぱいです".gsub(/(\p{Katakana}|\p{Hiragana}|\p{Han})+/, 'X')
 => "XーX"

In other words, all kana and kanji in that string except U+30FC are matched.  And it really is 30FC/12540:

"私のホバークラフトは鰻でいっぱいです".gsub(/(\p{Katakana}|\p{Hiragana}|\p{Han})+/, '').unpack("U*")
 => [12540] 

Also occurs in Ruby 1.8 with the Oniguruma library.



-- 
http://redmine.ruby-lang.org