Issue #4044 has been updated by Martin Dürst.

Status changed from Rejected to Open

In reply to my analysis at https://bugs.ruby-lang.org/issues/5871#note-7, Yui Naruse suggested at https://bugs.ruby-lang.org/issues/5871#note-8 that I open this issue rather than #5871, which I'm doing herewith.

Yui also suggested that I propose a concrete plan. My current proposal is that we analyse what casing data is being used in what places when using /i (case insensitive matching) in regular expressions, and that we then fix that. If we don't make progress, I'll also write to the Unicode mailing list to hopefully collect input from other implementers.

By the way, can somebody explain the following difference:

$ ruby -e "puts /[\W]|\u1234/i.match('k').inspect"
#<MatchData "k">

$ ruby -e "puts /\W|\u1234/i.match('k').inspect"
nil

(|\u1234 is there just to force the regexp to be in UTF-8.)
----------------------------------------
Bug #4044: Regex matching errors when using \W character class and /i option
https://bugs.ruby-lang.org/issues/4044

Author: Ben Hoskings
Status: Open
Priority: Normal
Assignee: Yui NARUSE
Category: core
Target version: 1.9.2
ruby -v: ruby 1.9.2p0 (2010-08-18 revision 29036) [x86_64-darwin10.4.0]


=begin
 Hi all,
 
 Josh Bassett and I just discovered an issue with regex matches on ruby-1.9.2p0. (We reduced it while we were hacking on gemcutter.)
 
 The case-insensitive (/i) option together with the non-word character class (\W) match inconsistently against the alphabet. Specifically the regex doesn't match properly against the letters 'k' and 's'.
 
 The following expression demonstrates the problem in irb:
 
     puts ('a'..'z').to_a.map {|c| [c, c.ord, c[/[^\W]/i] ].inspect }
 
 As a reference, the following two expressions are working properly:
 
     puts ('a'..'z').to_a.map {|c| [c, c.ord, c[/[^\W]/] ].inspect }
     puts ('a'..'z').to_a.map {|c| [c, c.ord, c[/[\w]/i] ].inspect }
 
 Cheers
 Ben Hoskings & Josh Bassett
=end



-- 
http://bugs.ruby-lang.org/