Issue #18012 has been reported by jirkamarsik (Jirka Marsik).

----------------------------------------
Bug #18012: Case-insensitive character classes can only match multiple code points when top-level character class is not negated
https://bugs.ruby-lang.org/issues/18012

* Author: jirkamarsik (Jirka Marsik)
* Status: Open
* Priority: Normal
* ruby -v: ruby 3.0.1p64 (2021-04-05 revision 0fb782ee38) [x86_64-linux]
* Backport: 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN
----------------------------------------
Some Unicode characters case-fold to strings of multiple code points, e.g. the ligature `\ufb00` can match the string `ff`.

```
irb(main):001:0> /\A[\ufb00]\z/i.match("\ufb00")
=> #<MatchData "">
irb(main):002:0> /\A[\ufb00]\z/i.match("ff")
=> #<MatchData "ff">
```

As expected, when we negate this character class, we can no longer match neither the ligature character `\ufb00` nor the string `ff`.

```
irb(main):003:0> /\A[^\ufb00]\z/i.match("\ufb00")
=> nil
irb(main):004:0> /\A[^\ufb00]\z/i.match("ff")
=> nil
```

Then, when we add a second negation, the `\ufb00` ligature reappears in the character set but the string `ff` is no longer accepted.

```
irb(main):005:0> /\A[^[^\ufb00]]\z/i.match("\ufb00")
=> #<MatchData "">
irb(main):006:0> /\A[^[^\ufb00]]\z/i.match("ff")
=> nil
```

This reveals that the multi-code-point matches in character classes are blocked by negation. However, this is implemented only by checking whether the topmost character class is negated. If we wrap the character class in another set of brackets, the semantics change.

```
irb(main):007:0> /\A[[^[^\ufb00]]]\z/i.match("\ufb00")
=> #<MatchData "">
irb(main):008:0> /\A[[^[^\ufb00]]]\z/i.match("ff")
=> #<MatchData "ff">
```

The cause behind this discrepancy (the fact that `[^[^\ufb00]]` and `[[^[^\ufb00]]]` match different strings) is the extra `IS_NCCLASS_NOT` check in `i_apply_case_fold` (https://github.com/ruby/ruby/blob/9eae8cdefba61e9e51feb30a4b98525593169666/regparse.c#L5568).




-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>