Issue #16158 has been updated by duerst (Martin D=FCrst).


Some more information: The onigmo documentation says (https://github.com/k-=
takata/Onigmo/blob/master/doc/RE#L270):
```
                     Subexp of look-behind must be fixed-width.
                     But top-level alternatives can be of various lengths.
                     ex. (?<=3Da|bc) is OK. (?<=3Daaa(?:b|cd)) is not allow=
ed.
```

Now what onigmo does internally is apparently that it considers the st liga=
ture as case equivalent to upper-case ST, which is again case equivalent to=
 lowercase st. You can see that as follows:

```
$ ruby -ve 'puts(/\uFB06/i =3D~ "most")'
ruby 2.7.0dev (2019-07-06T03:43:38Z trunk f296c260ef) [x86_64-cygwin]
2
```

The st ligature is a single character, so its length is 1, but the length o=
f ST and st is 2. So with the //i option, st seems to no longer be fixed wi=
dth, and that's why onigmo refuses to deal with this and produces an error.=
 So in some way, this is as per spec, although it's surprising and annoying.



----------------------------------------
Bug #16158: "st" Character Sequence In Regex Look-Behind Causes Illegal Pat=
tern Error When Combined With POSIX Bracket Expressions And Case Insensitiv=
ity Flag
https://bugs.ruby-lang.org/issues/16158#change-81561

* Author: michaeltomko (Michael Tomko)
* Status: Open
* Priority: Normal
* Assignee: =

* Target version: =

* ruby -v: ruby 2.7.0dev (2019-09-11 master 146677a1e7) [x86_64-openbsd6.5]
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
*This is my first Ruby bug submission. Please let me know if there is anyth=
ing else that I can provide that would be helpful. Thanks for your time!*

I've tried just about as many combinations as I can think of and I have bee=
n able to narrow down the issue to the following components being present i=
n a regular expression.

* The character sequence "st" either preceded by any characters OR being a =
part of a top-level alternation inside of a look-behind. The issue occurs w=
ith both positive and negative look-behinds. ex: `(?<!Costa)` or `(?<!Bob|S=
ally|Stan)` or `(?<=3D st)`
* Case insensitivity either being set globally or inside of the regex with =
`(?i)` preceding the look-behind.
* Any curly-style POSIX bracket expression included anywhere in the regex. =
ex: `\p{Space}` or `\p{L}`

Here are some examples of the error. I have tested this on 2.5.0 locally an=
d [on 2.5.3 with Rubular] (https://rubular.com/r/jnr98E9JfAZJIQ).

```
2.5.0 :044 > pat =3D /(?<!a st)\p{Space}/i
Traceback (most recent call last):
SyntaxError ((irb):44: invalid pattern in look-behind: /(?<!a st)\p{Space}/=
i)

2.5.0 :047 > pat =3D /(?i)(?<!a st)\p{Space}/
Traceback (most recent call last):
SyntaxError ((irb):47: invalid pattern in look-behind: /(?i)(?<!a st)\p{Spa=
ce}/)

2.5.0 :016 > pat =3D /(?<!Costa)Mesa(\p{Space}|\p{Punct})+(AZ|Arizona)/i
Traceback (most recent call last):
SyntaxError ((irb):16: invalid pattern in look-behind: /(?<=3DCosta)Mesa(\p=
{Space}|\p{Punct})+(AZ|Arizona)/i)
```

My expectation would be that this regular expression would compile as writt=
en, as it does in JRuby and in MacOS regex testing apps like Patterns or Re=
ggy.

It does compile as expected if the case insensitivity flag is removed or in=
stantiated after the look-behind, if the "st" character sequence is first i=
n the look-behind and not apart of an alternation, or if different types of=
 operators are substituted for the POSIX bracket expressions.


```
2.5.0 :007 > pat =3D /((?<!Cosa)Mesa|Arlington(?=3D([:space:]|[:punct:])+(A=
Z|Arizona)))/
 =3D> /((?<!Cosa)Mesa|Arlington(?=3D(\p{Space}|\p{Punct})+(AZ|Arizona)))/

2.5.0 :008 > pat =3D /((?<!Cosa)Mesa|Arlington(?=3D([:space:]|[:punct:])+(A=
Z|Arizona)))/i
 =3D> /((?<!Cosa)Mesa|Arlington(?=3D([:space:]|[:punct:])+(AZ|Arizona)))/i

2.5.0 :009 > pat =3D /((?<!Cosa)Mesa|Arlington(?=3D([:space:]|[:punct:])+(A=
Z|Arizona)))/i
 =3D> /((?<!Cosa)Mesa|Arlington(?=3D(\s|\W)+(AZ|Arizona)))/i

2.5.0 :056 > pat =3D /(?<!a st)(?i)(?<!juice)\p{Space}/
 =3D> /(?<!a st)(?i)(?<!juice)\p{Space}/

2.5.0 :058 > pat =3D /(?<!a st)(?i)(?<!stark)\p{Space}/
 =3D> /(?<!a st)(?i)(?<!stark)\p{Space}/
```



-- =

https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=3Dunsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>