Issue #16158 has been updated by duerst (Martin D=FCrst).


I've had a hunch, and have now been able to confirm this hunch:

The problem must be related to the fact that there is a 'st' ligature (U+FB=
06) in Unicode. The problem occurs for all the other Latin ligatures just b=
efore U+FB06, i.e. for 'ff', 'fi', 'fl', 'ffi' 'ffl', long s with t, and st=
. It also occurs for the components of the Armenian ligatures just followin=
g, e.g.
```
$ ruby -ve 'pat =3D /(?<!a \u0574\u0576)\p{Space}/i'
ruby 2.7.0dev (2019-07-06T03:43:38Z trunk f296c260ef) [x86_64-cygwin]
-e:1: invalid pattern in look-behind: /(?<!a \u0574\u0576)\p{Space}/i
```
It doesn't occur for Hebrew ligatures:
```
$ ruby -ve 'pat =3D /(?<!a \u05D9\u05B4)\p{Space}/i'
ruby 2.7.0dev (2019-07-06T03:43:38Z trunk f296c260ef) [x86_64-cygwin]

```

My guess is that this is because Latin and Armenian have case conversion, b=
ut Hebrew doesn't. This would match with the fact that the error is only pr=
oduced when matching is case-insensitive.


----------------------------------------
Bug #16158: "st" Character Sequence In Regex Look-Behind Causes Illegal Pat=
tern Error When Combined With POSIX Bracket Expressions And Case Insensitiv=
ity Flag
https://bugs.ruby-lang.org/issues/16158#change-81560

* Author: michaeltomko (Michael Tomko)
* Status: Open
* Priority: Normal
* Assignee: =

* Target version: =

* ruby -v: ruby 2.7.0dev (2019-09-11 master 146677a1e7) [x86_64-openbsd6.5]
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
*This is my first Ruby bug submission. Please let me know if there is anyth=
ing else that I can provide that would be helpful. Thanks for your time!*

I've tried just about as many combinations as I can think of and I have bee=
n able to narrow down the issue to the following components being present i=
n a regular expression.

* The character sequence "st" either preceded by any characters OR being a =
part of a top-level alternation inside of a look-behind. The issue occurs w=
ith both positive and negative look-behinds. ex: `(?<!Costa)` or `(?<!Bob|S=
ally|Stan)` or `(?<=3D st)`
* Case insensitivity either being set globally or inside of the regex with =
`(?i)` preceding the look-behind.
* Any curly-style POSIX bracket expression included anywhere in the regex. =
ex: `\p{Space}` or `\p{L}`

Here are some examples of the error. I have tested this on 2.5.0 locally an=
d [on 2.5.3 with Rubular] (https://rubular.com/r/jnr98E9JfAZJIQ).

```
2.5.0 :044 > pat =3D /(?<!a st)\p{Space}/i
Traceback (most recent call last):
SyntaxError ((irb):44: invalid pattern in look-behind: /(?<!a st)\p{Space}/=
i)

2.5.0 :047 > pat =3D /(?i)(?<!a st)\p{Space}/
Traceback (most recent call last):
SyntaxError ((irb):47: invalid pattern in look-behind: /(?i)(?<!a st)\p{Spa=
ce}/)

2.5.0 :016 > pat =3D /(?<!Costa)Mesa(\p{Space}|\p{Punct})+(AZ|Arizona)/i
Traceback (most recent call last):
SyntaxError ((irb):16: invalid pattern in look-behind: /(?<=3DCosta)Mesa(\p=
{Space}|\p{Punct})+(AZ|Arizona)/i)
```

My expectation would be that this regular expression would compile as writt=
en, as it does in JRuby and in MacOS regex testing apps like Patterns or Re=
ggy.

It does compile as expected if the case insensitivity flag is removed or in=
stantiated after the look-behind, if the "st" character sequence is first i=
n the look-behind and not apart of an alternation, or if different types of=
 operators are substituted for the POSIX bracket expressions.


```
2.5.0 :007 > pat =3D /((?<!Cosa)Mesa|Arlington(?=3D([:space:]|[:punct:])+(A=
Z|Arizona)))/
 =3D> /((?<!Cosa)Mesa|Arlington(?=3D(\p{Space}|\p{Punct})+(AZ|Arizona)))/

2.5.0 :008 > pat =3D /((?<!Cosa)Mesa|Arlington(?=3D([:space:]|[:punct:])+(A=
Z|Arizona)))/i
 =3D> /((?<!Cosa)Mesa|Arlington(?=3D([:space:]|[:punct:])+(AZ|Arizona)))/i

2.5.0 :009 > pat =3D /((?<!Cosa)Mesa|Arlington(?=3D([:space:]|[:punct:])+(A=
Z|Arizona)))/i
 =3D> /((?<!Cosa)Mesa|Arlington(?=3D(\s|\W)+(AZ|Arizona)))/i

2.5.0 :056 > pat =3D /(?<!a st)(?i)(?<!juice)\p{Space}/
 =3D> /(?<!a st)(?i)(?<!juice)\p{Space}/

2.5.0 :058 > pat =3D /(?<!a st)(?i)(?<!stark)\p{Space}/
 =3D> /(?<!a st)(?i)(?<!stark)\p{Space}/
```



-- =

https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=3Dunsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>