Issue #16158 has been updated by michaeltomko (Michael Tomko).


*This is my first Ruby bug submission. Please let me know if there is anything else that I can provide that would be helpful. Thanks for your time!*

I've tried just about as many combinations as I can think of and I have been able to narrow down the issue to the following components being present in a regular expression.

* The character sequence "st" either preceded by any characters OR being a part of a top-level alternation inside of a look-behind. The issue occurs with both positive and negative look-behinds. ex: `(?<!Costa)` or `(?<!Bob|Sally|Stan)` or `(?<= st)`
* Case insensitivity either being set globally or inside of the regex with `(?i)` preceding the look-behind.
* Any curly-style POSIX bracket expression included anywhere in the regex. ex: `\p{Space}` or `\p{L}`

Here are some examples of the error. I have tested this on 2.5.0 locally and [on 2.5.3 with Rubular] (https://rubular.com/r/jnr98E9JfAZJIQ).

```
2.5.0 :044 > pat = /(?<!a st)\p{Space}/i
Traceback (most recent call last):
SyntaxError ((irb):44: invalid pattern in look-behind: /(?<!a st)\p{Space}/i)

2.5.0 :047 > pat = /(?i)(?<!a st)\p{Space}/
Traceback (most recent call last):
SyntaxError ((irb):47: invalid pattern in look-behind: /(?i)(?<!a st)\p{Space}/)

2.5.0 :016 > pat = /(?<!Costa)Mesa(\p{Space}|\p{Punct})+(AZ|Arizona)/i
Traceback (most recent call last):
SyntaxError ((irb):16: invalid pattern in look-behind: /(?<=Costa)Mesa(\p{Space}|\p{Punct})+(AZ|Arizona)/i)
```

My expectation would be that this regular expression would compile as written, as it does in JRuby and in MacOS regex testing apps like Patterns or Reggy.

It does compile as expected if the case insensitivity flag is removed or instantiated after the look-behind, if the "st" character sequence is first in the look-behind and not apart of an alternation, or if different types of operators are substituted for the POSIX bracket expressions.


```
2.5.0 :007 > pat = /((?<!Costa)Mesa|Arlington(?=(\p{Space}|\p{Punct})+(AZ|Arizona)))/
 => /((?<!Costa)Mesa|Arlington(?=(\p{Space}|\p{Punct})+(AZ|Arizona)))/

2.5.0 :008 > pat = /((?<!Costa)Mesa|Arlington(?=([:space:]|[:punct:])+(AZ|Arizona)))/i
 => /((?<!Costa)Mesa|Arlington(?=([:space:]|[:punct:])+(AZ|Arizona)))/i

2.5.0 :009 > pat = /((?<!Costa)Mesa|Arlington(?=(\s|\W)+(AZ|Arizona)))/i
 => /((?<!Costa)Mesa|Arlington(?=(\s|\W)+(AZ|Arizona)))/i

2.5.0 :056 > pat = /(?<!a st)(?i)(?<!juice)\p{Space}/
 => /(?<!a st)(?i)(?<!juice)\p{Space}/

2.5.0 :058 > pat = /(?<!a st)(?i)(?<!stark)\p{Space}/
 => /(?<!a st)(?i)(?<!stark)\p{Space}/
```

----------------------------------------
Bug #16158: "st" Character Sequence In Regex Look-Behind Causes Illegal Pattern Error When Combined With POSIX Bracket Expressions And Case Insensitivity Flag
https://bugs.ruby-lang.org/issues/16158#change-81482

* Author: michaeltomko (Michael Tomko)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: 2.5.0, 2.5.3
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
*This is my first Ruby bug submission. Please let me know if there is anything else that I can provide that would be helpful. Thanks for your time!*

I've tried just about as many combinations as I can think of and I have been able to narrow down the issue to the following components being present in a regular expression.

* The character sequence "st" either preceded by any characters OR being a part of a top-level alternation inside of a look-behind. The issue occurs with both positive and negative look-behinds. ex: `(?<!Costa)` or `(?<!Bob|Sally|Stan)` or `(?<= st)`
* Case insensitivity either being set globally or inside of the regex with `(?i)` preceding the look-behind.
* Any curly-style POSIX bracket expression included anywhere in the regex. ex: `\p{Space}` or `\p{L}`

Here are some examples of the error. I have tested this on 2.5.0 locally and [on 2.5.3 with Rubular] (https://rubular.com/r/jnr98E9JfAZJIQ).

```
2.5.0 :044 > pat = /(?<!a st)\p{Space}/i
Traceback (most recent call last):
SyntaxError ((irb):44: invalid pattern in look-behind: /(?<!a st)\p{Space}/i)

2.5.0 :047 > pat = /(?i)(?<!a st)\p{Space}/
Traceback (most recent call last):
SyntaxError ((irb):47: invalid pattern in look-behind: /(?i)(?<!a st)\p{Space}/)

2.5.0 :016 > pat = /(?<!Costa)Mesa(\p{Space}|\p{Punct})+(AZ|Arizona)/i
Traceback (most recent call last):
SyntaxError ((irb):16: invalid pattern in look-behind: /(?<=Costa)Mesa(\p{Space}|\p{Punct})+(AZ|Arizona)/i)
```

My expectation would be that this regular expression would compile as written, as it does in JRuby and in MacOS regex testing apps like Patterns or Reggy.

It does compile as expected if the case insensitivity flag is removed or instantiated after the look-behind, if the "st" character sequence is first in the look-behind and not apart of an alternation, or if different types of operators are substituted for the POSIX bracket expressions.


```
2.5.0 :007 > pat = /((?<!Cosa)Mesa|Arlington(?=([:space:]|[:punct:])+(AZ|Arizona)))/
 => /((?<!Cosa)Mesa|Arlington(?=(\p{Space}|\p{Punct})+(AZ|Arizona)))/

2.5.0 :008 > pat = /((?<!Cosa)Mesa|Arlington(?=([:space:]|[:punct:])+(AZ|Arizona)))/i
 => /((?<!Cosa)Mesa|Arlington(?=([:space:]|[:punct:])+(AZ|Arizona)))/i

2.5.0 :009 > pat = /((?<!Cosa)Mesa|Arlington(?=([:space:]|[:punct:])+(AZ|Arizona)))/i
 => /((?<!Cosa)Mesa|Arlington(?=(\s|\W)+(AZ|Arizona)))/i

2.5.0 :056 > pat = /(?<!a st)(?i)(?<!juice)\p{Space}/
 => /(?<!a st)(?i)(?<!juice)\p{Space}/

2.5.0 :058 > pat = /(?<!a st)(?i)(?<!stark)\p{Space}/
 => /(?<!a st)(?i)(?<!stark)\p{Space}/
```



-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>