Issue #16158 has been updated by duerst (Martin D=FCrst). I've had a hunch, and have now been able to confirm this hunch: The problem must be related to the fact that there is a 'st' ligature (U+FB= 06) in Unicode. The problem occurs for all the other Latin ligatures just b= efore U+FB06, i.e. for 'ff', 'fi', 'fl', 'ffi' 'ffl', long s with t, and st= . It also occurs for the components of the Armenian ligatures just followin= g, e.g. ``` $ ruby -ve 'pat =3D /(?<!a \u0574\u0576)\p{Space}/i' ruby 2.7.0dev (2019-07-06T03:43:38Z trunk f296c260ef) [x86_64-cygwin] -e:1: invalid pattern in look-behind: /(?<!a \u0574\u0576)\p{Space}/i ``` It doesn't occur for Hebrew ligatures: ``` $ ruby -ve 'pat =3D /(?<!a \u05D9\u05B4)\p{Space}/i' ruby 2.7.0dev (2019-07-06T03:43:38Z trunk f296c260ef) [x86_64-cygwin] ``` My guess is that this is because Latin and Armenian have case conversion, b= ut Hebrew doesn't. This would match with the fact that the error is only pr= oduced when matching is case-insensitive. ---------------------------------------- Bug #16158: "st" Character Sequence In Regex Look-Behind Causes Illegal Pat= tern Error When Combined With POSIX Bracket Expressions And Case Insensitiv= ity Flag https://bugs.ruby-lang.org/issues/16158#change-81560 * Author: michaeltomko (Michael Tomko) * Status: Open * Priority: Normal * Assignee: = * Target version: = * ruby -v: ruby 2.7.0dev (2019-09-11 master 146677a1e7) [x86_64-openbsd6.5] * Backport: 2.5: UNKNOWN, 2.6: UNKNOWN ---------------------------------------- *This is my first Ruby bug submission. Please let me know if there is anyth= ing else that I can provide that would be helpful. Thanks for your time!* I've tried just about as many combinations as I can think of and I have bee= n able to narrow down the issue to the following components being present i= n a regular expression. * The character sequence "st" either preceded by any characters OR being a = part of a top-level alternation inside of a look-behind. The issue occurs w= ith both positive and negative look-behinds. ex: `(?<!Costa)` or `(?<!Bob|S= ally|Stan)` or `(?<=3D st)` * Case insensitivity either being set globally or inside of the regex with = `(?i)` preceding the look-behind. * Any curly-style POSIX bracket expression included anywhere in the regex. = ex: `\p{Space}` or `\p{L}` Here are some examples of the error. I have tested this on 2.5.0 locally an= d [on 2.5.3 with Rubular] (https://rubular.com/r/jnr98E9JfAZJIQ). ``` 2.5.0 :044 > pat =3D /(?<!a st)\p{Space}/i Traceback (most recent call last): SyntaxError ((irb):44: invalid pattern in look-behind: /(?<!a st)\p{Space}/= i) 2.5.0 :047 > pat =3D /(?i)(?<!a st)\p{Space}/ Traceback (most recent call last): SyntaxError ((irb):47: invalid pattern in look-behind: /(?i)(?<!a st)\p{Spa= ce}/) 2.5.0 :016 > pat =3D /(?<!Costa)Mesa(\p{Space}|\p{Punct})+(AZ|Arizona)/i Traceback (most recent call last): SyntaxError ((irb):16: invalid pattern in look-behind: /(?<=3DCosta)Mesa(\p= {Space}|\p{Punct})+(AZ|Arizona)/i) ``` My expectation would be that this regular expression would compile as writt= en, as it does in JRuby and in MacOS regex testing apps like Patterns or Re= ggy. It does compile as expected if the case insensitivity flag is removed or in= stantiated after the look-behind, if the "st" character sequence is first i= n the look-behind and not apart of an alternation, or if different types of= operators are substituted for the POSIX bracket expressions. ``` 2.5.0 :007 > pat =3D /((?<!Cosa)Mesa|Arlington(?=3D([:space:]|[:punct:])+(A= Z|Arizona)))/ =3D> /((?<!Cosa)Mesa|Arlington(?=3D(\p{Space}|\p{Punct})+(AZ|Arizona)))/ 2.5.0 :008 > pat =3D /((?<!Cosa)Mesa|Arlington(?=3D([:space:]|[:punct:])+(A= Z|Arizona)))/i =3D> /((?<!Cosa)Mesa|Arlington(?=3D([:space:]|[:punct:])+(AZ|Arizona)))/i 2.5.0 :009 > pat =3D /((?<!Cosa)Mesa|Arlington(?=3D([:space:]|[:punct:])+(A= Z|Arizona)))/i =3D> /((?<!Cosa)Mesa|Arlington(?=3D(\s|\W)+(AZ|Arizona)))/i 2.5.0 :056 > pat =3D /(?<!a st)(?i)(?<!juice)\p{Space}/ =3D> /(?<!a st)(?i)(?<!juice)\p{Space}/ 2.5.0 :058 > pat =3D /(?<!a st)(?i)(?<!stark)\p{Space}/ =3D> /(?<!a st)(?i)(?<!stark)\p{Space}/ ``` -- = https://bugs.ruby-lang.org/ Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=3Dunsubscribe> <http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>