Issue #14458 has been reported by dannyfallon (Danny Fallon).

----------------------------------------
Bug #14458: RubyVM::InstructionSequence compilation loses Regexp encoding
https://bugs.ruby-lang.org/issues/14458

* Author: dannyfallon (Danny Fallon)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.4.3p205 (2017-12-14 revision 61247) [x86_64-darwin16]
* Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN
----------------------------------------
We appear to be losing encoding information for a Regexp object when we pass it through the compiler:

~~~ ruby
irb(main):001:0> "Test".encoding
=> #<Encoding:UTF-8>
irb(main):002:0> RubyVM::InstructionSequence.compile("'Test'.encoding").eval
=> #<Encoding:UTF-8>
irb(main):003:0> /\p{Alnum}/.encoding
=> #<Encoding:UTF-8>
irb(main):004:0> RubyVM::InstructionSequence.compile("/\p{Alnum}/.encoding").eval
=> #<Encoding:US-ASCII>
~~~

I think the encoding should be retained, much like it is for strings. Adding /u to the Regexp object
does retain the encoding but that feels like a burden we shouldn't have to bear?

~~~
irb(main):005:0> RubyVM::InstructionSequence.compile("/\p{Alnum}/u.encoding").eval
=> #<Encoding:UTF-8>
~~~




-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>