Issue #14458 has been updated by dannyfallon (Danny Fallon). znz (Kazuhiro NISHIYAMA) wrote: > I think `/p{Alnum}/` is US-ASCII only, so encoding is US-ASCII. > > ``` > % irb -r irb/completion --simple-prompt > >> puts "/\p{Alnum}/.encoding" > /p{Alnum}/.encoding > => nil > >> eval "/\p{Alnum}/.encoding" > => #<Encoding:US-ASCII> > ``` > > You can use `"/\\p{Alnum}/.encoding"` or `'/\p{Alnum}/.encoding'`. Well, colour me embarrassed. Thank you for pointing out that subtlety in my example. I'm pretty sure we can close this one as invalid, sorry for wasting your time. ---------------------------------------- Bug #14458: RubyVM::InstructionSequence compilation loses Regexp encoding https://bugs.ruby-lang.org/issues/14458#change-70301 * Author: dannyfallon (Danny Fallon) * Status: Open * Priority: Normal * Assignee: * Target version: * ruby -v: ruby 2.4.3p205 (2017-12-14 revision 61247) [x86_64-darwin16] * Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN ---------------------------------------- We appear to be losing encoding information for a Regexp object when we pass it through the compiler: ~~~ ruby irb(main):001:0> "Test".encoding => #<Encoding:UTF-8> irb(main):002:0> RubyVM::InstructionSequence.compile("'Test'.encoding").eval => #<Encoding:UTF-8> irb(main):003:0> /\p{Alnum}/.encoding => #<Encoding:UTF-8> irb(main):004:0> RubyVM::InstructionSequence.compile("/\p{Alnum}/.encoding").eval => #<Encoding:US-ASCII> ~~~ I think the encoding should be retained, much like it is for strings. Adding /u to the Regexp object does retain the encoding but that feels like a burden we shouldn't have to bear? ~~~ irb(main):005:0> RubyVM::InstructionSequence.compile("/\p{Alnum}/u.encoding").eval => #<Encoding:UTF-8> ~~~ -- https://bugs.ruby-lang.org/ Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe> <http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>