Hi,

In message "UTF8 and Regexp"
    on 02/05/14, Bob Hutchison <hutch / recursive.ca> writes:

|I understand that Ruby's regular expressions handle utf8. I'm having trouble
|specifying uft8 in the pattern. I've tried /ab\x123cd/, and I've tried
|encoding the value character values > 0x7F as utf8 and constructing a string
|from which a regexp is compiled ("ab" << encode(0x123) << "cd" kind of
|thing). Ruby tells me that the regular expression is invalid in both cases.
|
|Is it possible to specify character codes > 0x7F in the patterns of Ruby's
|regular expressions? Any suggestions are more than welcome.

Just embed them directly in the pattern, or

  r = Regexp.compile("ab\304\243cd", 0, "UTF-8")

or even

  r = Regexp.compile("ab#{[0x123].pack('U')}cd", 0, "UTF-8")

Sorry for inconvenience.  It will be far better in the M17N
enhancement process.  Expression like \x{123} in the regular
expression will be allowed.

							matz.