James Gray wrote:
> I'm trying to document the Encoding Regexp objects receive for the m17n 
> series on my blog.  This is how I think it works:
> 
> * A / literal is given a US-ASCII Encoding if it contains only 7-bit 
> characters
> * A / literal receives the current source Encoding when it contains 
> 8-bit characters
> * The old /u and /n style modifiers still work to force a UTF-8 or 
> US-ASCII Encoding
There are /e (EUC-JP) and /s (Windows-31J).
And these are set Regexp::FIXEDENCODING.
This raise exceptions on strings with other encodings
even if the regexp contains only 7-bit.
The constant Regexp::FIXEDENCODING is defined in 1.9.2
but the value is also used in 1.9.1.

> * A / literal that would be US-ASCII due to the source Encoding or /n 
> will be upgraded to ASCII-8BIT by hex, octal, control, meta, or 
> control-meta byte escapes (as discussed in [ruby-core:23184])
simillar to above, /n raise warnings on other than ASCII-8BIT strings.

> * A / literal will receive a UTF-8 Encoding if it includes \u escapes
> * Regexp objects constructed with Regexp::new() receive the Encoding of 
> the String passed containing the regular expression
> Am I right so far?  Am I missing any variations?
> 
> Am I right that Regexp's favor US-ASCII because it maximizes their 
> compatibility?  It makes it so you can use them on any ASCII compatible 
> String instead of just a String in the source Encoding, right?

Yes, and if you set Regexp::FIXEDENCODING the regexp will match only the
same encoding.

P.S.
If you write about regexp, the difference of /i and character class
between Unicode and non-Unicode may be a topic.

-- 
NARUSE, Yui  <naruse / airemix.jp>