Hi,

James Gray wrote:
> On Sep 15, 2008, at 3:49 AM, Michael Selig wrote:
> 
>> On Mon, 15 Sep 2008 18:08:14 +1000, Tanaka Akira <akr / fsij.org> wrote:
>>
>>> In article <48cddb5533ad_8725cd9524342 / redmine.ruby-lang.org>,
>>>  Michael Selig <redmine / ruby-lang.org> writes:
>>>
>>>> UTF-16 & UTF-32 (and maybe other non-ascii compatible encodings) 
>>>> don't seem to be work as Regexp patterns.
>>>>
>>>> Regexp.new("abc".encode("UTF-16BE"))
>>>> ==> EncodingCompatibilityError: incompatible character encodings: 
>>>> US-ASCII and UTF-16BE
>>>
>>> % ruby -ve 'p Regexp.new("abc".encode("UTF-16BE")) =~ 
>>> "abc".encode("UTF-16BE")'
>>> ruby 1.9.0 (2008-09-15 revision 19356) [i686-linux]
>>> 0
>>
>> I see, I have diagnosed the problem wrongly. I was using irb.
>>
>> ruby -ve 'p Regexp.new("abc".encode("UTF-16BE"))'
>> ruby 1.9.0 (2008-09-03 revision 19073) [i686-linux]
>> -e:1:in `p': incompatible character encodings: UTF-16BE and ASCII-8BIT 
>> (EncodingCompatibilityError)
>>     from -e:1:in `<main>'
>>
>> This is the error I was getting in irb, and I mistakenly assumed it 
>> was from the Regexp::new.
>> It is a different problem - not as bad as I thought!
> 
> So it's inspect() that has the issues, right?

YES, a reason of this problem is Regexp#inspect.
So a patch is following.

--- re.c        (revision 19371)
+++ re.c        (working copy)
@@ -381,7 +381,7 @@ rb_reg_desc(const char *s, long len, VAL
  {
      VALUE str = rb_str_buf_new2("/");

-    rb_enc_copy(str, re);
+    rb_enc_associate(str, rb_usascii_encoding());
      rb_reg_expr_str(str, s, len);
      rb_str_buf_cat2(str, "/");
      if (re) {

The result of Regexp#inspect is only for see the content of regexp to debug,
so there may be no reason to keep original encoding.
# Of course Regexp#source must keep it.


Anyway, Regexp#to_s is alias of Regexp#source now.
But Regexp#inspect is more readble.
How about make Regexp#to_s as alias of Regexp#inspect ?

  *      r1 = /ab+c/ix           #=> /ab+c/ix
  *      s1 = r1.to_s            #=> "(?ix-m:ab+c)"
  *      r2 = Regexp.new(s1)     #=> /(?ix-m:ab+c)/
  *      r1 == r2                #=> false
  *      r1.source               #=> "ab+c"
  *      r2.source               #=> "(?ix-m:ab+c)"

-- 
NARUSE, Yui  <naruse / airemix.jp>