On Sep 13, 2008, at 5:39 PM, James Gray wrote:

> * What is the proper way to build a regular expression in some  
> encoding I have in a variable?

According to the new Pickaxe, Regexp is suppose to pick up the  
encoding of a passed in String.  I'm not finding that to be totally  
accurate though, because this code:

   ascii_str = <<-END_PARSER
   \\G(?:\\A|,)     # anchor the match
   (?: "( (?>[^"]*) # find quoted fields
          (?> ""
          [^"]* )* )"
       |            # ... or ...
       ([^",]*)     # unquoted fields
       )
   (?=,|\\z)        # ensure we are at field's end
   END_PARSER
   p ascii_str.encoding
   ascii_re = Regexp.new(ascii_str)
   p ascii_re.encoding

   sjis_str = ascii_str.encode("SJIS")
   p sjis_str.encoding
   sjis_re = Regexp.new(sjis_str)
   p sjis_re.encoding

prints:

   #<Encoding:US-ASCII>
   #<Encoding:US-ASCII>
   #<Encoding:Shift_JIS>
   #<Encoding:US-ASCII>

I tried to test with UTF-16 as well, since I think that's a good edge  
case.  However, we don't seem to have a converter for that:

   $ ruby_dev ~/Desktop/regexp_encoding.rb #<Encoding:US-ASCII>
   #<Encoding:US-ASCII>
   /Users/james/Desktop/regexp_encoding.rb:15:in `encode': code  
converter not found (US-ASCII to UTF-16) (Encoding::NoConverter)
   	from /Users/james/Desktop/regexp_encoding.rb:15:in `<main>'

I guess that means I need to be using Iconv anyway, to increase the  
amount of encodings I can support.  Right?

James Edward Gray II