As part of my ongoing process to make CSV m17n savvy, I'm needing an =20
encoding safe Regexp.escape().  I'm currently using basically this code:

   @re_esc   =3D   "\\".encode(@encoding)
   @re_chars =3D   %w[ \\ . [ ] ^ $ ?
                     *  + { } ( ) | ].map { |s| s.encode(@encoding) }

   # ...

   def escape_re(str)
     str.chars.map { |c| @re_chars.include?(c) ? @re_esc + c : c }.join
   end

First question:  are there logic flaws there or is escaping a string =20
to safely use in a Regexp that simple?

I decided to check this and see how well it does on Ruby's built-in =20
encodings.  I wrote and ran this simple script:

$ cat re_chars.rb
#!/usr/bin/env ruby -w
# encoding: UTF-8

Encoding.list.each do |encoding|
   begin
     esc =3D "\\".encode(encoding)
     re_chars =3D %w[ \\ . [ ] ^ $ ? * + { } ( ) | ].map { |s| =20
s.encode(encoding) }
   rescue
     puts "Cannot convert to #{encoding}"
   end
end
$ ruby_dev re_chars.rb
Cannot convert to ISO-2022-JP-2
Cannot convert to UTF-7

It looks like I have at least two problem encodings.  I need to come =20
up with a strategy for these.  Some options I am considering:

* Allow these to throw encoding errors and document that they are not =20=

supported.  I must admit that I've never encountered these encodings =20
in the wild and I just suspect they may not be too common.  That leads =20=

me to question the effort spent on them when it seems I am =20
successfully supporting 81 other encodings.  Of course, it would be =20
even cooler to claim I support all of Ruby's encodings=85

* Transcode incoming data for those two encodings to UTF-8.  I haven't =20=

checked if this is even possible, but I hope it would be for at least =20=

UTF-7.  This feels like giving up though.  I've come so far without =20
resorting to this=85

* Disable regular expression escaping for these encodings and hope for =20=

the best.  I doubt this will work though.  If I can't convert my ASCII =20=

regular expressions to the encoding, CSV is going to choke on it anyway.

Yeah, as you can see, I'm out of good ideas.  Please help.  =20
Suggestions very welcome.

One last question, any chance Ruby will eventually provide an encoding =20=

safe Regexp.escape() and allow me to skip this step?  :)

James Edward Gray II