On Mon, 15 Sep 2008 04:51:55 +1000, James Gray <james / grayproductions.net> wrote: > >> Do you really need to convert encodings in CSV? I would have thought >> that as long as your seperater characters are in a compatible encoding >> to the CSV data, everything should work without having to worry about >> the encodings. > > I believe a conversion is required because: > > * I have to incorporate whatever separators they give me into my ASCII > regular expressions. I'm not sure how I would do that without > conversions if they gave me UTF-16 separators, for example. > * I couldn't reasonably provide defaults without any transcoding. For > example, a comma and quote are useless for UTF-16. > I think understand what you are saying. You are right, you will need transcoding in some cases. I suggest: Say your regular expression (as a string) is "r". Test whether the encoding of "r" is compatible with the input file's encoding (if you are about to read from a file) or the encoding of the input (if it is a string) using Encoding.comaptible?, and if not, then encode "r" to the input's encoding. This encoding may fail, if for example for some odd reason the separator is a multi-byte UTF-16 character, and the default encoding is ASCII, but then this is probably an error anyhow. You will probably need to do a similar thing when building the regexp string in the first place if it includes separators that the user can specify in any encoding. By the way, this transcoding shouldn't be needed in many cases, as many character encodings are ascii compatible. Hope this makes sense. Mike.