Hi, Mikel Lindsaar wrote: > I don't really want to set the regexp to UTF-8 or something and then > transliterate the match strings as that just isn't going to scale I > think when you are talking about emails which can have almost anything > in them, and making a regexp for every encoding type also isn't the > solution. You should set regexp as ASCII or ASCII-8BIT. > The only other solution I can think of is going through TMail and > making all encodings internal to TMail one type (say UTF-8) and then > transliterating all input and output to match. But I am not totally > sure what I will run into on that, as while I understand some of the > issues of encodings and charactersets, I am by no means an expert on > the subject. Yes, you should make all encodings internal to TMail one type, ASCII-8BIT. When you do str.gsub(/\n|\r\n|\r/) { "\r\n" } , you may think you are working with character string. But it's wrong, you are working with byte string. As you said before, you want to make a regexp for every encoding type. Every encoding type means, you are working under characters: bytes. So you set encoding as ASCII-8BIT before work with bytes, and set suitable encoding before work with characters. str = NKF.nkf("-j", "\u{3042 3044 3046}") enc = str.encoding str.force_encoding(Encoding::ASCII_8BIT) true if /\A\e/ =~ str str.force_encoding(enc) -- NARUSE, Yui <naruse / airemix.jp>