I was playing around with the RMail package and I was missing RFC-2047 
support. I found the "module Rfc2047" in 
<20031204151316.GC849@jupp%gmx.de>
but noticed the following:

In the regex to discover encoded words:

|   WORD = %r{=\?([!#$%&'*+-/0-9A-Z\\^\`a-z{|}~]+)\?([BbQq])\?([!->@-~]+)\?=} # :nodoc:

I had to change % to \% to run. Maybe it's just Cygwin.

The second thing is that the module doesn't correctly interpret the 
"encoded-word - linear white space - encoded word" sequence, where 
all the white space should be deleted.

So I added a regex to delete this whitespace before further processing:

> module Rfc2047
> 
>   WORD = %r{=\?([!#$\%&'*+-/0-9A-Z\\^\`a-z{|}~]+)\?([BbQq])\?([!->@-~]+)\?=} # :nodoc:
>|   WORDSEQ = %r{(=\?[!#$\%&'*+-/0-9A-Z\\^\`a-z{|}~]+\?[BbQq]\?[!->@-~]+\?=)\s*(=\?[!#$\%&'*+-/0-9A-Z\\^\`a-z{|}~]+\?[BbQq]\?[!->@-~]+\?=)}

[Comment skipped]

>   def Rfc2047.decode_to(target, from)
>|     from.gsub!(WORDSEQ, '\1\2')
> 
>     out = from.gsub(WORD) do
>       |word|
>       charset, encoding, text = $1, $2, $3

It works so far, but I wonder whether '\s*' is the correct expression 
and whether there is a more efficient way to do this.


I also observed that decoding of non-Western character sets (Win-1251 
to 
Big5) to UTF-8 didn't work. Does anybody already suspect why or do I 
have
to track down the error further?
-- 
Oliver Cromm