--5mCyUwZo2JvN/JJP
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

There was a few posts about this, so perhaps somebody will find it
useful.

RFC 2047 is the MIME standard that describes how to use non-ascii
character sets in internet mail. This library takes the approach that if
you give it a string that might have RFC 2047 encoded words in it, and
tell it what character set you'd like the string to be in, it will
convert it (using iconv).

Sam

p.s. Matt, I think this would be a useful addition to rmail, please
steal it! If you do, I'll implement encoding.


--5mCyUwZo2JvN/JJP
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="rfc2047.rb"

# $Id: rfc2047.rb,v 1.2 2003/04/14 00:17:18 sam Exp $
#
# An implementation of RFC 2047 decoding.
#
# This module depends on the iconv library by Nobuyoshi Nakada, which I've 
# heard may be distributed as a standard part of Ruby 1.8.
#
# Copyright (c) Sam Roberts <sroberts / uniserve.com> 2003
#
# This file is distributed under the same terms as Ruby.

require 'iconv'

module Rfc2047

  WORD  r{([!#$%&'*+-/0-9A-Z\\^\`a-z{|}~]+)\?([BbQq])\?([!->@-~]+)\?# :nodoc:

  # Decodes a string, +from+, containing RFC 2047 encoded words into a target
  # character set, +target+. See iconv_open(3) for information on the
  # supported target encodings. If one of the encoded words cannot be
  # converted to the target encoding, it is left in its encoded form.
  def Rfc2047.decode_to(target, from)
    out  rom.gsub(WORD) do
      |word|
      charset, encoding, text  1, $2, $3
      
      # B64 or QP decode, as necessary:
      case encoding
        when 'b', 'B'
          text  ext.unpack('m*')[0]

        when 'q', 'Q'
          # RFC 2047 has a variant of quoted printable where a ' ' character
          # can be represented as an '_', rather than 2, so convert
          # any of these that we find before doing the QP decoding.
          text  ext.tr("_", " ")
          text  ext.unpack('M*')[0]

        # Don't need an else, because no other values can be matched in a
        # WORD.
      end

      # Convert
      #
      # Remember: Iconv.open(to, from)
      begin
        text  conv.open(target, charset) {|i| i.iconv(text)}
      rescue Errno::EINVAL, Iconv::IllegalSequence
        # Replace with the entire matched encoded word, a NOOP.
        text  ord
      end
    end
  end
end


--5mCyUwZo2JvN/JJP
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="test.rb"
Content-Transfer-Encoding: quoted-printable

#!/usr/bin/ruby -w

require 'rfc2047'
require 'test/unit'

=begin

From RFC 2047:

8. Examples

   The following are examples of message headers containing 'encoded-
   word's:

   =?US-ASCII?Q?Keith_Moore?= <moore / cs.utk.edu>

   =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld / dkuug.dk>

   =?ISO-8859-1?Q?Andr=E9?= Pirard <PIRARD / vm1.ulg.ac.be>

   =?ISO-8859-1?B?SWYgeW91IGNhbiByZWFkIHRoaXMgeW8=?= =?ISO-8859-2?B?dSB1bmRlcnN0YW5kIHRoZSBleGFtcGxlLg==?=

   =?ISO-8859-1?Q?Olle_J=E4rnefors?= <ojarnef / admin.kth.se>

   =?ISO-8859-1?Q?Patrik_F=E4ltstr=F6m?= <paf / nada.kth.se>

   Nathaniel Borenstein <nsb / thumper.bellcore.com> (=?iso-8859-8?b?7eXs+SDv4SDp7Oj08A==?=)

=end

class TestVcard < Test::Unit::TestCase

  def test_cases

    # Test Cases:
    #
    # Hash {
    #   encoded-string =>
    #      Hash {
    #         conversion_to_do => result_string
    #      }
    # }
    cases = {
      '=?US-ASCII?Q?Keith_Moore?= <moore / cs.utk.edu>' => {
        'utf-8' => 'Keith Moore <moore / cs.utk.edu>',
        'ascii' => 'Keith Moore <moore / cs.utk.edu>',
        'us-ascii' => 'Keith Moore <moore / cs.utk.edu>',
        'ascii' => 'Keith Moore <moore / cs.utk.edu>',
      },

      '=?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld / dkuug.dk>' => {
        'iso-8859-1' => "Keld J\xF8rn Simonsen <keld / dkuug.dk>",
        'utf-8' => "Keld J\303\270rn Simonsen <keld / dkuug.dk>",
        'ascii' => "=?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld / dkuug.dk>",
      },

      '=?ISO-8859-1?Q?Andr=E9?= Pirard <PIRARD / vm1.ulg.ac.be>' => {
        'iso-8859-1' => "Andr\xe9 Pirard <PIRARD / vm1.ulg.ac.be>",
      },

      '=?ISO-8859-1?B?SWYgeW91IGNhbiByZWFkIHRoaXMgeW8=?= =?ISO-8859-2?B?dSB1bmRlcnN0YW5kIHRoZSBleGFtcGxlLg==?=' => {
        'iso-8859-1' => 'If you can read this yo u understand the example.',
      },

      '=?ISO-8859-1?Q?Olle_J=E4rnefors?= <ojarnef / admin.kth.se>' =>iso-8859-1' => "Olle J\xE4rnefors <ojarnef / admin.kth.se>",
      },

      '=?ISO-8859-1?Q?Patrik_F=E4ltstr=F6m?= <paf / nada.kth.se>' => {
        'iso-8859-1' => "Patrik F\xE4ltstr\xF6m <paf / nada.kth.se>",
      },

      'Nathaniel Borenstein <nsb / thumper.bellcore.com> (=?iso-8859-8?b?7eXs+SDv4SDp7Oj08A==?=)' => {
        'iso-8859-8' => "Nathaniel Borenstein <nsb / thumper.bellcore.com> (\355\345\354\371 \357\341 \351\354\350\364\360)",
        'utf-8'      => "Nathaniel Borenstein <nsb / thumper.bellcore.com> (\327\235\327\225\327\234\327\251 \327\237\327\221 \327\231\327\234\327\230\327\244\327\240)",
      },

    }

    cases.each do
      |src, conversions|

      conversions.each do
        |toset, expected|

        dst = Rfc2047.decode_to(toset, src)

        puts "#{src} -- (#{toset}) --> #{dst.inspect}"

        assert_equal(expected, dst)
       end
    end
  end
end


--5mCyUwZo2JvN/JJP--