Hi,

Currently, REXML prefers iconv module to convert encodings.
Iconv can be useful for general purpose, but, to be frank, is
halfdone, I guess.  In many cases, particular conversion
engines would be much preferable if available.

Also, nkf module, a bundled library, can deal with Japanese
characters in utf-8 as well as in others already, so I'd like
to give priority nkf to over uconv, which is not bundled.


Index: ruby-ruby_1_8/lib/rexml/encoding.rb =================================================================== RCS file: /cvs/ruby/src/ruby/lib/rexml/encoding.rb,v retrieving revision 1.5.2.1 diff -U2 -p -r1.5.2.1 encoding.rb --- ruby-ruby_1_8/lib/rexml/encoding.rb 19 May 2005 03:51:52 -0000 1.5.2.1 +++ ruby-ruby_1_8/lib/rexml/encoding.rb 18 Oct 2005 02:10:51 -0000 @@ -26,26 +26,20 @@ module REXML $VERBOSE = false return if defined? @encoding and enc == @encoding - if enc and enc != UTF_8 - @encoding = enc.upcase + if enc + raise ArgumentError, "Bad encoding name #{enc}" unless /\A[\w-]+\z/n =~ enc + @encoding = enc.upcase.untaint + else + @encoding = UTF_8 + end + err = nil + [@encoding, "ICONV"].each do |enc| begin - require 'rexml/encodings/ICONV.rb' - Encoding.apply(self, "ICONV") + require File.join("rexml", "encodings", "#{enc}.rb") + return Encoding.apply(self, enc) rescue LoadError, Exception => err - raise ArgumentError, "Bad encoding name #@encoding" unless @encoding =~ /^[\w-]+$/ - @encoding.untaint - enc_file = File.join( "rexml", "encodings", "#@encoding.rb" ) - begin - require enc_file - Encoding.apply(self, @encoding) - rescue LoadError - puts $!.message - raise ArgumentError, "No decoder found for encoding #@encoding. Please install iconv." end end - else - @encoding = UTF_8 - require 'rexml/encodings/UTF-8.rb' - Encoding.apply(self, @encoding) - end + puts err.message + raise ArgumentError, "No decoder found for encoding #@encoding. Please install iconv." ensure $VERBOSE = old_verbosity Index: ruby-ruby_1_8/lib/rexml/encodings/EUC-JP.rb =================================================================== RCS file: /cvs/ruby/src/ruby/lib/rexml/encodings/EUC-JP.rb,v retrieving revision 1.6.2.1 diff -U2 -p -r1.6.2.1 EUC-JP.rb --- ruby-ruby_1_8/lib/rexml/encodings/EUC-JP.rb 19 May 2005 03:51:53 -0000 1.6.2.1 +++ ruby-ruby_1_8/lib/rexml/encodings/EUC-JP.rb 31 Oct 2005 04:31:52 -0000 @@ -1,12 +1,27 @@ -require 'uconv' - module REXML module Encoding - def decode_eucjp(str) - Uconv::euctou8(str) - end + begin + require 'uconv' + + def decode_eucjp(str) + Uconv::euctou8(str) + end + + def encode_eucjp content + Uconv::u8toeuc(content) + end + rescue LoadError + require 'nkf' + + EUCTOU8 = '-Ewm0' + U8TOEUC = '-Wem0' - def encode_eucjp content - Uconv::u8toeuc(content) + def decode_eucjp(str) + NKF.nkf(EUCTOU8, str) + end + + def encode_eucjp content + NKF.nkf(U8TOEUC, content) + end end Index: ruby-ruby_1_8/lib/rexml/encodings/SHIFT-JIS.rb =================================================================== RCS file: /cvs/ruby/src/ruby/lib/rexml/encodings/SHIFT-JIS.rb,v retrieving revision 1.2.2.3 diff -U2 -p -r1.2.2.3 SHIFT-JIS.rb --- ruby-ruby_1_8/lib/rexml/encodings/SHIFT-JIS.rb 19 May 2005 10:08:11 -0000 1.2.2.3 +++ ruby-ruby_1_8/lib/rexml/encodings/SHIFT-JIS.rb 31 Oct 2005 04:31:52 -0000 @@ -1,12 +1,27 @@ -require 'uconv' - module REXML module Encoding - def decode_sjis content - Uconv::sjistou8(content) - end + begin + require 'uconv' + + def decode_sjis content + Uconv::sjistou8(content) + end + + def encode_sjis(str) + Uconv::u8tosjis(str) + end + rescue LoadError + require 'nkf' + + SJISTOU8 = '-Swm0' + U8TOSJIS = '-Wsm0' - def encode_sjis(str) - Uconv::u8tosjis(str) + def decode_sjis(str) + NKF.nkf(SJISTOU8, str) + end + + def encode_sjis content + NKF.nkf(U8TOSJIS, content) + end end
-- Nobu Nakada