On Thu, 2009-04-23 at 11:37 +0900, James Gray wrote:
> On Apr 22, 2009, at 8:10 PM, Daly wrote:
> 
> > I fixed it by reading about encoding
> > and trial and error, so I'm left with a working solution, but not
> > knowing why it works on the Mac but not in Linux. Could someone please
> > explain?
> 
> I've been trying to covering character encodings with a heavy Ruby  
> slant on my blog for just this reason:
> 
> http://blog.grayproductions.net/articles/understanding_m17n
> 
> My coverage is about 98% complete now, in case you want to browse a bit.

Hi James,

I was wondering if you wanted to add a note on how to deal with
potentially unknown character encodings.  This was one of the more
annoying problems that I hit with trying to use Iconv directly for ruby
1.8.  In the past, I'd used the Mozilla character detection library (in
Java) when doing processing of XML in order to ensure that the file
encoding matched the declaration in the <?xml > header.

Fortunately, I recently found the rchardet gem which is the port of this
library to Ruby, and it's helped me deal with giving more appropriate
encoding information to Iconv.

Usage goes something like this:

 91       cd = CharDet.detect(text)
 92       encoding = cd['encoding']
 93       puts "Reading detected encoding '#{encoding}' text with confidence: %.    2f%%" % [cd['confidence'] * 100]
 94       iconv = Iconv.new("UTF-8", encoding)
 95       puts "Conversion to UTF-8 successful."

This time, I needed this sort of thing when trying to ensure I could
load arbitrary text files from unknown sources into GTK+ widgets.

I've actually rarely been in the case where I knew the encoding of the
input I was trying to deal with if it wasn't the same as the system
default, but maybe that's just me... ;)

The referenced blog post looks really good.  Thanks for your efforts.

Cheers,

ast
-- 
Andrew S. Townley <ast / atownley.org>
http://atownley.org