[Tobes <tobin / tobinharris.com>, 2007-01-04 16.55 CET] > Thanks for the links and the advice Carlos. > > I'm actually using Ruby FPDF (http://zeropluszero.com/software/fpdf/), > and couldn't see a dependency on PDF::Writer. However, using iconv to > convert to UTF-16 gives a different result > http://www.tobinharris.com/media/mtq38_utf16.jpg. > > Do you know of any tools that will let me reliably inspect the data in > the database to see what encoding the information is being stored in. > MySQL was setup to store UTF-8, and since the text data is sent from a > UTF-8 formatted web page, I assumed this would be the case. However, > I'm thinking that it wasn't UTF-8 at all, and so need to know what the > original encoding is? > > I'm also definately lacking some knowledge in this area, so any > pointers to resources/tools would be appreciated. Hi. I assumed you were using "railspdfplugin" http://rubyforge.org/projects/railspdfplugin/ which is the first Google result for RPDF, and depends on PDF::Writer. I can't access the Ruby FPDF page right now ("502 Bad Gateway" error message), but if it is based on PHP's FPDF, then you just have to follow the steps here: http://www.fpdf.org/en/tutorial/tuto7.htm (extrapolated to Ruby's FPDF, of course). WRT the screenshot of your other message, there are two possibilities: 1. that application, the MySQL query tool, is not UTF-8 aware. So, it interprets the 2 bytes of "" (197, 130) as 2 characters in some simple-byte encoding (probably latin-1), which gives "" and an unprintable character. Your test line wasn't UTF-8 encoded at all. 2. The application is UTF-8 aware, the test line is in UTF-8, but the data from your web pages was already in UTF-8 and you thought it wasn't and encoded it again to UTF-8. To test if a string is encoded in UTF-8, just examine its bytes p str.unpack("C*") and see if the diacritic letters are encoded with 2 or more bytes (UTF-8), or only one (iso-8859-*, cp*, etc.). (If you see *four* then you encoded them twice :). HTH. Good luck. --