On Oct 23, 2011, at 6:12 AM, Perry Smith wrote:
> And there is a third problem (which is probably a set of problems).  =
In my application, all the data actually starts off as various EBCDIC =
code pages. (http://bit.ly/rtTO8F).  Using ICU =
(http://site.icu-project.org/), I convert these to UTF-8 strings.  I =
store these in a PostgreSQL database (9.0.4) that is set up with UTF-8 =
encoding.  But STILL, frequently, something creates strings that are not =
UTF-8 strings.  As previously stated, I've set all my files to UTF-8 =
coding as well as set -KU but there are still ways for things to get =
botched.

This is class of errors I want to know about.

In the course of my RDoc work I was doing similar things (some files in =
7-bit ASCII, some in UTF-8, some in another encoding) and outputting =
everything to UTF-8.

I found that for some combinations of input encodings and arguments the =
output encoding of the string was not what I expected (not UTF-8).  I =
filed some bugs and these got fixed for ruby 1.9.3, but I still see a =
few reports of issues against RDoc that I can't reproduce.

If you come across issues like this it would be super helpful if you =
could create a test case and create an issue on redmine.

I filed this issue, you can use it as an example:

http://redmine.ruby-lang.org/issues/4380

> And my whole point here is that what Ruby has ended up doing is making =
simple libraries damn near impossible to write if you really really =
really really want to do things properly.  Any library that concatenate =
any strings is open to mistakes.

And where it's a bug like #4380, if we can have small test cases we =
should be able to fix them.=