On Oct 23, 2011, at 6:12 AM, Perry Smith wrote:
> And there is a third problem (which is probably a set of problems).  In my application, all the data actually starts off as various EBCDIC code pages. (http://bit.ly/rtTO8F).  Using ICU (http://site.icu-project.org/), I convert these to UTF-8 strings.  I store these in a PostgreSQL database (9.0.4) that is set up with UTF-8 encoding.  But STILL, frequently, something creates strings that are not UTF-8 strings.  As previously stated, I've set all my files to UTF-8 coding as well as set -KU but there are still ways for things to get botched.

This is class of errors I want to know about.

In the course of my RDoc work I was doing similar things (some files in 7-bit ASCII, some in UTF-8, some in another encoding) and outputting everything to UTF-8.

I found that for some combinations of input encodings and arguments the output encoding of the string was not what I expected (not UTF-8).  I filed some bugs and these got fixed for ruby 1.9.3, but I still see a few reports of issues against RDoc that I can't reproduce.

If you come across issues like this it would be super helpful if you could create a test case and create an issue on redmine.

I filed this issue, you can use it as an example:

http://redmine.ruby-lang.org/issues/4380

> And my whole point here is that what Ruby has ended up doing is making simple libraries damn near impossible to write if you really really really really want to do things properly.  Any library that concatenate any strings is open to mistakes.