On 9/17/2008 3:39 PM, NARUSE, Yui wrote:
> We can convert "all Shift_JIS characters" to Unicode now.
> But current problem is, there are some mappings Shift_JIS and Unicode
> conversion.
> Once you convert data from Shift_JIS to Unicode, true meaning of some
> characters
> may be lost forever. (e.g. YEN SIGN Problem)
> 
> If we develop "a better" conversion, this problem will be more complex.

Is there a complete characterization of this whole problem? It seems
to be the main reason for sticking to non-UTF-8 character sets in
Ruby these days, and concluding from what I have read about it, a
solution could be the addition of missing characters/codepoints to
Unicode. Why does no-one consider going that way, but instead builds
a complicated stack of functions for conversions on top level?

To some extent, it looks like 'some' people like insisting on the
status quo as it makes them feel special, swimming upstream the
Unicode waterfall, retaining on regional locales instead of solving
the issue. I do explicitly not refer to Ruby or the developers, they
just accept these special needs more than other computer language
designers with less sympathy for this anomaly.

Nevertheless, a persisting fix is needed, and I think writing more
and more clutches for encoding conversion goes the wrong way. This
might still be needed for legacy file support, but day-to-day work
should not have to deal with this issue so prominently.

cheers,
- Matthias