On 6/19/06, Tim Bray <tbray / textuality.com> wrote:
> On Jun 19, 2006, at 6:31 AM, Austin Ziegler wrote:
>> This entire discussion is centered around a proposal to do exactly
>> that. There are many *very good* reasons to avoid doing this. Unicode
>> Is Not Always The Answer.
>>
>> It's *usually* the answer, but there are times when it's just easier
>> to work with data in an established code page.
> To enlighten the ignorant, could you describe one or two scenarios
> where a Unicode-based String class would get in the way?  To use your
> words, make things less easy?  I would probably not agree that there
> are "*many good*" reasons to avoid this, but probably that's just
> because I've been fortunate enough to not encounter the problem
> scenarios.  This material would have application in a far larger
> domain than just Ruby, obviously.  -Tim

I've found that a Unicode-based string class gets in the way when it
forces you to work around it. For most text-processing purposes, it
*isn't* an issue. But when you've got text that you don't *know* the
origin encoding (and you're probably working in a different code page),
a Unicode-based string class usually guesses wrong.

Transparent Unicode conversion only works when it is guaranteed that the
starting code page and the ending code page are identical. It's
*definitely* a legacy data issue, and doesn't affect most people, but it
has affected me in dealing with (in a non-Ruby context) NetWare.
Additionally, the overhead of converting to Unicode if your entire data
set is in ISO-8859-1 is unnecessary; again, this is a specialized case.

More problematic, from the Ruby perspective, is the that a Unicode-based
string class would require that there be a wholly separate byte vector
class; I am not sure that is necessary or wise. The first time I read a
JPG into a String, I was delighted -- the interface presented was so
clean and nice as opposed to having to muck around in languages that
force multiple interfaces because of such a presentation.

Like I said, I'm not anti-Unicode, and I want Ruby's Unicode support to
be the best, bar none. I'm not willing to compromise on API or
flexibility to gain that, though.

-austin
-- 
Austin Ziegler * halostatue / gmail.com * http://www.halostatue.ca/
               * austin / halostatue.ca * http://www.halostatue.ca/feed/
               * austin / zieglers.ca