On Sep 10, 2008, at 9:55 AM, Tanaka Akira wrote:

>> Yes, there are lots of others.  For example, a full-text indexing
>> system dealing with a word like Qu=E9bec, which needs to index it the
>> same whether the =E9 appears as one codepoint or two.
>
> "=E9" is a character, even if it is represented as two
> codepoints.
>
> So ruby should treat it as a character.

Yes, but that gets really complicated really fast [1]. And that's not =20=

even considering locale dependent features.

The easiest solution is to implement something in the core library =20
that works for most people and give low-level access to people who =20
need to implement more complicated text processing.

Manfred

[1] http://unicode.org/reports/tr29/tr29-13.html=