Hi,

In message "Re: The face of Unicode support in the future"
    on Wed, 12 Jan 2005 00:54:01 +0900, Paul Brannan <pbrannan / atdesk.com> writes:

|On Mon, Jan 10, 2005 at 11:53:48PM +0900, Yukihiro Matsumoto wrote:
|> The "right" definition of characters differs application to
|> application.  That's the reason I don't add a Character class.  I want
|> to leave it to the user.
|
|I don't understand what you mean here.  How is having "abc"[0] return a
|String a better solution than having "abc"[0] return a Character?  Is it
|less restrictive in some way?

It's kinda hard for me to express in detail in English.  The definition
of the "character" had caused difficult and deep discussion (or flame
sometimes) among people who care about characters and encodings time
to time.  I just want to keep away from it.

|Anyway, some questions:
|
|1. Will this be true?
|
|  ?a == "a"

Yes.

|It would allow code like this to be forward-compatible:
|
|  line = gets
|  if line[0] == ?A then
|    ...
|  end

Yes.

|2. What will the encoding be of the character following the ? mark?  Can
|   I write:
|
|  if line[0] == ?<some utf-8 character> then

The encoding of the script file, which can be specified in similar
manner to Python's PEP 263, e.g.

 #!/usr/bin/ruby
 # -*- coding: <encoding name> -*-

The encoding of the script file can be specified by the following
order.

  * PEP 263 like pragma shown above
  * command line option (-K or better name)
  * compile time configuration
  * the default (probably "utf-8")

|3. Can I compare two strings that have two different encodings?

Yes, and the comparison is always false unless

  * encodings of the two strings are both ASCII compatible
  * they have same (7 bits) ASCII character sequence

|4. Will $KCODE change to allow more encodings or will it be going away?

The use of $KCODE will not be encouraged.  It might remain to show the
default encoding (of the script file).

|5. Can there be user-defined encodings (e.g. if some user wants to
|   provide utf-16)?

Yes, but with C extension.  Besides that UTF-16 (both BE and LE) will
be supported without any user works.

|6. Should String#encoding return a String or a Symbol?

A String.

Any other comments or questions?

							matz.