On Fri, 7 Jan 2005 20:28:51 +0900, Yukihiro Matsumoto
<matz / ruby-lang.org> wrote:
> Hi,
> 
> In message "Re: The face of Unicode support in the future"
>     on Fri, 7 Jan 2005 15:26:47 +0900, Mathieu Bouchard <matju / sympatico.ca> writes:
> 
> |Yes, I'd like those two:
> |
> | * characters are represented by integers
> | * "abc"[0] returns 97 instead of "a".
> |
> |Do I need to submit a RCR for those, or should I go for two of them ?
> 
> I think a fixnum is not enough since a character may not be
> represented by a single codepoint, e.g. character composition, or
> surrogation.  Besides that, a character is represented by combination
> of codepoint(s) and a character set that defines codepoint.
> 
> From my observation, characters (codepoint numbers under the current
> implementation) are not commonly used among Ruby users, so making them
> strings would not hinder overall performance.

It would be useful to have a handy way of getting the codepoint for
the characters:

  "a".codepoint_at(0) returns 97

Similarly, there are times when I simply want a raw string -- I use
Strings from time to time for images; can we ensure that there's a
"raw" encoding where:

  x = "abc"
  x.encoding = "raw"
  puts x[0] # 97

This would allow people to keep the same semantics that they have now
as long as they use "raw" strings.

-austin
-- 
Austin Ziegler * halostatue / gmail.com
               * Alternate: austin / halostatue.ca