Nikolai Weibull wrote:

> 

> 
> Also, how often is it actually necessary to convert strings to their
> ordinal value in their encoding table?  

If you're working on binary data and want to read the raw byte string 
instead of unpacking it into an array of Fixnums?  I don't know how 
common this is in practice.  I was using a string as a compact sequence 
of bytes to represent a Sudoku grid, which is what made me bring this up.

You say that characters-as-strings makes perfect sense:

> Perhaps, but this is a tradeoff of keeping "characters" and "strings"
> in the same class.  As already mentioned,  "characters" will currently
> be represented by one-character-long Strings in 1.9/2.0.  To me, this
> makes perfect sense, considering that one of the main design goals for
> Strings in 1.9/2.0 is that they should be able to handle most any
> encoding scheme (as I've understood it).
> 

But then you muse about a new type of Fixnum to represents characters!

> Anyway, while we're on the topic, what exactly should String#ord
> return?  I'd argue that a subclass of Fixnum would make sense, which
> would have methods like #alpha?, #digit?, and so on, according to what
> information is provided by the encoding scheme.  This can easily get a
> bit too Unicode-centric, but I prefer writing

I agree with the need for methods like this, but if that's going to 
happen, I'd say the class should just be called a Character, and there 
should be a way to get Character objects directly from strings without 
having to stick the ord method in the middle.  Personally, I'd suggest 
that String.[x] with one argument should return a Character object, and 
String.[x,1] should return a String of length one.

My own musings along these lines make characters a subclass of Symbol 
rather than of Fixnum.  So ?A would be an object much like :A, but would 
have additional character-specific methods, such as #encoding, #alpha?, etc.

>  "a".ord.alpha?
> 
> to
> 
>  Codepoint.alpha?("a".ord)
> 
> or something similar.  I guess a good name for this subclass would be
> Codepoint, but then perhaps #ord isn't a very good name and #codepoint
> would make more sense.
> 
> Finally, perhaps the type of methods I've described above, i.e.,
> #alpha?, #digit?, ..., should be methods of String for strings of
> length one character, like #ord.
> 
> Let's try it out:
> 
>  "a".alpha?
> 
> yes, yes I like that.  Still, String may be getting a bit overloaded by 
> then.

I think it is asking too much to have the String class represent byte 
strings, multi-byte character strings, and individual characters.

>> I hope I'm not coming across as argumentative in this thread.
> 

> 
> http://redhanded.hobix.com/inspect/futurismUnicodeInRuby.html
> 

Thanks!

Let me also respond to a couple of things from other messages:

> Like the fact that #ordAt isn't a very Rubyish name. 

My bad.  That was a typo based on my background in Java and JavaScript. 
  I don't actually like the idea of a separate method, but if one were 
needed, ord_at would obviously be a better name than ordAt.

David Black wrote:

> It's not going to be backward compatible in any case, since [] will
> have changed.  I think the reasoning is that people use [].chr more
> than they're likely to use [].ord, so offloading the less simple
> behavior onto the ord case will save method calls in the long run. 

I would have thought that people would use s[x,1] instead of s[x].ord, 
avoiding the extra method call.

	David Flanagan