Nikolai Weibull wrote: > > > Also, how often is it actually necessary to convert strings to their > ordinal value in their encoding table? If you're working on binary data and want to read the raw byte string instead of unpacking it into an array of Fixnums? I don't know how common this is in practice. I was using a string as a compact sequence of bytes to represent a Sudoku grid, which is what made me bring this up. You say that characters-as-strings makes perfect sense: > Perhaps, but this is a tradeoff of keeping "characters" and "strings" > in the same class. As already mentioned, "characters" will currently > be represented by one-character-long Strings in 1.9/2.0. To me, this > makes perfect sense, considering that one of the main design goals for > Strings in 1.9/2.0 is that they should be able to handle most any > encoding scheme (as I've understood it). > But then you muse about a new type of Fixnum to represents characters! > Anyway, while we're on the topic, what exactly should String#ord > return? I'd argue that a subclass of Fixnum would make sense, which > would have methods like #alpha?, #digit?, and so on, according to what > information is provided by the encoding scheme. This can easily get a > bit too Unicode-centric, but I prefer writing I agree with the need for methods like this, but if that's going to happen, I'd say the class should just be called a Character, and there should be a way to get Character objects directly from strings without having to stick the ord method in the middle. Personally, I'd suggest that String.[x] with one argument should return a Character object, and String.[x,1] should return a String of length one. My own musings along these lines make characters a subclass of Symbol rather than of Fixnum. So ?A would be an object much like :A, but would have additional character-specific methods, such as #encoding, #alpha?, etc. > "a".ord.alpha? > > to > > Codepoint.alpha?("a".ord) > > or something similar. I guess a good name for this subclass would be > Codepoint, but then perhaps #ord isn't a very good name and #codepoint > would make more sense. > > Finally, perhaps the type of methods I've described above, i.e., > #alpha?, #digit?, ..., should be methods of String for strings of > length one character, like #ord. > > Let's try it out: > > "a".alpha? > > yes, yes I like that. Still, String may be getting a bit overloaded by > then. I think it is asking too much to have the String class represent byte strings, multi-byte character strings, and individual characters. >> I hope I'm not coming across as argumentative in this thread. > > > http://redhanded.hobix.com/inspect/futurismUnicodeInRuby.html > Thanks! Let me also respond to a couple of things from other messages: > Like the fact that #ordAt isn't a very Rubyish name. My bad. That was a typo based on my background in Java and JavaScript. I don't actually like the idea of a separate method, but if one were needed, ord_at would obviously be a better name than ordAt. David Black wrote: > It's not going to be backward compatible in any case, since [] will > have changed. I think the reasoning is that people use [].chr more > than they're likely to use [].ord, so offloading the less simple > behavior onto the ord case will save method calls in the long run. I would have thought that people would use s[x,1] instead of s[x].ord, avoiding the extra method call. David Flanagan