On 2/6/07, David Flanagan <david / davidflanagan.com> wrote:
> Daniel Berger wrote:

> > I'm afraid I don't see such a method as being general enough to warrant
> > inclusion as part of the core class.
>
> This is a backward compatibility issue.  It seems to me that there will
> be a general need for a simple way to achieve the the 1.8 behavior.
> I think that warrants the addition of a method.

There is no 1.8 behavior to maintain, no backward compatibility to
achieve.  Sure, String#[] used to return an Fixnum for the byte value
of the byte N, when passed a single Integer argument N, but that won't
be the case anymore.  It will return a one-character long String
(containing perhaps many bytes), which can in turn be converted to a
Fixnum using String#ord.  I think this makes perfect sense considering
the way strings will be represented in 1.9/2.0.

Also, how often is it actually necessary to convert strings to their
ordinal value in their encoding table?  That's mostly for scanners and
parsers using lookup tables, but I'd argue that you'll need to
optimize those in other ways than being able to turn a string into its
ordinal value, and besides, you'll usually be splitting the input into
strings of length one before invoking #ord on them anyway.

> While I'm posting on this again, let me add another response to Matz's
> last post.  Matz wrote:
>
> > I just followed Python convention here.
>
> There's a small difference here.  In Python ord() is a function, not a
> method of the String class.  I have an easier time accepting a global
> function that places a 1-character restriction on its string argument,
> than I do a String method that only functions on 1 character strings.
> If one-character strings have different behavior than other strings,
> then shouldn't they be members of a different class?  If I can only call
> ord on some strings, shouldn't I be able to use respond_to? :ord on
> those strings?

Perhaps, but this is a tradeoff of keeping "characters" and "strings"
in the same class.  As already mentioned,  "characters" will currently
be represented by one-character-long Strings in 1.9/2.0.  To me, this
makes perfect sense, considering that one of the main design goals for
Strings in 1.9/2.0 is that they should be able to handle most any
encoding scheme (as I've understood it).

Anyway, while we're on the topic, what exactly should String#ord
return?  I'd argue that a subclass of Fixnum would make sense, which
would have methods like #alpha?, #digit?, and so on, according to what
information is provided by the encoding scheme.  This can easily get a
bit too Unicode-centric, but I prefer writing

  "a".ord.alpha?

to

  Codepoint.alpha?("a".ord)

or something similar.  I guess a good name for this subclass would be
Codepoint, but then perhaps #ord isn't a very good name and #codepoint
would make more sense.

Finally, perhaps the type of methods I've described above, i.e.,
#alpha?, #digit?, ..., should be methods of String for strings of
length one character, like #ord.

Let's try it out:

  "a".alpha?

yes, yes I like that.  Still, String may be getting a bit overloaded by then.

> I hope I'm not coming across as argumentative in this thread.

Of course you are, which is a good thing.  We're trying to come to an
agreement over how to deal with Strings in 1.9/2.0 and we can only get
there by reasoning about how to continue.

> I'm just
> having a hard time groking ord as it stands now. Perhaps this is due to
> ignorance: has anything been written (in English) explaining the whole
> scope of the text-processing changes in 1.9? I mean not just that
> characters are now single-character strings, but how multi-byte
> characters are going to be handled by the String class?

Considering that I'm writing (well, sadly, it's on hold at the
moment...life and all that) a library for implementing the scheme as
I've understood it (specifically for the UTF-8 encoding), I'd be very
interested in such a write-up.  The only description I've had to go on
is

http://redhanded.hobix.com/inspect/futurismUnicodeInRuby.html

which leaves a lot to be desired.

  nikolai