Quoting matz / ruby-lang.org, on Thu, Oct 19, 2006 at 02:49:30PM +0900:
> Hi,
> 
> In message "Re: Symbol < String in Ruby > 1.8"
>     on Thu, 19 Oct 2006 14:08:07 +0900, Sam Roberts <sroberts / uniserve.com> writes:
> 
> |Been working with lua recently (lua.org), it only has immutable strings,
> |so all strings are effectively interned at creation, comparison between
> |them is "cheap" after that, and memory is collapsed because there is
> |never more than one instance of the same string.
> 
> Interesting.  All strings?  Is the result of string substitution, for
> example, also interned?

Yes, all:

  Like earlier interpreted languages, such as Snobol [11] and Icon [10],
  Lua internalizes strings using a hash table: it keeps a single copy of
  each string with no duplications. Moreover, strings are immutable: once
  internalized, a string cannot be changed. Hash values for strings are
  computed by a simple expression that mixes bitwise and arithmetic
  operations, thus shuffling all bits involved. Hash values are saved when
  the string is internalized to allow fast string comparison and table
  indexing. The hash function does not look at all bytes of the string if
  the string is too long. This allows fast hashing of long strings.
  Avoiding loss of performance when handling long strings is important
  because they are common in Lua. For instance, it is usual to process
  files in Lua by reading them completely into memory into a single long
  string.

  - http://www.tecgraf.puc-rio.br/~lhf/ftp/doc/jucs05.pdf

The basic structure of the language is based around creative uses of
hybrid hash-tables/arrays. Since strings are used so much as keys into
these tables, I think that always having a hash for strings must be
particularly nice.

Anyhow, works for Lua, they say :-), but its a very different language.

> Module can not rely on the internal structure of object, so that all
> methods defined in the module (Textual?) should be based on some
> primitive methods.  For example, methods in Enumerable are based on
> #each (and several others).  I am not sure we can define such
> primitives for Textual methods, without hindering the performance.

> Besides that, making two independent classes, String and Symbol (or
> InternedString or whatever) is rather trivial.  I can do it in 10
> minutes or so.  The point is the rationale, and trade-off.

It struck me as surprising that Symbol derives from String, since it
does less. But you describe being frozen and interned as an additional
feature, not the removal of the feature of mutability. I can see that
point of view, too. Surprise is a matter of experience, I probably will
get used to it.  If it worked for Smalltalk, it must have some points in
favor.

I still don't understand why there is Symbol at all in Ruby, though, it
seems that it could be entirely replaced by String, UNLESS it does
something faster/better than just String#freeze. I know the object_ids
are the same, but using Symbol instead of String doesn't seem to make
comparisons fast, at least in ruby code (maybe it is faster in C?).

Maybe things have changed, I benchmarked comparisons between short
strings and symbols in ruby 1.8 last year to see if I could speed up a
library that the profiler said was spending most of its time in
String#==, and it didn't seem to make a difference.

Sam