On Tue, May 15, 2007 at 06:42:04PM +0900, enduro (Sven Suska) wrote:
> >The programs for which it makes sense to convert strings (received from 
> >some
> >external source, e.g. a database) to symbols for optimisation purposes, 
> >i.e.
> >where the benefits are measurable, will be pretty few. 
> >
> Yes, I agree.
> (That's what I tried to address by the two lines after the quote above,
> perhaps I should have put a smiley in there :-) )
> 
> >And you also open yourself to a symbol exhaustion denial-of-service.
> > 
> >
> Yes, of course.
> But my point is: Let the system take care of that.
> I want a Ruby that just works - crystal-clear, transparently, reliably.
> :-)
> And it already does in most cases. And there is a lot that can be improved.
> And one such improvements could be a garbage collection for symbols. (I 
> think.)

But then what you want are not symbols, but true immutable strings. By that
I mean: some object where I can write 10MB of binary dump. If I want to add
one character to the end of it, then I create another object containing
10MB+1byte of binary dump, and the old 10MB object is garbage-collected.

Now, there have been arguments that *all* strings in Ruby should have been
immutable in the first place, and I can sympathise with them. After all,
numbers are immutable, and so are certain other classes. But pragmatically,
there are cases where it is just so *useful* to append to a string. Besides,
maintaining the singleton property is hard for large binary objects - i.e.
when I create another 10MB binary dump, I have to check whether it's the
same as any other object which already exists.

(And of course, very large numbers are Bignums, which are not singletons)

> >That is, as far as I know, the symbol table is never garbage collected. 
> >Once
> >a symbol, always a symbol.
> >
> I'm not a core programmer, maybe i am asking to much,
> but I think it should be possible without slowing anything down.
> One very simple idea I can think of, is the following:
> Set a limit to the number of symbols and if it is reached
> the GC wil be invoked in a special symbol-mode, marking all symbols that are
> still in use and completely re-generates the symbol-table from scratch.

Yes, but why??? In real life, real world programs, only a few hundred unique
method names are used. So let them be symbols.

If you are going to create a million different symbols, or symbols which are
millions of bytes long, then use a String. That's what they are there for!

"Doctor, it hurts when I do this" -- "Then don't do that!"

What you seem to be saying is "I don't want there to be two different types
of object, one for method names and one for holding blobs of data", but I
don't understand this. Symbols work, are fast, and personally I find them
aesthetically pleasing: one is a sort of tag for method names, and one is a
holder of blobs of data which may come from the outside world or from my own
computations.

> Yes, I really must admit, I also like the cleanness of current Symbols.
> But then, my experience is that this clearness is not worth a lot,
> because the border towards "dirty" strings must be crossed often.
> (That's why I called sticking to the clearness "temping" in my last post.)

I don't think so. The examples I've seen so far are:

(1) Method names which are created algorithmically. That is, you know you
have a method called "foo" and you want to call another method called
"foo=". It works, where's the problem?

    send("#{mname}=")

Yes, you've made a conversion to a string, and back again. Big deal. The
only way to improve this would be to have symbol algebra, e.g.
    (:foo + :=) == :foo=

But internally it would almost certainly be implemented the same way,
because you'd have to look up the symbol ID to convert it into its character
representation, manipulate the characters, and then lookup back into a
symbol.

Or, you'd have to drop symbols entirely and make *every* method call use a
string of characters as the method name - which would be very expensive.

Or, you'd have to make all Strings immutable, so that the the string ID
could be used as a method call tag. See above for reasons why that is
undesirable.

(2) Rails, which allows you to be inconsistent between :foo=>:bar and
:foo=>"bar" and "foo"=>:bar and "foo"=>"bar" (at least sometimes - not
always). IMO it would have been better if Rails had stuck to one or the
other, but that's too late to undo.

Rails has introduced its own bast^H^H^H^Hextensions to the language anyway.

> Ruby is not yet good in many other aspects:
> speed, threads, documentation.

There is really *excellent* documentation for Ruby. You have to pay for it,
but the books I am thinking of are well worth the money.

You may not like the idea that the language designer and contributors are
not getting any money directly for their work, whilst book publishers are. I
can live with that.

I find that speed is good enough, and threads are better than most (have you
tried writing threaded programs in Perl?)

> The language is the crystal. It must be good in the beginning,
> it becomes more solid with every project written in that language.

Many people don't seem to realise that Ruby is, what, 15 years old now?

Regards,

Brian.