Issue #7791 has been updated by kstephens (Kurt  Stephens).


Student (Nathan Zook) wrote:
> Questions: 
> 1) How certain are you that this covers all of the cases?

With the unit and functional tests that already exist.  What cases do you have in mind?

> 2) In order to actually recover the memory, the symbol table has to be walked each time a symbol is created.  What are the implications of this?

The symbol table only has to be walked during sweep and can be done incrementally.

Progress:

I have a branch that alters global_symbols st_tables to map id->symbol_entry and string->symbol_entry.  This works but complicates reuse of collected IDs, they are finite.  The costs of determining which IDs are reclaimed make this design inefficient and convoluted.  So...

I'm considering creating a first-class struct RSymbol and making ID synonymous with VALUE, such that all IDs are VALUEs pointing to RSymbols; ID2SYM() and SYM2ID() become identity functions.

It may break C ABI because sizeof(ID) may not be equal to sizeof(VALUE) on some platforms, and requires minor changes to parser.y, but will make everything else much simpler.   Subsequently, all IDs in internal structures must be rb_gc_mark()ed.

Assuming that ID rb_intern(const char *x) means "pin the symbol with the name x", VALUE rb_intern_str_collectible(VALUE x) means "the symbol with the name x, pinned or not".  This should reduce required changes to C API extensions that will not rb_gc_mark() on all IDs, until they adopt a new contract.


----------------------------------------
Feature #7791: Let symbols be garbage collected
https://bugs.ruby-lang.org/issues/7791#change-37627

Author: rosenfeld (Rodrigo Rosenfeld Rosas)
Status: Feedback
Priority: Normal
Assignee: matz (Yukihiro Matsumoto)
Category: core
Target version: next minor


Lots of Denial-of-Service security vulnerabilities exploited in Ruby programs rely on symbols not being collected by garbage collector.

Ideally I'd prefer symbols and strings to behave exactly the same being just alternate ways of writing strings but I'll let this to another ticket.

This one simply asks for symbols to be allowed to be garbage collected when low on memory. Maybe one could set up some up-limit memory constraints dedicated to storing symbols. That way, the most accessed symbols would remain in that memory region and the least used ones would be reclaimed when the memory for symbols is over and a new symbol is created.

Or you could just allow symbols to be garbage collected any time. Any reasons why this would be a bad idea? Any performance benchmark demonstrating how using symbols instead of strings would make a real-world software perform much better?

Currently I only see symbols slowing down processing because people don't want to worry about it and will often use something like ActiveSupport Hash#with_indifferent_access or some other method to convert a string to symbol or vice versa...


-- 
http://bugs.ruby-lang.org/