Issue #7791 has been updated by marcandre (Marc-Andre Lafortune).


kstephens (Kurt  Stephens) wrote:
> Mark and sweep the symbol table like any other GC heap.
> 
> However there are some issues in the C API.
> 
> IDs in the C API are distinct from other values and Ruby extensions expected them to be pinned (never GCd).
> Therefore, the C API must support registering ID variables with the GC or mark all Symbols requested through the C API as pinned.
> The latter could be achieved by having a "pinned" flag in the String->Symbol table entries.
> Symbols that are created and reachable by usual means (i.e.: String#to_sym) do not initially set this "pinned" flag.
> 
> When the GC is marking Symbols, it set a "marked" flag in the corresponding symbol table entry.
> The GC sweep phase scans the symbol table and reclaims entries where !(e.pinned && e.marked) and clears e.marked.

+1, exactly what I was thinking.

> My employer will sponsor me to do this work.

Awesome news!

My personal thanks to your employer (Enova Financials, right?)


----------------------------------------
Feature #7791: Let symbols be garbage collected
https://bugs.ruby-lang.org/issues/7791#change-37606

Author: rosenfeld (Rodrigo Rosenfeld Rosas)
Status: Feedback
Priority: Normal
Assignee: matz (Yukihiro Matsumoto)
Category: core
Target version: next minor


Lots of Denial-of-Service security vulnerabilities exploited in Ruby programs rely on symbols not being collected by garbage collector.

Ideally I'd prefer symbols and strings to behave exactly the same being just alternate ways of writing strings but I'll let this to another ticket.

This one simply asks for symbols to be allowed to be garbage collected when low on memory. Maybe one could set up some up-limit memory constraints dedicated to storing symbols. That way, the most accessed symbols would remain in that memory region and the least used ones would be reclaimed when the memory for symbols is over and a new symbol is created.

Or you could just allow symbols to be garbage collected any time. Any reasons why this would be a bad idea? Any performance benchmark demonstrating how using symbols instead of strings would make a real-world software perform much better?

Currently I only see symbols slowing down processing because people don't want to worry about it and will often use something like ActiveSupport Hash#with_indifferent_access or some other method to convert a string to symbol or vice versa...


-- 
http://bugs.ruby-lang.org/