Issue #9614 has been updated by Eric Wong.


 Hi, thank you for the comments!
 
 ko1: adding flag for ordering might complicate the st code even more.
 I think st should only be for implementing Hash.
 
 My primary goals for ihash are to reduce pointer chasing and malloc use.
 ihash uses container_of, like ccan/list:
 
   st (unpacked entries):
     st_table -> st_table->bins -> st_table_entry -> data structure
 
   st (packed entries):
     st_table -> st_table->packed.entries -> data structure
 
   ihash:
     rb_ihash_table -> data structure (via container_of)
 
 Removing ordering requirment allows us to avoid writing/maintaining more
 (new) code and also save two pointers (fore/prev).
 
 nobu: reduction depends on what is stored
 
 Best case so far are method table and symbol table:
 
     For each method entry, we save:
 
         sizeof(st_table_entry) - sizeof(rb_ihash_node) + malloc_overhead
 
     On 64-bit with glibc malloc: 48 - 8 + 16 = 56 bytes saved.
 
     AFAIK, jemalloc has less overhead for small allocations, but even
     without malloc overhead, saving 40 bytes per method entry is great.
 
 Symbol table is a big win, too, one single struct has membership
 in both sym_id and id_str tables:
 
     struct rb_idsym {
         struct rb_ihash_node id_str_node; /* sizeof(void *) */
         ID id;
         struct rb_ihash_node sym_id_node;
         VALUE symstr;
         st_index_t hashval;
     };
 
     On 64-bit:
     Before: 48 + 48 = 96  (+ 32 bytes on glibc malloc)
     After:  40            (+ 16 bytes on glibc malloc)
 
     If we add ordering to Symbol.all_symbols, we only need one pair
     of list pointers and not two.  But I prefer to not need any :)
 
 Worst case is only saving two pointers (ivar table); but I may look at
 funny-falcon's sparse array for the ivar table.  I have not looked at
 the local variable table, but it may be in the same class as ivar
 tables and be better with a sparse array.
 
 
 P.S.: I am also considering truncating rb_idsym.hashval to use as ID.
 This will:
 a) reduce per-symbol overhead by sizeof(st_index_t)
 b) (hopefully) improve distribution for global method cach
    Measuring the effect of b) may be hard, though.

----------------------------------------
Feature #9614: ordering of non-Hash items which use st_ internally
https://bugs.ruby-lang.org/issues/9614#change-46721

* Author: Eric Wong
* Status: Open
* Priority: Normal
* Assignee: Yukihiro Matsumoto
* Category: core
* Target version: current: 2.2.0
----------------------------------------
Hi matz, I would like your permission to remove the order preservation from
any or all of the following currently implemented using `st_table`:

* method tables
* global symbols (`Symbol.all_symbols`)
* constant tables
* instance variable tables
* `global_variables` method
* `Thread#keys`
* anything besides the `Hash` class

I am currently working on a patch series to reduce internal memory usage,
so far I have only converted three pieces:

1. method tables (~200K reduction)
2. symbol table (`global_symbols.`{`id_str`,`sym_id`}) (~200K)
3. `frozen_strings` (~100K)

n.b. `frozen_strings` ordering is never exposed to users, so I expect
it to be OK.

Memory reduction is just based on "`ruby -e exit`" (which loads RubyGems);
bigger programs with more methods/symbols will save more memory.

Work-in-progress patches attached (0002 describes implementation details)


---Files--------------------------------
0001-adjust-tests-to-account-for-unsorted-methods.patch (1.87 KB)
0002-ihash-initial-implementation-method-table-conversion.patch (32.8 KB)
0004-ihash-implement-rb_ihash_update-to-support-rb_fstrin.patch (7.84 KB)
0003-parse.y-switch-to-ihash-saves-200K-out-of-the-box.patch (9.51 KB)


-- 
https://bugs.ruby-lang.org/