Issue #12142 has been updated by Yura Sokolov.


Good day, everyone.

I'm adding alternative patch version for st_table.
It is my compromise with Vladimir and his proposals.
I really count and mention Vladimir Makarov as co-author of this version.

- It uses open-addressing with mixing all hash bits at start (but a bit simpler then LCG).
  Although I still think chaining is more suitable, code for openaddressing looks clearer.
- While I still have compilation option for "huge" hash, it affects only `sizeof(st_table)`,
  layout of entries array is similar both for "maximum 32bit" and "almost unlimited" versions.
- I almost copy "use weak hash and fallback to strong on attack" suggestion (with a bit simpler check).

Pros of my version:

- less code (and I think, it is more readable, but it is just my opinion).
  My version of st_table algorithm has ~561 meaningful lines,
  Vladimir's version is ~803 lines.
- allocation size changes more frequently then `x2`, and tries to satisfy allocator pattern (jemalloc)
- smaller st_table
 - due to trick with one pointer for bins and entries,
 - no additional pointer for "current hash function"
 - version with 32bit index
- lazier entries compation and rehashing on table growth
- "attack" detection is not by full hash equality,
  but with "collision chain is statistically too long" (simpler to implement).
- improvements to `st_hash*` functions and siphash (it could be applied to Vladimir's patch also).

Cons of my version:

- It doesn't mark deleted entries in bins array.
  It allows simplify code significantly, and usually doesn't matter.
  But may lead to a bit more cache misses on insertion into hash with deleted entries.
- Single pointer to bins and entries needs single allocation. It leads to more coarse allocations, and could be disadvantage.
  And it is a _trick_, that could be considered disgusting.
- I do not make separate "reserving" search for `st_insert`, cause `st_insert` is not used in `Hash`.
  (`st_update` of Vladimir's version also do not use reservation).
  It simplifies code, but could be slower in some other places.
  It is not hard to implement it, if considered neccessary.
- "statistically too long collision chain" may happen occasionally, though, it is unlikely.
- no much comments.

https://github.com/funny-falcon/ruby/tree/st_table_with_array3
https://bugs.ruby-lang.org/attachments/download/6159/hash_improvements_and_st_array_with_open_addressing.mbox

I didn't run benchmark.
I've measured by making documentation for ruby, and my version were on par with Vladimir's.
(and yes, it was really hard to make it close to).
(hope benchmark will not be shameful :-) )

----------------------------------------
Feature #12142: Hash tables with open addressing
https://bugs.ruby-lang.org/issues/12142#change-60669

* Author: Vladimir Makarov
* Status: Open
* Priority: Normal
* Assignee: 
----------------------------------------
~~~
 Hello, the following patch contains a new implementation of hash
tables (major files st.c and include/ruby/st.h).

  Modern processors have several levels of cache.  Usually,the CPU
reads one or a few lines of the cache from memory (or another level of
cache).  So CPU is much faster at reading data stored close to each
other.  The current implementation of Ruby hash tables does not fit
well to modern processor cache organization, which requires better
data locality for faster program speed.

The new hash table implementation achieves a better data locality
mainly by

  o switching to open addressing hash tables for access by keys.
    Removing hash collision lists lets us avoid *pointer chasing*, a
    common problem that produces bad data locality.  I see a tendency
    to move from chaining hash tables to open addressing hash tables
    due to their better fit to modern CPU memory organizations.
    CPython recently made such switch
    (https://hg.python.org/cpython/file/ff1938d12240/Objects/dictobject.c).
    PHP did this a bit earlier
    https://nikic.github.io/2014/12/22/PHPs-new-hashtable-implementation.html.
    GCC has widely-used such hash tables
    (https://gcc.gnu.org/svn/gcc/trunk/libiberty/hashtab.c) internally
    for more than 15 years.

  o removing doubly linked lists and putting the elements into an array
    for accessing to elements by their inclusion order.  That also
    removes pointer chaising on the doubly linked lists used for
    traversing elements by their inclusion order.

A more detailed description of the proposed implementation can be
found in the top comment of the file st.c.

The new implementation was benchmarked on 21 MRI hash table benchmarks
for two most widely used targets x86-64 (Intel 4.2GHz i7-4790K) and ARM
(Exynos 5410 - 1.6GHz Cortex-A15):

make benchmark-each ITEM=bm_hash OPTS='-r 3 -v' COMPARE_RUBY='<trunk ruby>'

Here the results for x86-64:

hash_aref_dsym       1.094
hash_aref_dsym_long          1.383
hash_aref_fix        1.048
hash_aref_flo        1.860
hash_aref_miss       1.107
hash_aref_str        1.107
hash_aref_sym        1.191
hash_aref_sym_long           1.113
hash_flatten         1.258
hash_ident_flo       1.627
hash_ident_num       1.045
hash_ident_obj       1.143
hash_ident_str       1.127
hash_ident_sym       1.152
hash_keys            2.714
hash_shift           2.209
hash_shift_u16       1.442
hash_shift_u24       1.413
hash_shift_u32       1.396
hash_to_proc         2.831
hash_values          2.701

The average performance improvement is more 50%.  ARM results are
analogous -- no any benchmark performance degradation and about the
same average improvement.

The patch can be seen as

https://github.com/vnmakarov/ruby/compare/trunk...hash_tables_with_open_addressing.patch

or in a less convenient way as pull request changes

https://github.com/ruby/ruby/pull/1264/files


This is my first patch for MRI and may be my proposal and
implementation have pitfalls.  But I am keen to learn and work on
inclusion of this code into MRI.

~~~

---Files--------------------------------
0001-st.c-change-st_table-implementation.patch (59.4 KB)
st-march31.patch (114 KB)
base.patch (93.8 KB)
hash.patch (4.48 KB)
strong_hash.patch (8.08 KB)
city.patch (19.4 KB)
new-hash-table-benchmarks.patch (1.34 KB)
hash_improvements_and_st_implementation_changes.mbox (101 KB)
hash_improvements_and_st_array_with_open_addressing.mbox (108 KB)


-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>