Issue #12142 has been updated by Yura Sokolov.


Good day, everyone.

I've updated my version of st.c and hashing patch.
Main difference in st implementation from Vladimir Makarov's patch is
usage of closed addressing instead of open addressing.
Code size approximately:
- my version of st.c is near 1000 lines 
- Makarov's one is near 1250 lines
(both without huge comments and tables of sizes, Makarov's without Quadratic probing also).

Also there are:
- fixes for Murmur (it had issues with st_hash_uint and st_hash_end)
  (i've also replaced Murmur1/2 to Murmur3).
- make Murmur seeded, and use it for 'identity' hashes.
- suggestion to use SipHash13 instead SipHash24 ( Rust already accepted it)
- suggestion to use 32bit SipHash's cousine for 32bit architecture

Overall performance is on par with Vladimir's patch.
(note: Vladimir's patch removes seeding for fixnum/double/symbol hashing,
so it could be faster for keys of such types)
(I've tested against Eric's Wong mailbox applied on trunk, so no CityHash)

Uploaded mailbox patch:
https://bugs.ruby-lang.org/attachments/download/6061/hash_improvements_and_st_implementation_changes.mbox

Also could be seen on github:
https://github.com/funny-falcon/ruby/compare/trunk...funny-falcon:st_table_with_array2
https://github.com/funny-falcon/ruby/compare/trunk...funny-falcon:st_table_with_array2.patch


Performance of Redmine (software that runs bugs.ruby-lang.org):

````
ab -n 1000 -c 10 http://localhost:3000/projects/general/issues
trunk
Requests per second:    27.51 [#/sec] (mean)
Requests per second:    27.54 [#/sec] (mean)
Requests per second:    27.69 [#/sec] (mean)
makarov
Requests per second:    28.29 [#/sec] (mean)
Requests per second:    28.12 [#/sec] (mean)
Requests per second:    28.30 [#/sec] (mean)
falcon
Requests per second:    28.31 [#/sec] (mean)
Requests per second:    28.16 [#/sec] (mean)
Requests per second:    28.26 [#/sec] (mean)

ab -n 1000 -c 10 http://localhost:3000/issues/1
trunk
Requests per second:    26.37 [#/sec] (mean)
Requests per second:    26.51 [#/sec] (mean)
Requests per second:    26.48 [#/sec] (mean)
makarov
Requests per second:    27.39 [#/sec] (mean)
Requests per second:    27.41 [#/sec] (mean)
Requests per second:    27.75 [#/sec] (mean)
falcon
Requests per second:    27.54 [#/sec] (mean)
Requests per second:    27.23 [#/sec] (mean)
Requests per second:    27.47 [#/sec] (mean)
````

Tests using benchmark/driver.rb (tested miniruby):

````
Speedup ratio: compare with the result of `trunk' (greater is better)
64bit + jemalloc
name                makarov  falcon
bighash               1.320   1.447
hash_aref_dsym        0.976   1.009
hash_aref_dsym_long   1.474   1.459
hash_aref_fix         1.042   1.023
hash_aref_flo         1.671   1.595
hash_aref_miss        1.014   1.154
hash_aref_str         1.023   1.123
hash_aref_sym         0.997   1.016
hash_aref_sym_long    1.014   1.005
hash_flatten          1.179   1.097
hash_ident_flo        0.975   1.008
hash_ident_num        0.960   0.991
hash_ident_obj        0.939   0.967
hash_ident_str        0.949   0.958
hash_ident_sym        0.965   0.969
hash_keys             1.956   1.998
hash_long             1.004   1.203
hash_shift            1.280   1.324
hash_shift_u16        1.270   1.315
hash_shift_u24        1.263   1.304
hash_shift_u32        1.256   1.298
hash_small2           1.065   1.063
hash_small4           1.061   1.055
hash_small8           1.818   1.608
hash_to_proc          1.006   0.976
hash_values           1.749   1.775
loop_whileloop2       1.003   0.997
vm2_bighash*          3.465   3.101

64bit no jemalloc
name                makarov  falcon
bighash               1.741   1.840
hash_aref_dsym        0.934   0.970
hash_aref_dsym_long   1.546   1.519
hash_aref_fix         1.045   1.041
hash_aref_flo         1.734   1.654
hash_aref_miss        0.980   1.022
hash_aref_str         1.016   1.068
hash_aref_sym         1.000   1.015
hash_aref_sym_long    1.017   1.004
hash_flatten          1.172   1.179
hash_ident_flo        0.955   0.987
hash_ident_num        0.960   0.955
hash_ident_obj        0.952   0.963
hash_ident_str        0.947   0.956
hash_ident_sym        0.990   1.005
hash_keys             2.820   2.870
hash_long             0.995   1.206
hash_shift            1.437   1.461
hash_shift_u16        1.397   1.426
hash_shift_u24        1.389   1.426
hash_shift_u32        1.385   1.412
hash_small2           1.059   1.015
hash_small4           1.068   1.020
hash_small8           2.275   1.886
hash_to_proc          1.037   1.014
hash_values           2.811   2.850
loop_whileloop2       1.002   1.007
vm2_bighash*          3.429   3.372

32bit no jemalloc
name                makarov  falcon
bighash               1.103   1.267
hash_aref_dsym        0.991   1.014
hash_aref_dsym_long   1.356   1.444
hash_aref_fix         0.992   1.017
hash_aref_flo         1.175   2.051
hash_aref_miss        1.017   1.468
hash_aref_str         1.010   1.530
hash_aref_sym         1.021   1.051
hash_aref_sym_long    1.032   1.052
hash_flatten          1.073   1.122
hash_ident_flo        1.124   1.114
hash_ident_num        0.900   0.914
hash_ident_obj        0.921   0.947
hash_ident_str        0.908   0.945
hash_ident_sym        0.910   0.949
hash_keys             2.083   2.095
hash_long             0.997   2.146
hash_shift            1.437   1.442
hash_shift_u16        1.316   1.383
hash_shift_u24        1.319   1.348
hash_shift_u32        1.446   1.752
hash_small2           1.041   0.955
hash_small4           1.044   0.961
hash_small8           1.570   1.378
hash_to_proc          1.007   0.989
hash_values           2.215   2.252
loop_whileloop2       1.002   1.000
vm2_bighash*          3.268   2.969
````

----------------------------------------
Feature #12142: Hash tables with open addressing
https://bugs.ruby-lang.org/issues/12142#change-59612

* Author: Vladimir Makarov
* Status: Open
* Priority: Normal
* Assignee: 
----------------------------------------
~~~
 Hello, the following patch contains a new implementation of hash
tables (major files st.c and include/ruby/st.h).

  Modern processors have several levels of cache.  Usually,the CPU
reads one or a few lines of the cache from memory (or another level of
cache).  So CPU is much faster at reading data stored close to each
other.  The current implementation of Ruby hash tables does not fit
well to modern processor cache organization, which requires better
data locality for faster program speed.

The new hash table implementation achieves a better data locality
mainly by

  o switching to open addressing hash tables for access by keys.
    Removing hash collision lists lets us avoid *pointer chasing*, a
    common problem that produces bad data locality.  I see a tendency
    to move from chaining hash tables to open addressing hash tables
    due to their better fit to modern CPU memory organizations.
    CPython recently made such switch
    (https://hg.python.org/cpython/file/ff1938d12240/Objects/dictobject.c).
    PHP did this a bit earlier
    https://nikic.github.io/2014/12/22/PHPs-new-hashtable-implementation.html.
    GCC has widely-used such hash tables
    (https://gcc.gnu.org/svn/gcc/trunk/libiberty/hashtab.c) internally
    for more than 15 years.

  o removing doubly linked lists and putting the elements into an array
    for accessing to elements by their inclusion order.  That also
    removes pointer chaising on the doubly linked lists used for
    traversing elements by their inclusion order.

A more detailed description of the proposed implementation can be
found in the top comment of the file st.c.

The new implementation was benchmarked on 21 MRI hash table benchmarks
for two most widely used targets x86-64 (Intel 4.2GHz i7-4790K) and ARM
(Exynos 5410 - 1.6GHz Cortex-A15):

make benchmark-each ITEM=bm_hash OPTS='-r 3 -v' COMPARE_RUBY='<trunk ruby>'

Here the results for x86-64:

hash_aref_dsym       1.094
hash_aref_dsym_long          1.383
hash_aref_fix        1.048
hash_aref_flo        1.860
hash_aref_miss       1.107
hash_aref_str        1.107
hash_aref_sym        1.191
hash_aref_sym_long           1.113
hash_flatten         1.258
hash_ident_flo       1.627
hash_ident_num       1.045
hash_ident_obj       1.143
hash_ident_str       1.127
hash_ident_sym       1.152
hash_keys            2.714
hash_shift           2.209
hash_shift_u16       1.442
hash_shift_u24       1.413
hash_shift_u32       1.396
hash_to_proc         2.831
hash_values          2.701

The average performance improvement is more 50%.  ARM results are
analogous -- no any benchmark performance degradation and about the
same average improvement.

The patch can be seen as

https://github.com/vnmakarov/ruby/compare/trunk...hash_tables_with_open_addressing.patch

or in a less convenient way as pull request changes

https://github.com/ruby/ruby/pull/1264/files


This is my first patch for MRI and may be my proposal and
implementation have pitfalls.  But I am keen to learn and work on
inclusion of this code into MRI.

~~~

---Files--------------------------------
0001-st.c-change-st_table-implementation.patch (59.4 KB)
st-march31.patch (114 KB)
base.patch (93.8 KB)
hash.patch (4.48 KB)
strong_hash.patch (8.08 KB)
city.patch (19.4 KB)
new-hash-table-benchmarks.patch (1.34 KB)
hash_improvements_and_st_implementation_changes.mbox (101 KB)


-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>