Issue #12142 has been updated by Vladimir Makarov.


Koichi Sasada wrote:
> Sorry for my lazy-ness, I evaluated your two implementations (ffalcon has two versions, HUGEHASH or not).
> 
> Now, I depicted all of my evaluations in charts, so I introduce this report: <https://gist.github.com/ko1/dda8243134bc40f7fc05e293abc5f4b2#file-report-md>
> I'll add explanation of this evaluation soon.

Koichi, thank you for your detail investigation of the two tables.  Looking
at this report, I got an impression that you spent more time on this
report than we spent time on our implementations.

For the record, here is my comparison the same tables on different
tests (MRI hash table benchmarks) as a speed up ratio vs trunk:


```
                Yura    Yura64    My
bighash         1.657   1.751     1.612
hash_aref_dsym  0.929   0.927     0.967
hash_aref_dsym_long     1.314   1.444     1.427
hash_aref_fix   0.981   0.974     1.051
hash_aref_flo   1.662   1.656     1.908
hash_aref_miss  1.213   1.195     1.209
hash_aref_str   1.217   1.250     1.338
hash_aref_sym   0.924   0.932     0.992
hash_aref_sym_long      1.001   1.003     1.042
hash_flatten    1.082   1.091     1.184
hash_ident_flo  0.912   0.919     0.970
hash_ident_num  0.924   0.918     0.971
hash_ident_obj  0.878   0.876     0.963
hash_ident_str  0.902   0.896     0.960
hash_ident_sym  0.952   0.950     0.978
hash_keys       2.730   2.806     2.813
hash_long       1.439   1.463     1.564
hash_shift      1.297   1.283     1.403
hash_shift_u16  1.229   1.325     1.387
hash_shift_u24  1.224   1.201     1.378
hash_shift_u32  1.220   1.201     1.376
hash_small2     1.081   0.985     1.040
hash_small4     1.055   0.980     1.050
hash_small8     2.006   2.009     2.358
hash_to_proc    0.981   0.973     1.027
hash_values     2.817   2.790     2.823
vm2_bighash*    2.423   2.180     3.264
                1.33519 1.33252 1.44648
```

I used the following script on 4.2GHz i7-4790K for this:


```
ruby ../ruby/benchmark/driver.rb -p hash -r 3 -e trunk::trunk/miniruby -e yura::yura/miniruby -e yura::yura64/miniruby -e current::./miniruby 2>/dev/n\
ull|awk 'NF==4 && /hash/ {s1+=$2;s2+=$3;s3+=$4;n++;print} END{print s1/n, s2/n, s3/n}'
```

You wrote **"I agree to determine the implementation with coin toss"**
in your report.  I think it is better not do this.

My implementation is the original one.  Yura just repeated my major
ideas: putting elements in array, open hash addressing (although he
fiercely arguing with me a lot that the buckets are better), using
weaker hash function when there is no supension of a collision attack.
I did not use any his ideas.

In fact I stopped to improve my tables in April, Yura produced his
latest variant in September (btw it would be interesting to see also
comparison of his other two variants with buckets he proposed in July.
I put some such comparison on
https://github.com/vnmakarov/ruby/blob/hash_tables_with_open_addressing/README.md).

I believed the competition was unhealthy and it seems I was right as
Yura spent his time on his implementation and you had to spend your time on your thorough
investigation and still you need "a coin toss" to decide.

I'd like to propose a plan which is to use my code as the base and
Yura can add his own original code later:

o fixing Murmur hash in MRI
o different hash tables growth rate (it needs performance evaluation)
o smaller hash table header (usually it is accessed randomly and my
  header is just one cache line, further size decreasing hardly improves
  performance, imho)
o statistical collision attack recognition (i doubt it will improves
  performance as the surrounded code is memory bound)

Again this competition was unhealthy at least for me.  I could have
spent more time on more serious MRI performance improvement.

I'll respect any your decision, it will be enough for me that all my
major ideas about MRI hash tables is used.



----------------------------------------
Feature #12142: Hash tables with open addressing
https://bugs.ruby-lang.org/issues/12142#change-61178

* Author: Vladimir Makarov
* Status: Open
* Priority: Normal
* Assignee: 
----------------------------------------
~~~
 Hello, the following patch contains a new implementation of hash
tables (major files st.c and include/ruby/st.h).

  Modern processors have several levels of cache.  Usually,the CPU
reads one or a few lines of the cache from memory (or another level of
cache).  So CPU is much faster at reading data stored close to each
other.  The current implementation of Ruby hash tables does not fit
well to modern processor cache organization, which requires better
data locality for faster program speed.

The new hash table implementation achieves a better data locality
mainly by

  o switching to open addressing hash tables for access by keys.
    Removing hash collision lists lets us avoid *pointer chasing*, a
    common problem that produces bad data locality.  I see a tendency
    to move from chaining hash tables to open addressing hash tables
    due to their better fit to modern CPU memory organizations.
    CPython recently made such switch
    (https://hg.python.org/cpython/file/ff1938d12240/Objects/dictobject.c).
    PHP did this a bit earlier
    https://nikic.github.io/2014/12/22/PHPs-new-hashtable-implementation.html.
    GCC has widely-used such hash tables
    (https://gcc.gnu.org/svn/gcc/trunk/libiberty/hashtab.c) internally
    for more than 15 years.

  o removing doubly linked lists and putting the elements into an array
    for accessing to elements by their inclusion order.  That also
    removes pointer chaising on the doubly linked lists used for
    traversing elements by their inclusion order.

A more detailed description of the proposed implementation can be
found in the top comment of the file st.c.

The new implementation was benchmarked on 21 MRI hash table benchmarks
for two most widely used targets x86-64 (Intel 4.2GHz i7-4790K) and ARM
(Exynos 5410 - 1.6GHz Cortex-A15):

make benchmark-each ITEM=bm_hash OPTS='-r 3 -v' COMPARE_RUBY='<trunk ruby>'

Here the results for x86-64:

hash_aref_dsym       1.094
hash_aref_dsym_long          1.383
hash_aref_fix        1.048
hash_aref_flo        1.860
hash_aref_miss       1.107
hash_aref_str        1.107
hash_aref_sym        1.191
hash_aref_sym_long           1.113
hash_flatten         1.258
hash_ident_flo       1.627
hash_ident_num       1.045
hash_ident_obj       1.143
hash_ident_str       1.127
hash_ident_sym       1.152
hash_keys            2.714
hash_shift           2.209
hash_shift_u16       1.442
hash_shift_u24       1.413
hash_shift_u32       1.396
hash_to_proc         2.831
hash_values          2.701

The average performance improvement is more 50%.  ARM results are
analogous -- no any benchmark performance degradation and about the
same average improvement.

The patch can be seen as

https://github.com/vnmakarov/ruby/compare/trunk...hash_tables_with_open_addressing.patch

or in a less convenient way as pull request changes

https://github.com/ruby/ruby/pull/1264/files


This is my first patch for MRI and may be my proposal and
implementation have pitfalls.  But I am keen to learn and work on
inclusion of this code into MRI.

~~~

---Files--------------------------------
0001-st.c-change-st_table-implementation.patch (59.4 KB)
st-march31.patch (114 KB)
base.patch (93.8 KB)
hash.patch (4.48 KB)
strong_hash.patch (8.08 KB)
city.patch (19.4 KB)
new-hash-table-benchmarks.patch (1.34 KB)
hash_improvements_and_st_implementation_changes.mbox (101 KB)
hash_improvements_and_st_array_with_open_addressing.mbox (108 KB)


-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>