Issue #16851 has been updated by ana06 (Ana Maria Martinez Gomez).


@byroot

> I'd suggest running the hash related benchmarks included in ruby's repo: =
https://github.com/ruby/ruby/tree/master/benchmark

For most of the benchmarks, this changes doesn't make much of a different b=
ecause they are either too small hashes or use strings or symbols as keys a=
nd this is only implemented for the general keys (objects, integers, etc.).=
 Another important detail is that in the way that it is implemented right n=
ow the filling of the random matrix may be being benchmaked (so I think tha=
t the result of a finished implementation may be even better).

It does make a difference in the `bighash` benchmark. I have increased the =
number of elements inserted in the hash a bit to make the difference cleare=
r. I have also added an extra benchmark, equivalent to this one but with ob=
jects instead of numbers as keys: https://github.com/ruby/ruby/commit/99e0c=
33655dec45a9b36850b6fe2cb36ef5f4d42

Those are the results for those two benchmarks with Simple tabulation and Q=
uadratic Probing [ https://github.com/Ana06/ruby/tree/tabulation ]: =


```
compare-ruby: ruby 2.8.0dev (2020-05-07T16:22:38Z master 7ded8fd29a) [x86_6=
4-linux]
built-ruby: https://github.com/Ana06/ruby/tree/tabulation

Calculating -------------------------------------=B7
                     compare-ruby  built-ruby =

         bighash_obj        0.079       0.099 i/s -       1.000 times in 12=
.591126s 10.078022s
             bighash        0.394       0.454 i/s -       1.000 times in 2.=
541103s 2.204265s

Comparison:
                      bighash_obj
          built-ruby:         0.1 i/s =

        compare-ruby:         0.1 i/s - 1.25x  slower

                          bighash
          built-ruby:         0.5 i/s =

        compare-ruby:         0.4 i/s - 1.15x  slower
```

Those are the results for those two benchmarks with Simple tabulation and t=
he current hash implementation (without Quadratic or Linear Probing) [ http=
s://github.com/Ana06/ruby/tree/tabulation without [95f367a](https://github.=
com/ruby/ruby/commit/95f367a02bdc6b03eaf5ab9ac6c13821e802dbff) ]:

```
compare-ruby: ruby 2.8.0dev (2020-05-07T16:22:38Z master 7ded8fd29a) [x86_6=
4-linux]
built-ruby: https://github.com/Ana06/ruby/tree/tabulation without 95f367a =


Calculating -------------------------------------
                     compare-ruby  built-ruby =

         bighash_obj        0.081       0.094 i/s -       1.000 times in 12=
.271814s 10.585687s
             bighash        0.369       0.409 i/s -       1.000 times in 2.=
711338s 2.445082s

Comparison:
                      bighash_obj
          built-ruby:         0.1 i/s =

        compare-ruby:         0.1 i/s - 1.16x  slower

                          bighash
          built-ruby:         0.4 i/s =

        compare-ruby:         0.4 i/s - 1.11x  slower
```

> @matz
ana06 (Ana Maria Martinez Gomez) Do you need any help?

To finish it so that it can be integrated in production I do need some help=
. I am not sure where the good place to put this code is. Concretely:
* The filling of the random matrix shouldn't be done the first time it is n=
eeded, but it should be part of the "initialization" of Ruby (or the Hash c=
lass). But I don't know which is the best way to do this. Actually, this do=
esn't necessarily need to be done as I have implemented it. Simple Tabulati=
on needs 64KB of random data. Ideally it could even be a shared memory sect=
ion with other programs (I don't think we have to do something like this, b=
ut just pointing out that there may be an smarter solution).
* I have removed some functionality out by placing the Simple Tabulation co=
de in the `any_hash` function directly. With that it is not allowed to over=
write the hash method for the hashing implementation (which maybe is ok, it=
 is done for other objects already, related: https://bugs.ruby-lang.org/iss=
ues/16850). Using arrays as keys is also broken (at least not consistent wi=
th the previous behavior because the array id is now used). I would need so=
me direction how to address these things (probably there is a better place =
to integrate this code).

There would still be some work to do, such as for example implement it for =
32 bites (currently only implemented for 64). But this concretely shouldn't=
 be a big deal.

----------------------------------------
Feature #16851: Ruby hashing algorithm could be improved using Tabulation H=
ashing
https://bugs.ruby-lang.org/issues/16851#change-85631

* Author: ana06 (Ana Maria Martinez Gomez)
* Status: Open
* Priority: Normal
----------------------------------------
I have implemented Linear Probing and Simple tabulation in Ruby: https://gi=
thub.com/Ana06/ruby/compare/master...Ana06:tabulation

I tested it using the following code:
https://github.com/Ana06/ruby-tabulation/blob/master/benchmark_tabulation.rb
It benchmarks the insertion of 600000 elements (by hashing their 64 bits id=
s) in an empty hash 100 times. Below are the times (in seconds) I got execu=
ting this code for different versions of the Ruby code:

- master (without Simple Tabulation): 29.68
- master with Linear Probing (without Simple Tabulation): 99.76
- master with Quadratic Probing (without Simple Tabulation): 29.97
- master with Simple Tabulation: 15.76
- master with Linear Probing and Simple Tabulation: 24.23
- **master with Quadratic Probing and Simple Tabulation: 13.73**

`master` refers to ruby 2.8.0dev:
(2020-05-07T16:22:38Z master 7ded8fd) [x86_64-linux].
I tried with 8 x Intel i5-8265U.

Although this is just a proof of concept and not something that could be al=
ready use in production, I think it proves the potential of Simple Tabulati=
on to improve current Ruby implementation. It may be worthwhile to explore =
this idea further. What do you think?

I did this as part of a project for my [Advanced Algorithms course](http://=
www.cs.columbia.edu/~andoni/advancedS20/index.html). For more details check=
 the project report: =

https://github.com/Ana06/ruby-tabulation/blob/master/latex/RubyTabulation_P=
roject.pdf



-- =

https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=3Dunsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>