Issue #11396 has been updated by Bruno Escherl.


Eric Wong wrote:
> Possible fix is to memoize hashval inside struct RSymbol:
>  
>  http://80x24.org/spew/m/1437992270-20549-1-git-send-email-e / 80x24.org.txt
>  
>  Much better than before, but still slower than 2.1, I think.
>  
>  Only lightly-tested, and on hardware which doesn't benefit from
>  power-of-two sizing anyways.  Sorry busy and don't have access to
>  better HW for benchmarking for a few days.

Hi Eric, I compiled the ruby_2_2 branch with your patch and got the following results

ruby 2.1.6p336 (2015-04-13 revision 50298) [x86_64-darwin14.0]

              string    144.345  ( 3.5%) i/s -    728.000 
              symbol    506.609  ( 2.4%) i/s -      2.550k

ruby 2.2.3p147 (2015-07-04 revision 51143) [x86_64-darwin14] without patch

              string    138.830  ( 6.5%) i/s -    700.000 
              symbol     75.236  ( 4.0%) i/s -    378.000 

ruby 2.2.3p147 (2015-07-04 revision 51143) [x86_64-darwin14] with patch

              string    147.566  ( 4.7%) i/s -    742.000 
              symbol    495.675  ( 6.9%) i/s -      2.494k

The patch is working and getting quite close to 2.1.6 :-)

For more realistic hashes I also used the script with 2000 keys and 100 lookups:

ruby 2.1.6p336 (2015-04-13 revision 50298) [x86_64-darwin14.0]

              string     43.020k ( 1.1%) i/s -    217.406k
              symbol     72.882k ( 0.9%) i/s -    367.565k

ruby 2.2.2p95 (2015-04-13 revision 50295) [x86_64-darwin14]

              string     43.348k ( 2.8%) i/s -    219.402k
              symbol     22.336k ( 3.6%) i/s -    113.049k

ruby 2.2.3p147 (2015-07-04 revision 51143) [x86_64-darwin14] without patch

              string     44.412k ( 3.9%) i/s -    224.773k
              symbol     41.240k ( 3.3%) i/s -    209.721k

ruby 2.2.3p147 (2015-07-04 revision 51143) [x86_64-darwin14] with patch

              string     44.537k ( 2.3%) i/s -    224.561k
              symbol     85.511k ( 1.7%) i/s -    427.952k

So performance-wise this looks great! Can't judge of course, if there couldbe other side effects of this change.


----------------------------------------
Bug #11396: Bad performance in ruby >= 2.2 for Hash with many symbol keys
https://bugs.ruby-lang.org/issues/11396#change-53566

* Author: Bruno Escherl
* Status: Open
* Priority: Normal
* Assignee: 
* ruby -v: 
* Backport: 2.0.0: UNKNOWN, 2.1: UNKNOWN, 2.2: UNKNOWN
----------------------------------------
This started out as an issue on stackoverflow, where I found strange performance anomalies when comparing Set.include? and Array.include? in differentruby versions: http://stackoverflow.com/questions/31631284/performance-anomaly-in-ruby-set-include-with-symbols-2-2-2-vs-2-1-6

In the end it came down to problems with lookup of Hash keys. While for smaller Hashes the performance issues went away using ruby_2_2 branch, they staid for bigger Hashes. I'll attach a benchmark script (hash_bench_3.rb) I used that creates a Hash with 200000 keys and does a lookup of 10000 of them. 

Here my results:

ruby 2.1.6p336 (2015-04-13 revision 50298) [x86_64-darwin14.0]

              string    142.818  ( 2.8%) i/s -    714.000 
              symbol    505.831  ( 3.0%) i/s -      2.550k

ruby 2.2.2p95 (2015-04-13 revision 50295) [x86_64-darwin14]

              string    143.404  ( 3.5%) i/s -    728.000 
              symbol     76.945  ( 6.5%) i/s -    385.000 

ruby 2.2.3p147 (2015-07-04 revision 51143) [x86_64-darwin14] self-compiled

              string    138.349  ( 2.2%) i/s -    702.000 
              symbol     77.495  ( 3.9%) i/s -    392.000 

As you can see 2.2 is much slower than 2.1.6 for symbol keys. I was recommended to disable Garbage Collection for Symbols for testing and did so on the ruby_2_2 branch

ruby 2.2.3p147 (2015-07-04 revision 51143) [x86_64-darwin14] self-compiled,USE_SYMBOL_GC=0

              string    145.179  ( 3.4%) i/s -    728.000 
              symbol    602.008  ( 7.6%) i/s -      3.050k

I would have expected that symbol GC may have some performance impact, but this looks like it is too big. I can't say exactly at which point Garbage Collection really hurts, but the bigger the Hash and the bigger the number of include? calls, the slower it gets.

---Files--------------------------------
hash_bench_3.rb (605 Bytes)


-- 
https://bugs.ruby-lang.org/