Issue #9425 has been updated by Eric Wong. normalperson / yhbt.net wrote: > OK, I was right about compare_by_identity being worse with power-of-two, > but I fixed it by tweaking numhash: > > http://bogomips.org/ruby.git/patch?id=1579e9d0d82789 I tweaked that further, adding +3 instead of +1 to RUBY_SPECIAL_SHIFT in r45384. Also updated NEWS in case some extensions need tweaking for performance. Haswell Xeon E3-1230 v3 numbers: hash_aref_miss 1.166 hash_aref_str 1.167 hash_aref_sym 1.224 hash_aref_sym_long 1.270 hash_flatten 1.656 hash_ident_num 1.142 hash_ident_obj 1.193 hash_ident_str 1.194 hash_ident_sym 1.171 hash_keys 1.002 hash_shift 1.122 hash_values 1.006 loop_whileloop2 1.001 vm2_bighash* 1.233 ---------------------------------------- Feature #9425: [PATCH] st: use power-of-two sizes to avoid slow modulo ops https://bugs.ruby-lang.org/issues/9425#change-45904 * Author: Eric Wong * Status: Closed * Priority: Low * Assignee: * Category: core * Target version: current: 2.2.0 ---------------------------------------- Prime number-sized hash tables are only needed to compensate for bad hash functions. Ruby has good hash functions nowadays, so reduce our code size with power-of-two-sized hash tables which allows us to avoid the slow modulo operation. I expected numhash performance to be worse, but it seems most of those hashes are either too-small-to-matter or well-distributed anyways. If we find problems with some existing numhashes we should start using a proper hash function (Murmur via st_hash_uint) on those. This consistently speeds up the bm_hash_flatten and bm_vm2_bighash. target 0: trunk (ruby 2.2.0dev (2014-01-17 trunk 44631) [x86_64-linux]) at "/home/ew/rrrr/i/bin/ruby --disable=gems" target 1: st-noprime (ruby 2.2.0dev (2014-01-17 trunk 44631) [x86_64-linux]) at "/home/ew/ruby/i/bin/ruby --disable=gems" Benchmarks on a Xeon E3-1230 v3 CPU: minimum results in each 10 measurements. Execution time (sec) name trunk st-noprime hash_flatten 0.500 0.345 hash_keys 0.191 0.192 hash_shift 0.019 0.018 hash_values 0.201 0.200 loop_whileloop2 0.090 0.090 vm2_bighash* 4.457 3.578 Speedup ratio: compare with the result of `trunk' (greater is better) name st-noprime hash_flatten 1.451 hash_keys 0.998 hash_shift 1.046 hash_values 1.003 loop_whileloop2 1.000 vm2_bighash* 1.246 Somewhat less impressive on an AMD FX 8320: minimum results in each 10 measurements. Execution time (sec) name trunk st-noprime hash_flatten 0.633 0.596 hash_keys 0.236 0.232 hash_shift 0.031 0.032 hash_values 0.234 0.238 loop_whileloop2 0.135 0.135 vm2_bighash* 8.198 6.982 Speedup ratio: compare with the result of `trunk' (greater is better) name st-noprime hash_flatten 1.063 hash_keys 1.020 hash_shift 0.976 hash_values 0.982 loop_whileloop2 1.000 vm2_bighash* 1.174 ---------------------------------------------------------------- The following changes since commit 2c3522c3e403dfdadaaf6095564bde364cc4bddf: test_thread.rb: stop at once (2014-01-16 06:34:47 +0000) are available in the git repository at: git://80x24.org/ruby.git st-noprime for you to fetch changes up to ed4f4103f4f407ed99dd6cd25b6c35d3aa9f3479: st: use power-of-two sizes to avoid slow modulo ops (2014-01-18 04:05:21 +0000) ---------------------------------------------------------------- Eric Wong (1): st: use power-of-two sizes to avoid slow modulo ops st.c | 100 +++++++++++++++++-------------------------------------------------- 1 file changed, 25 insertions(+), 75 deletions(-) ---Files-------------------------------- 0001-st-use-power-of-two-sizes-to-avoid-slow-modulo-ops.patch (10.9 KB) -- https://bugs.ruby-lang.org/