Issue #10333 has been updated by Eric Wong.


 Eric Wong <normalperson / yhbt.net> wrote:
 > Maybe start moving existing iseq_compile_each optimizations to the
 > peephole optimizer (work-in-progress):
 > http://80x24.org/spew/m/ee49aae645e0953fc16fc1557dce6a09b4de4324.txt
 
 Part #2, generic putstring_for instruction:
 http://80x24.org/spew/m/5a77be4e211c81a509573e3e1ca3bc3ca2383e68.txt
 
 A new putstring_for instruction may replace all current uses of:
 
 * opt_str_freeze
 * opt_aref_with
 * opt_aset_with
 
 This new instruction should also be usable to implement new
 optimizations to avoid rb_str_resurrect.
 
 Optimizations for literal hash["literal"] (aref/lookup) and
 "literal".freeze are easily moved to the peephole optimizer.
 
 However, it seems easier to optimize `hash["literal"] = val'
 in iseq_compile_each right now.
 
 This reduces performance compared to the old opt_aref_with and
 opt_aset_with instructions slightly, but is more elegant for in
 avoiding special cases.  We may decide to resurrect opt_aref_with
 and opt_aset_with if we want to recover the small performance loss
 and can accept a bigger VM loop.
 
 "".freeze performance is probably not interesting to anyone :)
 
 benchmark results:
 minimum results in each 5 measurements.
 Execution time (sec)
 name                    2.1.3   trunk   built
 loop_whileloop2         0.106   0.106   0.106
 vm2_hash_aref_lit*      0.503   0.162   0.192
 vm2_hash_aset_lit*      0.587   0.214   0.241
 
 Speedup ratio: compare with the result of `2.1.3' (greater is better)
 name                    trunk   built
 loop_whileloop2         1.000   0.998
 vm2_hash_aref_lit*      3.099   2.621
 vm2_hash_aset_lit*      2.741   2.435

----------------------------------------
Feature #10333: [PATCH 3/1] optimize: "yoda literal" == string
https://bugs.ruby-lang.org/issues/10333#change-49308

* Author: Eric Wong
* Status: Open
* Priority: Normal
* Assignee: 
* Category: core
* Target version: current: 2.2.0
----------------------------------------
This is a follow-up-to:

1) [Feature #10326] optimize: recv << "literal string"
2) [Feature #10329] optimize: foo == "literal string"

This can be slightly faster than: (string == "literal") because
we can guaranteed the "yoda literal" is already a string at
compile time.

Updated benchmarks from Xeon E3-1230 v3 @ 3.30GHz:

target 0: trunk (ruby 2.2.0dev (2014-10-06 trunk 47822) [x86_64-linux]) at "/home/ew/rrrr/b/i/bin/ruby"
target 1: built (ruby 2.2.0dev (2014-10-06 trunk 47822) [x86_64-linux]) at "/home/ew/ruby/b/i/bin/ruby"

-----------------------------------------------------------
loop_whileloop2

~~~ruby
i = 0
while i< 6_000_000 # benchmark loop 2
  i += 1
end
~~~
~~~
trunk	0.10712811909615993
trunk	0.10693809622898698
trunk	0.10645449301227927
trunk	0.10646287119016051
built	0.10612367931753397
built	0.10581812914460897
built	0.10592922195792198
built	0.10595094738528132
~~~
-----------------------------------------------------------
vm2_streq1

~~~ruby
i = 0
foo = "literal"
while i<6_000_000 # benchmark loop 2
  i += 1
  foo == "literal"
end
~~~
~~~
trunk	0.47250875690951943
trunk	0.47325073881074786
trunk	0.4726782930083573
trunk	0.4727754699997604
built	0.185972370672971
built	0.1850820742547512
built	0.18558283289894462
built	0.18452610215172172
~~~
-----------------------------------------------------------
vm2_streq2

~~~ruby
i = 0
foo = "literal"
while i<6_000_000 # benchmark loop 2
  i += 1
  "literal" == foo
end
~~~
~~~
trunk	0.4719057851471007
trunk	0.4715963830240071
trunk	0.47177061904221773
trunk	0.4724834677763283
built	0.18247668212279677
built	0.18143231887370348
built	0.18060296680778265
built	0.17929687118157744
~~~
-----------------------------------------------------------
raw data:

~~~
[["loop_whileloop2",
  [[0.10712811909615993,
    0.10693809622898698,
    0.10645449301227927,
    0.10646287119016051],
   [0.10612367931753397,
    0.10581812914460897,
    0.10592922195792198,
    0.10595094738528132]]],
 ["vm2_streq1",
  [[0.47250875690951943,
    0.47325073881074786,
    0.4726782930083573,
    0.4727754699997604],
   [0.185972370672971,
    0.1850820742547512,
    0.18558283289894462,
    0.18452610215172172]]],
 ["vm2_streq2",
  [[0.4719057851471007,
    0.4715963830240071,
    0.47177061904221773,
    0.4724834677763283],
   [0.18247668212279677,
    0.18143231887370348,
    0.18060296680778265,
    0.17929687118157744]]]]

Elapsed time: 6.097474559 (sec)
~~~
-----------------------------------------------------------
benchmark results:
minimum results in each 4 measurements.
Execution time (sec)

name	|trunk	|built
--------+-------+-------
loop_whileloop2	|0.106	|0.106
vm2_streq1*	|0.366	|0.079
vm2_streq2*	|0.365	|0.073

Speedup ratio: compare with the result of `trunk' (greater is better)

name	|built
--------+-------
loop_whileloop2	|1.006
vm2_streq1*	|4.651
vm2_streq2*	|4.969
---
~~~
 benchmark/bm_vm2_streq2.rb |  6 ++++++
 compile.c                  | 20 +++++++++++++++++++-
 insns.def                  | 20 ++++++++++++++++++++
 test/ruby/test_string.rb   | 12 ++++++++----
 4 files changed, 53 insertions(+), 5 deletions(-)
 create mode 100644 benchmark/bm_vm2_streq2.rb
~~~

---Files--------------------------------
0001-optimize-yoda-literal-string.patch (6.23 KB)


-- 
https://bugs.ruby-lang.org/