Issue #10423 has been updated by Eric Wong.


 ko1 / atdot.net wrote:
 > Eric Wong wrote:
 > >  > My comments:
 > >  > 
 > >  > 1. (negative) Incompatibility
 > >  
 > >  > To solve this incompatibility, I have several ideas.
 > >  > 
 > >  > (1-1) Make new instruction to replace with receiver.
 > >  
 > >  I added a new opt_str_lit_yoda instruction.  It only handles one receiver
 > >  location for now, I'm not sure if other cases (e.g. gsub) can trigger
 > >  the incompatibility.
 > >  
 > >  patch: http://80x24.org/spew/m/38d41def78f16bca14fbc7fccd160ef0a8d06479.txt
 > >  
 > >  Right now, I do not optimize `str.gsub("bar", method_call)', yet,
 > >  only optimizing: `str.gsub(anything, "bar")'
 > >  
 > >  	"foo" == bar
 > >  
 > >  Compiles to:
 > >  
 > >    [:putobject, "foo"],
 > >    [:putself],
 > >    [:opt_send_without_block,
 > >     {:mid=>:bar, :flag=>280, :orig_argc=>0, :blockptr=>nil}],
 > >    [:opt_str_lit_yoda, ["foo", 17, 1]],
 > >    [:opt_eq, {:mid=>:==, :flag=>256, :orig_argc=>1, :blockptr=>nil}],
 > 
 > I raed the implementation of "opt_str_lit_yoda". Is this a not bitmap version (single string) of "opt_stringliteral"?
 > You don't use bitmap, because of implementation reason?
 
 Right, it only affected methods with one string literal arg.
 It was a little easier to implement without bitmap.
 
 > BTW, why the name "yoda"?
 
 Named after the syntax of the Star Wars character:
 https://en.wikipedia.org/wiki/Yoda_Conditions
 
 > >  > (1-3) Ignore such incompatibility because nobody write such code
 > >  
 > >  However, I hope we can ignore the strange case and new insn.
 > 
 > Me too.
 > (but also I think such restriction is challenging)
 > 
 > >  > 2. (negative) Live patching
 > >  
 > >  I've removed live patching for now:
 > >    http://80x24.org/spew/m/86739be08df5b4128e84bce15a2853cd7947a6fa.txt
 > >  It was a good learning experience, at least.
 > 
 > Thank you.
 > 
 > Please do not think that I stick to avoid live patching.
 > If we can have huge performance improvement, we need to consider about it.
 
 OK, we can also make a compile-time option.
 
 > >  > Have you measure performance improvement on non-microbenchmarks?
 > >  
 > >  "make rdoc" performance improved from 66.3 => 63.3 sec in original test
 > >  
 > >  However, the difference seems is smaller now: 66.1 => 64.9 sec
 > 
 > What does it means? what is "now"?
 
 I was not able to replicate the original results, probably because other
 changes in trunk affected performance.
 
 I'll check other issues tomorrow.
 
 About the discourse benchmark: it seems discourse and libraries already
 use .freeze frequently; so the performance may not be noticeable in
 application which is already optimized by Ruby experts.
 
 In our stdlib, I hope to make parts of r47073, r47072 unnecessary
 and revert the "literal".freeze calls.
 
 For instance, rack.git has commit dc53a8c26dc55d21240233b3d83d36efdef6e924
 ("Less allocated objects on each request"), which is ugly, but
 apparently important for performance.
 
 My goal is to make optimization transparent to non-experts
 (and prettier code for all)

----------------------------------------
Feature #10423: [PATCH] opt_str_lit*: avoid literal string allocations
https://bugs.ruby-lang.org/issues/10423#change-49897

* Author: Eric Wong
* Status: Open
* Priority: Normal
* Assignee: Koichi Sasada
* Category: core
* Target version: current: 2.2.0
----------------------------------------
Patch also downloadable at: http://80x24.org/spew/m/opt_str_lit-v4%40m.txt
Broken-out commits in the "opt_str_lit-v4" branch of
git://bogomips.org/ruby.git (and http://bogomips.org/ruby.git )

This obsoletes Features #10326, #10329, and #10333

Changes since -v3

* cleanup tests to be more DRY

* optimize allocations for Hash#fetch, String#r?partition

* split opt_str_lit into 3 instructions to reduce insn complexity
  (opt_str_freeze, opt_aset_with, opt_aref_with) are still gone.

  While having one instruction is appealing in some ways,
  hiding compile-time-resolvable branches behind it is
  misleading and it should be easier-to-follow when divided
  into three methods.

  1. opt_str_lit_recv: for "literal string" receivers
     This no longer optimizes the method dispatch away
     for String#freeze, the method call now always happens
     (but allocation is avoided)

  2. opt_str_lit_tmask: the most heavily-used
     This optimizes allocations away for "literal string" arguments

  3. opt_str_lit_data: currently only used for Time#strftime
     We may remove this (and the strftime optimization)
     if it is too rare to be useful.

Benchmarks:

  "make rdoc" performance improved from 66.3 => 63.3 sec

vm2_* benchmarks:

We take a speed hit from losing opt_aref_with/opt_aset_with
but I think the flexibility of the new instructions is worth it.

Speedup ratio: compare with the result of `trunk' (greater is better)

~~~
name    built
loop_whileloop2         1.000
vm2_array*              0.997
vm2_array_delete_lit*   2.204
vm2_array_include_lit*  2.372
vm2_bigarray*           1.002
vm2_bighash*            1.004
vm2_case*               1.030
vm2_defined_method*     1.013
vm2_dstr*               0.977
vm2_eval*               1.019
vm2_hash_aref_lit*      0.767
vm2_hash_aset_lit*      0.828
vm2_hash_delete_lit*    2.254
vm2_method*             1.010
vm2_method_missing*     1.010
vm2_method_with_block*  0.996
vm2_mutex*              0.954
vm2_newlambda*          1.010
vm2_poly_method*        0.974
vm2_poly_method_ov*     0.990
vm2_proc*               1.003
vm2_raise1*             0.992
vm2_raise2*             0.993
vm2_regexp*             1.053
vm2_send*               1.000
vm2_set_include_lit*    1.051
vm2_str_delete*         1.365
vm2_str_eq1*            2.282
vm2_str_eq2*            2.700
vm2_str_eqq1*           2.062
vm2_str_eqq2*           2.421
vm2_str_fmt*            1.271
vm2_str_gsub_bang_lit*  1.650
vm2_str_gsub_bang_re*   1.153
vm2_str_gsub_re*        1.110
vm2_str_plus1*          1.496
vm2_str_plus2*          1.618
vm2_str_tr_bang*        1.527
vm2_strcat*             1.406
vm2_super*              1.005
vm2_unif1*              0.985
vm2_zsuper*             0.992

-----------------------------------------------------------
raw data:

[["loop_whileloop2",
  [[0.09168246667832136,
    0.09153273981064558,
    0.09135112632066011,
    0.09144351910799742,
    0.0913569862022996],
   [0.09150420501828194,
    0.09135987050831318,
    0.09148290101438761,
    0.09135456290096045,
    0.09141282178461552]]],
 ["vm2_array",
  [[0.6448190519586205,
    0.6377620408311486,
    0.6456073289737105,
    0.636782786808908,
    0.64611473120749],
   [0.6434593833982944,
    0.6479324344545603,
    0.6385641889646649,
    0.6491997549310327,
    0.6610294701531529]]],
 ["vm2_array_delete_lit",
  [[0.44069126434624195,
    0.43327025789767504,
    0.44154794327914715,
    0.43351241294294596,
    0.4343419009819627],
   [0.24649471696466208,
    0.24813515041023493,
    0.2801106022670865,
    0.24796569626778364,
    0.25444996636360884]]],
 ["vm2_array_include_lit",
  [[0.42663843277841806,
    0.42771619465202093,
    0.4342082254588604,
    0.4192065382376313,
    0.4339462127536535],
   [0.24434014037251472,
    0.23243559524416924,
    0.23569373413920403,
    0.2295856410637498,
    0.23521045502275229]]],
 ["vm2_bigarray",
  [[5.9676302736625075,
    5.921381871215999,
    5.893002421595156,
    5.898605763912201,
    5.92879247572273],
   [5.918988090008497,
    5.953728860244155,
    5.910121874883771,
    5.894348439760506,
    5.882020344957709]]],
 ["vm2_bighash",
  [[3.5214368999004364,
    3.5914944410324097,
    3.5528544737026095,
    3.6002828497439623,
    3.595806434750557],
   [3.6454197568818927,
    3.5987599324434996,
    3.631806828081608,
    3.5076274275779724,
    3.5060981484130025]]],
 ["vm2_case",
  [[0.16305183991789818,
    0.16213963273912668,
    0.1626761993393302,
    0.1703133536502719,
    0.16371900774538517],
   [0.1630518063902855,
    0.16097174491733313,
    0.16129930969327688,
    0.16106242593377829,
    0.16007240489125252]]],
 ["vm2_defined_method",
  [[2.4309032559394836,
    2.4551019677892327,
    2.4333217879757285,
    2.4307468980550766,
    2.4270543046295643],
   [2.3963797464966774,
    2.4110122229903936,
    2.592901307158172,
    2.7106671230867505,
    2.6869526272639632]]],
 ["vm2_dstr",
  [[1.206045768223703,
    1.1181026576086879,
    1.29782950039953,
    1.074333599768579,
    1.060404953546822],
   [1.0911998134106398,
    1.3560991054400802,
    1.096764899790287,
    1.0906088771298528,
    1.0831655990332365]]],
 ["vm2_eval",
  [[12.45692038256675,
    12.536506623961031,
    12.868694677948952,
    12.425183075480163,
    12.422813648357987],
   [12.490455374121666,
    12.190652490593493,
    12.194032781757414,
    12.581901006400585,
    13.230092607438564]]],
 ["vm2_hash_aref_lit",
  [[0.2547442754730582,
    0.25785687286406755,
    0.2552897445857525,
    0.2547551356256008,
    0.2546374099329114],
   [0.3131725639104843,
    0.3109410647302866,
    0.30428329203277826,
    0.30984809901565313,
    0.30932857654988766]]],
 ["vm2_hash_aset_lit",
  [[0.3067243071272969,
    0.307466727681458,
    0.3059097733348608,
    0.32101333420723677,
    0.3114894386380911],
   [0.3522613048553467,
    0.3512485157698393,
    0.3521018158644438,
    0.35033643897622824,
    0.35163887683302164]]],
 ["vm2_hash_delete_lit",
  [[0.4271302009001374,
    0.5321928421035409,
    0.42596039827913046,
    0.4215333154425025,
    0.4221466425806284],
   [0.2427450241521001,
    0.23863658774644136,
    0.23870530910789967,
    0.23786015156656504,
    0.24112990126013756]]],
 ["vm2_method",
  [[1.2295677298679948,
    1.2423510188236833,
    1.2246184339746833,
    1.213058383204043,
    1.3788639837875962],
   [1.2468778286129236,
    1.2022472834214568,
    1.319407593458891,
    1.3409598469734192,
    1.246872222982347]]],
 ["vm2_method_missing",
  [[1.8783203130587935,
    2.0230442117899656,
    1.7176201958209276,
    1.790073310956359,
    1.7432779222726822],
   [1.710430753417313,
    1.8266426706686616,
    1.7299889298155904,
    1.7529844669625163,
    1.7007915144786239]]],
 ["vm2_method_with_block",
  [[1.3849838990718126,
    1.345238379202783,
    1.3416069420054555,
    1.3473590090870857,
    1.3418097402900457],
   [1.3468122174963355,
    2.0074896160513163,
    1.3560442496091127,
    1.353834005072713,
    1.3525155214592814]]],
 ["vm2_mutex",
  [[0.6452304208651185,
    0.7461022492498159,
    0.7083938084542751,
    0.7519227871671319,
    0.7953951777890325],
   [0.6722144978120923,
    0.6991892093792558,
    0.699041954241693,
    0.8469044668599963,
    0.7956293905153871]]],
 ["vm2_newlambda",
  [[0.7562470361590385,
    0.7670294912531972,
    0.7572819162160158,
    0.8686427380889654,
    0.772026221267879],
   [0.7586023258045316,
    0.7559101320803165,
    0.7497995179146528,
    0.7534453617408872,
    0.7510695895180106]]],
 ["vm2_poly_method",
  [[2.1563803972676396,
    1.9838355090469122,
    2.0223182002082467,
    1.9927900172770023,
    2.055130822584033],
   [2.03470276389271,
    2.187690651975572,
    2.162980039604008,
    2.143593772314489,
    2.04829403758049]]],
 ["vm2_poly_method_ov",
  [[0.22944753244519234,
    0.2304667690768838,
    0.22847569175064564,
    0.22843772917985916,
    0.22901444043964148],
   [0.2298243623226881,
    0.2334523554891348,
    0.23789969086647034,
    0.23442872054874897,
    0.23049725405871868]]],
 ["vm2_proc",
  [[0.46109406277537346,
    0.4476210633292794,
    0.44980251509696245,
    0.4451689962297678,
    0.4452860997989774],
   [0.4545932151377201,
    0.4440324120223522,
    0.44979235529899597,
    0.4695265833288431,
    0.4635683596134186]]],
 ["vm2_raise1",
  [[4.989030849188566,
    4.951415436342359,
    4.755166156217456,
    4.824490828439593,
    4.795467809773982],
   [4.797340838238597,
    4.799810292199254,
    4.792353850789368,
    4.966345896013081,
    5.099304006434977]]],
 ["vm2_raise2",
  [[7.062344457022846,
    7.118377918377519,
    7.290325260721147,
    7.1321266973391175,
    7.341989707201719],
   [7.456813280470669,
    7.130581725388765,
    7.1143358228728175,
    7.1227226899936795,
    7.2687620194628835]]],
 ["vm2_regexp",
  [[1.0923050865530968,
    1.0876065697520971,
    1.0792832653969526,
    1.0728685557842255,
    1.098634515888989],
   [1.172722346149385,
    1.0436540711671114,
    1.0638451064005494,
    1.0391646642237902,
    1.023901374079287]]],
 ["vm2_send",
  [[0.3253856906667352,
    0.31978365778923035,
    0.32439478673040867,
    0.3290995704010129,
    0.32464261166751385],
   [0.31982277520000935,
    0.32479251362383366,
    0.31968420650810003,
    0.3205655198544264,
    0.32221281714737415]]],
 ["vm2_set_include_lit",
  [[0.7017893251031637,
    0.7203339897096157,
    0.714060970582068,
    0.7558876499533653,
    0.7028816910460591],
   [0.6870902189984918,
    0.6755420127883554,
    0.6719080805778503,
    0.6900417571887374,
    1.1408466212451458]]],
 ["vm2_str_delete",
  [[0.671600878238678,
    0.6570039531216025,
    0.6776775699108839,
    0.6644531870260835,
    0.97628088388592],
   [0.508894819766283,
    0.5154125597327948,
    0.5064034061506391,
    0.5130562232807279,
    0.5056716529652476]]],
 ["vm2_str_eq1",
  [[0.42520802468061447,
    0.42302330397069454,
    0.42416366562247276,
    0.4216942973434925,
    0.4239108320325613],
   [0.23870286531746387,
    0.23699089046567678,
    0.2361262608319521,
    0.31979569140821695,
    0.26054865028709173]]],
 ["vm2_str_eq2",
  [[0.42534991446882486,
    0.4238837053999305,
    0.4327298905700445,
    0.42395070381462574,
    0.42501260805875063],
   [0.21500772424042225,
    0.21449210867285728,
    0.2152256891131401,
    0.21711387671530247,
    0.2152888299897313]]],
 ["vm2_str_eqq1",
  [[0.45151620265096426,
    0.4597599729895592,
    0.45280721783638,
    0.4582480965182185,
    0.45380780193954706],
   [0.26729187835007906,
    0.26626693457365036,
    0.2660442851483822,
    0.26674115750938654,
    0.2721241358667612]]],
 ["vm2_str_eqq2",
  [[0.45260279811918736,
    0.44771482422947884,
    0.4570333072915673,
    0.5335573730990291,
    0.4592890217900276],
   [0.2385527715086937,
    0.24223692156374454,
    0.24510123394429684,
    0.2744274791330099,
    0.2447566892951727]]],
 ["vm2_str_fmt",
  [[2.614012835547328,
    2.6062369514256716,
    2.5333361653611064,
    2.50343619287014,
    2.641179056838155],
   [1.9900542199611664,
    2.0983974151313305,
    1.9898552373051643,
    2.001521130092442,
    1.9963106652721763]]],
 ["vm2_str_gsub_bang_lit",
  [[1.1346126468852162,
    1.134159336797893,
    1.1467241132631898,
    1.170719631947577,
    1.149439207278192],
   [0.737271074205637,
    0.7308303005993366,
    0.730314377695322,
    0.7260715514421463,
    0.7231733025982976]]],
 ["vm2_str_gsub_bang_re",
  [[1.4583028750494123,
    1.4592840988188982,
    1.4590299287810922,
    1.4751051738858223,
    1.492074583657086],
   [1.3006651625037193,
    1.3510901927947998,
    1.2767879888415337,
    1.3884455161169171,
    1.277703725732863]]],
 ["vm2_str_gsub_re",
  [[1.7273316802456975,
    1.7103399503976107,
    1.7015334982424974,
    1.690703290514648,
    1.694624281488359],
   [1.5322920577600598,
    1.5391994584351778,
    1.5388264516368508,
    1.5372542077675462,
    1.5352015374228358]]],
 ["vm2_str_plus1",
  [[0.653569158166647,
    0.6917095249518752,
    0.6521412674337626,
    0.6697740163654089,
    0.7296328162774444],
   [0.46872936747968197,
    0.48235206957906485,
    0.46762560587376356,
    0.4766495209187269,
    0.4661587802693248]]],
 ["vm2_str_plus2",
  [[0.6671840893104672,
    0.6488652648404241,
    0.6509246686473489,
    0.6638444662094116,
    0.6557880509644747],
   [0.43602617271244526,
    0.4365514535456896,
    0.45153723005205393,
    0.437131704762578,
    0.44073084741830826]]],
 ["vm2_str_tr_bang",
  [[2.6861952347680926,
    2.816385295242071,
    2.919709806330502,
    2.712329135276377,
    2.6914397440850735],
   [1.79120559617877,
    1.8011440439149737,
    1.7935738116502762,
    1.7961191991344094,
    1.794896787032485]]],
 ["vm2_strcat",
  [[0.7171547841280699,
    0.7207952421158552,
    0.7209635498002172,
    0.7520132083445787,
    0.7179927034303546],
   [0.5424934774637222,
    0.5365820089355111,
    0.5409346511587501,
    0.5369464093819261,
    0.5396620389074087]]],
 ["vm2_super",
  [[0.43017072789371014,
    0.4671447016298771,
    0.43235956504940987,
    0.43128165043890476,
    0.4320727000012994],
   [0.42843823600560427,
    0.4359660716727376,
    0.46027328819036484,
    0.43320534005761147,
    0.4283680962398648]]],
 ["vm2_unif1",
  [[0.22890623100101948,
    0.23932419251650572,
    0.2282293662428856,
    0.2281231889501214,
    0.22829711250960827],
   [0.23160281032323837,
    0.23210297524929047,
    0.23051423579454422,
    0.23075581435114145,
    0.2301806192845106]]],
 ["vm2_zsuper",
  [[0.44561540707945824,
    0.4586751004680991,
    0.468279717490077,
    0.44244454242289066,
    0.4582256320863962],
   [0.5099742524325848,
    0.44541092310100794,
    0.4480971107259393,
    0.4709290647879243,
    0.5650665555149317]]]]

Elapsed time: 646.078768307 (sec)
-----------------------------------------------------------
benchmark results:
minimum results in each 5 measurements.
Execution time (sec)
name	trunk	built
loop_whileloop2	0.091	0.091
vm2_array*	0.545	0.547
vm2_array_delete_lit*	0.342	0.155
vm2_array_include_lit*	0.328	0.138
vm2_bigarray*	5.802	5.791
vm2_bighash*	3.430	3.415
vm2_case*	0.071	0.069
vm2_defined_method*	2.336	2.305
vm2_dstr*	0.969	0.992
vm2_eval*	12.331	12.099
vm2_hash_aref_lit*	0.163	0.213
vm2_hash_aset_lit*	0.215	0.259
vm2_hash_delete_lit*	0.330	0.147
vm2_method*	1.122	1.111
vm2_method_missing*	1.626	1.609
vm2_method_with_block*	1.250	1.255
vm2_mutex*	0.554	0.581
vm2_newlambda*	0.665	0.658
vm2_poly_method*	1.892	1.943
vm2_poly_method_ov*	0.137	0.138
vm2_proc*	0.354	0.353
vm2_raise1*	4.664	4.701
vm2_raise2*	6.971	7.023
vm2_regexp*	0.982	0.933
vm2_send*	0.228	0.228
vm2_set_include_lit*	0.610	0.581
vm2_str_delete*	0.566	0.414
vm2_str_eq1*	0.330	0.145
vm2_str_eq2*	0.333	0.123
vm2_str_eqq1*	0.360	0.175
vm2_str_eqq2*	0.356	0.147
vm2_str_fmt*	2.412	1.899
vm2_str_gsub_bang_lit*	1.043	0.632
vm2_str_gsub_bang_re*	1.367	1.185
vm2_str_gsub_re*	1.599	1.441
vm2_str_plus1*	0.561	0.375
vm2_str_plus2*	0.558	0.345
vm2_str_tr_bang*	2.595	1.700
vm2_strcat*	0.626	0.445
vm2_super*	0.339	0.337
vm2_unif1*	0.137	0.139
vm2_zsuper*	0.351	0.354

---
 benchmark/bm_vm2_array_delete_lit.rb  |   6 +
 benchmark/bm_vm2_array_include_lit.rb |   6 +
 benchmark/bm_vm2_hash_aref_lit.rb     |   6 +
 benchmark/bm_vm2_hash_aset_lit.rb     |   6 +
 benchmark/bm_vm2_hash_delete_lit.rb   |   6 +
 benchmark/bm_vm2_set_include_lit.rb   |   7 +
 benchmark/bm_vm2_str_delete.rb        |   6 +
 benchmark/bm_vm2_str_eq1.rb           |   6 +
 benchmark/bm_vm2_str_eq2.rb           |   6 +
 benchmark/bm_vm2_str_eqq1.rb          |   6 +
 benchmark/bm_vm2_str_eqq2.rb          |   6 +
 benchmark/bm_vm2_str_fmt.rb           |   5 +
 benchmark/bm_vm2_str_gsub_bang_lit.rb |   6 +
 benchmark/bm_vm2_str_gsub_bang_re.rb  |   6 +
 benchmark/bm_vm2_str_gsub_re.rb       |   6 +
 benchmark/bm_vm2_str_plus1.rb         |   6 +
 benchmark/bm_vm2_str_plus2.rb         |   6 +
 benchmark/bm_vm2_str_tr_bang.rb       |   7 +
 benchmark/bm_vm2_strcat.rb            |   7 +
 common.mk                             |  18 +-
 compile.c                             | 330 ++++++++++++++++++++++++++++++----
 defs/id.def                           |  38 ++++
 defs/opt_method.def                   |  90 ++++++++++
 insns.def                             | 275 ++++++++++++++++------------
 template/opt_method.h.tmpl            | 111 ++++++++++++
 template/opt_method.inc.tmpl          |  42 +++++
 test/-ext-/symbol/test_type.rb        |   1 +
 test/objspace/test_objspace.rb        |   1 +
 test/ruby/envutil.rb                  |  20 +++
 test/ruby/test_hash.rb                |   2 +
 test/ruby/test_iseq.rb                |   1 +
 test/ruby/test_optimization.rb        | 162 ++++++++++++++++-
 vm.c                                  |  67 +------
 vm_core.h                             |  44 +----
 vm_insnhelper.c                       |   8 +-
 vm_insnhelper.h                       |  27 +++
 36 files changed, 1089 insertions(+), 264 deletions(-)
~~~


---Files--------------------------------
opt_str_lit-v4.patch (76.7 KB)
opt_str_lit-v5.patch (78.2 KB)


-- 
https://bugs.ruby-lang.org/