Issue #16258 has been updated by ko1 (Koichi Sasada).

Assignee set to ko1 (Koichi Sasada)

Thank you for your patch.

Conclusion: OK.

Points:

* Current implementation separates ci and cc because of CoW friendliness (ci is immutable data and cc is mutable data). However, there are no measurements how it affect on CoW friendliness. Bcause ci is immutable data, we can pre-compile these data and it will improve startup time. However, there are no implementation of it.
* For Guild, I will rewrite inline cache (cc) because of atomicity. However, Ruby 2.7 doesn't have this change. For Ruby 2.7 only this patch is accepted.


----------------------------------------
Misc #16258: [PATCH] Combine call info and cache to speed up method invocation
https://bugs.ruby-lang.org/issues/16258#change-82104

* Author: alanwu (Alan Wu)
* Status: Open
* Priority: Normal
* Assignee: ko1 (Koichi Sasada)
----------------------------------------
Proposed change: https://github.com/ruby/ruby/pull/2564

To perform a regular method call, the VM needs two structs, `rb_call_info`
and `rb_call_cache`. At the moment, we allocate these two structures in
separate buffers. In the worst case, the CPU needs to read 4 cache lines to
complete a method call. Putting the two structures together reduces the
maximum number of cache line reads to 2. 

Combining the structures also saves 8 bytes per call site as the current
layout uses separate pointers for the call info and the call cache. This
change saves about 2 MiB on Discourse.

The Optcarrot benchmark receives a performance improvement from this patch. I
collected the following results using `make install` binaries compiled with
`-DRUBY_NDEBUG`, with a sample size of 50 for each category:

|       | master-a5245c | after patch | speed-up ratio |
|-------|---------------|-------------|----------------|
| plain | 42.39         | 50.17       | 18.35%         |
| jit   | 71.72         | 72.73       | 1.41%          |


These are medium FPS from the benchmark output. For raw benchmark results and
basic stats, see
https://gist.github.com/XrXr/ce5cb7cf2c3c4d29e58c919fa5c86b33. I took these
results with a i7-8750H CPU @ 2.20GHz on a 2018 MacBook Pro. I also ran the
benchmark with a AMD 2400G running Arch Linux and observed a 3% improvement
without the jit.

## Complications
 -  A new instruction attribute `comptime_sp_inc` is introduced to calculate
   SP increase at compile time without using call caches. At compile time, a
   `TS_CALLDATA` operand points to a call info struct, but at runtime, the
   same operand points to a call data struct. Instruction that explicitly
   define `sp_inc` also need to define `comptime_sp_inc`.
 - MJIT code for copying call cache becomes slightly more complicated.
 - This changes the bytecode format, which might break existing tools.


I think this patch offers a good general performance boost for a manageable amount
of code change.




-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>