Issue #16258 has been reported by alanwu (Alan Wu).

----------------------------------------
Misc #16258: [PATCH] Combine call info and cache to speed up method invocation
https://bugs.ruby-lang.org/issues/16258

* Author: alanwu (Alan Wu)
* Status: Open
* Priority: Normal
* Assignee: 
----------------------------------------
Proposed change: https://github.com/ruby/ruby/pull/2564

To perform a regular method call, the VM needs two structs, `rb_call_info`
and `rb_call_cache`. At the moment, we allocate these two structures in
separate buffers. In the worst case, the CPU needs to read 4 cache lines to
complete a method call. Putting the two structures together reduces the
maximum number of cache line reads to 2. 

Combining the structures also saves 8 bytes per call site as the current
layout uses separate pointers for the call info and the call cache. This
change saves about 2 MiB on Discourse.

The Optcarrot benchmark receives a performance improvement from this patch. I
collected the following results using `make install` binaries compiled with
`-DRUBY_NDEBUG`, with a sample size of 50 for each category:

|       | master-a5245c | after patch | speed-up ratio |
|-------|---------------|-------------|----------------|
| plain | 42.39         | 50.17       | 18.35%         |
| jit   | 71.72         | 72.73       | 1.41%          |


These are medium FPS from the benchmark output. For raw benchmark results and
basic stats, see
https://gist.github.com/XrXr/ce5cb7cf2c3c4d29e58c919fa5c86b33. I took these
results with a i7-8750H CPU @ 2.20GHz on a 2018 MacBook Pro. I also ran the
benchmark with a AMD 2400G running Arch Linux and observed a 3% improvement
without the jit.

## Complications
 -  A new instruction attribute `comptime_sp_inc` is introduced to calculate
   SP increase at compile time without using call caches. At compile time, a
   `TS_CALLDATA` operand points to a call info struct, but at runtime, the
   same operand points to a call data struct. Instruction that explicitly
   define `sp_inc` also need to define `comptime_sp_inc`.
 - MJIT code for copying call cache becomes slightly more complicated.
 - This changes the bytecode format, which might break existing tools.


I think this patch offers a good general performance boost for a manageable amount
of code change.




-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>