Intersting challenge.

I doubt that this improvement only for extending embed area, not a cache
line friendly technique.

Could you try same measurement
https://github.com/ruby/ruby/pull/495#issuecomment-31580604
with only addding dummy padding to RVALUE (and not extend embed area) if
it is easy to try?

If your assumption:

> The problem is, 5 is a prime number. So cache mechanisms of any size
cannot store this struct efficiently. Most notably, CPUs have been
equipped with data caches since their mid age; Ruby's objects do not
suit there. That does not always mean a breakage but significant
slowdown is happening.

is true, the performance will improve without extending embed data area.
At least, the improvement of vm3_gc is mainly from lightweight Hash
allocation, I guess.

If the assumption "only allocating overhead is issue" is true, we can
discuss lightweight memory allocation techniques (which includes
increasing RVALUE size and expand embed area). If cache line mismatch is
issue as you said, we can consider about cache line in other area.


(2014/01/04 7:15), shyouhei (Shyouhei Urabe) wrote:
> 
> Issue #9362 has been reported by shyouhei (Shyouhei Urabe).
> 
> ----------------------------------------
> Feature #9362: Minimize cache misshit to gain optimal speed
> https://bugs.ruby-lang.org/issues/9362
> 
> Author: shyouhei (Shyouhei Urabe)
> Status: Assigned
> Priority: Normal
> Assignee: matz (Yukihiro Matsumoto)
> Category: core
> Target version: current: 2.2.0
> 
> 
> Main features:
> 
>   - Applies cleanly onto trunk,
>   - Passes tests,
>   - RUNS FASTER.
> 
> Detailed concepts, the patches, and benchmark results can be
> obtained from: https://github.com/ruby/ruby/pull/495
> 
> 


-- 
// SASADA Koichi at atdot dot net