Issue #12599 has been updated by Noah Gibbs.


Based on diffs of long profiling runs of optcarrot, I think the following functions aren't inlined by default, and (I suspect) should be. Working on code changes for that now.

rb_get_alloc_func, rb_ary_rotate, rb_ary_modify.

I'm also seeing big changes in the other direction (take *more* time when inlined) to rb_ary_cmp and rb_yield, which suggests that something they call (not those functions) is getting inlined and it's making a significant difference.

I don't think inlining just those three functions will be most of the 5%-7% difference, though. I'll keep looking for big differences. Otherwise it may be a lot of small differences, which will be harder to track down :-/


----------------------------------------
Bug #12599: For CLang, increase inline-threshold to get 7%-10% speedup of optcarrot
https://bugs.ruby-lang.org/issues/12599#change-60285

* Author: Noah Gibbs
* Status: Open
* Priority: Normal
* Assignee: 
* ruby -v: 2.4.0dev
* Backport: 2.1: UNKNOWN, 2.2: UNKNOWN, 2.3: UNKNOWN
----------------------------------------
Here's a patch to set -inline-threshold where it's supported -- it's only for CLang, so I think this is mostly on Mac OS.

Clang's default inline threshold complexity is 225 (see "https://groups.google.com/forum/#!topic/llvm-dev/GpU79q9JzJI"). By turning it up to 5000, the Ruby binary's size goes from about 3MB to 6MB, but there's an overall speedup of the optcarrot benchmark of about 7%.

Here are roughly the speedups I found, using 500+ runs of the optcarrot benchmark for each check:

    Threshold:   Binary size:   Speedup on optcarrot:
    5000         6MB            7%
    2500         5.5MB          6%
    1800         4.8MB          5%
    1000         4.4MB          5% (hard to measure diff between 1000 and 1800)

There doesn't seem to be any increase in dynamic memory use - this is only inlining the C code compiled by CLang/LLVM, not changing any Ruby data structures at runtime, so the memory cost seems to only be paid once.

For a desktop Mac in particular, it seems like using 3MB extra for a 7% speedup is a really good deal.


---Files--------------------------------
inline-threshold.patch (1.03 KB)


-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>