Issue #12599 has been updated by Noah Gibbs.


My initial run was with Ruby 2.4.0dev on Mac OS X on the following CLang:

Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.3.0 (clang-703.0.31)
Target: x86_64-apple-darwin15.5.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

With CentOS, CLang 3.4.2 (see below) and Ruby 2.2.3, I get a 14% speedup rather than 7%! Not sure if that's a CentOS difference, a CLang version difference, a Mac vs Linux difference... That's taken versus CLang without inlining. I'm measuring against normal GCC on CentOS now. So 14% may be optimistic. But it would still be a great idea to include it in the config file to avoid CLang being slow.

The inlining settings on CentOS also double the size of the Ruby binary (about 8MB vs 19MB), like on MacOS.

clang version 3.4.2 (tags/RELEASE_34/dot2-final)
Target: x86_64-redhat-linux-gnu
Thread model: posix
Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-redhat-linux/4.4.4
Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-redhat-linux/4.4.7
Found candidate GCC installation: /usr/lib/gcc/x86_64-redhat-linux/4.4.4
Found candidate GCC installation: /usr/lib/gcc/x86_64-redhat-linux/4.4.7
Selected GCC installation: /usr/bin/../lib/gcc/x86_64-redhat-linux/4.4.7


----------------------------------------
Bug #12599: For CLang, increase inline-threshold to get 7% speedup of optcarrot
https://bugs.ruby-lang.org/issues/12599#change-59815

* Author: Noah Gibbs
* Status: Open
* Priority: Normal
* Assignee: 
* ruby -v: 2.4.0dev
* Backport: 2.1: UNKNOWN, 2.2: UNKNOWN, 2.3: UNKNOWN
----------------------------------------
Here's a patch to set -inline-threshold where it's supported -- it's only for CLang, so I think this is mostly on Mac OS.

Clang's default inline threshold complexity is 225 (see "https://groups.google.com/forum/#!topic/llvm-dev/GpU79q9JzJI"). By turning it up to 5000, the Ruby binary's size goes from about 3MB to 6MB, but there's an overall speedup of the optcarrot benchmark of about 7%.

Here are roughly the speedups I found, using 500+ runs of the optcarrot benchmark for each check:

    Threshold:   Binary size:   Speedup on optcarrot:
    5000         6MB            7%
    2500         5.5MB          6%
    1800         4.8MB          5%
    1000         4.4MB          5% (hard to measure diff between 1000 and 1800)

There doesn't seem to be any increase in dynamic memory use - this is only inlining the C code compiled by CLang/LLVM, not changing any Ruby data structures at runtime, so the memory cost seems to only be paid once.

For a desktop Mac in particular, it seems like using 3MB extra for a 7% speedup is a really good deal.


---Files--------------------------------
inline-threshold.patch (1.03 KB)


-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>