Issue #14370 has been updated by tenderlovemaking (Aaron Patterson).


I've been doing more investigation about performance of this patch.  Using the same Rails application I used for the initial graphs, I walked the heap for all instruction sequences and output the size of `iseq_encoded` along with the number of markable objects it found in the encoded iseq.  The code I used to walk the heap is here:

  https://gist.github.com/tenderlove/63ae5ec669a1e797b611aeaa0b72073e

The data I gathered is here:

  https://gist.github.com/tenderlove/d7ea6bb52004f8f5bf10af5b0fcfea4c

Using the numbers from above, I estimate it would take between 15ms and 25ms to mark all ISeq in the Rails application:

~~~
> sum(iseqs$size) / 65491636
[1] 0.01509967
> sum(iseqs$size) / 38439253
[1] 0.02572636
~~~

I made a histogram of Instruction Sequences bucketed by the number of markable objects that are in the iseq:

![instruction sequences bucked by markable length](https://user-images.githubusercontent.com/3124/35252871-58bc9700-ff97-11e7-90f3-ec6bc3d8710e.png)

Approximately 60% of the instruction sequences have 0 markable objects.  Those 60% account for 35% of the total `iseq_encoded` that needs to be walked:

~~~
> no_markables <- subset(iseqs, markables == 0)
> length(no_markables$markables) / length(iseqs$markables)
[1] 0.5950202
> sum(no_markables$size) / sum(iseqs$size)
[1] 0.3508831
~~~

If we set a flag on the iseq at compile time that the iseq "has objects to mark", we could reduce mark time to between 10ms and 16ms for all iseq objects:

~~~
> has_markables <- subset(iseqs, markables > 0)
> sum(has_markables$size) / 65491636
[1] 0.00980145
> sum(has_markables$size) / 38439253
[1] 0.01669941
~~~

What do you think?

----------------------------------------
Feature #14370: Directly mark instruction operands and avoid mark_ary usage on rb_iseq_constant_body
https://bugs.ruby-lang.org/issues/14370#change-69680

* Author: tenderlovemaking (Aaron Patterson)
* Status: Open
* Priority: Normal
* Assignee: ko1 (Koichi Sasada)
* Target version: 
----------------------------------------
Hi,

I've attached a patch that changes rb_iseq_mark to directly mark instruction operands rather than adding them to a mark array.  I observed a ~3% memory reduction by directly marking operands, and I didn't observe any difference in GC time.  To test memory usage, I used a basic Rails application, logged all malloc / free calls to a file, then wrote a script that would sum the live memory at each sample (each sample being a call to malloc).  I graphed these totals so that I could see the memory usage as malloc calls were made:

![memory usage graph](https://user-images.githubusercontent.com/3124/35020270-1b0ded20-fae0-11e7-9cbd-1d028a6c9484.png)

The red line is trunk, the blue line is trunk + the patch I've attached.  Since the X axis is sample number (not time), the blue line is not as long as the red line because the blue line calls `malloc` fewer times.  The Y axis in the graph is the total number of "live" bytes that have been allocated (all allocations minus their corresponding frees).  You can see from the graph that memory savings start adding up as more code gets loaded.

I was concerned that this patch might impact GC time, but `make gcbench-rdoc` didn't seem to show any significant difference in GC time between trunk and this patch.  If it turns out there is a performance impact, I think I could improve the time while still keeping memory usage low by generating a bitmap during iseq compilation.

There is a bit more information where I've been working, but I think I've summarized everything here.

  https://github.com/github/ruby/pull/39


---Files--------------------------------
iseq_mark.diff (6.28 KB)
iseq_mark.diff (6.28 KB)
iseq_mark.diff (7.26 KB)
benchmark_methods.diff (1.23 KB)
bench.rb (3.01 KB)


-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>