I had accidentally added GC.verify_internal_consistency method for each
test case (in after_tear_down.

After remove it,

trunk         : 798.756056s
before removal: 1981.145346s
after removal : 845.399831s

Not so bad.


(2014/08/17 11:55), shibata.hiroshi / gmail.com wrote:
> Issue #10137 has been updated by Hiroshi SHIBATA.
> 
> 
> I evaluated running time for test-all.
> 
> ref. https://twitter.com/k_tsj/status/500523453028388864
> 
> rincgc branch seems 2 times slower than trunk when sequential running.
> I tested 3 times. it's same results.
> 
> ### trunk
> 
> ruby -v: ruby 2.2.0dev (2014-08-17 trunk 47206) [x86_64-darwin14]
> make test-all TESTS='-j4'  291.72s user 65.89s system 241% cpu 2:27.94 total
> 
> ruby -v: ruby 2.2.0dev (2014-08-17 trunk 47206) [x86_64-darwin14]
> make test-all  336.31s user 62.33s system 83% cpu 7:55.63 total
> 
> ### rincgc branch
> 
> ruby -v: ruby 2.2.0dev (2014-08-15 trunk 47192) [x86_64-darwin14]
> make test-all TESTS='-j4'  359.57s user 74.27s system 213% cpu 3:23.33 total
> 
> ruby -v: ruby 2.2.0dev (2014-08-15 trunk 47192) [x86_64-darwin14]
> make test-all  878.35s user 65.17s system 90% cpu 17:22.43 total
> 
> 
> 
> ----------------------------------------
> Feature #10137: Introducing Incremental GC algorithm
> https://bugs.ruby-lang.org/issues/10137#change-48380
> 
> * Author: Koichi Sasada
> * Status: Open
> * Priority: Normal
> * Assignee: Koichi Sasada
> * Category: core
> * Target version: current: 2.2.0
> ----------------------------------------
> # Abstract
> 
> Introduce *incremental GC* algorithm to reduce pause time of major/full 
> GC. This ticket includes design and implementation note and a working 
> patch.
> 
> # Background and problem
> 
> Ruby 2.1 uses generational GC algorithm (named RGenGC) to improve GC 
> throughput. Genrational GC algorithm separates existing objects into 
> young generation objects and old generation objects. At the most of GC 
> timing, GC only marks young generation objects (called minor GC). If 
> there are no enough memory, then mark all of objects (called major GC or 
> full GC). Minor GC is dramatically fast than major GC. So that total 
> throughput of application improves (10% improvement in my RDoc benchmark, 
> [ruby-list:49896] reported that GC intensive application is 6 times 
> faster!).
> 
> However, major GC is needed periodically and it pauses same time as GC 
> on Ruby 2.0 and before. This problem hits response time intensive 
> application such as web application.
> 
> # Proposal
> 
> Introduce *Incremental GC* algorithm for major GC.
> 
> Incremental GC algorithm is well-known GC algorithm to reduce GC pause 
> time. Scatter GC (marking) process in several phases and run processes 
> interleaving with Ruby's execution process. This is similar to current 
> lazy sweep algorithm (in fact, lazy sweep is a half part of incremental 
> GC algorithm).
> 
> Running ruby process with marking phase, it is possible to introduce 
> critical bug because marked objects can points un-marked objects (on the 
> incremental GC terminology, marked objects are called "Black" objects 
> and un-marked objects are called "White" objects). Such un-marked 
> objects can be left in un-marking and be swept. 
> 
> ```
> # if `ary' is already marked, and obj is not marked
> ary[0] = obj
> obj = nil # no-body except `ary' refers `obj'
> ```
> 
> To prevent such destructive bug, we can use write barriers to detect 
> such "marked objects" to "un-marked objects". We can care about such 
> case.
> 
> Yes, MRI/CRuby has "WB-unprotected" objects such objects does not have 
> write barriers because of compatibility or implementation issues. To 
> care about such WB-unprotected objects, we need to traverse all of 
> living WB-unprotected objects at a time in the last of incremental 
> marking. This is new extending idea against traditional incremental GC 
> algorithm (at least I surveyed).
> 
> Deisgn and implementation details are here:
> http://www.atdot.net/fp_store/f.p61can/file.data-incremental-gc.pdf
> 
> Maybe a diagram at page 10 will help you to understand the flow of all 
> GC process.
> 
> Code is here:
> https://github.com/ko1/ruby/tree/rincgc
> 
> Compare with trunk:
> https://github.com/ko1/ruby/compare/rincgc
> 
> # Implementation note
> 
> ## WB-unprotected bitmap
> 
> As I said, we need to check all of living WB-unprotected objects at the 
> last of incremental marking phase. To do it lightweight, introduce 
> WB-unprotected bitmap intead of specific bit in RBasic::flags.
> 
> We can get all living WB-unprotected objects with the following pseudo 
> code:
> 
> ```
> bits = mark_bits[i] & wb_unprotected_bits[i];
> while (bits) {
>   if (bits & 1) do something.
>   bits >>= 1;
> }
> ```
> 
> ## 4 age promotion
> 
> Because we don't need to use WB-protected bit in RBasic::flags, we have 
> another 1 bit in RBasic::flags. To utilize this bit, we add *age* of an 
> object with exsiting promoted bit. Rename FL_WB_PROTECTED to 
> FL_PROMOTED0 and FL_PROMOTED to FL_PROMOTED1.
> 
> These two bits represent object's age (0 to 3) and 3 means OLD objects.
> 
> ## Write barriers
> 
> We extend write barriers to detect such [marked objects -> un-marked 
> objects] reference in incremental GC. It introduces some overhead.
> 
> # Evaluation
> 
> Benchmark results on real Linux machine (*1)
> http://www.atdot.net/sp/raw/yekban
> *1: Intel(R) Xeon(R) CPU E5335 @ 2.00GHz, 4GB memory, 2.6.32-5-amd64 (Debian squeeze)
> 
> In most of case, there are only a few (~5%) performance down. 
> Incremental GC introduces some overhead.  But I think they are 
> acceptable speed-down.
> 
> Discouse benchmark (only on Virtualbox VM, so accuracy is not good)
> http://www.atdot.net/sp/raw/g9uban
> 
> We can recognize reducing worst case seconds.
> 
> # TODO
> 
> (1) Clean up codes
> 
> Now, we can not disable incremental GC codes and generational GC codes. 
> We need to add ability to enable/disable features with macros.
> 
> (2) Tuning parameters
> 
> Now the parameters are fixed in codes. mruby already have tuning 
> parameters for incremental GC (matz said they are from Lua), 
> "GC.interval_ratio" and "GC.step_ratio". We can import these functions 
> (or making another interface to tell).
> 
> (3) Enter GC/ Exit GC internal events
> 
> This patch also includes function "gc_enter()" and "gc_exit()" which set 
> and reset a "doing GC" flag.
> 
> If we introduce internal event to hook these functions, we can measure 
> exact GC pause time (and mutators time).
> 
> # Summary
> 
> This feature proposal is to introduce incremental GC algorithm with working code.
> Incremental GC algorithm reduce application's pause time of major GC.
> 
> Any feedback are welcome!
> 
> Thanks,
> Koichi
> 
> 
> 
> 


-- 
// SASADA Koichi at atdot dot net