Issue #14723 has been updated by sam.saffron (Sam Saffron).


From my testing on Discourse bench ... the difference is pretty much not that measurable

Before patch

```
Unicorn: (workers: 3)
Include env: false
Iterations: 200, Best of: 1
Concurrency: 1

---
categories:
  50: 58
  75: 65
  90: 73
  99: 123
home:
  50: 62
  75: 70
  90: 86
  99: 139
topic:
  50: 60
  75: 65
  90: 72
  99: 117
categories_admin:
  50: 101
  75: 106
  90: 115
  99: 210
home_admin:
  50: 107
  75: 114
  90: 132
  99: 211
topic_admin:
  50: 115
  75: 123
  90: 134
  99: 201
timings:
  load_rails: 5444
ruby-version: 2.6.0-p-1
rss_kb: 196444
pss_kb: 139514
memorysize: 7.79 GB
virtual: vmware
architecture: amd64
operatingsystem: Ubuntu
processor0: Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz
physicalprocessorcount: 2
kernelversion: 4.15.0
rss_kb_23779: 309984
pss_kb_23779: 249785
rss_kb_23817: 307056
pss_kb_23817: 246738
rss_kb_23948: 304732
pss_kb_23948: 244364
```

After patch:

```
Iterations: 200, Best of: 1
Concurrency: 1

---
categories:
  50: 56
  75: 61
  90: 70
  99: 116
home:
  50: 63
  75: 70
  90: 77
  99: 170
topic:
  50: 61
  75: 68
  90: 77
  99: 96
categories_admin:
  50: 102
  75: 111
  90: 121
  99: 182
home_admin:
  50: 96
  75: 102
  90: 108
  99: 205
topic_admin:
  50: 109
  75: 118
  90: 130
  99: 192
timings:
  load_rails: 4987
ruby-version: 2.6.0-p-1
rss_kb: 196004
pss_kb: 137541
memorysize: 7.79 GB
virtual: vmware
architecture: amd64
operatingsystem: Ubuntu
processor0: Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz
physicalprocessorcount: 2
kernelversion: 4.15.0
rss_kb_16393: 306312
pss_kb_16393: 244353
rss_kb_16438: 307052
pss_kb_16438: 244942
rss_kb_16555: 305092
pss_kb_16555: 242997
```

Nothing really sticks out as absolutely an improvement across the board though some of the benches are a bit faster, memory is almost not impacted. It is no worse than head, but it is also not easy to measure how much better it is, we may need to repeat with significantly more iterations to remove noise. 

I do want to review Discourse carefully to ensure we are using async_exec everywhere... will do so later today. 

Eric if you feel like trying out the bench, clone: https://github.com/discourse/discourse.git and run ruby script/bench.rb


I also have some allocator benches you can play with at: https://github.com/SamSaffron/allocator_bench.git





----------------------------------------
Feature #14723: [WIP] sleepy GC
https://bugs.ruby-lang.org/issues/14723#change-71823

* Author: normalperson (Eric Wong)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
The idea is to use "idle time" when process is otherwise sleeping
and using no CPU time to perform GC.  It makes sense because real
world traffic sees idle time due to network latency and waiting
for user input.

Right now, it's Linux-only.  Future patches will affect other sleeping
functions:

  IO.select, Kernel#sleep, Thread#join, Process.waitpid, etc...

I don't know if this patch can be implemented for win32, right
now it's just dummy functions and that will be somebody elses
job.  But all pthreads platforms should eventually benefit.


Before this patch, the entropy-dependent script below takes 95MB
consistently on my system.  Now, depending on the amount of
entropy on my system, it takes anywhere from 43MB to 75MB.

I'm using /dev/urandom to simulate real-world network latency
variations.  There is no improvement when using /dev/zero
because the process is never idle.

  require 'net/http'
  require 'digest/md5'
  Thread.abort_on_exception = true
  s = TCPServer.new('127.0.0.1', 0)
  len = 1024 * 1024 * 1024
  th = Thread.new do
    c = s.accept
    c.readpartial(16384)
    c.write("HTTP/1.0 200 OK\r\nContent-Length: #{len}\r\n\r\n")
    IO.copy_stream('/dev/urandom', c, len)
    c.close
  end

  addr = s.addr
  Net::HTTP.start(addr[3], addr[1]) do |http|
    http.request_get('/') do |res|
      dig = Digest::MD5.new
      res.read_body { |buf|
        dig.update(buf)
      }
      puts dig.hexdigest
    end
  end

The above script is also dependent on net/protocol using
read_nonblock.  Ordinary IO objects will need IO#nonblock=true
to see benefits (because they never hit rb_wait_for_single_fd)

* gc.c (rb_gc_inprogress): new function
  (rb_gc_step): ditto
* internal.h: declare prototypes for new gc.c functions
* thread_pthread.c (gvl_contended_p): new function
* thread_win32.c (gvl_contended_p): ditto (dummy)
* thread.c (rb_wait_for_single_fd w/ ppoll):
  use new functions to perform GC while GVL is uncontended
  and GC is lazy sweeping or incremental marking
  [ruby-core:86265]
```

2 part patch broken out
https://80x24.org/spew/20180429035007.6499-2-e / 80x24.org/raw
https://80x24.org/spew/20180429035007.6499-3-e / 80x24.org/raw

Also on my "sleepy-gc" git branch @ git://80x24.org/ruby.git


---Files--------------------------------
sleepy-gc-wip-v1.diff (5.37 KB)


-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>