Issue #17795 has been updated by byroot (Jean Boussier).


> Afaik the proper way to do this is to close the connection after the fork.

No before. Otherwise the connection is "shared" and closing it in the child=
ren cause issues for the connections in the parent.

> I'd like to know where the idea that it's slow is coming from.

Maybe your glibc is quite old? https://sourceware.org/glibc/wiki/Release/2.=
25#pid_cache_removal

```ruby
require 'benchmark/ips'

module Foo
  class << self
    attr_accessor :bar
  end
  @bar =3D 42
end

puts "#{RUBY_VERSION} #{RUBY_PLATFORM}"
Benchmark.ips do |x|
  x.report('Process.pid') { Process.pid }
  x.report('Module.attr') { Foo.bar }
  x.compare!
end
```

```
3.0.1 x86_64-darwin20
Warming up --------------------------------------
         Process.pid     1.914M i/100ms
         Module.attr     1.775M i/100ms
Calculating -------------------------------------
         Process.pid     19.144M (=B1 0.7%) i/s -     97.626M in   5.099666s
         Module.attr     17.820M (=B1 0.4%) i/s -     90.530M in   5.080332s

Comparison:
         Process.pid: 19144498.7 i/s
         Module.attr: 17820085.3 i/s - 1.07x  (=B1 0.00) slower
```

```
3.0.1 x86_64-linux
Warming up --------------------------------------
         Process.pid   698.792k i/100ms
         Module.attr     1.886M i/100ms
Calculating -------------------------------------
         Process.pid      6.862M (=B1 1.5%) i/s -     34.940M in   5.092832s
         Module.attr     19.184M (=B1 1.0%) i/s -     96.197M in   5.014902s

Comparison:
         Module.attr: 19183904.4 i/s
         Process.pid:  6862219.4 i/s - 2.80x  (=B1 0.00) slower
```

So fast enough for things that are infrequently called, but slow enough tha=
t I see it sitting at `1-2%` of CPU profiles in real production workloads.

> So it seems to me this is not such a nice API.

It's not really intended as an actual API, but as a smaller change that wou=
ld be more easily accepted by the core team.

----------------------------------------
Feature #17795: `before_fork` and `after_fork` callback API
https://bugs.ruby-lang.org/issues/17795#change-91497

* Author: byroot (Jean Boussier)
* Status: Open
* Priority: Normal
----------------------------------------
Replaces: https://bugs.ruby-lang.org/issues/5446

### Context

Ruby code in production is very often running in a forking setup (puma, uni=
corn, etc), and it is common some types of libraries to need to know when t=
he Ruby process was forked. For instance:

  - Most database clients, ORMs or other libraries keeping a connection poo=
l might need to close connections before the fork happens.
  - Libraries relying on some kind of dispatcher thread might need to resta=
rt the thread in the forked children, and clear any internal buffer (e.g. s=
tatsd clients, newrelic_rpm).

**This need is only for forking the whole ruby process, extensions doing a =
`fork(2) + exec(2)` combo etc are not a concern, this aim at only catching =
`kernel.fork`, `Process.fork` and maybe `Process.daemon`.**.
The use case is for forks that end up executing Ruby code.

### Current solutions

Right now this use case is handled in several ways.

#### Rely on the integrating code to call a `before_fork` or `after_fork` c=
allback.

Some libraries simply rely on documentation and require the user to use the=
 hooks provided by their forking server.

Examples:

  - Sequel: http://sequel.jeremyevans.net/rdoc/files/doc/fork_safety_rdoc.h=
tml
  - Rails's Active Record: https://devcenter.heroku.com/articles/concurrenc=
y-and-database-connections#multi-process-servers
  - ScoutAPM (it tries to detect popular forking setup and register itself)=
: https://github.com/scoutapp/scout_apm_ruby/blob/fa83793b9e8d2f9a32c920f59=
b57d7f198f466b8/lib/scout_apm/environment.rb#L142-L146
  - NewRelic RPM (similarly tries to register to popular forking setups): h=
ttps://www.rubydoc.info/github/newrelic/rpm/NewRelic%2FAgent:after_fork


#### Continuously check `Process.pid`

Some libraries chose to instead keep the process PID in a variable, and to =
regularly compare it to `Process.pid` to detect forked children.
Unfortunately `Process.pid` is relatively slow on Linux, and these checks t=
end to be in tight loops, so it's not uncommon when using these libraries
to spend `1` or `2%` of runtime in `Process.pid`.

Examples:

  - Rails's Active Record used to check `Process.pid` https://github.com/Sh=
opify/rails/blob/411ccbdab2608c62aabdb320d52cb02d446bb39c/activerecord/lib/=
active_record/connection_adapters/abstract/connection_pool.rb#L946, it stil=
l does but a bit less: https://github.com/rails/rails/pull/41850
  - the `Typhoeus` HTTP client: https://github.com/typhoeus/typhoeus/blob/a=
345545e5e4ac0522b883fe0cf19e5e2e807b4b0/lib/typhoeus/pool.rb#L34-L42
  - Redis client: https://github.com/redis/redis-rb/blob/6542934f01b9c390ee=
450bd372209a04bc3a239b/lib/redis/client.rb#L384
  - Some older versions of NewRelic RPM: https://github.com/opendna/scorera=
nking-api/blob/8fba96d23b4d3e6b64f625079c184f3a292bbc12/vendor/gems/ruby/1.=
9.1/gems/newrelic_rpm-3.7.3.204/lib/new_relic/agent/harvester.rb#L39-L41

#### Continuously check `Thread#alive?`

Similar to checking `Process.pid`, but for the background thread use case. =
`Thread#alive?` is regularly checked, and if the thread is dead, it is assu=
med that the process was forked.
It's much less costly than a `Process.pid`, but also a bit less reliable as=
 the thread could have died for other reasons. It also delays re-creating t=
he thread to the next check rather than immediately upon forking.

Examples:

  - `statsd-instrument`: https://github.com/Shopify/statsd-instrument/blob/=
0445cca46e29aa48e9f1efec7c72352aff7ec931/lib/statsd/instrument/batched_udp_=
sink.rb#L63

#### Decorate `Kernel.fork` and `Process.fork`

Another solution is to prepend a module in `Process` and `Kernel`, to decor=
ate the fork method and implement your own callback. It works well, but is =
made difficult by `Kernel.fork`.


Examples:

  - Active Support: https://github.com/rails/rails/blob/9aed3dcdfea6b64c180=
35f8e2622c474ba499846/activesupport/lib/active_support/fork_tracker.rb
  - `dd-trace-rb`: https://github.com/DataDog/dd-trace-rb/blob/793946146b47=
09289cfd459f3b68e8227a9f5fa7/lib/ddtrace/profiling/ext/forking.rb
  - To some extent, `nakayoshi_fork` decorates the `fork` method: https://g=
ithub.com/ko1/nakayoshi_fork/blob/19ef5efc51e0ae51d7f5f37a0b785309bf16e97f/=
lib/nakayoshi_fork.rb

### Proposals

I see two possible features to improve this situation:

#### `Process.before_fork` and `Process.after_fork` callbacks

One solution would be for Ruby to expose a callback API for these two event=
s, similar to `Kernel.at_exit`.

#### Make `Kernel.fork` a delegator

A simpler change would be to just make `Kernel.fork` a delegator to `Proces=
s.fork`. This would make it much easier to prepend a module on `Process` fo=
r each library to implement its own callback.

Proposed patch: https://github.com/ruby/ruby/pull/4361



-- =

https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=3Dunsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>