Issue #5446 has been updated by tenderlovemaking (Aaron Patterson).


normalperson (Eric Wong) wrote:
> eregontp / gmail.com wrote:
>  > normalperson (Eric Wong) wrote:
>  > >  It's been a known problem for decades, now (at least since the
>  > >  days of mod_perl + DBI on Apache 1.x); and AFAIK there's no data
>  > >  leaks from it.  Anybody who makes that mistake would likely
>  > >  raise/crash/burn long before seeing, much less leaking sensitive data.
>  > 
>  > Yes, it's not a new problem.
>  > I disagree about no production leaks, because it happened to me on a website running for a national programming contest.
>  > For most of the contest it was fine as one process was able to handle the load,
>  > but at some point the webserver decided to spawn another process by forking,
>  > people starting seeing each's other solution, the scores were corrupted and everyone was puzzled as to what happened.
>  > We had to stop the contest due to this.

I've experienced a data leak similar to this.  Saying the impact was "catastrophic" would be an understatement. :(

>  fork is full of caveats; using atfork hook to work around one
>  caveat out of many is not a solution.  The solution is knowing
>  the caveats of the tools you use.
>  
>  In your case, it seemed like you were not paying attention to
>  the server setup at all and would not have known to use atfork
>  hook regardless if it was in the webserver or core.
>  
>  > I want to help protect future programmers from such bugs, if at all possible.
>  > And I believe it's possible.
>  > 
>  > >  I agree with Jeremy on this; it will likely cause new problems
>  > >  and surprises if libraries use it.
>  > 
>  > Let's design it so it doesn't.
>  > What's the harm/surprise to reconnect in at_fork(:after_child) for instance?
>  
>  It's a waste of time and resources when the child has no need for
>  a connection at all.  Simply put, library authors have no clue
>  how an application will use/manage processes and would
>  (rightfully) not bother using such hooks.

I think library authors can make things easier though.  Web frameworks, like Rails for example, are expected to handle this situation for the user.  In addition, say a library author provided no such feature like Sequel, how would a user know they need to call `DB.disconnect` after a fork?  Are they responsible for completely understanding the implementation of the library they are using?  Even if an end user called `DB.disconnect` in an after fork hook, what if that wasn't enough?  How would an end user know what needs to be called?

>  Also, consider that pthread_atfork has been around for many
>  years, it's not adopted by library authors (of C/C++ libraries)
>  because of problems surrounding it; and POSIX is even
>  considering deprecating pthread_atfork[1].
>  
>  How about an alternate proposal?
>  
>  	Introduce a new object_id-like identifier which changes
>  	across fork: Thread.current.thread_id
>  
>  It doesn't penalize platforms without fork, and can work well
>  with existing thread-aware code.

I think this is a good idea, but I'm not sure it addresses the communication issue I brought up.  IMO it would be great to have some sort of hook so that library authors can dictate what "the right thing to do" is after a fork (maybe there are other resources or caches that need to be cleaned, and maybe that changes from version to version).

>  IMHO, Thread.current.object_id being stable in forked child
>  isn't good; but I expect compatibility problems if we change it
>  at this point.  At least some usages of monitor.rb would
>  break.
>  
>  > The current hooks are webserver-specific and so migrating
>  > between unicorn/puma/passenger/etc means it's quite easy to
>  > forget to adapt to the new webserver hook, which would trigger
>  > this bug.
>  
>  I hate the amount of vendor lock-in each webserver has.
>  But making hooks which library authors can fire unpredictably
>  on application authors is worse, especially if there's no
>  "opt-out".

I think requiring users to specify a db disconnect after fork causes even more "vendor lock-in".  Lets say I did add the after fork code to deal with Sequel, but now I want to switch to a threaded webserver.  Now I have to do more work to figure out what's required (if anything) in a threaded environment.  It puts the onus on the app dev to figure out what's right for a particular environment, and that means it's harder to change: locking you in by making more work.

Additionally, forking servers all have to provide this type of hook anyway (Unicorn, Resque, Puma, to name a few) but today they have to specify their own API.  I think it would be great if we had a "Rack for fork hooks", if that makes sense.  :)


----------------------------------------
Feature #5446: at_fork callback API
https://bugs.ruby-lang.org/issues/5446#change-73086

* Author: normalperson (Eric Wong)
* Status: Assigned
* Priority: Normal
* Assignee: kosaki (Motohiro KOSAKI)
* Target version: 
----------------------------------------
It would be good if Ruby provides an API for registering fork() handlers.

This allows libraries to automatically and agnostically reinitialize resources
such as open IO objects in child processes whenever fork() is called by a user
application.  Use of this API by library authors will reduce false/improper
sharing of objects across processes when interacting with other
libraries/applications that may fork.

This Ruby API should function similarly to pthread_atfork() which allows
(at least) three different callbacks to be registered:

1) prepare - called before fork() in the original process
2) parent - called after fork() in the original process
3) child - called after fork() in the child process

It should be possible to register multiple callbacks for each action
(like at_exit and pthread_atfork(3)).

These callbacks should be called whenever fork() is used:

- Kernel#fork
- IO.popen
- ``
- Kernel#system

... And any other APIs I've forgotten about

I also want to consider handlers that only need to be called for plain
fork() use (without immediate exec() afterwards, like with `` and system()).

Ruby already has the internal support for most of this this to manage mutexes,
Thread structures, and RNG seed.  Currently, no external API is exposed.  I can
prepare a patch if an API is decided upon.




-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>