Samuel Williams <space.ship.traveller / gmail.com> wrote:
> Eric, thanks so much for the detailed reply and understanding the
> intent of my original message so well.

No problem :)

> As you've been so generous to me with your reply, I'm going to try to
> do the same for you.

> > Cool.  Thanks for sharing this; even if there's stuff below
> > I completely disagree with :)
> 
> It wouldn't be a good discussion if everyone agreed with each other :)
> 
> > Threads actually perform great for high throughput situations;
> > but yes, they're too big for dealing with network latency.
> 
> I hear what you are saying. From my point of view, the problem with
> the GIL/Threads is that you essentially get all the problems of
> Threads with non of the benefits. It's simply impossible for two pure
> ruby functions to execute at the same time in MRI. The only point is
> for IO multiplexing and it's really not a great solution, with large
> numbers of inflight requests being the main concern.

Of course Ruby is not only for C100K clients/servers.
Yes, I find Threads currently have useful cases (see below)

> >> - IO objects expose a lot of behaviour which is irrelevant to most
> >> use-cases (io/console, io/nonblock which doesn't seem to work at all).
> >> This makes it hard to provide a clean high-level interface.
> >
> > I'm not sure what you mean by "doesn't seem to work at all"
> 
> [1] pry(main)> require 'io/nonblock'
> [2] pry(main)> i, o = IO.pipe
> => [#<IO:fd 11>, #<IO:fd 12>]
> [3] pry(main)> i.nonblock?
> => false
> [4] pry(main)> i.nonblock = true
> => true
> [5] pry(main)> i.nonblock?
> => true
> [6] pry(main)> i.read
> asdf
> ^CInterrupt:
> from (pry):6:in `read'
> [7] pry(main)> i.read(1024)
> ^CInterrupt:
> from (pry):7:in `read'
> [8] pry(main)> i.read_nonblock(1024)
> IO::EAGAINWaitReadable: Resource temporarily unavailable - read would block
> from <internal:prelude>:77:in `__read_nonblock'
> 
> I would have assumed line 6 should behave the same as line 8, but
> perhaps I just don't understand how that API works. The documentation
> is very sparse.

I suppose we can improve documentation (can you provide a patch? :)

I think this IO#read behavior was inherited from Ruby 1.8; where
all sockets/pipes were internally non-blocking for green
Threads.  Anyways, I think exposing synchronous behavior by
default is easier for end users.

> >> - All IO operations should be non-block with a super fast/simple API.
> >> APIs which take complex lists of arguments, in the hot path, should be
> >> avoided (exceptions: true for example). A separate function for
> >> blocking and non-blocking IO is a huge cop-out.
> >
> > NAK.  I find value in using blocking accept/accept4 syscalls
> > (not emulating blocking with green threads/fibers + epoll/kqueue;
> > not even with EPOLLEXCLUSIVE)
> >
> > TL; DR: I have studied the Linux kernel a bit and know
> > how to take advantage of it ---
> >
> > This is because some blocking syscalls can take advantage of
> > "wake one" behavior in the Linux kernel to avoid thundering
> > herds.  EPOLLEXCLUSIVE was added a few years ago to Linux to
> > appease some epoll users; but it's still worse for load
> > distribution at high accept rates.  I'd rather embrace the fact
> > that epoll (and kqueue) themselves are (and must be) MT-friendly.
> >
> > Similarly to accept, UNIXSocket#recv_io has the same behavior
> > with blocking recvmsg when the receiving socket is shared
> > between multiple processes.
> 
> Yes, I looked at this.
> 
> I'm not convinced it's the right way to write a high performance server.
> 
> Using SO_REUSEPORT, you can simply spin up as many processes as you
> like, each listening on the same socket. The OS determines which
> process the request goes to.
> 
> It's currently broken on macOS, but works beautifully and scales
> magnificently on Linux. In theory it also works on BSD.

I still support Linux 2.6.18 and 2.6.32 in cmogstored,
and SO_REUSEPORT only exists in 3.9+

How does SO_REUSEPORT handle process shutdown these days?

I remember there were problems in earlier implementations losing
connections if a process closed/exited a listener which had a
socket queued up for it, but haven't followed up on that.  I
think I saw a bit on haproxy being successful with it, though.

Implementation-wise, having dedicated acceptor thread simplifies
the main event loop for epoll_ctl use:  EPOLL_CTL_ADD is only
called once per client in the dedicated accept thread, the main
worker threads will not have to check anything and can always
call EPOLL_CTL_MOD without caring about EPOLL_CTL_ADD.

One thread per listener is negligible overhead when I have
dozens/hundreds of disks and need >=1 threads per disk.

> > Furthermore, non-blocking I/O on regular files and directories
> > does not exist in any portable or complete way on *nix
> > platforms.  Threads (and processes)[2] are the only reasonable
> > ways to handle regular files and directories; even on NFS and other
> > network filesystems.
> >
> > [2] inside Linux, they're both "tasks" with different levels of
> >     sharing; the clone(2) manpage might be helpful to understand
> >     this.
> >
> 
> Yes, it's an interesting conundrum - avoiding blocking may simply be
> an impossible goal. Actually, with pre-emptive multi-tasking, that's
> basically a given.
> 
> However, we can avoid it for most common operations, which is a good
> start. In practice, thread pools (e.g. as used in libuv for blocking
> operations like getaddrinfo) might solve the majority of problems.

getaddrinfo in a thread pool is wasteful, and thread pools can
easily suffer from head-of-line blocking.

Same applies to AIO, which uses thread pools, more the footnote:
  http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/81643

> If your app is going to be slow due to resolving addresses, reading
> directories, and so on - it doesn't matter if the operation is
> blocking or not - latency is going to be affected. It's just that it
> also affects multiplexed operations.

Right.  The bigger problem is head-of-line blocking for unrelated
events/clients accessing different resources.

In a webserver, some clients are accessing contended resources,
they will encounter latency.  However, that latency should not
affect other clients accessing fast resources at the same time.

Thats why I want Ruby to continue to have access to native
threads; it gives folks aware of these limitations the ability
to engineer solutions around it.

<snip>

> >> - Fibers are fast, but I think they need to be *the* first class
> >> concurrency construct in Ruby and made as fast as possible. I heard
> >> that calling resume on a fiber does a syscall (if this is the case it
> >> should be removed if possible).
> >
> > We're working on auto-scheduling Fibers for 2.5:
> >
> >         https://bugs.ruby-lang.org/issues/13618
> >         (but API design is hard and not my department)

<snip>

> Auto-scheduling Fibers seems like an interesting idea. Making core
> Ruby heavy seems like a mistake though.
> 
> Why not just a gem, and provide the necessary hooks? Async does
> exactly what is proposed in this issue but with no modifications to
> core Ruby, building on well-established C libraries where possible.

See my response on that ticket
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/81643

> >
> >> - Threads as they are currently implemented should be removed from
> >> Ruby 3.0 - they actually make for a very poor concurrency concept,
> >> considering the GIL. They make all other operations more complex with
> >> no real benefit given how they are currently implemented. Reasoning
> >> about threads is bloody hard. It's even worse that the GIL hides a lot
> >> of broken behaviour. What are threads useful for? IO concurrency? yes
> >> but it's poor performing. Computational concurrency? not unless you
> >> use JRuby, Rubinius, and even then, my experience with those platforms
> >> has generally been sub-par.
> >
> > Again, native threads are useful for filesystem I/O, despite the GVL.
> >
> > I wish threads could be more useful by releasing GVL for readdir and
> > stat operations; but releasing+acquring GVL is expensive :<
> > Short term, I might complete my attempts to make GVL faster for 2.5.
> 
> In my testing the GVL is a significantly source of latency and
> convention in threaded servers.
> 
> I should make a comparison for you with real numbers. 8 threads vs 8 processes.

Of course, GVL hurts performance, even in single-thread cases.

Maybe work-in-progress patch for GVL using futex will help Linux
users with contention:

  https://80x24.org/spew/20170509062022.4413-1-e / 80x24.org/raw

But I'm not satisfied with the single-core regression and
will try to fix it as time allows.

<snip>

> >> I think that Ruby 3.0 should
> >> - either remove the GIL, or remove Thread.
> >
> > The former would be nice :)  As has been mentioned by others;
> > doing it without hurting single-thread performance is the hard
> > part.
> 
> How does it hurt single threaded performance?

AFAIK a test removal was to use fine-grained locks everywhere;
so that meant memory synchronization overhead; same problem
you'll see with releasing+reacquiring GVL for fast ops in 1
thread.

This is why readdir, stat, unlink wrappers in Ruby still hold
GVL: for the hot cache case.  Sadly, that means the entire
Ruby process will stall when NFS goes out to lunch,
instead of just a single thread stalling.

> People have already made implementations of Ruby without the GVL.

I think it's possible without regressions, just takes time:

> > I'm still hopeful we can take advantage of liburcu and steal
> > more ideas from the Linux kernel (unfortunately, Ruby did not go
> > to GPL-2+ back in the day), but liburcu is LGPL-2.1+ and we
> > already use libgmp optionally.

But, neither matz nor ko1 are big on promoting the existing
Thread API, but want to develop new/better actor APIs which
would be safer.  *shrug*

> >> - simplify IO classes and allow permanent non-blocking mode (e.g.
> >> io.nonblocking = true; io.read gives data or :wait_readable).
> >
> > That's backwards-incompatible and I'd rather we keep using
> > *_nonblock.  In Ruby, 2.5 *_nonblock will take advantage of
> > MSG_DONTWAIT and avoid unnecessary fcntl for sockets under
> > Linux: https://bugs.ruby-lang.org/issues/13362
> 
> That's a good idea.
> 
> The problem is, the behaviour of the underlying IO is leaking out
> through the function name, which I find ugly. It means that every
> function has two versions, and in addition to that, the terrible idea
> to use exceptions, then compounded by the "fix" to use a keyword
> argument on every function call, which only works on some versions of
> Ruby, and you end up with this:
> 
> https://github.com/socketry/async-io/blob/97d46edfbe849df608b79eefc81773548d24cb9d/lib/async/io/generic.rb#L105-L124

Sorry, I wasn't around when the exceptions were added in
1.8/1.9; and also for not getting "exception: false" added
sooner.

> It would be better if Ruby just implemented the core read/write and
> nonblocking semantics as one might expect, and then let library
> authors take care of the rest. Instead, I feel like the current IO
> situation in Ruby is over-engineered and facing an identity crisis.
> Even for something as simple as read into a string buffer, has a huge
> performance and cognitive overhead.

I try to stay away from API design; but I prefer
non-blocking/blocking semantics to be per-call rather than
stateful to the object.

It makes it easier to figure out what the caller expects when
reading someone elses at code.

At least for Linux + Ruby 2.5, we can avoid fcntl syscalls, too.
And maybe one day the proposed API in
<https://cr.yp.to/unix/nonblock.html> can become available.

> There is almost no case where one would want both blocking and
> non-blocking semantics on the same socket.

I have :)   http://mid.gmane.org/20150513023712.GA4206 / dcvr.yhbt.net

Unsubscribe: <mailto:ruby-talk-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk>