Issue #14739 has been updated by ioquatix (Samuel Williams).


I tested async-http, a web server, it has a basic performance spec using `wrk` as the client.

I ran it several times and report the best result of each below. It's difficult to make a judgement. I'd like to say performance was improved but if so, < 5%. However, this benchmark is testing an entire web server stack. Context switching only happens a few times per request.. If I had to take a guess, maybe not more than 4 times (accept, read request, write response). In many cases, we only context switch if the operation would block.

```
# Without libcoro-fiber
Async::HTTP::Server
  simple response
Running 2m test @ http://127.0.0.1:9292/
  8 threads and 8 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   110.06us  647.25us  67.72ms   99.33%
    Req/Sec    12.58k     3.07k   26.94k    70.77%
  12021990 requests in 2.00m, 401.28MB read
Requests/sec: 100100.72
Transfer/sec:      3.34MB

# With libcoro-fiber
Async::HTTP::Server
  simple response
Running 2m test @ http://127.0.0.1:9292/
  8 threads and 8 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   106.47us  834.32us  99.45ms   99.46%
    Req/Sec    12.66k     2.95k   17.61k    71.12%
  12093398 requests in 2.00m, 403.66MB read
Requests/sec: 100694.76
Transfer/sec:      3.36MB
```

This result surprised me a little bit, but now that I think about it, it could make sense. Because the cost of network (read/write) and processing (parsing, generating response, buffers, GC) far outweigh the fiber yield/resume, which is already minimised. In real world situations, the results should lean more in favour of libcoro.

Just for interest, I also collect system call stats.

```
# Without libcoro
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 45.76    4.635066           2   2095278           sendto
 32.47    3.288691           1   4191323           rt_sigprocmask
 20.90    2.117062           1   2095611       324 recvfrom
  0.67    0.068189        9741         7           poll
  0.07    0.006821           1      6256      5313 openat
  0.03    0.003404           1      4034         5 lstat
  0.01    0.001072           1      1158           read
  0.01    0.001049           1       987           close
  0.01    0.000805           1       901       421 stat
  0.01    0.000627          25        25           clone
  0.01    0.000624           1       793           fstat
  0.01    0.000521           4       124           mmap
  0.00    0.000475           1       798       246 fcntl
  0.00    0.000475           2       297         1 epoll_wait
  0.00    0.000402           3       140           mremap
  0.00    0.000386           1       346       322 epoll_ctl
  0.00    0.000331           1       557       552 ioctl
  0.00    0.000323          16        20           futex
  0.00    0.000321           3        94           mprotect
  0.00    0.000307           1       213           brk
  0.00    0.000255           4        62           getdents
  0.00    0.000183           1       291           getuid
  0.00    0.000180           1       292           geteuid
  0.00    0.000177           1       292           getegid
  0.00    0.000172           1       291           getgid
  0.00    0.000096           3        36           pipe2
  0.00    0.000074           6        12           munmap
  0.00    0.000066          11         6         2 execve
  0.00    0.000052           2        23        14 accept4
  0.00    0.000047           3        18           prctl
  0.00    0.000047           2        27           set_robust_list
  0.00    0.000045           2        19           getpid
  0.00    0.000040           0        81         2 rt_sigaction
  0.00    0.000028           2        16         8 access
  0.00    0.000017           1        15           getcwd
  0.00    0.000016           1        14           readlink
  0.00    0.000016           0       241       238 newfstatat
  0.00    0.000014           0        96           lseek
  0.00    0.000013           1        10           chdir
  0.00    0.000013           3         4           arch_prctl
  0.00    0.000012           0        25           setsockopt
  0.00    0.000009           0        25           getsockname
  0.00    0.000007           2         4           prlimit64
  0.00    0.000006           0        17           getsockopt
  0.00    0.000006           3         2           getrandom
  0.00    0.000004           2         2           sched_getaffinity
  0.00    0.000004           4         1           clock_gettime
  0.00    0.000003           2         2           write
  0.00    0.000003           3         1           sigaltstack
  0.00    0.000003           2         2           set_tid_address
  0.00    0.000002           2         1           vfork
  0.00    0.000001           1         1           wait4
  0.00    0.000001           1         1           getresgid
  0.00    0.000000           0         8           pipe
  0.00    0.000000           0         1           dup2
  0.00    0.000000           0         8           socket
  0.00    0.000000           0         8           bind
  0.00    0.000000           0         8           listen
  0.00    0.000000           0         1           sysinfo
  0.00    0.000000           0         1           getresuid
  0.00    0.000000           0         8           epoll_create1
------ ----------- ----------- --------- --------- ----------------
100.00   10.128563               8400935      7448 total

# With libcoro
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 65.83    5.263501           2   2708883           sendto
 32.87    2.628193           1   2709155       263 recvfrom
  1.06    0.084583       16917         5           poll
  0.09    0.006915           1      6232      5313 openat
  0.06    0.004405           1      4034         5 lstat
  0.02    0.001276           1      1123           read
  0.02    0.001207           1       833       379 stat
  0.01    0.000996           1       963           close
  0.01    0.000510           1       785           fstat
  0.01    0.000492           1       533       528 ioctl
  0.00    0.000330           2       162         1 epoll_wait
  0.00    0.000327           0       797       246 fcntl
  0.00    0.000285          11        25           clone
  0.00    0.000253           1       232           brk
  0.00    0.000253           1       284       260 epoll_ctl
  0.00    0.000239           2       123           mmap
  0.00    0.000207           2        95           mprotect
  0.00    0.000168           8        20           futex
  0.00    0.000163           3        62           getdents
  0.00    0.000142           0       291           getuid
  0.00    0.000139           1       238       235 newfstatat
  0.00    0.000133           0       292           geteuid
  0.00    0.000131           0       291           getgid
  0.00    0.000129           0       292           getegid
  0.00    0.000080           7        12           munmap
  0.00    0.000058           2        32           rt_sigprocmask
  0.00    0.000057           1        88           lseek
  0.00    0.000057           2        36           pipe2
  0.00    0.000044           1        81         2 rt_sigaction
  0.00    0.000043           3        14           readlink
  0.00    0.000039           2        16         8 access
  0.00    0.000036           2        22        13 accept4
  0.00    0.000035           1        27           set_robust_list
  0.00    0.000033           2        18           prctl
  0.00    0.000028           1        19           getpid
  0.00    0.000026           2        15           getcwd
  0.00    0.000020           2        10           chdir
  0.00    0.000013          13         1           wait4
  0.00    0.000009           5         2           getrandom
  0.00    0.000008           0        25           setsockopt
  0.00    0.000006           3         2           write
  0.00    0.000006           0        25           getsockname
  0.00    0.000003           3         1           vfork
  0.00    0.000003           1         6         2 execve
  0.00    0.000003           1         4           arch_prctl
  0.00    0.000003           2         2           set_tid_address
  0.00    0.000003           1         4           prlimit64
  0.00    0.000002           0        17           getsockopt
  0.00    0.000002           2         1           sigaltstack
  0.00    0.000001           1         1           getresuid
rake aborted!
  0.00    0.000001           1         1           getresgid
  0.00    0.000001           1         2           sched_getaffinity
  0.00    0.000000           0         8           pipe
  0.00    0.000000           0         1           dup2
  0.00    0.000000           0         8           socket
  0.00    0.000000           0         8           bind
  0.00    0.000000           0         8           listen
Interrupt: 
  0.00    0.000000           0         1           sysinfo
  0.00    0.000000           0         1           clock_gettime
  0.00    0.000000           0         8           epoll_create1
------ ----------- ----------- --------- --------- ----------------
```

`rt_sigprocmask` was gone because it's not invoked by libcoro unless using `swapcontext`.

----------------------------------------
Feature #14739: Improve fiber yield/resume performance
https://bugs.ruby-lang.org/issues/14739#change-71883

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
I am interested to improve Fiber yield/resume performance.

I've used this library before: http://software.schmorp.de/pkg/libcoro.html and handled millions of HTTP requests using it.

I'd suggest to use that library.

As this is used in many places in Ruby (e.g. enumerable) it could be a big performance win across the board.

Here is a nice summary of what was done for RethinkDB: https://rethinkdb.com/blog/making-coroutines-fast/

Does Ruby currently reuse stacks? This is also a big performance win if it's not being done already.



-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>