Issue #14739 has been updated by ioquatix (Samuel Williams). I tested async-http, a web server, it has a basic performance spec using `wrk` as the client. I ran it several times and report the best result of each below. It's difficult to make a judgement. I'd like to say performance was improved but if so, < 5%. However, this benchmark is testing an entire web server stack. Context switching only happens a few times per request.. If I had to take a guess, maybe not more than 4 times (accept, read request, write response). In many cases, we only context switch if the operation would block. ``` # Without libcoro-fiber Async::HTTP::Server simple response Running 2m test @ http://127.0.0.1:9292/ 8 threads and 8 connections Thread Stats Avg Stdev Max +/- Stdev Latency 110.06us 647.25us 67.72ms 99.33% Req/Sec 12.58k 3.07k 26.94k 70.77% 12021990 requests in 2.00m, 401.28MB read Requests/sec: 100100.72 Transfer/sec: 3.34MB # With libcoro-fiber Async::HTTP::Server simple response Running 2m test @ http://127.0.0.1:9292/ 8 threads and 8 connections Thread Stats Avg Stdev Max +/- Stdev Latency 106.47us 834.32us 99.45ms 99.46% Req/Sec 12.66k 2.95k 17.61k 71.12% 12093398 requests in 2.00m, 403.66MB read Requests/sec: 100694.76 Transfer/sec: 3.36MB ``` This result surprised me a little bit, but now that I think about it, it could make sense. Because the cost of network (read/write) and processing (parsing, generating response, buffers, GC) far outweigh the fiber yield/resume, which is already minimised. In real world situations, the results should lean more in favour of libcoro. Just for interest, I also collect system call stats. ``` # Without libcoro % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 45.76 4.635066 2 2095278 sendto 32.47 3.288691 1 4191323 rt_sigprocmask 20.90 2.117062 1 2095611 324 recvfrom 0.67 0.068189 9741 7 poll 0.07 0.006821 1 6256 5313 openat 0.03 0.003404 1 4034 5 lstat 0.01 0.001072 1 1158 read 0.01 0.001049 1 987 close 0.01 0.000805 1 901 421 stat 0.01 0.000627 25 25 clone 0.01 0.000624 1 793 fstat 0.01 0.000521 4 124 mmap 0.00 0.000475 1 798 246 fcntl 0.00 0.000475 2 297 1 epoll_wait 0.00 0.000402 3 140 mremap 0.00 0.000386 1 346 322 epoll_ctl 0.00 0.000331 1 557 552 ioctl 0.00 0.000323 16 20 futex 0.00 0.000321 3 94 mprotect 0.00 0.000307 1 213 brk 0.00 0.000255 4 62 getdents 0.00 0.000183 1 291 getuid 0.00 0.000180 1 292 geteuid 0.00 0.000177 1 292 getegid 0.00 0.000172 1 291 getgid 0.00 0.000096 3 36 pipe2 0.00 0.000074 6 12 munmap 0.00 0.000066 11 6 2 execve 0.00 0.000052 2 23 14 accept4 0.00 0.000047 3 18 prctl 0.00 0.000047 2 27 set_robust_list 0.00 0.000045 2 19 getpid 0.00 0.000040 0 81 2 rt_sigaction 0.00 0.000028 2 16 8 access 0.00 0.000017 1 15 getcwd 0.00 0.000016 1 14 readlink 0.00 0.000016 0 241 238 newfstatat 0.00 0.000014 0 96 lseek 0.00 0.000013 1 10 chdir 0.00 0.000013 3 4 arch_prctl 0.00 0.000012 0 25 setsockopt 0.00 0.000009 0 25 getsockname 0.00 0.000007 2 4 prlimit64 0.00 0.000006 0 17 getsockopt 0.00 0.000006 3 2 getrandom 0.00 0.000004 2 2 sched_getaffinity 0.00 0.000004 4 1 clock_gettime 0.00 0.000003 2 2 write 0.00 0.000003 3 1 sigaltstack 0.00 0.000003 2 2 set_tid_address 0.00 0.000002 2 1 vfork 0.00 0.000001 1 1 wait4 0.00 0.000001 1 1 getresgid 0.00 0.000000 0 8 pipe 0.00 0.000000 0 1 dup2 0.00 0.000000 0 8 socket 0.00 0.000000 0 8 bind 0.00 0.000000 0 8 listen 0.00 0.000000 0 1 sysinfo 0.00 0.000000 0 1 getresuid 0.00 0.000000 0 8 epoll_create1 ------ ----------- ----------- --------- --------- ---------------- 100.00 10.128563 8400935 7448 total # With libcoro % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 65.83 5.263501 2 2708883 sendto 32.87 2.628193 1 2709155 263 recvfrom 1.06 0.084583 16917 5 poll 0.09 0.006915 1 6232 5313 openat 0.06 0.004405 1 4034 5 lstat 0.02 0.001276 1 1123 read 0.02 0.001207 1 833 379 stat 0.01 0.000996 1 963 close 0.01 0.000510 1 785 fstat 0.01 0.000492 1 533 528 ioctl 0.00 0.000330 2 162 1 epoll_wait 0.00 0.000327 0 797 246 fcntl 0.00 0.000285 11 25 clone 0.00 0.000253 1 232 brk 0.00 0.000253 1 284 260 epoll_ctl 0.00 0.000239 2 123 mmap 0.00 0.000207 2 95 mprotect 0.00 0.000168 8 20 futex 0.00 0.000163 3 62 getdents 0.00 0.000142 0 291 getuid 0.00 0.000139 1 238 235 newfstatat 0.00 0.000133 0 292 geteuid 0.00 0.000131 0 291 getgid 0.00 0.000129 0 292 getegid 0.00 0.000080 7 12 munmap 0.00 0.000058 2 32 rt_sigprocmask 0.00 0.000057 1 88 lseek 0.00 0.000057 2 36 pipe2 0.00 0.000044 1 81 2 rt_sigaction 0.00 0.000043 3 14 readlink 0.00 0.000039 2 16 8 access 0.00 0.000036 2 22 13 accept4 0.00 0.000035 1 27 set_robust_list 0.00 0.000033 2 18 prctl 0.00 0.000028 1 19 getpid 0.00 0.000026 2 15 getcwd 0.00 0.000020 2 10 chdir 0.00 0.000013 13 1 wait4 0.00 0.000009 5 2 getrandom 0.00 0.000008 0 25 setsockopt 0.00 0.000006 3 2 write 0.00 0.000006 0 25 getsockname 0.00 0.000003 3 1 vfork 0.00 0.000003 1 6 2 execve 0.00 0.000003 1 4 arch_prctl 0.00 0.000003 2 2 set_tid_address 0.00 0.000003 1 4 prlimit64 0.00 0.000002 0 17 getsockopt 0.00 0.000002 2 1 sigaltstack 0.00 0.000001 1 1 getresuid rake aborted! 0.00 0.000001 1 1 getresgid 0.00 0.000001 1 2 sched_getaffinity 0.00 0.000000 0 8 pipe 0.00 0.000000 0 1 dup2 0.00 0.000000 0 8 socket 0.00 0.000000 0 8 bind 0.00 0.000000 0 8 listen Interrupt: 0.00 0.000000 0 1 sysinfo 0.00 0.000000 0 1 clock_gettime 0.00 0.000000 0 8 epoll_create1 ------ ----------- ----------- --------- --------- ---------------- ``` `rt_sigprocmask` was gone because it's not invoked by libcoro unless using `swapcontext`. ---------------------------------------- Feature #14739: Improve fiber yield/resume performance https://bugs.ruby-lang.org/issues/14739#change-71883 * Author: ioquatix (Samuel Williams) * Status: Open * Priority: Normal * Assignee: * Target version: ---------------------------------------- I am interested to improve Fiber yield/resume performance. I've used this library before: http://software.schmorp.de/pkg/libcoro.html and handled millions of HTTP requests using it. I'd suggest to use that library. As this is used in many places in Ruby (e.g. enumerable) it could be a big performance win across the board. Here is a nice summary of what was done for RethinkDB: https://rethinkdb.com/blog/making-coroutines-fast/ Does Ruby currently reuse stacks? This is also a big performance win if it's not being done already. -- https://bugs.ruby-lang.org/ Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe> <http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>