Issue #13887 has been updated by wanabe (_ wanabe).


normalperson (Eric Wong) wrote:
> s.wanabe / gmail.com wrote:
>  > I guess `cont->machine.stack_src` should not contain the under thread_start_func_1() stack.
>  > Because this area will be used and overwritten by pthread, especially stack_list_{add,del} in nptl/allocatestack.c.
>  > If cont_restore_1() copies `cont->machine.stack` to `cont->machine.stack_src`, it may be reverted above changes and corrupted stack_cache list structures.
>  
>  I think this makes your patch necessary for callcc use
>  regardless of FIBER_USE_NATIVE value.
>  
>  Shall I commit [ruby-core:82925]?

Thank you for your advice. I didn't care about callcc.
Your refined patch [ruby-core:82925] looks good for me, but I want to hear the opinion of Evaluator Maintainer just in case.

Sasada-san, how do you think about the issue?

----------------------------------------
Bug #13887: test/ruby/test_io.rb may get stuck with FIBER_USE_NATIVE=0 on Linux
https://bugs.ruby-lang.org/issues/13887#change-66852

* Author: wanabe (_ wanabe)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: 
* Backport: 2.2: UNKNOWN, 2.3: UNKNOWN, 2.4: UNKNOWN
----------------------------------------
Eric Wong wrote in https://bugs.ruby-lang.org/issues/13875#note-5 [ruby-core:82708]
> However, test/ruby/test_io.rb seems stuck when FIBER_USE_NATIVE is 0
> on my system...

I can reproduce with `make optflags="-DFIBER_USE_NATIVE=0 -O0" test-all TESTOPTS="$(git rev-parse --show-cdup)/test/ruby/test_io.rb"` on my Ubuntu 17.10, glibc 2.24-12.

Here is a reduced code.
This code don't stop with SIGINT (CTRL+C) so I had to use SIGKILL.

```
1000.times do |i|
  p i
  r, w = IO.pipe
  w.puts "foo"
  w.close
  rt = Thread.new do
    r.each_char.next
    r.close
  end
  Thread.new {}.join # <= stuck
  rt.join
end
```

GDB showed ALLOCATE_STACK() had entered in infinity loop at get_cached_stack(), static function of glibc.
I guess `stack_cache` local variable of allocatestack.c of glibc got corrupted.

```
$ gdb --args ./miniruby -v a.rb
(snip)
(gdb) r
Starting program: /home/takira/work/prog/ruby/ruby/tmp/.out.tmp/miniruby -v a.rb
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7f2865b5e700 (LWP 5568)]
ruby 2.5.0dev (2017-09-10 trunk 59457) [x86_64-linux]
0
(snip)
[Thread 0x7f8fe2ba4700 (LWP 27769) exited]
51
[Thread 0x7f8fe2da6700 (LWP 27768) exited]
^C
Thread 1 "miniruby" received signal SIGINT, Interrupt.
0x00007f8fe4014bc3 in get_cached_stack (memp=<synthetic pointer>, sizep=<synthetic pointer>) at allocatestack.c:194
warning: Source file is more recent than executable.
194	      if (FREE_P (curr) && curr->stackblock_size >= size)
(gdb) bt 5
#0  0x00007f8fe4014bc3 in get_cached_stack (memp=<synthetic pointer>, sizep=<synthetic pointer>)
    at allocatestack.c:194
#1  allocate_stack (stack=<synthetic pointer>, pdp=<synthetic pointer>, attr=0x7fffffffb350) at allocatestack.c:496
#2  __pthread_create_2_1 (newthread=0x555555d3f488, attr=0x7fffffffb350, 
    start_routine=0x555555713e6d <thread_start_func_1>, arg=0x555555d3f3c0) at pthread_create.c:663
#3  0x0000555555714013 in native_thread_create (th=0x555555d3f3c0) at ../../thread_pthread.c:1031
#4  0x000055555571827a in thread_create_core (thval=93824997656680, args=93824997656600, fn=0x0)
    at ../../thread.c:740
(More stack frames follow...)
(gdb) f 0
#0  0x00007f8fe4014bc3 in get_cached_stack (memp=<synthetic pointer>, sizep=<synthetic pointer>)
    at allocatestack.c:194
194	      if (FREE_P (curr) && curr->stackblock_size >= size)
(gdb) p (&stack_cache)->next
$1 = (struct list_head *) 0x7f8fe2da69c0
(gdb) p (&stack_cache)->next->next->next->next->next
$2 = (struct list_head *) 0x7f8fe2da69c0
```

I've reproduced it with ruby_2_4, ruby_2_3 and ruby_2_2.



-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>