Issue #14841 has been updated by hirura (Hiroyuki URANISHI).

File gdb_trace_normal.txt added
File gdb_trace_reproduced.txt added
ruby -v changed from ruby 2.5.1p57 (2018-03-29 revision 63029) [x86_64-linux] to ruby 2.6.0dev (2018-06-15 trunk 63671) [x86_64-linux]

Hi,

I tried getting gdb backtrace of a reproduced, means stopped in this case, process.
Attached files are the outputs of reproduced case and normal case.

Could anyone help on this ?


hirura (Hiroyuki URANISHI) wrote:
> Hi,
> 
> 
> I'm now writing a multi thread application, which uses IO.pipe, IO#close and IO#readpartial.
> During running a code many times, I found that, even though calling IO#close on a write side object instantiated by IO.pipe, IO#readpartial of the read side object very rarely does not raise EOFError and keeps blocking.
> 
> The below code is working fine in most cases.
> 1. Thread t2 generates '0' and '1' values and writes them on w object.
> 2. Thread t1 read from r object with IO#readpartial
> 3. After thread t2 exited, close w object to finish thread t1
> 
> ~~~ ruby
> require 'logger'
> 
> logger = Logger.new 'logger.log', 1, 10000000
> logger.level = Logger::DEBUG
> 
> r, w = IO.pipe
> 
> t0 = Thread.new{
>   # This is a dummy thread to avoid being killed by Ruby interpreter automatically due to deadlock
>   loop do
>     sleep 1
>     Thread.pass
>   end
> }
> 
> t1 = Thread.new{
>   loop do
>     begin
>       logger.debug { "r.readpartial(10240)" }
>       data_read = r.readpartial(10240)
>       logger.debug { "data_read: #{data_read}" }
>     rescue EOFError
>       logger.info { "EOFError" }
>       break
>     rescue => e
>       logger.error { e.message }
>     end
>   end
>   logger.info { "exiting t1" }
> }
> 
> t2 = Thread.new{
>   2.times{ |i|
>     sleep 0.1
>     w.write i.to_s
>   }
>   logger.info { "exiting t2" }
> }
> 
> begin
>   t2.join
>   logger.info { "t2 exited" }
>   logger.info { "close w" }
>   w.close
>   logger.info { "w closed" }
> rescue IOError
>   logger.info { "IOError when closing w" }
> end
> 
> t1.join
> logger.info { "t1 exited" }
> 
> begin
>   r.close
> rescue IOError
>   logger.info { "IOError when closing r" }
> end
> ~~~
> 
> And I run the above code many times like:
> 
> ~~~
> $ i=0; while :; do echo $i; ruby readpartial_and_eof_test.rb; i=$(expr $i + 1); done
> 0
> 1
> (Snip)
> 3892
> 3893
> 3894
> (Stopped here)
> ~~~
> 
> As seen above, during 3894th iteration, the code is not finished.
> 
> The output of logger is the below. Closing w object is done, but EOFError looks not being raised.
> 
> ~~~
> $ tail -n 32 logger.log 
> I, [2018-06-10T22:09:58.738087 #19893]  INFO -- : t1 exited
> D, [2018-06-10T22:09:58.871906 #19916] DEBUG -- : r.readpartial(10240)
> D, [2018-06-10T22:09:58.972198 #19916] DEBUG -- : data_read: 0
> D, [2018-06-10T22:09:58.972327 #19916] DEBUG -- : r.readpartial(10240)
> I, [2018-06-10T22:09:59.072294 #19916]  INFO -- : exiting t2
> D, [2018-06-10T22:09:59.072441 #19916] DEBUG -- : data_read: 1
> D, [2018-06-10T22:09:59.072758 #19916] DEBUG -- : r.readpartial(10240)
> I, [2018-06-10T22:09:59.072667 #19916]  INFO -- : t2 exited
> I, [2018-06-10T22:09:59.072915 #19916]  INFO -- : close w
> I, [2018-06-10T22:09:59.073011 #19916]  INFO -- : w closed
> I, [2018-06-10T22:09:59.073101 #19916]  INFO -- : EOFError
> I, [2018-06-10T22:09:59.073166 #19916]  INFO -- : exiting t1
> I, [2018-06-10T22:09:59.073268 #19916]  INFO -- : t1 exited
> D, [2018-06-10T22:09:59.165037 #19944] DEBUG -- : r.readpartial(10240)
> D, [2018-06-10T22:09:59.265229 #19944] DEBUG -- : data_read: 0
> D, [2018-06-10T22:09:59.265279 #19944] DEBUG -- : r.readpartial(10240)
> D, [2018-06-10T22:09:59.365360 #19944] DEBUG -- : data_read: 1
> D, [2018-06-10T22:09:59.365469 #19944] DEBUG -- : r.readpartial(10240)
> I, [2018-06-10T22:09:59.365252 #19944]  INFO -- : exiting t2
> I, [2018-06-10T22:09:59.365640 #19944]  INFO -- : t2 exited
> I, [2018-06-10T22:09:59.365709 #19944]  INFO -- : close w
> I, [2018-06-10T22:09:59.365754 #19944]  INFO -- : w closed
> I, [2018-06-10T22:09:59.365819 #19944]  INFO -- : EOFError
> I, [2018-06-10T22:09:59.365959 #19944]  INFO -- : exiting t1
> I, [2018-06-10T22:09:59.366078 #19944]  INFO -- : t1 exited
> D, [2018-06-10T22:09:59.472484 #19967] DEBUG -- : r.readpartial(10240)
> D, [2018-06-10T22:09:59.572743 #19967] DEBUG -- : data_read: 0
> D, [2018-06-10T22:09:59.572891 #19967] DEBUG -- : r.readpartial(10240)
> I, [2018-06-10T22:09:59.672976 #19967]  INFO -- : exiting t2
> I, [2018-06-10T22:09:59.673165 #19967]  INFO -- : t2 exited
> I, [2018-06-10T22:09:59.673199 #19967]  INFO -- : close w
> I, [2018-06-10T22:09:59.673220 #19967]  INFO -- : w closed
> $
> ~~~
> 
> 
> This issue occurred in CentOS 7.2.1511 and Ubuntu 16.04.4 LTS.
> 
> * CentOS
> ~~~
> $ cat /etc/redhat-release 
> CentOS Linux release 7.2.1511 (Core) 
> $ uname -a
> Linux mfz-remote-gw-01 3.10.0-327.18.2.el7.x86_64 #1 SMP Thu May 12 11:03:55 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
> $ ruby -v
> ruby 2.5.1p57 (2018-03-29 revision 63029) [x86_64-linux]
> $ rbenv -v
> rbenv 1.1.1-30-gc8ba27f
> $ gcc -v
> Using built-in specs.
> COLLECT_GCC=gcc
> COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper
> Target: x86_64-redhat-linux
> Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin --enable-initfini-array --disable-libgcj --with-isl=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/isl-install --with-cloog=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/cloog-install --enable-gnu-indirect-function --with-tune=generic --with-arch_32=x86-64 --build=x86_64-redhat-linux
> Thread model: posix
> gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) 
> ~~~
> 
> * Ubuntu
> ~~~
> $ cat /etc/os-release 
> NAME="Ubuntu"
> VERSION="16.04.4 LTS (Xenial Xerus)"
> ID=ubuntu
> ID_LIKE=debian
> PRETTY_NAME="Ubuntu 16.04.4 LTS"
> VERSION_ID="16.04"
> HOME_URL="http://www.ubuntu.com/"
> SUPPORT_URL="http://help.ubuntu.com/"
> BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
> VERSION_CODENAME=xenial
> UBUNTU_CODENAME=xenial
> $ uname -a
> Linux epsilon 4.4.0-122-generic #146-Ubuntu SMP Mon Apr 23 15:34:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
> $ ruby -v
> ruby 2.5.1p57 (2018-03-29 revision 63029) [x86_64-linux]
> $ rbenv -v
> rbenv 1.1.1-37-g1c772d5
> $ gcc -v
> Using built-in specs.
> COLLECT_GCC=gcc
> COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/5/lto-wrapper
> Target: x86_64-linux-gnu
> Configured with: ../src/configure -v --with-pkgversion='Ubuntu 5.4.0-6ubuntu1~16.04.9' --with-bugurl=file:///usr/share/doc/gcc-5/README.Bugs --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-5 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-5-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-5-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-5-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --disable-werror --
 with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
> Thread model: posix
> gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.9) 
> ~~~



----------------------------------------
Bug #14841: Very rarely IO#readpartial does not raise EOFError
https://bugs.ruby-lang.org/issues/14841#change-72518

* Author: hirura (Hiroyuki URANISHI)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.6.0dev (2018-06-15 trunk 63671) [x86_64-linux]
* Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN
----------------------------------------
Hi,


I'm now writing a multi thread application, which uses IO.pipe, IO#close and IO#readpartial.
During running a code many times, I found that, even though calling IO#close on a write side object instantiated by IO.pipe, IO#readpartial of the read side object very rarely does not raise EOFError and keeps blocking.

The below code is working fine in most cases.
1. Thread t2 generates '0' and '1' values and writes them on w object.
2. Thread t1 read from r object with IO#readpartial
3. After thread t2 exited, close w object to finish thread t1

~~~ ruby
require 'logger'

logger = Logger.new 'logger.log', 1, 10000000
logger.level = Logger::DEBUG

r, w = IO.pipe

t0 = Thread.new{
  # This is a dummy thread to avoid being killed by Ruby interpreter automatically due to deadlock
  loop do
    sleep 1
    Thread.pass
  end
}

t1 = Thread.new{
  loop do
    begin
      logger.debug { "r.readpartial(10240)" }
      data_read = r.readpartial(10240)
      logger.debug { "data_read: #{data_read}" }
    rescue EOFError
      logger.info { "EOFError" }
      break
    rescue => e
      logger.error { e.message }
    end
  end
  logger.info { "exiting t1" }
}

t2 = Thread.new{
  2.times{ |i|
    sleep 0.1
    w.write i.to_s
  }
  logger.info { "exiting t2" }
}

begin
  t2.join
  logger.info { "t2 exited" }
  logger.info { "close w" }
  w.close
  logger.info { "w closed" }
rescue IOError
  logger.info { "IOError when closing w" }
end

t1.join
logger.info { "t1 exited" }

begin
  r.close
rescue IOError
  logger.info { "IOError when closing r" }
end
~~~

And I run the above code many times like:

~~~
$ i=0; while :; do echo $i; ruby readpartial_and_eof_test.rb; i=$(expr $i + 1); done
0
1
(Snip)
3892
3893
3894
(Stopped here)
~~~

As seen above, during 3894th iteration, the code is not finished.

The output of logger is the below. Closing w object is done, but EOFError looks not being raised.

~~~
$ tail -n 32 logger.log 
I, [2018-06-10T22:09:58.738087 #19893]  INFO -- : t1 exited
D, [2018-06-10T22:09:58.871906 #19916] DEBUG -- : r.readpartial(10240)
D, [2018-06-10T22:09:58.972198 #19916] DEBUG -- : data_read: 0
D, [2018-06-10T22:09:58.972327 #19916] DEBUG -- : r.readpartial(10240)
I, [2018-06-10T22:09:59.072294 #19916]  INFO -- : exiting t2
D, [2018-06-10T22:09:59.072441 #19916] DEBUG -- : data_read: 1
D, [2018-06-10T22:09:59.072758 #19916] DEBUG -- : r.readpartial(10240)
I, [2018-06-10T22:09:59.072667 #19916]  INFO -- : t2 exited
I, [2018-06-10T22:09:59.072915 #19916]  INFO -- : close w
I, [2018-06-10T22:09:59.073011 #19916]  INFO -- : w closed
I, [2018-06-10T22:09:59.073101 #19916]  INFO -- : EOFError
I, [2018-06-10T22:09:59.073166 #19916]  INFO -- : exiting t1
I, [2018-06-10T22:09:59.073268 #19916]  INFO -- : t1 exited
D, [2018-06-10T22:09:59.165037 #19944] DEBUG -- : r.readpartial(10240)
D, [2018-06-10T22:09:59.265229 #19944] DEBUG -- : data_read: 0
D, [2018-06-10T22:09:59.265279 #19944] DEBUG -- : r.readpartial(10240)
D, [2018-06-10T22:09:59.365360 #19944] DEBUG -- : data_read: 1
D, [2018-06-10T22:09:59.365469 #19944] DEBUG -- : r.readpartial(10240)
I, [2018-06-10T22:09:59.365252 #19944]  INFO -- : exiting t2
I, [2018-06-10T22:09:59.365640 #19944]  INFO -- : t2 exited
I, [2018-06-10T22:09:59.365709 #19944]  INFO -- : close w
I, [2018-06-10T22:09:59.365754 #19944]  INFO -- : w closed
I, [2018-06-10T22:09:59.365819 #19944]  INFO -- : EOFError
I, [2018-06-10T22:09:59.365959 #19944]  INFO -- : exiting t1
I, [2018-06-10T22:09:59.366078 #19944]  INFO -- : t1 exited
D, [2018-06-10T22:09:59.472484 #19967] DEBUG -- : r.readpartial(10240)
D, [2018-06-10T22:09:59.572743 #19967] DEBUG -- : data_read: 0
D, [2018-06-10T22:09:59.572891 #19967] DEBUG -- : r.readpartial(10240)
I, [2018-06-10T22:09:59.672976 #19967]  INFO -- : exiting t2
I, [2018-06-10T22:09:59.673165 #19967]  INFO -- : t2 exited
I, [2018-06-10T22:09:59.673199 #19967]  INFO -- : close w
I, [2018-06-10T22:09:59.673220 #19967]  INFO -- : w closed
$
~~~


This issue occurred in CentOS 7.2.1511 and Ubuntu 16.04.4 LTS.

* CentOS
~~~
$ cat /etc/redhat-release 
CentOS Linux release 7.2.1511 (Core) 
$ uname -a
Linux mfz-remote-gw-01 3.10.0-327.18.2.el7.x86_64 #1 SMP Thu May 12 11:03:55 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
$ ruby -v
ruby 2.5.1p57 (2018-03-29 revision 63029) [x86_64-linux]
$ rbenv -v
rbenv 1.1.1-30-gc8ba27f
$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin --enable-initfini-array --disable-libgcj --with-isl=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/isl-install --with-cloog=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/cloog-install --enable-gnu-indirect-function --with-tune=generic --with-arch_32=x86-64 --build=x86_64-redhat-linux
Thread model: posix
gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) 
~~~

* Ubuntu
~~~
$ cat /etc/os-release 
NAME="Ubuntu"
VERSION="16.04.4 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.4 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial
$ uname -a
Linux epsilon 4.4.0-122-generic #146-Ubuntu SMP Mon Apr 23 15:34:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
$ ruby -v
ruby 2.5.1p57 (2018-03-29 revision 63029) [x86_64-linux]
$ rbenv -v
rbenv 1.1.1-37-g1c772d5
$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/5/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 5.4.0-6ubuntu1~16.04.9' --with-bugurl=file:///usr/share/doc/gcc-5/README.Bugs --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-5 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-5-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-5-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-5-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --disable-werror --wi
 th-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.9) 
~~~

---Files--------------------------------
gdb_trace_normal.txt (15.6 KB)
gdb_trace_reproduced.txt (14.2 KB)


-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>