Issue #10776 has been updated by Alex Coomans.


Awesome, thank you for looking at this so quickly!

----------------------------------------
Bug #10776: Ruby Chooses Incorrect Load Path For rubygems.rb
https://bugs.ruby-lang.org/issues/10776#change-51225

* Author: Alex Coomans
* Status: Closed
* Priority: Low
* Assignee: 
* ruby -v: 2.0.0p598
* Backport: 2.0.0: UNKNOWN, 2.1: UNKNOWN, 2.2: UNKNOWN
----------------------------------------
### Problem

I believe this problem affects version 1.9.3 and up based on a git blame, but I haven't actually checked them.

The following conditions need to all be met:

1. Ruby must be compiled without `--enable-shared` 
2. argv[0] to ruby must simply be `ruby`

And either one of the following need to be met:

1. Your PATH path must include a directory that has a directory named ruby before where ruby is located
2. The ruby binary is located in a directory named ruby (or any set of subdirectories). Eg: `/test/ruby/bin/ruby`

When you then try and execute ruby, it detects the wrong ruby install directory and fails to correctly load `rubygems.rb` - an example strace:

~~~
open("/home/vagrant/compiled/lib/ruby/site_ruby/2.0.0/rubygems.rb", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/home/vagrant/compiled/lib/ruby/site_ruby/2.0.0/x86_64-linux/rubygems.rb", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/home/vagrant/compiled/lib/ruby/site_ruby/rubygems.rb", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/home/vagrant/compiled/lib/ruby/vendor_ruby/2.0.0/rubygems.rb", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/home/vagrant/compiled/lib/ruby/vendor_ruby/2.0.0/x86_64-linux/rubygems.rb", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/home/vagrant/compiled/lib/ruby/vendor_ruby/rubygems.rb", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/home/vagrant/compiled/lib/ruby/2.0.0/rubygems.rb", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/home/vagrant/compiled/lib/ruby/2.0.0/x86_64-linux/rubygems.rb", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
write(2, "<internal:gem_prelude>:1:in `req"..., 37<internal:gem_prelude>:1:in `require') = 37
write(2, ": ", 2: )                       = 2
write(2, "cannot load such file -- rubygem"..., 36cannot load such file -- rubygems.rb) = 36
~~~

(Notice it is looking in `/home/vagrant/compiled/lib` instead of `/home/vagrant/compiled/ruby/lib`)

### Reproduction

~~~
$ LDFLAGS='-Wl,-rpath=\$$ORIGIN/../lib' ./configure --with-out-ext=tk --with-out-ext=tcl --disable-pthread --enable-load-relative --disable-install-doc --prefix=/home/vagrant/compiled/ruby
$ make
$ make test
$ make install

$ cd /home/vagrant/compiled
$ PATH=":/home/vagrant/appinstall/ruby/bin" ruby
<internal:gem_prelude>:1:in `require': cannot load such file -- rubygems.rb (LoadError)
	from <internal:gem_prelude>:1:in `<compiled>'
~~~

FYI this bug cannot be reproduced through GDB by looking directly at `/home/vagrant/appinstall/ruby/bin/ruby` because GDB forces `argv[0]` to be the full path. You'll need to either patch GDB or use the C program in the `Ongoing Problem` section to use GDB to debug.

### Patch

The following patch fixes the 99% case:

~~~
diff --git a/dln_find.c b/dln_find.c
index 56a1981..74beddd 100644
--- a/dln_find.c
+++ b/dln_find.c
@@ -278,11 +278,10 @@ dln_find_1(const char *fname, const char *path, char *fbuf, size_t size,
        }
 #endif /* _WIN32 or __EMX__ */

-       if (stat(fbuf, &st) == 0) {
+       if (stat(fbuf, &st) == 0 && !S_ISDIR(st.st_mode)) {
            if (exe_flag == 0) return fbuf;
            /* looking for executable */
-           if (!S_ISDIR(st.st_mode) && eaccess(fbuf, X_OK) == 0)
-               return fbuf;
+           if (eaccess(fbuf, X_OK) == 0) return fbuf;
        }
       next:
        /* if not, and no other alternatives, life is bleak */
~~~

How? `dln_find_file_r` is called by `ruby_init_loadpath_safe` to locate where the ruby binary itself is located. However `dln_find_file_r` calls `dln_find_1` which in turn can return a directory. This patch changes `dln_find_1` to only ever return a file. I couldn't find a case of `dln_find_file_r` being expected to return a directory.

### Ongoing Problem

As I mentioned, the patch only fixes the 99% case - the code inherently is broken by relying on path. Take for example this C program:

~~~
#include <unistd.h>

int main() {
  char *const argv[] = {"ruby", NULL};

  execve("/home/vagrant/compiled/ruby/bin/ruby", argv, NULL);
  return 0; // not reached
}
~~~

Compile and run it inside of `/home/vagrant/compiled` to reproduce. 

It would probably be better to look at `/proc/self/exe` when possible, but that would be more of a serious change




-- 
https://bugs.ruby-lang.org/