Issue #16848 has been updated by byroot (Jean Boussier).


In order to try to move this forward, here's some code snippet of what I think the interface could look like:

```ruby
class LoaderInterface
  def find_feature(relative_path)
    raise NotImplementedError
  end

  def load_feature(relative_path, absolute_path)
    # This should likely be the default behavior.
    # But defining load also allows to generate some source code and call `eval`.
    ::Kernel.load(absolute_path)
  end
end

class CacheLoader < LoaderInterface
  def initialize(cache)
    @cache = cache
  end

  def find_feature(relative_path)
    @cache[relative_path] # absolute_path or nil
  end
end

class ZipLoader < LoaderInterface
  def initialize(zip_path)
    @zip_path = zip_path
    @zip = Zip.open(zip_path)
  end

  def find_feature(relative_path)
    if @zip.file?(relative_path)
      "#{zip_path}:#{relative_path}"
    end
  end

  def load_feature(relative_path, absolute_path)
    ::Kernel.eval(@zip.read(relative_path), relative_path, absolute_path)
  end
end

LOAD_PATH = [
  '/path/to/lib',
  CacheLoader.new('delegate.rb' => '/opt/rubies/2.7.1/lib/ruby/2.7.0/delegate.rb'),
  ZipLoader.new('/path/to/file.zip'),
]

LOADED_FEATURES = []

def require_file(relative_path)
  $LOAD_PATH.each do |path|
    case path
    when String
      # old behavior
      feature_path = File.join(path, feature_path)
      if File.exist?(feature_path)
        ::Kernel.load(feature_path)
        LOADED_FEATURES << feature_path
        return true
      end
    else
      if feature_path = path.find_feature(relative_path)
        path.load_feature(relative_path, feature_path)
        # The loader doesn't have to care about LOADED_FEATURES management
        # if the feature was already loaded it simply won't be called.
        LOADED_FEATURES << feature_path
        return true
      end
    end
  end
  false
end

def require(feature)
  # Insert LOADED_FEATURES check here.
  if feature.end_with?('.rb')
    require_file(feature)
  else
    require_file("#{feature}.rb") || require_file("#{feature}.so")
  end
end
```

In short:

  - In the context of `require "foo/bar"`
  - First we call `find_feature("foo/bar.rb")`, loaders should return `nil` on miss, or a `String` that will be inserted in `$LOADED_FEATURES` on hit. e.g. `/absolute/path/to/foo/bar.rb`
  - If a loader hits, we then call `load_feature("foo/bar.rb", "/absolute/path/to/foo/bar.rb")`.
  - MAYBE: if the loader doesn't respond to `load_feature`, we can fallback to load the absolute path that was returned.
  - If none of the loader hit, we do a second pass with `find_feature("foo/bar.so")` (or the platform equivalent).


Important parts:

  - I think the `$LOADED_FEATURES` management should remain a responsibility of `Kernel#require`. Loaders shouldn't have to worry about it.
  - The double pass is a bit unfortunate, but required to maintain backward compatibility with the current behavior on LOAD_PATH precedence.

Any thoughts?

----------------------------------------
Feature #16848: Allow callables in $LOAD_PATH
https://bugs.ruby-lang.org/issues/16848#change-86096

* Author: byroot (Jean Boussier)
* Status: Feedback
* Priority: Normal
----------------------------------------
Make it easier to implement `$LOAD_PATH` caching, and speed up application boot time.

I benchmarked it on Redmine's master using bootsnap with only the optimization enabled:

```ruby
if ENV['CACHE_LOAD_PATH']
  require 'bootsnap'
  Bootsnap.setup(
    cache_dir:            'tmp/cache',
    development_mode:     false,
    load_path_cache:      true,
    autoload_paths_cache: true,
    disable_trace:        false,
    compile_cache_iseq:   true,
    compile_cache_yaml:   false,
  )
end
```

```
$ RAILS_ENV=production time bin/rails runner 'p 1'
        2.66 real         1.99 user         0.66 sys
$ RAILS_ENV=production time bin/rails runner 'p 1'
        2.71 real         1.97 user         0.66 sys
$ CACHE_LOAD_PATH=1 RAILS_ENV=production time bin/rails runner 'p 1'
        1.41 real         1.12 user         0.28 sys
$ CACHE_LOAD_PATH=1 RAILS_ENV=production time bin/rails runner 'p 1'
        1.41 real         1.12 user         0.28 sys
```

That's twice for a relatively small application. And the performance improvement is not linear; the larger the application, the larger the improvement.

### How it works

`require` has `O($LOAD_PATH.size)` performance. The more gems you add to your `Gemfile`, the larger `$LOAD_PATH` becomes. `require "foo.rb"` will try to open the file in each of the `$LOAD_PATH` entries. And since more gems usually also means more `require` calls, loading Ruby code may take up to quadratic performance loss.

To improve this, Bootsnap pre-computes a map of all the files in your `$LOAD_PATH`, and uses it to convert relative paths into absolute paths so that Ruby skips the `$LOAD_PATH` traversal.

```ruby
$LOAD_PATH = $w(/gems/foo/lib /gems/bar/lib)

BOOTSNAP_CACHE = {
  "bar.rb" => "/gems/bar/lib/bar.rb",
}
```

This resolves file lookup by a single hash lookup, and reduces boot performance from roughly `O($LOAD_PATH.size * number_of_files_to_require)` to `O(number_of_files_to_require)`.

This optimization is also used in [Gel](https://github.com/gel-rb/gel), a Rubygems/Bundler replacement.

### Trade offs

Every time `$LOAD_PATH` is modified, the cache must become invalidated. While this is complex to do for Bootsnap, it would be fairly easy if it is implemented inside Ruby.

More importantly, you have to invalidate the cache whenever you add or delete a file to/from one of the `$LOAD_PATH` members; otherwise, if you shadow or unshadow another file farther in the `$LOAD_PATH`, Bootsnap will load a wrong file. For instance, if `require "foo.rb"` initially resolves to `/some/gem/foo.rb`, and you create `lib/foo.rb`, you'll need to flush Bootsnap cache.

That latter is trickier, and Bootsnap has decided that it is rare enough to cause actual problems, and so far that holds. But that is not a trade off Ruby can make.

However that's probably a tradeoff Rubygems/Bundler can make. While it's common to edit your gems to debug something, it's really uncommon to add or remove files inside them. So in theory Rubygems/Bundler could compute a map of all files in a gem that can be required after it installs it. Then when you activate it, you merge it together with the other activated gems.

### Proposal

This could be reasonably easy to implement if `$LOAD_PATH` accepted callables in addition to paths. Something like this:

```ruby
$LOAD_PATH = [
  'my_app/lib',
  BundlerOrRubygems.method(:lookup),
]
```

The contract would be that `BundlerOrRubygems.lookup("some_relative/path.rb")` would return either an absolute path or `nil`. With such API, it would be easy to cache absolute paths only for gems and the stdlib, and preserve the current cache-less behavior for the application specific load paths, which are usually much less numerous. It would also allow frameworks such as Rails to implement the same caching for application paths when running in an environment
where the source files are immutable (typically production).



-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>