Issue #3924 has been updated by Xavier Shay.


Progress update time!

tl;dr - I've made the performance linear, still need to do a bit more clean up though.

I have started switching out the current loop of $LOADED_FEATURES to use a hash look up (using st.h) in order to fix this problem. You can try it now by using `require_2` rather than `require` on my fork (or by changing the mapping of require at the bottom of `load.c`):
https://github.com/xaviershay/ruby/compare/require-performance-fix

This fork passes all ruby tests and rubyspec require tests. It can also load a rails stack. That doesn't mean it is correct (the tests are not comprehensive), but it's a start.

It displays near-linear performance. Here are some graphs:
versus 1.9 - https://skitch.com/xaviershay/r9pwb/tc-load-time
versus 1.8 - https://skitch.com/xaviershay/r9pwn/tc-load-time

This is a pretty big patch, and also my first on MRI so there are a few potential issues:
* This is pretty much a rewrite for load.c, since I couldn't get my head around `rb_feature_p`. That means it's risky. It is likely possibly to add a hash-table lookup to the existing architecture, but someone else would have to volunteer to do this. On the up side, it is far easier to follow the algorithm in the new code.
* It doesn't use the file search functions in `file.c`, so I suspect there could be some safe level issues not covered yet.
* I have made public one or two functions in `file.c` and `array.c`. Pretty sure I could rewrite what I have to avoid doing this.
* `rb_provide` no longer makes sense. It was being used in `enumerator.c` but I found another method of providing backwards compatility. Since it is in `intern.h`, it can maybe just be removed?
* I use a proxy class for $LOADED_FEATURES to keep the loaded features hash up to date. Bit of a hack but I couldn't come up with a nicer way of doing it.
* Is it OK to add a new member to the VM struct? I added new_loaded_features (to be renamed shortly).
* Not tested on windows yet.

Outstanding tasks still on my list (hopefully will be able to do in the next week):
* Autoload still uses the old rb_feature_provided.
* It is much faster on synthetic benchmarks, but still slower on many real world cases because I have only optimized the general algorithm so far and still am using a lot of `rb_funcall`s that are not necessary.
* I haven't marked all my methods as static yet.
* Remove old require code once I'm satisfied the new code works.
* Clean up the vm->new_loaded_features stuff.

Re the original ticket description, I saw the `lstats` in profiling too but believe it is a symptom not a cause.
----------------------------------------
Bug #3924: Performance bug (in require?)
http://redmine.ruby-lang.org/issues/3924

Author: Carsten Bormann
Status: Open
Priority: Normal
Assignee: 
Category: core
Target version: 1.9.3
ruby -v: ruby 1.9.2p0 (2010-08-18 revision 29036) [x86_64-darwin10] 


=begin
 Running irb < /dev/null in 1.9.2 causes 3016 calls to lstat64.
 
 For instance, there is a sequence of 28 repetitions each of lstat calls to all 6 non-empty path prefixes of /opt/local/lib/ruby1.9/1.9.1/irb.rb -- a total of 170 lstats apparently just to load this file; another set of lstats then occurs later for another 18 (times 6) times.  Clearly, something is running amok in the calling sequence rb_require_safe -> realpath_rec -> lstat.
 
 Another example: Running a simple test with the baretest gem causes 17008 calls to lstat.  According to perftools.rb, 80 % of the 1.2 seconds of CPU is used in Kernel#gem_original_require (and another 12 in GC, some of which may be caused by this).
=end



-- 
http://redmine.ruby-lang.org