Issue #6836 has been updated by h.shirosaki (Hiroshi Shirosaki).


usa (Usaku NAKAMURA) wrote:
>  If the performance problem is in 'require' and 'load', change only them
>  and be stayed File.expand_path the same behavior.
>  Can't do so?
>  

I think that might be possible, though changing that properly seems not easy. 
I created an experimental patch for that. This patch passed webrick test and became faster.

https://gist.github.com/3293339


Changed functions:

load.c:
rb_get_expanded_load_path
rb_feature_provided
rb_feature_p

file.c:
rb_find_file_ext_safe
rb_find_file_safe
----------------------------------------
Bug #6836: Improve File.expand_path performance in Windows
https://bugs.ruby-lang.org/issues/6836#change-28723

Author: luislavena (Luis Lavena)
Status: Assigned
Priority: Normal
Assignee: nobu (Nobuyoshi Nakada)
Category: core
Target version: 2.0.0
ruby -v: ruby 2.0.0dev (2012-08-04 trunk 36616) [i386-mingw32]


=begin
(original write up in markdown here: https://gist.github.com/3242245)

== Background

While investigating the performance issues of (({File.expand_path})) on Windows,
Usaku Nakamura and  Nobuyoshi Nakada on [ruby-core:39504] pointed out that
due security concerns, accessing files on Windows required normalized paths.

This was covered in the security update of March 2008, WEBrick file-access
vulnerability [1].

After closer inspection of WEBrick code (mentioned by the security update),
I noticed it uses (({File.expand_path})) to perform the heavy lifting of path
normalization in the request.

The code around this can be inspected in (({prevent_directory_traversal}))[2]
and (({shift_path_info}))[3] methods.

This approach performs a hit into the filesystem, contrary to its
implementation in any other operating system.

(({File.expand_path})) is heavily used by (({require})), which result in slow
application startup, depending on the application size or number of gems it
depends on.

Stepping back for a second, we can see that the security issue is around
WEBrick and the way it determines (({path_info})) absoluteness.

It is also clear that to solve WEBrick security issue, a tax has been applied
to the entire Ruby ecosystem, penalizing startup performance.

With Hiroshi Shirosaki's help, we worked on a patch that:

* Limit filesystem hit only to WEBrick, using Windows' GetLongPathName [4].
* Use a Windows-specific API to normalize paths
* Improve encoding support.

What started as an experiment named Fenix [5] has shown great results on a
variety of systems.

This patch has been integrated in TheCodeShop [6] releases of Ruby 1.9.3 and
tested by Ruby-Core developers Hiroshi Shirosaki and myself.

== Performance

To demonstrate the benefits of this patch, I've used measurements [7] project
and both (({core_require_empty})) and (({core_require_nested})) workloads,
obtaining the following results:

 ruby 2.0.0dev (2012-08-03 trunk 36611) [i386-mingw32]
 Rehearsal ------------------------------------------------------
 core_require_empty   1.186000   3.385000   4.571000 (  4.676267)
 --------------------------------------------- total: 4.571000sec

                          user     system      total        real
 core_require_empty   1.217000   3.385000   4.602000 (  4.643266)

 Rehearsal -------------------------------------------------------
 core_require_nested   1.514000   3.760000   5.274000 (  5.305303)
 ---------------------------------------------- total: 5.274000sec

                           user     system      total        real
 core_require_nested   1.466000   3.713000   5.179000 (  5.233300)


And with patch applied:

 ruby 2.0.0dev (2012-08-03 trunk 36611) [i386-mingw32]
 Rehearsal ------------------------------------------------------
 core_require_empty   0.765000   1.077000   1.842000 (  1.887603)
 --------------------------------------------- total: 1.842000sec

                          user     system      total        real
 core_require_empty   0.717000   1.123000   1.840000 (  1.887603)

 Rehearsal -------------------------------------------------------
 core_require_nested   0.717000   1.670000   2.387000 (  2.480405)
 ---------------------------------------------- total: 2.387000sec

                           user     system      total        real
 core_require_nested   0.890000   1.528000   2.418000 (  2.496004)


Benchmarks were performed all on the same hardware and OS:

* CPU: Core 2 Duo T7500 @ 2.20Ghz
* RAM: 4GB
* HDD: 1.5GB RAMdisk (ImDisk)
* OS: Windows 7 Ultimate x64

All tests associated (both File and WEBrick ones) pass.

Additional tests that exercise specific aspects of new function were added.

Patch has been tested also against:

* Visual Studio build of Ruby
* Ubuntu 12.04
* Mac OSX

And the patch didn't affect either build or tests of it.

=== Real life impact: Rails

The biggest Ruby project affected by this is Rails applications.

An empty Rails application on startup requires more than 700 files from
different gems:

 V:\emptyapp>ruby script\rails runner -e production "p $LOADED_FEATURES.size"
 772

When benchmark startup using w32time [8]:

 V:\emptyapp>timer ruby script\rails runner -e production "0"
 real    7.285
 system  4.539
 user    2.683

And patched Ruby:

 V:\emptyapp>timer ruby script\rails runner -e production "0"
 real    2.620
 system  0.873
 user    1.700

(best result taken from several warm ups).

Now, a mid-sized application like Enki [9] which loads 1146 files in
production mode, result in:

 V:\enki>timer ruby script\rails runner -e production "p $LOADED_FEATURES.size"
 1146
 real    22.620
 system  11.497
 user    11.076

Almost ((*23 seconds*)), compared to patched version:

 V:\enki>timer ruby script\rails runner -e production "0"
 real    11.013
 system  1.981
 user    8.938

This change also improves performance of (({rake})) inside Rails, from:

 V:\enki>timer rake -T
 ...
 real    8.689
 system  0.015
 user    0.000

To:

 V:\enki>timer rake -T
 ...
 real    3.307
 system  0.000
 user    0.031

Making normal operations more accessible.

Looking forward for your thoughts on these changes.

Thank you.

[1] http://www.ruby-lang.org/en/news/2008/03/03/webrick-file-access-vulnerability/
[2] https://github.com/ruby/ruby/blob/trunk/lib/webrick/httpservlet/filehandler.rb#L242-263
[3] https://github.com/ruby/ruby/blob/trunk/lib/webrick/httpservlet/filehandler.rb#L330-337
[4] http://msdn.microsoft.com/en-us/library/windows/desktop/aa364980.aspx
[5] https://github.com/luislavena/fenix
[6] http://thecodeshop.github.com/
[7] https://github.com/jonforums/measurements
[8] https://github.com/thecodeshop/w32time
[9] https://github.com/xaviershay/enki
=end



-- 
http://bugs.ruby-lang.org/