Issue #6836 has been updated by luislavena (Luis Lavena).


usa (Usaku NAKAMURA) wrote:
> I found that this includes a patch to WEBrick.
> So, I guess that this means there is an incompatibility in File.expand_path, doesn't it?
> What is the incompatibility?

WEBrick relies on File.expand_path to resolve the traversal (by expanding) but also by expanding possible short names into long names.

Since this patch no longer hits the filesystem to determine if the path is a real file and expand the shortname into longname, we moved it to WEBrick.

From what I understand from WEBrick test, I'm still not 100% sure about that particular patch, specially since other tools like Rack handle path traversal security without touching the filesystem.

Rack doesn't rely on File.expand_path at all for traversal checks:

https://github.com/rack/rack/pull/373#issuecomment-4684709

And the code:

https://github.com/rack/rack/blob/master/lib/rack/file.rb#L40-67

Our decision to maintain WEBrick tests was under the assumption that the test was doing something other than what Rack is doing here.

Because of that, we decided to only expand the shortnames in WEBrick.

IMO think Rack approach is better, as it doesn't rely on File.expand_path at all, but I could be wrong.
----------------------------------------
Bug #6836: Improve File.expand_path performance in Windows
https://bugs.ruby-lang.org/issues/6836#change-28679

Author: luislavena (Luis Lavena)
Status: Assigned
Priority: Normal
Assignee: nobu (Nobuyoshi Nakada)
Category: core
Target version: 2.0.0
ruby -v: ruby 2.0.0dev (2012-08-04 trunk 36616) [i386-mingw32]


=begin
(original write up in markdown here: https://gist.github.com/3242245)

== Background

While investigating the performance issues of (({File.expand_path})) on Windows,
Usaku Nakamura and  Nobuyoshi Nakada on [ruby-core:39504] pointed out that
due security concerns, accessing files on Windows required normalized paths.

This was covered in the security update of March 2008, WEBrick file-access
vulnerability [1].

After closer inspection of WEBrick code (mentioned by the security update),
I noticed it uses (({File.expand_path})) to perform the heavy lifting of path
normalization in the request.

The code around this can be inspected in (({prevent_directory_traversal}))[2]
and (({shift_path_info}))[3] methods.

This approach performs a hit into the filesystem, contrary to its
implementation in any other operating system.

(({File.expand_path})) is heavily used by (({require})), which result in slow
application startup, depending on the application size or number of gems it
depends on.

Stepping back for a second, we can see that the security issue is around
WEBrick and the way it determines (({path_info})) absoluteness.

It is also clear that to solve WEBrick security issue, a tax has been applied
to the entire Ruby ecosystem, penalizing startup performance.

With Hiroshi Shirosaki's help, we worked on a patch that:

* Limit filesystem hit only to WEBrick, using Windows' GetLongPathName [4].
* Use a Windows-specific API to normalize paths
* Improve encoding support.

What started as an experiment named Fenix [5] has shown great results on a
variety of systems.

This patch has been integrated in TheCodeShop [6] releases of Ruby 1.9.3 and
tested by Ruby-Core developers Hiroshi Shirosaki and myself.

== Performance

To demonstrate the benefits of this patch, I've used measurements [7] project
and both (({core_require_empty})) and (({core_require_nested})) workloads,
obtaining the following results:

 ruby 2.0.0dev (2012-08-03 trunk 36611) [i386-mingw32]
 Rehearsal ------------------------------------------------------
 core_require_empty   1.186000   3.385000   4.571000 (  4.676267)
 --------------------------------------------- total: 4.571000sec

                          user     system      total        real
 core_require_empty   1.217000   3.385000   4.602000 (  4.643266)

 Rehearsal -------------------------------------------------------
 core_require_nested   1.514000   3.760000   5.274000 (  5.305303)
 ---------------------------------------------- total: 5.274000sec

                           user     system      total        real
 core_require_nested   1.466000   3.713000   5.179000 (  5.233300)


And with patch applied:

 ruby 2.0.0dev (2012-08-03 trunk 36611) [i386-mingw32]
 Rehearsal ------------------------------------------------------
 core_require_empty   0.765000   1.077000   1.842000 (  1.887603)
 --------------------------------------------- total: 1.842000sec

                          user     system      total        real
 core_require_empty   0.717000   1.123000   1.840000 (  1.887603)

 Rehearsal -------------------------------------------------------
 core_require_nested   0.717000   1.670000   2.387000 (  2.480405)
 ---------------------------------------------- total: 2.387000sec

                           user     system      total        real
 core_require_nested   0.890000   1.528000   2.418000 (  2.496004)


Benchmarks were performed all on the same hardware and OS:

* CPU: Core 2 Duo T7500 @ 2.20Ghz
* RAM: 4GB
* HDD: 1.5GB RAMdisk (ImDisk)
* OS: Windows 7 Ultimate x64

All tests associated (both File and WEBrick ones) pass.

Additional tests that exercise specific aspects of new function were added.

Patch has been tested also against:

* Visual Studio build of Ruby
* Ubuntu 12.04
* Mac OSX

And the patch didn't affect either build or tests of it.

=== Real life impact: Rails

The biggest Ruby project affected by this is Rails applications.

An empty Rails application on startup requires more than 700 files from
different gems:

 V:\emptyapp>ruby script\rails runner -e production "p $LOADED_FEATURES.size"
 772

When benchmark startup using w32time [8]:

 V:\emptyapp>timer ruby script\rails runner -e production "0"
 real    7.285
 system  4.539
 user    2.683

And patched Ruby:

 V:\emptyapp>timer ruby script\rails runner -e production "0"
 real    2.620
 system  0.873
 user    1.700

(best result taken from several warm ups).

Now, a mid-sized application like Enki [9] which loads 1146 files in
production mode, result in:

 V:\enki>timer ruby script\rails runner -e production "p $LOADED_FEATURES.size"
 1146
 real    22.620
 system  11.497
 user    11.076

Almost ((*23 seconds*)), compared to patched version:

 V:\enki>timer ruby script\rails runner -e production "0"
 real    11.013
 system  1.981
 user    8.938

This change also improves performance of (({rake})) inside Rails, from:

 V:\enki>timer rake -T
 ...
 real    8.689
 system  0.015
 user    0.000

To:

 V:\enki>timer rake -T
 ...
 real    3.307
 system  0.000
 user    0.031

Making normal operations more accessible.

Looking forward for your thoughts on these changes.

Thank you.

[1] http://www.ruby-lang.org/en/news/2008/03/03/webrick-file-access-vulnerability/
[2] https://github.com/ruby/ruby/blob/trunk/lib/webrick/httpservlet/filehandler.rb#L242-263
[3] https://github.com/ruby/ruby/blob/trunk/lib/webrick/httpservlet/filehandler.rb#L330-337
[4] http://msdn.microsoft.com/en-us/library/windows/desktop/aa364980.aspx
[5] https://github.com/luislavena/fenix
[6] http://thecodeshop.github.com/
[7] https://github.com/jonforums/measurements
[8] https://github.com/thecodeshop/w32time
[9] https://github.com/xaviershay/enki
=end



-- 
http://bugs.ruby-lang.org/