------=_Part_47954_20604676.1229970745928
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

2008/12/22 Charles Oliver Nutter <charles.nutter / sun.com>:
> Robert Klemme wrote:

>> I don't think there should be locking involved for "load" as that
>> loads a file unconditionally. When using "load" one must be aware of
>> the fact that the file can be loaded multiple times - including
>> concurrently.
>
> I tend to agree. Load explicitly bypasses things like $" population so you
> can load the same file repeatedly if you want to. So I think load should be
> the lock-free low-level "always execute target file" version. And in truth,
> require is largely implemented in terms of load.

That's always how I looked at it: "load" is the low level
functionality of reading a file of ruby code and executing it.
"require" adds the one off check plus the file name, $: and shared lib
magic.

>> For "require" the story looks a bit different.  If I am not mistaken
>> you would need a reentrant lock per (case insensitive? - probably
>> depending on file system) string after "require" as the smallest
>> granularity.  Hm...  The longer I think about it the more I believe
>> that a single reentrant lock for "require" is probably the most robust
>> solution...
>
> I think it has to be a single lock or nothing, due to the deadlock
> possibilities of a lock-per-resources.

Unfortunately there is also a deadlock possibility even with a single
lock.  Assuming you have a large application and want to parallelize
initialization because that might be IO intensive and your app will
start faster by doing this.  If you do something like this:

$ cat large_app.rb

%w{a b}.map do |fn|
  Thread.new(fn){|file| require "large_app/#{file}"}
end.each do |th|
  th.join
end

you have created a deadlock.  And you would not get a deadlock with a
lock by resource implementation.  Granted, the deadlock for the lock
by resource approach is more likely as this case is probably a bit
esoteric.  The bad news is that it does work with the current 1.8.7
interpreter, i.e. you do not get a deadlock.  Output from the test
attached:

robert@fussel /tmp
$ ruby -r large_app -e 'puts 1'
>large_app
>a
<a
>b
<b
<large_app
1

robert@fussel /tmp
$

> So I'll summarize my position here:
>
> - requires cannot safely be done concurrently because they will cause
> resources (classes, modules) to be accessed before they are completely
> loaded and initialized.
> - The simplest solution for this is to not require resources concurrently,
> which is certainly a valid solution which requires no locks and no
> behavioral changes.

But has the drawback of a missing safety net.

> - A lock per resources or per name is broken, due to deadlock potential, and
> so is not the right solution
> - A single global lock would fix require concurrency (and almost be enough
> to fix autoload concurrency) but would introduce the behavioral change that
> only one require can execute at a time.

No, it would mean, only one thread can execute require at a time.  The
lock would of course have to be reentrant so your required files can
require more.  Otherwise it would be useless.

Plus it would also have deadlock potential.  Even worse, this type of
deadlock is probably harder to (automatically) detect because it
involves locks of two different types. This is a bit similar to
deadlocks we once had in an application which could be seen at the
database level but the DB did not recognize them because application
level locks were also involved creating a deadlock which was not
resolved by the DB's deadlock detection.  Nasty debugging session that
was...

I see these additional solutions

- Disallow thread creation from a thread that is currently holding the
require lock.

- Mark threads created from a thread currently holding the require
lock and disallow further require executions from there (at least as
long as the parent thread holds the require lock).

Both these would at least avoid the deadlock I have constructed above
as far as I can see.  The first one might be simpler to implement, the
last one probably allows for more freedom in required code. It could
even be essential for some applications to start a thread during
requiring (just think about a pool cleanout thread or something
similar).

> - I'm not strongly leaning toward either "no fix" or a global require lock.
> It doesn't bother me terrible that require isn't thread-safe, since the
> simple fix of "don't do that" solve the problem without complication.
>
>> While we're at it: what happens to "require" in light of changes to $:
>> and current directory?
>
> I don't think it would make any difference; both just change how it searches
> for files, and should not affect concurrency characteristics.

Well, but what about

th1: $:unshift "bar"
th1: require a # loads "bar/a" as intended
th2: $:unshift "foo"
th1: require b # now reads "foo/b" instead of "bar/b"

Again, practically this might not be a problem but any time someone
has the idea to parallelize app initialization (for example, because
he has native threads now and 4 CPU's at hand) and common names are
used the effect of allowing concurrent changes to $: and Dir.pwd might
be dramatic - and hard to debug.

I'm all for pragmatic solutions but with such a fundamental feature
like the language's library loading I have a bad feeling that this
will haunt us one way or the other if it is not made robust.

Thoughts?

robert

-- 
remember.guy do |as, often| as.you_can - without end

------=_Part_47954_20604676.1229970745928
Content-Type: application/x-gzip; name=large_app.tgz
Content-Transfer-Encoding: base64
X-Attachment-Id: f_fp1fiq7k0
Content-Disposition: attachment; filename=large_app.tgz

H4sIAIBwb0UAA+3VwW6CMBwGcM59in9YlmxZwihQubi9wK7elzKrsGhhtYaD+O4rRMWLWbIE5uL3
iwmGkraxfJ8raZbqXVZVYDJvIKEzSZL2ylORdlf36e63Ip56nPMoTgUXEzfuvoYTj8KhNnRuu7HS
EHmmzJSxl597U4UeYz8jY9XWbsh/XR3fA5+x+3onKdsHa1nRvKRmoRtGNMuNkvNAq/phoR93zaJY
qYaM+toWRpF/muD5btcO7f09U3oeKPmRd7PYvJ3F5sFnWeh2iB3Wnp6t/dc/x83pj224NdqMp0Jc
zL9zyn8SRy7/7rHQIzHclno3nv/+/OVg/wA/9T8Pk/783bvg+l+EMfp/DMf+lz6z9EKzYq0CXdb0
RDGrc9fj/a0pWUbcjURdex/KW6K0/7M+/9l15L/r/0hwjvyP4Zj/7Lf5z5B/AAAAAAAAAAAAAAAA
gGvxDWJopKIAKAAA
------=_Part_47954_20604676.1229970745928--