On 1/5/10, Benjie Chen <benjie / lablife.org> wrote:
> Caleb,
>
> Thanks for continuing with this.
>
>> Try rewriting those 3 lines as:
>>
>>    RToken *token = ALLOC(RToken);
>>    volatile VALUE text = rb_str_new2(tk->text);
>>    token->text = text;
>>    return Data_Wrap_Struct(cToken, &frt_token_mark, &frt_token_free,
>> token);
>
> I tried this, but it did not fix the problem. I get the same error,
> "scan" method called on terminated object, and I was calling scan on
> token->text in Ruby land.

damn. I had my hopes.

> I believe your analysis is mostly right, leading to this suggestion,
> so I think it's possible that having the volatile VALUE text in the
> stack is right too (although, since Ruby, at least the version I use,
> has cooperative threads, I don't think it would ever interrupt in the
> middle of this particularly C extension to run GC, in this C extension
> there's no point where the thread blocks and gives up control of the
> processor).

ruby gc doesn't run in a separate thread; AFAIK, it's invoked whenever
the memory manager runs out of room in its current heap and is about
to ask the system for more memory. So, anytime some kind of ruby VALUE
gets allocated, the gc could potentially run. (like Data_Wrap_Struct,
rb_str_new2....)

> However, if GC is called after the procedure returns, therefore the
> stack "VALUE text" is destroyed, and before the result of the
> Data_Wrap_Struct is assigned to anything, then it's possible that GC
> may not know about token yet, and token gets removed. I am not clear,
> w/o looking at the GC code, exactly how the GC works. I can't think of
> a scenario why this is the case though, since Data_Wrap_Struct's
> result, when returned, should be in the caller's stack.

You could try invoking the gc manually at the point where you think
it's causing a race condition. Like right after the call to
rb_str_new2 is where I would first try. That won't fix a thing, but it
may make the problem easier to reproduce... so you don't have to run a
loop 25 million times.

> Right now the only fix I have is to do something like
>
>   VALUE v = Data_Wrap_Struct(...);
>   rb_ivar_set (..., &v);
>   return v;

At least you do have a fix that works...

> This really suggests a couple of things: 1) it's token that gets
> destroyed, and since I always use token->text first, that's why it
> seems like token->text is at fault; 2) after the return
> Data_Wrap_Struct in the original code, GC snuck in and reaped the
> returned value...

This shouldn't be the case because there should still be a reference
somewhere on the c call stack to the result of Data_Wrap_Struct, which
would prevent it from being freed.