Hi

I am working my way through Ferret (Ruby port of Lucene) code to solve
a bug. Ferret code is mainly a C extension to Ruby. I am running into
some issues with the garbage collector. I managed to fix it, but I
don't completely understand my fix =) I am hoping someone with deeper
knowledge of Ruby and C extension (this is my 3rd day with Ruby) can
elaborate. Thanks.

Here is the situation:

Some where in Ferret C code, I am returning a "Token" to Ruby land.
The code looks like

static VALUE get_token (...)
{
  ...
  RToken *token = ALLOC(RToken);
  token->text = rb_str_new2("some text");
  return Data_Wrap_Struct(..., &frt_token_mark, &frt_token_free, token);
}

frt_token_mark calls rb_gc_mark(token->text)   and  frt_token_free
just frees the token with  free(token)

In Ruby, this code correlates to the following:

  token = @input.next

Basically, @input is set to some object, calling the next method on it
triggers the get_token C call, which returns a token object.

In Ruby land, I then do something like   w = token.text.scan('\w+')

When I run this code inside a while 1 loop (to isolate my problem), at
some point (roughly when my ruby process mem footprint goes to 256MB,
probably some GC threshold), Ruby dies with errors like

  scan method called on terminated object

Or just core dumps. My guess was that token.text was garbage collected.

I don't know enough about Ruby C extension to know what happens with
Data_Wrap_Struct returned objects. Seems to me the assignment in Ruby
land, token =, should create a reference to it.

My "work-around"/"fix" is to create a Ruby instance variable in the
object referred to by @input, and stores the token text in there, to
get an extra reference to it. So the C code looks like

    RToken *token = ALLOC(RToken);
    token->text = rb_str_new2(tk->text);
    /* added code: prevent garbage collection */
    rb_ivar_set(input, id_curtoken, token->text);
    return Data_Wrap_Struct(cToken, &frt_token_mark, &frt_token_free, token);

So now I've created a "curtoken" in the input instance variable, and
saved a copy of the text there... I've taken care to remove/delete
this reference in the free callback of the class for @input.

With this code, it works in that I no longer get the terminated object error.

The fix seems to make sense to me -- it keeps an extra ref in curtoken
to the token.text string so an instance of token.text won't be removed
until the next time @input.next is called (at which time a different
token.text replaces the old value in curtoken).

My question is: why did it not work before? Shouldn't
Data_Wrap_Structure return an object that, when assigned in Ruby land,
has a valid reference and not be removed by Ruby?

Thanks,
Benjie