On 1/4/10, Benjie Chen <benjie / lablife.org> wrote:
> Some where in Ferret C code, I am returning a "Token" to Ruby land.
> The code looks like
>
> static VALUE get_token (...)
> {
>   ...
>   RToken *token = ALLOC(RToken);
>   token->text = rb_str_new2("some text");
>   return Data_Wrap_Struct(..., &frt_token_mark, &frt_token_free, token);
> }
>
> frt_token_mark calls rb_gc_mark(token->text)   and  frt_token_free
> just frees the token with  free(token)
>
> In Ruby, this code correlates to the following:
>
>   token = @input.next
>
> Basically, @input is set to some object, calling the next method on it
> triggers the get_token C call, which returns a token object.
>
> In Ruby land, I then do something like   w = token.text.scan('\w+')

What if you change this to:
  text=token.text
  w=text.scan('\w+')

It's that text object that your error message is complaining about;
what if we make an explicit reference to it the ruby side?

Or maybe the token.text is being freed between when the token is
created and when it's assigned to a ruby variable? There are no
ruby-land references to it or the token which refers to it during that
brief time, so if the garbage collector happens to be invoked there...
but presumably there is a reference to token somewhere on the c call
stack, so that shouldn't be an issue.

> When I run this code inside a while 1 loop (to isolate my problem), at
> some point (roughly when my ruby process mem footprint goes to 256MB,
> probably some GC threshold), Ruby dies with errors like
>
>   scan method called on terminated object
>
> Or just core dumps. My guess was that token.text was garbage collected.

Weird. It seems like at first glance the code you cite is doing
everything right.

> I don't know enough about Ruby C extension to know what happens with
> Data_Wrap_Struct returned objects. Seems to me the assignment in Ruby
> land, token =, should create a reference to it.

To the token... but it's the text field in token that the warning was
about specifically.

> My "work-around"/"fix" is to create a Ruby instance variable in the
> object referred to by @input, and stores the token text in there, to
> get an extra reference to it. So the C code looks like

That shouldn't be necessary; the mark routine you showed should be
enough to keep ruby's gc informed as to the liveness of your
token.texts. This isn't a proper fix but only a hack; it may well be a
good interim measure, but something deeper is wrong and needs to be
understood.

> My question is: why did it not work before? Shouldn't
> Data_Wrap_Structure return an object that, when assigned in Ruby land,
> has a valid reference and not be removed by Ruby?

Maybe this never really worked... at least not 100% of the time.
Ferret has its bugs.