Hi,

I don't know whether any of us has encountered this kind of problem before
in developing Ruby C extension.  The problem is the interaction between
memory allocation, the garbage collector (gc), and the mark
functions.  And I think this can occur only in struct/class containing
another struct/class.

This problem can occur only with a slight chance.  However, in my case, a
code that has been running for a while suddenly gives a segmentation fault
gave me a great headache.

The problem is like this.  I have an outer structure, such as

    typedef struct
    {
        void  *data1;
        VALUE data2;
    } sOuter;

which corresponds to an outer class, such as

    obj = Data_Make_Struct (cOuter, sOuter, mark_Outer, free_Outer, ptr);

After creating the outer object, I create an inner object:

    ....
    ptr->data2 = Data_Make_Struct (cInner, sInner, mark_Inner, free, p);
    ....

which corresponds to an inner structure, such as

    typedef struct
    {
        int   data3;
        VALUE data4;
    }

But this is exactly the problem!  After debugging my code, my conclusion
is that because Data_Make_Struct implicitly calls Ruby's ALLOC, which may
result in the invocation of the gc, in the lines above, THE MARK FUNCTIONS
MAY BE CALLED BEFORE THE STRUCT/OBJECT ITSELF IS SETUP PROPERLY, with a
result of segmentation fault.  I have been rather careful by checking the
pointers whether they are NULL in the mark functions (and I initialized
them to NULL), but I don't think it is a foolproof way.

In the problem above, it just happened that after thousands of
iterations, when Data_Make_Struct (cInner...) was called, the gc is
invoked.  Basically the gc tries to invoke rb_gc_mark (data2), which at
that point still does not contain a valid object.

Right now, the solution is simply to add this line:

    ....
    ptr->data2 = Qnil;
    ptr->data2 = Data_Make_Struct (cInner...);
    ....

I don't know whether this is really all there is to it.  At least, for
people who don't understand, the code may look funny, because it seems
the Qnil assignment is just a waste.  To me, rather than dealing with
these intricacies in a much more complex data structure, probably it is
better just to follow these simple principles:

    1) NEVER use Data_Make_Struct.  Use Data_Wrap_Struct instead.
    2) NEVER use Ruby's ALLOC.  Use, at least, C's malloc instead.

Regarding point 1), because Data_Wrap_Struct will not invoke the gc, it is
safer.  Regarding point 2), Ruby's ALLOC may have the same problem as
Data_Wrap_Struct, i.e., we will never know whether the gc will be invoked
at that point or not.  Because the corresponding free function is usually
the C's standard free () function anyway (there is no Ruby's FREE
function), probably it is more consistent to use malloc instead of
ALLOC.  When we use malloc, we know that the gc will not be invoked and we
can proceed in C as usual, without worrying about the gc trigger and
object states.

Finally, probably it is a good idea to remove Data_Make_Struct and ALLOC
from the Ruby C API.  We will not lose any functionality while making the
C extensions safer.

Regards,

Bill