[Synopsis: a C library I work with uses opaque integer handles to refer to internal toolkit objects. It also requires correct deallocation order for some of the objects. I can write a wrapper layer for the C implementation of Python to have it do the correct automatic garbage collection, but can't figure out how to use Ruby for the same task, because finalization order isn't guaranteed and because it assumes C extension types are always through pointers.] Hello, I posted this on c.l.py but Matz asked that I repost it here. This is the more appropriate group, but I'm a long time Python developer and c.l.py people nearly always do a good job of describing the pros and cons of different languages. Plus, posting here meant I needed to reread the FAQ and the back newsgroups postings. The topic was on C/C++ integration. From my admittedly poor understanding of Ruby, I don't follow how I could use it for a system I worked on called PyDaylight. The Daylight toolkit is a library for chemical informatics. It contains data types like "molecule", "atom", "bond", "pattern" and "reaction." It is written in C but exposes a consistent API for both C and Fortran programmers. This API uses opaque object handles to refer to internal objects. These are represented as integers - starting with 1 - because Fortran doesn't have a pointer data type. The internal data model is object oriented, but it is hidden behind that API. For example, using the SWIG'ged Python interface to the C code (this is from memory, commentary on the right) >>> from dayswig_python import * >>> dt_smilin("CO") # Create a molecule 1 # the molecule handle is 1 >>> dt_typename(1) # Get the toolkit's name for this type 'molecule' >>> dt_stream(1, TYP_ATOM) # Create an iterator over the atoms 2 # this is a new object >>> dt_next(2) # Get the first object in the iterator 3 # another new objet >>> dt_typename(3) # What is it? 'atom' # an atom >>> dt_symbol(3) # What kind of atom is it? 'C' # carbon >>> dt_next(2) # Next atom 4 >>> dt_symbol(4) 'O' # is an oxygen >>> dt_next(2) # Next atom? 0 # nope, finished with the iteration >>> dt_dealloc(2) # Remove the iterator 1 >>> dt_dealloc(1) # Remove the molecule 1 >>> The interface layer I wrote in Python hides this low-level interface to allow the following >>> from daylight import Smiles >>> for atom in Smiles.smilin("CO"): ... print atom.typename, atom.symbol ... atom C atom O >>> It does this by: - wrapping the integer handle inside of a class instance, as in class dayobject: def __init__(self, handle): self.handle = handle def __int__(self): return int(self.handle) ... class Atom(dayobject): ... atom = Atom(3) # where 3 is an atom handle The __int__ method allows a dayobject instance to be coerced into the value expected by the SWIG interface. The int(self.handle) is needed for reasons discussed below. - converting attribute lookup to function calls via a getattr hook, which lets me do 'atom.symbol' instead of 'dt_symbol(atom.handle)' (In Python, the getattr hook lets the instance define how to resolve attribute lookups if the attribute isn't otherwise found. Ruby has a similar method, if my memory serves me correct.) - converting the toolkit iterator model to a Python one, either by direct conversion to a list (eg, bond.atoms returns a list with two Atom instances) or through a lazy interface (eg, an iterator through all the compounds in a database). I know Ruby does iterators well. - doing the appropriate garbage collection. I'm not sure how well Ruby handles this last part. Let me explain in even more detail. The lifetime of a toolkit object may be dependent on another object (its parent). For example, the lifetime of an atom is dependent on the molecule. If the molecule is deallocated, then all of its atoms are deallocated. (If an atom is deallocated, it is deleted from the molecule, but the molecule persists.) The lifetime doesn't even always depend on the object type. For example, the molecule may be created on its own, or may be part of a "reaction" data type. (In the reaction "[OH-] + [H+] -> [H2O]" there are three molecules.) The only place which knows the lifetime of the object is the function used to create it. By reading the documentation and experimenting I found what they were. These create an integer handle which I wrap with a new object, something like: class smart_ptr: def __init__(self, handle): self.handle = handle def __int__(self): return self.handle def __del__(self): dayswig_python.dt_dealloc(self.handle) Here's where you see why dayobject's __int__ calls int(self.handle) - if it's a smart_ptr, it still needs to be converted into an integer. Consider the molecule. If I create a molecule from scratch then I return a Molecule wrapping a smart_ptr wrapping the handle def smilin(smiles): mol_handle = dayswig_python.dt_smilin(smiles) return Molecule(smart_ptr(mol_handle)) If instead I return a molecule which is a component of a reaction I use: mol_handle = ... # code not shown because it's too complicated # and irrelevant to this discussion return Molecule(mol_handle) This approach works because of the __del__ method, which is how Python does finalization. In the C implementation, it is called when the reference count goes to 0. (It is not called when the garbage collecter finds and removed non-accessible cycles.) This lets me use Python's garbage collector for all toolkit objects. Things get trickier. Some objects manage their own lifetime but are also dependent on another object. One such is the 'MatchObject' used for substructure searches. If the molecule is deleted, the toolkit invalidates all of the handles used in any MatchObject related to the molecule. Consider class MatchObject(daylight.dayobject): def __init__(self, mol, smarts, match_handle, flags): daylight.dayobject.__init__(self, match_handle) self.mol = mol ... def __del__(self): del self.handle del self.mol This saves both the handle for the MatchObject ('match_handle') *and* the handle for the molecule ('mol'). Because it keeps a reference to the molecule, that molecule will not be garbage collected until all of the MatchObjects are also removed. From tests, this is the expected behaviour. But notice that the finalizer is careful about the order in which objects are deleted. The match object is removed before deleting the molecule. If the order was reversed, then the ref count for the molecule goes to 0 so gets dt_dealloc'ed by the smart_ptr's __del__. The toolkit then invalidates all of the match objects associated with the molecule. Python doesn't know the match object handles are invalid. When the 'del self.handle' occurs, the smart_ptr for the match object calls its __del__, which tries to deallocate the invalidated match object. This is not allowed by the toolkit, although thankfully it returns an error message rather than core dumping. By deleting the objects in the correct order I ensure the library calls are done as needed by the toolkit. I understand that Ruby also has a way to do finialization for an instance, but I'm concerned about several things. 1. The size of a toolkit object can be large - up to 64K atoms, which is a couple of MB. Because Ruby doesn't know anything about that memory, how does it know when to do garbage collection? Under the C implementation of Python, this object is gc'ed when it is no longer needed - when the ref count hits 0. It can't be subtle and search memory for pointer-like values because the toolkit stores a table mapping the integer handles to the internal pointer. 2. The Ruby way to do finialization seems to be with a 'ObjectSpace.define_finalizer' method, which associates a finalizer with each object instance. Does that mean each instance I create needs to be registered? (As compared to Python where I define the finalization in the class definition - not with each instance.) 3. That lets me implement the 'smart_ptr' behaviour, but I still don't understand how to define finalization relationships between two Ruby objects, so that I can guarantee one object is removed before the other. In pure Ruby code this isn't needed, but I need to match the semantics expected by the C library. In Python I did it by defining the order in the __del__ (specifically in MatchObject.__del__) I cannot find a pure Ruby solution to this problem, which means working at the C level. I know just how complicated the Python code was to ensure the correct dependencies - I'm glad I could code everthing up in Python. Is a pure Ruby solution possible? I read in http://www.rubycentral.com/book/ext_ruby.html how to manage a C pointer with Data_Wrap_Struct. This interface allows you to tell the gc which associated objects should be marked as "in use." In some sense this is what I'm looking for excepting two things: a. I still don't know how to tell it which objects are to be deleted first. Are those mark relationships stored so the acyclic components are removed in the correct order? b. I don't have a C pointer. All I have are integers. I guess I could cast them to a pointer value, but there's always the chance it could collide either with a real pointer or with another library which also uses integer handles. If my observations are correct, then there is a category of C libraries which do not work well under Ruby but do work well under the C implementation of Python. (I say C implementation of Python because the __del__ semantics are implementation dependent. The Daylight toolkit is also available in Java via JNI. My code can talk to it using Jython. But Jython doesn't run the __del__ methods, instead leaving gc up to the Java runtime. But the JVM doesn't know the internals of the toolkit, so I end up leaking memory all over the place.) BTW, if you wish, the source code for this Python package is available at http://starship.python.net/crew/dalke/PyDaylight-0.7.tar.gz and a description of some of the implementation details is available in the Jan. 2001 Dr. Dobbs. Sincerely, Andrew dalke / acm.org P.S. Any errors in interpretation of Ruby or Python are purely my fault. My background is in physics and my interest these days is software applications development for computational chemistry and biology. That means I may not use the right computer science words for certain topics and that I do not have the experience to readily understand how Ruby works.