Mauricio Fern?ndez <batsman.geo / yahoo.com> wrote:
---------------------------------------------------------
> Even without safe_malloc, you may have more than 8MB garbage at a given
> time, if you have allocated n MB and they do suddenly become garbage.

> Anyway, using your solution, the total garbage will be < 16MB 
> if your objects are created and destroyed more or less sequentially (ie
> they aren't a lot of them alive at the same time) which I think should
> happen in the network simulation you're doing.
>  
>  The code above seems to indicate that the threshold will grow when
> needed, but then again this is OK IMO as that's what I think Ruby should
> be doing :-) You're in fact implementing the dynamic GC_MALLOC_LIMIT
> thing...
---------------------------------------------------------

Yes, I have thought the case of when at one time, say, the threshold is M
and the actual memory is M - 7 megabytes.  If suddenly the majority of
this (M - 7) memory is not needed anymore (but none of the C and Ruby
thresholds are reached), the garbage indeed can be larger than 8
megabytes.  However, this is more like further memory (and not
execution) optimization.  The fact is, at one time there is a valid memory
usage of (M - 7) megabytes and the computer is indeed able to handle
it.  If later, much of this memory is not needed, only other processes
will not get the memory.  But if the user is only performing a
network simulation in a computer, no execution performance degradation
will be observed.

If we indeed want to return the memory for other processes, then we need
some other counter for memory optimization:

    safe_malloc (int size)
    {
        ++counter;
        total += size;
        if (counter > FREQ || total > threshold)
        {
            counter = 0;
            rb_gc ();
            ....

The idea here is to set FREQ to a certain number such that rb_gc() is not
called as often as when we use ALLOC().

---------------------------------------------------------
> Suppose the amount of reachable data is R, and H is the total heap size
> (ie, R + garbage). In one run of the GC, you can free up to (H - R),
> that is, the garbage at that point.

> The cost of garbage collection involves 2 things:
>  * marking: O(R)  => more or less O(number of objects)
>  * sweeping: O(H) 

> We can thus write it as 
> 	c * R + d * H

> The amount of garbage thrown away is H - R, thus, the cost per amount of
> garbage reclaimed is
> 		 c * R + d * H
> 		---------------
> 		    H  -  R
> 		    
> The cost is fairly high if H is close to R (this is the case when the
> gc is triggered when there's little garbage compared to the total data).

> To lower it, given some R, we'll want H to be greater (and so will be
> H-R), that is *to have more garbage collected per run*.

> This is why IMO the gc should be less and less frequent (where "time"
> corresponds to ALLOCation) as the heap size grows.
---------------------------------------------------------

The idea of amortized cost of garbage collection is very interesting, but
probably our metric of interest for execution speed is only the total cost
itself:

     cost = c * R + d * H    = O(R)

(One question: why sweeping is O(H) and not O(R)?  Is the assumption
checking the marking bit is much cheaper than freeing the memory itself?)

When I simply use ALLOC, then the problem is my software does not scale
well.  The efficiency (in terms of events/sec) decreases when I have more
Ruby objects, even when they are static.

---------------------------------------------------------
> You can also change the current mem/speed tradeoff by setting
> GC_MALLOC_LIMIT to a higher value (this way the amortized cost of gc
> will be lower _but_ there will be more garbage and each run will be
> slower).
---------------------------------------------------------

Yes, in fact one of the objective in the software architecture is to be
able to run it with GC.disable, and therefore this is equivalent as
setting GC_MALLOC_LIMIT to infinity.  Thanks a lot for your thoughts.

Regards,

Bill