On Wed, Aug 21, 2002 at 10:52:10PM +0900, William Djaja Tjokroaminata wrote:
> I have been thinking about this problem last night, especially the good
> (counter) example given by Reimer.  Because of my philosophical "Ruby-C
> separation" (let Ruby and C manage their own memories) for execution
> performance reason, I came up with this solution.  In my malloc wrapper
> function, I will also do simple counting on the memory allocated on the
> C side so far:
> 
>     safe_malloc (int size)
>     {
>         total += size;
>         if (total > threshold)
>         {
>             rb_gc ();
>             threshold += GC_MALLOC_LIMIT;
>         }
>         if (ptr = malloc (size))
>             ....
>         else
>             .....
> 
>     safe_free (void *ptr)
>     {
>         total -= .....
>         free (....)
> 
> I think with this malloc wrapper function, the total garbage can be
> limited to 2 * GC_MALLOC_LIMIT = 16 Mbytes.  
 
Even without safe_malloc, you may have more than 8MB garbage at a given
time, if you have allocated n MB and they do suddenly become garbage.

Anyway, using your solution, the total garbage will be < 16MB 
if your objects are created and destroyed more or less sequentially (ie
they aren't a lot of them alive at the same time) which I think should
happen in the network simulation you're doing.
 
 The code above seems to indicate that the threshold will grow when
needed, but then again this is OK IMO as that's what I think Ruby should
be doing :-) You're in fact implementing the dynamic GC_MALLOC_LIMIT
thing...
 
Example:
 *	do some safe_malloc() until total > (GC_MALLOC_LIMIT = 8MB);
 		rb_gc is called, but imagine there's no garbage yet
 *	then threshold is increased and is now 16 MB
 *	allocate with safe_malloc, say, 5MB more (13MB+ are allocated)
 *	safe_free() all the data 
 *	allocate with safe_malloc(); rb_gc won't be called until you've
		total > 16MB this time
 * 	those total bytes may become garbage at a later time
 
> Also, when the C memory fluctuates, say, between 7 and 9 Mbytes, the
> Ruby gc will not be called unnecessarily. This solution will solve the
> problem, won't it? Probably there are other counter-examples that will
> defeat this solution?

I think it should do the trick.

> The idea of dynamic GC_MALLOC_LIMIT is very interesting.  I don't know
> whether there has been some research on the algorithms for mark-and-sweep
> gc.  For now, I think using a reasonable constant GC_MALLOC_LIMIT value is
> adequate.

Suppose the amount of reachable data is R, and H is the total heap size
(ie, R + garbage). In one run of the GC, you can free up to (H - R),
that is, the garbage at that point.

The cost of garbage collection involves 2 things:
 * marking: O(R)  => more or less O(number of objects)
 * sweeping: O(H) 

We can thus write it as 
	c * R + d * H

The amount of garbage thrown away is H - R, thus, the cost per amount of
garbage reclaimed is
		 c * R + d * H
		---------------
		    H  -  R
		    
The cost is fairly high if H is close to R (this is the case when the
gc is triggered when there's little garbage compared to the total data).

To lower it, given some R, we'll want H to be greater (and so will be
H-R), that is *to have more garbage collected per run*.

This is why IMO the gc should be less and less frequent (where "time"
corresponds to ALLOCation) as the heap size grows.

You can also change the current mem/speed tradeoff by setting
GC_MALLOC_LIMIT to a higher value (this way the amortized cost of gc
will be lower _but_ there will be more garbage and each run will be
slower).

-- 
 _           _                             
| |__   __ _| |_ ___ _ __ ___   __ _ _ __  
| '_ \ / _` | __/ __| '_ ` _ \ / _` | '_ \ 
| |_) | (_| | |_\__ \ | | | | | (_| | | | |
|_.__/ \__,_|\__|___/_| |_| |_|\__,_|_| |_|
	Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com
  
<|ryan|> I don't use deb
<netgod> u poor man
<Disconnect> netgod: heh
<Kingsqueak> apt-get install task-p0rn