On 8/26/10 11:51 AM, Asher wrote:
> Right - so how does a pointer ever get off the stack?
>
When a C function returns, the C stack pointer register (usually called 
"SP") is reset to the frame pointer (sometimes this register is called 
"FP").  The FP points to the current function arguments.  The area 
between the SP and the FP +- the space for arguments (and the other 
machine registers) represent the local variables, temporaries and 
arguments of the current function call (sometimes called an "activation 
record").

Load any C program under a debugger and you can see the assembly code.

The MRI GC knows where "top" (SP) and the bottom of the stack is because 
of mostly portable conventions on how C compilers generate code that 
manipulate SP and FP and how the operating system lays out the process' 
memory.  The stack, the machine registers and some global variables are 
part of what is sometimes called the "root set".

The MRI GC scans the root set for values that "look like they point to 
Ruby objects" and "marks" those objects recursively as "in use".  Any 
unmarked objects ("not in use") are definitely not referenced by 
anything else and can be deallocated ("sweeped").  The GC must 
"stop-the-world" while it does this "marking" and "sweeping" -- nothing 
else can happen till this finishes.   If the GC couldn't sweep anything, 
it allocates more memory from the OS (by calling malloc(), which calls 
something at a much lower level (sbrk() or mmap() or something else).

> For instance, in my example, where the variable with reference to the object has been assigned nil - the same thing occurs if the variable goes out of scope.
>
> So in both of those cases, the object "should" be garbage collected; I understand that it's possible, due to conservative GC, that it might mistake a number on the stack (a long), etc. as a valid pointer, but generally when GC runs it should decide that the var (which has no valid ruby references) is no longer live and should be GC'd. Or am I missing something?
>
> So we have a var with no references in Ruby that is being marked as live by the GC because the pointer has not yet been deallocated. So how does it ever get deallocated in order to not be marked as live?
>
> If what I am seeing is the case (and I assume it cannot be and that I am missing something) then the object would never be garbage collected.
>
> So how does GC actually occur?

Collection occurs in MRI when a new object is needed and there are no 
unused objects left around and/or there was a certain number of 
allocations since the last GC.

> What causes the pointer to be deallocated?
>
"Pointers" are never allocated or deallocated as in malloc()/free(). 
Only objects that have no references to them are deallocated.

The C compiler generates code that simply increments or decrements the 
SP or changes the FP -- Stacks are FIFOs.

The MRI GC is a very simple "stop-the-world", "mark-and-sweep" 
"conservative" collector.  Conservative meaning "treat anything that 
looks like a pointer to an object as a pointer to an object".  This can 
cause conservative collectors to keep some objects around longer than 
they should.  This is also be cause most C compilers leave garbage (old 
pointers) on the stack.

The Rubinius GC is different.  The MRI Enterprise Edition uses 
additional techniques on top of the standard MRI GC to improve 
performance in web servers and long-running processes.


> Asher
>
> On Aug 26, 2010, at 12:32 PM, Roger Pack wrote:
>
>> It walks the stack, and (for sake of ease of understanding) marks all
>> pointers still on the stack as "live"
>> then it marks all of their children as "live"
>> then all of their grandchildren, etc.
>>
>> Then it traverses the entire heap, looking for objects that haven't
>> been marked as live, and "frees" them.
>>
>> NB that these two stages are separate, so they don't conflict.
>> HTH.
>> -r
>
>

Yea, what Roger said.  :)

More here:

http://en.wikipedia.org/wiki/Garbage_collection_%28computer_science%29

A widely-ported and long-used GC can be downloaded here:

http://www.hpl.hp.com/personal/Hans_Boehm/gc/

ACM has a long rich history of GC research -- there's even a yearly 
symposium on the subject.  Appel's book is a great reference.
This book (sadly out of print) is a good introduction:

http://www.amazon.com/Topics-Advanced-Language-Implementation-Peter/dp/0262121514

There are far more complex GC algorithms that perform better in most 
cases.  Conservative mark-and-sweep collectors are far easier to 
interface with C code than other approaches -- most others require 
considerable cooperation between the code and the collector.

HTH^2,
-- KAS