Hmm interesting. So I was looking at it from the single threaded perspective so obviously missed some subtle implications. If I understand correctly, the problem is that 1) If you have a large stacked thread "full of garbage" then this garbage will be copied into the stack of a small stack after context switch if it grows. 2) If a single thread creates a very "dirty" stack then goes into a deep nested loop [ex: going to sleep forever within a very nested call], it will not free the invalid references until it comes out of that deep stack later. I suppose we can operate under the assumption that when the program starts, the extent of the stack is "clear" of bad references. A few tricks up our sleeve: We can do a stack cleaning around the time of a context switch: We can clear the difference in size between the stacks after each context switch. We could clear that difference PLUS re-clear the "cleared once" area below the stack, after each context switch. Or perhaps do the "clear at most once" trick only if rb_thread_alone, though I think the above would already do that. So anyway we could basically reset the "already cleared" markers once per context switch, instead of once per GC, and re-clear that stacks damage. Would that help? In reality I'm not sure if these would be necessary. How can we tell how much is necessary? Old notes: So let's then keep two values, per thread. One being the top of a "clean section" the other the bottom of the "clean section" [already swept section]. Make this "clean section" grow as possible [check it every CHECK_INT, if you're above it, grow it, if you're below it, reset it to start below you, etc.]. So we have track of, per thread, a growing cleaned area. Now when you context switch, if you switch from a large stack to a shorter stack, clean the difference, plus the "dirty but clean now" section--clean it again. Reset the pointers. I guess just try it out :) Or I might get around to it eventually. Comments inline: > My bogus2 benchmark switches between one thread having a very deep stack and > another with a shallow stack. It's the worst conceivable case of stack > thrashing. It runs about 15% faster if I disable only the clearing of the > stack. I wonder if that's what causes the micro-benchmark slowdowns [what are they like 5%?] What about disabling the depth checker, too? What's its impact? > When whatever transient ghost references remain, change value, GC will > eventually collect the objects to which they referred. Correct? Yeah > 2) GC is not triggered by any thread's particular activities. It may be > that a given thread, whose stack has become full of ghost references due to > deferred stack clearing, stops running for long periods of time. Or, that a > such a thread just never happens to be running when a GC is triggered. True if a thread "doesn't run at all" between GC's then it won't clear its stack until...it runs again at some point :) A thread basically gets a window of 1 GC to create as much trash as it wants, and, if it ceases running, retains that much trash. -=r