Monday, February 13, 2006

On-the-fly cycle collection

I did some Javascript programming recently and was surprised to learn that its garbage collector did not collect cycles. Brendan Eich's post today discussed the same problem for XPCom. Python had this problem but we (well, Neil mostly) fixed it years ago. It's basically impossible to require programmers to avoid cycles. Collecting cycles is the right thing.

The Python garbage collection algorithm adds a separate cycle collection phase in addition to reference counting. Periodically the garbage collector does a pass over the objects that subtracts references from tracked objects; the objects left with non-zero refcounts have external roots and a regularly mark-and-sweep phase proceeds from there. I believe the basic algorithm was introduced by Lins, but I can't recall the reference.

Eich's post reminded me of an alternate scheme by David Bacon and others. See the April 2005 paper by Paz et al. The key idea is that garbage cycles are only created when an object has its reference count decremented to a value greater than zero. If you have a cycle, it only becomes collectible when the last reference outside the cycle goes away; thus, a decref is necessary to collect the cycle. The key idea is to limit the garbage collection to a set of candidate objects that have recently been decrefed. It seems like a fairly straightfoward addition to Python.

1 comment:

Anonymous said...

Hi Jeremy,

I don't know much about the GC and cycle collection in CPython. Just enough to apply them to asynchronous peer networking:

http://laurentszyster/blog/finalization/

So, this is actually more a request for a comment than a comment ;-)

Is there anything you can think of that would prevent this hack of finalizations to work in CPython 3.0?

Regards,