A Generational, Thread-Aware GC for Parrot
Abstract:
Parrot, the VM for Perl6 (and other languages), currently uses a
plain Mark-and-Sweep GC. As it will be used on a wide range of systems, from
clusters to embedded, this scheme isn't really optimal. It also does not
consider multi-threading, which should be included later in Parrot.
The project is to implement a new GC that fixes all (or most of) these issues,
by using a generational, copying scheme.
Benefits:
This new GC should be more efficient and faster than the one that is currently
being used. It won't need to be rewritten when multi-threading comes to parrot.
It also provides a simple and cheap solution to the problem of timely
destruction, which is needed for python and backward compatibility with perl5.
Some parts should also be tunable (at least at build-time), which is always
useful when running on different platforms with different needs.
Project Detail:
This is a copy of the mail that was sent to perl6-internals@perl.org, where it
is currently being discussed
(http://www.nntp.perl.org/group/perl.perl6.internals/29994).
In order to allow more powerful GC schemes, we need to be able to copy objects
from a location of memory to another. There are problems with this, though, as
C structures can hold pointers to objects we are moving, and we have no way
(and don't want to) to make them update this info.
So we add a level of indirection: objects consist of a header and the actual
data. Headers are allocated once and for all at object creation and do not
move. They only hold a pointer to the 'real' object, thus making it possible
to move objects anywhere in the memory. Outside functions will see the header
address as the object address and some internal macros will handle the extra
indirection.
The big disadvantage of this approach is that we use one or two words (if
objects need to know where their header is, which seems reasonable) per object.
We then use a generational, incremental mark-and-sweep algorithm, such as
described by http://citeseer.csail.mit.edu/armstrong95one.html
We have different generations (in fact queues) of objects, from 1 to n-1
(the value of n could be tuned dynamically). We also have two special
generations, n, where objects do not grow old (to take care of timely
destruction) and 0, where objects are very constant and long-lived (mainly
used during parrot init). Objects in a generation are also sorted according
to their age.
The idea is to have the following invariant: all pointers go from new objects
to old objects. Thus, we can mark all objects in only one pass, starting from
the newest objects, generation n (using a Eratosthene-sieve-like method : if
an object is not marked, leave it, else mark all of its pointers). When an
entire generation has been processed, all non-marked objects are destroyed
and the others are compacted. Generation n becomes generation n-1, with the
exceptions of generation 0 and n.
Comparing the age of two objects then becomes very easy : if they are in
different generations, just compare their numbers. If not, the one with the
lower address is the youngest.
We can of course use a generational approach by limiting the number of
non-marked objects.
One problem is that our invariant can be broken. The solution is to track
down these inter-generational pointers (IGP) and simply add them to the root
set.
In order to reduce the number of IGP, we allocate an aggregate as being
artificially younger than scalars it has pointers to.
Thread issues still need to be discussed and are not covered here.
Keywords: generational, mark-and-collect, copying
Comments (0)
You don't have permission to comment on this page.