Introduction
Garbage Collection refers to the process of recovering the system memory held by a program that no longer needs this memory. Other programs can use the freed memory. Hence, the garbage collector identifies the objects in the programs that will not be required in the future and frees their space. The most popular algorithm used to implement garbage collectors is Mark and Sweep.
Parallel and Concurrent Garbage Collection
A Garbage Collector is said to be parallel if it uses multiple threads, whereas it is said to be concurrent if it works simultaneously with the mutator.
Let us first see the steps performed by an incremental analysis:-
- First of all, it finds the root set. This step is performed with all the mutators stopped.
- Then it interleaves the tracing of the reachable objects with the mutator's execution (s). During this period, whenever a mutator points from a Scanned object to an unreached object, the reference is remembered. We have few options regarding the granularity through which these references are remembered. For now, we will assume the card-based scheme, where the heap is divided into sections called "cards," and a bit map is maintained indicating which cards are dirty (have had one or more references within them rewritten).
- Again the mutator(s) are stopped to rescan all the cards that might hold references to the unreached objects.
The set of objects reached by the root set in the large multithreaded applications can be very large. Hence, it is not possible to take time and space to visit all such objects while all mutations cease. It is also possible that because of the large heap and a large number of mutation threads, we need to rescan many cards again after all the objects are scanned once. Hence, we should scan the cards in parallel, and the mutators should execute concurrently.
We can implement step (2) tracing in parallel by using the multiple garbage collecting threads concurrently with the mutator threads. This would trace most of the reachable objects.
We can implement step (2) by stopping the mutators and using the parallel threads to find all the reachable objects.
Concurrent garbage-collection algorithm
- First, the root set is scanned for each mutator thread. All the objects directly reachable from that thread are put into the Unscanned state. The easiest incremental approach to this step is to wait till a mutator thread calls up the memory manager and scan its own root set if it has not already been done. If any mutator threads have not called up the memory allocation function, but all the tracing has been done, then the current thread must be interrupted to get its root set scanned.
- Now the objects in the Unscanned state are scanned. We use a work queue of fixed-size work packets to support the parallel computation. Each of these holds several Unscanned objects. As soon as the Unscanned objects are discovered, they are placed in work packets. These work packets will be dequeued by the Threads looking for work and then trace the Unscanned objects therein. This strategy will spread the work evenly among the workers in this tracing process. If the system runs out of space and the space is not found to create such work packets, we simply mark the cards holding the objects to force them to be scanned. It is always possible because the bit array holding the marks for the cards has already been allocated.
- Then the objects in dirty cards are scanned. When there are no Unscanned objects left in the work queue and all the threads' root sets have been scanned, the cards are scanned again for the reachable objects. Dirty cards continue to be produced as long as the mutators execute. Hence, we need to stop the tracing process using some criteria. We can allow the cards to be Scanned again only once or a fixed number of times, or when the number of cards to be scanned is reduced to some threshold. As a result, the parallel and concurrent step mostly terminates before completing the trace, which is finished by the final step below.
- The final step guarantees to mark all the reachable objects as reached. The root sets of all the threads can be found quickly using all the processors in the system after stopping all the mutators. A small number of objects have to be placed in the Unscanned state because the reachability of most objects has been traced. All the threads then start tracing the rest of the reachable objects and rescanning all the cards.