Baker's mark-and-sweep algorithm
Since there is no easy method to locate unreachable objects without examining the entire heap, the sweeping portion of the mark-and-sweep approach is expensive.
We optimize this by retaining a list of allocated objects and then taking the difference between the allocated and reached objects to identify the set of unreachable objects.
Input: Set of objects, a heap, and a free list with all unallocated chunks of the heap.
Output: A modified free list after all garbage has been removed.
Algorithm
//Marking
1. Scanned = empty set;
2. Unscanned = empty set;
3. move objects referenced by root set from Unreached to Unscanned.
4. while( empty set != Unscanned ){
5. move some object obj from Unscanned to Scanned;
6. for(each object obj' referenced in obj){
7. if( reached-bit is 0 (obj' is unreached) ){
8. move obj' from Unreached to Unscanned;
}
}
}
//Sweeping
8. Free = Free U(union) Unreached;
10. Unreached = Scanned;
Here, We used four lists, Free, Unreached, Unscanned, and Scanned, each holding objects in a state of the same name. We initialized Scanned to an empty list and Unscanned to contain objects reached from the root set.
Next, we implemented a basic marking algorithm. We used a for-loop to examine references on one unscanned object obj if any obj' s references have not yet been reached. Add obj' to the Unscanned state in that case. Then, we took objects in the Unreached list and deallocated their chunks by moving them into the Free list.
It is beneficial to combine adjacent free chunks into larger chunks since every time we return a chunk to the free list, we evaluate the chunks (left and right) and merge if one is free.
Mark-and-compact algorithm
- We relocate collectors to shift reachable things around the heap to avoid memory fragmentation.
- Because the area occupied by a reachable object is usually smaller than the freed space, we can relocate all reachable objects to one end of the heap instead of freeing them individually after discovering holes, leaving the entire heap as one free chunk.
- The garbage collector has already evaluated every reference within the reachable objects at this point, so changing them to point to new locations isn't too difficult.
- This decreases fragmentation and allows huge objects to be stored in memory; also, because data only takes up a few cache lines and pages, relocation enhances a program's spatial and temporal locality by allocating newly created objects to the same cache line and page.
- For nearby chunks, prefetching gives an extra benefit. Instead of a free list, we only require a pointer free at the start of the one free block, which simplifies the data structure for keeping free space.
The algorithm is divided into three sections.
- Marking is similar to mark-and-sweep.
- In the second phase, the algorithm scans the allocated section of the heap and computes a new address for each of the reachable objects assigned from the low end of the head, ensuring there are no gaps between them. A NewLocation structure is used to store these addresses.
- The references to copying objects are modified to point to their respective new locations, whose addresses are contained in the NewLocation structure.
Input: Set of objects, a heap, and a free list with all unallocated chunks of the heap.
Output: A modified free list after all garbage has been removed.
Algorithm
Data structures used:
- UnscannedList.
- Reached bits in all objects.
- A free pointer marks the beginning of an unallocated region in the heap.
- The table NewLocation (hash table, search tree) is used to implement setting new location(NewLocation(obj)) and getting value of NewLocation(obj) given an object obj both at O(1) time.
// marking.
1. Unscanned: A set of objects referenced by root set;
2. while( empty set != Unscanned){
3. remove object obj from Unscanned;
4. for(each object obj' referenced in o){
5. if(obj' is unreached){
6. mark obj' as reached;
7. put obj' on list Unscanned;
}
}
}
// compute the new locations.
8. free: starting location of heap storage.
9. for(each chunk c in heap, from the low end){
10. if(i is reached){
11. newLocation(c) = free;
12. free = free + sizeof(c);
}
}
// Redirect references and relocate reached objects.
13. for(each chunk c in heap, from the low end){
14. if(c is reached){
15. for(each reference c.r in c)
16. c.r = NewLocation(c.r);
17. copy c to NewLocation(c);
}
}
18. for(each reference r in root set)
19. r = NewLocation(r);
Line by Line understanding of the algorithm.
- Line(1 - 7) is similar to the first basic algorithm.
- Line(8 -12) represents the second phase of the algorithm. It visits every chunk, and new addresses are assigned to these chunks in the same order as their old addresses.
- Line(13 - 17) represents the third phase of the algorithm where we visit reached objects and replace internal pointers of a reached object obj by the new values using the NewLocation table(Line(15 - 16)).
- Line(17), we move object obj with the new internal references to its new location.
- Line(18) and (19) retarget pointers in the root set elements, which are not heaped objects.
Conclusion
Objects are compacted in place using a mark-and-compact collector. The copying collector copies things from one region to another, freeing up space for relocation. This enables the movement of reachable objects as they are discovered.
- The complexity of the basic mark-and-sweep algorithm is proportional to the number of heap chunks.
- The complexity of Baker's mark-and-sweep algorithm is proportional to the number of items reached.
- The complexity of the basic mark-and-compact approach is proportional to the number of chunks in a heap plus the total size of objects reached.
Also check out - Phases of Compiler
Frequently Asked Questions
Which algorithm is used for garbage collection?
The mark-and-sweep algorithm is a tracing garbage collector because it traces out the whole collection of objects that are directly or indirectly accessible to the program. The algorithm finds unreachable objects and adds them to a list of available spaces.
Conclusion
In this article, we learned about Trace-Based-garbage collection. We looked into the mark-and-sweep algorithm that is a tracing garbage collector because it traces out the collection of objects that are directly or indirectly accessible to the program.
We looked at the three variations of this algorithm.
- Basic mark-and-sweep algorithm.
- Baker's mark-and-sweep algorithm.
- Basic mark-and-compact algorithm.
We learned that the complexity of the basic mark-and-sweep algorithm is proportional to the number of heap chunks.
The complexity of Baker's mark-and-sweep algorithm is proportional to the number of items reached.
The complexity of the basic mark-and-compact approach is proportional to the number of chunks in a heap plus the total size of objects reached.
Recommended Reading:
Do check out The Interview guide for Product Based Companies as well as some of the Popular Interview Problems from Top companies like Amazon, Adobe, Google, Uber, Microsoft, etc. on Coding Ninjas Studio.
Also check out some of the Guided Paths on topics such as Data Structure and Algorithms, Competitive Programming, Operating Systems, Computer Networks, DBMS, System Design, etc. as well as some Contests, Test Series, Interview Bundles, and some Interview Experiences curated by top Industry Experts only on Coding Ninjas Studio.
Cheers!