on garbage collection

part 1: why do garbage collection

Garbage collection, a job that has always had negative connotations. In real life, garbage collectors are people who collect unused waste to bring them to landfills or proper recycling facilities. In the software world, garbage collection is simply a form of automatic memory management, you can simply request memory (allocate) and not worry about giving it back (deallocating it) because the GC will handle it for you.

In this series of blog posts, we will talk about garbage collection: why do it, what is it, and how to implement a simple one.

But why garbage collection?

Before we can understand garbage collectors, we must first understand memory allocation.

Your computer has memory. Memory which is used to store application data, images, texts, even executables (aka the application itself). Let's imagine memory being composed of paper stored in a room. The OS (whether it be Linux, macOS or Windows), being a person inside of the room. The process being another person outside of the room.

|      |      |      |      |      |      |      |      |      |
|      |      |      |      |      |      |      |      |      |
|      |      |      |      |      |      |      |      |      |
             +------> \o/  os  ---------------------+
             |                                      |
------------ | ------------------------------------ | ----------
             |                                      |
             +-------------  process  \o/ <---------+

Memory allocation is the OS giving the process a sheet of paper, which the process can use. When the process doesn't any use for it, typically memory deallocation occurs and the piece of paper is returned back to the OS.

So you've just been given a page! What can you do with it? If we assume that the OS only gives out paper of fixed-sizes (literal 4kb page sizes), which most OSes do because architectural reasons, it would be a waste if we use up a piece of paper for a byte-sized char.

To solve that problem, we have another process-level memory allocation system beneath the OS-level system. It's common to have runtime C libraries that provide malloc/free functions for memory allocation/freeing smaller objects.

But why can't we just hoard memory, allocating memory and not freeing it?

Well we could, in fact some applications actually do this for better performance. Deallocation is expensive and when you have complex or large data structures, it incurs a large overhead, especially if the program is meant to be short-lived.

However in the real world, many programs are long-lived and thus it must free memory or the user is gonna be sad and terminate it. If the user doesn't kill it, it will hog the system of its memory and maybe even hang your OS.

The stack and the heap

Typically your runtime gives you two regions of memory to play with.

a stack: a continuous region where you can allocate memory by reserving some amount of space at the end, and freeing it by removing that amount of space at the end.

Imagine the stack being a literal stack of paper.

-------- <--
|      | <--
|      | <-- unused stack memory
|      | <-- the process may move the
-------- <-- end of the stack pointer up
|      | <-- to use it
|      | <--
|      | <--
-------- <-- end of the stack
|      |
| USED |
|      |
|      |
| USED |
|      |
-------- <-- start of the stack

The stack is small, you typically get no more than 10 MB per process on modern operating systems. Your CPU will keep track of at which memory address the stack begins and ends inside registers. The stack is used for small short-lived objects, mostly local variables.

a heap: a large dynamic region of memory, the OS doesn't automatically allocate it for you, so you'll have to allocate it yourself. Continuing our analogy, the process can ask the OS for paper, use those pieces of paper directly, and return them back to the OS, it doesn't have to be continuous either.

How to manage memory?

There are two styles of memory management most systems programming languages provide, each with its own pros and cons.

  1. Manual memory management

    The language provides you malloc/free functions and you have to do it yourself.

    pros: You do it yourself, complete control of however you want to optimize memory/cpu usage. Where to free objects is up to you, or you could just not free it.

    cons: You do it yourself. Many programming bugs have occurred when doing manual memory management, use-after-free, double freeing, memory leaks (technically not a bug however it can be harmful for some long-lived programs)

  2. Automatic memory management

    The language provides you with constructs that help you with deallocation.

    pros: The computer does it for you, you no-longer have to care about memory errors.


    • The computer does it for you. Sometimes the program have a runtime that tracks and frees memory for you, which would incur significant overhead.
    • Some languages like Rust which tracks who owns what in compile-time. In non-trivial cases you have to fight with the compiler to get things right.

    Automatic memory management comes primarily in 3 flavors: garbage collection, reference counting and escape analysis.

We'll only be discussing automatic memory management here.


Before we dive into automatic memory management, we must first know what ownership is. Simply put, ownership is when an object owns something. Say we have an owner object A, which owns the object B, we can draw it in a graph which looks like this:

 A          B
\o/ -----> \o/

Ownership also comes with borrowing, temporary leasing the object A to another function or another object, the object A must be returned to its owning scope. Once it returns it lives as normal until its lifetime ends as it dies. We won't be talking much about borrowing, as that will be for another time.

We can categorize styles of management by who owns what and who frees what.

Styles of management

Escape analysis: one owner, owner does deallocation

This is the simplest style of automatic memory management, one owner and one free. It's implemented through escape analysis, typically through RAII.

In this model, there is one owner for each child object. That owner can free its child once its lifetime ends (when it leaves a scope or some other object overrides it).

Many modern systems languages provide this as their primary model of memory management (Box in Rust, std::unique_ptr in C++)

Let's say you have a program like this:

Or to put it in a C++ way:

struct B {
    int num;
    B() : num(rand()) {};

struct A {
    std::unique_ptr<B> b;

void C() {
    A a;                            // birth of "a"
    a.b = std::unique_ptr(new B()); // |
    printf("%d", a.b->num);         // |
    // "death" of a <------------------+

The "a" object is a local variable, meaning it will last as long as the scope lasts. Whenever the scope of the "C" function ends (when it returns), the lifetime of the "a" object ends, which means all of its children gets deallocated.

Of course, you can transfer ownership, moving local variables outside of the scope by returning it:

A D() {
    A a;                            // birth of "a"
    a.b = std::unique_ptr(new B()); // |
    return a;                       // |
}                                   // |
                                    // |
void C() {                          // |
    A a = D(); // <--------------------+ "a" lives here now
    printf("%d", a.b->num);         // |
    // "death" of "a" <----------------+

Reference counting: multiple owners, last owner does deallocation

The one ownership model provides many benefits for typical linear structures and is provided in many modern systems programming language. However it's impossible to model one data shared by multiple instances. For example, how would you model this in a one owner model?

\o/ -------> [box] <------- \o/
i own this          i own this

The solution is to record the number of owners who own the box. Each time an owner dies, the number gets decremented. Finally if the number is zero then the box is finally deallocated. Modern systems languages typically also provide this model of memory management (Rc in Rust, std::shared_ptr in C++)

Something like this:

\o/ -------> [box of 2 owners] <------- \o/
i share this                     i share this

* when the left owner dies:
\o/          [box of 1 owner]  <------- \o/
i'm dying                        ok i own this

* when the right owner dies:
             [box of 1 owner]  <------- \o/
                                oh noes i die too
                                better take this box with me
-> the box gets deallocated

And thus everyone gets deallocated.

But what if you have a cyclic data structure, say a doubly linked list?

+-------+       +------+                    +------+
| LIST  |       |      | ---- i own u ----> |      |
|       | ----> | NODE |                    | NODE |
| START |       |      | <-- i own u too -- |      |
+-------+       +------+                    +------+
                2 owners                    1 owner

Say we have a doubly-linked list of 2 nodes. The list owns the first node. The first nodes owns the second node. But the second node owns the first, a cyclic structure.

What if we free the linked list? Well, the list owns the first node, so we'll lower the number of owners of the first node to 1.

 +------+                    +------+
 |      | ---- i own u ----> |      |
 | NODE |                    | NODE |
 |      | <-- i own u too -- |      |
 +------+                    +------+
 1 owner                     1 owner

Oops! You got a reference cycle! The first node extends the lifetime of the second node, and the second node extends the lifetime of the first. The two nodes will basically live forever, at least until the program ends, causing a memory leak.

We could alleviate this by introducing a type of reference called a weak reference. Weak references hold a non owning reference to a reference counted object. The weak reference may be valid, or it may not be valid.

How this is implemented in a standard library varies. For example in Rust, the standard library stores the object inside of a reference count wrapper, if the strong reference count reaches zero, then free the object, if the weak reference count reaches zero then deallocate the wrapper.

The cyclic structure would look like this:

+-------+       +------+                    +------+
| LIST  |       |      | ---- i own u ----> |      |
|       | ----> | NODE |                    | NODE |
| START |       |      | <--i have a weak-- |      |
+-------+       +------+     reference      +------+
                1 owner                     1 owner
                1 weak ref

Once the list is deallocated, this would happen:

+-------+       +------+                    +------+
| FREED |       |      |                    |      |
|       |       | NODE |                    | NODE |
| LIST  |       |      | <--i have a weak-- |      |
+-------+       +------+     reference      +------+
                0 owner                     0 owner
                1 weak ref

Then everything would be deallocated, because all of the node's strong reference count reached zero.

This is fine and well, but what if we have something a lot more complicated, like forests or directed acyclic graphs? It would be time consuming, and maybe even impossible to do weak refs for them all.

And that is where garbage collection comes in, which I'll talk about in the next post.