C#: Garbage Collector I

09 October 2020 • ☕️ 3 min read
#GC#garbage collector#CSharp

What is Garbage Collection (GC)?

In C#, when you run an application, CLR (Common Language Runtime) allocates space in the managed heap. CLR manages the memory allocated in the heap at runtime and assign an object to the memory in the creation of the object. It uses a pointer to refer to the address where the next object to be assigned. As more and more objects to be created, the less space to be available in the heap. When the heap is full, GC is performed. It automatically collects unreachable memory without manually reclaiming it.


Mark, Sweep, Compact

GC adopts Mark-Compact algorithms. At the beginning of the GC cycle, GC treats every object as 'garbage'. Then, it looks at a list of roots in the application.

The roots reference consists of:

  • Stack reference

    • Local object. Only exists during the method execution.
  • Static reference

    • Static object. Exists during the application domain life cycle.
  • Handles

    • Refering to the interaction between the unmanged code and the managed code. Exists until the unmanged object requiers the managed object.
  • Finalizer reference

    • Refering to the object that is waiting for finalized function. Exists until the execution of finalizer.

After GC traverses the roots, the set of roots will look like a tree as below:


Root graph


When GC finds all references, it moves on to the next object. This is called Mark phase. In Sweep phase, it clears the memory that is not 'marked'. In Compact phase, it overwrites the unused memory by reallocating objects in the heap. This results in a chain of small object heap (SOH) and GC moves the pointer to the new object location.


SOH vs LOH

GC manages two types of heap: small object heap (SOH) and large object heap (LOH). In Java, the heap is split into two spaces - old generation and young generation. In C#, SOH has three parts and each is called Gen0, Gen1, and Gen2. A newly created object is places in Gen0. After GC, the survivors in Gen0 moves to Gen1. When there is another GC, Gen2 will be filled with the survivors from Gen1. Full GC takes place when Gen2 is full. Full GC executes Gen0 GC and Gen1 GC as well. Therefore, the overhead is huge. This can be prevented by GC tuning. We will cover this further in the next post.

Before Full GC starts, CLR pauses the application and starts optimising the memory. This phase is called stop the world. Once all running threads are stopped, CLR runs GC thread to perform full GC.

If the size of an object is over 85k bytes, it is assigned to LOH. In LOH, it does not perform compaction. This is because the overhead of copying the object is too big. Large objects can cause frequent GC and end up with the waste of memory. So it's better avoiding frequently creating a big object.


Finalize function

GC can call this function in the managed heap to clear up the memory. In C#, you add ~ at the beginning of the method. At compile time, it is converted into Finalize() functoin.


IDisposable

This frees up the memory before GC. You can remove the objects in the unmanged heap by implementing IDisposable interface.


More information