Garbage Collection in Python

Garbage Collection in Python

In the previous article, we explored how objects are stored in memory and how they are referenced by variables in Python. However, an important question arises: what happens when objects are no longer needed? Keeping them in memory for an extended period can lead to unnecessary memory consumption. This is where the garbage collector comes into play.

Before delving into the details of the garbage collector, let's clarify some fundamental concepts:

Reference Counting

Whenever we create an object, it occupies memory. If that object is referenced by another variable, Python's memory manager increments a reference counter. Essentially, Python's memory manager keeps track of references and the number of times an object is referenced by variables.

Consider the following example.

a = dict()
b = a

In this example, the dictionary object is referenced by both a and b.

To check the reference count, we can use the built-in ctypes package. We pass the object's address to determine how many variables reference that specific address:

import ctypes

print(ctypes.c_long.from_address(id(a)).value)
# 2

To remove a reference, we can simply assign the value None to the variable a, and the reference count should decrease:

addr_a = id(a)
a = None

print(ctypes.c_long.from_address(addr_a).value)
# 1

Here, we store the object's address in another variable before removing the reference. As soon as we assign None to variable b, the count will decrease to 0.

Circular Reference

Sometimes, two objects refer to each other, creating a circular reference. This can lead to a memory leak because these objects may not be properly removed from memory. Consider the following example:

class Company:
    def __init__(self, name):
        self.name = name
        self.clients = []

    def add_client(self, client):
        self.clients.append(client)

company_a = Company('A')
company_b = Company('B')

company_a.add_client(company_b)
company_b.add_client(company_a)

In this example, both company_a and company_b reference each other.

Garbage Collection

Python's garbage collection is responsible for removing objects that are no longer referenced by any variables. The garbage collector runs periodically and primarily uses reference counting. When the reference count of an object becomes 0, the garbage collector cleans up the object from memory. However, it doesn't work well in the case of circular referencing.

Python employs several mechanisms for garbage collection:

  1. Reference Counting: As mentioned earlier, Python keeps track of each object's reference count, and when it reaches 0, the garbage collector removes the object from memory.

  2. Cyclic Garbage Collection: This mechanism focuses on identifying circular reference objects and removing orphan objects from memory.

    • The cyclic garbage collector starts its work by identifying potential circular references in the program's memory. It identifies objects that have reference counts greater than zero but are not directly accessible from the program's root objects.

    • The collector traverses the object graph starting from the root objects which includes global variables, and local variables in currently executing functions, to identify and mark reachable objects. During this traversal, it follows references between objects and marks them as reachable.

    • Once the collector completes the traversal, it employs a mark-and-sweep algorithm to identify and collect cyclically referenced objects that are not reachable from the root set. The sweeping phase scans through all objects in memory and deallocates those that are not marked as reachable, effectively reclaiming their memory.

  3. Generational Garbage Collection: Python's garbage collector divides objects into three generations (Generation 0, Generation 1, and Generation 2). New objects are created in Generation 0, and objects that survive one or more collections are promoted to the next generation. Younger generations are prioritized for collection, improving efficiency.

  4. Explicit Garbage Collection: While Python's garbage collector typically manages memory automatically, developers can manually trigger garbage collection using the gc module's functions, such as gc.collect().

Understanding the garbage collector is crucial when working with Python, even though you may not need to interact with it directly in most cases. Knowing what happens under the hood can help you write more efficient and memory-conscious Python code.