N+1 Problem in Django and Solution for it.

N+1 Problem in Django and Solution for it.

In the previous article, I mentioned some common mistakes by developers who are new to the Django framework. I have talked to many Django developers and the most common problem they were unaware of was n+1 queries.

Problem

Django translates ORM queries to SQL queries and fetches data from the database. If you have relational fields like ForeignKey, OneToOneField or ManyToManyField and using the data for any operation then chances are you must be facing this issue.

Django Querysets are lazy in nature so it fetches data only when it is required. Whenever we fetch data for any model it will retrieve data for all fields (except when required fields are explicitly mentioned). But if you need additional data from the relational table, a new query will be executed again to retrieve the data.

To understand this, let’s imagine you’re working on an app that displays products and their categories, like an online shopping site. Each product belongs to a category. Refer to the following models:

class Category(models.Model):
    title = models.CharField(max_length=100)

class Product(models.Model):
    name = models.CharField(max_length=100)
    category = models.ForeignKey(Category, on_delete=models.CASCADE)

Now if you want to show both the product name and its category title, your code may look like this:

products = Product.objects.all()[:5] # fetching only 5 records
for product in products:
    print(product.name)
    print(product.category.title)

Django makes it convenient to fetch data from the database, but there’s a trick here. When you retrieve a bunch of products using a query, Django doesn’t immediately fetch all the related data, like category information, along with it. Instead, it’s a bit lazy; it only gets the main product data initially.

Here’s where the n+1 query issue comes into play. Let’s say you want to display details of 5 products. Django fetches those 5 products in one go from the database using a single query. But when you want to show the category title for each product, Django might end up running an additional query for each product just to get its category title. So, if you have 5 products, you’ll end up with 6 queries in total (1 for products and 5 for categories).

This can be a problem because it’s like going to a grocery store to buy a few items, but instead of picking up everything you need in one trip, you’re going back and forth for each item. This wastes time and energy, right? Similarly, in your app, running extra database queries for each product not only takes more time but also uses up extra resources on the server.

This can eventually slow down your app’s performance. Imagine if many users are using your app at the same time — it could struggle to keep up because it’s spending more time talking to the database than actually serving useful information to users.

To observe the number of SQL queries, I use django-debug-toolbar. Take a look at the following screenshot for the above example:

Solution

Django Queryset provides us with two methods to address this problem, select_related & prefetch_related. To understand where to use them, you first need to understand forward and backward relations.

In Django models, when you establish a connection between two models through fields like ForeignKey, OneToOneField, or ManyToManyField, you create a relationship. Understanding this relationship is crucial to efficiently fetch related data.

1. Forward Relation

When a field is added to a model that refers to another model, it’s known as a forward relation. In your example, the category field in the Product model is a forward relation. This means each product “points” to a category.

2. Backward Relation

When a related model has an implicit reference to the model it’s connected to, it’s called a backward relation. In your case, since a ForeignKey is used, the Category model is backwardly related to the Product model. This allows you to access all products that belong to a specific category.

Using select_related and prefetch_related

1. select_related
The select_related method is used to optimize forward relations. It tells Django to fetch the related data along with the main data in a single query, avoiding the n+1 problem. This is effective when you have a situation like your example where you want to display products and their categories.

products = Product.objects.select_related('category').all()[:5]
for product in products:
    print(product.name)
    print(product.category.title) #No additional query, as category data is already fetched

Refer to the following screenshot to see database queries, you can see the number of queries reduced to just 1.

2. prefetch_related
The prefetch_related method is used for backward relations. It fetches all related objects in a single query, which is helpful when you want to access reverse relationships, like getting all products for each category.

categories = Category.objects.prefetch_related('product_set').all()
for category in categories:
    print(category.title)
    for product in category.product_set.all():
        print(product.name) # No additional query, as product data is prefetched

In the following screenshot, we can see the number of queries reduced to 2. In backward relation, it will always remain 2 regardless of the number of records that need to be fetched.

By utilizing select_related and prefetch_related effectively, you can minimize the number of database queries and significantly enhance your application’s performance. It’s all about optimizing the way you fetch related data to ensure your app runs efficiently, regardless of how complex its relationships might be.

Please refer to official Django documentation for more details: Go to Documentation

I hope you enjoyed this article. Hit that follow button for more such articles. Also, follow me on Twitter for more shitposting and updates ;)