N+1 Problem in Django and Solution for it.
In the previous article, I mentioned some common mistakes by developers who are new to the Django framework. I have talked to many Django developers and the most common problem they were unaware of was
Django translates ORM queries to SQL queries and fetches data from the database. If you have relational fields like
ManyToManyField and using the data for any operation then chances are you must be facing this issue.
Django Querysets are lazy in nature so it fetches data only when it is required. Whenever we fetch data for any model it will retrieve data for all fields (except when required fields are explicitly mentioned). But if you need additional data from the relational table, a new query will be executed again to retrieve the data.
To understand this, let’s imagine you’re working on an app that displays products and their categories, like an online shopping site. Each product belongs to a category. Refer to the following models:
title = models.CharField(max_length=100)
name = models.CharField(max_length=100)
category = models.ForeignKey(Category, on_delete=models.CASCADE)
Now if you want to show both the product name and its category title, your code may look like this:
products = Product.objects.all()[:5] # fetching only 5 records
for product in products:
Django makes it convenient to fetch data from the database, but there’s a trick here. When you retrieve a bunch of products using a query, Django doesn’t immediately fetch all the related data, like category information, along with it. Instead, it’s a bit lazy; it only gets the main product data initially.
Here’s where the n+1 query issue comes into play. Let’s say you want to display details of 5 products. Django fetches those 5 products in one go from the database using a single query. But when you want to show the category title for each product, Django might end up running an additional query for each product just to get its category title. So, if you have 5 products, you’ll end up with 6 queries in total (1 for products and 5 for categories).
This can be a problem because it’s like going to a grocery store to buy a few items, but instead of picking up everything you need in one trip, you’re going back and forth for each item. This wastes time and energy, right? Similarly, in your app, running extra database queries for each product not only takes more time but also uses up extra resources on the server.
This can eventually slow down your app’s performance. Imagine if many users are using your app at the same time — it could struggle to keep up because it’s spending more time talking to the database than actually serving useful information to users.
To observe the number of SQL queries, I use
django-debug-toolbar. Take a look at the following screenshot for the above example:
Django Queryset provides us with two methods to address this problem,
prefetch_related. To understand where to use them, you first need to understand forward and backward relations.
In Django models, when you establish a connection between two models through fields like ForeignKey, OneToOneField, or ManyToManyField, you create a relationship. Understanding this relationship is crucial to efficiently fetch related data.
1. Forward Relation
When a field is added to a model that refers to another model, it’s known as a forward relation. In your example, the
category field in the
Product model is a forward relation. This means each product “points” to a category.
2. Backward Relation
When a related model has an implicit reference to the model it’s connected to, it’s called a backward relation. In your case, since a
ForeignKey is used, the
Category model is backwardly related to the
Product model. This allows you to access all products that belong to a specific category.
select_related method is used to optimize forward relations. It tells Django to fetch the related data along with the main data in a single query, avoiding the n+1 problem. This is effective when you have a situation like your example where you want to display products and their categories.
products = Product.objects.select_related('category').all()[:5]
for product in products:
print(product.category.title) #No additional query, as category data is already fetched
Refer to the following screenshot to see database queries, you can see the number of queries reduced to just 1.
prefetch_related method is used for backward relations. It fetches all related objects in a single query, which is helpful when you want to access reverse relationships, like getting all products for each category.
categories = Category.objects.prefetch_related('product_set').all()
for category in categories:
for product in category.product_set.all():
print(product.name) # No additional query, as product data is prefetched
In the following screenshot, we can see the number of queries reduced to 2. In backward relation, it will always remain 2 regardless of the number of records that need to be fetched.
prefetch_related effectively, you can minimize the number of database queries and significantly enhance your application’s performance. It’s all about optimizing the way you fetch related data to ensure your app runs efficiently, regardless of how complex its relationships might be.
Please refer to official Django documentation for more details: Go to Documentation
I hope you enjoyed this article. Hit that follow button for more such articles. Also, follow me on Twitter for more shitposting and updates ;)