Models
Orjinal Makale https://markusholtermann.eu/2019/03/less-obvious-things-to-do-with-djangos-orm/
Less Obvious Things To Do With Django’s ORM
Django’s ORM is powerful. And it has been for ages. It only became even more powerful over the last years. Django’s ORM can be intimidating because it can do so many things. And it can also be confusing when trying to find out how to do things.
Our Models
Before we begin with the examples we need to define our database layout. Throughout this post I will be using these three database models:
from django.db import models
class Author(models.Model):
name = models.CharField(max_length=200)
def __str__(self):
return self.name
class Book(models.Model):
title = models.CharField(max_length=200)
author = models.ForeignKey(
Author, on_delete=models.CASCADE, related_name="books"
)
votes = models.PositiveIntegerField()
genres = models.ManyToManyField("Genre", related_name="books")
def __str__(self):
return self.title
class Genre(models.Model):
name = models.CharField(max_length=200, unique=True)
def __str__(self):
return self.nameYou can input whatever data you want. My talk-orm repository on GitHub contains the full Django project as well as some sample data I sourced from Goodreads.
Counting objects
Let’s start with something basic. We’ll count how many records we have in the database:
We access the attribute objects on a database model class. objects is a “Model Manager”. Django uses them to connect database model classes and QuerySets. We then access the count() method on a manager which uses the underlying, auto-generated queryset to create the SQL that returns the number of records for that model.
Fetching single objects
Great, let’s fetch some random author from the database:
Well, that obviously didn’t go as planned. We got an exception that multiple objects got returned. That’s what get() is all about. It returns exactly one object. Not zero, not two or more. One!
Alternatively, we could use first():
The difference here, first() may return None in case there is no record to return. We can see that when we try to select me as an author:
While first() returned None, get() raised a DoesNotExist exception.
Advanced filtering
We’ve just seen how we can filter on the exact value of a model field when querying the database. But there’s more. For example, we can select all authors whose name starts with "Lisa":
Similarly, when we don’t care about upper and lower case and only care if the character sequence is part of a record, we can use __icontains:
All these Field Lookups are fully documented.
Following related objects
Let’s say we want to print a list of all books with their corresponding author. The first approach we will make may very well look like this:
That works, we end up with a lot of database queries. Specifically, we end up with 1 + $number_of_books queries. Why is that?
First, we’re selecting all the books. That’s one query. Then, in the for loop, we make one query per book. In case you’re wondering: that is bad! It may be “just fine” for two to five books, but when you have more, you will end up with performance problems.
Sidebar: Inspecting database queries
When you’re developing your Django project or app, it can be helpful to check the recent database queries quickly. For that, Django tracks them on the database connection:
Following one-to-one and many-to-one relationships
Getting back to where we left off, we need to find a way to optimize our 1 + $number_of_books database queries. And Django has just the right tool for that: select_related(). This queryset method tells Django to fetch forward relationships when making the initial query:
Now we have only 1 query. Exactly what we wanted.
I wrote above that select_related() is for forward relationships. That means, it only ever works when there is at most one object on the other end of the relationships. “At most,” because that related object could also be None, e.g. when you have a ForeignKey with null=True. In other words, you can use select_related() when the current model has a ForeignKey or OneToOneField, or if the current model is the opposite end of an OneToOneField. It will not work for ManyToManyFields or the reverse of a ForeignKey.
Following one-to-many and many-to-many relationships
When there are one-to-one and many-to-one relationships, there are probably one-to-many and many-to-many as well. And indeed, there are. You use them when you have ManyToManyFields or when you follow a ForeignKey backward.
Given our database schema, we have that scenario when we want to list all the books for each author. The naïve approach will look a bit like this:
As you might imagine, this has similar problems as the example I had above. We now have 1 + $number_of_authors queries: one for the list of authors, and one for each author to get the books. We can optimize this to exactly two database queries:
Django will construct a prefetch query under the hood which pretty much equals to:
The filter on author_id will automatically be populated by Django and limit the books to the set of authors selected in the first query.
If you want to limit the books queryset further, you can do so using Prefetch objects:
Aggregating data
All the things above are somewhat basic and something everybody using Django will come across eventually. The next topic about data aggregation and annotating database objects with additional information is something that’s still common, but it may take some time for this to come up in a project.
Let’s COUNT again
Let’s start by counting the number of books per author:
And at this point, it’s interesting to start to look at the SQL Django generated:
The critical puzzle piece in this SQL statement is the JOIN between the author and book tables with the COUNT in the SELECT clause. Django shifts the entire work to calculate the sum of books per author to the database.
Let’s SUM it up
What we have is already great. Now, let’s look into finding the top five authors with the most votes across all books:
Instead of a simple count, we sum up all the votes for all books an author has written. We then sort the authors by the sum of votes in decreasing order and limit the amount of records returned from the database to five.
The SQL for this query is not too different to the previous one. Instead of COUNT there is a SUM function. There is now a ORDER BY column with a LIMIT. All as expected.
Annotating “arbitrary” data
The annotations shown above are not the only thing Django can do. There’s a lot more:
This query will count the books per author, but will also attach the “first name” to each model instance.
If we add a values() queryset method after the first annotate() method we effectively group on the first name for the counting:
I’m well aware that this is a lot to digest and understand. The Django documentation has a whole chapter on aggregations that I can highly recommend to read through and have a look at whenever you need to deal with aggregations and annotations, because I haven’t even covered half of it.
Top-k selects
The last thing I want to cover is something that’s generally not easy to express in SQL and also computational wise rather heavy. It’s about selecting the top-k elements for something else.
The typical approach to this problem, across all databases, is the use of subqueries. A subquery is a full SQL query that will run as part of a “main” database query. Some database have additional capabilities, though, that make top-k selection relatively easy. On PostgreSQL, that would be a LATERAL JOIN.
Let’s start by first selecting the top three books by the number of votes per author, and then the top three books by votes per genre.
Top three by author
When we discussed the prefetch_related method before, we already looked into the Prefetch object. We will leverage that here. Let’s build this query piece by piece.
We want authors and a set of books that belong to each author. For now, the books_qs won’t be doing much:
With this, we will prefetch all books that belong to an author. As a next step, let’s sort the list of books by votes in descending order:
The last step is to limit the number of books to k. The approach everybody takes will be this:
However, this will cause Django to raise an AssertionError:
If one thinks about that, Django will take the book_qs and apply a filter() call on the author_id to limit the books to the list of authors selected before. So, we need another approach. There’s already a feature request ticket on the Django bug tracker.
Instead, we need to look into Subquery and OuterRef.
First, we’ll select the primary key of the top k books while filtering on an outer reference to an author_id. This queryset will not work on its own. It will only ever work in the context of a subquery that knows about an author_id.
We then put that book_sub_qs into a subquery. With that, the inner query “knows” about the author_id. If we were to iterate over books_qs, we’d get a list of books, the first book having the most votes, and not more than 3 books per author:
With that, we can now go back to our Prefetch() object and combine authors and books:
Top three by genre
We can use the very same pattern we have above for top-k by an author when we want to select the top-k by genre.
The notable difference between the Book-Author and Book-Genre relationship is that one of them is a many-to-one (Book-Author) and the other one is many-to-many (Book-Genre).
Due to the relationship being a many-to-many one, we need to make one change to remove duplicate books — the book_qs gains a distinct() call.
Last updated