Save your Django models using update_fields for better performance

The Django ORM is designed to turn the rows of your database tables into objects that can then conform to object oriented principles. This makes it very easy to create, update and delete entries from your database.

However, there are certain advantages to using raw queries instead of an ORM. For instance when you update a row in your table, you might want to update only a subset of the columns in that row and not all of the columns.

Saving a Django model object updates all your columns every single time you call a save() method. To prevent this from happening you must be explicit.

## What save() does internally

Consider a Django model called Record which has the following fields:

from django.db import models

class Record(models.Model):
# id will be created automatically
name = models.CharField(max_length=255)
created_at = models.DateTimeField(auto_now_add=True)
is_deleted = models.BooleanField(default=False)

If you would like to update the name of a record you might do something like this

>>> record = Record.objects.get(id=1)
>>> record.name = "new record name"
>>> record.save()

If you turn on logs in the underlying database that you are using which in my case is Postgres, the query that actually runs is this:

UPDATE "record"
SET "name" = 'new record name',
"created_at" = '2021-06-12T15:09:05.019020+00:00' :: timestamptz,
"is_deleted" = FALSE
WHERE ""id" = 1

This may not seem like a big deal, but what if your model consisted of 20 fields and you run a save() operation on it very frequently? 

At a certain scale the database query that updates all of your columns every time you call save() can start causing you some unnecessary overhead. 

Why is the overhead unnecessary? Because it can be prevented with a simple tweak.

## Use update_fields in save()

If you would like to explicitly mention only those columns that you want to be updated, you can do so using the update\_fields parameter while calling the save() method.

>>> record = Record.objects.get(id=1)
>>> record.name = "new record name"
>>> record.save(update_fields=['name'])

The underlying query now becomes

UPDATE "record"
SET "name" = 'new record name'
WHERE "record"."id" = 1

You can also choose to update multiple columns by passing more field names in the update\_fields list. 

This is clearly a more efficient way to run your queries and will save you some database overhead.

## TL;DR

If you use the save() method with the intention of updating some specific columns in your database row, explicitly mention those fields by using the update\_fields parameter and calling the save() method like this:

**obj.save(update\_fields=\['field\_1', 'field\_2'\])** as opposed to just **obj.save()**

This will save you some database overhead by making the underlying query more efficient.

[Originally posted on my blog](https://www.sankalpjonna.com/learn-django/save-your-django-models-using-update-fields-for-better-performance)

9 thoughts on “Save your Django models using update_fields for better performance”

  1. great post, the ORM abstracts a lot of things. I think everyone should learn SQL even if you don’t end up using it. It will make you a better developer and you’ll have a better idea of what’s going on under the hood. And it’s really not hard

    Reply
  2. Great post mentioning something I wasn’t aware of! The only thing missing are some benchmarks so we could get a rough idea of how much of a speedup this could produce.

    Reply
  3. This is good to know about, but for most cases, the efficiency boost doesn’t outweigh the maintenance cost and fragility introduced. The farther the call to `save` is from the modification of the fields, the easier it is to forget to add all the fields to `update_fields`. anytime you change the code you have to change it in two places. if you forget to add a field to `update_fields`, data will be lost, and if you forget to remove an unused field, you’ll be generating wasteful SQL. IMO, any performance gain is not worth the risk, overhead, and vigilance necessary to maintain this as a best practice

    an alternative would be use `update` on a queryset instead:

    In [11]: p = Project.objects.get(id=2)
    In [12]: p.name = “xxx”
    In [13]: p.save(update_fields=[‘name’])
    UPDATE “base_project”
    SET “name” = ‘xxx’
    WHERE “base_project”.”id” = 2 [10.32ms]

    In [14]: Project.objects.filter(id=2).update(name=”xxx”)
    UPDATE “base_project”
    SET “name” = ‘xxx’
    WHERE “base_project”.”id” = 2 [10.42ms]

    that way you have everything in one place and update only the fields you need. the tradeoff here is that `save` doesn’t get called, so you’ll have to pick your poison carefully. i’d recommend sticking with a plain `save()` until you need to address any performance issues

    edit: for those unaware, `debug_toolbar` provides a management command `debugsqlshell` that displays the SQL generated when a queryset is executed

    Reply
  4. While using Django’s save with `update_fields` like save(update_fields=[‘name’]) also add your updated_at field or relevant modified date field else Django will not update the `auto-update fields with auto_now=True`.

    `obj.save(update_fields=[‘name’, ‘updated_at’])`

    Example:

    In [13]: rec = create_record(name=”Uprise”, )

    In [14]: rec.updated_at

    Out[14]: datetime.datetime(2021, 6, 13, 12, 56, 32, 945175)

    In [17]: rec.save(update_fields=[‘name’])

    In [18]: rec.updated_at

    Out[18]: datetime.datetime(2021, 6, 13, 12, 56, 32, 945175)

    In [19]: rec.name = “Uprsie – Organize”

    In [20]: rb.save(update_fields=[‘name’, ‘updated_at’])

    In [21]: rb.updated_date

    Out[21]: datetime.datetime(2021, 6, 13, 12, 57, 55, 561264)

    In [22]: rb.name

    Out[22]: ‘Uprise – Organize’

    Suggestion: You can create a custom mixin code that can override save and update `update_fields` which has `auto_now=True` when the value is not empty list.

    Reply
  5. It’s easy to overstate how much this helps with database performance. The database is much better at I/O than people sometimes think, but also the way the database actually performs I/O is different than what you expect. Most databases, including Postgres in your case, do block-based I/O, so it reads some block into memory containing many fragments of data, modifies it, and when writing it is necessary, writes the whole block back down to disk. If your row fits entirely within a block, you are (eventually) rewriting the entire block regardless of how much you changed within it. You _can_ get some benefit if you don’t write larger objects back to the database every time, especially those large enough to comprise multiple blocks for a single column value, but even in this case:

    1) many of the database column types for such objects (file, image, blob, etc.) actually only store an inline pointer, and often have some hashing semantic that avoids overwriting the unchanged body
    2) even when this is not the case, row chaining or row migration can push the database to do more block I/O than you expect even if you change a relatively trivial column

    For large column values there’s also network performance to consider. Sending a few extra bytes per column will not noticeably impact your performance, but sending an unnecessary 5mb file certainly may.

    The real, reliable benefits you can get from this apply to both django and the DB in similar ways:

    1) You may avoid firing a trigger with an `OF` clause if you do not include the field in the SQL statement. Likewise, these fields are passed to the `update_fields` kwarg of any `pre_save` or `post_save` signal in django, and you can avoid doing potentially costly work if the field has not changed.
    2) You may skip foreign key or check constraint validation in the DB, and related field validation within django, potentially requiring additional I/O or database calls.
    3) You may skip additional block I/O in the database due to secondary index reads if the index which includes the column was not used to select the containing block. The database will not modify the index for an unchanged value in any case.
    4) You can avoid the updating of `auto_now_add` fields in your model, assuming you don’t include the field in `update_fields`

    Reply
  6. A better reason to do this is its impossible to drop columns without downtime if you don’t. At least not without some extreme SeparateDatabaseAndState fuckery. Anything that touches all columns of a table (which is most queries in Django) will explode on removed columns during an upgrade :-/

    Reply
  7. I am not sure how much `update_fields` would help performance every time I’ve used it was for logic.

    I had two processes writing to the same row, and clobbering each other’s writes. Since they were writing to different fields it was fairly easy to use `update_fields`to keep them out of each other’s hair.

    Reply

Leave a Comment