0% found this document useful (0 votes)
33 views50 pages

Two Scoops of Django 3x - Compress 8

This document discusses ways to identify and reduce bottlenecks in Django projects, noting that premature optimization should be avoided. It recommends using django-debug-toolbar to find excessive queries, caching to speed up repeated queries, and indexing database fields to improve query performance. Optimization efforts should focus on query-heavy pages and databases access.

Uploaded by

Can İsildar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views50 pages

Two Scoops of Django 3x - Compress 8

This document discusses ways to identify and reduce bottlenecks in Django projects, noting that premature optimization should be avoided. It recommends using django-debug-toolbar to find excessive queries, caching to speed up repeated queries, and indexing database fields to improve query performance. Optimization efforts should focus on query-heavy pages and databases access.

Uploaded by

Can İsildar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

25.

2: Use MkDocs or Sphinx with Myst to Generate Documentation From Markdown

25.2 Use MkDocs or Sphinx with Myst to Generate Docu-


mentation From Markdown
MkDocs and Sphinx with Myst are static site generators which render nice-looking docs
from your .md files. Output formats include HTML, LaTeX, manual pages, and plain text.

MkDocs should just work per the usage instructions at mkdocs.org/#getting-started

For Sphinx, you’ll need to follow the Myst instructions to generate markdown docs: https:
//myst-parser.readthedocs.io/en/latest/using/intro.html.

TIP: Build Your Documentation at Least Weekly


You never know when bad cross-references or invalid formatting can break the doc-
umentation build. Rather than discover that the documentation is unbuildable at an
awkward moment, just make a habit of creating it on a regular basis. Our preference
is to make it part of our Continuous Integration processes, as covered in Chapter 34:
Continuous Integration.

PACKAGE TIP: Other Documentation Generators


We list the two main Python-based documentation site generators, but if one is
willing to explore non-python options, there are numerous other open source and
commercial documentation alternatives. Here is a quick list of some excellent ones
we’ve used in the past few years:
ä docusaurus.io
ä docsify.js.org
ä bookdown.org

25.3 What Docs Should Django Projects Contain?


Developer-facing documentation refers to notes and guides that developers need in order
to set up and maintain a project. This includes notes on installation, deployment, architec-
ture, how to run tests or submit pull requests, and more. We’ve found that it really helps to
place this documentation in all our projects, private or public.

Here we provide a table that describes what we consider the absolute minimum documen-
tation:
Chapter 25: Documentation: Be Obsessed

Filename or Directory Reason Notes


README.md Every coding project you Provide at least a short
begin regardless of frame- paragraph describing what
work or languae should the project does. Also, link
have a README.md file to the installation instruc-
in the repository root. tions in the docs/ directory.
docs/ Your project documenta- A simple directory.
tion should go in one, con-
sistent location. This is the
Python community stan-
dard.
docs/deployment.md This file lets you take a day A point-by-point set of in-
off. structions on how to instal-
l/update the project into
production, even if it’s done
via something powered by
various devops tools that
in theory make it “simple”,
document it here.
docs/installation.md This is really nice for new A point-by-point set of in-
people coming into a structions on how to on-
project or when you get a board yourself or another
new laptop and need to set developer with the soft-
up the project. ware setup for a project.
docs/architecture.md A guide for understanding This is how you imagine a
what things evolved from project to be in simple text
as a project ages and grows and it can be as long or
in scope. short as you want. Good
for keeping focused at the
beginning of an effort.

Table 25.1: Documentation Django Projects Should Contain


25.4: Additional Markdown Documentation Resources

Figure 25.1: Even ice cream could benefit from documentation.

25.4 Additional Markdown Documentation Resources


ä python.org/dev/peps/pep-0257 Official specification on docstrings.
ä readthedocs.io Read the Docs is a free service that can host your Sphinx or Mk-
Docs documentation.
ä pythonhosted.org Python Hosted is another free service for documentation host-
ing.
ä en.wikipedia.org/wiki/Markdown
ä documentup.com will host README documents written in Markdown format.

25.5 The ReStructuredText Alternative


ReStructuredText is a plain text formatting syntax not too dissimilar to Markdown. It has
a lot more built-in features than Markdown but is harder to learn and slower to write. It is
used by core tools such as Django, Python and many older third-party libraries .

25.5.1 ReStructuredText Resources


ä docutils.sourceforge.net/docs/ref/rst/restructuredtext.html is
the formal specification for reStructuredText.
ä docs.readthedocs.io/en/stable/intro/getting-started-with-sphinx.
html - ReadTheDoc’s starting instructions for Sphinx.
ä sphinx-doc.org/ is the Sphinx project home page. While usable with Markdown,
Sphinx really shines when used with ReStructuredText.
Chapter 25: Documentation: Be Obsessed

25.6 When Documentation Needs to Be Convert to/from


Markdown or ReStructuredText
Pandoc is a command-line tool that allows us to convert files from one markup format
into another. We can write in one format, and quickly convert it to another. Here’s how to
convert files using this tool:

Example 25.2: Using Pandoc to convert between formats

\$ # To convert a ReStructuredText document to GitHub-Flavored


,→ Markdown
\$ pandoc -t gfm README.rst -o README.md
\$ # To convert a Markdown document to ReStructuredText
\$ pandoc -f gfm README.md -o README.rst

Pandoc’s documentation is found at pandoc.org.

25.7 Wikis and Other Documentation Methods


For whatever reason, if you can’t place developer-facing documentation in the project itself,
you should have other options. While wikis, online document stores, and word processing
documents don’t have the feature of being placed in version control, they are better than no
documentation.

Please consider creating documents within these other methods with the same names as the
ones we suggested in the table on the previous page.

25.8 Ensuring that Code is Documented


Even with unambigious naming patterns and type hints, it can be hard to determine what
a particular class, method, or function does. This is where docstrings have saved the day
for so many over the years. To make enforcing this level of documentation easier, use the
Interrogate library.

interrogate.readthedocs.io/

25.9 Summary
In this chapter we went over the following:

ä The use of Markdown to write documentation in plaintext format.


ä The use of static site generators to render your documentation in HTML. Some tools
enable rendering docs as epub, mobi, or PDF.
ä Advice on the documentation requirements for any Django project.
25.9: Summary

ä Using ReStructuredText as a documentation alternative.


ä Pandoc as a tool to convert between formats.
ä Enforcing documentation with Interrogate.

Next, we’ll take a look at common bottlenecks in Django projects and ways to deal with
them.
Chapter 25: Documentation: Be Obsessed
26 | Finding and Reducing
Bottlenecks

WARNING: This Chapter is in Progress


We are in the midst of working on this chapter and will expand on it in the days to
come. We are open to suggestions on topics and items to cover, please submit them
to github.com/feldroy/two-scoops-of-django-3.x/issues

This chapter covers a few basic strategies for identifying bottlenecks and speeding up your
Django projects.

26.1 Should You Even Care?


Remember, premature optimization is bad. If your site is small- or medium-sized and the
pages are loading fine, then it’s okay to skip this chapter.

On the other hand, if your site’s user base is growing steadily or you’re about to land a
strategic partnership with a popular brand, then read on.

26.2 Speed Up Query-Heavy Pages


This section describes how to reduce bottlenecks caused by having too many queries, as well
as those caused by queries that aren’t as snappy as they could be.

We also urge you to read up on database access optimization in the official Django docs:
docs.djangoproject.com/en/3.2/topics/db/optimization/

26.2.1 Find Excessive Queries With Django Debug Toolbar


You can use django-debug-toolbar to help you determine where most of your queries are
coming from. You’ll find bottlenecks such as:
Chapter 26: Finding and Reducing Bottlenecks

ä Duplicate queries in a page.


ä ORM calls that resolve to many more queries than you expected.
ä Slow queries.

You probably have a rough idea of some of the URLs to start with. For example, which
pages don’t feel snappy when they load?

Install django-debug-toolbar locally if you don’t have it yet. Look at your project in a web
browser, and expand the SQL panel. It’ll show you how many queries the current page
contains.

PACKAGE TIP: Packages for Profiling and Performance Analysis


django-debug-toolbar is a critical development tool and an invaluable aid in page-
by-page analysis. We also recommend adding django-cache-panel to your project,
but only configured to run when settings/local.py module is called. This will increase
visibility into what your cache is doing.

django-extensions comes with a tool called RunProfileServer that starts


Django’s runserver command with hotshot/profiling tools enabled.

silk (github.com/mtford90/silk) Silk is a live profiling Django app that inter-


cepts and stores HTTP requests and database queries before presenting them in a
user interface for further inspection.

26.2.2 Reduce the Number of Queries


Once you know which pages contain an undesirable number of queries, figure out ways to
reduce that number. Some of the things you can attempt:

ä Try using select_related() in your ORM calls to combine queries. It follows


ForeignKey relations and combines more data into a larger query. If using CBVs,
django-braces makes doing this trivial with the SelectRelatedMixin. Beware of
queries that get too large by explicitly passing the related field names you are interested
in. Only the specified relations will be followed. Combine that with careful testing!
ä For many-to-many and many-to-one relationships that can’t be optimized with
select_related(), explore using prefetch_related() instead.
ä If the same query is being generated more than once per template, move the query
into the Python view, add it to the context as a variable, and point the template ORM
calls at this new context variable.
26.2: Speed Up Query-Heavy Pages

ä Implement caching using a key/value store such as Memcached or Redis. Then


write tests to assert the number of queries run in a view. See
docs.djangoproject.com/en/3.2/topics/testing/tools/#django.
test.TransactionTestCase.assertNumQueries for instructions.
ä Use the django.utils.functional.cached_property decorator to cache in
memory the result of method call for the life of an object instance. This is incred-
ibly useful, so please see Section 31.3.5: django.utils.functional.cached_property in
chapter 31.

26.2.3 Speed Up Common Queries


The length of time it takes for individual queries can also be a bottleneck. Here are some
tips, but consider them just starting points:

ä Make sure your indexes are helping speed up your most common slow queries. Look
at the raw SQL generated by those queries, and index on the fields that you filter/sort
on most frequently. Look at the generated WHERE and ORDER_BY clauses.
ä Understand what your indexes are actually doing in production. Development ma-
chines will never perfectly replicate what happens in production, so learn how to
analyze and understand what’s really happening with your database.
ä Look at the query plans generated by common queries.
ä Turn on your database’s slow query logging feature and see if any slow queries occur
frequently.
ä Use django-debug-toolbar in development to identify potentially-slow queries defen-
sively, before they hit production.

Once you have good indexes, and once you’ve done enough analysis to know which queries
to rewrite, here are some starting tips on how to go about rewriting them:

1 Rewrite your logic to return smaller result sets when possible.


2 Re-model your data in a way that allows indexes to work more effectively.
3 Drop down to raw SQL in places where it would be more efficient than the generated
query.

TIP: Use EXPLAIN ANALYZE / EXPLAIN


If you’re using PostgreSQL, you can use EXPLAIN ANALYZE to get an extremely
detailed query plan and analysis of any raw SQL query. For more information, see:
ä revsys.com/writings/postgresql-performance.html
ä craigkerstiens.com/2013/01/10/more-on-postgres-performance/
Chapter 26: Finding and Reducing Bottlenecks

The MySQL equivalent is the EXPLAIN command, which isn’t as detailed but is still
helpful. For more information, see:
ä dev.mysql.com/doc/refman/5.7/en/explain.html

A nice feature of django-debug-toolbar is that the SQL pane has an EXPLAIN fea-
ture.

26.2.4 Switch ATOMIC_REQUESTS to False


The clear, vast majority of Django projects will run just fine with the setting of
ATOMIC_REQUESTS to True. Generally, the penalty of running all database queries in a
transaction isn’t noticeable. However, if your bottleneck analysis points to transactions caus-
ing too much delay, it’s time to change the project run as ATOMIC_REQUESTS to False. See
Section 7.7.2: Explicit Transaction Declaration for guidelines on this setting.

26.3 Get the Most Out of Your Database


You can go a bit deeper beyond optimizing database access. Optimize the database itself !
Much of this is database-specific and already covered in other books, so we won’t go into
too much detail here.

26.3.1 Know What Doesn’t Belong in the Database


Frank Wiles of Revolution Systems taught us that there are two things that should never
go into any large site’s relational database:

Logs. Don’t add logs to the database. Logs may seem OK on the surface, especially in devel-
opment. Yet adding this many writes to a production database will slow their performance.
When the ability to easily perform complex queries against your logs is necessary, we rec-
ommend third-party services such as Splunk or Loggly, or use of document-based NoSQL
databases.

Ephemeral data. Don’t store ephemeral data in the database. What this means is data that
requires constant rewrites is not ideal for use in relational databases. This includes examples
such as django.contrib.sessions, django.contrib.messages, and metrics. Instead, move this
data to things like Memcached, Redis, and other non-relational stores.

TIP: Frank Wiles on Binary Data in Databases


Actually, Frank says that there are three things to never store in a database, the
third item being binary data. Storage of binary data in databases is addressed by
26.4: Cache Queries With Memcached or Redis

django.db.models.FileField, which does the work of storing files on file


servers like AWS CloudFront or S3 for you. Exceptions to this are detailed in Sec-
tion 6.4.5: When to Use BinaryField.

26.3.2 Getting the Most Out of PostgreSQL


If using PostgreSQL, be certain that it is set up correctly in production. As this is outside
the scope of the book, we recommend the following articles:

ä wiki.postgresql.org/wiki/Detailed_installation_guides
ä wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
ä revsys.com/writings/postgresql-performance.html
ä craigkerstiens.com/2012/10/01/understanding-postgres-performance
ä craigkerstiens.com/2013/01/10/more-on-postgres-performance

26.3.3 Getting the Most Out of MySQL


It’s easy to get MySQL running, but optimizing production installations requires experience
and understanding. As this is outside the scope of this book, we recommend the following
links to help you:

ä TODO - list good resources

26.4 Cache Queries With Memcached or Redis


You can get a lot of mileage out of simply setting up Django’s built-in caching system with
Memcached or Redis. You will have to install one of these tools, install a package that
provides Python bindings for them, and configure your project.

You can easily set up the per-site cache, or you can cache the output of individual views or
template fragments. You can also use Django’s low-level cache API to cache Python objects.

Reference material:

ä docs.djangoproject.com/en/3.2/topics/cache/
ä github.com/niwinz/django-redis

26.5 Identify Specific Places to Cache


Deciding where to cache is like being first in a long line of impatient customers at Ben and
Jerry’s on free scoop day. You are under pressure to make a quick decision without being
able to see what any of the flavors actually look like.

Here are things to think about:


Chapter 26: Finding and Reducing Bottlenecks

ä Which views/templates contain the most queries?


ä Which URLs are being requested the most?
ä When should a cache for a page be invalidated?

Let’s go over the tools that will help you with these scenarios.

26.6 Consider Third-Party Caching Packages


Third-party packages will give you additional features such as:

ä Caching of QuerySets.
ä Cache invalidation settings/mechanisms.
ä Different caching backends.
ä Alternative or experimental approaches to caching.

A few of the popular Django packages for caching are:

ä django-cacheops
ä django-cachalot

See djangopackages.org/grids/g/caching/ for more options.

WARNING: Third-Party Caching Libraries Aren’t Always the


Answer
Having tried many of the third-party Django cache libraries, we have to ask our
readers to test them very carefully and be prepared to drop them. They are cheap,
quick wins, but can lead to some hair-raising debugging efforts at the worst possible
times.

Cache invalidation is hard, and in our experience, magical cache libraries are better
for projects with more static content. By-hand caching is a lot more work, but leads
to better performance in the long run and doesn’t risk those terrifying moments.

26.7 Compression and Minification of HTML, CSS, and


JavaScript
When a browser renders a web page, it usually has to load HTML, CSS, JavaScript, and im-
age files. Each of these files consumes the user’s bandwidth, slowing down page loads. One
way to reduce bandwidth consumption is via compression and minification. Django even
provides tools for you: GZipMiddleware and the {% spaceless %} template tag. Through
the at-large Python community, we can even use WSGI middleware that performs the
same task.
26.8: Use Upstream Caching or a Content Delivery Network

The problem with making Django and Python do the work is that compression and minifica-
tion take up system resources, which can create bottlenecks of their own. A better approach
is to use Nginx or Apache web servers configured to compress the outgoing content. If you
are maintaining your own web servers, this is absolutely the way to go.

A common approach is to use a third-party compression module or Django library to com-


press and minify the HTML, CSS, and JavaScript in advance. Our preference is django-
pipeline which comes recommended by Django core developer Jannis Leidel.

For CSS and JavaScript, most people use JavaScript-powered tools for minification. Tools
like django-webpack-loader manage the JavaScript libraries within the Django context.
The advantage of this approach is the greater mindshare of tools and solved problems in
this domain space.

Tools and libraries to reference:

ä Apache and Nginx compression modules


ä django-webpack-loader
ä django-pipeline
ä django-compressor
ä django-htmlmin
ä Django’s built-in spaceless tag: docs.djangoproject.com/en/3.2/ref/
templates/builtins/spaceless
ä djangopackages.org/grids/g/asset-managers/

26.8 Use Upstream Caching or a Content Delivery


Network
Upstream caches such as Varnish are very useful. They run in front of your web server and
speed up web page or content serving significantly. See varnish-cache.org.

Content Delivery Networks (CDNs) like Fastly, Akamai, and Amazon Cloudfront serve
static media such as images, video, CSS, and JavaScript files. They usually have servers all
over the world, which serve out your static content from the nearest location. Using a CDN
rather than serving static content from your application servers can speed up your projects.

26.9 Other Resources


Advanced techniques on scaling, performance, tuning, and optimization are beyond the
scope of this book, but here are some starting points.

On general best practices for web performance:


Chapter 26: Finding and Reducing Bottlenecks

ä TODO - add useful links

On scaling large Django sites:

ä “The Temple of Django Database Performance” is a book that dives deep into opti-
mizing Django projects for speed and scalability. It’s a delightful book full of fan-
tasy and RPG references and worth every penny. spellbookpress.com/books/
temple-of-django-database-performance/
ä Written with a focus on scaling Django, the book “High Performance Django” espouses
many good practices. Full tricks and tips, as well as questions in each section that
force you to think about what you are doing. Dated in places, but still full of useful
information. highperformancedjango.com
ä Watch videos of presentations from past DjangoCons and Py-
Cons about different developers’ experiences. Scaling prac-
tices vary from year to year and from company to company:
https://www.youtube.com/results?search_query=django+scaling

Figure 26.1: With your site running smoothly, you’ll be feeling as cool as a cone.
26.10: Summary

TIP: For Sites With High Volume: High Performance Django


We want to reiterate that “High Performance Django” is worth getting if your site
has enough traffic to cause issues. While it’s getting old, Peter Baumgartner and
Yann Malet wrote the book more at the conceptual level, making it a volume that
you should consider purchasing.
ä highperformancedjango.com
ä amazon.com/High-Performance-Django/dp/1508748128

26.10 Summary
In this chapter, we explored a number of bottleneck reduction strategies including:

ä Whether you should even care about bottlenecks in the first place.
ä Profiling your pages and queries.
ä Optimizing queries.
ä Using your database wisely.
ä Caching queries.
ä Identifying what needs to be cached.
ä Compression of HTML, CSS, and JavaScript.
ä Exploring other resources.

In the next chapter, we’ll cover various practices involving asynchronous task queues, which
may resolve our bottleneck problems.
Chapter 26: Finding and Reducing Bottlenecks
27 | Asynchronous Task Queues

WARNING: This Chapter is in Progress


We are in the midst of working on this chapter and will expand on it in the days to
come. We are open to suggestions on topics and items to cover, please submit them
to github.com/feldroy/two-scoops-of-django-3.x/issues

An asynchronous task queue is one where tasks are executed at a different time from when
they are created, and possibly not in the same order they were created. Here is an example
of a human-powered asynchronous task queue:

1 In their spare time, Audrey and Daniel make ice cream cakes, taking orders from
friends and family. They use an issue tracker to track their tasks for scooping, spread-
ing, and decorating each cake.
2 Every so often, when they have spare time, they review the list of tasks and pick
one to do. Audrey prefers scooping and decorating, always doing those tasks first.
Daniel prefers scooping and spreading, finishing those before decorating. The result
is asynchronous completion of cake-making tasks.
3 As a cake-making task is completed and delivered, they mark the issue as closed.

TIP: Task Queue vs Asynchronous Task Queue


In the Django world, both terms are used to describe asynchronous task queue.
When someone writes task queue in the context of Django, they usually mean asyn-
chronous task queue.

Before we get into best practices, let’s go over some definitions:

Broker The storage for the tasks themselves. This can be implemented using any sort of
Chapter 27: Asynchronous Task Queues

persistence tool, although in Django the most common ones in use are RabbitMQ
and Redis. In the human-powered example, the storage is an online issue tracker.
Producer The code that adds tasks to the queue to be executed later. This is application
code, the stuff that makes up a Django project. In the human-powered example, this
would be Audrey and Daniel, plus anyone they can get to pitch in to help.
Worker The code that takes tasks from the broker and performs them. Usually there is more
than one worker. Most commonly each worker runs as a daemon under supervision.
In the human-powered example, this is Audrey and Daniel.
Serverless Usually provided by services such as AWS Lambda, this is, to paraphrase Mar-
tin Fowler, “where some amount of server-side logic is written by us but unlike tra-
ditional architectures is run in stateless compute containers that are event-triggered,
ephemeral (only last for one invocation), and fully managed by a 3rd party.” Server-
less takes over the role of Broker and Worker. In the human-powered example, it’s
as if Daniel and Audrey use a third-party service to take the orders and then follow
their precise instructions on doing the work.

27.1 Do We Need a Task Queue?


It depends. They add complexity but can improve user experience. Arguably it comes down
to whether a particular piece of code causes a bottleneck and can be delayed for later when
more free CPU cycles are available.

Here is a useful rule of thumb for determining if a task queue should be used:

Results take time to process: Task queue should probably be used.


Users can and should see results immediately: Task queue should not be used.

Let’s go over some possible use cases:

Issue Use Task Queue?


Sending bulk email Yes
Modifying files (including images) Yes
Fetching large amounts of data from third- Yes
party Ice Cream APIs
Inserting or updating a lot of records into Yes
a table
Updating a user profile No
Adding a blog or CMS entry No
Performing time-intensive calculations Yes
Sending or receiving of webhooks Yes

Table 27.1: Should a Project Have a Task Queue?

Please keep in mind there are site-traffic driven exceptions to all of these use cases:
27.1: Do We Need a Task Queue?

ä Sites with small-to-medium amounts of traffic may never need a task queue for any
of these actions.
ä Sites with larger amounts of traffic may discover that nearly every user action requires
use of a task queue.

Determining whether or not a site or action needs a task queue is a bit of an art. There is no
easy answer we can provide. However, knowing how to use them is a really powerful tool in
any developer’s toolkit.
Chapter 27: Asynchronous Task Queues
28 | Security Best Practices

When it comes to security, Django has a pretty good record. This is due to security tools
provided by Django, solid documentation on the subject of security, and a thoughtful team
of core developers who are extremely responsive to security issues. However, it’s up to indi-
vidual Django developers such as ourselves to understand how to properly secure Django-
powered applications.

This chapter contains a list of things helpful for securing your Django application. This list
is by no means complete. Consider it a starting point.

TIP: What to Do if You Have a Security Breach


If you’re in the midst of a security crisis, please go to Chapter 37: Appendix G:
Handling Security Failures.

28.1 Reference Security Sections in Other Chapters


A number of other chapters in this book contain dedicated security sections, or touch on
security matters. These are found at the following locations:

ä Section 5.3: Separate Configuration From Code


ä Section 13.3: Always Use CSRF Protection With HTTP Forms That Modify Data
ä ??: ??
ä Section 28.28: Never Display Sequential Primary Keys
ä Section 21.8: Secure the Django Admin
ä Chapter 37: Appendix G: Handling Security Failures

28.2 Harden Your Servers


Search online for instructions and checklists for server hardening. Server hardening mea-
sures include but are not limited to things like setting up firewalls (help.ubuntu.com/
community/UFW), changing your SSH port, and disabling/removing unnecessary services.
Chapter 28: Security Best Practices

28.3 Know Django’s Security Features


Django’s security features include:

ä Cross-site scripting (XSS) protection.


ä Cross-site request forgery (CSRF) protection.
ä SQL injection protection.
ä Clickjacking protection.
ä Support for TLS/HTTPS/HSTS, including secure cookies.
ä Automatic HTML escaping.
ä An expat parser hardened against XML bomb attacks.
ä Hardened JSON, YAML, and XML serialization/deserialization tools.
ä Secure password storage, using the PBKDF2 algorithm with a SHA256 hash by de-
fault. That said, please upgrade per Section 28.29: Upgrade Password Hasher to Ar-
gon2

Most of Django’s security features “just work” out of the box without additional configura-
tion, but there are certain things that you’ll need to configure. We’ve highlighted some of
these details in this chapter, but please make sure that you read the official Django documen-
tation on security as well: docs.djangoproject.com/en/3.2/topics/security/

28.4 Turn Off DEBUG Mode in Production


Your production site should not be running in DEBUG mode. Attackers can find out
more than they need to know about your production setup from a helpful DEBUG mode
stack trace page. For more information, see docs.djangoproject.com/en/3.2/ref/
settings/#debug.

Keep in mind that when you turn off DEBUG mode, you will need to set AL-
LOWED_HOSTS or risk raising a SuspiciousOperation error, which generates a 400
BAD REQUEST error that can be hard to debug. For more information on setting/debug-
ging ALLOWED_HOSTS see:

ä Section 28.7: Use Allowed Hosts Validation


ä ??: ??

28.5 Keep Your Secret Keys Secret


If the SECRET_KEY setting is not secret, depending on project setup, we risk an at-
tacker gaining control of other people’s sessions, resetting passwords, and more. Our API
keys and other secrets should be carefully guarded as well. These keys should not even be
kept in version control.

We cover the mechanics of how to keep your SECRET_KEY out of version control
28.6: HTTPS Everywhere

in Chapter 5: Settings and Requirements Files, Section 5.3: Separate Configuration From
Code, and Section 5.4: When You Can’t Use Environment Variables.

28.6 HTTPS Everywhere


Sites must always be deployed behind HTTPS. Not having HTTPS means that malicious
network users will sniff authentication credentials between your site and end users. In fact,
all data sent between your site and end users is up for grabs.

There is also no guarantee that any of your users are seeing what you expect them to see: an
attacker could manipulate anything in the request or the response. So HTTPS makes sense
even if all your information is public, but you do care about the integrity of the information.

Your entire site must be behind HTTPS. Your site’s static resources must also be served via
HTTPS, otherwise visitors will get warnings about “insecure resources” which will rightly
scare them away from your site. For reference, these warnings exist because they are a po-
tential man-in-the-middle vector.

TIP: Jacob Kaplan-Moss on HTTPS vs HTTP


Django co-leader Jacob Kaplan-Moss says, “Your whole site should only be available
via HTTPS, not HTTP at all. This prevents getting “firesheeped” (having a session
cookie stolen when served over HTTP). The cost is usually minimal.”

If visitors try to access your site via HTTP, they must be redirected to HTTPS. This
can be done either through configuration of your web server or with Django middleware.
Performance-wise, it’s better to do this at the web server level, but if you don’t have control
over your web server settings for this, then redirecting via Django middleware is fine.

In the modern era obtaining SSL certificates is cheap and easy. Depending on your plat-
form, it’s even provided for free. To set it up, follow the instructions for your particular
web server or platform-as-a-service. Our preferred services are the very reputable (and free)
letsencrypt.org and cloudflare.com. These services makes it trivial to secure sites
with proper SSL certicates.

Large cloud providers such as AWS, Google, and Microsoft also provide options.

WARNING: The Most Secure Option is to Set Up Your Own SSL


While services like CloudFlare, AWS, Google Cloud, and Azure make it easy to
set up SSL, they aren’t as secure as letsencrypt.org. We can best explain this
through the Principle of Least Privilege (POLP). This is the practice of limiting
Chapter 28: Security Best Practices

access rights for users to the bare minimum permissions they need to perform their
work.
While it is unlikely that a penetration will occur on these systems, either from an
outside or employee, it is not outside the realm of possibility.
For many systems the time benefit of using someone else’s SSL is worth it. However,
sites containing critical privacy information (HIPAA comes to mind) should set up
their own SSL.

WARNING: Not having HTTPS is Inexcusable


At the publication of this book it is 2020 and getting HTTPS/SSL into place is
relatively easy. Furthermore, browsers give users bold unavoidable warnings about
lack of SSL. Users of all ages and technical acumen are becoming knowledgeable of
what these warnings mean. Search engines harshly penalize sites without it.
Going forward there is no excuse for having a production site not on HTTPS. Don’t
do it, even for demos or MVPs.

TIP: Use django.middleware.security.SecurityMiddleware


The tool of choice for projects on Django for enforcing HTTPS/SSL across an
entire site through middleware is built right in. To activate this middleware just
follow these steps:
1 Add django.middleware.security.SecurityMiddleware to the

settings.MIDDLEWARE definition.
2 Set settings.SECURE_SSL_REDIRECT to True .

WARNING: django.middleware.security.SecurityMiddleware
Does Not Include static/media
Even if all Django requests are served over HTTPS, omitting HTTPS for resources
like javascript would still allow attackers to compromise your site.
As JavaScript, CSS, images, and other static assets are typically served directly by
the web server (nginx, Apache), make certain that serving of such content is done
via HTTPS. Providers of static assets such as Amazon S3 now do this by default.
28.6: HTTPS Everywhere

28.6.1 Use Secure Cookies


Your site should inform the target browser to never send cookies unless via HTTPS. You’ll
need to set the following in your settings:

Example 28.1: Securing Cookies

SESSION_COOKIE_SECURE = True
CSRF_COOKIE_SECURE = True

Read docs.djangoproject.com/en/3.2/topics/security/#ssl-https for more


details.

28.6.2 Use HTTP Strict Transport Security (HSTS)


HSTS can be configured at the web server level. Follow the instructions for your web server,
platform-as-a-service, and Django itself (via settings.SECURE_HSTS_SECONDS ).

If you have set up your own web servers, Wikipedia has sample HSTS configuration snippets
that you can use: en.wikipedia.org/wiki/HTTP_Strict_Transport_Security

When you enable HSTS, your site’s web pages include a HTTP header that tells HSTS-
compliant browsers to only connect to the site via secure connections:

ä HSTS-compliant browsers will redirect HTTP links to HTTPS.


ä If a secure connection isn’t possible (e.g. the certificate is self-signed or expired), an
error message will be shown and access will be disallowed.

To give you a better idea of how this works, here’s an example of what a HTTP Strict
Transport Security response header might look like:

Example 28.2: HSTS Response Header

Strict-Transport-Security: max-age=31536000; includeSubDomains

Some HSTS configuration advice:

1 Set max-age to a small value like 300 (5 minutes) during initial deployment of a
secured site to make sure you haven’t screwed something up or forgotten to make
some portion of the site available via HTTPS. We suggest this small value because
once you set max-age, we can’t unset it for users; their browsers control expiration,
not us.
Chapter 28: Security Best Practices

2 Once it’s been confirmed the site is properly secured, set max-age to incrementally
larger values. Go to 1 hour (3600), 1 week (604800), 1 month (2592000), and finally
1 year (31536000). This allows us to correct issues that may not turn up in initial
checks.
3 Once the project’s max-age is set to 1 year or greater (31536000), submit the project
for HSTS preloading, which can be done at hstspreload.org/. This aids in block-
ing active attackers from intercepting the first request from a user to a site.

WARNING: Choose Your HSTS Policy Duration Carefully


Remember that HSTS is a one-way switch. It’s a declaration that for the next N
seconds, your site will be HTTPS-only. Don’t set a HSTS policy with a max-age
longer than you are able to maintain. Browsers do not offer an easy way to unset it.

Note that HSTS should be enabled in addition to redirecting all pages to HTTPS as de-
scribed earlier.

WARNING: Additional Warning for includeSubDomains


We recommend everyone to use HSTS with a long duration and to use
includeSubDomains. However, especially in projects with lots of legacy compo-
nents, the combination requires great care when configuring.

Example: Imagine we create a new Django website called example.com. The site
is HTTPS with HSTS enabled. We test the HSTS settings, which work fine,
and then increase the duration to a year. Alas, after a month, someone realises
legacy.example.com is still a production service and does not support HTTPS. We
remove includeSubdomains from the header, but by now it’s already too late: all
clients inside the company have the old HSTS header remembered.

In short, before even considering includeSubdomains, one should be entirely aware


of what might be hosted under the domain that HSTS is configured on.

28.6.3 HTTPS Configuration Tools


Mozilla provides a SSL configuration generator at the mozilla.github.io/
server-side-tls/ssl-config-generator/, which can provide a starting point for
your own configuration. While not perfect, it expedites setting up HTTPS. As our security
reviewers say, “In general, any HTTPS is better than plain HTTP.”
28.7: Use Allowed Hosts Validation

Once you have a server set up (preferably a test server), use the Qualys SSL Labs server test
at ssllabs.com/ssltest/index.html to see how well you did. A fun security game is
trying to score an A+. Especially as the official Two Scoops of Django reward for getting
that good of a grade is a trip to the local favorite ice cream saloon.

28.7 Use Allowed Hosts Validation


In production, you must set ALLOWED_HOSTS in your settings to a list of allowed
host/domain names in order to avoid raising SuspiciousOperation exceptions. This is a
security measure to prevent the use of fake HTTP host headers to submit requests.

We recommend that you avoid setting wildcard values here. For more information, read the
Django documentation on ALLOWED_HOSTS and the get_host() method:

ä docs.djangoproject.com/en/3.2/ref/settings/#allowed-hosts
ä docs.djangoproject.com/en/3.2/ref/request-response/#django.
http.HttpRequest.get_host

28.8 Always Use CSRF Protection With HTTP Forms


That Modify Data
Django comes with easy-to-use cross-site request forgery protection (CSRF) built in, and
by default it is activated across the site via the use of middleware. Make sure that your data
changing forms use the POST method, which by default will activate CSRF protection on
the server side. The only valid exception to use GET method is for search forms, where
it is useful for the user to have the argument visible in the URL. We have some strong
recommendations discussed in Section 13.3: Always Use CSRF Protection With HTTP
Forms That Modify Data.

28.9 Prevent Against Cross-Site Scripting (XSS) Attacks


XSS attacks usually occur when users enter malignant JavaScript that is then rendered into
a template directly. This isn’t the only method, but it is the most common. Fortunately for
us, Django by default escapes <, >, ’, ”, and &, which is all that is needed for proper HTML
escaping.

The following are recommended by the Django security team:

28.9.1 Use format_html Over mark_safe


Django gives developers the ability to mark content strings as safe, meaning that Django’s
own safeguards are taken away. A better alternative is django.utils.html.format_html ,
which is like Python’s str.format() method, except designed for building up HTML
Chapter 28: Security Best Practices

fragments. All args and kwargs are escaped before being passed to str.format() which
then combines the elements.

Reference: docs.djangoproject.com/en/3.2/ref/utils/#django.utils.html.
format_html

28.9.2 Don’t Allow Users to Set Individual HTML Tag Attributes


If you allow users to set individual attributes of HTML tags, that gives them a venue for
injecting malignant JavaScript.

28.9.3 Use JSON Encoding for Data Consumed by JavaScript


Rely on JSON encoding rather than finding ways to dump Python structures directly to
templates. It’s not just easier to integrate into client-side JavaScript, it’s safer. To do this,
always use the json_script filter.

Reference: docs.djangoproject.com/en/3.2/ref/templates/builtins/
#json-script

28.9.4 Beware Unusual JavaScript


Due to JavaScript’s weird semantics, it’s possible to construct syntactically-valid, executable
programs from a very tiny subset of characters. Per feld.to/unusual-javascript, it’s
possible to transform regular-looking JavaScript into an alphabet of only six characters (plus
sign, exclamation mark, open/close bracket and open/close parenthesis).

WARNING: NSFW: feld.to/unusual-javascript not safe for work


The URL that feld.to/unusual-javascript links to has a URL that may vi-
olate educational or corporate viewing policies.

28.9.5 Add Content Security Policy Headers


Also known as CSP, Content Security Policy provides a standard method to declare ap-
proved origins of content that browsers should be allowed to load on a website. Covered
types are JavaScript, CSS, HTML frames, web workers, fonts, images, embeddable objects
such as Java applets, ActiveX, audio and video files, and other HTML5 features. Option-
ally, CSP can be used with a violation-report URL collected by a project or managed by a
third-party service.

ä en.wikipedia.org/wiki/Content_Security_Policy
28.10: Defend Against Python Code Injection Attacks

ä github.com/mozilla/django-csp

28.9.6 Additional Reading


There are other avenues of attack that can occur, so educating yourself is important.

ä docs.djangoproject.com/en/3.2/ref/templates/builtins/#escape
ä en.wikipedia.org/wiki/Cross-site_scripting

28.10 Defend Against Python Code Injection Attacks


We once were hired to help with a project that had some security issues. The requests coming
into the site were being converted from django.http.HttpRequest objects directly into
strings via creative use of the str() function, then saved to a database table. Periodically,
these archived Django requests would be taken from the database and converted into Python
dicts via the eval() function. This meant that arbitrary Python code could be run on the
site at any time.

Needless to say, upon discovery the critical security flaw was quickly removed. This just
goes to show that no matter how secure Python and Django might be, we always need to
be aware that certain practices are incredibly dangerous.

28.10.1 Python Built-Ins That Execute Code


Beware of the eval() , exec() , and execfile() built-ins. If your project allows arbitrary
strings or files to be passed into any of these functions, you are leaving your system open to
attack.

For more information, read “Eval Really Is Dangerous” by Ned Batchelder:


nedbatchelder.com/blog/201206/eval_really_is_dangerous.html

28.10.2 Python Standard Library Modules That Can Execute Code


“Never unpickle data that could have come from an untrusted source, or that could
have been tampered with.”

– docs.python.org/3/library/pickle.html

You should not use the Python standard library’s pickle module to deserialize anything
which could have been modified by the user. As a general rule, avoid accepting pickled values
from user for any reason. Specific warnings about pickle and security are listed below:

ä lincolnloop.com/blog/playing-pickle-security/
ä blog.nelhage.com/2011/03/exploiting-pickle/
Chapter 28: Security Best Practices

28.10.3 Third-Party Libraries That Can Execute Code

When using PyYAML, only use safe_load() . While the use of YAML in the Python
and Django communities is rare outside of continuous integration, it’s not uncommon to
receive this format from other services. Therefore, if you are accepting YAML documents,
only load them with the yaml.safe_load() method.

For reference, the yaml.load() method will let you create Python objects,
which is really bad. As Ned Batchelder says, yaml.load() should be renamed to
yaml.dangerous_load() : nedbatchelder.com/blog/201302/war_is_peace.
html

28.10.4 Be Careful With Cookie-Based Sessions

Typically most Django sites use either database- or cache-based sessions. These function
by storing a hashed random value in a cookie which is used as a key to the real session
value, which is stored in the database or cache. The advantage of this is that only the key
to the session data is sent to the client, making it very challenging for malignant coders to
penetrate Django’s session mechanism.

However, Django sites can also be built using cookie-based sessions, which place the session
data entirely on the client’s machine. While this means slightly less storage needs for the
server, it comes with security issues that justify caution. Specifically:

1 It is possible for users to read the contents of cookie-based sessions.


2 If an attacker gains access to a project’s SECRET_KEY and your session serializer
is JSON-based, they gain the ability to falsify session data and therefore, if authenti-
cation is used, impersonate any user.
3 If an attacker gains access to a project’s SECRET_KEY and your session serializer
is pickle-based, they gain the ability to not only falsify session data, but also to execute
arbitrary code. In other words, not only can they assume new rights and privileges,
they can also upload working Python code. If you are using pickle-based sessions or
are considering using them, please read the tip below.
4 Another disadvantage of this configuration is that sessions can’t be invalidated in a
guaranteed way (except when they expire): you can try to override the cookie in the
browser with a new value, but you can’t enforce an attacker to use it: if they continue
sending requests with the old cookie, the session backend won’t know the difference.
28.11: Validate All Incoming Data With Django Forms

TIP: Use JSON for Cookie-Based Sessions


The default cookie serializer for Django is JSON-based, meaning that even if an
attacker discovers a project’s SECRET_KEY , they can’t execute arbitrary code.
If you decide to write your own cookie serializer, stick to using JSON as the format.
Never, ever use the optional pickle serializer.

Resources on the subject:


ä docs.djangoproject.com/en/3.2/topics/http/sessions/
#session-serialization
ä docs.djangoproject.com/en/3.2/ref/settings/
#session-serializer

Another thing to consider is that cookie-based sessions are a potential client-side perfor-
mance bottleneck. Transmitting the session data server-to-client is generally not an issue,
but client-to-server transmissions are much, much slower. This is literally the difference
between download and upload speeds all internet users encounter.

In general, we try to avoid cookie-based sessions.

Additional reading:

ä docs.djangoproject.com/en/3.2/topics/http/sessions/
#using-cookie-based-sessions
ä http://bit.ly/2plfHqU Threatpost.com article on cookies
ä yuiblog.com/blog/2007/03/01/performance-research-part-3/

28.11 Validate All Incoming Data With Django Forms


Django forms should be used to validate all data being brought into your project, including
from non-web sources. Doing so protects the integrity of our data and is part of securing
your application. We cover this in Section 13.1: Validate All Incoming Data With Django
Forms.

TIP: Using DRF Serializers Instead of Django Forms


Django REST Framework’s validation is as well constructed and secure as Django’s
form libraries. If you are more familiar with DRF, then using serializers to validate
all incoming data is perfectly okay.
Chapter 28: Security Best Practices

28.12 Disable the Autocomplete on Payment Fields


You should disable the HTML field autocomplete browser feature on fields that are gate-
ways to payment. This includes credit card numbers, CVVs, PINs, credit card dates, etc.
The reason is that a lot of people use public computers or their personal computers in public
venues.

For reference, Django forms make this easy:

Example 28.3: Disabling Autocomplete in Form Fields

from django import forms

class SpecialForm(forms.Form):
my_secret = forms.CharField(
widget=forms.TextInput(attrs={'autocomplete': 'off'}))

For any site that might be used in a public area (an airport for example), consider changing
the form field itself to PasswordInput:

Example 28.4: Changing Public Widget to PasswordInput

from django import forms

class SecretInPublicForm(forms.Form):

my_secret = forms.CharField(widget=forms.PasswordInput())

28.13 Handle User-Uploaded Files Carefully


The only way to completely safely serve user-provided content is from a completely sepa-
rate domain. For better or worse, there are an infinite number of ways to bypass file type
validators. This is why security experts recommend the use of content delivery networks
(CDNs): they serve as a place to store potentially dangerous files.

If you must allow upload and download of arbitrary file types, make sure that the server
uses the “Content-Disposition: attachment” header so that browsers won’t display
the content inline.

28.13.1 When a CDN Is Not an Option


When this occurs, uploaded files must be saved to a directory that does not allow them to
be executed. In addition, at the very least make sure the HTTP server is configured to serve
28.13: Handle User-Uploaded Files Carefully

images with image content type headers, and that uploads are restricted to a whitelisted
subset of file extensions.

Take extra care with your web server’s configuration here, because a malicious user can try to
attack your site by uploading an executable file like a CGI or PHP script and then accessing
the URL. This won’t solve every problem, but it’s better than the defaults.

Consult your web server’s documentation for instructions on how to configure this, or con-
sult the documentation for your platform-as-a-service for details about how static assets and
user-uploaded files should be stored.

28.13.2 Django and User-Uploaded Files


Django has two model fields that allow for user uploads: FileField and ImageField.
They come with some built-in validation, but the Django docs also strongly advise you to
“pay close attention to where you’re uploading them and what type of files they are, to avoid
security holes.”

If you are only accepting uploads of certain file types, do whatever you can do to ensure that
the user is only uploading files of those types. For example, you can:

ä Use the python-magic library to check the uploaded file’s headers:


github.com/ahupp/python-magic
ä Validate the file with a Python library that specifically works with that file type. Unfor-
tunately this isn’t documented, but if you dig through Django’s ImageField source
code, you can see how Django uses PIL to validate that uploaded image files are in
fact images.
ä Use defusedxml instead of native Python XML libraries or lxml. See Section 28.21:
Guard Against XML Bombing With defusedxml.

WARNING: Custom Validators Aren’t the Answer Here


Don’t just write a custom validator and expect it to validate your uploaded files be-
fore dangerous things happen. Custom validators are run against field content after
they’ve already been coerced to Python by the field’s to_python() method.

If the contents of an uploaded file are malicious, any validation happening after
to_python() is executed may be too late.

Further research:

ä docs.djangoproject.com/en/3.2/ref/models/fields/#filefield
Chapter 28: Security Best Practices

ä youtube.com/watch?v=HS8KQbswZkU Tom Eastman’s PyCon AU talk on the


“The dangerous, exquisite art of safely handing user-uploaded files” is required watch-
ing if building a site handling file uploads.

28.14 Don’t Use ModelForms.Meta.exclude


When using ModelForms, always use Meta.fields. Never use Meta.exclude. The use
of Meta.exclude is considered a grave security risk, specifically a Mass Assignment Vul-
nerability. We can’t stress this strongly enough. Don’t do it.

One common reason we want to avoid the Meta.exclude attribute is that its behavior
implicitly allows all model fields to be changed except for those that we specify. When using
the excludes attribute, if the model changes after the form is written, we have to remember
to change the form. If we forget to change the form to match the model changes, we risk
catastrophe.

Let’s use an example to show how this mistake could be made. We’ll start with a simple ice
cream store model:

Example 28.5: Sample Store Model

# stores/models.py
from django.conf import settings
from django.db import models

class Store(models.Model):
title = models.CharField(max_length=255)
slug = models.SlugField()
owner = models.ForeignKey(settings.AUTH_USER_MODEL)
# Assume 10 more fields that cover address and contact info.

Here is the wrong way to define the ModelForm fields for this model:

Example 28.6: Implicit Definition of Form Fields

# DON'T DO THIS!
from django import forms

from .models import Store

class StoreForm(forms.ModelForm):

class Meta:
28.14: Don’t Use ModelForms.Meta.exclude

model = Store
# DON'T DO THIS: Implicit definition of fields.
# Too easy to make mistakes!
excludes = ("pk", "slug", "modified", "created", "owner")

In contrast, this is the right way to define the same ModelForm’s fields:

Example 28.7: Explicit Definition of Form Fields

from django import forms

from .models import Store

class StoreForm(forms.ModelForm):

class Meta:
model = Store
# Explicitly specifying the fields we want
fields = (
"title", "address_1", "address_2", "email",
"usstate", "postal_code", "city",
)

The first code example, as it involves less typing, appears to be the better choice. It’s not, as
when you add a new model field you now you need to track the field in multiple locations
(one model and one or more forms).

Let’s demonstrate this in action. Perhaps after launch we decide we need to have a way of
tracking store co-owners, who have all the same rights as the owner. They can access account
information, change passwords, place orders, and specify banking information. The store
model receives a new field as shown on the next page:

Example 28.8: Added Co-Owners Field

# stores/models.py
from django.conf import settings
from django.db import models

class Store(models.Model):
title = models.CharField(max_length=255)
Chapter 28: Security Best Practices

slug = models.SlugField()
owner = models.ForeignKey(settings.AUTH_USER_MODEL)
co_owners = models.ManyToManyField(settings.AUTH_USER_MODEL)
# Assume 10 more fields that cover address and contact info.

The first form code example which we warned against using relies on us remembering to
alter it to include the new co_owners field. If we forget, then anyone accessing that store’s
HTML form can add or remove co-owners. While we might remember a single form,
what if we have more than one ModelForm for a model? In complex applications this is not
uncommon.

On the other hand, in the second example, where we used Meta.fields we know exactly
what fields each form is designed to handle. Changing the model doesn’t alter what the form
exposes, and we can sleep soundly knowing that our ice cream store data is more secure.

28.14.1 Mass Assignment Vulnerabilities


The problem we describe in this section is a Mass Assignment Vulnerability.

These occur when the patterns such as Active Record, designed to empower developers,
create security risks for web applications. The solution is the approach we advocate in this
section, which is explicit definition of fields that can be modified.

See n.wikipedia.org/wiki/Mass_assignment_vulnerability for more detail.

28.15 Don’t Use ModelForms.Meta.fields = ”__all__”


This includes every model field in your model form. It’s a shortcut, and a dangerous one. It’s
very similar to what we describe in Section 28.14: Don’t Use ModelForms.Meta.exclude,
and even with custom validation code, exposes projects to form-based Mass Assignment
Vulnerabilities. We advocate avoiding this technique as much as possible, as we feel that it’s
simply impossible to catch all variations of input.

28.16 Beware of SQL Injection Attacks


The Django ORM generates properly-escaped SQL which will protect your site from users
attempting to execute malignant, arbitrary SQL code.

Django allows you to bypass its ORM and access the database more directly through raw
SQL. When using this feature, be especially careful to escape your SQL code properly. This
is of concern in these specific components of Django:

ä The .raw() ORM method.


28.17: Don’t Store Unnecessary Data

ä The .extra() ORM method.


ä Directly accessing the database cursor.

Reference:

ä docs.djangoproject.com/en/3.2/topics/security/
#sql-injection-protection

28.17 Don’t Store Unnecessary Data


There is data we should avoid storing for financial and legal reasons.

28.17.1 Never Store Credit Card Data


Unless you have a strong understanding of the PCI-DSS security standards
(pcisecuritystandards.org) and adequate time/resources/funds to validate your
PCI compliance, storing credit card data is too much of a liability and should be avoided.

Instead, we recommend using third-party services like Stripe, Braintree, Adyen, PayPal,
and others that handle storing this information for you, and allow you to reference the data
via special tokens. Most of these services have great tutorials, are very Python and Django
friendly, and are well worth the time and effort to incorporate into your project.

TIP: Educate Yourself on PCI Compliance


Ken Cochrane has written an excellent blog post on PCI
compliance. Please read kencochrane.net/blog/2012/01/
developers-guide-to-pci-compliant-web-applications/

TIP: Read the Source Code of Open Source E-Commerce


Solutions
If you are planning to use any of the existing open source Django e-commerce so-
lutions, examine how the solution handles payments. If credit card data is being
stored in the database, even encrypted, then please use another solution.

28.17.2 Don’t Store PII or PHI Unless Required (By Law)


PII is the abbreviation for Personally Identifying Information and PHI is the abbreviation
for Protected Health Information. This data is a key part of our online identity and can
be exploited to gain access to various accounts or worse. Using the United States as an
example, storing Social Security Numbers (SSN) and state ID numbers should be avoided
Chapter 28: Security Best Practices

unless required by law. Even if required by law, if it is possible to store the data in one-way
hashes, then by all means do so.

If PII is discovered within a project that doesn’t have a legal obligation to store it, raise it
as an issue immediately with the project owners. Having this data stolen through either
technical or social methods can be disastrous legally and financially for an organization. In
the case of Payment cards can be easily cancelled, the same cannot be said for PII.

In the case of PHI, which is medical data, our warnings are more dire. Specifically, ev-
eryone involved is at risk for civil and criminal prosecution under Title II of HIPAA. See
en.wikipedia.org/wiki/HIPAA#Security_Rule. While that rule is for US-based
projects, many countries have similar regulations. For example, the authors know someone
called to trial in Kenya over a PHI leak.

28.18 Monitor Your Sites


Check your web servers’ access and error logs regularly. Install monitoring tools and check
on them frequently. Keep an eye out for suspicious activity.

28.19 Keep Your Dependencies Up-to-Date


You should always update your projects to work with the latest stable release of Django and
third-party dependencies. This is particularly important when a release includes security
fixes. For that, we recommend pyup.io, which automatically checks requirements files
against the latest versions that PyPI provides.

‘I’ve set up (these kinds of services) for distinct actions: it mails me once a week
for each project with any outdated dependencies, and if it finds an insecure ver-
sion it automatically creates a pull request in GitHub, so tests run automatically
and I can deploy quickly.’

– Sasha Romijn, Django core dev and security reviewer for Two Scoops of
Django 1.8

Useful links for updates to Django itself.

ä The official Django weblog at djangoproject.com/weblog/


ä The official django-announce mailing list at groups.google.com/forum/#!
forum/django-announce

28.20 Prevent Clickjacking


Clickjacking is when a malicious site tricks users to click on a concealed element of another
site that they have loaded in a hidden frame or iframe. An example is a site with a false
social media ‘login’ button that is really a purchase button on another site.
28.21: Guard Against XML Bombing With defusedxml

Django has instructions and components to prevent this from happening:

ä docs.djangoproject.com/en/3.2/ref/clickjacking/

28.21 Guard Against XML Bombing With defusedxml


Attacks against XML libraries are nothing new. For example, the amusingly titled but dev-
astating ‘Billion Laughs’ attack (http://en.wikipedia.org/wiki/Billion_laughs)
was discovered in 2003.

Unfortunately, Python, like many other programming languages, doesn’t account for this or
other venues of attack via XML. Furthermore, third-party Python libraries such as lxml are
vulnerable to at least 4 well-known XML-based attacks. For a list of Python and Python
library vulnerabilities see https://feld.to/2KKwuMq.

Fortunately for us, Christian Heimes created defusedxml, a Python library designed to
patch Python’s core XML libraries and some of the third-party libraries (including lxml).

For more information, please read:

ä https://pypi.org/project/defusedxml

28.22 Explore Two-Factor Authentication


Two-factor authentication (2FA) requires users to authenticate by combining two separate
means of identification.

For modern web applications, what that usually means is you enter your password as well
as a value sent to your mobile device. Another option for One-Time Passwords (OTP) is
to register them in a password manager.

The advantage of 2FA is that it adds another component, one that changes frequently, to
the authentication process, great for any site involving personal identity, finance, or medical
requirements.

The downside is that the user needs to have a charged mobile device with access to a network
in order to log in to your site, making it not so ideal for users who may not have a charged
mobile device or easy access to a network.

ä en.wikipedia.org/wiki/Two_factor_authentication
ä https://pypi.org/project/django-two-factor-auth
Chapter 28: Security Best Practices

PACKAGE TIP: Look for TOTP in 2FA Products and Packages


TOTP is short for en.wikipedia.org/wiki/Time-based_One-time_
Password_Algorithm, which is an open standard used by Google Authenticator
and many other services. TOTP does not require network access, which is useful
for building certain kinds of Django projects. However, SMS implementations
require cellular network access or third-party services such as twilio.com.

WARNING: The Issue of 2FA Recovery


An important issue is how people recover from loss of their 2FA token or phone
number. Passwords are generally recovered by sending an e-mail with a secret link.
However, if the 2FA token can also be reset by e-mail, access to the user’s e-mail
has basically become the single factor of authentication. Common methods include
offering TOTP authentication with SMS as a fallback, or offering a number of
recovery codes that need to be kept by the user. In some cases, organisations will
only reset these tokens after receiving a scan of an identity card belonging to the
account holder. In any case, a recovery process will be needed, so think of this in
advance.

28.23 Embrace SecurityMiddleware


We’ve mentioned Django’s built-in django.middleware.security.SecurityMiddleware
several times already in this chapter. We owe it to ourselves and our users to embrace and
use this feature of Django.

28.24 Force the Use of Strong Passwords


A strong password is one that more than just a list of characters. It is long and preferably
complex, including punctuation, digits, and both character cases. Let’s pledge to protect our
users by enforcing the use of such passwords.

So what makes the best password?

Our opinion is that at this point in time, length is more important than complexity. An 8
character length password mixing cases, numbers, and special characters is easier by several
orders of magnitude to break than a 50-character sentence of just lower cased letters. What’s
even better is if you have a 30-50 character sentence that includes numbers, mixed cases,
and special characters.

Reference: xkcd.com/936/
28.25: Don’t Prevent Copy/Pasting of Password

Quality Password Specification


Bad 6-10 characters of just the alphabet
Okay Minimum 8 characters, mixed case + nu-
meric + special characters
Better Minimum 30 characters of just the alpha-
bet
Best Minimum 30 characters, mixed case + nu-
meric + special characters

Table 28.1: Password Strength: Length vs Complexity

28.25 Don’t Prevent Copy/Pasting of Password


There are sites that force users to enter passwords manually, rather than allow for copy/past-
ing. This is a terrible anti-pattern, one that encourages users to rely on easy-to-remember
or repeatedly used passwords rather than strong, unique passwords for every site generated
by secure password managers such as 1Password or LastPass.

28.26 Give Your Site a Security Checkup


There are a number of services that provide automated checkups for sites. They aren’t security
audits, but they are great, free ways to make certain that your production deployment doesn’t
have any gaping security holes.

pyup.io’s Safety library (github.com/pyupio/safety) checks your installed depen-


dencies for known security vulnerabilities. By default it uses the open Python vulnerability
database Safety DB, but can be upgraded to use pyup.io’s Safety API using the --key
option.

TODO - consider adding snikt or other commercial tools

Mozilla also provides a similar, but non-Django specific service called Observatory
(observatory.mozilla.org).

28.27 Put Up a Vulnerability Reporting Page


It’s a good idea to publish information on your site about how users can report security
vulnerabilities to you.

GitHub’s “Responsible Disclosure of Security Vulnerabilities” page is a good example of


this and rewards reporters of issues by publishing their names:
help.github.com/articles/responsible-disclosure-of-security-vulnerabilities/

28.28 Never Display Sequential Primary Keys


Displaying sequential primary keys is to be avoided because:
Chapter 28: Security Best Practices

ä They inform potential rivals or hackers of your volume


ä By displaying these values we make it trivial to exploit insecure direct object references
ä We also provide targets for XSS attacks

Here are some patterns for looking up records without revealing sequential identifiers:

28.28.1 Lookup by Slug


In the Django world, this is incredibly common. There are literally hundreds of examples
available on how to do it. This is the goto method for many Django projects. However, it
becomes a little challenging when you have issues with duplicate slugs. In which case, one
of the other methods apply.

28.28.2 UUIDs
Django comes with very useful models.UUIDField. While a use case for using them as
primary keys for large distributed systems exists, they also serve nicely for public lookups.
Here is a sample model:

Example 28.9: Using UUID for Public Lookups

import uuid
from django.db import models

class IceCreamPayment(models.Model):
uuid = models.UUIDField(
unique=True,
default=uuid.uuid4,
editable=False)

def __str__(self):
return str(self.pk)

And here is how we call that model:

Example 28.10: Looking Up Payment By UUID

>>> from payments import IceCreamPayment


>>> payment = IceCreamPayment()
>>> IceCreamPayment.objects.get(id=payment.id)
<IceCreamPayment: 1>
>>> payment.uuid
28.28: Never Display Sequential Primary Keys

UUID('0b0fb68e-5b06-44af-845a-01b6df5e0967')
>>> IceCreamPayment.objects.get(uuid=payment.uuid)
<IceCreamPayment: 1>

TIP: Using UUIDs as the Primary Key


Some people like to use UUIDs as the primary key. In fact, we’ve done so on several
projects over the years. Here are observations on doing so by us and members of the
Django security team:
UUIDs as primary keys simplifies the security of data. Instead of two fields (id and
uuid), there is just ”id”.
On the other hand, sequential IDs are possible for humans to remember, UUIDs
are not. If the only access to a model is through UUIDs, that makes working with
data a bit harder. To grab a particular record one typically needs the entire UUID
string.
Also, there are performance considerations to be aware of when using UUIDs for
lookups. Security is being traded for speed: percona.com/blog/2019/11/22/
uuids-are-popular-but-bad-for-performance-lets-discuss/.
What all of this tells us is that UUIDs for primary keys (and lookups) is something
that should be considered carefully. If a project is expected to handle billions of
records, an alternative approach (including sequential IDs) should be considered.

WARNING: The Dangers of Obfuscating Sequential IDs


Slugs and UUIDs both have their disadvantages. The slug-based approach runs
into collisions quickly, causing things like, “vanilla”, “vanilla-2”, “vanilla-3” to occur.
UUIDs, to put it simply, are long and not memorizable by most humans. What can
we do?
You can obfuscate the sequential ID. But we don’t recommend it. Why not?
The short answer: Obfuscating is not an effective way to hide sequential IDs.
The long answer: There are any number of methods for obfuscating numbers rang-
ing from base64 encoding to using the hashids library. These approaches
work by converting a number to a alphanumeric code and back again. They
not only hide the number, they also shorten it. Sounds great, right?
The problem is that every method of obfuscating sequential IDs is funda-
mentally insecure. Base64 encoding is trivial to undo. Libraries like hashids
can be broken with brute-forceattacks or by anyone with a good un-
derstanding of cryographic knowledge ( carnage.github.io/2015/08/
Chapter 28: Security Best Practices

cryptanalysis-of-hashids).
To summarize: If you want to hide your sequential identifiers, don’t rely on obfus-
cation.

28.29 Upgrade Password Hasher to Argon2


...Django’s default password hasher is PBKDF2. While this is still acceptable as a
lowest-common-denominator option, there are and for many years have been better
options available.

– James Bennett, Django Core Developer and security reviewer of Two Scoops
of Django.

PBKDF2 is Django’s default because it’s supported in the Python standard library and thus
requires no third-party packages. Argon2 is the best option that is natively supported by
Django. Installation instructions and references:

ä https://docs.djangoproject.com/en/3.2/topics/auth/passwords/
#using-argon2-with-django - Instructions on installing and using Argon2
with Django
ä https://docs.djangoproject.com/en/3.2/topics/auth/passwords/
#argon2 - Instructions on customizing Argon2 arguments (advanced and not
necessary for most projects)
ä latacora.micro.blog/2018/04/03/cryptographic-right-answers.
html - Easy-to-understand list of answers to questions non-security focused
developers get asked about cryptographic questions.

28.30 Use SRI When Loading Static Assets From External


Sources
It is not uncommon for projects to load libraries from external sources. For example, JQuery,
Google Fonts, Bootstrap Tailwindcss, and even React and Vue are often loaded from CDNs
like CDNJS, unpkg, Google Hosted Libraries, and others. Unfortunately, without the use
of SRI (Subresource Integrity) to confirm the source, attackers can replace or modify assets
used by users. In other words, don’t do this:

Example 28.11: Naively Loading Bootstrap Assets

<!-- DON'T DO THIS - loading static assets without SRI -->


<link rel="stylesheet"
28.30: Use SRI When Loading Static Assets From External Sources

,→ href="https://stackpath.bootstrapcdn.com/bootstrap/4.3.1/css/bootstrap.min.css">

<script
,→ src="https://code.jquery.com/jquery-3.3.1.slim.min.js"></script>
<script
,→ src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.14.7/umd/popper.min.js"></
<script
,→ src="https://stackpath.bootstrapcdn.com/bootstrap/4.3.1/js/bootstrap.min.js"></scr

This is why we recommend the use of subresource integrity hashes when including CSS,
JavaScript, or web fonts from external projects. This is provided as a feature by CDNJS, the
JQuery project, the Bootstrap project, and a shockingly small number of other projects.

From the official Bootstrap site, this is what using SRI looks like:

Example 28.12: Using SRI When Loading Bootstrap Assets

<!-- Loading Static Assets with SRI -->


<link rel="stylesheet"

,→ href="https://stackpath.bootstrapcdn.com/bootstrap/4.3.1/css/bootstrap.min.css"

,→ integrity="sha384-ggOyR0iXCbMQv3Xipma34MD+dH/1fQ784/j6cY/iJTQUOhcWr7x9JvoRxT2MZw1T
crossorigin="anonymous">
<script src="https://code.jquery.com/jquery-3.3.1.slim.min.js"

,→ integrity="sha384-q8i/X+965DzO0rT7abK41JStQIAqVgRVzpbzo5smXKp4YfRvH+8abtTE1Pi6jizo
crossorigin="anonymous"></script>
<script
,→ src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.14.7/umd/popper.min.js"

,→ integrity="sha384-UO2eT0CpHqdSJQ6hJty5KVphtPhzWj9WO1clHTMGa3JDZwrnQq4sF86dIHNDz0W1
crossorigin="anonymous"></script>
<script
,→ src="https://stackpath.bootstrapcdn.com/bootstrap/4.3.1/js/bootstrap.min.js"

,→ integrity="sha384-JjSmVgyd0p3pXB1rRibZUAYoIIy6OrQ6VrjIEaFf/nJGzIxFDsf4x0xIM+B07jRM
crossorigin="anonymous"></script>

If a project doesn’t include SRI hashes with their installation instructions, we suggest you
Chapter 28: Security Best Practices

load their assets using CDNJS, which includes it as a feature in their “copy” dropdown. Just
choose the “copy link tag” option.

Recommended tools and useful references:

ä cdnjs.com - CDN that provides SRI hashes for all libraries they host
ä developer.mozilla.org/en-US/docs/Web/Security/Subresource_
Integrity - Mozilla’s documetation of SRI
ä w3.org/TR/SRI/ - Official specification

28.31 Reference Our Security Settings Appendix


Keeping track of everything that relates to security and Django is challenging. This chapter
alone is nigh 30 pages long and at the beginning we make it very clear this is not an absolute
reference.

In order to add clarity, we’ve created Appendix G: Security Settings Reference. This is where
we put important and useful information on how to better configure the security settings of
a Django project.

28.32 Review the List of Security Packages


In the security section of Appendix A: Packages, we list over ten related security packages
that can make a difference to your site. While some are listed in this chapter, others are
unique to that section of this book.

28.33 Keep Up-to-Date on General Security Practices


We end this chapter with some common-sense advice.

First, keep in mind that security practices are constantly evolving, both in the
Django community and beyond. Subscribe to groups.google.com/forum/#!forum/
django-announce and check Twitter, Hacker News, and various security blogs regularly.

Second, remember that security best practices extend well beyond those practices specific to
Django. You should research the security issues of every part of your web application stack,
and you should follow the corresponding sources to stay up to date.

TIP: Good Books and Articles on Security


Paul McMillan, Django core developer, security expert, and Two Scoops reviewer,
recommends the following books:
ä “The Tangled Web: A Guide to Securing Modern Web Applications”:
amzn.to/1hXAAyx
ä “The Web Application Hacker’s Handbook”:
28.34: Summary

amzn.to/1dZ7xEY

In addition, we recommend the following reference site:


ä wiki.mozilla.org/WebAppSec/Secure_Coding_Guidelines

28.34 Summary
Please use this chapter as a starting point for Django security, not the ultimate reference
guide. See the Django documentation’s list for additional security topics:
docs.djangoproject.com/en/3.2/topics/security/
#additional-security-topics

Django comes with a good security record due to the diligence of its community and at-
tention to detail. Security is one of those areas where it’s a particularly good idea to ask for
help. If you find yourself confused about anything, ask questions and turn to others in the
Django community for help.
Chapter 28: Security Best Practices
29 | Logging: What’s It For, Anyway?

Logging is like rocky road ice cream. Either you can’t live without it, or you forget about it
and wonder once in awhile why it exists.

Anyone who’s ever worked on a large production project with intense demands understands
the importance of using the different log levels appropriately, creating module-specific log-
gers, meticulously logging information about important events, and including extra detail
about the application’s state when those events are logged.

While logging might not seem glamorous, remember that it is one of the secrets to building
extremely stable, robust web applications that scale and handle unusual loads gracefully.
Logging can be used not only to debug application errors, but also to track interesting
performance metrics.

Logging unusual activity and checking logs regularly is also important for ensuring the
security of your server. In the previous chapter, we covered the importance of checking your
server access and error logs regularly. Keep in mind that application logs can be used in
similar ways, whether to track failed login attempts or unusual application-level activity.

29.1 Application Logs vs. Other Logs


This chapter focuses on application logs. Any log file containing data logged from your
Python web application is considered an application log.

In addition to your application logs, you should be aware that there are other types of logs,
and that using and checking all of your server logs is necessary. Your server logs, database
logs, network logs, etc. all provide vital insight into your production system, so consider
them all equally important.

29.2 Why Bother With Logging?


Logging is your go-to tool in situations where a stack trace and existing debugging tools
aren’t enough. Whenever you have different moving parts interacting with each other or the
possibility of unpredictable situations, logging gives you insight into what’s going on.
Chapter 29: Logging: What’s It For, Anyway?

The different log levels available to you are DEBUG, INFO, WARNING, ERROR, and CRITICAL.
Let’s now explore when it’s appropriate to use each logging level.

29.3 When to Use Each Log Level


In places other than your production environment, you might as well use all the log lev-
els. Log levels are controlled in your project’s settings modules, so we can fine tune this
recommendation as needed to account for load testing and large scale user tests.

In your production environment, we recommend using every log level except for DEBUG.

Figure 29.1: Appropriate usage of CRITICAL/ERROR/WARNING/INFO logging in


ice cream.

Since the same CRITICAL, ERROR, WARNING, and INFO logs are captured whether in pro-
duction or development, introspection of buggy code requires less modification of code. This
is important to remember, as debug code added by developers working to fix one problem
can create new ones.

The rest of this section covers how each log level is used.

29.3.1 Log Catastrophes With CRITICAL


Use the CRITICAL log level only when something catastrophic occurs that requires urgent
attention.

For example, if your code relies on an internal web service being available, and if that web
service is part of your site’s core functionality, then you might log at the CRITICAL level
anytime that the web service is inaccessible.

You might also like