Crypto 101
Crypto 101
lvh
Copyright 2013-2017, Laurens Van Houtven (lvh)
This work is available under the Creative Commons
Attribution-NonCommercial 4.0 International (CC BY-NC
4.0) license. You can find the full text of the license at https:
//creativecommons.org/licenses/by-nc/4.0/.
2
Pomidorkowi
3
Contents
Contents 4
I Foreword 9
2 Advanced sections 12
3 Development 13
4 Acknowledgments 14
II Building blocks 16
5 Exclusive or 17
5.1 Description . . . . . . . . . . . . . . . . . . . 17
5.2 A few properties of XOR . . . . . . . . . . . . 18
5.3 Bitwise XOR . . . . . . . . . . . . . . . . . . . 19
5.4 One-time pads . . . . . . . . . . . . . . . . . . 19
5.5 Attacks on “one-time pads” . . . . . . . . . . . 21
5.6 Remaining problems . . . . . . . . . . . . . . 26
6 Block ciphers 27
6.1 Description . . . . . . . . . . . . . . . . . . . 27
6.2 AES . . . . . . . . . . . . . . . . . . . . . . . . 32
4
6.3 DES and 3DES . . . . . . . . . . . . . . . . . . 36
6.4 Remaining problems . . . . . . . . . . . . . . 39
7 Stream ciphers 40
7.1 Description . . . . . . . . . . . . . . . . . . . 40
7.2 A naive attempt with block ciphers . . . . . . 40
7.3 Block cipher modes of operation . . . . . . . . 47
7.4 CBC mode . . . . . . . . . . . . . . . . . . . . 47
7.5 Attacks on CBC mode with predictable IVs . . 49
7.6 Attacks on CBC mode with the key as the IV . . 51
7.7 CBC bit flipping attacks . . . . . . . . . . . . . 52
7.8 Padding . . . . . . . . . . . . . . . . . . . . . 55
7.9 CBC padding attacks . . . . . . . . . . . . . . 56
7.10 Native stream ciphers . . . . . . . . . . . . . . 64
7.11 RC4 . . . . . . . . . . . . . . . . . . . . . . . . 65
7.12 Salsa20 . . . . . . . . . . . . . . . . . . . . . . 74
7.13 Native stream ciphers versus modes of opera-
tion . . . . . . . . . . . . . . . . . . . . . . . . 76
7.14 CTR mode . . . . . . . . . . . . . . . . . . . . 77
7.15 Stream cipher bit flipping attacks . . . . . . . 78
7.16 Authenticating modes of operation . . . . . . 79
7.17 Remaining problems . . . . . . . . . . . . . . 79
8 Key exchange 80
8.1 Description . . . . . . . . . . . . . . . . . . . 80
8.2 Abstract Diffie-Hellman . . . . . . . . . . . . . 81
8.3 Diffie-Hellman with discrete logarithms . . . . 85
8.4 Diffie-Hellman with elliptic curves . . . . . . . 86
8.5 Remaining problems . . . . . . . . . . . . . . 87
9 Public-key encryption 89
9.1 Description . . . . . . . . . . . . . . . . . . . 89
9.2 Why not use public-key encryption for every-
thing? . . . . . . . . . . . . . . . . . . . . . . 90
9.3 RSA . . . . . . . . . . . . . . . . . . . . . . . 91
9.4 Elliptic curve cryptography . . . . . . . . . . . 95
5
9.5 Remaining problem: unauthenticated en-
cryption . . . . . . . . . . . . . . . . . . . . . 95
10 Hash functions 98
10.1 Description . . . . . . . . . . . . . . . . . . . 98
10.2 MD5 . . . . . . . . . . . . . . . . . . . . . . . 100
10.3 SHA-1 . . . . . . . . . . . . . . . . . . . . . . 101
10.4 SHA-2 . . . . . . . . . . . . . . . . . . . . . . 102
10.5 Keccak and SHA-3 . . . . . . . . . . . . . . . . 103
10.6 Password storage . . . . . . . . . . . . . . . . 104
10.7 Length extension attacks . . . . . . . . . . . . 108
10.8 Hash trees . . . . . . . . . . . . . . . . . . . . 110
10.9 Remaining issues . . . . . . . . . . . . . . . . 110
6
13.6 HKDF . . . . . . . . . . . . . . . . . . . . . . 139
7
8
IV Appendices 185
V Glossary 206
Index 212
VI References 215
Bibliography 216
Part I
Foreword
9
1
10
CHAPTER 1. ABOUT THIS BOOK 11
Advanced sections
12
3
Development
13
4
Acknowledgments
This book would not have been possible without the support
and contributions of many people, even before the first pub-
lic release. Some people reviewed the text, some people pro-
vided technical review, and some people helped with the
original talk. In no particular order:
• My wife, Ewa
• Brian Warner
• Oskar Żabik
• Ian Cordasco
• Zooko Wilcox-O’Hearn
14
CHAPTER 4. ACKNOWLEDGMENTS 15
Building blocks
16
5
Exclusive or
5.1 Description
Exclusive or, often called “XOR”, is a Boolean1 binary2 op-
erator that is true when either the first input or the second
input, but not both, are true.
Another way to think of XOR is as something called a
“programmable inverter”: one input bit decides whether
to invert the other input bit, or to just pass it through un-
changed. “Inverting” bits is colloquially called “flipping”
bits, a term we’ll use often throughout the book.
17
CHAPTER 5. EXCLUSIVE OR 18
0⊕0=0 1⊕0=1
0⊕1=1 1⊕1=0
Crib-dragging
A classic approach to break multi-time pad systems is “crib-
dragging.” Crib-dragging uses small sequences expected to
occur with high probability. Those sequences are “cribs”.
The name crib-dragging originates from the fact that these
small “cribs” are dragged from left to right across each ci-
phertext, and from top to bottom across the ciphertexts, in
the hope of finding a match. The matches form the sites of
the start, or “crib”, if you will, of further decryption.
The idea is fairly simple. Suppose we have several en-
crypted messages Ci encrypted with the same “one-time”
pad K 6 . If we could correctly guess the plaintext for one of
the messages, let’s say Cj , we’d know K:
Cj ⊕ Pj = (Pj ⊕ K) ⊕ Pj
= K ⊕ Pj ⊕ Pj
=K ⊕0
=K
6
We use capital letters when referring to an entire message, as op-
posed to just bits of a message.
CHAPTER 5. EXCLUSIVE OR 23
Pi = Ci ⊕ K for all i
• ␣ of ␣ and variants
• ␣ to ␣ and variants
• ␣ a ␣ and variants
Block ciphers
David Kahn
6.1 Description
A block cipher is an algorithm that encrypts blocks of a
fixed length. The encryption function E transforms plain-
text blocks P into ciphertext blocks C by using a secret key
k:
C = E(k, P )
27
CHAPTER 6. BLOCK CIPHERS 28
P = D(k, C)
n! = 1 · 2 · 3 · . . . · (n − 1) · n
but that’s okay: that tiny fraction is still nowhere near small
enough for an attacker to just try them all.
Of course, a block cipher should be as easy to compute
as possible, as long as it doesn’t sacrifice any of the above
properties.
6.2 AES
The most common block cipher in current use is AES.
Contrary to its predecessor DES (which we’ll look at in
more detail in the next chapter), AES was selected through
CHAPTER 6. BLOCK CIPHERS 33
Key schedule
AES requires separate keys for each round in the next steps.
The key schedule is the process which AES uses to derive 128-
bit keys for each round from one master key.
First, the key is separated into 4 byte columns. The key
is rotated and then each byte is run through an S-box (sub-
stitution box) that maps it to something else. Each column
is then XORed with a round constant. The last step is to XOR
the result with the previous round key.
The other columns are then XORed with the previous
round key to produce the remaining columns.
SubBytes
SubBytes is the step that applies the S-box (substitution box)
in AES. The S-box itself substitutes a byte with another byte,
and this S-box is applied to each byte in the AES state.
It works by taking the multiplicative inverse over the Ga-
lois field, and then applying an affine transformation so that
there are no values x so that x⊕S(x) = 0 or x⊕S(x) = 0xff.
To rephrase: there are no values of x that the substitution
box maps to x itself, or x with all bits flipped. This makes
tive: they can help provide insight in the workings of the cipher, guiding
cryptographers in designing future ciphers and attacks against current
ciphers.
CHAPTER 6. BLOCK CIPHERS 35
ShiftRows
After having applied the SubBytes step to the 16 bytes of the
block, AES shifts the rows in the 4 × 4 array:
3
In its defense, linear attacks were not publicly known back when
DES was designed.
CHAPTER 6. BLOCK CIPHERS 36
MixColumns
MixColumns multiplies each column of the state with a fixed
polynomial.
ShiftRows and MixColumns represent the diffusion prop-
erties of AES.
AddRoundKey
As the name implies, the AddRoundKey step adds the bytes
from the round key produced by the key schedule to the state
of the cipher.
Stream ciphers
7.1 Description
A stream cipher is a symmetric-key encryption algorithm
that encrypts a stream of bits. Ideally, that stream could be
as long as we’d like; real-world stream ciphers have limits,
but they are normally sufficiently large that they don’t pose
a practical problem.
40
CHAPTER 7. STREAM CIPHERS 41
all but the most extreme block sizes. Furthermore, all but
the smallest of these block sizes are unrealistically large. For
an uncompressed bitmap with three color channels of 8 bit
depth, each pixel takes 24 bits to store. Since the block size
of AES is only 128 bits, that would equate to 128
24 or just over 5
pixels per block. That’s significantly fewer pixels per block
than the larger block sizes in the example. But AES is the
workhorse of modern block ciphers—it can’t be at fault, cer-
tainly not because of an insufficient block size.
Notice that an idealized encryption scheme looks like
random noise. “Looking like random noise” does not mean
something is properly encrypted: it just means that we can-
not inspect it using trivial methods.
C = ECB(Ek , A∥S)
text block matches the ciphertext block CR1 that was remem-
bered earlier.
p · p . . . · p = pb
| {z }
b positions
For a typical block size of 16 bytes (or 128 bits) brute forc-
ing means trying 25616 combinations. The number of tries
amounts to a huge, 39-digit number. It is so large that trying
all combinations is impossible. An ECB encryption oracle
allows an attacker to decrypt in at most 256 · 16 = 4096 tries,
which is a far more manageable number.
CHAPTER 7. STREAM CIPHERS 47
Conclusion
In the real world, block ciphers are used in systems that en-
crypt large amounts of data all the time. We see that when
using ECB mode, an attacker both analyzes ciphertexts to
recognize repeating patterns, and even decrypts messages
when given access to an encryption oracle.
Even when we use idealized block ciphers with unrealis-
tic properties, such as block sizes of more than a thousand
bits, an attacker can decrypt the ciphertexts. Real world
block ciphers have more limitations than our idealized ex-
amples, for example, having much smaller block sizes.
We are not yet even considering potential weaknesses in
the block cipher. It is not AES nor the test block ciphers that
cause the problem, it is our ECB construction. Clearly, some-
thing better is needed.
PM = IVM ⊕ IVA ⊕ G
CM = E(k, IVM ⊕ PM )
= E(k, IVM ⊕ (IVM ⊕ IVA ⊕ G))
= E(k, IVA ⊕ G)
P1′ = D(k, C1 ) ⊕ IV
= D(k, C1 ) ⊕ k
= P1
P2′ = D(k, Z) ⊕ C1
=R
P3′ = D(k, C1 ) ⊕ Z
= D(k, C1 )
= P1 ⊕ IV
ber how CBC decryption works: the output of the block ci-
pher is XORed with the previous ciphertext block to produce
the plaintext block. Now that the input ciphertext block Ci
has been modified, the output of the block cipher will be
some random unrelated block, and, statistically speaking,
nonsense. After being XORed with that previous ciphertext
block, it will still be nonsense. As a result, the produced
plaintext block is still just nonsense. In the illustration, this
unintelligible plaintext block is Pi′ .
However, in the block after that, the bits we flipped in the
ciphertext will be flipped in the plaintext as well! This is be-
cause, in CBC decryption, ciphertext blocks are decrypted
by the block cipher, and the result is XORed with the previ-
ous ciphertext block. But since we modified the previous ci-
phertext block by XORing it with X, the plaintext block Pi+1
will also be XORed with X. As a result, the attacker com-
pletely controls that plaintext block Pi+1 , since they can just
flip the bits that aren’t the value they want them to be.
TODO: add previous illustration, but mark the path X
takes to influence P prime {i + 1} in red or something
This may not sound like a huge deal at first. If you don’t
know the plaintext bytes of that next block, you have no idea
which bits to flip in order to get the plaintext you want.
To illustrate how attackers can turn this into a practical
attack, let’s consider a website using cookies. When you reg-
ister, your chosen user name is put into a cookie. The web-
site encrypts the cookie and sends it to your browser. The
next time your browser visits the website, it will provide the
encrypted cookie; the website decrypts it and knows who
you are.
An attacker can often control at least part of the plaintext
being encrypted. In this example, the user name is part of
the plaintext of the cookie. Of course, the website just lets
you provide whatever value for the user name you want at
registration, so the attacker can just add a very long string
of Z bytes to their user name. The server will happily en-
crypt such a cookie, giving the attacker an encrypted cipher-
CHAPTER 7. STREAM CIPHERS 55
7.8 Padding
So far, we’ve conveniently assumed that all messages just
happened to fit exactly in our system of block ciphers, be
it CBC or ECB. That means that all messages happen to be
a multiple of the block size, which, in a typical block cipher
such as AES, is 16 bytes. Of course, real messages can be of
arbitrary length. We need some scheme to make them fit.
That process is called padding.
CHAPTER 7. STREAM CIPHERS 56
PKCS#5/PKCS#7 padding
A better, and much more popular scheme, is PKCS#5/PKCS#7
padding.
PKCS#5, PKCS#7 and later CMS padding are all more or
less the same idea5 . Take the number of bytes you have to
pad, and pad them with that many times the byte with that
value. For example, if the block size is 8 bytes, and the last
block has the three bytes 12 34 45, the block becomes 12
34 45 05 05 05 05 05 after padding.
If the plaintext happened to be exactly a multiple of the
block size, an entire block of padding is used. Otherwise, the
recipient would look at the last byte of the plaintext, treat it
as a padding length, and almost certainly conclude the mes-
sage was improperly padded.
This scheme is described in [Hou].
that the padding was invalid. We can use that tiny bit of in-
formation about the padding of the plaintext to iteratively
decrypt the entire message.
The attacker will do this, one ciphertext block at a time,
by trying to get an entire plaintext block worth of valid
padding. We’ll see that this tells them the decryption of their
target ciphertext block, under the block cipher. We’ll also
see that you can do this efficiently and iteratively, just from
that little leak of information about the padding being valid
or not.
It may be helpful to keep in mind that a CBC padding at-
tack does not actually attack the padding for a given mes-
sage; instead the attacker will be constructing paddings to
decrypt a message.
To mount this attack, an attacker only needs two things:
• 01
• 02 02
• 03 03 03
• …
The first option (01) is much more likely than the oth-
ers, since it only requires one byte to have a particular value.
The attacker is modifying that byte to take every possible
value, so it is quite likely that they happened to stumble upon
01. All of the other valid padding options not only require
that byte to have some particular value, but also one or more
other bytes. For an attacker to be guaranteed a message with
a valid 01 padding, they just have to try every possible byte.
For an attacker to end up with a message with a valid 02 02
padding, they have to try every possible byte and happen to
have picked a combination of C and R that causes the plain-
text to have a 02 in that second-to-last position. (To rephrase:
the second-to-last byte of the decryption of the ciphertext
block, XORed with the second-to-last byte of R, is 02.)
In order to successfully decrypt the message, we still
need to figure out which one of those options is the actual
value of the padding. To do that, we try to discover the length
of the padding by modifying bytes starting at the left-hand
CHAPTER 7. STREAM CIPHERS 60
p0 p1 p2 p3 p4 030303
p′0 p1 p2 p3 p4 030303
As you can see, this doesn’t affect the validity of the padding.
It also does not affect p1 , p2 , p3 or p4 . However, when we
continue modifying subsequent bytes, we will eventually hit
a byte that is part of the padding. For example, let’s say we
turn that first 03 into 02 by modifying R. Pi now looks like
this:
D(Ci )[b] ⊕ rb = 01
D(Ci )[b] = 01 ⊕ rb
The attacker has now tricked the receiver into revealing the
value of the last byte of the block cipher decryption of Ci .
D(Ci )[b] ⊕ rb = 01
Now, we’d like to get that byte to say 02, to produce an al-
most valid padding: the last byte would be correct for a 2-
byte PKCS#5 padding (02 02), but that second-to-last byte
probably isn’t 02 yet. To do that, we XOR with 01 to cancel
the 01 that’s already there (since two XORs with the same
value cancel each other out), and then we XOR with 02 to
get 02:
D(Ci )[b] ⊕ rb ⊕ 01 ⊕ 02 = 01 ⊕ 01 ⊕ 02
= 02
rb′ = rb ⊕ 01 ⊕ 02
CHAPTER 7. STREAM CIPHERS 63
7.11 RC4
By far the most common native stream cipher in common
use on desktop and mobile devices is RC4.
RC4 is sometimes also called ARCFOUR or ARC4, which
stands for alleged RC4. While its source code has been
leaked and its implementation is now well-known, RSA Se-
curity (the company that authored RC4 and still holds the
RC4 trademark) has never acknowledged that it is the real
algorithm.
It quickly became popular because it’s very simple and
very fast. It’s not just extremely simple to implement, it’s
also extremely simple to apply. Being a synchronous stream
CHAPTER 7. STREAM CIPHERS 66
Then, the key is mixed into the state. This is done by let-
ting index i iterate over every element of the state. The j
index is found by adding the current value of j (starting at 0)
with the next byte of the key, and the current state element:
def key_schedule(key):
s = range(256)
key_bytes = cycle(ord(x) for x in key)
j = 0
for i in range(256):
j = (j + s[i] + next(key_bytes)) %
,→256
s[i], s[j] = s[j], s[i]
return s
def pseudorandom_generator(s):
j = 0
for i in cycle(range(256)):
j = (j + s[i]) % 256
s[i], s[j] = s[j], s[i]
Attacks
such flaws:
• The first three bytes of the key are correlated with the
first byte of the keystream.
• The first few bytes of the state are related to the key
with a simple (linear) relation.
7.12 Salsa20
Salsa20 is a newer stream cipher designed by Dan Bernstein.
Bernstein is well-known for writing a lot of open source
(public domain) software, most of which is either directly se-
curity related or built with information security very much
in mind.
There are two minor variants of Salsa20, called
Salsa20/12 and Salsa20/8, which are simply the same
algorithm except with 12 and 8 rounds7 respectively, down
from the original 20. ChaCha is another, orthogonal tweak
of the Salsa20 cipher, which tries to increase the amount
of diffusion per round while maintaining or improving
performance. ChaCha doesn’t have a “20” after it; spe-
cific algorithms do have a number after them (ChaCha8,
ChaCha12, ChaCha20), which refers to the number of
rounds.
7
Rounds are repetitions of an internal function. Typically a number
of rounds are required to make an algorithm work effectively; attacks of-
ten start on reduced-round versions of an algorithm.
CHAPTER 7. STREAM CIPHERS 75
x ← x ⊕ (y ⊞ z) ≪ n
Key exchange
8.1 Description
Key exchange protocols attempt to solve a problem that, at
first glance, seems impossible. Alice and Bob, who’ve never
met before, have to agree on a secret value. The chan-
nel they use to communicate is insecure: we’re assuming
that everything they send across the channel is being eaves-
dropped on.
We’ll demonstrate such a protocol here. Alice and Bob
will end up having a shared secret, only communicating over
the insecure channel. Despite Eve having literally all of the
information Alice and Bob send to each other, she can’t use
any of that information to figure out their shared secret.
That protocol is called Diffie-Hellman, named after Whit-
field Diffie and Martin Hellman, the two cryptographic pio-
neers who discovered it. They suggested calling the protocol
Diffie-Hellman-Merkle key exchange, to honor the contribu-
tions of Ralph Merkle. While his contributions certainly de-
serve honoring, that term hasn’t really caught on. For the
benefit of the reader we’ll use the more common term.
80
CHAPTER 8. KEY EXCHANGE 81
Alice and Bob both pick a random color, and they mix it
with the base color.
At the end of this step, Alice and Bob know their respec-
tive secret color, the mix of the secret color and the base
color, and the base color itself. Everyone, including Eve,
knows the base color.
Then, Alice and Bob both send their mixed colors over
the network. Eve sees both mixed colors, but she can’t fig-
CHAPTER 8. KEY EXCHANGE 83
ure out what either of Alice and Bob’s secret colors are. Even
though she knows the base, she can’t “un-mix” the colors
sent over the network.1
At the end of this step, Alice and Bob know the base, their
respective secrets, their respective mixed colors, and each
other’s mixed colors. Eve knows the base color and both
mixed colors.
Once Alice and Bob receive each other’s mixed color,
they add their own secret color to it. Since the order of the
mixing doesn’t matter, they’ll both end up with the same se-
cret.
1
While this might seem like an easy operation with black-and-white
approximations of color mixing, keep in mind that this is just a failure of
the illustration: our assumption was that this was hard.
CHAPTER 8. KEY EXCHANGE 84
y ≡ gx (mod p)
mA ≡ g rA (mod p)
mB ≡ g rB (mod p)
These numbers are sent across the network where Eve can
see them. The premise of the discrete logarithm problem
is that it is okay to do so, because figuring out r in m ≡ g r
(mod p) is supposedly very hard.
Once Alice and Bob have each other’s mixed numbers,
they add their own secret number to it. For example, Bob
CHAPTER 8. KEY EXCHANGE 86
would compute:
s ≡ (g rA )rB (mod p)
Public-key encryption
9.1 Description
So far, we have only done secret-key encryption. Suppose,
that you could have a cryptosystem that didn’t involve a sin-
gle secret key, but instead had a key pair: one public key,
which you freely distribute, and a private one, which you
keep to yourself.
People can encrypt information intended for you by us-
ing your public key. The information is then impossible to
decipher without your private key. This is called public-key
encryption.
For a long time, people thought this was impossible.
However, starting in the 1970s, such algorithms started ap-
pearing. The first publicly available encryption scheme was
produced by three cryptographers from MIT: Ron Rivest,
Adi Shamir and Leonard Adleman. The algorithm they pub-
lished is still the most common one today, and carries the
first letters of their last names: RSA.
public-key algorithms aren’t limited to encryption. In
fact, you’ve already seen a public-key algorithm in this book
89
CHAPTER 9. PUBLIC-KEY ENCRYPTION 90
9.3 RSA
As we already mentioned, RSA is one of the first practical
public-key encryption schemes. It remains the most com-
mon one to this day.
C ≡ Me (mod N )
M ≡ Cd (mod N )
Breaking RSA
Like many cryptosystems, RSA relies on the presumed dif-
ficulty of a particular mathematical problem. For RSA, this
is the RSA problem, specifically: to find the plaintext mes-
sage M , given a ciphertext C, and public key (N, e) in the
equation:
C ≡ Me (mod N )
Implementation pitfalls
Right now, there are no known practical complete breaks
against RSA. That’s not to say that systems employing RSA
aren’t routinely broken. Like with most broken cryptosys-
tems, there are plenty of cases where sound components,
improperly applied, result in a useless system. For a more
complete overview of the things that can go wrong with RSA
implementations, please refer to [Bon99] and [AV96]. In this
book, we’ll just highlight a few interesting ones.
PKCSv1.5 padding
Salt
Salt1 is a provisioning system written in Python. It has
one major flaw: it has a module named crypt. Instead of
reusing existing complete cryptosystems, it implements its
own, using RSA and AES provided by a third party package.
For a long time, Salt used a public exponent (e) of 1,
which meant the encryption phase didn’t actually do any-
1
So, there’s Salt the provisioning system, salts the things used in bro-
ken password stores, NaCl pronounced “salt” the cryptography library,
and NaCl which runs native code in some browsers, and probably a bunch
I’m forgetting. Can we stop naming things after it?
CHAPTER 9. PUBLIC-KEY ENCRYPTION 94
OAEP
OAEP, short for optimal asymmetric encryption padding, is
the state of the art in RSA padding. It was introduced by Mi-
hir Bellare and Phillip Rogaway in 1995. [BR95]. Its structure
looks like this:
M ∥000 . . . = X ⊕ G(R)
G(R) = G(H(X) ⊕ Y )
Hash functions
10.1 Description
Hash functions are functions that take an input of indeter-
minate length and produce a fixed-length value, also known
as a “digest”.
Simple hash functions have many applications. Hash ta-
bles, a common data structure, rely on them. These sim-
ple hash functions really only guarantee one thing: for two
identical inputs, they’ll produce an identical output. Impor-
tantly, there’s no guarantee that two identical outputs imply
that the inputs were the same. That would be impossible:
there’s only a finite amount of digests, since they’re fixed
size, but there’s an infinite amount of inputs. A good hash
function is also quick to compute.
Since this is a book on cryptography, we’re particularly
interested in cryptographic hash functions. Cryptographic
hash functions can be used to build secure (symmetric) mes-
sage authentication algorithms, (asymmetric) signature al-
gorithms, and various other tools such as random number
generators. We’ll see some of these systems in detail in fu-
ture chapters.
98
CHAPTER 10. HASH FUNCTIONS 99
10.2 MD5
MD5 is a hash function designed by Ronald Rivest in 1991
as an extension of MD4. This hash function outputs 128-
bit digests. Over the course of the years, the cryptographic
community has repeatedly uncovered MD5’s weaknesses. In
1993, Bert den Boer and Antoon Bosselaers published a pa-
per demonstrating “pseudo-collisions” for the compression
function of MD5. [dBB93] Dobbertin expanded upon this re-
search and was able to produce collisions for the compres-
sion function. In 2004, based on Dobbertin’s work, Xiaoyun
Wang, Dengguo Feng, Xuejia Lai and Hongbo Yu showed that
MD5 is vulnerable to real collision attacks. [LWdW05] The
last straw came when Xiaoyun Wang et al. managed to gen-
erate colliding X.509 certificates and then presented a distin-
guishing attack on HMAC-MD5. [LWdW05] [WYW+09]
Nowadays, it is not recommended to use MD5 for gen-
erating digital signatures, but it is important to note that
HMAC-MD5 is still a secure form of message authentication;
however, it probably shouldn’t be implemented in new cryp-
tosystems.
Five steps are required to compute an MD5 message di-
gest:
1. Add padding. First, 1 bit is appended to the message
and then 0 bits are added to the end until the length is
448 (mod 512).
2. Fill up the remaining 64 bits with the the length of the
original message modulo 264 , so that the entire mes-
sage is a multiple of 512 bits.
CHAPTER 10. HASH FUNCTIONS 101
4. Process the input in 512 bit blocks; for each block, run
four “rounds” consisting of 16 similar operations each.
The operations all consist of shifts, modular addition,
and a specific nonlinear function, different for each
round.
import hashlib
hashlib.md5(b”crypto101”).hexdigest()
10.3 SHA-1
SHA-1 is another hash function from the MD4 family de-
signed by the NSA, which produces a 160-bit digest. Just like
MD5, SHA-1 is no longer considered secure for digital signa-
tures. Many software companies and browsers, including
Google Chrome, have started to retire support of the signa-
ture algorithm of SHA-1. On February 23, 2017 researchers
from CWI Amsterdam and Google managed to produce a col-
lision on the full SHA-1 function. [SBK+] In the past methods
to cause collisions on reduced versions of SHA-1 have been
published, including one by Xiaoyun Wang. “The SHAppen-
ing” demonstrated freestart collisions for SHA-1. A freestart
collision allows one to pick the initial value known as the
initialization vector at the start of the compression function.
[SKP15]
CHAPTER 10. HASH FUNCTIONS 102
import hashlib
hashlib.sha1(b”crypto101”).hexdigest()
10.4 SHA-2
SHA-2 is a family of hash functions including SHA-224, SHA-
256, SHA-384, SHA-512, SHA-512/224 and SHA-512/256 and
their digest sizes 224, 256, 384, 512, 224 and 256 respectively.
These hash functions are based on the Merkle–Damgård
construction and can be used for digital signatures, message
authentication and random number generators. SHA-2 not
only performs better than SHA-1, it also provides better se-
curity, because of its increase in collision resistance.
SHA-224 and SHA-256 were designed for 32-bit proces-
sor registers, while SHA-384 and SHA-512 for 64-bit registers.
The 32-bit register variants will therefore run faster on a 32-
bit CPU and the 64-bit variants will perform better on a 64-bit
CPU. SHA-512/224 and SHA-512/256 are truncated versions
of SHA-512 allowing use of 64-bit words with an output size
equivalent to the 32-bit register variants (i.e., 224 and 256 di-
gest sizes and better performance on a 64-bit CPU).
The following is a table that gives a good overview of the
SHA-2 family:
Attacks on SHA-2
Several (pseudo-)collision and preimage attacks have been
demonstrated using SHA-256 and SHA-512 with less rounds.
It is important to note that by removing a certain amount of
rounds one can’t attack the entire algorithm. For instance,
Somitra Kumar Sanadhya and Palash Sarkar were able to
cause collisions with SHA-256 using 24 of 64 rounds (remov-
ing the last 40 rounds). [SS08]
import hashlib
hashlib.sha3_224(b”crypto101”).hexdigest()
hashlib.sha3_256(b”crypto101”).hexdigest()
hashlib.sha3_384(b”crypto101”).hexdigest()
hashlib.sha3_512(b”crypto101”).hexdigest()
Rainbow tables
It turns out that this reasoning is flawed. The amount of
passwords that people actually use is very limited. Even
with very good password practices, they’re strings some-
where between 10 and 20 characters, consisting mostly of
things that you can type on common keyboards. In practice
though, people use even worse passwords: things based on
real words (password, swordfish), consisting of few sym-
bols and few symbol types (1234), or with predictable mod-
ifications of the above (passw0rd).
To make matters worse, hash functions are the same ev-
erywhere. If a user re-uses the same password on two sites,
and both of them hash the password using MD5, the values
in the password database will be the same. It doesn’t even
have to be per-user: many passwords are extremely com-
mon (password), so many users will use the same one.
Keep in mind that a hash function is easy to evaluate.
What if we simply try many of those passwords, creating
huge tables mapping passwords to their hash values?
That’s exactly what some people did, and the tables were
just as effective as you’d expect them to be, completely break-
ing any vulnerable password store. Such tables are called
rainbow tables. This is because they’re essentially sorted
lists of hash function outputs. Those outputs will be more or
less randomly distributed. When written down in hexadec-
imal formats, this reminded some people of color specifica-
tions like the ones used in HTML, e.g. #52f211, which is
lime green.
Salts
The reason rainbow tables were so incredibly effective was
because everyone was using one of a handful of hash func-
CHAPTER 10. HASH FUNCTIONS 106
tions. The same password would result in the same hash ev-
erywhere.
This problem was generally solved by using salts. By mix-
ing (appending or prepending1 ) the password with some ran-
dom value before hashing it, you could produce completely
different hash values out of the same hash function. It ef-
fectively turns a hash function into a whole family of related
hash functions, with virtually identical security and perfor-
mance properties, except with completely different output
values.
The salt value is stored next to the password hash in the
database. When the user authenticates using the password,
you just combine the salt with the password, hash it, and
compare it against the stored hash.
If you pick a sufficiently large (say, 160 bits/32 bytes),
cryptographically random salt, you’ve completely defeated
ahead-of-time attacks like rainbow tables. In order to suc-
cessfully mount a rainbow table attack, an attacker would
have to have a separate table for each of those salt values.
Since even a single table was usually quite large, storing a
large amount of them would be impossible. Even if an at-
tacker would be able to store all that data, they’d still have
to compute it first. Computing a single table takes a decent
amount of time; computing 2160 different tables is impossi-
ble.
Many systems used a single salt for all users. While
that prevented an ahead-of-time rainbow table attack, it still
allowed attackers to attack all passwords simultaneously,
once they knew the value of the salt. An attacker would
simply compute a single rainbow table for that salt, and
compare the results with the hashed passwords from the
database. While this would have been prevented by using a
different salt for each user, systems that use a cryptographic
hash with a per-user salt are still considered fundamentally
1
While you could also do this with XOR, it’s needlessly more error-
prone, and doesn’t provide better results. Unless you zero-pad both the
password and the salt, you might be truncating either one.
CHAPTER 10. HASH FUNCTIONS 107
broken today; they are just harder to crack, but not at all se-
cure.
Perhaps the biggest problem with salts is that many
programmers were suddenly convinced they were doing
the right thing. They’d heard of broken password storage
schemes, and they knew what to do instead, so they ignored
all talk about how a password database could be compro-
mised. They weren’t the ones storing passwords in plain-
text, or forgetting to salt their hashes, or re-using salts for
different users. It was all of those other people that didn’t
know what they were doing that had those problems. Un-
fortunately, that’s not true. Perhaps that’s why broken pass-
word storage schemes are still the norm.
2
Directed graphs, where each node except the root has exactly one
ancestor.
3
Each non-leaf node has no more than two children
11
Message authentication
codes
11.1 Description
A MAC is a small bit of information that can be used to check
the authenticity and the integrity of a message. These codes
are often called “tags”. A MAC algorithm takes a message
of arbitrary length and a secret key of fixed length, and pro-
duces the tag. The MAC algorithm also comes with a verifi-
cation algorithm that takes a message, the key and a tag, and
tells you if the tag was valid or not. (It is not always sufficient
to just recompute a tag and check if they are the same; many
secure MAC algorithms are randomized, and will produce
different tags every time you apply them.)
Note that we say “message” here instead of “plaintext”
or “ciphertext”. This ambiguity is intentional. In this book
we’re mostly interested in MACs as a way to achieve authenti-
cated encryption, so the message will always be a ciphertext.
That said, there’s nothing wrong with a MAC being applied
to a plaintext message. In fact, we will be seeing examples
111
CHAPTER 11. MESSAGE AUTHENTICATION CODES 112
Secure MACs
We haven’t quite defined yet exactly which properties we
want from a secure MAC.
We will be defending against an active attacker. The at-
tacker will be performing a chosen message attack. That
means that an attacker will ask us the tag for any number
of messages mi , and we’ll answer truthfully with the appro-
priate tag ti .
An attacker will then attempt to produce an existential
forgery, a fancy way of saying that they will produce some
new valid combination of (m, t). The obvious target for the
attacker is the ability to produce valid tags t′ for new mes-
sages m′ of their choosing. We will also consider the MAC
insecure if an attacker can compute a new, different valid
tag t′ for a message mi that we previously gave them a valid
tag for.
Authenticate-then-encrypt
Authenticate-then-encrypt is a poor choice, but it’s a subtle
poor choice. It can still be provably secure, but only under
certain conditions. [Kra01]
At first sight, this scheme appears to work. Sure, you
have to decrypt before you can do anything, but to many
cryptographers, including the designers of TLS, this did not
appear to pose a problem.
In fact, prior to rigorous comparative study of differ-
ent composition mechanisms, many preferred this setup.
CHAPTER 11. MESSAGE AUTHENTICATION CODES 115
Authenticate-and-encrypt
Authenticate-and-encrypt has some serious problems.
Since the tag authenticates the plaintext and that tag is
part of the transmitted message, an attacker will be able
to recognize two plaintext messages are the same because
their tags will also be the same. This essentially leads to the
same problem we saw with ECB mode, where an attacker
can identify identical blocks. That’s a serious problem, even
if they can’t decrypt those blocks.
TODO: Explain how this works in SSH (see Moxie’s Doom
article)
t = H(k∥m)
Breaking prefix-MAC
Despite being quite common, this MAC is actually com-
pletely insecure for most (cryptographically secure!) hash
functions H, including SHA-2.
As we saw in the chapter on hash functions, many hash
functions, such as MD5, SHA-0, SHA-1 and SHA-2, pad the
message with a predictable padding before producing the
output digest. The output digest is the same thing as the
internal state of the hash function. That’s a problem: the
attacker can use those properties to forge messages.
First, they use the digest as the internal state of the hash
function. That state matches the state you get when you hash
k∥m∥p, where k is the secret key, m is the message, and p is
CHAPTER 11. MESSAGE AUTHENTICATION CODES 117
def parse(s):
pairs = s.split(”&”)
parsed = {}
for pair in pairs:
(continues on next page)
1
I realize there are briefer ways to write that function. I am trying to
make it comprehensible to most programmers; not pleasing to advanced
Pythonistas.
CHAPTER 11. MESSAGE AUTHENTICATION CODES 118
Variants
Issues with prefix-MAC has tempted people to come up with
all sorts of clever variations. For example, why not add
the key to the end instead of the beginning (t = H(m∥k),
or “suffix-MAC”, if you will)? Or maybe we should append
the key to both ends for good measure (t = H(k∥m∥k),
“sandwich-MAC” perhaps?)?
For what it’s worth, both of these are at least better than
prefix-MAC, but both of these have serious issues. For exam-
ple, a suffix-MAC system is more vulnerable to weaknesses
CHAPTER 11. MESSAGE AUTHENTICATION CODES 119
11.4 HMAC
HMAC is a standard to produce a MAC with a cryptographic
hash function as a parameter. It was introduced in 1996 in a
paper by Bellare, Canetti and Krawczyk. Many protocols at
the time implemented their own attempt at message authen-
tication using hash functions. Most of these attempts failed.
The goal of that paper specifically was to produce a provably
secure MAC that didn’t require anything beyond a secret key
and a hash function.
One of the nice features of HMAC is that it has a fairly
strong security proof. As long as the underlying hash func-
tion is a pseudorandom function, HMAC itself is also a pseu-
dorandom function. The underlying hash function doesn’t
even have to be collision resistant for HMAC to be a secure
MAC. [Bel06] This proof was introduced after HMAC itself,
and matched real-world observations: even though MD5
and to a lesser extent SHA-0 had serious collision attacks,
HMAC constructions built from those hash functions still ap-
peared to be entirely secure.
The biggest difference between HMAC and prefix-MAC
or its variants is that the message passes through a hash func-
tion twice, and is combined with the key before each pass.
Visually, HMAC looks like this:
The only surprising thing here perhaps are the two con-
stants pinner (the inner padding, one hash function’s block
length worth of 0x36 bytes) and pouter (the outer padding,
one block length worth of 0x5c bytes). These are necessary
for the security proof of HMAC to work; their particular val-
CHAPTER 11. MESSAGE AUTHENTICATION CODES 120
t ≡ m · a + b (mod p)
P (M, a) ≡ a · (a · (a · (· · · ) + m2 ) + m1 ) + b (mod p)
CHAPTER 11. MESSAGE AUTHENTICATION CODES 122
Re-using a and b
We’ll illustrate that our example MAC is insecure if it is
used to authenticate two messages m1 , m2 with the same key
(a, b):
t1 ≡ m1 · a + b (mod p)
t2 ≡ m2 · a + b (mod p)
CHAPTER 11. MESSAGE AUTHENTICATION CODES 123
t1 ≡ m1 · a + b (mod p)
⇓ (reorder terms)
b ≡ t1 − m1 · a (mod p)
As you can see, as with one-time pads, re-using the key even
once leads to a complete failure of the cryptosystem to pre-
serve privacy or integrity, as the case may be. As a result,
one-time MACs are a bit dangerous to use directly. For-
tunately, this weakness can be solved with a construction
called a Carter-Wegman MAC, which we’ll see in the next
section.
AEAD
AEAD is a feature of certain modes of authenticated encryp-
tion. Such modes of operation are called AEAD modes. It
starts with the premise that many messages actually consist
of two parts:
Usually, you will want to use a much more high level cryp-
tosystem, such as OpenPGP, NaCl or TLS.
OCB mode is an AEAD mode of operation. It is one of the
earliest developed AEAD modes.
Usually, you will want to use a much more high level cryp-
tosystem, such as OpenPGP, NaCl or TLS.
GCM mode is an AEAD mode with an unfortunate case
of RAS (redundant acronym syndrome) syndrome: GCM it-
self stands for “Galois Counter Mode”. It is formalized in a
NIST Special Publication [gcm07] and roughly boils down to
a combination of classical CTR mode with a Carter-Wegman
MAC. That MAC can be used by itself as well, which is called
GMAC.
Authentication
GCM mode (and by extension GMAC)
12
Signature algorithms
12.1 Description
A signature algorithm is the public-key equivalent of a mes-
sage authentication code. It consists of three parts:
130
CHAPTER 12. SIGNATURE ALGORITHMS 131
PSS
TODO (see #49)
12.3 DSA
The Digital Signature Algorithm (DSA) is a US Federal Gov-
ernment standard for digital signatures. It was first pro-
posed by the National Institute of Standards and Technology
(NIST) in 1991, to be used in the Digital Signature Standard
(DSS). The algorithm is attributed to David W. Kravitz, a for-
mer technical advisor at the NSA.
DSA key generation happens in two steps. The first step
is a choice of parameters, which can be shared between
users. The second step is the generation of public and pri-
vate keys for a single user.
Parameter generation
We start by picking an approved cryptographic hash func-
tion H. We also pick a key length L and a prime length N .
While the original DSS specified that L be between 512 and
1024, NIST now recommends a length of 3072 for keys with a
security lifetime beyond 2030. As L increases, so should N .
Next we choose a prime q of length N bits; N must be
less than or equal to the length of the hash output. We also
pick an L-bit prime p such that p − 1 is a multiple of q.
CHAPTER 12. SIGNATURE ALGORITHMS 132
Key generation
Armed with parameters, it’s time to compute public and pri-
vate keys for an individual user. First, select a random x with
0 < x < q. Next, calculate y where y ≡ g x (mod p). This de-
livers a public key (p, q, g, y), and private key x.
Signing a message
In order to sign a message, the signer picks a random k be-
tween 0 and q. Picking that k turns out to be a fairly sensitive
and involved process; but we’ll go into more detail on that
later. With k chosen, they then compute the two parts of the
signature r, s of the message m:
Verifying a signature
Verifying the signature is a lot more complex. Given the mes-
sage m and signature (r, s):
w ≡ s−1 (mod q)
u1 ≡ wH(m) (mod q)
CHAPTER 12. SIGNATURE ALGORITHMS 133
u2 ≡ wr (mod q)
• It has to be unique.
• It has to be unpredictable.
• It has to be secret.
ri ≡ g k (mod q)
The two values s1 and s2 are part of the signatures the at-
tacker saw. So, the attacker can compute k. That doesn’t
give him the private key x yet, though, or the ability to forge
signatures.
Let’s write the equation for s down again, but this time
thinking of k as something we know, and x as the variable
we’re trying to solve for:
sk ≡ H(m) + xr (mod q)
sk − H(m) ≡ xr (mod q)
12.4 ECDSA
TODO: explain (see #53)
As with regular DSA, the choice of k is extremely critical.
There are attacks that manage to recover the signing key us-
ing a few thousand signatures when only a few bits of the
nonce leak. [MHMP13]
13.1 Description
A key derivation function is a function that derives one or
more secret values (the keys) from one secret value.
Many key derivation functions can also take a (usually op-
tional) salt parameter. This parameter causes the key deriva-
tion function to not always return the same output keys for
the same input secret. As with other cryptosystems, salts
are fundamentally different from the secret input: salts gen-
erally do not have to be secret, and can be re-used.
Key derivation functions can be useful, for example,
when a cryptographic protocol starts with a single secret
value, such as a shared password or a secret derived using
Diffie-Hellman key exchange, but requires multiple secret
values to operate, such as encryption and MAC keys. An-
other use case of key derivation functions is in cryptograph-
ically secure random number generators, which we’ll see in
more detail in a following chapter, where they are used to
extract randomness with high entropy density from many
sources that each have low entropy density.
137
CHAPTER 13. KEY DERIVATION FUNCTIONS 138
13.3 PBKDF2
13.4 bcrypt
13.5 scrypt
13.6 HKDF
The HKDF, defined in RFC 5869 [KE] and explained in detail
in a related paper [Kra10], is a key derivation function de-
signed for high entropy inputs, such as shared secrets from a
Diffie-Hellman key exchange. It is specifically not designed
to be secure for low-entropy inputs such as passwords.
HKDF exists to give people an appropriate, off-the-shelf
key derivation function. Previously, key derivation was of-
ten something that was done ad hoc for a particular standard.
Usually these ad hoc solutions did not have the extra provi-
sions HKDF does, such as salts or the optional info parame-
ter (which we’ll discuss later in this section); and that’s only
in the best case scenario where the KDF wasn’t fundamen-
tally broken to begin with.
HKDF is based on HMAC. Like HMAC, it is a generic con-
struction that uses hash functions, and can be built using
any cryptographically secure hash function you want.
return ””.join(outputs)[:desired_length]
Like the salt in the extraction phase, the “info” parame-
ter is entirely optional, but can actually greatly increase the
security of the application. The “info” parameter is intended
to contain some application-specific context in which the
key derivation function is being used. Like the salt, it will
cause the key derivation function to produce different val-
ues in different contexts, further increasing its security. For
example, the info parameter may contain information about
the user being dealt with, the part of the protocol the key
derivation function is being executed for or the like. [KE]
14
Random number
generators
Robert R. Coveyou
14.1 Introduction
Many cryptographic systems require random numbers. So
far, we’ve just assumed that they’re available. In this chapter,
we’ll go more in depth about the importance and mechanics
of random numbers in cryptographic systems.
Producing random numbers is a fairly intricate process.
Like with so many other things in cryptography, it’s quite
easy to get it completely wrong but have everything look
completely fine to the untrained eye.
There are three categories of random number genera-
tion that we’ll consider separately:
143
CHAPTER 14. RANDOM NUMBER GENERATORS 144
• Thermal processes
• Oscillator drift
• Timing events
Keep in mind that not all of these options necessarily gen-
erate high-quality, truly random numbers. We’ll elaborate
further on how they can be applied successfully anyway.
CHAPTER 14. RANDOM NUMBER GENERATORS 145
Radioactive decay
One example of a quantum physical process used to pro-
duce random numbers is radioactive decay. We know that
radioactive substances will slowly decay over time. It’s im-
possible to know when the next atom will decay; that pro-
cess is entirely random. Detecting when such a decay has
occurred, however, is fairly easy. By measuring the time be-
tween individual decays, we can produce random numbers.
Shot noise
Shot noise is another quantum physical process used to pro-
duce random numbers. Shot noise is based on the fact that
light and electricity are caused by the movement of indivisi-
ble little packets: photons in the case of light, and electrons
in the case of electricity.
Nyquist noise
An example of a thermal process used to produce random
numbers is Nyquist noise. Nyquist noise is the noise that
occurs from charge carriers (typically electrons) traveling
through a medium with a certain resistance. That causes
a tiny current to flow through the resistor (or, alternatively
put, causes a tiny voltage difference across the resistor).
r
4kB T ∆f
i=
R
p
v = 4kB T R∆f
These formulas may seem a little scary to those who haven’t
seen the physics behind them before, but don’t worry too
much: understanding them isn’t really necessary to go along
with the reasoning. These formulas are for the root mean
square. If you’ve never heard that term before, you can
roughly pretend that means “average”. ∆f is the bandwidth,
CHAPTER 14. RANDOM NUMBER GENERATORS 146
14.4 Yarrow
14.6 Dual_EC_DRBG
Background
For a long time, the official standards produced by NIST
lacked good, modern cryptographically secure pseudoran-
dom number generators. It had a meager choice, and the
ones that had been standardized had several serious flaws.
NIST hoped to address this issue with a new publication
called SP 800-90, that contained several new cryptographi-
cally secure pseudorandom number generators. This docu-
ment specified a number of algorithms, based on different
cryptographic primitives:
2. HMAC
3. Block ciphers
4. Elliptic curves
Right off the bat, that last one jumps out. Using elliptic
curves for random number generation was unusual. Stan-
dards like these are expected to be state-of-the-art, while still
staying conservative. Elliptic curves had been considered
before in an academic context, but that was a far cry from
being suggested as a standard for common use.
There is a second reason elliptic curves seem strange.
HMAC and block ciphers are obviously symmetric algo-
rithms. Hash functions have their applications in asymmet-
ric algorithms such as digital signatures, but aren’t them-
selves asymmetric. Elliptic curves, on the other hand, are
exclusively used for asymmetric algorithms: signatures, key
exchange, encryption.
That said, the choice didn’t come entirely out of the blue.
A choice for a cryptographically secure pseudorandom num-
ber generator with a strong number-theoretical basis isn’t
unheard of: Blum Blum Shub is a perfect example. Those
generators are typically much slower than the alternatives.
Dual_EC_DRBG, for example, is three orders of magnitude
CHAPTER 14. RANDOM NUMBER GENERATORS 150
r = ϕ(sP )
That value, r, is used both for producing the output bits and
updating the internal state of the generator. In order to pro-
duce the output bits, a different elliptic curve point, Q, is
used. The output bits are produced by multiplying r with Q,
and running the result through a transformation θ:
o = θ(ϕ(rQ))
s = ϕ(rP )
means that it’s quite easy for an attacker who sees the output
value of ϕ to find points that could have produced that value.
In itself, that’s not necessarily a big deal; but, as we’ll see, it’s
one factor that contributes to the possibility of a backdoor.
Another flaw was shown where points were turned into
pseudorandom bits. The θ function simply discards the 16
most significant bits. Previous designs discarded signifi-
cantly more: for 256-bit curves such as these, they discarded
somewhere in the range of 120 and 175 bits.
Failing to discard sufficient bits gave the generator a
small bias. The next-bit property was violated, giving attack-
ers a better than 50% chance of guessing the next bit cor-
rectly. Granted, that chance was only about one in a thou-
sand better than 50%; but that’s still unacceptable for what’s
supposed to be the state-of-the-art in cryptographically se-
cure pseudorandom number generators.
Discarding only those 16 bits has another consequence.
Because only 16 bits were discarded, we only have to guess
216 possibilities to find possible values of ϕ(rQ) that pro-
duced the output. That is a very small number: we can sim-
ply enumerate all of them. Those values are the outputs of
ϕ, which as we saw just returns the x coordinate of a point.
Since we know it came from a point on the curve, we just
have to check if our guess is a solution for the curve equa-
tion:
y 2 ≡ x3 + ax + b (mod p)
s. They still need to solve the elliptic curve discrete log prob-
lem to find r from rQ, given Q. We’re assuming that problem
is hard.
Keep in mind that elliptic curves are primitives used for
asymmetric encryption. That problem is expected to be
hard to solve in general, but what if we have some extra in-
formation? What if there’s a secret value e so that eQ = P ?
Let’s put ourselves in the shoes of an attacker knowing e.
We repeat our math from earlier. One of those points A we
just found is the rQ we’re looking for. We can compute:
Aftermath
TODO: Talk about RSA guy’s comments + snowden leaks
def uint32(n):
return 0xFFFFFFFF & n
def initialize_state(seed):
state = [seed]
return state
For those of you who haven’t worked with Python or its
bitwise operators:
def regenerate(s):
for i in range(624):
y = s[i] & 0x80000000
y += s[(i + 1) % 624] & 0x7fffffff
if y % 2:
s[i] ^= 0x9908b0df
The % in an expression like s[(i + n) % 624] means
that a next element of the state is looked at, wrapping around
to the start of the state array if there is no next element.
The values 0x80000000 and 0x7fffffff have a spe-
cific meaning when interpreted as sequences of 32 bits.
0x80000000 has only the first bit set; 0x7fffffff has ev-
ery bit except the first bit set. Because these are bitwise
AND’ed together (&), this effectively means that after the first
two lines in the loop, y consists of the first bit of the cur-
rent state element and all the subsequent bits of the next el-
ement.
_TEMPER_MASK_1 = 0x9d2c5680
_TEMPER_MASK_2 = 0xefc60000
def temper(y):
y ^= uint32(y >> 11)
y ^= uint32((y << 7) & _TEMPER_MASK_1)
y ^= uint32((y << 15) & _TEMPER_MASK_2)
(continues on next page)
CHAPTER 14. RANDOM NUMBER GENERATORS 159
def untemper(y):
y ^= y >> 18
y ^= ((y << 15) & _TEMPER_MASK_2)
return y
def _undo_shift_2(y):
t = y
for _ in range(5):
t <<= 7
t = y ^ (t & _TEMPER_MASK_1)
return t
def _undo_shift_1(y):
t = y
for _ in range(2):
t >>= 11
t ^= y
return t
Cryptographic security
Remember that for cryptographic security, it has to be im-
possible to predict future outputs or recover past outputs
given present outputs. The Mersenne Twister doesn’t have
that property.
It’s clear that pseudorandom number generators, both
those cryptographically secure and those that aren’t, are en-
tirely defined by their internal state. After all, they are deter-
ministic algorithms: they’re just trying very hard to pretend
not to be. Therefore, you could say that the principal differ-
ence between cryptographically secure and ordinary pseu-
CHAPTER 14. RANDOM NUMBER GENERATORS 161
Complete cryptosystems
162
15
15.1 Description
SSL, short for Secure Socket Layer, is a cryptographic proto-
col originally introduced by Netscape Communications1 for
securing traffic on the Web. The standard is now superseded
by TLS (Transport Layer Security), a standard publicized in
RFCs by the IETF. The term SSL is still commonly used, even
when the speaker actually means a TLS connection. From
now on, this book will only use the term TLS, unless we re-
ally mean the old SSL standard.
Its first and foremost goal is to transport bytes securely,
over the Internet or any other insecure medium. [DR] It’s
a hybrid cryptosystem: it uses both symmetric and asym-
metric algorithms in unison. For example, asymmetric al-
gorithms such as signature algorithms can be used to au-
thenticate peers, while public key encryption algorithms or
Diffie-Hellman exchanges can be used to negotiate shared
secrets and authenticate certificates. On the symmetric side,
1
For those too young to remember, Netscape is a company that used
to make browsers.
163
CHAPTER 15. SSL AND TLS 164
15.2 Handshakes
TODO: explain a modern TLS handshake
Downgrade attacks
SSL 2.0 made the mistake of not authenticating handshakes.
This made it easy to mount downgrade attacks. A down-
grade attack is a man-in-the-middle attack where an attacker
modifies the handshake messages that negotiate which ci-
phersuite is being used. That way, he can force the clients
to set up the connection using an insecure block cipher, for
example.
Due to cryptographic export restrictions at the time,
many ciphers were only 40 or 56 bit. Even if the attacker
couldn’t break the best encryption both client and server
supported, he could probably break the weakest, which is
all that is necessary for a downgrade attack to succeed.
This is one of the many reasons that there is an explicit
RFC [TP] prohibiting new TLS implementations from having
SSL v2.0 support.
2
In case I haven’t driven this point home yet: it only goes to show
that designing cryptosystems is hard, and you probably shouldn’t do it
yourself.
CHAPTER 15. SSL AND TLS 165
15.7 Attacks
As with most attacks, attacks on TLS can usually be grouped
into two distinct categories:
1. Attacks on the protocol itself, such as subverting the
CA mechanism;
<input type=”hidden”
name=”csrf-token”
value=”TOKEN_VALUE_HERE”>
… they can prefix the guess with the known part of that.
In this case, it’s a CSRF token; a random token selected by
the server and given to the client. This token is intended to
prevent malicious third party websites from using the ambi-
ent authority present in the browser (such as session cook-
ies) to make authenticated requests. Without a CSRF token,
a third party website might just make a request to the vulner-
able website; the web browser will provide the stored cookie,
and the vulnerable website will mistake that for an authen-
5
Within limits; specifically within a sliding window, usually 32kB big.
Otherwise, the pointers would grow bigger than the sequences they’re
meant to compress.
6
An iframe is a web page embedded within a page.
7
The key-value pairs in a URL after the question mark, e.g. the
x=1&y=2 in http://example.test/path?x=1&y=2.
CHAPTER 15. SSL AND TLS 170
ticated request.
The attacker makes guesses at the value of the token,
starting with the first byte, and moving on one byte at a
time.8 When they guess a byte correctly, the ciphertext will
be just a little shorter: the compression algorithm will no-
tice that it’s seen this pattern before, and be able to com-
press the plaintext before encrypting. The plaintext, and
hence the compressed ciphertext, will therefore be smaller.
They can do this directly when the connection is using a
stream cipher or a similar construction such as CTR mode,
since they produce ciphertexts that are exactly as long as the
plaintexts. If the connection is using a block-oriented mode
such as CBC mode, the difference might get lost in the block
padding. The attacker can solve that by simply controlling
the prefix so that the difference in ciphertext size will be an
entire block.
Once they’ve guessed one byte correctly, they can move
on to the next byte, until they recover the entire token.
This attack is particularly interesting for a number of
reasons. Not only is it a completely new class of attack,
widely applicable to many cryptosystems, but compressing
the plaintext prior to encryption was actively recommended
by existing cryptographic literature. It doesn’t require any
particularly advanced tools: you only need to convince the
user to make requests to a vulnerable website, and you only
need to be able to measure the size of the responses. It’s also
extremely effective: the researchers that published BREACH
report being able to extract secrets, such as CSRF tokens,
within one minute.
In order to defend against CRIME, disable TLS compres-
sion. This is generally done in most systems by default. In
order to defend against BREACH, there are a number of pos-
sible options:
• Don’t allow the user to inject arbitrary data into the re-
8
They may be able to move more quickly than just one byte at a time,
but this is the simplest way to reason about.
CHAPTER 15. SSL AND TLS 171
quest.
15.8 HSTS
HSTS is a way for web servers to communicate that what
they’re saying should only ever be transferred over a secure
transport. In practice, the only secure transport that is ever
used for HTTP is TLS.
Using HSTS is quite simple; the web server just adds
an extra Strict-Transport-Security header to the
response. The header value contains a maximum age
(max-age), which determines how long into the future the
browser can trust that this website will be HSTS-enabled.
This is typically a large value, such as a year. Browsers
successfully remembering that a particular host is HSTS-
enabled is very important to the effectiveness of the scheme,
as we’ll see in a bit. Optionally, the HSTS header can include
the includeSubDomains directive, which details the scope
of the HSTS policy. [HJB]
CHAPTER 15. SSL AND TLS 172
16.1 Description
OpenPGP is an open standard that describes a method for en-
crypting and signing messages. GPG is the most popular im-
plementation of that standard1 , available under a free soft-
ware license.
Unlike TLS, which focuses on data in motion, OpenPGP
focuses on data at rest. A TLS session is active: bytes fly
back and forth as the peers set up the secure channel. An
OpenPGP interaction is, by comparison, static: the sender
computes the entire message up front using information
shared ahead of time. In fact, OpenPGP doesn’t insist that
anything is sent at all: for example, it can be used to sign
software releases.
Like TLS, OpenPGP is a hybrid cryptosystem. Users have
key pairs consisting of a public key and a private key. Pub-
lic key algorithms are used both for signing and encryption.
Symmetric key algorithms are used to encrypt the message
1
GPG 2 also implements S/MIME, which is unrelated to the OpenPGP
standard. This chapter only discusses OpenPGP.
175
CHAPTER 16. OPENPGP AND GPG 176
Off-The-Record
Messaging (OTR)
17.1 Description
OTR messaging is a protocol for securing instant messaging
communication between people [BGB04]. It intends to be
the online equivalent of a private, real-life conversation. It
encrypts messages, preventing eavesdroppers from reading
them. It also authenticates peers to each other, so they know
who they’re talking to. Despite authenticating peers, it is de-
signed to be deniable: participants can later deny to third
parties anything they said to each other. It is also designed
to have perfect forward secrecy: even a compromise of a
long-term public key pair doesn’t compromise any previous
conversations.
The deniability and perfect forward secrecy prop-
erties are very different from those of other systems
such as OpenPGP. OpenPGP intentionally guarantees non-
repudiability. It’s a great property if you’re signing software
packages, talking on mailing lists or signing business in-
179
CHAPTER 17. OFF-THE-RECORD MESSAGING (OTR) 180
voices, but the authors of OTR argue that those aren’t desir-
able properties for the online equivalent of one-on-one con-
versations. Furthermore, OpenPGP’s static model of com-
munication makes the constant key renegotiation to facili-
tate OTR’s perfect forward secrecy impossible.
OTR is typically configured opportunistically, which
means that it will attempt to secure any communication be-
tween two peers, if both understand the protocol, without
interfering with communication where the other peer does
not. The protocol is supported in many different instant
messaging clients either directly, or with a plugin. Because
it works over instant messages, it can be used across many
different instant messaging protocols.
A peer can signal that they would like to speak OTR with
an explicit message, called the OTR Query message. If the
peer is just willing to speak OTR but doesn’t require it, they
can optionally invisibly add that information to a plaintext
message. That happens with a clever system of whitespace
tags: a bunch of whitespace such as spaces and tab charac-
ters are used to encode that information. An OTR-capable
client can interpret that tag and start an OTR conversation;
an client that isn’t OTR-capable just displays some extra
whitespace.
OTR uses many of the primitives we’ve seen so far:
Commit message
Initially Alice and Bob are in a protocol state where they wait
for the peer to initiate an OTR connection, and advertise
their own capability of speaking OTR.
Let’s suppose that Bob chooses to initiate an OTR conver-
sation with Alice. His client sends an OTR Commit Message,
and then transitions to a state where he waits for a reply from
from Alice’s client.
To send a commit message, a client picks a random 128-
bit value r and a random 320-bit (or larger) Diffie-Hellman
secret x. It then sends E(r, g x ) and H(g x ) to the peer.
Key message
Alice’s client has received Bob’s client’s advertisement to
start an OTR session. Her client replies with a key message,
which involves creating a new Diffie-Hellman key pair. She
picks a 320-bit (or larger) Diffie-Hellman secret y and sends
g y to Bob.
CHAPTER 17. OFF-THE-RECORD MESSAGING (OTR) 182
MB = Mm1 (g x , g y , pB , iB )
XB = (pB , iB , S(pB , MB ))
He sends Alice r, Ec (XB ), Mm2 (Ec (XB )).
Signature Message
Alice can now confirm she’s talking to Bob directly, because
Bob signed the authenticator for the exchange MB with his
long-term DSA key.
Alice can now also compute the shared secret: Bob has
sent her r, which was previously used to encrypt Bob’s Diffie-
Hellman public key. She then computes H(g x ) herself, to
compare it against what Bob sent. By completing her side
of the Diffie-Hellman exchange (s = (g x )y ), she derives the
same keys: c, c′ , m1 , m′1 , m2 , m′2 . Using m2 , she can verify
Mm2 (Ec (XB )). Once that message is verified, she can safely
decrypt it using her computed c.
CHAPTER 17. OFF-THE-RECORD MESSAGING (OTR) 183
Authenticating Alice
Now Bob can also authenticate Alice, again by mirroring
steps. First, he verifies Mm′2 (Ec (XB )). This allows him to
check that Alice saw the same XB he sent.
Once he decrypts Ec′ (XA ), he has access to XA , which is
Alice’s long-term public key information. He can then com-
pute MA = Mm′1 (g y , g x , pA , iA ) to compare it with the ver-
sion Alice sent. Finally, he verifies S(pA , MA ) with Alice’s
public key.
Appendices
185
A
Modular arithmetic
186
APPENDIX A. MODULAR ARITHMETIC 187
(10 + 4) mod 12 = 2
(2 − 5) mod 12 = 9
In these equations, the mod is an operator, giving the re-
mainder after division. When we are dealing with modular
APPENDIX A. MODULAR ARITHMETIC 189
10 + 4 ≡ 2 (mod 12)
30 = 2 · 3 · 5
360 = 23 · 32 · 5
A.3 Multiplication
You might remember you were first taught multiplication as
repeated addition:
n·x=x
| +x+
{z. . . + x}
n times
and taking the modulus whenever the sum gets larger than
the modulus. You can also just do regular multiplication,
and then take the modulus at the end.
aϕ(n) ≡ 1 (mod n)
A.5 Exponentiation
Like multiplication is taught as repeated addition, exponen-
tiation can be thought of as repeated multiplication:
| · a ·{z. . . · a}
an = a
n times
APPENDIX A. MODULAR ARITHMETIC 193
220 = (210 )2
209 = 1 · 27 +1 · 26 +0 · 25 +1 · 24 +0 · 23 +0 · 22 +0 · 21 +1 · 20
= 1 · 128 +1 · 64 +0 · 32 +1 · 16 +0 · 8 +0 · 4 +0 · 2 +1 · 1
= 128 +64 +16 +1
3209 = 3128+64+16+1
= 3128 · 364 · 316 · 31
X
t−1
k= 2i ki
i=0
That definition might look scary, but all you’re really doing
here is defining ki as bit of k at position i. The sum goes over
all the bits: if k is t bits long, and we start indexing at 0, the
index of the highest bit is t − 1, and the index of the lowest
bit is 0. For example, the binary expansion of the number 6
is 0b110. That number is three bits long, so t = 3. So:
X
t−1
6= 2i ki
i=0
X2
= 2i ki
i=0
= k2 · 22 + k1 · 21 + k0 · 20
= 1 · 22 + 1 · 21 + 0 · 20
X
t−1
Lj = 2i−j ki
i=j
APPENDIX A. MODULAR ARITHMETIC 197
X
2
L1 = 2i−1 ki
i=1
= 2 · k2 + 20 · k1
1
| {z } | {z }
i=2 i=1
=2·1+1·1
=3
Lj = 2 · Lj+1 + kj
k = 0b110010111
Lj = L2 = 0b1100101
Lj+1 = L3 = 0b110010
2 · Lj+1 = 2 · L3 = 0b1100100
Hj = Lj + 1 ⇐⇒ Lj = Hj − 1
Lj = 2 · Lj+1 + kj
⇓ (Lj+1 = Hj+1 − 1)
Lj = Lj+1 + kj + Hj+1 − 1
⇓ (Lj+1 = Hj+1 − 1)
Lj = 2 · Hj+1 + kj − 2
Elliptic curves
y 2 = x3 + ax + b
x2 + y 2 = 1 + dx2 y 2
202
APPENDIX B. ELLIPTIC CURVES 203
Side-channel attacks
205
Part V
Glossary
206
APPENDIX C. SIDE-CHANNEL ATTACKS 207
IV initialization vector
APPENDIX C. SIDE-CHANNEL ATTACKS 209
MITM man-in-the-middle
mode of operation
OTR off-the-record
APPENDIX C. SIDE-CHANNEL ATTACKS 210
A
AEAD, 207
AEAD mode, 207
AES, 207
AKE, 207
ARX, 207
asymmetric-key algorithm, 207
asymmetric-key encryption, 207
B
BEAST, 207
block cipher, 207
C
Carter-Wegman MAC, 207
CBC, 207
CBC mode, 207
CDN, 207
cross-site request forgery, 207
CSPRNG, 208
CSRF, 208
CTR mode, 208
D
DES, 208
212
APPENDIX C. SIDE-CHANNEL ATTACKS 213
E
ECB mode, 208
encryption oracle, 208
F
FIPS, 208
G
GCM, 208
GCM mode, 208
GMAC, 208
H
HKDF, 208
HMAC, 208
HSTS, 208
I
initialization vector, 208
IV, 208
K
KDF, 209
key agreement, 209
key exchange, 209
keyspace, 209
M
MAC, 209
message authentication code, 209
MITM, 209
mode of operation, 209
modes of operation, 209
N
nonce, 209
APPENDIX C. SIDE-CHANNEL ATTACKS 214
O
OCB, 209
OCB mode, 209
one-time MAC, 209
oracle, 209
OTR, 209
OTR messaging, 210
P
PRF, 210
PRNG, 210
PRP, 210
public-key algorithm, 210
public-key encryption, 210
R
RSA, 210
S
salt, 210
secret-key encryption, 210
SMP, 210
stream cipher, 210
substitution-permutation network, 211
symmetric-key encryption, 211
Part VI
References
215
Bibliography
216
APPENDIX C. SIDE-CHANNEL ATTACKS 217