Lecture 1
Intro to Crypto and Cryptocurrencies
This lecture
Crypto background
hash functions
digital signatures
… and applications
Intro to cryptocurrencies
basic digital cash
Lecture 1.1:
Cryptographic Hash Functions
Hash function:
takes any string as input
fixed-size output (we’ll use 256 bits)
Security properties:
Property1: collision-free
Property2: Pre-image ressistance (hiding)
Property3: Secondary Pre-image resistance
(puzzle-friendly)
Hash property 1: Collision-free
Nobody can find x and y such that
x != y and H(x)=H(y)
H(x) = H(y)
y
Collisions do exist ...
possible outputs
possible inputs
… but can anyone find them?
Collision-Resistance
● Means it should be hard to find two different inputs of any length that
result in the same hash. This property is also referred to as collision
free hash function.
○ for a hash function h, it is hard to find any two different inputs x and y such that h(x) = h(y).
● Since, hash function is compressing function with fixed hash length, it is
impossible for a hash function not to have collisions. This property of
collision free only confirms that these collisions should be hard to find.
● This property makes it very difficult for an attacker to find two input
values with the same hash.
Birthday Paradox – Finding collisions
● if you gather up 20-30 people in one room, the odds of two
people sharing the exact same birthday rises up astronomically. In
fact, there is a 50-50 chance for 2 people of sharing the same
birthday in this scenario!
○ assuming that all days of the year have the same likelihood of
having a birthday, the chances of another person sharing your
birthday is 1/365 which is a 0.27%.
How to find the collisions?
● Suppose you have N different possibilities
of an even happening, then you need
square root of N random items for them to
have a 50% chance of a collision
○ So applying this theory for birthdays, you have 365
different possibilities of birthdays, so you just need
Sqrt(365), which is ~23~, randomly chosen people
for 50% chance of two people sharing birthdays.
How to find a collision?
Usually, collision happens after sqrt(N), where
N is total number of possible ways
For ex: For 256 bits output, N=2256
try 2130 randomly chosen inputs
99.8% chance that two of them will collide
This works no matter what H is …
… but it takes too long to matter
Is there a faster way to find collisions?
For some possible Hashes, yes.
For others, we don’t know of one.
It is infeasible to find the input having same
hash but not impossible
No Hash Function has been proven collision-
free.
Application: Hash as message digest
If we know H(x) = H(y),
it’s safe to assume that x = y.
To recognize a file that we saw before,
just remember its hash.
Useful because the hash is small.
Hash property 2: Pre-image
resistance (Hiding)
We want something like this:
Given H(x), it is infeasible to find x.
H(“heads”)
easy to find x!
H(“tails”)
Hash property 2: Pre-image
resistance (Hiding)
Hiding property:
If r is chosen from a probability distribution that has high
min-entropy, then given H(r | x), it is infeasible to find x.
High min-entropy means that the distribution is “very
spread out”, so that no particular value is chosen with more
than negligible probability.
Pre-image resistance
● Means that it should be computationally hard
to reverse a hash function.
○ if a hash function h produced a hash value
z, then it should be a difficult process to
find any input value x that hashes to z.
● This property protects against an attacker who
only has a hash value and is trying to find the
input.
Application: Commitment
Want to “seal a value in an envelope”, and
“open the envelope” later.
Commit to a value, reveal it later.
Commitment API
(com, key) := commit(msg)
match := verify(com, key, msg)
To seal msg in envelope:
(com, key) := commit(msg) -- then publish com
To open envelope:
publish key, msg
anyone can use verify() to check validity
Commitment API
(com, key) := commit(msg)
match := verify(com, key, msg)
Security properties:
Hiding: Given com, infeasible to find msg.
Binding: Infeasible to find msg != msg’ such that
verify(commit(msg), msg’) == true
Commitment API
commit(msg) := ( H(key | msg), H(key) )
where key is a random 256-bit value
verify(com, key, msg) := ( H(key | msg) == com )
Security properties:
Hiding: Given H(key | msg), infeasible to find msg.
Binding: Infeasible to find msg != msg’ such that
H(key | msg) == H(key | msg’)
Hash property 3: Second Pre-image
resistance Puzzle-friendly
Puzzle-friendly:
For every possible output value y,
if k is chosen from a distribution with high min-entropy,
then it is infeasible to find x such that H(k | x) = y.
Second Pre-image resistance
● Means given an input and its hash, it should be hard to
find a different input with the same hash.
○ if a hash function h for an input x produces hash value
h(x), then it should be difficult to find any other input
value y such that h(y) = h(x).
● This property of hash function protects against an attacker
who has an input value and its hash, and wants to
substitute different value as legitimate value in place of
original input value.
Application: Search puzzle
Given a “puzzle ID” id (from high min-entropy distrib.),
and a target set Y:
Try to find a “solution” x such that
H(id | x) ∈ Y.
Puzzle-friendly property implies that no solving strategy is
much better than trying random values of x.
Pictorial representations of properties of
Hash Function
Examples of cryptographic hash functions
● MD 5:
○ It produces a 128-bit hash. Collision resistance was broken
after ~2^21 hashes.
● SHA 1:
○ Produces a 160-bit hash. Collision resistance broke after
~2^61 hashes.
● SHA 256:
○ Produces a 256-bit hash. This is currently being used by
Bitcoin.
● Keccak-256:
○ Produces a 256-bit hash and is currently used by Ethereum.
SHA-256 hash function
Padding (10* | length)
512 bits
Message Message Message
(block 1) (block 2) (block n)
256 bits 256 bits
c c c
IV Hash
Theorem: If c is collision-free, then SHA-256 is collision-free.
SHA-256 Operation
● Takes the message you're hashing, breaks it up into blocks that are
512 bits in size, pad the blocks if it is not a multiple of 512 (i.e. a 1
followed by a certain number of 0)
● start with the 256-bit value called the IV, specified in the standards
document and the first block. This 768-bits string goes through a
special function cc(compression function) that outputs a 256-bits
string
● Then the compression function (Merkle‐Damgard transform) is
applied to the concatenation of the first output and the second block
● the process is repeated until the end of the blocks, the hash is the
final 256-bits output
Last 5 rounds of SHA-256 computation
Application of SHA-256 in bitcoin
Lecture 1.2:
Hash Pointers and Data Structures
Pointers and Linked Lists
● Pointers
○ Pointers are variables in programming which stores
the address of another variable.
● Linked Lists
○ a sequence of blocks, each containing data which is
linked to the next block via a pointer variable
which is pointing to address of the next node in it
and hence the connection is made
○ The first block is called as “genesis block”
Linked List
Hash Pointer
● hash pointer is:
○ pointer to where some info as well as the
(cryptographic) hash of the info are stored.
● if we have a hash pointer, we can
○ get the info back, and
○ verify that it hasn’t changed
Hash Pointer
H( )
(data) will draw hash pointers like this
key idea:
build data structures with hash pointers
Blockchain
● Blockchain is linked list with hash pointers
○ A series of blocks, each block has data as well as a hash
pointer to the previous block in the list
■ Benefit: Value of the previous block and a digest of that
value that allows us to verify that the value hasn’t changed
○ Achieves tamper-evident (immutable) property because of
hash pointer
■ The adversary changes the data of some block k . Since the
data has been changed, the hash in block k + 1, which is a
hash of the entire block k , is not going to match up due to
collision-resistant property
linked list with hash pointers = “block chain”
H( )
prev: H( ) prev: H( ) prev: H( )
data data data
use case: tamper-evident log
detecting tampering
H( )
prev: H( ) prev: H( ) prev: H( )
data data data
use case: tamper-evident log
Merkle Tree
● Binary tree with hash pointers = “Merkle tree”
○ In a Merkle tree, data blocks are grouped in pairs and the hash of
each of these blocks is stored in a parent node.
○ The parent nodes are in turn grouped in pairs and their hashes
stored one level up the tree.
○ This continues all the way up the tree until we reach the root node.
■ if an adversary tampers with some data block at the bottom of
the tree,
■ that will cause the hash pointer that’s one level up to not
match, and
■ even if he continues to tamper with this block, the change will
eventually propagate to the top of the tree where he won’t be
able to tamper with the hash pointer that we’ve stored.
binary tree with hash pointers = “Merkle tree”
H( ) H( )
H( ) H( ) H( ) H( )
H( ) H( ) H( ) H( ) H( ) H( ) H( ) H( )
(data) (data) (data) (data) (data) (data) (data) (data)
proving membership in a Merkle tree
show O(log n) items
H( ) H( ) where n is total
number of leaf nodes
H( ) H( )
H( ) H( )
(data)
Advantages of Merkle trees
● Tree holds many items but just need to remember the root hash
● Can verify membership in O(log n) time/space
● Variant: sorted Merkle tree where the blocks are ordered at the
bottom can verify non-membership in O(log n) (show items
before, after the missing one)
● Proof of Non-membership: simply by showing a path to the item
that’s just before where the item in question would be and
showing the path to the item that is just after where it would be
More generally ...
can use hash pointers in any pointer-based
data structure that has no cycles
Lecture 1.3:
Digital Signatures
What we want from signatures
Only you can sign, but anyone can verify
Signature is tied to a particular document
can’t be cut-and-pasted to another doc
Requirements for signatures
“valid signatures verify”
verify(pk, message, sign(sk, message)) == true
“can’t forge signatures”
adversary who:
knows pk
gets to see signatures on messages of his choice
can’t produce a verifiable signature on another message
Unforgeability game
● Unforgeability game
○ Participants: an adversary who claims that he can forge signatures
and a challenger that will test this claim
○ Generate keys to generate the secret key which is given to
challenger and public key to adversary
○ Allow the attacker to get signatures on some documents of his
choice, for as long as he wants, as long as the number of guesses is
plausible
○ After that, the attacker picks some message M which he never
sees, that he will attempt to forge a signature on
○ The challenger runs the verify algorithm to determine if the
signature produced by the attacker is a valid signature on M
■ If it successfully verifies, the attacker wins the game
Practical stuff...
algorithms are randomized
need good source of randomness
limit on message size
fix: use Hash(message) rather than message
fun trick: sign a hash pointer
signature “covers” the whole structure
Digital Signature with Hash
Bitcoin uses ECDSA standard
Elliptic Curve Digital Signature Algorithm
relies on hairy math
will skip the details here --- look it up if you care
good randomness is essential
foul this up in generateKeys() or sign() ?
probably leaked your private key
● ECDSA
○ a cryptographic algorithm used by Bitcoin to ensure that funds can
only be spent by their rightful owners
○ private key (256 bits or 32 byte):
■ A secret number, known only to the person that generated it.
■ someone with the private key that corresponds to funds on the
block chain can spend the funds
○ public key
■ A number that corresponds to a private key, but does not need
to be kept secret
■ used to determine if a signature is genuine
■ Compressed – 33 bytes
● prefix either 0x02 or 0x03, and a 256-bit integer called x
■ Uncompressed – 65 bytes
● constant prefix (0x04), followed by two 256-bit integers called x and y (2 * 32 bytes)
■ signature: A number that proves that a signing operation took
place
● ECDSA
○ An ellipsis is a special case of the general second-degree equation
ax² + bxy + cy² + dx + ey + f = 0.
■ Depending on the values of the parameters a to f, the resulting
graph could be a circle, hyperbola, or parabola.
■ Elliptic curve cryptography uses third-degree equations.
■ Digital Signature Standards defines two kinds of elliptic curves
for use with ECC
● pseudo-random curves
○ whose coefficients are generated from the output of a
seeded cryptographic hash function;
● Special curves
○ whose coefficients and underlying field have been
selected to optimize the efficiency of the elliptic curve
operations
Lecture 1.4:
Public Keys as Identities
Useful trick: public key == an identity
if you see sig such that verify(pk, msg, sig)==true,
think of it as
pk says, “[msg]”.
to “speak for” pk, you must know matching secret key sk
How to make a new identity
create a new, random key-pair (sk, pk)
pk is the public “name” you can use
[usually better to use Hash(pk)]
sk lets you “speak for” the identity
you control the identity, because only you know sk
if pk “looks random”, nobody needs to know who you are
Decentralized identity management
anybody can make a new identity at any time
make as many as you want!
no central point of coordination
These identities are called “addresses” in Bitcoin.
Privacy
Addresses not directly connected to real-world identity.
But observer can link together an address’s activity over
time, make inferences.
Later: a whole lecture on privacy in Bitcoin ...
Lecture 1.5:
Simple Cryptocurrencies
GoofyCoin
Goofy can create new coins
New coins belong to me.
signed by pkGoofy
CreateCoin [uniqueCoinID]
A coin’s owner can spend it.
Alice owns it now.
signed by pkGoofy
Pay to pkAlice : H( )
signed by pkGoofy
CreateCoin [uniqueCoinID]
The recipient can pass on the coin again.
signed by pkAlice Bob owns it now.
Pay to pkBob : H( )
signed by pkGoofy
Pay to pkAlice : H( )
signed by pkGoofy
CreateCoin [uniqueCoinID]
double-spending attack
signed by pkAlice signed by pkAlice
Pay to pkBob : H( ) Pay to pkChuck : H( )
signed by pkGoofy
Pay to pkAlice : H( )
signed by pkGoofy
CreateCoin [uniqueCoinID]
double-spending attack
the main design challenge in digital currency
ScroogeCoin
Scrooge publishes a history of all transactions
(a block chain, signed by Scrooge)
H( )
prev: H( ) prev: H( ) prev: H( )
transID: 71 transID: 72 transID: 73
trans trans trans
optimization: put multiple transactions in the same block
CreateCoins transaction creates new coins
Valid, because I said so.
transID: 73 type:CreateCoins
coins created
num value recipient
0 3.2 0x... coinID 73(0)
1 1.4 0x... coinID 73(1)
2 7.1 0x... coinID 73(2)
PayCoins transaction consumes (and destroys) some coins,
and creates new coins of the same total value
transID: 73 type:PayCoins
consumed coinIDs:
68(1), 42(0), 72(3)
Valid if:
coins created -- consumed coins valid,
-- not already consumed,
num value recipient
-- total value out = total value in, and
0 3.2 0x... -- signed by owners of all consumed coins
1 1.4 0x...
2 7.1 0x...
signatures
Immutable coins
Coins can’t be transferred, subdivided, or combined.
But: you can get the same effect by using transactions
to subdivide: create new trans
consume your coin
pay out two new coins to yourself
Don’t worry, I’m honest.
Crucial question:
Can we descroogify the currency,
and operate without any central,
trusted party?