Hashing For Message Authentication by Avi Kak
Hashing For Message Authentication by Avi Kak
March 4, 2010
c
2010 Avinash Kak, Purdue University
Goals:
• Simple hashing
1
15.1: What is a Hash Function?
• For example, the SHA-512 hash function takes for input mes-
sages of length up to 2128 bits and produces as output a 512-bit
message digest (MD). (SHA stands for Secure Hash Al-
gorithm. [Note: A series of SHA algorithms has been developed by
the National Institute of Standards and Technology and published as
Federal Information Processing Standards (FIPS).]
2
• Since a message digest depends on all the bits in the input mes-
sage, any alteration of the input message during transmission
would cause its message digest to not match with its original mes-
sage digest. This can be used to check for forgeries, unauthorized
alterations, etc. To see the change in the hash code produced by
the most innocuous of changes in the input message:
Input message: "A hungry brown fox jumped over a lazy dog"
SHA1 hash code: a8e7038cf5042232ce4a2f582640f2aa5caf12d2
Input message: "A hungry brown fox jumped over a lazy dog"
SHA1 hash code: d617ba80a8bc883c1c3870af12a516c4a30f8fda
The only difference between the two messages is the extra space
between the words “hungry” and “brown” in the second message.
Notice how completely different the hash code looks. SHA-1 pro-
duces a 160 bit hash code. It takes 40 hex characters to show the
code in hex. [The hash codes shown were produced by the following Perl script:
#!/usr/bin/perl -w
use Digest::SHA1;
my $hasher = Digest::SHA1->new();
$hasher->add( "A hungry brown fox jumped over a lazy dog" );
print $hasher->hexdigest;
print ‘‘\n’’;
$hasher->add( "A hungry brown fox jumped over a lazy dog" );
print $hasher->hexdigest;
print ‘‘\n’’;
As the script shows, this uses the SHA-1 algorithm for creating the message digest. Perl’s Digest
module can be used to invoke any of over fifteen hashing algorithms. The module can output the
hash code in either binary format, or in hex format, or a binary string output as in the form of a
base64-encoded string. In Python, you can use the sha module. Both the Digest module for Perl
and the sha module for Python come with the standard distribution of the languages. ]
3
15.2: Different Ways to Use Hashing for Message
Authentication
Figures 1 and 2 show six different ways in which you could incorpo-
rate message hashing in a communication network. These constitute
different approaches to protect the hash value of a message. No
authentication at the receiving end could possibly be achieved if both
the message and its hash value are accessible to an adversary wanting
to tamper with the message. To explain each scheme separately:
5
• Finally, the scheme in Figure 2(c) shows an extension of the
scheme of Figure 2(b) where we have added symmetric-key based
confidentiality to the transmission between the sender and the
receiver.
6
Party A Party B
MESSAGE
Compare
Calculate Calculate HASH
Hash K K Hash
HASH
(a)
Party A Party B
MESSAGE
Calculate HASH
Compare
Calculate
Hash Hash
HASH K
ENCRYPT K DECRYPT
concatenate Encrypted
MESSAGE Hash
(b)
Party A Party B
MESSAGE
Calculate HASH
Compare
Calculate
Hash Hash
A’s Public Key
HASH
concatenate Encrypted
MESSAGE Hash
(c)
7
Figure 1: This figure is from Lecture 15 of “Computer and Net-
work Security” by Avi Kak
Party A Party B
MESSAGE
Calculate HASH
Compare
Calculate
Hash Hash
HASH A’s Public Key
K K
(a)
Party A Party B
concatenate concatenate
Calculate
Compare
Message Shared Secret
Hash Only
HASH HASH
(b)
Party A Party B
concatenate
concatenate
Calculate
Compare
8
(c)
• Hash functions that are not collision resistant can fall prey
to birthday attack. More on that later.
9
• If you use n bits to represent the hash code, there are only 2n dis-
tinct hash code values. If we place no constraints whatsoever on
the messages, then obviously there will exist multiple messages
giving rise to the same hash code. But then considering mes-
sages with no constraints whatsoever does not represent reality
because messages are not noise — they must possess consider-
able structure in order to be intelligible to humans. Collision
resistance refers to the likelihood that two different
messages possessing certain basic structure so as to
be meaningful will result in the same hash code.
• Ideally (if authentication is the only issue and we are not con-
cerned about confidentiality), to ward off message alteration by
en-route ill-intentioned agents, we would like to send unencrypted
plaintext messages with encrypted hash codes. (This elimi-
nates the computational overhead of encryption and
decryption for the main message content and yet al-
lows for authentication.) But this only works when collision
resistance is perfect. If a hashing approach has poor collision re-
sistance, all that an adversary has to do is to compute the hash
code of the message content and replace it with some other con-
tent that has the same hash code value. The fact that the
hash code value is encrypted does not do us any good
here.
10
15.4: Simple Hash Functions
• With this algorithm, every bit of the hash code represents the
parity at that bit position if we look across all of the b-bit blocks.
For that reason, the hash code produced is also known as longi-
tudinal parity check.
12
• That the collision resistance of ROXR is also poor is obvious from
the fact that we can take a message M1 along with its hash code
value h1; replace M1 by a message M2 of hash code value h2;
append a block of gibberish at the end M2 to force the hash code
value of the composite to be h1. So even if M1 was transmitted
with an encrypted h1, it does not do us much good from the
standpoint of authentication. We will see later how secure
hash algorithms make this ploy impossible by includ-
ing the length of the message in what gets hashed.
Message = "abc"
length L = 24 bits
13
15.5: What does Probability Theory Have to Say
about a Randomly Produced Message Having a
Particular Hash Code Value?
– Since h(y) either equals h(x) or does not equal h(x), the prob-
ability that h(y) does not equal h(x) is 1 − N1 .
1 − 1 − (1)
N
15
(1 + a)n ≈ 1 + an. Therefore, the probability expression we
derived can be approximated by
k k
≈ 1 − 1 − = (2)
N N
• Consider the case when we use 64 bit hash codes. In this case,
N = 264. We will have to construct a pool of 263 messages so that
the pool contains at least one message whose hash code equals
h(x) with a probability of 0.5.
16
15.6: What is the Probability that a Pair of Messages
will Have the Same Hash Code Value?
17
N!
1 − (3)
(N − k)!N k
– For the first message in the pool, we can choose any arbitrar-
ily. Since there are only N different messages with distinct
hash codes, so there are N ways to choose the first entry for
18
the pool. Stated differently, there is a choice of N different
candidates for the first entry in the pool.
N!
M1 = N × (N − 1) × . . . × (N − k + 1) = (4)
(N − k)!
– Let’s now try to figure out the total number of ways, M2, in
which we can construct a pool of k messages without worrying
at all about duplicate hash codes. Reasoning as before, there
are N ways to choose the first message. For selecting the
second message, we pay no attention to the hash code value
of the first message. There are still N ways to select the second
19
message; and so on. Therefore, the total number of ways we
can construct a pool of k messages without worrying about
hash code duplication is
M2 = N × N × . . . × N = Nk (5)
20
and that is the same as
1 2 k−1
" ! ! !#
1 − 1− × 1− × ...× 1− (10)
N N N
• We can use the above formula to estimate the size k of the pool
so that the pool contains at least one pair of messages with equal
hash codes with a probability of 0.5. We need to solve
k(k−1) 1
1 − e− 2N =
2
21
Simplifying, we get
k(k−1)
e 2N = 2
Therefore,
k(k − 1)
= ln2
2N
which gives us
k(k − 1) = (2ln2)N
k2 ≈ (2ln2)N (13)
implying
q
k ≈ (2ln2)N
√
≈ 1.18 N
√
≈ N
22
• So if we use an n-bit hash code, we have N = 2n. In this case,
a message pool of 2n/2 randomly generated messages will con-
tain at least one with a specified value for the hash code with a
probability of 0.5.
• Let’s again consider the case of 64 bit hash codes. Now N = 264.
So a pool of 232 randomly generated messages will have at least
one pair with identical hash codes with a probability of 0.5.
23
15.7: The Birthday Attack
• Now the question is: “What is the probability that the two sets
of contracts will have at least one contract each with the same
hash code?”
24
• Let the set of variations on the correct form of the contract be
denoted {c1, c2, . . . , ck } and the set of variations on the fraudu-
lent contract by {f1, f2, . . . , fk }. We need to figure out the
probability that there exists at least one pair (ci , fj )
so that h(ci ) = h(fj ).
• The probability that the same holds conjunctively for all members
of the set {c1, c2, . . . , ck } would therefore be
!k 2
1
1−
N
25
This is the probability that there will NOT exist any
hash code matches between the two sets of contracts
{c1, c2, . . . , ck } and {f1, f2, . . . , fk }.
1
• Since 1 − N1 is always less than e− N , the above probability will
always be greater than
k 2
− N1
1 − e
which gives us
26
q √ √
k = (ln 2)N = 0.83 N ≈ N
√
So if B is willing to generate N versions of the both the correct
contract and the fraudulent contract, there is better than an even
chance that B will find a fraudulent version to replace the correct
version.
27
15.8: Structure of Cryptographically Secure Hash
Functions
• The final block also includes the total length of the message whose
hash function is to be computed. This step enhances the security
of the hash function since it places an additional constraint on
28
the counterfeit messages.
• For the n-bit input, the first stage is supplied with a special N -bit
pattern called the Initialization Vector (IV).
• The function f that processes the two inputs, one n bits long and
the other b bits long, to produce an n bit output is usually called
the compression function. That is because, usually, b > n,
so the output of the f function is shorter than the length of the
input message segment.
29
• The precise nature of f depends on what hash algorithm is being
implemented, as we will see in the rest of this lecture.
30
Message Message Length +
Block 1 Block 2 Padding
Hash
Initialization f f f
Vector n bits n bits n bits n bits
31
15.9: The SHA Family of Hash Functions
• The most commonly used hash function from the SHA family
is SHA-1. It is used in many applications and protocols that
require secure and authenticated communications. SHA-1 is used
in SSL/TLS, PGP, SSH, S/MIME, and IPSec. (These standards
will be briefly reviewed in Lecture 20.)
Here is what the different columns of the above table stand for:
32
– The column Message Size shows the upper bound on the size
of the message that an algorithm can handle.
– The column heading Block Size is the size of each bit block
that the message is divided into. Recall from Section 15.8 that
an input message is divided into a sequence of b-bit blocks.
Block size for an algorithm tells us the value of b in the figure
on Slide 27.
33
• Also note that SHA-1 is a successor to MD5 that used to be a
widely used hash function.
34
15.10: The SHA-512 Secure Hash Algorithm
• The last 128 bits of what gets hashed are reserved for the
message length value.
• Leaving aside the trailing 128 bit positions, the padding con-
sists of a single 1-bit followed by the required number of 0-bits.
35
• The length value in the trailing 128 bit positions is an unsigned
integer with its most significant byte first.
37
• The ith round is fed the 64-bit message schedule word Wi and
a special constant Ki.
• How the contents of the hash buffer are processed along with
the inputs Wi and Ki is referred to as implementing the
round function.
a = T1 + T2
b = a
c = b
d = c
e = d + T1
38
f = e
g = f
h = g
where
X
T1 = h +64 Ch(e, f, g) +64 e +64 Wi +64 Ki
X
T2 = a +64 Maj(a, b, c)
Ch(e, f, g) = (e AN D f ) ⊕ (N OT e AN D g)
Maj(a, b, c) = (a AN D b) ⊕ (a AN D c) ⊕ (b AN D c)
ROT R28 (a) ⊕ ROT R24 (a) ⊕ ROT R39 (a)
X
a =
ROT R14 (e) ⊕ ROT R18 (e) ⊕ ROT R41 (e)
X
e =
+64 = addition modulo 264
Note that, when considered on a bit-by-bit basis the function
M aj() is true, that is equal to the bit 1, only when a majority
of its arguments (meaning two out of three) are true. Also,
the function Ch() implements at the bit level the conditional
statement “if arg1 then arg2 else arg3”.
Finally, ....: After all the N message blocks have been processed
(see Figure 4), the content of the hash buffer is the message digest.
39
Augmented Message: Multiple of 1024−bit blocks
Padding +
Length
Block 1 Block 2 Block N
M1 M2 MN
Initialization
Vector
Hash
512 bits f f f
H0 H1 H2 HN−1 HN
512 bits 512 bits 512 bits 512 bits 512 bits
40
Mi H
i−1
Compression function f
Message
Schedule
Eight 64−bit registers of
a b c d e f g h the 512 bit hash buffer
W
0
Round 0
K0
a b c d e f g h
W
1
Round 1
K1
a b c d e f g h
W
79
Round 79
K
79
a b c d e f g h
64
+ + + + + + + + Addition Modulo 2
a b c d e f g h
H
i
42
• Another way to generate a MAC would be to compress the mes-
sage into a fixed-size signature and to then encrypt the signature
with an algorithm like DES. The output of the encryption algo-
rithm becomes the MAC value and the encryption key the secret
that must be shared between the sender and the receiver of a
message.
43
– We now define
C(K, M) = E(K, ∆(M))
45
former by repeating the 00110110 sequence b/8 times, and the
latter by repeating 01011100 also b/8 times.
46
+
K
ipad
+
K
HASH
opad
n bit hash
b bits
pad n−bit hash to b bits
b bits b bits
HASH
HMAC
n bits
5. The very first step in the SHA1 algorithm is to pad the message
so that it is a multiple of 512 bits. This padding occurs as follows
(from NIST FPS 180-2): Suppose the length of the message M
is L bits. Append bit 1 to the end of the message, followed by K
zero bits where K is the smallest non-negative solution to
L + 1 + K = 448 mod 512
48
Next append a 64-bit block that is a binary representation of the
length integer L. For example,
Message = "abc"
length L = 24 bits
a b c <---423---> <---64---->
6. The fact that only the last 64 bits of the padded message are
used for representing the length of the message implies that SHA1
should NOT be used for messages that are longer than what?
10. Right or wrong: When you create a new password, only the hash
code for the password is stored. The text you entered for the
password is immediately discarded.
51
Acknowledgement
52