0% found this document useful (0 votes)

392 views52 pages

Hashing For Message Authentication by Avi Kak

This document discusses hashing for message authentication. It defines a hash function as taking a variable input message and producing a fixed-size output called the hash code or message digest. It then describes several ways hashing can be used for message authentication, including encrypting the message and hash together, encrypting just the hash, using public/private key encryption on the hash, and appending a secret string to the message before hashing. The goal is to ensure the integrity of the message by preventing unauthorized changes.

Uploaded by

Kumara Prathipati

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

392 views52 pages

Hashing For Message Authentication by Avi Kak

Uploaded by

Kumara Prathipati

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 52

Lecture 15: Hashing for Message Authentication

Lecture Notes on “Computer and Network Security”

by Avi Kak ([email protected])

March 4, 2010

c
2010 Avinash Kak, Purdue University

Goals:

• What is a hash function?

• Different ways to use hashing for message authentication

• The one-way and collision-resistance properties of secure hash functions

• Simple hashing

• The birthday paradox and the birthday attack

• Structure of cryptographically secure hash functions

• SHA Series of Hash Functions

• Message Authentication Codes

1
15.1: What is a Hash Function?

• A hash function takes a variable sized input message and

produces a fixed-sized output. The output is usually referred
to as the hash code or the hash value or the message di-
gest.

• For example, the SHA-512 hash function takes for input mes-
sages of length up to 2128 bits and produces as output a 512-bit
message digest (MD). (SHA stands for Secure Hash Al-
gorithm. [Note: A series of SHA algorithms has been developed by
the National Institute of Standards and Technology and published as
Federal Information Processing Standards (FIPS).]

• We can think of the hash code as a fixed-sized fingerprint of

a variable-sized message.

• Message digests produced by the most commonly used hash func-

tions range in length from 160 to 512 bits depending on the al-
gorithm used.

2
• Since a message digest depends on all the bits in the input mes-
sage, any alteration of the input message during transmission
would cause its message digest to not match with its original mes-
sage digest. This can be used to check for forgeries, unauthorized
alterations, etc. To see the change in the hash code produced by
the most innocuous of changes in the input message:

Input message: "A hungry brown fox jumped over a lazy dog"
SHA1 hash code: a8e7038cf5042232ce4a2f582640f2aa5caf12d2

Input message: "A hungry brown fox jumped over a lazy dog"
SHA1 hash code: d617ba80a8bc883c1c3870af12a516c4a30f8fda

The only difference between the two messages is the extra space
between the words “hungry” and “brown” in the second message.
Notice how completely different the hash code looks. SHA-1 pro-
duces a 160 bit hash code. It takes 40 hex characters to show the
code in hex. [The hash codes shown were produced by the following Perl script:
#!/usr/bin/perl -w
use Digest::SHA1;
my $hasher = Digest::SHA1->new();
$hasher->add( "A hungry brown fox jumped over a lazy dog" );
print $hasher->hexdigest;
print ‘‘\n’’;
$hasher->add( "A hungry brown fox jumped over a lazy dog" );
print $hasher->hexdigest;
print ‘‘\n’’;

As the script shows, this uses the SHA-1 algorithm for creating the message digest. Perl’s Digest

module can be used to invoke any of over fifteen hashing algorithms. The module can output the

hash code in either binary format, or in hex format, or a binary string output as in the form of a

base64-encoded string. In Python, you can use the sha module. Both the Digest module for Perl

and the sha module for Python come with the standard distribution of the languages. ]

3
15.2: Different Ways to Use Hashing for Message
Authentication

Figures 1 and 2 show six different ways in which you could incorpo-
rate message hashing in a communication network. These constitute
different approaches to protect the hash value of a message. No
authentication at the receiving end could possibly be achieved if both
the message and its hash value are accessible to an adversary wanting
to tamper with the message. To explain each scheme separately:

• In the symmetric-key encryption based scheme shown in Figure

1(a), the message and its hash code are concatenated together
to form a composite message that is then encrypted and placed
on the wire. The receiver decrypts the message and separates
out its hash code, which is then compared with the hash code
calculated from the received message. The hash code provides
authentication and the encryption provides confidentiality.

• The scheme shown in Figure 1(b) is a variation on Figure 1(a) in

the sense that only the hash code is encrypted. This scheme is
efficient to use when confidentiality is not the issue but message
authentication is critical. Only the receiver with access to the
secret key knows the real hash code for the message. So the
receiver can verify whether or not the message is authentic.
4
• The scheme in Figure 1(c) is a public-key encryption version of
the scheme shown in Figure 1(b). The hash code of the message is
encrypted with the sender’s private key. The receiver can recover
the hash code with the sender’s public key and authenticate the
message as indeed coming from the alleged sender. Confidential-
ity again is not the issue here. The sender encrypting with
his/her private key the hash code of his/her message
constitutes the basic idea of digital signatures.

• If we want to add symmetric-key based confidentiality to the

scheme of Figure 1(c), we can use the scheme shown in Figure
2(a). This is a commonly used approach when both confidential-
ity and authentication are needed.

• A very different approach to the use of hashing for authentica-

tion is shown in Figure 2(b). In this scheme, nothing is encrypted.
However, the sender appends a secret string S, known also to the
receiver, to the message before computing its hash code. Before
checking the hash code of the received message for its authen-
tication, the receiver appends the same secret string S to the
message. Obviously, it would not be possible for anyone to alter
such a message, even when they have access to both the original
message and the overall hash code.

5
• Finally, the scheme in Figure 2(c) shows an extension of the
scheme of Figure 2(b) where we have added symmetric-key based
confidentiality to the transmission between the sender and the
receiver.

6
Party A Party B

MESSAGE

Compare
Calculate Calculate HASH
Hash K K Hash
HASH

concatenate ENCRYPT DECRYPT MESSAGE HASH

(a)

Party A Party B

MESSAGE

Calculate HASH

Compare
Calculate
Hash Hash
HASH K

ENCRYPT K DECRYPT

concatenate Encrypted
MESSAGE Hash

(b)

Party A Party B

MESSAGE

Calculate HASH
Compare

Calculate
Hash Hash
A’s Public Key
HASH

ENCRYPT A’s Private Key DECRYPT

concatenate Encrypted
MESSAGE Hash

(c)

7
Figure 1: This figure is from Lecture 15 of “Computer and Net-
work Security” by Avi Kak
Party A Party B

MESSAGE

Calculate HASH

Compare
Calculate
Hash Hash
HASH A’s Public Key

ENCRYPT A’s Private Key DECRYPT

concatenate ENCRYPT DECRYPT Encrypted

MESSAGE Hash

K K

(a)

Party A Party B

MESSAGE Calculate HASH

Shared Secret Hash

concatenate concatenate

Calculate

Compare
Message Shared Secret
Hash Only
HASH HASH

concatenate MESSAGE HASH

(b)

Party A Party B

MESSAGE Calculate HASH

Shared Secret Hash

concatenate
concatenate

Calculate
Compare

Message Shared Secret

Hash K K Only
HASH HASH

concatenate Encrypt Decrypt MESSAGE HASH

8
(c)

Figure 2: This figure is from Lecture 15 of “Computer and Net-

work Security” by Avi Kak
15.3: When is a Hash Function Secure?

• A hash function is called secure if the following two conditions

are satisfied:

– If it is computationally infeasible to find a message that

corresponds to a given hash code. This is sometimes referred
to as the one-way property of a hash function.

– If it is computationally infeasible to find two different

messages that hash to the same hash code value. This is also
referred to as the strong collision resistance property of
a hash function.

• A weaker form of the strong collision resistance property is that

for a given message, there should not correspond another mes-
sage with the same hash code.

• Hash functions that are not collision resistant can fall prey
to birthday attack. More on that later.

9
• If you use n bits to represent the hash code, there are only 2n dis-
tinct hash code values. If we place no constraints whatsoever on
the messages, then obviously there will exist multiple messages
giving rise to the same hash code. But then considering mes-
sages with no constraints whatsoever does not represent reality
because messages are not noise — they must possess consider-
able structure in order to be intelligible to humans. Collision
resistance refers to the likelihood that two different
messages possessing certain basic structure so as to
be meaningful will result in the same hash code.

• Ideally (if authentication is the only issue and we are not con-
cerned about confidentiality), to ward off message alteration by
en-route ill-intentioned agents, we would like to send unencrypted
plaintext messages with encrypted hash codes. (This elimi-
nates the computational overhead of encryption and
decryption for the main message content and yet al-
lows for authentication.) But this only works when collision
resistance is perfect. If a hashing approach has poor collision re-
sistance, all that an adversary has to do is to compute the hash
code of the message content and replace it with some other con-
tent that has the same hash code value. The fact that the
hash code value is encrypted does not do us any good
here.

10
15.4: Simple Hash Functions

• Practically all algorithms for computing the hash code of a mes-

sage view the message as a sequence of n-bit blocks.

• The message is processed one block at a time in an iterative

fashion to produce an n-bit hash code.

• Perhaps the simplest hash function consists of starting with the

first n-bit block, XORing it bit-by-bit with the second n-bit block,
XORing the result with the next n-bit block, and so on. We will
refer to this as the XOR hash algorithm.

• With this algorithm, every bit of the hash code represents the
parity at that bit position if we look across all of the b-bit blocks.
For that reason, the hash code produced is also known as longi-
tudinal parity check.

• The hash code generated by the XOR algorithm can be useful as

a data integrity check in the presence of completely random
transmission errors. But, in the presence of an adversary trying
11
to deliberately tamper with the message content, the XOR al-
gorithm is useless for message authentication. An adversary
can modify the main message and add a suitable bit
block before the hash code so that the final hash code
remains unchanged.

• Another problem with this simple algorithm is its somewhat re-

duced collision resistance for structured documents. Ideally, one
would hope that, with an n-bit hash code, any particular message
would result in a given hash code value with a probability of 21n .
But now consider the case when the characters in a text message
are represented by their ASCII codes. Since the highest bit in
each byte for each character will always be 0, you can see that
some of the n bits in the hash code will predictably be 0 with the
simple XOR algorithm. This obviously reduces the num-
ber of unique hash code values available to us, and
thus increases the probability of collisions.

• To increase the space of distinct hash code values available for

the different messages, a variation on the basic XOR algorithm
consists of performing a one-bit circular shift of the partial hash
code obtained after each n-bit block of the message is processed.
This algorithm is known as the rotated-XOR algorithm (ROXR).

12
• That the collision resistance of ROXR is also poor is obvious from
the fact that we can take a message M1 along with its hash code
value h1; replace M1 by a message M2 of hash code value h2;
append a block of gibberish at the end M2 to force the hash code
value of the composite to be h1. So even if M1 was transmitted
with an encrypted h1, it does not do us much good from the
standpoint of authentication. We will see later how secure
hash algorithms make this ploy impossible by includ-
ing the length of the message in what gets hashed.

• As a quick example of including the length of the message in what

gets hashed, here is how the very popular SHA-1 algorithm pads
the message before it is hashed:
The very first step in the SHA1 algorithm is to pad the message
so that it is a multiple of 512 bits.

This padding occurs as follows (from NIST FPS 180-2):

Suppose the length of the message M is L bits.

Append bit 1 to the end of the message, followed by K

zero bits where K is the smallest nonnegative solution to

L + 1 + K = 448 mod 512

Next append a 64-bit block that is a binary representation

of the length integer L.

Consider the following example:

Message = "abc"
length L = 24 bits

This is what the padded bit pattern would look like:

01100001 01100010 01100011 1 00......000 00...011000

a b c <---423---> <---64---->
<------------------- 512 ------------------------------>

13
15.5: What does Probability Theory Have to Say
about a Randomly Produced Message Having a
Particular Hash Code Value?

• Assume that we have random message generator and that

we calculate the hash code for each message.

• Let’s say we have in our possession a message x whose hash code

is h(x).

• Let’s consider a pool of k messages produced randomly by this

generator. Since we are not placing any constraints on messages,
there is an infinite number of different messages that the generator
can produce. So the probability that any of the k messages in
the pool is the same as x is practically 0.

• Now we pose the following question: What is the value of k so

that the pool contains at least one message y for which the
probability of h(y) being equal to h(x) is 0.5?

• To find k, we reason as follows:

14
– Let’s say that the hash code can take on N different values.
If the message generator is truly random in its construction of
messages, all hash code values will be equally probable.

– Say we pick massage y at random from the pool of messages.

The probability that h(y) has any particular value is N1 . Since
h(x) is given, the probability that h(y) equals h(x) is N1 .

– Since h(y) either equals h(x) or does not equal h(x), the prob-
ability that h(y) does not equal h(x) is 1 − N1 .

– It follows that the probability that none of the messages in a

pool of k messages has its hash codes equal to h(x) is (1− N1 )k .

– Therefore, the probability that at least one of the k mes-

sages has its hash code equal to h(x) is
k
1
 

1 − 1 −  (1)
N

– The probability expression shown above can be considerably

simplified by recognizing that as a approaches 0, we can write

15
(1 + a)n ≈ 1 + an. Therefore, the probability expression we
derived can be approximated by

k k
 

≈ 1 − 1 −  = (2)
N N

• So the upshot is that, given a pool of k randomly produced mes-

sages, the probability there will exist at least one message in this
pool whose hash code equals the given value h(x) is Nk .

• Let’s now go back to the original question: How large should k be

so that the pool of messages contains at least one message whose
hash code equals the given value h(x) with a probability of 0.5?
We obtain the value of k from the equation Nk = 0.5. That is,
k = 0.5N .

• Consider the case when we use 64 bit hash codes. In this case,
N = 264. We will have to construct a pool of 263 messages so that
the pool contains at least one message whose hash code equals
h(x) with a probability of 0.5.

16
15.6: What is the Probability that a Pair of Messages
will Have the Same Hash Code Value?

• Given a pool of k messages, the question “What is the probability

that any message in the pool has its hash code equal to a
particular value?” is very different from the question
“What is the probability that any pair of messages in the pool
will have the same hash code?”

• The question “What is the probability that, in a class of 20

students, someone else has the same birthday as yours?” is
very different from the question “What is the probability
that there exists at least one pair of students in a class of
20 students with the same birthday?” The probability of the
19
former is approximately 365 , and the probability of the latter is
roughly the much larger value 19×18/2
365 = 171
365 (This is referred to
as the birthday paradox, paradox only in the sense that it
seems counterintuitive.)

• Given a pool of k messages, each of which has a hash code value

from N possible such values, the probability that the pool will
contain at least one pair with identical hash code value is given
by

17
N!
1 − (3)
(N − k)!N k

• The following reasoning establishes the above result. The rea-

soning consists of figuring out the total number of ways (M1) in
which we can construct a pool of k message with no duplicate
hash codes and the total number of ways (M2) we can do the
same while allowing for duplicates. The ratio M1/M2 then gives
us the probability of constructing a pool of k messages with no
duplicates. Subtracting this from 1 yields the probability that
the pool of k messages will have at least one duplicate hash code.

– Let’s consider in how many different ways we can construct

a pool of k messages so that we are guaranteed to have no
duplicate hash codes in the pool. For the sake of this men-
tal experiment, let’s assume that we have available to us a
very large set of randomly generated message – hash-code
pairs, that is randomly generated pairs of {x, h(x)}, where x
is a message and h(x) its hash code.

– For the first message in the pool, we can choose any arbitrar-
ily. Since there are only N different messages with distinct
hash codes, so there are N ways to choose the first entry for

18
the pool. Stated differently, there is a choice of N different
candidates for the first entry in the pool.

– Having used up one hash code, we can select a message corre-

sponding to the other N − 1 still available hash codes for the
second entry for the pool.

– Having used up two distinct hash code values, we can select a

message corresponding to the other N − 2 still available hash
codes for the third entry for the pool; and so on.

– Therefore, the total number of ways, M1, in which we can

construct a pool of k messages with no duplications in hash
code values is

N!
M1 = N × (N − 1) × . . . × (N − k + 1) = (4)
(N − k)!

– Let’s now try to figure out the total number of ways, M2, in
which we can construct a pool of k messages without worrying
at all about duplicate hash codes. Reasoning as before, there
are N ways to choose the first message. For selecting the
second message, we pay no attention to the hash code value
of the first message. There are still N ways to select the second
19
message; and so on. Therefore, the total number of ways we
can construct a pool of k messages without worrying about
hash code duplication is

M2 = N × N × . . . × N = Nk (5)

– Therefore, the probability of constructing a pool of k messages

with no duplications in hash codes is
M1 N!
= (6)
M2 (N − k)!N k

– Therefore, the probability of constructing a pool of k messages

with at least one duplication in the hash code values is
N!
1 − (7)
(N − k)!N k

• The probability expression in Equation (3) (or Equation (7) above)

can be simplified by rewriting it in the following form:
N × (N − 1) × . . . × (N − k + 1)
1 − (8)
Nk
which is the same as
N N −1 N −k+1
1 − × × ...× (9)
N N N

20
and that is the same as
1 2 k−1
" ! ! !#
1 − 1− × 1− × ...× 1− (10)
N N N

• We will now use the approximation that (1 − x) ≤ e−x for all

x ≥ 0 to make the claim that the above probability is lower-
bounded by

− N1 − N2 − k−1
1 − e ×e × ...×e N (11)

• Since 1 + 2 + 3 + . . . + (k − 1) is equal to k(k−1)

2 , we can write
the following expression for the lower bound on the probability
k(k−1)
1 − e− 2N (12)

So the probability that a pool of k messages will have

at least one pair with identical hash codes is always
greater than the value given by the above formula.

• We can use the above formula to estimate the size k of the pool
so that the pool contains at least one pair of messages with equal
hash codes with a probability of 0.5. We need to solve
k(k−1) 1
1 − e− 2N =
2

21
Simplifying, we get
k(k−1)
e 2N = 2

Therefore,
k(k − 1)
= ln2
2N
which gives us
k(k − 1) = (2ln2)N

• Assuming k to be large, the above equation gives us

k2 ≈ (2ln2)N (13)

implying
q
k ≈ (2ln2)N
√
≈ 1.18 N
√
≈ N

• So our final result is that √

if the hash code can take on a total N
different values, a pool of N messages will contain at least one
pair of messages with the same hash code with a probability of
0.5.

22
• So if we use an n-bit hash code, we have N = 2n. In this case,
a message pool of 2n/2 randomly generated messages will con-
tain at least one with a specified value for the hash code with a
probability of 0.5.

• Let’s again consider the case of 64 bit hash codes. Now N = 264.
So a pool of 232 randomly generated messages will have at least
one pair with identical hash codes with a probability of 0.5.

23
15.7: The Birthday Attack

• This attack applies to the following scenario: Say A has a dis-

honest assistant B preparing contracts for A’s digital signature.

• B prepares the legal contract for a transaction. B then proceeds

to create a large number of variations of the legal contract without
altering the legal content of the contract and computes the hash
code for each. These variations may be constructed by mostly
innocuous changes such as the insertion of additional white space
between some of the words, or contraction of the same; insertion
or or deletion of some of the punctuation, slight reformatting of
the document, etc.

• B prepares a fraudulent version of the contract. As with the

correct version, B prepares a large number of variations of this
contract, using the same tactics as with the correct version.

• Now the question is: “What is the probability that the two sets
of contracts will have at least one contract each with the same
hash code?”

24
• Let the set of variations on the correct form of the contract be
denoted {c1, c2, . . . , ck } and the set of variations on the fraudu-
lent contract by {f1, f2, . . . , fk }. We need to figure out the
probability that there exists at least one pair (ci , fj )
so that h(ci ) = h(fj ).

• If we assume (a very questionable assumption indeed ) that all the

fraudulent contracts are truly random vis-a-vis the correct ver-
sions of the contract, then the probability of f1’s hash code being
any one of N permissible values is N1 . Therefore, the probabil-
ity that the hash code h(c1) matches the hash code h(f1) is N1 .
Hence the probability that the hash code h(c1) does not match
the hash code h(f1) is 1 − N1 .

• Extending the above reasoning to joint events, the probability

that h(c1) does not match h(f1) and h(f2) and . . ., h(fk ) is
k
1
 
1 − 
N

• The probability that the same holds conjunctively for all members
of the set {c1, c2, . . . , ck } would therefore be
!k 2
1
1−
N

25
This is the probability that there will NOT exist any
hash code matches between the two sets of contracts
{c1, c2, . . . , ck } and {f1, f2, . . . , fk }.

• Therefore the probability that there will exist at least one

match in hash code values between the set of correct contracts
and the set of fraudulent contracts is
!k 2
1
1 − 1−
N

1
• Since 1 − N1 is always less than e− N , the above probability will
always be greater than
k 2
− N1
1 − e

• Now let’s pose the question: “What is the least value of k so

that the above probability is 0.5?” We obtain this value of k by
solving
k2 1
1 − e− N =
2
which simplifies to
k2
eN = 2

which gives us

26
q √ √
k = (ln 2)N = 0.83 N ≈ N
√
So if B is willing to generate N versions of the both the correct
contract and the fraudulent contract, there is better than an even
chance that B will find a fraudulent version to replace the correct
version.

• If n bits are used for the hash code, N = 2n . In this case,

k = 2n/2.

• The birthday attack consists, as you’d expect, of B getting A to

digitally sign a correct version of the contract and then replacing
the contract by its fraudulent version that has the same hash
code value. The fact that A would encrypt the hash code with
his/her private key is of no consequence.

• This attack is called the birthday attack because the combina-

torial issues involved are the same as in the birthday paradox
presented earlier. Also note that for n-bit hash codes, the value
of k the approximate value we obtained for k is the same in both
cases, that is 2n/2.

27
15.8: Structure of Cryptographically Secure Hash
Functions

• A hash function is cryptographically secure if it is computation-

ally infeasible to find collisions, that is if it is computationally in-
feasible to construct meaningful messages whose hash code would
equal a specified value. Said another way, a hash function should
be strictly one-way, in the sense that it lets us compute the
hash code for a message, but does not let us figure out a message
for a given hash code.

• Most secure hash functions are based on the structure proposed

by Merkle. This structure forms the basis of SHA series of hash
functions and also the Whirlpool hash function.

• The input message is partitioned into L bit blocks, each of size b

bits. If necessary, the final block is padded suitably so that it is
of the same length as others.

• The final block also includes the total length of the message whose
hash function is to be computed. This step enhances the security
of the hash function since it places an additional constraint on
28
the counterfeit messages.

• Merkle’s structure, shown in Figure 3, consists of L stages of

processing, each stage processing one of the b-bit blocks of the
input message.

• Each stage of the structure in Figure 3 takes two inputs, the b-

bit block of the input message meant for that stage and the n-bit
output of the previous stage.

• For the n-bit input, the first stage is supplied with a special N -bit
pattern called the Initialization Vector (IV).

• The function f that processes the two inputs, one n bits long and
the other b bits long, to produce an n bit output is usually called
the compression function. That is because, usually, b > n,
so the output of the f function is shorter than the length of the
input message segment.

• The function f itself may involve multiple rounds of pro-

cessing of the two inputs to produce an output.

29
• The precise nature of f depends on what hash algorithm is being
implemented, as we will see in the rest of this lecture.

30
Message Message Length +
Block 1 Block 2 Padding

b bits b bits b bits

Hash
Initialization f f f
Vector n bits n bits n bits n bits

Figure 3: This figure is from Lecture 15 of “Computer and Net-

work Security” by Avi Kak

31
15.9: The SHA Family of Hash Functions

• SHA (Secure Hash Algorithm) refers to a family of NIST-approved

cryptographic hash functions.

• The most commonly used hash function from the SHA family
is SHA-1. It is used in many applications and protocols that
require secure and authenticated communications. SHA-1 is used
in SSL/TLS, PGP, SSH, S/MIME, and IPSec. (These standards
will be briefly reviewed in Lecture 20.)

• The following table shows the various parameters of the different

SHA hashing functions.

Algorithm Message Block Word Message Security

Size Size Size Digest Size
(bits) (bits) (bits) (bits) (bits)
SHA-1 < 264 512 32 160 80
SHA-256 < 264 512 32 256 128
SHA-384 < 2128 1024 64 384 192
SHA-512 < 2128 1024 64 512 256

Here is what the different columns of the above table stand for:
32
– The column Message Size shows the upper bound on the size
of the message that an algorithm can handle.

– The column heading Block Size is the size of each bit block
that the message is divided into. Recall from Section 15.8 that
an input message is divided into a sequence of b-bit blocks.
Block size for an algorithm tells us the value of b in the figure
on Slide 27.

– The Word Size is used during the processing of the input

blocks, as will be explained later. Message Digest Size refers
to the size of the hash code produced.

– Finally, the Security column refers to how many messages

would have to be generated before one can be found with the
same hash code with a probability of 0.5. (This is the Birth-
day Attack presented in Section 15.7.) As shown previously,
in general, for a secure hash algorithm producing n-bit hash
codes, one would need to come up with 2n/2 messages in order
to discover a collision with a probability of 0.5. That’s why
the entries in the last column are half in size compared to the
entries in the Message Digest Size.

• The algorithms SHA-256, SHA-384, and SHA-512 are collectively

referred to as SHA-2.

33
• Also note that SHA-1 is a successor to MD5 that used to be a
widely used hash function.

• SHA-1 was cracked in year 2005 by two different research groups.

In particular, Wang, Yin, and Yu demonstrated that it is possible
to come up with a collision for SHA-1 with only 269 operations,
far fewer than the security level of 280 that is associated with this
hash function.

• NIST will withdraw its approval of SHA-1 by 2010.

34
15.10: The SHA-512 Secure Hash Algorithm

Figure 4 shows the overall processing steps of SHA-512. To describe

them in detail:

Append Padding Bits and Length Value: This step makes

the input message an exact multiple of 1024 bits:

• The length of the overall message to be hashed must be a

multiple of 1024 bits.

• The last 128 bits of what gets hashed are reserved for the
message length value.

• This implies that even if the original message were by chance

to be an exact multiple of 1024, you’d still need to append
another 1024-bit block at the end to make room for the 128-
bit message length integer.

• Leaving aside the trailing 128 bit positions, the padding con-
sists of a single 1-bit followed by the required number of 0-bits.

35
• The length value in the trailing 128 bit positions is an unsigned
integer with its most significant byte first.

• The padded message is now an exact multiple of 1024 bit

blocks. We represent it by the sequence {M1, M2, . . . , MN },
where Mi is the 1024 bits long ith message block.

Initialize Hash Buffer with Initialization Vector: You’ll

recall from Figure 3 that before we can process the first message
block, we need to initialize the hash buffer with IV, the Initial-
ization Vector:

• We represent the hash buffer by eight 64-bit registers.

• For explaining the working of the algorithm, these registers

are labeled (a, b, c, d, e, f, g, h).

• The registers are initialized by the first 64 bits of the frac-

tional part of the first eight primes.

Process Each 1024-bit Message Block Mi : Each message

block is taken through 80 rounds of processing. All of this pro-
cessing is represented by the module labeled f in Figure 4.
36
• The 80 rounds of processing for each 1024-bit message block
are depicted in Figure 5. In this figure, the labels a, b, c, . . . , h
are for the eight 64-bit registers of the hash buffer. Figure
5 stands for the modules labeled f in the overall processing
diagram in Figure 4.

• In keeping with the overall processing architecture shown in

Figure 3, the module f for processing the message block Mi
has two inputs: the current contents of the 512-bit hash buffer
and the 1024-bit message block. These are fed as inputs to
the first of the 80 rounds of processing depicted in Figure 5.

• The round based processing requires a message schedule

that consists of 80 64-bit words labeled {W0, W1, . . . , W79}.
The first sixteen of these, W0 through W15, are the sixteen
64-bit words in the 1024-bit message block Mi. The rest of
the words in the message schedule are obtained by
Wi = Wi−16 +64 σ0(Wi−15) +64 Wi−7 +64 σ1 (Wi−2)
where
σ0(x) = ROT R1 (x) ⊕ ROT R8 (x) ⊕ SHR7 (x)
σ1(x) = ROT R19 (x) ⊕ ROT R61 (x) ⊕ SHR6 (x)

ROT Rn (x) = circular right shif t of the 64 bit arg by n bits

SHRn (x) = lef t shif t of the 64 bit arg by n bits
with padding by zeros on the right
+64 = addition module 264

37
• The ith round is fed the 64-bit message schedule word Wi and
a special constant Ki.

• The constants Ki’s represent the first 64 bits of the frac-

tional parts of the cube roots of the first eighty
prime numbers. Basically, these constants are meant to
be random bit patterns to break up any regularities in the
message blocks.

• How the contents of the hash buffer are processed along with
the inputs Wi and Ki is referred to as implementing the
round function.

• The round function consists of a sequence of transpositions

and substitutions, all designed to diffuse to the maximum ex-
tent possible the content of the input message block. The
relationship between the contents of the eight registers of the
hash buffer at the input to the ith round and the output from
this round is given by

a = T1 + T2
b = a
c = b
d = c
e = d + T1

38
f = e
g = f
h = g
where
X
T1 = h +64 Ch(e, f, g) +64 e +64 Wi +64 Ki
X
T2 = a +64 Maj(a, b, c)
Ch(e, f, g) = (e AN D f ) ⊕ (N OT e AN D g)
Maj(a, b, c) = (a AN D b) ⊕ (a AN D c) ⊕ (b AN D c)
ROT R28 (a) ⊕ ROT R24 (a) ⊕ ROT R39 (a)
X
a =
ROT R14 (e) ⊕ ROT R18 (e) ⊕ ROT R41 (e)
X
e =
+64 = addition modulo 264
Note that, when considered on a bit-by-bit basis the function
M aj() is true, that is equal to the bit 1, only when a majority
of its arguments (meaning two out of three) are true. Also,
the function Ch() implements at the bit level the conditional
statement “if arg1 then arg2 else arg3”.

• The output of the 80th round is added to the content of the

hash buffer at the beginning of the round-based processing.
This addition is performed separately on each 64-
bit word of the output of the 80th modulo 264. In
other words, the addition is carried out separately for each of
the eight registers of the hash buffer modulo 264.

Finally, ....: After all the N message blocks have been processed
(see Figure 4), the content of the hash buffer is the message digest.
39
Augmented Message: Multiple of 1024−bit blocks

Actual Message Length: L bits

Padding +
Length
Block 1 Block 2 Block N

1024 bits 1024 bits 1024 bits

M1 M2 MN

Initialization
Vector

Hash
512 bits f f f
H0 H1 H2 HN−1 HN
512 bits 512 bits 512 bits 512 bits 512 bits

Figure 4: This figure is from Lecture 15 of “Computer and Net-

work Security” by Avi Kak

40
Mi H
i−1
Compression function f

Message
Schedule
Eight 64−bit registers of
a b c d e f g h the 512 bit hash buffer

W
0
Round 0
K0

a b c d e f g h

W
1
Round 1
K1

a b c d e f g h

W
79
Round 79
K
79

a b c d e f g h

64
+ + + + + + + + Addition Modulo 2

a b c d e f g h

H
i

Figure 5: This figure is from Lecture

41 15 of “Computer and Net-
work Security” by Avi Kak
15.11: Hash Functions for Computing Message
Authentication Codes

• Just as a hash code is a fixed-size fingerprint of a variable-sized

message, so is a message authentication code (MAC).

• A MAC is also known as a cryptographic checksum and as

an authentication tag.

• A MAC can be produced by appending a secret key to the mes-

sage and then hashing the composite message. The resulting
hash code is the MAC. [A MAC produced with a hash function is also referred
to by HMAC. A MAC can also be based on a block cipher or a stream cipher. The
block-cipher based DES-CBC MAC is widely used in various standards.]

• More sophisticated ways of producing a MAC may involve an

iterative procedure in which a pattern derived from the key is
added to the message, the composite hashed, another pattern
derived from the key added to the hash code, the new composite
hashed again, and so on.

42
• Another way to generate a MAC would be to compress the mes-
sage into a fixed-size signature and to then encrypt the signature
with an algorithm like DES. The output of the encryption algo-
rithm becomes the MAC value and the encryption key the secret
that must be shared between the sender and the receiver of a
message.

• Assuming a collision-resistant hash function, the original message

and its MAC can be safely transmitted over a network without
worrying that the integrity of the data may get compromised. A
recipient with access to the key used for calculating the MAC can
verify the integrity of the message by recomputing its MAC and
comparing it with the value received.

• Let’s denote the function that generates the MAC of a message M

using a secret key K by C(K, M ). That is M AC = C(K, M ).

• Here is a MAC function that is positively not safe:

– Let {X1, X2, . . . , } be the 64-bit blocks of a message M . That

is M = (X1||X2|| . . . ||Xm). (The operator ’||’ means
concatenation.) Let
∆(M) = X1 ⊕ X2 ⊕ · · · ⊕ Xm

43
– We now define
C(K, M) = E(K, ∆(M))

where the encryption algorithm, E(), is assumed to be DES

in the electronic codebook mode. (That is why we assumed
64 bits for the block length. We will also assume the key
length to be 56 bits.) Let’s say that an adversary can observe
{M, C(K, M )}.

– An adversary can easily created a forgery of the message by

replacing X1 through Xm−1 with any desired Y1 through
Ym−1 and then replacing Xm with Ym that is given by
Ym = Y1 ⊕ Y2 ⊕ · · · ⊕ Ym−1 ⊕ ∆(M)

It is easy to show that when the new message Mf orged =

{Y1||Y2|| · · · ||Ym} is concatenated with the original C(K, ∆(M )),
the recipient would not suspect any foul play. When the recip-
ient calculates the MAC of the received message using his/her
secret key K, the calculated MAC would agree with the re-
ceived MAC.

• The lesson to be learned from the unsafe MAC algorithm is that

although a brute-force attack to figure out the secret key K would
be very expensive (requiring around 256 encryptions of the mes-
sage), it is nonetheless ridiculously easy to replace a legitimate
message with a fraudulent one.
44
• A commonly-used and cryptographically-secure approach for com-
puting MACs is known as HMAC. It is used in the IPSec proto-
col (for packet-level security in computer networks), in SSL (for
transport-level security), and a host of other applications.

• The size of the MAC produced by HMAC is the same as the

size of the hash code produced by the underlying hash function
(which is typically SHA-1).

• The operation of the HMAC algorithm is shown Figure 6. This

figure assumes that you want an n-bit MAC and that you will be
processing the input message M one block at a time, with each
block consisting of b bits.

– The message is segmented into b-bit blocks Y1, Y2, . . ..

– K is the secret key to be used for producing the MAC.

– K + is the secret key K padded with zeros on the left so

that the result is b bits long. Recall, b is the length of each
message block Yi.

– The algorithm constructs two sequences ipad and opad, the

45
former by repeating the 00110110 sequence b/8 times, and the
latter by repeating 01011100 also b/8 times.

– The operation of HMAC is described by:

HMACK (M) = h ( (K ⊕ opad) || h ( (K ⊕ ipad) ||M ) )

where h() is the underlying iterated hash function of the sort

we have covered in this lecture.

• The security of HMAC depends on the security of the underly-

ing hash function, and, of course, on the size and the quality of
the key.

• For further information on HMAC, see Chapter 12 of “Cryp-

tography and Network Security” by William Stallings, the source
of the information presented here.

46
+
K

ipad

b bits b bits b bits

Y Y Y
0 1 L−1
b bits

+
K
HASH

opad
n bit hash
b bits
pad n−bit hash to b bits

b bits b bits

HASH

HMAC
n bits

Figure 6: This figure is from “Computer and Network Security”

by Avi Kak
47
HOMEWORK PROBLEMS

1. What is a hash code?

2. If you had only one minute to write a program that calculates

the 8-bit hash code of the contents of a disk file, how might you
do it?

3. Why would is it a foolish exercise to calculate an 8-bit hash by

XORing all the bytes in a file?

4. Even though its support will soon be withdrawn by the govern-

ment, what is probably the most frequently used hash coding
algorithm used today? What is the size of the hash code pro-
duced by this algorithm?

5. The very first step in the SHA1 algorithm is to pad the message
so that it is a multiple of 512 bits. This padding occurs as follows
(from NIST FPS 180-2): Suppose the length of the message M
is L bits. Append bit 1 to the end of the message, followed by K
zero bits where K is the smallest non-negative solution to
L + 1 + K = 448 mod 512

48
Next append a 64-bit block that is a binary representation of the
length integer L. For example,

Message = "abc"
length L = 24 bits

01100001 01100010 01100011 1 00......000 00...011000

a b c <---423---> <---64---->

<------------------- 512 ------------------------------>

Now here is the question: Why do we include the length of the

message in the calculation of the hash code?

6. The fact that only the last 64 bits of the padded message are
used for representing the length of the message implies that SHA1
should NOT be used for messages that are longer than what?

7. SHA1 scans through a document by processing 512-bit blocks.

Each block is hashed into a 160 bit hash code that is then used
as the initialization vector for the next block of 512 bits. This
obviously requires a 160 bit initialization vector for the first 512-
bit block. Here is the vector:
H_0 = 67452301 (32 bits in hex)
H_1 = efcdab89
H_2 = 98badcfe
H_3 = 10325476
49
H_4 = c3d2e1f0
How are these numbers selected?

8. Why can a hash function not be used for encryption?

9. What is meant by the strong collision resistance property of a

hash function?

10. Right or wrong: When you create a new password, only the hash
code for the password is stored. The text you entered for the
password is immediately discarded.

11. What is the relationship between “hash” as in “hash code” or

“hashing function” and “hash” as in a “hash table”?

12. Programming Assignment:

To gain further insights into hashing, the goal of this homework is

to implement in Perl or Python a very simple hash function (that
is meant more for play than for any serious production work).
Write a function that creates a 32-bit hash of a file through the
following steps: (1) Initialize the hash to all zeros; (2) Scan the
file one byte at a time; (3) Before a new byte is read from the
file, circularly shift the bit pattern in the hash to the left by four
positions; (4) Now XOR the new byte read from the file with the
50
least significant byte of the hash. Now scan your directory (a very
simple thing to do in both Perl and Python, as shown in Chapters
2 and 3 of the SWO book) and compute the hash of all your files.
Dump the hash values in some output file. Now write another
two-line script to check if your hashing function is exhibiting any
collisions. Even though we have a trivial hash function, it is very
likely that you will not see any collisions even if your directory is
large. Subsequently, by using a couple of files (containing random
text) created specially for this demonstration, show how you can
make their hash codes to come out to be the same if you alter one
of the files by appending to it a stream of bytes that would be
the XOR of the original hash values for the files (after you have
circularly rotated the hash value for the first file by 4 bits to the
left). NOTE: This homework is easy to implement in Python
if you use your instructor’s BitVector class.

51
Acknowledgement

Prateek Singhal caught a couple of typographical errors in the equa-

tions on slide 26. Thanks Prateek.

Gold B2 First, Unit 12 Key
100% (4)
Gold B2 First, Unit 12 Key
6 pages
Tracking GhostNet: Investigating A Cyber Espionage Network
99% (169)
Tracking GhostNet: Investigating A Cyber Espionage Network
53 pages
Lecture 15
No ratings yet
Lecture 15
96 pages
Lecture 15
No ratings yet
Lecture 15
96 pages
Unit-2-I CNS (2023-24) (UPDATED)
No ratings yet
Unit-2-I CNS (2023-24) (UPDATED)
17 pages
Completeunit 4 1
No ratings yet
Completeunit 4 1
14 pages
Cryptography 2.
No ratings yet
Cryptography 2.
34 pages
Public-Key Cryptography and Message Authentication: Cryptographicandnetwork Security Chapter-3 Messageauthentication
No ratings yet
Public-Key Cryptography and Message Authentication: Cryptographicandnetwork Security Chapter-3 Messageauthentication
27 pages
CNS - Unit 5
No ratings yet
CNS - Unit 5
17 pages
Cns Module 4
No ratings yet
Cns Module 4
20 pages
Unit 4
No ratings yet
Unit 4
6 pages
Unit-V - Network Security
No ratings yet
Unit-V - Network Security
62 pages
IT Security - Hash
No ratings yet
IT Security - Hash
30 pages
Network Security: Unit - V (Cont )
No ratings yet
Network Security: Unit - V (Cont )
62 pages
Chapter 11 Cryptographic Hash Functions
No ratings yet
Chapter 11 Cryptographic Hash Functions
38 pages
Unit 3
No ratings yet
Unit 3
22 pages
Information Systems Security LAB: Hash Functions
No ratings yet
Information Systems Security LAB: Hash Functions
6 pages
Lecture #9 Hashing-1
No ratings yet
Lecture #9 Hashing-1
27 pages
Final Hash Function
No ratings yet
Final Hash Function
29 pages
Hash
No ratings yet
Hash
4 pages
Topic - 04 - Public Key Cryptography and MSG Auth
No ratings yet
Topic - 04 - Public Key Cryptography and MSG Auth
27 pages
Unit 5
No ratings yet
Unit 5
60 pages
Module 3
No ratings yet
Module 3
24 pages
L18hash Function
No ratings yet
L18hash Function
63 pages
CNS unit-3-II
No ratings yet
CNS unit-3-II
14 pages
‎⁨نسخة ch2-Hash - Function (1) ⁩
No ratings yet
‎⁨نسخة ch2-Hash - Function (1) ⁩
24 pages
Slide 02 - Cryptography
No ratings yet
Slide 02 - Cryptography
48 pages
C&NS Unit-3
No ratings yet
C&NS Unit-3
68 pages
CNS Unit 3
No ratings yet
CNS Unit 3
38 pages
Unit 3
No ratings yet
Unit 3
40 pages
CNS - M4 - Hash Function - Requirement, Security
No ratings yet
CNS - M4 - Hash Function - Requirement, Security
30 pages
Cns Unit 4 Portal
No ratings yet
Cns Unit 4 Portal
18 pages
Lec 11 - Hashing
No ratings yet
Lec 11 - Hashing
16 pages
4.11 Data Integrity and Message Authentication
No ratings yet
4.11 Data Integrity and Message Authentication
6 pages
Cryptography and Network Security: Fifth Edition by William Stallings
No ratings yet
Cryptography and Network Security: Fifth Edition by William Stallings
25 pages
A7609 Is Unit 4
No ratings yet
A7609 Is Unit 4
22 pages
Cns 4,5,6
No ratings yet
Cns 4,5,6
56 pages
Unit-Iv Hash Function
No ratings yet
Unit-Iv Hash Function
19 pages
Lecture 18 Hash Functions
No ratings yet
Lecture 18 Hash Functions
26 pages
Hash Functions
No ratings yet
Hash Functions
7 pages
Chapter 4
No ratings yet
Chapter 4
166 pages
Chapter 11 0
No ratings yet
Chapter 11 0
21 pages
7 Cryptography
No ratings yet
7 Cryptography
131 pages
Cns Unit III
No ratings yet
Cns Unit III
22 pages
Comp - Sec Chapter
No ratings yet
Comp - Sec Chapter
47 pages
CH03 NetSec6e - accessiblePPT
No ratings yet
CH03 NetSec6e - accessiblePPT
47 pages
Lec 23
No ratings yet
Lec 23
35 pages
CGNS Unit 3
No ratings yet
CGNS Unit 3
16 pages
ISA Chapter Three
No ratings yet
ISA Chapter Three
57 pages
CNS R20 Unit 4
No ratings yet
CNS R20 Unit 4
36 pages
SHA Digital-Signature-Certificate Kerberos
No ratings yet
SHA Digital-Signature-Certificate Kerberos
62 pages
Ns 2
No ratings yet
Ns 2
38 pages
Lecture 11
No ratings yet
Lecture 11
77 pages
CSS 2 PPT Notes Handwritten
No ratings yet
CSS 2 PPT Notes Handwritten
57 pages
Network Security Hash
No ratings yet
Network Security Hash
32 pages
CH 1 - Hash Functions
No ratings yet
CH 1 - Hash Functions
26 pages
4-1 Cns r20 Unit - 4
No ratings yet
4-1 Cns r20 Unit - 4
30 pages
CNS (R16) B.Tech (CSE) IV Year I Sem
No ratings yet
CNS (R16) B.Tech (CSE) IV Year I Sem
34 pages
Lecture 5
No ratings yet
Lecture 5
15 pages
ECS726-Week04 - Hash - MAC - Digital Sinatures - Freshness - Dynamic Password Schemes
No ratings yet
ECS726-Week04 - Hash - MAC - Digital Sinatures - Freshness - Dynamic Password Schemes
52 pages
CISA EXAM-Testing Concept-Digital Signature
From Everand
CISA EXAM-Testing Concept-Digital Signature
Hemang Doshi
3.5/5 (5)
Hashing
From Everand
Hashing
Prakash Hegade
No ratings yet
Planning and Recon
No ratings yet
Planning and Recon
1 page
TSM at MIT: Security Notes On TSM (Tivoli Storage Manager)
No ratings yet
TSM at MIT: Security Notes On TSM (Tivoli Storage Manager)
3 pages
Hotel Management System: Visvesvaraya Technological University
100% (1)
Hotel Management System: Visvesvaraya Technological University
16 pages
ISO Tools
No ratings yet
ISO Tools
7 pages
Nikhil PPT Project
No ratings yet
Nikhil PPT Project
8 pages
TOC Convergence Report 2024
No ratings yet
TOC Convergence Report 2024
142 pages
House Judiciary Committee Discussion Draft
No ratings yet
House Judiciary Committee Discussion Draft
22 pages
Lecture 1.3 - Authentication and Authorization
No ratings yet
Lecture 1.3 - Authentication and Authorization
5 pages
Chat Sex Live Cam
No ratings yet
Chat Sex Live Cam
2 pages
Introduction To Cryptography
No ratings yet
Introduction To Cryptography
34 pages
Knapsack Cryptosystem
100% (1)
Knapsack Cryptosystem
11 pages
Sravani New Aadhar
No ratings yet
Sravani New Aadhar
1 page
L15 The Disadvantages of ICT
No ratings yet
L15 The Disadvantages of ICT
7 pages
GATE 2018 Admit Card S1: Examination Centre
No ratings yet
GATE 2018 Admit Card S1: Examination Centre
1 page
Top 15 Indicators of Compromise
100% (1)
Top 15 Indicators of Compromise
7 pages
Secure Mobile Development
No ratings yet
Secure Mobile Development
102 pages
Xaudcis Holy Angel University
No ratings yet
Xaudcis Holy Angel University
49 pages
EDITORIAL: The Sim Card Registration Safety in Mobile Phone
No ratings yet
EDITORIAL: The Sim Card Registration Safety in Mobile Phone
2 pages
Cybercrime and Environmentalaws
No ratings yet
Cybercrime and Environmentalaws
7 pages
Admit Card: Photograph of Candidate
No ratings yet
Admit Card: Photograph of Candidate
2 pages
Emerging Challenges To India's National Security: A Domestic Dimension
No ratings yet
Emerging Challenges To India's National Security: A Domestic Dimension
4 pages
Financial Fraud and Identity Theft
No ratings yet
Financial Fraud and Identity Theft
10 pages
Cyber Crime: Lingua House Lingua House
No ratings yet
Cyber Crime: Lingua House Lingua House
6 pages
Up Thana List
No ratings yet
Up Thana List
47 pages
VXVM Restore A Private Region
No ratings yet
VXVM Restore A Private Region
3 pages
Article On Geneva Convention or Declaration For Cyberspace
No ratings yet
Article On Geneva Convention or Declaration For Cyberspace
22 pages
Surf Web Anonymously
100% (2)
Surf Web Anonymously
6 pages
Cyber Crime British English Teacher
No ratings yet
Cyber Crime British English Teacher
12 pages