0% found this document useful (0 votes)
11 views15 pages

BCT Unit III Cryptography and Hashing New

A Message Authentication Code (MAC) is a cryptographic tool used to verify the authenticity and integrity of a message by ensuring it comes from the correct sender and has not been altered. The MD5 algorithm is a widely used hash function that produces a 128-bit hash value, but is now considered insecure for cryptographic applications due to vulnerabilities. Secure Hash Algorithms (SHA) are a family of cryptographic functions designed to secure data, with SHA-1, SHA-2, and SHA-3 providing increasing levels of security against attacks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views15 pages

BCT Unit III Cryptography and Hashing New

A Message Authentication Code (MAC) is a cryptographic tool used to verify the authenticity and integrity of a message by ensuring it comes from the correct sender and has not been altered. The MD5 algorithm is a widely used hash function that produces a 128-bit hash value, but is now considered insecure for cryptographic applications due to vulnerabilities. Secure Hash Algorithms (SHA) are a family of cryptographic functions designed to secure data, with SHA-1, SHA-2, and SHA-3 providing increasing levels of security against attacks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

What Is a Message Authentication Code?

Message Authentication Code (MAC), also referred to as a tag, is used to authenticate the origin and nature of a
message. MACs use authentication cryptography to verify the legitimacy of data sent through a network or
transferred from one person to another.

In other words, MAC ensures that the message is coming from the correct sender, has not been changed, and that
the data transferred over a network or stored in or outside a system is legitimate and does not contain harmful code.
MACs can be stored on a hardware security module, a device used to manage sensitive digital keys.

How Does a Message Authentication Code Work?

The first step in the MAC process is the establishment of a secure channel between the receiver and the sender. To
encrypt a message, the MAC system uses an algorithm, which uses a symmetric key and the plain text message
being sent. The MAC algorithm then generates authentication tags of a fixed length by processing the message. The
resulting computation is the message's MAC.

This MAC is then appended to the message and transmitted to the receiver. The receiver computes the MAC using
the same algorithm. If the resulting MAC the receiver arrives at equals the one sent by the sender, the message is
verified as authentic, legitimate, and not tampered with.

In effect, MAC uses a secure key only known to the sender and the recipient. Without this information, the recipient
will not be able to open, use, read, or even receive the data being sent. If the data is to be altered between the time
the sender initiates the transfer and when the recipient receives it, the MAC information will also be affected.

Therefore, when the recipient attempts to verify the authenticity of the data, the key will not work, and the end
result will not match that of the sender. When this kind of discrepancy is detected, the data packet can be discarded,
protecting the recipient’s system.

What is the MD5 hash function (md5 message-digest)?


MD5 is a widely used hash function that produces a message digest (or hash value) of 128 bits in length. It was
initially designed as a cryptographic hash function but, at a later stage vulnerabilities were found and therefore is
not considered suitable for cryptographic applications.

We can use two main ways of creating Hash functions:

 Based on compression. The input is transformed into an output of a smaller size


o MD5, SHA-1, and SHA-2
 Based on permutations. The output has the same size as the input.
o SHA-3 (Keccak)

MD5 is based on the Merkle-Damgård construction. Find below an illustration of this construction.
What is the MD5 Algorithm?


MD5 is a cryptographic hash function algorithm that takes the message as input of any length and
changes it into a fixed-length message of 16 bytes. MD5 algorithm stands for the message-digest
algorithm. MD5 was developed as an improvement of MD4, with advanced security purposes. The
output of MD5 (Digest size) is always 128 bits. MD5 was developed in 1991 by Ronald Rivest.
Use Of MD5 Algorithm:
 It is used for file authentication.
 In a web application, it is used for security purposes. e.g. Secure password of users etc.
 Using this algorithm, We can store our password in 128 bits format.

MD5 Algorithm

Working of the MD5 Algorithm:

MD5 algorithm follows the following steps


1. Append Padding Bits: In the first step, we add padding bits in the original message in such a
way that the total length of the message is 64 bits less than the exact multiple of 512.
Suppose we are given a message of 1000 bits. Now we have to add padding bits to the original
message. Here we will add 472 padding bits to the original message. After adding the padding bits
the size of the original message/output of the first step will be 1472 i.e. 64 bits less than an exact
multiple of 512 (i.e. 512*3 = 1536).
Length(original message + padding bits) = 512 * i – 64 where i = 1,2,3 . . .
2. Append Length Bits: In this step, we add the length bit in the output of the first step in such a
way that the total number of the bits is the perfect multiple of 512. Simply, here we add the 64-bit
as a length bit in the output of the first step.
i.e. output of first step = 512 * n – 64
length bits = 64.
After adding both we will get 512 * n i.e. the exact multiple of 512.
3. Initialize MD buffer: Here, we use the 4 buffers i.e. J, K, L, and M. The size of each buffer is 32
bits.
- J = 0x67425301
- K = 0xEDFCBA45
- L = 0x98CBADFE
- M = 0x13DCE476
4. Process Each 512-bit Block: This is the most important step of the MD5 algorithm. Here, a
total of 64 operations are performed in 4 rounds. In the 1st round, 16 operations will be performed,
2nd round 16 operations will be performed, 3rd round 16 operations will be performed, and in the
4th round, 16 operations will be performed. We apply a different function on each round i.e. for the
1st round we apply the F function, for the 2nd G function, 3rd for the H function, and 4th for the I
function.
We perform OR, AND, XOR, and NOT (basically these are logic gates) for calculating functions. We
use 3 buffers for each function i.e. K, L, M.
- F(K,L,M) = (K AND L) OR (NOT K AND M)
- G(K,L,M) = (K AND L) OR (L AND NOT M)
- H(K,L,M) = K XOR L XOR M
- I(K,L,M) = L XOR (K OR NOT M)
After applying the function now we perform an operation on each block. For performing operations
we need
 add modulo 232
 M[i] – 32 bit message.
 K[i] – 32-bit constant.
 <<<n – Left shift by n bits.
Now take input as initialize MD buffer i.e. J, K, L, M. Output of K will be fed in L, L will be fed into M,
and M will be fed into J. After doing this now we perform some operations to find the output for J.
 In the first step, Outputs of K, L, and M are taken and then the function F is applied to them. We
will add modulo 232 bits for the output of this with J.
 In the second step, we add the M[i] bit message with the output of the first step.
 Then add 32 bits constant i.e. K[i] to the output of the second step.
 At last, we do left shift operation by n (can be any value of n) and addition modulo by 2 32.
After all steps, the result of J will be fed into K. Now same steps will be used for all functions G, H,
and I. After performing all 64 operations we will get our message digest.
Output:
After all, rounds have been performed, the buffer J, K, L, and M contains the MD5 output starting
with the lower bit J and ending with Higher bits M.
Output

Hash of the input string:


922547e866c89b8f677312df0ccec8ee

Application Of MD5 Algorithm:

 We use message digest to verify the integrity of files/ authenticates files.


 MD5 was used for data security and encryption.
 It is used to Digest the message of any size and also used for Password verification.
 For Game Boards and Graphics.

Advantages of MD5 Algorithm:

 MD5 is faster and simple to understand.


 MD5 algorithm generates a strong password in 16 bytes format. All developers like web
developers etc use the MD5 algorithm to secure the password of users.
 To integrate the MD5 algorithm, relatively low memory is necessary.
 It is very easy and faster to generate a digest message of the original message.

Disadvantages of MD5 Algorithm:

 MD5 generates the same hash function for different inputs.


 MD5 provides poor security over SHA1.
 MD5 has been considered an insecure algorithm. So now we are using SHA256 instead of MD5
 MD5 is neither a symmetric nor asymmetric algorithm.

Is MD5 Secure?
MD5 is not suitable for cryptographic applications.
As stated in RFC 6151 “MD5 is no longer acceptable where collision resistance is required such as digital
signatures.”

Secure Hash Algorithms

Secure Hash Algorithms, also known as SHA, are a family of cryptographic functions designed to keep data
secured. It works by transforming the data using a hash function: an algorithm that consists of bitwise
operations, modular additions, and compression functions.

The hash function then produces a fixed-size string that looks nothing like the original.

These algorithms are designed to be one-way functions, meaning that once they’re transformed into their respective
hash values, it’s virtually impossible to transform them back into the original data.

A few algorithms of interest are SHA-1, SHA-2, and SHA-3, each of which was successively designed with
increasingly stronger encryption in response to hacker attacks.

SHA-0, for instance, is now obsolete due to the widely exposed vulnerabilities.

A common application of SHA is to encrypting passwords, as the server side only needs to keep track of a specific
user’s hash value, rather than the actual password.

This is helpful in case an attacker hacks the database, as they will only find the hashed functions and not the actual
passwords, so if they were to input the hashed value as a password, the hash function will convert it into another
string and subsequently deny access.

Additionally, SHAs exhibit the avalanche effect, where the modification of very few letters being encrypted causes
a big change in output; or conversely, drastically different strings produce similar hash values.

This effect causes hash values to not give any information regarding the input string, such as its original length.

In addition, SHAs are also used to detect the tampering of data by attackers, where if a text file is slightly changed
and barely noticeable, the modified file’s hash value will be different than the original file’s hash value, and the
tampering will be rather noticeable.

A small tweak in the original data produces a drastically different encrypted output. This is called the avalanche effect [1] .
SHA Characteristics
Cryptographic hash functions are utilized in order to keep data secured by providing three fundamental safety
characteristics: pre-image resistance, second pre-image resistance, and collision resistance.

The cornerstone of cryptographic security lies in the provision of pre-image resistance, which makes it hard and
time-consuming for an attacker to find an original message, m, given the respective hash value, hm.

This security is provided by the nature of one-way functions, which is a key component of SHA. Pre-image
resistance is necessary to ward off brute force attacks from powerful machines.

One-way Function

Alice and Bob are pen pals who share their thoughts via mail. When Alice visited Bob, she gave him a phone book
of her city.

In order to keep their messages safe from intruders, Alice tells Bob that she will encrypt the message. She tells Bob
that he will find a bunch of numbers on every letter, and each sequence of numbers represents a phone number.

Bob’s job is to find the phone number in the book and write down the first letter of the person’s last name. With this
function, Bob is to decrypt the entire message.

To decrypt the message, Bob has to read the entire phone book to find all the numbers on the letter, whereas Alice
can quickly find the letters and their respective phone numbers in order to encrypt her message.

For this reason, before Bob is able to decrypt the message by hand, Alice can re-hash the message and keep the data
secure. This makes Alice’s algorithm a one-way function[2].
The second safety characteristic is called second pre-image resistance, granted by SHA when a message is
known, m1, yet it’s hard to find another message, 2m2, that hashes to the same value: Hm1=Hm2.

Without this characteristic, two different passwords would yield the same hash value, deeming the original
password unnecessary in order to access secured data.

The last safety characteristic is collision resistance, which is provided by algorithms that make it extremely hard
for an attacker to find two completely different messages that hash to the same hash value: Hm1=Hm2. In order to
provide this characteristic, there must be a similar number of possible inputs to possible outputs, as more inputs
than outputs, by the pigeonhole principle, will definitively incur potential collisions.

For this reason, collision resistance is necessary, as it implies that finding two inputs that hash to the same hash
value is extremely difficult. Without collision resistance, digital signatures can be compromised as finding two
messages that produce the same hash value may make users believe two documents were signed by two different
people when one person was able to produce a different document with the same hash value.

Recent cryptographic functions have stronger security characteristics to block off recently developed techniques
such as length extension attacks, where given a hash value, hash(m), and the length of the original message, m, an
attacker can find a message, ’m’, and calculate the hash value of the concatenation of the original message and the
new message: hash (m∣∣m’).
As a general guideline, a hash function should be as seemingly random as possible while still
being deterministic and fast to compute.

SHA-1
Secure Hash Algorithm 1, or SHA-1, was developed in 1993 by the U.S. government's standards agency National
Institute of Standards and Technology (NIST). It is widely used in security applications and protocols,
including TLS, SSL, PGP, SSH, IPsec, and S/MIME.

SHA-1 works by feeding a message as a bit string of length less than 264264 bits, and producing a 160-bit hash
value known as a message digest. Note that the message below is represented in hexadecimal notation for
compactness.

There are two methods to encrypt messages using SHA-1. Although one of the methods saves the processing of
sixty-four 32-bit words, it is more complex and time-consuming to execute, so the simple method is shown in the
example below. At the end of the execution, the algorithm outputs blocks of 16 words, where each word is made up
of 16 bits, for a total of 256 bits.

Pseudocode

Suppose the message ‘abc’ were to be encoded using SHA-1, with the message ‘abc’ in binary being

01100001 01100010 01100011

and that in hex being

616263
1) The first step is to initialize five random strings of hex characters that will serve as part of the hash function
(shown in hex):

H0 =67DE2A01
H1=BB03E28C
H2=011EF1DC
H3=9293E9E2
H4=CDEF23A9.

2) The message is then padded by appending a 1, followed by enough 0s until the message is 448 bits. The length of
the message represented by 64 bits is then added to the end, producing a message that is 512 bits long:

Padding of string "abc" in bits, finalized by the length of the string, which is 24 bits.
3) The padded input obtained above, M, is then divided into 512-bit chunks, and each chunk is further divided into
sixteen 32-bit words, W0…W15. In the case of ‘abc’, there’s only one chunk, as the message is less than 512-bits
total.

4) For each chunk, begin the 80 iterations, i, necessary for hashing (80 is the determined number for SHA-1), and
execute the following steps on each chunk,:Mn:

For iterations 16 through 79, where 16≤i≤79, perform the following


operation:,W(i)=S1(W(i−3)⊕W(i−8)⊕W(i−14)⊕W(i−16)),where XOR, or ⊕⊕, is represented by the

following comparison of inputs x and y:


X y Output

0 0 0

1 0 1

0 1 1

1 1 0

For example, when i is 16, the words chosen are W(13),W(8),W(2),W(0), and the output is a new
word, W(16), so performing the XOR, or ⊕, operation on those words will give this:

W(0) 01100001 01100010 01100011 10000000

W(2) 00000000 00000000 00000000 000

W(8) 00000000 00000000 00000000 000

W(13) 00000000 00000000 00000000 000

W(16) 01100001 01100010 01100011 10000000

Circular Shift Operation

Now, the circular shift operation S n(X) on the word X by n bits, n being an integer between 0 and 32, is defined by

S n(X) = (X<<n) OR (X>>32−n),

where X<<n is the left-shift operation, obtained by discarding the leftmost n bits of X and padding the result
with n zeroes on the right.

X>>32−n is the right-shift operation obtained by discarding the rightmost n bits of X and padding the result
with n zeroes on the left.
Thus S n(X) is equivalent to a circular shift of X by n positions, and in this case the circular left-shift is used. [3]

So, a left shift S n (W(i)), where W(i) is 10010, would produce 01001, as the rightmost bit 0 is shifted to the left side
of the string. Therefore, W(16) would end up being

11000010 11000100 11000111 000000000.

5) Now, store the hash values defined in step 1 in the following variables:

A=H0
B=H1
C=H2
D=H3
E=H4.

6) For 80 iterations, where 0 ≤ i ≤ 79, compute

TEMP=S 5∗(A)+f(i;B,C,D)+E+W(i)+K(i).

See below for details on the logical function, f, and on the values of.K(i). Reassign the following variables:
E=D
D=C
C=S30(B)
B =A
A=TEMP.

7) Store the result of the chunk’s hash to the overall hash value of all chunks, as shown below, and proceed to
execute the next chunk:

H0 =H0+A

H1 =H1+B
H2 =H2+C
H3 =H3+D
H4=H4+E.

comprised of the OR logical operator, ∨, of the 5 hashed values:


8) As a final step, when all the chunks have been processed, the message digest is represented as the 160-bit string

HH=S128(H0) ∨ S96(H1) ∨ S64(H2) ∨ S32(H3) ∨ H4.

So, the string ‘abc’ becomes represented by a hash value akin to a9993e364706816aba3e25717850c26c9cd0d89d.
If the string changed to ‘abcd’, for instance, the hashed value would be drastically different so attackers cannot tell
that it is similar to the original message.

The hash value for 'abcd' is 81fe8bfe87576c3ecb22426f8e57847382917acf.

Functions used in the algorithm

A sequence of logical functions are used in SHA-1, depending on the value of i, where 0≤i≤79, and on three 32-bit
words B, C, and D, in order to produce a 32-bit output. The following equations describe the logical functions,
where ¬ is the logical NOT, ∨ is the logical OR, ∧ is the logical AND, and ⊕ is the logical XOR:

f(i;B,C,D) = (B ∧ C)∨((¬B)∧D ) for 0≥i≥19


f(i;B,C,D) = B ⊕ C ⊕ D for 20≥i≥39
f(i;B,C,D) = (B ∧ C) ∨ (B ∧ D)∨ (C ∧ D) for 40≥i≥59
f(i;B,C,D) = B⊕ C ⊕ D for 60≥i≥79.

Additionally, a sequence of constant words, shown in hex below, is used in the formulas:

K(i)=5A827999, where 0≤i≤19


K(i)=6ED9EBA1, where 20≤i≤39
K(i)=8F1BBCDC, where 40≤i≤59
K(i)=CA62C1D6,where 60≤i≤79.
SHA-2
Due to the exposed vulnerabilities of SHA-1, cryptographers modified the algorithm to produce SHA-2, which
consists of not one but two hash functions known as SHA-256 and SHA-512, using 32- and 64-bit words,
respectively. There are additional truncated versions of these hash functions, known as SHA-224, SHA-384, SHA-
512/224, and SHA-512/256, which can be used for either part of the algorithm.

SHA-1 and SHA-2 differ in several ways; mainly, SHA-2 produces 224- or 256-sized digests, whereas SHA-1
produces a 160-bit digest; SHA-2 can also have block sizes that contain 1024 bits, or 512 bits, like SHA-1.

Brute force attacks on SHA-2 are not as effective as they are against SHA-1. A brute force search for finding a
message that corresponds to a given digest of length L using brute force would require 2L evaluations, which makes
SHA-2 a lot safer against these kinds of attacks.

Secure Hashing Algorithm – 512


Given a string S of length N, the task is to find the SHA-512 Hash Value of the given string S.

Approach: Follow the steps below to solve the problem:


 Convert the given string into the binary form .
 Append ‘1’ to the string and then ‘0’ continuously until length of the string is < (N%(1024 – 128)).
 Add the 128-bit binary representation of N in the string S.
 Find the number of chunks of the size of 1024 and store it in a variable, say chunks as N/1024.
 Divide the string S into 16 chunks of 64 characters.
 Extend the number of chunks to 80 by performing the following operations:
 Iterate over the range [16, 80] and then find 4 values say WordA, WordB, WordC, WordD as:
 WordA = rotate_right(Message[g – 2], 19) ^ rotate_right(Message[g – 2], 61) ^
shift_right(Message[g – 2], 6).
 WordB = Message[g – 7].
 WordC = rotate_right(Message[g – 15], 1) ^ rotate_right(Message[g – 15], 8) ^
shift_right(Message[g – 15], 7).
 WordD = Message[g – 16].
 Update the value of Message[g] as (WordA + WordB + WordC + WordD).
 Initialize 8 variables say A, B, C, D, E, F, G, H of type 64-bit to store the final hash value of the given
string S.
 Traverse the array Block[] and perform the following steps:
 Update the value of A, B, C, D, E, F, G, H using the Hash Function till 80 iterations by rotating
one by one.
 Now, update the value of A, B, C, D, E, F, G, H by the summation of previous values
of A, B, C, D, E, F, G, H and the newly updated value of A, B, C, D, E, F, G, H.
 After completing the above steps, print the hexadecimal values of A, B, C, D, E, F, G, H to get the Hash
Value of the given string.
Common Attacks
Cryptography is developed because of attacks.

One of the most common attacks is known as the preimage attack, where pre-computed tables of solutions are used
in a brute-force manner to crack passwords.

The solution against these kinds of attacks is to compose a hash function that would take an attacker an very high
amount of resources, such as millions of dollars or decades of work, to find a message corresponding to a given
hash value.

Most attacks penetrating SHA-1 are collision attacks, where a non-sensual message produces the same hash value
as the original message.

Generally, this takes time proportional to 2n/2 to complete, where n is the length of the message. This is the reason
the message digests have increased in length from 160-bit digests in SHA-1 to 224- or 256-bit digests in SHA-2.

Other attacks exist that attempt to exploit mathematical properties in order to crack hash functions. Amongst these
is the birthday attack, where higher likelihood of collisions are found when using random attacks with a fixed
number of letter combinations.

What Is a Distributed Hash Table?

A Distributed Hash Table is a decentralized data store that looks up data based on key-value pairs . Every
node in a distributed hash table is responsible for a set of keys and their associated values. The key is a unique
identifier for its associated data value, created by running the value through a hashing function. The data
values can be any form of data.

Distributed hash tables are decentralized, so all nodes form the collective system without any centralized
coordination. They are generally fault-tolerant because data is replicated across multiple nodes. Distributed
hash tables can scale for large volumes of data across many nodes.

A Distributed Hash Table is a decentralized data store that holds data in key-value pairs.
Why Is a Distributed Hash Table Used?

Distributed hash tables provide an easy way to find information in a large collection of data because all keys
are in a consistent format, and the entire set of keys can be partitioned in a way that allows fast identification
on where the key/value pair resides.

The nodes participating in a distributed hash table act as peers to find specific data values, as each node stores
the key partitioning scheme so that if it receives a request to access a given key, it can quickly map the key to
the node that stores the data. It then sends the request to that node.

Also, nodes in a distributed hash table can be easily added or removed without forcing a significant amount of
re-balancing of the data in the cluster.

Cluster rebalancing, especially for large data sets, can often be a time-consuming task that also impacts
performance.

Having a quick and easy means for growing or shrinking a cluster ensures that changes in data size does not
disrupt the operation of the applications that access data in the distributed hash table.

What is Hashing in Data Structure?

Hashing in the data structure is a technique of mapping a large chunk of data into small tables using a hashing
function. It is also known as the message digest function. It is a technique that uniquely identifies a specific item
from a collection of similar items.

Featured Program for you: It uses hash tables to store the data in an array format. Each value in the array has
been assigned a unique index number.

Hash tables use a technique to generate these unique index numbers for each value stored in an array format.
This technique is called the hash technique.

You only need to find the index of the desired item, rather than finding the data. With indexing, you can quickly
scan the entire list and retrieve the item you wish. Indexing also helps in inserting operations when you need to
insert data at a specific location. No matter how big or small the table is, you can update and retrieve data within
seconds.

The hash table is basically the array of elements and the hash techniques of search are performed on a part of the
item i.e. key. Each key has been mapped to a number, the range remains from 0 to table size 1

Types of hashing in data structure is a two-step process.

1. The hash function converts the item into a small integer or hash value. This integer is used as an index to
store the original data.
2. It stores the data in a hash table. You can use a hash key to locate data quickly.
Need for Hash data structure
Every day, the data on the internet is increasing multifold and it is always a struggle to store this data efficiently.
In day-to-day programming, this amount of data might not be that big, but still, it needs to be stored, accessed,
and processed easily and efficiently.
A very common data structure that is used for such a purpose is the Array data structure.
Now the question arises if Array was already there, what was the need for a new data structure!
The answer to this is in the word “efficiency“.
Though storing in Array takes O(1) time, searching in it takes at least O(log n) time.
This time appears to be small, but for a large data set, it can cause a lot of problems and this, in turn, makes the
Array data structure inefficient.
So now we are looking for a data structure that can store the data and search in it in constant time, i.e. in O(1)
time. This is how Hashing data structure came into play. With the introduction of the Hash data structure, it is
now possible to easily store data in constant time and retrieve them in constant time as well.
Components of Hashing
There are majorly three components of hashing:
1. Key: A Key can be anything string or integer which is fed as input in the hash function the technique that
determines an index or location for storage of an item in a data structure.
2. Hash Function: The hash function receives the input key and returns the index of an element in an array
called a hash table. The index is known as the hash index.
3. Hash Table: Hash table is a data structure that maps keys to values using a special function called a hash
function. Hash stores the data in an associative manner in an array where each data value has its own unique
index.

Components of Hashing

How does Hashing work?


Suppose we have a set of strings {“ab”, “cd”, “efg”} and we would like to store it in a table.
Our main objective here is to search or update the values stored in the table quickly in O(1) time and we are not
concerned about the ordering of strings in the table. So the given set of strings can act as a key and the string
itself will act as the value of the string but how to store the value corresponding to the key?
 Step 1: We know that hash functions (which is some mathematical formula) are used to calculate the hash
value which acts as the index of the data structure where the value will be stored.
 Step 2: So, let’s assign
 “a” = 1,
 “b”=2, .. etc, to all alphabetical characters.
 Step 3: Therefore, the numerical value by summation of all characters of the string:
 “ab” = 1 + 2 = 3,
 “cd” = 3 + 4 = 7 ,
 “efg” = 5 + 6 + 7 = 18
 Step 4: Now, assume that we have a table of size 7 to store these strings. The hash function that is used here is
the sum of the characters in key mod Table size. We can compute the location of the string in the array by
taking the sum(string) mod 7.
 Step 5: So we will then store
 “ab” in 3 mod 7 = 3,
 “cd” in 7 mod 7 = 0, and
 “efg” in 18 mod 7 = 4.

Mapping key with indices of array

The above technique enables us to calculate the location of a given string by using a simple hash function and
rapidly find the value that is stored in that location. Therefore the idea of hashing seems like a great way to store
(key, value) pairs of key and hash value.

Examples of Hashing in Data Structure

The following are real-life examples of hashing in the data structure –

 In schools, the teacher assigns a unique roll number to each student. Later, the teacher uses that roll number
to retrieve information about that student.
 A library has an infinite number of books. The librarian assigns a unique number to each book. This unique
number helps in identifying the position of the books on the bookshelf.
The hash function in a data structure maps the arbitrary size of data to fixed-sized data. It returns the following
values: a small integer value (also known as hash value), hash codes, and hash sums. The hashing techniques in
the data structure are very interesting, such as:

hash = hashfunc(key)

index = hash % array_size

The hash function must satisfy the following requirements:

 A good hash function is easy to compute.


 A good hash function never gets stuck in clustering and distributes keys evenly across the hash table.
 A good hash function avoids collision when two elements or items get assigned to the same hash value.

One of the hashing techniques of using a hash function is used for data integrity. If using a hash function one
change in a message will create a different hash.

The three characteristics of the hash function in the data structure are:

1. Collision free
2. Property to be hidden
3. Puzzle friendly

Hash Table

Hashing in data structure uses hash tables to store the key-value pairs. The hash table then uses the hash
function to generate an index. Hashing uses this unique index to perform insert, update, and search operations.

It can be defined as a bucket where the data are stored in an array format. These data have their own index
value. If the index values are known then the process of accessing the data is quicker.

You might also like