BCT Unit III Cryptography and Hashing New
BCT Unit III Cryptography and Hashing New
Message Authentication Code (MAC), also referred to as a tag, is used to authenticate the origin and nature of a
message. MACs use authentication cryptography to verify the legitimacy of data sent through a network or
transferred from one person to another.
In other words, MAC ensures that the message is coming from the correct sender, has not been changed, and that
the data transferred over a network or stored in or outside a system is legitimate and does not contain harmful code.
MACs can be stored on a hardware security module, a device used to manage sensitive digital keys.
The first step in the MAC process is the establishment of a secure channel between the receiver and the sender. To
encrypt a message, the MAC system uses an algorithm, which uses a symmetric key and the plain text message
being sent. The MAC algorithm then generates authentication tags of a fixed length by processing the message. The
resulting computation is the message's MAC.
This MAC is then appended to the message and transmitted to the receiver. The receiver computes the MAC using
the same algorithm. If the resulting MAC the receiver arrives at equals the one sent by the sender, the message is
verified as authentic, legitimate, and not tampered with.
In effect, MAC uses a secure key only known to the sender and the recipient. Without this information, the recipient
will not be able to open, use, read, or even receive the data being sent. If the data is to be altered between the time
the sender initiates the transfer and when the recipient receives it, the MAC information will also be affected.
Therefore, when the recipient attempts to verify the authenticity of the data, the key will not work, and the end
result will not match that of the sender. When this kind of discrepancy is detected, the data packet can be discarded,
protecting the recipient’s system.
MD5 is based on the Merkle-Damgård construction. Find below an illustration of this construction.
What is the MD5 Algorithm?
MD5 is a cryptographic hash function algorithm that takes the message as input of any length and
changes it into a fixed-length message of 16 bytes. MD5 algorithm stands for the message-digest
algorithm. MD5 was developed as an improvement of MD4, with advanced security purposes. The
output of MD5 (Digest size) is always 128 bits. MD5 was developed in 1991 by Ronald Rivest.
Use Of MD5 Algorithm:
It is used for file authentication.
In a web application, it is used for security purposes. e.g. Secure password of users etc.
Using this algorithm, We can store our password in 128 bits format.
MD5 Algorithm
Is MD5 Secure?
MD5 is not suitable for cryptographic applications.
As stated in RFC 6151 “MD5 is no longer acceptable where collision resistance is required such as digital
signatures.”
Secure Hash Algorithms, also known as SHA, are a family of cryptographic functions designed to keep data
secured. It works by transforming the data using a hash function: an algorithm that consists of bitwise
operations, modular additions, and compression functions.
The hash function then produces a fixed-size string that looks nothing like the original.
These algorithms are designed to be one-way functions, meaning that once they’re transformed into their respective
hash values, it’s virtually impossible to transform them back into the original data.
A few algorithms of interest are SHA-1, SHA-2, and SHA-3, each of which was successively designed with
increasingly stronger encryption in response to hacker attacks.
SHA-0, for instance, is now obsolete due to the widely exposed vulnerabilities.
A common application of SHA is to encrypting passwords, as the server side only needs to keep track of a specific
user’s hash value, rather than the actual password.
This is helpful in case an attacker hacks the database, as they will only find the hashed functions and not the actual
passwords, so if they were to input the hashed value as a password, the hash function will convert it into another
string and subsequently deny access.
Additionally, SHAs exhibit the avalanche effect, where the modification of very few letters being encrypted causes
a big change in output; or conversely, drastically different strings produce similar hash values.
This effect causes hash values to not give any information regarding the input string, such as its original length.
In addition, SHAs are also used to detect the tampering of data by attackers, where if a text file is slightly changed
and barely noticeable, the modified file’s hash value will be different than the original file’s hash value, and the
tampering will be rather noticeable.
A small tweak in the original data produces a drastically different encrypted output. This is called the avalanche effect [1] .
SHA Characteristics
Cryptographic hash functions are utilized in order to keep data secured by providing three fundamental safety
characteristics: pre-image resistance, second pre-image resistance, and collision resistance.
The cornerstone of cryptographic security lies in the provision of pre-image resistance, which makes it hard and
time-consuming for an attacker to find an original message, m, given the respective hash value, hm.
This security is provided by the nature of one-way functions, which is a key component of SHA. Pre-image
resistance is necessary to ward off brute force attacks from powerful machines.
One-way Function
Alice and Bob are pen pals who share their thoughts via mail. When Alice visited Bob, she gave him a phone book
of her city.
In order to keep their messages safe from intruders, Alice tells Bob that she will encrypt the message. She tells Bob
that he will find a bunch of numbers on every letter, and each sequence of numbers represents a phone number.
Bob’s job is to find the phone number in the book and write down the first letter of the person’s last name. With this
function, Bob is to decrypt the entire message.
To decrypt the message, Bob has to read the entire phone book to find all the numbers on the letter, whereas Alice
can quickly find the letters and their respective phone numbers in order to encrypt her message.
For this reason, before Bob is able to decrypt the message by hand, Alice can re-hash the message and keep the data
secure. This makes Alice’s algorithm a one-way function[2].
The second safety characteristic is called second pre-image resistance, granted by SHA when a message is
known, m1, yet it’s hard to find another message, 2m2, that hashes to the same value: Hm1=Hm2.
Without this characteristic, two different passwords would yield the same hash value, deeming the original
password unnecessary in order to access secured data.
The last safety characteristic is collision resistance, which is provided by algorithms that make it extremely hard
for an attacker to find two completely different messages that hash to the same hash value: Hm1=Hm2. In order to
provide this characteristic, there must be a similar number of possible inputs to possible outputs, as more inputs
than outputs, by the pigeonhole principle, will definitively incur potential collisions.
For this reason, collision resistance is necessary, as it implies that finding two inputs that hash to the same hash
value is extremely difficult. Without collision resistance, digital signatures can be compromised as finding two
messages that produce the same hash value may make users believe two documents were signed by two different
people when one person was able to produce a different document with the same hash value.
Recent cryptographic functions have stronger security characteristics to block off recently developed techniques
such as length extension attacks, where given a hash value, hash(m), and the length of the original message, m, an
attacker can find a message, ’m’, and calculate the hash value of the concatenation of the original message and the
new message: hash (m∣∣m’).
As a general guideline, a hash function should be as seemingly random as possible while still
being deterministic and fast to compute.
SHA-1
Secure Hash Algorithm 1, or SHA-1, was developed in 1993 by the U.S. government's standards agency National
Institute of Standards and Technology (NIST). It is widely used in security applications and protocols,
including TLS, SSL, PGP, SSH, IPsec, and S/MIME.
SHA-1 works by feeding a message as a bit string of length less than 264264 bits, and producing a 160-bit hash
value known as a message digest. Note that the message below is represented in hexadecimal notation for
compactness.
There are two methods to encrypt messages using SHA-1. Although one of the methods saves the processing of
sixty-four 32-bit words, it is more complex and time-consuming to execute, so the simple method is shown in the
example below. At the end of the execution, the algorithm outputs blocks of 16 words, where each word is made up
of 16 bits, for a total of 256 bits.
Pseudocode
Suppose the message ‘abc’ were to be encoded using SHA-1, with the message ‘abc’ in binary being
616263
1) The first step is to initialize five random strings of hex characters that will serve as part of the hash function
(shown in hex):
H0 =67DE2A01
H1=BB03E28C
H2=011EF1DC
H3=9293E9E2
H4=CDEF23A9.
2) The message is then padded by appending a 1, followed by enough 0s until the message is 448 bits. The length of
the message represented by 64 bits is then added to the end, producing a message that is 512 bits long:
Padding of string "abc" in bits, finalized by the length of the string, which is 24 bits.
3) The padded input obtained above, M, is then divided into 512-bit chunks, and each chunk is further divided into
sixteen 32-bit words, W0…W15. In the case of ‘abc’, there’s only one chunk, as the message is less than 512-bits
total.
4) For each chunk, begin the 80 iterations, i, necessary for hashing (80 is the determined number for SHA-1), and
execute the following steps on each chunk,:Mn:
0 0 0
1 0 1
0 1 1
1 1 0
For example, when i is 16, the words chosen are W(13),W(8),W(2),W(0), and the output is a new
word, W(16), so performing the XOR, or ⊕, operation on those words will give this:
Now, the circular shift operation S n(X) on the word X by n bits, n being an integer between 0 and 32, is defined by
where X<<n is the left-shift operation, obtained by discarding the leftmost n bits of X and padding the result
with n zeroes on the right.
X>>32−n is the right-shift operation obtained by discarding the rightmost n bits of X and padding the result
with n zeroes on the left.
Thus S n(X) is equivalent to a circular shift of X by n positions, and in this case the circular left-shift is used. [3]
So, a left shift S n (W(i)), where W(i) is 10010, would produce 01001, as the rightmost bit 0 is shifted to the left side
of the string. Therefore, W(16) would end up being
5) Now, store the hash values defined in step 1 in the following variables:
A=H0
B=H1
C=H2
D=H3
E=H4.
TEMP=S 5∗(A)+f(i;B,C,D)+E+W(i)+K(i).
See below for details on the logical function, f, and on the values of.K(i). Reassign the following variables:
E=D
D=C
C=S30(B)
B =A
A=TEMP.
7) Store the result of the chunk’s hash to the overall hash value of all chunks, as shown below, and proceed to
execute the next chunk:
H0 =H0+A
H1 =H1+B
H2 =H2+C
H3 =H3+D
H4=H4+E.
So, the string ‘abc’ becomes represented by a hash value akin to a9993e364706816aba3e25717850c26c9cd0d89d.
If the string changed to ‘abcd’, for instance, the hashed value would be drastically different so attackers cannot tell
that it is similar to the original message.
A sequence of logical functions are used in SHA-1, depending on the value of i, where 0≤i≤79, and on three 32-bit
words B, C, and D, in order to produce a 32-bit output. The following equations describe the logical functions,
where ¬ is the logical NOT, ∨ is the logical OR, ∧ is the logical AND, and ⊕ is the logical XOR:
Additionally, a sequence of constant words, shown in hex below, is used in the formulas:
SHA-1 and SHA-2 differ in several ways; mainly, SHA-2 produces 224- or 256-sized digests, whereas SHA-1
produces a 160-bit digest; SHA-2 can also have block sizes that contain 1024 bits, or 512 bits, like SHA-1.
Brute force attacks on SHA-2 are not as effective as they are against SHA-1. A brute force search for finding a
message that corresponds to a given digest of length L using brute force would require 2L evaluations, which makes
SHA-2 a lot safer against these kinds of attacks.
One of the most common attacks is known as the preimage attack, where pre-computed tables of solutions are used
in a brute-force manner to crack passwords.
The solution against these kinds of attacks is to compose a hash function that would take an attacker an very high
amount of resources, such as millions of dollars or decades of work, to find a message corresponding to a given
hash value.
Most attacks penetrating SHA-1 are collision attacks, where a non-sensual message produces the same hash value
as the original message.
Generally, this takes time proportional to 2n/2 to complete, where n is the length of the message. This is the reason
the message digests have increased in length from 160-bit digests in SHA-1 to 224- or 256-bit digests in SHA-2.
Other attacks exist that attempt to exploit mathematical properties in order to crack hash functions. Amongst these
is the birthday attack, where higher likelihood of collisions are found when using random attacks with a fixed
number of letter combinations.
A Distributed Hash Table is a decentralized data store that looks up data based on key-value pairs . Every
node in a distributed hash table is responsible for a set of keys and their associated values. The key is a unique
identifier for its associated data value, created by running the value through a hashing function. The data
values can be any form of data.
Distributed hash tables are decentralized, so all nodes form the collective system without any centralized
coordination. They are generally fault-tolerant because data is replicated across multiple nodes. Distributed
hash tables can scale for large volumes of data across many nodes.
A Distributed Hash Table is a decentralized data store that holds data in key-value pairs.
Why Is a Distributed Hash Table Used?
Distributed hash tables provide an easy way to find information in a large collection of data because all keys
are in a consistent format, and the entire set of keys can be partitioned in a way that allows fast identification
on where the key/value pair resides.
The nodes participating in a distributed hash table act as peers to find specific data values, as each node stores
the key partitioning scheme so that if it receives a request to access a given key, it can quickly map the key to
the node that stores the data. It then sends the request to that node.
Also, nodes in a distributed hash table can be easily added or removed without forcing a significant amount of
re-balancing of the data in the cluster.
Cluster rebalancing, especially for large data sets, can often be a time-consuming task that also impacts
performance.
Having a quick and easy means for growing or shrinking a cluster ensures that changes in data size does not
disrupt the operation of the applications that access data in the distributed hash table.
Hashing in the data structure is a technique of mapping a large chunk of data into small tables using a hashing
function. It is also known as the message digest function. It is a technique that uniquely identifies a specific item
from a collection of similar items.
Featured Program for you: It uses hash tables to store the data in an array format. Each value in the array has
been assigned a unique index number.
Hash tables use a technique to generate these unique index numbers for each value stored in an array format.
This technique is called the hash technique.
You only need to find the index of the desired item, rather than finding the data. With indexing, you can quickly
scan the entire list and retrieve the item you wish. Indexing also helps in inserting operations when you need to
insert data at a specific location. No matter how big or small the table is, you can update and retrieve data within
seconds.
The hash table is basically the array of elements and the hash techniques of search are performed on a part of the
item i.e. key. Each key has been mapped to a number, the range remains from 0 to table size 1
1. The hash function converts the item into a small integer or hash value. This integer is used as an index to
store the original data.
2. It stores the data in a hash table. You can use a hash key to locate data quickly.
Need for Hash data structure
Every day, the data on the internet is increasing multifold and it is always a struggle to store this data efficiently.
In day-to-day programming, this amount of data might not be that big, but still, it needs to be stored, accessed,
and processed easily and efficiently.
A very common data structure that is used for such a purpose is the Array data structure.
Now the question arises if Array was already there, what was the need for a new data structure!
The answer to this is in the word “efficiency“.
Though storing in Array takes O(1) time, searching in it takes at least O(log n) time.
This time appears to be small, but for a large data set, it can cause a lot of problems and this, in turn, makes the
Array data structure inefficient.
So now we are looking for a data structure that can store the data and search in it in constant time, i.e. in O(1)
time. This is how Hashing data structure came into play. With the introduction of the Hash data structure, it is
now possible to easily store data in constant time and retrieve them in constant time as well.
Components of Hashing
There are majorly three components of hashing:
1. Key: A Key can be anything string or integer which is fed as input in the hash function the technique that
determines an index or location for storage of an item in a data structure.
2. Hash Function: The hash function receives the input key and returns the index of an element in an array
called a hash table. The index is known as the hash index.
3. Hash Table: Hash table is a data structure that maps keys to values using a special function called a hash
function. Hash stores the data in an associative manner in an array where each data value has its own unique
index.
Components of Hashing
The above technique enables us to calculate the location of a given string by using a simple hash function and
rapidly find the value that is stored in that location. Therefore the idea of hashing seems like a great way to store
(key, value) pairs of key and hash value.
In schools, the teacher assigns a unique roll number to each student. Later, the teacher uses that roll number
to retrieve information about that student.
A library has an infinite number of books. The librarian assigns a unique number to each book. This unique
number helps in identifying the position of the books on the bookshelf.
The hash function in a data structure maps the arbitrary size of data to fixed-sized data. It returns the following
values: a small integer value (also known as hash value), hash codes, and hash sums. The hashing techniques in
the data structure are very interesting, such as:
hash = hashfunc(key)
One of the hashing techniques of using a hash function is used for data integrity. If using a hash function one
change in a message will create a different hash.
The three characteristics of the hash function in the data structure are:
1. Collision free
2. Property to be hidden
3. Puzzle friendly
Hash Table
Hashing in data structure uses hash tables to store the key-value pairs. The hash table then uses the hash
function to generate an index. Hashing uses this unique index to perform insert, update, and search operations.
It can be defined as a bucket where the data are stored in an array format. These data have their own index
value. If the index values are known then the process of accessing the data is quicker.