0% found this document useful (0 votes)
7 views

Hash Function

Uploaded by

Harib Khan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Hash Function

Uploaded by

Harib Khan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Hash Function

An algorithm that converts a message into a hash value

What is a Hash Function?

A hash function is a mathematical function or algorithm that simply takes a variable


number of characters (called a ”message”) and converts it into a string with a fixed number
of characters (called a hash value or simply, a hash).

 A hash function is a mathematical function that converts any digital data into an output
string with a fixed number of characters. Hashing is the one-way act of converting the
data (called a message) into the output (called the hash).
 Hashing is useful to ensure the authenticity of a piece of data and that it has not been
tampered with since even a small change in the message will create an entirely different
hash.
 Hash functions are the basic tools of modern cryptography that are used in information
security to authenticate transactions, messages, and digital signatures.

The act of hashing is, therefore, running an input into a formula that converts it into an
output message of fixed length. No matter how many characters long the input is, the
output will always be the same in terms of the number of hexadecimal (letters and
numbers) characters.

Hashing is useful to ensure the authenticity of a piece of data, as any small change to the
message will result in a completely different hash value.
Hash functions are the basic tools of modern cryptography that are used in information
security to authenticate transactions, messages, and digital signatures.

Hashing is generally a one-way function, which means that it is easy to convert a message
into a hash but very difficult to “reverse hash” a hash value back to its original message
as it requires a massive amount of computing power.

This difficulty is what cryptocurrencies like Bitcoin, which uses proof-of-work systems,
depend on to ensure the integrity of their blockchain.

Why Do We Need Hash Functions?


Standard Length

When you hash a message, it takes your file or message of any size, runs it through a
mathematical algorithm, and spits out an output of a fixed length.

Table 1: Different Hash Functions

In Table 1 above, I have converted the same input message (the letters CFI) into hash
values using three different hash functions (MD5, SHA-1, and SHA-256). Each one of those
different hash functions will spit out an output hash that has a set fixed length of
hexadecimal characters. In the case of MD5, it is 32 characters, SHA-1, 40 characters, and
SHA-256, 64 characters.

Table 2: Different Inputs Using the Same Hash Function (SHA-1)


It doesn’t matter what we put in as an input, the same hash function will always produce
a hash value that has the the same number of characters. In Table 2 above, we change the
message each time, but using the same hash function (SHA-1 in this case), the output is
always 40 hexadecimal characters long.

Ensure data integrity

Let’s think of an example where you want to send a digital message or document to
someone, and you want to make sure that it hasn’t been tampered with along the way.
You could send it multiple times and have the recipient verify each copy is the same, but
that would not be feasible if the file or message was very large.

It would be much easier if there was a way of having a shorter and set number of
characters for the sender and receiver to check. And that’s essentially what a hash function
allows two computers to do.

Rather than compare the data in its original (and larger) form, by comparing the two
hashes of the data, computers can quickly confirm that the data has not been tampered
with and changed.

Hash functions, therefore, serve as a check-sum or a way for someone to identify whether
digital data has been tampered with after it’s been created.

Verify authenticity

For example, if you send out an email, it can be intercepted easily (especially if it is sent
over an unsecured WiFi network). The recipient of the email has no way of knowing if
someone has altered the contents of the email along the way, called a “Man-in-the-
Middle” (MitM) attack.

However, if the sender signs the email with their digital signature and hashes that
together with the email contents, the receiver can examine the hash data to ensure that
the email contents have not been modified after being digitally signed.
To do this, the receiver would compare the hash value on the digitally-signed email
received to a hash value they “re-generate” themselves using the same hash function
provided by the sender, as well as the signer’s public key.

If it matches, that means that no one has altered the message, but if the hashes are
different, then the receiver knows that the contents of the email are not authentic, as even
if something small has been changed in that message, the hash will be completely
different.

Tools for email signing and hashing:

 PGP (Pretty Good Privacy): PGP allows you to sign and encrypt emails. There are
various software tools and plugins for PGP that can be integrated into email clients like
Thunderbird or Gmail.
 S/MIME (Secure/Multipurpose Internet Mail Extensions): Another widely used
protocol that provides encryption and digital signatures for emails. Most email clients like
Outlook support S/MIME.

Example of using PGP for sending a signed email:

1. Install a PGP client (e.g., GPG Suite for Mac or Gpg4win for Windows).
2. Generate your key pair (public and private keys).
3. Sign your email using your private key before sending it.
4. The recipient uses your public key to verify the signature.

How Does a Hash Function Work?

A hash function depends on the algorithm but generally, to get the hash value of a set
length, it needs to first divide the input data into fixed-sized blocks, which are called data
blocks.

This is because a hash function takes in data at a fixed length. The size of the data block
is different from one algorithm to another.

If the blocks are not big enough, they may add padding to fill it out. However, regardless
of what method of hashing you use, the output, or hash value, is always the same fixed
length.

The hash function is then repeated as many times as the number of data blocks.
The “Avalanche Effect”

The data blocks are processed one at a time. The output of the first data block is fed as
input along with the second data block. Consequently, the output of the second is fed
along with the third block, and so on.

Thus, making the final output the combined value of all the blocks. If you change one bit
anywhere in the message, the entire hash value changes. This is called ‘the avalanche
effect.

Uniqueness and Deterministic


Hash functions must be Deterministic – meaning that every time you put in the same
input, it will always create the same output.

In other words, the output, or hash value, must be unique to the exact input. There should
be no chance whatsoever that two different message inputs create the same output hash.
If a hash function produces the same output from two different pieces of data, it is known
as a “hash collision,” and the algorithm is useless.

Irreversibility

Ideally, hash functions should be irreversible. Meaning that while it is quick and easy to
compute the hash if you know the input message for any given hash function, it is very
difficult to go through the process in reverse to compute the input message if you only
know the hash value.

Hash Functions in Cryptography

The most famous cryptocurrency, Bitcoin, uses hash functions in its blockchain. Powerful
computers, called miners, race each other in brute force searches to try to solve hashes in
order to earn the mining rewards of new Bitcoins, as well as processing fees that users
pay to record their transactions on the blockchain.

Solving a hash involves computing a proof-of-work, called a NONCE, or “number used


once”, that, when added to the block, causes the block’s hash to begin with a certain
number of zeroes. Once a valid proof-of-work is discovered, the block is considered valid
and can be added to the blockchain.

Since each block’s hash is created by a cryptographic algorithm – Bitcoin uses the SHA-
256 algorithm – the only way to find a valid proof-of-work is to run guesses through the
algorithm until the right number is found that creates a hash that starts with the right
number of zeroes. This is what Bitcoin miners are doing, running numbers through a
cryptographic algorithm until they guess the valid NONCE.

What are Examples of Common Cryptocurrency Hash Functions?


The SHA-256 function that Bitcoin uses is short for “Secure Hash Algorithm” and was
designed by the United States National Security Agency (NSA) and includes SHA-1, SHA-
2 (a family within a family that includes SHA-224, SHA-256, SHA-384, and SHA-512), and
SHA-3 (SHA3-224, SHA3-256, SHA3-384, and SHA3-512).

Other examples of common hashing algorithms include:

 Message Digest (MD) Algorithm — MD2, MD4, MD5, and MD6. MD5 was long
considered a go-to hashing algorithm, but it’s now considered broken because of
hash collisions.
 Windows NTHash — Also known as a Unicode hash or NTLM, this hash is
commonly used by Windows systems
 RACE Integrity Primitives Evaluation Message Digest (RIPEMD)
 Whirlpool
 RSA

Generally speaking, the most popular hashing algorithms or functions have a hash length
ranging from 160 to 512 bits.

Hashing vs Encryption

Encryption is the practice of taking data and creating a scrambled message in a way that
only someone with a corresponding key, called a cipher, can unscramble and decode it.
Encryption is a two-way function, designed to be reversible by anyone who holds a cipher.
So when someone encrypts something, it is done with the intention of decrypting it later.

Hashing is using a formula that converts data of any size to a fixed length. The computing
power required to “un-hash” something makes it very difficult so whereas encryption is a
two-way function, hashing is generally a one-way function.

Encryption is meant to protect data in transit, hashing is meant to verify that a file or piece
of data hasn’t been altered—that it is authentic. So you might liken encryption to putting
a piece of data in a safe that opens when the recipient knows the combination; hashing
is more like a security tamper seal that indicates if the contents of the data have been
altered.

Step-by-Step Breakdown of SHA-256 on the Input "hello"

The input string is "hello".

Step 1: Convert the Input to Binary

First, we convert the input string into its binary representation using ASCII encoding. Each
character in "hello" is represented by an 8-bit binary value:

 h = 104 = 01101000
 e = 101 = 01100101
 l = 108 = 01101100
 l = 108 = 01101100
 o = 111 = 01101111

Concatenating these binary values gives:

01101000 01100101 01101100 01101100 01101111

This is a 40-bit binary string:

01101000 01100101 01101100 01101100 01101111 (40 bits)

Step 2: Padding the Message

SHA-256 operates on blocks of 512 bits. If the input message is less than 512 bits, it needs to be
padded. Padding is done in the following way:

1. Append a single '1' bit to the end of the message.

After appending '1', we get:

01101000 01100101 01101100 01101100 01101111 1 (41 bits)

2. Append '0' bits until the message length is 448 bits (the remaining 64 bits are reserved for
the length of the original message).

After padding with zeros:

01101000 01100101 01101100 01101100 01101111 10000000... (448 bits in


total)

3. Append the original message length in binary as a 64-bit integer. Since "hello" is 40 bits
long, we append:

00000000 00000000 00000000 00000000 00000000 00000000 00000000 00101000


(64 bits)

Combining all parts gives us a 512-bit block:

01101000 01100101 01101100 01101100 01101111 10000000 00000000...00000000


00101000 (512 bits)

Step 3: Initialize Hash Values

SHA-256 has eight hash values (H0 to H7), initialized as follows:

H0 = 6a09e667
H1 = bb67ae85
H2 = 3c6ef372
H3 = a54ff53a
H4 = 510e527f
H5 = 9b05688c
H6 = 1f83d9ab
H7 = 5be0cd19
These values are derived from the fractional parts of the square roots of the first 8 prime numbers.

Step 4: Prepare the Message Schedule (W0 to W63)

The message schedule consists of 64 32-bit words (W0 to W63):

 W0 to W15 are the 512-bit message block split into 16 32-bit words.
 W16 to W63 are computed using the following formula:

W[i] = σ1(W[i-2]) + W[i-7] + σ0(W[i-15]) + W[i-16]

The σ0 and σ1 functions involve rotations and shifts:

σ0(x) = ROTR 7(x) XOR ROTR 18(x) XOR SHR 3(x)


σ1(x) = ROTR 17(x) XOR ROTR 19(x) XOR SHR 10(x)

For the message "hello", after splitting and padding, the first few words are as follows (in
hexadecimal):

W0 = 68656c6c (01101000 01100101 01101100 01101100)


W1 = 6f800000 (01101111 10000000 00000000 00000000)
W2 = 00000000
W3 = 00000000
...

The remaining words (W16 to W63) are calculated using the above σ0 and σ1 functions.

Step 5: Compression Function (64 Rounds)

The main computation loop of SHA-256 runs for 64 rounds. In each round, we update the working
variables (A, B, C, D, E, F, G, H) based on the message schedule, the constants, and the current
hash values.

Key Functions:

 Ch (Choice):

Ch(x, y, z) = (x AND y) XOR ((NOT x) AND z)

This function chooses bits from y when x is 1, and from z when x is 0.

 Maj (Majority):

Maj(x, y, z) = (x AND y) XOR (x AND z) XOR (y AND z)

This function selects the majority value of the three inputs x, y, and z.

 Σ0 and Σ1 (capital sigma): These functions involve bitwise rotations and shifts:

Σ0(x) = ROTR 2(x) XOR ROTR 13(x) XOR ROTR 22(x)


Σ1(x) = ROTR 6(x) XOR ROTR 11(x) XOR ROTR 25(x)

The 64 rounds use the following operations:


1. Compute temporary variables T1 and T2:

T1 = H + Σ1(E) + Ch(E, F, G) + K[i] + W[i]


T2 = Σ0(A) + Maj(A, B, C)

2. Update the working variables:

H = G
G = F
F = E
E = D + T1
D = C
C = B
B = A
A = T1 + T2

After all 64 rounds, the intermediate hash values are updated:

H0 = H0 + A
H1 = H1 + B
H2 = H2 + C
H3 = H3 + D
H4 = H4 + E
H5 = H5 + F
H6 = H6 + G
H7 = H7 + H

Step 6: Final Hash

After processing all blocks (in this case, just one), the final hash is the concatenation of H0 to H7.
This gives the 256-bit output in hexadecimal.

For the input "hello", the final hash is:

2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824

This is the SHA-256 hash of "hello."

Summary:

1. Input Message: "hello"


2. Padded to 512 bits and split into 16 32-bit words.
3. 64 rounds of processing with bitwise operations, rotations, and modular additions.
4. Final Hash:

2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824

You might also like