Hash Function
Hash Function
A hash function is a mathematical function that converts any digital data into an output
string with a fixed number of characters. Hashing is the one-way act of converting the
data (called a message) into the output (called the hash).
Hashing is useful to ensure the authenticity of a piece of data and that it has not been
tampered with since even a small change in the message will create an entirely different
hash.
Hash functions are the basic tools of modern cryptography that are used in information
security to authenticate transactions, messages, and digital signatures.
The act of hashing is, therefore, running an input into a formula that converts it into an
output message of fixed length. No matter how many characters long the input is, the
output will always be the same in terms of the number of hexadecimal (letters and
numbers) characters.
Hashing is useful to ensure the authenticity of a piece of data, as any small change to the
message will result in a completely different hash value.
Hash functions are the basic tools of modern cryptography that are used in information
security to authenticate transactions, messages, and digital signatures.
Hashing is generally a one-way function, which means that it is easy to convert a message
into a hash but very difficult to “reverse hash” a hash value back to its original message
as it requires a massive amount of computing power.
This difficulty is what cryptocurrencies like Bitcoin, which uses proof-of-work systems,
depend on to ensure the integrity of their blockchain.
When you hash a message, it takes your file or message of any size, runs it through a
mathematical algorithm, and spits out an output of a fixed length.
In Table 1 above, I have converted the same input message (the letters CFI) into hash
values using three different hash functions (MD5, SHA-1, and SHA-256). Each one of those
different hash functions will spit out an output hash that has a set fixed length of
hexadecimal characters. In the case of MD5, it is 32 characters, SHA-1, 40 characters, and
SHA-256, 64 characters.
Let’s think of an example where you want to send a digital message or document to
someone, and you want to make sure that it hasn’t been tampered with along the way.
You could send it multiple times and have the recipient verify each copy is the same, but
that would not be feasible if the file or message was very large.
It would be much easier if there was a way of having a shorter and set number of
characters for the sender and receiver to check. And that’s essentially what a hash function
allows two computers to do.
Rather than compare the data in its original (and larger) form, by comparing the two
hashes of the data, computers can quickly confirm that the data has not been tampered
with and changed.
Hash functions, therefore, serve as a check-sum or a way for someone to identify whether
digital data has been tampered with after it’s been created.
Verify authenticity
For example, if you send out an email, it can be intercepted easily (especially if it is sent
over an unsecured WiFi network). The recipient of the email has no way of knowing if
someone has altered the contents of the email along the way, called a “Man-in-the-
Middle” (MitM) attack.
However, if the sender signs the email with their digital signature and hashes that
together with the email contents, the receiver can examine the hash data to ensure that
the email contents have not been modified after being digitally signed.
To do this, the receiver would compare the hash value on the digitally-signed email
received to a hash value they “re-generate” themselves using the same hash function
provided by the sender, as well as the signer’s public key.
If it matches, that means that no one has altered the message, but if the hashes are
different, then the receiver knows that the contents of the email are not authentic, as even
if something small has been changed in that message, the hash will be completely
different.
PGP (Pretty Good Privacy): PGP allows you to sign and encrypt emails. There are
various software tools and plugins for PGP that can be integrated into email clients like
Thunderbird or Gmail.
S/MIME (Secure/Multipurpose Internet Mail Extensions): Another widely used
protocol that provides encryption and digital signatures for emails. Most email clients like
Outlook support S/MIME.
1. Install a PGP client (e.g., GPG Suite for Mac or Gpg4win for Windows).
2. Generate your key pair (public and private keys).
3. Sign your email using your private key before sending it.
4. The recipient uses your public key to verify the signature.
A hash function depends on the algorithm but generally, to get the hash value of a set
length, it needs to first divide the input data into fixed-sized blocks, which are called data
blocks.
This is because a hash function takes in data at a fixed length. The size of the data block
is different from one algorithm to another.
If the blocks are not big enough, they may add padding to fill it out. However, regardless
of what method of hashing you use, the output, or hash value, is always the same fixed
length.
The hash function is then repeated as many times as the number of data blocks.
The “Avalanche Effect”
The data blocks are processed one at a time. The output of the first data block is fed as
input along with the second data block. Consequently, the output of the second is fed
along with the third block, and so on.
Thus, making the final output the combined value of all the blocks. If you change one bit
anywhere in the message, the entire hash value changes. This is called ‘the avalanche
effect.
In other words, the output, or hash value, must be unique to the exact input. There should
be no chance whatsoever that two different message inputs create the same output hash.
If a hash function produces the same output from two different pieces of data, it is known
as a “hash collision,” and the algorithm is useless.
Irreversibility
Ideally, hash functions should be irreversible. Meaning that while it is quick and easy to
compute the hash if you know the input message for any given hash function, it is very
difficult to go through the process in reverse to compute the input message if you only
know the hash value.
The most famous cryptocurrency, Bitcoin, uses hash functions in its blockchain. Powerful
computers, called miners, race each other in brute force searches to try to solve hashes in
order to earn the mining rewards of new Bitcoins, as well as processing fees that users
pay to record their transactions on the blockchain.
Since each block’s hash is created by a cryptographic algorithm – Bitcoin uses the SHA-
256 algorithm – the only way to find a valid proof-of-work is to run guesses through the
algorithm until the right number is found that creates a hash that starts with the right
number of zeroes. This is what Bitcoin miners are doing, running numbers through a
cryptographic algorithm until they guess the valid NONCE.
Message Digest (MD) Algorithm — MD2, MD4, MD5, and MD6. MD5 was long
considered a go-to hashing algorithm, but it’s now considered broken because of
hash collisions.
Windows NTHash — Also known as a Unicode hash or NTLM, this hash is
commonly used by Windows systems
RACE Integrity Primitives Evaluation Message Digest (RIPEMD)
Whirlpool
RSA
Generally speaking, the most popular hashing algorithms or functions have a hash length
ranging from 160 to 512 bits.
Hashing vs Encryption
Encryption is the practice of taking data and creating a scrambled message in a way that
only someone with a corresponding key, called a cipher, can unscramble and decode it.
Encryption is a two-way function, designed to be reversible by anyone who holds a cipher.
So when someone encrypts something, it is done with the intention of decrypting it later.
Hashing is using a formula that converts data of any size to a fixed length. The computing
power required to “un-hash” something makes it very difficult so whereas encryption is a
two-way function, hashing is generally a one-way function.
Encryption is meant to protect data in transit, hashing is meant to verify that a file or piece
of data hasn’t been altered—that it is authentic. So you might liken encryption to putting
a piece of data in a safe that opens when the recipient knows the combination; hashing
is more like a security tamper seal that indicates if the contents of the data have been
altered.
First, we convert the input string into its binary representation using ASCII encoding. Each
character in "hello" is represented by an 8-bit binary value:
h = 104 = 01101000
e = 101 = 01100101
l = 108 = 01101100
l = 108 = 01101100
o = 111 = 01101111
SHA-256 operates on blocks of 512 bits. If the input message is less than 512 bits, it needs to be
padded. Padding is done in the following way:
2. Append '0' bits until the message length is 448 bits (the remaining 64 bits are reserved for
the length of the original message).
3. Append the original message length in binary as a 64-bit integer. Since "hello" is 40 bits
long, we append:
H0 = 6a09e667
H1 = bb67ae85
H2 = 3c6ef372
H3 = a54ff53a
H4 = 510e527f
H5 = 9b05688c
H6 = 1f83d9ab
H7 = 5be0cd19
These values are derived from the fractional parts of the square roots of the first 8 prime numbers.
W0 to W15 are the 512-bit message block split into 16 32-bit words.
W16 to W63 are computed using the following formula:
For the message "hello", after splitting and padding, the first few words are as follows (in
hexadecimal):
The remaining words (W16 to W63) are calculated using the above σ0 and σ1 functions.
The main computation loop of SHA-256 runs for 64 rounds. In each round, we update the working
variables (A, B, C, D, E, F, G, H) based on the message schedule, the constants, and the current
hash values.
Key Functions:
Ch (Choice):
Maj (Majority):
This function selects the majority value of the three inputs x, y, and z.
Σ0 and Σ1 (capital sigma): These functions involve bitwise rotations and shifts:
H = G
G = F
F = E
E = D + T1
D = C
C = B
B = A
A = T1 + T2
H0 = H0 + A
H1 = H1 + B
H2 = H2 + C
H3 = H3 + D
H4 = H4 + E
H5 = H5 + F
H6 = H6 + G
H7 = H7 + H
After processing all blocks (in this case, just one), the final hash is the concatenation of H0 to H7.
This gives the 256-bit output in hexadecimal.
2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824
Summary:
2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824