Lec 11 - Hashing
Lec 11 - Hashing
Hashing in Cryptography
Hashing in Cryptography
Hashing
Use of Hashing
Basic Properties of Hash Functions
Properties of Cryptographic Hash Function
Requirement of Cryptographic Hash Function
Hashing vs Encryption
Data Integrity & Data Confidentiality
Hashing in Python
Type of Hashing Function
1
Hashing
Hashing is a cryptography technique that converts a
message into a hash value.
The message can be any form of data (string, image, file)
A hash value is a alphanumeric string of a fixed length
that uniquely identifies the message.
It is substantially smaller than the message
A hash function is an algorithm that transforms data of
arbitrary size into a fixed size output.
Hashing
4
Picture source: Dennis Byrne(2021). Full Stack Python Security: Cryptography, TLS, and attack resistance. Manning.
2
Use of Hashing
Hashing can be used to validate files, documents and other
types of data to ensure content has not been altered.
Hashing is used in digital signatures to authenticate the
identity of the source messages.
Further reading (digital signature):
https://www.globalsign.com/en/blog/how-do-digital-signatures-work
https://www.youtube.com/watch?v=OKg4PqD01Z0
It is used in the computer forensics to verify the integrity
of recovered evidence and to compare two files to
determine if they are duplicates.
5
3
Basic Properties of Hash Function
Fixed-length hash values
Hash function takes message of any length and produces
a fixed-length hash value.
The length of the message does not affect the length of the
hash value.
4
Properties of Cryptographic Hash Function
One-way function properties Given h, it is hard to find x such that H(x) = h
Collision resistance
A collision occurs when two different messages produce same
hash value by the hashing function
Hash functions are designed to minimize collisions.
The hash function is judged on how well it avoids collisions. 9
10
5
Properties of Cryptographic Hash Function
Strong collision resistance
Strong collision resistance means it is difficult to find any two
messages that hash to the same value.
That is, it is hard to find any x1 and x2 such that H(x1)=H(x2).
Different from weak collision resistance, attacker gets to choose
both x1 and x2, not just x2
12
http://www.parkjonghyuk.net/lecture/2015-1st-lecture/networksecurity/chap11.pdf
6
Hashing vs Encryption
Encryption Hashing
two-way function one-way function
The original message can be retrieved using a The original message can not be retrieved.
decryption key.
The resultant encrypted string is of variable length. The resultant hash is of fixed length.
The length of the encrypted string depends on the The length of the hash is fixed and does not
length of the input string. depend on the size of the input string.
The purpose of encryption is to ensure data The purpose of hashing is to ensure data
confidentiality. integrity.
Examples of encryption algorithms include AES, Examples of hashing algorithms include SHA-
DES, RSA, ECC etc. 1, SHA-2, MD5, CRC etc.
13
7
Data Integrity & Data Confidentiality
Data Integrity • User A logged on to public Wi-Fi to
send user B an email.
• User A write out the message, sign with
the digital certificate, and send it.
• The man-in-the-middle attack occur —
someone intercept the message
(because public Wi-Fi are insecure) and
modify it.
• The hash value changes because the
email content is modified after digitally
signed.
16
8
Type of Hashing Function
Safe and Secure Hash Function
SHA-2
SHA-3
BLAKE2
18
9
Type of Hashing Function
• More bits at the hash output are expected to achieve stronger security and higher collision
resistance (with some exceptions).
• Thus, SHA-512 is stronger than SHA-256, it is expected that SHA-512 is more unlikely to
practically find a collision than for SHA-256.
19
William Smith (2017). Statistical Analysis of the SHA-1 and SHA-2 Hash Functions
10
Type of Hashing Function
BLAKE2
BLAKE2 is not as popular as SHA-2 or SHA-3.
It leverages modern CPU architecture to hash at extreme
speeds.
BLAKE2 should be considered if large amounts of data need to
be hashed.
BLAKE2 comes in two flavors: BLAKE2b and BLAKE2s.
BLAKE2b is optimized for 64-bit platforms.
BLAKE2s is optimized for 8- to 32-bit platforms.
21
22
11
Type of Hashing Function
SHA-1
SHA-1 is an obsolete 160-bit hash function developed by the
NSA in the mid-1990s.
Like MD5, this hash function was popular at one time but it is
no longer considered secure.
The first collisions for SHA-1 were announced in 2017 by a
collaboration effort between Google and Centrum Wiskunde &
Informatica, a research institute in the Netherlands.
In theoretical terms, this effort stripped SHA-1 of strong
collision resistance, not weak collision resistance.
23
12
Hashing in Python
I. Built-In Hashing
The result is different and will be different for each new Python
invocation.
Python has never guaranteed that hash() is deterministic.
25
Hashing in Python
II. Hashing from hashlib module
26
13
Hashing in Python
III. Checksums
A checksum is a small bit of data (generated hash value) to
verify the original data is as it should be.
It can be used to verify the integrity of a file.
Checksum security uses the generated hash value of a file and
compares it against the expected checksum (usually provided
by the sender of the file) value.
Checksums is deterministic, means same data will return the
same result each time.
It ensure the data has not been altered during transmission.
27
Hashing in Python
III. Checksums
Yes
Receive hash value
28
14
Hashing in Python
IV. Near-Duplicate Detection
It is a good property of a hash function to generate
significantly different hashes with small changes to the
data, especially for message digests.
If the objective is to find out the data that have
similar content, near-duplicate detection is used to reduce
the amount of data stored.
Some use cases require identifying subtle data differences
such as plagiarism detection.
29
Hashing in Python
IV. Near-Duplicate Detection
The Simhash Python library is needed.
30
15
Hashing in Python
V. Perceptual Hashing
This hashing method is used to detect differences in
images and video.
It can be used to eliminate storing duplicate content in a
video or determining an image is close enough to
consider it a duplicate, saving space.
The ImageHash Python library is needed.
31
Hashing in Python
V. Perceptual Hashing
The example above shows two near-duplicate images and how close their perceptual hashes are 32
16