0% found this document useful (0 votes)
21 views16 pages

Lec 11 - Hashing

Uploaded by

Abc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views16 pages

Lec 11 - Hashing

Uploaded by

Abc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Chapter 11

Hashing in Cryptography

Hashing in Cryptography
 Hashing
 Use of Hashing
 Basic Properties of Hash Functions
 Properties of Cryptographic Hash Function
 Requirement of Cryptographic Hash Function
 Hashing vs Encryption
 Data Integrity & Data Confidentiality
 Hashing in Python
 Type of Hashing Function

1
Hashing
 Hashing is a cryptography technique that converts a
message into a hash value.
 The message can be any form of data (string, image, file)
 A hash value is a alphanumeric string of a fixed length
that uniquely identifies the message.
 It is substantially smaller than the message
 A hash function is an algorithm that transforms data of
arbitrary size into a fixed size output.

Hashing

4
Picture source: Dennis Byrne(2021). Full Stack Python Security: Cryptography, TLS, and attack resistance. Manning.

2
Use of Hashing
 Hashing can be used to validate files, documents and other
types of data to ensure content has not been altered.
 Hashing is used in digital signatures to authenticate the
identity of the source messages.
 Further reading (digital signature):
 https://www.globalsign.com/en/blog/how-do-digital-signatures-work
 https://www.youtube.com/watch?v=OKg4PqD01Z0
 It is used in the computer forensics to verify the integrity
of recovered evidence and to compare two files to
determine if they are duplicates.
5

Basic Properties of Hash Function


 Deterministic behavior
 For a given message, a hash function always produces the
same hash value.
 Hash function behavior is repeatable, not random.

3
Basic Properties of Hash Function
 Fixed-length hash values
 Hash function takes message of any length and produces
a fixed-length hash value.
 The length of the message does not affect the length of the
hash value.

Basic Properties of Hash Function


 Avalanche effect
 Any change made to a message, no matter how small, will
result in a massive change in the hash value.
 Small differences between messages result in large
differences between hash values.

4
Properties of Cryptographic Hash Function
 One-way function properties Given h, it is hard to find x such that H(x) = h

 A function is one-way if it is easy to invoke and difficult to


reverse engineer.
 If an attacker obtains a hash value, it is difficult for him to figure
out what the message was.

 Collision resistance
 A collision occurs when two different messages produce same
hash value by the hashing function
 Hash functions are designed to minimize collisions.
 The hash function is judged on how well it avoids collisions. 9

Properties of Cryptographic Hash Function


 Weak collision resistance
 Given a message x1, weak collision resistance implies that it
is difficult to find another message x2 such that H(x1)=H(x2)
for the same output.
 If an attacker has one input, it must be infeasible to identify
another input capable of producing the same output.
 Imagine person A send correct message x through channel
and it is replaced by somebody with message y.

10

5
Properties of Cryptographic Hash Function
 Strong collision resistance
 Strong collision resistance means it is difficult to find any two
messages that hash to the same value.
 That is, it is hard to find any x1 and x2 such that H(x1)=H(x2).
 Different from weak collision resistance, attacker gets to choose
both x1 and x2, not just x2

 Weak collision resistance is bound to a particular given


message; strong collision resistance applies to any pair of
messages.
11

Requirement of Cryptographic Hash Function

12
http://www.parkjonghyuk.net/lecture/2015-1st-lecture/networksecurity/chap11.pdf

6
Hashing vs Encryption
Encryption Hashing
two-way function one-way function

The operation is reversible The operation is irreversible

The original message can be retrieved using a The original message can not be retrieved.
decryption key.
The resultant encrypted string is of variable length. The resultant hash is of fixed length.

The length of the encrypted string depends on the The length of the hash is fixed and does not
length of the input string. depend on the size of the input string.
The purpose of encryption is to ensure data The purpose of hashing is to ensure data
confidentiality. integrity.
Examples of encryption algorithms include AES, Examples of hashing algorithms include SHA-
DES, RSA, ECC etc. 1, SHA-2, MD5, CRC etc.
13

Data Integrity & Data Confidentiality


Data Integrity
 It refers to the accuracy and consistency (validity) of data over its
lifecycle.
 Hashing can be used to determine the integrity of the data.
 data is hashed at a certain time and the hash value is stored.
 At a later time, the data can be hashed again and compared to
the stored hash value.
 If the hash values match, it means the data has not been altered.
 If the hash values do not match, the data has been corrupted.
 The sensitive data should include cryptographic hashing for
verification of integrity. 14

7
Data Integrity & Data Confidentiality
Data Integrity • User A logged on to public Wi-Fi to
send user B an email.
• User A write out the message, sign with
the digital certificate, and send it.
• The man-in-the-middle attack occur —
someone intercept the message
(because public Wi-Fi are insecure) and
modify it.
• The hash value changes because the
email content is modified after digitally
signed.

• User B receives the email and uses the


data provided by the digital signature of
User A (along with the algorithm) to
generate the hash value.
• Compare the generated hash value with
the received hash value.
• If it matches, the email is intact and no
changes.
• If it is not matches, the email has been
modified by someone. 15
Picture source: https://www.thesslstore.com/blog/what-is-a-hash-function-in-cryptography-a-beginners-guide/

Data Integrity & Data Confidentiality


Data Confidentiality
 It ensures that data exchanged is not accessible to unauthorized
users (applications, processes, other systems and/or humans).
 The more sensitive the data, the higher the level of confidentiality.
Therefore, all sensitive data should always be controlled and
monitored.
 Cryptography is excellent for protecting the confidentiality of stored
and transmitted data. However, it imposes computational complexity
and increases latency, so it should be used with caution in time-
sensitive systems.

16

8
Type of Hashing Function
Safe and Secure Hash Function
 SHA-2
 SHA-3
 BLAKE2

Unsafe Hash Function


 SHA-1
 MD5
17

Type of Hashing Function


SHA-2 (Secure Hash Algorithm-2)
 The SHA-2 hash function family (SHA-224, SHA-256, SHA-384,
SHA-512) was published by the NSA in 2001.
 SHA-2 is implemented in commonly used security protocols,
such as Transport Layer Security (TLS) and Secure Sockets
Layer (SSL).
 SHA-256 is used to verify the transactions and calculate proof
of work in Bitcoin and other cryptocurrencies.
 SHA-2 is secure, safe, well-supported, and widely used today.

18

9
Type of Hashing Function

• More bits at the hash output are expected to achieve stronger security and higher collision
resistance (with some exceptions).
• Thus, SHA-512 is stronger than SHA-256, it is expected that SHA-512 is more unlikely to
practically find a collision than for SHA-256.

19
William Smith (2017). Statistical Analysis of the SHA-1 and SHA-2 Hash Functions

Type of Hashing Function


SHA-3
 It is the third generation of the SHA series of hash algorithms.
 SHA-3 is based on the Keccak family of hash algorithms and
was selected by NIST in 2015.
 SHA-3 is ideal for securing embedded subsystems, sensors,
consumer electronic devices, and other systems that use
symmetric key-based message authentication codes (MACs).
 SHA-3 (SHA3-224, SHA3-256, SHA3-384, SHA3-512), is
considered more secure than SHA-2 (SHA-224, SHA-256, SHA-
384, SHA-512) for the same hash length.
20

10
Type of Hashing Function
BLAKE2
 BLAKE2 is not as popular as SHA-2 or SHA-3.
 It leverages modern CPU architecture to hash at extreme
speeds.
 BLAKE2 should be considered if large amounts of data need to
be hashed.
 BLAKE2 comes in two flavors: BLAKE2b and BLAKE2s.
 BLAKE2b is optimized for 64-bit platforms.
 BLAKE2s is optimized for 8- to 32-bit platforms.
21

Type of Hashing Function


MD5 (Message-Digest 5)
 MD5 is an obsolete 128-bit hash function developed in the
early 1990s.
 This is one of the most used hash functions of all time.
 Unfortunately, MD5 is still in use even though researchers have
demonstrated MD5 collisions as far back as 2004.
 Today cryptanalysts can generate MD5 collisions on commodity
hardware in less than an hour.

22

11
Type of Hashing Function
SHA-1
 SHA-1 is an obsolete 160-bit hash function developed by the
NSA in the mid-1990s.
 Like MD5, this hash function was popular at one time but it is
no longer considered secure.
 The first collisions for SHA-1 were announced in 2017 by a
collaboration effort between Google and Centrum Wiskunde &
Informatica, a research institute in the Netherlands.
 In theoretical terms, this effort stripped SHA-1 of strong
collision resistance, not weak collision resistance.
23

Hashing Module in Python


 Python supports cryptographic hashing
natively.
 The hashlib module exposes everything
most programmers need for
cryptographic hashing.
 The algorithms_guaranteed set
contains every hash function that is
guaranteed to be available for all
platforms.
24

12
Hashing in Python
I. Built-In Hashing

 The result is different and will be different for each new Python
invocation.
 Python has never guaranteed that hash() is deterministic.

25

Hashing in Python
II. Hashing from hashlib module

26

13
Hashing in Python
III. Checksums
 A checksum is a small bit of data (generated hash value) to
verify the original data is as it should be.
 It can be used to verify the integrity of a file.
 Checksum security uses the generated hash value of a file and
compares it against the expected checksum (usually provided
by the sender of the file) value.
 Checksums is deterministic, means same data will return the
same result each time.
 It ensure the data has not been altered during transmission.
27

Hashing in Python
III. Checksums

Receive Verify data Checksum


No
file (Checksum) match ?

Yes
Receive hash value

28

14
Hashing in Python
IV. Near-Duplicate Detection
 It is a good property of a hash function to generate
significantly different hashes with small changes to the
data, especially for message digests.
 If the objective is to find out the data that have
similar content, near-duplicate detection is used to reduce
the amount of data stored.
 Some use cases require identifying subtle data differences
such as plagiarism detection.
29

Hashing in Python
IV. Near-Duplicate Detection
 The Simhash Python library is needed.

pip install simhash

30

15
Hashing in Python
V. Perceptual Hashing
 This hashing method is used to detect differences in
images and video.
 It can be used to eliminate storing duplicate content in a
video or determining an image is close enough to
consider it a duplicate, saving space.
 The ImageHash Python library is needed.

pip install ImageHash

31

Hashing in Python
V. Perceptual Hashing

The example above shows two near-duplicate images and how close their perceptual hashes are 32

16

You might also like