Chapter10_HashTables

This lecture covers hash tables, focusing on their structure, hash functions, and collision handling methods. It discusses the importance of efficient hash functions, the inevitability of collisions, and various strategies for collision resolution such as separate chaining and open addressing. The lecture also highlights the performance implications of different probing techniques, including linear probing, quadratic probing, and double hashing.

Uploaded by

aya boumelha

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Chapter10_HashTables

Uploaded by

aya boumelha

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 49

Lecture 10

Hash Tables

1
Main Points
 Introduction
 Hash Table, Hash Function, Collisions
 Handling Collisions
 Exercise

2
Let’s Look at this

3
Hash Tables: An introduction
 Goal: Is it possible to design a search of O(1): a
constant search time no mater where the element is
located in the list
 Example: Consider a list of employees in a small
company. Each of 100 employees has an ID number in
the range 0-99. If we store the elements( employee
records) in the array, then each employee’s ID
number will be an index to the array element where
record will be stored
 There is a one-to-one correspondence between the
element key and the array index
 What about if the company want to use a 5-digits ID
number as the primary key.
 Compare the size of the array with the number of
employees
4
One solution to the previous
problem
 What if we keep the array size down to the
size that we will actually be using (100
elements) and we use just the last two
digits of key to identify each employee?
 For example the employee with the key number
45678 will be stored in the element of the array
with index 78 and employee with 23456 will be
stored in array index 56
 We are looking for a way to convert a 5 digit
number into a two digit array index: some
function need to do the transformation: a hash
function to use a hash table (array)
5
Hash Table, Hash Function,
Collisions
 A Hash Table is a data structure in which
keys are mapped to array positions by a
hash function
 A Hash Function, is a function which,
when applied to the key, produces an
integer which can be used as an address
(index) in the hash table
 For our previous example:
int HashFunction (int id_number){
return( id_number % table_size);
}
6
Hash Table, Hash Function,
Collisions: continue…
 Suppose that the hash table contains
records of two employees with IDs 45678
and 23456 respectively.
 We need to add another employee whose ID is
34878!!
 The array element with index 78 has already a
value.
 When more than one element tries to occupy the
same array position, we have a collision
 Collision is a condition resulting when two or
more keys produce the same hash location

7
Issues surrounding Hashing
 Which HF to use?
 The Big Challenge
 Easy to compute. If the ‘Hash Algorithm’ is too
inefficient it will overshadow the advantages of the
technique
 Should distribute entries uniformly through the HT
slots
 Should minimize Collisions

 Can we avoid collisions?

 No - Collisions are inherent!  Regardless of the
type of the HF, we will likely experience collisions
because the domain of keys is usually larger than the
number of buckets

8
Other examples
 Implementing a Dictionary:
 HF: Sum the ASCII codes for the letters then mod n (n is HT
size!)
 raw and war would have the same key!
 How to implement a ‘Spelling Checker’?
 Create a HT for all the words in a dictionary
 When you encounter a word, whose spelling you want to
check, just hash it and see if it exists in the table.
 If it does, then you've spelled it correctly.
 If not, then you haven't.
 This allows you to look up a word in O(1) time rather than O(n)
time, which, in a dictionary on the order of 800,000 words, is a
big time saver.
 Cryptography: Use a hash function to encrypt your
passwords

9
Hash Function: Examples
 HFs generally take records whose key values come
from a large range, and stores those records in a HT
with a relatively small number of slots – Depends a lot
of the keys set
 Some HF Examples:
 Division
 F(x) = x mod m; best value for m is prime
 Mid-Square:
 Middle K digits in X²
 Folding:
 Given a key x1x2… xr
 F1(x1x2… xr) = x1x2+ x3x4…+ xr-1xr
 F2(x1x2… xr) = x2x1+ x4x3…+ xrxr-1
 Example: x = 251367
 F1(X) = 25 + 13 + 67 = 125
 F2(X) = 52 + 31 + 76 = 159
 Truncating:
 F(x) = last K digits of x or first K digits

10
Summarize what we learnt so
far….
 To add/retrieve an element from the
hash table:
Algorithm to add: Algorithm to get a
Add (key, value){ value:
Index=hash(key); dataT getValue(key){
Index=hash(key);
hashTable[Index]=value; return(hashTable[index])
} }
Will these algorithms always work? We can make
them work if we know all possible search keys,
appropriate Table size and perfect hash function
11
Collision, Handling Collision
 The two ways of dealing with
collisions are:
 Chaining: use linked list
 Open Addressing: Linear probing,
Quadratic probing, and Double Hashing

12
Problem
 Consider the following Hash table and
the following hash function: 0
1 1
H(x)=x² % 10
2
What we want is to insert [1,9] 3

4 Collisions! Because of the hash 4 2

5 5
function 6 4

Finish inserting the other 7

8
numbers using linear probing 9 3

13
Solution 1: Use Separate
Chaining
 Colliding records are chained together in
separate linked lists
 HT slots don’t hold data, rather it stores
pointers to the synonyms’ linked lists.
 If a collision happens, insert in the
corresponding linked list - O(1). (Insert
always at the head)
 Search/Delete ?
 Drawback: Use of another Data Structure,
Linear search through the Linked Lists

14
Advantages of Separate
chaining
 Simple collision handling
 No Overflow: we can store more
elements than the hash table size
 Deletion is done from the linked list

15
Example

16
Separate Chaining an
illustration
 Assume that we want to a list of
students into a hash table using their
IDs. The following program represent
how collision is solved using separate
chaining.
 We are using a table of 10 cells
 We are using ID%Size as a hash function
 Implement this code

17
18
19
20
21
22
23
24
Solution 2: Open addressing

 All data go inside the table itself

 Works when load factor is below 0.5
 Load factor?? In next slides
 If a collision happens
 Alternative cells are tried till and
empty cell is found.
25
Open Addressing
 No linked lists – All items are stored in the same HT
 Alternative cells ( h0(k), h1(k), .., hn(k)) are tried till an
empty cell is found. Each try is called a Probe
 hi(k) = hash(k) + fi(k)
 The function fi(k) is the collision resolution strategy
 Since each cell in the HT can hold only one item. A
bigger table is needed than in chaining
 Generally, HT Size >= 2N
 Several Methods:
 Linear Probing
 Quadratic Probing
 Double Hashing

26
Linear Probing
 In this method, f (k) is linear = i
i
 Linear probing Insert algorithm:
 If(table is full): error
 probe=h(k)
 While(table[probe] occupied)
 probe=(probe+1) mod m
 Table[probe]=k
 Search(k) Algorithm:
 Compute h(k)
 Look at HT[h(k)]:
 If empty (element does not exist)
 If full:
 Compare to K, if equal return it else:
 Loop/’circular linear search’ through successive slots
 If found return it
 If an empty slot found, element does not exist
 Drawback  Clustering
 Elements tend to cluster around full slots! Hence,
resulting in very long probes. A Solution  Quadratic Probing 27
Consider the following example
 H(x)= x%10
 F(i)=i
-Find 58(#tries?)
- Insert 19
- Find 19(#tries?)

28
Linear probing: drawbacks
 As long as the table is big enough, an
empty cell can always be found but
the time to do so can get quite large
 More, even if the table is relatively
empty, blocks of occupied cells start
forming
 Primary clustering

29
Linear Searching Analysis
 We want to compute the average probes for a 0 9
successful and unsuccessful search for this hash 1
table 2 2
 H(x)= x mod 11 3 13
 Case 1: Successful Search 4 25
20,30,2,13,25,24,10,9 5 24
Avg=(1+1+1+2+2+4+1+3)/8=15/8 6
(<2: two search/each) 7
 Case 2: Unsuccessful Search 8 30
we are searching for: 0,1,2,3,4,5,6,7,8,9,10 9 20
Avg=(2+1+1+4+3+2+1+1+5+3+1)/11=24/11 10 10

30
Solution 3: Quadratic Probing
 Eliminates Clustering by probing separated slots
 fi(k) is linear = i²
 If collision happens at HT[k], look successively at
K+1², K+2², … till empty cell found

 Solves the ‘Clustering’ BUT can lead to ‘Secondary

Clustering’:
 I.e, Colliding Elements will try the same probes

31
Example
 H(x)=x mod 10 0
1
 Insert: 3,5,13,24,33,45,54 2
 54? 3 3
4 13
5 5
6 45
7 33
8 24
9

32
Solution 4: Double Hashing
 Avoids both Primary and Secondary
Clustering
 Idea:
 The probe should depend on the key instead of
being the same for all keys
 Use another Hash Function. Hence, the
increment is defined by second function
 The Second HF should:
 Depend on the key
 Be different from the first! Why?
 Not returning Zero

33
Double Hashing Insert
Algorithm
 If (table is full) error
 probe=h1(key), offset=h2(k)
 While(table[probe] is occupied)
 probe=(probe+offset) mod m
 Table[probe]=k
 The probe goes to probe, probe+offset,
probe+2*offset, probe+3*offset…..

34
Double Hashing (Cont.)
 Ideal functions are of this format:
 h2(Key) = Const – (Key % Const)
 Where Const is a prime number less than
HT size
 Example: (Const = 5)

35
Illustration of linear probing and
double hashing

36
37
38
39
40
41
42
Load Factor and Hash Tables

43
Analysis of Separate Chaining
 Load factor λ definition:
 Ratio of number elements (N) in a hash
table to hash table size
 i.e: λ= N/TableSize
 The average length of a list is also λ
 For chaining λ is not bound by 1, it can
be >1 (Hash Table size is 10 but N=100)
 So to delete/search an element: time to
compute hash function + length of the
chain: λ so search and delete:
O(1+λ)=O(λ) 44
Separate Chaining Performance
 Search cost is proportional to length
of chain
 Worst case, all keys hash to same chain
 When size of hash table is too large
 Many empty slots
 When size of hash table is small
 You end up having long chains

45
Linear Probing Performance
 Insert and search depend on the length of the
cluster
 Average length of the cluster is λ
 Worst case: all keys hashed to same cluster
 When size of the array is too large:
 You will have many HT entries empty
 When the size is too small
 clusters!
 Typical choice: size=2*N_elements

46
Analysis of double hashing and
quadratic hashing
 Remember, the load factor λ=N elements in the HT/HT
size
 This means 1- λ: represent a fraction of how many empty
location in HT
 So the expected number of probes to find an empty
location (i.e: unsuccessful search)will be 1/(1- λ)
 Even though double hashing avoids the clustering of
linear probing and quadratic probing, the estimate of its
efficiency was proved to the be the same as quadratic
probing.

47
Hash Tables: A Summary
 A hash table is based on an array
 The range of key values is usually greater than the
size of the array
 A key value is hashed to an array index by a hash
function
 The hashing of a key to an already filled cell is called a
collision
 Collision can be handled using open addressing or
separate chaining
 In open addressing, data items that hash to a full
array cell are placed in another cell in the array
 In separate chaining, each array element consist of a
linked list

48
Hash Tables: A Summary
 In linear probing the step size is always 1
 The number of tries required to find an item is
called the probe length
 In linear probing, contiguous sequence of
filled cells appear: primary cluster
 Quadratic probing eliminates primary
clustering but suffers from less severe
secondary clustering
 In double hashing the step depends on the
key and is obtained from a second hash
function

Hashing Updated
No ratings yet
Hashing Updated
26 pages
15 HashTables
No ratings yet
15 HashTables
27 pages
Lect Hashing
No ratings yet
Lect Hashing
36 pages
Hash Table v2
No ratings yet
Hash Table v2
34 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
27 pages
Chapter 5_Hashing _Part1
No ratings yet
Chapter 5_Hashing _Part1
28 pages
Hashing
No ratings yet
Hashing
20 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
26 pages
Hashing new
No ratings yet
Hashing new
48 pages
Hashing
No ratings yet
Hashing
35 pages
Hashing
No ratings yet
Hashing
35 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
32 pages
Cse373 10 Hashing
No ratings yet
Cse373 10 Hashing
36 pages
3 Hashing
No ratings yet
3 Hashing
20 pages
TCP2101 Algorithm Design & Analysis: - Hash Tables
No ratings yet
TCP2101 Algorithm Design & Analysis: - Hash Tables
58 pages
05 Hashing
No ratings yet
05 Hashing
47 pages
11 Hashing
No ratings yet
11 Hashing
60 pages
Lecture 12
No ratings yet
Lecture 12
33 pages
unit 1 Hashing
No ratings yet
unit 1 Hashing
61 pages
Algo Cha 8
No ratings yet
Algo Cha 8
20 pages
Hashing PPT For Student
No ratings yet
Hashing PPT For Student
53 pages
Lecture 3.2.2 Collision Resolution Strategies
No ratings yet
Lecture 3.2.2 Collision Resolution Strategies
35 pages
Hashing
No ratings yet
Hashing
66 pages
Hash Tables in DS
No ratings yet
Hash Tables in DS
14 pages
Hashing PDF
No ratings yet
Hashing PDF
56 pages
Hashing PPT
No ratings yet
Hashing PPT
39 pages
Hashing and Indexing
No ratings yet
Hashing and Indexing
28 pages
Hash Table PDF
No ratings yet
Hash Table PDF
25 pages
Hashing RPK
No ratings yet
Hashing RPK
61 pages
CSE 326: Data Structures Hash Tables: Autumn 2007
No ratings yet
CSE 326: Data Structures Hash Tables: Autumn 2007
29 pages
IT245 - Module 8
No ratings yet
IT245 - Module 8
41 pages
11. Hafta. (3)
No ratings yet
11. Hafta. (3)
34 pages
Hashing
No ratings yet
Hashing
56 pages
ADS M TECH MID 2
No ratings yet
ADS M TECH MID 2
26 pages
Hashing in Data Structure
No ratings yet
Hashing in Data Structure
43 pages
Cs 218 - Data Structures: Hashing
No ratings yet
Cs 218 - Data Structures: Hashing
18 pages
L04 Hashing
No ratings yet
L04 Hashing
63 pages
ds 5 update
No ratings yet
ds 5 update
26 pages
Struktur Data: By: Sri Rezeki Candra Nursari
No ratings yet
Struktur Data: By: Sri Rezeki Candra Nursari
34 pages
Chapter_3_File_Organization_Hashing
No ratings yet
Chapter_3_File_Organization_Hashing
37 pages
Hashing
50% (2)
Hashing
43 pages
Hashing PDF
No ratings yet
Hashing PDF
65 pages
Hashing
No ratings yet
Hashing
10 pages
Dsa Lecture 13 Hash Tables
No ratings yet
Dsa Lecture 13 Hash Tables
15 pages
hashing v2 12032018
No ratings yet
hashing v2 12032018
23 pages
Chapter One - Hashing PDF
No ratings yet
Chapter One - Hashing PDF
30 pages
Chapter 8 - Searching
No ratings yet
Chapter 8 - Searching
44 pages
Hashing
No ratings yet
Hashing
23 pages
5 Hash_new
No ratings yet
5 Hash_new
24 pages
Group 15 Hash Tables
No ratings yet
Group 15 Hash Tables
42 pages
Hashing
No ratings yet
Hashing
30 pages
09 Hashtable
No ratings yet
09 Hashtable
53 pages
Chapter 11 Hashing
No ratings yet
Chapter 11 Hashing
42 pages
L21 Hashing
No ratings yet
L21 Hashing
55 pages
Hashing
No ratings yet
Hashing
13 pages
Hashing
No ratings yet
Hashing
44 pages
Hashing in Data Structure
No ratings yet
Hashing in Data Structure
25 pages
Maps and Hashing - Final
No ratings yet
Maps and Hashing - Final
51 pages
300+ Python Algorithms: Mastering the Art of Problem-Solving
From Everand
300+ Python Algorithms: Mastering the Art of Problem-Solving
Hernando Abella
5/5 (1)
Hashing
From Everand
Hashing
Prakash Hegade
No ratings yet
Week 11 Pointers
No ratings yet
Week 11 Pointers
19 pages
Intro to Psych Syllabus Spring 2021 (1)
No ratings yet
Intro to Psych Syllabus Spring 2021 (1)
3 pages
paper_anthropology_final-1
No ratings yet
paper_anthropology_final-1
7 pages
The_Survival_Games_Trailer
No ratings yet
The_Survival_Games_Trailer
2 pages
Solution
No ratings yet
Solution
247 pages
Informed Search: Prepared by Dr. Megharani Patil
No ratings yet
Informed Search: Prepared by Dr. Megharani Patil
22 pages
DSA Java Hashing (M 26)
No ratings yet
DSA Java Hashing (M 26)
8 pages
Hash Function - Wikipedia, The Free Encyclopedia
No ratings yet
Hash Function - Wikipedia, The Free Encyclopedia
5 pages
AI.2a-Solving Problems by Searching (5-10)
No ratings yet
AI.2a-Solving Problems by Searching (5-10)
96 pages
Unit 5
No ratings yet
Unit 5
20 pages
The Secure Hash Function (SHA) : Network Security
No ratings yet
The Secure Hash Function (SHA) : Network Security
24 pages
Xpbctbxabpqxctbpg Abxab: The Boyer-Moore Algorithm Right-To-Left Scan
No ratings yet
Xpbctbxabpqxctbpg Abxab: The Boyer-Moore Algorithm Right-To-Left Scan
5 pages
AI Sheet 3 - Problem Solving As Search (Heuristic Search - Adversarial Search) PDF
No ratings yet
AI Sheet 3 - Problem Solving As Search (Heuristic Search - Adversarial Search) PDF
2 pages
20. Hashing Technique
No ratings yet
20. Hashing Technique
8 pages
Hashing - Datastructures and Algorithms
No ratings yet
Hashing - Datastructures and Algorithms
32 pages
4MTH312 Ford Fulkerson
No ratings yet
4MTH312 Ford Fulkerson
6 pages
Hashing: 1/40 Data Structures and Algorithms in Java
No ratings yet
Hashing: 1/40 Data Structures and Algorithms in Java
42 pages
Ch05 Integer Programming
No ratings yet
Ch05 Integer Programming
29 pages
Lecture Notes - 4
No ratings yet
Lecture Notes - 4
16 pages
Be Computer Engineering Semester 3 2023 December Data Structurerev 2019 C Scheme
No ratings yet
Be Computer Engineering Semester 3 2023 December Data Structurerev 2019 C Scheme
2 pages
Artificial Intelligence: Course Instructor: Dr. Muhammad Kamran Malik
No ratings yet
Artificial Intelligence: Course Instructor: Dr. Muhammad Kamran Malik
46 pages
19MID0068 - Adv Algo - ETH DA-1
No ratings yet
19MID0068 - Adv Algo - ETH DA-1
14 pages
EXP 10-Linear Quadratic
No ratings yet
EXP 10-Linear Quadratic
6 pages
DSA Lab 10
No ratings yet
DSA Lab 10
5 pages
Intro To Artificial Intelligence Search: Ahmed Ezzat Labib Helwan University
No ratings yet
Intro To Artificial Intelligence Search: Ahmed Ezzat Labib Helwan University
21 pages
DS M5 Question Bank
No ratings yet
DS M5 Question Bank
3 pages
Informed and Uninformed Search
No ratings yet
Informed and Uninformed Search
74 pages
Schaum or - Branch & Bound Algorithms
No ratings yet
Schaum or - Branch & Bound Algorithms
16 pages
What is Searching
No ratings yet
What is Searching
8 pages
UNIT-2 AI
No ratings yet
UNIT-2 AI
40 pages
Unit 1
No ratings yet
Unit 1
33 pages
AI Unit 2
No ratings yet
AI Unit 2
37 pages
AIM 30 Edwards Hart Alpha Beta Heuristic
No ratings yet
AIM 30 Edwards Hart Alpha Beta Heuristic
5 pages
Activity 2 and 3 - Artificial Intelligence - CS5 AND CS 26
No ratings yet
Activity 2 and 3 - Artificial Intelligence - CS5 AND CS 26
2 pages