Chapter - 5 (Graphs and Hashing)
Chapter - 5 (Graphs and Hashing)
topic: graphs
graph is a non-linear data structure e1
A B
G(V,E)
V: set of vertices
e3 e2
E: set of edges e5
V: {A, B, C, D}
E: {e1, e2, e3, e4, e5} C D
e4
dense sparse
characteristics:
(i) space complexity:
- 0(n²) irrespective of the number of edges.
- takes a lot of space for sparse graphs (graphs with fewer edges).
a bidirectional edge represents a connection between two vertices in both directions. in an undirected graph,
the edge is symmetric.
characteristics:
(i) no direction: edge u → v implies both u → v and v → u.
example:
do not count self loops
is there any edge from 1 to 1?
1 3
1 2 3 4
1 0 1 1 1
A= 2 1 0 0 1
3 1 0 0 1
4 1 1 1 1
2 4
a directional edge (or directed edge) represents a connection between two vertices in a specific direction. in a
directed graph, edges have an orientation, meaning they go from one vertex to another.
characteristics:
(i) direction matters: edge u → v does not imply v → u.
example:
do not count self loops
is there any edge from 1 to 2?
1 3
1 2 3 4
1 0 0 1 1
A= 2 1 0 0 1
3 0 0 0 0 because there is no edge going out from 3
4 0 1 1 0
2 4
characteristics:
(i) space complexity:
- 0(v + e) where v is the number of vertices and e is the number of edges.
- much more space-efficient for sparse graphs.
example:
adjacency list
1 3
1 2 4 3
2 1 4
3 4 1
2 4 4 3 1 2
(i) visit
visiting a node means that you have reached the node during traversal and marked it as discovered.
- purpose: visiting ensures that the node is acknowledged in the traversal sequence, and any associated
action (like printing, marking, or processing) is performed.
characteristics:
- occurs when you first encounter a vertex.
- a node is often marked as "visited" to prevent revisiting.
example: in bfs, when you dequeue a node from the queue, it’s visited.
(ii) explore
exploring a node means examining its neighbors or edges to determine further traversal paths.
- purpose: exploring ensures that all the connected components or adjacent vertices of a node are considered for
traversal.
characteristics:
- involves traversing edges from the current vertex to its neighbors.
- typically, when a vertex is explored, its neighbors may be marked for a visit.
example: in dfs, when you recursively call dfs for a neighbor, you are exploring the vertex.
Chapter - 5 (Graphs and Hashing) Page 4
example: in dfs, when you recursively call dfs for a neighbor, you are exploring the vertex.
use case: ensures all paths and connections are fully traversed.
applications of bfs
- shortest path: finding the shortest path in an unweighted graph.
- connected components: identifying all connected components in a graph.
- network flow: bfs is used in algorithms like edmonds-karp.
- ai and games: solving puzzles like mazes and shortest path problems.
example: tree
in a tree, hierarchy is maintained, allowing us to reach any node by traversing through the left subtree or
the right subtree.
1
source
1
source
2 3
4 5 6 7
1
source
2 3
4 5 6 7
example: queue
2 6
queue
source 1 4 5 7
visited 0 0 0 0 0 0 0 0
3 8
in graphs, there may be cycles or multiple edges leading to the same node.
revisiting nodes can lead to infinite loops or redundant work, the visited set
keeps track of nodes that have already been processed, ensuring each node is
visited only once.
queue 1
source 1 4 5 7
3 8
2 6
queue 1 2 3 4
source 1 4 5 7
visited 1 1 1 1 0 0 0 0
1 2 3 4 5 6 7 8
3 8
queue 1 2 3 4 5
source 1 4 5 7
visited 1 1 1 1 1 0 0 0
1 2 3 4 5 6 7 8
3 8
queue 1 2 3 4 5 6 7 8
source 1 4 5 7
visited 1 1 1 1 1 1 1 1
1 2 3 4 5 6 7 8
3 8
3 8
1 →{2, 3,4}→{5}→{6,7,8}
key points:
(i) explores deeper into the graph:
dfs starts at the root (or any arbitrary node) and explores as deep as possible along a branch before
backtracking.
(ii)stack-based exploration:
dfs uses a stack to remember which nodes to visit next. this stack can be implicit with recursion or explicit
using an actual stack data structure.
- space complexity:
O(V), for storing visited nodes and the stack.
key points:
- dfs explores deeply into the graph, making it suitable for tasks like finding connected components,
topological sorting, and cycle detection.
- recursion is a natural fit for dfs because the call stack automatically manages the traversal process.
source 1 4 5 7
3 8
(ii) during the exploration of 1 we found 2 so stop exploring 1 and visited the unvisited node.
X 2 6
✓✓
1 2
source 1 4 5 7
3 8
(iii) during the exploration of 2 we found nothing so stop exploring 2 and return back to 1.
✓ 2 6
X X
✓✓
1 2
source 1 4 5 7
3 8
(iv) during the exploration of 1 we found 3 so stop exploring 1 and visit the unvisited node.
X 2 6
✓
X X
✓✓✓
1 2 3 source 1 4 5 7
3 8
(v) during the exploration of 3 we found nothing so stop exploring 3 and return to 1.
✓ 2 6
X
✓
X X X
✓✓✓ source 1 4 5 7
1 2 3
3 8
(vi) during the exploration of 1 we found 4 so stop exploring 1 and visit the unvisited node.
X 2 6
✓
X
✓
X X X source 1 4 5 7
✓✓✓✓
1 2 3 4
3 8
(vi) during the exploration of 4 we found 5 so stop exploring 4 and visit the unvisited node.
X 2 6
✓
X
✓
X X XX source 1 4 5 7
✓✓✓✓ ✓
1 2 3 4 5
3 8
(vii) during the exploration of 5 we found 8 so stop exploring 5 and visit the unvisited node.
X 2 6
✓
Chapter - 5 (Graphs and Hashing) Page 10
(vii) during the exploration of 5 we found 8 so stop exploring 5 and visit the unvisited node.
X 2 6
✓
X
✓
X X XX X X source 1 4 5 7
✓✓✓✓ ✓✓
1 2 3 4 5 8
3 8
(viii) during the exploration of 8 we found nothing so stop exploring 8 and visit the return to 5.
X 2 6
✓
X
✓ ✓
X X XX X X source 1 4 5 7
✓✓✓✓ ✓✓
1 2 3 4 5 8
3 8
(ix) during the exploration of 5 we found 7 so stop exploring 5 and visit the univisited node.
X 2 6
✓
X X
✓ ✓
X X XX X X source 1 4 5 7
✓✓✓✓ ✓✓ ✓
1 2 3 4 5 8 7
3 8
(x) during the exploration of 7 we found nothing so stop exploring 7 and return back to 5.
X 2 6
✓ ✓
X X
✓ ✓
X X XX X X X source 1 4 5 7
✓✓✓✓ ✓✓ ✓
1 2 3 4 5 8 7
3 8
(xi) during the exploration of 5 we found 6 so stop exploring 5 and visit the unvisited node.
X X
2 6
✓ ✓
X X
✓ ✓
X X XX X X X X source 1 4 5 7
✓✓✓✓ ✓✓ ✓✓
1 2 3 4 5 8 7 6
3 8
(xii) during the exploration of 6 we found nothing so stop exploring 6 and return back to 5.
✓ 2 6
X X
✓ ✓
X X
✓ ✓ source 1 4 5 7
X X XX X X X X
✓✓✓✓ ✓✓ ✓✓
1 2 3 4 5 8 7 6
3 8
(xiii) during the exploration of 5 we found nothing so stop exploring 5 and return back to 4.
2 6
X
✓
X X
✓ ✓ source 1 4 5 7
X X
✓ ✓ ✓
X X XX X X X X
✓✓✓✓ ✓✓ ✓✓ 3 8
1 2 3 4 5 8 7 6
(xiv) during the exploration of 4 we found nothing so stop exploring 4 and return back to 1.
2 6
X X
✓ ✓
X X
✓ ✓ source 1 4 5 7
X X X
✓ ✓ ✓
X X XX X X X X
✓✓✓✓ ✓✓ ✓✓ 3 8
1 2 3 4 5 8 7 6
topic: hashing
hashing is a technique used to map data (keys) to fixed-size values, typically integers, through a hash
function. it is commonly used for data retrieval and allows for efficient lookups, insertions, and deletions.
(ii)hash table:
- a data structure that uses a hash function to store data in an array-like structure.
- the index at which a key-value pair is stored is determined by the hash value.
(iii) collision:
- occurs when two different keys hash to the same index in the hash table.
- collision resolution techniques are used to handle this, such as chaining or open addressing.
types of hashing:
(i) direct hashing:
simple hash function that directly maps keys to an index (e.g., the key itself is used as the index).
applications of hashing:
(i)searching and retrieval:
provides constant time complexity O(1) for average-case search, insertion, and deletion.
(v) caching:
used to quickly retrieve frequently accessed data from a cache.
time complexity:
(i) search, insert, and delete:
- average case: O(1), assuming the hash function distributes keys uniformly and minimizes collisions.
- worst case: O(n), if all keys hash to the same index, resulting in a chain or a full probe sequence.
example:
keys: 12, 18, 15, 14, 13, 29, 31, 57
m = 10
h(12)=12mod10= 2
0
h(18)=18mod10= 8 1 31
2 12
h(15)=15mod10= 5
3 13
h(14)=14mod10= 4 4 14
5 15
h(13)=13mod10= 3
6
h(29)=29mod10= 9 7 57
8 18
h(31)=31mod10= 1
9 29
h(57)=31mod10= 7
h(12)=12mod10= 2
0
h(23)=23mod10= 3 1
2
h(42)=42mod10= 2
3
4
mapping to same location.
causes "collision" 5
steps:
- when a key hashes to an index i and that index is already occupied, the algorithm checks the next index (i+
1) % table_size.
- this process continues until an empty slot is found or the table is full.
pros:
- simple and easy to implement.
- good for small-sized hash tables or low load factors.
cons:
primary clustering: consecutive keys may cause primary clustering, where groups of consecutive occupied
slots form, making future lookups slower.
why mod n?
m = 10 0
0, 1, 2, 3, 4, 5, 6, 7, 8, 9
1
let h(k1)=4 collison 2
h(k, 1) = (h(k1)+1)= (4+1) = 5 (already filled) 3
h(k, 2) = (h(k1)+2)= (4+2) = 6 (already filled)
4
h(k, 3) = (h(k1)+3)= (4+3) = 7 (already filled)
h(k, 4) = (h(k1)+4)= (4+4) = 8 (already filled) 5
h(k, 5) = (h(k1)+5)= (4+5) = 9 (already filled) 6
h(k, 6) = (h(k1)+6)= (4+6) = 10 (doesnt exist)
7
cells were not filled but we did not check, in order to 8
check them we need to move circular and that is why we 9
require modn
example:
keys: 31, 26, 43, 22, 34, 46, 14, 58, 13
m= 12
(i) insert 31
(iv) insert 27
simple hash function
h(k)=kmodn 0
h(27)=27mod12 = 3 1
2 26
(v) insert 34
3 27
simple hash function
4
h(k)=kmodn
h(34)=34mod12 = 10 5
6
(vi) insert 46 7 31
simple hash function 8 43
h(k)=kmodn
9
h(46)=46mod12 = 10
collision occured 10 34
collision function 11 46
k=46, i=1
h(k,i)=(h(k)+i)modn
h(46,1)=(h(46)+1)mod12
h(46,1)=(10+1)mod12
h(46,1)= 11mod12 = 11
(vii) insert 14
simple hash function
h(k)=kmodn 0
(viii) insert 58
simple hash function
h(k)=kmodn
h(58)=58mod12 = 10 0 58
collision occured 1
collision function 26
2
k=58, i=1
h(k,i)=(h(k)+i)modn 3 27
h(58,1)=(h(58)+1)mod12 4 14
h(58,1)=(10+1)mod12 5
h(58,1)= 11mod12 = 11
collision occured 6
collision function 7 31
k=58, i=2
8 43
h(k,i)=(h(k)+i)modn
h(58,2)=(h(58)+2)mod12 9
h(58,2)=(10+2)mod12 10 34
h(58,2)= 12mod12 = 0 11 46
(ix) insert 13
direct method:
use simple hash function, if collison occurs use collision function
m=12
keys: 31, 26, 43, 22, 34, 46, 14, 58, 13
7 2 7 3 10 10 2 10 1
8 11 3 11
4 0
problem:
primary clustering: consecutive keys may cause primary clustering, where groups of consecutive occupied
slots form, making future lookups slower.
pros:
- reduces primary clustering compared to linear probing.
- more evenly distributes keys in the table.
cons:
- still may suffer from secondary clustering (where similar keys still collide).
- requires careful handling of table size to avoid infinite loops (if the table is nearly full).
example:
keys: 24, 17, 32, 2, 13
m= 11
(i) insert 24 0
simple hash function 1
h(k)=kmodn
2 24
h(24)=24mod11 = 2
3
(ii) insert 17 4
simple hash function 5
h(k)=kmodn 6 17
h(17)=17mod11 = 6
7
(iii) insert 32 8
(iv) insert 2
simple hash function
h(k)=kmodn
h(2)=2mod11 = 2 0
collision occured
1 2
collision function
k=2, i=1 2 24
h(k,i)=(h(k)+i2)modn 3 13
h(2,1)=(h(2)+12)mod11
4
h(2,1)=(2+1)mod11
h(2,1)= 3mod11 = 1 5
Chapter - 5 (Graphs and Hashing) Page 22
h(k,i)=(h(k)+i2)modn 3 13
h(2,1)=(h(2)+12)mod11
4
h(2,1)=(2+1)mod11
h(2,1)= 3mod11 = 1 5
6 17
(v) insert 13 7
simple hash function 8
h(k)=kmodn
h(13)=13mod11 = 2 9
collision occured 10 32
collision function 11
k=13, i=1
h(k,i)=(h(k)+i2)modn
h(13,1)=(h(13)+12)mod11
h(13,1)=(13+1)mod11
h(13,1)= 14mod11 = 3
key: 24, 2, 13
m: 11 0
(i) insert 24 1
simple hash function 2
h(k)=kmodn 3
h(24)=24mod11 = 2
4
(ii) insert 2 5
simple hash function 6
h(k)=kmodn 7
h(2)=2mod11 = 2
8
(iii) insert 13 9
simple hash function 10
h(k)=kmodn 11
h(13)=13mod11 = 2
keys that are hashed to same locations follows the same resolution path because of which we are not able to
utilize the table efficiently.
inspite of almost 50% available space we are not able to insert a new elements.
collision occured
collision function (double hashing)
h(k,i)=(h(k)+i.h'(k))modn
h'(k)=7-(kmod7)
- h'(k) never generates 0.
problem:
overhead, because of two hash functions