Hashing and Graphs
Hashing and Graphs
In data structures,
There are several searching techniques like linear search, binary search, search
trees etc.
In these techniques, time taken to search any particular element depends on the
total number of elements.
Example-
As the number of elements increases, time taken to perform the search also
increases.
This becomes problematic when total number of elements become too large.
Advantage-
Unlike other searching techniques,
An array data structure called as Hash table is used to store the data items.
Based on the hash key value, data items are inserted into the hash table.
Hash Table is defined as follows...
Hash table is just an array which maps a key (data) into the data structure with
the help of hash function such that insertion, deletion and search operations
are performed with constant time complexity (i.e. O(1)).
Basic concept of hashing and hash table is shown in the following figure...
1. Division method
In this the hash function is dependent upon the remainder of a division. For
example:-if the record 52,68,99,84 is to be placed in a hash table and let us
take the table size is 10.
Then:
h(key)=record% table size.
2=52%10
8=68%10
9=99%10
4=84%10
H(key)=124+655+12
=791
Collision
It is a situation in which the hash function returns the same hash key for
more than one record, it is called as collision. Sometimes when we are
going to resolve the collision it may lead to a overflow condition and this
overflow and collision condition makes the poor hash function.
Collision resolution technique
Collision occurs when hash value of the new key maps to an occupied bucket of
the hash table.
Collision resolution techniques are classified as-
The idea is to make each cell of hash table point to a linked list of records that
have same hash function value.
Let us consider a simple hash function as “key mod 7” and sequence of keys as
50, 700, 76, 85, 92, 73, 101.
Example2
Open Addressing-
In open addressing,
Unlike separate chaining, all the keys are stored inside the hash table.
No key is stored outside the hash table.
Techniques used for open addressing are-
Linear Probing
Quadratic Probing
Double Hashing
1) Linear probing
let hash(x) be the slot index computed using hash function and S be the table
size
If slot hash(x) % S is full, then we try (hash(x) + 1) % S
If (hash(x) + 1) % S is also full, then we try (hash(x) + 2) % S
If (hash(x) + 2) % S is also full, then we try (hash(x) + 3) % S
..................................................
..................................................
Example2
2) Quadratic probing
67, 90,55,17,49.
In this we can see if we insert 67, 90, and 55 it can be inserted easily but at
case of 17 hash function is used in such a manner that :-(17+0*0)%10=17
(when x=0 it provide the index value 7 only) by making the increment in
value of x. let x =1 so (17+1*1)%10=8.in this case bucket 8 is empty hence
we will place 17 at index 8.
3) Double hashing
It is a technique in which two hash function are used when there is an
occurrence of collision. In this method 1 hash function is simple as same as
division method. But for the second hash function there are two important
rules which are
Where, p is a prime number which should be taken smaller than the size of
a hash table.
We use another hash function hash2(x) and look for i*hash2(x) slot in i’th rotation.
let hash(x) be the slot index computed using hash function.
If slot hash(x) % S is full, then we try (hash(x) + 1*hash2(x)) % S
If (hash(x) + 1*hash2(x)) % S is also full, then we try (hash(x) +
2*hash2(x)) % S
If (hash(x) + 2*hash2(x)) % S is also full, then we try (hash(x) +
3*hash2(x)) % S
..................................................
..................................................
S.No
MCQs
1)A hash table of length 10 uses open addressing with hash function h(k)=k mod 10, and
linear probing. After inserting 6 values into an empty hash table, the table is as shown
below.
Which one of the following choices gives a possible order in which the key values could have
been inserted in the table?
How many different insertion sequences of the key values using the hash function h(k) =
k mod 10 and linear probing will result in the hash table shown below?
A 10
B 20
C 30
D 40
Answer: C
Explanation:
In a valid insertion sequence, the elements 42, 23 and 34 must appear before 52
and 33, and 46 must appear before 33. Total number of different sequences = 3!
x 5 = 30 In the above expression, 3! is for elements 42, 23 and 34 as they can
appear in any order, and 5 is for element 46 as it can appear at 5 different places.
4)Consider a hash table of size seven, with starting index zero, and a hash function
(3x + 4)mod7. Assuming the hash table is initially empty, which of the following is the
contents of the table when the sequence 1, 3, 8, 10 is inserted into the table using closed
hashing? Note that ‘_’ denotes an empty location in the table.
A 8, _, _, _, _, _, 10
B 1, 8, 10, _, _, _, 3
C 1, _, _, _, _, _,3
D 1, 10, 8, _, _, _, 3
Answer: B
Explanation
Let us put values 1, 3, 8, 10 in the hash of size 7. Initially, hash table is empty
- - - - - - -
0 1 2 3 4 5 6
The value of function (3x + 4)mod 7 for 1 is 0, so let us put the value at 0
1 - - - - - -
0 1 2 3 4 5 6
The value of function (3x + 4)mod 7 for 3 is 6, so let us put the value at 6
1 - - - - - 3
0 1 2 3 4 5 6
The value of function (3x + 4)mod 7 for 8 is 0, but 0 is already occupied, let us put the
value(8) at next available space(1)
1 8 - - - - 3
0 1 2 3 4 5 6
The value of function (3x + 4)mod 7 for 10 is 6, but 6 is already occupied, let us put the
value(10) at next available space(2)
1 8 10 - - - 3
0 1 2 3 4 5 6
5)Consider a hash table with 100 slots. Collisions are resolved using chaining.
Assuming simple uniform hashing, what is the probability that the first 3 slots are
unfilled after the first 3 insertions?
A (97 × 97 × 97)/1003
B (99 × 98 × 97)/1003
C (97 × 96 × 95)/1003
D (97 × 96 × 95)/(3! × 1003)
Answer:A
6)Which one of the following hash functions on integers will distribute keys most uniformly
over 10 buckets numbered 0 to 9 for i ranging from 0 to 2020?
A h(i) =i2 mod 10
B h(i) =i3 mod 10
C h(i) = (11 ∗ i2) mod 10
since mod 10 is used, the last digit matters. If you do cube all numbers from 0 to 9, you get
following
Number Cube Last Digit in Cube
0 0 0
1 1 1
2 8 8
3 27 7
4 64 4
5 125 5
6 216 6
7 343 3
8 512 2
9 729 9
Therefore all numbers from 0 to 2020 are equally divided in 10 buckets. If we make a table
for square, we don't get equal distribution. In the following table. 1, 4, 6 and 9 are repeated,
so these buckets would have more entries and buckets 2, 3, 7 and 8 would be empty.
Number Square Last Digit in Cube
0 0 0
1 1 1
2 4 4
3 9 9
4 16 6
5 25 5
6 36 6
7 49 9
8 64 4
9 81 1
Alternative approach - Using concept of power of cycle: (a) (0,1,4,9,6,5,6,9,4,1,0) repeated
(b) (0,1,8,7,4,5,6,3,2,9) repeated (c) (0,1,4,9,6,5,6,9,4,1,0) repeated (d) (0,2,4,6,8) repeated
So, only h(i) =i3mod 10 covers all the digits from 0 to 9. Option (B) is correct.
7)Given a hash table T with 25 slots that stores 2000 elements, the load factor α for T is
__________
A 80
B 0.0125
C 8000
D 1.25
Answer: A
Explanation:
load factor = (no. of elements) / (no. of table slots) = 2000/25 = 80
9)An advantage of chained hash table (external hashing) over the open addressing scheme is
A Worst case complexity of search operations is less
B Space used is less
C Deletion is easier
D None of the above
Answer: C
10)Insert the characters of the string K R P C S N Y T J M into a hash table of size 10. Use
the hash function
h(x) = ( ord(x) – ord("a") + 1 ) mod10
If linear probing is used to resolve collisions, then the following insertion causes collision
A Y
B C
C M
D P
Answer: C
Explanation:
(a) The hash table with size 10 will have index from 0 to 9. hash function = h(x) = ((ord(x) -
ord(A) + 1)) mod 10 So for string K R P C S N Y T J M:
K will be inserted at index : (11-1+1) mod 10 = 1
R at index: (18-1+1) mod 10 = 8
P at index: (16-1+1) mod 10 = 6
C at index: (3-1+1) mod 10 = 3 S at index: (19-1+1) mod 10 = 9 N at index: (14-1+1) mod 10 = 4 Y at
index (25-1+1) mod 10 = 5 T at index (20-1+1) mod 10 = 0 J at index (10-1+1) mod 10 = 0 // first
collision occurs. M at index (13-1+1) mod 10 = 3 //second collision occurs. Only J and M are causing
the collision. (b) Final Hash table will be:
0 T
1 K
2 J
3 C
4 N
5 Y
6 P
7 M
8 R
9 S
In the graph,
V = {0, 1, 2, 3}
E = {(0,1), (0,2), (0,3), (1,2)}
G = {V, E}
Graph Terminology
Path
A path can be defined as the sequence of nodes that are followed in order to reach some
terminal node V from the initial node U.
Closed Path
A path will be called as closed path if the initial node is same as terminal node. A path
will be closed path if V0=VN.
Cycle
A cycle can be defined as the path which has no repeated edges or vertices except the
first and last vertices.
Connected Graph
A connected graph is the one in which some path exists between every two vertices (u,
v) in V. There are no isolated nodes in connected graph.
Complete Graph
A complete graph is the one in which every node is connected with all other nodes. A
complete graph contain n(n-1)/2 edges where n is the number of nodes in the graph.
Weighted Graph
In a weighted graph, each edge is assigned with some data such as length or weight.
The weight of an edge e can be given as w(e) which must be a positive (+) value
indicating the cost of traversing the edge.
Digraph
A digraph is a directed graph in which each edge of the graph is associated with some
direction and the traversing can be done only in the specified direction.
Loop
An edge that is associated with the similar end points can be called as Loop.
Adjacent Nodes
If two nodes u and v are connected via an edge e, then the nodes u and v are called as
neighbours or adjacent nodes.
Graph Representation
By Graph representation, we simply mean the technique which is to be used in order to
store some graph into the computer's memory.
An undirected graph and its adjacency matrix representation is shown in the following
figure.
A directed graph and its adjacency matrix representation is shown in the following
figure.
The weighted directed graph along with the adjacency matrix representation is shown in
the following figure.
Pros: Representation is easier to implement and follow. Removing an edge takes O(1)
time. Queries like whether there is an edge from vertex ‘u’ to vertex ‘v’ are efficient and
can be done O(1).
Cons: Consumes more space O(V^2). Even if the graph is sparse(contains less number
of edges), it consumes the same space. Adding a vertex is O(V^2) time.
Consider the directed graph shown in the following figure and check the adjacency list
representation of the graph. In a directed graph, the sum of lengths of all the
adjacency lists is equal to the number of edges present in the graph.
In the case of weighted directed graph, each node contains an extra field that is called
the weight of the node. The adjacency list representation of a directed graph is shown in
the following figure.
Breadth first search
Breadth First Search is a level-wise vertex traversal process. Like a tree all
the graphs have vertex but graphs have cycle so in searching to avoid the
coming of the same vertex we prefer BFS
Algorithm:
1. Start by putting any one of the graph's vertices at the back of a queue.
2. Take the front item of the queue and add it to the visited list.
3. Create a list of that vertex's adjacent nodes. Add the ones which aren't in
the visited list to the back of the queue.
4. Keep repeating steps 2 and 3 until the queue is empty.
BFS example
Let's see how the Breadth First Search algorithm works with an example. We use
an undirected graph with 5 vertices.
We start from vertex 0, the BFS algorithm starts by putting it in the Visited list and
putting all its adjacent vertices in the stack.
Next, we visit the element at the front of queue i.e. 1 and go to its adjacent
nodes. Since 0 has already been visited, we visit 2 instead.
Vertex 2 has an unvisited adjacent vertex in 4, so we add that to the back of the
queue and visit 3, which is at the front of the queue.
Only 4 remains in the queue since the only adjacent node of 3 i.e. 0 is already
visited. We visit it.
Since the queue is empty, we have completed the Breadth First Traversal of the
graph.
C Program
#include<stdio.h>
#include<conio.h>
int a[20][20],q[20],visited[20],n,i,j,f=0,r=-1;
void bfs(int v) {
for (i=1;i<=n;i++)
if(a[v][i] && !visited[i])
q[++r]=i;
if(f<=r) {
visited[q[f]]=1;
bfs(q[f++]);
}
}
void main() {
int v;
clrscr();
printf("\n Enter the number of vertices:");
scanf("%d",&n);
for (i=1;i<=n;i++) {
q[i]=0;
visited[i]=0;
}
printf("\n Enter graph data in matrix form:\n");
for (i=1;i<=n;i++)
for (j=1;j<=n;j++)
scanf("%d",&a[i][j]);
printf("\n Enter the starting vertex:");
scanf("%d",&v);
bfs(v);
printf("\n The node which are reachable are:\n");
for (i=1;i<=n;i++)
if(visited[i])
printf("%d\t",i); else
printf("\n Bfs is not possible");
getch();
}
Example 2
DFS algorithm
Depth First Search is a depthwise vertex traversal process. Like a tree all
the graphs have vertex but graphs have cycle so in searching to avoid the
coming of the same vertex we prefer DFS.
To implement the DFS we use stack and array data structure.
DFS example
Let's see how the Depth First Search algorithm works with an example. We use
an undirected graph with 5 vertices.
We start from vertex 0, the DFS algorithm starts by putting it in the Visited list and
putting all its adjacent vertices in the stack.
Next, we visit the element at the top of stack i.e. 1 and go to its adjacent nodes.
Since 0 has already been visited, we visit 2 instead.
Vertex 2 has an unvisited adjacent vertex in 4, so we add that to the top of the
stack and visit it.
After we visit the last element 3, it doesn't have any unvisited adjacent nodes, so
we have completed the Depth First Traversal of the graph.