DS (U5)
DS (U5)
UNIT V
HASH TABLES :
The best search method, binary search technique involves number of comparisons,
which has a search time of O(log2n). Another approach, is to compute the location of
the desired record. The nature of this computation depends on the key set and the
memory-space requirements of the desired record.
This key-to-address transformation problem is defined as a mapping or hashing
function H, which maps the key space (K) into an address space (A).
The hash table is partitioned into b buckets. Each bucket is capable of holding s
records. Thus, a bucket is said to consist of s slots, each slot being large enough to
hold 1 record.
As an example, consider the hash table HT with b = 26 buckets, each bucket having
exactly two slots, i.e. s = 2. The hash function f must map each of the possible
identifiers into one of the numbers 1 2–6. Here, A-Z corresponds to the numbers 1-26
respectively, then the function f is defined by: f(X) = the first character of X. The
L G H Q W L I L H U V $ % & « Z L O O E H KD
The identifiers A, A1,A2 are synonyms. Take for eg., if A and A1 are already stored
in the bucket. If A2 is to be stored then overflow occurs, since s = 2.
HASHING FUNCTION S:
If X is an identifier chosen at random from the identifier space, then we want the
probability that f(X) = I to be 1/b for all buckets i. Then a random X has an equal
chance of hashing into any of the b buckets. A hash function satisfying this property
will be termed a uniform hash function.
Several kinds of uniform hash functions are in use.
selection of bits or digits based on the table size and also they should fit
into one computer word of memory.
Great care should be taken while choosing value for M and preferably
it should be an even number. By making M a large prime number the keys
are spread out evenly.
A key is partitioned into a number of parts, each of which has the same
length as the required address. The parts are then added together, ignoring
the final carry, to form an address. For eg., if the key 356942781 is to be
transformed into a three-digit address.
Two types:
Two techniques open addressing and chaining are used to detect collisions and
overflows.
OPEN ADDRESSING:
Here, collisions are simply resolved by computing a sequence of hash slots. Two
types of techniques :
1) Linear Probing
2) Quadratic Probing
1. Linear Probing:
Here, the function f is defined as f(I) = i. It indicates that whenever we
encounter collisions, the next available cell is searched sequentially and data
elements are placed accordingly.
The following figure shows a hash table with seven locations(buckets)
numbered from 0 to 6. Here the divisor we use is 7. Initially, we insert 23 and
its position is 2 as 23 % 7 = 2.
Then we insert 30 and its position is 2, but the bucket number 2 is already
occupied by 23. So, collision has occurred. Therefore, the value 30 gets next
available cell, which is 3. The orientation is as follows.
Similarly, when we wish to insert 38, and we face collision again. Now it is
placed at the next available cell, which is 4. The arrangement of data elements
is as follows.
Overhead in this technique is the time taken for finding the next available cell.
DATA STRUCTURES II IT
2. Quadratic Probing:
In this case, when the collision occurs at hash address h, then this method searches
the table at location h+1, h+2, and h+9. The hash function will now be defined as
Now let us consider a table of size 10 and index numbered from 0 to 9. Initially the
table looks as follows.
When we wish to insert 23, we can easily insert at location 3 as shown below.
Now we want to insert 93 and as the position 3 is already occupied, collision takes
place. So, the cell with distance one apart is checked and if it is free then the new data
element is placed, which is as shown below.
Now we wish to insert 113, in this case the position 3 and 4 are already occupied, so
the cell with distance 4 is checked and it is found empty then the new value is placed at
location 7.
DATA STRUCTURES II IT
CHAINING:
One of the reasons Open addressing and its variations perform poorly is that
searching for an identifier involves comparison of identifiers with different hash
values.
GRAPH
DEFINING GRAPH:
2 3
FIG: Graph G
We have numbered the graph as 1,2,3,4. Therefore, V(G)=(1,2,3,4) and
E(G) = {(1,2),(1,3),(1,4),(2,3),(2,4)}.
UNDIRECTED GRAPH:
An undirected graph is that in which ,the pair of vertices representing the edges is
unordered.
DIRECTED GRAPH:
An directed graph is that in which , each edge is an ordered pair of vertices, (i.e.)
each edge is represented by a directed pair. It is also referred to as digraph.
DIRECTED GRAPH
Page 5
DATA STRUCTURES II IT
COMPLETE GRAPH:
An n vertex undirected graph with exactly n(n-1)/2 edges is said to be complete
graph.
SUBGRAPH:
A sub- J U D S K R I * L V 9 D * J U DE(SG
D).KsQomG
e * ( *
of the sub-graphs are as follow,
1 1 1
2 3 2 3
(a) (b)
4
1
2 3 2 3
4
4
' (d)
ADJACENT VERTICES:
A vertex v1 is said to be a adjacent vertex if there exist an edge (v1,v2) or (v2,v1).
PATH:
A path from vertex V to the vertex W is a sequence of vertices, each adjacent to the
next. The length of the graph is the number of edges in it.
CONNECTED GRAPH:
A graph is said to be connected if there exist a path from any vertex to another vertex.
UNCONNECTED GRAPH:
A graph is said as unconnected graph if there exist any 2 unconnected components.
1
6
2 3
7 8
4
4 5
4 5
CYCLE
A cycle is a path in which the first and the last vertices are the same.
Eg. 1,3,4,7,2,1
DEGREE:
The number of edges incident on a vertex determines its degree. There are two types
of degrees In-degree and out-degree.
IN-DEGREE of the vertex V is the number of edges for which vertex V is a head and
OUT-DEGREE is the number of edges for which vertex is a tail.
0 1 1 1
1 0 1 1
1 1 0 1
1 1 1 0
The space required to represent a graph using its adjacency matrix is n* n bits. About
half this space can be saved in the case of an undirected graphs by storing only the
upper or lower triangle of the matrix. From the adjacency matrix , one may readily
determine if there an edge connecting any two vertices I and j. for an undirected
n
graph the G H J U H H R I D(j=1)QA(I\, j). FY
or aHdirU
ecteW
d grL
aphFtheH V , L
row is the out-degree and column sum is the in-degree.
Vertex 2 1 3 4 0
Vertex 3 1 2 4 0
Vertex 4
1 2 3 0
Page 8
8
DATA STRUCTURES II IT
The degree of the vertex in an undirected graph may be determined by just counting
the number of nodes in its adjacency list. The total number of edges in G may ,
therefore be determined in time O(n+e).in the case of digraph the number of list
nodes is only e. the out-degree of any vertex can be determined by counting the
number of nodes in an adjacency list. The total number of edges in G can, therefore
be determined in O(n+e).
For each edge there will be exactly one node , but this node will be in two list , (i.e)
the adjacency list for each of the two nodes it is incident to .the node structure now
becomes
Where M is a one bit mark field that may be used to indicate whether or not
the edge has been examined.
V1 N1 1 2 N2 N4 edge(1,2)
V2
N2 edge (1,3)
1 3 N3 N4
V3
V4 N3 1 4 0 N5
edge(1,4)
N4 2 3 N5 N6
edge(2,3)
N5 2 4 0 N6 edge(2,4)
N6 3 4 0 0 edge(3,4)
3.2.4 TRAVERSAL:
Given an undirected graph G = (V.E) and a vertex v in V(G) we are interested in
visiting all vertices in G that are reachable from v (that is all vertices connected to v ).
We have two ways to do the traversal. They are
We start with say, vertex v. An adjacent vertex is selected and a Depth First search is
intimated from it, i.e. let V1, V2.. Vk are adjacent vertices to vertex v. we may select
any vertex from this list. Say, we select V1. Now all the adjacent vertices to v1 are
identified and all of those are visited; next V2 is selected and all its adjacent vertices
visited and so on. This process continues till are the vertices are visited. It is very
much possible that we reach a traversed vertex second time. Therefore we have to set
a flag somewhere to check if the vertex is already visited.
V1
V2 V3
V4 V5 V6 V7
V8
All the adjacent vertices of V7 are already visited, we back track and find that we
have visited all the vertices.
Therefore the sequence of traversal is
V1, V2, V4, V5, V6, V3, V7.
This is not a unique or the only sequence possible using this traversal method.
We may implement the Depth First search by using a stack,pushing all unvisited
vertices to the one just visited and poping the stack to find the next vertex to visit.
Procedure DFS(v)
// Given an undirected graph G = (V.E) with n vertices and an array visited (n)
initially set to zero . This algorithm visits all vertices reachable from v .G and
VISITED are global > //
|*1| void
|*2| dfs<vertex>
|*3|{
|*4|visited[v]=TRUE;
|*5|for each vertex W adjacent to V
|*6| if(!visited[W])
|*7|dfs(W);
|*8|}
COMPUTING TIME:
In DFS we pick on one of the adjacent vertices; visit all of its adjacent vertices and
back track to visit the unvisited adjacent vertices. In BFS , we first visit all the
adjacent vertices of the start vertex and then visit all the unvisited vertices adjacent to
these and so on.
Let us consider the same example, given in figure. We start say, with V1 . Its adjacent
vertices are V2, V8 , V3 . we visit all one by one. We pick on one of these, say V2 . The
unvisited adjacent vertices to V2 are V4, V5 . we visit both . we gi back to the
remaining visited vertices of V1 and pick on one of this, say V3. T The unvisited
DATA STRUCTURES II IT
adjacent vertices to V3 are V6,V7. There are no more unvisited adjacent vertices of V 8,
V4, V5, V6 and V7.
V1 V1
V2 V3 V2 V3
V1
(a) V8
V4 V5 V8
(b) (c)
V1
V2 V3
V4 V5 V6 V7
V8
(d)
Thus the sequence so generated is V1,V2, V8, V3,V4, V5,V6, V7. Here we need a queue
instead of a stack to implement it. We add unvisited vertices adjacent to the one just
visited at the rear and read at from to find the next vertex to visit.
Algorithm BFS gives the details.
Procedure BFS(v)
//A breadth first search of G is carried out beginning at vertex v. All vertices visited
are marked as VISITED(I) = 1. The graph G and array VISITED are global and
VISITED is initialised to 0.//
VISITED(v) 1
Initialise Q to be empty //Q is a queue//
loop
for all vertices w adjacent to v do
if VISITED(w) = 0 //add w to queue//
then [call ADDQ(w, Q); VISITED(w) 1] //mark w as VISITED//
end
DATA STRUCTURES II IT
COMPUTING TIME :
Each vertex visited gets into the queue exactly once, so the loop forever is iterated at
most n times. If an adjacency matrix is used, then the for loop takes O(n) time for
each vertex visited. The total time is, therefore, O(n2). In case adjacency lists are used
the for loop as a total cost of d1 « «n = O(e) wherGe di = degree(vi). Again, all
vertices visited. Together with all edges incident to from a connected component of
G.
The time to do this O(n^2) if adjacency matrices are used and O(e) if adjacency lists
are used., All the connected components of a graph is obtained by making repeated
calls to either DFS (V) or BFS (V) with V as vertex not and yet visited
ALGORITHM:
Procedure COMP(G, n)
/ determine the connected components of G. G has n->1 vertices. VISITED is now a
local array /
for i<-1 to n do
VISITED(I)<-0 / initialise all vertices as unvisited /
end
for i<-1 to n do
if VISITED(I)=0 then / call DFS(I) / find a component / output all newly visited
vertices together with all edges incident to them/
end
end COMP
13
DATA STRUCTURES II IT
SPANNING TREE:
A tree is a spanning tree of a connected graph G(V,E) such that
1. every vertex of G belongs to an edge in T and
2. the edges in T form a tree.
Let us see how w4e construct a spanning tree for a given graph.
Take any vertex v as an initial partial tree and edges one by one so that each edge
gives a new vertex to the partial tree.
In general if there are a vertex in the graph we shall construct a spanning tree in (n-
1) steps i.e. (n-1) edges are needed to added.
The following figure shows a complete graph and three of its spanning trees.
When either DFS or BFS are used the edges of T form a spanning tree.
The spanning tree resulting from a call to DFS is known as a depth first spanning tree.
When BFS is used the resulting spanning tree is called a breadth first spanning tree.
Consider the graph
V1
V2 V3
V4 V5 V6 V7
The following figure shows the spanning trees resulting from depth first add breadth
first search starting at vertex V1 in the aboveVg8raph.
V1 V1
V2 V3 V2 V3
V4 V5 V6 V7
V4 V5 V6 V7
14
DATA STRUCTURES II IT
The edges will have weights associated to there i.e. cost of communication. Given the
weighted graph, one have to construct a communication links that would connect all
the cities with minimum total cost.
Since, the links selected will form a tree. We are going to find the spanning tree of a
given graph with minimum cost. The cost of a spanning tree is the sum of the costs of
the edges in the tree.
. 5 8 6 . $ / H¶OD6FOR C
0 RE
( A7TING A SPANNING TREE:
One approach to determine a minimum cost spanning of a graph has been given by
kruskal. In this approach we need to select (n-1) edges in G, edge by edge, such that
these form an MST or G. We can select another least cost edge and so on. An edge is
included in the tree if it does not form a cycle with the edges already in T.
ALGORITHM:
1. void
2. kruskal(graph)
3. {
4. Sets s(num_vertex);
5. Binary_heap<edge>H(num_egde);
6. Vertex U,V;
7. Set_typeU_set,V_set;
8. edge E ;
9. int edge_accepted=0;
10. read_graph_info_heap_array(G,H);
11. H build_heap();
12. While(edges_accepted<num_vertex_1>)
13. {
14. E=H.delete_min();//E=(U,V)
15. U_set=s.find_and_compree(U);
16. V_set=s.find_and_compree(V);
17. if (U_set!=V_set)
18. {
DATA STRUCTURES II IT
In case they are, then the edge(V,W) is to be discarded. If they are not then(V,W) is to
be added to T.
Our possible grouping is to place all vertices in the same connected components of T
into the set. Then two vertices V,W are connected in t if they are in the same set.
We know that graph traversal can travel through edges of the graph and the weight
associated with each edge may reflect distance, time or cost of travel.
In many applications, we are often required to find a shortest path, (ie) a path having
the minimum weight between the vertices.
SHORTEST PATH :
We know that, a path is a sequence of vertices such that there is an edge that we can
follow between each constructive pair of vertices . Length of the path is the sum of
weights of the edges on that path.
The starting vertex of the path is called the source vertex and the last vertex of the
path is called the destination vertex.
Shortest path from vertex V to vertex W is a path for which the sum of the weights of
the arcs or edges on the path is minimum.
Here, the path that may look longer in number of edges and vertices visited, at times
may be shorter cost-wise.
DATA STRUCTURES II IT
In this problem we are given a graph G=(V,E),a weighting function W(e) for the
edges of G and a source vertex Vo. The problem is to determine the shortest paths
from Vo to all the remaining vertices of G.
We may further look for a path with length shorter than 13,if exists. For graphs with a
small number of vertices and edges, we may write all the possible combinations to
find shortest path.
Further, we have to find shortest path from A to all the remaining vertices. Shortest
path from A to all other vertices (ie) B,C,D,E,F&G are given below:
PATH LENGTH
AB 2
ABC 4
ABD 4
AFE 4
AFG 6
DATA STRUCTURES II IT
AF 1
ABCH 5
Before designing algorithm, let us make some observations that can be used to
determine shortest path. Let S denote the set of vertices including Vo to which the
shortest paths have already been found. For W not in S, let DIST(W) be the length of
the shortest path starting from Vo going through only those vertices which are in S
and ending at W.
OBSER VATIONS:
1. If the next shortest path is to vertex U, then the path begins at Vo, ends at U
and other intermediate vertices in path are in S.
2. The destination of the next path generated must be that vertex U which has
the minimum distance, DIST(U), among all vertices not in S.
3. Having selected a vertex U, and generated the shortest Vo to U path, vertex
U becomes a member of S. At this point the length of the shortest paths starting at Vo,
going through vertices only in S and ending at a vertex W not in S may decrease.
This is due to a shorter path starting at Vo going to U and then to W. The intermediate
vertices on to the Vo to U path and the U to W path must all be in S.
By assumption that shortest paths are being generated in non descending order, the V o
to U path must be the shortest such path and U to W path can be chosen so as to not
contain the intermediate vertices.
Therefore, Vo to U in the shortest path and the path from U to W in the edge(U,W).
Therefore, the length of the path Vo to W (ie) DIST(W)=DIST(U)+length(U,W)
The first shortest path algorithm is given by dijkstra , makes use of these
observations to determine the cost of the shortest paths from V o to all other vertices in
G.
It is assumed that the n vertices of G are numbered 1 through n. the set s is maintained
as a bit array with S(I)=0 if vertex I is not in s and S(I)=1 if it is 0.
It is assumed that the graph itself is represented by its left adjacency matrix with
COST(I,J) being the weight of the edge<I,J>COST(I,J)will be set to some larger
Q X P E H U L Q F D V H W K H HGJ H
non-negative number without affecting the outcome of the algorithm.
ALGORITHM:
Procedure SHORTEST_PATH(v,COST,DIST,n)
// DIST(J),1<=n is set to the length of the shortest path from vertex ie to vertex J in a
digraph G with n vertices. DIST(v) is set to zero.G is represented by its left
adjacency matrix, S(1:n)
1. for I<-1 to n do //initialize set s to empty//
DATA STRUCTURES II IT
2. S(I)<-0;DIST(I)<-COST(v,I)
3. End
4. S(v),-1;DIST(v)<-0;num<-2 // put vertex v in s//
5. While num <n do // determine n-1 paths from vertex v//
6. Choose n:DIST(v)=min
7. S(u)<-1;num<-num+1 //put vertex u in set s//
8. for all w with S(u)=0 do //update distances//
9. DIST(w),-min{DIST(w),DIST(u)+COST(u,w)}
10. End
11. End
12. End SHORTEST_PATH
Here, the for loop is executed n-2 times. Each execution of this loop requires O(n)
time at line to select the next vertex and again at lines 8-10to update DIST. So the
total time for the while loop is O(n*n).
Any shortest path algorithm must examine each edge in the graph at least once since
any of the edges could be in a shortest path. Since left adjacency matrix were used to
represented the graph,it takes O(n*n) time just to determine which edges are in G and
so any shortest path algorithm using this representation must cate O(n*n).
The all pairs shortest path problem calls for finding the shortest paths between all
pairs of vertices Vi, Vj, i!=j.
The graph G is represented by its left adjacency matrix with COST(I,J)=0 and
& 2 6 7 , M L Q F D V H H G J H ,
Ak(I,J) is defined to decrease the left of the shortest path from I to j going through no
intermediate vertex of index greater than k. A0, (I,J) is to successively generate the
matrices A0, A1, A2 « n«. $
If we have already generate A^k-1,then we have already generated Ak,by
realizing that for any pair of vertices i, j either.
(i) the shortest path from I to j going through no vertex with index greater than
k does not go through the vertex with index k and so its left is Ak-1;or
(ii) The shortest such path goes through vector k. such a path consists of a path
from I to k and another one from I to k and another one from k to j. these paths must
be the shortest paths from I to k and from k to j going through no vector with index
DATA STRUCTURES II IT
greater than k-1, and so their cost are AK-1(I,K) and AK-1(K,J). This is true only if G
has no cycle with negative length containing vertex K.
If this is not true, then the shortest I to J path going through no vertices on index
greater than K may make several cycles from K to K and this have a length less than
AK-1(I,K) +AK-1(K,J). So the value of AK(I,J) is ie
AK(I,J)=min{AK-1(I,J),AK-1(I,K)+AK-1(K,J)},K>=1 and
A0(I,J)=COST(I,J).
EXAMPLE:
ALGORITHM:
The algorithm ALL_COSTS computers An(I,J).the computation is done in place so
the superscript on A is dropped. The reason for this computation can be carried out in
place is that AK(I,K)=AK-1(I,K) and AK(K,J)=Ak-1(K,J) and so the in place
computation does not alter the outcome.
ANALYSIS:
The looping is independent of the data in matrix A and so we can easily say that the
computing time of this algorithm is O(n^3).
EXAMPLE:
Let us consider the following digraph G:
The initial A matrix, A0 plus its value after 3 iterations A(1),A(2),A(3) is given below:
DATA STRUCTURES II IT
In case of all pairs shortest path problem, it is necessary to determine for every pair of
vertices i, j In the existence of a path from i to j .
If A is the adjacency matrix of G , then the matrix A+ having the property A+(i ,j) =1
if there is a path of length >0 from i to j and o otherwise it is called the transitive
closure matrix of G.
The matrix A* with the property A*(i ,j) = 1 if there is a path of length >=0
from i to j and o otherwise , is the reflexive transitive closure matrix of G
EXAMPLE :
Consider the following digraph G
11 2 3 4 5
fig(a)
1 0 1 0 0 0
2 0 0 1 0 0
3 0 0 0 1 0
4 0 0 0 0 1
5 0 0 1 0 0
fig (b)
DATA STRUCTURES II IT
1 2 3 4 5
A+: 1 0 1 1 1 1
2 0 0 1 1 1
3 0 0 1 1 1
4 0 0 1 1 1
5 0 0 1 1 1
fig (c)
1 2 3 4 5
*
A; 1 1 1 1 1 1
2 0 1 1 1 1
3 0 0 1 1 1
4 0 0 1 1 1
5 0 0 1 1 1
fig (d)
If we use algorithm ALL_COSTS with COST(i ,j) = 1 if <i ,j> is an edge in G and
COST(i ,j) = if <i ,j> is not in G , Then we can easily obtain A+ from the final
matrix A by letting A+ (i ,j )=1 if A(i,j)<+ .
A directed graph G in which the vertices represent tasks/events and the edges
represents activities to move from one event to another then that graph is known as
activity on verten network/AOV-network/PERT graph.
b) V1
DATA STRUCTURES II IT
c) V4
d) V3
e) V6
f) V2
g) V5
Here on step (b) we have, 3 vertices V2,V3&V4 which has no predecessor and any of
these can be the next vertex in topological order.
This can be efficiently done if for each vertex a count of the no of its immediate
predecessor is kept.
This can be easily implemented if the network is represented by its adjacency lists.
ALGORITHM:
Procedure Topological_order (count,vertex,link,n)
//the n vertices of an AOV-network are listed in topological order .
the network is represented as a set of adjacency lists with
COUNT(i)=the in-degree of vertex i//
1) top 0 //initialize stack//
2) for i 1 to n do //create a linked stack of vertices with no predecessors //
3) if COUNT (I)=0 then [ COUNT(i) top;topI]
4) end
5) for i 1 to n do //print the vertices in topological order //
6) L I W R S W K H Q > S U L Q W Q
7) j top;topCOUNT (top);print(j)//unstack a vertex//
8) ptr LINK(j)
9) while ptr <> 0 do
//decrease the count of successor vertices of j//
The head nodes of these lists contain two fields : COUNT & LINK. The COUNT
field contains the in-degree of that vertex and LINK is a pointer to the first node on
the adjacency list. Each list node has 2 fields:VERTEX & LINK.
COUNT fields can be set at the time of i/p. When edge < i ,j > is i/p the count
of vertex j is incremented by 1. The list of vertices with zero count is maintained as a
stack.