0% found this document useful (0 votes)
11 views28 pages

DS (U5)

Uploaded by

special27032001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views28 pages

DS (U5)

Uploaded by

special27032001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

DATA STRUCTURES II IT

UNIT V

Hashing: Introduction H – ash function –methods - Hash table implementation - rehashing.


Graph: Directed and un directed graph r–epresentation of graphs g–raph traversals: Depth
– readth first search t–ransitive closure s–panning trees a–pplication -
first search B

HASH TABLES :

The best search method, binary search technique involves number of comparisons,
which has a search time of O(log2n). Another approach, is to compute the location of
the desired record. The nature of this computation depends on the key set and the
memory-space requirements of the desired record.
This key-to-address transformation problem is defined as a mapping or hashing
function H, which maps the key space (K) into an address space (A).

The hash table is partitioned into b buckets. Each bucket is capable of holding s
records. Thus, a bucket is said to consist of s slots, each slot being large enough to
hold 1 record.

An overflow is said to occur when a new identifier I is mapped or hashed by function


f into a full bucket.
Since, the key space is usually much larger than the address space, many keys will be
matched to the same address. Such a many to one mapping results in collisions
between records.

As an example, consider the hash table HT with b = 26 buckets, each bucket having
exactly two slots, i.e. s = 2. The hash function f must map each of the possible
identifiers into one of the numbers 1 2–6. Here, A-Z corresponds to the numbers 1-26
respectively, then the function f is defined by: f(X) = the first character of X. The
L G H Q W L I L H U V $ % & « Z L O O E H KD
The identifiers A, A1,A2 are synonyms. Take for eg., if A and A1 are already stored
in the bucket. If A2 is to be stored then overflow occurs, since s = 2.

HASHING FUNCTION S:

If X is an identifier chosen at random from the identifier space, then we want the
probability that f(X) = I to be 1/b for all buckets i. Then a random X has an equal
chance of hashing into any of the b buckets. A hash function satisfying this property
will be termed a uniform hash function.
Several kinds of uniform hash functions are in use.

(i) MID-SQUARE METHOD:

A key is multiplied by itself and the address is obtained by choosing an


appropriate number of bits or digits from the middle of the square. The
DATA STRUCTURES II IT

selection of bits or digits based on the table size and also they should fit
into one computer word of memory.

Eg. Consider a key, 56789 and when it is squared we get 3224990521. If


the three digit address is needed, then positions 5 to 7 may chosen, giving
address 990.

(ii) DIVISION METHOD:

In this method, integer x is to divide by M and d then to use the remainder


modulo M. The hash function is
H(x) = x mod M

Great care should be taken while choosing value for M and preferably
it should be an even number. By making M a large prime number the keys
are spread out evenly.

(iii) FOLDING METHOD:

A key is partitioned into a number of parts, each of which has the same
length as the required address. The parts are then added together, ignoring
the final carry, to form an address. For eg., if the key 356942781 is to be
transformed into a three-digit address.
Two types:

1. Fold-shifting: 356, 942 and 781 are added to yield 079.


2. Fold-boundary method: 653, 942 and 187 are added together,
yielding 782.

(iv) DIGIT ANALYSIS METHOD :

A hashing function referred to as digit analysis forms addresses by selecting


and shifting digits or bits of the original key. For eg., a key 7546123 is
transformed to the address 2164 by selecting digits in positions 3 to 6 and
reversing their order. Digit positions having the most uniform distributions
are selected. This hashing transformation technique has been used in
conjunction with static key sets.

OVERFLOW HANDLING : (or)Collision-Resolution Technique:

Two techniques open addressing and chaining are used to detect collisions and
overflows.

The general objective of a collision-resolution technique is to attempt to place


colliding records elsewhere in the table. This requires the investigation of a series of
table positions until an empty one is found to accommodate a colliding record.
DATA STRUCTURES II IT

OPEN ADDRESSING:

Here, collisions are simply resolved by computing a sequence of hash slots. Two
types of techniques :
1) Linear Probing
2) Quadratic Probing
1. Linear Probing:
Here, the function f is defined as f(I) = i. It indicates that whenever we
encounter collisions, the next available cell is searched sequentially and data
elements are placed accordingly.
The following figure shows a hash table with seven locations(buckets)
numbered from 0 to 6. Here the divisor we use is 7. Initially, we insert 23 and
its position is 2 as 23 % 7 = 2.

Next, we insert 50 and its position is 1 and the arrangement is as follows.

Then we insert 30 and its position is 2, but the bucket number 2 is already
occupied by 23. So, collision has occurred. Therefore, the value 30 gets next
available cell, which is 3. The orientation is as follows.

Similarly, when we wish to insert 38, and we face collision again. Now it is
placed at the next available cell, which is 4. The arrangement of data elements
is as follows.

Overhead in this technique is the time taken for finding the next available cell.
DATA STRUCTURES II IT

2. Quadratic Probing:

In this case, when the collision occurs at hash address h, then this method searches
the table at location h+1, h+2, and h+9. The hash function will now be defined as

(h(x)+i2) % hash size

Now let us consider a table of size 10 and index numbered from 0 to 9. Initially the
table looks as follows.

When we wish to insert 23, we can easily insert at location 3 as shown below.

Next we want to insert 81 and it is easily placed at location 1.

Now we want to insert 93 and as the position 3 is already occupied, collision takes
place. So, the cell with distance one apart is checked and if it is free then the new data
element is placed, which is as shown below.

Now we wish to insert 113, in this case the position 3 and 4 are already occupied, so
the cell with distance 4 is checked and it is found empty then the new value is placed at
location 7.
DATA STRUCTURES II IT

CHAINING:
One of the reasons Open addressing and its variations perform poorly is that
searching for an identifier involves comparison of identifiers with different hash
values.

Many of the comparisons being made could be saved if we maintained lists of


identifiers, one list per bucket, each list containing all the synonyms for that bucket. If
this were done, a search would then involve computing the hash address f(X) and
examining only those identifiers in the list for f(X).

GRAPH

DEFINING GRAPH:

A graphs g consists of a set V of vertices (nodes) and a set E of edges (arcs). We


write G=(V,E). V is a finite and non-empty set of vertices. E is a set of pair of
vertices; these pairs are called as edges . Therefore,
V(G).read as V of G, is a set of vertices and E(G),read as E of G is a set of edges.

An edge e=(v, w) is a pair of vertices v and w, and to be incident with v and w.


A graph can be pictorially represented as follows,

2 3

FIG: Graph G
We have numbered the graph as 1,2,3,4. Therefore, V(G)=(1,2,3,4) and
E(G) = {(1,2),(1,3),(1,4),(2,3),(2,4)}.

3.2.2 BASIC TERMINOLGIES OF GRAPH:

UNDIRECTED GRAPH:
An undirected graph is that in which ,the pair of vertices representing the edges is
unordered.

DIRECTED GRAPH:
An directed graph is that in which , each edge is an ordered pair of vertices, (i.e.)
each edge is represented by a directed pair. It is also referred to as digraph.

DIRECTED GRAPH

Page 5
DATA STRUCTURES II IT

COMPLETE GRAPH:
An n vertex undirected graph with exactly n(n-1)/2 edges is said to be complete
graph.

The graph G is said to be complete graph .

SUBGRAPH:
A sub- J U D S K R I * L V 9 D * J U DE(SG
D).KsQomG
e * ( *
of the sub-graphs are as follow,
1 1 1

2 3 2 3
(a) (b)
4
1

2 3 2 3

4
4

' (d)

FIG: Graph G FIG: Subgraphs of G

ADJACENT VERTICES:
A vertex v1 is said to be a adjacent vertex if there exist an edge (v1,v2) or (v2,v1).

PATH:
A path from vertex V to the vertex W is a sequence of vertices, each adjacent to the
next. The length of the graph is the number of edges in it.

CONNECTED GRAPH:
A graph is said to be connected if there exist a path from any vertex to another vertex.

UNCONNECTED GRAPH:
A graph is said as unconnected graph if there exist any 2 unconnected components.
1
6
2 3
7 8
4

FIG: Unconnected Graph


DATA STRUCTURES II IT

STRONGLY CONNECTED GRAPH:


A digraph is said to be strongly connected if there is a directed path from any vertex
to any other vertex.
1

4 5

WEAKLY CONNECTED GRAPH:


If there dos not exist a directed path from one vertex to another vertex then it is said
to be a weakly connected graph.
1

4 5

CYCLE
A cycle is a path in which the first and the last vertices are the same.
Eg. 1,3,4,7,2,1

DEGREE:
The number of edges incident on a vertex determines its degree. There are two types
of degrees In-degree and out-degree.

IN-DEGREE of the vertex V is the number of edges for which vertex V is a head and
OUT-DEGREE is the number of edges for which vertex is a tail.

A GRAPH IS SAID TO B E A TREE , IF IT SATISFIES THE TWO


PROPERTIES:
(1) it is connected
(2) there are no cycles in the graph.
DATA STRUCTURES II IT

3.2.3 GRAPH REPRESENTATION:

The graphs can be represented by the follow three methods:


1. Adjacency matrix.
2. Adjacency list.
3. Adjacency multi-list.

3.2.3.1 ADJACENCY MATRIX:

The adjacency matrix A for a graph G = (V,E) with n vertices, is an n* n matrix of


bits ,such that A
Ij = 1 , if there is an edge from vi to vj and
Aij = 0, if there is no such edge.
The adjacency matrix for the graph G is,

0 1 1 1
1 0 1 1
1 1 0 1
1 1 1 0

The space required to represent a graph using its adjacency matrix is n* n bits. About
half this space can be saved in the case of an undirected graphs by storing only the
upper or lower triangle of the matrix. From the adjacency matrix , one may readily
determine if there an edge connecting any two vertices I and j. for an undirected
n
graph the G H J U H H R I D(j=1)QA(I\, j). FY
or aHdirU
ecteW
d grL
aphFtheH V , L
row is the out-degree and column sum is the in-degree.

The adjacency matrix is a simple way to represent a graph , but it has 2


disadvantages,
1. it takes O(n*n) space to represent a graph with n vertices ; even for sparse graphs
and
2. L W W D N H V « Q Q W L P H V W R VRO

3.2.3.2. ADJACENCY LIST:

In the representation of the n rows of the adjacency matrix are represented


as n linked lists. There is one list for each vertex in G. the nodes in list I represent the
vertices that are adjacent from vertex i. Each nodes has at least 2 fields: VERTEX
and LINK. The VERTEX fields contain the indices of the vertices adjacent to vertex
i. In the case of an undirected graph with n vertices and e edges ,this representation
requires n head nodes and 2e list nodes.
2 3 4 0
Vertex 1

Vertex 2 1 3 4 0

Vertex 3 1 2 4 0
Vertex 4
1 2 3 0
Page 8

8
DATA STRUCTURES II IT

The degree of the vertex in an undirected graph may be determined by just counting
the number of nodes in its adjacency list. The total number of edges in G may ,
therefore be determined in time O(n+e).in the case of digraph the number of list
nodes is only e. the out-degree of any vertex can be determined by counting the
number of nodes in an adjacency list. The total number of edges in G can, therefore
be determined in O(n+e).

3.2.3.3 ADJACENCY MULTILIST:

In the adjacency list representation of an undirected graph each edge (v i,


vj) is represented by two entries ,one on the list for V i and the other on the list for Vj.

For each edge there will be exactly one node , but this node will be in two list , (i.e)
the adjacency list for each of the two nodes it is incident to .the node structure now
becomes

M V1 V2 Link for V1 Link for V2

Where M is a one bit mark field that may be used to indicate whether or not
the edge has been examined.

The adjacency multi-list diagram is as follow,


Vertex

V1 N1 1 2 N2 N4 edge(1,2)
V2
N2 edge (1,3)
1 3 N3 N4
V3
V4 N3 1 4 0 N5
edge(1,4)

N4 2 3 N5 N6
edge(2,3)

N5 2 4 0 N6 edge(2,4)

N6 3 4 0 0 edge(3,4)

The lists are: v1: N1 N2 N3


V2: N1 N4 N5
V3: N2 N4 N6
V4: N3 N5 N6

FIG: Adjacency Multilists for G


DATA STRUCTURES II IT

3.2.4 TRAVERSAL:
Given an undirected graph G = (V.E) and a vertex v in V(G) we are interested in
visiting all vertices in G that are reachable from v (that is all vertices connected to v ).
We have two ways to do the traversal. They are

1. DEPTH FIRST SEARCH

2. BREADTH FIRST SEARCH.

3.2.4.1 DEPTH FIRST SEARCH :


In graphs, we do not have any start vertex or any special vertex singled out to start
traversal from. Therefore the traversal may start from any arbitrary vertex.

We start with say, vertex v. An adjacent vertex is selected and a Depth First search is
intimated from it, i.e. let V1, V2.. Vk are adjacent vertices to vertex v. we may select
any vertex from this list. Say, we select V1. Now all the adjacent vertices to v1 are
identified and all of those are visited; next V2 is selected and all its adjacent vertices
visited and so on. This process continues till are the vertices are visited. It is very
much possible that we reach a traversed vertex second time. Therefore we have to set
a flag somewhere to check if the vertex is already visited.

Let us see an example, consider the following graph.

V1

V2 V3

V4 V5 V6 V7

V8

Let us start with V1.


Its adjacent vertices are V2, V8, and V3. Let us pick on v2.
Its adjacent vertices are V1, V4, V5, V1 is already visited . let us pick on V4.
Its adjacent vertices are V2, V8.
V2 is already visited .let us visit V8.
Its adjacent vertices are V4, V5, V1, V6, V7.
V4 and V1 are visited. Let us traverse V5.
Its adjacent vertices are V2, V8. Both are already visited therefore, we back
track.
We had V6 and V7 unvisited in the list of V8. We may visit any. We may visit
any. We visit V6.
DATA STRUCTURES II IT

Its adjacent are V8 and V3. Obviously the choice is V3.


Its adjacent vertices are V1, V7 . We visit V7.

All the adjacent vertices of V7 are already visited, we back track and find that we
have visited all the vertices.
Therefore the sequence of traversal is
V1, V2, V4, V5, V6, V3, V7.
This is not a unique or the only sequence possible using this traversal method.

We may implement the Depth First search by using a stack,pushing all unvisited
vertices to the one just visited and poping the stack to find the next vertex to visit.

This procedure is best described recursively as in

Procedure DFS(v)
// Given an undirected graph G = (V.E) with n vertices and an array visited (n)
initially set to zero . This algorithm visits all vertices reachable from v .G and
VISITED are global > //
|*1| void
|*2| dfs<vertex>
|*3|{
|*4|visited[v]=TRUE;
|*5|for each vertex W adjacent to V
|*6| if(!visited[W])
|*7|dfs(W);
|*8|}

COMPUTING TIME:

In case G is represented by adjacency lists then the vertices w adjacent to v can be


determined by following a chain of links. Since the algorithm DFS would examine
each node in the adjacency lists at most once and there are 2e list nodes. The time to
complete the search is O (e). if G is represented by its adjacency matrix. Then the
time to determine all vertices adjacent to v is O(n). since at most n vertices are visited.
The total time is O(n2).

3.2.4.2 BREADTH FIRST SEARCH:

In DFS we pick on one of the adjacent vertices; visit all of its adjacent vertices and
back track to visit the unvisited adjacent vertices. In BFS , we first visit all the
adjacent vertices of the start vertex and then visit all the unvisited vertices adjacent to
these and so on.

Let us consider the same example, given in figure. We start say, with V1 . Its adjacent
vertices are V2, V8 , V3 . we visit all one by one. We pick on one of these, say V2 . The
unvisited adjacent vertices to V2 are V4, V5 . we visit both . we gi back to the
remaining visited vertices of V1 and pick on one of this, say V3. T The unvisited
DATA STRUCTURES II IT

adjacent vertices to V3 are V6,V7. There are no more unvisited adjacent vertices of V 8,
V4, V5, V6 and V7.

V1 V1

V2 V3 V2 V3
V1

(a) V8
V4 V5 V8

(b) (c)

V1

V2 V3

V4 V5 V6 V7

V8

(d)

FIG: Bread th First Search

Thus the sequence so generated is V1,V2, V8, V3,V4, V5,V6, V7. Here we need a queue
instead of a stack to implement it. We add unvisited vertices adjacent to the one just
visited at the rear and read at from to find the next vertex to visit.
Algorithm BFS gives the details.

Procedure BFS(v)

//A breadth first search of G is carried out beginning at vertex v. All vertices visited
are marked as VISITED(I) = 1. The graph G and array VISITED are global and
VISITED is initialised to 0.//
VISITED(v)  1
Initialise Q to be empty //Q is a queue//
loop
for all vertices w adjacent to v do
if VISITED(w) = 0 //add w to queue//
then [call ADDQ(w, Q); VISITED(w)  1] //mark w as VISITED//
end
DATA STRUCTURES II IT

if Q is empty then return


call DELETEQ(v,Q)
forever
end BFS

COMPUTING TIME :

Each vertex visited gets into the queue exactly once, so the loop forever is iterated at
most n times. If an adjacency matrix is used, then the for loop takes O(n) time for
each vertex visited. The total time is, therefore, O(n2). In case adjacency lists are used
the for loop as a total cost of d1 « «n = O(e) wherGe di = degree(vi). Again, all
vertices visited. Together with all edges incident to from a connected component of
G.

3.2.5 APPLICATIONS OF GRAPH TRAVERSAL:

Two simple applications of graph traversal are:


1.finding the component of a graph and
2.finding a spanning tree of a connected graph

3.2.5.1 CONNECTED COMPONENTS:

If G is an undirected graph, we can determine whether it is connected or not


connected by making a call to either DFS or BFS and then determining if there is any
unvisited vertex.

The time to do this O(n^2) if adjacency matrices are used and O(e) if adjacency lists
are used., All the connected components of a graph is obtained by making repeated
calls to either DFS (V) or BFS (V) with V as vertex not and yet visited

The following algorithm determines all the connected components of G.


The algorithm uses DFS . Instead BFS may be used. The computing time is not
affected.

ALGORITHM:

Procedure COMP(G, n)
/ determine the connected components of G. G has n->1 vertices. VISITED is now a
local array /
for i<-1 to n do
VISITED(I)<-0 / initialise all vertices as unvisited /
end
for i<-1 to n do
if VISITED(I)=0 then / call DFS(I) / find a component / output all newly visited
vertices together with all edges incident to them/
end
end COMP

13
DATA STRUCTURES II IT

3.2.5.2 SPANNING TREES AND MINIMUM COST SPANNING TREES:

SPANNING TREE:
A tree is a spanning tree of a connected graph G(V,E) such that
1. every vertex of G belongs to an edge in T and
2. the edges in T form a tree.

CONSTRUCTING A SPANNING TREE:

Let us see how w4e construct a spanning tree for a given graph.
Take any vertex v as an initial partial tree and edges one by one so that each edge
gives a new vertex to the partial tree.
In general if there are a vertex in the graph we shall construct a spanning tree in (n-
1) steps i.e. (n-1) edges are needed to added.
The following figure shows a complete graph and three of its spanning trees.

When either DFS or BFS are used the edges of T form a spanning tree.
The spanning tree resulting from a call to DFS is known as a depth first spanning tree.

When BFS is used the resulting spanning tree is called a breadth first spanning tree.
Consider the graph

V1

V2 V3

V4 V5 V6 V7

The following figure shows the spanning trees resulting from depth first add breadth
first search starting at vertex V1 in the aboveVg8raph.

V1 V1

V2 V3 V2 V3

V4 V5 V6 V7
V4 V5 V6 V7

Fig. DFS (I) spanninVg8tree . fig. BFS(I) Spanning tree


V8
Page 14

14
DATA STRUCTURES II IT

MINIMUM COST SPANNING TREES (APPLICATIONS OF SPANNING


TREE)

BUILDING A MINIMUM SPANNING TREE:


If the nodes of G represent cities and the edges represent possible communication
links connecting two cities then the minimum number oif links needed to connect the
n cities is n-1. The spanning tree of G will represent all feasible choices.
This application of spanning tree arises from the property that a spanning tree is a
minimal sub- J U D S K * ¶ * ¶R I 9 * * V X FDKQ G W K* D¶ W L V9
minimal sub-graph is one with fewest number of edges.

The edges will have weights associated to there i.e. cost of communication. Given the
weighted graph, one have to construct a communication links that would connect all
the cities with minimum total cost.

Since, the links selected will form a tree. We are going to find the spanning tree of a
given graph with minimum cost. The cost of a spanning tree is the sum of the costs of
the edges in the tree.

. 5 8 6 . $ / H¶OD6FOR C
0 RE
( A7TING A SPANNING TREE:

One approach to determine a minimum cost spanning of a graph has been given by
kruskal. In this approach we need to select (n-1) edges in G, edge by edge, such that
these form an MST or G. We can select another least cost edge and so on. An edge is
included in the tree if it does not form a cycle with the edges already in T.

ALGORITHM:

1. void
2. kruskal(graph)
3. {
4. Sets s(num_vertex);
5. Binary_heap<edge>H(num_egde);
6. Vertex U,V;
7. Set_typeU_set,V_set;
8. edge E ;
9. int edge_accepted=0;
10. read_graph_info_heap_array(G,H);
11. H build_heap();
12. While(edges_accepted<num_vertex_1>)
13. {
14. E=H.delete_min();//E=(U,V)
15. U_set=s.find_and_compree(U);
16. V_set=s.find_and_compree(V);
17. if (U_set!=V_set)
18. {
DATA STRUCTURES II IT

19. //accept the edges


20. Edges_accepted++;
21. S.union_by_height(U_set,V_set)
22. }
23. }
24. }
Initially, E is the set of all edges in G. In this set, we are going to (I)determining an
edge with minimum cost(in line 3)
(ii) delete that edge(line 4)
this can be done efficiently if the edges in E are maintained as a sorted sequential list.

In order to be able to perform steps 5&6 efficiently, the vertices in G should be


grouped together in such way that one may easily determine of the vertices V and W
are already connected by the earlier selection of edges.

In case they are, then the edge(V,W) is to be discarded. If they are not then(V,W) is to
be added to T.

Our possible grouping is to place all vertices in the same connected components of T
into the set. Then two vertices V,W are connected in t if they are in the same set.

3.2.6 SHORTEST PATHS AND TRANSITIVE CLOSURE

We know that graph traversal can travel through edges of the graph and the weight
associated with each edge may reflect distance, time or cost of travel.
In many applications, we are often required to find a shortest path, (ie) a path having
the minimum weight between the vertices.

SHORTEST PATH :

We know that, a path is a sequence of vertices such that there is an edge that we can
follow between each constructive pair of vertices . Length of the path is the sum of
weights of the edges on that path.

The starting vertex of the path is called the source vertex and the last vertex of the
path is called the destination vertex.

Shortest path from vertex V to vertex W is a path for which the sum of the weights of
the arcs or edges on the path is minimum.

Here, the path that may look longer in number of edges and vertices visited, at times
may be shorter cost-wise.
DATA STRUCTURES II IT

FINDING SHORTEST PATH :

Two kinds of problem in finding shortest path are:


1. Single source shortest path.
2. All pairs shortest path.

3.2.6.1 SINGLE SOURCE SHORTEST PATH :


+ H U H Z H ¶ Y H D V L Q J O H VRXUF H
vertex V to every other vertex of the graph.

In this problem we are given a graph G=(V,E),a weighting function W(e) for the
edges of G and a source vertex Vo. The problem is to determine the shortest paths
from Vo to all the remaining vertices of G.

Consider the following weighted graph with 8 vertices.

There are many paths from A to H. Length of path AFDEH=1+3+4+6=14.


Another path from A to H is ABCEH. Its length is 2+2+3+6=13.

We may further look for a path with length shorter than 13,if exists. For graphs with a
small number of vertices and edges, we may write all the possible combinations to
find shortest path.

Further, we have to find shortest path from A to all the remaining vertices. Shortest
path from A to all other vertices (ie) B,C,D,E,F&G are given below:

PATH LENGTH
AB 2
ABC 4
ABD 4
AFE 4
AFG 6
DATA STRUCTURES II IT

AF 1
ABCH 5

ALGORITHM TO GENERATE SHORTEST PATH :

Before designing algorithm, let us make some observations that can be used to
determine shortest path. Let S denote the set of vertices including Vo to which the
shortest paths have already been found. For W not in S, let DIST(W) be the length of
the shortest path starting from Vo going through only those vertices which are in S
and ending at W.
OBSER VATIONS:
1. If the next shortest path is to vertex U, then the path begins at Vo, ends at U
and other intermediate vertices in path are in S.
2. The destination of the next path generated must be that vertex U which has
the minimum distance, DIST(U), among all vertices not in S.
3. Having selected a vertex U, and generated the shortest Vo to U path, vertex
U becomes a member of S. At this point the length of the shortest paths starting at Vo,
going through vertices only in S and ending at a vertex W not in S may decrease.

This is due to a shorter path starting at Vo going to U and then to W. The intermediate
vertices on to the Vo to U path and the U to W path must all be in S.

By assumption that shortest paths are being generated in non descending order, the V o
to U path must be the shortest such path and U to W path can be chosen so as to not
contain the intermediate vertices.

Therefore, Vo to U in the shortest path and the path from U to W in the edge(U,W).
Therefore, the length of the path Vo to W (ie) DIST(W)=DIST(U)+length(U,W)

The first shortest path algorithm is given by dijkstra , makes use of these
observations to determine the cost of the shortest paths from V o to all other vertices in
G.

It is assumed that the n vertices of G are numbered 1 through n. the set s is maintained
as a bit array with S(I)=0 if vertex I is not in s and S(I)=1 if it is 0.

It is assumed that the graph itself is represented by its left adjacency matrix with
COST(I,J) being the weight of the edge<I,J>COST(I,J)will be set to some larger
Q X P E H U L Q F D V H W K H HGJ H
non-negative number without affecting the outcome of the algorithm.

ALGORITHM:

Procedure SHORTEST_PATH(v,COST,DIST,n)
// DIST(J),1<=n is set to the length of the shortest path from vertex ie to vertex J in a
digraph G with n vertices. DIST(v) is set to zero.G is represented by its left
adjacency matrix, S(1:n)
1. for I<-1 to n do //initialize set s to empty//
DATA STRUCTURES II IT

2. S(I)<-0;DIST(I)<-COST(v,I)
3. End
4. S(v),-1;DIST(v)<-0;num<-2 // put vertex v in s//
5. While num <n do // determine n-1 paths from vertex v//
6. Choose n:DIST(v)=min
7. S(u)<-1;num<-num+1 //put vertex u in set s//
8. for all w with S(u)=0 do //update distances//
9. DIST(w),-min{DIST(w),DIST(u)+COST(u,w)}
10. End
11. End
12. End SHORTEST_PATH

ANALYSIS OF ALGORITHM SHORTEST PATH:


The time taken by the algorithm as a graph with n vertices is O(n*n).

Here, the for loop is executed n-2 times. Each execution of this loop requires O(n)
time at line to select the next vertex and again at lines 8-10to update DIST. So the
total time for the while loop is O(n*n).

Any shortest path algorithm must examine each edge in the graph at least once since
any of the edges could be in a shortest path. Since left adjacency matrix were used to
represented the graph,it takes O(n*n) time just to determine which edges are in G and
so any shortest path algorithm using this representation must cate O(n*n).

3.2.6.2 ALL PAIRS SHORTEST PATHS:

The all pairs shortest path problem calls for finding the shortest paths between all
pairs of vertices Vi, Vj, i!=j.

One possible solution to this is to apply the algorithm SHORTEST_PATH n times,


R Q F H Z L W K H D F K Y H U W H [ L). Q Y *

The graph G is represented by its left adjacency matrix with COST(I,J)=0 and
& 2 6 7 , M L Q F D V H H G J H ,

Ak(I,J) is defined to decrease the left of the shortest path from I to j going through no
intermediate vertex of index greater than k. A0, (I,J) is to successively generate the
matrices A0, A1, A2 « n«. $
If we have already generate A^k-1,then we have already generated Ak,by
realizing that for any pair of vertices i, j either.
(i) the shortest path from I to j going through no vertex with index greater than
k does not go through the vertex with index k and so its left is Ak-1;or

(ii) The shortest such path goes through vector k. such a path consists of a path
from I to k and another one from I to k and another one from k to j. these paths must
be the shortest paths from I to k and from k to j going through no vector with index
DATA STRUCTURES II IT

greater than k-1, and so their cost are AK-1(I,K) and AK-1(K,J). This is true only if G
has no cycle with negative length containing vertex K.

If this is not true, then the shortest I to J path going through no vertices on index
greater than K may make several cycles from K to K and this have a length less than
AK-1(I,K) +AK-1(K,J). So the value of AK(I,J) is ie
AK(I,J)=min{AK-1(I,J),AK-1(I,K)+AK-1(K,J)},K>=1 and
A0(I,J)=COST(I,J).
EXAMPLE:

The following fig. Shows a digraph with its matrix A0

For this graph A2(1,3)  min{A1(1,3),A1(1,2)+A1(2,3)}=2. Instead we see that


A2(1,3)=- as the length o I W K H SDW K
is so because of the cycle 1 2 1 which has a length of 1–.

ALGORITHM:
The algorithm ALL_COSTS computers An(I,J).the computation is done in place so
the superscript on A is dropped. The reason for this computation can be carried out in
place is that AK(I,K)=AK-1(I,K) and AK(K,J)=Ak-1(K,J) and so the in place
computation does not alter the outcome.

PROCEDURE ALL_COSTS (COST,A,N)


//COST(n,n) is the cost adjacency matrix of a graph with n vertices; A(I,J) is
the cost of the shortest path between vertices Vi,Vj.
COST(I,J)=0, 1<=i<=n //
1. for I  1 to n do
2. for J  1 to n do
3. A(I,J)COST(I,J) //copy COST into A//
4. end
5. end
6. for K  1to n do //for a path with highest vertex index K//
7. for I  1to n do //for all possible pairs of vertices //
8. for J  1 to n do
9. A(I,J)  min{A(I,J),A(I,K)+A(K,J)}
10. End
11. End
12. End
13. End ALL_COSTS
DATA STRUCTURES II IT

ANALYSIS:

The looping is independent of the data in matrix A and so we can easily say that the
computing time of this algorithm is O(n^3).

EXAMPLE:
Let us consider the following digraph G:

The cost matrix of G is

The initial A matrix, A0 plus its value after 3 iterations A(1),A(2),A(3) is given below:
DATA STRUCTURES II IT

3.2.6.3 TRANSITIVE CLOSURE :

In case of all pairs shortest path problem, it is necessary to determine for every pair of
vertices i, j In the existence of a path from i to j .

Two cases of interest here are


I. When all path lengths (i.e. the number of edges on the path) are required to
be positive
II. And the other when path lengths are to be non negative.

If A is the adjacency matrix of G , then the matrix A+ having the property A+(i ,j) =1
if there is a path of length >0 from i to j and o otherwise it is called the transitive
closure matrix of G.

The matrix A* with the property A*(i ,j) = 1 if there is a path of length >=0
from i to j and o otherwise , is the reflexive transitive closure matrix of G

EXAMPLE :
Consider the following digraph G

11 2 3 4 5

fig(a)

ADJACENCY MATRIX A for G is

1 0 1 0 0 0

2 0 0 1 0 0

3 0 0 0 1 0

4 0 0 0 0 1

5 0 0 1 0 0

fig (b)
DATA STRUCTURES II IT

1 2 3 4 5
A+: 1 0 1 1 1 1

2 0 0 1 1 1

3 0 0 1 1 1

4 0 0 1 1 1

5 0 0 1 1 1

fig (c)

1 2 3 4 5
*
A; 1 1 1 1 1 1

2 0 1 1 1 1

3 0 0 1 1 1

4 0 0 1 1 1

5 0 0 1 1 1

fig (d)

The only difference between A* and A+ is the terms on the diagonal.


A+ (i ,j) = 1 if there is a cycle of length >1 containing vertex i while A*(i ,j)
is always one as there is a path of length o from i to i .

If we use algorithm ALL_COSTS with COST(i ,j) = 1 if <i ,j> is an edge in G and
COST(i ,j) =  if <i ,j> is not in G , Then we can easily obtain A+ from the final
matrix A by letting A+ (i ,j )=1 if A(i,j)<+ .

A* can be obtained by setting all diagonal elements equal 1.

In this case the algorithm ALL _COSTS can be simplified on line 9.


That is it becomes A(i,j)<-- A(i,j) or (A(i,k) & A(k,j)&COST(i,j) is just adjacency
matrix of G.

With this modification the resultant matrix of A will be A+


DATA STRUCTURES II IT

3.2.7 TOPOLOGICAL SORT

A directed graph G in which the vertices represent tasks/events and the edges
represents activities to move from one event to another then that graph is known as
activity on verten network/AOV-network/PERT graph.

PERT graph is made to analyze interrelated activities of complete projects. The


purpose of a topological sort is to order the events in a linear manner, with the
restriction that an event cannot precede other events that must first take place.

A topological sort defines a linear ordering on these orders of a directed graph


that have the following property: If node P as a predecessor of node Q, then Q cannot
be the predecessor of node P. A topological sort cannot order the nodes in a cycle.

ALGORITHM: To sort the tasks into topological order:


Let n be the no of vertices
1. For i 1 to n do //o/p the vertices//
2. If every vertex has no predecessor then (the network has a cycle and stop)
3. Pick a vertex i.e. which has no predecessor.
4. O/p V
5. Delete V and all edges leading and of V from the network.
6. End.

Apply this for the above given graph, we will get:


a) Initial

b) V1
DATA STRUCTURES II IT

c) V4

d) V3

e) V6

f) V2

g) V5

Topological order generated is:


V1,V4,V3,V6,V2,V5.

Here on step (b) we have, 3 vertices V2,V3&V4 which has no predecessor and any of
these can be the next vertex in topological order.

In this problem, the functions are:


i) Decide whether a vertex has any predecessor.
ii) Decide a vertex together with all its incident edges.

 This can be efficiently done if for each vertex a count of the no of its immediate
predecessor is kept.
 This can be easily implemented if the network is represented by its adjacency lists.

The following algorithm assumes that the network is represented by adjacency


lists.
DATA STRUCTURES II IT

ALGORITHM:
Procedure Topological_order (count,vertex,link,n)
//the n vertices of an AOV-network are listed in topological order .
the network is represented as a set of adjacency lists with
COUNT(i)=the in-degree of vertex i//
1) top 0 //initialize stack//
2) for i  1 to n do //create a linked stack of vertices with no predecessors //
3) if COUNT (I)=0 then [ COUNT(i)  top;topI]
4) end
5) for i 1 to n do //print the vertices in topological order //
6) L I W R S W K H Q > S U L Q W Q
7) j  top;topCOUNT (top);print(j)//unstack a vertex//
8) ptr LINK(j)
9) while ptr <> 0 do
//decrease the count of successor vertices of j//

10) K VERTEX(ptr) // K is a successor of j//


11) COUNT(K)  COUNT(K)-1 // Decrease count//
12) If COUNT(K)=0 // Add vertex K to stack//
13) Then[COUNT(K)  top; topK]
14) Ptr LINK(ptr)
15) End
16) End
17) End TOPOLOGICAL_ORDER.

COUNT LINK VERTEX LINK

The head nodes of these lists contain two fields : COUNT & LINK. The COUNT
field contains the in-degree of that vertex and LINK is a pointer to the first node on
the adjacency list. Each list node has 2 fields:VERTEX & LINK.

COUNT fields can be set at the time of i/p. When edge < i ,j > is i/p the count
of vertex j is incremented by 1. The list of vertices with zero count is maintained as a
stack.

You might also like