Unit-IV Notes Data Structure.docx
Unit-IV Notes Data Structure.docx
Graph Data Structure is a non-linear data structure consisting of vertices and edges. It is
useful in fields such as social network analysis, recommendation systems, and computer
networks. In the field of sports data science, graph data structure can be used to analyze and
understand the dynamics of team performance and player interactions on the field.
In the above figure, an image shows the mapping among the vertices (A, B, C, D, E), and this
mapping is represented by using the adjacency matrix.
There exist different adjacency matrices for the directed and undirected graph. In a directed
graph, an entry Aij will be 1 only when there is an edge directed from Vi to Vj.
Adjacency matrix for a directed graph
In a directed graph, edges represent a specific path from one vertex to another vertex.
Suppose a path exists from vertex A to another vertex B; it means that node A is the initial
node, while node B is the terminal node.
Consider the below-directed graph and try to construct the adjacency matrix of it.
In the above graph, we can see there is no self-loop, so the diagonal entries of the adjacent
matrix are 0.
Adjacency matrix for a weighted directed graph
It is similar to an adjacency matrix representation of a directed graph except that instead of
using the '1' for the existence of a path, here we have to use the weight associated with the
edge. The weights on the graph edges will be represented as the entries of the adjacency
matrix. We can understand it with the help of an example. Consider the below graph and its
adjacency matrix representation. In the representation, we can see that the weight associated
with the edges is represented as the entries in the adjacency matrix.
In the above image, we can see that the adjacency matrix representation of the weighted
directed graph is different from other representations. It is because, in this representation, the
non-zero values are replaced by the actual weight assigned to the edges.
Adjacency matrix is easier to implement and follow. An adjacency matrix can be used when
the graph is dense and a number of edges are large.
Though, it is advantageous to use an adjacency matrix, but it consumes more space. Even if
the graph is sparse, the matrix still consumes the same space.
Linked list representation
An adjacency list is used in the linked representation to store the Graph in the computer's
memory. It is efficient in terms of storage as we only have to store the values for edges.
Let's see the adjacency list representation of an undirected graph.
In the above figure, we can see that there is a linked list or adjacency list for every node of
the graph. From vertex A, there are paths to vertex B and vertex D. These nodes are linked to
nodes A in the given adjacency list.
An adjacency list is maintained for each node present in the graph, which stores the node
value and a pointer to the next adjacent node to the respective node. If all the adjacent nodes
are traversed, then store the NULL in the pointer field of the last node of the list.
The sum of the lengths of adjacency lists is equal to twice the number of edges present in an
undirected graph.
Now, consider the directed graph, and let's see the adjacency list representation of that graph.
For a directed graph, the sum of the lengths of adjacency lists is equal to the number of edges
present in the graph.
Now, consider the weighted directed graph, and let's see the adjacency list representation of
that graph.
In the case of a weighted directed graph, each node contains an extra field that is called the
weight of the node.
In an adjacency list, it is easy to add a vertex. Because of using the linked list, it also saves
space.
Here are the two most common ways to represent a graph : For simplicity, we are going to
consider only unweighted graphs in this post.
1. Adjacency Matrix
2. Adjacency List
Adjacency Matrix Representation
An adjacency matrix is a way of representing a graph as a matrix of boolean (0’s and 1’s)
Let’s assume there are n vertices in the graph So, create a 2D matrix adjMat[n][n] having
dimension n x n.
● If there is an edge from vertex i to j, mark adjMat[i][j] as 1.
● If there is no edge from vertex i to j, mark adjMat[i][j] as 0.
Representation of Undirected Graph as Adjacency Matrix:
The below figure shows an undirected graph. Initially, the entire Matrix is initialized to 0. If
there is an edge from source to destination, we insert 1 to both cases
(adjMat[destination] and adjMat[destination]) because we can go either way.
Output: 0 2 4 3 1
Explanation: DFS Steps:
● Start at 0: Mark as visited. Output: 0
● Move to 2: Mark as visited. Output: 2
● Move to 4: Mark as visited. Output: 4 (backtrack to 2, then backtrack to 0)
● Move to 3: Mark as visited. Output: 3 (backtrack to 0)
● Move to 1: Mark as visited. Output: 1
Note that there can be multiple DFS traversals of a graph according to the order in
which we pick adjacent vertices. Here we pick vertices as per the insertion order.
DFS from a Given Source of Undirected Graph:
The algorithm starts from a given source and explores all reachable vertices from the
given source. It is similar to Preorder Tree Traversal where we visit the root, then
recur for its children. In a graph, there maybe loops. So we use an extra visited array
to make sure that we do not process a vertex again.
Let us understand the working of Depth First Search with the help of the following
illustration: for the source as 0.
Time complexity: O(V + E), where V is the number of vertices and E is the number
of edges in the graph.
Auxiliary Space: O(V + E), since an extra visited array of size V is required, And
stack size for recursive calls to DFSRec function.
DFS for Complete Traversal of Disconnected Undirected Graph
The above implementation takes a source as an input and prints only those vertices
that are reachable from the source and would not print all vertices in case of
disconnected graph. Let us now talk about the algorithm that prints all vertices
without any source and the graph maybe disconnected.
The idea is simple, instead of calling DFS for a single vertex, we call the above
implemented DFS for all all non-visited vertices one by one.
for (int i : adj[s])
if (visited[i] == false)
DFSRec(adj, visited, i);
}
Time complexity: O(V + E). Note that the time complexity is same here because we visit
every vertex at most once and every edge is traversed at most once (in directed) and twice in
undirected.
Auxiliary Space: O(V + E), since an extra visited array of size V is required, And stack size
for recursive calls to DFSRec function.
Spanning Tree
A spanning tree is a subset of Graph G, such that all the vertices are connected using
minimum possible number of edges. Hence, a spanning tree does not have cycles and a graph
may have more than one spanning tree.
Properties of a Spanning Tree:
● A Spanning tree does not exist for a disconnected graph.
● For a connected graph having N vertices then the number of edges in the spanning
tree for that graph will be N-1.
● A Spanning tree does not have any cycle.
● We can construct a spanning tree for a complete graph by removing E-N+1 edges,
where E is the number of Edges and N is the number of vertices.
● Cayley’s Formula: It states that the number of spanning trees in a complete graph
with N vertices is
Kruskal’s Algorithm:
Here we will discuss Kruskal’s algorithm to find the MST of a given weighted graph.
In Kruskal’s algorithm, sort all edges of the given graph in increasing order. Then it keeps on
adding new edges and nodes in the MST if the newly added edge does not form a cycle. It
picks the minimum weighted edge at first and the maximum weighted edge at last. Thus we
can say that it makes a locally optimal choice in each step in order to find the optimal
solution. Hence this is a Greedy Algorithm.
How to find MST using Kruskal’s algorithm?
Below are the steps for finding MST using Kruskal’s algorithm:
1. Sort all the edges in non-decreasing order of their weight.
2. Pick the smallest edge. Check if it forms a cycle with the spanning tree formed so far.
If the cycle is not formed, include this edge. Else, discard it.
3. Repeat step#2 until there are (V-1) edges in the spanning tree.
Step 2 uses the Union-Find algorithm to detect cycles.
So we recommend reading the following post as a prerequisite.
● Union-Find Algorithm | Set 1 (Detect Cycle in a Graph)
● Union-Find Algorithm | Set 2 (Union By Rank and Path Compression)
Kruskal’s algorithm to find the minimum cost spanning tree uses the greedy approach. The
Greedy Choice is to pick the smallest weight edge that does not cause a cycle in the MST
constructed so far. Let us understand it with an example:
Illustration:
Below is the illustration of the above approach:
Input Graph:
The graph contains 9 vertices and 14 edges. So, the minimum spanning tree formed will be
having (9 – 1) = 8 edges.
After sorting:
Sourc
Weight Destination
e
1 7 6
2 8 2
2 6 5
4 0 1
4 2 5
Sourc
Weight Destination
e
6 8 6
7 2 3
7 7 8
8 0 7
8 1 2
9 3 4
10 5 4
11 1 7
14 3 5
Now pick all edges one by one from the sorted list of edges
Step 1: Pick edge 7-6. No cycle is formed, include it.
Example of a graph
Step 1: Firstly, we select an arbitrary vertex that acts as the starting vertex of the Minimum
Spanning Tree. Here we have selected vertex 0 as the starting vertex.
0 is selected as starting vertex
Step 2: All the edges connecting the incomplete MST and other vertices are the edges {0, 1}
and {0, 7}. Between these two the edge with minimum weight is {0, 1}. So include the edge
and vertex 1 in the MST.
Structure of the alternate MST if we had selected edge {1, 2} in the MST
How to implement Prim’s Algorithm?
Follow the given steps to utilize the Prim’s Algorithm mentioned above for finding MST of
a graph:
● Create a set mstSet that keeps track of vertices already included in MST.
● Assign a key value to all vertices in the input graph. Initialize all key values as
INFINITE. Assign the key value as 0 for the first vertex so that it is picked first.
● While mstSet doesn’t include all vertices
o Pick a vertex u that is not there in mstSet and has a minimum key value.
o Include u in the mstSet.
o Update the key value of all adjacent vertices of u. To update the key values,
iterate through all adjacent vertices.
o For every adjacent vertex v, if the weight of edge u-v is less than the
previous key value of v, update the key value as the weight of u-v.
The idea of using key values is to pick the minimum weight edge from the cut. The key
values are used only for vertices that are not yet included in MST, the key value for these
vertices indicates the minimum weight edges connecting them to the set of vertices included
in MST.
Time Complexity: O(V2), If the input graph is represented using an adjacency list, then the
time complexity of Prim’s algorithm can be reduced to O(E * logV) with the help of a binary
heap. In this implementation, we are always considering the spanning tree to start from the
root of the graph
Auxiliary Space: O(V)
Time Complexity: O(E*log(E)) where E is the number of edges
Auxiliary Space: O(V^2) where V is the number of vertex
Prim’s algorithm for finding the minimum spanning tree (MST):
Advantages:
1. Prim’s algorithm is guaranteed to find the MST in a connected, weighted graph.
2. It has a time complexity of O(E log V) using a binary heap or Fibonacci heap, where
E is the number of edges and V is the number of vertices.
3. It is a relatively simple algorithm to understand and implement compared to some
other MST algorithms.
Disadvantages:
1. Like Kruskal’s algorithm, Prim’s algorithm can be slow on dense graphs with many
edges, as it requires iterating over all edges at least once.
2. Prim’s algorithm relies on a priority queue, which can take up extra memory and slow
down the algorithm on very large graphs.
3. The choice of starting node can affect the MST output, which may not be desirable in
some applications.
Other Implementations of Prim’s Algorithm:
Given below are some other implementations of Prim’s Algorithm
● Prim’s Algorithm for Adjacency Matrix Representation – In this article we have
discussed the method of implementing Prim’s Algorithm if the graph is represented by
an adjacency matrix.
● Prim’s Algorithm for Adjacency List Representation – In this article Prim’s Algorithm
implementation is described for graphs represented by an adjacency list.
● Prim’s Algorithm using Priority Queue: In this article, we have discussed a
time-efficient approach to implement Prim’s algorithm.
Shortest Paths using Dijkstra’s Algorithm
Given a weighted graph and a source vertex in the graph, find the shortest paths from the
source to all the other vertices in the given graph.
Note: The given graph does not contain any negative edge.
Examples:
Input: src = 0, the graph is shown below.
Output: 0 4 12 19 21 11 9 8 14
Explanation: The distance from 0 to 1 = 4.
The minimum distance from 0 to 2 = 12. 0->1->2
The minimum distance from 0 to 3 = 19. 0->1->2->3
The minimum distance from 0 to 4 = 21. 0->7->6->5->4
The minimum distance from 0 to 5 = 11. 0->7->6->5
The minimum distance from 0 to 6 = 9. 0->7->6
The minimum distance from 0 to 7 = 8. 0->7
The minimum distance from 0 to 8 = 14. 0->1->2->8
Step 1:
● The set sptSet is initially empty and distances assigned to vertices are {0, INF, INF,
INF, INF, INF, INF, INF} where INF indicates infinite.
● Now pick the vertex with a minimum distance value. The vertex 0 is picked, include it
in sptSet . So sptSet becomes {0}. After including 0 to sptSet , update distance values
of its adjacent vertices.
● Adjacent vertices of 0 are 1 and 7. The distance values of 1 and 7 are updated as 4
and 8.
The following subgraph shows vertices and their distance values, only the vertices with finite
distance values are shown. The vertices included in SPT are shown in green colour.
Step 2:
● Pick the vertex with minimum distance value and not already included in SPT (not in
sptSET ). The vertex 1 is picked and added to sptSet .
● So sptSet now becomes {0, 1}. Update the distance values of adjacent vertices of 1.
● The distance value of vertex 2 becomes 12 .
Step 3:
● Pick the vertex with minimum distance value and not already included in SPT (not in
sptSET ). Vertex 7 is picked. So sptSet now becomes {0, 1, 7}.
● Update the distance values of adjacent vertices of 7. The distance value of vertex 6
and 8 becomes finite ( 15 and 9 respectively).
Step 4:
● Pick the vertex with minimum distance value and not already included in SPT (not in
sptSET ). Vertex 6 is picked. So sptSet now becomes {0, 1, 7, 6} .
● Update the distance values of adjacent vertices of 6. The distance value of vertex 5
and 8 are updated.
We repeat the above steps until sptSet includes all vertices of the given graph. Finally, we get
the following S hortest Path Tree (SPT).
The Dijkstra algorithm is one of the most important graph algorithms in the DSA but most
of the students find it difficult to understand it. In order to have a strong grip on these types of
algorithms.
Time Complexity: O(V 2 )
Auxiliary Space: O(V)
Notes:
● The code calculates the shortest distance but doesn’t calculate the path information.
Create a parent array, update the parent array when distance is updated and use it to
show the shortest path from source to different vertices.
● The time Complexity of the implementation is O(V 2 ) . If the input graph is
represented using adjacency list , it can be reduced to O(E * log V) with the help of a
binary heap. Please see Dijkstra’s Algorithm for Adjacency List Representation for
more details.
● Dijkstra’s algorithm doesn’t work for graphs with negative weight cycles.
Why Dijkstra’s Algorithms fails for the Graphs having Negative Edges ?
The problem with negative weights arises from the fact that Dijkstra’s algorithm assumes that
once a node is added to the set of visited nodes, its distance is finalized and will not change.
However, in the presence of negative weights, this assumption can lead to incorrect results.
Consider the following graph for the example:
In the above graph, A is the source node, among the edges A to B and A to C , A to B is the
smaller weight and Dijkstra assigns the shortest distance of B as 2, but because of existence
of a negative edge from C to B , the actual shortest distance reduces to 1 which Dijkstra fails
to detect.
Note: We use Bellman Ford’s Shortest path algorithm in case we have negative edges in the
graph.
Dijkstra’s Algorithm using Adjacency List in O(E logV):
For Dijkstra’s algorithm, it is always recommended to use Heap (or priority queue ) as the
required operations (extract minimum and decrease key) match with the speciality of the heap
(or priority queue). However, the problem is, that priority_queue doesn’t support the decrease
key. To resolve this problem, do not update a key, but insert one more copy of it. So we allow
multiple instances of the same vertex in the priority queue. This approach doesn’t require
decreasing key operations and has below important properties.
● Whenever the distance of a vertex is reduced, we add one more instance of a vertex in
priority_queue. Even if there are multiple instances, we only consider the instance
with minimum distance and ignore other instances.
● The time complexity remains O(E * LogV) as there will be at most O(E) vertices in
the priority queue and O(logE) is the same as O(logV)
Time Complexity: O(E * logV), Where E is the number of edges and V is the number of
vertices.
Auxiliary Space: O(V)
Applications of Dijkstra’s Algorithm:
● Google maps uses Dijkstra algorithm to show shortest distance between source and
destination.
● In computer networking , Dijkstra’s algorithm forms the basis for various routing
protocols, such as OSPF (Open Shortest Path First) and IS-IS (Intermediate System to
Intermediate System).
● Transportation and traffic management systems use Dijkstra’s algorithm to optimize
traffic flow, minimize congestion, and plan the most efficient routes for vehicles.
● Airlines use Dijkstra’s algorithm to plan flight paths that minimize fuel consumption,
reduce travel time.
● Dijkstra’s algorithm is applied in electronic design automation for routing connections
on integrated circuits and very-large-scale integration (VLSI) chips.