Unit - 5
Unit - 5
Graphs: ADT, data structure for graphs, graph traversal, Transitive closure,
directed acyclic graph, shortest paths [weighted graphs, Dijkstra’s algorithm],
minimum spanning trees [Prim’s, Kruskal’s, disjoint partitions, union-find
structures].
A graph data structure is a collection of nodes that have data and are connected to
other nodes.
Every relationship is an edge from one node to another. Whether you post a photo,
join a group, like a page, etc., a new edge is created for that relationship.
All of Facebook is then a collection of these nodes and edges. This is because
Facebook uses a graph data structure to store its data.
Graph Terminology
Adjacency: A vertex is said to be adjacent to another vertex if there is an edge
connecting them. Vertices 2 and 3 are not adjacent because there is no edge
between them.
Path: A sequence of edges that allows you to go from vertex A to vertex B is called
a path. 0-1, 1-2 and 0-2 are paths from vertex 0 to vertex 2.
Directed Graph: A graph in which an edge (u,v) doesn't necessarily mean that
there is an edge (v, u) as well. The edges in such a graph are represented by arrows
to show the direction of the edge.
Graph Representation
Graphs are commonly represented in two ways:
1. Adjacency Matrix
An adjacency matrix is a 2D array of V x V vertices. Each row and column
represent a vertex.
If the value of any element a[i][j] is 1, it represents that there is an edge connecting
vertex i and vertex j.
Since it is an undirected graph, for edge (0,2), we also need to mark edge (2,0);
making the adjacency matrix symmetric about the diagonal.
Edge lookup(checking if an edge exists between vertex A and vertex B) is
extremely fast in adjacency matrix representation but we have to reserve space for
every possible link between all vertices(V x V), so it requires more space.
2. Adjacency List
An adjacency list represents a graph as an array of linked lists.
The index of the array represents a vertex and each element in its linked list
represents the other vertices that form an edge with the vertex.
The adjacency list for the graph we made in the first example is as follows:
An adjacency list is efficient in terms of storage because we only need to store the
values for the edges. For a graph with millions of vertices, this can mean a lot of
saved space.
Graph Operations
The most common graph operations are:
● Check if the element is present in the graph
● Graph Traversal
● Add elements(vertex, edges) to graph
● Finding the path from one vertex to another
Graph Traversal
There are two types of graph traversal
1. Depth First Search(DFS)
2. Breadth First Search(BFS)
DFS Algorithm
Traversal means visiting all the nodes of a graph. Depth first traversal or Depth
first Search is a recursive algorithm for searching all the vertices of a graph or tree
data structure. In this article, you will learn with the help of examples the DFS
algorithm, DFS pseudocode, and the code of the depth first search algorithm with
implementation in C++, C, Java, and Python programs.
DFS algorithm
A standard DFS implementation puts each vertex of the graph into one of two
categories:
1. Visited
2. Not Visited
The purpose of the algorithm is to mark each vertex as visited while avoiding
cycles.
DFS example
Let's see how the Depth First Search algorithm works with an example. We use an
undirected graph with 5 vertices.
Next, we visit the element at the top of stack i.e. 1 and go to its adjacent nodes.
Since 0 has already been visited, we visit 2 instead.
Vertex 2 has an unvisited adjacent vertex in 4, so we add that to the top of the
stack and visit it.
Vertex 2 has an unvisited adjacent vertex in 4, so we add that to the top of the
stack and visit it.
Vertex 2 has an unvisited adjacent vertex in 4, so we add that to the top of the
stack and visit it.
After we visit the last element 3, it doesn't have any unvisited adjacent nodes, so
we have completed the Depth First Traversal of the graph.
After we visit the last element 3, it doesn't have any unvisited adjacent nodes, so
we have completed the Depth First Traversal of the graph.
DFS(G, u)
u.visited = true
for each v ∈ G.Adj[u]
if v.visited == false
DFS(G,v)
init() {
For each u ∈ G
u.visited = false
For each u ∈ G
DFS(G, u)
}
The code for the Depth First Search Algorithm with an example is shown below.
The code has been simplified so that we can focus on the algorithm rather than
other details.
print(start)
for next in graph[start] - visited:
dfs(graph, next, visited)
return visited
dfs(graph, '0')
BFS algorithm
A standard BFS implementation puts each vertex of the graph into one of two
categories:
1. Visited
2. Not Visited
The purpose of the algorithm is to mark each vertex as visited while avoiding
cycles.
The graph might have two different disconnected parts so to make sure that we
cover every vertex, we can also run the BFS algorithm on every node
BFS example
Let's see how the Breadth First Search algorithm works with an example. We use
an undirected graph with 5 vertices.
We start from vertex 0, the BFS algorithm starts by putting it in the Visited list and
putting all its adjacent vertices in the stack.
Next, we visit the element at the front of queue i.e. 1 and go to its adjacent nodes.
Since 0 has already been visited, we visit 2 instead.
Only 4 remains in the queue since the only adjacent node of 3 i.e. 0 is already
visited. We visit it.
Visit last remaining item in the stack to check if it has unvisited neighbors
Since the queue is empty, we have completed the Breadth First Traversal of the
graph.
BFS pseudocode
create a queue Q
mark v as visited and put v into Q
while Q is non-empty
remove the head u of Q
mark and enqueue all (unvisited) neighbours of u
The code for the Breadth First Search Algorithm with an example is shown below.
The code has been simplified so that we can focus on the algorithm rather than
other details.
# BFS algorithm
def bfs(graph, root):
while queue:
if __name__ == '__main__':
graph = {0: [1, 2], 1: [2], 2: [3], 3: [1, 2]}
print("Following is Breadth First Traversal: ")
bfs(graph, 0)
An undirected graph is a graph in which the edges do not point in any direction
(ie. the edges are bidirectional).
Undirected Graph
A connected graph is a graph in which there is always a path from a vertex to any
other vertex.
Connected Graph
Spanning tree
A spanning tree is a sub-graph of an undirected connected graph, which includes
all the vertices of the graph with a minimum possible number of edges. If a vertex
is missed, then it is not a spanning tree.
The total number of spanning trees with n vertices that can be created from a
complete graph is equal to n(n-2).
If we have n = 4, the maximum number of possible spanning trees is equal to 4 4-2
= 16. Thus, 16 spanning trees can be formed from a complete graph with 4 vertices.
Normal graph
Some of the possible spanning trees that can be created from the above graph are:
A spanning tree
A spanning tree
A spanning tree
A spanning tree
A spanning tree
A spanning tree
Minimum Spanning Tree
A minimum spanning tree is a spanning tree in which the sum of the weight of
the edges is as minimum as possible.
Weighted graph
The possible spanning trees from the above graph are:
Minimum spanning tree - 1
The minimum spanning tree from the above spanning trees is:
Minimum spanning tree
The minimum spanning tree from a graph is found using the following
algorithms:
● Prim's Algorithm
● Kruskal's Algorithm
Adjacency Matrix
An adjacency matrix is a way of representing a graph G = {V, E} as a matrix of
booleans.
In the case of undirected graphs, the matrix is symmetric about the diagonal
because of every edge (i, j), there is also an edge (j, i).
Pros of an adjacency matrix
The basic operations like adding an edge, removing an edge, and checking
whether there is an edge from vertex i to vertex j are extremely time-efficient,
constant-time operations.
If the graph is dense and the number of edges is large, the adjacency matrix should
be the first choice. Even if the graph and the adjacency matrix is sparse, we can
represent it using data structures for sparse matrices.
The biggest advantage, however, comes from the use of matrices. The recent
advances in hardware enable us to perform even expensive matrix operations on
the GPU.
Graphs out in the wild usually don't have too many connections and this is the
major reason why adjacency lists are the better choice for most tasks.
While basic operations are easy, operations like inEdges and outEdges are
expensive when using the adjacency matrix representation.
If you know how to create two-dimensional arrays, you also know how to create
an adjacency matrix.
# Adjacency Matrix representation in Python
class Graph(object):
# Add edges
def add_edge(self, v1, v2):
if v1 == v2:
print("Same vertex %d and %d" % (v1, v2))
self.adjMatrix[v1][v2] = 1
self.adjMatrix[v2][v1] = 1
# Remove edges
def remove_edge(self, v1, v2):
if self.adjMatrix[v1][v2] == 0:
print("No edge between %d and %d" % (v1, v2))
return
self.adjMatrix[v1][v2] = 0
self.adjMatrix[v2][v1] = 0
def __len__(self):
return self.size
def main():
g = Graph(5)
g.add_edge(0, 1)
g.add_edge(0, 2)
g.add_edge(1, 2)
g.add_edge(2, 0)
g.add_edge(2, 3)
g.print_matrix()
if __name__ == '__main__':
main()
The index of the array represents a vertex and each element in its linked list
represents the other vertices that form an edge with the vertex.
An adjacency list is efficient in terms of storage because we only need to store the
values for the edges. For a sparse graph with millions of vertices and edges, this
can mean a lot of saved space.
The simplest adjacency list needs a node data structure to store a vertex and a
graph data structure to organize the nodes.
We stay close to the basic definition of a graph - a collection of vertices and edges
{V, E}. For simplicity, we use an unlabeled graph as opposed to a labeled one i.e. the
vertices are identified by their indices 0,1,2,3.
struct Graph
{
int numVertices;
struct node** adjLists;
};
All we are saying is we want to store a pointer to struct node*. This is because we
don't know how many vertices the graph will have and so we cannot create an
array of Linked Lists at compile time.
class Graph
{
int numVertices;
list<int> *adjLists;
public:
Graph(int V);
void addEdge(int src, int dest);
};
The type of LinkedList is determined by what data you want to store in it. For a
labeled graph, you could store a dictionary instead of an Integer
class AdjNode:
def __init__(self, value):
self.vertex = value
self.next = None
class Graph:
def __init__(self, num):
self.V = num
self.graph = [None] * self.V
# Add edges
def add_edge(self, s, d):
node = AdjNode(d)
node.next = self.graph[s]
self.graph[s] = node
node = AdjNode(s)
node.next = self.graph[d]
self.graph[d] = node
graph.print_agraph()
Dijkstra's Algorithm
Dijkstra's algorithm allows us to find the shortest path between any two vertices of a
graph.
It differs from the minimum spanning tree because the shortest distance between
two vertices might not include all the vertices of the graph.
Djikstra used this property in the opposite direction i.e we overestimate the
distance of each vertex from the starting vertex. Then we visit each node and its
neighbors to find the shortest subpath to those neighbors.
The algorithm uses a greedy approach in the sense that we find the next best
solution hoping that the end result is the best solution for the whole problem.
Example of Dijkstra's algorithm
It is easier to start with an example and then think about the algorithm.
Choose a starting vertex and assign infinity path values to all other devices
If the path length of the adjacent vertex is lesser than new path length, don't
update it
Avoid updating path lengths of already visited vertices
After each iteration, we pick the unvisited vertex with the least path length. So we
choose 5 before 7
Notice how the rightmost vertex has its path length updated twice
Once the algorithm is over, we can backtrack from the destination vertex to the
source vertex to find the path.
A minimum priority queue can be used to efficiently receive the vertex with least
path distance.
function dijkstra(G, S)
for each vertex V in G
distance[V] <- infinite
previous[V] <- NULL
If V != S, add V to Priority Queue Q
distance[S] <- 0
num_of_vertices = len(vertices[0])
visited_and_distance[to_visit][0] = 1
i=0
Kruskal's Algorithm
It falls under a class of algorithms called greedy algorithms that find the local
optimum in the hopes of finding a global optimum.
We start from the edges with the lowest weight and keep adding edges until we
reach our goal.
The steps for implementing Kruskal's algorithm are as follows:
● Sort all the edges from low weight to high
● Take the edge with the lowest weight and add it to the spanning tree. If
adding the edge created a cycle, then reject this edge.
● Keep adding edges until we reach all vertices.
Choose the edge with the least weight, if there are more than 1, choose anyone
Choose the next shortest edge that doesn't create a cycle and add it
Choose the next shortest edge that doesn't create a cycle and add it
The most common way to find this out is an algorithm called Union FInd. The
Union-Find algorithm divides the vertices into clusters and allows us to check if
two vertices belong to the same cluster or not and hence decide whether adding
an edge creates a cycle.
KRUSKAL(G):
A = ∅
For each vertex v ∈ G.V:
MAKE-SET(v)
For each edge (u, v) ∈ G.E ordered by increasing order by weight(u, v):
if FIND-SET(u) ≠ FIND-SET(v):
A = A ∪ {(u, v)}
UNION(u, v)
return A
# Search function
g = Graph(6)
g.add_edge(0, 1, 4)
g.add_edge(0, 2, 4)
g.add_edge(1, 2, 2)
g.add_edge(1, 0, 4)
g.add_edge(2, 0, 4)
g.add_edge(2, 1, 2)
g.add_edge(2, 3, 3)
g.add_edge(2, 5, 2)
g.add_edge(2, 4, 4)
g.add_edge(3, 2, 3)
g.add_edge(3, 4, 3)
g.add_edge(4, 2, 4)
g.add_edge(4, 3, 3)
g.add_edge(5, 2, 2)
g.add_edge(5, 4, 3)
g.kruskal_algo()
Prim's Algorithm
Prim's algorithm is a minimum spanning tree algorithm that takes a graph as
input and finds the subset of the edges of that graph which
● form a tree that includes every vertex
● has the minimum sum of weights among all the trees that can be formed
from the graph
We start from one vertex and keep adding edges with the lowest weight until we
reach our goal.
Choose a vertex
Choose the nearest edge not yet in the solution, if there are multiple choices,
choose one at random