Graph Data Structures
Graph Data Structures
5 is adjacent to 7
7 is adjacent from 5
2
O( N )
Graph terminology (cont.)
• What is the number of edges in a complete
undirected graph with N vertices?
N * (N-1) / 2
2
O( N )
Graph terminology (cont.)
• Weighted graph: a graph in which each edge
carries a value
Graph implementation
Pros: Representation is easier to
• Array-based implementation (Adjacency Matrix) implement and follow.
– A 1D array is used to represent the vertices Removing an edge takes O(1)
time. Queries like whether
– A 2D array (adjacency matrix) is used to represent the
there is an edge from vertex 'u'
edges to vertex 'v' are efficient and
can be done O(1).
• Adjacency matrix
– Good for dense graphs --|E|~O(|V|2)
– Memory requirements: O(|V| + |E| ) = O(|V|2 )
– Connectivity between two vertices can be tested quickly
• Adjacency list
– Good for sparse graphs -- |E|~O(|V|)
– Memory requirements: O(|V| + |E|)=O(|V|)
– Vertices adjacent to another vertex can be found quickly
we use dynamic arrays (vector in C++/ArrayList in Java) to represent adjacency lists instead of the linked list. The vector implementation has
advantages of cache friendliness.
During the execution of the nodes of a graph, a node can have one of three states i.e. Status of a Node as
follows.
Status 1 (White color): The initial state of any node (Ready state)
Status 2 (Gray color): The node is in the queue or stack i.e. waiting status of the node. (Waiting State)
Status 3 (Black color): The node has been processed, removed from stack/queue. (Processed state)
DFS Algorithm:
BFS Algorithm:
1. Initialize all nodes to the ready state. i.e. color
1. Initialize all nodes to the ready state (White
of all nodes to be white.
color).
2. Push the starting node A into STACK and
2. Put the starting node in the Queue and change
change its status to the waiting state i.e. Color
its status to the waiting state (Color Gray) and
Gray and update its forward time stamp to be
weight 0.
1.
3. Repeat step 4 & step 5 until the Queue is
3. Repeat step 4 and step 5 until STACK is
empty.
empty.
4. Remove the front Node N of the queue.
4. POP the top Node N of STACK. Process N and
Process the removed node N and change its
change its status to the processed state i.e.
status to the processed state (Black color).
Black color.
5. Add all the adjacent nodes of N that are in the
5. PUSH all the adjacent nodes of N onto the
ready state. Change their status to waiting for
stack if they are in the ready state (white).
state and update their weight by (weight of
Change their status to the waiting state (gray).
N+1).
If no new visit i.e. (no node having white
6. End.
color) Backtrack is required go to step 4.
6. End.
Breadth First Traversal of a Graph
The Breadth First Traversal or BFS traversal of a graph is similar to that of the Level Order Traversal of Trees.
The BFS traversal of Graphs also traverses the graph in levels. It starts the traversal with a given vertex, visits all of the vertices
adjacent to the initially given vertex and pushes them all to a queue in order of visiting. Then it pops an element from the front
of the queue, visits all of its neighbours and pushes the neighbours which are not already visited into the queue and repeats the
process until the queue is empty or all of the vertices are visited.
The BFS traversal uses an auxiliary boolean array say visited[] which keeps track of the visited vertices. That is if visited[i] =
true then it means that the i-th vertex is already visited.
Complete Algorithm:
• Create a boolean array say visited[] of size V+1 where V is the number of vertices in the graph.
• Create a Queue, mark the source vertex visited as visited[s] = true and push it into the queue.
• Until the Queue is non-empty, repeat the below steps:
• Pop an element from the queue and print the popped element.
• Traverse all of the vertices adjacent to the vertex poped from the queue.
• If any of the adjacent vertex is not already visited, mark it visited and push it to the queue.
Depth First Traversal of a Graph
The Depth-First Traversal or the DFS traversal of a Graph is used to traverse a graph depth wise. That is, it in this traversal method, we start
traversing the graph from a node and keep on going in the same direction as far as possible. When no nodes are left to be traversed along
the current path, backtrack to find a new possible path and repeat this process until all of the nodes are visited.
We can implement the DFS traversal algorithm using a recursive approach. While performing the DFS traversal the graph may contain a cycle
and the same node can be visited again, so in order to avoid this we can keep track of visited array using an auxiliary array. On each step of
the recursion mark, the current vertex visited and call the recursive function again for all the adjacent vertices.
Output:
Following is Depth First Traversal
(starting from vertex 2)
2013
Applications, Advantages and Disadvantages of DFS
Depth First Search is a widely used algorithm for traversing a graph. Here we have discussed some applications, advantages, and
disadvantages of the algorithm.
Applications of Depth First Search:
1. Detecting cycle in a graph: A graph has a cycle if and only if we see a back edge during DFS. So we can run DFS for the graph and
check for back edges.
2. Path Finding: We can specialize the DFS algorithm to find a path between two given vertices u and z.
Call DFS(G, u) with u as the start vertex.
Use a stack S to keep track of the path between the start vertex and the current vertex.
As soon as destination vertex z is encountered, return the path as the contents of the stack
3. Topological Sorting: Topological Sorting is mainly used for scheduling jobs from the given dependencies among jobs. In computer
science, applications of this type arise in instruction scheduling, ordering of formula cell evaluation when recomputing formula values in
spreadsheets, logic synthesis, determining the order of compilation tasks to perform in makefiles, data serialization, and resolving
symbol dependencies in linkers.
4. To test if a graph is bipartite: We can augment either BFS or DFS when we first discover a new vertex, color it opposite its parents,
and for each other edge, check it doesn’t link two vertices of the same color. The first vertex in any connected component can be red or
black.
5. Finding Strongly Connected Components of a Graph: A directed graph is called strongly connected if there is a path from each vertex
in the graph to every other vertex. (See this for DFS-based algo for finding Strongly Connected Components)
6. Solving puzzles with only one solution: such as mazes. (DFS can be adapted to find all solutions to a maze by only including nodes on
the current path in the visited set.).
7. Web crawlers: Depth-first search can be used in the implementation of web crawlers to explore the links on a website.
8. Maze generation: Depth-first search can be used to generate random mazes.
9. Model checking: Depth-first search can be used in model checking, which is the process of checking that a model of a system meets a
certain set of properties.
10. Backtracking: Depth-first search can be used in backtracking algorithms.
Advantages of Depth First Search:
• Memory requirement is only linear with respect to the search graph. This is in contrast with breadth-first search which
requires more space. The reason is that the algorithm only needs to store a stack of nodes on the path from the root
to the current node.
• The time complexity of a depth-first Search to depth d and branching factor b (the number of children at each node,
the outdegree) is O(bd) since it generates the same set of nodes as a breadth-first search, but simply in a different
order. Thus practically depth-first search is time-limited rather than space-limited.
• If the depth-first search finds a solution without exploring much in a path then the time and space it takes will be
very less.
• DFS requires less memory since only the nodes on the current path are stored. By chance, DFS may find a solution
without examining much of the search space at all.
• The disadvantage of Depth-First Search is that there is a possibility that it may down the left-most path forever. Even a
finite graph can generate an infinite tree. One solution to this problem is to impose a cutoff depth on the search.
Although the ideal cutoff is the solution depth d and this value is rarely known in advance of actually solving the
problem. If the chosen cutoff depth is less than d, the algorithm will fail to find a solution, whereas if the cutoff depth
is greater than d, a large price is paid in execution time, and the first solution found may not be an optimal one.
• Depth-First Search is not guaranteed to find the solution.
• And there is no guarantee to find a minimal solution if more than one solution.
start end
Introduction to Kruskal’s Algorithm
In Kruskal’s algorithm, sort all edges of the given graph in increasing order. Then it keeps
on adding new edges and nodes in the MST if the newly added edge does not form a
cycle. It picks the minimum weighted edge at first and the maximum weighted edge at
last. Thus we can say that it makes a locally optimal choice in each step in order to find
the optimal solution. Hence this is a Greedy Algorithm.
Algorithm:
1. Create a set mstSet that keeps track of vertices already included in MST.
2. Assign a key value to all vertices in the input graph. Initialize all key values as INFINITE. Assign key
value as 0 for the first vertex so that it is picked first.
3. While mstSet doesn't include all vertices:
• Pick a vertex u which is not there in mstSet and has minimum key value.
• Include u to mstSet.
• Update key value of all adjacent vertices of u. To update the key values, iterate through all
adjacent vertices. For every adjacent vertex v, if weight of edge u-v is less than the previous key
value of v, update the key value as weight of u-v.
The idea of using key values is to pick the minimum weight edge from cut
The key values are used only for vertices which are not yet included in MST, the key value for these vertices
indicate the minimum weight edges connecting them to the set of vertices included in MST.
Pick the vertex with minimum key value and not already included in MST (not in mstSET).
The vertex 1 is picked and added to mstSet. So mstSet now becomes {0, 1}. Update the
key values of adjacent vertices of 1. The key value of vertex 2 becomes 8.
Pick the vertex with minimum key value and not already included in MST (not in mstSET).
We can either pick vertex 7 or vertex 2, let vertex 7 is picked. So mstSet now becomes {0,
1, 7}. Update the key values of adjacent vertices of 7. The key value of vertex 6 and 8
becomes finite (1 and 7 respectively).
Dijkstra's Algorithm for Shortest Path in a Weighted Graph
Given a graph and a source vertex in the graph, find the shortest paths from single source to all
vertices in the given graph.
Dijkstra's algorithm is a variation of the BFS algorithm. In Dijkstra's Algorithm, a SPT(shortest path tree) is generated with given
source as root. Each node at this SPT stores the value of the shortest path from the source vertex to the current vertex. We
maintain two sets, one set contains vertices included in shortest path tree, other set includes vertices not yet included in shortest
path tree. At every step of the algorithm, we find a vertex which is in the other set (set of not yet included) and has a minimum
distance from the source.
Below is the detailed steps used in Dijkstra's algorithm to find the shortest path from a single source vertex to all other vertices in
the given weighted graph.
Algorithm:
1. Create a set sptSet (shortest path tree set) that keeps track of vertices included in shortest path tree, i.e., whose minimum
distance from source is calculated and finalized. Initially, this set is empty.
2. Assign a distance value to all vertices in the input graph. Initialize all distance values as INFINITE. Assign distance value as 0 for
the source vertex so that it is picked first.
3. While sptSet doesn't include all vertices:
• Pick a vertex u which is not there in sptSet and has minimum distance value.
• Include u to sptSet.
• Update distance value of all adjacent vertices of u. To update the distance values, iterate through all adjacent vertices. For
every adjacent vertex v, if sum of distance value of u (from source) and weight of edge u-v, is less than the distance value
of v, then update the distance value of v.
Let us understand the above algorithm with the help of an example.
Consider the below given graph:
The set sptSet is initially empty and distances assigned to vertices are {0, INF,
INF, INF, INF, INF, INF, INF} where INF indicates infinite. Now pick the vertex
with minimum distance value.
The vertex 0 is picked, include it in sptSet. So sptSet becomes {0}. After including 0 to sptSet, update distance values of its adjacent
vertices. Adjacent vertices of 0 are 1 and 7. The distance values of 1 and 7 are updated as 4 and 8. Following subgraph shows vertices
and their distance values, only the vertices with finite distance values are shown. The vertices included in SPT are shown in green
colour.
Implementation:
Since at every step we need to find the vertex with minimum distance from the source vertex from the set of vertices currently
not added to the SPT, so we can use a min heap for easier and efficient implementation. Below is the complete algorithm using
priority_queue(min heap) to implement Dijkstra's Algorithm:
2) Create an empty priority_queue pq. Every item of pq is a pair (weight, vertex). Weight (or distance) is used as the first item of
pair as the first item is by default used to compare two pairs
Note: The Dijkstra's Algorithm doesn't work in the case when the Graph
has negative edge weight
Bellman-Ford Algorithm for Shortest Path
Problem: Given a graph and a source vertex src in graph, find shortest paths from src to all vertices in the given graph. The graph
may contain negative weight edges.
We have discussed Dijkstra's algorithm for this problem. Dijkstra's algorithm is a Greedy algorithm and time complexity is
O(VLogV) (with the use of Fibonacci heap). Dijkstra doesn't work for Graphs with negative weight edges, Bellman-Ford works for
such graphs. Bellman-Ford is also simpler than Dijkstra and suites well for distributed systems. But time complexity of Bellman-
Ford is O(VE), which is more than Dijkstra.
Algorithm:
Input: Graph and a source vertex src.
Output: Shortest distance to all vertices from src. If there is a negative weight cycle, then shortest distances are not calculated,
negative weight cycle is reported.
1. This step initializes distances from source to all vertices as infinite and distance to source itself as 0. Create an array dist[] of
size |V| with all values as infinite except dist[src] where src is source vertex.
2. This step calculates shortest distances. Do following |V|-1 times where |V| is the number of vertices in given graph. Do
following for each edge u-v:
If dist[v] > dist[u] + weight of edge uv, then update dist[v] as: dist[v] = dist[u] + weight of edge uv.
3. This step reports if there is a negative weight cycle in graph. Do following for each edge u-v. If dist[v] > dist[u] + weight of
edge uv, then "Graph contains negative weight cycle".
The idea of step 3 is, step 2 guarantees the shortest distances if the graph doesn't contain a negative weight cycle. If we iterate
through all edges one more time and get a shorter path for any vertex, then there is a negative weight
cycle.
How does this work? Like other Dynamic Programming Problems, the algorithm calculates shortest paths in a bottom-up manner. It first
calculates the shortest distances which have at most one edge in the path. Then, it calculates the shortest paths with at-most 2 edges,
and so on. After the i-th iteration of the outer loop, the shortest paths with at most i edges are calculated. There can be maximum |V| -
1 edge in any simple path, that is why the outer loop runs |v| - 1 time. The idea is, assuming that there is no negative weight cycle, if
we have calculated shortest paths with at most i edges, then an iteration over all edges guarantees to give shortest path with at-most
(i+1) edge.
Example:
Let us understand the algorithm with following example graph. The images are taken from this source.
Let the given source vertex be 0. Initialize all distances as infinite, except the distance to the source itself. Total number of vertices in
the graph is 5, so all edges must be processed 4 times.
Let all edges are processed in following order: (B,E), (D,B), (B,D), (A,B), (A,C), (D,C), (B,C), (E,D). We get following distances
when all edges are processed first time. The first row in shows initial distances. The second row shows distances when
edges (B,E), (D,B), (B,D) and (A,B) are processed. The third row shows distances when (A,C) is processed. The fourth row
shows when (D,C), (B,C) and (E,D) are processed.
The first iteration guarantees to give all shortest paths which are at most 1 edge long. We get the following distances when
all edges are processed a second time (The last row shows final values).
The second iteration guarantees to give all shortest paths which are at most 2 edges long. The algorithm processes all edges
2 more times. The distances are minimized after the second iteration, so third and fourth iterations don't update the
distances.
Important Notes:
The Floyd-Warshall algorithm, named after its creators Robert Floyd and Stephen
Warshall. It is used to find the shortest paths between all pairs of nodes in a weighted
graph. This algorithm is highly efficient and can handle graphs with both positive and
negative edge weights, making it a versatile tool for solving a wide range of network and
connectivity problems.
The Floyd Warshall Algorithm is an all pair shortest path algorithm unlike Dijkstra and
Bellman Ford which are single source shortest path algorithms. This algorithm works for
both the directed and undirected weighted graphs. But, it does not work for the graphs
with negative cycles (where the sum of the edges in a cycle is negative). It follows
Dynamic Programming approach to check every possible path going via every possible
node in order to calculate shortest distance between every pair of nodes.
Floyd Warshall Algorithm Algorithm:
• Initialize the solution matrix same as the input graph matrix as a first step.
• Then update the solution matrix by considering all vertices as an intermediate vertex.
• The idea is to pick all vertices one by one and updates all shortest paths which include the picked
vertex as an intermediate vertex in the shortest path.
• When we pick vertex number k as an intermediate vertex, we already have considered vertices {0, 1,
2, .. k-1} as intermediate vertices.
• For every pair (i, j) of the source and destination vertices respectively, there are two possible cases.
• k is not an intermediate vertex in shortest path from i to j. We keep the value of dist[i][j] as it is.
• k is an intermediate vertex in shortest path from i to j. We update the value of dist[i][j] as dist[i]
[k] + dist[k][j], if dist[i][j] > dist[i][k] + dist[k][j]
From your computer's file system to your bank's transaction history, trees
are widely used to represent hierarchical data.
There are many types of trees, each with its own strengths and
weaknesses. Selecting the right one can make a big difference in the
efficiency and speed of your algorithm.
3 Think Recursively
The recursive nature of trees makes them powerful tools for solving
complex problems. By thinking in terms of parent-child relationships, you
can unlock the full potential of tree structure.
public class Main
{
public List<List<Integer>> buildGraph(int[][] edges, int n){
List<List<Integer>> graph = new ArrayList<>();
for(int i=0;i<n;i++){
graph.add(new ArrayList<>());
}
for(int[] edge:edges){
graph.get(edge[0]).add(edge[1]);
graph.get(edge[1]).add(edge[0]);
}
return graph;
}
public static void main(String[] args) {
//System.out.println("Hello World");
int [][] edges= new int[][]{{0,1},{0,2},{1,2},{1,3},{3,4},
{4,5}};
int n=6;
Main adjlist=new Main();
List<List<Integer>> graph =adjlist.buildGraph(edges,n);
for(int i=0;i<n;i++){
System.out.print(i+" : ");
for(int nbr:graph.get(i)){
System.out.print(nbr+" ");
}
System.out.println();
}
}
Thank you!