Cycle Detection in Directed Graph
Cycle Detection in Directed Graph
IIT2013180
Shubham Gupta
Note: This is an a solo assignment that Ive done apart from group assignment Cycle-Detection-In-UndirectedGraph which Ive done in a team with SP-Harish (IIT2013134). Ive done this extra assignment to compensate for
the IPCO Quiz-plus-assignment (Parallel Matrix-Multiplication) given in the month of February which I had
missed.
Cycles are detected in Directed Graphs using Back Edges. A back edge is an edge from a node to either
itself or its ancestor, where ancestor is a node that precedes the current node in depth first search
traversal. This can be achieved by using DFS, but its slightly more complicated than it appears at first
sight.
IPCO-630 Assignment
IIT2013180
Shubham Gupta
Undirected Graph
In the Undirected graph shown above, theres a cycle. Starting from node 1, the order of DFS traversal
would be:
1 -> 2 -> 3 -> 4 -> 1 (cycle detected)
This cycle could easily be detected by keeping a visited vector to keep track of nodes that have
already been traversed. Once we encounter a node that is already marked visited, we have found a
cycle. To accomplish this recursively, we not only require the current node but also its parent as per
the order of traversal established by DFS.
Directed Graph
In the Directed Graph above, which looks identical to its Undirected counterpart, theres no cycle.
This is so because as per the definition of cycle, a walk from any node in this graph doesnt lead back
to the same node. So although there are two separate paths leading to node 3 from node 1:
1 -> 2 -> 3
1 -> 4 -> 3
There isnt a cycle in this graph. Now evidently, the approach for Undirected Graph will fail for Directed
Graph since node 3, which is already visited will definitely be reached twice as a result of two separate
paths. But this same flaw also gives us an insight into the remedy to this issue.
Overall, the DFS on this Directed Graph will look like:
1 -> 2 -> 3 -> (backtrack) 2 -> (backtrack) 1 -> 4 -> 3
If we look at the nodes in our current recursion stack, it goes like (first table) following
Current
Node
1
2
3
2
1
4
3
Recursion
Stack
1
1, 2
1, 2, 3
1, 2
1
1, 4
1, 4, 3
Current
Node
1
2
3
4
1
From the recursion stack, we can see that in the two instances
where we are at node 3, the recursion stack doesnt contain node
3 twice but rather only once. This is a direct consequence of the
fact that node 3 is reachable from node 1 using two distinct
paths.
Recursion
Stack
1
1, 2
1, 2, 3
1, 2, 3, 4
1, 2, 3, 4, 1
1
IPCO-630 Assignment
IIT2013180
Shubham Gupta
detected when we get back to node 1 after starting from 1. This is clearly indicated by the recursion
stack which contains 1 twice at the moment when cycle is detected.
Therefore in order to detect cycle in directed graphs, we should be able to trace presence of an edge
in the current recursion stack. If a node in the stack is repeated, then a cycle is confirmed.
To achieve this, we extend our methodology for detecting cycles in undirected graphs taking two
vectors instead of one: visiting and visited. The use of two vectors is as follows:
Visiting
This vector marks the nodes that are still present in the recursion stack. In other words, the
DFS call on these nodes havent returned and their children are yet to be fully explored. An
edge to a node marked as visiting will indicate a cycle.
Visited
This vector marks the nodes that have been already been pushed and popped off from the
recursion stack. These nodes need not be explored any further and an edge to a node marked
as visited will not indicate a cycle; rather it will only tell that we need not call DFS on that
node.
With this, we come can write the following pseudo code which is intuitively recursive. It must be duly
noted that the following pseudo code would only detect cycles in one connected component. If the
graph has several disjoint connected components, then the method will have to be invoked on each
one of those separately.
DFS (node)
// mark current node as visiting
visiting [node] = true
// Recurse over each adjacent node reachable from current node
for all vertices v adjacent to node
if cycle_found {
return
}
if visiting [v] = true
cycle_found = true;
print (cycle)
return
else if visited [v] = false
DFS (v)
end-if
end-loop
// mark current node as not visiting and visited
visiting [node] = false
visited [node] = true
end
SPACE COMPLEXITY:
TIME COMPLEXITY:
O (V)
O (V + E)
IPCO-630 Assignment
IIT2013180
Shubham Gupta
IPCO-630 Assignment
IIT2013180
Shubham Gupta
Shown above is a glimpse of a scenario where a graph resembling a binary tree is being undergone
cycle detection. Suppose that cycle detection was invoked from main method at root node (blue). The
root node invoked cycle detection on its two adjacent nodes in threads T1 and T2 respectively. The
yellow node was marked visiting by thread T1 and pink node by thread T2. The white node, which is
adjacent to yellow node is still unmarked. Next if thread T1 invokes DFS on white node (by spawning
a new thread), then were good to go. However, the main thread, which returned to main method
after invoking recursive calls to DFS on yellow and pink nodes in threads T1 and T2 respectively, now
makes a call to DFS on the white node inside the for loop before thread T1 could, then there will be a
problem. This situation is depicted in following figure.
At this point, when thread T1 checks for the adjacent nodes (excluding the parent blue node marked
as 1), it will see that node 2 (coloured in blue) is already visited. So it would think that a cycle has been
found while there is none. Note that under this scenario, node 1 would already have been marked as
not visiting and visited. This could further lead to not being able to detect a cycle which might be
looping back to starting node 1 as it has already been marked as not visiting but visited.
The above argument makes it clear why we need to join the newly spawned threads before exiting
the DFS method. So we now can now understand that as per the above algorithm, DFS invoked on a
node will not return unless all vertices reachable from it havent been marked visited. When this
happens, an entire connected component would have been marked visited and well then move on by
invoking DFS on some other unvisited node in the main method.
RUST Implementation
In our RUST implementation, we use adjacency matrix to store graphs. Boolean vector is used to hold
the visited information of nodes. Overall, we have the following global variables:
IDENTIFIER
Type
Usage
MAX_THREADS
i32
constant value of 10 denoting maximum no of threads to use
cnt
i32
keep count of no of threads spawned so far
n
i32
Actual number of nodes in graph
adj [500] [500]
i32
adjacency matrix to store the graph
cycle_found
boolean flag variable to indicate cycle has been found
visited [500]
boolean vector to mark nodes whos DFS call has completed
visiting [500] boolean Vector to store status of nodes whos DFS call is ongoing
IPCO-630 Assignment
IIT2013180
Shubham Gupta
IPCO-630 Assignment
IIT2013180
Shubham Gupta
Speedup Analysis
To test our code against its serial counterpart, following inputs were used
A graph having only one big cycle comprising all nodes and no other edges except those
making up the cycle
A graph having a structure of a complete binary tree
A graph having random edges having count equal to 1/4th of the maximum possible number
of edges
A graph having random edges with count equal to 1/2 the maximum no of edges
A graph having random edges with count equal to 3/4th the maximum no of edges
The fully-connected graph was deliberately skipped because cycle-detection in such a graph would
involve only 3 iterations at the maximum and comparing results of serial and parallel codes for such
low number of iterations wouldnt be appropriate.
The graphs had 500 nodes each and following table lists the average speedup ratio of codes with
different limits for maximum number of threads.
Input Type
Cycle
Tree
Random Sparse
Random Medium
Random Dense
Filename
inp_directed_350x350_cycle.txt
inp_directed_350x350_tree.txt
inp_directed_350x350_random_sparse.txt
inp_directed_350x350_random_medium.txt
inp_directed_350x350_random_dense.txt
T=2
1.02
1.57
1.48
1.34
0.93
T=5
0.89
1.63
1.43
1.29
0.91
T=10
0.85
1.69
1.52
1.33
0.68
Library Integration
In the library, the method to perform cycle-detection in directed graph goes by the name
cyc_det_directed_y2k that takes two arguments: matrix of type &mut [[i32; 350]; 350] and
n_tmp of type i32. The matrix stores the adjacency matrix of the graph while the n_tmp stores
number of nodes in graph.
The above method invokes depth first search through the method called dfs_second which takes in
current node as argument. This DFS method spawns new thread and the limit on maximum number
of threads spawned is set by global variable called MAX_THREADS_second. Following is the list of
global variables used: