Graph Algorithms
BFS, DFS
Greedy Strategies Applied to Graph problems:
We first review some notations and terms about graphs. A
graph consists of vertices (nodes) and edges (arcs, links),
in which each edge connects two vertices (not
necessarily distinct). More formally, a graph G = (V, E),
where V and E denote the sets of vertices and edges,
respectively.
1
a
2
c
b
3
d
4
In this example, V = {1, 2, 3, 4},
E = {a, b, c, d, e}. Edges c and d
are parallel edges; edge e is a
self-loop. A path is a sequence
of adjacent edges, e.g., path
abeb, path acdab.
Directed graphs vs. (un-directed) graphs:
If every edge has an orientation, e.g., an edge starting from
node x terminating at node y, the graph is called a directed
graph, or digraph for short. If all edges have no orientation,
the graph is called an undirected graph, or simply, a graph.
When there are no parallel edges (two edges that have
identical end points), we could identify an edge with its two
end points, such as edge (1,2), or edge (3,3). In an undirected
graph, edge (1,2) is the same as edge (2,1). We will assume
no parallel edges unless otherwise stated.
1
a
2
c
b
3
d
4
A directed graph. Edges c and d
are parallel (directed) edges. Some
directed paths are ad, ebac.
Both directed and undirected graphs appear often and naturally
in many scientific (call graphs in program analysis), business
(query trees, entity-relation diagrams in databases), and
engineering (CAD design) applications. The simplest data
structure for representing graphs and digraphs is using 2dimensional arrays. Suppose G = (V, E), and |V| = n. Declare
an array T[1..n][1..n] so that T[i][j] = 1 if there is an edge (i, j)
E; 0 otherwise. (Note that in an undirected graph, edges (i, j)
and (j, i) refer to the same edge.)
j
1
2
3
4
1
i 2
3
4
1
0
0
2 3 4
1 0 0
0 0 1
0 1 0
1 0 0
A 2-dimensional
array for the digraph,
called the adjacency
matrix.
Sometimes, edges of a graph or digraph are given a
positive weight or cost value. In that case, the adjacency
matrix can easily modified so that T[i][j] = the weight of
edge (i, j); 0 if there is no edge (i, j). Since the adjacency
matrix may contain many zeros (when the graph has few
edges, known as sparse), a space-efficient representation
uses linked lists representing the edges, known as the
adjacency list representation.
1
2
3
4
1
2
3
4
2
4
3
The adjacency lists for the digraph, which
can store edge weights by adding another
field in the list nodes.
Graph (and Digraph) Traversal techniques:
Given a (directed) graph G = (V, E), determine all nodes
that are connected from a given node v via a (directed) path.
The are essentially two graph traversal algorithms, known
as Breadth-first search (BFS) and depth-first search (DFS),
both of which can be implemented efficiently.
BFS: From node v, visit each of its neighboring nodes in
sequence, then visit their neighbors, etc., while avoiding
repeated visits.
DFS: From node v, visit its first neighboring node and all
its neighbors using recursion, then visit node vs second
neighbor applying the same procedure, until all vs
neighbors are visited, while avoiding repeated visits.
Breadth-First Search (BFS):
BFS(v) // visit all nodes reachable from node v
(1) create an empty FIFO queue Q, add node v to Q
(2) create a boolean array visited[1..n], initialize all values
to false except for visited[v] to true
(3) while Q is not empty
(3.1) delete a node w from Q
(3.2) for each node z adjacent from node w
if visited[z] is false then
add node z to Q and set visited[z] to true
The time complexity is O(n+e)
with n nodes and e edges, if the
adjacency lists are used. This is
because in the worst case, each
node is added once to the queue
(O(n) part), and each of its
neighbors gets considered once
(O(e) part).
1
2
4
5
Node search order
starting with node 1,
including two nodes
not reached
Depth-First Search (DFS):
(1) create a boolean array visited[1..n], initialize all values
to false except for visited[v] to true
(2) call DFS(v) to visit all nodes reachable via a path
DFS(v)
for each neighboring nodes w of v do
if visited[w] is false then
set visited[w] to true; call DFS(w) // recursive call
The algorithms time
complexity is also O(n+e)
using the same reasoning as
in the BFS algorithm.
1
2
5
4
Node search order
starting with node 1,
including two nodes
not reached