Chapter 9 - Graph Traversal
Chapter 9 - Graph Traversal
Chapter 9 - Graph Traversal
Graph Traversal
1
Graph Traversal:
Graphs are one of the unifying themes of computer science – an abstract representation which
describes the organization of transportation systems, electrical circuits, human interactions, and
telecommunication networks. That so many different structures can be modeled using a single
formalism is a source of great power to the educated programmer.
In this chapter, we focus on problems which require only an elementary knowledge of graph
algorithms, specifically the appropriate use of graph data structures and traversal algorithms. In
Chapter 10, we will present problems relying on more advanced graph algorithms that find
minimum spanning trees, shortest paths, and network flows.
Flavors of Graphs:
A graph G = (V, E) is defined by a set of vertices V, and a set of edges E consisting of ordered or
unordered pairs of vertices from V. In modeling a road network, the vertices may represent the cities
or junctions, certain pairs of which are connected by roads/edges. In analyzing the source code of a
computer program, the vertices may represent lines of code, with an edge connecting lines x and y if
y can be the next statement executed after x. In analyzing human interactions, the vertices typically
represent people, with edges connecting pairs of related souls.
There are several fundamental properties of graphs which impact the choice of data structures used
to represent them and algorithms available to analyze them. The first step in any graph problem is
determining which flavor of graph you are dealing with:
• Undirected vs. Directed — A graph G = (V, E) is undirected if edge (x, y) ∈ E implies that
(y, x) is also in E. If not, we say that the graph is directed. Road networks between cities are
typically undirected, since any large road has lanes going in both directions. Street networks
within cities are almost always directed, because there are typically at least a few one-way
streets lurking about. Program-flow graphs are typically directed, because the execution
flows from one line into the next and changes direction only at branches. Most graphs of
graph-theoretic interest are undirected.
• Weighted vs. Unweighted — In weighted graphs, each edge (or vertex) of G is assigned a
numerical value, or weight. Typical application-specific edge weights for road networks
might be the distance, travel time, or maximum capacity between x and y. In unweighted
graphs, there is no cost distinction between various edges and vertices.
The difference between weighted and unweighted graphs becomes particularly apparent in
finding the shortest path between two vertices. For unweighted graphs, the shortest path must
have the fewest number of edges, and can be found using the breadth-first search algorithm
discussed in this chapter. Shortest paths in weighted graphs requires more sophisticated
algorithms.
• Cyclic vs. Acyclic — An acyclic graph does not contain any cycles. Trees are connected
acyclic undirected graphs. Trees are the simplest interesting graphs, and inherently recursive
structures since cutting any edge leaves two smaller trees.
Directed acyclic graphs are called DAGs. They arise naturally in scheduling problems, where
a directed edge (x, y) indicates that x must occur before y. An operation called topological
sorting orders the vertices of a DAG so as to respect these precedence constraints. Topological
sorting is typically the first step of any algorithm on a DAG.
• Simple vs. Non-simple — Certain types of edges complicate the task of working with graphs.
A self-loop is an edge (x, x) involving only one vertex. An edge (x, y) is a multi-edge if it
occurs more than once in the graph.
2
Both of these structures require special care in implementing graph algorithms. Hence any
graph which avoids them is called simple.
• Embedded vs. Topological — A graph is embedded if the vertices and edges have been
assigned geometric positions. Thus any drawing of a graph is an embedding, which may or
may not have algorithmic significance.
Occasionally, the structure of a graph is completely defined by the geometry of its embedding.
For example, if we are given a collection of points in the plane, and seek the minimum cost
tour visiting all of them (i.e., the traveling salesman problem), the underlying topology is the
complete graph connecting each pair of vertices. The weights are typically defined by the
Euclidean distance between each pair of points.
Another example of topology from geometry arises in grids of points. Many problems on an n
× m grid involve walking between neighboring points, so the edges are implicitly defined
from the geometry.
• Implicit vs. Explicit — Many graphs are not explicitly constructed and then traversed, but built
as we use them. A good example is in backtrack search. The vertices of this implicit search
graph are the states of the search vector, while edges link pairs of states which can be directly
generated from each other. It is often easier to work with an implicit graph than explicitly
constructing it before analysis.
• Labeled vs. Unlabeled — In labeled graphs, each vertex is assigned a unique name or
identifier to distinguish it from all other vertices. In unlabeled graphs, no such distinctions
have been made.
Most graphs arising in applications are naturally and meaningfully labeled, such as city names
in a transportation network. A common problem arising on graphs is that of isomorphism
testing, determining whether the topological structure of two graphs are in fact identical if we
ignore any labels. Such problems are typically solved using backtracking, by trying to assign
each vertex in each graph a label such that the structures are identical.
• Adjacency Lists in Matrices — Adjacency lists can also embedded in matrices, thus
eliminating the need for pointers. We can represent a list in an array (or equivalently, a row of
a matrix) by keeping a count k of how many elements there are, and packing them into the
first k elements of the array. Now we can visit successive the elements from the first to last
just like a list, but by incrementing an index in a loop instead of cruising through pointers.
This data structure looks like it combines the worst properties of adjacency matrices (large
space) with the worst properties of adjacency lists (the need to search for edges). However,
there is a method to its madness. First, it is the simplest data structure to program, particularly
for static graphs which do not change after they are built. Second, the space problem can in
principle be eliminated by allocating the rows for each vertex dynamically, and making them
exactly the right size.
To prove our point, we will use this representation in all our examples below.
• Table of Edges — An even simpler data structure is just to maintain an array or linked list of
the edges. This is not as flexible as the other data structures at answering “who is adjacent to
vertex x?” but it works just fine for certain simple procedures like Kruskal’s minimum
spanning tree algorithm.
As stated above, we will use adjacency lists in matrices as our basic data structure to represent
graphs. It is not complicated to convert these routines to honest pointer-based adjacency lists.
Sample code for adjacency lists and matrices can be found in many books.
We represent a graph using the following data type. For each graph, we keep count of the number of
vertices, and assign each vertex a unique number from 1 to nVertices. We represent the edges in an
MAXV × MAXDEGREE array, so each vertex can be adjacent to MAXDEGREE others. By
defining MAXDEGREE to be MAXV, we can represent any simple graph, but this is wasteful of
space for low-degree graphs:
struct graph
{
int edges[MAXV+1][MAXDEGREE]; // adjacency info
int degree[MAXV+1]; // out-degree of each vertex
int nVertices; // number of vertices in graph
4
int nEdges; // number of edges in graph
};
void main()
{
graph *p, u;
bool directed;
p = &u;
cout << "If the grpah is directed enter 1, otherwise enter 0: ";
cin >> directed;
read_graph(p, directed);
print_graph(p);
}
We represent a directed edge (x, y) by the integer y in x’s adjacency list, which is located in the
subarray graph->edges[x]. The degree field counts the number of meaningful entries for the given
vertex. An undirected edge (x, y) appears twice in any adjacency-based graph structure, once as y in
x’s list, and once as x in y’s list.
To demonstrate the use of this data structure, we show how to read in a graph from a file. A typical
graph format consists of an initial line featuring the number of vertices and edges in the graph,
followed by a listing of the edges at one vertex pair per line.
initialize_graph(g);
5
void initialize_graph(graph *g)
{
int i; // counter
g -> nVertices = 0;
g -> nEdges = 0;
for (i=1; i<=MAXV; i++)
g->degree[i] = 0;
}
The critical routine is insert edge.We parameterize it with a Boolean flag directed to identify
whether we need to insert two copies of each edge or only one. Note the use of recursion to solve
the problem:
g->edges[x][g->degree[x]] = y;
g->degree[x] ++;
if (directed == false)
insert_edge(g,y,x,true);
else
g->nEdges ++;
}
6
Sample input / output:
Breadth-First Search:
You are given the maze figure below and asked to find a way to the exit with a minimum number of
decisions to make (a decision is required whenever there is at least one available direction to go).
We have pointed out these critical positions and given them numbers.
On the basis of the above diagram we will draw a graph with the following rules :
Also the circles colored in cyan are the start (1) and the finish (10).
7
It is easy to see now that the minimum length path is 1, 3, 14, 15, 10 with 4 decisions to make (the
number of edges connecting the vertices). This is because we have an overview of the maze, we
know every detail about it in advance. But what if we do not ? We would never be aware of the
consequences of our current decision until we make it. What would be our strategy then ?
One possibility is to gather an infinite number of people (this is for instructional purposes only) and
put them in the start position of the maze.
Then every time we are to make a decision, leave one single person in the current position and split
the group into N smaller ones, where N is the number of the current possible decisions to make.
These groups each go a different way and so on, until someone reaches the finish. It is clear that the
path found by this group of people is of minimum length, since every group has to make only one
decision at a time (the length of each step is the same for everyone) and this group made it first to
the finish. Note that passing through a point that already has one person standing there is not
permitted.
To reconstruct the minimum length path, let us assume that every person that is left in a certain
point knows exactly the position where the group came from. For example the person in point 2
knows that the group came from the 1st position before it left him there. Now if we start with the
last position of the winning group (the finish) we may go backwards as we know its previous
position, and so on until we reach the start. We have now constructed the exact minimum path (in
terms of decisions to make) to arrive at the finish of the maze, but in reversed order. The last
operation is to reverse the path in order to find the correct one.
This is a version of Lee's algorithm for finding the shortest path between two particular vertices in
an undirected graph. But let us now concentrate on the method of traversal, which in terms of graph
theory is known as Breadth First Search ( BFS in short ). I assume you already know the algorithm,
it is all in the way the groups of people move from one point to another - at each step the groups
8
separate into smaller ones, then go to the next positions and also leave a person behind. Remember
it is not allowed to pass through a point with one person already standing there. This separation
process continues until there are no more available positions to visit.
For a better understanding let us try and traverse the next graph starting from vertex 3 using this
method. The steps are already described graphically below but let us make a few comments.
First there is a group of 5 people (which is enough for this situation) positioned in vertex 3. Then
one person goes to vertex 1, one to vertex 5, two persons go to vertex 2 and one stays in 3. At the
next phase of the process we clearly see that the person in vertex 1 has nowhere to go, since both its
next possible positions (2 and 3) are occupied. Because 2 is smaller than 5 (considering the integer
numbers order) we will search first for its available positions. The single one is vertex 4, so we
leave one person in 2 and send one to 4. This seems like our last move - no person can move from
its current position anymore since all are occupied.
The BFS method shows the vertices that are visited through each step of the traversal process.
3, 1, 2, 5, 4
To implement this method in C++ we will use a queue to store the current position of each group of
people and search for their available directions to go. Also we will use an additional Boolean array
to store information about each vertex (whether it is occupied or not) :
The Breadth First Search pseudocode looks like this (assuming x is the first node to start the
traversal) :
9
if (visited[ i ] == false AND there is an edge between k and i) then
ADD i to Queue
visited [ i ] = true
end if
end while
The minimum length path algorithm is very similar to this. The only difference is that we need to
store the position from which every person came in the traversal process in order to reconstruct the
path. We will achieve this using an array and displaying it in reverse order at the end of the
traversal. Below you will find the listing that implements both BFS and the minimum path
algorithm by encapsulating them inside the Graph class. We have also implemented a Queue class
since both use this type of container.
#include <iostream>
struct node
{
int info;
node *next;
};
class Queue
{
public:
Queue();
~Queue();
bool isEmpty();
void add(int);
int get();
private:
node *first, *last;
};
class Graph
{
public:
Graph(int size = 2);
~Graph();
bool isConnected(int, int);
// adds the (x, y) pair to the edge set
void addEdge(int x, int y);
// performs a Breadth First Search starting with node x
void BFS(int x);
// searches for the minimum length path
// between the start and target vertices
void minPath(int start, int target);
10
private :
int n;
int **A;
};
Queue::Queue()
{
first = new node;
first->next = NULL;
last = first;
}
Queue::~Queue()
{
delete first;
}
bool Queue::isEmpty()
{
return (first->next == NULL);
}
void Queue::add(int x)
{
node *aux = new node;
aux->info = x;
aux->next = NULL;
last->next = aux;
last = aux;
}
int Queue::get()
{
node *aux = first->next;
int value = aux->info;
first->next = aux->next;
if (last == aux) last = first;
delete aux;
return value;
}
Graph::Graph(int size)
{
int i, j;
if (size < 2)
n = 2;
else
n = size;
11
A = new int*[n];
for (i = 0; i < n; ++i)
Graph::~Graph()
{
for (int i = 0; i < n; ++i)
delete [] A[i];
delete [] A;
}
void Graph::BFS(int x)
{
Queue Q;
bool *visited = new bool[n+1];
int i;
Q.add(x);
visited[x] = true;
cout << "Breadth First Search starting from vertex ";
cout << x << " : " << endl;
while (!Q.isEmpty())
{
int k = Q.get();
cout << k << " ";
for (i = 1; i <= n; ++i)
if (isConnected(k, i) && !visited[i])
{
Q.add(i);
12
visited[i] = true;
}
}
Q.add(start);
visited[start] = true;
found = false;
p = q = 0;
X[0].current = start;
X[0].prev = 0;
if (i == target)
found = true;
}
++p;
}
13
cout << "The minimum length path from " << start;
cout << " to " << target << " is : " << endl;
p = 0;
while (q)
{
Y[p] = X[q].current;
q = X[q].prev;
++p;
}
Y[p] = X[0].current;
for (q = 0; q <= p/2; ++q)
{
int temp = Y[q];
Y[q] = Y[p-q];
Y[p-q] = temp;
}
delete [] visited;
delete [] X;
delete [] Y;
}
void Traversal()
{
Graph g(5);
g.addEdge(1, 2); g.addEdge(1, 3); g.addEdge(2, 4);
g.addEdge(3, 5); g.addEdge(4, 5); g.addEdge(2, 3);
g.BFS(3);
}
void Maze()
{
Graph f(15);
f.addEdge(1, 2); f.addEdge(1, 3); f.addEdge(2, 4);
f.addEdge(3, 14); f.addEdge(4, 5); f.addEdge(4, 6);
f.addEdge(5, 7); f.addEdge(6, 13); f.addEdge(7, 8);
f.addEdge(7, 9); f.addEdge(8, 11); f.addEdge(9, 10);
f.addEdge(10, 12); f.addEdge(10, 15); f.addEdge(11, 12);
f.addEdge(13, 14); f.addEdge(14, 15);
f.minPath(1, 10);
}
14
void main())
{
Travversal();
cout << endl;
Mazze();
}
Samplee input / ou
utput:
Order in which
w the noodes are exppanded
15
a depth-first search starting at A, assuming that the left edges in the shown graph are chosen before
right edges, and assuming the search remembers previously-visited nodes and will not repeat them
(since this is a small graph), will visit the nodes in the following order: A, B, D, F, E, C, G.
#include<iostream>
class graph
{
private:
int n;
graph* next;
public:
graph* read_graph(graph*);
void dfs(int); //dfs for a single node
void dfs(); //dfs of the entire graph
void ftraverse(graph*);
};
graph *g[100];
int visit[100];
int dfs_span_tree[100][100];
graph* graph::read_graph(graph*head)
{
int x;
graph* last;
head=last=NULL;
if(head==NULL)
head=NEW;
else
last->next=NEW;
last=NEW;
return head;
}
void graph::ftraverse(graph*h)
{
while(h!=NULL)
{
cout<<h->n<<"->";
h=h->next;
}
cout<<"NULL"<<endl;
}
void graph::dfs(int x)
{
cout<<"node "<<x<<" is visited\n";
visit[x]=1;
graph *p;
p=g[x];
while(p!=NULL)
{
int x1=p->n;
if(visit[x1]==0)
{
cout<<"from node "<<x<<' ';
//Add the edge to the dfs spanning tree
dfs_span_tree[x][x1]=1;
dfs(x1);
}
p=p->next;
}
}
void graph::dfs()
{
int i;
cout<<"*****************************************************\n";
cout<<"This program is to implement dfs for an unweighted graph \n";
cout<<"*****************************************************\n";
for(i=1;i<=n;i++)
{
cout<<"Enter the adjacent nodes to node no. "<<i<<endl;
cout<<"***************************************\n";
g[i]=read_graph(g[i]);
}
for(i=1;i<=n;i++)
visit[i]=0; //mark all nodes as unvisited
for(i=1;i<=n;i++)
for(int j=1;j<=n;j++)
dfs_span_tree[i][j]=0;
for(i=1;i<=n;i++)
{
for(int j=1;j<=n;j++)
cout<<dfs_span_tree[i][j]<<' ';
cout<<endl;
}
}
int main()
{
graph obj;
obj.dfs();
return 0;
}
18
Sample input / output:
19
Topoloogical Sortin
ng:
In graphh theory, a topological
t sort or topo
ological orddering of a directed
d acyyclic graph (DAG) is a
linear ordering
o of iits nodes in which each h node comees before all nodes to wwhich it has outbound
edges. Every
E DAG G has one orr more topollogical sortss.
More foormally, deffine the reacchability rellation R oveer the nodess of the DAG G such thatt xRy if and
only if there
t is a diirected path
h from x to y.
y Then, R iss a partial order, and a topologicall sort is a
linear extension off this partiall order, that is, a total order
o compaatible with thhe partial orrder.
The cannonical appllication of topological
t sorting (top
pological orrder) is in sccheduling a sequence of
o
jobs or tasks. The jobs
j are rep
presented by
y vertices, annd there is an
a edge from m x to y if job x must be
b
completted before job y can bee started (forr example, when
w washiing clothes,, the washin ng machine
must fin
nish before we put the clothes to dry).
d Then, a topologicaal sort givess an order inn which to
perform
m the jobs.
In comp puter sciencce, applicatiions of this type arise in
n instruction
n schedulin
ng, ordering of formula
cell evaaluation wheen recompu uting formulla values in spreadsheeets, logic syn nthesis, deteermining th
he
order off compilatioon tasks to perform
p in makefiles,
m a resolvin
and ng symbol ddependenciees in linkerss.
Examplle:
• 7, 5, 3, 11, 8, 2, 9, 10
0 (visual leftt-to-right, to
op-to-bottom
m)
• 3, 5, 7, 8, 11, 2, 9, 10
0 (smallest-nnumbered available
a
vertex firsst)
• 3, 7, 8, 5, 11, 10, 2, 9
• 5, 7, 3, 8, 11, 10, 9, 2 (least num
mber of edgees first)
• 7, 5, 11, 3, 10, 8, 9, 2 (largest-nuumbered avaailable verteex
first)
• 7, 5, 11, 2,
2 3, 8, 9, 100
20
#include<iostream>
class graph
{
private:
int n;
int data[20];
int gptr[20][20];
public:
void create();
void topological();
};
void graph::create()
{
int i, j;
cout<<"*******************************************************\n"
<<"This program sorts the given numbers in increasing order\n"
<<"\t\t using topological sorting\n"
<<"***********************************************************\n";
cout<<"Enter the no. of nodes in the directed unweighted graph ::";
cin>>n;
for(i=1;i<=n;i++)
{
cout<<"enter data for the node "<<i<<" ::";
cin>>data[i];
}
for(i=1;i<=n;i++)
for(j=1;j<=n;j++)
cin>>gptr[i][j];
}
void graph::topological()
{
int flag;
int i,j;
int poset[20],included[20];
for(i=1;i<=n;i++)
{
poset[i]=0;
included[i]=false;
}
21
int k=1;
flag=true;
int zeroindegree;
int c=1;
while(flag==1)
{
for(i=1;i<=n;i++)
{
if(!included[i])
{
zeroindegree=true;
for(j=1;j<=n;j++)
if(gptr[j][i]>0)
{
zeroindegree=false;
break;
}
if(zeroindegree)
{
included[i]=true;
poset[k]=data[i];
k=k+1;
for(j=1;j<=n;j++)
{
gptr[i][j]=-1;
gptr[j][i]=-1;
}
break;
}
}
}
if(i==n+1)
{
if(zeroindegree==false)
{
cout<<"Graph is not acyclic\n";
return;
}
else
{
poset[k]=data[i-1];
k=k+1;
flag=false;
}
}
}
22
cout<<"After topological sorting the numbers are :\n";
for(i=1;i<=n;i++)
cout<<poset[i]<<"\t";
cout<<endl<<endl;
}
void main()
{
graph obj;
obj.create();
obj.topological();
}
23
Problems
24
Biocoloring
The four-color theorem states that every planar map can be colored using only four colors in such a
way that no region is colored using the same color as a neighbor. After being open for over 100
years, the theorem was proven in 1976 with the assistance of a computer.
Here you are asked to solve a simpler problem. Decide whether a given connected graph can be
bicolored, i.e., can the vertices be painted red and black such that no two adjacent vertices have the
same color.
To simplify the problem, you can assume the graph will be connected, undirected, and not contain
self-loops (i.e., edges from a vertex to itself).
Input
The input consists of several test cases. Each test case starts with a line containing the number of
vertices n, where 1 < n < 200. Each vertex is labeled by a number from 0 to n−1. The second line
contains the number of edges l. After this, l lines follow, each containing two vertex numbers
specifying an edge.
An input with n = 0 marks the end of the input and is not to be processed.
Output
Decide whether the input graph can be bicolored, and print the result as shown below.
25
Slash Maze
By filling a rectangle with slashes (/) and backslashes (\), you can generate nice little mazes. Here is
an example:
As you can see, paths in the maze cannot branch, so the whole maze contains only (1) cyclic paths
and (2) paths entering somewhere and leaving somewhere else. We are only interested in the cycles.
There are exactly two of them in our example.
Your task is to write a program that counts the cycles and finds the length of the longest one. The
length is defined as the number of small squares the cycle consists of (the ones bordered by gray
lines in the picture). In this example, the long cycle has length 16 and the short one length 4.
Input
The input contains several maze descriptions. Each description begins with one line containing two
integers w and h (1 ≤ w, h ≤ 75), representing the width and the height of the maze. The next h lines
describe the maze itself and contain w characters each; all of these characters will be either “/” or
“\”.
The input is terminated by a test case beginning with w = h = 0. This case should not be processed.
Output
For each maze, first output the line “Maze #n:”, where n is the number of the maze. Then, output
the line “k Cycles; the longest has length l.”, where k is the number of cycles in the maze and l the
length of the longest of the cycles. If the maze is acyclic, output “There are no cycles.”
Output a blank line after each test case.
26
Edit Step Ladders
An edit step is a transformation from one word x to another word y such that x and y are words in
the dictionary, and x can be transformed to y by adding, deleting, or changing one letter. The
transformations from dig to dog and from dog to do are both edit steps. An edit step ladder is a
lexicographically ordered sequence of words w1, w2, . . . , wn such that the transformation from wi to
wi+1 is an edit step for all i from 1 to n − 1.
For a given dictionary, you are to compute the length of the longest edit step ladder.
Input
The input to your program consists of the dictionary: a set of lowercase words in lexicographic
order at one word per line. No word exceeds 16 letters and there are at most 25,000 words in the
dictionary.
Output
The output consists of a single integer, the number of words in the longest edit step ladder.
Sample Input
cat
dig
dog
fig
fin
fine
fog
log
wine
Sample Output
5
27
Hanoi Tower Troubles Again!
There are many interesting variations on the Tower of Hanoi problem. This version consists of N
pegs and one ball containing each number from 1, 2, 3, . . . , ∞. Whenever the sum of the numbers
on two balls is not a perfect square (i.e., c2 for some integer c), they will repel each other with such
force that they can never touch each other.
The player must place balls on the pegs one by one, in order of increasing ball number (i.e., first
ball 1, then ball 2, then ball 3. . . ). The game ends where there is no non-repelling move.
The goal is to place as many balls on the pegs as possible. The figure above gives a best possible
result for 4 pegs.
Input
The first line of the input contains a single integer T indicating the number of test cases (1 ≤ T ≤ 50).
Each test case contains a single integer N (1 ≤ N ≤ 50) indicating the number of pegs available.
Output
For each test case, print a line containing an integer indicating the maximum number of balls that
can be placed. Print “-1” if an infinite number of balls can be placed.
28