Advanced Data Structures Lecture 2
Advanced Data Structures Lecture 2
Lecture 2
Introductory remarks
What is a graph? A data structure consisting of a nite set of nodes and edges between nodes. Edges can be directed (in directed graphs) or not (in undirected graphs). What can be represented as a graph?
1
2 3 4
physical networks: electrical circuits, roadways, organic molecules , etc. interactions in ecosystems, social relationships, databases program ow of control etc.
Graphs: Applications
1. Can we walk down all streets (or bridges) of a city without going down a street (or bridge) twice? Classical problem: The Seven bridges of Knigsberg
Approach: Euler circuits 2. Can a circuit be implemented on a planar board? (connections should not intersect) Approach: planar graphs 3. (For chemists):Distinguish molecules with same formula but different structure
Data Structures for Graphs
Types of graphs
Directed graph (digraph)
G = (V , E ) where V = nite set of vertices (or nodes) E = set of arcs (or directed edges) between nodes. Every arc is from a source (or tail) node to a destination (or head) node. A digraph can be simple: there is at most one arc between two nodes. An arc from a node u to u v a node v is represented as u v and drawn as If u v is an arc then we say v is adjacent to u . multiple: there can be more than one arc from one node to another. Distinct arcs with same source and destination can be distinguished by labeling them. An arc with label e from u to v is represented e u v e u v and drawn as path = sequence of nodes = v1 , . . . , vn such that v1 v2 , . . . , vn1 vn are arcs.The length of is n 1. is a path from v1 to vn . simple path = path with distinct nodes, except possibly the rst and last. simple cycle = path of length at least 1 that begins and ends at the same node.
Types of graphs
Digraphs (continued.)
Example of simple digraph: communication network with one-way phone lines:
Example of multiple digraph: communication network with multiple one-way phone lines:
Types of graphs
Digraphs (continued.)
Example of simple digraph: 1 2 1,2,4 is a path of length 2 from node 1 to node 4. 3,2,4,3 is a cycle of length 3.
1
b b
a b a
2
b
e if i j E blank otherwise. For multiple labeled digraph: A[i ][j ] can be the list of labels of arcs from node i to node j .
1
b b
a b a
2
b
1 2 3 4 1 a b 2 a b 3 b a 4 b a
1 2 3 4 1 1 1 1 2 2 1 3 1 3 4 4 1 Advantages: O (1) time to check if there is an arc i j . Disadvantages: (n2 ) storage even if the digraph has arcs.
Data Structures for Graphs
n2
Representations of digraphs
Adjacency list
G = (V , E ) where V = {1, . . . , n} Adjacency list = array of pointers Head [1..n] where Head [i ] = pointer to the list of nodes adjacent to i . Alternative representation: two arrays of integers Head [1..n] and Adj [1..m] such that, for every 1 i n: Adj [Head [i ]], Adj [Head [i ] + 1], . . . contain the nodes adjacent to i , up to that point where we rst encounter 0, which marks the end of the list of nodes adjacent to i .
Example 1 3 2 4
If G = (V , E ) is a simple digraph with small number of arcs, then G can be represented by the list of pairs (v , w ) such that v w is an arc in E . V can be retrieved by traversing the list of edges. Example 1 3 2 4
EdgeList = {(1, 2), (1, 3), (2, 4), (3, 2), (4, 3)}.
Connectivity in digraphs
Assumption: G = (V , E ) is digraph with V = {1, 2, . . . , n} Given: two nodes i , j V Determine: whether there exists a path from i to j .
Connectivity in digraphs
Assumption: G = (V , E ) is digraph with V = {1, 2, . . . , n} Given: two nodes i , j V Determine: whether there exists a path from i to j . The transitive closure of G is the graph G = (V , E ) such that (v , v ) E if and only if there exists a path from v to v in G.
Connectivity in digraphs
Assumption: G = (V , E ) is digraph with V = {1, 2, . . . , n} Given: two nodes i , j V Determine: whether there exists a path from i to j . The transitive closure of G is the graph G = (V , E ) such that (v , v ) E if and only if there exists a path from v to v in G. Let A and A be the adjacency matrices of G and G . A and A are matrices of size n n whose elements are 0 and 1. Such matrices are called Boolean matrices. (0=false, 1=true)
Connectivity in digraphs
Assumption: G = (V , E ) is digraph with V = {1, 2, . . . , n} Given: two nodes i , j V Determine: whether there exists a path from i to j . The transitive closure of G is the graph G = (V , E ) such that (v , v ) E if and only if there exists a path from v to v in G. Let A and A be the adjacency matrices of G and G . A and A are matrices of size n n whose elements are 0 and 1. Such matrices are called Boolean matrices. (0=false, 1=true) There is a path from i to j if and only if A [i ][j ] = 1 it is desirable to compute A .
Connectivity in digraphs
Assumption: G = (V , E ) is digraph with V = {1, 2, . . . , n} Given: two nodes i , j V Determine: whether there exists a path from i to j . The transitive closure of G is the graph G = (V , E ) such that (v , v ) E if and only if there exists a path from v to v in G. Let A and A be the adjacency matrices of G and G . A and A are matrices of size n n whose elements are 0 and 1. Such matrices are called Boolean matrices. (0=false, 1=true) There is a path from i to j if and only if A [i ][j ] = 1 it is desirable to compute A . How can we compute A if we know A?
Transitive Closure
How to compute A ?
If A, B are Boolean matrices of size n n, we can dene: A B = C if C [i ][j ] = A[i ][j ] B [i ][j ] = max(A[i ][j ], B [i ][j ]) ( is Boolean addition) A B = C if C [i ][j ] = A[i ][1] B [1][j ] . . . A[i ][n] B [n][j ]. ( is the Boolean product dened by b1 b2 := min(b1 , b2 ).)
Transitive Closure
How to compute A ?
If A, B are Boolean matrices of size n n, we can dene: A B = C if C [i ][j ] = A[i ][j ] B [i ][j ] = max(A[i ][j ], B [i ][j ]) ( is Boolean addition) A B = C if C [i ][j ] = A[i ][1] B [1][j ] . . . A[i ][n] B [n][j ]. ( is the Boolean product dened by b1 b2 := min(b1 , b2 ).) There exists a path of length p from i to j iff the element at position (i , j ) of the matrix Ap = A . . . A is 1, where A is the adjacency matrix of G. p + 1 times
Transitive Closure
How to compute A ?
If A, B are Boolean matrices of size n n, we can dene: A B = C if C [i ][j ] = A[i ][j ] B [i ][j ] = max(A[i ][j ], B [i ][j ]) ( is Boolean addition) A B = C if C [i ][j ] = A[i ][1] B [1][j ] . . . A[i ][n] B [n][j ]. ( is the Boolean product dened by b1 b2 := min(b1 , b2 ).) There exists a path of length p from i to j iff the element at position (i , j ) of the matrix Ap = A . . . A is 1, where A is the adjacency matrix of G. p + 1 times There exists a path of length p from i to j iff the element at position (i , j ) of the matrix Id A . . . Ap+1 is 1, where A is the adjacency matrix of G, and Id is the identity matrix.
Transitive Closure
How to compute A ?
If A, B are Boolean matrices of size n n, we can dene: A B = C if C [i ][j ] = A[i ][j ] B [i ][j ] = max(A[i ][j ], B [i ][j ]) ( is Boolean addition) A B = C if C [i ][j ] = A[i ][1] B [1][j ] . . . A[i ][n] B [n][j ]. ( is the Boolean product dened by b1 b2 := min(b1 , b2 ).) There exists a path of length p from i to j iff the element at position (i , j ) of the matrix Ap = A . . . A is 1, where A is the adjacency matrix of G. p + 1 times There exists a path of length p from i to j iff the element at position (i , j ) of the matrix Id A . . . Ap+1 is 1, where A is the adjacency matrix of G, and Id is the identity matrix. If there is a path from i to j in G then there is a path of length n 1 from i to j .
Transitive Closure
How to compute A ?
If A, B are Boolean matrices of size n n, we can dene: A B = C if C [i ][j ] = A[i ][j ] B [i ][j ] = max(A[i ][j ], B [i ][j ]) ( is Boolean addition) A B = C if C [i ][j ] = A[i ][1] B [1][j ] . . . A[i ][n] B [n][j ]. ( is the Boolean product dened by b1 b2 := min(b1 , b2 ).) There exists a path of length p from i to j iff the element at position (i , j ) of the matrix Ap = A . . . A is 1, where A is the adjacency matrix of G. p + 1 times There exists a path of length p from i to j iff the element at position (i , j ) of the matrix Id A . . . Ap+1 is 1, where A is the adjacency matrix of G, and Id is the identity matrix. If there is a path from i to j in G then there is a path of length n 1 from i to j . The transitive closure of adjacency matrix A is the matrix A where A [i ][j ] = 1 iff there exists a path of length 1 from i to j .
Transitive Closure
How to compute A ?
If A, B are Boolean matrices of size n n, we can dene: A B = C if C [i ][j ] = A[i ][j ] B [i ][j ] = max(A[i ][j ], B [i ][j ]) ( is Boolean addition) A B = C if C [i ][j ] = A[i ][1] B [1][j ] . . . A[i ][n] B [n][j ]. ( is the Boolean product dened by b1 b2 := min(b1 , b2 ).) There exists a path of length p from i to j iff the element at position (i , j ) of the matrix Ap = A . . . A is 1, where A is the adjacency matrix of G. p + 1 times There exists a path of length p from i to j iff the element at position (i , j ) of the matrix Id A . . . Ap+1 is 1, where A is the adjacency matrix of G, and Id is the identity matrix. If there is a path from i to j in G then there is a path of length n 1 from i to j . The transitive closure of adjacency matrix A is the matrix A where A [i ][j ] = 1 iff there exists a path of length 1 from i to j .
A = Id + A . . . An
(Why?)
Transitive Closure
Warshalls algorithm
The previous approach to compute A has time complexity O (n4 ). Too expensive! Warshall discovered a much better approach: For all i , j V = {1, 2, . . . , n} consider all paths from i to j whose intermediate nodes are from {1, . . . , k }. Let C [i ][j ](k ) be 1 if there exists such a path, and 0 otherwise. Warshall observed that C [i ][j ](k ) can be computed by recursion on k : C [i ][j ](k ) = A[i ][j ] C [i ][j ](k 1) C [i ][k ](k 1) C [k ][j ](k 1) if k = 0, if k 1. (Why?)
Transitive closure
Warshalls algorithm
procedure Warshall input: int A[n][n] // an n x n Boolean matrix output: int C[n][n] // the transitive closure of A for (int i=1; i < n; i++) for (int j=1;j<n;j++) C[i][j]=A[i][j]; for (k=1; k<n; k++) for (i=1; i<n; i++) for (j=1; j<n; j++) if (C[i][j] == 0) C[i][j] = min(C[i][k], C[k][j]); Time complexity: O (n3 ).
(Why?)
r v
s w
t x
u y
r v
s w
t x
u y
r v
s w
t x
u y
s
2
r v
s w
t x
u y
Level 1:
s
2
r v
s w
t x
u y
Level 1: 3 Level 2:
r v
4
w
5
s
2
r v
s w
t x
u y
Level 1: 3 Level 2:
r v
6
w
5
4
t u
x
7
Level 3:
Data Structures for Graphs
1 2
It contains all vertices reachable from s. The path from s to v in the tree corresponds to the shortest path from s to v in the graph. Q: Why is it named "breadth-rst tree"? A: It expands the frontier between discovered and undiscovered nodes uniformly across the breadth of the frontier. The breadth-rst search distinguishes 2 kinds of nodes:
1 2
undiscovered discovered
(white) (black)
r v t u
r
3
r
3 4
w t u x y
r
3 4
w
5
t u
x y
r
3 4
w
5
t u
x y leaves=[t,x]
r
3 4
w
5
v
6
t u
x y
r
3 4
w
5
v
6
t u
x
7
r
3 4
w
5
v
6
t u
x
7
leaves=[y]
r
3 4
w
5
v
6
t u
x
7
leaves=[]
r
3 4
w
5 Nodes in the order of first visit:
v
6
t u
x
7
s, r , w , v , t , x , u , y
Nodes in the order of last visit:
s, r , v , w , t , u , x , y
// useful type declarations typedef string node; typedef list<node> adjList; typedef map<node,int> rootDistance; typedef map<node,node> pType; typedef map<node,adjList> graphType; // auxiliary method -- compute the vector of nodes in the graph vector<node> *getNodes(graphType graph) { vector<node> *V = new vector<node>(); for(graphType::iterator it = graph.begin(); it != graph.end(); it++) { V->push_back(it->first); } return V; } // node colors enum Color {WHITE, BLACK};
void computeBFTables(graphType &graph, node s, map<node,int> &distance,pType &parent,map<node,Color> &nodeColor){ queue<node> Q; Q.push(s); while(Q.size()!=0) { node u = Q.front(); Q.pop(); adjList children = graph[u]; // look up the nodes reachable from node u for(adjList::iterator it=children.begin(); it != children.end(); it++) { node v = *it; if (nodeColor[v] == WHITE) { // mark node v as newly discovered nodeColor[v] = BLACK; // record the parent and distance of v from the root distance[v] = distance[u]+1; parent[v] = u; // push v in the queue of leaf nodes Q.push(v); } } } }
Data Structures for Graphs
void printPath(graph &G, node s, node v) { map<node,int> distance; pType parent; BFS(G, s, distance, parent); printPathAux(G,s,v,parent); cout << endl; } void printPathAux(graph &G, node s, node v,pType parent) { if (v == s) { cout << s; } else if (parent[v] == ""){ cout<< "no path from "<< s << " to "<< v<<" exists."; } else { printPathAux(G,s,parent[v],parent); cout <<" "<< v; } }
Depth-rst traversal
Solves same problem as breadth-rst traversal, but in a different way: Given a graph G = (V , E ) and a node s V Find all nodes reachable by a path from s, by searching "deeper" in the graph whenever possible. Edges are explored out of the most recently discovered vertex v that still has unexplored edges leaving it. When all of v s edges have been explored, the algorithm "backtracks" to its parent node, to explore other leaving edges.
s
1
r
2
w t
6
v
5
Nodes in the order of rst visit: s, r , v , w , t , u , x , y Nodes in the order of last visit: v , r , u , y , x , t , w , s
x
7
(2) Expand the most recently generated leaf of the depth-rst tree; if it cannot be expanded, remove it, and try alternative expansions of the parent node keep the leaves of the tree in a queue (it can be a stack<node> container in C++)
(2) Expand the most recently generated leaf of the depth-rst tree; if it cannot be expanded, remove it, and try alternative expansions of the parent node keep the leaves of the tree in a queue (it can be a stack<node> container in C++) (3) Other desirable features: same as for breadth-rst search Compute the depths of nodes in the depth-rst search tree Compute the parent of each node in the depth-rst search tree.
Same data structures and initialization step like for breadth-rst traversal.
// Depth-first search // DFS(graph,s,distance,parent) takes inputs graph and s // and instantiates distance and parent such that: // - distance[v] returns the distance from v to s // - parent[v] returns the parent of node v. void DFS(map<node, adjList > &graph, node s, rootDistance &distance, pType &parent) { vector<node> *V; V=getNodes(graph); map<node,Color> nodeColor; initTables(*V,s,distance,parent,nodeColor); computeDFTables(graph,s,distance,parent,nodeColor); }
void computeDFTables(map<node, adjList > &graph, node s, map<node,int> &distance,pType &parent,map<node,Color> &nodeColor){ stack<node> S; S.push(s); while(S.size() != 0) { node u = S.peek(); adjList children = graph[u]; bool found = false; for(adjList::iterator it=children.begin(); it != children.end(); it++) { node v = *it; if (nodeColor[v] == WHITE) { nodeColor[v] = BLACK; distance[v] = distance[u]+1; parent[v] = u; S.push(v); found = true; break; } } if(!found) S.pop(); } }
Data Structures for Graphs
void computeDFTables(map<node, adjList > &graph, node s, map<node,int> &distance,pType &parent,map<node,Color> &nodeColor){ nodeColor[s] = BLACK; adjList children = graph[s]; for(adjList::iterator it=children.begin(); it != children.end(); it++) { node v = *it; if(nodeColor[v] == WHITE) { parent[v] = s; distance[v] = distance[s] + 1; computeDFTables(graph,v,distance,parent,nodeColor); } } }
The implementation differs from breadth-rst search by using a stack container for the nodes that are waiting to be expanded, instead of a queue. The behavior is different:
In general, the length of a branch from the root of the tree to a node in the depth-rst tree is different from the length of the shortest path from the root to that node. It consumes less memory.
References
Alfred V. Aho et al. Data Structures and Algorithms. Chapter 6: Directed Graphs. 1983. Addison-Wesley. T. H. Cormen et al. Introduction to Algorithms. Chapter 23: Elementary Graph Algorithms. 2000. MIT Press and McGraw-Hill Book Company.