Graph Theory Penn State
Graph Theory Penn State
Graph Theory Penn State
Notes
Version 1.4.3
Christopher Griffin
« 2011-2017
Licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License
With Contributions By:
Suraj Shekhar
Contents
List of Figures v
Using These Notes xi
Chapter 1. Preface and Introduction to Graph Theory 1
1. Some History of Graph Theory and Its Branches 1
2. A Little Note on Network Science 2
Chapter 2. Some Definitions and Theorems 3
1. Graphs, Multi-Graphs, Simple Graphs 3
2. Directed Graphs 8
3. Elementary Graph Properties: Degrees and Degree Sequences 9
4. Subgraphs 15
5. Graph Complement, Cliques and Independent Sets 16
Chapter 3. More Definitions and Theorems 21
1. Paths, Walks, and Cycles 21
2. More Graph Properties: Diameter, Radius, Circumference, Girth 23
3. More on Trails and Cycles 24
4. Graph Components 25
5. Introduction to Centrality 30
6. Bipartite Graphs 31
7. Acyclic Graphs and Trees 33
Chapter 4. Some Algebraic Graph Theory 41
1. Isomorphism and Automorphism 41
2. Fields and Matrices 47
3. Special Matrices and Vectors 49
4. Matrix Representations of Graphs 49
5. Determinants, Eigenvalue and Eigenvectors 52
6. Properties of the Eigenvalues of the Adjacency Matrix 55
Chapter 5. Applications of Algebraic Graph Theory: Eigenvector Centrality and
Page-Rank 59
1. Basis of Rn 59
2. Eigenvector Centrality 61
3. Markov Chains and Random Walks 64
4. Page Rank 68
Chapter 6. Trees, Algorithms and Matroids 71
iii
1. Two Tree Search Algorithms 71
2. Prim’s Spanning Tree Algorithm 73
3. Computational Complexity of Prim’s Algorithm 79
4. Kruskal’s Algorithm 81
5. Shortest Path Problem in a Positively Weighted Graph 83
6. Greedy Algorithms and Matroids 87
Chapter 7. A Brief Introduction to Linear Programming 91
1. Linear Programming: Notation 91
2. Intuitive Solutions of Linear Programming Problems 92
3. Some Basic Facts about Linear Programming Problems 95
4. Solving Linear Programming Problems with a Computer 98
5. Karush-Kuhn-Tucker (KKT) Conditions 100
6. Duality 103
Chapter 8. An Introduction to Network Flows and Combinatorial Optimization 109
1. The Maximum Flow Problem 109
2. The Dual of the Flow Maximization Problem 110
3. The Max-Flow / Min-Cut Theorem 112
4. An Algorithm for Finding Optimal Flow 115
5. Applications of the Max Flow / Min Cut Theorem 119
6. More Applications of the Max Flow / Min Cut Theorem 121
Chapter 9. A Short Introduction to Random Graphs 127
1. Bernoulli Random Graphs 127
2. First Order Graph Language and 0 − 1 properties 130
3. Erdös-Rényi Random Graphs 131
Chapter 10. Coloring 137
1. Vertex Coloring of Graphs 137
2. Some Elementary Logic 139
3. NP-Completeness of k-Coloring 141
4. Graph Sizes and k-Colorability 145
Chapter 11. Some More Algebraic Graph Theory 147
1. Vector Spaces and Linear Transformation 147
2. Linear Span and Basis 149
3. Vector Spaces of a Graph 150
4. Cycle Space 151
5. Cut Space 154
6. The Relation of Cycle Space to Cut Space 157
Bibliography 159
iv
List of Figures
6.1 The breadth first walk of a tree explores the tree in an ever widening pattern. 72
6.2 The depth first walk of a tree explores the tree in an ever deepening pattern. 73
6.3 The construction of a breadth first spanning tree is a straightforward way to
construct a spanning tree of a graph or check to see if its connected. 75
6.4 The construction of a depth first spanning tree is a straightforward way to
construct a spanning tree of a graph or check to see if its connected. However,
vii
this method can be implemented with a recursive function call. Notice this
algorithm yields a different spanning tree from the BFS. 75
6.5 A weighted graph is simply a graph with a real number (the weight) assigned to
each edge. 76
6.6 In the minimum spanning tree problem, we attempt to find a spanning subgraph
of a graph G that is a tree and has minimal weight (among all spanning trees). 76
6.7 Prim’s algorithm constructs a minimum spanning tree by successively adding
edges to an acyclic subgraph until every vertex is inside the spanning tree. Edges
with minimal weight are added at each iteration. 78
6.8 When we remove an edge (e0 ) from a spanning tree we disconnect the tree into
two components. By adding a new edge (e) edge that connects vertices in these
two distinct components, we reconnect the tree and it is still a spanning tree. 78
6.9 Kruskal’s algorithm constructs a minimum spanning tree by successively adding
edges and maintaining and acyclic disconnected subgraph containing every
vertex until that subgraph contains n − 1 edges at which point we are sure it is
a tree. Edges with minimal weight are added at each iteration. 82
6.10 Dijkstra’s Algorithm iteratively builds a tree of shortest paths from a given
vertex v0 in a graph. Dijkstra’s algorithm can correct itself, as we see from
Iteration 2 and Iteration 3. 85
7.1 Feasible Region and Level Curves of the Objective Function: The shaded region
in the plot is the feasible region and represents the intersection of the five
inequalities constraining the values of x1 and x2 . On the right, we see the
optimal solution is the “last” point in the feasible region that intersects a level
set as we move in the direction of increasing profit. 94
7.2 An example of infinitely many alternative optimal solutions in a linear
programming problem. The level curves for z(x1 , x2 ) = 18x1 + 6x2 are parallel
to one face of the polygon boundary of the feasible region. Moreover, this side
contains the points of greatest value for z(x1 , x2 ) inside the feasible region. Any
combination of (x1 , x2 ) on the line 3x1 + x2 = 120 for x1 ∈ [16, 35] will provide
the largest possible value z(x1 , x2 ) can take in the feasible region S. 95
7.3 Matlab input for solving the diet problem. Note that we are solving a
minimization problem. Matlab assumes all problems are mnimization problems,
so we don’t need to multiply the objective by −1 like we would if we started
with a maximization problem. 100
7.4 The Gradient Cone: At optimality, the cost vector c is obtuse with respect to
the directions formed by the binding constraints. It is also contained inside the
cone of the gradients of the binding constraints, which we will discuss at length
later. 102
7.5 In this problem, it costs a certain amount to ship a commodity along each edge
and each edge has a capacity. The objective is to find an allocation of capacity
to each edge so that the total cost of shipping three units of this commodity
from Vertex 1 to Vertex 4 is minimized. 107
viii
8.1 A cut is defined as follows: in each directed path from v1 to vm , we choose an
edge at capacity so that the collection of chosen edges has minimum capacity
(and flow). If this set of edges is not an edge cut of the underlying graph, we
add edges that are directed from vm to v1 in a simple path from v1 to vm in the
underlying graph of G. 114
8.2 Two flows with augmenting paths and one with no augmenting paths are
illustrated. 115
8.3 The result of augmenting the flows shown in Figure 8.2. 116
8.4 The Edmonds-Karp algorithm iteratively augments flow on a graph until no
augmenting paths can be found. An initial zero-feasible flow is used to start the
algorithm. Notice that the capacity of the minimum cut is equal to the total
flow leaving Vertex 1 and flowing to Vertex 4. 117
8.5 Illustration of the impact of an augmenting path on the flow from v1 to vm . 117
8.6 Games to be played flow from an initial vertex s (playing the role of v1 ). From
here, they flow into the actual game events illustrated by vertices (e.g., NY-BOS
for New York vs. Boston). Wins and loses occur and these wins flow across the
infinite capacity edges to team vertices. From here, the games all flow to the
final vertex t (playing the role of vm ). 120
8.7 Optimal flow was computed using the Edmonds-Karp algorithm. Notice a
minimum capacity cut consists of the edges entering t and not all edges leaving
s are saturated. Detroit cannot make the playoffs. 121
8.8 A maximal matching and a perfect matching. Note no other edges can be
added to the maximal matching and the graph on the left cannot have a perfect
matching. 122
8.9 In general, the cardinality of a maximal matching is not the same as the
cardinality of a minimal vertex covering, though the inequality that the
cardinality of the maximal matching is at most the cardinality of the minimal
covering does hold. 124
Three random graphs in the same random graph family G 10, 12 . The first two
9.1
graphs, which have 21 edges, have probability 0.52 1 × 0.52 4. The third graph,
which has 24 edges, has probability 0.52 4 × 0.52 1. 128
9.2 A path graph with 4 vertices has exactly 4!/2 = 12 isomorphic graphs obtained
by rearranging the order in which the vertices are connected. 131
9.3 There are 4 graphs in the isomorphism class of S3 , one for each possible center
of the star. 132
9.4 The 4 isomorphism types in the random graph family G(5, 3). We show that
there are 60 graphs isomorphic to this first graph (a) inside G(5, 3), 20 graphs
isomorphic to the second graph (b) inside G(5, 3), 10 graphs isomorphic to the
third graph (c) inside G(5, 3) and 30 graphs isomorphic to the fourth graph (d)
inside G(5, 3). 133
10.1 A graph coloring. We need three colors to color this graph. 137
ix
10.2 At the first step of constructing G , we add three vertices {T, F, B} that form a
complete subgraph. 142
10.3 At the second step of constructing G , we add two vertices vi and vi0 to G and
an edge {vi , vi0 } 142
10.4 At the third step of constructing G, we add a “gadget” that is built specifically
for term φj . 143
10.5 When φj evaluates to false, the graph G is not 3-colorable as illustrated in
subfigure (a). When φj evaluates to true, the resulting graph is colorable. By
the label TFT, we mean v(xj1 ) = v(xj3 ) = TRUE and vj2 = FALSE. 144
11.1 The cycle space of a graph can be thought of as all the cycles contained in that
graph along with the subgraphs consisting of cycles that only share vertices but
no edges. This is illustrated in this figure. 152
11.2 A fundamental cycle of a graph G (with respect to a spanning forest F ) is a
cycle created from adding an edge from the original edge set of G (not in F ) to
F. 153
11.3 The cut space of a graph can be thought of as all the minimal cuts contained in
that graph along with the subgraphs consisting of minimal cuts that only share
vertices but no edges. This is illustrated in this figure. 154
11.4 A fundamental edge cut of a graph G (with respect to a spanning forest F ) is a
partition cut created from partitioning the vertices based on a cut in a spanning
tree and then constructing the resulting partition cut. 155
x
Using These Notes
Stop! This is a set of lecture notes. It is not a book. Go away and come back when you
have a real textbook on Graph Theory. Okay, do you have a book? Alright, let’s move on
then. This is a set of lecture notes for Math 485–Penn State’s undergraduate Graph Theory
course. Since I use these notes while I teach, there may be typographical errors that I noticed
in class, but did not fix in the notes. If you see a typo, send me an e-mail and I’ll add an
acknowledgement. There may be many typos, that’s why you should have a real textbook.
The lecture notes are loosely based on Gross and Yellen’s Graph Theory and It’s Appli-
cations, Bollobás’ Graph Theory, Diestel’s Graph Theory, Wolsey and Nemhauser’s Integer
and Combinatorial Optimization, Korte and Vygen’s Combinatorial Optimization and sev-
eral other books that are cited in these notes. All of the books mentioned are good books
(some great). The problem is, they are either too complex for an introductory undergrad-
uate course, have odd notation, do not focus enough on applications or focus too much on
applications.
This set of notes correct some of the problems I mention by presenting the material
in a format for that can be used easily in an undergraduate mathematics class. Many of
the proofs in this set of notes are adapted from the textbooks with some minor additions.
One thing that is included in these notes is a treatment of graph duality theorems from the
perspective linear optimization. This is not covered in most graph theory books, while graph
theoretic principles are not covered in many linear or combinatorial optimization books. I
should note, Bondy and Murty discuss Linear Programming in their book Graph Theory,
but it is clear they are not experts in optimization and their treatment is somewhat non
sequitur, which is a shame. The best book on the topic of combinatorial optimization is by
far Korte and Vygen’s, who do cover linear programming in their latest edition. Note: Penn
State has an expert in graph coloring problems, so there is no section on coloring in these
notes, because I invited a guest lecturer who was the expert. I may add a section on graph
coloring eventually.
In order to use these notes successfully, you should have taken a course in combinatorial
proof (Math 311W at Penn State) and ideally matrix algebra (Math 220 at Penn State),
though courses in Linear Programming (Math 484 at Penn State) wouldn’t hurt. I review
a substantial amount of the material you will need, but it’s always good to have covered
prerequisites before you get to a class. That being said, I hope you enjoy using these notes!
xi
CHAPTER 1
2
CHAPTER 2
1 2
4 3
are constructed by representing each vertex as a point (or square, circle, triangle etc.) and
each edge as a line connecting the vertex representations that make up the edge. That is, let
v1 , v2 ∈ V . Then there is a line connecting the points for v1 and v2 if and only if {v1 , v2 } ∈ E.
In this example, the neighborhood of Vertex 1 is Vertices 2 and 4 and Vertex 1 is adjacent
to these vertices.
Definition 2.9 (Degree). Let G = (V, E) be a graph and let v ∈ V . The degree of v,
written deg(v) is the number of non-self-loop edges adjacent to v plus two times the number
of self-loops defined at v. More formally:
deg(v) = |{e ∈ E : ∃u ∈ V (e = {u, v})}| + 2 |{e ∈ E : e = {v}}|
Here if S is a set, then |S| is the cardinality of that set.
Remark 2.10. Note that each vertex in the graph in Figure 2.1 has degree 2.
Example 2.11. If we replace the edge set in Example 2.8 with:
E = {{1, 2}, {2, 3}, {3, 4}, {4, 1}, {1}}
then the visual representation of the graph includes a loop that starts and ends at Vertex 1.
This is illustrated in Figure 2.2. In this example the degree of Vertex 1 is now 4. We obtain
this by counting the number of non self-loop edges adjacent to Vertex 1 (there are 2) and
adding two times the number of self-loops at Vertex 1 (there is 1) to obtain 2 + 2 × 1 = 4.
Example 2.12. The city of Königsburg exists as a collection of islands connected by
bridges as shown in Figure 2.3. The problem Euler wanted to analyze was: Is it possible
to go from island to island traversing each bridge only once? This was assuming that
there was no trickery such as using a boat. Euler analyzed the problem by simplifying the
4
Self-Loop
1 2
4 3
Figure 2.2. A self-loop is an edge in a graph G that contains exactly one vertex.
That is, an edge that is a one element subset of the vertex set. Self-loops are
illustrated by loops at the vertex in question.
Islands
Bridge
A
D
B
Figure 2.3. The city of Königsburg is built on a river and consists of four islands,
which can be reached by means of seven bridges. The question Euler was interested
in answering is: Is it possible to go from island to island traversing each bridge
only once? (Picture courtesy of Wikipedia and Wikimedia Commons: http://en.
wikipedia.org/wiki/File:Konigsberg_bridges.png)
Island(s)
A D
Bridge
B
Figure 2.4. Representing each island as a dot and each bridge as a line or curve
connecting the dots simplifies the visual representation of the seven Königsburg
Bridges.
5
representation to a graph. Assume that we treat each island as a vertex and each bridge as
an line egde. The resulting graph is illustrated in Figure 2.4.
Note this representation dramatically simplifies the analysis of the problem in so far as
we can now focus only on the structural properties of this graph. It’s easy to see (from
Figure 2.4) that each vertex has an odd degree. More importantly, since we are trying to
traverse islands without ever recrossing the same bridge (edge), when we enter an island
(say C) we will use one of the three edges. Unless this is our final destination, we must use
another edge to leave C. Additionally, assuming we have not crossed all the bridges yet, we
know we must leave C. That means that the third edge that touches C must be used to
return to C a final time. Alternatively, we could start at Island C and then return once and
never come back. Put simply, our trip around the bridges of Königsburg had better start or
end at Island C. But Islands (vertices) B and D also have this property. We can’t start and
end our travels over the bridges on Islands C, B and D simultaneously, therefore, no such
walk around the islands in which we cross each bridge precisely once is possible.
Exercise 1. Since Euler’s work two of the seven bridges in Königsburg have been de-
stroyed (during World War II). Another two were replaced by major highways, but they are
still (for all intents and purposes) bridges. The remaining three are still intact. (See Figure
2.5.) Construct a graph representation of the new bridges of Königsburg and determine
A
D
B
Figure 2.5. During World War II two of the seven original Königsburg bridges
were destroyed. Later two more were made into modern highways (but they are
still bridges). Is it now possible to go from island to island traversing each bridge
only once? (Picture courtesy of Wikipedia and Wikimedia Commons: http://en.
wikipedia.org/wiki/File:Konigsberg_bridges_presentstatus.png)
whether it is possible to visit the bridges traversing each bridge exactly once. If so, find such
a sequence of edges. [Hint: It might help to label the edges in your graph. You do not have
to begin and end on the same island.]
Definition 2.13 (MultiGraph). A graph G = (V, E) is a multigraph if there are two
edges e1 and e2 in E so that e1 and e2 are equal as sets. That is, there are two vertices v1
and v2 in V so that e1 = e2 = {v1 , v2 }.
Remark 2.14. Note in the definition of graph (Definition 2.1) we were very careful to
specify that E is a collection of one and two element subsets of V rather than to say that E
was, itself, a set. This allows us to have duplicate edges in the edge set and thus to define
multigraphs. In Computer Science a set that may have duplicate entries is sometimes called
a multiset. A multigraph is a graph in which E is a multiset.
6
Example 2.15. Consider the graph associated with the Bridges of Königsburg Problem
(see Figure 2.6). The vertex set is V = {A, B, C, D}. The edge collection is:
E = {{A, B}, {A, B}, {A, C}, {A, C}, {A, D}, {B, D}, {C, D}}
This multigraph occurs because there are two bridges connecting island A with island B
and two bridges connecting island A with island C. If two vertices are connected by two (or
more) edges, then the edges are simply represented as parallel lines (or arcs) connecting the
vertices.
A D
Figure 2.6. A multigraph is a graph in which a pair of nodes can have more than
one edge connecting them. When this occurs, the for a graph G = (V, E), the
element E is a collection or multiset rather than a set. This is because there are
duplicate elements (edges) in the structure.
Remark 2.16. Let G = (V, E) be a graph. There are two degree values that are of
interest in graph theory: the largest and smallest vertex degrees usually denoted ∆(G) and
δ(G). That is:
(2.2) ∆(G) = max deg(v)
v∈V
(2.3) δ(G) = min deg(v)
v∈V
Remark 2.17. Despite our initial investigation of The Bridges of Königsburg Problem
as a mechanism for beginning our investigation of graph theory, most of graph theory is not
concerned with graphs containing either self-loops or multigraphs.
Definition 2.18 (Simple Graph). A graph G = (V, E) is a simple graph if G has no
edges that are self-loops and if E is a subset of two element subsets of V ; i.e., G is not a
multi-graph.
Remark 2.19. In light of Remark 2.17, we will assume that every graph we discuss in
these notes is a simple graph and we will use the term graph to mean simple graph. When
a particular result holds in a more general setting, we will state it explicitly.
Exercise 2. Consider the new Bridges of Königsburg Problem from Exercise 1. Is the
graph representation of this problem a simple graph? Could a self-loop exist in a graph
derived from a Bridges of Königsburg type problem? If so, what would it mean? If not,
why?
7
Exercise 3. Prove that for simple graphs the degree of a vertex is simply the cardinality
of its (open) neighborhood.
2. Directed Graphs
Definition 2.20 (Directed Graph). A directed graph (digraph) is a tuple G = (V, E)
where V is a (finite) set of vertices and E is a collection of elements contained in V × V .
That is, E is a collection of ordered pairs of vertices. The edges in E are called directed
edges to distinguish them from those edges in Definition 2.1
Definition 2.21 (Source / Destination). Let G = (V, E) be a directed graph. The
source (or tail ) of the (directed) edge e = (v1 , v2 ) is v1 while the destination (or sink or
head ) of the edge is v2 .
Remark 2.22. A directed graph (digraph) differs from a graph only insofar as we replace
the concept of an edge as a set with the idea that an edge as an ordered pair in which the
ordering gives some notion of direction of flow. In the context of a digraph, a self-loop is an
ordered pair with form (v, v). We can define a multi-digraph if we allow the set E to be a
true collection (rather than a set) that contains multiple copies of an ordered pair.
Remark 2.23. It is worth noting that the ordered pair (v1 , v2 ) is distinct from the pair
(v2 , v1 ). Thus if a digraph G = (V, E) has both (v1 , v2 ) and (v2 , v1 ) in its edge set, it is not
a multi-digraph.
Example 2.24. We can modify the figures in Example 2.8 to make it directed. Suppose
we have the directed graph with vertex set V = {1, 2, 3, 4} and edge set:
E = {(1, 2), (2, 3), (3, 4), (4, 1)}
This digraph is visualized in Figure 2.7(a). In drawing a digraph, we simply append arrow-
heads to the destination associated with a directed edge.
We can likewise modify our self-loop example to make it directed. In this case, our edge
set becomes:
E = {(1, 2), (2, 3), (3, 4), (4, 1), (1, 1)}
This is shown in Figure 2.7(b).
1 2 1 2
4 3 4 3
(a) (b)
Figure 2.7. (a) A directed graph. (b) A directed graph with a self-loop. In a
directed graph, edges are directed; that is they are ordered pairs of elements drawn
from the vertex set. The ordering of the pair gives the direction of the edge.
8
Example 2.25. Consider the (simple) graph from Example 2.8. Suppose that the vertices
represent islands (just as they did) in the Bridges of Königsburg Problem and the edges
represent bridges. It is very easy to see that a tour of these islands exists in which we cross
each bridge exactly once. (Such a tour might start at Island 1 then go to Island 2, then 3,
then 4 and finally back to Island 1.)
Definition 2.26 (Underlying Graph). If G = (V, E) is a digraph, then the underlying
graph of G is the (multi) graph (with self-loops) that results when each directed edge (v1 , v2 )
is replaced by the set {v1 , v2 } thus making the edge non-directional. Naturally if the directed
edge is a directed self-loop (v, v) then it is replaced by the singleton set {v}.
Remark 2.27. Notions like edge and vertex adjacency and neighborhood can be extended
to digraphs by simply defining them with respect to the underlying graph of a digraph. Thus
the neighborhood of a vertex v in a digraph G is N (v) computed in the underlying graph.
Remark 2.28. Whether the underlying graph of a digraph is a multi-graph or not usually
has no bearing on relevant properties. In general, an author will state whether two directed
edges (v1 , v2 ) and (v2 , v1 ) are combined into a single set {v1 , v2 }or two sets in a multiset. As
a rule-of-thumb, multi-digraphs will have underlying multigraphs, while digraphs generally
have underlying graphs that are not multi-graphs.
Remark 2.29. It is possible to mix (undirected) edges and directed edges together into
a very general definition of a graph with both undirected and directed edges. Situations
requiring such a model almost never occur in modeling and when they do, the undirected
edges with form {v1 , v2 } are usually replaced with a pair of directed edges (v1 , v2 ) and (v2 , v1 ).
Thus, for remainder of these notes, unless otherwise stated:
(1) When we say graph we will mean simple graph as in Remark 2.19. If we intend the
result to apply to any graph we’ll say a general graph.
(2) When we say digraph we will mean a directed graph G = (V, E) in which every edge
is a directed edge and the component E is a set and in which there are no self-loops.
Exercise 4. Suppose in the New Bridges of Königsburg (from Exercise 1) some of the
bridges are to become one way. Find a way of replacing the edges in the graph you obtained
in solving Exercise 1 with directed edges so that the graph becomes a digraph but so that it
is still possible to tour all the islands without crossing the same bridge twice. Is it possible
to directionalize the edges so that a tour in which each bridge is crossed once is not possible
but it is still possible to enter and exit each island? If so, do it. If not, prove it is not
possible. [Hint: In this case, enumeration is not that hard and its the most straight-forward.
You can use symmetry to shorten your argument substantially.]
1
2
Figure 2.8. The graph above has a degree sequence d = (4, 3, 2, 2, 1). These are
the degrees of the vertices in the graph arranged in increasing order.
Proof. The fact that d is graphic means there is at least one graph whose degree
sequence is equal to d. From among all those graphs, chose G = (V, E) to maximize
(2.5) r = |N (v1 ) ∩ {v2 , . . . , vd1 +1 }|
Recall that N (v1 ) is the neighborhood of v1 . Thus maximizing Expression 2.5 implies we are
attempting to make sure that as many vertices in the set {v2 , . . . , vd1 +1 } are adjacent to v1
as possible.
If r = d1 , then the theorem is proved since v1 is adjacent to v2 , . . . , vd1 +1 . Therefore we’ll
proceed by contradiction and assume r < d1 . We know the following things:
(1) Since deg(v1 ) = d1 there must be a vertex vt with t > d1 + 1 so that vt is adjacent
to v1 .
(2) Moreover, there is a vertex vs with 2 ≤ s ≤ d1 + 1 that is not adjacent to v1 .
(3) By the ordering of V , deg(vs ) ≥ deg(vt ); that is ds ≥ dt .
11
(4) Therefore, there is some vertex vk ∈ V so that vs is adjacent to vk but vt is not
because vt is adjacent to v1 and vs is not and the degree of vs is at least as large as
the degree of vt .
Let us create a new graph G0 = (V, E 0 ). The edge set E 0 is constructed from E by:
(1) Removing edge {v1 , vt }.
(2) Removing edge {vs , vk }.
(3) Adding edge {v1 , vs }.
(4) Adding edge {vt , vk }.
This is ilustrated in Figure 2.9. In this construction, the degrees of v1 , vt , vs and vk are
2 3 s d1 + 1 t
2 3 s d1 + 1 t
Figure 2.9. We construct a new graph G0 from G that has a larger value r (See
Expression 2.5) than our original graph G did. This contradicts our assumption
that G was chosen to maximize r.
Exercise 8 (Independent Project). There are several proofs of Theorem 2.45, some
short. Investigate them and reconstruct an annotated proof of the result. In addition
investigate Berg’s approach using flows [Ber73].
Remark 2.46. There has been a lot of interest recently in degree sequences of graphs,
particularly as a result of the work in Network Science on so-called scale-free networks. This
has led to a great deal of investigation into properties of graphs with specific kinds of degree
sequences. For the brave, it is worth looking at [MR95], [ACL01], [BR03] and [Lu01] for
interesting mathematical results in this case. To find out why all this investigation started,
see [BAJB00].
3.1. Types of Graphs from Degree Sequences.
Definition 2.47 (Complete Graph). Let G = (V, E) be a graph with |V | = n with
n ≥ 1. If the degree sequence of G is (n − 1, n − 1, . . . , n − 1) then G is called a complete
graph on n vertices and is denoted Kn . In a complete graph on n vertices each vertex is
connected to every other vertex by an edge.
2Thanks to an anonymous comment from the Internet, that detected a small typo in Equation 2.6 in
versions before 1.4.1
13
Lemma 2.48. Let Kn = (V, E) be the complete graph on n vertices. Then:
n(n − 1)
|E| =
2
Corollary 2.49. Let G = (V, E) be a graph and let |V | = n. Then:
n
0 ≤ |E| ≤
2
Exercise 9. Prove Lemma 2.48 and Corollary 2.49. [Hint: Use Equation 2.4.]
Definition 2.50 (Regular Graph). Let G = (V, E) be a graph with |V | = n. If the
degree sequence of G is (k, k, . . . , k) with k ≤ n − 1 then G is called a k-regular graph on n
vertices.
Example 2.51. We illustrate one complete graph and two (non-complete) regular graphs
in Figure 2.10. Obviously every complete graph is a regular graph. Every Platonic solid is
Figure 2.10. The complete graph, the “Petersen Graph” and the Dodecahedron.
All Platonic solids are three-dimensional representations of regular graphs, but not
all regular graphs are Platonic solids. These figures were generated with Maple.
also a regular graph, but not every regular graph is a Platonic solid. In Figure 2.10(c) we
show a flattened dodecahedron, one of the five platonic solids from classical geometry. The
Peteron Graph (Figure 2.10(b)) is a 3-regular graph that is used in many graph theoretic
examples.
3.2. Digraphs.
Definition 2.52 (In-Degree, Out-Degree). Let G = (V, E) be a digraph. The in-degree
of a vertex v in G is the total number of edges in E with destination v. The out-degree of
v is the total number of edges in E with source v. We will denote the in-degree of v by
degin (v) and the out-degree by degout (v).
14
Theorem 2.53. Let G = (V, E) be a digraph. Then the following holds:
X X
(2.7) |E| = degin (v) = degout (v)
v∈V v∈V
4. Subgraphs
Definition 2.54 (Subgraph). Let G = (V, E). A graph H = (V 0 , E 0 ) is a subgraph of G
if V 0 ⊆ V and E 0 ⊆ E. The subgraph H is proper if V 0 ( V or E 0 ( E.
Example 2.55. We illustrate the notion of a sub-graph in Figure 2.11. Here we illustrate
a sub-graph of the Petersen Graph. The sub-graph contains vertices 6, 7, 8, 9 and 10 and
the edges connecting them.
Figure 2.11. The Petersen Graph is shown (a) with a sub-graph highlighted (b)
and that sub-graph displayed on its own (c). A sub-graph of a graph is another
graph whose vertices and edges are sub-collections of those of the original graph.
Figure 2.12. The subgraph (a) is induced by the vertex subset V 0 = {6, 7, 8, 9, 10}.
The subgraph shown in (b) is a spanning sub-graph and is induced by edge subset
E 0 = {{1, 6} , {2, 9} , {3, 7} , {4, 10} , {5, 8} , {6, 7} , {6, 10} , {7, 8} , {8, 9} , {9, 10}}.
Figure 2.13. A clique is a set of vertices in a graph that induce a complete graph
as a subgraph and so that no larger set of vertices has this property. The graph in
this figure has 3 cliques.
Example 2.67. In Figure 2.14, the graph from Figure 2.13 is illustrated (in a different
spatial configuration) with its cliques. The complement of the graph is also illustrated.
Notice that in the complement, every clique is now an independent set.
Figure 2.14. A graph and its complement with cliques in one illustrated and in-
dependent sets in the other illustrated.
17
Definition 2.68 (Relative Complement). If G = (V, E) is a graph and H = (V, E 0 ) is a
spanning sub-graph, then the relative complement of H in G is the graph H 0 = (V, E 00 ) with:
e = {v1 , v2 } ∈ E 00 ⇐⇒ {v1 , v2 } ∈ E and {v1 , v2 } 6∈ E 0
Theorem 2.69. Let G = (V, E) be a graph and let H = (V, E 0 ) be its complement. A set
S is a clique in G if and only if S is a maximal independent set in H.
Exercise 12. Prove Theorem 2.69. [Hint: Use the definition of graph complement and
the fact that if an edge is present in a graph G is must be absent in its complement.]
Definition 2.70 (Vertex Cover). Let G = (V, E) be a graph. A vertex cover is a set of
vertices S ⊆ V so that for all e ∈ E at least one element of e is in S; i.e., every edge in E is
adjacent to at least one vertex in S.
Example 2.71. A covering is illustrated in Figure 2.15
Figure 2.15. A covering is a set of vertices so that ever edge has at least one
endpoint inside the covering set.
Exercise 13. Illustrate by exhaustion that removing any vertex from the proposed
covering in Figure 2.15 destroys the covering property.
Theorem 2.72. A set I is an independent set in a graph G = (V, E) if and only if the
set V \ I is a covering in G.
Proof. (⇒) Suppose that I is an independent set and choose e = {v, v 0 } ∈ E. If v ∈ I,
then clearly v 0 ∈ V \ I. The same is true of v 0 . It is possible that neither v nor v 0 is in I,
but this does not affect that fact that V \ I must be a cover since for every edge e ∈ E at
least one element is in V \ I.
(⇐) Now suppose that V \ I is a vertex covering. Choose any two vertices v and v 0 in I.
The fact that V \ I is a vertex covering implies that {v, v 0 } cannot be an edge in E because
it does not contain at least one element from V \ I, contradicting our assumption on V \ I.
Thus, I is an independent set since no two vertices in I are connected by an edge in E. This
completes the proof.
18
Remark 2.73. Theorem 2.72 shows that the problem of identifying a largest independent
set is identical to the problem of identifying a minimum (size) vertex covering. As it turns
out, both these problems are equivalent to yet a third problem, which we will discuss later
called the matching problem. Coverings (and matchings) are useful, but to see one example
of their utility imagine a network of outposts is to be established in an area (like a combat
theatre). We want to deliver a certain type of supplies (antibiotics for example) to the
outposts in such a way so that no outpost is anymore than one link (edge) away from an
outpost where the supply is available. The resulting problem is a vertex covering problem.
In attempting to find the minimal vertex covering asks the question what is the minimum
number of outposts that must be given antibiotics?
19
CHAPTER 3
6 3
4 4 4 4 1
4 1
3 4
5 5 5 5
3 3 3 3
Figure 3.1. A walk (a), cycle (b), Eulerian trail (c) and Hamiltonian path (d) are
illustrated.
Remark 3.14. For the most part, the terminology on paths, cycles, tours etc. is stan-
dardized. However, not every author adheres to these same terms. It is always wise to
identify exactly what words an author is using for walks, paths cycles etc.
Remark 3.15. Walks, cycles, paths and tours can all be extended to the case of digraphs.
In this case, the walk, path, cycle or tour must respect the edge directionality. Thus, if
w = (. . . , vi , ei , vi+1 , . . . ) is a directed walk, then ei = (vi , vi+1 ) as an ordered pair.
Exercise 16. Formally define directed walks, directed cycles, directed paths and directed
tours for directed graphs. [Hint: Begin with Definition 3.1 and make appropriate changes.
Then do this for cycles, tours etc.]
Exercise 17. Show that the diameter of a graph is in fact the maximum eccentricity of
any vertex in the graph.
23
Definition 3.20 (Radius). Let G = (V, E). The radius of G is minimum eccentricy of
any vertex in V . That is:
(3.3) rad(G) = min ecc(v1 ) = min max dG (v1 , v2 )
v1 ∈V v1 ∈V v2 ∈V
1
2
Figure 3.3. The diameter of this graph is 2, the radius is 1. It’s girth is 3 and its
circumference is 4.
Exercise 18. Compute the diameter, radius, girth and circumference of the Petersen
Graph.
3. More on Trails and Cycles
Remark 3.24. Suppose that
w = (v1 , e1 , v2 , . . . , vn , en , vn+1 )
If for some m ∈ {1, . . . , n} and for some k ∈ Z we have vm = vm+k . Then
w0 = (vm , em , . . . , em+k−1 , vm+k )
is a closed sub-walk of w. The walk w0 can be deleted from the walk w to obtain a new walk:
w00 = (v1 , e1 , v2 , . . . , vm+k , em+k , vm+k+1 , . . . , vn , en , vn+1 )
24
that is shorter than the original walk. This is illustrated in Figure 3.4. [GY05] calls this a
walk reduction, though this notation is not standard.
w
4
w� 6
w��
1 2 3 7 8 1 2 3 7 8
Figure 3.4. We can create a new walk from an existing walk by removing closed
sub-walks from the walk.
Lemma 3.25. Let G = (V, E) be a graph and suppose that t is a non-trivial tour (closed
trail) in G. Then t contains a cycle.
Proof. The fact that t is closed implies that it contains at least one pair of repeated
vertices. Therefore a closed sub-walk of t must exist since t is itself has these repeated
vertices. Let c be a minimal (length) closed sub-walk of t. We will show that c must be a
cycle. By way of contradiction, suppose that c is not a cycle. Then since it is closed it must
contain a repeated vertex (that is not its first vertex). If we applied our observation from
Remark 3.24 we could produce a smaller closed walk c0 , contradicting our assumption that
c was minimal. Thus c must have been a cycle. This completes the proof.
Theorem 3.26. Let G = (V, E) be a graph and suppose that t is a non-trivial tour (closed
trail). Then t is composed of edge disjoint cycles.
Proof. We will proceed by induction. In the base case, assume that t is a one edge
closed tour, then G is a non-simple graph that contains a self-loop and this is a single edge
in t and thus t is a (non-simple) cycle2. Now suppose the theorem holds for all closed trails
of length N or less. We will show the result holds for a tour of length N + 1. Applying
Lemma 3.25, we know there is at least one cycle c in t. If we reduce tour t by c to obtain t0 ,
then t is still a tour and has length at most N . We can now apply the induction hypothesis
to see that this new tour t0 is composed of disjoint cycles. When taken with c, it is clear
that t is now composed of disjoint cycles. The theorem is illustrated in Figure 3.5. This
completes the proof.
4. Graph Components
Definition 3.27 (Reachability). Let G = (V, E) and let v1 and v2 be two vertices in
V . Then v2 is reachable from v1 if there is a walk w beginning at v1 and ending at v2
(alternatively, the distance from v1 to v2 is not +∞). If G is a digraph, we assume that the
walk is directed.
Definition 3.28 (Connectedness). A graph G is connected if for every pair of vertices v1
and v2 in V , v2 is reachable from v1 . If G is a digraph, then G is connected if its underlying
graph is connected. A graph that is not connected is called disconnected.
2If we assume that G is simple, then the base case begins with t having length 3. In this case it is a
3-cycle.
25
1
1
2 1
2 2
2
5
2 5
7
6
= 7 + 6
4
4 4 1
4
1 3
3
5
5
3
3
Figure 3.5. We show how to decompose an (Eulerian) tour into an edge disjoint
set of cycles, thus illustrating Theorem 3.26.
1 1 1
2 2 2
4 4 4
5 5 5
3 3 3
Figure 3.6. A connected graph (a), a disconnected graph (b) and a connected
digraph that is not strongly connected (c).
components than graph G. If V 0 = {v} is a vertex cut, then v is called a cut vertex.
Definition 3.39 (Edge Cut and Cut-Edge). Let G = (V, E) be a graph. A set E 0 ⊂ E
is a edge cut if the graph G0 resulting from deleting edges E 0 from G has more components
than graph G. If E 0 = {e} is an edge cut, then e is called a cut-edge.
Definition 3.40 (Minimal Edge Cut). Let G = (V, E). An edge cut E 0 of G is minimal
if when we remove any edge from E 0 to form E 00 the new set E 00 is no longer an edge cut.
Example 3.41. In figure 3.7 we illustrate a vertex cut and a cut vertex (a singleton
vertex cut) and an edge cut and a cut edge (a singleton edge cut).Note that the edge-cut
in Figure 3.7(a) is minimal and cut-edges are always minimal. A cut-edge is sometimes
called a bridge because it connects two distinct components in a graph. Bridges (and small
edge cuts) are a very important part of social network analysis [KY08, CSW05] because
they represent connections between different communities. To see this, suppose that (for
Cut Vertex
Cut Edge
2 e1 6
2 7
1
5 1 e1
7 5 6
3 4 e2 9
8 3 4
9 8
(a) Edge Cut and Cut Vertex (b) Vertex Cut and Cut Edge
Figure 3.7. We illustrate a vertex cut and a cut vertex (a singleton vertex cut)
and an edge cut and a cut edge (a singleton edge cut). Cuts are sets of vertices or
edges whose removal from a graph creates a new graph with more components than
the original graph.
u1 um
e
v1 v2 vi vi+1 vn+1
Figure 3.8. If e lies on a cycle, then we can repair path w by going the long way
around the cycle to reach vn+1 from v1 .
(⇒) Suppose G0 = G − {e} is connected. Now let e = {v1 , vn+1 }. Since G0 is connected,
there is a walk from v1 to vn+1 . Applying Remark 3.24, we can reduce this walk to a path p
with:
p = (v1 , e1 , . . . , vn , en , vn+1 )
Since p is a path, there are no repeated vertices in p. We can construct a cycle c containing
e in G as:
p = (v1 , e1 , . . . , vn , en , vn+1 , e, v1 )
since e = {v1 , vn+1 } = {vn+1 , v1 }. Thus, e lies on a cycle in G. This completes the proof.
Corollary 3.43. Let G = (V, E) be a connected graph and let e ∈ E. The edge e is a
cut edge if and only if e does not lie on a cycle in G.
Exercise 20. Prove Corollary 3.43.
28
Remark 3.44. The next result is taken from Extremal Graph Theory, the study of
extremes or bounds in properties of graphs. There are a number of results in Extremal
Graph Theory that are of interest. See [Bol04] for a complete introduction.
Theorem 3.45. If G = (V, E) is a graph with n vertices and k components, then:
(n − k + 1)(n − k)
|E| ≤
2
Pk
Proof. Assume that each component of G has ni vertices in it with i=1 ni = n.
Applying Lemma 2.48 we know that Component i has at most ni (ni − 1)/2 edges; that is,
each component is a complete graph on ni vertices. This is the largest number of edges that
can occur under these assumptions.
Consider the case where k − 1 of the components has exactly 1 vertex and the remaining
component has n − (k − 1) vertices. Then the total number of edges in this case is:
(n − (k − 1))(n − (k − 1) − 1) (n − k + 1)(n − k)
=
2 2
edges. It now suffices to show that this case has the greatest number of vertices of all cases
where the k components are each complete graphs.
Consider the case when component i is Kr and component j is Ks with r, s ≥ 2 and
suppose r ≥ s. Then the total number of edges in these two components is:
r(r − 1) + s(s − 1) r 2 + s2 − r − s
=
2 2
Now, suppose we move one vertex in component j to component i. Then component i is
now Kr+1 and component j is now Ks−1 . Applying Lemma 2.48, the number of edges in this
case is:
(r + 1)(r) + (s − 1)(s − 2) r2 + r + s2 − 3s + 2
=
2 2
Observe that since r ≥ s, substituting s for r we have:
r2 + r + s2 − 3s + 2 ≥ r2 + s2 − 2s + 2
By a similar argument:
r2 + s2 − 2s ≥ r2 + s2 − r − s
Thus we conclude that:
r2 + r + s2 − 3s + 2 r2 + s2 − 2s + 2 r2 + s2 − 2s r 2 + s2 − r − s
≥ ≥ ≥
2 2 2 2
Repeating this argument over and over shows that in a k component graph with n vertices,
the largest number of edges must occur in the case when there is one component with
n − (k − 1) vertex and k − 1 components with exactly 1 vertex. This completes the proof.
Corollary 3.46. Any graph with n vertices and more than (n − 1)(n − 2)/2 edges is
connected.
Exercise 21. Prove Corollary 3.46.
29
5. Introduction to Centrality
Remark 3.47. There are many situations in which we’d like to measure the importance
of a vertex in a graph. The problem of measuring this quantity is usually called determining
a vertex’s centrality.
Definition 3.48 (Degree Centrality). Let G = (V, E) be a graph. The degree centrality
of a vertex is just its degree or for a centrality in the set [0, 1], we may define the degree
centrality of vertex vi as deg(vi )/2|E|
Exercise 22. Show that if we require the degree centralities of a graph to be in the
interval [0, 1], then the sum of the centralities equals 1.
Remark 3.49. Degree centrality is only the simplest measurement of centrality. There
are many other measures of this quantity we discuss one more and then continue our discus-
sion of this topic in Chapter 5.
Definition 3.50 (Geodesic Centrality). Let G = (V, E) be a graph. The geodesic
centrality (sometimes called the betweeness) of a vertex v ∈ V is the fraction of times
v occurs on any shortest path connecting any other pair of vertices s, t ∈ V . Put more
formally, let σst be the total number of shortest paths connecting vertex s with vertex t. Let
σst (v) be the number of these shortest paths containing v. The geodesic centrality of v is:
X σst (v)
(3.4) CB (v) =
s6=t6=v
σst
These values can be normalized so that they fall within [0, 1] by dividing each CB (v) by the
sum of all CB (v).
Example 3.51. Consider the graph with 4 vertices shown below. The degrees of the
graph are (2, 3, 3, 2), which is the unnormalized degree centrality. The normalized degree
centrality of the vertices is:
(1) v1 : 15
3
(2) v2 : 10
3
(3) v3 : 10
(4) v4 : 15
1 2 3 4
To compute the normalized Geodesic centrality, we must compute the fraction of times
a vertex appears in a shortest path. This is shown in the table below in Table 1: In the
vertex pair (1, 2) there is exactly one shortest path connecting 1 to 2. Since 1 and 2 are
the end points, they are not counted. Vertices 3 and 4 do not appear in this shortest path,
so they each receive a zero. For (1, 4) there are two shortest paths (one through 2 and the
other through 3) therefore 1/2 of the shortest paths contain vertex 2 and 1/2 of the shortest
30
Vertex Pair 1 2 3 4
(1,2) - - 0 0
(1,3) - 0 - 0
1 1
(1,4) - 2 2
-
(2,3) 0 - - 0
(2,4) 0 - 0 -
(3,4) 0 0 - -
1 1
SUM 0 2 2
0
Table 1. A table showing the intermediate computations for geodesic centrality.
paths contain vertex 3. The remainder of the table is filled out in exactly the same way. The
normalized geodesic centrality is:
(1) v1 : 0
(2) v2 : 12
(3) v3 : 12
(4) v4 : 0
In this case, we see that the centrality measures are similar in their ordering, but different
in their values.
Exercise 23. Compute the geodesic centrality and the degree centrality for the graph
shown in Figure 3.10. Compare your results.
2 3
4 5
Figure 3.10. The graph for which you will compute centralities.
Remark 3.52. It’s clear from this analysis that cut vertices should have high geodesic
centrality if they connect two large components of a graph. Thus, by some measures, cut
vertices are very important elements of graphs.
6. Bipartite Graphs
Definition 3.53. A graph G = (V, E) is bipartite if V = V1 ∪ V2 and V1 ∩ V2 = ∅ and
if e = E, then e = {v1 , v2 } with v1 ∈ V1 and v2 ∈ V2 . This definition is valid for non-simple
graphs as well.
Remark 3.54. In a bipartite graph, we can think of the vertices as belonging to one
of two classes (either V1 or V2 ) and edges only exist between elements of the two classes,
31
not between elements in the same class. We can also define n-partite graphs in which the
vertices are in any of n classes and there are only edges between classes, not within classes.
Example 3.55. Figure 3.11 shows a bipartite graph in which V1 = {1, 2, 3} and V2 =
{4, 5, 6, 7}. Notice that there are only edges connecting vertices in V1 and vertices in V2 .
There are not edges connecting elements in V1 to other elements in V1 or elements in V2 to
other elements in V2 .
Figure 3.11. A bipartite graph has two classes of vertices and edges in the graph
only exists between elements of different classes.
Definition 3.56 (Complete Bipartite Graph). The graph Km,n is the complete bipartite
graph consisting of the vertex set V = {v11 , . . . , v1m } ∪ {v21 , . . . , v2n } and having an edge
connecting every element of V1 to to every element of V2 .
Definition 3.57 (Path Concatenation). Let p1 = (v1 , e1 , v2 , . . . , vn , en , vn+1 ) and let
p2 = (vn+1 , en+1 , vn+2 , . . . , vn+m , en+m , vn+m+1 ). Then the concatenation of path p1 with
path p2 is the path:
p = (v1 , e1 , v2 , . . . , vn , en , vn+1 , en+1 , vn+2 , . . . , vn+m , en+m , vn+m+1 )
Remark 3.58. Path concatenation is illustrated in the proof of Theorem 3.59.
Theorem 3.59. A graph G = (V, E) is bipartite if and only if every cycle in G has even
length.
Proof. (⇒) Suppose G is bipartite. Every cycle begins and ends at the same vertex
and therefore in the same partition, either V1 or V2 . Starting at a vertex v1 ∈ V1 we must
take a walk of length 2 to return to V1 . The same is true if we start at a vertex in V2 . Thus
every cycle must contain an even number of edges in order to return to either V1 or V2 .
(⇐) Suppose that every cycle in G has even length. Without loss of generality, assume
G is connected. We will create a partition of V so that V = V1 ∪ V2 and and V1 ∩ V2 = ∅
and there is no edge between vertices if they are in the same class.
Choose an arbitrary vertex v ∈ V and define:
(3.5) V1 = {v 0 ∈ V : dG (v, v 0 ) ≡ 0 mod 2}
(3.6) V2 = {v 0 ∈ V : dG (v, v 0 ) ≡ 1 mod 2}
32
Clearly V1 and V2 constitute a partition of V . Choose u1 , u2 ∈ V1 and suppose e = {u1 , u2 } ∈
E. The distance from v to u1 is even, so there is a path p1 with an even number of edges
beginning at v and ending at u1 . Likewise the distance from v to u2 is even, so there is a
path p2 beginning at u2 and ending at v with an even number of edges. If we concatenate
paths p1 and the length 1 path q = (u1 , {u1 , u2 }, u2 ) and path p2 we obtain a cycle in G that
has odd length. Therefore, there can be no edge connecting two vertices in V1 .
Choose u1 , u2 ∈ V2 and suppose that e = {u1 , u2 } ∈ E. Using the same argument, there
is a path p1 of odd length from v to u1 and a path p2 of odd length from u2 to v. If we
concatenate paths p1 and the length 1 path q = (u1 , {u1 , u2 }, u2 ) and path p2 we again obtain
a cycle in G that has odd length. Therefore, there can be no edge connecting two vertices
in V2 . These arguments are illustrated in Figure 3.12
u1
v
Both even or odd length paths
u2
Figure 3.12. Illustration of the main argument in the proof that a graph is bipartite
if and only if all cycles have even length.
In the case when G has more than one component, execute the process described above
for each component to obtain partitions V1 , V2 , V3 , V4 , . . . , V2n . Create a bipartition U1 and
U2 of V with:
[n
(3.7) U1 = V2k−1
k=1
[n
(3.8) U2 = V2k
k=1
Clearly there can be no edge connecting a vertex in U1 with a vertex in U2 . This completes
the proof.
7. Acyclic Graphs and Trees
Definition 3.60 (Acyclic Graph). A graph that contains no cycles is called acyclic.
Definition 3.61 (Forests and Trees). Let G = (V, E) be an acyclic graph. If G has
more than one component, then G is called a forest. If G has one component, then G is
called a tree.
Example 3.62. A randomly generated tree with 10 vertices is shown in Figure 3.13.
Note that a tree (if drawn upside down) can be made to look exactly like a real tree growing
up from the ground.
Remark 3.63. We can define directed trees and directed forests as acyclic directed
graphs. Generally speaking, we require the underlying graphs to be acyclic rather than
33
Figure 3.13. A tree is shown. Imagining the tree upside down illustrates the tree
like nature of the graph structure.
connected and acyclic. Therefore it is a spanning tree of G. The theorem then follows by
induction.
Corollary 3.67. Every graph G = (V, E) has a spanning forest F = (V, E 0 ).
Exercise 24. Prove Corollary 3.67.
Definition 3.68 (Leaf). Let T = (V, E). If v ∈ V and deg(v) = 1, then v is called a
leaf of T .
Lemma 3.69. Every tree with one edge has at least two leaves.
Proof. Let:
w = (v1 , e1 , v2 , . . . , vn , en , vn+1 )
be a path of maximal length in T . Consider vertex vn+1 . If deg(vn+1 ) > 1, then there are
two possibilities: (i) there is an edge en+1 and a vertex vn+2 with vn+2 not in the sequence
w. In this case, we can extend w to w0 defined as:
w0 = (v1 , e1 , v2 , . . . , vn , en , vn+1 , en+1 , vn+2 )
which contradicts our assumption that w was maximal in length. (ii) there is an edge en+1
and a vertex vn+2 and for some k ∈ {1, . . . , n}, vn+2 = vk ; i.e., vn+2 is in the sequence w. In
this case, there is a closed sub-walk:
w0 = (vk , ek , vk+1 , . . . , vn+1 , en+1 , vn+2 )
Since w is a path, there are no other repeated vertices in the sequence w0 and thus w0 is a
cycle in T , contradicting our assumption that T was a tree. The reasoning above holds for
vertex v1 as well, thus the two end points of every maximal path in a tree must be leaves.
This completes the proof.
Corollary 3.70. Let G = (V, E) be a graph. If each vertex in V has degree at least 2,
then G contains a cycle.
35
Exercise 25. Prove Corollary 3.70.
Lemma 3.71. Let T = (V, E) be a tree with |V | = n. Then |E| = n − 1.
Proof. We will proceed by induction. For the case when n = 1, this statement must be
true. Now, suppose that the statement is true |V | ≤ n. We will show that when |V | = n + 1,
then |E| = n, assuming that T = (V, E) is a tree. By Lemma 3.69 we know that if T is a tree,
then it contains one component and at least two leaves. Therefore, choose a vertex v ∈ V
that is a leaf in T . There is some edge e = {v 0 , v} ∈ E. Consider the graph T 0 = (V 0 , E 0 )
with: V 0 = V \ {v} and E 0 = E \ {e}. This new graph T 0 must:
(1) have one component since v was connected to only one other vertex v 0 ∈ V and T
had only one component and
(2) by acyclic since T itself was acyclic and we have not introduced new edges to create
a cycle.
Therefore T 0 is a tree with n vertices and by the induction hypothesis it must contain n − 1
edges. Since we removed exactly one edge (and one vertex) to construct T 0 from T it follows
that T had exactly n edges and our originally assumed n + 1 vertices. The required result
follows immediately from induction.
Corollary 3.72. If G = (V, E) is a forest with n vertices, then G has n − c(G) edges.
(Recall c(G) is the number of components in G).
Exercise 26. Prove Corollary 3.72.
Theorem 3.73. A graph G = (V, E) is connected if and only if it has a spanning tree.
Exercise 27. Prove Theorem 3.73.
Theorem 3.74. Let T = (V, E) be a graph with |V | = n. Then the following are
equivalent:
(1) T is a tree.
(2) T is acyclic and has exactly n − 1 edges.
(3) T is connected and has exactly n − 1 edges.
(4) T is connected and every edge is a cut-edge.
(5) Any two vertices of T are connected by exactly one path.
(6) T is acyclic and the addition of any new edge creates exactly one cycle in the resulting
graph.
Proof. (1 =⇒ 2) Assume T is a tree. Then by definition T is acyclic and the fact that
it has n − 1 edges follows from Lemma 3.71.
(2 =⇒ 3) Since T is acyclic, it must be a forest and by Corollary 3.72 |E| = n − c(T ).
Since we assumed that T has n − 1 edges, we must have n − c(T ) = n − 1 and thus the
number of components of T is 1 and thus T must be connected.
(3 =⇒ 4) The fact that T is connected is assumed from 3. Suppose we consider the
graph T 0 = (V, E 0 ) where E 0 = E \{e}. Then the number of edges in T 0 is n−2. The graph T 0
contains n vertices and must still be acyclic (that is a forest) and therefore n − 2 = n − c(T 0 ).
Thus c(T 0 ) = 2 and e was a cut-edge.
(4 =⇒ 5) Choose two vertices v and v 0 in V . The fact that there is a path between v and
v 0 is guaranteed by our assumption that T is connected. By way of contradiction, suppose
36
that there are at least two paths from v to v 0 in T . These two paths must diverge at some
vertex w ∈ V and recombine at some other vertex w0 . (See Figure 3.15.) We can construct
a cycle in T by beginning at vertex w following the first path to w0 and the following the
second path back to w from w0 .
v w w� v�
By Theorem 3.42 removing any edge in this cycle cannot result in a disconnected graph.
Thus, no edge in the constructed cycle in a cut-edge, contradicting our assumption on T .
Thus, two paths connecting v and v 0 cannot exist.
(5 =⇒ 6) The fact that any pair of vertices is connected in T implies T is connected
(i.e., has one component). Now suppose that T has a cycle (like the one illustrated in Figure
3.15). Then it is easy to see there are (at least) two paths connecting w and w0 contradicting
our assumption. Therefore, T is acyclic. The fact that adding an edge creates exactly one
cycle can be seen in the following way: Consider two vertices v and v 0 and suppose the edge
{v, v 0 } is not in E. We know there is a path:
(v, {v, u1 }, u1 , . . . , un , {un , v 0 }, v 0 )
in T connecting v and v 0 and it is unique. Adding the edge {v, v 0 } creates the cycle:
c1 = (v, {v, u1 }, u1 , . . . , un , {un , v 0 }, v 0 , {v, v 0 }, v)
so at least one cycle is created. To see that this cycle is unique, note that if there is another
cycle present then it must contain the edge {v, v 0 }. Suppose that this cycle is:
c2 = (v, {v, w1 }, w1 , . . . , wn , {wn , v 0 }, v 0 , {v, v 0 }, v)
where there is at least one vertex wi not present in the set {u1 , . . . , un } (otherwise the two
cycles are identical). We now see there must be two disjoint paths connecting v and v 0 ,
namely:
(v, {v, u1 }, u1 , . . . , un , {un , v 0 }, v 0 )
and
(v, {v, w1 }, w1 , . . . , wn , {wn , v 0 }, v 0 )
this contradicts our assumption on T . Thus the created cycle is unique.
(6 =⇒ 1) It suffices to show that T has a single component. Suppose not, there are
at least two components of T . Chose two vertices v and v 0 in V so that these two vertices
are not in the same component. Then the edge e = {v, v 0 } is not in E and adding it to E
cannot create a cycle. To see why, not that if T 0 is the graph that results from the addition
of e, then e is now a cut-edge. Applying Corollary 3.43 we see that e cannot lie on a cycle
and thus the addition of this edge does not create a cycle, contradicting our assumption on
37
T . Thus, T must have a single component. Since it is acyclic and connected, T is a tree.
This completes the proof.
Definition 3.75 (Tree-Graphic Sequence). Recall from Definition 2.39 a tuple d =
(d1 , . . . , dn ) is graphic if there exists a graph G with degree sequence d. The tuple d is
tree-graphic if it is both graphic and there exists a tree with degree sequence d.
Theorem 3.76. A degree sequence d = (d1 , . . . , dn ) is tree-graphic if and only if
(1) n = 1 and
n
X
(3.9) di = 2n − 2
i=1
We assume that the degrees are ordered (largest first) and positive. Therefore, d1 ≥ 2
(because otherwise d1 + · · · + dn+1 ≤ n + 1) and that d1 ≤ n (note in the case that d1 = n
we must have d2 = d3 = · · · = dn+1 = 1). Moreover, if d1 = d2 = · · · = dn−1 = 2,
then dn = dn+1 = 1. Since di > 0 for i = 1, . . . , n + 1 from the previous two facts we
see that for any positive value of d1 , we must have dn = dn+1 = 1 in order to ensure that
d1 + d2 + · · · + dn+1 = 2n. Consider the sequence of degrees:
d0 = (d1 − 1, d2 , . . . , dn )
Since dn+1 = 1, we can see that (d1 − 1) + d2 + · · · + dn = 2n − 2. Thus, a permutation of d0 to
correct the order leads to a tree-graphic sequence by the induction hypothesis. Let T 0 be the
tree that results from this tree-graphic degree sequence and let v1 be the vertex with degree
d1 − 1 in T 0 . Then by adding a new vertex vn+1 to T 0 along with edge {v1 , vn+1 } we have
constructed a tree T with the original degree sequence d. That is, clearly this new graph T
must be connected since T 0 was connected and we have connected vn+1 to the vertices in T 0
and it must be acyclic since we have not connected two vertices already in T 0 with the edge
{v1 , vn+1 }. The result follows by induction.
Exercise 28. Prove the necessity part of Theorem 3.76. [Hint: Use Theorem 2.37.]
Remark 3.77. This final theorem characterizes Eulerian graphs and will be useful later.
We use results derived from our study of trees to prove the following theorem.
38
Theorem 3.78. Let G = (V, E) be a non-empty, non-trivial connected graph G. Then
the following are equivalent3:
(1) G is Eulerian.
(2) The degree of every vertex in G is even.
(3) The set E is the union of the edge sets of a collection of edge-disjoint cycles in G.
1 1
6 6 1
2 2
2 2
2
7 7
3
= + 3
4 5 1 1
4 4 5
4
5 5
3
3
Figure 3.16. We illustrate an Eulerian graph and note that each vertex has even
degree. We also show how to decompose this Eulerian graph’s edge set into the
union of edge-disjoint cycles, thus illustrating Theorem 3.78. Following the tour
construction procedure (starting at Vertex 5), will give the illustrated Eulerian tour.
Exercise 29. Show by example that Theorem 3.78 does not necessarily hold if we are
only interested in Eulerian trails.
40
CHAPTER 4
2 3 2 3
4 4
5 6 5 6
G ⇠
6 = G0
Figure 4.1. Two graphs that have identical degree sequences, but are not isomorphic.
but these graphs cannot be isomorphic, since they have different numbers of components.
The same is true with the other graph properties. The equality between a property of G
and that same property for G0 is a necessary criterion for the isomorphism of G and G0 , but
not sufficient. We will not encounter any property of a graph that provides such a necessary
and sufficient condition.(See Remark 4.12).
Theorem 4.9. Suppose G = (V, E) and G0 = (V 0 , E 0 ) are graphs with G ∼ = G0 with
0
f : V → V the graph isomorphism between the graphs. If H is a subgraph of G, then
H 0 = f (H) is a subgraph of G0 . (Here f (H) is the image of the subgraph H under the
isomorphism f .)
Exercise 31. Prove Theorem 4.9. [Hint: The proof does not have to be extensive
in detail. Simply write enough to convince yourself that the isomorphisms preserve the
subgraph property.]
Definition 4.10 (Graph Isomorphism Problem). Given two graphs G = (V, E) and
G = (V 0 , E 0 ) the graph isomorphism problem is to determine whether or not G and G0 are
0
isomorphic.
Definition 4.11 (Subgraph Isomorphism). Given two graphs G = (V, E) and H =
(V , E 0 ) the subgraph isomorphism problem is to determine whether G contains a subgraph
0
that is isomorhic to H.
42
Remark 4.12. In general, the subgraph isomorphism problem is very (very) hard. In
fact, sub-graph isomorphism is a so-called NP-complete problem. (Here the “NP” stands for
non-deterministic turing machine solvable in polynomial time.) This is the class of some of
the hardest practical problems. Interested readers might consider looking at [CLRS01] for
more details.
The graph isomorphism problem (interestingly enough) is a bit of an enigma. We do not
know exactly how hard this problem is to solve. We do know that it is not quite as hard as
the subgraph isomorphism problem. It is worthwhile noting, however, that there is a linear
time algorithm for determining the isomorphism of two trees. (See Page 84 of [AHU74].)
Exercise 32. List some ways to determine that two graphs are not isomorphic. That
is, what are some tests one might do, to see whether two graphs are not isomorphic?
Definition 4.13 (Automorphism). Let G = (V, E) be a graph. An automorphism is
an isomorphism from G to itself. That is, a bijection f : V → V so that for all v1 , v2 ∈ V ,
{v1 , v2 } ∈ E ⇐⇒ {f (v1 ), f (v2 )} ∈ E.
Remark 4.14 (Inverse Automorphism). Recall that an isomorphism (and hence an au-
tomorphism) is a bijective function and hence it has a well defined inverse. That is, if
G = (V, E) is a graph and f : V → V is an automorphism, then if f (v1 ) = f (v2 ), we know
that v1 = v2 (because f is injective). Further, we know that for every v2 ∈ V there is a
(unique) v1 ∈ V so that f (v1 ) = v2 (because f is surjective). Thus, if v2 ∈ V we can define
f −1 (v2 ) to be the unique v1 so that f (v1 ) = v2 .
Lemma 4.15. Let G = (V, E) be a graph. Suppose that f : V → V is an automorphism.
Then f −1 : V → V is also an automorphism.
Proof. The fact that f is a bijection implies that f −1 is itself a bijection. We know for
all v1 and v2 in V that:
{v1 , v2 } ∈ E ⇐⇒ {f (v1 ), f (v2 )} ∈ E
For every vertex pair u1 and u2 in V there are unique vertices v1 and v2 in V so that
u1 = f (v1 ) and u2 = f (v2 ). Furthermore, by the previous observation:
{u1 , u2 } ∈ E ⇐⇒ {v1 , v2 } ∈ E
But this means that for all u1 and u2 in V we have:
(4.2) {f −1 (u1 ), f −1 (u2 )} ∈ E ⇐⇒ {u1 , u2 } ∈ E
Thus f −1 is a bijection that preserves the edge relation. This completes the proof.
Exercise 33. Prove carefully that if f is a bijection then so is f −1 . [Hint: Most of the
proof is in Remark 4.14.]
Lemma 4.16 (Composition). Let G = (V, E) be a graph. Suppose that f : V → V and
g : V → V are automorphisms. Then f ◦ g is also an automorphism.
Exercise 34. Prove Lemma 4.16
Definition 4.17 (Group). A group is a pair (S, ◦) where S is a set and ◦ : S × S → S
is a binary operation so that:
43
(1) The binary operation ◦ is associative; that is, if s1 , s2 and s3 are in S, then (s1 ◦
s2 ) ◦ s3 = s1 ◦ (s2 ◦ s3 ).
(2) There is a unique identity element e ∈ S so that for all s ∈ S, e ◦ s = s ◦ e = s.
(3) For every element s ∈ S there is an inverse element s−1 ∈ S so that s ◦ s−1 =
s−1 ◦ s = e.
If ◦ is commutative, that is for all s1 , s2 ∈ S we have s1 ◦ s2 = s2 ◦ s1 , then (S, ◦) is called a
commutative group (or abelian group).
Example 4.18. This course is not about group theory. If you’re interested in groups in
the more abstract sense, it’s worth considering taking Math 435, which is all about abstract
algebra. One of the simplest examples of a group is the set of integers Z under the binary
operation of addition.
Definition 4.19 (Sub-Group). Let (S, ◦) be a group. A subgroup of (S, ◦) is a group
(T, ◦) so that T ⊆ S. The subgroup (T, ◦) shares the identify of the group (S, ◦).
Example 4.20. Consider the group (Z, +). If 2Z is the set of even integers, then (2Z, +)
is a subgroup of (Z, +) because that even integers are closed under addition.
Theorem 4.21. Let G = (V, E) be a graph. Let Aut(G) be the set of all automorphisms
on G. Then (Aut(G), ◦) is a group.
Proof. By Lemma 4.16, we can see that functional composition is a binary operation
◦ : Aut(G) → Aut(G). Associativity is a property of functional composition, since if f :
V → V and g : V → V and h : V → V it is easy to see that for all v ∈ V :
(4.3) ((f ◦ g) ◦ h)(v) = (f ◦ g)(h(v)) = f (g(h(v))) = f ◦ (g(h(v))) = (f ◦ (g ◦ h))(v)
The identity function e : V → V defined by e(v) = v for all v ∈ V is an automorphism
of V . Finally, by Lemma 4.15, each element of Aut(G) has an inverse. This completes the
proof.
Definition 4.22 (Permutation / Permutation Group). A permutation on a set V =
{1, . . . , n} of n elements is a bijective mapping f from V to itself. A permutation group on
a set V is a set of permutations with the binary operation of functional composition.
Example 4.23. Consider the set V = {1, 2, 3, 4}. A permutation on this set that maps
1 to 2 and 2 to 3 and 3 to 1 can be written as: (1, 2, 3)(4) indicating the cyclic behavior
that 1 → 2 → 3 → 1 and 4 is fixed. In general, we write (1, 2, 3) instead of (1, 2, 3)(4) and
suppress any elements that do not move under the permutation.
For the permutation taking 1 to 3 and 3 to 1 and 2 to 4 and 4 to 2 we write (1, 3)(2, 4)
and say that this is the product of (1, 3) and (2, 4). When determining the impact of a
permutation on a number, we read the permutation from right to left. Thus, if we want to
determine the impact on 2, we read from right to left and see that 2 goes to 4. By contrast,
if we had the permutation: (1, 3)(1, 2) then this permutation would take 2 to 1 first and then
1 to 3 thus 2 would be mapped to 3. The number 1 would be first mapped to 2 and then
stop. The number 3 would be mapped to 1. Thus we can see that (1, 3)(1, 2) has the same
action as the permutation (1, 2, 3).
Definition 4.24 (Symmetric Group). Consider a set V with n elements in it. The
permutation group Sn contains every possible permutation of the set with n elements.
44
Example 4.25. Consider the set V = {1, 2, 3}. The symmetric group on V is the set S3
and it contains the permutations:
(1) The identity: (1)(2)(3)
(2) (12)(3)
(3) (13)(2)
(4) (23)(1)
(5) (123)
(6) (132)
Proposition 4.26. For each n, |Sn | = n!.
Exercise 35. Prove Proposition 4.26
Definition 4.27 (Transposition). A permutation of the form (a1 , a2 ) is called a trans-
position.
Theorem 4.28. Every permutation can be expressed as the product of transpositions.
Proof. Consider the permuation (a1 , a2 , . . . , an ). We may write:
(4.4) (a1 , a2 , . . . , an ) = (a1 , an )(a1 , an−1 ) · · · (a1 , a2 )
Observe the effect of these two permutations on ai . For i 6= 1 and i 6= n, then reading
from right to left (as the permutation is applied) we see that ai maps to a1 , which reading
further right to left is mapped to ai+1 as we expect. If i = 1, then a1 maps to a2 and there
is no further mapping. Finally, if i = n, then we read left to right to the only transposition
containing an and see that an maps to a1 . Thus Equation 4.4 holds. This completes the
proof.
Remark 4.29. The following theorem is useful for our work on matrices in the second
part of this chapter, but its proof is outside the scope of these notes. The interested reader
can see Chapter 2.2 of [Fra99].
Theorem 4.30. No permutation can be expressed as both a product of an even and an
odd number of transpositions.
Definition 4.31 (Even/Odd Permutation). Let σ ∈ Sn be a permutation. If σ can be
expressed as an even number of transpositions, then it is even, otherwise σ is odd. The
signature of the permutation is:
(
−1 σ is odd
(4.5) sgn(σ) =
1 σ is even
Remark 4.32. Let G = (V, E) be a graph. If f ∈ Aut(G), then f is a permutation on
the vertices of G. Thus the graph automorphism group is just a permutation group that
respects vertex adjacency.
Example 4.33. Consider the graph K3 , the complete graph on 3 vertices (see Figure
4.2(a).) The graph K3 has six automorphisms, one for each element in S3 the set of all
permutations on 3 objects. These automorphisms are (i) the identity automorphism that
maps all vertices to themselves, which is the permutation (1)(2)(3); (ii) the automorphism
that exchanges vertex 1 and 2, which is the permutation (1, 2)(3); (iii) the automorphism
45
(1, 2, 3) (1, 3, 2)
1
Counter-clockwise Clockwise
Rotation Rotation
2 3
(1, 3)(2) (1, 2)(3)
(2, 3)(1)
(a) K3 (b) Symmetries
Figure 4.2. The graph K3 has six automorphisms, one for each element in S3
the set of all permutations on 3 objects. These automorphisms are (i) the identity
automorphism that maps all vertices to themselves; (ii) the automorphism that
exchanges vertex 1 and 2; (iii) the automorphism that exchanges vertex 1 and 3;
(iv) the automorphism that exchanges vertex 2 and 3; (v) the automorphism that
sends vertex 1 to 2 and 2 to 3 and 3 to 1; and (vi) the automorphism that sends
vertex 1 to 3 and 3 to 2 and 2 to 1.
that exchanges vertex 1 and 3, which is the permutation (1, 3)(2); (iv) the automorphism
that exchanges vertex 2 and 3, which is the permutation (1)(2, 3); (v) the automorphism
that sends vertex 1 to 2 and 2 to 3 and 3 to 1, which is the permutation (1, 2, 3); and (vi)
the automorphism that sends vertex 1 to 3 and 3 to 2 and 2 to 1, which is the permutation
(1, 3, 2).
Notice that each of these automorphisms is illustrated by a symmetry in the graphical
representation of K3 . The permutations (1, 2)(3), (1, 3)(2), and (2, 3)(1) are flips about
an axis of symmetry, while the permutations (1, 2, 3) and (1, 3, 2) are rotations. This is
illustrated in Figure 4.2(b).
It should be noted, that this method of drawing a graph to find its automorphism group
does not work in general, but for some graphs (like complete graphs or cycle graphs) this
can be useful.
Exercise 36. Characterize the automorphism group of the cycle graph C4 .
Lemma 4.34. The automorphism group of Kn is Sn , thus |Aut(Kn )| = n!.
Exercise 37. Prove Lemma 4.34
Definition 4.35 (Star Graph). A star graph on n + 1 vertices (unfortunately denoted
Sn ) is a graph with vertex set V = {v0 , . . . , vn } and edge set E so that:
e ∈ E ⇐⇒ e = {v0 , vi } i ∈ {1, . . . , n}
Thus the graph Sn has n + 1 vertices and n edges.
Remark 4.36. It is unfortunate that the symmetric group on n items and star graph
with n + 1 vertices have the same representation. We will differentiate between the two
46
explicitly to prevent conclusion. It is also worth noting that some references define the star
graph Sn to have n vertices and n − 1 edges.
Example 4.37. The star graph S3 with 4 vertices and 3 edges is shown in Figure 4.3 as
is the graph S9 .
(a) S3 (b) S9
Exercise 38. Show that the automorphism group of the star graph S3 is also identical
to the symmetric permutation group S3 . As a result, show that two non-isomorphic graphs
can share an automorphism group. (Remember Aut(K3 ) is also the symmetric permutation
group on 3 elements.)
Exercise 39 (Project). Study the problem of graph automorphism in detail. Explore
the computational complexity of determining the automorphism group of a graph or family
of graphs. Explore any automorphism groups for specific types of graphs like cycle graphs,
star graphs, hypercubes etc.
Definition 4.46. Two vectors x and y are orthogonal if x · y = 0. (Here 0 is the zero
in the field over which the vectors are defined.)
Definition 4.47 (Matrix Multiplication). If A ∈ Rm×n and B ∈ Rn×p , then C = AB
is the matrix product of A and B and
(4.7) Cij = Ai· · B·j
Note, Ai· ∈ R1×n (an n-dimensional vector) and B·j ∈ Rn×1 (another n-dimensional vector),
thus making the dot product meaningful.
Example 4.48.
1 2 5 6 1(5) + 2(7) 1(6) + 2(8) 19 22
(4.8) = =
3 4 7 8 3(5) + 4(7) 3(6) + 4(8) 43 50
Definition 4.49 (Matrix Transpose). If A ∈ Rm×n is a m×n matrix, then the transpose
of A dented AT is an m × n matrix defined as:
(4.9) ATij = Aji
Example 4.50.
T
1 2 1 3
(4.10) =
3 4 2 4
The matrix transpose is a particularly useful operation and makes it easy to transform
column vectors into row vectors, which enables multiplication. For example, suppose x is
an n × 1 column vector (i.e., x is a vector in Rn ) and suppose y is an n × 1 column vector.
Then:
(4.11) x · y = xT y
48
Exercise 40. Let A, B ∈ Rm×n . Prove by example that AB 6= BA; that is, matrix
multiplication is not commutative. [Hint: Almost any pair of matrices you pick (that can be
multiplied) will not commute.]
Exercise 41. Let A ∈ Rm×n and let, B ∈ Rn×p . Use the definitions of matrix multipli-
cation and transpose to prove that:
(4.12) (AB)T = BT AT
[Hint: Note that Cij = Ai· · B·j , which moves to the (j, i) position. Now figure out what is
in the (j, i) position of BT AT .]
3. Special Matrices and Vectors
Definition 4.51 (Identify Matrix). The n × n identify matrix is:
1 0 ... 0
0 1 ... 0
(4.13) In = ... . . . ..
.
0 0 ... 1
Definition 4.52 (Zero Matrix). The n × n zero matrix an n × n consisting entirely of
0.
Exercise 42. Let A ∈ Rn×n . Show that AIn = In A = A. Hence, I is an identify for
the matrix multiplication operation on square matrices. [Hint: Do the multiplication out
long hand.]
Definition 4.53 (Symmetric Matrix). Let M ∈ Rn×n be a matrix. The matrix M is
symmetric if M = MT .
Definition 4.54 (Invertible Matrix). Let A ∈ Rn×n be a square matrix. If there is a
matrix A−1 such that
(4.14) AA−1 = A−1 A = In
then matrix A is said to be invertible (or nonsingular ) and A−1 is called its inverse. If A is
not invertible, it is called a singular matrix.
4. Matrix Representations of Graphs
Definition 4.55 (Adjacency Matrix). Let G = (V, E) be a graph and assume that
V = {v1 , . . . , vn }. The adjacency matrix of G is an n × n matrix M defined as:
(
1 {vi , vj } ∈ E
Mij =
0 else
3 4
Let:
Mki· = r1 . . .
(4.17) rn
where rl , (l = 1, . . . , n), is the number of walks of length k from vi to vl by the induction
hypothesis. Let:
b1
This is the total number of walks of length k leading to a vertex vl , (l = 1, . . . , n), from
vertex vi such that there is also an edge connecting vl to vj . Thus Mk+1
ij is the number of
walks of length k + 1 from vi to vj . The result follows by induction.
Example 4.58. Consider the graph in Figure 4.4. The adjacency matrix for this graph
is:
0 1 1 1
1 0 0 1
(4.20) M=
1
0 0 1
1 1 1 0
50
Consider M2 :
3 1 1 2
1 2 2 1
(4.21) M2 =
1
2 2 1
2 1 1 3
This tells us that there are three distinct walks of length 2 from vertex v1 to itself. These
walks are obvious:
(1) (v1 , {v1 , v2 }, v2 , {v1 , v2 }, v1 )
(2) (v1 , {v1 , v2 }, v3 , {v1 , v3 }, v1 )
(3) (v1 , {v1 , v4 }, v4 , {v1 , v4 }, v1 )
We also see there is 1 path of length 2 from v1 to v2 : (v1 , {v1 , v4 }, v4 , {v2 , v4 }, v2 ). We can
verify each of the other numbers of paths in M2 .
Exercise 44. Devise an inefficient test for isomorphism between two graphs G and G0
using their adjacency matrix representations. Assume it takes 1 time unit to test whether
two n × n matrices are equal. What is the maximum amount of time your algorithm takes to
determine that G ∼6= G0 ? [Hint: Continue to re-order the vertices of G0 and test the adjacency
matrices for equality.]
Definition 4.59 (Directed Adjacency Matrix). Let G = (V, E) be a directed graph and
assume that V = {v1 , . . . , vn }. The adjacency matrix of G is an n × n matrix M defined as:
(
1 (vi , vj ) ∈ E
Mij =
0 else
Theorem 4.60. Let G = (V, E) be a digraph with V = {v1 , . . . , vn } and let M be its
adjacency matrix. For k ≥ 0, the (i, j) entry of Mk is the number of directed walks of length
k from vi to vj .
Exercise 45. Prove Theorem 4.60. [Hint: Use the approach in the proof of Theorem
4.57.]
Definition 4.61 (Incidence Matrix). Let G = (V, E) be a graph with V = {v1 , . . . , vm }
and E = {e1 , . . . , en }. Then the incidence matrix of G is an m × n matrix A with:
0 if vi is not in ej
(4.22) Aij = 1 if vi is in ej and ej is not a self-loop
2 if v is in e and e is a self-loop
i j j
Remark 4.64. The adjacency matrices of simple directed graphs (those with no self-
loops) have very useful properties, which we will come to when we study network flows. In
particular, these matrices have the property that every square sub-matrix has a determinant
that is either 1, -1 or 0. This property is called total unimodularity and it is particularly
important in the analysis of network flows.
Here σ ∈ Sn represents a permutation over the set {1, . . . , n} and σ(i) represents the value
to which i is mapped under σ.
Example 4.66. Consider an arbitrary 2 × 2 matrix:
a b
M=
c d
There are only two permutations in the set S2 : the identity permutation (which is even) and
the transposition (1, 2) which is odd. Thus, we have:
a b
det(M) = = M11 M22 − M12 M21 = ad − bc
c d
This is the formula that one would expect from a course in matrices (like Math 220).
Definition 4.67 (Eigenvalue and (Right) Eigenvector). Let M ∈ Rn×n . An eigenvalue,
eigenvector pair (λ, x) is a scalar and n × 1 vector such that:
(4.25) Mx = λx
52
Remark 4.68. A left eigenvector is defined analogously with xT M = λxT , when x is
considered a column vector. We will deal exclusively with right eigenvectors and hence when
we say “eigenvector” we mean a right eigenvector.
Definition 4.69 (Characteristic Polynomial). If M ∈ Rn×n then its characteristic poly-
nomial is:
(4.26) det (λIn − M)
Remark 4.70. The following theorem is useful for computing eigenvalues of small ma-
trices and defines the characteristic polynomial for a matrix. Its proof is outside the scope
of these notes, but would occur in a Math 436 class. (See Chapter 8.2 of [Lan87].)
Theorem 4.71. A value λ is an eigenvalue for M ∈ Rn×n if and only if it satisfies the
characteristic equation:
det (λIn − M) = 0
Furthermore, M and MT share eigenvalues.
Example 4.72. Consider the matrix:
1 0
M=
0 2
The characteristic polynomial is computed as:
λ − 1 0
det (λIn − M) = = (λ − 1)(λ − 2) − 0 = 0
0 λ − 2
Thus the characteristic polynomial for this matrix is:
(4.27) λ2 − 3λ + 2
The roots of this polynomial are λ1 = 1 and λ2 = 2. Using these eigenvalues, we can compute
eigenvectors:
1
(4.28) x1 =
0
0
(4.29) x2 =
1
and observe that:
1 0 1 1
(4.30) Mx1 = =1 = λ1 x1
0 2 0 0
and
1 0 0 0
(4.31) Mx2 = =2 λx
0 2 1 1 2 2
as required. Computation of eigenvalues and eigenvectors is usually accomplished by com-
puter and several algorithms have been developed. Those interested readers should consult
(e.g.) [Dat95].
53
Example 4.73. You can use Matlab to compute the eigenvalues of a matrix using the
eig command. The same command can also return the eigenvalues (as the diagonals of
a matrix) and the corresponding eigenvectors in a second matrix. An example is shown
in Figure 4.5. This command will return the eigenvalues when used as: d = eig(A) and
Figure 4.5. Computing the eigenvalues and eigenvectors of a matrix in Matlab can
be accomplished with the eig command. This command will return the eigenvalues
when used as: d = eig(A) and the eigenvalues and eigenvectors when used as [V
D] = eig(A). The eigenvectors are the columns of the matrix V.
the eigenvalues and eigenvectors when used as [V D] = eig(A). The eigenvectors are the
columns of the matrix V.
Remark 4.74. It is important to remember that eigenvectors are unique up to scale.
That is, if M is a square matrix and (λ, x) is an eigenvalue eigenvector pair for M, then so
is (λ, αx) for α 6= 0. This is because:
(4.32) Mx = λx =⇒ M(αx) = λ(αx)
Definition 4.75 (Degenerate Eigenvalue). An eigenvalue is degenerate if it is a multiple
root of the characteristic polynomial. The multiplicity of the root is the multiplicity of the
eigenvalue.
Example 4.76. Consider the identify matrix I2 . It has characteristic polynomial (λ−1)2 ,
which has one multiple root 1. Thus λ = 1 is a degenerate eigenvalue for this matrix.
However, this matrix does have two eigenvectors [1 0]T and [0 1]T .
54
Remark 4.77. The theory of eigenvalues and eigenvectors of matrices is deep and well
understood. A substantial part of this theory should be covered in Math 436, for those
interested. We will use only a few result in our study of graphs. The following results are
proved in Chapter 8 of [GR01]. Unfortunately, the proofs are well outside the scope of the
class.
Theorem 4.78 (Spectral Theorem for Real Symmetric Matrices). Let M ∈ Rn×n be a
symmetric matrix. Then the eigenvalues of M are all real.
Remark 4.80. The following theorem follows from the Spectral Theorem for Real Sym-
metric Matrices and the Rational Root Theorem.
Exercise 47 (Project). Prove the Spectral Theorem for Real Symmetric Matrics and
then use it to obtain Part 1 of Theorem 4.81. Then prove and apply Lemma 4.79 to prove
Part 2 of Theorem 4.81. You should discuss the proof of the Spectral Theorem for Hermitian
Matrices. [Hint: All these proofs are available in references or online, expand on these sources
in your own words.]
Remark 4.82. Two graphs that are not isomorphic can have the same set of eigenvalues.
This can be illustrated through an example that can be found in Chapter 8 of [GR01]. The
graphs are shown in Figure 4.6. We can see the two graphs are not isomorphic since there
is no vertex in Graph G1 that has a degee of 6 unlike Vertex 7 of graph G2 . The adjacency
Figure 4.6. Two graphs with the same eigenvalues that are not isomorphic are
illustrated.
57
CHAPTER 5
Remark 5.1. In this chapter, we’re going to explore two applications of Algebraic Graph
Theory: Eigenvector Centrality and Page-Rank. The goal is to devise ways for ranking the
vertices of a graph. This topic is actually very important. Google uses Page-Rank to rank
the search results they return. Social scientists have used eigenvector centrality as a way of
determining leaders in organizations. We’ll first review a key set of definitions from Linear
Algebra and then discuss eigenvector centrality. We then move on to Markov chains and
Page-Rank.
1. Basis of Rn
Remark 5.2. We will be considering the field R and vectors defined over it. By this, we
mean n × 1 matrices, which are just column vectors. Therefore, by a vector in Rn we really
mean a matrix x ∈ Rn×1 .
Definition 5.3. Let x1 , . . . , xm be vectors in ∈ Rn and let α1 , . . . , αm ∈ R be scalars.
Then
(5.1) α1 x1 + · · · + αm xm
is a linear combination of the vectors x1 , . . . , xm .
Definition 5.4 (Span). Let X = {x1 , . . . , xm } be a set of vectors in ∈ Rn , then the
span of X is the set:
(5.2) span(X ) = {y ∈ Rn |y is a linear combination of vectors in X }
Definition 5.5 (Linear Independence). Let x1 , . . . , xm be vectors in ∈ Rn . The vectors
x1 , . . . , xm are linearly dependent if there exists α1 , . . . , αm ∈ R, not all zero, such that
(5.3) α1 x1 + · · · + αm xm = 0
If the set of vectors x1 , . . . , xm is not linearly dependent, then they are linearly independent
and Equation 5.3 holds just in case αi = 0 for all i = 1, . . . , n.
Example 5.6. In R3 , consider the vectors:
1 1 0
x1 = 1 , x2 = 0 , x3 = 1
0 1 1
We can show these vectors are linearly independent: Suppose there are values α1 , α2 , α3 ∈ R
such that
α1 x1 + α2 x2 + α3 x3 = 0
59
Then:
α1 α2 0 α1 + α2 0
α1 + 0 α3 = α1 + α3 = 0
0 α2 α3 α2 + α3 0
Thus we have the system of linear equations:
α1 +α2 =0
α1 + α3 = 0
α2 + α3 = 0
Solving this problem yields the unique solution: α1 = α2 = α3 = 0. Thus these vectors are
linearly independent.
Definition 5.7 (Basis). Let X = {x1 , . . . , xm } be a set of vectors in Rn . The set X is
called a basis of Rn if X is a linearly independent set of vectors and every vector in Rn is
in the span of X . That is, for any vector w ∈ Rn we can find scalar values α1 , . . . , αm such
that
Xm
(5.4) w= αi xi
i=1
This leads to n equations, one for vertex in V (or each row of M). Written as a matrix
expression we have:
1
(5.6) x= Mx =⇒ λx = Mx
λ
Thus x is an eigenvector of M and λ is its eigenvalue.
Clearly, there may be several eigenvectors and eigenvalues for M. The question is, which
eigenvalue / eigenvector pair should be chosen? The answer is to choose the eigenvector with
all positive entries corresponding to the largest eigenvalue. We know such an eigenvalue /
eigenvector pair exists and is unique as a result of the Perron-Frobenius Theorem and Lemma
4.84.
Theorem 5.14. Let G = (V, E) be a connected graph with adjacency matrix M ∈ Rn×n .
Suppose that λ0 is the largest real eigenvalue of M and has corresponding eigenvalue v0 . If
x ∈ Rn×1 is a column vector so that x · v0 6= 0, then
Mk x
(5.7) lim = α0 v0
k→∞ λk
0
61
Proof. Applying Theorem 5.12 we see that the eigenvectors of M must form a basis for
Rn . Thus, we can express:
(5.8) x = α0 v0 + α1 v1 + · · · + αn−1 vn−1
Multiplying both sides by Mk yields:
(5.9) Mk x = α0 Mk v0 +α1 Mk v1 +· · ·+αn−1 Mk vn−1 = α0 λk0 v0 +α1 λk1 v1 +· · ·+αn−1 λkn vn−1
because Mk vi = λki vi for any eigenvalue vi . Dividing by λk0 yields:
Mk x λk1 λkn−1
(5.10) = α v
0 0 + α 1 k 1v + · · · + α n−1 vn−1
λk0 λ0 λk0
Applying the Perron-Frobenius Theorem (and Lemma 4.84) we see that λ0 is greater than
the absolute value of any other eigenvalue and thus we have:
λki
(5.11) lim =0
k→∞ λk0
for i 6= 0. Thus:
Mk x
(5.12) lim = α0 v0
k→∞ λk
0
Remark 5.15. We can use Theorem 5.14 to justify our definition of eigenvector centrality
as the eigenvector corresponding to the largest eigenvalue. Let x be a vector with a 1 at
index i and 0 everywhere else. This vector corresponds to beginning at vertex vi in graph
G with n vertices. If M is the adjacency matrix, then Mx is the ith column of M whose
j th index tells us the number of walks of length 1 leading from vertex vj to vertex vi and by
symmetry the number of walks leading from vertex vi to vertex vj . We can repeat this logic
to see that Mk x gives us a vector of whose j th element is the number of walks of length k
from vi to vj . Note for the remainder of this discussion, we will exploit the symmetry that
the (i, j) element of M k is both the number of walks from i to j and the number of walks
from j to i.
From Theorem 5.14 we know that no matter which vertex we choose in creating x that:
Mk x
(5.13) lim = α0 v0
k→∞ λ0
62
The probability of ending up at vertex j then is the j th element of the vector T1 Mk x. Put
more intuitively, if we start at a vertex and begin wandering along the edges of the graph at
random, the probability of ending up at vertex j after k steps is the j th element of T1 Mk x.
If we let: |x|1 be the sum of the components of a vector x, then from Equation 5.13 we
may deduce:
Mk x v0
(5.15) lim =
k→∞ |Mk x|1 |v0 |1
Thus, eigenvector centrality tells us a kind of probability of ending up at a specific vertex
j assuming we are allowed to wander along the edges of a graph (from any starting vertex)
for an infinite amount of time. Thus, the more central a vertex is, the more likely we will
arrive at it as we move through the edges of the graph. More importantly, we see the
self-referential nature of eigenvector centrality: the more likely we are to arrive at a given
vertex after walking along the edges of a graph, the more likely we are to arrive at one of its
neighbors. Thus, eigenvector centrality is a legitimate measure of vertex importance if one
wishes to measure the chances of ending up at a vertex when wondering around a graph.
We will discuss this type of model further when we investigate random walks on graph.
Example 5.16. Consider the graph shown in Figure 5.1. Recall from Example 4.58 this
1 2
3 4
Figure 5.1. A matrix with 4 vertices and 5 edges. Intuitively, vertices 1 and 4
should have the same eigenvector centrality score as vertices 2 and 3.
for all v ∈ V . Here, No (v) is the neighborhood reachable by out-edge from v. If there is no
edge (v, v 0 ) ∈ E then p(v, v 0 ) = 0.
Remark 5.19. There are continuous time Markov chains, but these are not in the scope
of these notes. When we say Markov chain, we mean discrete time Markov chain.
Example 5.20. A simple Markov chain is shown in Figure 5.2. We can think of a
Markov chain as governing the evolution of state as follows. Think of the states as cities
with airports. If there is an out-edge connecting the current city to another city, then we
can fly from our current city to this next city and we do so with some probability. When
we do fly (or perhaps don’t fly and remain at the current location) our state updates to the
next city. In this case, time is treated discretely.
1
2
1 6
1 2
2 7
1
7
Figure 5.2. A Markov chain is a directed graph to which we assign edge proba-
bilities so that the sum of the probabilities of the out-edges at any vertex is always
1.
A walk along the vertices of a Markov chain governed by the probability function is called
a random walk.
Definition 5.21 (Stochastic Matrix). Let M = (G, p) be a Markov chain. Then the
stochastic matrix (or probability transition matrix) of M is:
(5.17) Mij = p(vi , vj )
Example 5.22. The stochastic matrix for the Markov chain in Figure 5.2 is:
1 1
M = 21 62
7 7
Thus a stochastic matrix is very much like an adjacency matrix where the 0’s and 1’s indi-
cating the presence or absence of an edge are replaced by the probabilities associated to the
edges in the Markov chain.
Definition 5.23 (State Probability Vector). If M = (G, p) be a Markov chain with n
states (vertices) then a state probability vector is a vector x ∈ Rn×1 such that x1 + x2 + · · · +
xn = 1 and xi ≥ 0 for i = 1, . . . , n and xi represents the probability that we are in state i
(at vertex i).
65
Remark 5.24. The next theorem can be proved in exactly the same way that Theorem
4.57 is proved.
Theorem 5.25. Let M = (G, p) be a Markov chain with n states (vertices). Let x(0) ∈
n×1
R be an (initial) state probability vector. Then assuming we take a random walk of length
k in M using initial state probability vector x(0) , the final state probability vector is:
k
(5.18) x(k) = MT x(0)
Remark 5.26. If you prefer to remove the transpose, you can write x(0) ∈ R1×n ; that is,
x(0) is a row vector. Then:
(5.19) x(k) = x(0) Mk
with x(k) ∈ R1×n .
Exercise 49. Prove Theorem 5.25. [Hint: Use the same inductive argument from the
proof of Theorem 4.57.]
Example 5.27. Consider the Markov chain in Figure 5.2. The state vector:
(0) 1
x =
0
states that we will start in State 1 with probability 1. From Example 5.22 we know what
M is. Then it is easy to see that:
k
x(1) = MT x(0) = 21 12
Which is precisely the state probability vector we would expect after a random walk of length
1 in M.
Definition 5.28 (Stationary Vector). Let M = (G, p) be a Markov chain. Then a
vector x∗ is stationary for M if
(5.20) x∗ = MT x∗
Remark 5.29. Expression 5.20 should look familiar. It says that MT has an eigenvalue
of 1 and a corresponding eigenvector whose entries are all non-negative (so that the vector
can be scaled so its components sum to 1). Furthermore, this looks very similar to the
equation we used for eigenvector centrality.
Lemma 5.30. Let M = (G, p) be a Markov chain with n states and with stochastic matrix
M. Then:
X
(5.21) Mij = 1
j
for all i = 1, . . . , n.
Exercise 50. Prove Lemma 5.30.
Lemma 5.31. M = (G, p) be a Markov chain with n states and with stochastic matrix
M. If G is strongly connected, then M and MT are irreducible.
66
Proof. If G is strongly connected, then there is a directed walk from any vertex vi to
any other vertex vj in V , the vertex set of G. Consider any length k walk connecting vi to
vj (such a walk exists for some k). Let ei be the vector with 1 in its ith component and 0
everywhere else. Then (MT )k ei is the final state probability vector associated with a walk
of length k starting at vertex vi . Since there is a walk of length k from vi to vj , we know
that the j th element of this vector must be non-zero. That is:
eTj (MT )k ei > 0
where ej is defined just as ei is but with the 1 at the j th position. Thus, (MT )kij > 0 for
some k for every (i, j) pair and thus MT is irreducible. The fact that M is irreducible follows
immediately from the fact that (M T )k = (M k )T . This completes the proof.
Theorem 5.32 (Perron-Frobenius Theorem Redux). If M is an irreducible matrix, then
M has an eigenvalue λ0 with the following properties:
(1) The eigenvalue λ0 is positive and if λ is an alternative eigenvalue of M, then λ0 ≥
|λ|,
(2) The matrix M has an eigenvectors v0 corresponding to λ0 with only positive entries,
(3) The eigenvalue λ is a simple root of the characteristic equation for M and therefore
has a unique (up to scale) eigenvectors v0 .
(4) The eigenvector v0 is the only eigenvector of M that can have all positive entries
when properly scaled.
(5) The following inequalities hold:
X X
min Mij ≤ λ0 ≤ max Mij
i i
j j
Therefore, by the squeezing lemma λ0 = 1. The fact that MT has exactly one strictly
positive eigenvector v0 corresponding to λ0 = 1 means that:
(5.22) MT v0 = v0
Thus v0 is the unique stationary state probability vector for M = (G, p). This completes
the proof.
67
4. Page Rank
Definition 5.34 (Induced Markov Chain). Let G = (V, E) be a graph. Then the induced
Markov chain from G is the one obtained by defining a new directed graph G0 = (V, E 0 ) with
each edge {v, v 0 } ∈ E replaced by two directional edges (v, v 0 ) and (v 0 , v) in E and defining
the probability function p so that:
1
(5.23) p(v, v 0 ) =
degoutG0 v
Example 5.35. An induced Markov chain is shown in Figure 5.3. The Markov chain in
1
4 1 4 1
1/3
1/3 1/3
1/2
1/2
1/2
3 2 3 2
1/2
and we would be correct. When this convergence happens quickly (where we leave quickly
poorly defined) the graph is said to have a fast mixing property.
If we used the stationary probability of a vertex in the induced Markov chain as a measure
of importance, then clearly vertex 1 would be most important followed by vertices 2 and 3
and lastly vertex 4. We can compare this with the eigenvector centrality measure, which
assigns a rank vector of:
0.3154488065
0.2695944375
x+ = 0.2695944375
0.1453623195
Thus eigenvector centrality gives the same ordinal ranking as using the stationary state
probability vector, but there are subtle differences in the values produced by these two
ranking schemes. This leads us to PageRank [BP98].
68
Remark 5.36. Consider a collection of web pages each with links. We can construct a
directed graph G with the vertex set V consisting of the we web pages and E consisting of
the directed links among the pages. Imagine a random web surfer who will click among these
web pages following links until a dead-end is reached (a page with no outbound links). In
this case, the web surfer will type a new URL in (chosen from the set of web pages available)
and the process will continue.
From this model, we can induce a Markov chain in which we define a new graph G0 with
edge set E 0 so that if v ∈ V has out-degree 0, then we create an edge in E 0 to every other
vertex in V and we then define:
1
(5.24) p(v, v 0 ) =
degoutG0 v
exactly as before. In the absence of any further insight, the PageRank algorithm simply
assigns to each web page a score equal to the stationary probability of its state in the
induced Markov chain. For the remainder of this remark, let M be the stochastic matrix of
the induced Markov chain.
In general, however, PageRank assumes that surfers will get bored after some number
of clicks (or new URL’s) and will stop (and move to a new page) with some probability
d ∈ [0, 1] called the damping factor. This factor is usually estimated. Assuming there are n
web pages, let r ∈ Rn×1 be the PageRank score for each page. Taking boredom into account
leads to a new expression for rank (similar to Equation 5.5 for Eigenvector centrality):
n
!
1−d X
(5.25) ri = +d Mji rj for i = 1, . . . , n
n j=1
Here the d term acts like a damping factor on walks through the Markov chain. In essence,
it stalls people as they walk, making it less likely a searcher will keep walking forever. The
original System of Equations 5.25 can be written in matrix form as:
1−d
(5.26) r = 1 + dMT r
n
where 1 is a n × 1 vector consisting of all 1’s. It is easy to see that when d = 1 r is precisely
the stationary state probability vector for the induced Markov chain. When d 6= 1, r is
usually computed iteratively by starting with an initial value of ri0 = 1/n for all i = 1, . . . , n
and computing:
1−d
(k)
r = 1 + dMT r(k−1)
n
The reason is that for large n, the analytic solution:
1−d
T −1
(5.27) r = In − dM 1
n
is not computationally tractable1.
1Note,
−1
In − dMT computes a matrix inverse, which we reviewed briefly in Chapter 4. We should
note that for stochastic matrices, this inverse is guaranteed to exist. For those interested, please consult and
of [Dat95, Lan87, Mey01].
69
Example 5.37. Consider the induced Markov chain in Figure 5.3 and suppose we wish
to compute PageRank on these vertices with d = 0.85 (which is a common assumption). We
might begin with:
1
4
(0)
1
r = 4
1
4
1
4
We would then compute:
0.462499999999999967
0.214583333333333320
1−d
(1) T (0)
r = 1 + dM r =
n 0.214583333333333320
0.108333333333333337
We would repeat this again to obtain:
0.311979166666666641
0.259739583333333302
1−d
(2) T (1)
r = 1 + dM r =
n 0.259739583333333302
0.168541666666666673
This would continue until the difference between in the values of r(k) and r(k−1) was small.
The final solution would be close to the exact solution:
0.366735867135100591
0.245927818588310476
∗
r =
0.245927818588310393
0.141408495688278513
Note this is (again) very close to the stationary probabilities and the eigenvector centralities
we observed earlier. This vector is normalized so that all the entries sum to 1.
Exercise 51. Consider the Markov chain shown below:
1/2
4 1
1/3
1/3 1/3
1/2
1/2
1/2 1/2
3 2
1/2
Suppose this is the induced Markov chain from 4 web pages. Compute the page-rank of
these web pages using d = 0.85.
Exercise 52. Find an expression for r(2) in terms of r(0) . Explain how the damping
factor occurs and how it decreases the chance of taking long walks through the induced
Markov chain. Can you generalize your expression for r(2) to an expression for r(k) in terms
of r(0) ?
70
CHAPTER 6
Example 6.2. Figure 6.1 shows the order the vertices are added to w during a breadth
first search of the tree.
Proposition 6.3. A breadth first search of a tree T = (V, E) enumerates all vertices in
w.
Proof. We proceed by induction. If T has one vertex, then clearly v0 in the algorithm
is that vertex. The vertex is added to w in the first iteration of the while loop at Line 1 and
71
a 1
b c 2 3
d e 4 5
Figure 6.1. The breadth first walk of a tree explores the tree in an ever widening
pattern.
Fnext is the empty set, thus the algorithm terminates. Now suppose that the statement is
true for all trees with at most n vertices. We will show the statement is true for a tree with
n + 1 vertices. To see this, construct a new tree T 0 in which we remove a leaf vertex v 0 from
T . Clearly the algorithm must enumerate every vertex in T 0 and therefore there is a point
in which we reach Line 3 with some vertex v that is adjacent to v 0 in T . At this point, v 0
would be added to Fnext and it would be added to w in the next execution through the while
loop since F 6= ∅ the next time. Thus, every vertex of T must be enumerated in w. This
completes the proof.
= (b, c) Remark 6.4. Breadth First Search can be modified for directed trees in the obvious way.
Necessarily, we need v0 to be strongly connected to every other vertex in order to ensure
that BFS enumerates every possible vertex.
Remark 6.5. Another algorithm for enumerating the vertices of a tree is the depth first
search algorithm. This algorithm works by descending into the tree as deeply as possible
(until a leaf is identified) and then working back up. We present Depth First Search as a
recursive algorithm.
Recurse
Input: T = (V, E) a tree, vnow current vertex, w the sequence
(1) for each v ∈ V do
(2) if {vnow , v} ∈ E and v 6∈ w then
(3) Append v to w
(4) Recurse(T, v, w)
(5) end if
(6) end for
Algorithm 2. Depth First Search
72
1
2 5
3 4
Figure 6.2. The depth first walk of a tree explores the tree in an ever deepening
pattern.
Example 6.6. Figure 6.2 shows the order the vertices are added to w during a depth
first search of the tree.
Proposition 6.7. A depth first search of a tree T = (V, E) enumerates all vertices in
w.
Exercise 53. Prove proposition 6.7. [Hint: The proof is almost identical to the proof
for Breadth First Search.]
Remark 6.8. We note that breadth and depth first search can be trivially modified to
search through connected graph structures and construct spanning trees for these graphs.
We also note that BFS and DFS can be modified to function on directed trees (and graphs)
and that all vertices will be enumerated provided that every vertex is reachable by a directed
path from v0 .
Remark 6.9. In terms of implementation, we note that the recursive implementation of
Depth First Search works on most computing systems provided the graph your are searching
has a longest path of at most some specified value. This is because most operating systems
prevent a recursive algorithm from making any more than a specified number of recursion
calls.
Remark 6.10. We can also build a spanning tree using a breadth first search on a graph.
These algorithms are shown in Algorithms 3 and 4. Notice that instead of just appending
vertices to w we also grow a tree that will eventually span the input graphs G (just in case
G is connected).
Example 6.11. We illustrate a breadth first spanning tree construction in Figure 6.3.
We also illustrate a depth first spanning tree in Figure 6.4
Exercise 54. Show that a breadth first spanning tree returns a tree with the property
that the walk from v0 to any other vertex has the smallest length.
Recurse
Input: G = (V, E) a graph, T = (V, E 0 ) a tree, vnow current vertex, w the sequence
(1) for each v ∈ V do
(2) if {vnow , v} ∈ E and v 6∈ w then
(3) Append v to w
(4) Add {vnow , v} to E 0
(5) Recurse(T, v, w)
(6) end if
(7) end for
Algorithm 4. Depth First Search Spanning Tree
Example 6.14. Consider the graph shown in Figure 6.5. A weighted graph is simply
a graph with a real number (the weight) assigned to each edge. Weighted graphs arise in
several instances, such as travel planning, communications and military planning.
74
Step 1 Step 2
Output
Step 3
Figure 6.4. The construction of a depth first spanning tree is a straightforward way
to construct a spanning tree of a graph or check to see if its connected. However, this
method can be implemented with a recursive function call. Notice this algorithm
yields a different spanning tree from the BFS.
Remark 6.15. Any graph can be thought of as a weighted graph in which we assign the
weight of 1 to each edge. The distance between two vertices in a graph can then easily be
generalized in a weighted graph. If p = (v1 , e1 , v2 , . . . , vn , en , vn+1 ) is a path, then the weight
75
3
4 -1 6
5
3
7
Figure 6.5. A weighted graph is simply a graph with a real number (the weight)
assigned to each edge.
Thus in a weighted graph, the distance between two vertices v1 and v2 is the weight of the
weight of the least weight path connecting v1 and v2 . We study the problem of finding this
distance in Section 5.
Definition 6.16 ((Sub)Graph Weight). Let (G, w) be a weighted graph with G = (V, E).
If H = (V 0 , E 0 ) is a subgraph of G, then the weight of H is:
X
w(H) = w(e)
e∈E 0
Definition 6.17 (Minimum Spanning Forrest Problem). Let (G, w) be a weighted graph
with G = (V, E). The minimum spanning forest problem for G is to find a forest F = (V 0 , E 0 )
that is a spanning subgraph of G that has the smallest possible weight.
Remark 6.18. If (G, w) is a weighted graph and G is connected, then the minimum
spanning forest problem becomes the minimum spanning tree problem.
Example 6.19. A minimum spanning tree for the weighted graph shown in Figure 6.5 is
shown in Figure 6.6. In the minimum spanning tree problem, we attempt to find a spanning
4 -1 6
5
3
7
Figure 6.6. In the minimum spanning tree problem, we attempt to find a spanning
subgraph of a graph G that is a tree and has minimal weight (among all spanning
trees).
subgraph of a graph G that is a tree and has minimal weight (among all spanning trees).
76
We will verify that the proposed spanning tree is minimal when we derive algorithms for
constructing a minimum spanning forest.
Remark 6.20. The next algorithm, commonly called Prim’s Algorithm [Pri57] will
construct a minimum spanning tree for a connected graph.
Prim’s Algorithm
Input: (G, w) a weighted connected graph with G = (V, E), v0 a starting vertex
Initialize: E 0 = ∅ {The edge set of the spanning tree.}
Initialize: V 0 = {v0 } {New vertices added to the spanning tree.}
(1) while V 0 6= V
(2) Set X := V \ V 0
(3) Choose edge e = {v, v 0 } so (i) v ∈ V 0 ; (ii) v 0 ∈ X and:
w(e) = min0 w {u, u0 }
u∈U,u ∈X
Example 6.21. We illustrate the successive steps of Prim’s Algorithm in Figure 6.7.
At the start, we initialize our set V 0 = {1} and the edge set is empty. At each successive
iteration, we add an edge that connects a vertex in V 0 with a vertex not in V 0 that has
minimum weight. Note at Iteration 2, we could have chosen to add either edge {1, 3} or
edge {4, 6} the order doesn’t matter, so any tie breaking algorithm will suffice. We continue
adding edges until all vertices in the original graph are in the spanning tree.
Theorem 6.22. Let (G, w) be a weighted connected graph. Then Prim’s algorithm returns
a minimum spanning tree.
Proof. We will show by induction that at each iteration of Prim’s algorithm, the tree
(V 0 , E 0 ) is a subtree of a minimum spanning tree T of (G, w). If this is the case, then at the
termination of the algorithm, (V 0 , E 0 ) must be equal to the minimum spanning tree T .
To establish the base case, not that at the first iteration V 0 = {v0 } and E 0 = ∅ and
therefore (V 0 , E 0 ) must be a subtree of T a minimum spanning tree of (G, w). Now, suppose
that the statement is true for all iterations up to and including k and let Tk = (V 0 , E 0 ) at
iteration k. Suppose at iteration k + 1 we add edge e = {v, v 0 } to Tk to obtain Tk+1 = (U, F )
with U = V 0 ∪ {v 0 } and F = E 0 = E ∪ {e}. Suppose that Tk+1 is not a subtree of T , then e
is not an edge in T and thus e must generate a cycle in T . On this cycle, there is some edge
e0 = {u, u0 } with u ∈ V 0 and u0 6∈ V 0 . At iteration k + 1, we must have considered adding
this edge to E 0 but by selection of e, we know that w(e) ≤ w(e0 ) and thus if we construct
T 0 from T by removing e0 and adding e, we know that T 0 must span G (this is illustrated
in Figure 6.8) and w(T 0 ) ≤ w(T ) thus T 0 is a minimum spanning tree of G and Tk+1 is a
subtree of T 0 . The result follows by induction.
77
v0
1 3 3 1 3 3
4 -1 6 4 -1 6
4 4
2 2
5 5
3 3
5 5
7 7
6 6
V � = {1} V � = {1, 4}
E � = {} E � = {{1, 4}}
Initialization Iteration 1
1 3 3 1 3 3
4 -1 6 4 -1 6
4 4
2 2
5 5
3 3
5 5
7 7
6 6
V � = {1, 3, 4} V � = {1, 3, 4, 6}
E � = {{1, 4}, {1, 3}} E � = {{1, 4}, {1, 3}, {4, 6}}
Iteration 2 Iteration 3
1 3 3 1 3 3
4 -1 6 4 -1 6
4 4
2 2
5 5
3 3
5 5
7 7
6 6
V � = {1, 2, 3, 4, 6} V � = {1, 2, 3, 4, 5, 6}
E � = {{1, 4}, {1, 3}, {4, 6}, {1, 4}} E � = {{1, 4}, {1, 3}, {4, 6}, {1, 4}, {4, 5}}
e e
e' e'
T T�
Figure 6.8. When we remove an edge (e0 ) from a spanning tree we disconnect the
tree into two components. By adding a new edge (e) edge that connects vertices in
these two distinct components, we reconnect the tree and it is still a spanning tree.
78
Exercise 55. Use Prim’s algorithm to find a minimum spanning tree for the graph
shown below:
Exercise 56. Modify Algorithm 5 so that it returns a minimum spanning forest when
G is not connected. Prove your algorithm works.
Kruskal’s Algorithm
Input: (G, w) a weighted connected graph with G = (V, E) and n = |V |
Initialize: Q = E
Initizlize: V 0 = V
Initizlize: E 0 = ∅
Initialize: For all v ∈ V define C(v) := {v} {C(v) is the set of vertices connected to v at each
iteration.}
(1) while E 0 has fewer than n − 1 edges
(2) Choose the edge e = (v, v 0 ) in Q with minimum weight.
(3) if C(v) 6= C(v 0 )
(4) for each u ∈ C(v): C(u) := C(u) ∪ C(v 0 )
(5) for each u ∈ C(v 0 ): C(u) := C(u) ∪ C(v)
(6) E 0 := E 0 ∪ {e}
(7) Q := Q \ {e}
(8) else
(9) Q := Q \ {e}
(10) GOTO 2
(11) end if
(12) end while
Output: T = (V 0 , E 0 ) {T is a minimum spanning tree.}
Algorithm 7. Kruskal’s Algorithm
Example 6.34. We illustrate Kruskal’s Algorithm in Figure 6.9. The spanning sub-
graph starts with each vertex in the graph and no edges. In each iteration, we add the
edge with the lowest edge weight provided that it does not cause a cycle to emerge in the
existing sub-graph. In this example, there is never an edge chosen that causes a cycle to
81
1 3 3 1 3 3
4 -1 6 4 -1 6
4 4
2 2
5 5
3 3
5 5
7 7
6 6
Initialization Iteration 1
1 3 3 1 3 3
4 -1 6 4 -1 6
4 4
2 2
5 5
3 3
5 5
7 7
6 6
Iteration 2 Iteration 3
1 3 3 1 3 3
4 -1 6 4 -1 6
4 4
2 2
5 5
3 3
5 5
7 7
6 6
Iteration 4 Iteration 5
appear (because of the way the weights are chosen). In this example, the construction of
the spanning tree occurs in exactly the same set of steps as Prim’s algorithm. This is not
always the case.
Exercise 57. Use Kruskal’s Algorithm to determine a minimum spanning tree for the
graph from Exercise 55.
Exercise 58. In the graph from Example 6.21, choose a starting vertex other than 1
and execute Prim’s algorithm to show that Prim’s Algorithm and Kruskal’s algorithm do
not always add edges in the same order.
Remark 6.35. We will prove the following theorem in the last section of this chapter
using a very generalized method. It can be shown by induction, just as we did in Theorem
6.22.
82
Theorem 6.36. Let (G, w) be a weighted connected graph. Then Kruskal’s algorithm
returns a minimum spanning tree.
Remark 6.37. The proof of the following theorem is beyond the scope of this course,
however it is useful to know the computational complexity of Kruskal’s algorithm. See
[CLRS01] for a proof.
Theorem 6.38. There is an implementation of Kruskal’s algorithm whose running time
is O (|E| log(|V |)).
Exercise 59. Compare the running time of an implementation of Kruskal’s Algorithm
O (|E| log(|V |)) to the best running time of and implementation of Prim’s algorithm O(|E| +
|V | log(|V |)). Under what circumstances might you use each algorithm? [Hint: Suppose that
G has n vertices. Think about what happens when |E| is big (say n(n − 1)/2) vs. when |E|
is small (say 0). Try plotting the two cases for various sizes of n.]
5. Shortest Path Problem in a Positively Weighted Graph
Remark 6.39. The shortest path problem in a weighted graph is the problem of finding
the least weight path connecting a given vertex v to a given vertex v 0 . Dijkstra’s Algorithm
[Dij59] answers this question by growing a spanning tree starting at v so that the unique
path from v to any other vertex v 0 in the tree is the shortest. The algorithm is shown in
Algorithm 8. It is worth noting that this algorithm only works when the weights in the
graph are all positive. We will discuss Floyd’s Algorithm [Flo62] for this instance when we
discuss Network Programming in a later chapter.
any vertex v 0 added to X prior to the k th iteration, d(v0 , v 0 ) is correct and the unique path
in Tk from v0 to v 0 defined by the function p(v) is the minimum distance path from v0 to v 0
in (G, w).
Before proceeding note, that for any vertex v 0 added to X at iteration k, p(v 0 ) is fixed
permanently after that iteration. Thus, the path from v0 to v 0 in Tk is the same as the path
from v0 to v 0 in Tk+1 . Thus, assuming that d(v0 , v 0 ) and p(v) are correct at iteration k means
it must also hold at future iterations (or more generally) that it is correct for (G, w).
Suppose vertex v is added to set X (removed from Q) at iteration k + 1 but the shortest
path from v0 to v is not the unique path from v0 to v in the tree Tk+1 constructed from the
vertices in X and the function p(v). Since G is connected, there is a shortest path and we
now have two possibilities: (i) the shortest path connecting v0 to v passes through a vertex
not in X or (ii) the shortest path connecting v0 to v passes through only vertices in X.
In the first case, if the true shortest path connecting v0 to v passes through a vertex u
not in X, then we have two new possibilities: (i) d(v0 , u) = ∞ or d(v0 , u) = r < ∞. We
may dismiss the first case as infeasible, and thus we have d(v0 , u) = r < ∞. In order for the
distance from v0 to v to be less along the path containing u, we know that d(v0 , u) < d(v0 , v).
84
v0 Closest to v0
Closest to v0
1 3 3 1 3 3
4 10 6 4 10 6
4 4
2 2
5 5
3 3
5 5
7 7
6 6
Q = {1, 2, 3, 4, 5, 6} Q = {2, 3, 4, 5, 6}
Iteration 1 Iteration 2
Closest to v0
1 3 3 1 3 3
Closest to v0
4 10 6 4 10 6
4 4
2 2
5 5
Removed from
3 3
5 5
tree. 7 7
6 6
Q = {2, 4, 5, 6} Q = {4, 5, 6}
Iteration 3 Iteration 4
1 3 3 1 3 3
10 6 4 10 6
4
4 4
2 2
5 5
3 3
5 5
7 7
6 6
Closest to v0 Closest to v0
Q = {5, 6} Q = {5}
Iteration 5 Iteration 6
1 3 3
4 10 6
4
2
5
3
5
7
6
Q = {} STOP
Figure 6.10. Dijkstra’s Algorithm iteratively builds a tree of shortest paths from
a given vertex v0 in a graph. Dijkstra’s algorithm can correct itself, as we see from
Iteration 2 and Iteration 3.
But if that’s true, then in Step 2 of Algorithm 8, we should have evaluated the neighborhood
of u well before evaluating the neighborhood of v in the for-loop starting at Line 4 and thus
u must be an element of X (i.e., not in Q). This leads to the second case.
Suppose now that the true shortest path from v0 to v leads to a vertex v 00 before reaching
v while the path recorded in Tk+1 reaches v 0 before reaching v as illustrated below.
85
v0
v�
w1
v
s
w2
v ��
u
Ignore this path, it contains
vertices not in X
Let w1 = w(v 0 , v) and w2 = w(v 00 , v). Then it follows that d(v0 , v 0 ) + w1 > d(v0 , v 00 ) + w2 .
By the induction hypothesis, d(v0 , v 0 ) and d(v0 , v 00 ) are both correct as is their path in Tk+1 .
However, since both v 0 and v 00 are in X, we know that the neighborhoods of both these
vertices were evaluated in the for-loop at Line 4. If p(v) = v 00 when N (v 0 ) was evaluated,
then p(v) = v 00 since Line 6 specifically forbids changes to p(v) unless d(v0 , v 0 ) + w1 <
d(v0 , v 00 ) + w2 . On the other hand, if p(v) = v 0 when N (v 00 ) was evaluated, then it’s clear at
once that p(v) = v 00 at the end of the evaluation of the if-statement at Line 6. In either case,
d(v0 , v) could not be incorrect in Tk+1 . The correctness of Dijkstra’s algorithm follows from
induction.
Remark 6.42. The following theorem has a proof that is outside the scope of the course.
See [CLRS01] for details.
Theorem 6.43. There is an implementation of Dijkstra’s Algorithm that has running
time in O (|E| + |V | log(|V |)).
Remark 6.44. Dijkstra’s Algorithm is an example of a Dynamic Programming [Bel57]
approach to finding the shortest path in a graph. Dynamic programming a sub-discipline
of Mathematical Programming (or Optimization), which we will encounter in the coming
chapters.
Exercise 60. Use Dijkstra’s algorithm to grow a Dijkstra tree for the graph in Exercise
55 starting at vertex D. Find the distance from D to each vertex in the graph.
Exercise 61 (Project). The A∗ heuristic is a variation on Dijkstra’s Algorithm, which
in the worst case defaults to Dijkstra’s Algorithm. It is fundamental to the study of Artificial
Intelligence. Investigate the A∗ heuristic, describe how it operates and compare it to Dijk-
stra’s Algorithm. Create two examples for the use of the A∗ heuristic, one that out-performs
Dijkstra’s Algorithm and the other that defaults to Dijkstra’s algorithm. You do not have
to code the algorithms, but you can.
Exercise 62 (Project). Using [CLRS01] (or some other book on algorithms) implement
Breadth and Depth First Search (for generating a spanning tree), Prim’s, Kruskal’s and
Dijkstra’s algorithm in the language of your choice. Write code to generate a connected
graph with an arbitrarily large number of vertices and edges. Empirically test the running
time of your three algorithms to see how well the predicted running times match the actual
86
running times as a function of the number of vertices and edges. Using these empirical
results, decide whether your answer to Exercise 59 was correct or incorrect.
Remark 6.53. Let (G, w) be a weighted graph and consider the weighted hereditary
system with (E, I) with I the collection of edge subsets of E that induce acyclic graphs and
w is just the edge weighting. Kruskal’s Algorithm is exactly a greedy algorithm. We begin
with the complete set of edges and continue adding them to the forest (acyclic subgraph of
a given weighted graph (G, w)), each time checking to make sure that the added edge does
not induce a cycle (that is, that we have an element of I). We will use this fact to prove
Theorem 6.36.
87
Greedy Algorithm
Input: (E, I, w) a weighted hereditary system
Initizlize: E 0 = ∅
Initizlize: A = E
(1) while A 6= ∅
(2) Choose e ∈ A to minimize w(e)
(3) A := A \ {e}
(4) if E 0 ∪ {e} ∈ I
(5) E 0 := E 0 ∪ {e}
(6) end if
(7) end while
Output: E 0
Algorithm 9. Greedy Algorithm (Minimization)
then w(Ik ) ≤ w(Jk ) for all k = 1, . . . , n. We proceed by induction. Since the Greedy
Algorithm selects the element e with smallest weight first, it is clear that w(I1 ) ≤ w(J1 ),
thus we have established the base case. Now assume that the statement is true up through
88
some arbitrary k < n. By definition, we know that |Jk+1 | > |Ik | and therefore by the
augmentation property there is some e ∈ Jk+1 with e 6∈ Ik so that Ik ∪ {e} is an element of I.
It follows that w(ek+1 ) ≤ w(e) because otherwise, the Greedy Algorithm would have chosen
e instead of ek+1 . Furthermore, w(e) ≤ w(fk+1 ) since the elements of I and J are listed in
ascending order and e ∈ Jk+1 . Thus, w(ek+1 ) ≤ w(e) ≤ w(fk+1 ) and therefore we conclude
that w(Ik+1 ) ≤ w(Jk+1 ). The result follows by induction at once.
(⇐) We will proceed by contrapositive to prove that M is a matroid. Suppose that the
augmentation property is not satisfied and consider I and J in I with |I| < |J| so that there
is no element e ∈ J with e 6∈ I so that I ∪ {e} is in I. Without loss of generality, assume
that |I| = |J| + 1. Let |I| = p and consider the following weight function:
−p − 2 if e ∈ I
w(e) = −p − 1 if e ∈ J \ I
0 else
After the Greedy algorithm chooses all the elements of I, it cannot decrease the weight of the
independent set because only elements that are not in J will be added. Thus, the total weight
will be −p(p + 2) − p2 − 2p. However, the set J has weight −(p + 1)(p + 1) = −p2 − 2p − 2.
Thus any set independent set containing J has weight at most −p2 − 2p − 2. Thus, the
Greedy Algorithm cannot identify a maximal independent set with minimum weight when
the augmentation property is not satisfied. Thus by contrapositive we have shown that if
the Greedy Algorithm identifies a maximal independent set with minimal weight, then M
must be a matroid. This completes the proof.
Theorem 6.57. Let G = (V, E) be a graph. Then the hereditary system M (G) = (E, I)
where I is the collection of subsets that induce acyclic graphs is a matroid.
Proof. From Proposition 6.48, we know that (E, I) is a hereditary system, we must
simply show that it has the augmentation property. To see this, let I and J be two elements
of I with |I| < |J|. Let H be the subgraph of G induced from the edge sets I ∪ J. Let
F be a spanning forest of this subgraph H that contains I. We know from Corollary 3.67
that H has a spanning subgraph and we know that we can construct such a graph using the
technique from the proof of Theorem 3.66.
Since J is acyclic, F has at least as many edges as J and therefore there exists at least
one edge e that is in the forest F but that does not occur in the set I and furthermore, it
must be an element of J (by construction of H). Since e is an edge in F , it follows that
the subgraph induced by the set I ∪ {e} is acyclic and therefore, I ∪ {e} is an element of I.
Thus M (G) has the augmentation property and is a matroid.
Corollary 6.58 (Theorem 6.36). Let (G, w) be a weighted graph. Then Kruskal’s al-
gorithm returns a minimum spanning tree when G is connected.
Exercise 65. Prove Theorem 6.36.
Remark 6.59. Matroid Theory is a very active and deep area of research in combinatorics
and combinatorial optimization theory. A complete study of this field is well outside the scope
of this course. The interested reader should consider a text on Matroid Theory like [Oxl92].
89
CHAPTER 7
Remark 7.1. It turns out the many graph theoretic problems can be expressed as linear
optimization problems. Furthermore, the proofs of some of the most fundamental theorems
of graph theory are greatly simplified by the use of a linear optimization formulation.
Even though it seems like we’re going to go far off topic, we use this chapter to introduce
Linear Optimization and its fundamental results. We will then use these results to prove key
results in Graph Theory and thereby illustrate the link between the theory of optimization
and the theory of graphs.
Remark 7.3. You will recall from your matrices class (Math 220) that matrices can be
used as a short hand way to represent linear equations. Consider the following system of
equations:
a11 x1 + a12 x2 + · · · + a1n xn = b1
a21 x1 + a22 x2 + · · · + a2n xn = b2
(7.2) ..
.
am1 x1 + am2 x2 + · · · + amn xn = bm
(7.4) Ax ≤ b
Using this representation, we can write our general linear programming problem using
matrix and vector notation. Expression 7.1 can be written as:
T
max z(x) =c x
(7.5) s.t. Ax ≤ b
Hx = r
Definition 7.4. In Problem 7.5, if we restrict some of the decision variables (the xi ’s)
to have only integer (or discrete) values, then the problem becomes a mixed integer linear
programming problem. If all of the variables are restricted to integer values, the problem is
an integer programming problem and if every variable can only take on the values 0 or 1, the
program is called a 0 − 1 integer programming problem. [WN99] is an excellent reference
for Integer Programming.
The company can spend no more than 120 hours per week making toys and since a plane
takes 3 hours to make and a boat takes 1 hour to make we have:
Likewise, the company can spend no more than 160 hours per week finishing toys and since
it takes 1 hour to finish a plane and 2 hour to finish a boat we have:
Remark 7.6. Strictly speaking, the linear programming problem in Example 7.5 is not a
true linear programming problem because we don’t want to manufacture a fractional number
of boats or planes and therefore x1 and x2 must really be drawn from the integers and not
the real numbers (a requirement for a linear programming problem). This type of problem
is generally called an integer programming problem. However, we will ignore this fact and
assume that we can indeed manufacture a fractional number of boats and planes. If you’re
interested in this distinction, you might consider taking Math 484, where we discuss this
issue in depth.
Linear Programs (LP’s) with two variables can be solved graphically by plotting the
feasible region along with the level curves of the objective function. We will show that we
can find a point in the feasible region that maximizes the objective function using the level
curves of the objective function. We illustrate the method first using the problem from
Example 7.5.
Example 7.7 (Continuation of Example 7.5). Let’s continue the example of the Toy
Maker begin in Example 7.5. To solve the linear programming problem graphically, begin
by drawing the feasible region. This is shown in the blue shaded region of Figure 7.1.
After plotting the feasible region, the next step is to plot the level curves of the objective
function. In our problem, the level sets will have the form:
−7 c
7x1 + 6x2 = c =⇒ x2 = x1 +
6 6
This is a set of parallel lines with slope −7/6 and intercept c/6 where c can be varied as
needed. The level curves for various values of c are parallel lines. In Figure 7.1 they are
shown in colors ranging from red to yellow depending upon the value of c. Larger values of
c are more yellow.
To solve the linear programming problem, follow the level sets along the gradient (shown
as the black arrow) until the last level set (line) intersects the feasible region. If you are
doing this by hand, you can draw a single line of the form 7x1 + 6x2 = c and then simply
draw parallel lines in the direction of the gradient (7, 6). At some point, these lines will fail
to intersect the feasible region. The last line to intersect the feasible region will do so at a
point that maximizes the profit. In this case, the point that maximizes z(x1 , x2 ) = 7x1 +6x2 ,
subject to the constraints given, is (x∗1 , x∗2 ) = (16, 72).
Note the point of optimality (x∗1 , x∗2 ) = (16, 72) is at a corner of the feasible region. This
corner is formed by the intersection of the two lines: 3x1 + x2 = 120 and x1 + 2x2 = 160. In
93
3x1 + x2 = 120
x1 + 2x2 = 160
Figure 7.1. Feasible Region and Level Curves of the Objective Function: The
shaded region in the plot is the feasible region and represents the intersection of
the five inequalities constraining the values of x1 and x2 . On the right, we see the
optimal solution is the “last” point in the feasible region that intersects a level set
as we move in the direction of increasing profit.
Applying our graphical method for finding optimal solutions to linear programming problems
yields the plot shown in Figure 7.2. The level curves for the function z(x1 , x2 ) = 18x1 + 6x2
94
are parallel to one face of the polygon boundary of the feasible region. Hence, as we move
further up and to the right in the direction of the gradient (corresponding to larger and
larger values of z(x1 , x2 )) we see that there is not one point on the boundary of the feasible
region that intersects that level set with greatest value, but instead a side of the polygon
boundary described by the line 3x1 + x2 = 120 where x1 ∈ [16, 35]. Let:
S = {(x1 , x2 |3x1 + x2 ≤ 120, x1 + 2x2 ≤ 160, x1 ≤ 35, x1 , x2 ≥ 0}
that is, S is the feasible region of the problem. Then for any value of x∗1 ∈ [16, 35] and any
value x∗2 so that 3x∗1 + x∗2 = 120, we will have z(x∗1 , x∗2 ) ≥ z(x1 , x2 ) for all (x1 , x2 ) ∈ S. Since
there are infinitely many values that x1 and x2 may take on, we see this problem has an
infinite number of alternative optimal solutions.
96
This linear programming problem can be put into standard form by using both a slack and
surplus variable.
max z(x1 , x2 ) = 2x1 − x2
s.t. x1 − x2 + s1 = 1
2x1 + x2 − s2 = 6
x1 , x2 , s1 , s2 ≥ 0
Definition 7.16 (Row Rank). Let A ∈ Rm×n . The row rank of A is the size of the
largest set of row (vectors) from A that are linearly independent.
Example 7.17. The row rank of the matrix
1 2 3
A= 4 5
6
7 8 9
is 2. To see this note that:
7 8 9 =− 1 2 3 +2 4 5 6
It is also clear that [1 2 3] and [4 5 6] are linearly independent. Thus showing that the row
rank of A is 2.
Remark 7.18. The column rank of a matrix A ∈ Rm×n is defined analogously on columns
rather than rows. The following theorem relates the row and column rank. It’s proof is
outside the scope of the course.
Theorem 7.19. If A ∈ Rm×n is a matrix, then the row rank of A is equal to the column
rank of A. Further, rank(A) ≤ min{m, n}.
Definition 7.20. Suppose that A ∈ Rm×n and let m ≤ n. Then A has full row rank if
rank(A) = m.
Remark 7.21. We will assume, when dealing with Linear Programming Problems in
standard or canonical form that the matrix A has full row rank and if not, we will adjust it
so this is true. The following theorem tells us what can happen in a Linear Programming
Problem.
Theorem 7.22. Consider any linear programming problem:
T
max z(x) =c x
P s.t. Ax ≤ b
x≥0
Then there are exactly four possibilities:
(1) There is a unique solution to problem P denoted x∗ .
(2) There are an infinite number of alternative optimal solutions to P .
(3) There is no solution to P because there is no x that satisfies Ax = b.
(4) There is no solution to P because the problem is unbounded. That is for any x such
that Ax = b there is another x0 6= x so that Ax0 = b and cT x < cT x0 .
97
4. Solving Linear Programming Problems with a Computer
Remark 7.23. There are a few ways to solve Linear Programming problems. The most
common approach is called the Simplex Algorithm. Unfortunately, we will not have time to
cover the Simplex Algorithm in this class. This is covered in IE 405 and Math 484, for those
interested [Gri11].
Remark 7.24. We’ll show how to solve Linear Programs using Matlab. Matlab assumes
that all linear programs are input in the following form:
T
min z(x) =c x
s.t. Ax ≤ b
(7.16) Hx = r
x≥l
x≤u
Here c ∈ Rn×1 , so there are n variables in the vector x, A ∈ Rm×n , b ∈ Rm×1 , H ∈ Rl×n
and r ∈ Rl×1 . The vectors l and u are lower and upper bounds respectively on the decision
variables in the vector x.
The Matlab command for solving linear programs is linprog and it takes the parameters:
(1) c,
(2) A,
(3) b,
(4) H,
(5) r,
(6) l,
(7) u
If there are no inequality constraints, then we set A = [] and b = [] in Matlab; i.e., A and
b are set as the empty matrices. A similar requirement holds on H and r if there are no
equality constraints. If some decision variables have lower bounds and others don’t, the term
-inf can be used to set a lower bound at −∞ (in l). Similarly, the term inf can be used if
the upper bound on a variable (in u) is infinity. The easiest way to understand how to use
Matlab is to use it on an example.
Example 7.25. Suppose I wish to design a diet consisting of Raman noodles and ice
cream. I’m interested in spending as little money as possible but I want to ensure that I eat
at least 1200 calories per day and that I get at least 20 grams of protein per day. Assume that
each serving of Raman costs $1 and contains 100 calories and 2 grams of protein. Assume
that each serving of ice cream costs $1.50 and contains 200 calories and 3 grams of protein.
We can construct a linear programming problem out of this scenario. Let x1 be the
amount of Raman we consume and let x2 be the amount of ice cream we consume. Our
objective function is our cost:
(7.17) x1 + 1.5x2
Our constraints describe our protein requirements:
(7.18) 2x1 + 3x2 ≥ 20
98
and our calorie requirements (expressed in terms of 100’s of calories):
(7.19) x1 + 2x2 ≥ 12
This leads to the following linear programming problem:
min x1 + 1.5x2
s.t. 2x1 + 3x2 ≥ 20
(7.20)
x1 + 2x2 ≥ 12
x1 , x2 ≥ 0
Let’s use Matlab to solve this problem. Our original problem is:
min x1 + 1.5x2
s.t. 2x1 + 3x2 ≥ 20
x1 + 2x2 ≥ 12
x1 , x2 ≥ 0
− x1 − 2x2 ≤ −12
x1 , x2 ≥ 0
Then we have:
1
c=
1.5
−2 −3
A=
−1 −2
−20
b=
−12
H = r = []
0
l= u = []
0
The Matlab code to solve this problem is shown in Figure 7.3 The solution Matlab returns
in the x variable is x1 = 3.7184 and x2 = 4.1877. It turns out there are actually an infinite
number of alternative optimal solutions to this problem. You could draw a picture of this
scenario to see if you can figure out why.
Exercise 67. In previous example, you could also have just used the problem in standard
form with the surplus variables and had A = b = [] and defined H and r instead. Use Matlab
to solve the diet problem in standard form. Compare your results to Example 7.25
99
%%Solve the Diet Linear Programming Problem
c = [1 1.5]’;
A = [[-2 -3];...
[-1 -2]];
b = [-20 -12]’;
H = [];
r = [];
l = [0 0]’;
u = [];
[x obj] = linprog(c,A,b,H,r,l,u);
Figure 7.3. Matlab input for solving the diet problem. Note that we are solving
a minimization problem. Matlab assumes all problems are mnimization problems,
so we don’t need to multiply the objective by −1 like we would if we started with a
maximization problem.
with A ∈ Rm×n , b ∈ Rm and (row vector) c ∈ Rn . Then x∗ ∈ Rn if and only if there exists
(row) vectors w∗ ∈ Rm and v∗ ∈ Rn and a slack variable vector s∗ ∈ Rm so that:
Ax∗ + s∗ = b
(7.22) Primal Feasibility
x∗ ≥ 0
∗ ∗
w A − v = c
(7.23) Dual Feasibility w∗ ≥ 0
v∗ ≥ 0
∗
w (Ax∗ − b) = 0
(7.24) Complementary Slackness
v ∗ x∗ = 0
Remark 7.28. The vectors w∗ and v∗ are sometimes called dual variables for reasons
that will be clear in the next chapter. They are also sometimes called Lagrange Multipliers.
You may have encountered Lagrange Multipliers in your Math 230 or Math 231 class. These
are the same kind of variables except applied to linear optimization problems. There is one
element in the dual variable vector w∗ for each constraint of the form Ax ≤ b and one
element in the dual variable vector v∗ for each constraint of the form x ≥ 0.
100
Example 7.29. Consider the Toy Maker Problem (Equation 7.9) with Dual Variables
(Lagrange Multipliers) listed next to their corresponding constraints:
max z(x1 , x2 ) = 7x1 + 6x2 Dual Variable
s.t. 3x1 + x2 ≤ 120 (w1 )
x1 + 2x2 ≤ 160 (w1 )
x1 ≤ 35 (w3 )
x1 ≥ 0
(v1 )
x2 ≥ 0
(v2 )
In this problem we have:
3 1 120
A= 1 2
b = 160
c= 7 6
1 0 35
Then the KKT conditions can be written as:
3 1 120
x1
1 2 x2 ≤ 160
Primal Feasibility 1 0 35
x1 ≥ 0
x2 0
3 1
1 2 − v1 v2 = 7 6
w 1 w 2 w 3
Dual Feasibility 1 0
w1 w2 w3 ≥ 0 0 0
v1 v2 ≥ 0 0
3 1 120
w1 w2 w3 1 2 x1 − 160 = 0
Complementary Slackness x2
1 0 35
v1 v2 x1 x2 = 0
Note, we are suppressing the slack variables s in the primal feasibility expression. Recall
that at optimality, we had x1 = 16 and x2 = 72. The binding constraints in this case where
3x1 + x2 ≤ 120
x1 + 2x2 ≤ 160
To see this note that if 3(16) + 72 = 120 and 16 + 2(72) = 160. Then we should be able
to express c = [7 6] (the vector of coefficients of the objective function) as a positive
combination of the gradients of the binding constraints:
∇(7x1 + 6x2 ) = 7 6
∇(3x1 + x2 ) = 3 1
∇(x1 + 2x2 ) = 1 2
101
That is, we wish to solve the linear equation:
3 1
(7.25) w1 w2 = 7 6
1 2
The result is the system of equations:
3w1 + w2 = 7
w1 + 2w2 = 6
A solution to this system is w1 = 85 and w2 = 11
5
. This fact is illustrated in Figure 7.4.
Figure 7.4 shows the gradient cone formed by the binding constraints at the optimal
point for the toy maker problem. Since x1 , x2 > 0, we must have v1 = v2 = 0. Moreover,
3x1 + x2 ≤ 120
x1 + 2x2 ≤ 160
x1 ≤ 35
x1 ≥ 0
x2 ≥ 0
Figure 7.4. The Gradient Cone: At optimality, the cost vector c is obtuse with
respect to the directions formed by the binding constraints. It is also contained
inside the cone of the gradients of the binding constraints, which we will discuss at
length later.
since x1 < 35, we know that x1 ≤ 35 is not a binding constraint and thus its dual variable
w3 is also zero. This leads to the conclusion:
∗
x1 16 ∗
w1 w2∗ w3∗ = 8/5 11/5 0
∗ ∗
∗ = v1 v2 = 0 0
x2 72
and the KKT conditions are satisfied.
Exercise 68. Consider the problem:
max x1 + x2
s.t. 2x1 + x2 ≤ 4
x1 + 2x2 ≤ 6
x1 , x2 ≥ 0
Write the KKT conditions for an optimal point for this problem. (You will have a vector
w = [w1 w2 ] and a vector v = [v1 v2 ]).
102
Draw the feasible region of the problem and use Matlab to solve the problem. At the
point of optimality, identify the binding constraints and draw their gradients. Show that the
KKT conditions hold. (Specifically find w and v.)
Exercise 69. Find the KKT conditions for the problem:
min cx
(7.26) s.t. Ax ≥ b
x≥0
6. Duality
Remark 7.30. In this section, we show that to each linear programming problem (the
primal problem) we may associate another linear programming problem (the dual linear
programming problem). These two problems are closely related to each other and an analysis
of the dual problem can provide deep insight into the primal problem.
Consider the linear programming problem
T
max c x
(7.27) P s.t. Ax ≤ b
x≥0
Then the dual problem for Problem P is:
min
wb
(7.28) D s.t. wA ≥ c
w≥0
Remark 7.31. Let v be a vector of surplus variables. Then we can transform Problem
D into standard form as:
min wb
s.t. wA − v = c
(7.29) DS
w≥0
v≥0
Thus we already see an intimate relationship between duality and the KKT conditions. The
feasible region of the dual problem (in standard form) is precisely the the dual feasibility
constraints of the KKT conditions for the primal problem.
In this formulation, we see that we have assigned a dual variable wi (i = 1, . . . , m) to
each constraint in the system of equations Ax ≤ b of the primal problem. Likewise dual
variables v can be thought of as corresponding to the constraints in x ≥ 0.
Remark 7.32. The proof of the following lemma is outside the scope of the class, but it
establishes an important fact about duality.
103
Lemma 7.33. The dual of the dual problem is the primal problem.
Remark 7.34. Lemma 7.33 shows that the notion of dual and primal can be exchanged
and that it is simply a matter of perspective which problem is the dual problem and which is
the primal problem. Likewise, by transforming problems into canonical form, we can develop
dual problems for any linear programming problem.
The process of developing these formulations can be exceptionally tedious, as it requires
enumeration of all the possible combinations of various linear and variable constraints. The
following table summarizes the process of converting an arbitrary primal problem into its
dual. This table can be found in Chapter 6 of [BJS04].
CONSTRAINTS
0
VARIABLES
0
UNRESTRICTED =
CONSTRAINTS
VARIABLES
0
= UNRESTRICTED
Example 7.35. Consider the problem of finding the dual problem for the Toy Maker
Problem (Example 7.5) in standard form. The primal problem is:
max 7x1 + 6x2
s.t. 3x1 + x2 + s1 = 120 (w1 )
x1 + 2x2 + s2 = 160 (w2 )
x1 + s3 = 35 (w3 )
x1 , x2 , s1 , s2 , s3 ≥ 0
Here we have placed dual variable names (w1 , w2 and w3 ) next to the constraints to which
they correspond.
The primal problem variables in this case are all positive, so using Table 1 we know that
the constraints of the dual problem will be greater-than-or-equal-to constraints. Likewise, we
know that the dual variables will be unrestricted in sign since the primal problem constraints
are all equality constraints.
104
The coefficient matrix is:
3 1 1 0 0
A = 1 2 0 1 0
1 0 0 0 1
Clearly we have:
c= 7 6 0 0 0
120
b = 160
35
This vector will be related to c in the constraints of the dual problem. Remember, in this
case, all variables in the primal problem are greater-than-or-equal-to zero. Thus
we see that the constraints of the dual problem are:
3w1 + w2 + w3 ≥7
w1 + 2w2 ≥6
w1 ≥0
w2 ≥0
w3 ≥0
We also have the redundant set of constraints that tell us w is unrestricted because the
primal problem had equality constraints. This will always happen in cases when you’ve
introduced slack variables into a problem to put it in standard form. This should be clear
from the definition of the dual problem for a maximization problem in canonical form.
Thus the whole dual problem becomes:
min 120w1 + 160w2 + 35w3
s.t. 3w1 + w2 + w3 ≥ 7
w1 + 2w2 ≥ 6
(7.30) w1 ≥ 0
w2 ≥ 0
w3 ≥ 0
w unrestricted
Again, note that in reality, the constraints we derived from the wA ≥ c part of the dual
problem make the constraints “w unrestricted” redundant, for in fact w ≥ 0 just as we
would expect it to be if we’d found the dual of the Toy Maker problem given in canonical
form.
105
Exercise 70. Identify the dual problem for:
max x1 + x2
s.t. 2x1 + x2 ≥ 4
x1 + 2x2 ≤ 6
x1 , x2 ≥ 0
Exercise 71. Use the table or the definition of duality to determine the dual for the
problem:
min cx
(7.31) s.t. Ax ≥ b
x≥0
Remark 7.36. The following theorems are outside the scope of this course, but they can
be useful to know and will help cement your understanding of the true nature of duality.
Theorem 7.37 (Strong Duality Theorem). Consider Problem P and Problem D. Then
(Weak Duality): cx∗ ≤ w∗ b, thus every feasible solution to the primal problem
provides a lower bound for the dual and every feasible solution to the dual problem
provides an upper bound to the primal problem.
Furthermore exactly one of the following statements is true:
(1) Both Problem P and Problem D possess optimal solutions x∗ and w∗ respectively
and cx∗ = w∗ b.
(2) Problem P is unbounded and Problem D is infeasible.
(3) Problem D is unbounded and Problem P is infeasible.
(4) Both problems are infeasible.
Theorem 7.38. Problem D has an optimal solution w∗ ∈ Rm if and only if there exists
vectors x∗ ∈ Rn and s∗ ∈ Rm and a vector of surplus variables v∗ ∈ Rn such that:
∗
w A≥c
(7.32) Primal Feasibility
w∗ ≥ 0
∗ ∗
Ax + s = b
(7.33) Dual Feasibility x∗ ≥ 0
s∗ ≥ 0
∗
(w A − c) x∗ = 0
(7.34) Complementary Slackness
w ∗ s∗ = 0
Furthermore, these KKT conditions are equivalent to the KKT conditions for the primal
problem.
Remark 7.39. The final theorem illustrates the true nature of duality. Two linear pro-
gramming problems are dual if they share KKT conditions. That is, if they share conditions
for optimality.
106
Cost Per Flow Unit / Capacity
2
Produces 3 units Consumes 3 units
$4 / 3
$3 / 1
x2 x3
1 $2 / 2 4
x5
$1 / 2 $5 / 2
x1 3 x4
Figure 7.5. In this problem, it costs a certain amount to ship a commodity along
each edge and each edge has a capacity. The objective is to find an allocation of
capacity to each edge so that the total cost of shipping three units of this commodity
from Vertex 1 to Vertex 4 is minimized.
Exercise 72. Consider the directed graph shown in Figure 7.5 The amount of flow along
each edge is given by the variables x1 , . . . , x5 . The total cost of shipping flow from Vertex 1
to Vertex 5 is
X 5
(7.35) ci x i
i=1
where ci is the cost associated to the flow in each edge. For each edge, we know that xi ≥ 0
and xi ≤ ui where ui is the capacity on each edge. Finally, we must be able to assert
that commodities are neither created nor destroyed (except at Vertex 1, where 3 units of
commodity are created and at Vertex 4 where 3 units of commodity are consumed). Thus
we have constraints of the form:
x1 + x2 = 3
x1 = x4 + x5
x2 + x5 = x3
x3 + x4 = 3
(1) Put all these constraints together to form a linear programming problem whose
solution yields a minimal cost assignment of flow to the edges.
(2) Use Matlab to find an optimal flow.
(3) Notice that each equation in the equality constraints represents the balance of flow
into and out of a vertex. Rewrite each equation so that is has the form
flow-out − flow-in = flow produced at vertex − flow consumed at vertex
(4) Compute the A matrix that results from the equations you just constructed and
compare it to the incidence matrix of the directed graph. [Hint: They should be
the same.]
107
CHAPTER 8
Remark 8.1. For the remainder of this chapter, we will consider directed graphs with no
isolated vertices and no self-loops. That is, we will only consider those graphs whose incident
matrices do not have any zero rows. These graphs will be connected and furthermore will
have two special vertices v1 and vm and we will assume that there is at least one directed
path from v1 to vm .
Remark 8.5. Equation 8.1 states that the total flow out of vertex vi minus the total
flow into vi must be equal to the total flow produced at vi . Or put more simply, excess flow
is neither created nor destroyed.
Definition 8.6 (Edge Capacity). Let G = (V, E) be a digraph with no self-lops and
suppose V = {v1 , . . . , vm } and E = {e1 , . . . , en }. If ek ∈ E then its capacity is a positive
real value uk that determines the maximum amount of flow the edge may be assigned. We
can think of (G, u) as being a weighed graph where the weights are the edge capacities.
Proposition 8.7. Let G = (V, E) be a digraph with no self-lops and suppose V =
{v1 , . . . , vn } and let A be the incidence matrix of G (see Definition 4.63). Then Equation
8.1 can be written as:
(8.2) Ai· x = bi
109
where x is a vector of variables of the form xk taken in the order the edges are represented
in A.
Proof. From Definition 4.63 we know that:
0
if vi is not in ek
(8.3) Aik = 1 if vi is the source of ek
−1 if v is the destination of e
i k
The equivalence between Equation 8.2 and Equation 8.1 follows at once from this fact.
Remark 8.8. For the remainder of this chapter, Let e1 ∈ Rm×1 be the vector with a 1
at position 1 and 0 everywhere else. Define em similarly.
Definition 8.9 (Maximum Flow Problem). Let G = (V, E) be a digraph with no self-
loops and suppose that V = {v1 , . . . , vm }. Without loss of generality, suppose that there is
no edge connecting vm to v1 . The maximum flow problem for G is the linear programming
problem:
max f
s.t. (em − e1 ) f + Ax = 0
(8.4) x≤u
x≥0
f unrestricted
Here u is a vector of edge flow capacity values.
Remark 8.10. The constraints (em − e1 ) f + Ax = 0 are flow conservation constraints
when we assume that there is an (imaginary) flow backwards from vm to v1 along an edge
(vm , v1 ) and that no flow is produced in the graph. That is, we assume all flow is circulating
within the graph. The value f determines the amount of flow that circulates back to vertex
v1 from vm under this assumption. Since all flows are circulating and excess flow is neither
created nor destroyed, the value of f is then the total flow that flows from v1 to vm . By
maximizing f , Problem 8.4 is exactly computing the maximum amount of flow that can go
from vertex v1 to vm under the assumptions that flows are constrained by edge capacities
(x ≤ u), flows are non-negative (x ≥ 0) and flows are neither created nor destroyed in the
graph.
Lemma 8.14. Let G = (V, E) be a directed graph and suppose V = {v1 , . . . , vm } and
E = {e1 , . . . , en }. The solution to the maximum flow problem is bounded above by the
minimal cut capacity.
Proof. Let (V1 , V2 ) be the cut with minimal capacity. Consider the following solution
to the dual problem:
(
0 vi ∈ V1
(8.13) wi∗ =
1 vi ∈ V2
and
(
1 ek = (vi , vj ) and vi ∈ V1 and vj ∈ V2
(8.14) h∗k =
0 else
112
It is clear this represents a feasible solution to the dual problem. Thus by the strong duality
theorem (Theorem 7.37) the objective function value:
X
(8.15) uk hk
k
is an upper bound for the primal problem. But this is just the capacity of the cut with the
smallest capacity. This completes the proof.
Lemma 8.15. In any optimal solution to Problem 8.4 every directed path from v1 to vm
must have at least one edge at capacity.
Proof. Note first, problem 8.4 is bounded above by the capacity of the minimal cut as
shown in Lemma 8.14 and since the zero flow is a feasible solution, we know from Theorem
7.22 there is at least one optimal solution to Problem 8.4 because the problem can neither
be unbounded nor infeasible.
Consider any optimal solution to Problem 8.5. Then it corresponds to some optimal so-
lution to the primal problem and these solutions satisfy the Karush-Kuhn-Tucker conditions.
We show that in this primal solution, along each path from v1 to vm in G at least one edge
must have flow equal to its capacity. To see this note that for any edge that does not carry
its capacity (that is xk < uk ) we must have hk = 0 (to ensure complementary slackness).
Suppose this path has vertices (u1 , v2 , . . . , us ) with v1 = u1 and vm = us . If there is some
path from v1 to vm that does not carry its capacity, then we have the following requirements:
ws > w1
w1 ≥ w2
..
.
ws−1 ≥ ws
Theorem 8.16. Let G = (V, E) be a directed graph and suppose V = {v1 , . . . , vm } and
E = {e1 , . . . , en }. There is at least one cut (V1 , V2 ) so that the flow from v1 to vm is equal to
the capacity of the cut (V1 , V2 ).
Proof. Denote one such solution to Problem 8.4 as (x∗ , f ∗ ); we now know such a solution
exists. By Lemma 8.15, we know that in this solution every directed path from v1 to vm
must have at least one edge at capacity. From each path from v1 to vm select an edge that
is at capacity in such a way that we minimize the total sum of the capacities of the chosen
edges. Denote this set of edges E 0 .
If E 0 is not yet an edge cut in the underlying graph of G, then there are some paths from
v1 to vm in the underlying graph of G that are not directed paths from v1 to vm . In each
such path, there is at least one edge directed toward vm from v1 . Choose one edge from each
of these paths directed from vm to v1 to minimize the total cardinality of edges chosen and
add these edges to E 0 . (See Figure 8.1).
113
Define V1 and V2 as follows:
V1 = {v : there is a simple path from v1 to v in the underlying graph of G − E 0 }
V2 = {v : there is a path from v to vm in the underlying graph of G − E 0 }
This construction is illustrated in Figure 8.1.
V1
At capacity
V2
1 m
Imaginary Arc
f
Claim 1. Every vertex is either in V1 or V2 using the definition of E 0 and thus the set
0
E = (V1 , V2 ) is an edge cut in the underlying graph of G.
Proof. See Exercise 73.
Suppose E 0 = {es1 , . . . , esl }.
Claim 2. If there is some edge ek with source in V2 and destination in V1 , then xk = 0.
Proof. If xk 6= 0, we could reduce this flow to zero and increase the net flow from v1 to
vm by adding this flow to f . If flow cannot reach x1 along ek (illustrated by the middle path
in Figure 8.1, then flow conservation ensures it must be equal to zero.
Claim 3. The total flow from v1 to vm must be equal to the capacity of the edges in E 0
that have source in V1 and destination in V2 .
Proof. We’ve established that if there is some edge ek with source in V2 and destination
in V1 , then xk = 0. Thus, the flow from v1 to vm must all traverse edges leaving V1 and
entering V2 . Thus, the flow from v1 to vm must be equal to the capacity of the cut E 0 =
(V1 , V2 ).
Claim 3 establishes that the flow f ∗ must be equal to the capacity of a cut (V1 , V2 ). This
completes the proof of the theorem.
Corollary 8.17 (Max Flow / Min Cut Theorem). Let G = (V, E) be a directed graph
and suppose V = {v1 , . . . , vm } and E = {e1 , . . . , en }. Then the maximum flow from v1 to vm
is equal to the capacity of the minimum cut separating v1 from vm .
114
Proof. By Theorem 8.16, if (x∗ , f ∗ ) is a maximum flow in G from v1 to vm , then there
is a cut (V1 , V2 ) so that the capacity of this cut is equal to f ∗ . Since f ∗ is bounded above by
the capacity of the minimal cut separating v1 from vm , the cut constructed in the proof of
Theorem 8.16 must be a minimal capacity cut. Thus, the maximum flow from v1 to vm is
equal to the capacity of the minimum cut separating v1 from vm .
Exercise 73. Prove Claim 1 in the proof of the Theorem 8.16.
4. An Algorithm for Finding Optimal Flow
Remark 8.18. The proof of the Max Flow / Min Cut theorem we presented is a very
non-standard proof technique. Most techniques are constructive; that is, they specify an
algorithm for generating a maximum flow and then show that this maximum flow must be
equal to the capacity of the minimal cut. In this section, we’ll develop this algorithm and
show that it generates a maximum flow and then (as a result of the Max Flow / Min Cut
theorem this maximum flow must be equal to the capacity of the minimum cut.)
Definition 8.19 (Augment). Let G = (V, E) be a directed graph and suppose V =
{v1 , . . . , vm } and E = {e1 , . . . , en } and let x be a feasible flow in G. Consider a simple path
p = (v1 , e1 , . . . , el , vm ) in the underlying graph of G from v1 to vm . The augment of p is the
quantity:
(
uk − xk if the edge ek is directed toward vm
(8.16) min
k∈{1,...,l} xk else
Definition 8.20 (Augmenting Path). Let G = (V, E) be a directed graph and suppose
V = {v1 , . . . , vm } and E = {e1 , . . . , en } and let x be a feasible flow in G. A simple path p in
in the underlying graph of G from v1 to vm is an augmenting path if its augment is non-zero.
In this case we say that flow x has an augmenting path.
Example 8.21. An example of augmenting paths is shown in Figure 8.2. An augmenting
uk 2
2 3
1 3 4
3 1
3
xk xk xk
2 2 2
1/2 2/3 1/2 1.25 / 3 1/2 3/3
Figure 8.2. Two flows with augmenting paths and one with no augmenting paths
are illustrated.
path is simply an indicator that more flow can be pushed from vertex v1 to vertex vm . For
example, in the flow on the bottom left of Figure 8.2 we could add an additional unit of
115
flow on the edge (v1 , v3 ). This one unit could flow along edge (v3 , v2 ) and then along edge
(v2 , v4 ). Augmenting paths that are augmenting soley because of a backward flow away from
v1 to vm can also be used to increase the net flow from v1 to vm by removing flow along the
backward edge.
Definition 8.22 (Path Augment). If p is an augmenting path in G with augment ∆,
then by augmenting p by ∆ we mean add ∆ to the flow in each edge directed from v1 toward
vm and subtract ∆ from the flow in each edge directed from vm to v1 .
Example 8.23. If we augment the augmenting paths illustrated in Example 8.21, the
resulting flows are illustrated in Figure 8.3.
xk xk
2 2
1/2 2/3 1/2 1.25 / 3
1 1/3 4 1 0.25 / 3 4
xk xk
2 2
1 2/3 4 1 0/3 4
Figure 8.3. The result of augmenting the flows shown in Figure 8.2.
Remark 8.24. The next algorithm, sometimes called the Edmonds-Karp Algorithm will
find a maximum flow in a network by discovering and removing all augmenting paths.
Maximum Flow Algorithm
Input: (G, u) a weighted directed graph with G = (V, E), V = {v1 , . . . , vm }, E = {e1 , . . . , en }
Initizlize: x = 0 {Initialize all flow variables to zero.}
(1) Find the shortest augmenting path p in G using the current flow x.
(2) if no augmenting path exists then STOP
(3) else augment the flow along path p to produce a new flow x
(4) end if
(5) GOTO (1)
Output: x∗
Algorithm 10. Maximum Flow Algorithm
2 2
0/2 0/3 2/2 2/3
1 0/2 4 1 0/2 4
2 2
2/2 3/3 2/2 2/3
1 1/2 4 1 0/2 4
2 2
0/2 0/3 2/2 3/3
1 0/2 4 1 1/2 4
Lemma 8.27. Let G = (V, E) be a directed graph and suppose V = {v1 , . . . , vm } and
E = {e1 , . . . , en } and let x∗ is optimal if and only if x∗ does not have an augmenting path.
Proof. Our proof is by abstract example. Without loss of generality, consider Figure
8.5. Suppose there is an augmenting path (as illustrated in Figure 8.5). If the flow f1 is
f4
1 m
f1 f2 f3
f6 f5
Figure 8.5. Illustration of the impact of an augmenting path on the flow from v1 to vm .
below capacity c1 , and this is the augment. Then we can increase the total flow along this
path by increasing the flow on each edge in the direction of vm (from v1 ) by ∆ = c1 − f1 and
decreasing the flow on each edge in the direction of v1 (from vm ) by ∆. Flow conservation
117
is preserved since we see that:
(8.17) f1 + f2 − f5 = 0 =⇒ (f1 + ∆) + (f2 − ∆) − f5 = 0
(8.18) f3 + f2 − f4 = 0 =⇒ (f3 + ∆) + (f2 − ∆) − f4 = 0
and:
(8.19) f = f5 + f6 + f3 =⇒ f + ∆ = f5 + f6 + (f3 + ∆)
(8.20) f = f4 + f6 + f1 =⇒ f + ∆ = f4 + f6 + (f1 + ∆)
The same is true if the flow on f2 > 0 and this is the augment. In this case, we can increase
the total flow by decreasing the flow on each edge in the direction of v1 (from vm ) by ∆ = f2
and increasing the flow on each edge in the direction of vm (from v1 ) by ∆. Thus if an
augmenting path exists, the flow cannot be maximal.
Conversely, suppose we do not have a maximal flow. Then by the Max Flow / Min Cut
theorem, the flow across the minimal edge cut is not equal to its capacity. Thus there is
some edge in the minimal edge cut whose flow can be increased. Thus, there must be an
augmenting path. This completes the proof.
Remark 8.28. The proof of Lemma 8.27 also illustrates that Algorithm 10 maintains
flow feasibility as it is executed.
Remark 8.29. The proof of the completeness of Algorithm 10 is a bit complicated.
Therefore, we simply state it without offering formal proof. The interested reader should
consult [KV08].
Theorem 8.30. Algorithm 10 terminates in O(mn2 ) time.
Theorem 8.31. At the completion of Algorithm 10, there are no augmenting paths and
the flow x∗ is feasible.
Proof. To see that x∗ is feasible, note that we never increase the flow along any path
by more than the maximum amount possible to ensure feasibility in all flows and a flow is
never decreased beyond zero. This is ensured in our definition of augment.
To prove optimality, suppose at the completion of Algorithm 10 there was an augmenting
path p. If we execute Line 1 of the algorithm, we will detect that augmenting path. Thus,
no augmenting path exists at the conclusion of Algorithm 10 and by Lemma 8.27 x∗ is
optimal.
Corollary 8.32 (Integral Flow Theorem). If the capacities of a network are all integers,
then there exists an integral maximum flow.
Remark 8.33. It is worth noting that the original form of Algorithm 10 did not specify
which augmenting path to find. This leads to a pathological condition in which the algorithm
occasionally will not terminate. This is detailed in Ford and Fulkerson’s original paper and
more recently in [Zwi95]. The shortest augmenting path can be found using a breadth
first search on the underlying graph. This breadth first search is what leads to the proof of
Theorem 8.30.
Exercise 74. Prove the Integral Flow Theorem.
118
Remark 8.34. The problem of finding a maximum flow (or minimum cut) is still very
much an area of interest for researchers with new results being published as recently as the
late 90’s. See [KV08] for details.
NY- 76 - 75 = 1
3 BOS
8 BLT
NY-
TOR
76-71=5
7
s t
2
BLT- 76-69=7
BOS
7
BOS
0
BLT- 76-63=13
TOR
BOS
- TOR
TOR
Figure 8.6. Games to be played flow from an initial vertex s (playing the role of
v1 ). From here, they flow into the actual game events illustrated by vertices (e.g.,
NY-BOS for New York vs. Boston). Wins and loses occur and these wins flow across
the infinite capacity edges to team vertices. From here, the games all flow to the
final vertex t (playing the role of vm ).
left to be played between the two teams in the game vertex. This makes sense, we cannot
assign more games to that edge than can be played. Edges crossing from the game vertices
to the team vertices have unbounded capacity; the values we assign them will be bounded
by the number of games the team play in the game vertices anyway. Edges going from the
team vertices to the final vertex t have capacity equal to the number of games Detroit can
win minus the games the team whose vertex the edge leaves has already won. This tells us
that for Detroit to come out on top (or with more wins than any other team) the number of
wins assigned to a team cannot be greater than the number of wins Detroit can amass (at
best). Clearly, if the maximum flow along in this graph fully saturates the edges leaving s,
then there is an assignment of games so that Detroit can still finish first. On the other hand,
if the edges connecting the team vertices to t form the minimum cut and the edges leaving s
are not saturated, then there there is no way to assign wins to Detroit to ensure that it wins
more games than any other team (or at best ties). The maximum flow in this example is
shown in Figure ??. From this figure, we see that Detroit cannot make the playoffs. There
is no way to assign all remaining games and for Detroit to have the most wins of any team
(or to at least tie). This is evident since the edges leaving s are not saturated.
Remark 8.37. Consider a score table for a team sport with n teams and with playoff
rules like those discussed in Remark 8.35. We will refer to P (k) as the maximum flow
problem constructed for team k (k = 1, . . . , n) as in Example 8.36.
BOS
- TOR
TOR
Figure 8.7. Optimal flow was computed using the Edmonds-Karp algorithm. No-
tice a minimum capacity cut consists of the edges entering t and not all edges leaving
s are saturated. Detroit cannot make the playoffs.
Note that each arc of the form (v 0 , v 00 ) corresponds to a vertex in G. Thus edge disjoint
paths in the constructed graph correspond to vertex disjoint graphs in the original graph.
The result follows from Menger’s First Theorem.
Definition 8.42 (Matching). A matching in a graph G = (V, E) is a subset M of E
such that no two edges in M share a vertex in common. A matching is maximal if there
is no other matching in G containing it. A matching has maximum cardinality if there is
no other matching of G with more edges. A maximal matching is perfect if every vertex is
adjacent to an edge in the matching.
Example 8.43. We illustrate a maximal matching and a perfect matching in Figure 8.8.
Figure 8.8. A maximal matching and a perfect matching. Note no other edges can
be added to the maximal matching and the graph on the left cannot have a perfect
matching.
s t
In the remainder of the proof, s will be our source (v1 ) and t will be our sink (vm ). Consider
a maximal (in cardinality) set P of vertex disjoint paths from s to t (we may think of G
being directed from vertices in V1 toward vertices in V2 ). Each path p ∈ P has the form
(s, e1 , v1 , e2 , v2 , e3 , t) with v1 ∈ V1 and v2 ∈ V2 . It is easy to see that we can construct a
matching M (P ) from P so for path p we introduce the edge e2 = {v1 , v2 } into our match-
ing M (P ). The fact that the paths in P are vertex disjoint implies there is a one-to-one
correspondence between elements in M (P ) and elements in P . Thus, |P | ≤ |M ∗ | since we
assumed that M ∗ was a maximum cardinality matching.
Now consider the smallest set J ⊂ V whose deletion destroys all paths from s to t in
N . By way of contradiction, suppose that |J| < |C ∗ |. Since we assumed that C ∗ was a
minimal vertex cover, it follows that J is not itself a vertex cover of G and thus G − J
leaves at least one edge in G. But this edge must connect a vertex in V1 to a vertex in V2
because G is bipartite. Thus, N − J has a path from s to t, which is a contradiction. Thus,
|C ∗ | ≤ |J|. Thus we have inequalities: |P | ≤ |M ∗ | ≤ |C ∗ | ≤ |J|. But by Menger’s Second
Theorem, minimizing |J| and maximizing |P | implies that |J| = |P | and thus |M ∗ | = |C ∗ |.
This completes the proof.
1Thanks to S. Shekhar for pointing out a weakness in the original proof I used.
123
Minimal Covering Maximal Matching
Cardinality = 2 Cardinality = 1
Figure 8.9. In general, the cardinality of a maximal matching is not the same as the
cardinality of a minimal vertex covering, though the inequality that the cardinality
of the maximal matching is at most the cardinality of the minimal covering does
hold.
Remark 8.49. It is clear that König’s Theorem does not hold in general. To see this,
consider K3 (see Figure 8.9). In this case, the general inequality that that the cardinality
of the maximal matching is at most the cardinality of the minimal covering does hold (and
this will always hold), but we do not have equality.
Remark 8.50. Let G = (V, E) be a (bipartite) graph with V = {v1 , . . . , vm } and E =
{e1 , . . . , en }. The minimal vertex covering problem for G can be written as the integer
programming problem:
min x1 + · · · + xm
(8.21) s.t. xi + xj ≥ 1 ∀{vi , vj } ∈ E
i x ∈ {0, 1} ∀i = 1, . . . , m
If A is the incidence matrix for G, then this problem can be written in matrix notation as:
T
min 1 x
(8.22) s.t. AT x ≥ 1
x ∈ {0, 1}m
Compute the dual problem for Problem 8.21. [Hint: Use slack and surplus variables to
construct a problem with only equality constraints (except for x ≥ 0). Compute the dual
problem. You will obtain an objective function that looks like max w1 + · · · + wn + u1 + u2 +
· · · + um and some constraints.]
Exercise 79. Use the dual problem you constructed in the Exercise 78 along with a
restriction to 0−1 variables to show that to show that for K3 the cardinality of the maximum
matching cannot be greater than 1 while the cardinality of the minimum covering can be 2.
Thus, illustrate that strong duality does not hold for integer programming problems.
124
Exercise 80. When G is a bipartite graph, argue that there is an integer solution of the
dual problem you found in Exercise 78 (even if you don’t force it to have 0 − 1 variables).
Finally prove that under the assumptions we’ve given in this chapter (i.e., that no isolated
vertex exists in a bipartite graph G) the cardinality of a minimal vertex covering is equal to
the cardinality of a maximal matching. [Hint: Argue an optimal integer solution exists to
the relaxation of both problems. The result follows from strong duality.]
Remark 8.51. There are many other applications of Linear Programming (and Integer
Programming) to the study of graphs. Please consult [BJS04, KV08, PS98, WN99] for
details.
125
CHAPTER 9
The study of random graphs presupposes a little knowledge of probability theory. For
those readers not familiar with probability at all, Chapter 1 of [Gri14] contains a small
introduction to discrete probability spaces, which should provide sufficient background for the
material on Random Graphs. The material can be obtained from http://www.personal.
psu.edu/cxg286/Math486.pdf. We provide one definition that is not contained in Chapter
1 of [Gri14] that is critical for the rest of this chapter.
Definition 9.1 (Bernoulli Random Variable). A Bernoulli Random Variable is a random
variable underlying a probability space with exactly 2 outcomes 0 and 1 and for which the
probability of 1 occurring is p.
Remark 9.2. It is easy to see that if X is a Bernoulli random variable, then (the
expected value of X) E(X) = p = Pr(X = 1). Furthermore, given a set of n Bernoulli
Random variables, the expected number of these events that will be 1 is np.
Remark 9.3. Random graphs are misnamed, as it makes one think of a graph that is
somehow random. In reality, the term random graph usually refers to a family of graphs,
which serves as a discrete probability space on which we derive a probability measure that
assigns to each graph some probability that it occurs.
1. Bernoulli Random Graphs
Definition 9.4 (Bernoulli Random Graph). Let n ∈ Z+ and let p ∈ (0, 1) Then G(n, p)
is Bernoulli Family of Random Graphs, which is the discrete probability space so that:
(1) The sample space is the set of all graphs with n vertices
(2) The probability that any two vertices chosen at random has an edge between them
is p and this probability is independent of all other edges.
(3) Therefore, for any given graph G = (V, E) in G(n, p) with |E| = m, the probability
assigned to G is:
n
pm (1 − p)( 2 )−m
Remark 9.5. Even though they are called Bernoulli Random Graphs, they were invented
by E. N. Gilbert [Gil59]. Gilbert went on to look at other (more useful) classes of random
graphs that are useful for modeling more realistic phenomena in communications [Gil61].
Example 9.6. We illustrate three graphs generated randomly where n = 10 and p = 0.5.
That means that any two edges have a 50% chance of having an edge between them. The first
two graphs each have 21 edges and therefore the probability of these graphs is 0.52 1 × 0.52 4,
while the probability of the third graph is 0.52 4 × 0.52 1 because it has 24 edges. Of course
these two values are identical.
127
(a) (b) (c)
Figure 9.1. Three random graphs in the same random graph family G 10, 21 .
The first two graphs, which have 21 edges, have probability 0.52 1 × 0.52 4. The third
graph, which has 24 edges, has probability 0.52 4 × 0.52 1.
Theorem 9.7. Let 2 ≤ k ≤ n. Then the probability that a graph G ∈ G(n, p) has a set
of k independent vertices is at most:
n k
(9.1) (1 − p)(2)
k
Proof. For any set of k vertices, the probability that they are independent is simply
the probability that none of the pairs of vertices are connected by an edge. There are k2
such pairs and each has a probability of (1 − p) of not being connected by an edge. There
are nk subsets of the n vertices each containing k elements. Therefore, the probability that
any of these sets is an independent set is:
n k
(1 − p)(2)
k
Thus we have proved:
n k
(9.2) Pr(α(G) ≥ k) ≤ (1 − p)(2)
k
where α is the independence number of the graph G.
Remark 9.8. Observe that Equation 9.2 is a weak bound in that if we choose k much
smaller than n and fix p, then the bound exceeds 1. This is because, while it is true that
each edge’s existence in the set U in the proof is independent, not all the nk possible sets
U are independent (they will share vertices). Thus we are over estimating the probability
when we sum them in the proof.
Exercise 81. Find a limiting expression for the probability that G ∈ G(n, p) has a clique
of size k with 2 ≤ k ≤ n. [Hint: Remember, a clique is the exact opposite of an independent
set.]
128
Definition 9.9 (Almost Sure Properties). Let P be a statement about a graph G. (For
example, P might be the statement “G is a tree.”) A property P is said to hold almost
surely (abbreviated a.s) for graphs in the family G(n, p) if:
lim Pr (P holds for an arbitrary graph G ∈ G(n, p)) = 1
n→∞
Remark 9.10. This notion of almost surely is a funny one, but there’s any easy way to
get used to it. As n grows large, we are really absorbing more and more of the graphs that
are possible (that is structures G = (V, E) with |V | some finite integer). If a property holds
with probability 1 as n goes to infinity, it means that for almost every graph that property
must hold because no matter how large an n we choose, there are always more graphs with
more than n vertices than there are with fewer than n vertices. Thus, almost surely should
be interpreted over the set of all graphs with a finite (but unbounded) number of vertices.
Lemma 9.11. Let p ∈ (0, 1) and let G be a graph in G(n, p). Then almost surely every
pair of vertices v, u in G is connected by a path of length 2.
Proof. Let G = (V, E) and consider w ∈ V {v, u}. The probability that both edges
{v, w} and {u, w} are in G is p2 . Thus the probability that at least one of these edges is
absent is 1 − p2 . Over all n − 2 possible choices for w, the probability that this occurs each
time, therefore is:
n−2
1 − p2
This quantity approaches 0 as n approaches infinity and thus, a.s. there is a path of length
2 connecting u and v.
Theorem 9.12. A graph G ∈ G(n, p) is almost surely connected.
Proof. This proof is immediate from the previous lemma.
Theorem 9.13. Let p ∈ (0, 1) and let G be a graph in G(n, p) and let H = (V, E) be an
arbitrary fixed graph. Then the property G has H as a subgraph holds a.s.
Proof. Suppose that H contains m vertices. Then partition the n vertices available in
graphs in G(n, p) into m sets each with size k = bn/mc. If there are vertices remaining,
they may be added to an m + 1st partition and ignored. Suppose that H contains s edges.
We’ll order the vertices in each partition and choose the ith element from each of these m
partitions to be the vertices of H. (We will see that i will run from 1 to k.) With s edges,
the probability that the edges of the graph H are present is exactly ps and therefore the
probability these edges are not present is 1−ps . Each of the k vertex m-tuples is independent
of all the others because (by ordering the partitions) we do not repeat any pair of vertices
that might form an edge. Thus, the probability that H is not spanned by any of these k
m-tuples of vertices is:
(1 − ps )k = (1 − ps )bn/mc
Since p ∈ (0, 1) we have:
(9.3) lim (1 − ps )bn/mc = 0
n→∞
Remark 9.23. Erdös-Rényi graphs were developed by P. Erdös and A. Rényi in 1959
while studying the interactions of probability theory and discrete mathematics [ER60,
ER59]. Before giving an example of an Erdös-Rényi Family of Random Graphs, we re-
quire three lemmas, which will be useful.
Lemma 9.24. Consider the Path Graph Pn ; that is the graph on n vertices that is itself
a path. The isomorphism type of Pn contains exactly n!/2 distinct graphs.
Proof. Consider, without loss of generality the example graph shown in Figure 9.2 For
Figure 9.2. A path graph with 4 vertices has exactly 4!/2 = 12 isomorphic graphs
obtained by rearranging the order in which the vertices are connected.
n vertices, we could arrange these vertices in any of n! ways. However, (by way of example)
the graph in which the vertices are arranged in order 1 to n with vertex i adjacent to vertex
i − 1 and vertex i + 1 (save for vertices 1 and n, which are only adjacent to vertices 2 and
n − 1 respectively) has an identical edge set to the graph with the vertices arranged in the
order n to 1. Thus, to obtain the size of the isomorphism type of Pn , we must divide n! by 2
to remove duplicates. Thus the size of the isomorphism type of Pn is n!/2. This completes
the proof.
Lemma 9.25. There are exactly n + 1 distinct graphs in the isomorphism type of Sn (the
star graph on n vertices).
Proof. By way of example, consider the graph S3 shown in Figure 9.3. We can choose
any of the 4 vertices to be the center of the star and obtain a different edge relation. For Sn ,
which contains n + 1 vertices, there are n + 1 possible vertices that could act as the center
of the star. Thus there are n + 1 distinct graphs in the isomorphism class. This completes
the proof.
Remark 9.26. In the previous two lemmas, we were interested in the number of distinct
graphs in the isomorphism class. Here distinct means that the edge sets are different. In the
case of the star graph shown in Figure 9.3 this means that the graph in which vertex 1 is
131
Figure 9.3. There are 4 graphs in the isomorphism class of S3 , one for each possible
center of the star.
the center is distinct from the star graph with vertex 4 at the center because the edge sets
in these two instances would be:
E = {{0, 1}, {0, 2}, {0, 3}} E 0 = {{3, 0}, {3, 1}, {3, 2}}
Note this is distinct from the number of automorphisms, which might be quite a bit higher
because two distinct automorphism might create the same edge set. (For example, in the
figure if we map the vertex 1 to 3 and leave all other vertices unchanged, this is an automor-
phism, but the edge structure has not changed.)
Lemma 9.27. The number of distinct graphs in the isomorphism class of Kn is 1.
Exercise 83. Prove Lemma 9.27.
Example 9.28. This example appears in Chapter 11.3 of [GY05]. We just explain it
a little more completely. Consider the random graph family G(5, 3). This family of graphs
contains
5
2
10
= = 120
3 3
Some of these graphs are isomorphic however. In fact there are exactly 4 isomorphism
classes contained in G(5, 3), which we illustrate through exhaustive enumeration. Consider
the graph shown in Figure 9.4(a): This graph consists of an isolated vertex (in the figure
Vertex 5) and a copy of P4 . There are 5 ways that the isolated vertex can be chosen and
from Lemma 9.24 we know there are 4!/2 = 12 elements in the isomorphism class of P4 .
Thus there are 60 graphs isomorphic to the on in Figure 9.4(a) in G(5, 3).
Another type of graph we might consider is shown in Figure 9.4(b). This graph has an
isolated vertex and a copy of the star graph S3 . Again, there are 5 distinct ways to choose
a vertex as the isolated vertex and by Lemma 9.25 there are 4 distinct graphs in the S3
isomorphism class. Thus there are 20 distinct graphs isomorphic to the on in Figure 9.4(b)
in G(5, 3).
132
2
2 4 2 4
5
4 5
5
1 2 3 4 1 3 1 3 1 3 5
Figure 9.4. The 4 isomorphism types in the random graph family G(5, 3). We
show that there are 60 graphs isomorphic to this first graph (a) inside G(5, 3), 20
graphs isomorphic to the second graph (b) inside G(5, 3), 10 graphs isomorphic to
the third graph (c) inside G(5, 3) and 30 graphs isomorphic to the fourth graph (d)
inside G(5, 3).
The third graph type in G(5, 3) is shown in Figure 9.4(c). Here there are two isolated
vertices and one copy of K3 . By Lemma 9.27 we know there is only one distinct element in
the isomorphism class of K3 , but there are 52 = 10 ways to choose the two isolated vertices.
Thus there are 10 distinct graphs isomorphic to the on in Figure 9.4(c) in G(5, 3).
In Figure 9.4(d) we have the final graph type that appears in G(5, 3). We can tell this
because we have a single copy of K2 which has only one distinct element in its isomorphism
class by Lemma 9.27 and a copy of S2 , which by Lemma 9.25 has 3 elements in its isomor-
phism class. There are 52 = 10 to pick the two vertices that form K2 thus there are 30
distinct graphs isomorphic to the on in Figure 9.4(d) in G(5, 3).
Since 60 + 20 + 10 + 30 = 120 we know we have enumerated all the distinct graphs in
G(5, 3). This yields some interesting properties. For example, it we let X be the random
variable that returns the number of isolated vertices assuming we draw a graph (sample) at
random from the family G(5, 3), then we see that:
60 20 10 30 100 5
(9.5) E(X) = (1) + (1) + (2) + (0) = =
120 120 120 120 120 6
We might instead define Y be to the random variable that is 1 if and only if there is a copy
of K3 (as a subgraph) of a graph chosen at random from G(5, 3) and 0 else. Then Y is a
Bernoulli Random Variable (see Remark 9.2.) In this case, it’s easy to see that:
10 1
(9.6) Pr(Y = 1) = =
120 12
since only the graphs isomorphic to the graph in Figure 9.4(c) contain a copy of K3 .
Remark 9.29. Given a graph G = (V, E) (possibly drawn from a random graph family)
we say that G has a copy of Kn if there is a subgraph H of G that is isomorphic to Kn .
Figure 9.4 obviously contains a copy of K3 and every graph with at least one edge contains
a copy of K2 .
Theorem 9.30. The expected number of copies of Ks in a graph chosen at random from
G(n, m) is:
n−1
n m 2
(9.7) s
s
s 2 2
133
Proof. Define a random variable Y to be 1 just in case there the vertex set {v1 , . . . , vs }
induces a complete sub-graph in a randomly chosen graph from G(n, m) and 0 otherwise.
We now ask the question, what is the probability that Y = 1 (or equivalently what is E(Y )).
If we let this probability be p then, from Remark 9.2, the expected number of copies of Ks
is simply:
n
p
s
since there are ns ways to pick these s vertices and by Lemma 9.27 the isomorphism class
of Kn contains only one element. To compute p, consider the following. There are out of all
the graphs in G(n, m), we must choose one in which k = 2s edges are proscribed (because
they link the elements of {v1 , . . . , vs }). We now may select m − k edges from a possible
collection of n2 − k edges. Thus, there are:
n
−k
2
m−k
ways this can be done. This means the probability of choosing any one of these graphs is:
n n−1
2
−k 2
(9.8) p=
m−k m
since there are:
n
2
m
graphs in the family G(n, m) from Definition 9.22. Simplifying Equation 9.8 we obtain:
n−1 n
m! n2 − m !
n
2
−k 2 2
−k !
(9.9) p= = n n
=
m−k m 2
− k − (m − k) !(m − k)! 2
!
n
n
n
n−1
2
− k ! m! − k ! m! − k !k! m! m
n
= 2 n = 2 n = 2
(m − k)! 2 ! 2
! (m − k)! 2
! k!(m − k)! k k
Thus we see that the expected number of copies of Ks in a graph chosen at random from
G(n, m) is:
n−1 n−1
n m 2
n m 2
(9.10) = s
s
s k k s 2 2
This completes the proof.
Lemma 9.31. There are exactly (n − 1)!/2 distinct graphs in the isomorphism type of Cn
(the cycle graph on n vertices).
Proof. We will build a cycle iteratively using the n vertices allotted. Choose a vertex
at random from the n vertices and denote it v1 . This vertex will be joined by an edge to
exactly one vertex v2 . There are n − 1 possible choices for v2 . Now, v2 will be joined by an
edge to exactly one (new) vertex v3 for which we have n − 2 possible choices. We can now
see that the number of ways of making a such a cycle is (n − 1)(n − 2) · · · (1). However, The
cycle that first creates an edge from v1 to v2 and then v2 to v3 and so on is precisely the
134
same as the cycle that first creates and edge from v1 to vn and then from vn to vn−1 and so
on. Thus, the value (n − 1)(n − 2) · · · (1) double counts each cycle once. Therefore, the total
number of distinct graphs in the isomorphism class of Cn is (n − 1)!/2.
Theorem 9.32. The expected number of copies of Ck in a random graph chosen from
the family G(n, m) is:
n−1
(k − 1)! n m 2
(9.11)
2 k k k
Proof. We follow the proof of Theorem 9.30. Define a random variable Y to be 1 just
in case there the vertex set {v1 , . . . , vs } induces a cycle Ck in a randomly chosen graph from
G(n, m) and 0 otherwise. We now ask the question, what is the probability that Y = 1
(or equivalently what is E(Y )). If we let this probability be p then, from Remark 9.2 and
Lemma 9.31, the expected number of distinct copies of Ck is simply:
(n − 1)! n
p
2 k
since there are nk ways to choose the k vertices in Ck and (n − 1)!/2 distinct isomorphic
copies of each cycle Ck on k vertices. Further, since Ck has k edges, we have already derived
the value of p in the proof of of Theorem 9.30. We have:
n−1
2
m
p=
k k
Thus we conclude that the expected number of copies of Ck in a random graph chosen from
the family G(n, m) is
n−1
(k − 1)! n m 2
2 k k k
This completes the proof.
Exercise 84. Let H be a graph on k vertices whose isomorphism class contains exactly
s distinct elements. State and prove a theorem on the expected number of copies of H in a
random graph chosen from the family G(n, m). [Hint: Theorem 9.32 is an example of such
a general theorem. Extrapolate from this.]
Remark 9.33. The spaces G(n, 12 ) and G(n, m) are closely related to each other. Consider
G(n, 12 ) restricted to only those graphs with exactly m edges. The probability of any one of
these graphs in G(n, 12 ) is:
m (n2 )−m
1 1
(9.12)
2 2
That is, they all have equal probability in G(n, 21 ). But, if we were to compute the conditional
probability of any of these graphs in G(n, 12 ) given that we require a graph to have m edges,
then their probabilities all reduce to precisely the probability one expects in the model
G(n, m) by the properties of conditional probability.
135
CHAPTER 10
Coloring
Figure 10.1. A graph coloring. We need three colors to color this graph.
T F
Figure 10.2. At the first step of constructing G , we add three vertices {T, F, B}
that form a complete subgraph.
For each propositional atom xi in the logical language L we are considering, add two
vertices vi and vi0 to G. Add an edge {vi , vi0 } to G as well as edges {vi , B} and {vi0 , B}.
This ensures that (i) vi and vi0 cannot have the same color and (ii) neither vi nor vi0 can
have the same color as vertex B. Thus, one must be colored green and the other red. That
means either xi is true (corresponding to vi colored green) or ¬xi is true (corresponding to
vi0 colored green). This is illustrated in Figure 10.3.
vi vi0
T F
Figure 10.3. At the second step of constructing G , we add two vertices vi and vi0
to G and an edge {vi , vi0 }
.
tj 4
tj 1 tj 2
T F
Figure 10.4. At the third step of constructing G, we add a “gadget” that is built
specifically for term φj .
of generality, we will show the construction for the case when φj = xj1 ∨ xj2 ∨ xj3 . All other
cases follow by an identical argument with a modified graph structure. For the remainder
of this proof, let ν be a valuation function.
Claim 1. If ν(xj1 ) = ν(xj2 ) = ν(xj3 ) = FALSE, then G is not 3-colorable.
Proof. To see this, observe that either tj1 or tj2 must be colored blue and the other
green, since v1 , v2 and v3 are colored red. Thus tj4 must itself be colored red. Further, since
vj3 is colored red, it follows that tj3 must be colored blue. But then tj5 is adjacent to a green
vertex (T ), a red vertex (tj4 ) and a blue vertex tj3 . Thus, we require a fourth color. This is
illustrated in Figure 10.5(a).
Claim 2. If ν(xj1 ) = TRUE or ν(xj2 ) = TRUE or ν(xj3 ) = TRUE, then G is 3-colorable.
Proof. The proof of the claim is illustrated in Figure 10.5(b) - (h).
Our two claims show that by our construction of G, G is 3-colorable if and only if every
every formula of S can be satisfied by some assignment of TRUE or FALSE to the atomic
propositions. (It should be clear that variations of Claims 1 and 2 are true by symmetry
arguments for any other possible value of φj ; e.g. φj = xj1 ∨ ¬xj2 ∨ xj3 .) It’s clear that if
we have n formulas in S and m atomic propositions, then G has 5n + 2m + 3 vertices and
3m + 10n + 3 edges and thus G can be constructed in a polynomial amount of time from
S. It follows at once that since 3 − SAT is NP-complete, so is the question of whether an
arbitrary graph is 3-colorable.
143
!
Swappable Swappable
Colors Colors
B B B
T F T F T F
B B B
T F T F T F
B B
T F T F
Figure 10.5. When φj evaluates to false, the graph G is not 3-colorable as illus-
trated in subfigure (a). When φj evaluates to true, the resulting graph is colorable.
By the label TFT, we mean v(xj1 ) = v(xj3 ) = TRUE and vj2 = FALSE.
144
Corollary 10.37. For an arbitrary k, deciding whether a graph is k-colorable is NP-
complete.
v1 v2
u2 u1
146
CHAPTER 11
Example 11.10. Consider the vector spaces R2 and R3 and the matrix:
1 2
M = 3 4
5 6
Then the function f : R2 → R3 with f (x) = Mx is a linear transformation. To see this,
consider
x11 x
x1 = x2 = 21
x12 x22
Then we have:
1 2 1 2
x11 x21 x11 + x21
M(x1 + x2 ) = 3 4
+ = 3 4
=
x12 x22 x12 + x22
5 6 5 6
(x11 + x21 ) + 2(x12 + x22 ) (x11 + 2x12 ) + (x21 + 2x22 )
3(x11 + x21 ) + 4(x12 + x22 ) = (3x11 + 4x12 ) + (3x12 + 4x22 ) =
5(x11 + x21 ) + 6(x12 + x22 ) (5x11 + 6x12 ) + (5x12 + 6x22 )
x11 + 2x12 x21 + 2x22 1 2 1 2
3x11 + 4x12 + 3x12 + 4x22 = 3 4 x11 + 3 4 x21 = Mx1 + Mx2
x12 x22
5x11 + 6x12 5x12 + 6x22 5 6 5 6
A similar argument will show that M(cx) = cMx for all vectors x ∈ R2 and all scalars
c ∈ R.
148
Remark 11.11. It is (relatively) easy to generalize the previous example to see that
(left) multiplication of a matrix M ∈ Rm×n by a vector x ∈ Rn×1 constitutes a linear
transformation from Rn to Rm .
Remark 11.18. The following statements on the size of a bases in vectors spaces are
outside the scope of this course. Proof can be found in [Lan87].
Theorem 11.19. Every basis of a vector space V has precisely the same cardinality.
Definition 11.20 (Dimension). The cardinality of any basis of a vector space V is called
the dimension of the vector space.
Remark 11.21. Theorem 11.19 ensures that the dimension of a vector space is uniquely
specified.
149
3. Vector Spaces of a Graph
Definition 11.22 (Galois Field with 2 Elements). The Galois Field with 2 elements
(denoted GF2) is the field ({0, 1}, +, ·, 0, 1) where:
(1) 0 + 0 = 0,
(2) 0 + 1 = 1,
(3) 1 + 1 = 0,
(4) 0 · 0 = 0,
(5) 0 · 1 = 0, and
(6) 1 · 1 = 1
Remark 11.23. The field GF2 is the first of an infinite number of finite fields that have
several practical applications. The interested reader should consult [LP97] for an excellent
introduction to applied abstract algebra.
Definition 11.24 (Symmetric Difference). Let S1 and S2 be any two sets. The symmet-
ric difference of S1 and S2 is:
(11.5) S1 + S2 = (S1 \ S2 ) ∪ (S2 \ S1 )
That is:
S1 + S2 = {s ∈ S1 ∩ S2 : s is in exactly one of S1 or S2 }
Definition 11.25 (Power Set). Let S be a set, then 2S is the power set of S; that is, 2S
is the set of all subsets of S.
Definition 11.26 (Edge Space). Let G = (V, E) be a graph. Then the edge space of G
is the vector space with field GF2 and set of vectors 2E , where vector addition is symmetric
difference and scalar vector multiplication is defined as:
(1) If S ⊆ E (i.e., S ∈ 2E ), then 0 · S = ∅ and
(2) If S ⊆ E, then 1 · S = S.
The edge space of G is generally denoted E.
Theorem 11.27. The tuple (GF2, 2E , +, ·) is a vector space.
Exercise 92. Prove Theorem 11.27.
Definition 11.28 (Vertex Space). Let G = (V, E) be a graph. Then the vertex space of
G is the vector space with field GF2 and set of vectors 2V , where vector addition is symmetric
difference and scalar vector multiplication is defined as:
(1) If S ⊆ V (i.e., S ∈ 2V ), then 0 · V = ∅ and
(2) If S ⊆ V , then 1 · V = V .
The vertex space of G is generally denoted V.
Theorem 11.29. The tuple (GF2, 2V , +, ·) is a vector space.
Remark 11.30. Vector Space generated in this way are more abstract than the vector
spaces we have discussed thus far. These vector spaces are connected to the theory of
current and voltage in Kirchoff’s Loop laws from Electricity and Magnatism [Pur84] and
graph coloring.
150
Theorem 11.31. Let G = (V, E) be a graph. The set of singleton sets in 2E forms a
basis for E. Therefore, E has dimension |E|.
Proof. Let E = {e1 , . . . , em } and let S ⊂ E. Then we have:
(11.6) S = α1 {e1 } + α2 {e2 } + · · · + αm {em }
with:
(
1 ei ∈ S
αi =
0 else
Thus B = {{e1 }, . . . , {em }} spans E. To see that B is linearly independent, note that:
∅ = α1 {e1 } + α2 {e2 } + · · · + αm {em }
is only possible if α1 = α2 = · · · = αm = 0 since the elements of B are mutually disjoint.
The fact that the dimension of E is |E| follows at once from Definition 11.20.
Exercise 93. State and prove a similar Theorem to Theorem 11.31 for V.
Definition 11.32 (Characteristic Vector). Let G = (V, E) Let S be a vector in E. The
characteristic vector of S is the m-dimensional vector of values drawn from GF2 so that:
S = α1 {e1 } + α2 {e2 } + · · · + αm {em }
Remark 11.33. Obviously for any subset of V we may likewise define a characteristic
vector based on Exercise 93. This means that E can really be considered to be the vector
space GF2n –the set of m-tuples of elements of GF2 for a graph G = (V, E) with |E| = n.
Likewise, if |V | = m, then V is equivalent to the vector space GF2m . The following theorem
is now straightforward to prove.
Theorem 11.34. Let G = (V, E) and let M be the incidence matrix of G. Then M is
a linear transformation from E to V when elements of E are treated as their characteristic
vectors and elements of V are treated as their characteristic vectors.
Remark 11.35. Before proving Theorem 11.34 it is important to note that all operations
are performed over GF2 and not the real numbers.
Proof of Theorem 11.34. The fact that multiplication by M is a linear transform
from E to V is a result of the fact that matrix multiplication is always a linear transform
from one vector space over a field to another vector space over the same field (see Remark
11.11).
Exercise 94. Consider a graph G = (V, E) and its incidence matrix M. Let x be the
characteristic vector for a standard basis vector in E (a vector corresponding to the one
element edge sets of E). What is the result (in V) of the transformation Mx?
4. Cycle Space
Definition 11.36 (Cycle Space). Let G = (V, E) be a graph. The cycle space of G is
an element of 2E denoted C and is the smallest set (of sets) containing the ∅, all cycles in G
(where each cycle is treated as a set of edges) and all unions of edge disjoint cycles in G.
151
Example 11.37. We show an example of the cycle space. The cycle space of a graph can
be thought of as all the cycles contained in that graph along with the subgraphs consisting
of cycles that only share vertices but no edges.
C
∅
Figure 11.1. The cycle space of a graph can be thought of as all the cycles con-
tained in that graph along with the subgraphs consisting of cycles that only share
vertices but no edges. This is illustrated in this figure.
Theorem 11.42. Let G = (V, E) be a connected graph and let T = (V, E 0 ) be a spanning
tree of G. Then the fundamental system of cycles of G and T forms a basis for C.
Proof. Recall first that by Theorem 3.66 such a spanning tree T exists when G is
connected. Consider any fundamental cycle C. This cycle is constructed by adding exactly
one edge to T and finding the cycle that results. Thus no fundamental cycle can be expressed
as the sum of any other fundamental cycles because they will all be missing (at least) one
edge. As a result, the fundamental system of cycles must be linearly independent.
Choose any element C of C and let {e1 , . . . , er } be the edges of C that do not appear in
E 0 . Further define the fundamental cycle Ci to be the one that arises as a result of adding
ei to T . The quantity C + C1 + · · · + Cr = ∅ if and only if C = C1 + · · · + Cr because there
are no edges in C that are not in C1 + · · · + Cr and similarly no edges in C1 + · · · + Cr that
are not in C.
It is easy to see that no edge in the set {e1 , . . . , er } appears in C + C1 + · · · + Cr because
each edge appears once in C and once in one of the Ci ’s and therefore, it will not appear in
C + C1 + · · · + Cr . But this means that every edge in C + C1 + · · · + Cr is an edge of T .
More specifically, the sub-graph induced by the edges in C + C1 + · · · + Cr is a sub-graph
of T and thus necessarily acyclic. Every element of C induces a subgraph of G that has at
least one cycle except for ∅. Thus, C + C1 + · · · + Cr = ∅.
Since our choice of C was arbitrary we can see that for any C we have:
C = α1 C1 + · · · + αk Ck
where {C1 , . . . , Ck } is the fundamental set of cycles of G with respect to T and
(
1 if the non-tree edge of Ci is found in C
αi =
0 else
Thus, the fundamental set of cycles is linearly independent and spans C and so it is a basis.
This completes the proof.
Remark 11.43. The dimension of the space C for a graph G = (V, E) is called the
cyclomatic number of G and is, of course, equal to the size of (any) set of fundamental cycles
generated by a spanning tree of G.
Exercise 96. The statement of Theorem 11.42 is stated and (more or less) proved
following [GY05]. Diestel [Die10] has a different way of defining the basis that does not
require G to be connected. Note that fundamental systems of cycles were defined for arbitrary
graphs (rather than just connected ones). Is there any reason we couldn’t just replace the
153
spanning tree T with a spanning forrest (invoking Corollary 3.67) and state and prove a
more general result?
5. Cut Space
Definition 11.44 (Cut Space). Let G = (V, E) be a graph. The cut space of G is an
element of 2E denoted C ∗ and is the smallest set (of sets) containing the ∅, all minimal edge
cuts in G and all unions of edge disjoint minimal edge cuts in G.
Example 11.45. We show an example of the cut space. The cut space of a graph can
be thought of as all the minimal cuts contained in that graph along with the subgraphs
consisting of minimal cuts that only share vertices but no edges.
C∗
∅
Figure 11.3. The cut space of a graph can be thought of as all the minimal cuts
contained in that graph along with the subgraphs consisting of minimal cuts that
only share vertices but no edges. This is illustrated in this figure.
Spanning Tree
Definition 11.52 (Fundamental System of Edge Cuts). Let G = (V, E) and let F =
(V, E 0 ) be a spanning forest. The fundamental system of edge cuts with respect to G and F
is the set of all fundamental edge cuts of G with respect to F .
Exercise 99. Find the fundamental system of edge cuts cycles for the graph shown in
Figure 11.4.
Lemma 11.53. Let G = (V, E) be a connected graph and let T = (V, E 0 ) be a spanning
tree of G. Every minimal edge cut of G contains at least one element of T .
Proof. Let C be any minimal edge cut of G. We first show that the graph G − C has
2 components. If G − C has 1 component, then C is not an edge cut, contradicting our
assumption. If G − C has three components, then choose two of them. There must be edges
connecting vertices in these two components in C because G was connected. Restoring any
155
one of these edges will result in a graph that still has more than 1 component (since we
assumed there were more than two components) and thus E 0 is not minimal.
Let V1 and V2 be the vertices of the two components that result from the removal of C
from G. The fact that T is a spanning tree means that there is some (exactly one) edge
e ∈ E 0 connecting a vertex in V1 to a vertex in V2 . This edge must be in C, otherwise, V1 is
connected to V2 and C is not a cut. This completes the proof.
Theorem 11.54. Let G = (V, E) be a connected graph. A set E 0 ⊆ G is an edge cut if
and only if every spanning tree of G contains at least one edge in E 0 .
Exercise 100. Prove Theorem 11.54. [Hint: Use Theorem 3.73 or Lemma 11.53 or
both.]
Theorem 11.55. Let G = (V, E) be a connected graph and let T = (V, E 0 ) be a spanning
tree of G. Then the fundamental system of edge cuts of G and T forms a basis for C ∗ .
Proof. To see that the fundamental system of edge cuts is linearly independent, note
that each system of edge cuts is uniquely defined by the removal of one edge from the spanning
tree T and therefore this edge is an element of that cut. If V1 and V2 is the resulting partition,
then there is exactly one edge in the spanning tree in the cut hV1 , V2 i (otherwise there would
be more than one path connecting two vertices in the spanning tree) and thus this partition
cut is uniquely defined by that edge. Thus no fundamental edge cut can be expressed as the
sum of any other fundamental cycles because they will all be missing (at least) one edge. As
a result, the fundamental system of edge cuts must be linearly independent.
Choose any element C of C ∗ and let {e1 , . . . , er } be the edges of C that appear in E 0 .
Further define the fundamental edge cut Ci to be the one that arises as a result of removing
ei from T .
It is easy to see that no edge in the set {e1 , . . . , er } appears in C + C1 + · · · + Cr because
each edge appears once in C and once in one of the Ci ’s and therefore, it will not appear in
C + C1 + · · · + Cr . But this means that every edge in C + C1 + · · · + Cr is an edge that does
not appear in T . Every element in C ∗ is the union of edge disjoint minimal edge cuts except
for ∅. By Lemma 11.53, the only way C + C1 + · · · + Cr will not contain an edge in T is if
it is ∅. Thus C = C1 + · · · + Cr .
Since our choice of C was arbitrary we can see that for any C we have:
C = α1 C1 + · · · + αk Ck
where {C1 , . . . , Ck } is the fundamental set of edge cuts of G with respect to T and
(
1 if the edge of C has edge ei found in E 0
αi =
0 else
Thus, the fundamental set of edge cuts is linearly independent and spans C ∗ and so it is a
basis. This completes the proof.
Definition 11.56 (Cycle Rank / Edge-Cut Rank). Let G = (V, E) be a graph. The
edge-cut rank of G is the number of edges in a spanning forest of G. The cycle rank (or
Betti number) of G, denoted β(G), is the number of edges in the relative complement of
that spanning forest in G (see Definition 2.68).
156
Proposition 11.57. Let G = (V, E) be a graph. The Betti number of G is:
β(G) = |E| − |V | − c(G)
While the edge cut rank of G is |V | − c(G).
Exercise 101. Prove Proposition 11.57. [Hint: Use Corollary 3.72.]
Corollary 11.58. Let G = (V, E) be a graph. The dimension of C is β(G), while the
dimension of the edge cut space is the edge cut rank, which is |V | − c(G).
Proof. This follows immediately from the proofs of Theorems 11.42 and 11.55.
Thus:
dim(C ⊕ C ∗ ) = |E|
if and only if
dim(C ∩ C ∗ ) = 0
The latter can only happen when C ∩ C ∗ = ∅. In this case, by necessity, C ⊕ C ∗ = E and C
and C ∗ are orthogonal complements. This completes the proof.
158
Bibliography
[AB00] R. Albert and A. Barabási, Topology of evolving networks: Local events and universality, Phys.
Rev. Lett. 85 (2000), no. 24, 5234–5237.
[AB02] , Statistical mechanics of complex networks, Reviews of modern physics 74 (2002), no. 1,
47–97.
[ACL01] William Aiello, Fan Chung, and Linyuan Lu, A random graph model for power law graphs, Ex-
periment. Math. 10 (2001), no. 1, 53–66.
[AHU74] A. V. Aho, J. E. Hopcroft, and J. D. Ullman, The design and analysis of computer algorithms,
Addison-Wesley, 1974.
[Ald08] D. L. Alderson, Catching the “Network Science” Bug: Insight and Opportunity for the Operations
Researcher, Oper. Res. 56 (2008), no. 5, 1047–1065.
[BAJ99] A. Barabási, R. Albert, and H. Jeong, Mean-field theory for scale-free random graphs, Physica A
272 (1999), no. 1/2, 173–187.
[BAJB00] A. Barabási, R. Albert, H. Jeong, and G. Bianconi, Power-law distribution of the world wide web,
Science 287 (2000), no. 5461, 2115.
[Bel57] R. Bellman, Dynamic programming, Princeton University Press, 1957.
[Ber73] C. Berge, Graphs and hypergraphs, North-Holland, 1973.
[BJS04] Mokhtar S. Bazaraa, John J. Jarvis, and Hanif D. Sherali, Linear programming and network flows,
Wiley-Interscience, 2004.
[BM08] A. Bondy and U. S. R. Murty, Graph theory, 3 ed., Springer, Graduate Texts in Mathematics,
2008.
[Bol00] Béla Bollobás, Modern Graph Theory, Springer, 2000.
[Bol01] B. Bollobás, Random Graphs, Cambridge University Press, 2001.
[Bol04] , Extremal graph theory, Dover Press, 2004.
[BP98] S. Brin and L. Page, The anatomy of a large-scale hypertextual web search engine, Seventh Inter-
national World-Wide Web Conference (WWW 1998), 1998.
[BR03] Béla Bollobás and Oliver Riordan, Robustness and vulnerability of scalefree random graphs, In-
ternet Mathematics 1 (2003), 1–35.
[Cha84] G. Chartrand, Introductory graph theory, Dover, 1984.
[CK68] G. Chartrand and H. Kronk, Randomly traceable graphs, SIAM J. Applied Math. 16 (1968), no. 4,
696–700.
[CLRS01] T. Cormen, C. Leiserson, R. Rivest, and C. Stein, Introduction to algorithms, 2 ed., The MIT
Press, 2001.
[CSW05] P. J. Carrington, J. Scott, and S. Wasserman, Models and Methods in Social Network Analysis,
Cambridge University Press, 2005.
[Dat95] B. N. Datta, Numerical linear algebra, Brooks/Cole, 1995.
[Die10] R. Diestel, Graph theory, 4 ed., Graduate Texts in Mathematics, Springer, 2010.
[Dij59] E. W. Dijkstra, A note on two problems in connexion with graphs, Numerische Mathematik (59),
269–271.
[ER59] P. Erdös and A. Rényi, On random graphs, Publ. Math. Debrecen 6 (1959), 290–297.
[ER60] , On the evolution of random graphs, Publ. Math. Inst. Hungar. Acad. Sci. 5 (1960), 17–61.
159
[FCF11] A. Friggeri, G. Chelius, and E. Fleury, Fellows: Crowd-sourcing the evaluation of an overlapping
community model based on the cohesion measure, Proc. Interdisciplinary Workshop on Informa-
tion and Decision in Social Networks (Massachusetts Institute of Technology), Laboratory for
lnformation and Decision Systems, May 31 - Jun 1 2011.
[Flo62] R. W. Floyd, Algorithm 97: Shortest path, Comm. ACM 5 (1962), no. 6, 345.
[Fra99] J. B. Fraleigh, A First Course in Abstract Algebra, 6 ed., Addison-Wesley, 1999.
[GB06] Christopher Griffin and Richard R. Brooks, A note on the spread of worms in scale-free networks,
IEEE Transactions on Systems, Man and Cybernetics, Part B 36 (2006), no. 1, 198–202.
[Gil59] E. N. Gilbert, Random Graphs, Ann. Math. Statist. 4 (1959), 1141–1144.
[Gil61] , Random plane networks, J. Soc. Indus. Appl. Math. 9 (1961), no. 4, 533–543.
[GR01] C. Godsil and G. Royle, Algebraic graph theory, Springer, 2001.
[Gri11] C. Griffin, Linear programming: Penn state math 484 lecture notes (v 1.8),
http://www.personal.psu.edu/cxg286/Math484 V1.pdf, 2010-2011.
[Gri14] , Game theory: Penn state math 486 lecture notes (v 1.1.1),
http://www.personal.psu.edu/cxg286/Math486.pdf, 2014.
[GY05] J. Gross and J. Yellen, Graph theory and its applications, 2 ed., CRC Press, Boca Raton, FL,
USA, 2005.
[HU79] J. Hopcroft and J. D. Ullman, Introduction to automata theory, languages and computation,
Addison-Wesley, Reading, MA, 1979.
[Kru56] J. B. Kruskal, On the shortest spanning subtree of a graph and the traveling salesman problem,
Proc. AMS 7 (1956), no. 1.
[KV08] B. Korte and J. Vygen, Combinatorial Optimization, Springer-Verlag, 2008.
[KY08] D. Knoke and S. Yang, Social Network Analysis, Quantitative Applications in the Social Sciences,
no. 154, SAGE Publications, 2008.
[Lan87] S. Lang, Linear Algebra, Springer-Verlag, 1987.
[LP97] R. Lidl and G. Pilz, Applied Abstract Algebra, Springer, 1997.
[Lu01] Linyuan Lu, The diameter of random massive graphs, Proceedings of the twelfth annual ACM-
SIAM symposium on Discrete algorithms, 2001, pp. 912–921.
[Mar00] D. Marker, Model Theory: An Introduction, 1 ed., Springer-Verlag, 2000.
[Mey01] C. D. Meyer, Matrix analysis and applied linear algebra, SIAM Publishing, 2001.
[MR95] Michael Molloy and Bruce Reed, A critical point for random graphs with a given degree sequence,
Random Structures Algorithms 6 (1995), 161–179.
[NBW06] M. E. J. Newman, A. Barabási, and D. J. Watts, The Structure and Dynamics of Networks,
Princeton University Press, 2006.
[New03] M. E. J. Newman, The structure and function of complex networks, SIAM Review 45 (2003),
no. 2, 167–256.
[Oxl92] J. G. Oxley, Matroid theory, Oxford University Press, 1992.
[Pri57] R. C. Prim, Shortst connection networks and some generalizations, Bell System Technical Journal
36 (1957).
[PS98] C. H. Papadimitriou and K. Steiglitz, Combinatorial optimization: Algorithms and complexity,
Dover Press, 1998.
[PSV01] Romualdo Pastor-Satorras and Alessandro Vespignani, Epidemic dynamics and endemic states in
complex networks, Phys. Rev. E (3) 63 (2001), no. 66117.
[Pur84] E. M. Purcell, Electricity and magnetism, 2 ed., McGraw-Hill, 1984.
[Sim55] H. Simon, On a class of skew distribution functions, Biometrika 42 (1955), no. 3/4, 425–440.
[Sim05] S. Simpson, Mathematical logic, http://www.math.psu.edu/simpson/courses/math557/logic.pdf,
December 2005.
[Spi11] L. Spizzirri, Justication and application of eigenvector centrality, http://www.math.washington.
edu/~morrow/336_11/papers/leo.pdf, March 6 2011 (Last Checked: July 20, 2011).
[Tru94] R. J. Trudeau, Introduction to graph theory, 2 ed., Dover, 1994.
[WN99] L. A. Wolsey and G. L. Nemhauser, Integer and combinatorial optimization, Wiley-Interscience,
1999.
160
[Zwi95] U. Zwick, The smallest networks on which the ford-fulkerson maximum flow procedure may fail
to terminate, Theoretical Computer Science 148 (1995), no. 1, 165–170.
161