0% found this document useful (0 votes)
58 views25 pages

Minimum Spanning Tree

The document discusses algorithms for finding the minimum spanning tree of a connected, undirected weighted graph. It describes Boruvka's algorithm, Prim's algorithm, and Kruskal's algorithm, explaining their implementations and time complexities of O(E log V), O(E log V), and O(E log V) respectively.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views25 pages

Minimum Spanning Tree

The document discusses algorithms for finding the minimum spanning tree of a connected, undirected weighted graph. It describes Boruvka's algorithm, Prim's algorithm, and Kruskal's algorithm, explaining their implementations and time complexities of O(E log V), O(E log V), and O(E log V) respectively.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Minimum Spanning Tree

References:
1. Algorithms, Jeff Erickson, Chapter 7
2. Algorithm Design Manual, Skiener, Chapter 6
Intro 2/8

Suppose we have a connected, undirected, weighted graph:

G = (V ; E ); w(e) 2 R

for every edge e 2 E it has a weight w(e). The weight could be


positive, negative, or zero.
This lecture describes several algorithms to find the minimum
spanning tree of G , that is the spanning tree T that minimizes
the function
X
w(T ) = w(e)
e 2T
Figure. A weighted graph and its minimum spanning tree.
Uniqueness of MST 4/8

It would greatly simplify our discussion if the MST is unique in a


given graph G . Fortunately,
if all edges in G have distinct weights, then G has unique MST.
Sketch proof: by contradiction. Goal: If G has two distinct MST,
then G must have two edges which have the same weight. Proof is
like exchange argument of greedy algorithm.
1. Suppose we have two distinct MST T ; T 0
2. Let e ; e 0 be the minimum weight edges in the two MST T nT 0,
and T 0 nT respectively. W.l.o.g, assume w(e) 6 w(e 0).
3. T ' [ fe g contains exactly 1 cycle C. Let e 00 be any edge in C
that is not in T (e 00 may or may not be e 0). Because e 00 2 / T , we
/ e, therefore e 00 2 T 0 nT . So w(e 00) > w(e 0) > w(e)
have e 00 =
4. Consider new spanning tree T 00 = T 0 + e ¡ e 00 (might be equal
to T). We have w(T 00) = w(T 0) + w(e) ¡ w(e 00) 6 w(T 0). But
T 0 is minimum spanning tree! We must have = instead of 6,
which means w(e) = w(e 00).

To simplify discussion, we make all edges distinct, by


forcing a tie breaking rule of equal weight edges, such as: (details
do not matter; any systematic tie breaking rule is good enough)
From now on, we can assume weights are distinct, and therefore
the MST is unique.
We can talk about the MST.
The only MST algorithm 5/8

Just like Whatever-First-Search, there is one generic MST algorithm


that can be considered as mother of the three popular MST algo-
rithms.
The general MST algorithm maintains a intermediate spanning
forest F at all times, such that
F is a subgraph of the minimum spanning tree of G

Initially, F consists of all the nodes and 0 edges (a forest of one-


node trees). The generic algorithm connects trees in F by adding
certain edges between them. When the algorithm halts, the forest
becomes a single spanning tree (which is the MST).
Which edges to add?
At any time point, the intermediate spanning forest F induces two
special types of edges in the rest of the graph:

 An edge is useless if it is not an edge of F , but both of its


endpoints are in the same component of F .

 An edge is safe if it is the minium weight edge with exactly one


endpoint in some component of F . (An edge could be safe for
two different components)

 Otherwise, the edge is undecided .

All MST algorithms are based on two observations:


Lemma 1. (Prim) The minimum spanning tree of G contains every
safe edge.

(Yes, safe edge at any stage of the generic algorithm)


We can actually prove a stronger statement:
For any subset S  V , the MST contains the minimum-weight
edge with exact one endpoint in S.
The proof is similar to greedy exchange argument:
Say e is the lightest such edge, and a spanning tree T does not
include e, we prove that T is definitely not MST.
Since T is connected, there must be path from one endpoint of e
to the other endpoint. Since this path starts with vertice in S and
ends with vertice not in S, there must be an edge with exactly one
endpoint in S. Let e 0 be such an edge. We replace e 0 with e: we get
another tree, with smaller cost! (because e is lightest: w(e) < w(e'))
Figure 1. Black vertices are the subset S. e is the lightest edge with exactly one
endpoint in S.

Lemma 2. The MST contains no useless edge.

Proof: adding any useless edge to F would introduce a cycle. (by


definition, useless edges are edges in the same component). 
Generic MST Algorithm:
Repeatedly add safe edges to the evolving forest F . If F is not yet
connected, there must be at least one safe edge.
Our generic MST algorithm eventually connects F . By induction,
lemma 1 says that the resulting tree is the MST.
When we add an edge to F, some undecided edges become useless,
and some undecided edges become safe.
To fully specify an algorithm, we must describe which safe edges
to add in each iteration, and how to find such edges.
Boruvka's Algorithm 6/8

The simplest MST algorithm: (Boruvka, 1926)

Add ALL safe edges and recurse.

Figure 2. Red edges are in F; dashed lines are useless. Arrows are safe edges.
In more detail, Boruvka's algorithm works like this.
1. It relies on the CountAndLabel() algorithm to count the number
of connected components in a graph, and label each node with its
component number:
2. How do we find all safe edges of F ? Suppose F has more than 1
component. The following subroutine computes an array safe[1::v]
of safe edges, where safe[i] is the minimum weight edge with one
endpoint in the i-th component of F . We brute-force trough all
edges and find the safe edges. For an edge uv, if they belong to
the same component, then it's either useless or already an edge in
F. Otherwise, compare the weight of uv to safe[component(u)] and
safe[component(v)] and update the entries if needed.
What's the time complexity?
 Each CountAndLabel() costs O(V + E ), because it's basically
a graph traversal of forest F . Since F is a forest, E < V . Thus
O(V + E ) = O(V )
 Each AddAllSafeEdges() costs O(V + E ) (again, it's graph tra-
versal, with constant time work per vertice/edge). Since graph G
is connected, we have V < E . Thus it costs O(V + E ) = O(E ).
 How many iterations? Each iteartion reduce the #components
of F by at least 2x (worst case: two components coalesce into
one; best case: every component coalesce into one).
Thus the number of iterations is O(log V ):
 In total, the Boruvka algorithm costs O(E log V )
Features of Boruvka algorithm for MST:
 It's fast; worst case is O(E log V ) but for a lot of graphs it's
much faster than that.
 It allows significant parallelism (each component can be processed
in parallel)
 Recent very fast MST algorithms are generalization of Boruvka's.
 This should be the choice for implementation of MST.
Jarnik's (Prim's) Algorithm 7/8

In Jarnik's (Prim's) algorithm, the intermediate forest F has only


one non-trivial component; all other components are isolated single
vertices.

The algorithm repeats the following step until T spans the whole
graph:

Jarnik: repeatedly add T's safe edge to T.


To implement Jarnik's algorithm we keep all the edges adjacent to
T in a priority queue. When we take an edge e from the priority
queue, we check whether both endpoints are in T. If not, we add the
vertice at the other end of e to T, and all its edges into the priority
queue. This is the best-first search!.
Time complexity: O(E log E ) = O(E log V ).
Kruskal's Algorithm 8/8

Kruskal: Scan all edges by increasing weight; if an edge is safe,


add it to F.
We can first sort the edges in increasing order which costs
O(E log E ) = O(E log V ). [Turns out this dominates all other costs].
A little detour:
We need a set-partition data structure in Kruskal that can effi-
ciently do:
1. MakeSet(v): make a set out of a single node
2. Find(v): return the set that v belongs to.
3. Union(u,v): union the sets that u and v belong to.
Naive data structure candidates:
1. We just have labels attached to every node. MakeSet(v) and
Find(v) takes constant time. However Union(u,v) takes linear
time.
2. Graph: connected component is a set. How to Find & Union?
Need full graph traversal to identify connected components.
This is actually quite common requirement for many applications.
A particularly efficient and simple data structure called union-find
fits the bill.

Union-find partitions elements into disjoint sets. Each element is in


exactly one set. Union-find represents each set as a backward tree,
with pointer to its parent. Each node contains an element. The root
is the representative of the set: e.g. find(u) will return the root of
the set that u belongs to.
Now, MakeSet(v) is pretty simple; just assign parent of v to 0.
For Find(v), we can just trace back using the parent array.
For Union(u,v): link one of u,v to the other (e.g. u adopting v as
child).
How efficient are these operations? Find(v) is proportional to the
height of tree; so we would like short trees. So when we Union(u,v),
to make it as short as possible, we let the taller tree adopt the shorter
tree.
Analysis: we can easily prove the height of any tree is at most
O(log n). So the most time consuming operation is Find(v) which
costs O(log n).
In fact this can be made even faster by collapsing the tree when we
Find(v): just relinking the parent of all nodes in path v->root(v) to
directly to root. The tree becomes very flat.

You might also like