0% found this document useful (0 votes)
23 views23 pages

Matroid

The document discusses UMAP Module 781, titled 'Matroids: The Theory and Practice of Greed,' authored by Christian Jones and Ran Libeskind-Hadas. It introduces matroids as mathematical structures that generalize linear independence and illustrates their application in developing efficient greedy algorithms for discrete optimization problems. The module is targeted at students familiar with linear algebra and graph theory, providing insights into both solvable and intractable optimization problems.

Uploaded by

perezzk042
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views23 pages

Matroid

The document discusses UMAP Module 781, titled 'Matroids: The Theory and Practice of Greed,' authored by Christian Jones and Ran Libeskind-Hadas. It introduces matroids as mathematical structures that generalize linear independence and illustrates their application in developing efficient greedy algorithms for discrete optimization problems. The module is targeted at students familiar with linear algebra and graph theory, providing insights into both solvable and intractable optimization problems.

Uploaded by

perezzk042
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

UMAP Module 781

Modules in
Undergraduate Matroids: The Theory
Mathematics
and Its
and Practice of Greed
Applications Christian Jones
Published in
cooperation with

The Society for


Industrial and
Applied Mathematics,

The Mathematical
Association of America,

The National Council


of Teachers of
Mathematics,

The American
Mathematical
Association of
Two-Year Colleges,

The Institute for


Operations Research
and the Management
Sciences, and

The American
Statistical Association.

Applications of Computer Science,


Discrete Optimization
COMAP, Inc., Suite 210, 57 Bedford Street, Lexington, MA 02420 (781) 862–7878
180 The UMAP Journal 21.2 (2000)

INTERMODULAR DESCRIPTION SHEET: UMAP Unit 781

TITLE: Matroids: The Theory and Practice of Greed

AUTHOR: Christian Jones


Dept. of Mathematics
University of Florida
Gainesville, FL 32611
[email protected]

Ran Libeskind-Hadas
Dept. of Computer Science
Harvey Mudd College
Claremont, CA 91711
[email protected]

MATHEMATICAL FIELD: Linear algebra, graph theory, computer science

APPLICATION FIELD: Computer science, discrete optimization

TARGET AUDIENCE: Students in a course in linear algebra, discrete mathe-


matics, graph theory, or algorithms.

ABSTRACT: A matroid is a mathematical structure that generalizes


the notion of linear independence. Remarkably, this
simple and elegant mathematical structure can be used
systematically to develop efficient and simple “greedy”
algorithms for a variety of discrete optimization prob-
lems. Moreover, matroids provide some insight into
why other discrete optimization problems are appar-
ently computationally intractable. This Module intro-
duces matroids and demonstrates their application to
several discrete optimization problems.

PREREQUISITES: The reader is assumed to be familiar with elementary


concepts in linear algebra (definition and properties of
linear independence) and in graph theory (definition of
a graph, bipartite graph, and path).

The UMAP Journal 21 (2) (2000) 179–201. Copyright


c 2000 by COMAP, Inc. All rights reserved.

Permission to make digital or hard copies of part or all of this work for personal or classroom use
is granted without fee provided that copies are not made or distributed for profit or commercial
advantage and that copies bear this notice. Abstracting with credit is permitted, but copyrights
for components of this work owned by others than COMAP must be honored. To copy otherwise,
to republish, to post on servers, or to redistribute to lists requires prior permission from COMAP.

COMAP, Inc., Suite 210, 57 Bedford Street, Lexington, MA 02420


(800) 77-COMAP = (800) 772-6627, or (781) 862-7878; http://www.comap.com
Matroids: The Theory and Practice of Greed 181

Matroids: The Theory and Practice of


Greed
Christian Jones
Dept. of Mathematics
University of Florida
Gainesville, FL 32611
[email protected]
Ran Libeskind-Hadas
Dept. of Computer Science
Harvey Mudd College
Claremont, CA 91711
[email protected]

Table of Contents
1. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2. MATROIDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3. FROM MATROIDS TO GREEDY ALGORITHMS . . . . . . . . . . . . . . 5

4. A SCHEDULING PROBLEM . . . . . . . . . . . . . . . . . . . . . . . . 7

5. A TASK ASSIGNMENT PROBLEM . . . . . . . . . . . . . . . . . . . . 9

6. CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
7. EXERCISES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
8. SOLUTIONS TO THE EXERCISES . . . . . . . . . . . . . . . . . . . . . . 16

9. REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
ABOUT THE AUTHORS . . . . . . . . . . . . . . . . . . . . . . . . . . 19
182 The UMAP Journal 21.2 (2000)

MODULES AND MONOGRAPHS IN UNDERGRADUATE


MATHEMATICS AND ITS APPLICATIONS (UMAP) PROJECT

The goal of UMAP is to develop, through a community of users and devel-


opers, a system of instructional modules in undergraduate mathematics and
its applications, to be used to supplement existing courses and from which
complete courses may eventually be built.
The Project was guided by a National Advisory Board of mathematicians,
scientists, and educators. UMAP was funded by a grant from the National
Science Foundation and now is supported by the Consortium for Mathemat-
ics and Its Applications (COMAP), Inc., a nonprofit corporation engaged in
research and development in mathematics education.

Paul J. Campbell Editor


Solomon Garfunkel Executive Director, COMAP
Matroids: The Theory and Practice of Greed 183

1. Introduction
This paper shows how some natural generalizations of concepts from linear
algebra can be used to find simple and efficient algorithms for many discrete op-
timization problems. Specifically, we describe a mathematical structure called
a matroid. We then show how matroids can be used to construct so-called greedy
algorithms for a variety of discrete optimization problems.
We begin by considering the case of a new long-distance telephone company.
The company plans to offer service between n cities but to lease the actual phone
lines from existing companies. A direct phone line exists between certain pairs
of cities, and there is a positive cost associated with leasing each line. The
new company would like to lease a subset of lines such that the company can
provide a path between any two cities using only the leased lines. A solution
is any subset of lines with this property. An optimal solution is a solution of
minimum total cost. For example, consider the weighted graph in Figure 1a,
where vertices correspond to cities, edges correspond to existing phone lines,
and the edge weights correspond to the leasing costs of the phone lines. One
possible solution is shown in Figure 1b, with a total cost of 18. Another solution
is shown in Figure 1c, with a total cost of only 11; it is possible to verify that
this is an optimal solution.
Note that any solution must span all of the vertices in the graph and an
optimal solution must be a tree: a connected graph with no cycles. If a solution
contains a cycle, then any edge on the cycle can be removed without destroying
connectivity while decreasing the cost of the solution. Thus, given a connected
graph, an optimal solution is a spanning tree of minimum total cost, also known
as a minimum spanning tree.

a 5 b a b a 5 b

5 4 5 4
3 3
6 6
e c e c e c
2 1 2 2 1

d d d

(a) (b) (c)

Figure 1. a. A weighted graph with five vertices. b. A solution with cost 18. c. An optimal
solution with cost 11.

The minimum spanning tree problem was studied as early as 1926 by the
Czech mathematician Otakar Borůvka in connection with minimizing the cost
of electric networks [1926]. The problem was later studied by Prim [1957] and
Kruskal [1956] among others.
Kruskal showed that the following simple algorithm can be used for finding
a minimum spanning tree in a connected graph: Let S be an initially empty set.
Sort the edges of the graph in order of nondecreasing edge costs. Consider each
edge in the sorted list, beginning with an edge of least cost, and add the edge

1
184 The UMAP Journal 21.2 (2000)

to S if and only if it does not create a cycle with the edges already in S. Kruskal
showed that after all edges have been considered, S is a minimum spanning
tree for the graph. For example, in the graph in Figure 1, Kruskal’s algorithm
sorts the edges in the order 1, 2, 3, 4, 5, 5, 6. It begins by selecting the edge of
weight 1 and adding it to S. The edge of weight 2 is added next, followed by
the edge of weight 3. The edge of weight 4 is considered next but cannot be
added to S because it creates a cycle with existing edges in S. Next, one of the
two edges of weight 5 is considered and added to S. The next edge of weight 5
cannot be added to S because it creates a cycle with existing edges. Similarly,
the edge of weight 6 cannot be added. At this point, the algorithm terminates
and S is a minimum spanning tree.
Kruskal’s algorithm is said to be “greedy” because at each step it simply
chooses the cheapest remaining edge that does not introduce a cycle. In general,
a “greedy” algorithm is one that makes a sequence of locally optimal decisions.
In the case of the minimum spanning tree problem, this sequence of locally
optimal decisions leads to a globally optimal solution. Unfortunately, for other
problems, greedy algorithms do not always find optimal solutions.
Consider the famous and seemingly similar traveling salesperson problem.
In this problem, we are given a graph with n vertices representing n cities.
There is an edge between every pair of vertices with an associated positive
real number representing the cost of a direct flight between the corresponding
cities. A salesperson wishes to start at her home city, visit each city at least once,
and return to her home city. However, since she is very busy, the salesperson
stipulates that she will not fly to any city more than once. Thus, the salesperson
wishes to find a cycle that visits each city exactly once: a traveling salesperson
tour or Hamiltonian cycle in the graph. Among all tours, the salesperson wants
to find one that minimizes the sum of the edge costs. For example, consider the
weighted graph with five vertices, representing five cities, in Figure 2a. One
tour is shown in Figure 2b and has a total cost of 12. Another tour is shown in
Figure 2c and has a cost of 7; it can be shown that this tour is an optimal tour.

1 a b 1 a b 1 a b 1
2 2 2
4 3 4 3
5 6
7
e c e c e c
1 2 1 2 1 2
d d d

(a) (b) (c)

Figure 2. a. A weighted graph with five vertices. b. A tour with cost 12. c. An optimal tour with
cost 7.

It seems intuitively natural to try to solve the traveling salesperson problem


using a greedy algorithm. For example, starting at the home city, we might
“greedily” select the least expensive flight leaving that city. Assume that this

2
Matroids: The Theory and Practice of Greed 185

flight brings us to city v. We could now “greedily” select the cheapest flight
from city v that brings us to a city that we have not yet visited. The process
can be repeated until we reach a city such that all other cities have been visited.
At this point, we are forced to fly directly to the starting city and the tour is
complete.
Surprisingly, this greedy approach for the traveling salesperson problem
does not always find optimal solutions. For example, consider the graph with
four vertices in Figure 3a. If vertex a represents the start city, the greedy al-
gorithm selects vertex b as the next city followed by vertex c, then vertex d,
and finally returning to vertex a. The cost of this tour, shown in Figure 3b is
1,000,003. On the other hand, the tour in Figure 3c has a cost of 6.

a 1 a 1 a 1
b b b

2 2

1,000,000 1 1,000,000 1
2 2

d c d c d c
1 1 1

(a) (b) (c)

Figure 3. a. A weighted graph with four vertices. b. A solution with cost 1,000,003. c. An optimal
solution with cost 6.

Not only does this particular greedy algorithm fail for the traveling salesper-
son problem, but no efficient algorithm is known for this problem.1 In fact, the
traveling salesperson problem is an NP-complete problem. This means, roughly,
that not only is no efficient algorithm known for the problem but that the dis-
covery of an efficient algorithm would immediately imply efficient algorithms
for a multitude of other apparently intractable computational problems.
While many discrete optimization problems can be solved by greedy algo-
rithms, many others seem not to be amenable to greed or even any efficient
algorithm at all. This article describes how an elegant mathematical struc-
ture, called a matroid, can be used to construct and establish the correctness of
greedy algorithms for a variety of discrete optimization problems. Moreover,
matroids also help us understand why certain other problems are not amenable
to efficient solution.
In the following sections of this paper, we begin by defining the concept of
a matroid. Next, we show how matroids are directly related to greedy algo-
rithms. We then use matroids to find greedy algorithms for several optimization
problems. We conclude with a discussion of the relationship between matroids
and intractable problems (such as the traveling salesperson problem) and give
a brief overview of some other mathematical structures related to matroids and
their applications.
1 An algorithm is said to be efficient if its running time is polynomial in the size of the problem
instance. The size of the problem instance is the number of digits required to encode it in binary.
See Garey and Johnson [1970] or Cormen et al. [1990] for more on this topic.

3
186 The UMAP Journal 21.2 (2000)

2. Matroids
A matroid is a mathematical structure, introduced by Whitney [1935], that
generalizes the notion of linear independence. Recall that in any vector space,
an independent set of vectors has the property that each of its subsets is also
an independent set. In addition, if X and Y are two independent sets such that
|X| > |Y |, then there exists some element e ∈ X − Y such that Y + e is also an
independent set.2
Remarkably, there are many sets, other than vector spaces, with their own
associated definitions of “independence” that satisfy the two properties above.
A matroid is any structure that satisfies these properties.

Definition 1 A matroid is an ordered pair M = (E, I) where E is a finite set and I


is a set of subsets of E satisfying the following two properties:

Heredity Property: The empty set is in I and for any set X ∈ I all subsets of X are
also elements of I.
Exchange Property: If X, Y ∈ I such that |X| > |Y |, then there exists some e ∈
X − Y such that Y + e ∈ I.

The elements of set I are called, not surprisingly, the independent sets of M .
As an example of a matroid, consider any matrix whose elements are real
numbers. Let E be the set of rows of the matrix and let I be the set of all linearly
independent subsets of E. Now, M = (E, I) is easily verified to be a matroid.
In fact, the name “matroid” comes from this relationship with matrices.
A more interesting example of a matroid is one induced from a graph,
known as the graphic matroid. Consider a graph G = (V, E) where V is the set
of vertices and E is the set of edges. If E  ⊆ E and if F = (V, E  ) contains no
cycles, then F is said to be a spanning forest of G. Each connected component
in the spanning forest is called a tree. For any graph G = (V, E), let I be the
set of all E  ⊆ E such that (V, E  ) is a spanning forest of G. We claim that
MG = (E, I) is a matroid.
The heredity property of matroids is easily seen to hold for MG : Since (V, ∅)
is a spanning forest of G, ∅ ∈ I. In addition, if E  ∈ I then (V, E  ) is a spanning
forest and thus (V, E  ) is a spanning forest for any E  ⊆ E  .
The exchange property requires slightly more work to verify. Assume that
graph G has n vertices and let F = (V, E  ) be a spanning forest in G. Note
that if E  = ∅, then it contains no edges and thus comprises n distinct trees,
each of which is a single vertex. If |E  | = 1, then F comprises n − 1 trees: One
tree is two vertices connected by an edge, and the remaining n − 2 trees are
distinct vertices. In general, if an edge e ∈ / E  does not create a cycle when

added to E , then the edge must connect two distinct trees in F . Therefore
2 Throughout this paper, the notation |A| denotes the cardinality of set A, A − B denotes the set
of elements in set A that are not in set B, and A + e denotes the set formed by adding element e
to set A.

4
Matroids: The Theory and Practice of Greed 187

F + e comprises one fewer tree than does F . Thus, in general, a forest with
k < n edges comprises exactly n − k trees.
In relation to the graphic matroid, let X and Y be two edge sets in I such that
|X| > |Y |. Thus, FX = (V, X) and FY = (V, Y ) are both spanning forests of G.
By the above observation, FX comprises fewer trees than does FY . Therefore,
there is some tree in FX whose vertices are in more than one tree of FY . This
means that there is an edge e ∈ X whose endpoints are in different trees of FY ,
and therefore FY + e does not contain a cycle and hence is a spanning forest.
Thus, MG is indeed a matroid. For example, for the graph in Figure 1 some of
the independent sets are {ab}, {ab, bc, de}, {ae, ce, de, bc}, and {ab, bd, cd, de}.
Before turning to applications of matroids to discrete optimization prob-
lems, we note that the analogy between matroids and vector spaces does not
end with independent sets. A related analogy, which is used extensively in the
next section, is that of a basis. In a vector space, a basis is a maximal independent
set: an independent set such that the addition of any other vector to this set
results in a set that is no longer independent. Similarly, a basis in a matroid
M = (E, I) is defined to be a maximal independent set; an element I ∈ I such
that I + e ∈/ I for all e ∈ E − I. A fundamental result of vector spaces is that
all bases have the same size. Analogously, we can directly apply the definition
of a matroid to prove the following lemma.
Lemma 1 All bases of a matroid have the same size.
The proof of this lemma and other analogies between vector spaces and ma-
troids are explored in the exercises.

3. From Matroids to Greedy Algorithms


We are now ready to establish the connection between matroids and greedy
algorithms. In a discrete optimization problem, each element typically has
some associated cost or weight and we wish to find a solution of minimum or
maximum total weight. In the minimum spanning tree problem, for example, a
weight is associated with each edge in the graph and we wish to find a spanning
tree of minimum total weight.
For a given matroid M = (E, I), let w be a weight function that assigns a
real number w(x) to each x ∈ E. We can easily extend the definition of weight
to apply to sets of elements: For any set S ⊆ E, we define the weight of the set
S to be

w(S) = w(x).
x∈S

To see how this extension of the weight function is useful, we revisit the
the minimum spanning tree problem. For a given instance, we can construct
a corresponding graphic matroid MG = (E, I), where E is the set of edges
in the graph and I is the set of all spanning forests in the graph. Let w be a

5
188 The UMAP Journal 21.2 (2000)

weight function that assigns a positive weight to each edge in the graph. Then
we claim that the objective of the minimum spanning tree problem is to find
a basis of MG such that w(X) is minimized. To see this, recall that a basis is
a maximal independent set: an element in I such that no proper superset of
this element is in I. In this case, elements in I are spanning forests, and thus a
maximal element is a forest such that no edge can be added without creating a
cycle. Such a forest is a spanning tree. Thus, a basis of minimum weight in the
graphic matroid is a minimum spanning tree in the graph.
A remarkable property of matroids is that a basis of maximum weight can
be found using a simple greedy algorithm. (The case of finding a basis of
minimum weight will be shown to follow easily from this.) Given a matroid
M = (E, I) and a weight function w : E → R, the matroid greedy algorithm
performs the following steps:
Sort the n elements of E into list e1 , e2 , . . . , en such that
w(e1 ) ≥ w(e2 ) ≥ · · · ≥ w(en )
Let X = ∅
for i = 1 to n
if X + ei ∈ I
then let X = X + ei
return X

Theorem 1 For any matroid M = (E, I) and weight function w : E → R, the


matroid greedy algorithm returns a basis of maximum weight.

Proof: First, we observe that the set X returned by the algorithm is a basis of
M . If not, then there exists ei ∈ E − X such that X + ei ∈ I. By the heredity
property, every subset of X + ei is in I, and thus ei would have been added to
the set X in step i of the for loop, a contradiction.
Let X = {x1 , . . . , xk }, where w(x1 ) ≥ w(x2 ) ≥ · · · ≥ w(xk ). Let Y be a basis
of M of maximum weight. By Lemma 1, |Y | = k. Let Y = {y1 , . . . , yk }, where
w(y1 ) ≥ w(y2 ) ≥ · · · ≥ w(yk ). If w(xi ) ≥ w(yi ) for all i, 1 ≤ i ≤ k, then X is
also a basis of maximum weight. Assume therefore that this is not the case and
let  be the least value such that w(x ) < w(y ). Consider the sets

X −1 = {x1 , . . . , x −1 } and Y = {y1 , . . . , y }.

By the heredity property, these sets are independent. The exchange property
implies that there exists some yi ∈ Y − X −1 such that X −1 + yi ∈ I. Since
w(yi ) ≥ w(y ) > w(x ), the greedy algorithm considers yi before x . By the
heredity property, every subset of X −1 + yi is in I and thus the algorithm
would have included yi in X, a contradiction. 
As an example, we again revisit the minimum spanning tree problem, which
we noted earlier is exactly that of finding a basis of minimum weight in the
graphic matroid MG = (E, I). Assume for a moment that we actually wanted
to find a spanning tree of maximum weight, a maximum spanning tree. In this

6
Matroids: The Theory and Practice of Greed 189

case, Theorem 1 tells us that we can simply apply the matroid greedy algorithm
to the corresponding graphic matroid. In other words, we begin by sorting the
edges in E in order of nonincreasing weights. Beginning with an initially
empty set X, we consider the sorted edges in E one at a time. If an edge under
consideration does not create a cycle with respect to the edges already in X,
we add the edge to X. This results in a basis of maximum weight: a maximum
spanning tree. Now observe that a minimum spanning tree can be found by
replacing each w(e) in the graph by its negative, −w(e), and finding a maximum
spanning tree in this reweighted graph. This procedure is exactly equivalent
to the aforementioned algorithm due to Kruskal.
In the next two sections we explore two more discrete optimization prob-
lems. For each of these problems, we find a corresponding matroid. We can
then apply the matroid greedy algorithm to solve each of these problems.

4. A Scheduling Problem
Consider a set S = {1, . . . , n} of n jobs that must be performed by a single
machine. Each job takes one unit of time to complete; once a job is started
on the machine, it must be completed before the next job can be started. For
each job, there is a deadline d(i) such that 1 ≤ d(i) ≤ n, 1 ≤ i ≤ n. For
each job, there is also an associated positive real reward or weight, w(i), that is
obtained if the job is completed no later than its deadline. The objective is to
determine a schedule, an ordering of the n jobs, that maximizes the total reward.
In practice, these jobs might be mechanical tasks performed on a machine or
programs run on a computer, and the rewards might represent profits earned by
completing the jobs by their deadlines. We solve this problem by constructing
a corresponding matroid, showing that an optimal solution to the scheduling
problem corresponds to a basis of maximum weight in the matroid, and then
finding such a basis using the matroid greedy algorithm.
Given a schedule for the n jobs, we say that a job is on time if it is completed
on or before its deadline; otherwise the job is said to be late. Let S be a set of
jobs with associated deadlines and weights. A feasible schedule for X ⊆ S is a
schedule in which all of the jobs in X are on time. A subset X of S is said to
be feasible if there exists at least one feasible schedule for X. Notice that the
problem of finding an optimal schedule for S can be reduced to that of finding
a feasible subset X ⊆ S of maximum total weight. We would then like to find a
feasible schedule for X. The elements in S − X will all be late and can therefore
be scheduled in any order after the jobs in X. Of course, we need some way of
determining whether a subset X ⊆ S is feasible, and if it is, we need to find a
feasible schedule for it.
Imagine for a moment that we are given a set X ⊆ S and told that X is
feasible. We know that a feasible schedule exists for X, but it would seem that
we might need to test all of the permutations of the elements of X, one by one,
until we find a feasible schedule. This, of course, would be a prohibitively slow

7
190 The UMAP Journal 21.2 (2000)

process. Surprisingly, if X is known to be feasible, it is very easy to find a feasible


schedule: we simply sort the jobs in order of nondecreasing deadlines. Perhaps
even more remarkable is the fact that to determine whether or not a set X is
feasible in the first place, we need only sort the jobs in order of nondecreasing
deadlines and check to see if every job is on time in this schedule. We now
formalize these claims in the following lemma.

Lemma 2 A set X is feasible if and only if the schedule formed by sorting the elements
of X in order of nondecreasing deadlines results in each job being on time.

Proof: Assume that X is feasible. Then there exists some schedule that com-
pletes the jobs in X on time. If this schedule has some pair of jobs i and j
both of which are completed on time but such that i is completed before j
and d(i) > d(j), then we swap the two jobs. Now, job j is completed even
earlier and is therefore still on time. Moreover, job i is now completed when
j was completed in the original schedule and is therefore completed no later
than time d(j). Since d(i) > d(j), job i is still on time in this new schedule.
Therefore, given any schedule that completes all of the jobs in X on time, we
can repeatedly swap pairs of on-time jobs until they are completed in order of
nondecreasing deadlines with all jobs still being completed on time.
Conversely, assume that for a given set X, the schedule formed by sorting
the elements of X in order of nondecreasing deadlines results in each job being
on time. Since this is a feasible schedule for X, X is feasible by definition. 

Let us define I to be the set of all feasible subsets of S. We next show


that MS = (S, I) is a matroid. Since all weights are positive in this problem,
a feasible set of maximum weight is precisely a basis of maximum weight in
the matroid. Thus, the matroid greedy algorithm can be applied to solve this
scheduling problem.

Lemma 3 Given a set of jobs S, let I denote the set of all feasible subsets of S. Then
MS = (S, I) is a matroid.

Proof: The heredity property is satisfied because ∅ is trivially feasible and every
subset of a feasible set is clearly also feasible.
To show that the exchange property is satisfied, let X and Y be two elements
of I such that |X| > |Y |. Without loss of generality, assume that |X| = |Y | + 1.
(If this is not the case, we simply remove elements from X arbitrarily until this
assumption is true.) Let |Y | = n and let x1 , . . . , xn+1 and y1 , . . . , yn denote the
elements of X and Y in order of nondecreasing deadlines. That is, d(xi ) ≤
d(xi+1 ), 1 ≤ i ≤ n, and d(yj ) ≤ d(yj+1 ), 1 ≤ j ≤ n − 1. If xn+1 ∈ / Y , then
xn+1 ∈ X − Y , and the schedule y1 , . . . , yn , xn+1 is a feasible schedule for
Y + xn+1 because xn+1 completes at time n + 1 in the schedule for X and thus
d(xn+1 ) ≥ n + 1.
Assume, therefore, that xn+1 ∈ Y . Since |X| > |Y |, there exists an element of
X that is not in Y . Let k be the largest value of i such that xi ∈
/ Y . Then 1 ≤ k ≤ n

8
Matroids: The Theory and Practice of Greed 191

and xk ∈ / Y but xj ∈ Y for k < j ≤ n + 1. Since xk+1 , . . . , xn+1 ∈ Y and the


elements x1 , . . . , xn+1 and y1 , . . . , yn both appear in order of nondecreasing
deadlines, it must be the case that d(yn ) ≥ d(xn+1 ) and, in general, d(yi ) ≥
d(xi+1 ), k ≤ i ≤ n. Also, d(xi ) ≥ i, 1 ≤ i ≤ n + 1. Thus, d(yi ) ≥ d(xi+1 ) ≥ i + 1,
k ≤ i ≤ n. This implies that in the schedule y1 , . . . , yn , we can “shift” the
elements yk , . . . , yn so that they now complete at times k + 1, . . . , n + 1 and are
all still on time. This leaves a “gap” at time k. Since xk ∈ / Y and d(xk ) ≥ k,
we move xk into this gap. We now have the set Y + xk with feasible schedule
y1 , . . . , yk−1 , xk , yk , . . . , yn . Therefore, Y + xk is a feasible set and the exchange
property is satisfied. 
Since we have shown that MS is a matroid and that the scheduling prob-
lem can be formulated as that of finding a basis of maximum weight in this
matroid, the matroid greedy algorithm can be applied to solve this problem.
Specifically, the matroid greedy algorithm begins by sorting the jobs in order of
nonincreasing weights. The set X is initially empty. Each job ei is considered
according to the sorted order and is added to X if and only if X + ei is inde-
pendent. To test if X + ei is independent, all the jobs in X + ei are sorted in
order of nondecreasing deadlines. Each job in this sorted order is then checked
to determine if it is completed by its deadline. If all the jobs are completed by
their deadlines, then X + ei is independent and ei is added to X. Otherwise,
the algorithm does not add ei to X. When all jobs have been considered, the
set X is a feasible set of maximum total weight. To find an optimal schedule,
we simply sort the elements in X in order of nondecreasing deadlines. We then
append the elements in S − X in arbitrary order to the end of this schedule.
Exercise 8 provides a small example on which the algorithm may be performed.

5. A Task Assignment Problem


A company has m employees E = {e1 , . . . , em } and n tasks T = {t1 , . . . , tn }
that must be completed. Each employee is qualified to perform a certain subset
of these tasks but has time to perform at most one task. There is a positive real
weight w(ti ) associated with each task, representing the value or priority of
that task. The objective is to find an assignment of tasks to employees, where
each task is assigned to at most one employee and each employee is assigned
to at most one task, such that the value of the completed tasks is maximized.
This optimization problem can be modeled with a bipartite graph in which
there are two vertex sets E = {e1 , . . . , em } and T = {t1 , . . . , tn }. There is an
edge from a vertex ei to a vertex tj if employee ei is qualified to perform task
tj . A matching in the graph is a subset of edges such that no two edges share
a common vertex. For a given matching, a vertex is said to be matched if some
edge in the matching is incident on it. Our objective is to find a matching in
the graph that maximizes the sum of the weights of the matched vertices in T .
This problem too can be solved with a greedy algorithm. To do so we begin

9
192 The UMAP Journal 21.2 (2000)

by defining a matroid known in the literature as a transversal matroid. Given


a bipartite graph with vertex sets E and T , we define X ⊆ T to be matchable
if there exists some matching in the graph that matches every vertex in X to
some vertex in E.3 Let I denote the set of all matchable subsets of T . We
claim that MT = (T, I) is a matroid. Notice that an optimal solution to our
problem is exactly that of finding an independent set of maximum weight in
the matroid. Since all tasks in T have positive weight, an independent set of
maximum weight is a basis of maximum weight. Thus, by showing that MT is
a matroid, we can solve the optimization problem by using the matroid greedy
algorithm.

Lemma 4 Given a bipartite graph with vertices E ∪ T , let I denote the set of all
matchable subsets of T . Then MT = (T, I) is a matroid.

Proof: The heredity property is satisfied because ∅ is trivially matchable and


any subset of a matchable set is clearly also matchable.
To show that the exchange property is satisfied, let X and Y be two elements
of I such that |X| > |Y |. Since X and Y are matchable, let MX and MY be
matchings that match the vertices in X and Y with vertices in E. Color the
edges of MX − MY black, the edges of MY − MX white, and the edges of
MX ∩ MY gray. Notice that every edge in MX ∪ MY is colored black, white,
or gray. Since the number of edges in MX is exactly equal to the number of
vertices in X, and similarly for MY and Y , we have |MX | > |MY |. This implies
that there are more black edges than white edges.
Next, consider the subgraph M induced by black edges and white edges.4
Each vertex in M is incident on at most two edges in M , because it is incident
on at most one edge in MX and at most one edge in MY . Therefore, the vertices
of M have degree one or two. This implies that M comprises only cycles and
paths. Because no vertex of M can be incident on two edges of the same color,
the edges on these cycles and paths alternate between black and white; these are
called alternating cycles and paths. Each alternating cycle has an equal number
of black and white edges. Since there are more black edges than white edges,
some alternating path must have more black edges than white edges.
Let P be an alternating path with more black edges than white edges. Let
v1 , v2 , . . . , vk denote the vertices on path P from one endpoint to the other.
Since P has more black edges than white edges, the first and last edges on this
path must be black. Therefore, the path has an odd number of edges and thus
an even number of vertices. Since the graph is bipartite, the vertices alternate
between being in E and T . This means that one endpoint of P is in E and the
other is in T . Without loss of generality, assume that v1 ∈ T .
We now claim that v1 ∈ X − Y . First, v1 ∈ X because it is incident on an
edge in MX . Assume, by way of contradiction, that v1 ∈ Y . Then MY must
contain an edge incident on v1 . Such an edge is either white or gray. The
3 The technical name for such a subset is a partial transversal.
4 The subgraph induced by the black and white edges is the graph formed by considering only
those edges and the vertices incident to them.

10
Matroids: The Theory and Practice of Greed 193

black edge in P incident on v1 is, by definition, from matching MX . Since there


cannot be a second edge of MX incident on v1 , no gray edge is incident on v1 .
If there is a white edge incident on v1 , then this edge is part of P , contradicting
the assumption that v1 is an endpoint of P . Thus, v1 ∈ X − Y .
Since v1 ∈ X − Y , we now consider the set Y + v1 . To demonstrate that
Y +v1 is matchable, consider the matching MY modified by removing the white
edges in P from MY and adding the black edges in P to MY . This set is still a
matching and, in addition to matching every vertex in Y , matches v1 as well.
Thus Y + v1 is matchable and is therefore an element of I. 
Now that we have a matroid corresponding to this matching problem, we
can use the matroid greedy algorithm to determine an optimal solution: For any
graph, we first sort the vertices in T in nonincreasing order of weight and then
we repeatedly choose a maximum weight vertex that maintains independence.
The problem of determining if a set is independent in this context—that is,
whether a subset of T is matchable—can be solved using the bipartite matching
algorithm often covered in a graph theory course [Bondy and Murty 1976] or
network flow algorithms generally covered in an algorithms course [Cormen et
al. 1990]. Exercise 9 provides an example on which the greedy algorithm may
be performed. This example is sufficiently small that testing for independence
can easily be performed by inspection.

6. Conclusion
We have seen that matroids are a powerful tool for constructing and show-
ing the correctness of greedy algorithms. Matroids can also be used to develop
efficient algorithms for even more difficult optimization problems where sim-
ple greedy algorithms fail. For example, consider a variant of the bipartite
matching problem that arose in the task assignment problem. In that prob-
lem, weights were associated with a subset of the vertices in the graph and
the objective was to find a matching that maximized the sum of these weights.
Now consider the situation in which there is a positive weight associated with
each edge, rather than with some vertices, and we wish to find a matching
that maximizes the sum of the weights on the matched edges. Although the
greedy algorithm does not always find an optimal solution for this problem,
the problem can be solved using the notion of matroid intersections.
Given two matroids M1 = (E, I1 ) and M2 = (E, I2 ) over the same set E
and a weight function w : E → R, the matroid intersection problem for two
matroids is that of finding an element X ∈ I1 ∩I2 of maximum weight. Efficient
algorithms are known for solving the matroid intersection problem for two
matroids. Many discrete optimization problems, including the aforementioned
bipartite matching problem with edge weights, can be formulated as a matroid
intersection problem for two matroids.
Some optimization problems that cannot be solved by greedy algorithms,

11
194 The UMAP Journal 21.2 (2000)

or even by using the intersection of two matroids, can be formulated as the


intersection of three or more matroids. The traveling salesperson problem, for
example, can be formulated as that of finding an independent set of maximum
weight in three matroids. Unfortunately, the matroid intersection problem for
three or more matroids is NP-complete.
Several mathematical structures related to matroids have also been studied.
For example, an antimatroid is a structure with a weaker version of the heredity
property but a stronger version of the exchange property. A common frame-
work for matroids and antimatroids is a structure called a greedoid [Korte et al.
1991], which uses the weaker versions of both the heredity and the exchange
properties. These two structures too have applications in the development of
greedy algorithms for optimization problems.
Finally, while we have investigated applications of matroids to discrete
optimization, matroids and their related structures have a variety of other ap-
plications in mathematics. Active areas of current research are in applications
of such structures in algebra, geometry, and topology.
We refer the interested reader to several excellent books. Oxley’s text pro-
vides a comprehensive introduction to matroid theory [Oxley 1992]. Texts by
Lawler [1976] and by Papadimitriou and Steiglitz [1982] discuss the applica-
tions of matroids to discrete optimization.

7. Exercises
1. Prove Lemma 1.
2. Let M = (E, I) be a matroid. Analogous to the definition in a vector space,
the rank of a set X ⊆ E, denoted r(X), is the size of a largest independent
subset of X. Let B be a basis of M . Show that r(B) = r(E) = |B|.
3. Recall that if V is a vector space and X ⊆ V , the span of X, denoted span(X),
is the subspace of all vectors that can be expressed as a linear combination
of vectors in X. One property of vector spaces is that for any basis B in
vector space V , span(B) = V .
If M = (E, I) is a matroid and X ⊆ E, the span of X, denoted span(X),
is defined to be a maximal superset Y of X such that r(Y ) = r(X). Let
M = (E, I) be a matroid and let B be a basis in the matroid. Show that
span(B) = E.
4. In the Introduction, we described a problem involving a weighted graph in
which each vertex corresponds to a city, each edge corresponds to an existing
phone line, and the weight on each edge corresponds to the cost of leasing
that phone line. We assumed that the weight on each edge is a positive real
number. We then argued that the problem of finding the least expensive
subset of edges that permit us to find a path between any two vertices is
exactly that of finding a minimum spanning tree in the graph. Show that if

12
Matroids: The Theory and Practice of Greed 195

the edge weights are not necessarily all positive, then a minimum spanning
tree does not necessarily give an optimal solution to this problem.
5. Does the matroid greedy algorithm find a minimum spanning tree if edges
can have arbitrary real weights?
6. In the scheduling problem we assumed that all jobs had positive weights.
Can the matroid greedy algorithm be used to solve this problem if jobs can
have arbitrary real weights?

7. Consider the weighted graph shown in Figure 4. Use the greedy algorithm
described earlier to find a minimum spanning tree in this graph.

Figure 4. Weighted graph for Exercise 7.

8. Consider the scheduling problem for five jobs with deadlines d(1) = 2,
d(2) = 3, d(3) = 1, d(4) = 2, d(5) = 4 and weights w(1) = 3, w(2) = 1,
w(3) = 2, w(4) = 5, w(5) = 1.
a) Use the greedy algorithm to find an optimal schedule.
b) What is the sum of the weights of the on-time jobs in an optimal schedule
in this case?
c) How does this compare to the sum of the weights of the on-time jobs in
the schedule 1, 2, 3, 4, 5?
9. Consider the task assignment problem for three employees {e1 , e2 , e3 } and
four tasks {t1 , t2 , t3 , t4 }. Employee e1 is qualified to perform task t4 , em-
ployee e2 is qualified to perform tasks t1 and t2 , and employee e3 is qualified
to perform tasks t1 and t3 . The weights on the tasks are w(t1 ) = 5, w(t2 ) = 4,
w(t3 ) = 2, and w(t4 ) = 1.
a) Use the greedy algorithm to find an optimal solution for this instance of
the task assignment problem.
b) What is the total weight of the completed tasks in this solution?

13
196 The UMAP Journal 21.2 (2000)

10. Let E be a set, and let P1 , P2 , . . . , Pk be a partition of E. That is, E =


P1 ∪ P2 ∪ · · · ∪ Pk and for all i, j, 1 ≤ i < j ≤ k, Pi ∩ Pj = ∅. A set S is said
to partially represent P1 , . . . , Pk if S contains at most one member of each of
P1 , . . . , Pk . Let I be the collection of sets that partially represent P1 , . . . , Pk .
In this exercise, we verify that MP = (E, I) is a matroid.
a) Show that MP satisfies the heredity property.
b) Show that MP satisfies the exchange property.
11. We investigate another discrete optimization problem and its corresponding
matroid. Assume that there are n computers in a network. Ideally, every
computer in the network is paired up with exactly one other computer so
that the two computers can exchange files periodically for backup purposes.
For compatibility reasons, only certain pairs of computers can be matched,
and it therefore may not be possible to find a match for every computer. Each
computer administrator has stipulated a fee that they are willing to pay to
have their computer matched. The company that provides the matching
service would like to find a matching that maximizes the total fees that it
can collect.
This optimization problem can be modeled as a graph in which vertices
correspond to computers, edges correspond to computers that can poten-
tially be matched, and the weight on each vertex represents the fee to be
paid if that vertex is matched. Recall that a matching in a graph is defined
to be a set of edges of G such that no two edges share a common vertex.
Our objective is to find a matching that maximizes the sum of the weights
on the matched vertices.
Notice that this graph matching problem is slightly different from the
one that arose in the task assignment problem. First, the graph that models
this problem is not necessarily bipartite. In addition, in this problem every
vertex has an associated weight.
To solve this problem, we consider a new matroid called a matching
matroid. Let G = (V, E) be a graph. Define X ⊆ V to be matchable if there
exists some matching M ⊆ E such that every vertex in X is incident on
some edge in M . For convenience, we say that an edge in M covers its
endpoints or M covers the set X. Unlike the case of the bipartite matchings
in the task assignment problem, an edge in matching M may be incident on
zero, one, or two vertices in X. Let I be the set of all matchable subsets of
V . Assume that we already have an algorithm that determines whether or
not a set is matchable and, if it is matchable, constructs a matching for it.
a) If MM = (V, I) is a matroid and w is a weight function from the ver-
tices to the positive reals, show that a basis of maximum weight in MM
corresponds to an optimal solution to this optimization problem.
b) We now show that MM is a matroid in a sequence of steps. Begin by
showing that MM satisfies the heredity property.
c) To show that the exchange property is satisfied, begin by considering
X, Y ∈ I such that |X| > |Y |, and consider matchings MX and MY that

14
Matroids: The Theory and Practice of Greed 197

cover X and Y . Argue that if MY covers a vertex in X − Y , then the


exchange property is satisfied.
d) Now consider the case that MY covers no vertex in X − Y . As in the
proof of Lemma 4, color the edges in MX − MY black, color the edges
in MY − MX white, and color the edges in MX ∩ MY gray. Show that
the gray edges cover at least as many vertices in Y as in X.
e) Recall that an alternating path/cycle is a path/cycle that alternates between
black and white edges. Show that the edges on the alternating cycles
cover at least as many vertices in Y as in X.
f) Show that the alternating paths contain more vertices of X than Y .
g) Show that an endpoint of an alternating path cannot be in both X and Y .
h) Show that there exists an alternating path v1 , . . . , vk such that v1 ∈ X −Y
and vk ∈
/ Y.
i) Show that Y + v1 ∈ I by showing that a matching exists that covers this
set. Conclude that MM is a matroid.
12. As an example of the optimization problem described in Exercise 11, con-
sider a network comprising five computers, a, b, c, d, e, represented by the
graph in Figure 5. An edge between two vertices indicates that the corre-
sponding computers are compatible and may be matched to one another.
The number next to each vertex represents the fee (or weight) associated
with that computer. Use the greedy algorithm to find an optimal solution
for this instance of the optimization problem.

Figure 5. Graph for Exercise 12.

15
198 The UMAP Journal 21.2 (2000)

8. Solutions to the Exercises


1. Let M = (E, I) be a matroid and let X and Y be two bases of M . If
|X| > |Y |, then by the exchange property there exists some e ∈ X − Y
such that Y + e ∈ I. This contradicts the assumption that Y is a basis. By
symmetry, it cannot be the case that |X| < |Y |. Thus |X| = |Y |, proving the
lemma.
2. Since B is an independent subset, r(B) = |B|. Further, note that r(E) = |C|
where C is some basis of the matroid. From the previous problem, B and
C must have the same cardinality, so r(E) = |B| = r(B).
3. From the previous problem r(B) = r(E). Since E is a maximal superset of
B, the claim follows.

4. If some edge weights are negative, it may be advantageous to select a subset


of edges that contain a cycle. For example, consider a graph with three
vertices a, b, c and edges ab, bc, ca with weights −1, −2, −3. In this case, we
would select all three edges in spite of the fact that they form a cycle.

5. Yes. We wish to find a basis of maximum weight regardless of whether the


weights are positive or negative. The matroid greedy algorithm was shown
to work for any matroid M = (E, I) with weight function w : E → R.

6. No. We showed that an optimal solution to this problem corresponds to


a basis of maximum weight in the matroid assuming that the weights are
positive. If the weights are not positive this claim is not true. In particular, a
basis does not necessarily correspond to an optimal solution to this problem
when some jobs have negative weights. Therefore, the matroid greedy
algorithm cannot be used for this problem if negative weights are permitted.

7. The algorithm selects the edges with weights 1, 2, 4, and 6 to be in the


spanning tree.
8. a) The greedy algorithm finds schedule 4, 1, 2, 5, 3 or 1, 4, 2, 5, 3, depending
on how the tie between jobs 1 and 4 is broken.
b) The sum of the weights of the on-time jobs is 10.
c) The sum of the weights of the on-time jobs in schedule 1, 2, 3, 4, 5 is 4.
9. a) The algorithm selects tasks t1 , t2 , and t4 . These tasks can be assigned to
employees e3 , e2 , and e1 .
b) The total weight of this solution is 10.

16
Matroids: The Theory and Practice of Greed 199

10. a) The empty set has no elements from any Pi , 1 ≤ i ≤ k, and is therefore
independent. Let X ∈ I and Y ⊆ X. Since X has at most 1 element
from each Pi , 1 ≤ i ≤ k, Y does as well. Hence, Y is independent.
b) Let X, Y ∈ I with |X| > |Y |. There must exist some i, 1 ≤ i ≤ k, such
that X contains an element e of Pi but such that Y contains no element
of Pi . Therefore, e ∈ X − Y and Y + e ∈ I.
11. a) Each set in I corresponds to a solution to the problem. Since the weight
function is positive, an optimal solution is a set in I such that the addition
of any other element of V results in a set not in I. This is a basis of I.
b) The hereditary property is satisfied since ∅ ∈ I and if X ∈ I is covered
by some matching M , then M also covers any subset Y ⊆ X and thus
Y ∈ I.
c) If MY covers some vertex v ∈ X − Y , then MY covers Y + v and thus
Y + v ∈ I and the exchange property is satisfied.
d) Assume that MY covers no vertex in X − Y . The gray edges are those
in MX ∩ MY and thus cannot cover a vertex in X − Y . Thus, if a gray
edge covers a vertex in X, that vertex must be in Y as well. Therefore,
the gray edges cover at least as many vertices in Y as in X.
e) Every vertex on an alternating cycle is covered by an edge in MY . Thus,
any vertex in X on an alternating cycle must also be in Y . Therefore, the
alternating cycles cover at least as many vertices in Y as in X.
f) Every vertex in X ∪Y is covered by a gray, black, or white edge. From the
previous two results, we know that the gray edges and the alternating
cycles contain at least as many vertices in Y as in X. Since |X| > |Y |,
the alternating paths must contain more vertices of X than Y .
g) Let v1 , . . . , vk be an alternating path and assume that v1 ∈ X ∩ Y . Then
both MX and MY contain edges incident on v1 . Since edge (v1 , v2 ) is
from either MX or MY but not both, there must exist another edge from
MX or MY incident on v1 . Thus, v1 is either incident on two edges from
the same matching or is not an endpoint of the alternating path, both of
which are contradictions.
h) From the above observations, there must exist an alternating path P
with more vertices in X than Y . Let v1 , . . . , vk be the vertices of P from
one endpoint to the other. At least one endpoint must be in X − Y since
otherwise the path has at least as many vertices in Y as in X. Without
loss of generality, assume v1 ∈ X − Y . If vk ∈ Y , then from the above
observation vk cannot be in X. In this case, the path contains at least as
many vertices in Y as in X, a contradiction.
i) Let us remove the white edges in P from MY and add the black edges
of P to MY . This is still a matching, and it is is incident on all vertices in
Y + v1 . This matching confirms that Y + v1 is matchable and that MM
is a matroid.

17
200 The UMAP Journal 21.2 (2000)

12. An optimal solution comprises vertices e, b, d, c. The total weight of this


solution is 14, and a matching for this set of vertices matches e with c and
matches b with d.

9. References
Bondy, J.A., and U.S.R. Murty. 1976. Graph Theory with Applications. London:
Macmillan.
Borůvka, O. 1926. On a certain minimal problem. Prace Moravske Predovedecke
Spolecrosti 3: 37–58
Cormen, T.H., C.E. Leiserson, and R.L. Rivest. 1990. Introduction to Algorithms.
New York: McGraw-Hill.
Edmonds, J., and D.R. Fulkerson. 1965. Transversals and matroid partition.
Journal of Research of the National Bureau of Standards, Section B 69: 147–153.
Garey, M.R., and D.S. Johnson. 1970. Computers and Intractability: A Guide to
the Theory of NP-Completeness. New York: W.H. Freeman.
Korte, B., L. Lovász, and R. Schrader. 1991. Greedoids. New York: Springer
Verlag.
Kruskal, J.B. 1956. On the shortest spanning subtree of a graph and the trav-
elling salesman problem. Proceedings of the American Mathematical Society 7:
48–50.
Lawler, E. 1976. Combinatorial Optimization: Networks and Matroids. New York:
Holt, Rinehart, and Winston.
Oxley, J.G. 1992. Matroid Theory. New York: Oxford University Press.
Papadimitriou, C., and K. Steiglitz. 1982. Combinatorial Optimization: Algo-
rithms and Complexity. Englewood Cliffs, NJ: Prentice-Hall.
Prim, R.C. 1957. The shortest connecting network and some generalisations.
Bell Systems Technical Journal 36: 1389–1401.
Whitney, H. 1935. On the abstract properties of linear independence. American
Journal of Mathematics 57: 509–533.

18
Matroids: The Theory and Practice of Greed 201

About the Authors


Christian Jones completed a B.S. in mathemat-
ics from Harvey Mudd College and is currently
pursuing a Ph.D. in mathematics at the University
of Florida. His research interests are in combina-
torics, especially algorithms in graph theory, ma-
troid theory, and number theory.

Ran Libeskind-Hadas completed a B.S. in ap-


plied mathematics from Harvard University and
M.S. and Ph.D. in computer science from the Uni-
versity of Illinois at Urbana–Champaign. He is an
associate professor of computer science at Harvey
Mudd College. His research interests are in al-
gorithms and parallel computing and he teaches
courses in discrete mathematics, algorithms, and
other areas of computer science.

19

You might also like