Greedy Matching Algorithm
Greedy Matching Algorithm
Given a graph G = (V, E) with nodes V and edges E, a matching M is a subset of edges
M ⊂ E such that each node has degree at most 1 in M . A node v is matched in M if it
has an adjacent edge. A matching M is a perfect matching if all nodes are matched. Both
of the recommended text books have discussions on the matching problem, its applicants,
so we won’t repeat this here. In this note we will discuss two topics.pFirst cover the greedy
algorithm for max weight matching, and the the Hopcroft -Karp O( |V ||E|) algorithm for
finding a maximum matching (with no weights).
Greedy Algorithm
Given a graph and weights we ≥ 0 for the edges, the goal is to find a matching of large
weight. The greedy algorithm starts by sorting the edges by weight, and then adds edges to
the matching in this order as long as the set of a matching. So a bit more formally:
The greedy algorithm clearly doesn’t find the optimal solution. To see an example,
consider a path of length 3 with two edges of weight 1, and the middle edge of weight 1 + .
The greedy algorithm results in a single edge matching of weight 1 + , while the optimum is
the two edge matching of weight 2. Essentially a factor of 2 off. We claim that this example
is worst possible
Theorem 1. The weight of the matching M returned by the greedy algorithm is at least half
of the weight of any matching M ∗ .
Proof. Let M ∗ is a matching of maximum weight, and M be the matching returned by the
greedy algorithm. Note that for any edge e ∈ M ∗ \ M , there is a reason e didn’t get into the
greedy matching M , a previously considered edge, lets call it f (e) that has higher weight,
and shares an end-node with e. If there are multiple such edges, let f (e) be either of the two
such edges.
• For an edge f there can be f (e) for up to at most two edges e, conflicting with edge f
at the two different ends.
Putting these two facts together, we get the following inequalities
X X X
we ≥ wf (e) ≥ 2wf
ein∈M ∗ e∈∈M ∗ f ∈∈M
Proof. For an augmenting path P the set M 0 = M 4P is a matching of one larger size.
To see the opposite, let M ∗ be a matching larger size than M , and consider M ∗ 4M . Note
that in this set all nodes have degree at most 2 (one from M and one from M ∗ ), so it consist
of a set of disjoint paths and cycles. Further, these paths and cycles have to alternate edges
from M and M ∗ so they are alternating paths and cycles for matching M . An alternating
cycle must have the same number of M and M ∗ edges. Since M ∗ is larger, some path P
must have more M ∗ edges. The only way a path can have more M ∗ edges, if its a path
starting and ending with an M ∗ edge, at an nodes unmatched by M , so its an augmenting
path.
To be used later, we include here a stronger version of this lemma that we’ll need:
Lemma 3. Given a matching M so that there is a matching M ∗ that has k more edges (that
is, |M ∗ | ≥ |M | + k) for some k > 0, then M has an augmenting path of length at most
|V |/k − 1 (using at most |V |/k nodes).
Proof. Recall the proof above thinking about M ∗ 4M . Since M ∗ is k larger, the symmetric
difference must have at least k paths with more M ∗ edges than M edges. These are k disjoint
augmenting paths, so one of them must have at most |V |/k nodes.
Maximum Matching Algorithm
Using Lemma 2 above, we can find maximum matching, if we can find augmenting path.
Finding augmenting path is hard in general graphs, but it is easier in bipartite graphs. The
following directed graph constriction works well.
For a bipartite graph G = (V, E) and a matching M in G we call consider the following
residual graph RM with nodes V and two additional nodes s and t, and edges as follows,
using A and B to denote the two sized of the bipartite graph.
• For each unmatched node a ∈ A add edges (s, a) to RM
• For each unmatched node b ∈ B add edges (b, t) to RM
• For each edge (a, b) ∈ M add directed edge (b, a) to RM
• For each edge (a, b) 6∈ M add directed edge (a, b) to RM
So edges in the matching are directed B to A, while edges not in the matching are directed
A to B. Any path in RM must alternate going A to B to A, etc. A path from s to t of length
3 is just one unmatched edges (with nodes s and t added). A path from s to t of length 3
consist of s, an augmenting path of length 3 and then node t. More generally, we get the
following lemma.
Lemma 4. Paths in RM from s to t are in one to one correspondence to augmenting paths
in G for matching M .
Using a simple DFS or BFS algorithm to find path in RM we get an O(|V ||E|) maximum
matching algorithm as follows.
First focus on the inner loop, aiming to find a maximal set of disjoint shortest paths in a
graph efficiently. The very first of these inner loops, when M = ∅, the algorithm is the greedy
algorithm discussed above. As we have no weights (or all weights are 1), any ordering of the
edges will do, and the matching we find is at least 1/2 of maximum size matching. It is not
hard to see that this greedy algorithm (without the sort of the edges) can be implemented
in linear time (O(|E| + |V |) time).
0
Next we show that this is also true in any later iteration. Note that all paths in RM will
be shortest, so the inner loop doesn’t need to focus on finding short paths, only on finding
disjoint paths. We suggest to implement this via Depth First Search (DFS). DFS finds a
path from s to t. After it found some number of paths, we start again from s in looking for
additional paths, but do not have to search again the edges the we searched previously.
Lemma 6. In any directed graph G = (V, E) with nodes s and t we can find in O(|E| + |V |)
time a set of path P from s to t in G that are disjoint (expect for s and t), and no further s
to t path in G is disjoint from all of the paths in P.
This implements the inner while loop in linear time. Next we have to limit the number
of iterations of the outside while loop. This will come in two steps. First we show that the
length of the shortest path increases between every two iterations.
Lemma 7. For any matching M in a bipartite graph G = (V, E) and a maximal set of
disjoint shortest paths P in RM and let d be the length of the shortest path in RM . Now
consider the matching M 0 obtained by augmenting M by all the augmenting paths in P. The
shortest path in RM 0 has length at least d + 2.
Proof. In the directed graph RM let d(v) denote distance from s to v in this graph, so
d(t) = d. Note that any shortest path will only use edges that go from a node v at some
distance d(v) to a node at distance d(v) + 1. Now consider the way RM and RM 0 differ. To
get RM 0 we delete some edges from RM : the edges leasing s and the edges entering t that
are along the paths in P. For the other edges along the paths in P each edge changes its
direction (as it switched between being in the matching and outside the matching). If an
edge along a path went from a node v at some distance d(v) to a node w at distance d(v) + 1,
the edge goes from distance d(v)+1 to d(v) (according to the distances in RM ). So no edges
are added that can make distanced shorter. Further, P is already a maximal size, so no
shortest path can be disjoint from P. This shows that the distance d0 > d. To see that it is
at least 2 bigger, note that due to the bipartite graph we start with, the distance has to be
odd.
To show that the number of p iterations is low, we put together Lemmas 6 and 3. By
Lemma 6 it will take at most |V |/2 iteration p of the outer loop to reach a matching M
where the shortest path length in RM is at least |V |. We claim that atpthis point the size
of the matching is almost optimal, more concretely that |M ∗ | − |M | ≤ |V |. To see why,
assume that |M ∗ | − |M | = k, and use Lemma
p 3 to see that the shortest
p augmenting path
now must have length d ≤ |V |/k. So d ≥ |V | implies that k ≤ |V |. Each iteration of
the outer loop adds at least one edge to the matching, so we get our main result.