Greedy
Greedy
Introduction
Analysis of algorithms
In this lecture we begin the actual “analysis of algorithms” by examining greedy algorithms, which
are considered among the easiest algorithms to describe and implement. Before doing so, we describe
the different aspects of an algorithm that require analysis.
Correctness It must be established that the algorithm performs as advertised. In other words, for
each legal input, called a problem instance, the algorithm produces the desired output.
Complexity Bounds must be provided on either the amount of time required to execute the algo-
rithm, or the amount of space (e.g., memory) that is consumed by the algorithm as a function
of the size of an input instance.
Implementation Appropriate data structures must be identified that allow for the algorithm to
achieve a desired time or space complexity.
The correctness of an algorithm is sometimes immedate from its description, while the correctness of
others may require clever mathematical proofs.
As for complexity, in this course we are primarily concerned with the big-O growth of the worst-case
running time of an algorithm. Of course, the running time T will be a function of the size of an
input to the algorithm, and will have a big-O growth that is directly proportional to the number of
algorithm steps that are required to process the given input. Here we define the size of a problem
instance as the minimum number of bits needed to represent the instance. The convention is to
represent the size of an instance using one or more size parameters that increase in proportion to
the problem size. For example, if the size of an input is represented using the single size parameter
n, then T (n) will represent the worst-case running time of the algorithm. By “worst case” we mean
that T (n) represents the running time of the size-n input that requires the most time to run.
1
Example 1. Consider an algorithm that sorts an array integers. Suppose each integer can be
represented using k bits. Then the size of a problem instance is equal to k times the number of
integers to be sorted. Let n be a parameter that represents the number of integers in the array. Then
the problem size is equal to nk, and the running time function can be written as T (n, k). Now, if
the algorithm is independent of the number of bits used to represent each integer, then we may drop
the k and simply let n denote the problem size. In this case T is written as T (n). For example, the
Quicksort algorithm makes use of the swap operation, which is independent of the number of bits
representing each integer (assuming that each integer is stored in a single register or memory word).
Thus T (n) seems appropriate than T (n, k). On the other hand, Radix sort works directly on the bits
of each integer. In this case T (n, k) seems more appropriate.
Example 2. Recall that a graph is a pair of sets G = (V, E), where V is the vertex set, and E is
the edge set whose elements have the form (u, v), where u, v ∈ V . A graph algorithm takes as
input a graph, and computes some property of the graph. In this case the worst-case running time is
expressed as T (m, n), using size parameters n = |V |, called the order of G, and m = |E|, called the
size of G. Notice that the size parameters for representing the size of a single vertex or edge are not
included, since the algorithm steps are usually independent of the nature of each vertex and edge.
In other words, regardless of whether each vertex is an integer, or an Employee data structure, the
algorithm steps, and hence the big-O growth of the running time, remain the same.
Example 3. Consider an algorithm that takes as input a positive integer n, and determines whether
or not n is prime. Since a positive integer n can be repreented using blog nc + 1 bits, we use size
parameter log n to represent the problem size. Thus, the worst-case running time is written as
T (log n).
Note that we use the growth terminology from the big-O lecture to describe the running time of an
algorithm. For example, if an algorithm has running time T (n) = O(n), then the algorithm is said
to have a linear running time.
Another time complexity measure of interest is the average-case running time Tave , which is
obtained by taking the average of the running times for inputs of a given size.
Finally, the runnng time of an algorithm is dependent on its implementation. For example, a graph
algorithm may have running time T (m, n) = O(m2 ) using one implementation, and T (m, n) =
O(m log n) using another. For this reason complexity analysis is inseparable from implementation
analysis, and it often requires the use of both basic and advanced data structures for achieving a
desired running time. On the other hand, correctness analysis is usually independent of implemen-
tation.
Greedy Algorithms
2
1. the algorithm works in stages, and during each stage a choice is made which is locally optimal;
2. the sum totality of all the locally optimal choices produces a globally optimal solution.
Note that if a greedy procedure does not always lead to a globally optimal solution, then we will refer
to it as a heuristic, or a greedy heuristic. Heuristics often provide a “short cut” to a solution,
but not necessarily to an optimal solution. Hencefore, we will use the term “algorithm” for a method
that always produces a correct/optimal solution, and “heuristic” to describe a procedure that may
not always produce the correct or optimal solution.
The following are some problems that that can be solved using a greedy algorithm. Their algorithms
can be found in this lecture, and in the exercises.
Minimum Spanning Tree finding a spanning tree for a graph whose weight edges sum to a mini-
mum value
Fractional Knapsack selecting a subset of items to load in a container in order to maximize profit
Task Selection finding a maximum set of non-overlapping tasks (each with a fixed start and finish
time) that can be completed by a single processor
Huffman Coding finding a code for a set of items that minimizes the expected code-length
Unit Task Scheduling with Deadlines finding a unit-length a task-completion scehdule for a
single processor in order to maximize the total earned profit where task ti
Single source distances in a graph finding the distance from a source vertex in a graph to every
other vertex in the graph
Like all families of algorithms, greedy algorithms tend to follow a similar analysis pattern.
Greedy Correctness Correctness is usually proved through some form of induction. For example,
assume their is an optimal solution that agrees with the first k choices of the algorithm. Then
show that there is an optimal solution that agrees with the first k + 1 choices.
Greedy Complexity The running time of a greedy algorithm is determined by the ease in main-
taining an ordering of the candidate choices in each round. This is usually accomplished via a
static or dynamic sorting of the candidate choices.
Greedy Implementation Greedy algorithms are usually implemented with the help of a static
sorting algorithm, such as Quicksort, or with a dynamic sorting structure, such as a heap.
Additional data structures may be needed to efficiently update the candidate choices during
each round.
3
Minimum Spanning Tree Algorithms
Let G = (V, E) be a simple connected graph. Then a spanning tree T = (V, E 0 ) of G is a subgraph
of G which is also a tree. Notice that T must include all the vertices of G. Thus, a spanning tree of G
represents a minimal set of edges that are needed by G in order to maintain connectivity. Moreover,
if G is weighted, then a minimum spanning tree (mst) of G is a spanning tree whose edge weights
sum to a minimum value.
Example 4. Consider a problem in which roads are to be built that connect all four cities a, b, c,
and d to one another. In other words, after the roads are built, it will be possible to drive from any
one city to another. The cost (in millions) of building a road between any two cities is provided in
cities a b c d
a 0 30 20 50
the following table. b 30 0 50 10
c 20 50 0 75
d 50 10 75 0
Using this table, find a set of roads of minimum cost that will connect the cities.
In this section we present two greedy algorithms for finding an MST in a simple weighted connected
graph G = (V, E).
Kruskal’s Algorithm
Kruskal’s algorithm builds a minimum spanning tree in greedy stages. Assume that V = {v1 , . . . , vn },
for some n ≥ 1. Define forest F that has n trees T1 , . . . , Tn , where Ti consists of the single vertex
vi . Sort the edges of G in order of increasing weight. Now, following this sorted order, for each edge
e = (u, v), if u and v are in the same tree T , then continue to the next edge, since adding e will create
4
a cycle in T . Otherwise, letting Tu and Tv be the respective trees that to which u and v belong,
replace Tu and Tv in F with the single tree Tu+v that consists of the union of trees Tu and Tv , along
with edge e. In other words,
and
F ← F − Tu − Tv + Tu+v .
The algorithm terminates when F consists of a single (minimum spanning) tree.
Example 5. Use Kruskal’s algorithm to find an mst for the graph G = (V, E), where the weighted
edges are given by
E = {(a, b, 1), (a, c, 3), (b, c, 3), (c, d, 6), (b, e, 4), (c, e, 5), (d, f, 4), (d, g, 4),
5
Theorem 2. When Kruskal’s algorithm terminates, then F consists of a single minimum spanning
tree.
Proof of Theorem 2. Let T = (V, E 0 ) be the tree returned by Kruskall’s algorithm on input
G = (V, E), and let e1 , e2 , . . . , en−1 be the edges of E 0 in the order that they were added. Let Topt
be an mst of G that contains edges e1 , . . . , ek−1 , but does not contain ek , for some 1 ≤ k ≤ n − 1.
We show how to transform Topt into an MST Topt2 that contains e1 , . . . , ek .
Consider the result of adding ek to Topt to yield the new graph Tc . Then, since Tc is connected and
has n edges, Tc has a cycle C containing ek .
Claim. there must be some edge e in C that i) is different from each of e1 , . . . , ek , and ii) occurs
after ek in the sorted list of edges created by Kruskal’s algorithm. Hence, we ≥ wek .
Proof of Claim. First, i) must true, since otherwise cycle C would consist only of edges from
e1 , . . . , ek , which contradicts the fact that these edges all belong to (acyclic) tree T . Second, suppose
ii) were false. Then every edge e in C − ek is positioned before ek in the sorted Kruskal list. But
since C − ek is acyclic (as it is a subgraph of Topt ), it follows that every edge in C − ek belongs to
T , since the algorithm would accept each edge e, since e does not make a cycle with the already-
accepted edges. In other words, C − ek = {e1 , . . . , ek−1 }. But this implies C is a cycle of T , which
is a contradiction. Thus, there must be some edge e ∈ C that occurs after ek in the edge-sorted list,
which implies we ≥ wek .
Finally, let Topt2 = Topt − e + ek . Then i) Topt2 is connected, since any path in Topt that traverses
e can now instead traverse C − e; and ii) Topt2 is acyclic since it is connected and has n − 1 edges
(see exercises). Therefore, Topt2 is an mst that agrees with T up to the first k edges e1 , . . . , ek that
are added by the algorithm. Continuing the above argument for increasing values of k, eventually
we will arrive at an mst that agrees with T in all of its edges, implying that T is an mst.
Proof of Theorem 3. Given connected simple graph G = (V, E), sort the edges of E by increasing
order of weight using Mergesort. This requires Θ(m log m) steps. Next, forest F is initialized with
trees T1 , . . . , Tn , where Ti consists of the single vertex vi , i = 1, . . . , n. Create an integer array of size
n called id, where id[i] gives the id of the tree that contains vi . Thus, id[i] is initialized with i, for
all i = 1, . . . , n. Finally, use a tree data structure to create membership trees M1 , . . . , Mn , where Mi
keeps track of the vertices in tree Ti , for all i = 1, . . . , n. The Tree data structure used to create an
M tree consists of a root attribute that references a vertex of V , along with a children attribute
that lists all the child Trees of root. It should be emphasized that the structure of Mi bears no
relation to that of Ti . The sole purpose of Mi is to store all of the vertices of Ti into a structure.
Now for each edge e = (vi , vj ) in the ordering, if id[i] = id[j], then vi and vj are already in the same
tree. So e is skipped. Otherwise, add e to the list of mst edges. Furthermore, asume without loss of
generality that the size of Tj does not exceed that of Ti . Then traverse Mj and, for each vk ∈ Mj ,
6
set id[k] = i. In other words, every vertex in Tj is now a member of Ti . Finally, add Mj to the list
of children of Tree Mi .
Thus, in addition to sorting and iterating through the edges, the only significant additional work is
to maintain the id array, which requires that one of the M trees be traversed each time a tree merger
occurs. Now, initialize variable count to zero and increment it each time a value of id is updated,
then count stores the big-O growth of the total steps that are required to maintain the id array.
Proof of Claim. We use mathematical induction on n. For the basis step, if n = 1, then no tree
mergers are performed, and count = 0 = 1 log 1.
Now consider n ≥ 2, and assume that, when the implementation is peformed on any graph of order
k < n, then the count variable never exceeds k log k. Now let G be a connnected simple graph of
order n and, when performing the implementation on G, consider the situation where there are two
trees, say T1 and T2 remaining. Let counti , i = 1, 2 denote the number of times a value of id was
updated for a vertex that belongs in Ti . Then count = count1 + count2 , and, letting x denote the
size of T1 , count1 ≤ x log x, and count2 ≤ (n − x) log(n − x). Without loss of generality, assume
x ≤ n/2, then the final value of count is bounded by
x log x + (n − x) log(n − x) + x,
since the ids of vertices in T1 will be updated one final time, and contribute an additional count of
x. Hence, the problem reduces to showing that
whenever 1 ≤ x ≤ n/2. Now let f (x) = x log x + (n − x) log(n − x) + x be a real-valued function over
the interval [1, n/2]. Then
x
f 0 (x) = 1/ ln(2) + log x − 1/ ln 2 − log(n − x) + 1 = log( ) + 1.
n−x
This function has a critical point at x = n/3 which is a local minimum, since f 00 (n/3) > 0. Thus, f is
maximized at either x = 1 or x = n/2. It is an exercise to show that f (1) = (n − 1) log(n − 1) + 1 ≤
n log n for all n ≥ 2, while f (n/2) = n log n. Therefore, count ≤ n log n.
Returning to the proof Theorem 3, we see that the running time T (m, n) = Θ(m log m) + O(n log n).
But since n + 1 ≤ m, we have T (m, n) = Θ(m log m).
7
Example 6. Show the M trees during each stage of Kruskal’s algorithm applied to the Example-5
problem instance.
Prim’s Algorithm
Prim’s algorithm builds a single tree in stages, where a single edge/vertex is added to the current
tree at each stage. Given connected and weighted simple graph G = (V, E), the algorithm starts by
initializing a tree T1 = ({v}, ∅), where v ∈ V is a vertex in V that is used to start the tree.
Now suppose tree Ti having i vertices has been constructed, for some 1 ≤ i ≤ n. If i = n, then the
algorithm terminates, and Tn is the desired spanning tree. Otherwise, let Ti+1 be the result of adding
to Ti a single edge/vertex e = (u, w) that satisfies the following.
8
Theorem 4. Prim’s algorithm returns a minimum spanning tree for input G = (V, E).
The proof of correctness of Prim’s algorithm is very similar to that of Kruskal’s algorithm, and his
left as an exercise. Like all exercises in these lectures, the reader should make an honest attempt to
construct a proof before viewing the one provided in the solutions.
Prim’s algorithm can be efficiently implemented with the help of a binary min-heap. The first step
is to build a binary min-heap whose elements are the n vertices. A vertex is in the heap iff it has
yet to be added to the tree under construction. Moreover, the priority of a vertex v in the heap is
defined as the least weight of any edge e = (u, v), where u is a vertex in the tree. In this case, u is
called the parent of v, and is denoted as p(v). The current parent of each vertex can be stored in
an array. Since the tree is initially empty, the priority of each vertex initialized to ∞ and the parent
of each vertex is undefined.
Now repeat the following until the heap is empty. Pop the heap to obtain the vertex u that has a
minimum priority. Add u to the tree. Moreover, if p(u) is defined, then add edge (p(u), u) to the
tree. Finally, for each vertex v still in the heap for which e = (u, v) is an edge of G, if we is less than
the current priority of v, then set the priority of v to we and set p(v) to u.
The running time of the above implementation is determined by the following facts about binary
heaps.
3. When the priority of a vertex is reduced, the heap can be adjusted in O(log n) steps.
4. The number of vertex-priority reductions is bounded by the number m = |E|, since each
reduction is caused by an edge, and each edge e = (u, v) can contribute to at most one reduction
(namely, that of v’s priority) when u is popped from the heap.
Putting the above facts together, we see that Prim’s algorithm has a running time of O(n + n log n +
m log n) = O(m log n).
Example 8. Demonstrate the heap implementation of Prim’s algorithm with the graph from Ex-
ample 4.
9
Dijkstra’s Algorithm
Let G = (V, E) be a weighted graph whose edge weights are all nonnegative. Then the cost of a
path P in G, denoted cost(P ), is defined as the sum of the weights of all edges in P . Moreover, given
u, v ∈ V , the distance from u to v in G, denoted d(u, v), is defined as the minimum cost of a path
from u to v. In case there is no path from u to v in G, then d(u, v) = ∞.
Dijkstra’s algorithm is used to find the distances from a single source vertex s ∈ V to every other
vertex in V . The description of the algorithm is almost identical to that of Prim’s algorithm. Like
Prim’s algorithm, the algorithm builds a single tree in stages, where a single edge/vertex is added
to the current tree at each stage. Given weighted (and possibly directed) graph G = (V, E), the
algorithm starts by initializing a tree T1 = ({s}, ∅), where s ∈ V is the source vertex from which the
distances are to be calculated.
Now suppose tree Ti having i vertices has been constructed, for some 1 ≤ i ≤ n = |V |. If i = n
or there is no edge that is incident with both a vertex in Ti with one not in Ti , then the algorithm
terminates, and Ti is the desired spanning tree. Otherwise, let Ti+1 be the result of adding to Ti a
single edge/vertex e = (u, v) that satisfies the following.
1. u ∈ Ti and and v 6∈ Ti .
Example 7. Demonstrate Dijkstra’s algorithm on the directed weighted graph with the following
edges.
(a, b, 3), (a, c, 1), (a, e, 7), (a, f, 6), (b, f, 4), (b, g, 3), (c, b, 1), (c, e, 7), (c, d, 5), (c, g, 10), (d, g, 1),
10
The heap implementation of Prim’s algorithm can also be used for Dijkstra’s algorithm, except now
the priority of a vertex v is the minimum of d(s, u) + we , where e = (u, v) is an edge that is incident
with a vertex u in the tree. Also, the priority of s is initialized to zero.
Although we are able to copy the implementation of Prim’s algorithm, and appy it to Dijkstra’s
algorithm, we cannot do the same with Prim’s correctness proof, since finding an mst is inherently
different from that of finding distances in a graph.
Theorem 5. Dijkstra’s algorithm returns a tree T which has the following property. For each vertex
v ∈ T , the cost of the unique path from root s to v in T is i) equal to the priority of v when v is
popped from the heap, and ii) is equal to d(s, v). Moreover, for any vertex v not in the tree, it is the
case that d(s, v) = ∞.
Proof of Theorem 5. We use induction to prove that, for each vertex v added to T , the cost of the
unique path from root s to v in T is i) equal to the priority of v when v is popped from the heap, and
ii) is equal to d(s, v). We do this by considering the order in which the vertices are added. We let Ti
denote the tree that results after the first i vertices have been popped from the heap and added to
the tree.
We first show that the statement is true for the first added vertex. Since this vertex is s, and the
priority of s is initialized to zero, we have d(s, s) = 0 which equals the priority of s when it is popped
from the heap. Moreover, 0 is the cost of the unique path from s to s in T1 .
Now assume that the statement is true for the first k − 1 vertices that are popped from the heap, for
some k ≥ 2. Consider the k th vertex v that is popped from the heap and whose priority is finite. By
definition, the priority of v is equal to the minimum of d(s, u) + we , for any edge e = (u, v) for which
u is a vertex in Tk−1 . Moreover, since, by the inductive assumption, d(s, u) equals the cost of the
unique path P from s to u in Tk−1 , and, after v and e are added to the tree to form Tk , the unique
path from s to v is P2 = P, v, and we see that the priority of v is equal to the cost of the unique path
from s to v in Tk . It remains to show that this cost is equal to d(s, v). Suppose otherwise. Assume
d(s, v) < d(s, u) + we . Then there is a path Popt from s to v whose cost is less than d(s, u) + we .
Moreover, since s ∈ Tk−1 and v 6∈ Tk−1 , Popt has a unique edge e2 = (u2 , v2 ) for which u2 ∈ Tk−1 and
v2 6∈ Tk−1 . Thus, by the inductive assumption, the cost of Popt is at least d(s, u2 ) + we2 . But since
v2 is in the heap (before v was popped), its priority is no greater than d(s, u2 ) + we2 . Finally, since v
(and not v2 ) was the k th vertex popped from the heap, the priority of v is no greater than that of
the priority of v2 . Putting this all together, we have
which contradicts the assumption d(s, v) < d(s, u) + we . Therefore, d(s, v) = d(s, u) + we .
Finally, if v is not in the final tree, then v must have had an infinite priority when the algorithm
terminated. But by definition of priority, this means that there is no edge that is incident with some
vertex of the final tree and connects to v. Therefore, there is no path from s to v.
11
Exercises
1. A positive integer is perfect iff it is the sum of each of its proper divisors. For example, 6 is
perfect since 6 = 3 + 2 + 1. If an algorithm takes as input a positive integer n and returns true
iff n is perfect, then provide an appropriate parameter for representing the size of the problem
input.
2. If the algorithm from the previous exercise requires O(n2 ) to execute, then use the big-O growth
terminology to describe the algorithm’s running time.
4. If the algorithm from the previous exercise requires n log t steps to execute, then use the big-O
growth terminology to describe the algorithm’s running time.
5. Prove that a tree (i.e. undirected and acyclic graph) of size two or more must always have a
degree-one vertex. Hint: use a proof by contradiction.
6. Prove that a tree of size n has exactly n−1 edges. Hint: use the previous exercise and induction.
7. Prove that if a graph of order n is connected and has n − 1 edges, then it must be acyclic (and
hence is a tree). HInt: argue that the graph must have a degree-1 vertex, and use induction.
8. Draw the weighted graph whose vertices are a-e, and whose edges-weights are given by
{(a, b, 2), (a, c, 6), (a, e, 5), (a, d, 1), (b, c, 9), (b, d, 3), (b, e, 7), (c, d, 5),
11. Repeat Exercise 8 using Prim’s algorithm. Assume that vertex e is the first vertex added to
the mst. Annotate each edge with the order in which it is added to the mst.
12. For the previous exercise. Show the state of the binary heap just before the next vertex is
popped. Label each node with the vertex it represents and it priority. Let the initial heap have
e as its root.
12
13. Does Prim’s and Kruskal’s algorithm work if negative weights are allowed? Explain.
14. Explain how Prim’s and/or Kruskal’s algorithm can be modified to find a maximum spanning
tree.
15. Draw the weighted directed graph whose vertices are a-g, and whose edges-weights are given
by
{(a, b, 2), (b, g, 1), (g, e, 1), (b, e, 3), (b, c, 2), (a, c, 5), (c, e, 2), (c, d, 7), (e, d, 3),
(e, f, 8), (d, f, 1)}.
Perform Dijkstra’s algorithm to determine the reachability tree that is rooted at a. Annotate
each tree edge with the order that it was added to the tree. Annotate each vertex with its
current priority and parent.
16. Let G be a graph with vertices 0, 1, . . . , n−1, and let parent be an array, where parent[i] denotes
the parent of i for some shortest path from vertex 0 to vertex i. Assume parent[0] = −1;
meaning that 0 has no parent. Provide a recursive implementation of the function
that prints from left to right the optimal path from vertex 0 to vertex i. You may assume
access to a print() function that is able to print strings, integers, characters, etc.. For example,
print i
print "Hello"
print ’,’
17. The Fuel Reloading Problem is the problem of traveling in a vehicle from one point to
another, with the goal of minimizing the number of times needed to re-fuel. It is assumed that
travel starts at point 0 (the origin) of a number line, and proceeds right to some final point
F > 0. The input includes F , a list of stations 0 < s1 < s2 < · · · < sn < F , and a distance d
that the vehicle can travel on a full tank of fuel before having to re-fuel. Consider the greedy
algorithm which first checks if F is within d units of the current location (either the start or
the current station where the vehicle has just re-fueled). If F is within d units of this location,
then no more stations are needed. Otherwise it chooses the next station on the trip as the
furthest one that is within d units of the current location. Apply this algorithm to the problem
instance F = 25, d = 6, and
18. Given a finite set T of tasks, where each task t is endowed with a start time s(t) and finish
time f (t), the goal is to find a subset Topt of T of maximum size whose tasks are pairwise non-
overlapping, meaning that no two tasks in Topt share a common time in which both are being
executed. This way a single processor can complete each task in Topt without any conflicts.
Consider the following greedy algorithm, called the Task Selection Algorithm (TSA), for
finding Topt . Assume all tasks start at or after time 0. Initialize Topt to the empty set, and
13
initialize variable last finish to 0. Repeat the following step. If no task in T has a start time
equal to or exceeding last finish, then terminate the algorithm and return Topt . Otherwise
add to Topt the task t ∈ T for which s(t) ≥ last finish and whose finish time f (t) is a
minimum amongst all such tasks. Set last finish to f (t).
Implement TSA on the following set of tasks.
Task ID Start time Finish Time
1 2 4
2 1 4
3 2 7
4 4 8
5 4 9
6 6 8
7 5 10
8 7 9
9 7 10
10 8 11
19. Describe an efficient implementationn of the Task Selection algorithm, and provide the algo-
rithm running time under this implementation.
20. Consider the following alternative greedy procedure for finding a maximum set of non-overlapping
tasks for the Task Selection problem. Sort the tasks in order of increasing duration. Initialize
S = ∅ to be the set of selected non-overlapping tasks. At each round, consider the task t of
least duration that has yet to be considered in a previous round. If t does not overlap with
any activity in S, then add t to S. Otherwise, continue to the next task. Prove or disprove
that this procedure will always return a set (namely S) that consists of a maxiumum set of
non-overlapping tasks.
21. In one or more paragraphs, describe how to efficiently implement the procedure described in
the previous exercise. Provide the worst-case running time for your implementation.
22. The Fractional Knapsack takes as input a set of goods G that are to be loaded into a
container (i.e. knapsack). When good g is loaded into the knapsak, it contributes a weight of
w(g) and induces a profit of p(g). However, it is possible to place only a fraction α of a good
into the knapsack. In doing so, the good contributes a weight of αw(g), and induces a profit
of αp(g). Assuming the knapsack has a weight capacity M ≥ 0, determine the fraction f (g) of
each good that should be loaded onto the knapsack in order to maximize the total container
profit.
The Fractional Knapsack greedy algorithm (FKA) solves this problem by computing the profit
density d(g) = p(g)/w(g) for each good g ∈ G. Thus, d(g) represents the profit per unit weight
of g. FKA then sorts the goods in decreasing order of profit density, and initializes variable RC
to M , and variable TP to 0. Here, RC stands for “remaining capacity”, while TP stands for “total
profit”. Then for each good g in the ordering, if w(g) ≤ RC, then the entirety of g is placed
into the knapsack, RC is decremented by w(g), and TP is incremented by p(g). Otherwise, let
α = RC/w(g). Then αw(g) = RC weight units of g is addded to the knapsack, TP is incremented
by αp(g), and the algorithm terminates.
14
For the following instance of the FK problem, determine the amount of each good that is placed
in the knapsack by FKA, and provide the total container profit. Assume M = 10.
good weight profit
1 3 40
2 5 60
3 5 50
4 1 30
5 4 50
23. Describe an efficient implementationn of the FK algorithm, and provide the algorithm running
time under this implementation.
24. The 0-1 Knapsack problem is similar to Fractional Knapsack, except now, for each good g ∈ G,
either all of g or none of g is placed in the knapsack. Consider the following modification of
the Fractional Knapsack greedy algorithm. If the weight of the current good g exceeds the
remaining capacity RC, then g is skipped and the algorithm continues to the next good in
the ordering. Otherwise, it adds all of g to the knapsack and decrements RC by w(g), while
incrementing TP by p(g). Verify that this modified algorithm does not produce an optimal
knapsack for the problem instance of Exercise 22.
25. Scheduling with Deadlines. The input for this problem is a set of n tasks a1 , . . . , an . The
tasks are to be executed by a single processor starting at time t = 0. Each task ai requires
one unit of processing time, and has an integer deadline di . Moreover, if the processor finishes
executing ai at time t, where di ≤ t, then a profit pi is earned. For example, if task a1 has a
deadline of 3 and a profit of 10, then it must be either the first, second, or third task executed
in order to earn the profit of 10. Consider the following greedy algorithm for maximizing the
total profit earned. Sort the tasks in decreasing order of profit. Then for each task ai in the
ordering, schedule ai at time t ≤ di , where t is the latest time that does not exceed di , and
for which no other task has yet to be scheduled at time t. If no such t exists, then skip ai
and proceed to the next task in the ordering. Apply this algorithm to the following problem
instance. If two tasks have the same profit, then ties are broken by alphabetical order. For
example, Task b preceeds Task e in the ordering.
Task a b c d e f g h i j k
Deadline 4 3 1 4 3 1 4 6 8 2 7
Profit 40 50 20 30 50 30 40 10 60 20 50
26. Provide the worst-case running time T (n) of the Task-Scheduling greedy algorithm, where n
denotes the number of tasks in the input.
27. Let X = {x1 , . . . , xn } be a set of elements that are to be binary encoded, meaning that each
element xi is to be assigned a binary string s(xi ) that represents the element. Here, s(xi ) is
called the codeword assigned to xi . Assigned to each element xi is a probability pi , where
Pn
pi = 1. For example, the elements of X could be the tokens that comprise a text file, and
i=1
pi is proportional to how often token xi appears in the file. Now, the binary encoding must
represent a prefix code, meaning that, for all i 6= j, s(xi ) is not a prefix of s(xj ). For example,
00 is a prefiex of 0010 since the former is equal to the first two bits of the latter. The Optimal
Binary Code problem is the problem of assigning a prefix code to X in such a way that the
15
expected codeword length
n
X
E[s(X )] = pi |s(xi )|
i=1
is minimized, where |s(xi )| denotes the length of the codeword assigned to xi . Given X =
{1, 2, 3, 4}, with p(1) = 0.5, p(2) = 0.25, and p(3) = p(4) = 0.125, provide an optimal binary
prefix code for X . Give an informal argument for why you believe the encoding is optimal.
28. Prove that there is an optimal prefix encoding of X = {x1 , . . . , xn }, with n ≥ 2, for which the
two least probable elements are assigned codewords w1 and w2 which differ only in the last
bit. In other words w1 = w0 and w2 = w1, for some binary string w. Hint: think of binary
codewords as being nodes of a binary tree.
29. The Huffman Coding Algorithm is a recursive greedy algorithm for assigning an optimal
prefix code s(X ) to a set X = {x1 , . . . , xn }.
Base Case If X = {x1 } consists of a single element, then s(x1 ) = λ, the empty string.
Recursive Case Assume X = {x1 , . . . , xn }, with n ≥ 2. Without loss of generality, assume
xn−1 and xn are the two least probable elements. Merge these two elements into a new
element y with probability equal to pn−1 + pn . Then apply the algorithm to the input
X 0 = {x1 , . . . , xn−2 , y} to obtain the prefix code s0 (X 0 ). Finally, define s(X ) by s(xi ) =
s0 (xi ), for all i = 1, . . . , n − 2, s(xn−1 ) = s0 (y)0, and s(xn ) = s0 (y)1. In words, we use the
same prefix code that is returned by the recursive call, while assigning xn−1 and xn the
codeword asssigned to y with either a 0 or 1 appended to obtain the respective codewords
for each of xn−1 and xn . Apply Huffman’s algorithm to X = {1, 2, 3, 4}, with p(1) = 0.5,
p(2) = 0.25, and p(3) = p(4) = 0.125.
30. Given the set of keys 1, . . . , n, where key i has weight wi , i = 1, . . . , n. The weight of the key
reflects how often the key is accessed, and thus heavy keys should be higher in the tree. The
Optimal Binary Search Tree problem is to construct a binary-search tree for these keys, in such
a way that
Xn
wac(T ) = w i di
i=1
is minimized, where di is the depth of key i in the tree (note: here we assume the root has
a depth equal to one). This sum is called the weighted access cost. Consider the greedy
heuristic for Optimal Binary Search Tree: for keys 1, . . . , n, choose as root the node having the
maximum weight. Then repeat this for both the resulting left and right subtrees. Apply this
heuristic to keys 1-5 with respective weights 50,40,20,30,40. Show that the resulting tree does
not yield the minimum weighted access cost.
31. Given a simple graph G = (V, E), a vertex cover for G is a subset C ⊆ V of vertices for which
every edge e ∈ E is incident with at least one vertex of C. Consider the greedy heuristic for
finding a vertex cover of minimum size. The heuristic chooses the next vertex to add to C as
the one that has the highest degree. It then removes this vertex (and all edges incident with
0
it) from G to form a new graph G . The process repeats until the resulting graph has no more
edges. Give an example that shows that this heuristic does not always find a minimum cover.
16
Exercise Solutions
1. log n since it requires blog nc + 1 bits to represent n.
2. The algorithm has expoenential running time since n2 = (2log n )2 = 4log n which grows exponen-
tially with respect to input parameter log n.
3. n and log t since it requires blog tc + 1 bits to represent t, and each permutation has a length
of n.
4. The algorithm has polynomial (more specifically, quadratic) running time since n log t is the
product of input parameters n and log t.
5. Consider the longest simple path P = v0 , v1 , . . . , vk in the tree. Then both v0 and vk are
degree-1 vertices. For example, suppose there was another vertex u adjacent to v0 , other than
v1 . Then if u 6∈ P , then P 0 = u, P is a longer simple path than P which contradicts the fact
that P is the longest simple path. On the other hand, if u ∈ P , say u = vi for some i > 1, then
P 0 = u, v0 , v1 , . . . , vi = u is a path of length at least three that begins and ends at u. In other
words, P 0 is a cycle, which contradicts the fact that the underlying graph is a tree, and hence
acyclic.
6. Use the previous problem and mathematical induction. For the inductive step, assume trees
of size n have n − 1 edges. Let T be a tree of size n + 1. Show that T has n edges. By the
previous problem, one of its vertices has degree 1. Remove this vertex and the edge incident
with it to obtain a tree of size n. By the inductive assumption, the modified tree has n − 1
edges. Hence T must have n edges.
7. Use induction.
Thus, removing u from V and removing the edge incident with u from E yields a connected
graph G0 of order n−1 and size n−2. By the inductive assumption, G0 is acyclic. Therefore,
since no cycle can include vertex u, G is also acyclic.
8. Edges added: (a, d, 1), (a, b, 2), (c, e, 4), (c, d, 5) for a total cost of 12.
17
a
d b c
11. Edges added: (c, e, 4), (c, d, 5), (a, d, 1), (a, b, 2) for a total cost of 12.
12. The heap states are shown below. Note: the next heap is obtained from the previous heap by
i) popping the top vertex u from the heap, followed by ii) performing a succession of priority
reductions for each vertex v in the heap for which the edge (u, v, c) has a cost c that less than
the current priority of v. In the case that two or more vertices have their priorities reduced,
assume the reductions (followed by a percolate-up operation) are performed in alphabetical
order.
e/∞
a/∞ b/∞
c/∞ d/∞
c/4
b/7 a/5
d/8
d/5
b/7 a/5
18
a/1
b/3
b/2
13. Add a sufficiently large integer J to each edge weight so that the weights will be all nonnegative.
Then perform the algorithm, and subtract J from each mst edge weight.
14. For Kruskal’s algorithm, sort the edges by decreasing edge weight. For Prim’s algorithm, use a
max-heap instead of a min-heap. Verify that these changes can be successfully adopted in each
of the correctness proofs.
15. Edges added in the following order: (a, b, 2), (b, g, 1), (b, c, 2), (g, e, 1), (e, d, 3), (d, f, 1). d(a, a) =
0, d(a, b) = 2, d(a, g) = 3, d(a, c) = 4, d(a, e) = 4, d(a, d) = 7, d(a, f ) = 8.
print_optimal_path(parent[i], parent);
print ‘‘ ‘’;
print i;
}
19. It is sufficient to represent the problem size by the number n of input tasks. Sort the tasks in
order of increasing start times. Now the algorithm can be completed in the following loop.
19
The above code appears to be a correct implementation of TSA. The only possible concern
is for a task t that neither satisfies the if nor the else-if condition. Such tasks never get
added to the final set of non-overlapping tasks. To see that this is justified, suppose in the if
statement t is comparing its finish time f (t) with that of t0 . Then we have
where the first inequality is from the fact that the tasks are sorted by start times, and the
second inequality is from the fact that t does not satisfy the else-if condition. Hence, it
follows that t and t0 overlap, so, if t0 is added to the optimal set, then t should not be added.
Moreover, the only way in which t0 is not added is if there exists a task t00 that follows t in
terms of start time, but has a finish time that is less than that of t0 ’s. In this case we have
s(t) ≤ s(t00 ) and f (t) ≥ f (t0 ) ≥ f (t00 ) and so t overlaps with t00 . And once again t should not
be added to the final set.
Based on the above code and analysis, it follows that TSA can be implemented with an initial
sorting of the tasks, followed by a linear scan of the sorted tasks. Therefore, T (n) = Θ(n log n).
20. Hint: consider the case where there are three tasks t1 , t2 , and t3 , where there is overlap between
t1 and t2 , and t2 and t3 .
21. The most efficient implementation has running time Θ(n log n). Hint: your implementation
should make use of a balanced (e.g. AVL) binary search tree.
22. The table below shows the order of each good in terms of profit density, how much of each good
was placed in the knapsack, and the profit earned from the placement. The total profit earned
is 140.4.
good weight profit density placed profit earned
4 1 30 30 1 30
1 3 40 13.3 3 40
5 4 50 12.5 4 50
2 5 60 12 2 20.4
3 5 50 10 0 0
23. The parameters n, and log M can be used to represent the problem size, where n is the number
of goods. Notice how log M is used instead of M , since log M bits are needed to represent
capacity M . Furthermore, assume each good weight does not exceed M , and the good profits
use a constant number of bits. Then the sorting of the goods requires Θ(n log n) steps, while
the profit density calculations and updates of variables RC and TP require O(n log M + n) total
steps. Therefore, the running time of FKA is T (n) = O(n log n + n log M ).
24. The table below shows the order of each good in terms of profit density, how much of each good
was placed in the knapsack by modified FKA, and the profit earned from the placement. The
total profit earned is 120. However, placing goods 2, 4, and 5 into the knapsack earns a profit
of 140 > 120. An alternative algorithm for 0-1 Knapsack will be presented in the Dynamic
Programming lecture.
20
good weight profit density placed profit earned
4 1 30 30 1 30
1 3 40 13.3 3 40
5 4 50 12.5 4 50
2 5 60 12 0 0
3 5 50 10 0 0
25. The optimal schedule earns a total profit of 300, and is shown below.
Time 1 2 3 4 5 6 7 8
Task g e b a h k i
Profit 40 50 50 40 10 50 60
26. The worst case occurs when each of the n tasks has a deadline of n and all have the same profit.
In this case task 1 is scheduled at n, task 2 at n − 1, etc.. Notice that, when scheduling task i,
the array that holds the scheduled tasks must be queried i − 1 times before finding the available
time n − i + 1. This yields a total of 0 + 1 + · · · + n − 1 = Θ(n2 ) queries. Thus, the algorithm
has a running time of T (n) = O(n2 ). Can you find a way to improve this algorithm’s worst
case running time? Can you find a different algorithm with a better worst case running time?
27. s(1) = 0, s(2) = 10, s(3) = 110, s(4) = 111. The code is optimal since a prefix code with four
codewords at a minimum must have lengths of 1, 2, 3, and 3. Then given these lengths, the
elements with higher probability are assigned to the shorter lengths.
28. Consider an optimal prefix encoding of X = {x1 , . . . , xn }, with n ≥ 2. Without loss of gener-
ality, assume x1 and x2 are the two least probable elements, with w1 and w2 their respective
codewords. Suppose that w1 and w2 differ in more than just the last bit.
First assume that |w2 | =
6 |w1 |. Without loss of generality, assume |w2 | > |w1 |. Then, since
the code is optimal, we must have p2 < p1 . Futhermore, |w2 | ≥ 2. Without loss of generality,
assume w2 ends with 0. Then w2 = w0 for some binary string w with |w| ≥ 1. Now since w
was not assigned to x2 , it must be the case that w is a prefix of another codeword v 6= w2 .
But since w2 is the longest codeword (as it encodes the least probable element), it follows that
|v| = |w2 |, and hence v = w1. But then v would encode the second least probable element,
which implies v = w1 , which contradicts the assumption that |w2 | =
6 |w1 |. Hence, we must have
|w2 | = |w1 |.
Now, since we are assuming that w1 and w2 differ in more than just the last bit, it follows that
|w2 | = |w1 | ≥ 2. Again, assume without loss of generality that x2 is the least probable element,
and that w2 = w0 for some binary string w with |w| ≥ 1. Once again, since w was not assigned
to x2 , it must be the case that w is a prefix of another codeword v 6= w2 . But since w2 is
the longest codeword (as it encodes the least probable element), it follows that |v| = |w2 |, and
hence v = w1. Now, assuming v encodes some other element x ∈ X , we swap codewords with
x1 and x, and arrive at a code in which the two least probable elements are assigned codewords
that differ in only the last bit.
29. The first recursive call is for X 0 = {1, 2, (3, 4)}. The next recursive call is for X 00 = {1, (2, (3, 4))}.
The final recursive call is for X 000 = {(1, (2, (3, 4)))}. Since this set has one element, (1, (2, (3, 4)))
is assigned λ, then 1 is assigned 0 and (2, (3, 4)) is assigned 1. Then 2 is assigned 10 and (3, 4) is
assigned 11. Then 3 is assigned 110 and 4 is assigned 111. The final code is s(1) = 0, s(2) = 10,
s(3) = 110, s(4) = 111.
21
30. The heuristic produces the tree below.
1/50
2/40
5/40
4/30
3/20
2/40
1/50 4/30
3/20 5/40
31. In the graph below, the heuristic will first choose vertex a, followed by four additional vertices
(either b, d, f , h, or c, e, g, i), to yield a cover of size five. However, the optimal cover {c, e, g, i}
has a size of four.
b
i c
h a d
g e
22