04 Unit4
04 Unit4
Unit 4
Unit 4
Structure 4.1 Introduction Objectives 4.2 Greedy Method Strategy 4.3 Optimistic Storage on Tapes 4.4 Knapsack Problem 4.5 Job Sequencing with Deadlines 4.6 Optimal Merge Pattern 4.7 Single Source Shortlist Paths 4.8 Summary 4.9 Terminal Questions 4.10 Answers
Greedy Method
4.1 Introduction
In the previous unit, we have discussed the different techniques of algorithms. Now we will study a well known method known as the greedy method. The greedy method is the most straightforward design technique, and it can be applied to a wide variety of problems. Most of these problems have n inputs and require us to obtain a subset that satisfies some constraints. Any subset that satisfies these constraints is called a feasible solution. We need to find a feasible solution that either maximizes or minimizes a given objective function. A feasible solution that does this is called an optimal solution. There is usually a simple way to determine a feasible solution but not necessarily an optimal solution. Objectives After studying this unit, you should be able to: describe the greedy method approach apply the greedy method for optimistic storage on tapes, knapsack problem define the job sequencing problem using greedy method.
Fundamentals of Algorithms
Unit 4
regarding whether a particular input is in an optimal solution. This is done by considering the inputs in an order determined by some selection procedure. If the inclusion of the next input into the partially constructed optimal solution will result in an infeasible solution, then this input is not added to the partial solution. Otherwise, it is added. The selection procedure itself is based on some optimization measure. This measure may be the objective function. This version of the greedy technique is called the subset paradigm. We can describe the subset paradigm abstractly, but more precisely than above, by considering the control abstraction in Algorithm 4.1 Algorithm 4.1 1. Algorithm Greedy (a,n) 2. //a[1:n] contains the n inputs 3. { 4. Solution: = ;//initialize the solution 5. for i:=1 to n do 6. { 7. x: =select (a); 8. if feasible (solution, x) then 9. solution:=Union(solution, x); 10. } 11. return solution. 12. } The function Select selects an input from a [ ] and removes it. The selected inputs value is assigned to x. Feasible is a Boolean-valued function that determines whether x can be included into the solution vector. The function union combines x with the solution and updates the objective function. The function Greedy describes the essential way that a greedy algorithm will look, once a particular problem is chosen and the functions select, feasible and union are properly implemented. Problems that do not call for the selection of an optimal subset, in the greedy method we make decisions by considering the inputs in some order. Each decision is made using an optimization criterion that can be computed using decisions already made. This version of the greedy method is called the ordering paradigm.
Sikkim Manipal University Page No. 98
Fundamentals of Algorithms
Unit 4
Self Assessment Question 1. is a Boolean-valued function that determines whether x can be included into the solution vector.
problem, we are required to find a permutation for the n programs so that when they are sorted on the tape in this order the MRT is minimized. This problem fits the ordering paradigm. Minimizing the MRT is equivalent to minimizing d l
1 j n1 k j
l ik
A greedy approach to building the required permutation would choose the next program on the basis of some optimization measure. One possible measure would be the d value of the permutation constructed so far. The next program to be stored on the tape would be one that minimizes the increase in d. If we have already constructed the permutation i1, i2, .. ir, then appending program j gives the permutation i1, i2,.., ir, ir+1=j. This increases the d value by
l ik
1 j n
l j . Since
1 k j
l ik
is fixed and
independent of j, we trivially observe that the increase in d is minimized if the next program chosen is the one with the least length form among the remaining programs.
Page No. 99
Fundamentals of Algorithms
Unit 4
The greedy method simply requires us to store the programs in nondecreasing order of their lengths. This ordering can be carried out in O(n log n). Theorem: If l1
n k
l2
n, minimizes
Proof: Let l=i1,i2,..,in be any permutation of the index set {1, 2, ..,n}. Then
n k n
dI
k 1 j 1
l ij
k 1
n k 1 l jk
If there exist a and b such that a<b and lia > Iib, then interchanging ia and ib results in a permutation I with
dI
k a
k b
n k 1 l ik
k
n a 1 l ib
n b 1 l ia
Subtracting d I
dI dI
n a 1 l ia
b a l ia
l ib
>0 Hence, no permutation that is not in non-decreasing order of the lis can have minimum d. It is easy to see that all permutations in non-decreasing order of the lis have the same d value. Hence, the ordering defined by ij=j, 1 j n minimizes the d value. The tape storage problem can be extended to several tapes. If there are m>1, tapes, T0, .., Tm-1, then the programs are to be distributed over these tapes. For each tape a storage permutation is to be provided If lj is the storage permutation for the subset of programs on tape j, then d(lj) is as defined earlier. The total time TD is
d l j . The objective is to store the
0 j m 1
Fundamentals of Algorithms
Unit 4
The obvious generalization of the solution for the one-tape case is to consider the program in non-decreasing order of lis. The program currently being considered is placed on the tape that results in the minimum increase in TD. This tape will be the one with the least amount of tape used so far. If there is more than one tape with this property, then the one with the smallest index can be used. If the jobs are initially ordered so that l1 l 2 ..... l n , then the first m programs are assigned to tapes T0,..,Tm-1 respectively. The next m programs will be assigned to tapes T0,.., Tm-1 respectively. The General rule is that program i is stored on tape T i mod m On any given tape the programs are stored in non-decreasing order of their lengths. Algorithm 4.2 presents this rule in pseudocode. It assumes that the programs are ordered as above. It has a computing time of (n) and does not need to know the program lengths. Algorithm 4.2 1. Algorithm Store (n,m) 2. //n is the number of programs and m the number 3. // of tapes 4. { 5. j:=0; //Next tape to store on 6. for i:=1 to n do 7. { 8. write(append program, i, 9. to permutation for tape,j); 10. j:= (j+1)mod m, 11. } 12. } Self Assessment Questions 2. A greedy approach to building the required permutation would choose the next program on the basis of some. 3. The ordering defined by ij=j, 1 j n, the d value.
Fundamentals of Algorithms
Unit 4
Pi x i -------------------- 1
1 i n
w i xi
1 i n
m ------------- 2
n ------------------ 3
1, 1 i
The profits and weights are positive numbers. A feasible solution (or filling) is any set (x1, ., xn) satisfying 2 and 3 above. An optimal solution is a feasible solution for which 1 is maximized. The Greedy knapsack algorithm requires O(n) time. The Greedy knapsack algorithm is given below. Algorithm 4.3 Algorithm Greedyknapsack (m,n) //p[1:n} and w[1:n] contain the profits and weights respectively //of the n objects ordered such that p[i]/w[i] p[i+1]/w[i+1] //m is the knapsack size and x[1:n] is the solution vector { for i:=1 to n do x[i]:=0.0;//Initialize x u:=m; for i:=1 to n do { if (w[i]>u) then break; x[i]:=1.0; U:=U-w[i]; } if (i n) then x[i]:=u/w[i]; } Self Assessment Question 4. The profits and weights are numbers.
Sikkim Manipal University Page No. 102
Fundamentals of Algorithms
Unit 4
pi . An optimal
solution is a feasible solution with maximum value. Since the problem involves the identification of a subset, it fits the subset paradigm. Algorithm 4.4 demonstrates the greedy algorithm for sequencing unit time jobs with deadlines and profits Algorithm 4.4 1. Algorithm JS (d,j,n) 2. //d[i] 1, 1 i n are the deadlines, n 1. The jobs 3. // are ordered such that p[1] p[2] .... p[n]. 4. //J[i] is the ith job in the optimal solution, 1 i k. 5. //Also, at termination d [J[i]] d[J[i+1]], 1 i k. 6. { 7. d[0]:=J[0]:=0;//lnitialize 8. J[1]:=1 ;// include job 1 9. K:=1; 10. for i:=2 to n do 11. { 12. //Consider jobs in non-increasing order of p[i] 13. //Find position for i and check feasibility for insertion 14. r:=k; 15. While((d[J[r]]>d[i]) and (d[J[r]] r)) dp r:= r 1; 16. if ((d[J[r]] d[i]) and (d[i]>r)) then 17. { 18. // Insert i into J[ ]. 19. for q:=k to (r+1) step 1 do J[q+1]:=J[q]; 20. J[r+1]:=i; k:=k+1; 21. } 22. } 23. return k; 24. }
Sikkim Manipal University Page No. 103
Fundamentals of Algorithms
Unit 4
Theorem: If p1/w1 p2/w2 .. pn/wn. Then, Greedy knapsack algorithm generates an optimal solution to the given instance of the knapsack problem. Proof: Let x = (x1, .,xn) be the solution generated by Greedy knapsack algorithm. If all the xi equal one, then clearly the solution is optimal. So, let j be the least index such that xj 1. From the algorithm it follows that xi=1 for 1 i<j, xi = 0 for j<i n and 0 xj<1. Let y=(y1, , yn) be an optimal solution we can assume that wjyi = m. Let k be the least index such that yk xk. Clearly, such a k must exist. It also follows that yk<xk. To see this, consider the three possibilities k<j, k=j or k>j. 1. If k<j, then xk=1, But, yk xk, and so yk < xk. 2. k=j, then since wiyi = m and yi = xi for 1 i<j. it follow that either yk<xk or wiyi = m 3. k > j, then wiyi = m, and this is not possible. Now, suppose we increase yk to xk and decrease as many of (yk+1,., yn) as necessary so that the total capacity used is still on. This results in a new solution z=(z1, .., zn) with zi=xi, 1 i k and k<i nwi (yi - zi) = wk(zk-yk), Then, for z we have
pi z i
1 i n
pi y i
1 i n
zk
Yk w k
k i n
yi zk
zi w i
pi wi yi zi pk wk pi y i
1 i n
pi y i
1 i n
yk wk
k i n
If
pi z i
sums are equal, then either z=x and x is optimal, or z x. In the latter case, repeated use of the above argument will either show that y is not optimal or transform y into x and thus show that x too is optimal. Greedy algorithm for sequencing unit time jobs with deadlines and profits For JS, there are two possible parameters in terms of which its complexity can be measured. We can use n, number of jobs and s, the number of jobs included in the solution J. The while loop of line 15 in Algorithm 4.4 is iterated at most K times. Each iteration takes (1) time. If the condition; of line 16 is true, then lines 19 and 20 are executed. These lines require (k-r)
Sikkim Manipal University Page No. 104
Fundamentals of Algorithms
Unit 4
time to insert job i. Hence, the total time for each iteration of the for loop of line 10 is (k). This loop is iterated n-1 times. If s is the final value of k, that is, S is the number of jobs in the final solution, then the total time needed by algorithm JS is (Sn). Since S n, the worst-case time, as a function of n alone is (n2). If we consider the job set pi=di=n i + 1, 1 i n, then algorithm JS takes (n2) time to determine J. Hence. The worst-case computing time for JS is (n2) In addition to the space needed for d, JS needs (S) amount of space for J. Self Assessment Question 5. If, then Greedy knapsack algorithm generates an optimal solution to the given instance of the knapsack problem.
Fundamentals of Algorithms
Unit 4
The merge pattern such as the one just described will be referred to as a two way merge pattern (each merge step involves the merging of two files). The two-way merge pattern can be represented by binary merge trees. Fig. 4.1 shows a binary merge tree representing the optimal merge pattern obtained for the above five files.
The leaf nodes are drawn as squares and represent the given five files. These nodes are called external nodes. The remaining nodes are drawn as circles and are called internal nodes. Each internal node has exactly two children, and it represents the file obtained by merging the files represented by its two children. The number in each nodes is length (no. of records) of the file represented by that node. The external node x4 is at a distance of 3 from the root z4 (a node at level i is at a distance of i 1 from the root) Hence, the records of files x4 are moved 3 times, once to get z1, once again to get z2, and finally, one more time to get z4. If di is distance from the root to the external node for file xi and qi, the length of xi is then the total number of record moves, for this binary merge
n
tree it is
i 1
d i qi
This sum is called the weighted external path length of the tree. tree node = record{ treenode*lchild; treenode * rchild; integer weight;
Fundamentals of Algorithms
Unit 4
In the algorithm below we discuss how to generate a two-way merge tree Algorithm 4.5 1. Algorithm Tree (n) 2. //list is a global list of n single node 3. //binary trees as described above 4. { 5. for i:=1 to n1 do 6. { 7. pt:=new treenode; //Get a new tree node 8. (pt /child):=Least(list);//Merge two trees with 9. (pt. /child) :=Least(list) ;//smallest lengths 10. (pt weight):=((pt lchild) weight) 11. +((pt rchild) weight); 12. Insert (list,pt): 13. } 14. return Least (list); //Tree left in list is the merge tree 15. } An optimal two-way merge pattern corresponds to a binary merge tree with minimum weighted external path length. The function Tree of Algorithm 4.5 uses the greedy rule stated earlier to obtain a two-way merge tree for n files. The algorithm has as input a list of n trees. Each node in a tree has three fields, /child, rchild and weight. Initially, each tree in list has exactly one node. This node is an external node and has lchild and rchild fields zero whereas weight is the length of one of n files to be merged. During the course of the algorithm, for any tree in list with root node t, t weight is the length of the merged file it represents ( t weight equals the sum of the lengths of the external nodes in tree t). Function Tree uses two functions, Least (list) and Insert (list, t). Least (list) finds a tree in list whose root has least weight and returns a pointer to these trees. These trees are removed from list. Insert (list, t) inserts the tree with root t into list. The main for loop in Algorithm 4.5 is executed n 1 times. If list is kept in non-decreasing order according to the weight value in the roots, then Least (list) requires only O(1) time and Insert (list, t) can be done in O(n) time. Hence, the total time taken is O(n2).
Sikkim Manipal University Page No. 107
Fundamentals of Algorithms
Unit 4
Computing time for Tree is O(n log n). Some speedup may be obtained by combining the Insert of line 12 with the Least of line 9. Self Assessment Questions 6. Each node in a tree has three fields , and
Fig. 4.2: Graph & Shortest paths from vertex 1 to all destinations
For the graph of fig 4.2 the nearest vertex to v0 = 1 is 4 (cost [1,4] =10). The path 1, 4 is the first path generated. The second nearest vertex to node 1 is
Sikkim Manipal University Page No. 108
Fundamentals of Algorithms
Unit 4
5 and the distance between 1 and 5 is 25. The path 1, 4, 5 is the next path generated. In order to generate the shortest paths in this order, we need to be able to determine. 1) The next vertex to which a shortest path must be generated and 2) A shortest path to this vertex Let S denote the set of vertices (including v0) to which the shortest paths have already been generated. For w not in S, let dist[w] be the length of the shortest path starting from v0, going through only those vertices that are in S and ending at w. We observe that: 1. If the next shortest path is to vertex u, then the path begins at v0, and goes through only those vertices that are in S. To prove this, we must show that all the intermediate vertices on the shortest path to u are in S. Assume there is a vertex w on this path that is not in S. Then, the v0 to u path also contains a path from v0 to w that is of length less than the v0 to u path. By assumption, the shortest paths are being generated in non decreasing order of path length, and so the shorter path v0 to w must already have been generated. Hence, there can be no intermediate vertex that is not in S. 2. The destination of the next path generated must be that of vertex u which has the minimum distance, dist [u], among all vertices not in S. This follows from the definition of dist and observation 1. In case there are several vertices not in S with the same dist, then any of these may be selected. 3. Having selected a vertex u in observation 2 and generated the shortest v0 to u path, vertex u becomes a member of S. At this point, the length of the shortest paths starting at v0, going through vertices only in S, and ending at a vertex w not in S may decrease; that is, the value of dist [w] may change. If it does change, then it must be due to a shorter path starting at v0 and going to u and then to w. The intermediate vertices on the v0 to u path and the u to w path must all be in S. Further, the v0 to u path must be the shortest such path; otherwise dist [w] is not defined properly. Also, the u to w path can be chosen so as not to contain any intermediate vertices. Therefore, we can conclude that if dist [w] is to change, then it is because of a path from v0 to u to w, where, the path
Sikkim Manipal University Page No. 109
Fundamentals of Algorithms
Unit 4
from v0 to u is the shortest path and the path from u to w is the edge (u, w). The length of this path is dist [u] + cost [u, w]. The above observations lead to a simple algorithm 4.6 for the single source shortest path problem. These algorithms (known as Dijkstras algorithm) only determines the lengths of the shortest path from v0 to all other vertices in G. Algorithm 4.6 1. Algorithm shortest paths (v, cost dist, n) 2. // dist [j], 1 j n, is set to the length of the shortest 3. // path from vertex v to vertex in a digraph G with n 4. // vertices dist [v] is set to zero. G is represented by its 5. // cost adjacency matrix cost [1:n, 1:n] 6. { 7. for i: = 1 to n do 8. { // Initialise S 9. S [i] : = false; dist [i] : = cost [v, i]; 10. } 11. S[v]: = true; dist[v] : = 0.0; //put v in s 12. for num:=2 to n 1 do 13. { 14. // Determine n 1 paths from v 15. choose u from among those vertices not 16. in S such that dist [u] is minimum; 17. S[u] : = true; //put u in S 18. for (each w adjacent to u with S[w] = false) do 19. // update distances 20. if (dist [w] > (dist [u] + cost [u, w])) then 21. dist [W]: = dist [u] + cost [u, W]; 22. } 23. } Above algorithm is also known as Greedy algorithm to generate shortest paths. In the function shortest paths (Algorithm 4.6) it is assumed that the n vertices of G are numbered 1 through n. The set S is maintained as a bit
Sikkim Manipal University Page No. 110
Fundamentals of Algorithms
Unit 4
array with S [i] = 0 if vertex I is not in S and S [i] = 1. It is assumed that the graph itself is represented by its cost adjacency matrix with cost [i, j]s being the weight of the edge <i, j>. The weight cost [i, I] is set to some large number, , in case the edge <i, j> is not in E (G). For i = j, cost [i, j] can be set to any non negative number without affecting the outcome of the algorithm. The time taken by the algorithm on graph with n vertices is O(n2). To see this, note that the for loop of line 7 in Algorithm 4.6 takes (n) time. The for loop of line 12 is executed n 2 times. Each execution of this loop requires O(n) time at lines 15 and 16 to select the next vertex and again at the for loop at line 18 to update dist. So the total time for this loop is O(n2). In case a list t of vertices currently not in S is maintained, then the number of nodes on this list would at any time be n-num. This would speed up lines 15 and 16 and the for loop of line 18, but the asymptotic time would remain O(n2). Self Assessment Questions 7. The shortest path between v0 and some other node v is an among a subset of the edges. 8. The time taken by the algorithm on graph with n vertices is .
4.8 Summary
In this unit we studied that the greedy method for optimally storage on tapes algorithm takes (n) computing time. The greedy method for knapsack problem takes (n2) time to determine the solution. The greedy method for optimal merge pattern has O(nlogn) computing time for tree The greedy method for single source shortlist path problem has (n2) asymptotic time.
Fundamentals of Algorithms
Unit 4
5. Do you agree that the single source shortlist paths problem fits the ordering paradigm ? Justify your answer 6. Write an algorithm for single source shortest path problem 7. Write the Greedy-Knapsack algorithm
4.10 Answers
Self Assessment Questions 1. Feasible 2. Optimization measure 3. Minimizes 4. Positive 5. p1/w1 p2/w2 .. pn/wn 6. lchild, rchild and weight. 7. ordering 8. O(n2) Terminal Questions 1. The greedy method suggests that one can device an algorithm that works in stages, considering one input at a time. At each stage, a decision is made regarding whether a particular input is in an optimal solution. This is done by considering the inputs in an order determined by some selection procedure. (Refer Section 4.2) 2. We assume that whenever a program is to be retrieved from this tape, the tape is initially positioned at the front. Hence, if the programs are stored in the order I=i1, i2 -----in, the time tj needed to retrieve program ij is proportional to
I ik . 1 k j
the expected or mean retrieval time (MRT) is Section 4.3) Refer Section 4.4 Refer Section 4.6 Refer Section 4.7 Refer Section 4.7 Refer Section 4.4
t j . (Refer
1 j n
3. 4. 5. 6. 7.