0% found this document useful (0 votes)
45 views12 pages

DAA Unit-IV

This document discusses greedy algorithms and dynamic programming. It provides examples of the activity selection problem and how the greedy algorithm of selecting activities by earliest finish time works. It then discusses Huffman coding and how it uses a greedy approach to assign variable-length codes to characters based on frequency. The steps of building a Huffman tree from character frequencies and assigning codes by traversing the tree are described. Finally, it provides a brief introduction to dynamic programming, noting that it solves problems by combining solutions to sub-problems in an optimal way by storing results of sub-problems to avoid recomputing.

Uploaded by

20981a4208
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views12 pages

DAA Unit-IV

This document discusses greedy algorithms and dynamic programming. It provides examples of the activity selection problem and how the greedy algorithm of selecting activities by earliest finish time works. It then discusses Huffman coding and how it uses a greedy approach to assign variable-length codes to characters based on frequency. The steps of building a Huffman tree from character frequencies and assigning codes by traversing the tree are described. Finally, it provides a brief introduction to dynamic programming, noting that it solves problems by combining solutions to sub-problems in an optimal way by storing results of sub-problems to avoid recomputing.

Uploaded by

20981a4208
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Greedy and Dynamic Programming UNIT-4

1. GREEDY APPLICATION – Activity Selection:


Input: A list of intervals I = {I1, I2, . . . , In}, where each interval Ii is defined by two integers s and
f, where s is the starting time, and f is the finishing time. Let Ii = (si , fi) with si < fi . Two intervals Ii , Ij
are compatible, i.e. disjoint, if they do not intersect (fi < sj or si < fj ).
Output: A maximum subset of pair wise compatible (disjoint) intervals in I. A number of greedy
heuristics we tried in class failed quickly and miserably. Heuristics such as the Greedy Early Start Time
algorithm (sorting the starting time by non decreasing start time s1 ≤ s2 ≤ . . . ≤ sn), or the Greedy Early
Finish Time algorithm (sorting the finishing time by non decreasing start time f1 ≤ f2 ≤ . . . ≤ fn), or the
Greedy by Duration (sorting the intervals by non decreasing duration (f 1 −s1) ≤ (f2 −s2) ≤ . . . ≤ (fn −sn)),
or the Greedy by fewest conflicts (For each job j, count the number of conflicting jobs c j. Schedule in
ascending order of cj) etc, but the Early Finish Time greedy algorithm (EFT) seemed to work, and we
proved it is indeed optimal.
Example: For example, suppose we have N=8 jobs {A,B,C,D,E,F,G,H} with the following start
and finish time and weights {w1, w2,…,wN} respectively.

Fig: Interval scheduling problem with 8 jobs


Pseudo code as:
1) Sort the activities according to their finishing time.
2) Select the first activity from the sorted array and print it.
3) Do following for remaining activities in the sorted array.
a) If the start time of this activity is greater than or equal to the finish time of
previously selected activity then select this activity and print it.
Greedy algorithm as:

Time Complexity is O(n log n)

Let’s sort and label the jobs by finishing time: f1 ≤ f2 ≤ . . . ≤ fN. In order to maximize profit we
can either take current job j and include to optimal solution or not include current job j to optimal
solution. How to find the profit including current job? The idea is to find the latest job before the current
job (in sorted array) that doesn’t conflict with current job 1<=j<=N-1. Once we find such a job, we recur
for all jobs till that job and add profit of current job to result.

Blog: anilkumarprathipati.wordpress.com Page 1


Greedy and Dynamic Programming UNIT-4

Fig: Interval scheduling solution with 3 compatible jobs


Now, jobs – B, E, H are completed without overlapping. Maximum numbers of jobs are three
(here observe the color code of each job.

2. Huffman Coding:

Huffman coding is a lossless data compression algorithm. The idea is to assign variable-length
codes to input characters, lengths of the assigned codes are based on the frequencies of corresponding
characters. The most frequent character gets the smallest code, and the least frequent character gets the
largest code.
The variable-length codes assigned to input characters are prefix Codes, means the codes (bit
sequences) are assigned in such a way that the code assigned to one character is not the prefix of code
assigned to any other character. This is how Huffman Coding makes sure that there is no ambiguity when
decoding the generated bitstream.
Let us understand prefix codes with a counter example. Let there be four characters a, b, c and d, and their
corresponding variable length codes be 00, 01, 0 and 1. This coding leads to ambiguity because code
assigned to c is the prefix of codes assigned to a and b. If the compressed bit stream is 0001, the de-
compressed output may be “cccd” or “ccb” or “acd” or “ab”. See this or applications of Huffman Coding.
There are mainly two major parts in Huffman Coding
1. Build a Huffman Tree from input characters.
2. Traverse the Huffman Tree and assign codes to characters.

Steps to build Huffman Tree


Input is an array of unique characters along with their frequency of occurrences and output is Huffman
Tree.
1. Create a leaf node for each unique character and build a min heap of all leaf nodes (Min Heap
is used as a priority queue. The value of frequency field is used to compare two nodes in min
heap. Initially, the least frequent character is at root)
2. Extract two nodes with the minimum frequency from the min heap.
3. Create a new internal node with a frequency equal to the sum of the two nodes frequencies.
Make the first extracted node as its left child and the other extracted node as its right child.
Add this node to the min heap.
4. Repeat steps#2 and #3 until the heap contains only one node. The remaining node is the root
node and the tree is complete.

Blog: anilkumarprathipati.wordpress.com Page 2


Greedy and Dynamic Programming UNIT-4

Example:
Character Frequency
a 5
b 9
c 12
d 13
e 16
f 45
Step 1. Build a min heap that contains 6 nodes where each node represents root of a tree with single
node.
Step 2 Extract two minimum frequency nodes from min heap. Add a new internal node with frequency
5 + 9 = 14.

Illustration of step 2
Now min heap contains 5 nodes where 4 nodes are roots of trees with single element each, and
one heap node is root of tree with 3 elements.
character Frequency
c 12
d 13
Internal Node 14
e 16
f 45
Step 3: Extract two minimum frequency nodes from heap. Add a new internal node with frequency 12
+ 13 = 25

Illustration of step 3
Now min heap contains 4 nodes where 2 nodes are roots of trees with single element each, and two
heap nodes are root of tree with more than one nodes
character Frequency
Internal Node 14
e 16
Internal Node 25
f 45

Blog: anilkumarprathipati.wordpress.com Page 3


Greedy and Dynamic Programming UNIT-4

Step 4: Extract two minimum frequency nodes. Add a new internal node with frequency 14 + 16 = 30

Illustration of step 4
Now min heap contains 3 nodes.
character Frequency
Internal Node 25
Internal Node 30
f 45
Step 5: Extract two minimum frequency nodes. Add a new internal node with frequency 25 + 30 = 55

Illustration of step 5
Now min heap contains 2 nodes.
character Frequency
f 45
Internal Node 55
Step 6: Extract two minimum frequency nodes. Add a new internal node with frequency 45 + 55 = 100

Illustration of step 6

Blog: anilkumarprathipati.wordpress.com Page 4


Greedy and Dynamic Programming UNIT-4

Now min heap contains only one node.


character Frequency
Internal Node 100
Since the heap contains only one node, the algorithm stops here.
Steps to print codes from Huffman Tree:
Traverse the tree formed starting from the root. Maintain an auxiliary array. While moving to the left
child, write 0 to the array. While moving to the right child, write 1 to the array. Print the array when a
leaf node is encountered.

Steps to print code from HuffmanTree


The codes are as follows:
character code-word
f 0
c 100
d 101
a 1100
b 1101
e 111
Time complexity: O(nlogn) where n is the number of unique characters. If there are n nodes,
extractMin() is called 2*(n – 1) times. extractMin() takes O(logn) time as it calls minHeapify(). So,
overall complexity is O(nlogn).
If the input array is sorted, there exists a linear time algorithm. We will soon be discussing in
our next post.
Applications of Huffman Coding:
1. They are used for transmitting fax and text.
2. They are used by conventional compression formats like PKZIP, GZIP, etc.
3. Multimedia codecs like JPEG, PNG, and MP3 use Huffman encoding (to be more precise
the prefix codes).
It is useful in cases where there is a series of frequently occurring characters.

Blog: anilkumarprathipati.wordpress.com Page 5


Greedy and Dynamic Programming UNIT-4

3. Dynamic Programming:
Dynamic programming, like the divide-and-conquer method, solves problems by combining the
solutions to sub-problems. Dynamic programming is applicable when the sub-problems are not
independent, that is, when sub-problems share sub sub-problems. A dynamic-programming algorithm
solves every sub sub-problem just once and then saves its answer in a table, thereby avoiding the work of
re-computing the answer every time the sub sub-problem is encountered. Dynamic programming is
typically applied to optimization problems. The development of a dynamic-programming algorithm can
be broken into a sequence of four steps.
1. Characterize the structure of an optimal solution.
2. Recursively define the value of an optimal solution.
3. Compute the value of an optimal solution in a bottom-up fashion.
4. Construct an optimal solution from computed information.
The dynamic programming technique was developed by Bellman based upon the principle known as
principle of optimality. Dynamic programming uses optimal substructure in a bottom-up fashion. That is,
we first find optimal solutions to sub problems and, having solved the sub problems, we find an optimal
solution to the problem.

4. Application - Matrix chain multiplication:


Matrix chain multiplication is an example of dynamic programming. We are given a sequence
(chain) A1, A2, ..., An of n matrices to be multiplied, and we wish to compute the productA 1xA2 x
A3x….xAn. We can evaluate the expression using the standard algorithm for multiplying pairs of matrices
as a subroutine once we have parenthesized it to resolve all ambiguities in how the matrices are multiplied
together. A product of matrices is fully parenthesized if it is either a single matrix or the product of two
fully parenthesized matrix products, surrounded by parentheses. Matrix multiplication is associative, and
so all parenthesizations yield the same product. For example, if the chain of matrices is A 1, A2, A3, A4, the
product A1x A2x A3x A4 can be fully parenthesized in five distinct ways:
(A1 (A2 (A3 A4))) ,
(A1 ((A2 A3) A4)) ,
((A1 A2) (A3 A4)) ,
((A1 (A2 A3)) A4) ,
(((A1 A2) A3) A4).
The way we parenthesize a chain of matrices can have a dramatic impact on the cost of evaluating the
product. Consider first the cost of multiplying two matrices. We can multiply two matrices A and B only
if they are compatible: the number of columns ofA must equal the number of rows of B. If A is am × n
matrix and B is a p ×q matrix, the resulting matrix C is a m × q matrix. The standard algorithm is given
below

Algorithm Matrix_Mul(A, B)
{
if(n ≠ P) then
error "incompatible dimensions"
else
for i ← 1 to m do
for j ← 1 to q do
{
C[i, j] ← 0
for k ← 1 to ndo
C[i, j] ← C[i, j] + A[i, k] * B[k, j]
}
return C
}

Blog: anilkumarprathipati.wordpress.com Page 6


Greedy and Dynamic Programming UNIT-4

The time to compute C is the number of multiplications which is mnq or mpq.


Example 1:-To illustrate the different costs incurred by different parenthesizations of a matrix product,
consider the problem to find the product of three matricesA1, A2, A3 i.e. A1* A2* A3 of three matrices.
Suppose that the dimensions of the matrices are 10 × 100, 100 × 5, and 5 × 50, respectively. If we
multiply according to the parenthesization
((A1 A2) A3) = 10 * 100 * 5 + 10 * 5 *50 =7500
(A1 (A2 A3)) = 10 * 100 * 50 +100 * 5 * 50 =75,000
Thus, computing the product according to the first parenthesization is 10 times faster.
Definition:-The matrix-chain multiplication problem can be stated as follows: given a chain A 1, A2, ...,An
of n matrices, where for i = 1, 2, ..., n, matrix Ai has dimension Pi-1 ×Pi, fully parenthesize the product A1
A2 An in a way that minimizes the number of scalar multiplications.
Note:-In the matrix-chain multiplication problem, we are not actually multiplying matrices. Our goal is
only to determine an order for multiplying matrices that has the lowest cost.

Solving the matrix-chain multiplication problem by dynamic programming


Step 1: The structure of an optimal parenthesization. Our first step in the dynamic-programming paradigm
is to find the optimal substructure and then use it to construct an optimal solution to the problem from
optimal solutions to sub-problems. For the matrix-chain multiplication problem, we can perform this step
as follows. For example any parenthesization of the product Ai Ai+1Aj must split the product between Ak
and Ak+1 for some integer k in the range i ≤ k < j. That is, for some value of k, we first compute the
matrices Ai,k and Ak+1,j and then multiply them together to produce the final product Ai,j. The cost of this
parenthesization is thus the cost of computing the matrix Ai,k, plus the cost of computing Ak+1,j, plus the
cost of multiplying them together.
Step 2: A recursive solution. Next, we define the cost of an optimal solution recursively in terms of the
optimal solutions to sub problems. For the matrix-chain multiplication problem, We can define m[i, j]
recursively as follows. If i = j, the matrix Ai,j = Ai, so that no scalar multiplications are necessary to
compute the product. Thus, Mij= 0 for i = j
= min{Mi,k + Mk + 1, j + Pi-1PkPj} for i < j
i<=k<j
Step 3: Computing the optimal costs. we perform the third step of the dynamic-programming paradigm
and compute the optimal cost by using a tabular ,bottom-up approach.

Step 4: Constructing an optimal solution. In the first level we compare M 12and M23. When M12< M23 we
parenthesize the A1A2 in the product A1A2A3i.e (A1A2) A3 and parenthesize the A2A3 in the product
A1A2A3i.e A1(A2A3)whenM12>M23 .This process is repeated until the whole product is parenthesized. The
top entry in the table i.e M13 gives the optimum cost of matrix chain multiplication.

Example:-Find an optimal parenthesization of a matrix-chain product whose dimensions are given in the
table below.

Blog: anilkumarprathipati.wordpress.com Page 7


Greedy and Dynamic Programming UNIT-4

M atrix D imension
P 5x4
Q 4x6
R 6x2
T 2x7
Solution:-Given
P0=5 ,P1=4, P2=6, P3=2, P4=7
The Bottom level of the table is to be initialized.
Mi,j=0 where i = j

To compute Mijwhen i < j,


Mij= min { Mi,k + Mk + 1, j + Pi-1PkPj} for i < j
i<=k<j
Thus M12 = min{M11+M22+P0P1P2} = 0 + 0 + 5 * 4 * 6 = 120
M23 = min{ M22+M33+P1P2P3} = 0 + 0 + 4 * 6 * 2 = 48
M34 = min{ M33+M44+P2P3P4} = 0 + 0 + 6 * 2 * 7 = 84
M13 = min{ M11+M23+P0P1P3,M12+M33+P0P2P3 }
= min{0 + 48 + 5 * 4 * 2 , 120+ 0+5 * 6 * 2} = min{ 88, 180}= 88
M24 = min{ M22+M34+P1P2P4,M23+M44+P1P3P4 }
= min{0 + 84 + 4 * 6 * 7, 48+ 0+ 4 * 2 * 7} = min{ 252, 104}= 104
M14 = min{ M11+M24+P0P1P4,M12+M34+P0P2P4 ,M13+M44+P0P3P4 }
= min{0 + 104 + 5 * 4 * 7,120+ 84+ 5 * 6 * 7,88 + 0 + 5 * 2 * 7}
= min{244,414, 158} = 158

In the first level when we compare M12, M23 and M34. As M23=48 is minimum among three we
parenthesize the QR in the product PQRT i.e P(QR)T. In the second level when we compare M 13 and
M24. As M13=88 is minimum among the two we parenthesize the P and QR in the product PQRT i.e
(P(QR))T. Finally we parenthesize the whole product i.e ((P(QR))T). The top entry in the table i.eM 14
gives the optimum cost of ((P(QR))T).

Blog: anilkumarprathipati.wordpress.com Page 8


Greedy and Dynamic Programming UNIT-4

Verification:-The chain of matrices is P, Q, R, T, the product Px Qx Rx T can be fully parenthesized in


five distinct ways:
1. (P(Q(RT)))2. (P((QR)T)) 3. ((PQ)(RT))4. ((P(QR))T) 5. (((PQ) R)T)
Cost of (P(Q(RT))) = 5*4*7 + 4*6*7 + 6*2*7 = 392
Cost of (P((QR)T)) = 5*4*7 + 4*6*2 + 4*2*7 = 244
Cost of ((PQ)(RT)) = 5*4*6 + 6*2*7 + 5*6*7 = 414
Cost of ((P(QR))T) = 5*4*2 + 4*6*2 + 5*2*7 = 158
Cost of (((PQ) R)T) = 5*4*6 + 5*6*2 + 5*2*7 = 250
From the above manual method also we find the optimal cost is 158 and the order of matrix
multiplication is ((P(QR))T)

Algorithm Matrix_Chain_Mul(p)
{
for i = 1 to n do
M[i, i] = 0
for len= 2 to n do
{
for i = 1 to n - len + 1 do
{
j←i+l-1
M[i, j] ← ∞
for k = i to j - 1 do
q =M[i, k] + M[k + 1, j] + Pi-1PkPj
if q <M[i, j]then
{
M[i, j] ← q
}
}
}
return m
}
Time Complexity:-Algorithm, Matrix_Chain_Mul uses first For loop to initialize M[i,j] which takes O(n).
M[i, j] value is computed using three For loops which takes O(n3). Thus the overall time complexity of
Matrix_Chain_Mul is O(n3).

5. Longest Common Subsequence:

Here longest means that the subsequence should be the biggest one. The common means that
some of the characters are common between the two strings. The subsequence means that some of the
characters are taken from the string that is written in increasing order to form a subsequence.

Application: comparison of two strings


Ex: X= {A B C B D A B }, Y= {B D C A B A}
Longest Common Subsequence:
X= AB C BDAB
Y= BDCAB A
Brute force algorithm would compare each subsequence of X with the symbols in Y.
• if |X| = m, |Y| = n, then there are 2m subsequences of x; we must compare each with Y (n
comparisons)
• So the running time of the brute-force algorithm is O(n 2m)

Blog: anilkumarprathipati.wordpress.com Page 9


Greedy and Dynamic Programming UNIT-4

• Notice that the LCS problem has optimal substructure: solutions of subproblems are parts of the
final solution.
• Subproblems: “find LCS of pairs of prefixes of X and Y”
• First, we’ll find the length of LCS. Later we’ll modify the algorithm to find LCS itself.
• Define Xi, Yj to be the prefixes of X and Y of length i and j respectively
• Define c[i,j] to be the length of LCS of Xi and Yj
• Then the length of LCS of X and Y will be c[m,n]
c[i  1, j  1]  1 if x[i ]  y[ j ],
c[i, j ]  
 max(c[i, j  1], c[i  1, j ]) otherwise
• We start with i = j = 0 (empty substrings of x and y)
• Since X0 and Y0 are empty strings, their LCS is always empty (i.e. c[0,0] = 0)
• LCS of empty string and any other string is empty, so for every i and j: c[0, j] = c[i,0] = 0
• When we calculate c[i,j], we consider two cases:
• First case: x[i]=y[j]: one more symbol in strings X and Y matches, so the length of LCS Xi and
Yj equals to the length of LCS of smaller strings Xi-1 and Yi-1 , plus 1
• Second case: x[i] != y[j]
• As symbols don’t match, our solution is not improved, and the length of LCS(X i , Yj) is the same
as before (i.e. maximum of LCS(Xi, Yj-1) and LCS(Xi-1,Yj)

LCS Algorithm:

LCS-Length(X, Y)
1. m = length(X) // get the # of symbols in X
2. n = length(Y) // get the # of symbols in Y
3. for i = 1 to m c[i,0] = 0 // special case: Y0
4. for j = 1 to n c[0,j] = 0 // special case: X0
5. for i = 1 to m // for all Xi
6. for j = 1 to n // for all Yj
7. if ( Xi == Yj )
8. c[i,j] = c[i-1,j-1] + 1
9. else c[i,j] = max( c[i-1,j], c[i,j-1] )
10. return c

Example:
X = ABCB, Y = BDCAB
Initially, the table as shown below

Blog: anilkumarprathipati.wordpress.com Page 10


Greedy and Dynamic Programming UNIT-4

After filling the cells, finally the table as

LCS Algorithm Running Time


• LCS algorithm calculates the values of each entry of the array c[m,n]
• So what is the running time?
• O(m*n)
• since each c[i,j] is calculated in constant time, and there are m*n elements in the array
How to find actual LCS
• So far, we have just found the length of LCS, but not LCS itself.
• We want to modify this algorithm to make it output Longest Common Subsequence of X and Y
Each c[i,j] depends on c[i-1,j] and c[i,j-1] or c[i-1, j-1]
For each c[i,j] we can say how it was acquired:
For example, here c[i,j] = c[i-1,j-1] +1 = 2+1=3

 So we can start from c[m,n] and go backwards


 Whenever c[i,j] = c[i-1, j-1]+1, remember x[i] (because x[i] is a part of LCS)
 When i=0 or j=0 (i.e. we reached the beginning), output remembered letters in reverse order.

Blog: anilkumarprathipati.wordpress.com Page 11


Greedy and Dynamic Programming UNIT-4

Blog: anilkumarprathipati.wordpress.com Page 12

You might also like