Unit 2 - Analysis and Design of Algorithm - WWW - Rgpvnotes.in
Unit 2 - Analysis and Design of Algorithm - WWW - Rgpvnotes.in
Tech
Subject Name: Analysis and Design of Algorithm
Subject Code: IT-403
Semester: 4th
Downloaded from be.rgpvnotes.in
Introduction:
The greedy method is perhaps the most straight forward design technique, used to determine a
feasible solution that may or may not be optimal.
The method:
Applicable to optimization problems ONLY
Constructs a solution through a sequence of steps
Each step expands a partially constructed solution so far, until a complete solution to the
problem is reached.
Locally optimal: it has to be the best local choice among all feasible choices
algorithm
Feasible solution:- Most problems have n inputs and its solution contains a subset of inputs that
satisfies a given constraint(condition). Any subset that satisfies the constraint is called feasible
solution.
Optimal solution: To find a feasible solution that either maximizes or minimizes a given objective
function. A feasible solution that does this is called optimal solution.
The greedy method suggests that an algorithm works in stages, considering one input at a time. At
each stage, a decision is made regarding whether a particular input is in an optimal solution.
Greedy algorithms neither postpone nor revise the decisions (ie., nobacktracking).
Example: K uskal s i i al spa i g t ee. “ele t a edge f o a sorted list, check, decide, and never
visit it again.
Selection Fu tio , that sele ts a i put f o a[] a d e o es it. The sele ted i put s alue is
assigned to x.
Feasible Boolean-valued function that determines whether x can be included into the solution
vector.
Union function that combines x with solution and updates the objective function.
As, different pairings require different amounts of time, in this strategy we want to determine an
optimal way of merging many files together. At each step, two shortest sequences are merged.
Example
Let us consider the given files, f1, f2, f3, f4 and f5 with 20, 30, 10, 5 and 30 number of elements
respectively.
50 + 60 + 65 + 95 = 270
15 + 35 + 65 + 95 = 210
In this context, we are now going to solve the problem using this algorithm.
Initial Set
Step-1
Step-2
Step-3
Step-4
Huffman Coding:
Huffman coding is a lossless data compression algorithm. The idea is to assign variable-length codes to
input characters, lengths of the assigned codes are based on the frequencies of corresponding
characters. The most frequent character gets the smallest code and the least frequent character gets
the largest code.
The variable-length codes assigned to input characters are Prefix Codes, means the codes (bit
sequences) are assigned in such a way that the code assigned to one character is not prefix of code
assigned to any other character. This is how Huffman Coding makes sure that there is no ambiguity
when decoding the generated bit stream.
Example. Let there be four characters a, b, c and d, and their corresponding variable length codes be
00, 01, 0 and 1. This coding leads to ambiguity because code assigned to c is prefix of codes assigned
Input is array of unique characters along with their frequency of occurrences and output is Huffman
Tree.
1. Create a leaf node for each unique character and build a min heap of all leaf nodes (Min Heap is
used as a priority queue. The value of frequency field is used to compare two nodes in min heap.
Initially, the least frequent character is at root)
2. Extract two nodes with the minimum frequency from the min heap.
3. Create a new internal node with frequency equal to the sum of the two nodes frequencies. Make
the first extracted node as its left child and the other extracted node as its right child. Add this node to
the min heap.
4. Repeat steps#2 and #3 until the heap contains only one node. The remaining node is the root node
and the tree is complete.
character Frequency
a 5
b 9
c 12
d 13
e 16
f 45
Step 1. Build a min heap that contains 6 nodes where each node represents root of a tree with single
node.
Step 2 Extract two minimum frequency nodes from min heap. Add a new internal node with frequency
5 + 9 = 14.
Now min heap contains 5 nodes where 4 nodes are roots of trees with single element each, and one
heap node is root of tree with 3 elements
character Frequency
c 12
d 13
Internal Node 14
e 16
f 45
Step 3: Extract two minimum frequency nodes from heap. Add a new internal node with frequency 12
+ 13 = 25
Now min heap contains 4 nodes where 2 nodes are roots of trees with single element each, and two
heap nodes are root of tree with more than one nodes.
character Frequency
Internal Node 14
e 16
Internal Node 25
f 45
Step 4: Extract two minimum frequency nodes. Add a new internal node with frequency 14 + 16 = 30
Step 5: Extract two minimum frequency nodes. Add a new internal node with frequency 25 + 30 = 55
Step 6: Extract two minimum frequency nodes. Add a new internal node with frequency 45 + 55 = 100
Since the heap contains only one node, the algorithm stops here.
character code-word
f 0
c 100
d 101
a 1100
b 1101
e 111
The greedy method suggests that a minimum cost spanning tree can be obtained by contacting the
tree edge by edge. The next edge to be included in the tree is the edge that results in a minimum
increase in the some of the costs of the edges included so far.
There are two basic algorithms for finding minimum-cost spanning trees, and both are greedy
algorithms
P i s Algo ith
K uskal s Algo ith
Prim’s Algorithm: Start with any one node in the spanning tree, and repeatedly add the cheapest
edge, and the node it leads to, for which the node is not already in the spanning tree.
Fig1:Graph
PRIM’S ALGORITHM: -
i) Select an edge with minimum cost and include in to the spanning tree.
ii) Among all the edges which are adjacent with the selected edge, select the one with
minimum cost.
iii) ‘epeat step u til e ti es a d -1) edges are been included. And the sub graph
obtained does not contain any cycles.
Notes: - At every state a decision is made about an edge of minimum cost to be included into the
spanning tree. From the edges which are adjacent to the last edge included in the spanning tree i.e. at
every stage the sub-graph obtained is a tree.
Fig3: Graph
near (j): = 0;
for k:=1 to n do // update near ()
if ((near (k) 0) and (cost {k, near (k)) > cost (k,j)))
then near Z(k): = ji
}
return mincost;
}
The algorithm takes four arguments E: set of edges, cost is nxn adjacency matrix cost of (i,j)= +ve
intege , if a edge e ists et ee i&j othe ise i fi it . is o/: of e ti es. t is a -1):2matrix
which consists of the edges of spanning tree.
E = { (1,2), (1,6), (2,3), (3,4), (4,5), (4,7), (5,6), (5,7), (2,7) }
n = {1,2,3,4,5,6,7)
Fig4: Example
1. The algorithm will start with a tree that includes only minimum cost edge of G. Then edges
are added to this tree one by one.
2. The next edge (i,j) to be added is such that i is a vertex which is already included in the
treed and j is a vertex not yet included in the tree and cost of i,j is minimum among all
edges adja e t to i .
3. With ea h e te j e t et i luded i the t ee, e assig a alue ea j . The alue ea
j ep ese ts a e te i the t ee su h that ost j, ea j is i i u a o g all choices
for near (j)
4. We defi e ea j := fo all the e ti es j that a e al ead i the t ee.
5. The e t edge to i lude is defi ed the e te j su h that ea j 0 and cost of (j,
near (j)) is minimum.
Analysis: -
The time required by the p i e algo ith is di e tl p opo tio al to the o/: of e ti es. If a g aph G
Kruskal’s Algorithm: Start with no nodes or edges in the spanning tree, and repeatedly add the
cheapest edge that does not create a cycle.
In Kruskals algorithm for determining the spanning tree we arrange the edges in the increasing order
of cost.
i) All the edges are considered one by one in that order and deleted from the graph and are
included in to the spanning tree.
ii) At every stage an edge is included; the sub-graph at a stage need not be a tree. Infect it is a
forest.
iii) At the e d if e i lude e ti es a d -1 edges without forming cycles then we get a
single connected component without any cycles i.e. a tree with minimum cost.
At every stage, as we include an edge in to the spanning tree, we get disconnected trees represented
by various sets. While including an edge in to the spanning tree we need to check it does not form
cycle. Inclusion of an edge (i,j) will form a cycle if i,j both are in same set. Otherwise the edge can be
included into the spanning tree.
Kruskal minimum spanning tree algorithm
Algorithm Kruskal (E, cost, n,t)
//E is the set of edges i G. G has e ti es
//Cost {u,v} is the cost of edge (u,v) t is the set
//of edges in the minimum cost spanning tree
//The final cost is returned
{ construct a heap out of the edge costs using heapify;
for i:= 1 to n do parent (i):= -1 // place in different sets
//each vertex is in different set {1} {1} {3}
i: = 0; min cost: = 0.0;
While (i<n-1) and (heap not empty))do
{
Delete a minimum cost edge (u,v) from the heaps; and reheapify using adjust;
j:= find (u); k:=find (v);
if (j k) then
{ i: = 1+1;
+ (i,1)=u; + (i, 2)=v;
mincost: = mincost+cost(u,v);
Union (j,k);
}
}
if (i n- the ite No spa i g t ee ;
else return mincost;
}
Fig5: Graph
Consider the above graph of , Using Kruskal's method the edges of this graph are considered for
inclusion in the minimum cost spanning tree in the order (1, 2), (3, 6), (4, 6), (2, 6), (1, 4), (3, 5), (2, 5),
(1, 5), (2, 3), and (5, 6). This corresponds to the cost sequence 10, 15, 20, 25, 30, 35, 40, 45, 50, 55.
The first four edges are included in T. The next edge to be considered is (I, 4). This edge connects two
vertices already connected in T and so it is rejected. Next, the edge (3, 5) is selected and that
completes the spanning tree.
Analysis: - If the no/: of edges in the graph is given by /E/ then the time for Kruskals algorithm is given
by 0 (|E| log |E|).
Knapsack problem
The knapsack problem or rucksack (bag) problem is a problem in combinatorial optimization: Given a
set of items, each with a mass and a value, determine the number of each item to include in a
collection so that the total weight is less than or equal to a given limit and the total value is as large as
possible
Dynamic Programming:- Solve each sub problem once and store their solutions in an
array.
Fractional Knapsack
In this case, items can be broken into smaller pieces, hence the thief can select fractions of items.
and
In this version of Knapsack problem, items can be broken into smaller pieces. So, the thief may take
only a fraction xi of ith item. 0⩽xi⩽1 The ith item contributes the weight xi.wi to the total weight in the
knapsack and profit xi.pi to the total profit. Hence, the objective of this algorithm is to
maximize ∑n=1n(xi.pi)
It is clear that an optimal solution must fill the knapsack exactly, otherwise we could add a fraction of
one of the remaining items and increase the overall profit. Thus, an optimal solution can be obtained
by ∑n=1n(xi.wi)=W In this context, first we need to sort those items according to the value of piwi , so
that pi+1wi+ piwi . Here, x is an array to store the fraction of items.
Analysis
If the provided items are already sorted into a decreasing order of piwi
, then the whileloop takes a time in O(n); Therefore, the total time including the sort is in O(n logn).
Example
Let us consider that the capacity of the knapsack W = 60 and the list of provided items are shown in
the follo i g ta le −
Item A B C D
Profit 280 100 120 120
Weight 40 10 20 24
Ratio (piwi)
7 10 6 5
As the provided items are not sorted based on piwi
Item B A C D
Profit 100 280 120 120
Weight 10 40 20 24
Ratio (piwi)
10 7 6 5
Solution
First all of B is chosen as weight of B is less than the capacity of the knapsack. Next, item A is chosen,
as the available capacity of the knapsack is greater than the weight of A. Now, C is chosen as the next
item. However, the whole item cannot be chosen as the remaining capacity of the knapsack is less
than the weight of C.
Now, the capacity of the Knapsack is equal to the selected items. Hence, no more item can be
selected.
And the total profit is 100 + 280 + 120 * (10/20) = 380 + 60 = 440
This is the optimal solution. We cannot gain more profit selecting any different combination of items.
Consider a knapsack of capacity 20. Determine the optimum strategy for placing the objects in to the
knapsack. The problem can be solved by the greedy approach where in the inputs are arranged
according to selection process (greedy strategy) and solve the problem in stages. The various greedy
strategies for the problem could be as follows.
(0, 2/3, 1) 2 2
3
x15+10x1= 20 3
x 24 +15x1 = 31
(0, 1, ½ ) 1 1
1x15+ 2 x10 = 20 1x24+ 2 x15 = 31.5
½, ⅓, ¼ ½ 8+⅓ +¼ = . ½ +⅓ + ¼ x15 =
12.5+8+3.75 = 24.25
Analysis: - If we do not consider the time considered for sorting the inputs then all of the three greedy
strategies complexity will be O(n).
In job sequencing problem, the objective is to find a sequence of jobs, which is completed within their
deadlines and gives maximum profit.Let us consider, a set of n given jobs which are associated with
deadlines and profit is earned, if a job is completed by its deadline. These jobs need to be ordered in
such a way that there is maximum profit. It may happen that all of the given jobs may not be
completed within their deadlines. Assume, deadline of ith job Ji is di and the profit received from this
job is pi. Hence, the optimal solution of this algorithm is a feasible solution with maximum profit.
Method:
There is set of n-jo s. Fo a jo i, is a i tege deadli g di a d p ofit Pi> , the p ofit Pi is ea ed iff
the job completed by its deadline.
To complete a job one had to process the job on a machine for one unit of time. Only one machine is
available for processing jobs.
A feasible solution for this problem is a subset J of jobs such that each job in this subset can be
completed by its deadline.
The value of a feasible solution J is the sum of the profits of the jobs in J, i.e., ∑i∈jPi
The problem involves identification of a subset of jobs which can be completed by its deadline.
Therefore the problem suites the subset methodology and can be solved by the greedy method.
Example
Let us consider a set of given jobs as shown in the following table. We have to find a sequence of jobs,
which will be completed within their deadlines and will give maximum profit. Each job is associated
Job J1 J2 J3 J4 J5
Deadline 2 1 3 2 1
Profit 60 100 20 40 20
Solution
To solve this problem, the given jobs are sorted according to their profit in a descending order. Hence,
after sorting, the jobs are ordered as shown in the following table.
Job J2 J1 J4 J3 J5
Deadline 1 2 2 3 1
Profit 100 60 40 20 20
From this set of jobs, first we select J2, as it can be completed within its deadline and contributes
maximum profit.
In the next clock, J4 cannot be selected as its deadline is over, hence J3 is selected as it executes
within its deadline.
Thus, the solution is the sequence of jobs (J2, J1, J4), which are being executed within their deadline
and gives maximum profit.
algorithm js(d, j, n)
//d: dead line, j:subset of jobs ,n: total number of jobs
// d[i] i a e the dead li es,
// the jobs are o de ed su h that p[ ] p[ ] --- p[ ]
//j[i] is the ith jo i the opti al solutio i k, k subset range
{
d[0]=j[0]=0;
j[1]=1;
k=1;
for i=2 to n do{
r=k;
hile d[j[ ]]>d[i] a d [d[j[ ]]≠ do
r=r-1;
if d[j[ ]] d[i] a d d[i]> the
{
for q:=k to (r+1) setp-1 do j[q+1]= j[q];
j[r+1]=i;
k=k+1;
}
}
return k;
}
Graphs can be used to represent the highway structure of a state or country with vertices
representing cities and edges representing sections of highway.
The edges have assigned weights which may be either the distance between the 2 cities
connected by the edge or the average time to drive along that section of highway.
For example if A motorist wishing to drive from city A to B then we must answer the following
questions
o Is there a path from A to B
o If there is more than one path from A to B which is the shortest path
The length of a path is defined to be the sum of the weights of the edges on that path.
Given a directed graph G(V,E) with weight edge w(u,v). e have to find a shortest path from source
vertex S∈v to every other vertex v1∈ v-s.
To find SSSP for directed graphs G(V,E) there are two different algorithms.
Bellman-Ford Algorithm
Dijkst a s algo ith
1. Bellman-Ford Algorithm:- allow –ve weight edges in input graph. This algorithm either finds a
shortest path form source vertex S∈V to other vertex v∈V or detect a –ve weight cycles in G,
hence no solution. If there is no negative weight cycles are reachable form source vertex S∈V
to every other vertex v∈V
2. Dijkst a s algo ith :- allows only +ve weight edges in the input graph and finds a shortest path
from source vertex S∈V to every other vertex v∈V.
Consider the above directed graph, if node 1 is the source vertex, then shortest path from 1 to
2 is 1,4,5,2. The length is 10+15+20=45.
To formulate a greedy based algorithm to generate the shortest paths, we must conceive of a
multistage solution to the problem and also of an optimization measure.
This is possible by building the shortest paths one by one.
As an optimization measure we can use the sum of the lengths of all paths so far generated.
If e ha e al ead o st u ted i sho test paths, the usi g this opti izatio easu e, the
next path to be constructed should be the next shortest minimum length path.
The greedy way to generate the shortest paths from Vo to the remaining vertices is to generate
these paths in non-decreasing order of path length.
For this 1st, a shortest path of the nearest vertex is generated. Then a shortest path to the 2 nd
nearest vertex is generated and so on.