DAA Module 3
DAA Module 3
Contents
The greedy method is the straight forward design technique applic able to variety of
applications.
The greedy approach sugges ts constructing a solution through a sequence of steps, each
expanding a partially constru cted solution obtained so far, until a complete solution to the
problem is reached. On each s tep the choice made must be:
feasible, i.e., it has to satisfy the problem’s constraints
locally optimal, i.e., it has to be the best local choice among a ll feasible choices
available on that step
irrevocable, i.e., once made, it cannot be changed on subseq uent steps of the
algorithm
As a rule, greedy algorithms are both intuitively appealing and simple. Gi ven an
optimization problem, it is usually easy to figure out how to proceed in a greedy ma nner,
possibly after considering a few small instances of the problem. What is usually more
difficult is to prove that a greedy algorithm yields an optimal solution (when it does).
Problem Statement: Given coins of several denominations find out a way to give a customer
an amount with fewest numbe r of coins.
Example: if denominations are 1, 5, 10, 25 and 100 and the change required is 30, the
solutions are,
Amount : 30
Solutions : 3 x 10 ( 3 coins ), 6 x 5 ( 6 coins )
1 x 25 + 5 x 1 ( 6 coins ) 1 x 25 + 1 x 5 ( 2 coins )
The last solution is the optim al one as it gives us change only with 2 coins.
Solution for coin change prooblem using greedy algorithm is very intui tive and called as
cashier’s algorithm. Basic principle is: At every iteration for search o f a coin, take the
largest coin which can fit in to remain amount to be changed at that p articular time. At
the end you will have optimal solution.
There are several greedy meth ods to obtain the feasible solutions.
a) At each step fill the knapsack with the object with largest profit - If the object under
consideration does not fit, then the fraction of it is included to fill the knap sack. This method
does not result optimal solution. As per this method the solution to the above problem is as
follows;
Select Item-1 with profit p1=25, here w1=18, x1=1. Remaining cap acity = 20-18 = 2
Select Item-2 with profit p1=24, here w2=15, x1=2/15. Remainin g capacity = 0
Total profit earned = 2 8.2. This results 2nd solution in the example 4.1
Algorithm: The algorithm given below assumes that the objects are sorted in non-
increasing order of pro fit/weight ratio
Analysis:
Disregarding the time to initia lly sort the object, each of the above strategi es use O(n) time,
Note: The greedy approach to solve this problem does not necessaril y yield an optimal
solution
The greedy strategy to solve job sequencing problem is, “At each time select the job that that
satisfies the constraints and gives maximum profit. i.e consider th e jobs in the non
decreasing order of the pi’s”
By following this procedure, we get the 3rd solution in the example 4.3. It can be proved
that, this greedy strategy always results optimal solution
Algorithm/Program 4.6: Greedy algorithm for sequencing unit time jobs with deadlines and
profits
Analysis:
Analysis
Prim's algorithm constructs a minimum spanning tree through a sequence of expanding sub-
trees. The initial subtree in s uch a sequence consists of a single vertex selected arbitrarily
from the set V of the graph' s vertices. On each iteration it expands the current tree in the
greedy manner by simply attaching to it the nearest vertex not in that tree. (By the nearest
vertex, we mean a vertex not in the tree connected to a vertex in the tree by an edge of the
smallest weight. Ties can be broken arbitrarily.) The algorithm stops after all the graph's
vertices have been included i n the tree being constructed. Since the algorithm expands a tree
by exactly one vertex on each of its iterations, the total number of such iterations is n - 1,
where n is the number of v ertices in the graph. The tree generated by the algorithm is
obtained as the set of edges.
Correctness
Prim’s algorithm always yield s a minimum spanning tree.
Analysis of Efficiency
The efficiency of Prim’s algorithm depends on the data structures chosen for the graph itself
and for the priority queue off the set V − VT whose vertex priorities are the distances to the
nearest tree vertices.
1. If a graph is represented by its weight matrix and the priority qu eue is implemented
as an unordered arraay, the algorithm’s running time will be in Θ(|V|2). Indeed, on
each of the |V| − 1iter ations, the array implementing the priority q ueue is traversed
to find and delete the m inimum and then to update, if necessary, the priorities of the
remaining vertices.
We can implement the priorit y queue as a min-heap. (A min-heap is a c omplete binary tree
in which every element is les s than or equal to its children.) Deletion of t he smallest element
from and insertion of a new el ement into a min-heap of size n are O(log n) operations.
2. If a graph is represented by its adjacency lists and the priority qu eue is implemented
as a min-heap, the run ning time of the algorithm is in O(|E| log |V |).
This is because the algorithm performs |V| − 1 deletions of the smallest element and makes
|E| verifications and, possibl y, changes of an element’s priority in a mi n-heap of size not
exceeding |V|. Each of these o perations, as noted earlier, is a O(log |V|) op eration. Hence,
the running time of this implemen tation of Prim’s algorithm is in
(|V| − 1+ |E|) O (log |V |) = O(|E| log |V |) because, in a connected grap h, |V| − 1≤ |E|.
Kruskal's algorithm is anothe r greedy algorithm for the minimum spanning tree problem that
also always yields an optimal solution. It is named Kruskal's algorithm, after Joseph Kruskal.
Kruskal's algorithm looks at a minimum spanning tree for a weighted connected graph G =
(V, E) as an acyclic sub graph with |V | - 1 edges for which the sum of the edge weights is
the smallest. Consequently, the algorithm constructs a minimum spanning tree as an
expanding sequence of sub graphs, which are always acyclic but are not necessarily
connected on the intermediate stages of the algorithm.
Working
The algorithm begins by sorting the graph's edges in non decreasing order of their weights.
Then, starting with the empty sub graph, it scans this sorted list adding th e next edge on the
list to the current sub graph if such an inclusion does not create a cycle a nd simply
skipping the edge otherwise.
The fact that ET ,the set of edges composing a minimum spanning tree of graph G actually a
tree in Prim's algorithm but geenerally just an acyclic sub graph in Kruskal'ss algorithm.
Kruskal’s algorithm is not si mpler because it has to check whether the a ddition of the next
edge to the edges already selected would create a cycle.
We can consider the algorit hm's operations as a progression through a series of forests
containing all the vertices of a given graph and some of its edges. The initial forest consists
of |V| trivial trees, each comprising a single vertex of the graph. The final forest consists of
a single tree, which is a mi nimum spanning tree of the graph. On each iteration, the
algorithm takes the next edge (u, v) from the sorted list of the graph's edges, finds the trees
containing the vertices u and v, and, if these trees are not the same, unites them in a larger
tree by adding the edge (u, v).
Analysis of Efficiency
The crucial check whether tw o vertices belong to the same tree can be founnd out using
union-find algorithms.
Efficiency of Kruskal’s algorithm is based on the time needed for sortin g the edge weights
of a given graph. Hence, wit h an efficient sorting algorithm, the time efficiency of Kruskal's
algorithm will be in O (|E| log |E|).
Illustration
An example of Kruskal’s alg orithm is shown below. The
selected edges are shown in bold.
Dijkstra's Algorithm is the best-known algorithm for the single-so urce shortest-paths
problem. This algorithm is applicable to undirected and directed graph s with nonnegative
weights only.
Working - Dijkstra's algorithm finds the shortest paths to a graph's vertic es in order of their
distance from a given source.
First, it finds the shortest path from the source to a vertex nearest to it, then to a
second nearest, and so on.
Since all the edge weiights are nonnegative, the next vertex neares t to the source can be
found among the v ertices adjacent to the vertices of Ti. The set of vertices adjacent to the
vertices in Ti c an be referred to as "fringe vertices"; they are the candidates from which
Dijkstra's algorithm selects the next vertex nearest to th e source.
To identify the ith nea rest vertex, the algorithm computes, for eveery fringe vertex u,
the sum of the distanc e to the nearest tree vertex v (given by the we ight of the edge
(v, u)) and the length d., of the shortest path from the source to v (previously
determined by the algorithm) and then selects the vertex with the smallest such sum.
The fact that it suffices to compare the lengths of such special paths is the central
insight of Dijkstra's algorithm.
To facilitate the algorithm's operations, we label each vertex with t wo labels.
O The numeric label d indicates the length of the shortest path from the source to
this vertex found by the algorithm so far; when a vertex is a dded to the tree, d
indicates the length of the shortest path from the source to that vertex.
O The other label in dicates the name of the next-to-last vertex on such a path, i.e., the
parent of the vertex in the tree being constructed. (It can be left unspecified for the
source s annd vertices that are adjacent to none of the current tree vertices.)
With such labeling, f inding the next nearest vertex u* becomes a simple task of
finding a fringe vertex with the smallest d value. Ties can be broke n arbitrarily.
After we have identified a vertex u* to be added to the tree, we need to perform two
operations:
O Move u* from the fringe to the set of tree vertices.
O For each rema ining fringe vertex u that is connected to u* by an edge
of weight w (u*, u) such that d u*+ w(u*, u) <d u, update the labels of u by u*
and du* + w(u *, u), respectively.
O
The shortest paths (identified by following nonnumeric labels backward from a destination
vertex in the left column to th e source) and their lengths (given by numeric labels of the tree
vertices) are as follows:
The pseudocode of Dijkstr a’s algorithm is given below. Note that in the following
pseudocode, VT contains a givven source vertex and the fringe contains the vertices adjacent
to it after iteration 0 is completed.
Analysis:
The time efficiency of Dijkstra’s algorithm depends on the data structures used for implementing
the priority queue and for representing an input graph itself. For graphs represented by their
adjacenc y lists and the priority queue implemented as a min-heap, it is in
O ( |E| log |V| )
Applications
Suppose we have to encode a text that comprises characters from some n-character alphabet by
assigning to each of the tex t's characters some sequence of bits called th e codeword.There are
two types of encoding: Fixxed-length encoding, Variable-length encodin g
Fixed-length encoding: This method assigns to each character a bit string of the same length
m (m >= log2 n). This is exactly what the standard ASCII code does. On e way of getting a
coding scheme that yields a shorter bit string on the average is based on the old idea of
assigning shorter code-word s to more frequent characters and longer code-words to less
frequent characters.
If we want to create a binary prefix code for some alphabet, it is natu ral to associate the
alphabet's characters with leaves of a binary tree in which all the left edg es are labelled by 0
and all the right edges are labelled by 1 (or vice versa). The codeword of a character can then
be obtained by recording the labels on the simple path from the root to the character's leaf.
Since there is no simple path to a leaf that continues to another leaf, no codeword can be a
prefix of another codeword; hence, any such tree yields a prefix code.
Among the many trees that can be constructed in this manner for a gi ven alphabet with
known frequencies of the character occurrences, construction of suchh a tree that would
assign shorter bit strings to high-frequency characters and longer oness to low-frequency
characters can be done by the following greedy algorithm, invented by Da vid Huffman.
4.1 Huffman Trees and Codes
Huffman's Algorithm
Step 1: Initialize n one-node trees and label them with the characters of th e alphabet. Record
the frequency of each charact er in its tree's root to indicate the tree's weigh t. (More
generally, the weight of a tree will be equal to the sum of the frequencies in the tree's leaves.)
Step 2: Repeat the following operation until a single tree is obtained. Find two trees with the
smallest weight. Make them the left and right subtree of a new tree and record the sum of
their weights in the root of the new tree as its weight.
A tree constructed by the above algorithm is called a Huffman tree. It de fines-in the manner
described-a Huffman code.
Example: Consider the five-symbol alphabet {A, B, C, D, _} with the following occurrence
frequencies in a text made up of these symbols:
The Huffman tree construction for the above problem is shown below:
With the occurrence frequen cies given and the codeword lengths obt ained, the average
number of bits per symbol i n this code is
2*0.35+3*0.1+2*0.2+2*0.2+3*0.15=2.25.
Had we used a fixed-length encoding for the same alphabet, we would ha ve to use at least 3
bits per each symbol. Thus, fo r this example, Huffman’s code achieves the compression
ratio (a standard measure of a compression algorithm’s effectiveness) of (3−2. 25)/3*100%=
25%. In other words, Huffman’s encoding of the above text will use 25% lesss memory than
its fixed-length encoding.
Heap is a partially ordered da ta structure that is especially suitable for implementing priority
queues. Priority queue is a m ultiset of items with an orderable characteris tic called an
item’s priority, with the following operations:
finding an item with the highest (i.e., largest) priority
deleting an ite m with the highest priority
adding a new item to the multiset
Notion of the Heap
Definition:
A heap can be defined as a binary tree with keys assigned to its nodes, one key per node,
provided the following two co nditions are met:
1. The shape property—the binary tree is essentially complete (or simply complete),
i.e., all its levels are full except possibly the last level, where onnly some rightmost
leaves may be missing.
2. The parental dominance or heap property—the key in each nod e is greater than or
equal to the keys in its children.
Illustration:
The illustration of the definit ion of heap is shown bellow: only the left mo st tree is heap.
The second one is not a heap, because the tree’s shape property is violated. The left child of
last subtree cannot be empty. And the third one is not a heap, because the parental dominance
fails for the node with key 5.
Properties of Heap
1. There exists exactly o ne essentially complete binary tree with n nodes. Its height is
equal to
2. The root of a heap always contains its largest element.
Thus, we could also define a heap as an array H[1..n] in which every elem ent in position i in
the first half of the array is greater than or equal to the elements in positions 2i and 2i + 1, i.e.,
The bottom-up heap construction algorithm is illustrated bellow. It initial izes the essentially
complete binary tree with n nodes by placing keys in the order given and then “heapifies” the
tree as follows.
Starting with the last parental node, the algorithm checks wh ether the parental
dominance holds for the key in this node. If it does not, the algor ithm exchanges the
node’s key K with th e larger key of its children and checks whether the parental
dominance holds for K in its new position. This process continues until the parental
dominance for K is satisfied. (Eventually, it has to because it hold s automatically for
any key in a leaf.)
After completing the “heapification” of the subtree rooted at t he current parental
node, the algorithm proceeds to do the same for the node’s immedi ate predecessor.
The algorithm stops after this is done for the root of the tree.
Illustration
Bottom-up construction of a heap for the list 2, 9, 7, 6, 5, 8. The double headed arrows show
key comparisons verifying the parental dominance.
Assume, for simplicity, that n = 2k − 1 so that a heap’s tree is full, i.e., the largest possible
number of nodes occurs on each level. Let h be the height of the tree.
According to the first property of heaps in the list at the beginning of the section, h= or just
1 = k − 1 for the specific values of n we are considering.
Each key on level i of the tree will travel to the leaf level h in the worst case of the heap
construction algorithm. Since moving to the next level down requires two comparisons—one
to find the larger child and th e other to determine whether the exchange is required—the
total number of key comparisons innvolving a key on level i will be 2(h − i).
Therefore, the total number of key comparisons in the worst case will be
where the validity of the last equality can be proved either by using the c losed-form formula
Thus, with this bottom-up alg orithm, a heap of size n can be constructed with fewer than 2n
comparisons.
Obviously, this insertion opeeration cannot require more key comparisons than the heap’s
height. Since the height of a heap with n nodes is about log2 n, the time efficiency of
insertion is in O (log n).
Illustration of inserting a new key: Inserting a new key (10) into the
heap is constructed bellow. T he new key is shifted up via a swap with
its parents until it is not larger than its parents (or is in the root).
Delete an item from a hea p: Deleting the root’s key from a heap can be done with the
following algorithm:
Maximum Key Deletion fro m a heap
1. Exchange the root’s k ey with the last key K of the heap.
2. Decrease the heap’s size by 1.
3. “Heapify” the smaller tree by sifting K down the tree exactly in the same way we did
it in the bottom-up heap construction algorithm. That is, v erify the parental
dominance for K: if it holds, we are done; if not, swap K with the l arger of its
children and repeat this operation until the parental dominance condition holds for K
in its new position.
Illustration
The efficiency of deletion is determined by the number of key com parisons needed to
“heapify” the tree after the s wap has been made and the size of the tree is decreased by 1.
Since this cannot require more key comparisons than twice the heap’ s height, the time
efficiency of deletion is in O ( log n) as well.
As a result, the array elementts are eliminated in decreasing order. But si nce under the array
implementation of heaps an element being deleted is placed last, the res ulting array will be
exactly the original array sort ed in increasing order.
Analysis of efficiency:
Since we already know that the heap construction stage of the algorithm is in O(n), we have
to investigate just the time efficiency of the second stage. For the number of key
comparisons, C(n), needed for eliminating the root keys from the heaps of diminishing sizes
from n to 2, we get the following inequality:
This means that C(n) ∈ O(n lo g n) for the second stage of heapsort.
For both stages, we get O(n) + O(n log n) = O(n log n).
A more detailed analysis sho ws that the time efficiency of heapsort is, in fact, in Θ(n log n)
in both the worst and average cases. Thus, heapsort’s time efficiency fal ls in the same class
as that of mergesort.
Unlike the latter, heapsort is in-place, i.e., it does not require any extra storage. Timing
experiments on random files show that heapsort runs more slowly than quicksort but can be
competitive with mergesort.
*****