Chapter 1-3 algorithm analysis
Chapter 1-3 algorithm analysis
Chapter – One
✓ This definition implies that a largest element is at the root of the heap.
✓ Permits one to insert elements into a set and also to find the largest element
efficiently.
✓ If the elements are distinct, then the root contains the largest item.
✓ The parent node contains a value as small as or smaller than its children.
✓ In this case the root contains the smallest element.
✓ Assume that the larger values are closer to the root.
✓ A data structure which provides for these two operations is called a
priority queue.
19
12 16
1 4 7
19 12 16 1 4 7
Array A
Heap
• A heap can be stored as an array A.
– Root of tree is A[i],where i=1
– Left child of A[i] = A[2i]
– Right child of A[i] = A[2i + 1]
– Parent of A[i] = A[ i/2 ]
– Heapsize[A] ≤ length[A]
• The root of the tree A[1] and given index i of a node, the indices of its parent, left child
and right child can be computed
PARENT (i)
return floor(i/2)
LEFT (i)
return 2i
RIGHT (i)
return 2i + 1
Heap order property
• For every node v, other than the root, the key stored in v is greater or equal
(smaller or equal for max heap) than the key stored in the parent of v.
• In this case the maximum value is stored in the root
Definition
• Max Heap
– Store data in ascending order
– Has property of
A[Parent(i)] ≥ A[i]
• Min Heap
– Store data in descending order
– Has property of
A[Parent(i)] ≤ A[i]
19
12 16
1 4 7
19 12 16 1 4 7
Array A
1
4 16
7 12 19
1 4 16 7 12 19
Array A
Insertion
• Algorithm
1. Add the new element to the next available position at the lowest level
2. Restore the max-heap property if violated
• General strategy is percolate up (or bubble up): if the parent of the
element is smaller than the child of element, then interchange the
parent and child.
OR
12 16 12 16
4 7 1 4 7 17
1
Insert 17
19
12 17
swap
1 4 7 16
16 4 7 1 12 19
16
4 7
1 12 19
16 16
4 7 4 19
swa
p
12 19 1 12 7
1
16 19
swa
p
12 19 12 16
swa
p
4 7 1 4 7
1
Heap Sort
• The heapsort algorithm consists of two phases:
- build a heap from an arbitrary array
- use the heap to sort the data
12 16
1 4 7
Take out biggest
19
12 16
Move the last element
to the root
1 4 7
Sorted:
Array A
12 16 1 4 7 19
Take out biggest
1
Sorted:
Array A
1 4 7 12 16 19
Hashing Algorithms.
• Hashing algorithms are a certain type of search procedure.
• We assume that we are given a set of records, where each record R is uniquely
identified by its key K.
• Besides K the record R contains some unspecified useful information in the
field INFO.
• We wish to organize our records in such a way that (1) we can quickly find the
record having a given key K (if such a record exists), and (2) we can easily add
additional records to our collection.
• A straight forward way to implement this organization is to maintain our
records in a table.
• A table entry is either empty, or it contains one of our records, in which case it
is full.
• Similarly, a new record can be inserted into the table by searching for an
empty position.
• Suppose we want to design a system for storing employee records
keyed using phone numbers. And we want following queries to be
performed efficiently:
➢Insert a phone number and corresponding information.
➢Search a phone number and fetch the information.
➢Delete a phone number and related information.
• We can think of using the following data structures to maintain
information about different phone numbers.
➢Array of phone numbers and records.
➢Linked List of phone numbers and records.
➢Balanced binary search tree with phone numbers as keys.
➢Direct Access Table.
• Direct access table use big array and use phone numbers as
index in the array.
• An entry in array is NULL if phone number is not present, else
the array entry stores pointer to records corresponding to phone
number.
• To insert a phone number, we create a record with details of
given phone number, use phone number as index and store the
pointer to the created record in table.
• This solution has many practical limitations.
➢First problem with this solution is extra space required is
huge.
➢Another problem is an integer in a programming language
may not store n digits.
• Due to above limitations Direct Access Table cannot always be
used.
• Hashing is the solution that can be used in almost all such situations
and performs extremely well compared to above data structures like
Array, Linked List, Balanced BST in practice.
• With hashing we get O(1) search time on average (under reasonable
assumptions) and O(n) in worst case.
• Hashing is an improvement over Direct Access Table.
• The idea is to use hash function that converts a given phone number
or any other key to a smaller number and uses the small number as
index in a table called hash table.
Hash Function
• A function that converts a given big phone number to a small practical integer value.
• The mapped integer value is used as an index in hash table.
• In simple terms, a hash function maps a big number or string to a small integer that
can be used as index in hash table.
• A good hash function should have following properties
1) Efficiently computable.
2) Should uniformly distribute the keys.
• Hash Table: An array that stores pointers to records corresponding to a given phone
number.
• An entry in hash table is NIL if no existing phone number has hash function value
equal to the index for the entry.
• Collision Handling: Hash function gets possibility that two keys result in same
value.
• E.g keys like k2=k7=k5 , k1=k4 and k6=k8
• Following are the ways to handle collisions:
✓ Chaining: The idea is to make each cell of hash table point to a linked list of
records that have same hash function value. Chaining is simple, but requires
additional memory outside the table.
✓ Open Addressing: In open addressing, all elements are stored in the hash table
itself. Each table entry contains either a record or NIL.
Sets Representation
• A set is defined as a collection of distinct objects of the same type or
class of objects like numbers, alphabets, names, etc.
• Sets are represented in two forms:-
✓ a) Roster or tabular form: List all the elements of the set within braces { }
and separate them by commas.
➢Example: If A= set of all odd numbers less then 10
then in the roster from it can be expressed as A={
1,3,5,7,9}.
✓ b) Set Builder form: List the properties fulfilled by all the elements of the
set.
➢ We note as {x: x satisfies properties P}. and read as 'the set of those
entire x such that each x has properties P.‘
➢ Example: If B= {2, 4, 8, 16, 32}, then the set builder representation
will be: B={x: x=2n, where n ∈ N and 1≤ n ≥5}
• Disjoint Sets: Disjoint set union ... if Si and Sj are two disjoint
sets, then their union Si U S j = { all elements x such that x is in Si or
Sj}.
Example:
➢ If we have a set S1= {l, 7, 8, 9}; S 2 = {2, 5, 10} and S3 ={3, 4, 6}. So S 1
U S 2 = {1, 7, 8, 9, 2, 5, 10}.
▪ Find (i) ... find the set containing element i. Thus, 4 is in set S3 and 9
is in set S 1.
▪ The sets will be represented by trees.
▪ Representing disjoint sets by trees for S1, S 2 and S 3 .
Union of Sets
• Union of Sets A and B is defined to be the set of all those elements which belong to
A or B or both and is denoted by A∪B.
• The nodes are linked on the parent relationship, i.e. each node
other than the root is linked to its parent.
• Example the S 1 U S 2 can be represented by tree representations
• In presenting the UNION and FIND algorithms we shall identify sets by the index
of the roots of the trees.
• The operation of FIND(i) now becomes: determine the root of
the tree containing element i.
• UNION(i, j) requires two trees with roots i and j to be joined.
• Each node needs only one field, the PARENT field to link to its parent.
• Root nodes have a PARENT field of zero.
Simple union and find algorithms
procedure U(i, j)
//replace the disjoint sets with roots i and j, i ;it. j, by their union//
integer i, j
PARENT(i) j
end U
procedure F(i)
integer i, j //find the root of the tree containing element //
j i
while PARENT(j) > 0 do //PARENT(j) = 0 if this node is a root//
j PARENT(j)
repeat
return(j)
end F
• For instance, if we start off with n elements each in a set of its own, i.e. Si= {i}, 1 ≤
i ≤ n, then the initial configuration consists of a forest with n nodes and PARENT(i)
= 0, 1 ≤ i ≤ n.
• Now imagine that we process the following sequences of UNION-FIND
operations:
• U(l, 2), F(l), U(2, 3), F(l), U(3, 4), F(l), U(4, 5), ... , F(l), U(n - 1, n)
Weighting Rule for UNION (i j).
• If the number of nodes in tree i is less than the number in tree j, then make j the
parent of i, otherwise make i the parent of j. E.g.
Chapter-2 Divide and conquer
• Outlines:
✓The General Method,
✓binary search,
✓finding, maximum and minimum
✓Merge sort
✓Quick sort
✓selection sort
Divide & Conquer
• Divide & Conquer design technique is a method that helps in
sorting a set of data.
• Is a design strategy which is well known to breaking down
efficiency barriers.
• The time complexity of merge sort in the best case, worst case and
average case is O(n log n) and the number of comparisons used is
nearly optimal.
Divide and Conquer Strategy
• Using the Divide and Conquer technique, we divide a problem into
subproblems.
• When the solution to each subproblem is ready, we 'combine' the results
from the subproblems to solve the main problem.
• Suppose we had to sort an array A.
• A subproblem would be to sort a sub-section of this array starting at
index p and ending at index r, denoted as A[p..r].
Divide
• If q is the half-way point between p and r, then we can split the subarray
A[p..r] into two arrays A[p…q] and A[q+1, r].
Conquer
• In the conquer step, we try to sort both the subarrays A[p..q] and A[q+1, r].
• If we haven't yet reached the base case, we again divide both these
subarrays and try to sort them.
Combine
• When the conquer step reaches the base step and we get two sorted subarrays A[p..q] and
A[q+1, r] for array A[p..r], we combine the results by creating a sorted array A[p..r] from
two sorted subarrays A[p..q] and A[q+1, r]
• If the inclusion of the next input, into the partially constructed optimal
solution will result in an infeasible solution then this input is not added to the
partial solution.
• Some problems like Knapsack, Job sequencing with deadlines and minimum
cost spanning trees are based on subset paradigm.
Algorithm Greedy (a, n)
// a(1 : n) contains the ‘n’ inputs
{
solution := // initialize the solution to empty
for i:=1 to n do
{
x := select (a);
if feasible (solution, x) then
solution := Union (Solution, x);
}
return solution;
}
➢ Procedure Greedy function select selects an input from ‘a’, removes it and assigns
its value to ‘x’
➢ Feasible is a Boolean valued function, which determines if ‘x’ can be included into
the solution vector.
➢ The function Union combines ‘x’ with solution and updates the objective function.
Job Sequencing With Deadlines
• The sequencing of jobs on a single processor with deadline constraints is called as
Job Sequencing with Deadlines.
Here:
➢ You are given a set of jobs.
➢ Each job has a defined deadline and some profit associated with it.
➢ The profit of a job is given only when that job is completed within its
deadline.
➢ Only one processor is available for processing all the jobs.
➢ Processor takes one unit of time to complete a job.
The problem states-
• “How can the total profit be maximized if only one job can be completed at a
time?”
Approach to Solution-
• A feasible solution would be a subset of jobs where each job of the subset
gets completed within its deadline.
• Value of the feasible solution would be the sum of profit of all the jobs
contained in the subset.
• An optimal solution of the problem would be a feasible solution which
gives the maximum profit.
Greedy Algorithm-
• Greedy Algorithm is adopted to determine how the next job is selected for
an optimal solution.
Step-02:
✓ Check the value of maximum deadline.
✓ Draw a Gantt chart where maximum time on Gantt chart is the value of
maximum deadline.
Step-03:
✓ Pick up the jobs one by one.
✓ Put the job on Gantt chart as far as possible from 0 ensuring that the job
gets completed before its deadline.
Practice problem based on job sequencing with deadlines-
• Problem- Given the jobs, their deadlines and associated profits as shown-
Jobs J1 J2 J3 J4 J5 J6
Deadlines 5 3 3 2 4 2
Profits 200 180 190 300 120 100
Step-02:
✓ Value of maximum deadline = 5. So, draw a Gantt chart with
maximum time on Gantt chart = 5 units as shown.
✓ Now, We take each job one by one in the order they appear in Step-01.
✓ We place the job on Gantt chart as far as possible from 0.
Step-03:
✓ We take job J4. Since its deadline is 2, so we place it in the first empty cell
before deadline 2 as-
Step-04:
✓ We take job J1. Since its deadline is 5, so we place it in the first empty cell
before deadline 5 as-
Step-05:
✓ We take job J3. Since its deadline is 3, so we place it in the first empty cell
before deadline 3 as-
Step-06:
• We take job J2. Since its deadline is 3, so we place it in the first empty cell
before deadline 3.
• Since the second and third cells are already filled, so we place job J2 in the
first cell as-
Step-07:
✓ Now, we take job J5. Since its deadline is 4, so we place it in the first empty cell
before deadline 4 as-
• Now, The only job left is job J6 whose deadline is 2. All the slots before deadline
2 are already occupied.
• Thus, job J6 can not be completed. Now, the given questions may be answered as-
• There are many ways in which pairwise merge can be done to get a
single sorted file.
Algorithm to Generate Two-way Merge Tree:
struct treenode
{
treenode * lchild;
treenode * rchild;
};
Algorithm TREE (n)
// list is a global of n single node binary trees
{
for i := 1 to n – 1 do
{
pt new treenode
(pt → lchild) least (list); // merge two trees with smallest lengths
(pt → rchild) least (list);
(pt → weight) ((pt → lchild) → weight) + ((pt →rchild) → weight);
insert (list, pt);
}
return least (list); // The tree left in list is the merge tree
}
Example:
• Given a set of unsorted files: 5, 3, 2, 7, 9, 13
• Now, arrange these elements in ascending order: 2, 3, 5, 7, 9, 13
• After this, pick two smallest numbers and repeat this until we left with only one number.
• Now follow following steps:
Step 1: Insert 2, 3
Step 2:
Step 3: Insert 5
Step 4: Insert 13
Step 5: Insert 7 and 9
Step 6:
Examples
Minimum Spanning Trees
• Spanning Tree
– A tree (i.e., connected, acyclic graph) which contains all the
vertices of the graph
• Minimum Spanning Tree
– Spanning tree with the minimum sum of weights
8 7
b c d
4 9
2
a 11 i 4 14 e
7 6
8 10
g g f
1 2
• Spanning forest
– If a graph is not connected, then there is a spanning tree for each connected
component of the graph
Properties of Minimum Spanning Trees
1 1 6
4
2
6 5
1
Choose the vertex u not in V such 10 5
that edge weight from u to a vertex 1
in V is minimal (greedy!)
8 3
2 3 4
V={1,3} E’= {(1,3) }
1 1 6
4
2
6 5
Prim’s algorithm
Repeat until all vertices have been
chosen 1
10 5
Choose the vertex u not in V such
that edge weight from v to a vertex 1
in V is minimal (greedy!)
8 3
V= {1,3,4} E’= {(1,3),(3,4)} 2 3 4
V={1,3,4,5} E’={(1,3),(3,4),(4,5)}
1 1 6
….
V={1,3,4,5,2,6} 4
E’={(1,3),(3,4),(4,5),(5,2),(2,6)} 2
6 5
Repeat until all vertices have been 1
chosen 10 5
V={1,3,4,5,2,6} 1
E’={(1,3),(3,4),(4,5),(5,2),(2,6)} 8 3
2 3 4
1 1 6
Final Cost: 1 + 3 + 4 + 1 + 1 = 10
4
2
6 5
Kruskal’s Algorithm
• Select edges in order of increasing cost
• Accept an edge to expand tree or forest only if it does not cause a cycle
• Implementation using adjacency list, priority queues and disjoint sets
• Its algorithm is:
Initialize a forest of trees, each tree being a single node
Build a priority queue of edges with priority being lowest cost
Repeat until |V| -1 edges have been accepted {
Deletemin edge from priority queue
If it forms a cycle then discard it
else accept the edge – It will join 2 existing trees yielding a larger tree
and reducing the forest by one tree
}
The accepted edges form the minimum spanning tree
• Vertices in different trees are disjoint
– True at initialization and Union won’t modify the fact for remaining trees
• Trees form equivalent classes under the relation “is connected to”
– u connected to u (reflexivity)
– u connected to v implies v connected to u (symmetry)
– u connected to v and v connected to w implies a path from u to w so u
connected to w (transitivity)
1
10 5
1
1
8 3
2 3 4
1 1 6
4
2
6 5
1
10 5
1
1
8 3
2 3 4
1 1 6
4
2
6 5
1
Initially, Forest of 6 trees
F= {{1},{2},{3},{4},{5},{6}}
6 5
1
Select edge with lowest cost (2,5)
Find(2) = 2, Find (5) = 5
Union(2,5)
2 3 4
F= {{1},{2,5},{3},{4},{6}}
1 edge accepted 1
6 5
1
Select edge with lowest cost (2,6)
Find(2) = 2, Find (6) = 6
Union(2,6)
F= {{1},{2,5,6},{3},{4}} 2 3 4
2 edges accepted
1
1
6 5
1
Select edge with lowest cost (1,3)
Find(1) = 1, Find (3) = 3
1
Union(1,3)
F= {{1,3},{2,5,6},{4}} 2 3 4
3 edges accepted
1
1
6 5
1
Select edge with lowest cost (5,6)
Find(5) = 2, Find (6) = 2
1
Do nothing
F= {{1,3},{2,5,6},{4}} 2 3 4
3 edges accepted
1
1
6 5
1
Select edge with lowest cost (3,4)
Find(3) = 1, Find (4) = 4
1
Union(1,4)
3
F= {{1,3,4},{2,5,6}} 2 3 4
4 edges accepted
1
1
6 5
Select edge with lowest cost (4,5)
1
Find(4) = 1, Find (5) = 2
Union(1,2)
1
F= {{1,3,4,2,5,6}}
3
5 edges accepted : end 2 3 4
Total cost = 10
Although there is a unique spanning tree in 1
this example, this is not generally the case 4
1
6 5
• Recall that m = |E| = O(V2) = O(n2 )
• Prim’s runs in O((n+m) log n)
• Kruskal runs in O(m log m) = O(m log n)
1 F
2 F
3 F
4 F
5 F
6 F
7 F
8 F
9 F
• Graphically, we will denote this with check boxes next to
each of the vertices (initially unchecked)
• We will work bottom up.
– Note that if the starting vertex has any adjacent edges, then there
will be one vertex that is the shortest distance from the starting
vertex. This is the shortest reachable vertex of the graph.
3
C 2
E
Initialize: 2
B D
10
8
0 A 1 4 7 9
3
Q: A B C D E C 2
E
0
S: {}
“A” EXTRACT-MIN(Q): 2
B D
10
8
0 A 1 4 7 9
3
Q: A B C D E C 2
E
0
S: { A }
10
Relax all edges leaving A: 2
B D
10
8
0 A 1 4 7 9
3
Q: A B C D E C 2
E
0 3
10 3
S: { A }
10
“C” EXTRACT-MIN(Q): 2
B D
10
8
0 A 1 4 7 9
3
Q: A B C D E C 2
E
0 3
10 3
S: { A, C }
Relax all edges leaving C: 7 11
2
B D
10
8
0 A 1 4 7 9
3
Q: A B C D E C 2
E
0 3 5
10 3
7 11 5
S: { A, C }
“E” EXTRACT-MIN(Q): 7 11
2
B D
10
8
0 A 1 4 7 9
3
Q: A B C D E C 2
E
0 3 5
10 3
7 11 5
S: { A, C, E }
Relax all edges leaving E: 7 11
2
B D
10
8
0 A 1 4 7 9
3
Q: A B C D E C 2
E
0 3 5
10 3
7 11 5
7 11 S: { A, C, E }
“B” EXTRACT-MIN(Q): 7 11
2
B D
10
8
0 A 1 4 7 9
3
Q: A B C D E C 2
E
0 3 5
10 3
7 11 5
7 11 S: { A, C, E, B }
Relax all edges leaving B: 7 9
2
B D
10
8
0 A 1 4 7 9
3
Q: A B C D E C 2
E
0 3 5
10 3
7 11 5
7 11 S: { A, C, E, B }
9
“D” EXTRACT-MIN(Q): 7 9
2
B D
10
8
0 A 1 4 7 9
3
Q: A B C D E C 2
E
0 3 5
10 3
7 11 5
7 11 S: { A, C, E, B, D }
9