CSC315 - Lecture - Notes - Updated 2023
CSC315 - Lecture - Notes - Updated 2023
CSC315
ALGORITHMS & COMPLEXITY ANALYSIS
[Orhionkpaiyo, B. C]
CSC315 – Algorithms & Complexity Analysis – Orhionkpaiyo, B. C
Course Aim:
Algorithms (along with data structures) are the fundamental “building blocks”
from which programs are constructed. Only by fully understanding them is it
possible to write very effective programs. An algorithmic solution to a
computational problem will usually involve designing an algorithm, and then
analysing its performance. Algorithmic is a branch of computer science that
consists of designing and analyzing computer algorithms. Therefore, this course
introduces the basics of the analyses of time and space complexity of some
basic algorithms and the notations for describing them. The goal is to provide
students with the fundamental knowledge of common algorithms and to deal
with a wide variety of computational problems.
Objectives
Course Outline
o Algorithm Complexity
o Methods of estimating the runtime of algorithms
o Asymptotic notations
o Runtime classification of algorithms
o Time & Space trade-offs in Algorithm Analysis
3. Non-recursive Algorithms
o Linear search
o Binary Search
o Bubble sort
o Selection sort
o Insertion sort
4. Greedy Algorithms
i. Quicksort
Textbooks:
Contacts:
Grading:
Lecture Schedule
Week Topic
1 Students’ registration
Algorithms are used for calculation, data processing, and many other fields.
Algorithms are generally created independent of underlying languages, i.e. an
algorithm can be implemented in more than one programming language.
and decision points reduces the task into a series of smaller steps of more
manageable size.
Three reasons for using algorithms are efficiency, abstraction and reusability.
Efficiency: Certain types of problems, like sorting, occur often in
computing. Efficient algorithms must be used to solve such problems
considering the time and cost factor involved in each algorithm.
Pseudocode and flowcharts are structured ways to express algorithms that avoid
many ambiguities common in natural language statements, while remaining
independent of a particular implementation language.
A: Natural Language
1. Initialize sum
2. Repeat steps 3 and 4 ten times
3. Read number
4. Add number to sum
5. Print sum
B. Pseudo code
C. Flowchart Representation
Start
Read Num
Yes
More?
No
Print Sum
Stop
B. Using Pseudocode
Exercises
"Time" can mean the number of memory accesses performed, the number of
comparisons between integers, the number of times some inner loop is
executed, or some other natural unit related to the amount of real time the
algorithm will take.
The choice of an input size greatly depends on the elementary operation; the
most relevant or important operation of an algorithm.
o Comparisons
o Additions
o Multiplications
Indeed, for small values of n, most functions will be very similar in running
time. Only for sufficiently large n do differences in running time become
apparent.
Types of Complexity
Or, among all inputs of the same size, what is the maximum
running time? Computer scientists mostly use worst-case analysis
For example, in a linear search, the worst case is when the target item is
at the end of the list or not in the list at all. Then the algorithm must visit
every item and perform n iterations for a list of size n. Thus, the worst-
case complexity of a linear search is O(n).
There are four reasons why algorithms are generally analyzed by their worst
case:
Many algorithms perform to their worst case a large part of the time. For
example, the worst case in searching occurs when we do not find what
we are looking for at all. This frequently happens in database
applications.
The best case is not very informative because many algorithms perform
exactly the same in the best case. For example, nearly all searching
algorithms can locate an element in one inspection at best, so analyzing
this case does not tell us much.
To determine the average case, for example in a linear search, add the
number of iterations required to find the target at each possible position
and divide the sum by n. Thus, the algorithm performs (n + n – 1 + n – 2
+ . . . + 1) / n, or
(n+1)/2 iterations.
For example, in a linear search, the best case is when the algorithm finds
the target at the first position, after making one iteration, for an O(1)
complexity.
One way to measure the time cost of an algorithm is to use the computer’s clock
to obtain an actual run time. This process is called benchmarking or profiling
o It starts by determining the time for several different data sets of the
same size and then calculates the average time.
o Next, similar data are gathered for larger and larger data sets. After
several such tests, enough data are available to predict how the algorithm
will behave for a data set of any size.
The program uses the time() function to track the running time.
This function returns the number of seconds that have elapsed between
the current time on the computer’s clock and start time.
e.g.
Start = time.timer()
//start of the algorithm
The algorithm
//End of the algorithm
Elapsed = time.timer – start
Also, the running time of a program varies with the type of operating
system that lies between it and the hardware.
b. Counting Instructions
Keep in mind, however, that when you count instructions, you are
counting the instructions in the high-level code in which the algorithm is
written, not instructions in the executable machine language program.
When analyzing an algorithm in this way, you distinguish between two classes
of instructions:
ii. Instructions whose execution count varies with the problem size
e.g. instructions within loops
Note: The first class of instructions does not affect significantly in algorithm
analysis. The instructions in the second class normally are found in loops or
recursive functions. In the case of loops, focus is also on instructions performed
in any nested loops or, more simply, just the number of iterations that a nested
loop performs.
Because the most important factor affecting running time is normally size of the
input, for a given input size n we often express the time T to run the algorithm
as a function of n, written as T(n).
Time of algorithm can also be measured without necessarily counting the exact
number of steps of a program, but how that number grows with the size of the
input to the program. This approach is based on the asymptotic complexity
measure.
Asymptotic analysis refers to the study of an algorithm as the input size “gets
big” or reaches a limit. Asymptotic notation gives us a method for classifying
functions according to their rate of growth.
Primarily we are interested only in the growth rate of f, which describes how
quickly the algorithm’s performance will degrade as the size of data it processes
becomes arbitrarily large.
(i) Upper bound: It indicates the upper or highest growth rate that the
algorithm can have. A special notation, called big-Oh notation is adopted. It is
usually written in relation to the input size such as O(n). The O(n) is read as
“Oh of n” or “big Oh of n”. The Ο(n) measures the worst case time complexity
or the longest amount of time an algorithm can possibly take to complete.
For f(n) a non-negatively valued function, f(n) is in set O(g(n)) if there exist
two positive constants c and n0 such that f(n) ≤ cg(n) for all n > n0.
In other words,
The definition says that for all inputs of the type in question (such as the worst
case for all inputs of size n) that are large enough (i.e., n > n0), the algorithm
always executes in less than cg(n) steps for some constant c.
ii. g(n) is some expression for the upper bound (i.e. g(n) is an arbitrary
time complexity you are trying to relate to your algorithm).
iii. Constant n0 is the smallest value of n for which the claim of an upper
bound holds true
It should be emphasized, however, that this does not mean that the running time
is always as large as cg(n), even for large input sizes. Thus, the O-notation
provides an upper bound on the running time
The most important property is that big-O gives an upper bound only. If an
algorithm is O(n2), it doesn’t have to take n2 steps (or a constant multiple of n2).
But it can’t take more than n 2. So any algorithm that is O(n), is also an O(n2)
algorithm. Think of big-O as being like "<". Any number that is < n is also < n2.
Here are some rules that help simplify functions by omitting dominated terms:
This indicates that the term n2 will be the one that accounts for the most of the
running time as n grows arbitrarily large.
For f(n) a non-negatively valued function, f(n) is in set (g(n)) if there exist two
positive constants c and n0 such that f(n) ≥ cg(n) for all n > n0.
(g(n)) is the set of all functions with a larger or same order of growth as
g(n).
(iii) The big-Theta notation - Θ: When the upper and lower bounds are the
same within a constant factor, we indicate this by using Θ (big-Theta) notation.
Θ(g(n)) is the set of all functions with the same order of growth as g(n).
Θ(g(n)) = {f(n): there exist positive constants c1, c1, and n0 such that
Examples: (i):
(ii):
Note: Each of the assignment statements takes constant time. The for loop in
the second example involves fixed number of operations and hence requires
constant time.
Logarithmic (log n): Implies that the running time of the algorithm is
proportional to the logarithm of the input size. The algorithm of this type does
not use the whole input.
x = logbn
n = 2k
What is important here is that the running time grows in proportion to the
logarithm of the input (in this case, log to the base 2)
Sum = 0;
for (k = 1; k <= n; k*=2) // do log n times.
sum ++;
If n is 8, the loop runs log(8) times i.e. 3 times.
The loop executes N times, so the total time is N*O(1) which is O(N).
Examples:
(i)
for (int i = 0; i < data.Length; i++)
{
if (data[i] == find)
return i;
}
(ii):
(iii):
o Traversing an array.
o Sequential/Linear search in an array
Example 1:
Example 2:
Sum = 0;
for (k = 1; k <= n; k*=2) // do log n times.
for (j = 1; j <= n; j++) //do n times
sum ++;
The inner loop executes n times while the outer loop executes log n + 1 times
because on each iteration k is multiplied by two until it reaches n. The total
runtime is nlogn.
Quadratic time (n2):. That is, the number of operations is proportional to the
size of the task squared. It means the running time of the algorithm on an input
of size n is limited by the quadratic function of n. It is represented as O(n2) and
referred to as quadratic time
Examples:
(i):
(ii):
Loop2 runs a constant number of times (exactly once) every loop1 iteration
and does not affect the time complexity (although there are three for loops,
only two affect the running time, since they depend on the input size). We can
carefully trace the values of j in each loop1/loop2 iteration to find the number
of times the innermost loop iterates.
(iii):
What is the running time for this code fragment? Clearly it takes longer to run
when n is larger.
Compiled: November 2018 Last update: January 2023 Page 33
CSC315 – Algorithms & Complexity Analysis – Orhionkpaiyo, B. C
Cubic (n3): Similarly, an algorithm that process triples of data items (perhaps in
a triple–nested loop) has a cubic running time. Whenever n doubles, the running
time increases eight fold.
There are 3 nested for loops, each of which runs n times. The innermost loop
therefore executes n*n*n = n3 times. The innermost statement, which
contains a scalar sum and product takes constant O(1) time. So the algorithm
overall takes O(n3) time.
Exponential (2n): Few algorithms with exponential running time are likely to
be appropriate for practical use, such algorithms arise naturally as “brute–force”
solutions to problems. Whenever n doubles, the running time squares.
The graph below shows the growth of the algorithm running time with given
input size. The growth rate for an algorithm is the rate at which the cost of the
algorithm grows as the size of its input grows
Exercise
Using the given inputs in the table below, compute the growth rate of the three
algorithms with nlogn, n2 and logn runtime respectively and plot the graph
to show the growth rate of the algorithm
Input logn
Size nlogn n2
256
1024
2048
4096
8192
3. Non-recursive Algorithms
o Using a simple for loop to display the numbers from one to ten is an
iterative process.
Examples:
Algorithm F(n)
if n = 0 then return 1 // base case
else F(n-1)•n // recursive call
Algorithm Binary(n)
count:= 1;
while n > 1 do
count:= count + 1;
n:= n/2;
end
return count;
NOTE: The core of the algorithm analysis is to find out how the number of the
basic operations depends on the size of the input.
In the following search algorithm, there are six basic operations which are
marked with a box and labelled 1 – 6.
Rule 1: for loops - the size of the loop X the running time of the body
The running time of a for loop is at most the running time of the statements
inside the loop X the number of iterations.
Example:
a). Find the running time of statements when executed only once:
The statement in the loop body has fixed number of operations; hence it
has a constant running time when executed only once.
Rule 2: Nested loops – the product of the size of the loops X the running time of
the body
The total running time is the running time of the inside statements times the
product of the sizes of all the loops
Applying Rule 1 for the nested loop (the ‘j’ loop) we get O(n) for the body of
the outer loop.
The outer loop runs n times, therefore the total time for the nested loops will be
Running time is the product of the size of the loops times the running time of the
body.
Example:
sum = 0;
for( i = 0; i < n; i++)
for( j = 0; j < 2n; j++)
sum++;
We have one operation inside the loops, and the product of the sizes is 2n2
Note: if the body contains a function call, its running time has to be taken into
consideration.
sum = 0;
for( i = 0; i < n; i++)
for( j = 0; j < n; j++)
sum = sum + function(sum);
Assume that the running time of function(sum) is known to be log(n).
There are 3 nested for loops, each of which runs n times. The innermost loop
therefore executes n*n*n = n3 times. The innermost statement, which
contains a scalar sum and product takes constant O(1) time. So the algorithm
overall takes O(n3) time.
The total running time is the maximum of the running time of the individual
fragments
sum = 0;
for( i = 0; i < n; i++) // runs O(n) time
sum = sum + i;
sum = 0;
for( i = 0; i < n; i++)
for( j = 0; j < 2*n; j++)
sum++;
The first loop runs in O(n) time, the second - O(n2) time, the maximum is O(n2)
Rule 4: If statement
if C
S1;
else
S2;
The running time is the maximum of the running times of S1 and S2.
1. The running time of each assignment, read, and write statement can usually
be taken to be O(1).
Consecutive Statements: S1; S2; S3; ... ; SN, The runtime R of a sequence
of statements is the runtime of the statement with the max runtime:
4. The time to execute a loop is the sum, over all times around the loop, the time
to execute the body and the time to evaluate the condition for termination
(usually the latter is O(1)).
Often this time is the product of the number of times around the loop and
the largest possible time for one execution of the body, but we must
consider each loop separately to be sure.
Count inside out: the runtime is the max of the runtime of each statement
multiplied by the total number of times each statement is executed
j = p[j]*3;
The Algorithm
Algorithm linearSearch
Input: An array A[1..n] of n elements and an element x.
Output: j if x = A[j], 1 ≤ j≤ n, and 0 otherwise.
1. j ←1
2. while (j < n) and (x ≠ A[j])
3. j←j+1
4. end while
5. if x = A[j] then return j else return 0
The algorithm scans the entries in A and compares each entry with x.
If the item is not in the list, the only way to know it is to compare it against
every item present.
o In the best case we will find the item in the first place we look, at the
beginning of the list. We will need only one comparison (i.e. O(1)
running time for the best case).
o In the worst case, we will not discover the item until the very last
comparison, the nth comparison.
o Worst still, the item is not in the list
The worst case is when there are no matching elements or the first matching
element is the last one on the list. In this case, the algorithm makes the largest
number of key comparisons among all possible inputs of size n.
Thus:
C(n) = n.
l← 0; r ← n − 1
while l ≤ r do
m ← (l + r)/2
if K = A[m]
return m
else if
K < A[m]
r←m−1
else
l← m + 1
return −1
Low = 0; High = N – 1
Low = Mid + 1
High = Mid – 1
Illustration:
Compare X with middle item A[mid], go to left half if X < A[mid] and
right half if X > A[mid]. Repeat.
The standard way to analyze the efficiency of binary search is to count the
number of times the search key is compared with an element of the array.
o The worst-case inputs include all arrays that do not contain a given
search key, as well as some successful searches.
Recall that each comparison eliminates about half of the remaining items from
consideration.
o If we start with n items, about n/2 items will be left after the first
comparison. After the second comparison, there will be about n/4 then,
n/8, n/16, and so on.
o When we split the list enough times, we end up with a list that has just
one item. Either that is the item we are looking for or it is not. Either
way, we are done.
2x = N
X = log2 N
Detailed Analysis
Worst Case: Last iteration occurs when n/2k ≥ 1 and n/2k+1 < 1 item
remaining
If the number of objects is small enough to fits into the main memory,
sorting is called internal sorting.
If the number of objects is so large that some of them reside on external
storage during the sort, it is called external sorting.
Bubble sort
Insertion sort
Selection sort
The first iteration of the inner for loop moves through the record array
from bottom to top, comparing adjacent keys. If the lower-indexed key’s
value is greater than its higher-indexed neighbour, then the two values
are swapped. Once the smallest value is encountered, this process will
cause it to “bubble” up to the top of the array.
The second pass through the array repeats this process. However,
because we know that the smallest value reached the top of the array on
the first pass, there is no need to compare the top two elements on the
second pass. Likewise, each succeeding pass through the array compares
adjacent elements, looking at one less value than the preceding pass.
The action of the algorithm on the list 42, 20, 17, 13, 28, 14, 23, 15 is illustrated
below.
The above is an illustration of Bubble Sort. Each column shows the array after
the iteration with the indicated value of i in the outer for loop. Values above the
line in each column have been sorted. Arrows indicate the swaps that take place
during a given iteration.
The array is scanned from the bottom up, and two adjacent elements are
interchanged if they are found to be out of order with respect to each other.
First, items data[n-1] and data[n-2] are compared and swapped if they
are out of order. Next, data[n-2] and data[n-3] are compared, and their
order is changed if necessary, and so on up to data[1] and data[0]. In this
way, the smallest element is bubbled up to the top of the array.
However, this is only the first pass through the array. The array is
scanned again comparing consecutive items and interchanging them
when needed, but this time, the last comparison is done for data[2] and
data[1] because the smallest element is already in its proper position,
namely, position 0.
The second pass bubbles the second smallest element of the array up to
the second position, position 1. The procedure continues until the last
pass when only one comparison, data[n-1] with data[n-2], and
possibly one interchange are performed.
The sorting activities of the Bubble sort algorithm are summarized as follows:
2: Version 2
Bubble Sort (Array A[])
For i = 1 to n – 1
For j = 1 to n – i
If (A[j] > A[j+1]) then
Swap(A[j], A[j+1])
The innermost statement, the if, takes O(1) time. It doesn’t necessarily
take the same time when the condition is true as it does when it is false,
but both times are bounded by a constant.
The outer loop executes n times, but the inner loop executes a number
of times that depends on i.
The first time the inner for executes, it runs i = n-1 times. The second
time it runs n-2 times as shown in the table below.
O((n2-n)/2).
Using the rules for big-O given earlier, this bound simplifies to O((n2)/2)
by ignoring a smaller term, and to O(n2), by ignoring a constant factor.
The selection sort improves on the bubble sort by making only one exchange
for every pass through the list.
In order to do this, a selection sort looks for the largest value as it makes
a pass and, after completing the pass, places it in the proper location. As
with a bubble sort, after the first pass, the largest item is in the correct
place.
After the second pass, the next largest is in place. This process continues
and requires n−1 passes to sort n items, since the final item must be in
place after the (n−1)st pass.
The Selection sort performs sorting by searching for the minimum value
number and placing it into the first or last position according to the order
(ascending or descending). The process of searching the minimum key and
placing it in the proper position is continued until the all the elements are placed
at right position.
• For each position, find the element that belongs there and put it in place
by swapping it with the element that’s currently there
The figure shows the entire sorting process. On each pass, the largest remaining
item is selected and then placed in its proper location. The first pass places 93,
the second pass places 77, the third places 55, and so on.
The Algorithm
SelectionSort
Input: An array A[1::n] of n elements.
Output: A[1::n] sorted in non-decreasing order.
1. for i←1 to n-1
2. k ←i
3. for j ← i+1 to n //Find the ith
smallest element
4. if A[j] < A[k] then k ←j
5. end for
6. if k≠ i then interchange A[i] and A[k]
7. end for
…..
C(n) = (n - 1) + (n – 2) + …..+ 2 + 1
C(n) = (n - 1) + (n – 2) + …..+ 2 + 1 =
C(n) =
= n2/2 – n/2
Thus n2/2 = n2
After each of the n-1 passes to find the smallest remaining element, the
algorithm performs a swap to put the element in place.
Insertion Sort iterates through a list of records. Each record is inserted in turn at
the correct position within a sorted list composed of those records already
processed.
Insertion sort uses two sets of arrays where one stores the sorted data and
other on unsorted data.
The primary concept behind insertion sort is to insert each item into its
appropriate place in the final list.
One way to think about this is to imagine that you have a stack of phone
bills from the past two years and that you wish to organize them by date.
each bill, you would add it to the sorted pile that you have already
made.
Below is an illustration of Insertion Sort. Each column shows the array after the
iteration with the indicated value of i in the outer for loop. Values above the
line in each column have been sorted. Arrows indicate the upward motions of
records through the array.
The Algorithm
Algorithm insertionSort(A, n)
Input: array A of n integers
Output: the sorted array A
1. for i ← 1 to n ‐ 1 do
2. x ← A[i]
3. for j ← i ‐1 downto 0 do
4. if x < A[j] then A[j+1] ←A[j]
5. else break
6. A[j+1] ← x
7. return A
#
Algorithm insertionSort(A, n) operations total # operations
Input array A of n integers
Output the sorted array A
for i ← 1 to n ‐ 1 do 0 0
x ← A[i] 1 n - 1
for j ← i‐1 downto 0 do 0 0
if x < A[j] then A[j+1] ← (n -
A[j] 2 1)n
else break 0 0
A[j+1] ← x 1 n - 1
return A 1 1
Total n2 + n - 1
There are n−1 passes to sort n items. The iteration starts at position 1 and
moves through position n−1. The maximum number of comparisons for an
insertion sort is the sum of the first n−1
If the inclusion of the next input, into the partially constructed optimal
solution will result in an infeasible solution then this input is not added
to the partial solution.
On each step—and this is the central point of this technique—the choice made
must be:
For the greedy strategy to work well, it is necessary that the problem under
consideration has two characteristics:
To apply the greedy technique to a problem, we must take into consideration the
following:
Greedy algorithms can be applied in wide range of problems which include the
following problems:
Compiled: November 2018 Last update: January 2023 Page 63
CSC315 – Algorithms & Complexity Analysis – Orhionkpaiyo, B. C
o Scheduling a hall for lectures (only one course can hold at a time) when
several groups want to use it.
o Renting out some piece of equipment to different people is another
example.
o Others include requests for boats to use a repair facility while they are in
port, and planning weekend schedule.
Suppose we are given a set of n lectures, and lecture i starts at time sj and ends
at time fj. The goal is to use the minimum number classrooms to schedule all
lectures so that no two occur at the same time in the same room. As an
illustration of the problem, consider the sample instance
(b) A solution in which all lectures are scheduled using 3 classrooms: each row
represents a set of lectures that can all be scheduled in a single classroom.
Consider a set of requests for a room. Only one person can reserve the
room at a time, and you want to allow the maximum number of requests.
(1,4),(3,5),(0,6),(5,7),(3,8),(5,9),(6,10),(8,11),(8,12),(2,13),(12,14)
iv. Earliest finish time: Considers jobs in ascending order of start time fi.
This is mostly used because it yields optimal choice. It leaves the
resource available for as many other activities as possible.
Example:
i a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11
si 1 3 0 5 3 5 6 8 8 2 12
fi 4 5 6 8 8 9 10 11 12 13 14
Exercise: Using the above table, determine the possible sets of mutually
compatible activities by applying the first three approaches above.
Points to remember
For this algorithm we have a list of activities with their starting time and
finishing time.
Our goal is to select maximum number of non-conflicting activities that
can be performed by a person or a machine, assuming that the person or
machine involved can work on a single activity at a time.
Any two activities are said to be non-conflicting if starting time of one
activity is greater than or equal to the finishing time of the other activity.
In order to solve this problem we first sort the activities as per their
finishing time in ascending order.
Then we select non-conflicting activities.
Problem
Consider the following 8 activities with their starting and finishing time.
Activity a1 a2 a3 a4 a5 a6 a7 a8
start 1 0 1 4 2 5 3 4
finish 3 4 2 6 9 8 5 5
Sorted Activity a3 a1 a2 a7 a8 a4 a6 a5
start 1 1 0 3 4 4 5 2
finish 2 3 4 5 5 6 8 9
Step 3: select next activity whose start time is greater than or equal to the finish
time of the previously selected activity
Pseudo Code
Greedy_activity_selector (s, f)
2. n length[s]
3. A {i}
4. j 1
5. for i 2 to n
6. if si fj then
7. A A {i}
8. j i
9. return A
The set A collects the selected activities. The variable j specifies the most recent
addition to A. Since the activities are considered in order of non-decreasing
finishing time, fj is always the maximum finishing time of any activity in A.
Running time of line 1 in the above depends on the sort algorithm used.
With merge sort algorithm, it requires O(n log n) time.
O(n) for the greedy collection of activities (i.e The algorithm takes as
input a set of activities S (each activity stored as a pair of numbers (si,
fi)) and the total number of activities n).
Total running time is O(n log n) + O(n) = O(n log n)
A Cycle is a sequence of nodes and edges without repetitions other than the
starting and ending nodes. Example
2 2
1 1
3 3
A spanning tree for a connected graph is a tree whose vertex set is the same as
the vertex set of the given graph, and whose edge set is a subset of the edge set
of the given graph. i.e., any connected graph will have a spanning tree.
Weight of a spanning tree w (T) is the sum of weights of all edges in T. The
Minimum spanning tree (MST) is a spanning tree with the smallest possible
weight.
The following diagrams illustrate a graph and the possible spanning trees from
the graph.
A greedy method to obtain the minimum spanning tree would construct the tree
edge by edge, where each edge is chosen accounting to some optimization
criterion. An obvious criterion would be to choose an edge which adds a
minimum weight to the total weight of the edges selected so far. There are two
ways in which this criterion can be achieved.
i. The set of edges selected so far always forms a tree, the next edge to be
added is such that not only it adds a minimum weight, but also forms a
tree with the previous edges; it can be shown that the algorithm results in
a minimum cost tree; this algorithm is called Prim’s algorithm.
ii. The edges are considered in non decreasing order of weight; the set T
of edges at each stage is such that it is possible to complete T into a tree;
thus T may not be a tree at all stages of the algorithm; this also results in
a minimum cost tree; this algorithm is called Kruskal’s algorithm.
The algorithm grows the spanning tree starting from an arbitrary vertex. Let G =
(V,E), where for simplicity V is taken to be the set of integers {1, 2,….,n}. The
algorithm begins by creating two sets of vertices: X = {1} and Y = {2, 3,…,n}. It
then grows a spanning tree, one edge at a time. At each step, it finds an edge (x,
1. Create MST set that keeps track of vertices already included in MST.
2. Assign key values to all vertices in the input graph. Initialize all key
values as INFINITE (∞). Assign key values like 0 for the first vertex so
that it is picked first.
3. While MST set doesn't include all vertices.
1. Pick vertex u which is not in MST set and has minimum key
value. Include 'u' to MST set.
2. Update the key value of all adjacent vertices of u. To update,
iterate through all adjacent vertices. For every adjacent vertex v, if
the weight of edge u.v less than the previous key value of v,
update key value as a weight of u.v.
Step1:
The vertex connecting to the edge having least weight is usually selected.
Step2:
Find all the edges that connect the tree to new vertices.
Find the least weight edge among those edges and include it in the
existing tree.
If including that edge creates a cycle, then reject that edge and look for
the next least weight edge.
Keep repeating this step 2 until all the vertices is included and Minimum
Spanning Tree (MST) is obtained.
The Algorithm
{N[y] is neighbour of y}
{C[y] is cost of edge connecting y}
{Vertex y not adjacent to 1}
It follows that the time complexity of the algorithm is O(m + n2) = O(n2).
Consider the weighted graph below. The first edge that is added is (1, 2),
followed by edges (1, 3), (4, 6), and (5, 6). Notice that edge (2, 3) creates a
cycle as shown in figure f, and hence is discarded. For the same reason edge (4,
5) is also discarded, as shown in figure g. Finally, edge (3, 4) is included, which
results in the minimum spanning tree (V, T) shown in fig. (h).
The Algorithm
Time complexity
We analyze the time complexity of the algorithm as follows:
Steps 1 and 2 cost O(m log m) and Θ(n), respectively, where m = │E│.
Step 6 costs Θ(1), and since it is executed at most m times, its total cost
is O(m).
Step 7 is executed exactly n - 1 times for a total of Θ(n) time.
The union operation is executed n – 1 times
The find operation is executed at most 2m times.
The overall cost of these two operations (i.e. union and find) is O(m log*
n).
The 0–1 knapsack problem is posed as follows. A thief robbing a store finds n
items; the ith item is worth vi dollars and weighs wi pounds, where vi and wi are
integers. He wants to take as valuable a load as possible, but he can carry at
most W pounds in his knapsack for some integer W. Which items should he
take? (This is called the 0–1 knapsack problem because each item must either
be taken or left behind; the thief cannot take a fractional amount of an item or
take an item more than once e.g. TV set)
In the fractional knapsack problem, the setup is the same, but the thief can take
fractions of items, rather than having to make a binary (0–1) choice for each
item. You can think of an item in the 0–1 knapsack problem as being like a gold
ingot (mass of metal), while an item in the fractional knapsack problem is more
like gold dust.
o For the 0–1 problem, consider the most valuable load that weighs at most
W pounds. If we remove item j from this load, the remaining load must
be the most valuable load weighing at most W - wj that the thief can take
from the n - 1 original items excluding j.
However, only the fractional knapsack problem has the greedy choice property.
Thus, the fractional knapsack problem is solvable by a greedy strategy, whereas
the 0–1 problem is not (but solvable by dynamic programming). Therefore, our
focus is the fractional knapsack problem.
Given a list of n objects say {I1, I2,…..,In} and a knapsack (or bag) with
capacity, M. Each object Ii has a weight Wi and a profit of Pi. If a fraction xi
(where xi {0,…..,1) of an object Ii is placed into a knapsack, then a
profit of pixi is earned.
Mathematically:
Where pi and wi are the profit and weight of ith object and xi is the fraction
of ith object to be selected.
Example:
Solution
1 1 2/15 0 20 28.2
2 0 2/3 1 20 31.0
3 0 1 1/2 20 31.5
Greedy strategy I: In this case, the items are arranged by their profit values.
Here the item with maximum profit is selected first. If the weight of the object
is less than the remaining capacity of the knapsack then the object is selected
full and the profit associated with the object is added to the total profit.
Otherwise, a fraction of the object is selected so that the knapsack can be filled
exactly. This process continues from selecting the highest profitable object to
the lowest profitable object till the knapsack is exactly full.
Thus, the first item has the maximum profit (25) and is selected first. After
filling the item (w = 18), the sack is left with only 2 capacity (i.e. 20 – 18).
Next, a fraction of the second item (profit = 24) is selected. Since the weight,
15, exceeds the maximum, a fraction of it, , is taken. Now the knapsack is full
(i.e. ) so 3rd item is not selected. Hence, the total profit = 28.2
units and the solution set is (x1, x2, x3) = (1, , 0).
Greedy strategy II: In this case, the items are arranged by fair weights. Here
the item with minimum weight in selected first and the process continues like
greedy strategy-I till the knapsack is exactly full.
Thus, 3rd, and 2nd items are selected. After selecting the 3rd item, the sack
remains 10 to be filled, so only a fraction (10) of the 2 nd item is taken. That is
Greedy strategy III: In this case, the items are arranged by profit/weight ratio
and the item with maximum profit/weight ratio is selected first and the process
continues like greedy strategy-I till the knapsack is exactly full.
Using this approach, select those objects with maximum value of , that is,
select those objects first with maximum profit per unit weight. Since ,
= 1.3, 1.6, 1.5 respectively, select 2nd object first, then 3rd object followed by 1st
object. The total profit (i.e. 31.5 (i.e. 24 + 7.5) and the solution set
(x1, x2, x3) = (0, 1, ) i.e. total weight selected from x1, x2, x3 = 0, 15, 5 = 20.
Therefore, it is clear from the above strategies that the Greedy method
generates optimal solution if we select the objects with respect to their profit to
weight ratios.
The Algorithm
time (which is the most efficient runtime possible for sort algorithms
Line 6 (i.e. the while loop) takes O(n) time
Therefore, total time is O(n log n)
o The first task in the schedule begins at time 0 and finishes at time 1, the
second task begins at time 1 and finishes at time 2, and so on.
o Because each job takes the same amount of time, we will think of a
Schedule S as consisting of a sequence of job “slots” 1,2,3, . . .where S(t)
is the job scheduled in slot,t
The problem of scheduling unit-time tasks with deadlines and penalties for a
single processor has the following inputs:
We wish to find a schedule for S that minimizes the total penalty incurred for
missed deadlines.
Given is a set S of n tasks {a1, …, an} and a single processor. Each task
takes a unit time to execute and tasks have deadlines d1, …, dn. Tasks
have non-negative penalties w1, …, wn if miss the deadline.
Goal: Find a schedule S where the total penalty of the tasks that miss their
deadline is minimized – i.e., the total penalty of the tasks that meet their
deadline is maximized
NOTE:
ii. Check the value of maximum deadline and draw a Gantt chart
where maximum time on Gantt chart is the value of maximum
deadline.
iii. Pick up the jobs one by one on the Gantt chart as far as possible
from 0 ensuring that the job gets completed before its deadline.
Given the following six jobs, their deadlines and associated profits, we illustrate
the above strategy as follows:
Jobs J1 J2 J3 J4 J5 J6
Deadlines 5 3 3 2 4 2
Profits 20 18 19 30 12 10
Step 1: Sort all the given jobs in decreasing order of their profit
Jobs J4 J1 J3 J2 J5 J6
Deadlines 2 5 3 3 4 2
Profits 30 20 19 18 12 10
Step 2:
So, draw a Gantt chart with maximum time on Gantt chart = 5 units as shown
Gantt chart
Step 3:
4. Next, take job J2. Since its deadline is 3, so we place it in the first
empty cell before deadline 3. But the second and third cells are
already filled, so we place job J2 in the first cell
5. Next, take job J5. Since its deadline is 4, so we place it in the first
empty cell before deadline 4
NOTE: The only job left is J6 whose deadline is 2. All the slots before deadline
2 are already occupied therefore J6 cannot be completed.
Thus the optimal schedule of the jobs is: J2, J4, J3, J5, J1
Total profit: 30 + 20 + 19 + 18 + 12 = 99
Solution:
Job i 1 2 3 4 5 6
Profit gi 20 15 10 7 5 3
Deadline di 3 1 1 3 1 3
Steps:
i. Here jobs are already sorted in decreasing order of their profit
Let P = min(6,3) = 3
F= 0 1 2 3
Job 0 0 0 0
selected
F= 0 1 2 0
Job 0 0 0 1
selected
F= 0 0 2 0
Job 0 2 0 1
selected
F= 0 0 0 0
Job 0 2 4 1
selected
The final optimal sequence: Execute the job order 2, 4, 1 with total profit value
42
The Algorithm
1:
2:
o they break the problem into several sub-problems that are similar to the
original problem but smaller in size,
o solve the sub-problems recursively, and
o then combine these solutions to create a solution to the original problem.
Combine the solutions to the sub-problems into the solution for the original
problem.
When an algorithm contains a recursive call to itself, its running time can often
be described by a recurrence. Recurrence is briefly explained in the following
section.
5.1 Recurrences
Like all recursive structures, a recurrence consists of one or more base cases
and one or more recursive cases. Each of these cases is an equation or
inequality, with some function value f (n) on the left side.
o The base cases give explicit values for a (typically finite, typically small)
subset of the possible values of n.
o The recursive cases relate the function value f (n) to function value f (k)
for one or more integers k < n;
o typically, each recursive case applies to an infinite number of
possible values of n.
For example, the following recurrence (written in two different but standard
ways) describes the identity function f (n) = n:
In both presentations, the first line is the only base case, and the second line is
the only recursive case.
Example 1:
T(n) = 3T(n/5) + 8n2
Because a = 3, b = 5, c = 8, and k = 2, we find that 3 < 52.
Applying case (3) of the theorem, T(n) = Θ(n2).
Example 2:
ALGORITHM Quicksort(A[l..r])
//Sorts a sub-array by quicksort
Input: Sub-array of array A[0..n − 1], defined by its left and right indices l and r
Output: Sub-array A[l..r] sorted in nondecreasing order
if l < r
s ← Partition(A[l..r]) //s is a split position
Quicksort(A[l..s − 1])
Quicksort(A[s + 1..r])
ALGORITHM HOARE-PARTITION(A, p, r)
1. x = A[p]
2. i = p - 1
3. j = r + 1
4. while true
5. repeat
6. j = j - 1
7. until A[j] ≤ x
8. repeat
9. i = i + 1
The right-to-left scan, denoted below by index pointer j, starts with the last
element of the sub-array. Since we want elements larger than the pivot to be in
the right part of the sub-array, this scan skips over elements that are larger than
the pivot and stops on encountering the first element smaller than or equal to the
pivot.
After both scans stop, three situations may arise, depending on whether or not
the scanning indices have crossed. If scanning indices i and j have not crossed,
i.e. i < j, we simply exchange A[i] and A[j] and resume the scans by
incrementing i and decrementing j, respectively:
If the scanning indices have crossed over, i.e., i > j, we will have partitioned
the sub-array after exchanging the pivot with A[j ]:
Finally, if the scanning indices stop while pointing to the same element, i.e., i =
j, the value they are pointing to must be equal to p. Thus, we have the sub-array
partitioned, with the split position s = i = j:
We can combine the last case with the case of crossed-over indices (i > j)
i. Worst-case partitioning
The running time of QUICKSORT is dominated by the time spent in the
partition procedure. Each time the PARTITION procedure is called, a pivot
element is selected, and this element is never included in any future recursive
calls to QUICK-SORT and PARTITION.
One call to PARTITION takes O(1) time plus an amount of time that is
proportional to the number of iterations of the loop.
The partitioning costs 2(n) time. Since the recursive call on an array of size 0
just returns,
= T (n − 1) + 2(n) .
Intuitively, if we sum the costs incurred at each level of the recursion, we get
an arithmetic series (which evaluates to 2(n2). Thus, if the partitioning is
maximally unbalanced at every recursive level of the algorithm, the running
time is 2(n2) i.e. O(n2)
Therefore the worst-case running time of quicksort is 2(n2), which occurs when
the input array is already completely sorted
each of size no more than n/2, since one is of size ⌊n/2⌋ and one of size
⌈n/2⌉−1. In this case, quicksort runs much faster. The recurrence for the
Thus, the equal balancing of the two sides of the partition at every level of
the recursion produces an asymptotically faster algorithm.
A Binary Search Tree (BST), also called ordered binary tree, has the following
property:
o all values stored in its left sub-tree (the tree whose root is the left child)
are less than value v stored in n, and
o all values stored in the right sub-tree are greater than v.
NOTE: Storing multiple copies of the same value in the same tree is avoided.
An attempt to do so can be treated as an error.
The meaning of “less than” or “greater than” depend on the type of values
stored in the tree: It is “<” and “>” for numerical values and alphabetical order
in the case of strings.
The keys in a binary search tree are always stored in such a way as to satisfy the
binary-search-tree property:
The nodes encountered during the recursion form a simple path downward
from the root of the tree, and thus the running time of TREE-SEARCH is O(h),
where h is the height of the tree.
The height of a tree is the number of nodes in the longest path from the
root node to a leaf node.
In other words, if the depth of a binary search tree is n nodes so the
running time of the algorithm will be O(n).
Exercises