Daaunit 1
Daaunit 1
The word “Algorithm” comes from the Persian author Abdullah Jafar Muhammad ibn Musa Al-khowarizmi in ninth
century, who has given the definition of algorithm as follows:
An Algorithm is a set of rules for carrying out calculation either by hand or on a machine.
An Algorithm is a well defined computational procedure that takes input and produces output.
An Algorithm is a finite sequence of instructions or steps (i.e. inputs) to achieve some particular output.
5
ANALYIS AND COMPLEXITY OF ALGORITHMS
“Analysis of algorithm” is a field in computer science whose overall goal is an understanding of the complexity
of algorithms (in terms of time Complexity), also known as execution time & storage (or space) requirement
taken by that algorithm.
Suppose M is an algorithm, and suppose n is the size of the input data. The time and space used by the algorithm
M are the two main measures for the efficiency of M. The time is measured by counting the number of key
operations, for example, in case of sorting and searching algorithms, the number of comparisons is the number of
key operations. That is because key operations are so defined that the time for the other operations is much less
than or at most proportional to the time for the key operations. The space is measured by counting the maximum
of memory needed by the algorithm.
The complexity of an algorithm M is the function f(n), which give the running time and/or storage space
requirement of the algorithm in terms of the size n of the input data. Frequently, the storage space required by an
algorithm is simply a multiple of the data size n. In general the term “complexity” given anywhere simply refers
to the running time of the algorithm. There are 3 cases, in general, to find the complexity function f(n):
The analysis of the average case assume a certain probabilistic distribution for the input data; one such
assumption might be that all possible permutations of an input data set are equally likely. The Average case also uses the
concept of probability theory. Suppose the numbers numbers …… occur with respective probabilities Then the
expectation or average value of E is given by .
To understand the Best, Worst and Average cases of an algorithm, consider a linear array, where the array A contains
n-elements.. Suppose you want either to find the location LOC of a given element (say ) in the given array A or to send
some message, such as LOC=0, to indicate that does not appear in A. Here the linear search algorithm solves this
problem by comparing given, one-by-one, with each element in A. That is, we compare with A[1], then A[2], and so on,
until we find LOC such that .
6
Analysis of linear search algorithm
The complexity of the search algorithm is given by the number C of comparisons between x and array elements A[K].
Best case: Clearly the best case occurs when x is the first element in the array
A. That is . In this case
Worst case: Clearly the worst case occurs when x is the last element in the array A or x is not present in given array A
(to ensure we have to search entire array A till last element). In this case, we have
.
Average case: Here we assume that searched element appear array A, and it is equally likely to occur at any
position in the array. Here the number of comparisons can be any of numbers , and each
number occurs with the probability then
It means the average number of comparisons needed to find the location of x is approximately equal to half the number
of elements in array A.
Unless otherwise stated or implied, we always find and write the complexity of an algorithm in the worst case.
There are three basic asymptotic notations which are used to express the running time of an algorithm
in terms of function, whose domain is the set of natural numbers N={1,2,3,…..}. These are:
[This notation is used to express Upper bound (maximum steps) required to solve a problem]
Ω
[This notation is used to express Lower bound i.e. minimum (at least) steps required to solve a problem]
Θ („Theta‟) Notations. [Used to express both Upper & Lower bound, also called tight bound]
Asymptotic notation gives the rate of growth, i.e. performance, of the run time for “sufficiently large input sizes”
and is not a measure of the particular run time for a specific input size (which should be done
empirically). O-notation is used to express the Upper bound (worst case); Ω- notation is used to express the Lower bound
(Best case) and Θ- Notations is used to express both upper and lower bound (i.e. Average case) on a function.
We generally want to find either or both an asymptotic lower bound and upper bound for the growth of our function.
The lower bound represents the best case growth of the algorithm while the upper bound represents the worst case
growth of the algorithm.
1. Divide-and-Conquer
2. Greedy method
3. Dynamic Programming
4. Backtracking
5. Branch-and-Bound
7
1. Divide & conquer technique is a top-down approach to solve a problem. The algorithm which follows divide and
conquer technique involves 3 steps:
Divide the original problem into a set of sub problems.
Conquer (or Solve) every sub-problem individually, recursive. Combine the solutions of these sub problems to
get the solution of original problem.
2. Greedy technique is used to solve an optimization problem. An Optimization problem is one in which, we are given
a set of input values, which are required to be either maximized or minimized (known as objective function) w. r. t.
some constraints or conditions. Greedy algorithm always makes the choice (greedy criteria) that looks best at the
moment, to optimize a given objective function. That is, it makes a locally optimal choice in the hope that this
choice will lead to a overall globally optimal solution. The greedy algorithm does not always guaranteed the
optimal solution but it generally produces solutions that are very close in value to the optimal.
3. Dynamic programming technique is similar to divide and conquer approach. Both solve a problem by breaking it
down into a several sub problems that can be solved recursively. The difference between the two is that in dynamic
programming approach, the results obtained from solving smaller sub problems are reused (by maintaining a table of
results) in the calculation of larger sub problems. Thus dynamic programming is a Bottom-up approach that begins
by solving the smaller sub-problems, saving these partial results, and then reusing them to solve larger sub-problems
until the solution to the original problem is obtained. Reusing the results of sub-problems (by maintaining a table of
results) is the major advantage of dynamic programming because it avoids the re- computations (computing results
twice or more) of the same problem. Thus Dynamic programming approach takes much less time than naïve or
straightforward methods, such as divide-and-conquer approach which solves problems in top-down method and
having lots of re-computations. The dynamic programming approach always gives a guarantee to get a optimal
solution.
4. The term “backtrack” was coined by American mathematician D.H. Lehmer in the 1950s. Backtracking can be
applied only for problems which admit the concept of a “partial candidate solution” and relatively quick test of
whether it can possibly be completed to a valid solution. Backtrack algorithms try each possibility until they find the
right one. It is a depth-first-search of the set of possible solutions. During the search, if an alternative doesn‟t work,
the search backtracks to the choice point, the place which presented different alternatives, and tries the next
alternative. When the alternatives are exhausted, the search returns to the previous choice point and try the next
alternative there. If there are no more choice points, the search fails.
5. Branch-and-Bound (B&B) is a rather general optimization technique that applies where the greedy method and
dynamic programming fail. B&B design strategy is very similar to backtracking in that a state-space-tree is used to
solve a problem. Branch and bound is a systematic method for solving optimization problems. However, it is much
slower. Indeed, it often leads to exponential time complexities in the worst case. On the other hand, if applied
carefully, it can lead to algorithms that run reasonably fast on average. The general idea of B&B is a BFS-like search
for the optimal solution, but not all nodes get expanded (i.e., their children generated). Rather, a carefully selected
criterion determines which node to expand and when, and another criterion tells the algorithm when an optimal
solution has been found. Branch and Bound (B&B) is the most widely used tool for solving large scale NP-hard
combinatorial optimization problems.
The following table-1 summarizes these techniques with some common problems that follows these techniques with their
running time. Each technique has different running time (…time complexity).
8
PSEUDO-CODE FOR ALGORITHM
Pseudo-code (derived from pseudo-code) is a compact and informal high level description of a computer programming
algorithm that uses the structural conventions of some programming language. Unlike actual computer language such as
C,C++ or JAVA, Pseudo-code typically omits details that are not essential for understanding the algorithm, such as functions
(or subroutines), variable declaration, semicolons, special words and so on. Any version of pseudo-code is acceptable as long
as its instructions are unambiguous and is resembles in form. Pseudo-code is independent of any programming language.
Pseudo-code cannot be compiled nor executed, and not following any syntax rules
Flow charts can be thought of as a graphical alternative to pseudo-code. A flowchart is a schematic representation of an
algorithm, or the step-by-step solution of a problem, using some geometric figures (called flowchart symbols) connected by
flow-lines for the purpose of designing or documenting a program.
The purpose of using pseudo-code is that it may be easier to read than conventional programming languages that enables
(or helps) the programmers to concentrate on the algorithms without worrying about all the syntactic details of a
particular programming language. In fact, one can write a pseudo- code for any given problem without even knowing
what programming language one will use for the final implementation.
9
The first line of a function consists of the name of the function followed parentheses, in parentheses we pass the
parameters of the function. The parameters may be data, variables, arrays, and so on, that are available to the function. In
the above algorithm, The parameters are the three input values, and the output parameter, x that is assigned
the maximum of the three input values
1. Give a valid name for the pseudo-code procedure. (See sample code for insertion sort at the end).
2. Use the line numbers for each line of code.
3. Use proper Indentation for every statement in a block structure.
4. For a flow control statements use if-else. Always end an if statement with an end-if. Both if, else and end-
if should be aligned vertically in same line.
6. Array elements can be represented by specifying the array name followed by the index in square brackets.
For example, indicates the ith element of the array A.
7. For looping or iteration use for or while statements. Always end a for loop with end-for and a while with
end-while.
8. The conditional expression of for or while can be written as shown in rule (4). You can separate two or
more conditions with “and”.
9. If required, we can also put comments in between the symbol
/* and */.
10
MATHEMATICAL INDUCTION AND MATEMATICAL FORMULE FOR
ALGORITHMS
Mathematical Induction
In this section we will describe what mathematical induction is and also introduce some formulae which are extensively
used in mathematical induction.
Mathematical induction is a method of mathematical proof typically used to establish that a given
statement is true for all natural number (positive integers). It is done by proving that the first
statement in the infinite sequence of statements is true, and then proving that if any one statement in
the infinite sequence of statements is true, then so is the next one.
Statement1:
Statement2:
Statement3:
…….
……….
Statement n:
…….
………
Statements 1-3 are obtained by everywhere replacing n by 1 in the original equation, then n by 2, and then n by 3.
Thus a Mathematical induction is a method of mathematical proof of a given formula (or statement), by proving a
sequence of statements
Proof by using Mathematical Induction of a given statement (or formula), defined on the positive integer N, consists
of two steps:
2. (Inductive Step): Assume that S(n) is true, and prove that is true for all
Example1: Prove the proposition that “The sum of the first n positive integers is ; that is
by induction.
11
Proof:
(Base Step): We must show that the given equation is true for
i.e. this is true.
Hence we proved “S(1) is true”.
(Induction Step):
Let us assume that the given equation S(n) is true for n; that is ;
Now we have to prove that it is true for (n+1).
Hence S(n+1) is true whenever S(n) is true. This implies that equation S(n) is true for all
12
13
Sum formula for Arithmetic Series
where
For
I) (when
II) (when
III)
Logarithmic Formulae
The following logarithmic formulae are quite useful for solving recurrence relation.
1.
2.
3.
4.
5.
6.
14
Remark: Since changing the base of a logarithm from one constant to another only changes the value of the
logarithm by a constant factor (see property 4), we don‟t care about the base of a “log” when we are writing a
time complexity using O, Ω, or θ- notations. We always write 2 as the base of
a log. For Example:
Space complexity is defined as the amount of memory a program needs to run to completion.
Space of any Program has a
i. fixed part which is indepedendent of Characteristics (number of inputs and outputs).
i.e. space for instructions, fixed size component variables (also called aggregate ) and constants
ii. variable part –space needed by component variable whose size is dependent on the particular problem
instance –reference variables and stack space.
S(P)=c + Sp
Why is this of concern?
We could be running on a multi-user system where programs are allocated a specific amount of space.
We may not have sufficient memory on our computer.
There may be multiple solutions, each having different space requirements.
The space complexity may define an upper bound on the data that the program can handle.
Time Complexity
The time , taken by a program P, is the sum of the Compile time & the Run
(execution) time. The Compile time does not depends on the instance characteristics (i.e. no. of inputs, no. of
outputs, magnitude of inputs, magnitude of outputs etc.).
1. Algorithm X (a,b,c)
15
2. Algorithm SUM (a, n) S:= 0
For i = 1 to n do
S = S + a [i];
Return S;
Here the problem instance is characterized by value of n, i.e., number of elements to be summed.
ADD, SUB, MUL, DIV is a functions whose values are the numbers of performed when the code for P is used on an
instance with characteristics n. Generally, the time complexity of an algorithm is given by the no. steps taken by the
algorithm to complete the function it was written for.
The number of steps is itself a function of the instance characteristics.
16
How to calculate time complexity of any program
The number of machine instructions which a program executes during its running time is called its time complexity. This
number depends primarily on the size of the program‟s input. Time taken by a program is the sum of the compile time and
the run time. In time complexity, we consider run time only. The time required by an algorithm is determined by the
number of elementary operations.
The following primitive operations that are independent from the programming language are used to calculate the
running time:
The following fragment shows how to count the number of primitive operations executed by an algorithm.
17
Best, Worst and Average case (Step Count)
Best Case: It is the minimum number of steps that can be executed for the given parameter.
Worst Case: It is the maximum no. of steps that can be executed for the given parameter.
Average case: It is the Average no. of steps that can be executed for the given parameter.
To better understand all of the above 3 cases, consider an example of English dictionary, used to search a
meaning of a particular word.
Best Case: Suppose we open a dictionary and luckily we get the meaning of a word which we are looking
for. This requires only one step (minimum possible) to get the meaning of a word.
Worst case: Suppose you are searching a meaning of a word and that word is either not in a dictionary or
that word takes maximum possible steps (i.e. now no left hand side or right hand side page is possible to see).
Average Case: If you are searching a word for which neither a Minimum (best case) nor a maximum
(worst case) steps are required is called average case. In this case, we definitely get the meaning of that
word.
18
ASYMPTOTIC NOTATIONS (O, Ω, and θ)
Asymptotic notations have been used in earlier sections in brief. In this section we will elaborate these notations in
detail. They will be further taken up in the next unit of this block.These notations are used to describe the Running time of
an algorithm, in terms of functions, whose domains are the set of natural numbers,N = {1, 2, ……}. Such notations are
convenient for describing the worst case running time function. T(n) (problem size input size).
The complexity function can be also be used to compare two algorithms P and Q that perform the same task.
We say that the function [read as “f of n is big “Oh” of g of n”], if there exist two positive constants
C and n0 such that
: n ≥ n0
19
C.g(n
No Matter
f(n)
F(n)
Figure 1
→ n n0
Figure 1
The intuition behind O- notation is shown in Figure 1.
For all values of n to the right of n0, the value of f(n) is always lies on or below
To understand O-Notation let us consider the following examples: Example1.1: For the function defined by
: show that
Solution:
Remark: The value of C and is not unique. For example, to satisfy the above equation (1), we can also take .
C
So depending on thevalue of C , the value of is also changes. Thus any value of C , which satisfy the given
inequality is a valid solution.
; Let C=3
20
Value of n
6 3
15 24
27 81
……. (1)
There is no value of C and , which satisfy this equation (1). For example, if you take , then to
contradict this inequality you can take any value greater than C, that is . Since we do not found the required
constant C and to satisfy (1). Hence
(vi) Do yourself.
Theorem:
We say that the function [read as “f of n is big “Omega” of g of n”], if and only if there exist two
positive constants C and n0 such that
21
Note that for all values of f(n) always lies on or above g(n).
show that
Solution:
(ii) To show that ; we have to show that no value of C and is there which satisfy the following
equation (1). We can prove this result by contradiction.
22
23
The following figure shows the intuition behind the Θ-Notation.
c2g(n)
f(n)
c1g(n)
f(n) = (g(n))
n
n0
Figure 3
Note that for all values of n to the right of the n0 the value of f(n) lies at or above C1g(n) and at or below C2.g(n).
: show that
To satisfy this inequality (1) simultaneously, we have to find the value of , and , using the following
inequality
Let
24
Left side inequality: ; is satisfied for
Thus .
find out the O-notation, Ω- notation and Θ- notation for the following functions.
(i)
(ii)
Solution:
(i) Here , So
, and .
(ii) , So
, and .
Q.2: What is the running time to retrieve an element from an array of size n
(in worst case):
c) d) none of these
c) d)
25
Q.5: Define an algorithm? What are the various properties of an algorithm?
Q.6: What are the various fundamental techniques used to design an algorithm efficiently? Also write two
problems for each that follows these techniques?
Q.9: Define time complexity. Explain how time complexity of an algorithm is computed?
There are two important ways to categorize (or major) the effectiveness and efficiency of algorithm: Time complexity
and space complexity. The time complexity measures the amount of time used by the algorithm. The space complexity
measures the amount of space used. We are generally interested to find the best case, average case and worst case
complexities of a given algorithm. When a problem becomes “large”, we are interested to find asymptotic complexity
and O (Big-Oh) notation is used to quantify this.
Recurrence relations often arise in calculating the time and space complexity of algorithms. Any problem can be solved
either by writing recursive algorithm or by writing non-recursive algorithm. A recursive algorithm is one which makes
a recursive call to itself with smaller inputs. We often use a recurrence relation to describe the running time of a
recursive algorithm.
A recurrence relation is an equation or inequality that describes a function in terms of its value on smaller inputs or as
a function of preceding (or lower) terms.
1. Basic step: Here we have one or more constant values which are used to terminate recurrence. It is also known as
initial conditions or base conditions.
2. Recursive steps: This step is used to find new terms from the existing (preceding) terms. Thus in this step the
recurrence compute next sequence from the k preceding values
. This formula is called a recurrence relation (or recursive formula). This
formula refers to itself, and the argument of the formula must be on smaller values (close to the base
value).
Hence a recurrence has one or more initial conditions and a recursive formula, known as recurrence relation.
For example: A Fibonacci sequence can be defined by the recurrence relation
1. (Basic Step) The given recurrence says that if n=0 then and if n=1 then . These two
conditions (or values) where recursion does not call itself is called a initial conditions (or Base
conditions).
2. (Recursive step): This step is used to find new terms from the existing (preceding) terms, by
using the formula
[Type here] [Type here] [Type here]
; for
This formula says that “by adding two previous sequence (or term) we can get the next term”.
For example
Let us consider some recursive algorithm and try to write their recurrence relation. Then later we will learn some
method to solve these recurrence relations to analyze the running time of these algorithms.
Algorithm: FACT(n)
1: if
1
3: else
4: return n*FACT(n-1)
5:
33
[Type here] [Type here] [Type here]
Example2: Let denotes the number of times the statement is executed in the algorithm2.
Algorithm2: Example(n)
1: if
3:
4:
5:
5:
The base case is reached when The algorithm2 perform one comparison and one return
statement. Therefore,
Example3: Let denotes the time the statement is executed in the algorithm2.
Algorithm3:
1: if
3:
4:
5:
34
[Type here] [Type here] [Type here]
35
[Type here] [Type here] [Type here]
In substitution method, we guess a bound and then use mathematical induction to prove our guess correct. The iteration
method converts the recurrence into a summation and then relies on techniques for bounding summations to solve the
recurrence and the Master method provides bounds for the recurrence of the form
Solution: step1: The given recurrence is quite similar with that of MERGE- SORT, you guess the solution is
Or
We know that
Thus
Remark: Making a good guess, which can be a solution of a given recurrence, requires
experiences. So, in general, we are often not using this method to get a solution of the
given recurrence.
We generally follow the following steps to solve any recurrence: Expend the recurrence
Evaluate the summation by using the arithmetic or geometric summation formulae as given in
section1.4 of this unit.
Solution: Here
37
= ………..
Solution:
Solution: Here
When we are solving recurrences we always omit the sealing or floor because it won‟t affect the result.
Hence we can write the equation 2 as:
(By substituting
38
= ………..
Algorithm [ Using log property ]
This recurrence (1) describe the running time of any divide-and-conquer algorithm.
a) To make a recursion tree of a given recurrence (1), First put the value of at root node of a tree and make
a of child node of this
root value Now tree will be looks like as:
………
Figure-a
b) Now we have to find the value of by putting (n/b) in place of n in equation (1).That is
… (2)
39
From equation (2), now will be the value of node having branch (child nodes) each of size T(n/b).
Now each in
figure-a will be replaced as follows:
………
…. ……
Figure-b
c) In this way you can expend a tree one more level (i.e. up to (at least) 2 levels).
Step2: (a) Now you have to find per level cost of a tree. Per level cost is the sum of the cost of each node at that level.
For example per level cost at level1 is . This is also called Row-Sum.
(b) Now the total (final) cost of the tree can be obtained by taking the sum of costs of all these levels.
). This is also called Column-Sum.
Let us take one example to understand the concept to solve a recurrence using recursion tree method:
1. To make a recursion tree, you have to write the value of at root node. And
2. The number of child of a Root Node is equal to the value of a. (Here the value of So recursion tree be
looks like as:
Figure-a
40
b) Now we have to find the value of in figure (a) by putting (n/2) in place of n in equation (1).That is
… (2)
From equation (2), now will be the value of node having branch (child
nodes) each of size T(n/2). Now each in
figure-a will be replaced as follows:
Figure-b
41
……………. n
………… n
…… n
…… n
……………….. ……………… ….. n
111 ……………… ………………. 1 1 …. n
Now we find the per level cost of a tree, Per-level cost is the sum of the costs within each level (called row sum). Here per
level cost is For example: per level cost at depth 2 in figure-c can be obtained as:
Then total cost is the sum of the costs of all levels (called column sum), which gives the solution of a given Recurrence.
The height of the tree is
Height of tree can be obtained as follow (see recursion tree of figure c): you start a problem of size n, then problem size reduces
to , then and so on till boundary condition (problem size 1) is not reached. That is
42
Hence total cost in equation (3) is
).
Solution: We always omit floor & ceiling function while solving recurrence. Thus given
recurrence can be written as:
Figure-a to figure-c shows a step-by-step derivation of a recursion tree for the given
recurrence (1).
Figure-a
Figure-b
43
……. n
……… n
…….. n
……………… ………………..
………….
T(1) T(1)
T(1) T(1)
--------------- (*)
--------------- (**)
44
Algorithm (**), thus we can
write:
Remark: If
Solution:
Figure-a to figure-c shows a step-by-step derivation of a recursion tree for the given
recurrence
Figure-a
Figure-b
c) In this way, you can extend a tree up to Boundary condition (when problem size
becomes 1). That is
45
1 …….. 1
………
..……..
.
…………………
1 1 1 1………. ………… 1 1 1 1 ……
Hence Total Cost of the tree in figure (c) can be obtained by taking column sum upto the height of the tree.
a)
{ if
//if n is even
46
b)
{ if
}
2: Solve the following recurrence using Iteration Method:
a)
Solution a.
property ]
47
Thus
Solution for
Q.Let f(n) and g(n) be two asymptotically positive functions. Prove or disprove
the following (using the basic definition of O, Ω and Θ):
48
(a)
For C = 2 and
(c) );
(d)
49
(e)
b.
c.
d.
a.
50
51