L1 AlgoBasics
L1 AlgoBasics
Analysis of algorithms
CSE373: Design and Analysis of Algorithms
Algorithm
In simple terms, an algorithm is a series of instructions to
solve a problem (complete a task)
Life
Explain how to compute GCD to a 8 year old child
Explain how to tie a tie
Analysis:
• Estimate the cost of an algorithm in terms of resources (e.g.
memory, processor, bandwidth, etc.) and performance
(time).
• Prove why certain algorithm works (achieves its goal).
External motivation
•To be able to be hired by a top software company
• To be able to achieve accolades in programming contests
• To be able to do research on design and analysis of algorithms in
theoretical CS, Data Mining, AI, etc.
Internal motivation > External motivation
Computational Problem
A computational problem can be represented as a question
that describes the requirements of the desired output given
an input. For e.g.
• Is n a prime number? (n is a user input)
• What are the prime factors of n? (n is a user input)
• How many prime factors of n are there? (n is user input)
• What is the maximum value of a[i]? (a[1…n] is an input
array)
Common Types of Computational Problems
• Decision Problems: given an input x, verify if x satisfies a
certain property i.e., give YES/NO answer
•3-coloring decision problem: given an undirected graph, G=
(V,E), determine if there is a way to assign a “color” chosen
from {0,1, 2} to each vertex in such a way that no two adjacent
vertices have the same color
d a →b →d
a →c →d
c a →b →c →d
Common Types of Computational Problems
• Counting Problem: Compute the number of solutions to a
given search problem
• 3-coloring counting problem: given an undirected graph G = (V, E)
compute the number of ways we can color the graph using 3 colors:
{1,2,3} (while coloring we have to ensure that no two adjacent vertices
have the same color).
• Path counting Problem: Compute the number of possible paths from u
to v in a given graph G.
d
c
Common Types of Computational Problems
•Optimization Problems: Find the best possible solution among
the set of all possible solutions to a search problem
•Graph coloring optimization problem: given an undirected
graph, G= (V,E), what is the minimum number of colors
required to assign “colors” to each vertex in such a way that
no two adjacent vertices have the same color
•Path optimization problem: Find the shortest path(s) from u
to v in a given graph G.
a b Find all the shortest
paths from a to d:
d a →b →d
a →c →d
c
Complexity of Computational Problems
Computational Problems can be categorized into different
classes based on their complexity, such as:
• Unsolvable problems: problems that can’t be solved by
anyone or any machine ever
• Class P problems: problems that are “efficiently/easily
solvable”
• Class NP problems: problems whose outputs can be
“efficiently/easily verified”
• Class NPC: class NP problems that are “hardest to solve”
….
i = j – 1;
For loop body
body
Loop
i←i–1 }//while
A[i+1] ← key A[i+1] = key;
}//for
Indentation/spacing determines where }
the algorithm/loop/if/else-body ends
i j
Insertion Sort Simulation7 1 2 3 4 5
⊳ A[1 . . n]
key 4 4 9 5 1
INSERTION-SORT (A, n)
for j ← 2 to n i j
key ← A[ j]
i←j–1 i j
while i > 0 and A[i] > key 1 2 3 4 5
4 7 9 5 1
do A[i+1] ← A[i] key 5
i←i–1
A[i+1] ← key i j
1 2 3 4 5
key 1 4 5 7 9 1
Loop Invariant: At the beginning of each iteration
of the for loop, A[1..j-1] is already sorted
•To have an estimate about how much time an algorithm may take to finish for a
given input size. (Running time aka. Time Complexity) analysis
•Sometimes, instead of running time, we are interested in how much
memory/space the algorithm may consume while it runs (space complexity)
•It enables us to compare between two algorithms
• What do we actually mean by running time analysis?
• To determine in which rate running time increases as the problem size increases.
• Size of the problem can be a range of things depending on the problem at hand, such
as:
• size of an array -- for e.g. for sorting/searching in an array
• polynomial degree of an equation -- for e.g. for solving an equation
• number of elements in a matrix – for e.g. for computing its determinant
• number of bits in the binary representation of the input number – for e.g. for computing the
bitwise not of an input number
How Do We Analyze Running Time?
We need to define an objective measure.
Count the number of statements executed?
Associate a "cost" with each statement.
Find the "total cost“ by finding the total number of times each statement
is executed.
Not good: number of statements vary with the programming language as
well as the style of the individual programmer.
Algorithm 1 Algorithm 2
Cost c1 c2 c3 Cost
arr[0] = 0; c1 for(i=0; i<N; i++) c1+c2(N+1)+ c3N
arr[i] = 0; c1 N
arr[1] = 0; c1 -----------------------------
arr[2] = 0; c1 c1+c2(N+1)+c3N+c1N = (c1+c2+c3)N+(c1+c2)
...
arr[N-1] = 0; c1
----------------------
c1+c1+...+c1 = c1N
Informal Notion of Running Time
• Express runtime as a function of the input size n (i.e., as a function, f(n)) in order
to understand how f(n) grows with n and
• count only the most significant term of f(n) and ignore everything else (because
those won’t affect running time much for very large values of n).
Thus the running times (also called time complexity) of the programs of the previous
slide becomes:
f(N)= c1N ≤ N*(some constant)
g(N) = (c1+c2+c3)N+(c1+c2) ≤ (c1+c2+c3)N = N*(some constant)
Thus both these functions are bounded (from above) by some constant multiple of
N and as such both have the same upper bound: O(N). This means that, the running
time of each of these algorithms is always less than or equal to a constant multiple of
N; we ignore the values of constants in the Big Oh notation, i.e., we never write
O(c1N) [it is actually O(N)] or O(65N2+34N+7) [it is actually O(N2)].
fA(n)=30n+8
Running time
fB(n)=n2+1
Increasing n
Growth of Functions
Complexity Graphs
log(n)
Complexity Graphs
n log(n)
log(n)
Complexity Graphs
n10 n3
n2
n log(n)
Complexity Graphs (log scale)
3n
nn
n20
2n
n10
1.1n
Examples:
T(n) = 3n2+10nlgn+8 is O(n2), O(n2lgn), O(n3), O(n4), …
T’(n) = 52n2+3n2lgn+8 is O(n2lgn), O(n3), O(n4), …
Loose upper
bounds
This means that f(n) is bounded from above by a constant multiple of g(n)
Big-Oh Visualization
Asymptotic Notations
• - notation (Big Omega)
Loose lower
bounds
This means that f(n) is bounded from below by a constant multiple of g(n)
Asymptotic Notations
• -notation (Big Theta)
Examples:
T(n) = 3n2+10nlgn+8 is Θ(n2)
T’(n) = 52n2+3n2lgn+8 is Θ(n2lgn)
This means that f(n) is bounded from above and below by constant multiples of g(n),
i.e., f(n) is roughly proportional to g(n)
Some Examples
Determine the time complexity for the following algorithm.
count = 0; //c1
i = 0; //c1
while(i < n){ //(n+1)c2
count++; //nc3
i++; //nc3
}
Why? Outer for loop runs (lg n) times (prove it!) and for
each iteration of outer loop, the inner loop runs (n) times
Some Examples
Determine the time complexity for the following algorithm.
sum = 0;
for(i=1; i<=n; i=i*2)
for(j=0; j<i; j++)
sum += i*j;
Some Examples
Determine the time complexity for the following algorithm.
sum = 0;
for(i=1; i<=n; i=i*2)
for(j=0; j<i; j++)
sum += i*j;
char someString[10];
gets(someString); (n)
int t = strlen(someString);
for(i=0; i<t; i++)
This someString[i] -= 32;
example shows that a badly implemented algorithm may have
greater time complexity than a more efficient implementation
So far, we have been able to ALWAYS determine time complexity of an
algorithm from the input size only. But is input size enough to
determine time complexity unambiguously?
int find_a(char *str)
{
int i;
for (i = 0; str[i]; i++)
{
if (str[i] == ’a’)
return i;
}
return -1;
}
Time
What complexity:
is the time complexity of the above algorithm?
Three scenarios
Best case
Worst case
Average case
Types of Time Complexity Analysis
• Worst-case: (usually done)
• Running time on worst possible input
• Best-case: (bogus)
• Running time on best possible input
How can you arrange the input numbers so that this algorithm
becomes most inefficient (worst case)?
How can you arrange the input numbers so that this algorithm
becomes most efficient (best case)?
Insertion Sort: Running Time
Statement
for j ← 2 to n
do key ← A[ j]
i←j–1
How can we simplify T(n)? Hint: compute the value of T(n) in the best/worst case
Insertion Sort: Running Time
Statement
for j ← 2 to n
do key ← A[ j]
If you are asked to computei worst
← j –case
1 time of Insertion-Sort, just say that the while loop
runs j times for worst possible input i.e. reverse sorted array (explain why) and then
while
compute T(n) from T(n) = Σinj=2
>(j)0=and
…. = A[i] > key
n(n+1)/2 – 1 which is O(n2)
You don’t really need to show such ado detailedA[i+1] ←asA[i]
calculation shown here & in the book.
𝑛 i ←𝑛 i – 1 𝑛
( 𝑛−1 ) +𝑐=4 ∑
𝑇 ( 𝑛 )=𝑐 1 𝑛+𝑐 2 ( 𝑛−1 ) +𝑐 3 A[i+1] key𝑡 +𝑐 ∑ (𝑡 −1)+𝑐 ∑ (𝑡 −1)+𝑐 (𝑛−1)
𝑗 5 𝑗 6 𝑗 7
𝑗=2 𝑗=2 𝑗=2
Here tj = no. of times the condition of while loop is tested for the current value of j.
In the worst case (when input is reverse-sorted), in each iteration of the for loop, all the j-1
elements need to be right shifted, i.e., tj=(j-1)+1 = j :[1 is added to represent the last test].
Putting this in the above eq., we get: T(n) = An2+Bn+C → O(n2), where A, B, C are constants.
What is T(n) in the best case (when the input numbers are already sorted)?
Polynomial & non-polynomial time algorithms
• Polynomial time algorithm: Algorithm whose worst-
case running time is polynomial
• E.g.: Linear Search (in unsorted array): O(n),
Binary Search (in sorted array): O(lg n),
InsertionSort: O(n2), etc.
• Non-polynomial time algorithm: Algorithm whose
worst-case running time is not polynomial
• Examples: an algorithm to enumerate and print all possible
orderings of n persons: O(n!), an algorithm to enumerate
and print all possible binary strings of length n: O(2n)
Amortized Analysis (Section 17.1)
• Amortized Running Time: average time taken by an operation in a
sequence of operations on a given data structure (doesn’t give time
of a single operation).
• Example: Consider MultipoppableStack data structure:- it
supprts 3 operations on a stack:
– PUSH(x) -> takes O(1) time
– POP() -> takes O(1) time
– MULTIPOP(k) //pops top k items from the stack
• while not STACK-EMPTY() and k>0
• do POP()
• k=k-1
• Let’s consider a sequence of n PUSH, POP & MULTIPOP operations
• An e.g. of such a sequence of n=9 operations is: <PUSH, PUSH,
POP, PUSH, PUSH, MULTIPOP(2), PUSH, PUSH, MULTIPOP(3)>
Example of Amortized Analysis (Contd.)
• MULTIPOP = just a # of POP calls; so we need to count only
the total # of PUSHs and POPs in the sequence.
• Each object can be POPed only once for each time it is
PUSHed. Therefore
total # of POPs (including POP calls in MULTIPOP)
≤ total # of PUSHs ≤ total # of operations = n.
Also, total # of PUSHs ≤ n
Therefore total # of PUSH+POP calls ≤ 2n
• Each PUSH/POP operation takes O(1); so total time taken by
the sequence of operations is ≤ 2nO(1) i.e., O(n)
• There are n operations in the sequence, so in average, each
operation takes O(n)/n which is O(1)
⸫ Amortized running time of an operation on MultipoppableStack is O(1)