Algorithms
Analysis
Data Structures
Problem Solving
Main Steps:
1. Problem definition
2. Algorithm design / Algorithm
specification
3. Algorithm analysis
4. Implementation
5. Testing
6. [Maintenance]
1. Problem Definition
What is the task to be
accomplished?
Calculate the average of the grades for
a given student
What are the time / space / speed /
performance requirements ?
2. Algorithm Design /
Specifications
Algorithm: Finite set of instructions that, if
followed, accomplishes a particular task.
Describe: in natural language / pseudo-code /
diagrams / etc.
Criteria to follow:
Input: Zero or more quantities
Output: One or more quantities
Definiteness: Clarity, precision of each instruction
Finiteness: The algorithm has to stop after a finite
(may be very large) number of steps
Effectiveness: Each instruction has to be basic
enough and feasible
Computer Algorithm
A procedure (a finite set of well-defined
instructions) for accomplishing some
tasks which,
given an initial state
terminate in a defined end-state
The computational complexity and
efficient implementation of the
algorithm are important in computing,
and this depends on suitable data
structures.
4,5,6: Implementation,
Testing, Maintenance
Implementation
Decide on the programming language to use
• C, C++, Lisp, Java, Perl, Prolog, assembly, etc. , etc.
Write clean, well documented code
Test, test, test
Integrate feedback from users, fix bugs, ensure
compatibility across different versions
Maintenance
3. Algorithm Analysis
Space complexity
How much space is required
Time complexity
How much time does it take to run the
algorithm
Often, we deal with estimates!
Space Complexity
Space complexity = The amount
of memory required by an algorithm
to run to completion
Some algorithms may be more
efficient if data completely loaded
into memory
Need to look also at system limitations
E.g. Sorting records of a 2GB file. can I
afford to load the entire file into
memory?
Time Complexity
Often more important than space complexity
space available (for computer programs!) tends to be
larger and larger
time is still a problem for all of us
3-4GHz processors on the market
still …
Some problems would take years to run to completion
Algorithms running time is an important issue
Running time
5 ms worst-case
4 ms
3 ms
}
average-case?
best-case
2 ms
1 ms
A B C D E F G
Input
Suppose the program includes an if-then statement that
may execute or not: variable running time
Typically algorithms are measured by their worst case
Running time
The running time of an algorithm best case
varies with the inputs, and
typically grows with the size of average case
the inputs. worst case
120
To evaluate an algorithm or to 100
compare two algorithms, we focus
on their relative rates of growth
Running Time
80
wrt the increase of the input size.
60
The average running time is
difficult to determine. 40
20
We focus on the worst case
running time 0
Easier to analyze 1000 2000 3000 4000
Input Size
09:20 AM
Algorithm analysis:
Experimental Approach
Write a program to 9000
implement the algorithm.
8000
Run this program with 7000
inputs of varying size and 6000
Time (ms)
composition.
5000
4000
Get an accurate measure
of the actual running time 3000
(e.g. system call date). 2000
1000
Plot the results.
0
Problems?
0 50 100
Input Size
09:20 AM
Limitations of Experimental
Approach
The algorithm has to be
implemented, which may take a long
time and could be very difficult.
Results may not be indicative for the
running time on other inputs that are
not included in the experiments.
In order to compare two algorithms,
the same hardware and software
must be used.
Algorithm analysis:
Theoretical Approach
Based on high-level description of
the algorithms, rather than language
dependent implementations
Makes possible an evaluation of the
algorithms that is independent of
the hardware and software
environments
Generality
Algorithm Description
Pseudocode
Example: find the max element of
High-level
description of an an array
algorithm.
Algorithm arrayMax(A, n)
More structured than Input array A of n integers
plain English. Output maximum element of
A
Less detailed than a
program. currentMax A[0]
Preferred notation for i 1 to n 1 do
for describing if A[i] currentMax
algorithms. then
currentMax
A[i]
return currentMax
Pseudocode
Control flow Method call
if … then … [else …] var.method (arg [, arg…])
while … do …
repeat … until … Return value
for … do … return expression
Indentation replaces
braces Expressions
¬ Assignment (equivalent
Method declaration to )
Algorithm method (arg [, arg…])
Input … = Equality testing
(equivalent to )
Output …
n2 Superscripts and other
mathematical
formatting allowed
Primitive Operations
The basic computations Examples:
performed by an algorithm Evaluating an
expression
Identifiable in pseudocode
Assigning a value
Largely independent from the to a variable
programming language
Calling a method
Exact definition not important Returning from a
Use comments method
Instructions have to be basic
enough and feasible!
Low-Level Algorithm Analysis
Based on primitive operations (low-level
computations independent from the
programming language)
E.g.:
Make an addition = 1 operation
Calling a method or returning from a method
= 1 operation
Index in an array = 1 operation
Comparison = 1 operation etc.
Method: Inspect the pseudo-code and
count the number of primitive operations
executed by the algorithm
Counting Primitive Operations
By inspecting the code, we can determine the
number of primitive operations executed by
an algorithm, as a function of the input size.
Algorithm arrayMax(A, n) # operations
currentMax A[0] 2
for i 1 to n 1 do 2+n
if A[i] currentMax then 2(n 1)
currentMax A[i] 2(n 1)
increment counter i 2(n 1)
return currentMax 1 .
Total 7n
1
Estimating Running Time
Algorithm arrayMax executes 7n 1 primitive
operations.
Let’s define
a:= Time taken by the fastest primitive operation
b:= Time taken by the slowest primitive operation
Let T(n) be the actual running time of arrayMax.
We have
a (7n 1) T(n) b(7n 1)
Therefore, the running time T(n) is bounded by
two linear functions.
Growth Rate of Running Time
Changing computer hardware /
software
Affects T(n) by a constant factor
Does not alter the growth rate of T(n)
The linear growth rate of the
running time T(n) is a property of
algorithm arrayMax
Growth Rate
On a graph, as
you go to the
Value of function
fA(n)=30n+8
right, a faster
growing
function
fB(n)=n2+1
eventually
becomes
larger... Increasing n
Constant Factors
The growth rate is not affected by
constant factors or
lower-order terms
Examples
102n + 105 is a linear function
103n2 + 105n is a quadratic function
Understanding Rate of Growth
Consider the example of buying elephants
and a fish:
Cost: (cost_of_elephants) + (cost_of_fish)
or Cost ~ cost_of_elephants (approximation)
The low order terms of a function are relatively
insignificant for large n
n4 + 100n2 + 10n + 50 n4
The highest order term of a function
determines its rate of growth!
Example
A website to process user data (e.g.,
financial records).
To process n records:
program A takes fA(n)=30n+8 microseconds
program B takes fB(n)=n2+1 microseconds
Which program would you choose,
knowing you’ll want to support millions of
users?
Compare rates of growth
(30n + 8) n (n2 + 1) n2
Algorithm Analysis
We know:
Experimental approach – problems
Low level analysis – count operations
Characterize an algorithm as a
function of the “problem size”
E.g.
Input data = array problem size is N
(length of array)
Input data = matrix problem size is N x M
Asymptotic Notation
A way to describe behavior of functions
Abstracts away low-order terms and constant
factors
How we indicate running times of algorithms
Describe the running time of an algorithm as
n grows to
Asymptotic Notation
Goal: to simplify analysis by getting
rid of unneeded information
We want to say in a formal way 3n2
≈ n2
The “Big-Oh” Notation:
given functions f(n) and g(n), we say
that f(n) is O(g(n)) if and only if there
are positive constants c and n0 such
that f(n) ≤ c g(n) for n ≥ n0
Graphic Illustration
f(n) = 2n+6
According to definition: c g(n) = 4n
Need to find a
function g(n) and a
const. c and a f(n) = 2n + 6
constant n0 such as
f(n) ≤ cg(n) when n
≥ n0
g(n) = n and c = 4 g(n) = n
and n0=3
f(n) is O(n)
The order of f(n) is n
2n+6<=cn
cn-2n>=6
n(c-2)>=6
n0
n
n>=6/c-2
More examples
O(g(n)) = the set of functions with a
smaller or same order of growth as g(n)
50n3 + 20n + 4 is O(n3)
Would be correct to say is O(n3+n) ?
• Not useful, as n3 exceeds by far n, for large
values
Would be correct to say is O(n5)
• OK, but g(n) should be as close as possible to
f(n)
3log(n) + log (log (n)) = O( ? )
• Simple Rule: Drop lower order terms and constant factors
Big-Oh and Growth Rate
The big-Oh notation gives an upper
bound on the growth rate of a function.
The statement “f (n) is O(g (n))” means
that the growth rate of f (n) is no more
than the growth rate of g (n).
We can use the big-Oh notation to rank
functions according to their growth rate.
Big-Oh Rules
If f (n) is a polynomial of degree d, then f
(n) is O(nd), i.e.,
1. Drop lower-order terms
2. Drop constant factors
Use the smallest possible class of
functions
Say “2n is O(n)” instead of “2n is O(n 2)”
Use the simplest expression of the class
Say “3n + 5 is O(n)” instead of “3n + 5 is
O(3n)”
Properties of Big-Oh
log nx is O(log n), for x > 0 – how?
nx vs. an, for any fixed x > 0 and a > 1
An algorithm of order n to a certain power is
better than an algorithm of order a ( > 1) to the
power of n
logx n vs. ny, for x > 0 and y > 0
An algorithm of order log n (to a certain power) is
better than an algorithm of n raised to a power y.
Asymptotic analysis -
terminology
Special classes of algorithms:
logarithmic: O(log n)
linear: O(n)
quadratic: O(n2)
polynomial: O(nk), k ≥ 1
exponential: O(an), n > 1, a > 1
Polynomial vs. exponential ?
Logarithmic vs. polynomial ?
Some Numbers
2 3 n
log n n n log n n n 2
0 1 0 1 1 2
1 2 2 4 8 4
2 4 8 16 64 16
3 8 24 64 512 256
4 16 64 256 4096 65536
5 32 160 1024 32768 4294967296
More Examples …
n4 + 100n2 + 10n + 50 is of the
order of O(n4)
10n3 + 2n2 is O(n3)
n3 - n2 is O(n3)
10 is O(1),
1270 is O(1)
Running Time Calculations
General Rules
1. FOR loop
• The number of iterations times the time of the inside
statements.
2. Nested loops
• The product of the number of iterations times the
time of the inside statements.
3. Consecutive Statements
• The sum of running time of each segment.
4. If/Else
• The testing time plus the larger running time
of the cases.
Some Examples
Case1: for (i=0; i<n; i++)
for (j=0; j<n; j++) O(n2)
k++;
Case 2: for (i=0; i<n; i++)
k++;
for (i=0; i<n; i++) O(n2)
for (j=0; j<n; j++)
k++;
Case 3: for (int i=0; i<n-1; i++)
for (int j=0; j<i; j++) O(n2)
int k+=1;
Some Math
properties of logarithms:
logb(xy) = logbx + logby
logb (x/y) = logbx - logby
logbxa= alogbx
logba= logxa/logxb
properties of exponentials:
a(b+c) = abac abc = (ab)c
ab /ac = a(b-c)
b= alogab bc = ac*logab
Problem:
Order the following functions by
their asymptotic growth rates
nlogn
logn3
n2
2logn
log(logn)
Sqr(logn)