04 Analysis
04 Analysis
1.4 A NALYSIS OF
A LGORITHMS
‣ introduction
‣ observations
‣ mathematical models
‣ order-of-growth classifications
‣ theory of algorithms
h t t p : / / a l g s 4. c s . p r i n c e t o n . e d u ‣ memory
1.4 A NALYSIS OF
A LGORITHMS
‣ introduction
‣ observations
‣ mathematical models
‣ order-of-growth classifications
‣ theory of algorithms
h t t p : / / a l g s 4. c s . p r i n c e t o n . e d u ‣ memory
Running time
Analytic Engine
4
Cast of characters
Theoretician wants
to understand.
5
Reasons to analyze algorithms
Predict performance.
Provide guarantees.
6
Some algorithmic successes
7
Some algorithmic successes
N-body simulation.
・Simulate gravitational interactions among N bodies.
・Brute force: N steps.
2
8
The challenge
Scientific method.
・Observe some feature of the natural world.
・Hypothesize a model that is consistent with the observations.
・Predict events using the hypothesis.
・Verify the predictions by making further observations.
・Validate by repeating until the hypothesis and observations agree.
Principles.
・Experiments must be reproducible.
・Hypotheses must be falsifiable.
3-SUM. Given N distinct integers, how many triples sum to exactly zero?
3 -40 40 0 0
-10 0 10 0
4
// ...
int main() {
ifstream file("1Mints.txt");
vector<int> a;
int x;
while (file >> x) {
a.push_back(x);
}
cout << ThreeSum(a) << endl;
return 0;
}
14
Measuring the running time
A. Manual.
ThreeSum 2Kints.txt
ThreeSum 4Kints.txt
15
Measuring the running time
...
#include <ctime>
...
int main() {
ifstream file("1Mints.txt");
vector<int> a;
int x;
int c = 0;
while (file >> x && c++ < 2000) {
a.push_back(x);
}
clock_t start = clock();
cout << ThreeSum(a) << endl;
clock_t end = clock();
double elapsed = double(end - start) / CLOCKS_PER_SEC;
cout << "Time: " << elapsed << "s" << endl;
return 0; client code
}
16
Empirical analysis
Run the program for various input sizes and measure running time.
N time (seconds) †
250 0
500 0
1,000 0.1
2,000 0.8
4,000 6.4
8,000 51.1
16,000 ?
18
Data analysis
19
Data analysis
Log-log plot. Plot running time T (N) vs. input size N using log-log scale.
lg(T (N)) = b lg N + c
b = 2.999
c = lg a = -33.2103
T (N) = a N b, where a = 2 c
3 orders
of magnitude
power law
Observations.
N time (seconds) †
8,000 51.1
8,000 51
8,000 51.1
16,000 410.8
validates hypothesis!
22
Experimental algorithmics
26
1.4 A NALYSIS OF
A LGORITHMS
‣ introduction
‣ observations
‣ mathematical models
‣ order-of-growth classifications
‣ theory of algorithms
h t t p : / / a l g s 4. c s . p r i n c e t o n . e d u ‣ memory
Mathematical models for running time
31
Cost of basic operations
32
Cost of basic operations
int count = 0;
for (int i = 0; i < N; i++)
if (a[i] == 0)
count++;
N array accesses
operation frequency
variable declaration 2
assignment statement 2
equal to compare N
array access N
increment N to 2 N
34
Example: 2-SUM
int count = 0;
for (int i = 0; i < N; i++)
for (int j = i+1; j < N; j++)
if (a[i] + a[j] == 0)
count++;
operation frequency
equal to compare ½ N (N − 1)
tedious to count exactly
array access N (N − 1)
increment ½ N (N − 1) to N (N − 1)
37
Simplifying the calculations
38
Simplification 1: cost model
Cost model. Use some basic operation as a proxy for running time.
int count = 0;
for (int i = 0; i < N; i++)
for (int j = i+1; j < N; j++)
if (a[i] + a[j] == 0)
count++;
operation frequency
equal to compare ½ N (N − 1)
increment ½ N (N − 1) to N (N − 1)
39
Simplification 2: tilde notation
40
Simplification 2: tilde notation
increment ½ N (N − 1) to N (N − 1) ~ ½ N 2 to ~ N 2
41
Example: 2-SUM
int count = 0;
for (int i = 0; i < N; i++)
for (int j = i+1; j < N; j++)
if (a[i] + a[j] == 0)
count++;
A. ~ N 2 array accesses.
Bottom line. Use cost model and tilde notation to simplify counts.
42
Example: 3-SUM
int count = 0;
for (int i = 0; i < N; i++)
for (int j = i+1; j < N; j++)
for (int k = j+1; k < N; k++)
if (a[i] + a[j] + a[k] == 0) "inner loop"
count++;
A. ~ ½ N 3 array accesses.
Bottom line. Use cost model and tilde notation to simplify counts.
43
Diversion: estimating a discrete sum
Ex 1. 1 + 2 + … + N.
Ex 2. 1k + 2k + … + N k.
44
Estimating a discrete sum
Ex 4. 1 + ½ + ¼ + ⅛ + …
45
Estimating a discrete sum
wolframalpha.com
46
Mathematical models for running time
In practice,
・Formulas can be complicated.
・Advanced mathematics might be required.
・Exact models best left for experts.
costs (depend on machine, compiler)
TN = c1 A + c2 B + c3 C + c4 D + c5 E
A = array access
B = integer add
C = integer compare frequencies
D = increment (depend on algorithm, input)
E = variable assignment
Definition. If f (N) ~ c g(N) for some constant c > 0, then the order of growth
of f (N) is g(N).
・Ignores leading coefficient.
・Ignores lower-order terms.
Ex. The order of growth of the running time of this code is N 3.
int count = 0;
for (int i = 0; i < N; i++)
for (int j = i+1; j < N; j++)
for (int k = j+1; k < N; k++)
if (a[i] + a[j] + a[k] == 0)
count++;
51
Common order-of-growth classifications
order of
name typical code framework description example
growth
add two
1 constant a = b + c; statement
numbers
while (N > 1)
log N logarithmic divide in half binary search
{ N = N / 2; ... }
divide
N log N linearithmic [see mergesort lecture] mergesort
and conquer
52
Practical implications of order-of-growth
tens of hundreds of
N millions billions
millions millions
hundreds of hundreds of
N log N millions millions
thousands millions
tens of
N2 hundreds thousand thousands
thousands
2N 20 20s 20s 30
Bottom line. Need linear or linearithmic alg to keep pace with Moore's law.
53
Search Algorithm: Binary Search
if (A[mid] == Num)
return mid; // Num found at mid
else
return -1; // Num not found
}
Analyzing BinarySearch
• log 2 N = log 2 2 𝑋 = X
(-40, -20) 60
(-40, 10) 30
⋮ ⋮
(-20, -10) 30
⋮ ⋮ only count if
a[i] < a[j] < a[k]
(-10, 0) 10
to avoid
⋮ ⋮ double counting
( 10, 30) -40
63
( 10, 40) -50
32,000 14.88
64,000 59.16
ThreeSumFast
Goals.
・Establish “difficulty” of a problem.
・Develop “optimal” algorithms.
Approach.
・Suppress details in analysis: analyze “to within a constant factor.”
・Eliminate variability in input model: focus on the worst case.
Upper bound. Performance guarantee of algorithm for any input.
Lower bound. Proof that no algorithm can do better.
Optimal algorithm. Lower bound = upper bound (to within a constant factor).
69
Commonly-used notations in the theory of algorithms
½ N2
Asymptotic order of 10 N 2 classify
Big Theta Θ(N2)
growth 5 N 2 + 22 N log N + 3N algorithms
⋮
10 N 2
100 N develop
Big Oh Θ(N2) and smaller O(N2)
22 N log N + 3 N upper bounds
⋮
½N2
N5 develop
Big Omega Θ(N2) and larger Ω(N2)
N 3 + 22 N log N + 3 N lower bounds
⋮
Theory of algorithms: example 1
Goals.
・Establish “difficulty” of a problem and develop “optimal” algorithms.
・Ex. 1-SUM = “Is there a 0 in the array? ”
Upper bound. A specific algorithm.
・Ex. Brute-force algorithm for 1-SUM: Look at every array entry.
・Running time of the optimal algorithm for 1-SUM is O(N).
Lower bound. Proof that no algorithm can do better.
・Ex. Have to examine all N entries (any unexamined one might be 0).
・Running time of the optimal algorithm for 1-SUM is Ω(N).
Optimal algorithm.
・Lower bound equals upper bound (to within a constant factor).
・Ex. Brute-force algorithm for 1-SUM is optimal: its running time is Θ(N).
71
Theory of algorithms: example 2
Goals.
・Establish “difficulty” of a problem and develop “optimal” algorithms.
・Ex. 3-SUM.
Upper bound. A specific algorithm.
・Ex. Brute-force algorithm for 3-SUM.
・Running time of the optimal algorithm for 3-SUM is O(N 3).
72
Theory of algorithms: example 2
Goals.
・Establish “difficulty” of a problem and develop “optimal” algorithms.
・Ex. 3-SUM.
Upper bound. A specific algorithm.
・Ex. Improved algorithm for 3-SUM.
・Running time of the optimal algorithm for 3-SUM is O(N 2 log N ).
Start.
・Develop an algorithm.
・Prove a lower bound.
Gap?
・Lower the upper bound (discover a new algorithm).
・Raise the lower bound (more difficult).
Golden Age of Algorithm Design.
・1970s-.
・Steadily decreasing upper bounds for many important problems.
・Many known optimal algorithms.
Caveats.
・Overly pessimistic to focus on worst case?
・Need better than “to within a constant factor” to predict performance.
74
Commonly-used notations in the theory of algorithms
10 N 2 provide
Tilde leading term ~ 10 N 2 10 N 2 + 22 N log N approximate
10 N 2 + 2 N + 37 model
½ N2
asymptotic order classify
Big Theta Θ(N2) 10 N 2
of growth algorithms
5N 2+ 22 N log N + 3N
10 N 2
develop
Big Oh Θ(N2) and smaller O(N2) 100 N
upper bounds
22 N log N + 3 N
½N2
develop
Big Omega Θ(N2) and larger Ω(N2) N 5
lower bounds
N 3+ 22 N log N + 3 N
78
Typical memory usage for fundamental types and vectors
bool 1 vector<char> N + 24
char 1 vector<int> 4 N + 24
int 4 vector<double> 8 N + 24
float 4
one-dimensional vectors
long 4
long long 8
type bytes
double 8
vector<vector<char>> ~1MN
vector<vector<double>> ~8MN
two-dimensional vectors
79
Typical memory usage for objects in C++
struct S {
char a; // size: 1, alignment: 1 • Objects of type S can be
allocated at any address
char b; // size: 1, alignment: 1
because both S.a and S.b can
}; // sizeof(S) = 2, alignof(S) = 1 be allocated at any address.
80
Typical memory usage summary
82
Turning the crank: summary
Empirical analysis.
・Execute program to perform experiments.
・Assume power law and formulate a hypothesis for running time.
・Model enables us to make predictions.
Mathematical analysis.
・Analyze algorithm to count frequency of operations.
・Use tilde notation to simplify analysis.
・Model enables us to explain behavior.
Scientific method.
・Mathematical model is independent of a particular system;
applies to machines not yet built.
・Empirical analysis is necessary to validate mathematical models
and to make predictions.
83