0% found this document useful (0 votes)
21 views

Analysis of Algorithms

The document discusses analyzing algorithms by making observations about their performance on different sized inputs, developing mathematical models to describe their behavior, and classifying their performance based on order-of-growth as the input size increases to predict how they will scale to larger problems. It provides an example of empirically analyzing the running time of a 3-SUM algorithm to find triples of numbers from an input that sum to zero and fitting the results to a mathematical model to hypothesize the algorithm's order-of-growth.

Uploaded by

dogan20021907
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Analysis of Algorithms

The document discusses analyzing algorithms by making observations about their performance on different sized inputs, developing mathematical models to describe their behavior, and classifying their performance based on order-of-growth as the input size increases to predict how they will scale to larger problems. It provides an example of empirically analyzing the running time of a 3-SUM algorithm to find triples of numbers from an input that sum to zero and fitting the results to a mathematical model to hypothesize the algorithm's order-of-growth.

Uploaded by

dogan20021907
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

1.

4 A NALYSIS OF A LGORITHMS
‣ introduction
‣ observations
‣ mathematical models
‣ order-of-growth classifications
‣ memory
h t t p : / / a l g s 4. c s . p r i n c e t o n . e d u
Running time

“ As soon as an Analytic Engine exists, it will necessarily guide the future


course of the science. Whenever any result is sought by its aid, the question
will arise—By what course of calculation can these results be arrived at by
the machine in the shortest time? ” — Charles Babbage (1864)

how many times do you


have to turn the crank?

Analytic Engine
Reasons to analyze algorithms

Predict performance.

Compare algorithms. this course

Provide guarantees.

Understand theoretical basis. theory of algorithms

Primary practical reason: avoid performance bugs.

client gets poor performance because programmer


did not understand performance characteristics
Some algorithmic successes

N-body simulation.
・Simulate gravitational interactions among N bodies.
・Brute force: N steps.
2

・Barnes-Hut algorithm: N log N steps, enables new research. Andrew Appel


PU '81
The challenge

Q. Will my program be able to solve a large practical input?

Why is my program so slow ? Why does it run out of memory ?

Insight. [Knuth 1970s] Use scientific method to understand performance.


Scientific method applied to analysis of algorithms

A framework for predicting performance and comparing algorithms.

Scientific method.
・Observe some feature of the natural world.
・Hypothesize a model that is consistent with the observations.
・Predict events using the hypothesis.
・Verify the predictions by making further observations.
・Validate by repeating until the hypothesis and observations agree.

Principles.
・Experiments must be reproducible.
・Hypotheses must be falsifiable.

Feature of the natural world. Computer itself.


1.4 A NALYSIS OF A LGORITHMS
‣ introduction
‣ observations
‣ mathematical models
‣ order-of-growth classifications
‣ memory
h t t p : / / a l g s 4. c s . p r i n c e t o n . e d u
Example: 3-SUM

3-SUM. Given N distinct integers, how many triples sum to exactly zero?

% more 8ints.txt a[i] a[j] a[k] sum


8
30 -40 10 0
30 -40 -20 -10 40 0 10 5 1

2 30 -20 -10 0
% java ThreeSum 8ints.txt
3 -40 40 0 0
4
4 -10 0 10 0

Context. Deeply related to problems in computational geometry.


3-SUM: brute-force algorithm

public class ThreeSum


{
public static int count(int[] a)
{
int N = a.length;
int count = 0;
for (int i = 0; i < N; i++)
for (int j = i+1; j < N; j++)
for (int k = j+1; k < N; k++) check each triple

if (a[i] + a[j] + a[k] == 0)


count++;
return count;
}

public static void main(String[] args)


{
In in = new In(args[0]);
int[] a = in.readAllInts();
StdOut.println(count(a));
}
}
Measuring the running time

Q. How to time a program?


A. Manual.
Measuring the running time

Q. How to time a program?


A. Automatic.

public class Stopwatch

Stopwatch() create a new stopwatch

double elapsedTime() time since creation (in seconds)

public static void main(String[] args)


{
In in = new In(args[0]);
int[] a = in.readAllInts();
Stopwatch stopwatch = new Stopwatch();
client code
StdOut.println(ThreeSum.count(a));
double time = stopwatch.elapsedTime();
StdOut.println("elapsed time " + time);
}
Empirical analysis

Run the program for various input sizes and measure running time.
Empirical analysis

Run the program for various input sizes and measure running time.

N time (seconds) †

250 0

500 0

1,000 0.1

2,000 0.8

4,000 6.4

8,000 51.1

16,000 ?
Data analysis

Standard plot. Plot running time T (N) vs. input size N.


Data analysis

Log-log plot. Plot running time T (N) vs. input size N using log-log scale.

lg(T (N)) = b lg N + c
b = 2.999
c = -33.2103

T (N) = a N b, where a = 2 c
3 orders
of magnitude

power law

Regression. Fit straight line through data points: a N b. slope

Hypothesis. The running time is about 1.006  10 –10  N 2.999 seconds.


Prediction and validation

Hypothesis. The running time is about 1.006  10 –10  N 2.999 seconds.

"order of growth" of running


time is about N3 [stay tuned]
Predictions.
・51.0 seconds for N = 8,000.
・408.1 seconds for N = 16,000.

Observations.
N time (seconds) †

8,000 51.1

8,000 51

8,000 51.1

16,000 410.8

validates hypothesis!
Experimental algorithmics

System independent effects.


・Algorithm. determines exponent
・Input data. in power law

determines constant in
System dependent effects.
power law
・Hardware: CPU, memory, cache, …
・Software: compiler, interpreter, garbage collector, …
・System: operating system, network, other apps, …

Bad news. Difficult to get precise measurements.


Good news. Much easier and cheaper than other sciences.

e.g., can run huge number of experiments


1.4 A NALYSIS OF A LGORITHMS
‣ introduction
‣ observations
‣ mathematical models
‣ order-of-growth classifications
‣ memory
h t t p : / / a l g s 4. c s . p r i n c e t o n . e d u
Cost of basic operations

Challenge. How to estimate constants.

operation example nanoseconds †

integer add a + b 2.1

integer multiply a * b 2.4

integer divide a / b 5.4

floating-point add a + b 4.6

floating-point multiply a * b 4.2

floating-point divide a / b 13.5

sine Math.sin(theta) 91.3

arctangent Math.atan2(y, x) 129

... ... ...

† Running OS X on Macbook Pro 2.2GHz with 2GB RAM


Cost of basic operations

Observation. Most primitive operations take constant time.

operation example nanoseconds †

variable declaration int a c1

assignment statement a = b c2

integer compare a < b c3

array element access a[i] c4

array length a.length c5

1D array allocation new int[N] c6 N

2D array allocation new int[N][N] c7 N 2

Caveat. Non-primitive operations often take more than constant time.

novice mistake: abusive string concatenation


Example: 1-SUM

Q. How many instructions as a function of input size N ?

int count = 0;
for (int i = 0; i < N; i++)
if (a[i] == 0)
count++;

N array accesses

operation frequency

variable declaration 2

assignment statement 2

less than compare N+1

equal to compare N

array access N

increment N to 2 N
Example: 2-SUM

Q. How many instructions as a function of input size N ?

int count = 0;
for (int i = 0; i < N; i++)
for (int j = i+1; j < N; j++)
if (a[i] + a[j] == 0)
count++;

operation frequency

variable declaration N+2

assignment statement N+2

less than compare ½ (N + 1) (N + 2)

equal to compare ½ N (N − 1)
tedious to count exactly
array access N (N − 1)

increment ½ N (N − 1) to N (N − 1)
Simplification 1: cost model

Cost model. Use some basic operation as a proxy for running time.

int count = 0;
for (int i = 0; i < N; i++)
for (int j = i+1; j < N; j++)
if (a[i] + a[j] == 0)
count++;

operation frequency

variable declaration N+2

assignment statement N+2

less than compare ½ (N + 1) (N + 2)

equal to compare ½ N (N − 1)

array access N (N − 1) cost model = array accesses

increment ½ N (N − 1) to N (N − 1) (we assume compiler/JVM do not


optimize any array accesses away!)
Simplification 2: tilde notation

・Estimate running time (or memory) as a function of input size N.


・Ignore lower order terms.
– when N is large, terms are negligible
– when N is small, we don't care

Ex 1. ⅙ N 3 + 20 N + 16 ~ O(N 3)
Ex 2. ⅙ N 3 + 100 N 4/3 + 56 ~ O(N 3)
Ex 3. ⅙N3 - ½N 2 + ⅓ N ~ O(N 3)

discard lower-order terms


(e.g., N = 1000: 166.67 million vs. 166.17 million)
Simplification 2: tilde notation

・Estimate running time (or memory) as a function of input size N.


・Ignore lower order terms.
– when N is large, terms are negligible
– when N is small, we don't care

operation frequency Big-O notation

variable declaration N+2 O(N)

assignment statement N+2 O(N)

less than compare ½ (N + 1) (N + 2) O(N 2)

equal to compare ½ N (N − 1) O(N 2)

array access N (N − 1) O(N)

increment ½ N (N − 1) to N (N − 1) O(N 2)
Example: 2-SUM

Q. Approximately how many array accesses as a function of input size N ?

int count = 0;
for (int i = 0; i < N; i++)
for (int j = i+1; j < N; j++)
"inner loop"
if (a[i] + a[j] == 0)
count++;

A. O(N 2) array accesses.

Bottom line. Use cost model and Big-O notation to simplify counts.
35
Example: 3-SUM

Q. Approximately how many array accesses as a function of input size N ?

int count = 0;
for (int i = 0; i < N; i++)
for (int j = i+1; j < N; j++)
for (int k = j+1; k < N; k++)
"inner loop"
if (a[i] + a[j] + a[k] == 0)
count++;

A. O(N 3) array accesses.

Bottom line. Use cost model and tilde notation to simplify counts.
36
Mathematical models for running time

In principle, accurate mathematical models are available.

In practice,
・Formulas can be complicated.
・Advanced mathematics might be required.
・Exact models best left for experts.
costs (depend on machine, compiler)

TN = c1 A + c2 B + c3 C + c4 D + c5 E
A= array access
B= integer add
frequencies
C= integer compare
(depend on algorithm, input)
D= increment
E= variable assignment

Bottom line. We use approximate models in this course: T(N) ~ O(N 3).
38
1.4 A NALYSIS OF A LGORITHMS
‣ introduction
‣ observations
‣ mathematical models
‣ order-of-growth classifications
‣ memory
h t t p : / / a l g s 4. c s . p r i n c e t o n . e d u
Common order-of-growth classifications

Good news. The set of functions


1, log N, N, N log N, N 2, N 3, and 2N
suffices to describe the order of growth of most common algorithms.

41
Common order-of-growth classifications

order of
name typical code framework description example T(2N) / T(N)
growth

add two
1 constant a = b + c; statement 1
numbers

while (N > 1)
log N logarithmic divide in half binary search ~1
{ N = N / 2; ... }

for (int i = 0; i < N; i++) find the


N linear loop 2
{ ... } maximum

divide
N log N linearithmic [see mergesort lecture] mergesort ~2
and conquer

for (int i = 0; i < N; i++)


check all
N 2 quadratic for (int j = 0; j < N; j++) double loop 4
pairs
{ ... }

for (int i = 0; i < N; i++)


for (int j = 0; j < N; j++) check all
N3 cubic triple loop 8
for (int k = 0; k < N; k++) triples
{ ... }

exhaustive check all


2N exponential [see combinatorial search lecture] T(N)
search subsets

42
Practical implications of order-of-growth

time to process millions of inputs


growth
rate
1970s 1980s 1990s 2000s

1 instant instant instant instant

log N instant instant instant instant

N minutes seconds second instant

tens of
N log N hour minutes seconds
seconds

N2 decades years months weeks

N3 never never never millennia

43
Binary search demo

Goal. Given a sorted array and a key, find index of the key in the array?

Binary search. Compare key against middle entry.


・Too small, go left.
・Too big, go right.
・Equal, found.

successful search for 33

6 13 14 25 33 43 51 53 64 72 84 93 95 96 97
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

lo hi

44
Binary search: Java implementation

Trivial to implement?
・First binary search published in 1946.
・First bug-free one in 1962.
・Bug in Java's Arrays.binarySearch() discovered in 2006.
public static int binarySearch(int[] a, int key)
{
int lo = 0, hi = a.length-1;
while (lo <= hi)
{
int mid = (lo + hi) / 2;
if (key < a[mid]) hi = mid - 1; one "3-way compare"
else if (key > a[mid]) lo = mid + 1;
else return mid;
}
return -1;
}

Invariant. If key appears in the array a[], then a[lo] <= key <= a[hi].

45
Binary search: mathematical analysis

Proposition. Binary search uses at most 1 + lg N key compares to search in


a sorted array of size N.

Def. T (N) = # key compares to binary search a sorted subarray of size ≤ N.

Binary search recurrence. T (N) ≤ T (N / 2) + 1 for N > 1, with T (1) = 1.

left or right half


(floored division)
Pf sketch. [assume N is a power of 2]

T (N) ≤ T (N / 2) + 1 [ given ]

≤ T (N / 4) + 1 + 1 [ apply recurrence to first term ]

≤ T (N / 8) + 1 + 1 + 1 [ apply recurrence to first term ]

≤ T (N / N) + 1 + 1 + … + 1 [ stop applying, T(1) = 1 ]

= 1 + lg N
46
1.4 A NALYSIS OF A LGORITHMS
‣ introduction
‣ observations
‣ mathematical models
‣ order-of-growth classifications
‣ theory of algorithms
h t t p : / / a l g s 4. c s . p r i n c e t o n . e d u ‣ memory
Basics

Bit. 0 or 1. NIST most computer scientists


Byte. 8 bits.
Megabyte (MB). 1 million or 220 bytes.
Gigabyte (GB). 1 billion or 230 bytes.

64-bit machine. We assume a 64-bit machine with 8-byte pointers.


・Can address more memory.
・Pointers use more space. some JVMs "compress" ordinary object
pointers to 4 bytes to avoid this cost

60
Typical memory usage for primitive types and arrays

type bytes type bytes

boolean 1 char[] 2 N + 24

byte 1 int[] 4 N + 24

char 2 double[] 8 N + 24

int 4 one-dimensional arrays

float 4

long 8
type bytes
double 8
char[][] ~2MN
primitive types

int[][] ~4MN

double[][] ~8MN

two-dimensional arrays

61
Typical memory usage for objects in Java

Object overhead. 16 bytes.


Reference. 8 bytes.
Padding. Each object uses a multiple of 8 bytes.

Ex 1. A Date object uses 32 bytes of memory.

16 bytes (object overhead)

4 bytes (int)
4 bytes (int)
4 bytes (int)
4 bytes (padding)

32 bytes

62
Typical memory usage for objects in Java
Typical memory usage for objects in Java
Turning the crank: summary

Empirical analysis.
・Execute program to perform experiments.
・Assume power law and formulate a hypothesis for running time.
・Model enables us to make predictions.
Mathematical analysis.
・Analyze algorithm to count frequency of operations.
・Use tilde notation to simplify analysis.
・Model enables us to explain behavior.
Scientific method.
・Mathematical model is independent of a particular system;
applies to machines not yet built.
・Empirical analysis is necessary to validate mathematical models
and to make predictions.

71

You might also like