0% found this document useful (0 votes)
26 views41 pages

Principles of Algorithm Analysis: Biostatistics 615

This document discusses principles for analyzing algorithms. It introduces common relationships between the input size N and an algorithm's running time, such as O(N), O(log N), and O(N2). It compares the sequential search and binary search algorithms, showing that binary search has better worst-case performance of O(log N) compared to sequential search's O(N). Empirical testing of the algorithms on various input sizes is also presented.

Uploaded by

ZeedArt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views41 pages

Principles of Algorithm Analysis: Biostatistics 615

This document discusses principles for analyzing algorithms. It introduces common relationships between the input size N and an algorithm's running time, such as O(N), O(log N), and O(N2). It compares the sequential search and binary search algorithms, showing that binary search has better worst-case performance of O(log N) compared to sequential search's O(N). Empirical testing of the algorithms on various input sizes is also presented.

Uploaded by

ZeedArt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Principles of Algorithm

Analysis

Biostatistics 615
Lecture 3
Problem Set…
z Questions?

z FAQ:
• How to compile C code?
• 10 minute introduction for the adventurous…
C Compilers
z Popular commercial compilers are:
• Borland C++ Builder
• Microsoft Visual C++
• Metrowerks Codewarrior

z Several free compilers also available


• Borland C++ (older version, no graphics)
• GCC
GCC
z GNU C Compiler
• Available on most UNIX systems
• Also on newer Macintosh computers

z For Windows, download from


• www.mingw.org
• www.cygwin.com
Running GCC
z Command line application
z Basic usage is:

gcc –o program_name source.c

• Use extension “.c” or “.cpp” for source code

z Reads in source(s), creates executable


program
A simple program …
/* This is a comment */
#include “stdio.h”
#include “stdlib.h”

int main()
{
int lucky; // Variable declaration

srand(123456); // Initialize random numbers

lucky = rand() % 49; // Generate a random number

printf(“Hello! My lucky number is %d\n”, lucky);

return 0;
}
Good editors for C programs…
z Commercial compilers provide very
fancy editors

z A good free alternative is nedit

• On Windows, available when you install


cygwin.
Today
z Strategies for comparing algorithms

z Common relationships between


algorithm complexity and input data

z Compare two simple search algorithms


Objectives
z Framework for

• empirical testing
• approximate analysis

z Highlight performance characteristics of


algorithms
Specific Questions
z Compare two algorithms for one task

z Predict performance in a new environment


• If we had a computer that was 10x faster and could
handle 10x more data, how would approach perform?

z Set values of algorithm parameters


Two Common Mistakes
z Ignore performance of algorithm
• Shun faster algorithms to avoid complexity in program
• Instead, wait for simple N2 algorithms, when N log N
alternatives exist of modest complexity available

z Too much weight on performance of algorithm


• Improving a very fast program is not worth it
• Spending too much time tinkering with code is rarely
good use of time
Empirical analysis
z Given two algorithms … which is better?

z Run both
• Say, algorithm A takes 3 seconds
• Say, algorithm B takes 30 seconds

z Empirical studies may not always be practical


• Some algorithms may take too long to run!
Choices of Input Data
z Actual data
• Measures performance in use
z Random data
• Generic approach, may not be representative
z Perverse data
• Attempt worst case analysis
Limitations of Empirical
Analysis
z Quality of implementation
• Is our favored implementation coded more
carefully than another?

z Extraneous factors
• Compiler
• Machine
• Computer system
Limitations of Empirical
Analysis
z Requires a working program

z Theoretical analysis is an alternative


• Estimate potential gains
z Predict effectiveness relative to new
algorithms or computers (that may not
yet exist)
Theoretical Analysis
z Predict performance of algorithm based
on theoretical properties

z “Independent” of actual implementation

z Several constructs occur frequently in


algorithm analysis
Limitations of Theoretical
Analysis

z Efficiency can depend on compiler

z Efficiency may fluctuate with input data

z Some algorithms are not well understood


The idea…
z Given a code fragment

#Find parent of node i


i = a[i];

z Consider how many times it is executed


z But not how long each execution takes
Two typical analyses
z Average-case for random input

z Worst-case

z Are these representative of real world


problems?
• Check with empirical predictions…
The Primary Parameter N
z Examples
• Degree of polynomial
• Number of characters in a string
• Size of file to be sorted
• Number of input data items
• Some other abstract measure of problem size

z With multiple parameters, we can often hold


one of them constant
Running time as a function of N
Running time when N
f(N) Description
doubles…
1 constant -
log N logarithmic constant increase
N linear doubles
N log N log-linear more than doubles
N2 quadratic increases fourfold
N3 cubic increases eightfold
2N exponential running time squares
Running time as a function of N
z Multiple terms may be involved
• e.g. N + N log N

z Typically, we ignore
• Smaller terms
• Constant coefficient
• Focus on inner loop

z In rare cases, smaller terms and constant


coefficient will be important
Time to Solve Large Problem

Problem Size N = 1,000,000


operations
per second
N N log N N2

106 seconds minutes months

109 instant instant hours

1012 instant instant seconds


Time to Solve Huge Problem

Problem Size N = 1,000,000,000


operations
per second
N N log N N2

106 hours days never

109 seconds minutes centuries

1012 instant instant months


Big-Oh Notation
z Algorithm is O(N) or O(N log N)
• Common statement
• What does it mean?
z Summarizes performance for large N

z Focuses on leading terms of expression


describing running time
Big-Oh Notation
z Consider function g(N)

z It is said to be O(f(N))

z If there exist c0 and N0 such that:

• N > N0 implies c0f(N) > g(N)


From N to Running Time…
z Common relationships
• N2
• log N
• N log N
•N

z Describe examples of how these arise


z Cost of running program is CN
O(N2)
z Loop through input successively, eliminate
one item at a time
C N = C N −1 + N for N ≥ 2, C 1= 1
= C N − 2 + ( N − 1) + N
...
= 1 + 2 + ... + ( N − 1) + N
N ( N + 1)
=
2
O(log N)
z Recursive program, halves input in one step
C 2 n = C 2 n−1 + 1 for N ≥ 2, C 1= 1
= C 2 n−2 + 1 + 1
= C 2 n −3 + 3
...
= C 20 + n
= n +1
N = 2n
O(N log N)
z Recursive program, processes each item,
splits input into two halves, examines each
C N = 2C N / 2 + N for N ≥ 2, C 1= 0
one…
C2n = 2C2n−1 + 2 n

C2 n 2C2n−1 + 2 n
n
=
2 2n
C2n−1
= n −1
+1
2
C 2 n−2
= n−2
+1+1
2
...
=n
O(2N)

z Halves input, must examine each item…


CN = CN /2 + N for N ≥ 2, C 1= 1
N N N
= N + + + + ...
2 4 8
≈ 2N
Application
z Analysis of two search algorithms

z Consider a set of items

• Evaluate functions to decide whether a


particular item is present…
Sequential Search
int search(int a[], int value, int start, int stop)
{
// Variable declarations
int i;

// Search through each item


for (i = start; i <= stop; i++)
if (value == a[i])
return i;

// Search failed
return -1;
}
Sequential Search Properties
z Algorithm:
• Look through array sequentially, until we find a match

z Average cost
• If match found: N/2
• If match not found: N

z Actual cost depends on fraction of successful


searches
Better Sequential Search
z If items are sorted…

z Stop unsuccessful search early, when


we reach item with higher value
• Cost for unsuccessful searches is now N/2

z Overall, algorithm is still O(N)


Binary Search
int search(int a[], int value, int start, int stop)
{
while (stop >= start)
{
// Find midpoint
int mid = (start + stop) / 2;

// Compare midpoint to value


if (value == a[mid]) return mid;

// Reduce input in half!!!


if (value > a[mid])
{ start = mid + 1; }
else
{ stop = mid - 1; }
}

// Search failed
return -1;
}
Binary Search Properties
z Algorithm:
• Halve number of items to consider with each
comparison

z Worst-case cost
• Maximum cost is never greater than log2 N

z Much better than sequential search, but even


better methods exist!
Sequential vs. Binary Search
M = 1,000 M = 10,000 M = 100,000
N S B S B S B
125 1 1 13 2 130 20
250 3 0 25 2 251 22
500 5 0 49 3 492 23
1250 13 0 128 3 1276 25
2500 26 1 267 3 * 28
Timings in seconds, for M searches in table of N elements
Summary
z Outline principles for analysis of
algorithms

z Introduced some common relationships


between N and running time

z Described two simple search algorithms


Further Reading
z Read chapter 2 of Sedgewick
Tip of the Day:
Defensive Programming
z Document code
• Indicate intended purpose
• Specify required inputs
• Always indicate author

z Check for error conditions

You might also like