Alg Co Co Notes 120 More Complexity
Alg Co Co Notes 120 More Complexity
Problem Complexity
c Theodore Norvell
n = 10 n = 50 n = 100 n = 1000
log2 n 3 ns 5 ns 6 ns 10 ns
n 10 ns 50 ns 100 ns 1 µs
n log2 n 33 ns 282 ns 664 ns 10 µs
n2 100 ns 2.5 µs 10 µs 1 ms
n3 1 µs 125 µs 1 ms 1s
n100 3 × 1083y 2.5 × 10179y 3 × 10209y 3 × 10310y
1.1n 2.6 ns 117 ns 13 µs 8 × 1050y
2n 1 µs 3. 5 × 1024y 4 × 1039y 3 × 10310y
n! 3 ms 10 × 1073y 3 × 10167y 1.3 × 102577y
n
22 6 × 10317y big Bigger HUGE
Another way to look at it is how big an instance can be
solved in a given time
In 1s In 1hr In 1day
log2 n 10300,000,000 big really big
n 109 3.6 × 1012 8.64 × 1013
n log2 n 4 × 107 1011 2 × 1012
n2 3 × 104 2 × 106 9 × 106
n3 1000 15, 000 44, 000
2n 29 42 46
n! 12 15 16
n
22 4 5 5
We broadly classify functions according to how they
behave
• Super exponential functions grow faster than any
exponential function.
2n
∗ n!, 2 , The Ackermann/Peter function.
• Exponential functions
∗ 2n, 3n etc
∗ 2Θ(1)n where Θ(1) represents some positive coeffi-
cient
∗ (The next time someone refers to “exponential
growth” ask yourself if they know what they are
talking about.)
• Polynomial functions n , n log2 n, n2, n3 etc
∗ While n log n is not really a polynomial, it is bounded
by two polynomials and so is considered a polyno-
mial.
∗ nΘ(1) where Θ(1) represents some positive coeffi-
cient
Typeset March 18, 2020 2
Algorithms: Correctness and Complexity. Slide set 12. Problem Complexity
c Theodore Norvell
Problem complexity
A problem consists
• an input set (called the set of instances)
• a set of outputs.
• an acceptability relation between these sets
• a measure of input size.
Using precondition P and postcondition R:
• P identifies inputs out of a (possibly) larger set.
• R defines the acceptable outputs for each input.
An algorithm A solves problem B iff, for every input, A
terminates with an acceptable output.
Exact problem complexity
The worst-case time complexity of a problem is the
worst-case time complexity of the fastest algorithm that
solves it.
For the rest of this slide deck, we are concerned only
with worst-case time complexity.
Note that problem complexity is model dependent.
Usually the RAM model is considered the standard.
When possible, the complexity of a problem should be
stated using Θ(f ) notation.
Upper-bounds
Often the exact problem complexity of a problem is hard
to calculate exactly as we must consider every algorithm
for the problem — including ones that no one has yet
thought of.
You may know a fast algorithm for a problem.
• But that doesn’t mean that Fred won’t come up with a
faster one tomorrow
If you know an algorithm has a worst-case time function
in O(g) then the problem complexity is in O(g).
For example,
• merge sort sorts n numbers in Θ(n log n) time
• therefore (since Θ(g) ⊆ O(g)) merge sort sorts in
O(n log n) time
• therefore the fastest sorting algorithm sorts in
O(n log n) time
• therefore (by definition) the problem complexity of
sorting is in O(n log n)
• in other words O(n log n) is an upper bound for sorting
n numbers.
Proof of the third step: Assume (falsely) that the fastest
sorting algorithm does not take O(n log n) time. Then
merge sort is faster that the fastest sorting algorithm.
Contradiction.
Example:
For a given n
• the best case time is the length of the shortest path
from root to leaf
• the worst case time is the length of the longest path
from root to leaf
To simplify, we’ll only count comparisons.
[Since we are investigating lower bounds, it is kosher to
ignore whole classes of operations. If a sorting algorithm
requires at least f (n) comparisons, then it must require
at least f (n) operations.]
Thus the worst (and best, and average) case time for
2
selection sort is n 2−n comparisons.
But selection sort is not the best algorithm. We want a
result that even the best algorithm can not beat.
Merge sort, for n = 2k , requires Θ(n log n) comparisons.
Can we do better?
The best algorithm has the shortest trees (as n
approaches ∞)
The key question is:
• How short can a tree for an input of size n be?
The inputs [2, 1, 3] and [4, 1, 6] look the same in this model,
as the only way to access the data is by comparing.
There are n! distinct inputs, as there are n! permutations
of n distinct values.
log2(n!)
=
log2(1 × 2 × ...n)
=
log2 2 + log2 3 + ... + log2 n
>
n+1
log2 (x − 1) dx
2
0
1 2 3 4 5 6 7 8
x
n+1
log2 2 + log2 3... + log2 n > 2 log2 (x − 1) dx
n+1
log2 (x − 1) dx
2
= n
1
ln x dx
ln 2 1
= “ ln x dx = x ln x − x”
1
((n ln n − n) − (1 ln 1 − 1))
ln 2
=
1
(n ln n − n + 1)
ln 2
1
∈ “The dominant term is n ln n”
ln 2
Ω(n log n)
For comparison
log2(1024!) = 8770
So merge sort is quite close to optimal.
(The reason quicksort is usually quicker than merge-sort
is that it has a smaller number of moves.)