Big O
Big O
Big O time is the language and metric we use to describe the efficiency of algorithms. Not understanding it
thoroughly can really hurt you in developing an algorithm. Not only might you be judged harshly for not really
understanding big O, but you will also struggle to judge when your algorithm is getting faster or slower.
An Analogy:
Imagine the following scenario: You’ve got a file on a hard drive and you need to send it to your friend who
lives across the country. You need to get the file to your friend as fast as possible. How should you send it?
Most people's first thought would be email, FTP, or some other means of electronic transfer. That thought is
reasonable, but only half correct.
If it's a small file, you're certainly right. It would take 5 - 10 hours to get to an airport, hop on a flight, and
then deliver it to your friend.
But what if the file were really, really large? Is it possible that it's faster to physically deliver it via plane?
Yes, actually it is. A one-terabyte (1 TB) file could take more than a day to transfer electronically. It would be
much faster to just fly it across the country. If your file is that urgent (and cost isn't an issue), you might just
want to do that.
What if there were no flights, and instead you had to drive across the country? Even then, for a really huge
file, it would be faster to drive.
Time Complexity
This is what the concept of asymptotic runtime, or big O time, means. We could describe the data transfer
"algorithm" runtime as:
Electronic Transfer: O ( s ), where s is the size of the file. This means that the time to transfer the file
increases linearly with the size of the file. (Yes, this is a bit of a simplification, but that's okay for
these purposes.)
Airplane Transfer: O(1) with respect to the size of the file. As the size of the file increases, it won't
take any longer to get the file to your friend. The time is constant.
No matter how big the constant is and how slow the linear increase is, linear will at some point surpass
constant.
O (s )
There are many more runtimes than this. Some of the most common ones are O ¿ and O ( 2N ). There’s no
fixed list of possible runtimes though.
You can also have multiple variables in your runtime. For example, the time to paint a fence that's w meters
wide and h meters high could be described as O(wh). If you needed p layers of paint, then you could say
that the time is O( w hp).
Academics use big O, big Ɵ (theta), and big Ω (omega) to describe runtimes.
O (big O): In academia, big O describes an upper bound on the time. An algorithm that prints all the
values in an array could be described as O(N ), but it could also be described as O(N 2 ), O(N 3 ), or
N
O(2 ) (or many other big O times). The algorithm is at least as fast as each of these; therefore they
are upper bounds on the runtime. This is similar to a less-than-or-equal-to relationship. If Bob is X
years old (I'll assume no one lives past age 130), then you could say X ≤ 30. It would also be correct
to say that X ≤ 1 ,000 or X ≤ 1,000,000. It's technically true (although not terribly useful). Likewise,
a simple algorithm to print the values in an array is O( N )as well as O(N 3 ) or any runtime bigger
than O( N ).
Ω (big omega): In academia, Ω is the equivalent concept but for lower bound. Printing the values in
an array is Ω ( N ) as well as Ω ¿ and Ω ( 1 ). After all, you know that it won’t be faster than those
runtimes.
Ɵ (big theta): In academia, Ɵ means both O and Ω. That is, an algorithm is Θ ( N ) if it is both O ( N )
and Ω ( N ) . Ɵ gives a tight bound on runtime.
In industry (and therefore in interviews), people seem to have merged Ɵ and O together. Industry's meaning
of big O is closer to what academics mean by Ɵ, in that it would be seen as incorrect to describe printing an
array as O( N 2 ). Industry would just say this is O(N ).
For this book, we will use big O in the way that industry tends to use it: By always trying to offer the tightest
description of the runtime.
Let's look at this from the perspective of quick sort. Quick sort picks a random element as a "pivot" and then
swaps values in the array such that the elements less than pivot appear before elements greater than pivot.
This gives a "partial sort.” Then it recursively sorts the left and right sides using a similar process.
Best Case: If all elements are equal, then quick sort will, on average, just traverse through the array
once. This is O(N). (This actually depends slightly on the implementation of quick sort. There are
implementations, though, that will run very quickly on a sorted array.)
Worst Case: What if we get really unlucky and the pivot is repeatedly the biggest element in the
array? (Actually, this can easily happen. If the pivot is chosen to be the first element in the subarray
and the array is sorted in reverse order, we'll have this situation.) In this case, our recursion doesn't
divide the array in half and recurse on each half. It just shrinks the subarray by one element. This will
degenerate to an O( N 2 ) runtime.
Expected Case: Usually, though, these wonderful or terrible situations won't happen. Sure,
sometimes the pivot will be very low or very high, but it won't happen over and over again. We can
expect a runtime of O( N log N ).
We rarely ever discuss best case time complexity, because it's not a very useful concept. After all, we could
take essentially any algorithm, special case some input, and then get an O(1) time in the best case.
For many – probably most – algorithms, the worst case and the expected case are the same. Sometimes
they're different, though, and we need to describe both of the runtimes.
It's easy for candidates to muddle these concepts (probably because both have some concepts of "higher”
"lower" and "exactly right"), but there is no particular relationship between the concepts.
Best, worst, and expected cases describe the big O (or big theta) time for particular inputs or scenarios.
Big O, big omega, and big theta describe the upper, lower, and tight bounds for the runtime.
Space Complexity
Time is not the only thing that matters in an algorithm. We might also care about the amount of memory – or
space – required by an algorithm.
Space complexity is a parallel concept to time complexity. If we need to create an array of size n, this will
require O(n) space. If we need a two-dimensional array of size nxn, this will require O ( n2 ) space.
Stack space in recursive calls counts, too. For example, code like this would take O(n) time and O(n) space.
int sum(int n) { /* Ex 1. */
if (n <= 0) {
return 0;
}
return n + sum(n – 1);
}
Each of these calls is added to the call stack and takes up actual memory.
However, just because you have n calls total doesn’t mean it takes O(n) space. Consider the below function,
which adds adjacent elements between 0 and n:
int pairSumSequence(int n) { /* Ex 2. */
int sum = 0;
for (int i = 0; i < n; i++) {
sum += pairSum(i, i + 1);
}
return sum;
}
There will be roughly O(n) calls to pairSum. However, those calls do not exist simultaneously on the call
stack, so you only need O(1) space.
For this reason, we drop the constants in runtime. An algorithm that one might have described as O(2N) is
actually O(N).
Many people resist doing this. They will see code that has two (non-nested) for loops and continue this
O(2N). They think they’re being more “precise”. They’re not.
Consider the below code:
Which one is faster? The first one does one for loop and the other one does two for loops. But then, the first
solution has two lines of code per for loop rather than one.
If you're going to count the number of instructions, then you'd have to go to the assembly level and take into
account that multiplication requires more instructions than addition, how the compiler would optimize
something, and all sorts of other details.
This would be horrendously complicated, so don't even start going down this road. Big O allows us to express
how the runtime scales. We just need to accept that it doesn't mean that O(N ) is always better than
2
O(N ).
We already said that we drop constants. Therefore, O ( N 2+ N 2 ) would be O ( N 2 ). If we don’t care about that
latter N 2 term, why would we care about N? We don’t.
O ( N + N ) becomes O ( N ).
2 2
O ( N + log N ) becomes O ( N ) .
O ( 5∗2 N +1000 N 100 ) becomes O ( 2N ).
We might still have a sum in a runtime. For example, the
expression O ( B 2+ A ) cannot be reduced (without some
special knowledge of A and B).
In the example on the left, we do A chunks of work then B chunks of work. Therefore, the total amount of
work is O ( A +B ) .
In the example on the right, we do A chunks of work for each element in A. Therefore, the total amount of
work is O ( A∗B ) .
In other words:
If your algorithm is in the form “do this, then, when you’re all done, do that” then you add the
runtimes.
If your algorithm is in the form “do this for each time you do that” then you multiply the runtimes.
Amortized Time
An ArrayList, or a dynamically resizing array, allows you to have the benefits of an array while offering
flexibility in size. You won’t run out of space in the ArrayList since its capacity will grow as you insert
elements.
An ArrayList is implemented with an array. When the array hits capacity, the ArrayList class will create a
new array with double the capacity and copy all the elements over to the new array.
The array could be full. If the array contains N elements, then inserting a new element will take O(N) time.
You will have to create a new array of size 2N and then copy N elements over. This insertion will take O(N)
time.
However, we also know that this doesn’t happen very often. The vast majority of the time insertion will be in
O(1) time.
We need a concept that takes both into account. This is what amortized time does. It allows us to describe
that, yes, this worst case happens every once in a while. But once it happens, it won't happen again for so
long that the cost is "amortized”.
As we insert elements, we double the capacity when the size of the array is a power of 2. So after X elements,
we double the capacity at array sizes 1, 2, 4, 8, 16, ... , X. That doubling takes, respectively, 1, 2, 4, 8, 16, 32,
64, ... , X copies.
What is the sum of 1 + 2 + 4 + 8 + 16 + ... + X? If you read this sum left to right, it starts with 1 and doubles
until it gets to X. If you read right to left, it starts with X and halves until it gets to 1.
X X X
What then is the sum of X + + + +...+1? This is roughly 2X.
2 4 8
Therefore, X insertions take O(2X) time. The amortized time for each insertion is O(1).
Log N Runtimes
We commonly see O(log N) in runtimes. Where does this come from?
Let’s look at binary search as an example. In binary search, we are looking for an example x in an N-element
sorted array. We first compare x to the midpoint of an array. If x == middle, then we return. If x <
middle, then we search on the left side of the array. If x > middle, then we search on the right side of the
array.
N
We start off with an N-element array to search. Then, after a single step, we’re down to elements. One
2
N
more step, and we’re down to elements. We stop when we either find the value or we’re down to just
4
one element.
The total runtime is then a matter of how many steps (dividing N by 2 each time) we can take until N
becomes 1.
N = 16
N = 8 /* divide by 2 */
N = 4 /* divide by 2 */
N = 2 /* divide by 2 */
N = 1 /* divide by 2 */
We could look at this in reverse (going from 1 to 16 instead of 16 to 1). How many times can we multiply 1 by
2 until we get N?
N = 1
N = 2 /* multiply by 2 */
N = 4 /* multiply by 2 */
N = 8 /* multiply by 2 */
N = 16 /* multiply by 2 */
24 =16 → log 2
k
log 2 N=k → 2 =N
This is a good takeaway for you to have. When you see a problem where the number of elements in the
problem space gets halved each time, that will likely be a O ¿ runtime.
This is the same reason why finding an element in a balanced binary search tree is O ¿. With each
comparison, we go either left or right. Half the nodes are on each side, so we cut the problem space in half
each time.
Recursive Runtimes
Here’s a tricky one. What’s the runtime of this code?
int f(int n) {
if (n <= 1) {
return 1;
}
return f(n – 1) + f(n – 1);
}
A lot of people will, for some reason, see the two calls to f and jump to O ( N 2 ). This is completely incorrect.
Rather than making assumptions, let’s derive the runtime by walking through the code. Suppose we call f(4).
This calls f(3) twice. Each of those calls to f(3) calls f(2), until we get down to f(1).
f(1)
f(2)
f(1)
f(3)
f(1)
f(2)
f(1)
f(4)
f(1)
f(2)
f(1)
f(3)
f(1)
f(2)
f(1)
The tree will have depth N. Each node (i.e., function call) has two children. Therefore, each level will have
twice as many calls as the one above it. The number of nodes on each level is:
Try to remember this pattern. When you have a recursive function that makes multiple calls, the runtime will
often (but not always) look like O(branches depth), where branches is the number of times each recursive call
branches. In this case, this gives us O(2N).
As you may recall, the base of a log doesn't matter for big O since logs of different bases are only different by
a constant factor. However, this does not apply to exponents. The base of an exponent does matter.