Searching and Sorting Algorithms: CS117, Spring 2006 Supplementary Lecture Notes Written by Amy Csizmar Dalal
Searching and Sorting Algorithms: CS117, Spring 2006 Supplementary Lecture Notes Written by Amy Csizmar Dalal
1 Introduction
How do you find someone’s phone number in the phone book? How do you find your keys when you’ve
misplaced them? If a deck of cards has less than 52 cards, how do you determine which card is missing?
Searching for items and sorting through items are tasks that we do everyday. Searching and sorting are
also common tasks in computer programs. We search for all occurrences of a word in a file in order to replace
it with another word. We sort the items on a list into alphabetical or numerical order. Because searching
and sorting are common computer tasks, we have well-known algorithms, or recipes, for doing searching
and sorting. We’ll look at two searching algorithms and four sorting algorithms here. As we look at each
algorithm in detail, and go through examples of each algorithm, we’ll determine the performance of each
algorithm, in terms of how quickly and efficiently each algorithm completes its task.
1
3 Search algorithms
There are two types of search algorithms: those that don’t make any assumptions about the order of the
list, and those that assume the list is already in order. We’ll look at the former first, derive the number of
comparisons required for this algorithm, and then look at an example of the latter.
In the discussion that follows, we use the term search term to indicate the item for which we are searching.
We assume the list to search is an array of integers, although these algorithms will work just as well on any
other primitive data type (doubles, characters, etc.). We refer to the array elements as items and the array
as a list.
• Best case: The best case occurs when we find the item on the first try; i.e, the search term is in the
first position in the list. The number of comparisons in this case is 1.
2
• Worst case: The worst case occurs when we either don’t find the item in the list or we find the item
in the last place we look, i.e. the search term is the last item in the list. The number of comparisons
in this case is equal to the size of the array. If our array has N items, then sequential search takes N
comparisons in the worst case.
• Average case: On average, the search term will be somewhere in the middle of the list. The number
of comparisons in this case is approximately N/2.
In both the worst case and the average case, the number of comparisons is proportional to the number of
items in the array, N. Thus, we say in these two cases that the number of comparisons is order N, or O(N )
for short. For the best case, we say the number of comparisons is order 1, or O(1) for short.
// set the starting and ending indexes of the array; these will
// change as we narrow our search
3
int low = 0;
int high = list.length-1;
int mid;
3.2.3 Example
Find the number 5 in the list {1, 4, 5, 6, 9, 12, 14, 16}.
1. Middle item in the list is 6.2 5 is less than 6, so we know that 5, if it’s in the list, is in the first half of
the list. Eliminate everything in the list from 6 on up, and concentrate on the smaller list {1, 4, 5}.
2. Middle item in the list is 4. 5 is greater than 4, so we know that 5, if it’s in the list, is in the second
half of the list. Eliminate everything in the list from 4 on down, and concentrate on the smaller list
{5}.
3. Middle item in the list is 5. 5 = 5, so we found the item! 5 was at position 2 in the original list, so the
algorithm returns this index.
the items on either side of the actual midpoint as our midpoint. Typically, since we find the midpoint by integer division, we
take the term to the left of the actual midpoint. However, our algorithm would work just as well if we selected 9 as our middle
item rather than 6 in this example.
4
The example in the previous section, where we attempted to find the number 5 in a list of 8 items, is
another example of a worst case scenario, because 5 was one away from the midpoint of the list. In this case,
it took three comparisons to find 5 in the list.
If we look at a list that has 16 items, or 32 items, we find that in the worst case it takes 4 and 5
comparisons, respectively, to either find the search term or determine that the search term is not in the
list. In all of these examples, the number of comparisons is log2 N .3 This is much less than the number of
comparisons required in the worst case for sequential search! The worst case for binary search is thus order
log N, or O(logN ); for simplicity, and because the logarithms of any base are all considered to be the same
order of magnitude, we drop the base 2 from this notation.
The average case occurs when the search term is anywhere else in the list. The number of comparisons
is roughly the same as for the worst case, so it also is O(logN ).
In general, anytime an algorithm involves dividing a list in half, the number of operations is O(log N).
3.3 Discussion
From the analysis above, binary search certainly seems faster and more efficient than sequential search.
But binary search has an unfair advantage: it assumes the list is already in sorted order! If we sort the items
in a list before running the sequential search algorithm, does the performance of sequential search improve?
The answer is no. If we know that the list is sorted, then we can modify sequential search so that it stops
searching the list once the items in the list are larger than the search term. But, on average, we will still end
up searching roughly half of the list (N/2 comparisons), and in the worst case we will still have to search
the entire list before declaring defeat (N comparisons). So, our modified linear search is still O(N ) for the
average case and in the worst case.
Figure 1 illustrates why a O(log N ) algorithm is faster than an O(N ) algorithm. Some other common
functions are also illustrated for comparison purposes. As you can see, when N is small there is not much
difference between an O(N ) algorithm and an O(log N ) algorithm. As N grows large, the differences between
these two functions become more pronounced.
Ideally, when talking about any algorithm, we want the number of operations in the algorithm to increase
as slowly as possible as N increases. The best-performing algorithm is O(1), which means that the algorithm
executes in constant time. There are very few algorithms for which this is true, so the next best algorithm
is O(logN ).
3 The mathematical expression y = log2 x means “y is the power of 2 such that 2y = x.”
5
4 Sorting algorithms
There are many different strategies for putting the items in a list in order. Some of them are easy to
implement, but slow; others are trickier to implement, but much faster. We will look at two examples
of ”easy but slow” algorithms–selection sort and bubble sort–and two examples of ”complicated but fast”
algorithms–quicksort and merge sort.
6
// into the second index slot.
list[index2] = temp;
}
Notice that we don’t return the array after we swap the two items. This is one feature of arrays, and
of objects in general: because we refer to an array by its address, if we change an item in the array within
a method, the change holds outside of the method. We can do this because we are not changing the array
itself; we are just changing one of the values stored in the array. So, in short, this is legal.
4.1.3 Example
Sort the list {15, 4, 23, 12, 56, 2} using selection sort.
1. Current item = 15
Smallest = 15
Compare 15 and 4 : 4 is smaller, so set smallest = 4
Compare 4 and 23 : 4 is smaller
Compare 4 and 12 : 4 is smaller
Compare 4 and 56 : 4 is smaller
Compare 4 and 2 : 2 is smaller, so set smallest = 2
End of list : swap 15 and 2
List is now {2, 4, 23, 12, 56, 15}, and 2 is in its proper position.
2. Current item = 4
Smallest = 4
Compare 4 and 23 : 4 is smaller
Compare 4 and 12 : 4 is smaller
Compare 4 and 56 : 4 is smaller
Compare 4 and 15 : 4 is smaller
End of list : nothing to swap. List is same as in previous step, and both 2 and 4 are in their proper
positions.
3. Current item = 23
Smallest = 23
Compare 23 and 12 : 12 is smaller, so set smallest = 12
Compare 12 and 56 : 12 is smaller
Compare 12 and 15 : 12 is smaller
End of list : swap 23 and 12
List is now {2, 4, 12, 23, 56, 15}.
4. Current item = 23
Smallest = 23
Compare 23 and 56 : 23 is smaller
Compare 23 and 15 : 15 is smaller, so set smallest = 15
End of list : swap 23 and 15
List is now {2, 4, 12, 15, 23, 56}
5. Current item = 23
Smallest = 23
Compare 23 and 56 : 23 is smaller
End of list : nothing to swap. List is same as in previous step.
7
4.1.4 Performance of selection sort
The best case for selection sort occurs when the list is already sorted. In this case, the number of swaps
is zero. We still have to compare each item in the list to each other item in the list on each pass through
the algorithm. The first time through, we compare the first item in the list to all other items in the list, so
the number of comparisons is (N-1). The second time through, we compare the second item in the list to
the remaining items in the list, so the number of comparisons is (N-2). The total number of comparisons is
(N − 1) + (N − 2) + ... + 2 + 1. This equation simplifies to N (N + 1)/2 − 1, which is approximately N 2 .
Thus, even in the best case, selection sort requires O(N 2 ) comparisons.
The worst case for selection sort occurs when the first item in the list is the largest, and the rest of the
list is in order. In this case, we perform one swap on each pass through the algorithm, so the number of
swaps is N. The number of comparisons is the same as in the best case, O(N 2 ).
The average case requires the same number of comparisons, O(N 2 ), and roughly N/2 swaps. Thus, the
number of swaps in the average case is O(N ).
In summary, we say that selection sort is O(N ) in swaps and O(N 2 ) in comparisons.
Notice that in each pass, the largest item “bubbles” up the list until it settles in its final position. This
is where bubble sort gets its name.
8
}
if (isSorted==true) {
// we didn’t find anything to swap, so we’re done.
break;
}
}
}
4.2.3 Example
Sort the list {15, 4, 23, 12, 56, 2} using bubble sort.
The next two sorts we will look at behave much differently than selection sort and bubble sort. They are
more complex, but the advantage is that they are much, much faster than selection sort and bubble sort.
9
4.3 Quicksort
Quicksort is in fact a very fast sorting algorithm. The basic idea behind quicksort is this: Specify one
element in the list as a “pivot” point. Then, go through all of the elements in the list, swapping items that
are on the “wrong” side of the pivot. In other words, swap items that are smaller than the pivot but on
the right-hand side of the pivot with items that are larger than the pivot but on the left-hand side of the
pivot. Once you’ve done all possible swaps, move the pivot to wherever it belongs in the list. Now we can
ignore the pivot, since it’s in position, and repeat the process for the two halves of the list (on each side of
the pivot). We repeat this until all of the items in the list have been sorted.
Quicksort is an example of a divide and conquer algorithm. Quicksort sorts a list by dividing the list
into smaller and smaller lists, and sorting the smaller lists in turn.
4.3.1 Example
It’s easiest to see how quicksort operates using an example. Let’s sort the list {15, 4, 23, 12, 56, 2} by
quicksort. The first thing we need to do is select a pivot. We can select any item of the list as our pivot, so
for convenience let’s select the first item, 15.
Now, let’s find the first item in the list that’s greater than our pivot. We’ll use the variable low to store
the index of this item. Starting with the first item beyond the pivot (4), we find that the first item that is
greater than 15 is 23. So we set low = 2, since 23 is at index 2 in the list.
Next, we start at the end of the list and work back toward the beginning of the list, looking for the first
item that is less than the pivot. We’ll use the variable high to store the index of this item. The last item in
the list, 2, is less than our pivot (15), so we set high = 5, since 2 is at index 5 in the list.
Now that we’ve found two items that are out of place in the list, we swap them. So, now our list looks
like this: {15, 4, 2, 12, 56, 23}. We also increment low and decrement high by 1 before continuing our search.
Now, low = 3 and high = 4, and these correspond to 2 and 56, respectively, in the list.
Starting at index low, we try to find another item that’s greater than our pivot. In fact, there’s no other
element that’s greater than our pivot. We can determine this by looking at the values of low and high, and
stopping when low becomes equal to high. At this point, we’re done swapping; now we just have to put the
pivot into the correct position. The proper position for the pivot is the position below where low and high
meet: position 3, which is currently occupied by 12. So, we swap 15 and 12.
At the end of the first pass, the list looks like this: {12, 4, 2, 15, 56, 23}. Clearly, the elements are not in
order, but at least they are all on the correct side of the pivot, and 15 is in fact in the correct place in the
list. We can now safely ignore 15 for the rest of the algorithm.
Now, let’s repeat the procedure on the two halves of the list: {12, 4, 2} and {56, 23}. Let’s start with
the second half, since there are fewer items there.
Again, we pick the first element, 56, as our pivot. We set our low and high markers to 1 and 1, respectively.
But wait a minute. Right off the bat, low and high are equal! In fact, this means that there’s nothing to
swap except for the pivot, and so we swap 56 and 23. This means our list is 23, 56, which is sorted, so we
can safely ignore both of these for the rest of the algorithm.4
At the end of the second pass, the list is {12, 4, 2, 15, 23, 56}, but we still need to sort the first half of
the list.
We pick 12 as our pivot point, set low to 1 and high to 2. We increment low and find that now low =
high, which means we cannot swap anything except for the pivot. We swap the pivot with the last element
in the list, which gives us {2, 4, 12}. 12 is now in the proper position. Finally, we repeat this for the list {2,
4}, find that the two are in order already, and end up with the list {2, 4, 12, 15, 23, 56}, which is sorted.
10
public void quicksort(int[] list, int low, int high) {
if (low < high) {
int mid = partition(list,low,high);
quicksort(list,low,mid-1);
quicksort(list,mid+1,high);
}
}
Before we get to the partition method, let’s take a close look at the quicksort() method. quicksort()
actually calls itself, twice! A method that calls itself is called a recursive method. Recursive methods are
useful when we need to repeat a calculation or some other series of tasks some number of times with a
parameter that varies each time the calculation is done, such as to compute factorials or powers of a number.
Recursive methods have three defining features:
• A base case or stop case, that indicates when the method should stop calling itself
• A test to determine if recursion should continue
• One or more calls to itself
In other words, each time we call the method, we change the parameters that we send to the method,
test to see if we should call the method again, and provide some way to ultimately return from the method.
In the quicksort() method, the conditional statement serves as the test. The method is called twice, once
for the lower half of the list and once for the upper half of the list. The “stop case” occurs when low is
greater than or equal to high: in this case, nothing is done because the condition in the conditional statement
is false.
The quicksort() method calls another method that selects the pivot and moves the elements in the list
around as in the example. It returns the index of the pivot point in the list, which is the item in the list
that is now fixed in place.
11
if (low < high) {
// found one! move it into position
list[high] = list[low];
//System.out.println("Current list: ");
//print(list);
swapCount++;
}
}
} while (low < high);
return low;
}
As you can see, partition() performs most of the actual work of the quicksort algorithm.
4.4.1 Example
Sort the list {15, 4, 23, 12, 56, 2} using merge sort.
3. Divide lists in half again: {15}, {4}, {23}, {12}, {56}, {2}. All lists are size 1, so we can now start
merging back together.
12
4. Merge {15} and {4}
(a) Compare 15 and 4: 4 is smaller, so 4 goes in list first.
(b) 15 is only item remaining, so copy it into the list.
(c) New list: {4, 15}.
5. Merge {4, 15} and {23}
(a) Compare 4 and 23: 4 is smaller, so 4 goes in list first.
(b) Compare 15 and 23: 15 is smaller, so 15 goes in list second.
(c) 23 is only item remaining, so copy it into the list
(d) New list: {4, 15, 23}.
6. Merge {12} and {56}
(a) Compare 12 and 56: 12 is smaller, so 12 goes in list first.
(b) 56 is only item remaining, so copy it into the list.
(c) New list: {12, 56}.
7. Merge {12, 56} and {2}
(a) Compare 12 and 2: 2 is smaller, so 2 goes in list first.
(b) 12 and 56 are in the same list, so there are no more comparisons to make. Copy them into the
new list.
(c) New list: {2, 12, 56}.
8. Merge {4, 15, 23} and {2, 12, 56}
(a) Compare 4 and 2. 2 is smaller, so 2 goes in list first.
(b) Compare 4 and 12. 4 is smaller, so 4 goes in list second.
(c) Compare 15 and 12. 12 is smaller, so 12 goes in list third.
(d) Compare 15 and 56. 15 is smaller, so 15 goes in list fourth.
(e) Compare 23 and 56. 23 is smaller, so 23 goes in list fifth.
(f) 56 is the only item left, so copy it into the last position in the list.
(g) New (and final) list: {2, 4, 12, 15, 23, 56}.
13
// merge the halves
merge(list,low,high);
}
}
This method calculates the midpoint of the array that’s passed in, uses the midpoint to divide the array
in half, and then calls mergeSort() for each of these halves. Again, the test is found in the conditional
statement, and the stop case occurs when low is greater than or equal to high, which occurs when the size
of the array is 1.
mergeSort() will be recursively called until we have arrays of size 1. At that point, the second method,
merge(), is called to reassemble the arrays:
public void merge(int[] list, int low, int high) {
// temporary array stores the ‘‘merge’’ array within the method.
int[] temp = new int[list.length];
// Set the midpoint and the end points for each of the subarrays
int mid = (low + high)/2;
int index1 = 0;
int index2 = low;
int index3 = mid + 1;
// if there are any items left over in the first subarray, add them to
// the new array
while (index2 <= mid) {
temp[index1] = list[index2];
index1++;
index2++;
}
// if there are any items left over in the second subarray, add them
// to the new array
while (index3 <= high) {
temp[index1] = list[index3];
index1++;
index3++;
}
14
Table 1: Summary of sorting algorithms
4.5 Summary
We’ve looked at four different sorting algorithms: two O(N 2 ) sorts (selection sort and bubble sort), and
two O(N logN ) sorts (quicksort and merge sort). Table 1 lists the average/worst case performance of each
algorithm, along with its key advantage and disadvantage over other algorithms.
15