46B Big O Search Sort
46B Big O Search Sort
*I’d like to acknowledge Dr. Philip Heller and Dr. Chakarov from the Computer Science Department for sharing their resources for the
course. The majority of the material we use this semester will be based their work.
Outline
• Big O
• Selection sort
• Insertion sort
– Big O
• Merge sort
• Binary Search
– Big O
Big O
Algorithms need to be correct and efficient
12n + 1000n + 65
3 2
➔ O(n ) 3
More on complexity
• Important in algorithms that process all data in a data
structure (array, ArrayList, HashSet, TreeSet, …)
• Look for loops that process each member of the data
structure
????
Sorting Algorithms
• The easy / obvious algorithms are slow
– Selection Sort: many visits, few moves
– Insertion Sort: many moves, few visits
• The smart algorithms are fast and recursive
– Merge Sort
– Quick Sort (you’ll see this in 146)
Selection Sort
Selection sort
• Selection sort is a sorting algorithm that treats the input as two parts, a sorted part and an unsorted part, and repeatedly
selects the proper next value to move from the unsorted part to the end of the sorted part.
• The index variable i denotes the dividing point. Elements to the left of i are sorted, and elements including and to the
right of i are unsorted. All elements in the unsorted part are searched to find the index of the element with the smallest
value. The variable indexSmallest stores the index of the smallest element in the unsorted part. Once the element with
the smallest value is found, that element is swapped with the element at location i. Then, the index i is advanced one
place to the right, and the process repeats.
• The term "selection" comes from the fact that for each iteration of the outer loop, a value is selected for position i.
• Selection sort has the advantage of being easy to code, involving one loop nested within another loop
• Selection sort may require a large number of comparisons. The selection sort algorithm runtime is O().
Selection Sort
• Sorts array members
• We’ll look at sorting ints
• Could be any numeric primitive type, or any Comparable
class type
• The big idea: swap array members until the array is
sorted
– Find smallest member, put it in a[0]
– Find next smallest member, put it in a[1]
– etc.
A simple example
1 10 4 7
A simple example
7 10 4 1
1 4 10 7
A simple example
7 10 4 1
1 10 4 7
1 4 7 10
A simple example
7 10 4 1
1 10 4 7
1 4 10 7
1 4 7 10
Pseudocode
selectionSort(arr)
public SelectionSorter(int[] a) {
this.a = a;
}
// Sorts a
public void sortInPlace() { . . . }
}
The Actual Sorting Code
public void sortInPlace(int[] a) {
for (int startOfUnsorted=0;
startOfUnsorted<a.length;
startOfUnsorted++) {
// Find index of smallest member of unsorted region.
int smallestInUnsorted = Integer.MAX_VALUE;
int indexOfSmallest = -1;
for (int i=startOfUnsorted; i<a.length; i++) {
if (a[i] < smallestInUnsorted) {
smallestInUnsorted = a[i];
indexOfSmallest = i;
}
}
7 # of visits = 4
Ar
ey
10 ou
Ar
sm sm
ey
Ar
al al
ou
4 ey
les les
ou
t? t?
sm
1
Ar
al
ey
les
ou
t?
sm
al
les
t?
startOfUnsorted = 0:
7 # of visits = 4
Ar
ey
10 ou
Ar
sm sm
ey
Ar
4 al al
ou
ey
les les
ou
t? t?
sm
1
Ar
al
ey
les
ou
t?
sm
al
les
t?
startOfUnsorted = 0:
1 # of visits = 4
10 +4
SW
4 AP
7
startOfUnsorted = 1
1 # of visits = 4
10 +4
4
7
startOfUnsorted = 1
1 # of visits = 4
10 +4
Ar
+3
ey
Ar
4
ou
ey
sm
ou
al
7 sm
les
Ar
al
ey
t?
les
ou
t?
sm
al
les
t?
startOfUnsorted = 1
1 # of visits = 4
10 +4
Ar
+3
ey
Ar
4
ou
ey
sm
ou
al
7 sm
les
Ar
al
ey
t?
les
ou
t?
sm
al
les
t?
startOfUnsorted = 1
1 # of visits = 5
4 +4
SW +3
10 AP
+4
7
startOfUnsorted = 2
1 # of visits = 4
4 +4
+3
10
+4
7
startOfUnsorted = 2
1 # of visits = 4
4 +4
Ar +3
10 ey
ou +4
7 sm
+2
Ar
al
ey
les
ou
t?
sm
al
les
t?
startOfUnsorted = 2
1 # of visits = 4
4 +4
Ar +3
10 ey
ou +4
7 sm
+2
Ar
al
ey
les
ou
t?
sm
al
les
t?
startOfUnsorted = 2
1 # of visits = 4
4 +4
+3
10
+4
SW
AP
7 +2
+4
startOfUnsorted = 2
5 # of visits = 4
9 +4
+3
7
+4
SW
AP
10 +2
+4
startOfUnsorted = 3
1 # of visits = 4
4 +4
+3
7
+4
10 +2
+4
startOfUnsorted = 3
1 # of visits = 4
4 +4
+3
7
+4
10 +2
Ar
ey
ou
+4
sm
al
+1
les
t?
startOfUnsorted = 3
1 # of visits = 4
4 +4
+3
7
+4
10 +2
+4
+1
startOfUnsorted = 3
1 # of visits = 4
4 +4
+3
7
+4
10 +2
SW
AP
1 # of visits = 4
4 +4
+3
7
+4
10 +2
+4
+1
+4
# of visits =
4 + 4 + 3 + 4 + 2+ 4 + 1 + 4
Find smallest
# of visits =
4+4+3+4+2+4+1+4
Σ
i=1
i + 4*4
1-minute algebra review
x
Σ
i=1
i = x * (x+1) / 2
Check: 1 + 2 + 3 + 4 = 10 = 4*5/2
For array of size n, # of visits =
n
Σ
i=1
i + 4*n
= ½ * n * (n+1) + 4n
For array of size n, # of visits =
n
Σ
i=1
i + 4*n
= ½ * n * (n+1) + 4n
= n /2 + n/2 + 4n = n /2 + 4.5n
2 2
For array of size and, # of visits =
= ½ * n * (n+1) + 4n
= n /2 + n/2 + 4n = n /2 + 4.5n
2 2
1 4 7 10 Done!
Pseudocode Practice
Insertion Sort
insertionSort(arr)
Pseudocode
Insertion Sort Solution
insertionSort(arr)
for i in 0 to arr.length
toPlace = arr[i]
j = i-1 //last index of sorted part
while j>=0 and arr[j]>toPlace
arr[j+1] = arr[j]
j--
arr[j+1]=toPlace
Merge Sort
Merge sort
• Merge sort is a sorting algorithm that divides a list into two halves, recursively sorts each half, and then merges
the sorted halves to produce a sorted list. The recursive partitioning continues until a list of 1 element is reached,
as list of 1 element is already sorted.
• The merge sort algorithm uses three index variables to keep track of the elements to sort for each recursive
method call. The index variable i is the index of first element in the list, and the index variable k is the index of
the last element. The index variable j is used to divide the list into two halves. Elements from i to j are in the left
half, and elements from j + 1 to k are in the right half.
• Merge sort merges the two sorted partitions into a single list by repeatedly selecting the smallest element from
either the left or right partition and adding that element to a temporary merged list. Once fully merged, the
elements in the temporary merged list are copied back to the original list.
• The merge sort algorithm's runtime is O(N log N). Merge sort divides the input in half until a list of 1 element is
reached, which requires log N partitioning levels. At each level, the algorithm does about N comparisons
selecting and copying elements from the left and right partitions, yielding N * log N comparisons.
• Merge sort requires O(N) additional memory elements for the temporary array of merged elements. For the final
merge operation, the temporary list has the same number of elements as the input. Some sorting algorithms sort
the list elements in place and require no additional memory, but are more complex to write and understand.
Merge Sort
• Not in-place: uses extra memory
• “Divide-and-conquer” algorithm
• Recursion simplifies by splitting data set in half,
rather than reducing its size by 1
5 7 1 9 6 5 2 3
5 7 1 9 6 5 2 3
5 7 1 9 6 5 2 3
5 7 1 9 6 5 2 3
Merge Sort (put it back together)
5 7 1 9 6 5 2 3
5 7 1 9 5 6 2 3
1 5 7 9 2 3 5 6
1 2 3 5 5 6 7 9
Complexity of Merge Sort
How many times is the entire data set split?
5 7 1 9 6 5 2 3
5 7 1 9 6 5 2 3
5 7 1 9 6 5 2 3
5 7 1 9 6 5 2 3
log2(n) = log2(8) = 3
Complexity of Merge Sort
How do we put the data back together?
5 7 1 9 6 5 2 3
5 7 1 9 5 6 2 3
1 5 7 9 2 3 5 6
1 2 3 5 5 6 7 9
O(n) for each step
Merge Sort Complexity
• There are log2(n) “levels”
• For each “level”
• O(n) work is done to split and merge
• = O(n * log(n))
If you forget
everything else
about Merge Sort
When computer scientists say
log they mean log base 2. complexity,
remember this!
Pseudocode Practice
Merge Sort
mergeSort(arr,start,end)
merge(arr,start,mid,end)
Pseudocode Practice
Merge Sort
mergeSort(arr,start,end)
if start>=end
return
mergedIndex = start
mergedIndex = start
mergedIndex = start
mergedIndex = start
mergedIndex = start
mergedIndex = start
mergedIndex = start
mergedIndex = start
mergedIndex = start
mergedIndex = start
mergedIndex = start
mergedIndex = start
• Linear search is a search algorithm that starts from the beginning of a list, and checks each element until the
search key is found or the end of the list is reached.
• For a list with N elements, linear search thus requires at most N comparisons. The algorithm is said to require
"on the order" of N comparisons.
Searching algorithms
• An algorithm is a sequence of steps for accomplishing a task.
• Linear search is a search algorithm that starts from the beginning of a list, and checks each element until the
search key is found or the end of the list is reached.
• For a list with N elements, linear search thus requires at most N comparisons. The algorithm is said to require
"on the order" of N comparisons.
I want to know if my array contains some element
• Linear Search
• Binary search is a faster algorithm for searching a list if the list's elements are sorted and directly
accessible (such as an array). Binary search first checks the middle element of the list. If the search
key is found, the algorithm returns the matching location. If the search key is not found, the
algorithm repeats the search on the remaining left sublist (if the search key was less than the middle
element) or the remaining right sublist (if the search key was greater than the middle element).
• Binary search is incredibly efficient in finding an element within a sorted list. During each iteration or step of
the algorithm, binary search reduces the search space (i.e., the remaining elements to search within) by half.
The search terminates when the element is found or the search space is empty (element not found).
• For a 32 element list, if the search key is not found, the search space is halved to have 16 elements, then 8, 4,
2, 1, and finally none, requiring only 6 steps. For an N element list, the maximum number of steps required to
reduce the search space to an empty sublist is . Ex: .
Binary Search:
Cut the problem in half each time
1 2 3 4 12 33 67 99
12 33 67 99
If it’s here, which half is it in?
12 33
If it’s here, which half is it in?
Complexity of binary search
• Assume n is a power of 2 (i.e., you can keep dividing it in half until you get 1)
• Iteration 1: Array size n
• Iteration 2: Array size n/2
• Iteration 3: Array size (n/2)/2 = n /4
• …
• Iteration k (one element array):
• Array size n/2^k-1 = 1,
• n = 2^k-1,
• log(n) = log(2^k-1) Smart recursive algorithm is able to benefit
• log(n) = (k-1)log 2= k - 1. from sorted array. Simple linear search is
• log(n) = k - 1 not.
• Thus the number of iterations is log(n) + 1
• Each iteration takes constant time
• Binary search is O(log(n))