Sorting
Sorting
Sorting Algorithms
Another name for the lecture is Google II. Sorting is a great topic in CS: relatively simple extremely important illustrates lots of different algorithms and analysis techniques Theres more than one way to skin a cat.
Google...
Last time, I said Google does its thing in a couple of very signicant steps: I. Collect pages from the web (graph search). II. Index them. III.Respond to queries.
Selection Sort
Idea is quite simple. We go through the list one item at a time. We keep track of the smallest item weve found. When were through the list, we pull the smallest item out and add it to a list of sorted items. We repeat until all the items have been removed.
Code
def Selection(l): sorted = [] while len(l) > 0: (smallest,rest) = ndSmallest(l) sorted = sorted + [smallest] l = rest return sorted def ndSmallest(l): smallest = l[0] rest = [] for i in range(1,len(l)): if l[i] < smallest: rest = rest + [smallest] smallest = l[i] else: rest = rest + [l[i]] return (smallest, rest)
Guess Who?
Each player picks a character. Players take turns asking each other yes/ no questions. First player to uniquely identify the other players character wins!
Cross-Hatched?
Squiggle?
Insight
Each question splits the remaining set of possibilities into two subsets (yes and no). We want to pick a question so that the larger of the two subsets is as small as possible. Half! How many questions? n=1, questions = 0 n=2, questions = 1 n=4, questions = 2 n=8, questions = 3 n=16, questions = 4 n, questions = lg n.
Binary Search
Lets say we have a sorted list of n items. How many comparisons do we need to make to nd where a new item belongs in the list? Can start at the bottom and compare until the new item is bigger. Maximum number of comparisons? One for each position: n. We can ask better questions: bigger than the halfway mark? That gets us: lg (n+1)!
abcdefg
Quicksort
quicksort: Another sorting algorithm. Idea: Break the list of n+1 elements into the median and two lists of n/2. The two lists are those smaller than the median and those larger than the median. Sort the two lists separately. Glue them together: All n are sorted.
Quicksort Example
Original list: [56, 80, 66, 64, 37, 36, 91, 48, 17, 20, 86, 89, 41, 1, 96, 12, 74] Median is 56; smaller: [37, 36, 48, 17, 20, 41, 1, 12] bigger: [80, 66, 64, 91, 86, 89, 96, 74] Sort each; smaller: [1, 12, 17, 20, 36, 37, 41, 48] bigger: [64, 66, 74, 80, 86, 89, 91, 96] Glue: [1, 12, 17, 20, 36, 37, 41, 48, 56, 64, 66, 74, 80, 86, 89, 91, 96]
But...
If we could nd the median, the whole sorting process would be pretty easy. Sufcient to split anywhere in the middle half at least half the time: Still O(n log n). Pick a random list element. 25% of the time, it will be in the 1st quarter of the sorted list, 25% of the time in the last quarter, and 50% in the middle half.
Quicksorts Flow
Pick an item, any item (the pivot). Partition the list as to less (left) or greater than (right) pivot. Sort the two halves (recursively).
Code
def Quicksort(l): if len(l) <= 1: return l pivot = l[randint(0,len(l)-1)] (left,equal,right) = partition(l,pivot) return Quicksort(left) + equal + Quicksort(right) def partition(l,pivot): left = [] right = [] equal = [] for item in l: if item < pivot: left = left + [item] if item > pivot: right = right + [item] if item == pivot: equal = equal + [item] return (left,equal,right)
Merge Sort
View all the items as separate sorted lists. Pick the two shortest lists and combine them into a single sorted list: Compare the rst items. Move smaller one to end of the combined list. Repeat until one list is empty. Repeat until only a single list is left.
Lower Bound
Weve shown that we can sort in O(N lg N) comparisons. What if someone comes along and does it better? We need to protect ourselves and prove a lower bound: that is, to show that nothing less than N lg N will sufce. Lets return to Guess Who?.
Counting Orderings
How many ways to order N elements? 1: 1 2: 2 3: 6 = 3 x 2 4: 24 = 4 x 3 x 2 5: 120 = 5 x 4 x 3 x 2 N: N! = N x (N-1) x (N-2) x ... x 2 x 1 Known as the factorial function. Thus, sorting must nd the unique sorted ordering from a set of N! possibilities using just yes/no questions.
A Little Math
N! = 1 x 2 x 3 x ... x N/2 x (N/2 + 1) x ... x N > (N/2 + 1) x ... x N > N/2 x ... x N/2 = N/2 N/2 Number of comparisons to sort N items > # of yes/no questions to pick one out of N! > # of yes/no questions to pick one out of N/2 > lg N/2 N/2 = N/2 lg N/2 or, essentially N lg N. O(N lg N) wins!
N/2