Sorting Algorithms Sorting Algorithms
Sorting Algorithms Sorting Algorithms
•Unsorted array
A[0] A[n-2] A[n-1]
Can the binary search be used to improve efficiency? There is no right shifting of the elements in the sorted array
Insertion Sort O(n2) sorting algorithms
A[0] A[n-2] A[n-1]
• Selection sort and Insertion sort are both O(n2)
• T(n), time to run insertion sort on length n
• Time T(n-1) to sort segment A[0] to A[n-2] by recursion
• O(n2) sorting is infeasible for n over 100000
• (n-1) steps to insert A[n-1] in sorted segment
Divide-and-conquer Divide-and-conquer
• You should think of a divide-and-conquer algorithm as having
• Both merge sort and quicksort employ a common algorithmic three parts:
paradigm based on recursion.
• This paradigm, divide-and-conquer,
• breaks a problem into sub-problems that are similar to the original problem, • Divide the problem into a number of sub-problems that are
• recursively solves the sub-problems, and smaller instances of the same problem.
• finally combines the solutions to the sub-problems to solve the original
problem.
• Conquer the sub-problems by solving them recursively. If they are
• Because divide-and-conquer solves sub-problems recursively, small enough, solve the sub-problems as base cases.
• each sub-problem must be smaller than the original problem, and
• there must be a base case for sub-problems.
• Combine the solutions to the sub-problems into the solution for
the original problem.
Divide-and-conquer
Merge Sort
43 32 22 78 63 57 91 13
43 32 22 78 63 57 91 13
43 32 22 78 63 57 91 13
43 32 22 78 63 57 91 13
43 32 22 78 63 57 91 13
• T(n) = n(1+log2(n)) = O(n log2n)
Quick Sort
Quick Sort
Satellite Data
• In practice, the numbers to be sorted are rarely isolated values. Each is usually part
Satellite Data
of a collection of data called a record.
• Each record contains a key, which is the value to be sorted, and the remainder of • To keep things simple, we assume, as we have for binary search trees and red-black
the record consists of satellite data, which are usually carried around with the key. trees, that any satellite information associated with a key is stored in the same
node as the key.
• In practice, when a sorting algorithm permutes the keys, it must permute the
satellite data as well. • In practice, one might actually store with each key just a pointer to another disk
page containing the satellite information for that key.
• If each record includes a large amount of satellite data, we often permute an array
of pointers to the records rather than the records themselves in order to minimize • They implicitly assumes that the satellite information associated with a key, or the
data movement. pointer to such satellite information, travels with the key whenever the key is moved
from node to node.
• For example:
• When you sort, you can't break the structure. If you have a collection of people with
the attributes of a name, a current address, a social security number, and an age and Significance:
you want to sort them by age, you can't change the association between the four There's a few reasons why stability can be important. One is that, if two records
fields. You just need to sort them within the structure based on the value of a field. don't need to be swapped by swapping them you can cause a memory update, a page is
In this example, age is the key and the satellite data is the name, address, and social marked dirty, and needs to be re-written to disk (or another slow medium).
security number.
Radix Sort Radix Sort
• Radix sort solves the problem of card sortingby sorting on the least
significant digit first.
• The algorithm then combines the cards into a single deck, with the
cards in the 0 bin preceding the cards in the 1 bin preceding the cards
in the 2 bin, and so on.
• Then it sorts the entire deck again on the second-least significant digit
and recombines the deck in a like manner.
• The process continues until the cards have been sorted on all d digits.
Analysis:
Remarkably, at that point the cards are fully sorted on the d-digit
number.
• As a simple example, let us determine the expected number of heads that we obtain when
flipping a fair coin. Our sample space is S = {H, T}, and we define a random variable Y • Thus the expected number of heads obtained by one flip of a fair coin is 1/2. As the following
which takes on the values H and T, each with probability 1/2. We can then define an indicator lemma shows, the expected value of an indicator random variable associated with an event A
random variable XH, associated with the coin coming up heads, which we can express as the is equal to the probability that A occurs.
event Y = H. This variable counts the number of heads obtained in this flip, and it is 1 if the
coin comes up heads and 0 otherwise. We write
Bucket Sort: analysis Exercise
• Using this expected value, we conclude that the average-case running time
for bucket sort is linear. • Illustrate the operation of BUCKET-SORT on the array
A = <.79; .13; .16; .64; .39; .20; .89; .53; .71; .42>
• Even if the input is not drawn from a uniform distribution, bucket sort may
still run in linear time.
• As long as the input has the property that the sum of the squares of the
bucket sizes is linear in the total number of elements,
• by linearity of expectation bucket sort will run in linear time.