06 Lecture
06 Lecture
&
-
Last time: QuickSort and ⌦(n lg n) lower bound for
comparison-based sorting
Today: CountingSort, RadixSort, BucketSort
Posting lecture notes before lecture
Feedback (e.g., lectures, tutorials, homework, textbook)
- - &
CountingSort
A
Idea: sort {5, 3, 1, 2, 6, 4} by placing 1 or 0 in new array
- -
u
34Hs/
CountingSort
/
??
bowlsg
is
simG
01 23456
.
2257788
(A)
300 222335(B) =
CountingSort
C
Idea: sort {5, 3, 1, 2, 6, 4} by placing 1 or 0 in new array
Idea: sort {2, 5, 3, 0, 2, 3, 0, 2} by “counting” with new array
CountingSort depends on a key assumption: numbers to
-
be sorted are integers in {0, 1, . . . k}
Input: A[1 . . . n], where A[j] 2 {0, 1, . . . k} for j = 1, 2 . . . n
Array A and values n and k are given as parameters
Output: B[1 . . . n], sorted
Array B is assumed to be already allocated and is given as a
parameter
Auxiliary storage: C [0 . . . k]
CountingSort
CountingSort(A, B, n, k)
for i 0 to k
C [i] 0
for j 1 to n
bitha
C [A[j]] + +
for 1 1 to k
>
- C [i] C [i] + C [i 1]
for j n downto 1
- B[C [A[j]]] A[j]
C [A[j]]
5,3,1,2,6,4 -
#
2,5,3,0,2,3,0,2
out
Running Time and Space of CountingSort
CountingSort(A, B, n, k)
&
for i 0 to k
C [i] 0
- for j 1 to n
C [A[j]] + +
& for 1 1 to k
C [i] C [i] + C [i 1]
- for j n downto 1
B[C [A[j]]] A[j]
C [A[j]]
2
We need array C for counting of size k
= is
Runtime is clearly&
⇥(n + k)
if k run
me
Runtime is linear if k = O(n)
them
Stability
Definition
A sorting algorithm is stable if keys with the same values appear
in the same order in the output as in the input.
RadixSort(A, d)
for i 1 to d
stably sort A based on digit d
&
326
453
608
835
751
435
704
690
Correctness of RadixSort
-
RadixSort(A, d)
for i 1 to d
6
stably sort A based on digit d
-
⇥(d(n + k))
If you assume d and k are constants, then ⇥(n)
But wait, what do d and k really mean?
E.g., with decimal notation, k = 10 but is d a constant?
smaller bases (e.g., base-2) result in more digits
-
for i 1 to d ----
stably sort A based on digit d --
-
Let us represent the input in binary and keys have b bits
-
Break every key into “digits” by taking r -bits at a time
db/r e
Then d =&
What is the range of our “digits”? [0 . . . 2r 1]
G
⇥(d(n + k)) = .
⇥( br (n
+ 2 )) .
r
=
flipping coin, rolling dice, etc.)
Assume (WLOG) that elements are in the range [0, 1]
a
Drum
BucketSort(A)
for i 1 to n
ut
insert A[i] into list B[bnA[i]c]
for i 0 to n 1
sort list B[i] using InsertionSort
concatenate B[0], B[1], . . . B[n]
BucketSort
·
BucketSort(A)
for i 1 to n
insert A[i] into bucket B[bnA[i]c]
for i 0 to n 1
Zuran)
sort list B[i] using InsertionSort
concatenate B[0], B[1], . . . B[n]
What is a “bucket”?
·
BucketSort(A)
for i 1 to n
insert A[i] into bucket B[bnA[i]c]
for i 0 to n 1
sort list B[i] using InsertionSort
concatenate B[0], B[1], . . . B[n]
BucketSort(A)
-
an
for i 1 to n If
insert A[i] into bucket B[bnA[i]c]
·
for i 0 to n 1
sort list B[i] using InsertionSort
an concatenate B[0], B[1], . . . B[n]
-in
Efficient if no bucket gets too many values
All lines of algorithm except InsertionSort take O(n)
Intuitively, if each bucket gets a constant number of elements,
it takes O(1) time to sort each bucket, and so O(n) overall
We “expect” each bucket to have few elements, since the
-
average is 1 element per bucket
Best case? Worst case? Average case?
>
- -
BucketSort Summary
-
The average-case and worst-case runtime analysis is
probabilistic, rather than guaranteed
Those running-
times depend on the distribution of inputs
If the input isn’t drawn from a uniform distribution on [0, 1),
all bets are o↵ (performance-wise), but the algorithm is still
correct &