0% found this document useful (0 votes)
3 views19 pages

06 Lecture

The document outlines the assignment of Homework 3 due on May 19 and discusses various sorting algorithms including CountingSort, RadixSort, and BucketSort. It explains the concepts, correctness, and runtime of these algorithms, emphasizing their efficiency and stability. Additionally, it highlights how these sorting methods can escape the Ω(n log n) lower bound for sorting through non-comparison-based techniques.

Uploaded by

liaoshuyun123456
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views19 pages

06 Lecture

The document outlines the assignment of Homework 3 due on May 19 and discusses various sorting algorithms including CountingSort, RadixSort, and BucketSort. It explains the concepts, correctness, and runtime of these algorithms, emphasizing their efficiency and stability. Additionally, it highlights how these sorting methods can escape the Ω(n log n) lower bound for sorting through non-comparison-based techniques.

Uploaded by

liaoshuyun123456
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Administrivia

&

HW3 assigned today; due Monday May 19 at 9:30am


Optional Midterm Exam: Friday, May 23
Reading: Chapters 8, 9 (Linear time Sorting, Medians and
-
Order statistics)

-
Last time: QuickSort and ⌦(n lg n) lower bound for
comparison-based sorting
Today: CountingSort, RadixSort, BucketSort
Posting lecture notes before lecture
Feedback (e.g., lectures, tutorials, homework, textbook)
- - &
CountingSort

A
Idea: sort {5, 3, 1, 2, 6, 4} by placing 1 or 0 in new array
- -

u
34Hs/
CountingSort

Idea: sort {5, 3, 1, 2, 6, 4} by placing 1 or 0 in new array


Idea: sort {2, 5, 3, 0, 2, 3, 0, 2} by “counting” with new array

/
??
bowlsg
is

simG
01 23456
.
2257788
(A)
300 222335(B) =
CountingSort

C
Idea: sort {5, 3, 1, 2, 6, 4} by placing 1 or 0 in new array
Idea: sort {2, 5, 3, 0, 2, 3, 0, 2} by “counting” with new array
CountingSort depends on a key assumption: numbers to
-
be sorted are integers in {0, 1, . . . k}
Input: A[1 . . . n], where A[j] 2 {0, 1, . . . k} for j = 1, 2 . . . n
Array A and values n and k are given as parameters
Output: B[1 . . . n], sorted
Array B is assumed to be already allocated and is given as a
parameter
Auxiliary storage: C [0 . . . k]
CountingSort

CountingSort(A, B, n, k)
for i 0 to k
C [i] 0
for j 1 to n

bitha
C [A[j]] + +
for 1 1 to k
>
- C [i] C [i] + C [i 1]
for j n downto 1
- B[C [A[j]]] A[j]
C [A[j]]

5,3,1,2,6,4 -
#
2,5,3,0,2,3,0,2
out
Running Time and Space of CountingSort

CountingSort(A, B, n, k)
&
for i 0 to k
C [i] 0
- for j 1 to n
C [A[j]] + +
& for 1 1 to k
C [i] C [i] + C [i 1]
- for j n downto 1
B[C [A[j]]] A[j]
C [A[j]]

In addition to array A we need another of same size B: O(n)

2
We need array C for counting of size k
= is
Runtime is clearly&
⇥(n + k)
if k run
me
Runtime is linear if k = O(n)
them
Stability

G CountingSort is stable (unlike some other sorting algorithms)

Definition
A sorting algorithm is stable if keys with the same values appear
in the same order in the output as in the input.

Why do we care about this?


RadixSort
Used in tabulation machines in 1880s
Used in punch card readers in 1900’s
Punch card sorters, worked on one column at a time
IBM sold many such “business machines”

RadixSort(A, d)
for i 1 to d
stably sort A based on digit d

&
326
453
608
835
751
435
704
690
Correctness of RadixSort

Induction on number of passes, starting with least significant


digit (i in pseudocode)
Assume digits 1, 2, . . . i 1 are sorted
Show that a stable sort on digit& i leaves digits 1, . . . i sorted
1 If 2 digits in position i are di↵erent, ordering by position i is
correct, and positions 1, . . . i 1 are irrelevant
2 If 2 digits in position i are equal, numbers are already in the

b right order (by inductive hypothesis). The stable sort on digit i


leaves them in the right order.

-This argument shows why it is so important to use a stable sort for


intermediate sort.
What happens if we start with most significant digit?
What if not all elements have the same number of digits?
use padding -
023
437
Running Time of RadixSort

-
RadixSort(A, d)
for i 1 to d

6
stably sort A based on digit d

Let us use CountingSort as the intermediate sort


⇥(n + k) work per digit, where digits are in range [0, k]
-
=

A total of d intermediate sorts


=

-
⇥(d(n + k))
If you assume d and k are constants, then ⇥(n)
But wait, what do d and k really mean?
E.g., with decimal notation, k = 10 but is d a constant?
smaller bases (e.g., base-2) result in more digits
-

Running Time of RadixSort


RadixSort(A, d) Lucollo
---

for i 1 to d ----
stably sort A based on digit d --

-
Let us represent the input in binary and keys have  b bits

-
Break every key into “digits” by taking r -bits at a time
db/r e
Then d =&
What is the range of our “digits”? [0 . . . 2r 1]

G
⇥(d(n + k)) = .
⇥( br (n
+ 2 )) .
r

How do we make this as small as possible?


Smaller range of digits result in more digits and vice versa
Balance 00bn
and
r
b2r
r
o
Setting r = lg n results in bn
lg n +
= + b2lg n
lg n
bn
lg n
bn
lg n
Representing a decimal number in binary incurs a lg n blowup
0
= ⇥( lgbnn )

b = lg n and ⇥( lgbnn ) = ⇥(n)


in digits, so &
RadixSort Summary

How does RadixSort escape the ⌦(n lg n) lower bound for


sorting?
RadixSort Summary

How does RadixSort escape the ⌦(n lg n) lower bound for


sorting?
It uses CountingSort, which uses keys as indices
RadixSort can outperform other sorting algorithms
But note that it does not sort in-place
Requires even more additional space than MergeSort
BucketSort

Idea: break into groups by size (e.g., S,M,L,XL, etc.), sort


each group, list in order
Assume that the input is generated by a random process that
-
distributes elements uniformly and independently (e.g.,

=
flipping coin, rolling dice, etc.)
Assume (WLOG) that elements are in the range [0, 1]

a
Drum
BucketSort(A)
for i 1 to n
ut
insert A[i] into list B[bnA[i]c]
for i 0 to n 1
sort list B[i] using InsertionSort
concatenate B[0], B[1], . . . B[n]
BucketSort

·
BucketSort(A)
for i 1 to n
insert A[i] into bucket B[bnA[i]c]
for i 0 to n 1
Zuran)
sort list B[i] using InsertionSort
concatenate B[0], B[1], . . . B[n]

How many buckets?

What is a “bucket”?

Where is the work of


sorting done?
Correctness of BucketSort

·
BucketSort(A)
for i 1 to n
insert A[i] into bucket B[bnA[i]c]
for i 0 to n 1
sort list B[i] using InsertionSort
concatenate B[0], B[1], . . . B[n]

Consider elements A[i] and A[j]


Assume WLOG that A[i]  A[j]
-
Then bnA[i]c  bnA[j]c S
Then A[i] is either placed in the same bucket as A[j] or in a
bucket with lower index
In both cases A[i] appears before A[j] in the output:
If A[i] and A[j] are in same bucket, InsertionSort fixes up
If A[i] is bucket with lower index, concatenation of lists fixes up
Runtime of BucketSort
I

BucketSort(A)
-
an
for i 1 to n If
insert A[i] into bucket B[bnA[i]c]

·
for i 0 to n 1
sort list B[i] using InsertionSort
an concatenate B[0], B[1], . . . B[n]

-in
Efficient if no bucket gets too many values
All lines of algorithm except InsertionSort take O(n)
Intuitively, if each bucket gets a constant number of elements,
it takes O(1) time to sort each bucket, and so O(n) overall
We “expect” each bucket to have few elements, since the
-
average is 1 element per bucket
Best case? Worst case? Average case?
>
- -
BucketSort Summary

How does BucketSort escape the ⌦(n lg n) lower bound for


sorting?
BucketSort Summary

How does BucketSort escape the ⌦(n lg n) lower bound for


sorting?
Again, not a comparison sort
Uses a function of key values to index into an array

-
The average-case and worst-case runtime analysis is
probabilistic, rather than guaranteed
Those running-
times depend on the distribution of inputs
If the input isn’t drawn from a uniform distribution on [0, 1),
all bets are o↵ (performance-wise), but the algorithm is still
correct &

You might also like