Radix Sort Sorting in The C++ STL: CS 311 Data Structures and Algorithms Lecture Slides

Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

Radix Sort

Sorting in the C++ STL

CS 311 Data Structures and Algorithms


Lecture Slides
Monday, March 16, 2009

Glenn G. Chappell
Department of Computer Science
University of Alaska Fairbanks
[email protected]

20052009 Glenn G. Chappell


Unit Overview
Algorithmic Efficiency & Sorting

Major Topics
 Introduction to Analysis of Algorithms
 Introduction to Sorting
 Comparison Sorts I
 More on Big-O
 The Limits of Sorting
 Divide-and-Conquer
 Comparison Sorts II
 Comparison Sorts III
Radix Sort
Sorting in the C++ STL

16 Mar 2009 CS 311 Spring 2009 2


Review
Introduction to Analysis of Algorithms

Efficiency
General: using few resources (time, space, bandwidth, etc.).
Specific: fast (time).
Analyzing Efficiency
Determine how the size of the input affects running time, measured in
steps, in the worst case.
Scalable
Works well with large problems.

Using Big-O In Words


Cannot read O(1) Constant time
all of input
O(log n) Logarithmic time
Faster
O(n) Linear time
O(n log n) Log-linear time Slower

Usually too O(n2) Quadratic time


slow to be O(bn), for some b > 1 Exponential time
scalable
16 Mar 2009 CS 311 Spring 2009 3
Review
Introduction to Sorting Basics, Analyzing

Sort: Place a list in order.


3 1 5 3 5 2
Key: The part of the data item used to sort.
Comparison sort: A sorting algorithm x
that gets its information by comparing compare x<y?
y
items in pairs.
A general-purpose comparison sort
1 2 3 3 5 5
places no restrictions on the size of the
list or the values in it.

Five criteria for analyzing a general-purpose comparison sort:


(Time) Efficiency In-place = no large additional
Requirements on Data storage space required
(constant additional space).
Space Efficiency
Stability 1. All items close to proper
Performance on Nearly Sorted Data places, OR
2. few items out of order.
16 Mar 2009 CS 311 Spring 2009 4
Review
Introduction to Sorting Overview of Algorithms

There is no known sorting algorithm that has all the properties we


would like one to have.
We will examine a number of sorting algorithms. Most of these fall
into two categories: O(n2) and O(n log n).
Quadratic-Time [O(n2)] Algorithms
 Bubble Sort
 Selection Sort
 Insertion Sort
 Quicksort
 Treesort (later in semester)
Log-Linear-Time [O(n log n)] Algorithms
  Merge Sort
(some)  Heap Sort (mostly later in semester)

  Introsort (not in the text)


Special Purpose Not Comparison Sorts
 Pigeonhole Sort
 Radix Sort

16 Mar 2009 CS 311 Spring 2009 5


Review
Comparison Sorts III Quicksort: Description

Quicksort is another divide-and-conquer


algorithm. Procedure:
Chooses a list item (the pivot). 3 1 5 3 5 2 3
Do a Partition: put items less than the Pivot
pivot before it, and items greater than Partition
the pivot after it.
Recursively sort two sublists: items 2 1 3 3 5 5 3
before pivot, items after pivot. Sort
Pivot
Sort
We did a simple pivot choice: the (recurse) (recurse)
first item. Later, we improve this.
1 2 3 3 3 5 5
Fast Partition algorithms are in-place, but
not stable.
Note: In-place Partition does not give us
an in-place Quicksort. Quicksort uses
memory for recursion.

16 Mar 2009 CS 311 Spring 2009 6


Review
Comparison Sorts III Quicksort: Improvements

Unoptimized Quicksort is slow (quadratic time) on nearly sorted


data and uses a lot of space (linear) for recursion.
Three Improvements Initial State:
Median-of-three pivot selection.
We 2 12 9 10 3 1 6
 Improves performance on most
wrote Pivot
nearly sorted data.
these After Partition:
two.  Requires random-access data.
Tail-recursion elimination on 2 1 3 6 10 9 12
larger recursive call.
 Reduces space usage to logarithmic.
Do not sort small sublists; finish with Insertion Sort.
 General speed up.
 May adversely affect cache hits.
With these optimizations, Quicksort is still O(n2) time.

16 Mar 2009 CS 311 Spring 2009 7


Review
Comparison Sorts III Quicksort: Analysis

Efficiency 
Quicksort is O(n2).
Quicksort has a very good O(n log n) average-case time.
Requirements on Data 
Non-trivial pivot-selection algorithms (median-of-three and others)
are only efficient for random-access data.
Space Usage 
Quicksort uses space for recursion.
 Additional space: O(log n), if you are clever about it.
 Even if all recursion is eliminated, O(log n) additional space is used.
 This additional space need not hold any data items.
Stability 
Efficient versions of Quicksort are not stable.
Performance on Nearly Sorted Data 
A non-optimized Quicksort is slow on nearly sorted data: O(n2).
Median-of-three Quicksort is O(n log n) on most nearly sorted data.

16 Mar 2009 CS 311 Spring 2009 8


Review
Comparison Sorts III Introsort: Description

In 1997, David Musser found out how to make Quicksort log-linear


time.
Keep track of the recursion depth.
If this exceeds some bound (recommended: 2 log2n), then switch to
Heap Sort for the current sublist.
 Heap Sort is a general-purpose comparison sort that is log-linear time
and in-place. We will discuss it in detail later in the semester.
Musser calls this technique introspection. Thus, introspective
sorting, or Introsort.

16 Mar 2009 CS 311 Spring 2009 9


Review
Comparison Sorts III Introsort: Diagram

Here is an illustration of how Introsort works.


In practice, the recursion will be deeper than this.
The Insertion-Sort call might not be done, due to its effect on cache hits.
Now, the list is
Introsort nearly sorted. Finish
with a (linear time!)
Introsort-recurse Insertion Sort [??].
Like Mo3 Quicksort:
Insertion Sort
Find Mo3 Pivot, Partition
Tail recursion
Introsort-recurse Introsort-recurse
eliminated. (But it
Like Mo3 Quicksort: Like Mo3 Quicksort: still counts toward
Find Mo3 Pivot, Partition Find Mo3 Pivot, Partition the maximum
recursion depth.)
Introsort-recurse Introsort-recurse
When the sublist to sort is Like Mo3 Quicksort: Like Mo3 Quicksort:
very small, do not recurse. Find Mo3 Pivot, Partition Find Mo3 Pivot, Partition
Insertion Sort will finish the
job later [??]. Introsort-recurse Introsort-recurse
Like Mo3 Quicksort: Like Mo3 Quicksort:
Find Mo3 Pivot, Partition Find Mo3 Pivot, Partition
Recursion Depth Limit
When the recursion depth is Heap Sort Heap Sort
too great, switch to Heap Sort
to sort the current sublist.

16 Mar 2009 CS 311 Spring 2009 10


Review
Comparison Sorts III Introsort: Analysis

Efficiency
Introsort is O(n log n).
Introsort also has an average-case time of O(n log n) [of course].
 Its average-case time is just as good as Quicksort.
Requirements on Data 
Introsort requires random-access data.
Space Usage 
Introsort uses space for recursion (or simulated recursion).
 Additional space: O(log n) even if all recursion is eliminated.
 This additional space need not hold any data items.
Stability 
Introsort is not stable.
Performance on Nearly Sorted Data 
Introsort is not significantly faster or slower on nearly sorted data.

16 Mar 2009 CS 311 Spring 2009 11


Radix Sort
Background

We have looked in detail at six general-purpose comparison sorts.


Now we look at two sorting algorithms that do not use a
comparison function:
Pigeonhole Sort.
Radix Sort.
Later in the semester, we will look closer at Heap Sort, which is a
general-purpose comparison sort, but which can also be
conveniently modified to handle other situations.

16 Mar 2009 CS 311 Spring 2009 12


Radix Sort
Preliminaries: Pigeonhole Sort Description

Suppose we have a list to sort, and:


Keys lie in a small fixed set of values. Not general-
Keys can be used to index an array. purpose
 E.g., they might be small-ish nonnegative integers. Not even a
Procedure comparison sort
Make an array of empty lists (buckets), one for each possible key.
Iterate through the given list; insert each item at the end of the
bucket corresponding to its value.
Copy items in each bucket, in order, back to the original list.
Time efficiency: linear time, if written properly.
How is this possible? Answer: We are not doing general-purpose
comparison sorting. Our (n log n) bound does not apply.
This algorithm is often called Pigeonhole Sort.
It is not very practical; it requires a very limited set of keys.
Pigeonhole Sort is stable, and uses linear additional space.

16 Mar 2009 CS 311 Spring 2009 13


Radix Sort
Preliminaries: Pigeonhole Sort Write It

TO DO
Write a function to do Pigeonhole Sort.

16 Mar 2009 CS 311 Spring 2009 14


Radix Sort
Description
Using Pigeonhole Sort, we can design a practical algorithm: Radix Sort.
Suppose we want to sort a list of strings (in some sense):
Character strings.
Numbers, considered as strings of digits.
Short-ish sequences of some other kind.
Call the entries in a string characters.
These need to be valid keys for Pigeonhole Sort.
In particular, we must be able to use them as array indices.
The algorithm will arrange the list in lexicographic order.
This means sort first by first character, then by second, etc.
For strings of letters, this is alphabetical order.
For positive integers (padded with leading zeroes), this is numerical order.
Radix Sort Procedure
Pigeonhole Sort the list using the last character as the key.
Take the list resulting from the previous step and Pigeonhole Sort it, using
the next-to-last character as the key.
Continue
After re-sorting by first character, the list is sorted.

16 Mar 2009 CS 311 Spring 2009 15


Radix Sort
Example

Here is the list to be sorted.


583 508 183 90 223 236 924 4 426 106 624
We first sort them by the units digit, using Nonempty buckets
Pigeonhole Sort. are underlined

90 583 183 223 924 4 624 236 426 106 508


Then Pigeonhole Sort again, based on the tens digit,
in a stable manner (note that the tens digit of 4 is 0).
4 106 508 223 924 624 426 236 583 183 90
Again, using the hundreds digit.
4 90 106 183 223 236 426 508 583 624 924
And now the list is sorted.

16 Mar 2009 CS 311 Spring 2009 16


Radix Sort
Write It, Comments

TO DO
Write Radix Sort for small-ish positive integers.

Comments
Radix Sort makes very strong assumptions about the values
in the list to be sorted.
It requires linear additional space.
It is stable.
It does not perform especially well or badly on nearly sorted
data.
Of course, what we really care about is speed. See the next
slide.

16 Mar 2009 CS 311 Spring 2009 17


Radix Sort
Efficiency [1/2]

How Fast is Radix Sort?


Fix the number of characters and the character set.
Then each sorting pass can be done in linear time.
 Pigeonhole Sort with one bucket for each possible character.
And there are a fixed number of passes.
Thus, Radix Sort is O(n): linear time.
How is this possible?
Radix Sort is a sorting algorithm. However, again, it is
neither general-purpose nor a comparison sort.
 It places restrictions on the values to be sorted: not general-
purpose.
 It gets information about values in ways other than making a
comparison: not a comparison sort.
Thus, our argument showing that (n log n) comparisons
were required in the worst case, does not apply.
16 Mar 2009 CS 311 Spring 2009 18
Radix Sort
Efficiency [2/2]

In practice, Radix Sort is not really as fast as it might seem.


There is a hidden logarithm. The number of passes required is equal
to the length of a string, which is something like the logarithm of
the number of possible values.
If we consider Radix Sort applied to a list in which all the values
might be different, then it is in the same efficiency class as normal
sorting algorithms.
However, in certain special cases (e.g., big lists of small numbers)
Radix Sort can be a useful technique.

100 million
records to sort by
ZIP code? Radix
Sort works well.

16 Mar 2009 CS 311 Spring 2009 19


Sorting in the C++ STL
Specifying the Interface
Iterator-based sorting functions can be specified two ways:
Given a range
 last is actually just past the end, as usual.

template<typename Iterator>
void sortIt(Iterator first, Iterator last);

Given a range and a comparison.

template<typename Iterator, typename Ordering>


void sortIt(Iterator first, Iterator last, Ordering compare);

compare, above, should be something you can use to compare two


values.
compare(val1, val2) should be a legal expression, and should return a
bool: true if val1 comes before val2 (think less-than).
So compare can be a function (passed as a function pointer).
It can also be an object with operator() defined: a function object.

16 Mar 2009 CS 311 Spring 2009 20


Sorting in the C++ STL
Overview of the Algorithms [1/4]

The C++ Standard Template Library has six sorting algorithms:


Global function std::sort
Global function std::stable_sort
Member function std::list<T>::sort
Global functions std::partial_sort and partial_sort_copy.
Combination of two global functions: std::make_heap &
std::sort_heap
We now look briefly at each of these.

16 Mar 2009 CS 311 Spring 2009 21


Sorting in the C++ STL
Overview of the Algorithms [2/4]

Function std::sort, in <algorithm>


Global function.
Takes two random-access iterators and an optional comparison.
O(n2), but has O(n log n) average-case. Not stable.
 This should become O(n log n) in the forthcoming revised C++ standard.
 It is currently O(n log n) in good STL implementations.
Algorithm used:
 Quicksort is what the standards committee was thinking.
 Introsort is what good implementations now use.
 Other algorithms (Heap Sort?) are possible, but unlikely.
Function std::stable_sort, in <algorithm>
Global function.
Takes two random-access iterators and an optional comparison.
O(n log n). Stable.
Algorithm used: probably Merge Sort, general sequence version.

16 Mar 2009 CS 311 Spring 2009 22


Sorting in the C++ STL
Overview of the Algorithms [3/4]

Function std::list<T>::sort, in <list>


Member function. Sorts only objects of type std::list<T>.
Takes either no parameters or a comparison.
O(n log n). Stable.
Algorithm used: probably Merge Sort, Linked-List version.

16 Mar 2009 CS 311 Spring 2009 23


Sorting in the C++ STL
Overview of the Algorithms [4/4]

We will look at the last two STL algorithms in more detail later in
the semester, when we cover Priority Queues and Heaps:
Functions std::partial_sort and std::partial_sort_copy, in
<algorithm>
 Global functions.
 Take three random-access iterators and an optional comparison.
 O(n log n). Not stable.
 Solve a more general problem than comparison sorting.
 Algorithm used: probably Heap Sort.
Combination: std::make_heap & std::sort_heap, in <algorithm>
 Both Global functions.
 Both take two random-access iterators and an optional comparison.
 Combination is O(n log n). Not stable.
 Solves a more general problem than comparison sorting.
 Algorithm used: almost certainly Heap Sort.

16 Mar 2009 CS 311 Spring 2009 24


Sorting in the C++ STL
Using the Algorithms [1/2]

Algorithm std::sort is declared in the header <algorithm>.


Call it with two iterators:

vector<int> v;
Default constructor call.
std::sort(v.begin(), v.end()); We can only pass an
// Ascending order object, not a type.

Or use two iterators and a comparison:

std::sort(v.begin(), v.end(), std::greater<int>());


// Descending order

Class template std::greater is defined in <functional>.


Use std::stable_sort similarly to std::sort.

16 Mar 2009 CS 311 Spring 2009 25


Sorting in the C++ STL
Using the Algorithms [2/2]

When sorting a std::list, use the sort member function:

#include <list>

std::list<int> myList;
myList.sort(); // Ascending order
myList.sort(std::greater<int>()); // Descending order

16 Mar 2009 CS 311 Spring 2009 26

You might also like