Sorting Algorithm: From Wikipedia, The Free Encyclopedia
Sorting Algorithm: From Wikipedia, The Free Encyclopedia
Since the dawn of computing, the sorting problem has attracted a great deal of research,
perhaps due to the complexity of solving it efficiently despite its simple, familiar
statement. For example, bubble sort was analyzed as early as 1956.[1] Although many
consider it a solved problem, useful new sorting algorithms are still being invented (for
example, library sort was first published in 2004). Sorting algorithms are prevalent in
introductory computer science classes, where the abundance of algorithms for the
problem provides a gentle introduction to a variety of core algorithm concepts, such as
big O notation, divide and conquer algorithms, data structures, randomized algorithms,
best, worst and average case analysis, time-space tradeoffs, and lower bounds.
Contents
[hide]
1 Classification
o 1.1 Stability
2 Comparison of algorithms
3 Inefficient/humorous sorts
4 Summaries of popular sorting algorithms
o 4.1 Bubble sort
o 4.2 Insertion sort
o 4.3 Shell sort
o 4.4 Merge sort
o 4.5 Heapsort
o 4.6 Quicksort
o 4.7 Counting Sort
o 4.8 Bucket sort
o 4.9 Radix sort
o 4.10 Distribution sort
5 Memory usage patterns and index sorting
6 See also
7 References
8 External links
[edit] Classification
Sorting algorithms used in computer science are often classified by:
[edit] Stability
Stable sorting algorithms maintain the relative order of records with equal keys. If all
keys are different then this distinction is not necessary. But if there are equal keys, then a
sorting algorithm is stable if whenever there are two records (let's say R and S) with the
same key, and R appears before S in the original list, then R will always appear before S
in the sorted list. When equal elements are indistinguishable, such as with integers, or
more generally, any data where the entire element is the key, stability is not an issue.
However, assume that the following pairs of numbers are to be sorted by their first
component:
(4, 2) (3, 7) (3, 1) (5, 6)
In this case, two different results are possible, one which maintains the relative order of
records with equal keys, and one which does not:
Unstable sorting algorithms may change the relative order of records with equal keys, but
stable sorting algorithms never do so. Unstable sorting algorithms can be specially
implemented to be stable. One way of doing this is to artificially extend the key
comparison, so that comparisons between two objects with otherwise equal keys are
decided using the order of the entries in the original data order as a tie-breaker.
Remembering this order, however, often involves an additional computational cost.
Sorting based on a primary, secondary, tertiary, etc. sort key can be done by any sorting
method, taking all sort keys into account in comparisons (in other words, using a single
composite sort key). If a sorting method is stable, it is also possible to sort multiple times,
each time with one sort key. In that case the keys need to be applied in order of increasing
priority.
The following table describes sorting algorithms that are not comparison sorts. As such,
they are not limited by a lower bound. Complexities below are in terms of n,
the number of items to be sorted, k, the size of each key, and s, the chunk size used by the
implementation. Many of them are based on the assumption that the key size is large
enough that all entries have unique key values, and hence that n << 2k, where << means
"much less than."
Stable n <<
Name Average Worst Memory Notes
2k
Pigeonhole
Yes Yes
sort
Assumes
uniform
distribution
Bucket
Yes No of elements
sort
from the
domain in
the array.
Counting
Yes Yes
sort
LSD
Yes No
Radix sort
MSD
No No
Radix sort
Asymptotic
s are based
on the
assumption
Spreadsort No No that n << 2k,
but the
algorithm
does not
require this.
The following table describes some sorting algorithms that are impractical for real-life
use due to extremely poor performance or a requirement for specialized hardware.
Additionally, theoretical computer scientists have detailed other sorting algorithms that
provide better than time complexity with additional constraints, including:
A bubble sort, a sorting algorithm that continuously steps through a list, swapping items
until they appear in the correct order.
Main article: Bubble sort
Bubble sort is a straightforward and simplistic method of sorting data that is used in
computer science education. The algorithm starts at the beginning of the data set. It
compares the first two elements, and if the first is greater than the second, it swaps them.
It continues doing this for each pair of adjacent elements to the end of the data set. It then
starts again with the first two elements, repeating until no swaps have occurred on the last
pass. This algorithm is highly inefficient, and is rarely used[citation needed], except as a
simplistic example. For example, if we have 100 elements then the total number of
comparisons will be 10000. A slightly better variant, cocktail sort, works by inverting the
ordering criteria and the pass direction on alternating passes. The modified Bubble sort
will stop 1 shorter each time through the loop, so the total number of comparisons for 100
elements will be 4950.
Bubble sort average case and worst case are both O(n).
Insertion sort is a simple sorting algorithm that is relatively efficient for small lists and
mostly-sorted lists, and often is used as part of more sophisticated algorithms. It works by
taking elements from the list one by one and inserting them in their correct position into a
new sorted list. In arrays, the new list and the remaining elements can share the array's
space, but insertion is expensive, requiring shifting all following elements over by one.
Shell sort (see below) is a variant of insertion sort that is more efficient for larger lists.
[edit] Shell sort
Shell sort was invented by Donald Shell in 1959. It improves upon bubble sort and
insertion sort by moving out of order elements more than one position at a time. One
implementation can be described as arranging the data sequence in a two-dimensional
array and then sorting the columns of the array using insertion sort. Although this method
is inefficient for large data sets, it is one of the fastest algorithms for sorting small
numbers of elements.
Merge sort takes advantage of the ease of merging already sorted lists into a new sorted
list. It starts by comparing every two elements (i.e., 1 with 2, then 3 with 4...) and
swapping them if the first should come after the second. It then merges each of the
resulting lists of two into lists of four, then merges those lists of four, and so on; until at
last two lists are merged into the final sorted list. Of the algorithms described here, this is
the first that scales well to very large lists, because its worst-case running time is O(n log
n). Merge sort has seen a relatively recent surge in popularity for practical
implementations, being used for the standard sort routine in the programming languages
Perl[5], Python (as timsort[6]), and Java (also uses timsort as of JDK7[7]), among others.
[edit] Heapsort
Heapsort is a much more efficient version of selection sort. It also works by determining
the largest (or smallest) element of the list, placing that at the end (or beginning) of the
list, then continuing with the rest of the list, but accomplishes this task efficiently by
using a data structure called a heap, a special type of binary tree. Once the data list has
been made into a heap, the root node is guaranteed to be the largest(or smallest) element.
When it is removed and placed at the end of the list, the heap is rearranged so the largest
element remaining moves to the root. Using the heap, finding the next largest element
takes O(log n) time, instead of O(n) for a linear scan as in simple selection sort. This
allows Heapsort to run in O(n log n) time.
[edit] Quicksort
Counting sort is applicable when each input is known to belong to a particular set, S, of
possibilities. The algorithm runs in O(|S| + n) time and O(|S|) memory where n is the
length of the input. It works by creating an integer array of size |S| and using the ith bin to
count the occurrences of the ith member of S in the input. Each input is then counted by
incrementing the value of its corresponding bin. Afterward, the counting array is looped
through to arrange all of the inputs in order. This sorting algorithm cannot often be used
because S needs to be reasonably small for it to be efficient, but the algorithm is
extremely fast and demonstrates great asymptotic behavior as n increases. It also can be
modified to provide stable behavior.
Bucket sort is a divide and conquer sorting algorithm that generalizes Counting sort by
partitioning an array into a finite number of buckets. Each bucket is then sorted
individually, either using a different sorting algorithm, or by recursively applying the
bucket sorting algorithm. Thus this is most effective on data whose values are limited
(e.g. a sort of a million integers ranging from 1 to 1000). A variation of this method called
the single buffered count sort is faster than quicksort and takes about the same time to run
on any set of data.
Radix sort is an algorithm that sorts a list of fixed-size numbers of length k in O(n k)
time by treating them as bit strings. We first sort the list by the least significant bit while
preserving their relative order using a stable sort. Then we sort them by the next bit, and
so on from right to left, and the list will end up sorted. Most often, the counting sort
algorithm is used to accomplish the bitwise sorting, since the number of values a bit can
have is minimal - only '1' or '0'.
[edit] Distribution sort
Distribution sort refers to any sorting algorithm where data is distributed from its input to
multiple intermediate structures which are then gathered and placed on the output. See
Bucket sort.
For example, the popular recursive quicksort algorithm provides quite reasonable
performance with adequate RAM, but due to the recursive way that it copies portions of
the array it becomes much less practical when the array does not fit in RAM, because it
may cause a number of slow copy or move operations to and from disk. In that scenario,
another algorithm may be preferable even if it requires more total comparisons.
One way to work around this problem, which works well when complex records (such as
in a relational database) are being sorted by a relatively small key field, is to create an
index into the array and then sort the index, rather than the entire array. (A sorted version
of the entire array can then be produced with one pass, reading from the index, but often
even that is unnecessary, as having the sorted index is adequate.) Because the index is
much smaller than the entire array, it may fit easily in memory where the entire array
would not, effectively eliminating the disk-swapping problem. This procedure is
sometimes called "tag sort".[8]
Techniques can also be combined. For sorting very large sets of data that vastly exceed
system memory, even the index may need to be sorted using an algorithm or combination
of algorithms designed to perform reasonably with virtual memory, i.e., to reduce the
amount of swapping required.
[edit] References
This article includes a list of references or external links, but its sources remain
unclear because it has insufficient inline citations. Please help to improve this
article by introducing more precise citations where appropriate. (September 2009)
1. ^ Demuth, H. Electronic Data Sorting. PhD thesis, Stanford University, 1956.
2. ^ Y. Han. Deterministic sorting in time and linear space.
Proceedings of the thiry-fourth annual ACM symposium on Theory of computing,
Montreal, Quebec, Canada, 2002,p.602-608.
3. ^ M. Thorup. Randomized Sorting in Time and Linear Space
Using Addition, Shift, and Bit-wise Boolean Operations. Journal of Algorithms,
Volume 42, Number 2, February 2002 , pp. 205-230(26)
[hide]
vde
Sorting algorithms
Ineffective/humorous
Bogosort | Stooge sort
sorts
Retrieved from "http://en.wikipedia.org/wiki/Sorting_algorithm"
Categories: Sorting algorithms
Hidden categories: All articles with unsourced statements | Articles with unsourced
statements from January 2010 | Articles lacking in-text citations from September 2009 |
All articles lacking in-text citations
Views
Article
Discussion
Edit this page
History
Personal tools
Try Beta
Log in / create account
Navigation
Main page
Contents
Featured content
Current events
Random article
Search
Go Search
Interaction
About Wikipedia
Community portal
Recent changes
Contact Wikipedia
Donate to Wikipedia
Help
Toolbox
Languages
Catal
esky
Dansk
Deutsch
Espaol
Franais
slenska
Italiano
Kurd /
Ltzebuergesch
Lietuvi
Magyar
Nederlands
Norsk (bokml)
Polski
Portugus
Slovenina
Slovenina
Suomi
Svenska
Trke
Ting Vit