History of Bucket Sort
History of Bucket Sort
History of Bucket Sort
Bucket sort, or bin sort, is a sorting algorithm that works by distributing the elements of
an array into a number of buckets. Each bucket is then sorted individually, either using a different
sorting algorithm, or by recursively applying the bucket sorting algorithm. It is a distribution sort, a
generalization of pigeonhole sort, and is a cousin of radix sort in the most-to-least significant digit
flavor. Bucket sort can be implemented with comparisons and therefore can also be considered
a comparison sort algorithm. The computational complexity depends on the algorithm used to sort
each bucket, the number of buckets to use, and whether the input is uniformly distributed.
Optimizations[edit]
A common optimization is to put the unsorted elements of the buckets back in the original array first,
then run insertion sort over the complete array; because insertion sort's runtime is based on how far
each element is from its final position, the number of comparisons remains relatively small, and the
memory hierarchy is better exploited by storing the list contiguously in memory.[2]
In computer science, counting sort is an algorithm for sorting a collection of objects according to
keys that are small integers; that is, it is an integer sorting algorithm. It operates by counting the
number of objects that have each distinct key value, and using arithmetic on those counts to
determine the positions of each key value in the output sequence. Its running time is linear in the
number of items and the difference between the maximum and minimum key values, so it is only
suitable for direct use in situations where the variation in keys is not significantly greater than the
number of items. However, it is often used as a subroutine in another sorting algorithm, radix sort,
that can handle larger keys more efficiently.[1][2][3]
Because counting sort uses key values as indexes into an array, it is not a comparison sort, and
the Ω(n log n) lower bound for comparison sorting does not apply to it.[1] Bucket sort may be used for
many of the same tasks as counting sort, with a similar time analysis; however, compared to
counting sort, bucket sort requires linked lists, dynamic arrays or a large amount of preallocated
memory to hold the sets of items within each bucket, whereas counting sort instead stores a single
number (the count of items) per
History[edit]
Although radix sorting itself dates back far longer, counting sort, and its application to radix sorting,
were both invented by Harold H. Seward in 1954.[1][4][8]
Radix Sort
History[edit]
Radix sort dates back as far as 1887 to the work of Herman Hollerith on tabulating machines.[1] Radix
sorting algorithms came into common use as a way to sort punched cards as early as 1923[2]
The first memory-efficient computer algorithm was developed in 1954 at MIT by Harold H. Seward.
Computerized radix sorts had previously been dismissed as impractical because of the perceived
need for variable allocation of buckets of unknown size. Seward's innovation was to use a linear
scan to determine the required bucket sizes and offsets beforehand, allowing for a single static
allocation of auxiliary memory. The linear scan is closely related to Seward's other algorithm
— counting sort.
In the modern era, radix sorts are most commonly applied to collections of
binary strings and integers. It has been shown in some benchmarks to be faster than other more
general purpose sorting algorithms, sometimes 50% to 3x as fast [3][4][5].
Digit Order[edit]
Radix sorts can be implemented to start at either the most significant digit (MSD) or least significant
digit (LSD). For example, with 1234, one could start with 1 (MSD) or 4 (LSD).
QuickSort
Quicksort is a comparison sort, meaning that it can sort items of any type for which a "less-than"
relation (formally, a total order) is defined. Efficient implementations of Quicksort are not a stable
sort, meaning that the relative order of equal sort items is not preserved. Quicksort can operate in-
place on an array, requiring small additional amounts of memory to perform the sorting. It is very
similar to selection sort, except that it does not always choose worst-case partition.
The quicksort algorithm was developed in 1959 by Tony Hoare while in the Soviet Union, as a
visiting student at Moscow State University. At that time, Hoare worked on a project on machine
translation for the National Physical Laboratory. As a part of the translation process, he needed to
sort the words in Russian sentences prior to looking them up in a Russian-English dictionary that
was already sorted in alphabetic order on magnetic tape.[4] After recognizing that his first
idea, insertion sort, would be slow, he quickly came up with a new idea that was Quicksort. He wrote
a program in Mercury Autocode for the partition but could not write the program to account for the list
of unsorted segments. On return to England, he was asked to write code for Shellsort as part of his
new job. Hoare mentioned to his boss that he knew of a faster algorithm and his boss bet sixpence
that he did not. His boss ultimately accepted that he had lost the bet. Later, Hoare learned
about ALGOL and its ability to do recursion that enabled him to publish the code in Communications
of the Association for Computing Machinery, the premier computer science journal of the time.[2][5]
Quicksort is a space-optimized version of the binary tree sort. Instead of inserting items sequentially
into an explicit tree, quicksort organizes them concurrently into a tree that is implied by the recursive
calls.
The most direct competitor of quicksort is heapsort. Heapsort's running time is O(n log n), but
heapsort's average running time is usually considered slower than in-place quicksort. This result is
debatable; some publications indicate the opposite.[28][29] Introsort is a variant of quicksort that
switches to heapsort when a bad case is detected to avoid quicksort's worst-case running time.
Quicksort also competes with merge sort, another O(n log n) sorting algorithm. Mergesort is a stable
sort, unlike standard in-place quicksort and heapsort, and can be easily adapted to operate on linked
lists and very large lists stored on slow-to-access media such as disk storage or network-attached
storage.
Bucket sort with two buckets is very similar to quicksort; the pivot in this case is effectively the value
in the middle of the value range, which does well on average for uniformly distributed inputs.
Generalization[edit]
Richard Cole and David C. Kandathil, in 2004, discovered a one-parameter family of sorting
algorithms, called partition sorts, which on average (with all input orderings equally likely) perform at
most comparisons (close to the information theoretic lower bound) and operations; at
worst they perform comparisons (and also operations); these are in-place, requiring only
additional space. Practical efficiency and smaller variance in performance were demonstrated
against optimised quicksorts (of Sedgewick and Bentley-McIlroy).[34]