History of Bucket Sort

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 5

History Of Bucket Sort

Bucket sort, or bin sort, is a sorting algorithm that works by distributing the elements of
an array into a number of buckets. Each bucket is then sorted individually, either using a different
sorting algorithm, or by recursively applying the bucket sorting algorithm. It is a distribution sort, a
generalization of pigeonhole sort, and is a cousin of radix sort in the most-to-least significant digit
flavor. Bucket sort can be implemented with comparisons and therefore can also be considered
a comparison sort algorithm. The computational complexity depends on the algorithm used to sort
each bucket, the number of buckets to use, and whether the input is uniformly distributed.

Optimizations[edit]
A common optimization is to put the unsorted elements of the buckets back in the original array first,
then run insertion sort over the complete array; because insertion sort's runtime is based on how far
each element is from its final position, the number of comparisons remains relatively small, and the
memory hierarchy is better exploited by storing the list contiguously in memory.[2]

Comparison with other sorting algorithms[edit]


Bucket sort can be seen as a generalization of counting sort; in fact, if each bucket has size 1 then
bucket sort degenerates to counting sort. The variable bucket size of bucket sort allows it to use
O(n) memory instead of O(M) memory, where M is the number of distinct values; in exchange, it
gives up counting sort's O(n + M) worst-case behavior.
Bucket sort with two buckets is effectively a version of quicksort where the pivot value is always
selected to be the middle value of the value range. While this choice is effective for uniformly
distributed inputs, other means of choosing the pivot in quicksort such as randomly selected pivots
make it more resistant to clustering in the input distribution.
The n-way mergesort algorithm also begins by distributing the list into n sublists and sorting each
one; however, the sublists created by mergesort have overlapping value ranges and so cannot be
recombined by simple concatenation as in bucket sort. Instead, they must be interleaved by a merge
algorithm. However, this added expense is counterbalanced by the simpler scatter phase and the
ability to ensure that each sublist is the same size, providing a good worst-case time bound.
Top-down radix sort can be seen as a special case of bucket sort where both the range of values
and the number of buckets is constrained to be a power of two. Consequently, each bucket's size is
also a power of two, and the procedure can be applied recursively. This approach can accelerate the
scatter phase, since we only need to examine a prefix of the bit representation of each element to
determine its bucket.
Count Sort

In computer science, counting sort is an algorithm for sorting a collection of objects according to
keys that are small integers; that is, it is an integer sorting algorithm. It operates by counting the
number of objects that have each distinct key value, and using arithmetic on those counts to
determine the positions of each key value in the output sequence. Its running time is linear in the
number of items and the difference between the maximum and minimum key values, so it is only
suitable for direct use in situations where the variation in keys is not significantly greater than the
number of items. However, it is often used as a subroutine in another sorting algorithm, radix sort,
that can handle larger keys more efficiently.[1][2][3]
Because counting sort uses key values as indexes into an array, it is not a comparison sort, and
the Ω(n log n) lower bound for comparison sorting does not apply to it.[1] Bucket sort may be used for
many of the same tasks as counting sort, with a similar time analysis; however, compared to
counting sort, bucket sort requires linked lists, dynamic arrays or a large amount of preallocated
memory to hold the sets of items within each bucket, whereas counting sort instead stores a single
number (the count of items) per

Input and output assumptions[edit]


In the most general case, the input to counting sort consists of a collection of n items, each of which
has a non-negative integer key whose maximum value is at most k.[3] In some descriptions of
counting sort, the input to be sorted is assumed to be more simply a sequence of integers itself,[1] but
this simplification does not accommodate many applications of counting sort. For instance, when
used as a subroutine in radix sort, the keys for each call to counting sort are individual digits of larger
item keys; it would not suffice to return only a sorted list of the key digits, separated from the items.
In applications such as in radix sort, a bound on the maximum key value k will be known in advance,
and can be assumed to be part of the input to the algorithm. However, if the value of k is not already
known then it may be computed, as a first step, by an additional loop over the data to determine the
maximum key value that actually occurs within the data.
The output is an array of the items, in order by their keys. Because of the application to radix sorting,
it is important for counting sort to be a stable sort: if two items have the same key as each other,
they should have the same relative position in the output as they did in the input.[1][2]

History[edit]
Although radix sorting itself dates back far longer, counting sort, and its application to radix sorting,
were both invented by Harold H. Seward in 1954.[1][4][8]
Radix Sort

In computer science, radix sort is a non-comparative sorting algorithm. It avoids comparison by


creating and distributing elements into buckets according to their radix. For elements with more than
one significant digit, this bucketing process is repeated for each digit, while preserving the ordering
of the prior step, until all digits have been considered. For this reason, radix sort has also been
called bucket sort and digital sort.

History[edit]
Radix sort dates back as far as 1887 to the work of Herman Hollerith on tabulating machines.[1] Radix
sorting algorithms came into common use as a way to sort punched cards as early as 1923[2]
The first memory-efficient computer algorithm was developed in 1954 at MIT by Harold H. Seward.
Computerized radix sorts had previously been dismissed as impractical because of the perceived
need for variable allocation of buckets of unknown size. Seward's innovation was to use a linear
scan to determine the required bucket sizes and offsets beforehand, allowing for a single static
allocation of auxiliary memory. The linear scan is closely related to Seward's other algorithm
— counting sort.
In the modern era, radix sorts are most commonly applied to collections of
binary strings and integers. It has been shown in some benchmarks to be faster than other more
general purpose sorting algorithms, sometimes 50% to 3x as fast [3][4][5].

Digit Order[edit]
Radix sorts can be implemented to start at either the most significant digit (MSD) or least significant
digit (LSD). For example, with 1234, one could start with 1 (MSD) or 4 (LSD).
QuickSort

Quicksort (sometimes called partition-exchange sort) is an efficient sorting algorithm, serving as a


systematic method for placing the elements of a random access file or an array in order. Developed
by British computer scientist Tony Hoare in 1959[1] and published in 1961,[2] it is still a commonly
used algorithm for sorting. When implemented well, it can be about two or three times faster than its
main competitors, merge sort and heapsort.[3][contradictory]

Quicksort is a comparison sort, meaning that it can sort items of any type for which a "less-than"
relation (formally, a total order) is defined. Efficient implementations of Quicksort are not a stable
sort, meaning that the relative order of equal sort items is not preserved. Quicksort can operate in-
place on an array, requiring small additional amounts of memory to perform the sorting. It is very
similar to selection sort, except that it does not always choose worst-case partition.

The quicksort algorithm was developed in 1959 by Tony Hoare while in the Soviet Union, as a
visiting student at Moscow State University. At that time, Hoare worked on a project on machine
translation for the National Physical Laboratory. As a part of the translation process, he needed to
sort the words in Russian sentences prior to looking them up in a Russian-English dictionary that
was already sorted in alphabetic order on magnetic tape.[4] After recognizing that his first
idea, insertion sort, would be slow, he quickly came up with a new idea that was Quicksort. He wrote
a program in Mercury Autocode for the partition but could not write the program to account for the list
of unsorted segments. On return to England, he was asked to write code for Shellsort as part of his
new job. Hoare mentioned to his boss that he knew of a faster algorithm and his boss bet sixpence
that he did not. His boss ultimately accepted that he had lost the bet. Later, Hoare learned
about ALGOL and its ability to do recursion that enabled him to publish the code in Communications
of the Association for Computing Machinery, the premier computer science journal of the time.[2][5]

Quicksort is a space-optimized version of the binary tree sort. Instead of inserting items sequentially
into an explicit tree, quicksort organizes them concurrently into a tree that is implied by the recursive
calls.

The most direct competitor of quicksort is heapsort. Heapsort's running time is O(n log n), but
heapsort's average running time is usually considered slower than in-place quicksort. This result is
debatable; some publications indicate the opposite.[28][29] Introsort is a variant of quicksort that
switches to heapsort when a bad case is detected to avoid quicksort's worst-case running time.

Quicksort also competes with merge sort, another O(n log n) sorting algorithm. Mergesort is a stable
sort, unlike standard in-place quicksort and heapsort, and can be easily adapted to operate on linked
lists and very large lists stored on slow-to-access media such as disk storage or network-attached
storage.

Bucket sort with two buckets is very similar to quicksort; the pivot in this case is effectively the value
in the middle of the value range, which does well on average for uniformly distributed inputs.
Generalization[edit]
Richard Cole and David C. Kandathil, in 2004, discovered a one-parameter family of sorting
algorithms, called partition sorts, which on average (with all input orderings equally likely) perform at

most comparisons (close to the information theoretic lower bound) and operations; at

worst they perform comparisons (and also operations); these are in-place, requiring only

additional space. Practical efficiency and smaller variance in performance were demonstrated
against optimised quicksorts (of Sedgewick and Bentley-McIlroy).[34]

You might also like