Chapter-5. Sorting and Searching
Chapter-5. Sorting and Searching
We sort many things in our everyday lives: A handful of cards when playing Bridge;
bills and other piles of paper; jars of spices; and so on. And we have many intuitive
strategies that we can use to do the sorting, depending on how many objects we
have to sort and how hard they are to move around. Sorting is also one of the most
frequently performed computing tasks. We might sort the records in a database
so that we can search the collection efficiently. We might sort the records by zip
code so that we can print and mail them more cheaply. . We might use sorting as an
intrinsic part of an algorithm to solve some other problem, such as when computing
the minimum-cost spanning tree.
1.1 Definition of Sorting
Given a set of records 𝑟1 , 𝑟2 , . . . , 𝑟𝑛 with key values 𝑘1 , 𝑘2 , . . . , 𝑘𝑛 , the Sorting
Problem is to arrange the records into any order 𝑠 such that records 𝑟𝑠1 , 𝑟𝑠2 , . . . , 𝑟𝑠𝑛
have keys obeying the property 𝑘𝑠1 ≤ 𝑘𝑠2 ≤ . . . ≤ 𝑘𝑠𝑛 . In other words, the sorting
problem is to arrange a set of records so that the values of their key fields are in
non-decreasing order.
When comparing two sorting algorithms, the most straightforward approach
would seem to be simply program both and measure their running times. When analyzing sorting
algorithms, it is traditional to measure the number of comparisons made between keys.
1.2 Applications of Sorting
Commercial computing: Organizations organize their data by sorting it. Accounts to be sorted
by name or number, transactions to be sorted by time or place, mail to be sorted by postal code or
address, files to be sorted by name or date, or whatever, processing such data is sure to involve a
sorting algorithm somewhere along the way.
Search for information: Keeping data in sorted order makes it possible to efficiently search
through it using the classic binary search algorithm.
Operations research: Suppose that we have N jobs to complete. We want to maximize customer
satisfaction by minimizing the average completion time of the jobs. The shortest processing time
first rule, where we schedule jobs in increasing order of processing time, is known to accomplish
this goal.
Combinatorial search: A classic paradigm in artificial intelligence is to define a set of
configurations with well-defined moves from one configuration to the next and a priority
associated with each move.
Prim's algorithm, Kruskal's algorithm and Dijkstra's algorithm are classical algorithms that process
graphs. These use sorting.
Huffman compression is a classic data compression algorithm that depends upon processing a set
of items with integer weights by combining the two smallest to produce a new one whose weight
is the sum of its two constituents.
String processing algorithms are often based on sorting.
1.3 Bubble Sort
Bubble Sort is often taught to novice programmers in introductory computer science courses. It is
a relatively slow sort. It has a poor best-case running time.
Bubble Sort consists of a simple double for loop. The first iteration of the
inner for loop moves through the record array from left to right, comparing
adjacent keys. If the lower-indexed key’s value is greater than its higher-indexed neighbor, then
the two values are swapped. Once the largest value is encountered,
this process will cause it to “bubble” up to the right of the array. The second pass
through the array repeats this process. However, because we know that the largest
value reached the right of the array on the first pass, there is no need to compare
the right two elements on the second pass. Likewise, each succeeding pass through
the array compares adjacent elements, looking at one less value than the preceding
pass. The following figure1 shows an example of Bubble Sort:
1
https://eleni.blog/2019/06/09/sorting-in-go-using-bubble-sort/
Input: A (array), N (#elements)
𝑝𝑎𝑠𝑠 =1
Step 1: 𝑢 = 𝑁− 𝑝𝑎𝑠𝑠
Compare the pairs (𝐴[0], 𝐴[1]), (𝐴[1], 𝐴[2]), (𝐴[2], 𝐴[3]), … ,
(𝐴[𝑢−1], 𝐴[𝑢]) and Exchange pair of elements if they are not in
order.
Step 2: 𝑝𝑎𝑠𝑠 = 𝑝𝑎𝑠𝑠 + 1. If the array is unsorted and 𝑝𝑎𝑠𝑠 < 𝑁−1
then go to step 1
It is to notice that if in a pass no swapping occurs then next passes are redundant. In the above
illustration, it is seen that pass 7 is redundant since there was no swapping in pass 6.
Not much use in the real world, but it’s easy to understand and fast to implement. It is used
when a fast algorithm is needed to sort: 1) an extremely small set of data (Ex. Trying to get the
books on a library shelf back in order.) or 2) a nearly sorted set of data. (Ex. Trying to decide
which laptop to buy, because it is easier to compare pairs of laptops one at a time and decide which
you prefer, than to look at them all at once and decide which was best.)
1.4 Selection Sort
Consider the problem of sorting a pile of phone bills for the past year. An intuitive approach might
be to look through the pile until you find the bill for
January, and pull that out. Then look through the remaining pile until you find the
bill for February, and add that behind January. Proceed through the ever-shrinking
pile of bills to select the next one in order until you are done. The 𝑖th pass of Selection Sort
“selects” the 𝑖th smallest key in the array, placing that record into position i. In other
words, Selection Sort first finds the smallest key in an unsorted list, then the second
smallest, and so on. Its unique feature is that there are few record swaps. To find
the next smallest key value requires searching through the entire unsorted portion
of the array, but only one swap is required to put the record in place. Thus, the total
number of swaps required will be n − 1 (we get the last record in place “for free”).
Selection Sort (as written here) is essentially a Bubble Sort, except that rather
than repeatedly swapping adjacent values to get the next smallest record into place, we instead
remember the position of the element to be selected and do one swap
at the end. Selection sort is particularly advantageous when the cost to do a swap is high, for
example, when the elements are long strings or other large records. Selection Sort is more efficient
than Bubble Sort (by a constant factor) in most other situations as well.
There is another approach to keeping the cost of swapping records low that
can be used by any sorting algorithm even when the records are large. This is
to have each element of the array store a pointer to a record rather than store the
record itself. In this implementation, a swap operation need only exchange the
pointer values; the records themselves do not move. The following figure shows an example of
Selection Sort:
Figure 5.Error! No text of specified style in document..2: An illustration of Selection Sort to
sort an array in increasing order
1.9 References
• Kruse, R. L., Ryba, A. J., & Ryba, A. (1999). Data structures and program design in C++
(p. 280). New York: Prentice Hall.
• Shaffer, C. A. (2012). Data structures and algorithm analysis. Update, 3, 0-3.
• Dale, N. B. (2003). C++ plus data structures. Jones & Bartlett Learning.