Searching and Sorting
Searching and Sorting
CHAPTER 1
The process of locating target data is known as searching. Consider a situation where you
are trying to get the phone number of your friend from a telephone directory. The
telephone directory can be thought of as a table or a file, which is a collection of records.
Each record has one or more fields such as name, address, and telephone number. The
fields, which are used to distinguish records, are known as keys. While searching, we are
asked to find the record which contains information along with the target key. When we
think of a telephone directory, the search is usually by name. However, when we try to
locate the record corresponding to a given telephone number, the key will be the
telephone number.
If given an address and the person’s name and telephone number need to be located, the
person’s address will be the key.
If the key is unique and if it determines a record uniquely, it is called a primary key. For
example, telephone number is a primary key. As any field of a record may serve as the
key for a particular application, keys may not always be unique. For example, if we use
‘name’ as the key for a telephone directory, there may be one or more persons with the
same name. In addition, sorted organization of a directory makes searching easier and
faster.
We may use one of the two linear data structures, arrays and linked lists, for storing the
data. Search techniques may vary according to data organization. The data may be stored
on a secondary storage or permanent storage area. If the search is applied on the table
that resides at the secondary storage (hard disk), it is called as external searching,
whereas searching of a table that is in primary storage (main memory) is called as
internal searching which is faster than external searching.
One of the most popular applications of search algorithms is adding a record in the
collection of records. While adding, the record is searched by key and if not present, it is
1
inserted in the collection. Such a technique of searching the record and inserting it if not
found is known as search and insert algorithm.
Depending on the way data is scanned for searching a particular record, the search
techniques are categorized as follows:
Sequential search
Binary search
The easiest search technique is a sequential search. There are two ways for storing the
collection of records namely, sequential and non-sequential. Let us assume that we have a
sequential file F, and we wish to retrieve a record with a certain key value k. If F has n
records with the key value ki such as i = 1 to n, then one way to carry out the retrieval is
by examining the key values in the order of their arrangement until the correct record is
located. Such a search is known as sequential search since the records are examined
sequentially from the first till the last.
Hence, a sequential search begins with the first available record and proceeds to the next
available record repeatedly until we find the target key or conclude that it is not found.
Sequential search is also called as linear search.
Algorithm:
1. Set i = 0, flag = 0
2. Compare key[i] and target
if(key[i] = target)
Set flag = 1, location = i and goto step 5
else
6.stop
2
The following Figure shows a sample sequential unordered data and traces the search for
the target data of 89.
Target location
Index 0 1 2 3 4 5 6 7 8
Elemen 8 7
ts 23 12 9 10 11 9 8 66 88
Target
data
Initially, i = 0 and the target element 89 is to be searched. At each pass, the target 89 is
compared with the element at the ith location till it is found or the index i exceeds the size.
At i = 5, the search is successful.
Let us compute the amount of time the sequential search needs to search for a target data.
For this, we must compute the number of times the comparisons of keys is done. In
general, for any search algorithm, the computational complexity is computed by
considering the number of comparisons made.
Average complexity is the sum of number of comparisons for each position of the target
data divided by n and is given as follows:
= (n + 1)/2
Hence, the average number of comparisons done by the sequential search method in the
case of a successful search is (n + 1)/2. An unsuccessful search is given by n
comparisons. The number of comparisons is n and the complexity is denoted as O(n).
3
->Pros and Cons of Sequential Search:
Pros:
● Suitable for storage structures which do not support direct access to data, for
example, magnetic tape, linked list, etc.
● Best case is one comparison, worst case is n comparisons, and average case is (n
+ 1)/2 comparisons
● In the case of ordered data other search techniques such as binary search are
found more suitable.
->Variations of Sequential Search: The time complexity of sequential search is O(n);
this amounts to one comparison in the best case, n comparisons in the worst case, and (n
+ 1)/2 comparisons in the average case. The algorithm starts at the first location and the
search continues till the last element. We can make a few changes leading to a few
variations in the sequential search algorithm. There are three such variations:
1. Sentinel search
2. Probability search
3. Ordered list search
Sentinel search : We note that in steps 2–4 of the above Algorithm , there are two
comparisons— one for the element (key) to be searched and the other for the end of the
array. The algorithm ends either when the target is found or when the last element is
compared. The algorithm can be modified to eliminate the end of list test by placing the
target at the end of list as just one additional entry. This additional entry at the end of the
list is called as a sentinel. Now, we need not test for the end of list condition within the
4
loop and merely check after the loop completes whether we found the actual target or the
sentinel. This modification avoids one comparison within the loop that varies n times.
The only care to be taken is not to consider the sentinel entry as a data member.
Algorithm:
1. Set i = 0
2. list[n] = target {add sentinel}
3. Compare key[i] and target
if(key[i] = target)
Set location = i and goto step 6
else
7. stop
Probability search In probability search, the elements that are more probable are
placed at the beginning of the array and those that are less probable are placed at the end
of the array.
Ordered list search When elements are ordered, binary search is preferred. However,
when data is ordered and is of smaller size, sequential search with a small change is
preferred to binary search. In addition, when the data is ordered but stored in a data
structure such as a linked list, modified sequential search is preferred. While searching an
ordered list, we need not continue the search till the end of list to know that the target
element is not in the list. While searching in an ascending ordered list, whenever an
element that is greater than or equal to the target is encountered, the search stops. We can
also add a sentinel to avoid the end of list test.
BINARY SEARCH:-
As discussed, sequential search is not suitable for larger lists. It requires n comparisons in
the worst case. We have a better method when the data is sorted.
The method is called binary search, in that we have to divide the list to be searched every
time into two lists and the search is done in only one of the lists. Consider that the list is
sorted in ascending order. In binary search algorithm, to search for a particular element, it
is first compared with the element at the middle position, and if it is found, the search is
successful, else if the middle position value is greater than the target, the search will
5
continue in the first half of the list; otherwise, the target will be searched in the second
half of the list. The same process is repeated for one of the halves of the list till the list is
reduced to size one.
Algorithm:
1. Let n be size of the list
3. if(key[middle] = target)
high = middle − 1
else
low = middle + 1
4. Goto step(2)
5. if flag = 1
report as target element found at location ‘position’
else
6. stop
Ex:
int Binary_Search_non_recursive(int A[], int n, int key)
6
int low = 0,high = n − 1,mid;
else if(key<A[mid])
high = mid − 1;
else
Ex:
int Binary_Search(int A[],int low,int high,int key)
int mid;
if(A[mid] == key)
return mid;
7
return Binary_Search(A,mid + 1,high, key);
return −1;
n
=
T(1), 1
T(n) =
{
n
T(n/2) + c,
>
1
The most popular and easiest way to solve a recurrence relation is to repeatedly make
substitutions for each occurrence of the function T on the right-hand side until all such
occurrences disappear.
T(n) = O(log2n)
Pros
Cons
● Not suitable for storage structures that do not support direct access to data, for
example, magnetic tape and linked list
9
Sorting is the operation of arranging the records of a table according to the key value of
each record, or it can be defined as the process of converting an unordered set of ele-
ments to an ordered set.
BUBBLE SORT:-
The bubble sort is the oldest and the simplest sort in use. Unfortunately, it is also the
slowest. The bubble sort works by comparing each item in the list with the item next to it
and swapping them if required. This causes larger values to ‘bubble’ to the end of the list
while smaller values ‘sink’ towards the beginning of the list. In brief, the bubble sort
derives its name from the fact that the smallest data item bubbles up to the top of the
sorted array. Fig. illustrates a bubble sort using an array of size 7.
7 6 5 1 0
6 7 36 5 23 4 6
(
a
)
Ar
ra 7 6 5 2 1
y 6 7 36 5 3 4 6
Step 6 7 5 2 1
1 7 6 36 5 3 4 6
Step 6 3 5 2 1
2 7 6 76 5 3 4 6
Step 6 3 7 2 1
Pas 3 7 6 55 6 3 4 6
s1
Step 6 3 2 7 1
4 7 6 55 3 6 4 6
Step 6 3 2 1 7
5 7 6 55 3 4 6 6
Step 6 3 2 1 7
6 7 6 55 3 4 6 6
Step 3 6 2 1 7
1 6 7 55 3 4 6 6
Step 3 5 2 1 7
2 6 5 67 3 4 6 6
Pas Step 3 5 6 1 7
s2 3 6 5 23 7 4 6 6
Step 3 5 1 6 7
4 6 5 23 4 7 6 6
Step 3 5 1 6 7
5 6 5 23 4 6 7 6
Step 3 5 1 6 7
1 6 5 23 4 6 7 6
Pas Step 3 2 1 6 7
s3 2 6 3 55 4 6 7 6
Step 3 2 14 5 6 6 7
10
3 6 3 5 7 6
Step 3 2 5 6 7
4 6 3 14 6 5 7 6
Step 2 3 5 6 7
1 3 6 14 6 5 7 6
Pas Step 2 1 5 6 7
s4 2 3 4 36 6 5 7 6
Step 2 1 3 5 6 7
3 3 4 6 6 5 7 6
Pas Step 1 2 3 5 6 7
s5 1 4 3 6 6 5 7 6
Step 1 3 5 6 7
2 4 6 23 6 5 7 6
Pas Step 1 3 5 6 7
s6 1 6 4 23 6 5 7 6
1 3 5 6 7
Final sorted array 6 4 23 6 5 7 6
(
b
)
temp = A[j]
A[j] = A[j + 1]
A[j + 1] = temp
End
end
3. stop
Prog:
int i, j,temp;
11
for(i = 1; i < n; i++) // number of passes
A[j + 1] = temp;
Analysis of Bubble Sort: The algorithm begins by comparing the top item of the array
with the next and swapping them if necessary. After n - 1 comparisons, the largest among
a total of n items descends to the bottom of the array, that is, to the nth location. The
process is then repeated to the remaining n - 1 items in the array. For n data items, the
method requires n(n - 1)/2 comparisons and on an average, almost one-half as many
swaps. The bubble sort, therefore, is very inefficient in large sorting jobs.
The analysis of this routine is a bit difficult. If we do not stop iterations when the array is
sorted, the analysis is simple.
This totals up to
12
(n − 1) + (n − 2) + (n − 3) + … + 1 = n(n − 1)/2
Hence, the time complexity for each of the cases is given by the following:
INSERTION SORT:-
The insertion sort works just like its name suggests—it inserts each item into its proper
place in the final list. The simplest implementation of this requires two list structures: the
source list and the list into which the sorted items are inserted.
Ex: Consider the given unsorted array. Sort this array in ascending order using insertion
sort.
Eleme
nts 76 67 36 55 23 14 6
Index 0 1 2 3 4 5 6
Solution Pass 1: Consider the first list is sorted, and insert the second number 67 in
the first list.
Sorted
array Unsorted array
Eleme 6 3 5 2 1
nts 76 7 6 5 3 4 6
Index 0 1 2 3 4 5 6
13
Pass 2: Insert number 36 in the first list.
14
Pass 3: Insert number 55 in the first list.
Unsorted
Sorted array array
Element 6 7 5
s 36 7 6 5 23 14 6
Index 0 1 2 3 4 5 6
Sorted Unsorted
array array
2
Element 5 3 1
s 36 5 67 76 4 6
Index 0 1 2 3 4 5 6
Unsorted
Sorted array array
Element 3
s 23 6 55 67 76 14 6
Index 0 1 2 3 4 5 6
Sorted Unsorted
array array
Elemen 1 6
ts 4 23 36 55 7 76 6
Index 0 1 2 3 4 5 6
15
Sorted
array
Eleme 5
nts 6 14 23 36 5 67 76
Index 0 1 2 3 4 5 6
Algorithm:
2. Check if list (J) < list (J − 1): if so interchange them; set J = J −1 and repeat step (2)
until J = 1
3. Set J = 3, 4, 5,. . ., N and keep on executing step (2)
int i, j, element;
element = A[i];
j = j − 1;
th
A[j] = element; // place element at j position
16
Analysis of Insertion Sort: Although the insertion sort is almost always better than
the bubble sort, the time required in both the methods is approximately the same, that
is, it is proportional to n2, where n is the number of data items in the array.
(n − 1) + (n − 2) + …. + 1 = (n − 1) × n/2
which is O(n2).
SELECTION SORT:-
The selection sort algorithms construct the sorted sequence, one element at a time, by
adding elements to the sorted sequence in order. At each step, the next element to be
added to the sorted sequence is selected from the remaining elements.
Because the elements are added to the sorted sequence in order, they are always
added at one end. This makes the selection sorting different from the insertion
sorting. In insertion sorting, the elements are added to the sorted sequence in an
arbitrary order. There-fore, the position in the sorted sequence at which each
subsequent element is inserted is arbitrary.
In this method, we sort a set of unsorted elements in two steps. In the first step, find
the smallest element in the structure. In the second step, swap the smallest element
with the element at the first position. Then, find the next smallest element and swap
with the element at the second position. Repeat these steps until all elements get
arranged at proper positions.
Ex : Look at following array of unsorted integers. The working of selection sort is shown in
Table 9.4 with the resultant array after each pass where the updated values of index vari-able i and
minpos after each pass are indicated.
Eleme
nts 76 67 36 55 23 14 6
Index 0 1 2 3 4 5 6
17
example 9.5
Table Selection
sort
Index 0 1 2 3 4 5 6 i minpos
5 1
76 67 36 5 23 4 6 0 6
5 1
Pass 1 6 67 36 5 23 4 76 1 5
5 6
Pass 2 6 14 36 5 23 7 76 2 4
5 6
Pass 3 6 14 23 5 36 7 76 3 4
3 6
Pass 4 6 14 23 6 55 7 76 4 4
3 6
Pass 5 6 14 23 6 55 7 76 5 5
3 6
Sorted array 6 14 23 6 55 7 76
int i, j;
minpos = i;
//i + 1 to n − 1
minpos = j;
if(minpos != i)
temp = A[i];
18
4. swap the ith element and minpos element
A[i] = A[minpos];
A[minpos] = temp;
Analysis of Selection Sort: In the above Program Code we can note that there are two
loops, one nested within the other. During the first pass, (n − 1) comparisons are made. In
the second pass, (n − 2) comparisons are made. In general, for the ith pass, (n − i)
comparisons are required.
(n − 1) + (n − 2) + … + 1 = n(n −1)/2
Therefore, the number of comparisons for the selection sort is proportional to n2, which
means that it is O(n2). The different cases are as follows:
Average case: O(n2) Best case: O(n2) Worst case: O(n2)
QUICK SORT:-
Quick sort is based on the divide-and-conquer strategy. This sort technique initially
selects an element called as pivot that is near the middle of the list to be sorted, and then
the items on either side are moved so that the elements on one side of pivot are smaller
and on the other side are larger. Now, the pivot is at the right position with respect to the
sorted sequence. These two steps, selecting the pivot and arranging the elements on either
side of pivot, are now applied recursively to both the halves of the list till the list size
reduces to one.
To choose the pivot, there are several strategies. The popular way is considering the first
element as the pivot.
● Pick an element in the array to serve as a ‘pivot’ (usually the left-most element in
the list).
● Partition the array into two parts—one with elements smaller than the pivot and
the other with elements larger than the pivot by traversing from both the ends and
performing swaps if needed.
Let us consider an example. Let the list of numbers to be sorted be {13, 11,
14, 11, 15, 19, 12, 16, 15, 13, 15, 18, 19}. Now, the first element 13 becomes
pivot. We need to place 13 at a proper location so that all elements to its left
are smaller and the right are greater.
1 1
A 13 11 14 11 5 19 12 16 15 13 15 18 9
Initially, the array is pivoted about its first element A[pivot] = 13.
1 1
0 1 2 3 4 5 6 7 8 9 0 11 2
1 1 1 1 1 1 1 1 1
13 11 14 1 5 9 2 16 5 3 5 8 9
Let us first find the elements larger than the pivot, that is, 13. In addition,
let us find the last element not larger than the pivot. These elements are in
positions 2 and 9. Let us swap those.
13 1 1 11 15 1 12 1 15 1 15 18 19
20
1 4 9 6 3
1 1 1 1 1 1 1
13 1 3 1 15 9 12 6 5 4 15 18 19
1 1 1 1 1
13 1 13 1 15 9 12 6 5 14 15 18 19
1
13 11 13 1 12 19 15 16 15 14 15 18 19
1 1 1
13 11 13 1 2 9 15 16 15 14 15 18 19
Here,the lower and upper bounds have crossed. So let us now swap the pivot-with
element 12.
1 1 1 1
12 11 13 1 13 19 5 16 15 4 15 18 9
21
Here, we get two partitions as represented in the following sequence:
1 1 1
12 11 13 11 13 19 5 16 15 14 15 8 9
Recursively applying similar steps to each sub-list on the right and left side of
the pivot, we get,
1 1 1 1
11 11 2 13 3 15 5 15 16 18 9 19
8. call Quicksort(low, j − 1)
10. Stop
With the first seven steps of the process, the elements lesser than the key value are placed
at the left side and the elements greater than the key value are placed at the right side of
the key value.
Choice of Pivot: We can choose any entry in the list as the pivot. The choice of the first
entry as pivot is popular but often a poor choice. If the list is already sorted, then there
will be no element less than the first element selected as pivot, and so one of the sub-lists
will be empty. Hence, we choose a pivot near the centre of the list, in the hope that our
choice will position the list in such a manner that about half the elements will come on
each side of the pivot.
22
The choice of the pivot near the centre is also arbitrary, and hence, it is not necessary that
it will always divide the list into half. A good way to choose a pivot is to use a random
number generator to choose the position of the next pivot in each of the activations of
quick sort.
The average complexity of the quick sort algorithm is O(nlogn). However, the worst case
time complexity is O(n2).
MERGE SORT:-
We know that merge sort first divides the whole array iteratively into
equal halves unless the atomic values are achieved. We see here that
an array of 8 items is divided into two arrays of size 4.
We further divide these arrays and we achieve atomic value which can
no more be divided.
23
Now, we combine them in exactly the same manner as they were
broken down.
We first compare the element for each list and then combine them
into another list in a sorted manner. We see that 14 and 33 are in
sorted positions. We compare 27 and 10 and in the target list of 2
values we put 10 first, followed by 27. We change the order of 19 and
35 whereas 42 and 44 are placed sequentially.
After the final merging, the list should look like this −
Algorithm:
if(n == 1)
return(L);
else
{
24
split L into two halves L1 and L2;
return(merge(mergesort(L1, n/2),
(mergesort(L2, n/2))
}
}
Time Complexity:
T(n) = O(nlogn)
25
26
27
28
29
4
7
30
31
www.TechnicalBooksPdf.com
32