Unit 9
Unit 9
Searching Techniques
UNIT 9 SEARCHING AND SORTING TECHNIQUES
9.0 INTRODUCTION
Searching is the process of looking for something: Finding one piece of data that has
been stored within a whole group of data. It is often the most time-consuming part of
many computer programs. There are a variety of methods, or algorithms, used to
search for a data item, depending on how much data there is to look through, what
kind of data it is, what type of structure the data is stored in, and even where the data
is stored - inside computer memory or on some external medium.
Till now, we have studied a variety of data structures, their types, their use and so on.
In this unit, we will concentrate on some techniques to search a particular data or
piece of information from a large amount of data. There are basically two types of
searching techniques, Linear or Sequential Search and Binary Search.
Searching is very common task in day-to-day life, where we are involved some or
other time, in searching either for some needful at home or office or market, or
searching a word in dictionary. In this unit, we see that if the things are organised in
some manner, then search becomes efficient and fast.
All the above facts apply to our computer programs also. Suppose we have a
telephone directory stored in the memory in an array which contains Name and
Numbers. Now, what happens if we have to find a number? The answer is search that
number in the array according to name (given). If the names were organised in some
order, searching would have been fast.
So, basically a search algorithm is an algorithm which accepts an argument ‘a’ and
tries to find the corresponding data where the match of ‘a’ occurs in a file or in a
table.
9.1 OBJECTIVES
After going through this unit, you should be able to:
know the basic concepts of searching;
1
know the process of performing the Linear Search;
Searching and Sorting Techniques
know the process of performing the Binary Search and
know the applications of searching.
Before studying Linear Search, let us define some terms related to search.
For example, the telephone directory that we discussed in previous section can be
considered as a file, where each record contains two fields: name of the person and
phone number of the person.
Now, it depends on the application whose field will be the ‘key’. It can be the name of
person (usual case) and it can also be phone number. We will locate any particular
record by matching the input argument ‘a’ with the key value.
The simplest of all the searching techniques is Linear or Sequential Search. As the
name suggests, all the records in a file are searched sequentially, one by one, for the
matching of key value, until a match occurs.
The Linear Search is applicable to a table which it should be organised in an array. Let
us assume that a file contains ‘n’ records and a record has ‘a’ fields but only one key.
The values of key are organised in an array say ‘m’. As the file has ‘n’ records, the
size of array will be ‘n’ and value at position R(i) will be the key of record at position
i. Also, let us assume that ‘el’ is the value for which search has to be made or it is the
search argument.
Algorithm
Sep 1: [Initialize]
k=0
flag=1
Step 3: if (m[k]=el )
then
flag=0
print “Search is successful” and element is found at location (k+1)
stop
endif
2
Graph Algorithms and
Searching Techniques
Step 4: if (flag=1) then
print “Search is unsuccessful”
endif
Step 5: stop
Program 9.1 examines each of the key values in the array ‘m’, one by one and stops
when a match occurs or the total array is searched.
Example:
A telephone directory with n = 10 records and Name field as key. Let us assume that
the names are stored in array ‘m’ i.e. m(0) to m(9) and the search has to be made for
name “Radha Sharma”, i.e. element = “Radha Sharma”.
Telephone Directory
The above algorithm will search for element = “Radha Sharma” and will stop at 6th
index of array and the required phone number is “26150880”, which is stored at
position 7 i.e. 6+1.
How many number of comparisons are there in this search in searching for a given
element?
The number of comparisons depends upon where the record with the argument key
appears in the array. If record is at the first place, number of comparisons is ‘1’, if
record is at last position ‘n’ comparisons are made.
If it is equally likely for that the record can appear at any position in the array, then, a
successful search will take (n+1)/2 comparisons and an unsuccessful search will take
‘n’ comparisons.
4
☞
Graph Algorithms and
Searching Techniques Check Your Progress 1
1) Linear search uses an exhaustive method of checking each element in the array
against a key value. When a match is found, the search halts. Will sorting the
array before using the linear search have any effect on its order of efficiency?
……………………………………………………………………………………
2) In a best case situation, the element was found with the fewest number of
comparisons. Where, in the list, would the key element be located?
……………………………………………………………………………………
An array-based binary search selects the middle element in the array and compares its
value to that of the key value. Because, the array is sorted, if the key value is less than
the middle value then the key must be in the first half of the array. Likewise, if the
value of the key item is greater than that of the middle value in the array, then it is
known that the key lies in the second half of the array. In either case, we can, in effect,
“throw out” one half of the search space or array with only one comparison.
Now, knowing that the key must be in one half of the array or the other, the binary
search examines the mid value of the half in which the key must reside. The algorithm
thus narrows the search area by half at each step until it has either found the key data
or the search fails.
As the name suggests, binary means two, so it divides an array into two halves for
searching. This search is applicable only to an ordered table (in either ascending or
in descending order).
Let us write an algorithm for Binary Search and then we will discuss it. The array
consists of elements stored in ascending order.
Algorithm
Step 1: Declare an array ‘k’ of size ‘n’ i.e. k(n) is an array which stores all the keys of
a file containing ‘n’ records
Step 2: I← 0
5
else Searching and Sorti
low = mid + 1
endif
endif
endwhile
Step 6: Stop
/*Header Files*/
#include<stdio.h>
#include<conio.h>
/*Functions*/
void binary_search(int array[ ], int value, int size)
{
int found=0;
int high=size-1, low=0, mid;
mid = (high+low)/2;
printf(“\n\n Looking for %d\n”, value);
while((!found)&&(high>=low))
{
printf(“Low %d Mid%d High%d\n”, low, mid, high);
if(value==array[mid] )
{printf(“Key value found at position %d”,mid+1);
found=1;
}
else
{if (value<array[mid])
high = mid-1;
else
low = mid+1;
mid = (high+low)/2;
}
}
if (found==1
printf(“Search successful”);
else
printf(“Key value not found”);
}
/*Main Function*/
void main(void)
{
int array[100], i;
/*Inputting Values to Array*/
for(i=0;i<100;i++)
{ printf(“Enter the name:”);
scanf(“%d”, array[i]);
}
printf(“Result of search %d\n”, binary_searchy(array,33,100));
printf(“Result of search %d\n”, binary_searchy(array, 75,100));
printf(“Result of search %d\n”, binary_searchy(array,1,100));
}
Program 9.2 : Binary Search
6
Graph Algorithms and Example:
Searching Techniques
22 1
33 2
44 3
55
4
Let key = 55, low = 0, high = 4
Each comparison in the binary search reduces the number of possible candidates
where the key value can be found by a factor of 2 as the array is divided in two halves
in each iteration. Thus, the maximum number of key comparisons are approximately
log n. So, the order of binary search is O (log n).
Binary search is lots faster than linear search. Here are some comparisons:
8 | 4 4
128 | 64 8
256 | 128 9
1000 | 500 11
100,000 | 50,000 18
A binary search on an array is O(log2 n) because at each test, you can “throw out”
one half of the search space or array whereas a linear search on an array is O(n).
It is noteworthy that, for very small arrays a linear search can prove faster than a
binary search. However, as the size of the array to be searched increases, the binary
7
search is the clear winner in terms of number of comparisons and therefore overall Searching and Sorting Techniques
speed.
Still, the binary search has some drawbacks. First, it requires that the data to be
searched be in sorted order. If there is even one element out of order in the data being
searched, it can throw off the entire process. When presented with a set of unsorted
data, the efficient programmer must decide whether to sort the data and apply a binary
search or simply apply the less-efficient linear search. Is the cost of sorting the data is
worth the increase in search speed gained with the binary search? If you are searching
only once, then it is probably to better do a linear search in most cases.
9.4 APPLICATIONS
The searching techniques are applicable to a number of places in today’s world, may it
be Internet, search engines, on line enquiry, text pattern matching, finding a record
from database, etc.
The most important application of searching is to track a particular record from a large
file, efficiently and faster.
1. Spell Checker
2. Search Engines
Search engines use software robots to survey the Web and build their databases. Web
documents are retrieved and indexed using keywords. When you enter a query at a
search engine website, your input is checked against the search engine’s keyword
indices. The best matches are then returned to you as hits. For checking, it uses any of
the Search algorithms.
Search Engines use software programs known as robots, spiders or crawlers. A robot
is a piece of software that automatically follows hyperlinks from one document to the
next around the Web. When a robot discovers a new site, it sends information back to
its main site to be indexed. Because Web documents are one of the least static forms
of publishing (i.e., they change a lot), robots also update previously catalogued sites.
How quickly and comprehensively they carry out these tasks vary from one search
engine to the next.
8
3. String Pattern matching
We will illustrate insertion sort with an example (refer to Figure 9.1) before
presenting the formal algorithm.
Example : Sort the following list using the insertion sort method:
Thus to find the correct position search the list till an item just greater than the target is
found. Shift all the items from this point one down the list. Insert the target in the
vacated slot. Repeat this process for all the elements in the list. This results in sorted
list.
9
9.5.2 Bubble Sort
Searching and Sorting Techniques
In this sorting algorithm, multiple swappings take place in one pass. Smaller elements
move or ‘bubble’ up to the top of the list, hence the name given to the algorithm.
In this method, adjacent members of the list to be sorted are compared.If the item on top
is greater than the item immediately below it, then they are swapped. This processis
carried on till the list is sorted.
1. Begin
2. Read the n elements
3. for i=1 to n
for j=n downto i+1
if a[j] <= a[j-1]
swap(a[j],a[j-1])
4. End // of Bubble Sort
= (N-1) +(N-2) . . . + 2 + 1
= (N-1)*N / 2 =O(N2)
This inefficiency is due to the fact that an item moves only to the next position in each
pass.
Rearrange the list so that this item is in the proper position, i.e., all preceding items have a
lesser value and all succeeding items have a greater value than this item.
2. A[I]
Repeat steps 1 & 2 for sublist1 & sublist2 till A[ ] is a sorted list. As can be seen, this
2. From the left end of the list (A[O] onwards) scan till an item A[R] is found
10
whose value is greater than A[I].
Graph Algorithms and
Searching Techniques
3. From the right end of list [A[N] backwards] scan
till an item A[L] is found whose value is less than A[1].
Program 9.3 gives the program segment for Quick sort. It uses recursion.
Quicksort(A,m,n) int
A[ ],m,n
{
int i, j, k; if
m<n
{
i=m;
j=n+1;
k=A[m];
do
do
++i;
while (A[i] < k);
do
--j;
while (A[j] > k);
if (i < j)
{
temp = A[i];
A[i] = A[j];
A[j] = temp;
}
while (i<j);
temp = A[m];
A[m] = A[j];
A[j] = temp;
Quicksort(A,m,j-1);
Quicksort(A,j+1,n);
}
The Quick sort algorithm uses the O(N Log2N) comparisons on average. The
performance can be improved by keeping in mind the following points.
1. Switch to a faster sorting scheme like insertion sort when the sublist size
becomes comparatively small.
11
Searching and Sorting Techniques
9.5.4 2-Way Merge Sort
Merge sort is also one of the ‘divide and conquer’ class of algorithms. The basic idea in
this is to divide the list into a number of sublists, sort each of these sublists and merge
them to get a single sorted list. The illustrative implementation of 2 way merge sort sees
the input initially as n lists of size 1. These are merged to get n/2 lists of size
2. These n/2 lists are merged pair wise and so on till a single list is obtained. This canbe
better understood by the following example. This is also called Concatenate sort.
Figure 9.2 depicts 2-way merge sort.
Mergesort is the best method for sorting linked lists in random order. The total computing
time is of the 0(n log2n ).
The disadvantage of using mergesort is that it requires two arrays of the same size and space
for the merge phase. That is, to sort a list of size n, it needs space for 2n elements.
Mergesort is the best method for sorting linked lists in random order. The total
computing time is of the 0(n log2n ).
The disadvantage of using mergesort is that it requires two arrays of the same size and
space for the merge phase. That is, to sort a list of size n, it needs space for 2n
elements.
12
Searching and Sorting Techniques
A complete binary tree is said to satisfy the ‘heap condition’ if the key of each node is
greater than or equal to the key in its children. Thus the root node will have the largest
key value.
Trees can be represented as arrays, by first numbering the nodes (starting from the root)
from left to right. The key values of the nodes are then assigned to array positions whose
index is given by the number of the node. For the example tree, thecorresponding array is
depicted in Figure 9.4.
The relationships of a node can also be determined from this array representation. If a
node is at position j, its children will be at positions 2j and 2j + 1. Its parent will be at
position └J/2┘.
Consider the node M. It is at position 5. Its parent node is, therefore, at position 5/2┘ = 2
i.e. the parent is R. Its children are at positions 2 5 & (2 5) + 1, i.e.10 + 11
respectively i.e. E & I are its children.
A Heap is a complete binary tree, in which each node satisfies the heap condition,
represented as an array.
We will now study the operations possible on a heap and see how these can be combined
to generate a sorting algorithm.
1. Initially R is added as the right child of J and given the number 13.
2. But, R > J. So, the heap condition is violated.
3. Move R upto position 6 and move J down to position 13.
4. R > P. Therefore, the heap condition is still violated.
5. Swap R and P.
4. The heap condition is now satisfied by all nodes to get the heap of Figure 9.5.
We will first see two methods of heap construction and then removal in order from theheap
to sort the list.
Insert items into an initially empty heap, satisfying the heap condition at all
steps.
From the right most node modify to satisfy the heap condition.
Example: Build a heap of the following using top down approach for heap
construction.
PROFESSIONAL
Figure 9.6 shows different steps of the top down construction of the heap.
6 (e)
6 (f) 6 (g)
6 (h) 6 (i)
6 (j) 6 (k)
Example: The input file is (2,3,81,64,4,25,36,16,9, 49). When the file is interpreted as a
binary tree, it results in Figure 9.7. Figure 9.8 depicts the heap.
81
2
3 64 36
81
64 16
4 25 36 49 2
25
16 9 49 3 9 4
Figure 9.9 illustrates various steps of the heap of Figure 9.8 as the sorting takesplace.
64
49
49 36 16 36
16 9
4 25 2 4 25 2
3 9 3
36 25
16 16 3
25
9 4 2
9 4 3 2
15
Graph Algorithms and
Searching Techniques
9
16
4 3
9 3
2
2 4
Sorted: 81, 64, 49, 36, 25 Sorted:81, 64, 49, 36, 25, 16
Size: 5 Size:4
4 3
2 3
2
Sorted: 81, 64, 49, 36, 25, 16, 9 Sorted:81, 64, 49, 36, 25, 16, 9, 4
Size: 3 Size: 2
Sorted: 81, 64, 49, 36, 25, 16, 9, 4, 3 Sorted: 81, 64, 49, 36, 25, 16, 9, 4, 3, 2
Size : 1 Result
The first method is called the MSD (Most Significant Digit) sort and the second method
is called the LSD (Least Significant Digit) sort. Digit stands for a key. Though they are
16 called sorting methods, MSD and LSD sorts only decide the order of sorting. The actual
sorting could be done by any of the sorting methods discussed in this unit.
Searching and Sorting Techniques
☞ Check Your Progress 3
1) The complexity of Bubble sort is
2) Quick sort algorithm uses the programming technique of
3) Write a program in ‘C’ language for 2-way merge sort.
4) The complexity of Heap sort is
9.7 SUMMARY
Searching is the process of looking for something. Searching a list consisting of 100000
elements is not the same as searching a list consisting of 10 elements. We discussed two
searching techniques in this unit namely Linear Search and Binary Search. Linear Search
will directly search for the key value in the given list. Binary search will directly search
for the key value in the given sorted list. So, the major difference is the way the given list
is presented. Binary search is efficient in most of the cases. Though, it had the overhead
that the list should be sorted before search can start, it is very well compensated through
the time (which is very less when comparedto linear search) it takes to search. There are a
large number of applications of Searching out of whom a few were discussed in this unit.
1) No
2) It will be located at the beginning of the list
(a) F
(b) F
(c) F
Reference Books
1. Fundamentals of Data Structures in C++ by E. Horowitz, Sahai and D. Mehta,
Galgotia Publications.
17