Sri vidya college of engineering and technology course material
UNITS
SEARCHING AND SORTING ALGORITHMS
INTRODUCTION TO SEARCHING ALGORITHMS
Searching is an operation or a technique that helps finds the place of a given element or
‘value in the list. Any search is said to be successful or unsuccessful depending upon whether the
element that is being searched is found or not. Some of the standard searching technique that is
being followed in data structure is listed below:
1. Linear Search,
Binary Search
LINEAR SEARCH
Linear search is a very basic and simple search algorithm. In Linear search, we search an
element or value in a given array by traversing the array from the starting, till the desired element
or value is found.
‘Ir compares the element to be searched with all the elements present in the array and when the
element is matched successfully, itretums the index of the element in the array, else it retam =I
Linear Search is applied on unsorted or wnordered lists, when there are fewer elements in a list.
For Example,
Linear Search
Algorithm
Linear Search ( Array A, Value x)
Step 1: Set ito
Step 2: if i> n then go to step 7
Step 3: if Ali]
then go to step 6
Step 4; Set itoi+ 1
Step 5: Go to Step 2
EC $393/Fundamentals of data structures in C unit $Step 6: Print Element x Found at index i and go to step &
Step 7: Print element not found
Step 8: Exit
Pseudocode
procedure Linear_search (list, value)
for each item in the list
if match item == value
return the item’s location
end if
end for
end procedure
‘Features of Linear Search Algorithm
1, It is used for unsorted and unordered small list of elements,
2. Ithas a time complexity of O(n), which means the time is linearly dependent on the
number of elements, which is not bad, but not that good too.
3. Ithasa very simple implementation,
BINARY SEARCH
Binary Search is used with sorted array or list. In binary search, we follow the following
steps
1, Westart by comparing the element to be searched with the element in the middle of
the listarray
2. Ifwe get a match, we return the index of the middle element
3.97 Téwe do not get a match, we check whether the element to be searched is less or
seater than in value than the middle element.
If the element/mumber to be searched is greater in value than the middle number,
‘then we pick the elements on the right side of the middle element(as the list/array is
sorted, hence on the right, we will have all the numbers greater than the middle
smumber), and start again from the step 1
5
Ifthe element/mumber to be searched is lesser in value than the middle number, then
wwe pick the elements on the left side of the middle element, and start again from the
step 1
Binary Search is useful when there are large number of elements in an array and they are
sorted. So a necessary condition for Binary search to work is that the listarray should be sorted,Features of Binary Search
1, Itis great to search through large sorted arrays.
2, Ithas a time complexity of O(log n) which is a very good time complexity. Ithas a
simple implementation.
Binary search is a fast search algorithm with run-time complexity of I(log n). This search
algorithm works on the principle of divide and conquers. For this algorithm to work properly, the
data collection should be in the sorted form,
‘Binary search looks fora particular item by comparing the middle most item of the collection.
If. match occurs, then the index of item is returned. If the middle item is greater than the item,
then the item is searched in the sub-array to the left of the middle item. Otherwise, the item is
searched for in the sub-array to the right of the middle item, This process continues on the sub-
array as well until the size of the sub array reduces to zero.
‘How Binary Search Works?
Fora binary search to work, it is mandatory for the target array to be sorted, We shall lear.
the process of binary search with a pictorial example. The following is our sorted array and let us
assume that we need to search the location of vale 31 using binary search
wo 19 26027 GPs as ew
o 12/3 4 5 6 7 B® 9
First, we shall determine half of the array by using this formula -
(high ow) /2
Here it is, 0+ (9-0) /2 =4 (integer value of 4.5). So, 4 is the mid of the array.
nid = low,
w 19
o 1 2
‘Now we compare the value stored at location 4, with the value being searched, ie. 31. We
find that the value at location 4 is 27, which is not a match, As the value is greater than 27 and we
have a sorted array, so we also know that the target value must be in the upper portion of the array
v7 123 4 o 7 8 9
‘We change our low to mid + 1 and find the new mid value again,
low = mid =
EC $393/Fundamentals of data structures inc ‘vnirmid = low + (high - low) /2
Our new mid is 7 now. We compare the value stored at location 7 with our target value 31
ou» « @080 +
o 1 2
The value stored at location 7 is not a match, rather it is more than what we are looking
for. So, the value must be in the lower part from this location.
oo > @e) =
Hence, we calculate the mid again, This time itis 5.
We compare the value stored at location 5 with our target value. We find that it is a match.
10 4 19 26 »@» 350 42 44
7 56,7
v1 2s so
‘We conclude thatthe target value 31 is stored at location 5
Binary search halves the searchable items and thus reduces the count of comparisons to be
made to very less numbers.
Psetidocode
‘The pseudocode of binary search algorithms should look like this
Procedure binary search
‘A ! sorted array
1 size of array
x ! value to be searched
Set lowerBound = 1
Set upperBound =n
while x not foundif A[midPoint
TWupperBound = lowerBound
EXIT: x does not exists.
set midPoint
lowerBound ~ ( upperBound - lowerBound )/ 2
if A[midPoint] x
set upperBound = midPoint - 1
EXIT: x found at location midPoint
end while
end procedure
SORTING
Preliminaries
A sorting algorithm is an algorithm that puts elements of a list in a certain order. The most
‘used orders are numerical order and lexicographical order. Efficient sorting is important to
optimizing the use of other algorithms that require sorted lists to work’correctly and for producing
human - readable input.
Sorting algorithms are often classified by:
Computational complexity (worst, average and best case) in terms of the size of the
list),
For typical Sorting algotithms good behaviour is O(NlogN) and worst case behaviour
is O(N") and the average case behaviour is O(N).
Meinory Utilization
Stability - Maintaining relative oder of records with equal keys,
‘No. of comparisions.
Methods applied like Insertion, exchange, selection, mergingete.
Sorting is a process of linear ordering of list of objects.
Sorting techniques are categorized into
= Internal Sorting
= Extemal Sorting
Internal Sorting takes place in the main memory of a computer.
EC $393/Fundamentals of data structures inc
‘vnireg: - Bubble sort, Insertion sort, Shell sort, Quick sort, Heap sort, ete
Extemal Sorting, takes place in the secondary memory of a computer, Since the mumber of
objects to be sorted is too large to fit in main memory
eg: - Merge Sort, Multiway Merge, Polyphase merge,
THE BUBBLE SORT
The bubble sort makes multiple passes through a list. It compares adjacent items and
exchanges those that are out of order. Each pass through the list places te next largest value in its,
proper place. In essence, each item “bubbles” up to the location where it belongs.
Fig. 5.1 shows the first pass of a bubble sort. The shaded items are being compared to see
if they are out of order. If there are 7 items in the list, then there are n= [n= 1 pairs of items that
need to be compared on the first pass. It is important to note that ofice the largest value in the list
is part of a pair, it will continually be moved along until the pass is complete
First Pass
——— SN
5426 | 93 | 17 | 77 | 31 | 44
20 | Exchange
26 | ss 93 | 17 | 77] 31 20. | No Exchange
26 | ss | 93 a7 | 77 | 31 | 44 | ss | 20 | Exchange
a7 [93 77 | 31 | 44 | ss | 20 | Exchange
a6 | ss [a7 | 77 | 93° 31 | 44 | ss | 20 | Exchange
26 | ss | a7 | 77 | 31 | 93 44 | 55 | 20 | Exchange
17 | 77 | 31 | 44 | 93 55 | 20 | Exchange
[# s4 [a7 | 77 | a1 | 44 | 55 | 93 20 | Exchange
20 | 93 | 93inplace
7 | 7
3 a. | 4 after first pass
Fig. $.1 Merge SortAt the start of the second pass, the largest value is now in place. There are n - In - 1 items
left to sort, meaning that there will be n - 2 n - 2 pairs. Since each pass places the next largest
value in place, the total number of passes necessary will be n - 1 n - 1. After completing the n - 1
n- [ passes, the smallest item must be in the correct position with no further processing required.
The exchange operation, sometimes called a “swap”.
Program for bubble soi
def bubbleSort(alist
for passnum in range(len(alist}-1,0,-1):
for iin range(passnum):
if alisti}>alist{i+1]:
temp = alist{i]
alist{i]
ist{i+1]
alistfi+1
temp
alist = [54,26,93,17,77.31,4+
bubbleSori(alist)
print(alist)
Output:
[17, 20, 26,
44, 94, 55, 77, 93]
Analysis:
To analyze the bubble sort, we should note that regardless of how the items are arranged in
the initial list, "IN"1 passes will be made to sort a list of size n. Table -1 shows the number of
comparisons for each pass. The.total number of comparisons is the sum of the first n - In - 1
integers: In the best case, if the list is already ordered, no exchanges will be made. However, in
the worst case, every comparison will cause an exchange. On average, we exchange half of the
time.
Pass Comparisons
1 ae ine
a-2n-
3 n-3n-3
a-ln- a
EC $393/Fundamentals of data structures inc
‘vnirDisadvantages:
Abubble sort is often considered the most inefficient sorting method since it must exchange
items before the final location is known, These “wasted” exchange operations are very costly
However, because the bubble sort makes passes through the entire unsorted portion of the list, it
has the capability to do something most sorting algorithms cannot. In particular, if during a pa:
there are no exchanges, then we know that the list must be sorted. A bubble sort can be modified
to stop early if it finds that the list has become sorted. This means that for lists that require just a
few passes, a bubble sort may have an advantage in that it will recognize the sorted list and stop
5.6. THE SELECTION SORT
The selection sort improves on the bubble sort by making only one exchange for every
pass through the list. In order to do this, a selection sort looks for the largest value as it makes a
pass and, after completing the pass, places it in the proper location, As with a bubble sort, after
the first pass, the largest item is in the correct place, After the second pass, the next largest is in
place. This process continues and requires n” In” passes to sort n items, since the final item must
be in place after the (7"1)(a!"1) last pass.
Figure shows the entire sorting process. On each pass, the largest remaining item is selected
and then placed in its proper location. The first pass places 9, the second pass places 77, the third
places 55, and soon,
eee
26 | sa | 93 | a7 |o77 | ar | 44 | ss | 20 | 93 istargest
———7
26 | ss | 20] 17 77 | 31 | 44 | ss | 93 | 77 Astargest
26 20 | SSistargest
26 29. | $4 istargest
26 | a1 | 20 | 17 | 44 sa | ss | 77 | 93 soja hace
C+
26 | a1 | 20 | a7 | 44 | st | ss | 77 | 93 | StS ireest
T+
x fv joa [# as [7 op | 26a17 | 20
» [a |s 54
93
Program for Selection Sort:
20 is largest
170k
list is sorted.
def selectionSort(alist)
for fillslot in range(Jen(alist)-1,0,-1):
positionO#Mas
for location in range({.fillslot*1)
if.alist[location}>alist{positionOfMax}
positionOfMax = location
temp = alistffillslot]
alist[fillslot] = alist[positionOfMax,
alist[positionOfMax] = temp
alist = [54.26,93,17,77,31,44,55,20]
selectionSort(alist)
print(alist)
Output:
[17, 20, 26, 31, 44, 54, 55, 77, 93]
INSERTION SORT
Insertion sorts works by taking elements from the list one by one and inserting them in their
current position into a new sorted. list, Insertion sort consists of N - 1 passes, where N is the
number of elements to be sorted. The i*pass of insertion sort will insert the element Ali] into
Ali - 1]. After doing this insertion the records occupying
its rightful place among A[1}, A[2]
A(I]..-.A[i] are in sorted order.
Insertion Sort Procedure
void Insertion_Sort (int af, int n)
inti, j, temp ;
for (i= Osi < msi
temp = ali]:
for (j= i; j>0 && afj-1] > temp 5
EC $393/Fundamentals of data structures inc
‘vnirExample
Consider an unsorted array as follows,
20 10 60 40 30 15
Passes of Insertion Sort
oricial | 20 | 10 | 60 | 40 | 30 | 15 | postions MoveD
After 10 | 20 | 60 | 40 | 30 | 1s 1
After 10 | 20 | 60 | 40 | 30 | 1s 0
After 10 | 20 | 40 | 60 | 30] as 1
afteri=4 | 10] 20 | 30 | 40 | 60 | 1s 2
aiteri=5 | 10] 15 | 20 | 30 | 40 | 60 4
Sorted Array [10 | 15 | 20 | 30 [40 | 0
Analysis Of Insertion Sort
WORST CASE ANALYSIS. = O(N?)
BEST CASE ANALYSIS = OW)
AVERAGE CASEANALYSIS - O(N’)
Limitations Of Insertion Sort :
* Trisrelatively efficient for small lists and mostly - sorted lists.
* Tris expensive because of shifting all following elements by one.
SHELL SORT
Shell sort was invented by Donald Shell. It improves upon bubble sort and insertion sort by
moving out of order elements more than one position at a time. It works by arranging the data
sequence in a two = dimensional array and then sorting the columns of the array using insertion
sort.
In shell short the whole array is first fragmented into K segments, where K is preferably a
prime mumber. After the first pass the whole array is partially sorted. In the next pass, the value of
K is reduced which increases the size of each segment and reduces the number of segments. The
next value of K is chosen so that itis relatively prime to its previous value. The process is repeateduntil K = 1, at which the array is sorted. The insertion sort is applied to each segment, so each
successive segment is partially sorted. The shell sort is also called the Dimini
Sort, because the value of K decreases continuously
Shell Sort Routine
void shellsort (int A[ J, int N)
inti, j,k, temp;
(i= AG -:
Aj] = temp?
Example
Consider an unsorted array as follows,
81 94 11 9612 35 17 95° 28 38
Here N = 10, the first pass as K = 5 (10/2)
81 94 11 9612 35 17 95 28 58
81 94 Il 96 12 35 17 95 28 58
After first pass
35 17 IL 28 12 81 94 95 96
EC $393/Fundamentals of data structures inc aneIn second Pass, K is reduced to 3
35 17 IL 28 12 81 94 95 96 58
After second pass,
28 12 M35 17
94,
In third pass, K is reduced to 1
The final sorted array is
11 12 17 28 35 $8 8194 95 996
Analysis Of Shell Sort :
WORST CASE ANALYSIS - O(N)
BEST CASE ANALYSIS- O(N log N)
AVERAGE CASE ANALYSIS - O(N)
Advantages Of Shell Sort :
* Tris one of the fastest algorithms for sorting small number of elements.
* Trrequires relatively small amounts of memory.
5.9, RADIX SORT
Radix sort is a small method used when alphabetizing a large list of names. Intuitively,
one might want to sort numbers on their most significant digit. However, Radix sort works counter-
intuitively by sorting on the least significant digits first. On the first pass, all the numbers are
sorted on the least significant digit and combined in an array. Then on the second pass, the entire
numbers are sorted again on the second least significant digits and combined in an array and so
on.Algorithm: Radix-Sort (list, n)
shift
for loop = 1 to keysize do
for entry = 1 ton do
bucketnumber = (listfentry] key / shift) mod 10
append (bucker[bucketmumber), listfeutry))
list = combinebuckets()
shit
Analysis,
shift * 10
Each key is looked at once for each digit (or letter ifthe keys are alphabetic) of the longest
‘key. Hence, if the longest key has m digits and there are nkeys, radix sort has order O(a.n)
However, if we look at these two values, the size of the keys will be relatively small when
compared to the number of keys. For example, if we have six-digit keys, we could have a million
different records
Here, we see that the size of the keys is not significant, and this algorithm is of linear
complexity Om)
Example
Following example shows how Radix sort operates on seven 3-digits number.
Input 1"Pass 2" Pass 3 Pass
329 720 720 329
457 329 355
657 436 436 436
839 457 839 487
436 637 35s 637
329 720
355 839 839
In the above example, the first column is the input. The remaining columns show the list
after successive sorts on increasingly significant digits position. The code for Radix sort assumes
‘that each element in an array A of nelements has d digits, where digit J is the lowest-order digit
and d is the highest-order digit.
EC $393/Fundamentals of data structures inc aneExample
To show how radix sort work
‘var array = [88, 410, 1772, 20]
Radix sort relies on the positional notation of integers, as shown here:
thousands hundreds tens ones
First, the array is divided into buckets based on the value of the least significant digit
the ones digit
0 | -410, 20
2 |-172
8 | -88
These buckets are then emptied in order, resulting in the following partially-sorted array
array = [410, 20, 1772, $8]
Next, repeat this procedure for the fens digits
1 | -410
2 | -20
7 | -1772The relative order of the elements didn’t change this time, but you've still got more
digits to inspect,
The next digit to consider is the hundreds digit:
o | -20,88
4 | -410
7 | -1772
For values that have no hundreds position (or any other position without a value), the digit
will be assumed to be zero.
Reassembling the array based on these buckets gives the following:
0, 88, 410, 172]
consider the thousands digit?
0 | -20,88,410
1 | -4772
‘Reassembling the array from these buckets leads to the final sorted array:
acray:
[20, 88, 410, 1772]
When multiple numbers end up in the same bucket, their relative ordering doesn’t change.
For examples im the Zero bucket for the hundreds position, 20 comes before 88. This is because the
previous step put 20 in a lower bucket than 80, so 20 ended up before 88 in the array.
HASHING
‘Hash Table
The hash table data structure is an array of some fixed size, containing the keys. A key is a
value associated with each record.
EC $393/Fundamentals of data structures inc ‘vnirSlot
Location
2 2
3 #8
4
6
s
9
10]
Fig. 5.10 Hash Table
HASHING FUNCTION
A hashing function is a key - to - address transformation, which acts upon a given key to
compute the relative position of the key in an array.
‘A simple Hash function
[ZASH (KEYVALUE) = KEYVALUE MOD TABLESIZE
‘Example : - Hash (92)
Hash (92) = 92 mod 10 = 2
The keyvalue “92” is placed in the relative location “2”
Routine For Simple Hash Function
Hash (Char *key, int Table Size)
int Hashvaltie = 0;
while (* 0”)
Hashval += * key +4;
retum Hashval % Tablesize;
Some of the Methods of Hashing Function
1
3
4
Module Division
Mid - Square Method
Folding Method
PSEUDO Random Method
Digit or Character Extraction Method
6. Radix Transformation.Collisions
A Collision occurs when two or more elements are hashed (mapped) to same value (ie)
‘When two key values hash to the same position.
Collision Resolution
‘When two items hash to the same slot, there is a systematic method for placing the second
item in the hash table. This process is called collision resolution,
Some of the Collision Resolution Techniques
Seperate Chaining 2. Open Addressing 3. Multiple Hashing
SEPERATE CHAINING
Seperate chaining is an open hashing technique, Apointer field is added to each record location,
‘When an overflow occurs this pointer is set to point to overflow blocks making a linked list.
In this method, the table can never overflow, since the linked list are only extended upon the
arrival of new keys.
Insert : 10, 11, 81, 10, 7, 34, 94, 17
o + | Lhe
1 » | ofn[ fs
3|
4] —Ps bios
+
EC $393/Fundamentals of data structures inc
‘vnirInsertion
To perform the insertion of an element, traverse down the appropriate list to check whether
the element is already in place
Ifthe element is new one, the inserted it is either at the front of the list or at the end of the
list
If it is a duplicate element, an extra field is kept and placed.
INSERT 10
Hash (k) = k% Tablesize
Hash (10) = 10 % 10
Hash (10) =
INSERT 11
Hash (11) = 11 % 10
Hash (11)
INSERT 81
Hash (81) = 81% 10
Hash (81)=1
The element 81 collides to the same hash value 1. To place the value 81 at this position
perform the following,
Traverse the list to check whether it is already present.
Since it is not already present, insert at end of the list. Similarly the rest of the elements are
inserted,
Routine To Perform Insertion
‘void Insert (int key, Hashtable H)
{
Position Pos, Newell:
List Ly
* Traverse the list to check whether the key is already present *
IND (Key, H);
If (Pos==NULL) * Key is not found *
Pos =
Newell = malloc (size of (struct ListNode));
If (Newcell ! = NULL)L =H — Thelists [Hash (key, H+ Tablesize)];
Neweell > Next =L + Next;
Newcell Element = key
* Insert the key at the front of the list *
L Next = Neweell;
Find Routine
Position Find (int key, Hashtable H)
{
Position P;
List L;
L=H 5 Thelists [Hash (key, H> Tablesize)];
P=L Next;
while (P! = NULL && P+ Element
P
retumn p;
Advantage
More number of elements can be inserted as it uses array of linked lists.
Disadvantage of Seperate Chaining
* Itrequires pointers, which occupies more memory space
* Tttakes more effort to performa search, since ittakes time to evaluate the hash function
and also to traverse the list
EC $393/Fundamentals of data structures inc
‘vnir