Unit 5 Sort and Search
Unit 5 Sort and Search
Sorting
Process of arranging elements
Useful in Searching process
Useful in many algorithms as pre-requisite (Uniqueness check, etc.),
and therefore improves the efficiency of the algorithm.
Insertion Sort
The given problem instance is divide into 2 group – sorted, unsorted.
Initially, sorted group will contain first element, and rest of the
elements are in Unsorted group
One element of unsorted group is taken at a time and inserted into
sorted group.
Example
84 69 69 69 69
54 54
84 84
76 76 76
69 69
84
84 84 84
76 76
75
69 69 69
86 86
84 84
76
76 76 76 76
86 86
84
86 86 86 86 86
86
54 54 94
54 54
94 94
54 54
75 75 91
75 75
91 91
75 75 75
Algorithm
for(i=1 to n-1) do
{
ele=a[i], j=i-1;
for(j=i-1; j>=0 & a[j+1] < a[j]; j--)
{
a[j+1] = a[j];
}
a[j+1]=ele;
}
BUCKET SORT
Input is an array of 15 integers. For integers, the number of buckets is 10, from
0 to 9. The first pass distributes the keys into buckets by the least significant
digit (LSD). When the first pass is done, we have the following
RADIX SORT - Example
100, 150, 65, 25, 19, 8, 4, 67, 73, 90, 128,
248, 328, 440
0 1 2 3 4 5 6 7 8 9
100 73 4 65 67 8 19
150 25 128
90 248
440 328
merge
100 150 090 440 073 004 065 025 067
008 128 248 328 019
Radix Sort – Program
#include <conio.h>
#include <stdio.h>
void main()
{
int unsorted[50] , bucket[10][50]={{0}} , sorted[50] ;
int j , k , m , p , flag = 0, num, N;
clrscr();
printf("\nEnter the number of elements to be sorted :");
scanf("%d",&N);
printf("\nEnter the elements to be sorted :\n");
for(k=0 ; k < N ; k++)
{
scanf("\n%d",&num);
sorted[k] = unsorted[k] = num;
}
Radix Sort – Program
for(p=1; flag != N ; p*=10)
{
flag = 0;
for(k=0;k<N;k++)
{
printf("\n flag=%d",flag);
bucket[(sorted[k]/p)%10][k] = sorted[k];
printf("\n position of element=%d %d",((sorted[k]/p)%10),k);
printf("\n element value=%d",bucket[(sorted[k]/p)%10][k]);
if ( (sorted[k]/p)%10 == 0 )
{
flag++;
}
}
Radix Sort – Program
for(j=0,m=0;j<10;j++) if (flag == N)
{ {
for(k=0;k<N;k++) printf("\nSorted List: \n");
{ for(j=0 ; j < N ; j++)
if( bucket[j][k] > 0 ) {
{ printf("%d\t", sorted[j]);
sorted[m] = bucket[j][k]; }
bucket[j][k] = 0 ; m++; printf("\n");
} }
} getch() ;
} }
}
Address Calculation Sort
• This can be one of the fastest types of distributive sorting technique if
enough space is available also called as Hashing.
• In this algorithm, a hash function is used and applied to each element in
the list. The result of the hash function is placed into an address in the
table that represents the key.
• Linked lists are used as address table for storing keys.(if there are 4 keys
then 4 linked lists are used).
• The hash function places the elements in linked lists are called as sub
files. An item is placed into a sub -file in correct sequence by using any
sorting method. After all the elements are placed into subfiles, the lists
(subfies)are concatenated to produce the sorted list.
Address Calculation Sort
Procedure:
1. In this method a hash function f is applied to each key.
2. The result of this function determines into which of the
several subfiles the record is to be placed. The function should
have the property that: if x <= y , f (x) <= f (y), Such a
function is called order preserving.
3. An item is placed into a subfile in correct sequence by using
any sorting method – simple insertion is often used.
4. After all the elements are placed into subfiles, the lists
(subfiles) are concatenated to produce the sorted list.
SEARCHING TECHNIQUES
Hashing
Table 1. Records of employees
• Division Method
• Multiplication Method
• Mid-Square Method
• Folding Method
Division Method: This method divides x by M and then uses
the remainder obtained.
h(x) = x mod M
Example: Given a hash table of size 1000, map the key 12345 to
an appropriate location in the hash table.
Step 1: Square the value of the key. That is, find k2.
Step 2: Extract the middle r digits of the result obtained in Step 1.
Example: Calculate the hash value for keys 1234 and 5642 using
the mid-square method. The hash table has 100 memory locations.
Solution Note that the hash table has 100 memory locations whose
indices vary from 0 to 99.
This means that only two digits are needed to map the key to a
location in the hash table, so r = 2.
When k = 1234, k2 = 1522756, h (1234) = 27
When k = 5642, k2 = 31832164, h (5642) = 21
Observe that the 3rd and 4th digits starting from the right are
chosen.
Folding Method:
Step 1: Divide the key value into a number of parts. That is, divide
k into parts k1, k2, , ..., kn, where each part has the same number
of digits except the last part which may have lesser digits than the
other parts.
Step 2: Add the individual parts. That is, obtain the sum of k1+ k2
+ ... + kn. The hash value is produced by ignoring the last carry, if
any.
Example: Given a hash table of 100 locations, calculate
the hash value using folding method for keys 5678, 321,
and 34567
Collision Resolution Technique
• Collisions occur when the hash function maps two different keys to the
same location. Obviously, two records cannot be stored in the same
location.
• A method used to solve the problem of collision, collision resolution
technique is applied. The two most popular methods of resolving
collisions are:
1. Open addressing
2. Chaining
• The hash table contains two types of values: sentinel values (e.g., –1) and
data values. The presence of a sentinel value indicates that the location
contains no data value at present but can be used to hold a value.
• The process of examining memory locations in the hash table is called
probing.
Collision Resolution by Open addressing
Open addressing technique can be implemented using linear probing,
quadratic probing, double hashing, and rehashing.
Key = 101
h(101, 0) = (101 mod 10 + 0) mod 10 = (1) mod 10 = 1
T[1]is occupied, so we cannot store the key 101 in T[1]. Therefore, try again for the
next location. Thus probe, i = 1, this time.
Key = 101
h(101, 1) = (101 mod 10 + 1) mod 10 = (1 + 1) mod 10 = 2
T[2]is also occupied, so we cannot store the key in this location. The procedure will be
repeated until the hash function generates the address of location 8 which is vacant
and can be used to store the value in it.
Searching a Value using Linear Probing
• While searching for a value in a hash table, the array index is re-
computed and the key of the element stored at that location is
compared with the value that has to be searched.
• If a match is found, then the search operation is successful.
• If the key does not match, then the search function begins a
sequential search of the array that continues until:
the value is found, or
the search function encounters a vacant location in the array,
indicating that the value is not present, or
the search function terminates because it reaches the end of the
table and the value is not present.
Collision Resolution by Open addressing Contd..
Quadratic Probing: In this technique, if a value is already stored at a
location generated by h(k), then the following hash function is used to
resolve the collision:
where m is the size of the hash table, h’(k) = (k mod m), i is the probe
number that varies from 0 to m–1, and c1and c2 are constants such that
c1 and c2 ≠ 0.
Example: Consider a hash table of size 10. Using quadratic probing, insert
the keys 72, 27, 36, 24, 63, 81, and 101 into the table. Take c1= 1 and c2= 3.
Let h’(k)= k mod m, m = 10, h(k, i) = [h’(k) + c1i + c2i2] mod m
Key = 72
h(72, 0) = [72 mod 10 + 1 ¥ 0 + 3 ¥ 0] mod 10 = [72 mod 10] mod 10 = 2 mod 10 = 2
Key = 101
h(101,0) = [101 mod 10 + 1 ¥ 0 + 3 ¥ 0] mod 10 = [101 mod 10 + 0] mod 10 = 1 mod 10 = 1
Since T[1]is already occupied, the key 101 cannot be stored in T[1]. Therefore, try again for
next location. Thus probe, i = 1, this time.
Key = 101
h(101,0) = [101 mod 10 + 1 ¥ 1 + 3 ¥ 1] mod 10 = [101 mod 10 + 1 + 3] mod 10
= [101 mod 10 + 4] mod 10 = [1 + 4] mod 10 = 5 mod 10 = 5
Collision Resolution by Open addressing Contd..
Double Hashing: double hashing uses one hash value and then
repeatedly steps forward an interval until an empty location is reached.
The interval is decided using a second, independent hash function,
hence the name double hashing. In double hashing, we use two hash
functions rather than a single function. The hash function in the case
of double hashing can be given as:
where m is the size of the hash table, h1(k) and h2(k) are two hash
functions given as h1(k) = k mod m, h2(k) = k mod m', i is the probe
number that varies from 0 to m–1, and m' is chosen to be less than m.
We can choose m' = m–1or m–2.
Example: Consider a hash table of size = 10. Using double hashing,
insert the keys 72, 27, 36, 24, 63, 81, 92, and 101 into the table.
Take h1= (k mod 10) and h2 = (k mod 8).
We have h(k, i) = [h1 (k) + ih2(k)] mod m
Key = 72
h(72, 0) = [72 mod 10 + (0 ¥ 72 mod 8)] mod 10 = [2 + (0 ¥ 0)] mod 10 = 2 mod 10 = 2
Each file has a list of attributes associated with it that gives the
operating system and the application software information about the
file and how it is intended to be used.
Attributes Flag
A file can have six additional attributes attached to it. These attributes
are usually stored in a single byte, with each bit representing a
specific attribute
Read-only A file marked as read-only cannot be deleted or modified.
Volume Label Every disk volume is assigned a label for identification. The
label can be assigned at the time of formatting the disk or later through
various tools such as the DOS command LABEL.
Binary file:
• contains any type of data encoded in binary form for computer storage
and processing purposes.
• provide efficient storage of data, but they can be read only through an
appropriate program.
• is not readable by humans.
BASIC FILE OPERATIONS
FILE ORGANIZATION
1000 + (5–1) * 20
= 1000 + 80
= 1080
Indexed Sequential File Organization
• Indexed sequential file organization stores data for fast retrieval. The
records in an indexed sequential file are of fixed length and every
record is uniquely identified by a key field.
• It maintains a table known as the index table which stores the record
number and the address of all the records.
• This type of file organization is called as indexed sequential file
organization because physically the records may be stored anywhere,
but the index table stores the address of those records.
• An indexed sequential file uses the concept of both sequential as
well as relative files.
• While the index table is read sequentially to find the address of the
desired record, a direct access is made to the address of the specified
record in order to access it randomly.
Indexed Sequential File Organization
INDEXING