0% found this document useful (0 votes)
27 views47 pages

Chapter 9 Searching

This document provides an overview of searching algorithms and data structures. It discusses linear search, binary search, and tree search. Linear search is applicable for unsorted data and has a time complexity of O(n). Binary search is for sorted data and has a time complexity of O(log n). Tree search uses binary search trees and has an average time complexity of O(log n). The document also briefly covers hashing techniques.

Uploaded by

aavashdai248
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views47 pages

Chapter 9 Searching

This document provides an overview of searching algorithms and data structures. It discusses linear search, binary search, and tree search. Linear search is applicable for unsorted data and has a time complexity of O(n). Binary search is for sorted data and has a time complexity of O(log n). Tree search uses binary search trees and has an average time complexity of O(log n). The document also briefly covers hashing techniques.

Uploaded by

aavashdai248
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

DATA STRUCTURES AND

ALGORITHMS
CHAPTER 9 (SEARCHING)

By: Er. Santosh Panjiyar


Searching
 The process of retrieving some particular information
from a large amount of previously stored information.
 The information can be in sorted or unsorted form.
 Normally we think of the information as divided up into
records, each record having a key for use in
searching.
 The goal of the search is to find all records with keys
matching a given search key.
 The purpose of the search is usually to access
information within the record for processing.
 It is not compulsory that the searched key values are
found in the records.
Searching (Types)
 Searching generally falls in two categories:
 Internal Search
 External Search
 Internal Search:
 If the data to be searched are all present in main memory, then the
searching becomes internal search
 Internal Searches are faster than External Search and hence are
recommended whenever possible
 They are mainly done among the data which occupy less space
compared to the space of RAM
 External Search:
 If most of the data to be searched are in auxiliary memory, then the
search becomes External Search.
 If the data are very large and our main memory is not large enough
to hold them all during the process, then the external search is used.
Searching Algorithms
 Different searching algorithms are used
 The choice of proper algorithm depends upon the way the
data are arranged
 Any algorithm technique may be better than another
according to the favorable way the data are arranged
for it
 We are going to study the following 3 searching
techniques:
 Linear Search
 For unsorted data in linear structure
 Binary Search
 For sorted data in linear structure
 Tree Search
 For data maintained in search trees
Linear Search
 Also called Sequential Search
 Simplest among all
 Applicable for data organized in form of array or linked list
 It is applicable for small data values
 Each element in the array is compared with the value to be
searched
 If the values are matched, then the search is successful
 Otherwise the comparison is kept on doing until all the values are
compared
 By the end of the comparison if the value in the array is not
matched with the value to be searched, then the search is
considered unsuccessful
Linear Search (Example)
To find

Result Found
Linear Search (Example)

Result Not Found


Linear Search (Algorithm)
Declare and initialize necessary variables
n; a[n]; item-to be searched from array
flag=0 for determining the success of search
For i=0 to n-1
if a[i]=item
display “Search Successful”
flag=1
stop
end if
End for
If flag=0
display “Search Unsuccessful”
End if
Linear Search
 It is considered simple and is very applicable when
searching for small data
 It is good in searching for unsorted data
 Whereas,
 It is slower as compared to other searching
algorithms
 It is applicable only for small amount of data
Efficiency of Sequential search

 Best case:
In the best case, the desired element is present in the first position of the
array i.e. only one comparison is made. So, time complexity is 0(1).
 Average case: If it is equally likely for arrangement to appear at any
position in the array. According the number of comparisons can be any of
the number 1,2,3,….n and each number occurs with the probability p=1/n.
Then,
time complexity(T(n)) is:
= 1.1/n +2.1/n+ ….+n.1/n
=i/n(1+2+3+….+n)=n(n+1)/2*1/n =(n+1)/2
=O((n+1)/2)

`
 Worst case : Clearly, the worst case occurs when
item is the last element in the array or is not there at
all. In either situation, we have
T(n)=n+1
Accordingly, T(n)=O(n+1) i.e. O(n) is worst case
complexity.
->An unsuccessful search will take n comparisons. In
any case, number of comparison is O(n).
Note: i.e worst case if it is last element.
Binary Search
 If the data items are presented in sorted form (i.e. ascending
or descending), then this algorithm is used
 It is much more efficient algorithm than the general linear
search for the sorted data
 Key value is compared with the middle element of the list
 If the values are equal then the search is successful and the
process is stopped
 If the middle values is less then the result is in upper half of the
list
 If the middle value is greater then the result is in lower half of
the list
 The search is repeated for the lower or upper half of the list
until we find the required value in the list or all items from the
list is searched
Binary Search (Example)

Find ‘x’

Total Iteration: 4
Binary Search (Example)

Find ‘4’

Total Iteration: 4
Binary Search (Algorithm)
1) Declare and initialize necessary variables, n, a[n],first=0,
last=n-1, item, middle=int(first+last)/2
2) Repeat step (3) and (4) while((first<=last)) &&
(a[mid]!=item))
3) If(item<a[middle])
last=middle - 1
else
first=middle + 1
4) middle=int(first + last)/2
5) if( a[middle]==item)
print “ search successful”
else print “ search unsuccessful” (6) exit
Binary Search
 Binary search is best suited if
 thedata are present in sorted form and
 are being represented in array or list

 The main drawbacks are:


 Requires the data to be already sorted
 Cannot be used where there are many insertions or
deletions
Complexity of binary search

 In binary search in each comparison, the size


of the search area is reduced by half. So, the
efficiency of the binary search method at the
binary search method at the worst case is
log2n+1 i.e O(log2n+1)
where n is the total no. of items.
-> Thus, we may say that the binary search
algorithm is O(log n) or O(log2n).
Tree Search
 If the data are arranged in the form of search tree
structure, then the tree search can be applied
 Search tree generally is a Binary Search Tree
 Here the key value is compared with the root node at first.
 If it matches then the search is successful
 If it doesn’t matches then either the left or right subtree is
searched based upon the comparison result
 If the data item is less than the root node, then left subtree is
searched
 If the data item is greater than the root node, then right subtree is
searched
 The traversal is repeated until the searched item is found or
null value is reached
Tree Search (Example)
Q. Search for 31 Compare key 31 with root value 11
31>11, so move to right

Compare key 31 with root


value 19
31>19, so move to right

43>31
Move to left

Found 31
Search Successful
Tree Search (Algorithm)
Declare and initialize necessary variables
node=root – node pointer pointing to root of the tree , item – data to be
searched; flag=0
If(node=NULL)
display “Empty Tree”; Stop;
End if
While (node!=NULL)
if (item=node->data)
display “Search Successful” ; flag=1; stop;
else if (item> node->data) node=node->right;
else node=node->left;
end if
End while
If (flag=0)
display “Search Unsuccessful”
End if
Efficiency of Binary search tree
 The time required to search a binary search tree
varies O(n) and O(logn), depending on the structure
of the tree.
 If the records are presented in random order,
balanced trees result more than not, so that on the
average, search time remains O(logn).
Hashing
 Hashing is the technique of representing longer records
by shorter values called keys
 The keys are placed in a table called hash table where
the keys are compared for finding the records
 One of the simple search scheme where the records are
indexed using certain hash function is called hashing
 Hash table is a dictionary in which keys are mapped to
array positions by hash function
 Items being searched can directly be accessed by using
the hash table by mapping the corresponding key
values into records
 Hashing technique requires minimum (generally 1)
number of comparison for searching the desired record
Hashing
 Lets consider an example:
 Numbers from 1 to 99 can be indexed in a hash Key Value
table by using the hash function of modulo 10 0 10, 20, 40
division 1
 This function results with the last digit of the
2 42, 82
numbers which can be used as key for the hash
table 3
 If random numbers are chosen, all the indices 4 64
may not be full 5 95
 Some index may contain more than one values 6
and some may not contain any 7 87, 47
 This imbalance in indices is called Clustering 8
 More than one data in a single index is called 9 99
Collision, which results with the conflict while
searching
Hashing (terminology)
 Home Address:
 Address produced by the hash function
 Prime area:
 The memory location that contains all the home addresses
 Synonyms:
 A set of keys that hash to the same locations
 Collision:
 The location of data to be inserted is already occupied by
the synonym data
 Any given hashing technique can be considered ideal if,
 There is no location collision
 The address space in memory is compact
Hash Function (Types)
 Different types of hash functions can be used
 Some of the popular ones are:
 Direct hashing
 Modulo Division

 Multiplicative

 Digit Extraction (truncation)

 Mid-Square

 Folding
Direct Hashing
 The address is the key itself address Key
 Hash(key)=key 0 0
1 1
 The main advantage is that
--- ---
there is not any collision --- ---
 The disadvantage is that the 50 50

address space (storage) is as 51 51


--- ---
large as the key space
--- ---
1089 1089
1090 1090
Modulo Division
 Hash(key)=address=key%listsize
 Yields hash value which belongs to the set
{0,1,2,3,……..,listsize-1}
 Fewer collisions if listsize is a prime number
 Example:
 Numbering system to handle 1,500 students
 If key is 12865

 Address=hash(12865)=12865%1500=865
Multiplicative Method
 Address=hash(k)=floor(listsize*(k*c-floor(k*c))
 Where 0<c<1
 Note: floor(X) is the largest integer not greater
than X
 Description:
 Multiply key k by real number between 0 and 1
 Get fractional part of the product [k*c-
floor(k*c)]
 This is a random number between 0 and 1
 Multiply the result by listsize and obtain the
integer part
 The final result is the required address
Multiplicative Method
Example:
Assume;
k=12876
Listsize=100
C=0.12
Now the address is:
Address=
hash(12876)=Floor(100*(12876*0.12-floor(12876*0.12)))
=floor(100*(1545.12-floor(1545.12)))
=floor(100*(1545.12-1545))
=floor(100*0.12)
=12
Digit Extraction
 Some digits from the number in specific places are extracted
 The places from which the extraction has to be done are
predefined
 The same extraction technique is used for all the keys
 Here,
 Address=selected digit from the key
Example:
345261=326
167524=152
543625=562
987709=970
Mid Square
 Few number of middle digits from the key are extracted
 Thus extracted number is squared
 The squared result is the required address value
 The number of digits chosen depends on number of digits
allowed for indexing
Example:
Assume;
k=12876
Extract second and third digit
N=28
Now,
Address=hash(12876)=N*N
=28*28
=784
Mid Square
 The major disadvantage is value obtained by doing
square may be too large
 The resolution can be to use only a portion of the
result
 Few number of digits from the middle of the result is
used
Example:
K=39873
Address=98*98=9604
Which is long
Hence use only portion of the result
New Address=60
Folding
 The key is divided into parts whose size matches the address
size (or less for the last part)
 Sum all the divided parts
 If there is any carry in the result, then discard it
 Thus formed number is the address for the key
Example:
Assume;
k=12896543
Hash table size = 000 to 999 (i.e. 3 digits)
Our part division will be: 128+965+43
=1136
Truncate the carry (i.e. 1 in thousand’s place)
Hence our address will be
Address=136
Collision Resolution
 Direct hashing maps the key values with the
individual addresses, hence it is a one-to-one
mapping technique and no collision occurs.
 All other hashing techniques may results with some
collision
 Different collision resolution techniques are used
 These techniques are independent of the hashing
functions applied
 All these techniques target to minimize clustering
because clustering is the main reason for collision
Collision Resolution (Techniques)
 Two basic techniques are used:
 Rehashing
 Also called Open Addressing
 The types are:
 Linear Probing
 Quadratic Probing
 Double Hashing
 Chaining
Collision Resolution
 Open Addressing:
 When collision occurs, an unoccupied address is
searched for placing the new element
 Rehashing rh is applied to address value h(key) if the
h(key) is already occupied in the hash table.
 Again if rh(h(key)) is already occupied we apply
rh(rh(h(key))) until an open address is found
 It can be done in 3 different ways:
 LinearProbing
 Quadratic Probing
 Double Hashing
Linear probing
 When a home address is occupied, go to the next
address
 Next address = current address + 1
 Rh(k,i) = (h(k)+i) % listsize
 Where h(k) = k % listsize
 and i=0, 1, 2, 3, …………., listsize-1
Linear Probing
Linear Probing
 Advantages:
 Simple to implement
 Data tend to cluster around home address resulting to
compactness of disk spaces
 Disadvantages:
 Data tend to cluster around specific home address
(Primary Clustering)
 The linear searching is required if data is not present in
the searched location, this is very slow process
Quadratic Probing
 Tends to minimize the problem of primary clustering
from linear probing
 The value is moved considerable distance from the
initial collision
 The address incremented is the collision probe
number squared, i.e.
 rh(k,i) = (h(k) + i2) % listsize
 Where h(k) = k % listsize
 and i=0, 1, 2, 3, ………, listsize-1
Quadratic Probing
Quadratic Probing
 Advantages:
 Works much better than linear probing
 Removes primary clustering

 Disadvantages:
 Time consuming than linear probing
 Produces secondary clustering
Double Hashing
 Two different hash functions are used to generate
the address if the initial hashing results with collision
 This removes the secondary collision
 The initial hash value is reused to rehash functions
and new hash value is computed
 hp(k, i) = (h1(k) + i*h2(k)) % listsize
 Where h1(k) = k % listsize
 and h2(k) = k % (some integer slightly less than listsize)
 I = 0, 1, 2, 3, ………, (listsize-1)
Double Hashing
76, 93, 40, 47, 10, 55, 73, 56 h1(k)=k%10
hp(k,i)=(h1(k) + i * h2(k)) % listsize
Where i = 0, 1, 2, 3, ………, listsize-1
h2(k) = k % (listsize-1)
76 93 40 47 10 55 73 56
0 40 40 40 40 40 40
1 10 10 10 10
2
3 93 93 93 93 93 93 93
4 73 73
5 55 55 55
6 76 76 76 76 76 76 76 76
7 47 47 47 47 47
8 56
9
Open Addressing (Disadvantage)
 Major disadvantages are:
 Each collision resolution results with the probability for
future collision
 If the number of keys are more than the address size of
hash table, then collision is sure to occur.
 This is called overflow
 To overcome these disadvantages, separate chaining is
used.
Chaining
 Also called separate chaining
 Use fixed size hash table
 This method maintains the chain of elements which
have same hash address.
 Link lists are used to store the synonyms
 Each slot in hash table points to the head of the
linked list
 All the elements for that address is placed in linked
list
Chaining

You might also like