Chapter 9 Searching
Chapter 9 Searching
ALGORITHMS
CHAPTER 9 (SEARCHING)
Result Found
Linear Search (Example)
Best case:
In the best case, the desired element is present in the first position of the
array i.e. only one comparison is made. So, time complexity is 0(1).
Average case: If it is equally likely for arrangement to appear at any
position in the array. According the number of comparisons can be any of
the number 1,2,3,….n and each number occurs with the probability p=1/n.
Then,
time complexity(T(n)) is:
= 1.1/n +2.1/n+ ….+n.1/n
=i/n(1+2+3+….+n)=n(n+1)/2*1/n =(n+1)/2
=O((n+1)/2)
`
Worst case : Clearly, the worst case occurs when
item is the last element in the array or is not there at
all. In either situation, we have
T(n)=n+1
Accordingly, T(n)=O(n+1) i.e. O(n) is worst case
complexity.
->An unsuccessful search will take n comparisons. In
any case, number of comparison is O(n).
Note: i.e worst case if it is last element.
Binary Search
If the data items are presented in sorted form (i.e. ascending
or descending), then this algorithm is used
It is much more efficient algorithm than the general linear
search for the sorted data
Key value is compared with the middle element of the list
If the values are equal then the search is successful and the
process is stopped
If the middle values is less then the result is in upper half of the
list
If the middle value is greater then the result is in lower half of
the list
The search is repeated for the lower or upper half of the list
until we find the required value in the list or all items from the
list is searched
Binary Search (Example)
Find ‘x’
Total Iteration: 4
Binary Search (Example)
Find ‘4’
Total Iteration: 4
Binary Search (Algorithm)
1) Declare and initialize necessary variables, n, a[n],first=0,
last=n-1, item, middle=int(first+last)/2
2) Repeat step (3) and (4) while((first<=last)) &&
(a[mid]!=item))
3) If(item<a[middle])
last=middle - 1
else
first=middle + 1
4) middle=int(first + last)/2
5) if( a[middle]==item)
print “ search successful”
else print “ search unsuccessful” (6) exit
Binary Search
Binary search is best suited if
thedata are present in sorted form and
are being represented in array or list
43>31
Move to left
Found 31
Search Successful
Tree Search (Algorithm)
Declare and initialize necessary variables
node=root – node pointer pointing to root of the tree , item – data to be
searched; flag=0
If(node=NULL)
display “Empty Tree”; Stop;
End if
While (node!=NULL)
if (item=node->data)
display “Search Successful” ; flag=1; stop;
else if (item> node->data) node=node->right;
else node=node->left;
end if
End while
If (flag=0)
display “Search Unsuccessful”
End if
Efficiency of Binary search tree
The time required to search a binary search tree
varies O(n) and O(logn), depending on the structure
of the tree.
If the records are presented in random order,
balanced trees result more than not, so that on the
average, search time remains O(logn).
Hashing
Hashing is the technique of representing longer records
by shorter values called keys
The keys are placed in a table called hash table where
the keys are compared for finding the records
One of the simple search scheme where the records are
indexed using certain hash function is called hashing
Hash table is a dictionary in which keys are mapped to
array positions by hash function
Items being searched can directly be accessed by using
the hash table by mapping the corresponding key
values into records
Hashing technique requires minimum (generally 1)
number of comparison for searching the desired record
Hashing
Lets consider an example:
Numbers from 1 to 99 can be indexed in a hash Key Value
table by using the hash function of modulo 10 0 10, 20, 40
division 1
This function results with the last digit of the
2 42, 82
numbers which can be used as key for the hash
table 3
If random numbers are chosen, all the indices 4 64
may not be full 5 95
Some index may contain more than one values 6
and some may not contain any 7 87, 47
This imbalance in indices is called Clustering 8
More than one data in a single index is called 9 99
Collision, which results with the conflict while
searching
Hashing (terminology)
Home Address:
Address produced by the hash function
Prime area:
The memory location that contains all the home addresses
Synonyms:
A set of keys that hash to the same locations
Collision:
The location of data to be inserted is already occupied by
the synonym data
Any given hashing technique can be considered ideal if,
There is no location collision
The address space in memory is compact
Hash Function (Types)
Different types of hash functions can be used
Some of the popular ones are:
Direct hashing
Modulo Division
Multiplicative
Mid-Square
Folding
Direct Hashing
The address is the key itself address Key
Hash(key)=key 0 0
1 1
The main advantage is that
--- ---
there is not any collision --- ---
The disadvantage is that the 50 50
Address=hash(12865)=12865%1500=865
Multiplicative Method
Address=hash(k)=floor(listsize*(k*c-floor(k*c))
Where 0<c<1
Note: floor(X) is the largest integer not greater
than X
Description:
Multiply key k by real number between 0 and 1
Get fractional part of the product [k*c-
floor(k*c)]
This is a random number between 0 and 1
Multiply the result by listsize and obtain the
integer part
The final result is the required address
Multiplicative Method
Example:
Assume;
k=12876
Listsize=100
C=0.12
Now the address is:
Address=
hash(12876)=Floor(100*(12876*0.12-floor(12876*0.12)))
=floor(100*(1545.12-floor(1545.12)))
=floor(100*(1545.12-1545))
=floor(100*0.12)
=12
Digit Extraction
Some digits from the number in specific places are extracted
The places from which the extraction has to be done are
predefined
The same extraction technique is used for all the keys
Here,
Address=selected digit from the key
Example:
345261=326
167524=152
543625=562
987709=970
Mid Square
Few number of middle digits from the key are extracted
Thus extracted number is squared
The squared result is the required address value
The number of digits chosen depends on number of digits
allowed for indexing
Example:
Assume;
k=12876
Extract second and third digit
N=28
Now,
Address=hash(12876)=N*N
=28*28
=784
Mid Square
The major disadvantage is value obtained by doing
square may be too large
The resolution can be to use only a portion of the
result
Few number of digits from the middle of the result is
used
Example:
K=39873
Address=98*98=9604
Which is long
Hence use only portion of the result
New Address=60
Folding
The key is divided into parts whose size matches the address
size (or less for the last part)
Sum all the divided parts
If there is any carry in the result, then discard it
Thus formed number is the address for the key
Example:
Assume;
k=12896543
Hash table size = 000 to 999 (i.e. 3 digits)
Our part division will be: 128+965+43
=1136
Truncate the carry (i.e. 1 in thousand’s place)
Hence our address will be
Address=136
Collision Resolution
Direct hashing maps the key values with the
individual addresses, hence it is a one-to-one
mapping technique and no collision occurs.
All other hashing techniques may results with some
collision
Different collision resolution techniques are used
These techniques are independent of the hashing
functions applied
All these techniques target to minimize clustering
because clustering is the main reason for collision
Collision Resolution (Techniques)
Two basic techniques are used:
Rehashing
Also called Open Addressing
The types are:
Linear Probing
Quadratic Probing
Double Hashing
Chaining
Collision Resolution
Open Addressing:
When collision occurs, an unoccupied address is
searched for placing the new element
Rehashing rh is applied to address value h(key) if the
h(key) is already occupied in the hash table.
Again if rh(h(key)) is already occupied we apply
rh(rh(h(key))) until an open address is found
It can be done in 3 different ways:
LinearProbing
Quadratic Probing
Double Hashing
Linear probing
When a home address is occupied, go to the next
address
Next address = current address + 1
Rh(k,i) = (h(k)+i) % listsize
Where h(k) = k % listsize
and i=0, 1, 2, 3, …………., listsize-1
Linear Probing
Linear Probing
Advantages:
Simple to implement
Data tend to cluster around home address resulting to
compactness of disk spaces
Disadvantages:
Data tend to cluster around specific home address
(Primary Clustering)
The linear searching is required if data is not present in
the searched location, this is very slow process
Quadratic Probing
Tends to minimize the problem of primary clustering
from linear probing
The value is moved considerable distance from the
initial collision
The address incremented is the collision probe
number squared, i.e.
rh(k,i) = (h(k) + i2) % listsize
Where h(k) = k % listsize
and i=0, 1, 2, 3, ………, listsize-1
Quadratic Probing
Quadratic Probing
Advantages:
Works much better than linear probing
Removes primary clustering
Disadvantages:
Time consuming than linear probing
Produces secondary clustering
Double Hashing
Two different hash functions are used to generate
the address if the initial hashing results with collision
This removes the secondary collision
The initial hash value is reused to rehash functions
and new hash value is computed
hp(k, i) = (h1(k) + i*h2(k)) % listsize
Where h1(k) = k % listsize
and h2(k) = k % (some integer slightly less than listsize)
I = 0, 1, 2, 3, ………, (listsize-1)
Double Hashing
76, 93, 40, 47, 10, 55, 73, 56 h1(k)=k%10
hp(k,i)=(h1(k) + i * h2(k)) % listsize
Where i = 0, 1, 2, 3, ………, listsize-1
h2(k) = k % (listsize-1)
76 93 40 47 10 55 73 56
0 40 40 40 40 40 40
1 10 10 10 10
2
3 93 93 93 93 93 93 93
4 73 73
5 55 55 55
6 76 76 76 76 76 76 76 76
7 47 47 47 47 47
8 56
9
Open Addressing (Disadvantage)
Major disadvantages are:
Each collision resolution results with the probability for
future collision
If the number of keys are more than the address size of
hash table, then collision is sure to occur.
This is called overflow
To overcome these disadvantages, separate chaining is
used.
Chaining
Also called separate chaining
Use fixed size hash table
This method maintains the chain of elements which
have same hash address.
Link lists are used to store the synonyms
Each slot in hash table points to the head of the
linked list
All the elements for that address is placed in linked
list
Chaining