0% found this document useful (0 votes)
20 views58 pages

Searching Hashing

Uploaded by

roopitaryaman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views58 pages

Searching Hashing

Uploaded by

roopitaryaman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 58

Quality Content for Outcome based Learning

Searching & Hashing

Ver. No.: 1.1 Copyright © 2021, ABES Engineering College


General Guideline
© (2021) ABES Engineering College.

This document contains valuable confidential and proprietary information of ABESEC. Such confidential and
proprietary information includes, amongst others, proprietary intellectual property which can be legally protected and
commercialized. Such information is furnished herein for training purposes only. Except with the express prior
written permission of ABESEC, this document and the information contained herein may not be published,
disclosed, or used for any other purpose.

2
Copyright © 2021, ABES Engineering College
Module Objective

Linear Search – Problem – Solution –Complexity - Application - Disadvantages

Binary Search – Problem – Solution – Complexity-Application - Disadvantages

Problems – Solution And Variation of Binary Search and linear Search

Ternary Search – Jump Search – Exponential Search – Interpolation Search

Hashing – Hash Function

Collision Resolution Technique – Linear Probing – Quadratic Probing – Rehashing/ Double


Hashing

3
Copyright © 2021, ABES Engineering College
Quality Content for Outcome based Learning

Searching

Ver. No.: 1.1 Copyright © 2021, ABES Engineering College


Searching :

Searching is the process of finding a given value position in a list of values.

It decides whether a search key is present in the data or not.

It is the algorithmic process of finding a particular item in a collection of items.

There are several methods of searching in the list of items.

Some of them are listed below:


 Linear/ Sequential Search (Brute Force approach)
 Binary Search (Decrease and conquer approach)
 Ternary Search
 Jump Search
 Exponential Search
 Indexed Sequential Search (Combination of Sequential and Binary Search)
 Hashing

5
Copyright © 2021, ABES Engineering College
Real world Problems

There are many other scenarios in which searching is performed as a frequently carried out
operation.
Some of them are listed below:

• Dictionary where meaning is required to be searched for a given word,


• Catalog from which description is required to be searched for a particular item e.g. e-retail
stores like Amazon, Flipkart
• Daily attendance through fingerprint scan (biometric search),
• Telephone directory of mobile phone in which a telephone number is required to be searched
through name,
• Keyword search in Google, etc.
• Research article search at Research Gate, Sci-hub, etc.

6
Copyright © 2021, ABES Engineering College
Linear/ Sequential Search

A Linear search simply scans each element at a time sequentially; that's why it is also known as sequential search.

Example:
Suppose we have to find a mobile number for some person. The Mobile No is stored in the address book of the phone.
If we scan from the first contact and scroll it down one by one until the desired mobile no. is found, this process is also a
sequential search.

7
Copyright © 2021, ABES Engineering College
Algorithm Example

ALGORITHM LinearSearch(A[ ], N, SearchKey) Let us consider the following example. The below-given
BEGIN: figure shows an array of character values having 8 data
FOR i=0 to N-1 DO items.
IF A [ i ] = =SearchKey THEN
RETURN i
RETURN -1
//invalid index indicating search element is not found
END;

If we want to search 'D', then the searching begins from


the 0th element and scans each element is scanned one
by one till the search element is not found.

8
Copyright © 2021, ABES Engineering College
Analysis of algorithm

Time Complexity
Best-case complexity: Ω(1), when the element is found at the first position.
Worst-case complexity: O(n), when the element is found at the last index or element is not present in the array.
Average-case Complexity: For the average case analysis, we need to consider the probability of finding the element at
every position. In a set of randomly arranged data elements, finding the search element at any place is equally likely. In
an element data set of n size, the probability of finding the element at every position will be 1/n.
Total Effort = ∑ Probability * No of Comparisons
= 1/n * 1 + 1/n *2 + 1/n*3+ … + 1/n*(n-1) + 1/n*(n)
=1/n*(1+2+3+ … + n)
= 1/n * ∑ n
= 1/n * n*(n+1)/2
= (n+1)/2
= θ(n)

9
Copyright © 2021, ABES Engineering College
Analysis of Algorithm
Space Complexity: In the algorithm written above, we just need a loop counter as additional memory. The space
complexity thus is constant i.e., θ(1).

10
Copyright © 2021, ABES Engineering College
Linear Search in 2-D Array(Matrices)

ALGORITHM LinearSearchIn2D(A[m][n], SearchKey)


BEGIN:
FOR i=0 to m-1 DO
FOR j=0 to n-1 DO
IF A [i][j] = = SearchKey THEN
WRITE(“Element found at Row i, Column j”)
WRITE(“Element not present”)
END;

Time complexity = O(n2)

11
Copyright © 2021, ABES Engineering College
Practice Problem

Examples:
Given number is 225: Then both the above algorithms return true because 25 is present in 225 and 225 is divisible by
25.
Given number is 175: 25 is not present so return false. 175 is divisible by 25 so returns true.
Given number is 149: Both the Algorithm returns false as given number does not contain 25 and also not divisible by
25

ALGORITHM Print(A[],N)
BEGIN:
FOR i=0 TO N DO
IF Contains25(A[i]) || DivisibleBy25(A[i])
WRITE(“LIKE”)
ELSE
WRITE(“DISLIKE”)
END;

12
Copyright © 2021, ABES Engineering College
Practice Problem

ALGORITHM Contains25(N)
BEGIN:
WHILE N!=0 DO
IF N%100==25 THEN
RETURN 1
N=N/10
RETURN 0
END;

ALGORITHM DivisibleBy25(N)
BEGIN:
RETURN N%25==0
END;

13
Copyright © 2021, ABES Engineering College
Disadvantage
 In the linear search the data elements are randomly arranged.
 In case we have the data elements arranged in some order, the search effort
can be brought down?
 Linear Search still takes o(n) time
 Can we Reduce ?

14
Copyright © 2021, ABES Engineering College
Binary Search

Binary search is an approach that can be applied ..


If the Binary search is performed on a sorted array, the procedure goes like:

1. Find the middle element of the array.


2. Compare the mid element with the search element.
3. There are three possible cases.

The mid element is the same as the search element. Search can be declared successful in this case.
4. If the search element is less than the mid element, the search area should be restricted to the left half of mid only
5. Similarly, if it is greater than the mid element, search area should be the right half of mid.
6. The process is followed until the search element is found.

15
Copyright © 2021, ABES Engineering College
Solution Approach

Example:
 Let us understand the working of binary search
through an example:
 Suppose we have an array of 10 size, which is
indexed from 0 to 9 as shown in the below figure and
we want to search element 22 (Key) in the given
array.

16
Copyright © 2021, ABES Engineering College
Algorithm

ALGORITHM Binary Search (A[],N, SearchKey)


BEGIN:
Low=0
High = N-1
WHILE Low <=High DO
Mid = ⌊(Low +High)/2⌋
IF A[Mid] = = Searchkey THEN
RETURN Mid
ELSE
If SearchKey<A[Mid] THEN
High= Mid-1
ELSE
Low= Mid+1
RETURN -1 // invalid index indicating that element is not found
END;

17
Copyright © 2021, ABES Engineering College
Analysis of Algorithm
Time complexity:
In binary search, best-case complexity is Ω(1), where the element is found at the middle index in the first run.

For the worst-case analysis, let us have a second look at the algorithm. Constant number of statements is required to
be executed for dividing the search area. The number of elements in either of the selected half will be N/2. A binary
search is performed on the selected area recursively. The Recurrence given below justifies this paragraph.

T(N) = T(N/2) +C
T(N) is the time complexity of Binary search in an array of size N. T(N/2) is the Time complexity of Binary Search on an
array of size N/2.

T(N) = T(N/2) +C
= [T(N/ 4) + C] +C
= T(N/22) + 2C
= [T(N/8 ) +C]+ 2C
= T(N/23 ) + 3C

= T(N/2K ) + K.C

18
Copyright © 2021, ABES Engineering College
Analysis of Algorithm
After K divisions, the length of array becomes 1
Length of array = N⁄2K = 1
=>N = 2K
Applying log function on both sides
=>log2 (N) = log2 (2K)
=>log2 (N) = K log2 (2)
=>K = log2 (N)

T(N) =T(N/2) +K.C


=T (1) + log2 (N).C
Searching in a one size array requires constant computations
= C’ + log2 (N).C
= O(log2 N)

Space Complexity: The algorithm takes high, low, and mid variables in the logic. Count of 3 is constant; hence the
space complexity of Binary Search is θ(1).

19
Copyright © 2021, ABES Engineering College
Recursive Binary Search
The recursive approach of Binary search is similar to the iterative one. It assumes that every time a part of the array is
selected for search, We can perform a Binary search on that array recursively.

ALGORITHM BinarySearch (A[ ], Low, High SearchKey)


ALGORITHM BinarySearch (A[ ], Low, High SearchKey)
BEGIN:
IF Low<= High THEN
Mid = ⌊(Low +High)/2⌋
IF A[Mid] = = SearchKey THEN
RETURN Mid
ELSE
IF SearchKey<A[Mid] THEN
BinarySearch(A[ ], low, Mid–1, SearchKey)
ELSE
BinarySearch(A[ ], Mid+1, High, SearchKey)
RETURN –1
END;
The recursive Binary Search approach is easy to write, but it increases the space complexity because of pending
activation records. The space taken by recursive Binary Search is O(log 2N)

20
Copyright © 2021, ABES Engineering College
Comparison of Linear Search with Binary Search
Linear Search Binary Search
Working Linear search iterates through all the Binary search wisely decreases the
elements and compares them with size of the array which has to be
the key which has to be searched. searched and compares the key with
the mid element every time.

Prerequisites Data can be random or sorted the It works only on a sorted array, so
algorithm remains the same, so there sorting an array is a prerequisite for
is no need for any pre-work. this algorithm.

Use Case We are generally preferred for smaller We are preferred for comparatively
and randomly ordered datasets. larger and sorted datasets.

Effectiveness Less efficient in the case of larger More efficient in the case of larger
datasets. datasets.
Time Complexity Best-case complexity - Ω(1) Worst- Best-case complexity - Ω(1) Worst-
case complexity - O(n) case complexity- O(log2n)

21
Copyright © 2021, ABES Engineering College
Question - Case 1

You have an array – is there any two element exist in an array if both element
addition is above 1000 ?

Find out Time complexity ?

i) O(N)
ii)O(N^2)
iii)O(NlongN)
iv)O(logN)

22
Copyright © 2021, ABES Engineering College
Question - Case 1- 4

You have an Sorted array – is there any two element exist in an array if both
element addition is above 1000 ?

Case 1 - O(N^2) using linear search


Case 2 - O(NlogN) using linear and binary search combination
Case 3 - O(N) using two variable technique
Case 4 – O(1) Conceptual

23
Copyright © 2021, ABES Engineering College
Question - Case 1- 4

You have an Sorted array – is there any two element exist in an array if both
element addition is equal 1000 ?

Case 1 - O(N^2) using linear search


Case 2 - O(NlogN) using linear and binary search combination
Case 3 - O(N) using two variable technique
Case 4 – O(1) Conceptual // Not posible

24
Copyright © 2021, ABES Engineering College
Index Sequential Search

Index Sequential Search


 It is a searching technique that uses sequential searching and random-access searching methods. In this searching,
a given sorted array of n elements is divided into groups based on the group size. Then we create an index array
that contains the starting index of each group. This index array also stores the indexes of each group in increasing
order.
 If we want to search an element called a key in the given array, we find the index group to present that search
element.
 Following are the steps to implement Index Sequential Search
 Suppose an array A of N elements is given in which elements are stored in sorted order. Now divide the array into
groups according to the group size.
 Now stores the starting index of each group into the index array. Here each element in the index array points to the
group of elements. From this index array, we can also get the starting and end index of that group where the search
element is expected to be present.
 Read the search element called KEY.
 Now compare the item with the first element; if the item is greater than the first element of that group, then moves to
the next group; otherwise, perform the sequential search in the previous group. Repeat this step until all groups are
processed.

25
Copyright © 2021, ABES Engineering College
Application

Application
This index sequential search or access to search or access records in the Database. It accesses database records very
quickly if the index table is organized correctly. The main advantage of the indexed sequential is that it reduces the
search time for a given item because sequential search is performed on the smaller range compared to the large table.

26
Copyright © 2021, ABES Engineering College
Algorithm

ALGORITHM: IndexSequentialSearch(A[ ], N, KEY)


BEGIN: IF flag = 0 THEN
Ind[ ], indele[ ]
si = ind[i-1]
n1=0, si, ei, flag = 0
FOR i = 0 TO N STEP+3 DO ei = N-1
Indele[n1] = A[ i ] FOR i = si TO ei DO
ind[n1] = i
IF KEY = = A[i] THEN
n1 = n1+1
IF KEY<Indele[0] THEN j=1
WRITE(“ITEM NOT FOUND”) BREAK
ELSE IF j = = 1 THEN
FOR i = 1 TO n1+1 DO
WRITE(i+1)
IF KEY<Indele[i] THEN
si = ind[i–1] ELSE
ei = ind[i] WRITE(“No. not found”)
flag = 1
END;
BREAK

Time Complexity: ϴ(N/K)


27
Copyright © 2021, ABES Engineering College
Analysis

 Time Complexity: ϴ(N/K)


 If the index is created by selecting each k th element in the list or the size of the group is K then the size of the index
is N/K, upon which sequential search is executed. This complexity can be further reduced if we apply binary search
on the index array to select the search array.
 In that case it will take log2(N/K) time to search in index array and order of K time to perform the sequential search.

 Space Complexity: ϴ(N/K)
 If index is created by selecting each k th element in the list, size of index is N/K. This is the additional space than the
original array. In case of Index array is given, space complexity will be ϴ(1).

28
Copyright © 2021, ABES Engineering College
Analysis

Example:

29
Copyright © 2021, ABES Engineering College
Example

30
Copyright © 2021, ABES Engineering College
Example

31
Copyright © 2021, ABES Engineering College
Example

32
Copyright © 2021, ABES Engineering College
Quality Content for Outcome based Learning

HASHING

Ver. No.: 1.1 Copyright © 2021, ABES Engineering College


Hashing

34
Copyright © 2021, ABES Engineering College
Hashing

35
Copyright © 2021, ABES Engineering College
Hashing

36
Copyright © 2021, ABES Engineering College
Hashing

The above elements are mapped using hash function H(x) = H(x) mod 5

Suppose we want to search a topic hashing in a Data Structure Book. Then, instead of
using linear search or binary search technique, we can directly use the help of the
index page and can see its exact page number and search this in O(1) time.
37
Copyright © 2021, ABES Engineering College
Definition of Hashing
• The process of transforming an
element into a secret element using
a hash code is known as hashing.
• Hashing in the data
structure is a technique of
mapping a large chunk of data
into small tables using a hashing
function.
• It is a technique that uniquely
identifies a specific item from a
collection of similar items.
38
Copyright © 2021, ABES Engineering College
Hash Function

The mathematical function used for transforming an element into a mapped one or to
secret code is known as a Hash Function.

39
Copyright © 2021, ABES Engineering College
Hashing Functions

Modulo Method (Division Modulo)


This method computes hash code by using division method. The simple formula for
calculating hash code is given by:
Hash Code(key) = key mod HashTableSize
The size of Hash Table is decided based on the input data set.

Input elements as 35, 44, 22, 19, 11, 20, 43, 6, 88, 27
Hash code(35) = 35 mod 10 = 5,
Hash code(44) = 44 mod 10 = 4
0 1 2 3 4 5 6 7 8 9
20 11 22 43 44 35 6 27 88 19

40
Copyright © 2021, ABES Engineering College
Hashing Functions

Consider a Hash Table of Size 100,


The Hash code for 123,223,323,423
123 mod 100 =3
223 mod 100 =3
323 mod 100 =3
423 mod 100 =3
We cannot store these keys at the same location in the
Hash Table
WHAT CAN BE DONE IN SUCH CASE??

41
Copyright © 2021, ABES Engineering College
Hashing Functions

We can divide the keys with some number with the least factors, a Prime number

Disadvantage: This method may suffer from the collision. Two elements when converted to hash function,
if result in having one hash code then collision is said to have occurred.
42
Copyright © 2021, ABES Engineering College
Hashing Functions

Mid Square Method

unique key is extracted from the middle of the square of the key
If the number of digits of the highest possible index in the chosen hash table (k), then this
hashing process suggests picking k digits from the square of the keys to act as the hash
code (if the hash table size is in the powers of 10) else modulus is taken of these mid k
digits with the table size.
Key 104 Key 4012

Key2 10816 Key2 16096144

H(K) 10816 H(K) 16096144

43
Copyright © 2021, ABES Engineering College
Hashing Functions

hash table having size N. Each hash table location has an address of k digits.

44
Copyright © 2021, ABES Engineering College
Hashing Functions

Folding Method
• Divide the key into equal size of pieces (of the same length as that of the length of the
largest address in the table size) and then these are added together.
• Modulus is taken of sum with the table size, which results in the hash code.
H(k) = sum mod N
Table size (N) be 1000, i.e. addresses will range from 0 – 999. The largest address is of 3
digit. If the key is 12345678, breaking it down into groups of 3 digits each.
12 + 345 + 678 = 1035
Hash code = 1035 % 1000 = 35

45
Copyright © 2021, ABES Engineering College
Hashing Functions

If the Table size (N) is 13, i.e. address will range from 0 – 12. Largest address is of 2 digits,
the key will be divided into groups of 2 digits each. If the key is 12345678,
12+34+56+78 = 180
Hash Code = 180 % 13 =11

46
Copyright © 2021, ABES Engineering College
Hashing Functions
A variation of the folding method is Reverse Folding, in which either the odd group or the
even group is reversed before addition.

47
Copyright © 2021, ABES Engineering College
Collision Resolution in Hashing

The situation when the location found for two keys are same, the situation
can be termed as collision.
H:Key1→ L
H:Key2→ L
As two data values cannot be kept in the exact location, collision is not
desirable situation.
Avoiding collisions completely is difficult, even with a good hash function.
Some method should be used to resolve this.

48
Copyright © 2021, ABES Engineering College
Collision Resolution in Hashing

There are two known methods of collision resolution in Hashing.


• Open Addressing
• Chaining

49
Copyright © 2021, ABES Engineering College
Collision Resolution in Hashing

Open Addressing
Every key considers the entire table as the storage space. Thus, if it does not find the
appropriate place for storage through the hash function, it tries to find the next free
available slot.
• There are 3 different Open addressing mechanisms named as
• Linear Probing
• Quadratic Probing
• Rehashing/Double hashing

50
Copyright © 2021, ABES Engineering College
Collision Resolution in Hashing

Linear Probing
If the key cannot be stored/searched at the given hash location, try to find the next
available free slot by traversing sequentially
If the table size is TS,
H(K,j) = (H(K) + j) modulus TS
j=0, 1, 2, ...
Sequence of investigation:
• H(K)
• (H(K) + 1 ) modulus TS
• (H(K) + 2) modulus TS,
• …
Modulus is applied because the Hash table is considered to be circular in nature.
51
Copyright © 2021, ABES Engineering College
Collision Resolution in Hashing

Quadratic Probing
Idea: when there is a collision, check the next available position in the table using the
quadratic formula:
H’(K,j) = (H(K) + j2) modulus TS
j =0, 1, 2, ...

52
Copyright © 2021, ABES Engineering College
Collision Resolution in Hashing

Rehashing/Double Hashing
The first hash function is used to find the hash table location and the second hash function
to find the increment sequence

H(K,j) = (H(K) + j H’(K) ) modulus TS, j=0,1,...


Sequence of investigation:
- H(K)
- (H(K) + H’(K)) modulus TS
- (H(K) + 2*H’(K)) modulus TS and so on
- (H(K) + 3*H’(K)) modulus TS and so on

53
Copyright © 2021, ABES Engineering College
Collision Resolution in Hashing

Example
 H(K) = k modulus 13
 H’(K) = 1+ (k modulus 11)
 H(K,i) = (H(K) + i H’(K) ) mod 13

• Insert key 14 in the Given Table


H(14,0) = 14 modulus 13 = 1

H(14,1) = (H(14) + H’(14)) modulus 13


= (1 + 4) modulus 13 = 5

H(14,2) = (H(14) + 2 H’(14)) modulus 13


= (1 + 8) modulus 13 = 9
54
Copyright © 2021, ABES Engineering College
Collision Resolution in Hashing

Chaining
 The chaining method takes the
array of linked lists in contrast
to the linear array for Hash
Table.
 The keys with the same hash
address go to the same linked
list.

55
Copyright © 2021, ABES Engineering College
Load Factor of a Hash Table

• Load factor is defined as


λ = N/TS
– N refers to the Total number of keys stored in the hash table
– TS refers to the Hash Table Size

56
Copyright © 2021, ABES Engineering College
Summary

57
Copyright © 2021, ABES Engineering College
Thank You

58
Copyright © 2021, ABES Engineering College

You might also like