0% found this document useful (0 votes)
12 views27 pages

DS (KCS-301) Unit 3 CSE Sorting and Searching

The document discusses different hashing techniques used for efficient searching and retrieval of records from storage. Hashing uses a hash function to map records to unique addresses in storage. Common hash functions include division, mid-square, and folding methods. Collisions occur when different keys map to the same address. Separate chaining and closed hashing like linear probing are methods used to handle collisions. The ideal hash function minimizes collisions and evenly distributes records for efficient storage and retrieval.

Uploaded by

fijoxa3396
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views27 pages

DS (KCS-301) Unit 3 CSE Sorting and Searching

The document discusses different hashing techniques used for efficient searching and retrieval of records from storage. Hashing uses a hash function to map records to unique addresses in storage. Common hash functions include division, mid-square, and folding methods. Collisions occur when different keys map to the same address. Separate chaining and closed hashing like linear probing are methods used to handle collisions. The ideal hash function minimizes collisions and evenly distributes records for efficient storage and retrieval.

Uploaded by

fijoxa3396
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Scanned by CamScanner

Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Data Structure (KCS-301)
(Unit-3)

Page 23 of 27
Data Structure (KCS-301)
(Unit-3)

Index Sequential Search


------------------------------------------------

Hashing: It is a searching technique called hashing or hash addressing which is


independent of the number of input elements n.

Hash function H will be applicable on key k to generate a memory address L, i.e.

H : K -> L

Some applications such as direct files, require the search to be of order O(1) in best case
i.e. the time of search should be independent of the element or key in the storage area. In
this situation a hash function is used to obtain a unique address for a given key or token.

A hash function is any function that can be used to map data of arbitrary size to data of a
fixed size. The values returned by a hash function are called hash values, hash codes,
digests, or simply hashes.

Example: MD5 and SHA-1 are efficient hash functions used for cryptographic applications.

Types of Hashing Functions:


 Division method
 Midsquare method
 Folding method

1. Division method: In this method the key K is divided by a prime number , After the
division, the remainder is taken as the address.

Ex. Let key K = 189235 prime number (p) = 41 (say)

 Hash Address = K mod p

=189235 mod 41
Page 24 of 27
Data Structure (KCS-301)
(Unit-3)

=20 (remainder)

2. Midsquare method: In this method the value of the key K is squared and afterwards a
suitable number of digits from the middle of k2 is choosen to be the address of the record.
Let us assume that 4th and 5th digit from the right of K2 will be selected as the hash address
as shown below :

K : 5314 6218 9351


K2 : 28238596 38663524 87441201

Hash Address: 38 63 41

So the record with K =5314, 6218 and 9351, would be stored at address 38,63 and 41
respectively.

3. Folding method: In this method, the key is split into pieces and a suitable arithmetic
operation is done on the pieces. The operation can be add, subtract, divide etc.

Ex. i) Let key K = 189235

Let us split it into two parts 189 and 235

By adding the two parts we get

Hash Address 424 (189+235)

ii) Let key K = 123529164

Hash address 816 (123+529+164)

There is the chance that, after applying hash function on different keys, same address is
generated, such mapping of keys to the same address is known as the collision and the
key are called as synonyms.

H : K1 -> L

H : K2 -> L

To manage collisions, the overflowed keys must be stored in some other storage
space called overflow area.

** So the preferred hash function is that, which generate minimum number of collisions.

Requirement for Hashing Algorithms:

The important features required in a Hash algorithm or function are:


Repeatable: A capacity to generate a unique address where a record can be stored and
retrieved afterward.

Even Distribution: The record of a file should be evenly distributed throughout the
allocated storage space.

Minimum collision: it should generate uniue addresses for different key so that number of
collision can be minimized.
Page 25 of 27
Data Structure (KCS-301)
(Unit-3)

Overflow Management (Collision Handling) Methods:


Separate Chaining ( Open hashing):
The idea is to make each cell of hash table point to a linked list of records that have same
hash function value.
Let us consider a simple hash function as “key mod 7” and sequence of keys as 50, 700,
76, 85, 92, 73, 101.

Advantages:
1) Simple to implement.
2) Hash table never fills up, we can always add more elements to chain.
3) Less sensitive to the hash function or load factors.
4) It is mostly used when it is unknown how many and how frequently keys may be inserted
or deleted.
Disadvantages:
1) Cache performance of chaining is not good as keys are stored using linked list. Open
addressing provides better cache performance as everything is stored in same table.
2) Wastage of Space (Some Parts of hash table are never used)
3) If the chain becomes long, then search time can become O(n) in worst case.
4) Uses extra space for links.

Page 26 of 27
Data Structure (KCS-301)
(Unit-3)

Closed Hashing: There are different methods in close hashing as follows:

1.Linear probing: if location L is full then goto location L+(1),if location L+1 is full then go
to location L+(2) and so on.
2.Quadratic probing: if location L is full then goto location L+(1)2, if location L+(1)2 is full
then go to location L+(2)2 =L+4 and so on.

** (probing means searching)

3.Double hashing H1 : K -> LH1 Let H1: Division method hash function

H2 : K -> LH2 H2: Midsquare method hash function

if location LH1 is full then goto location LH1+ LH2, if location LH1+ LH2 is
full then go to location LH1+ 2*LH2 , if location LH1+ 2*LH2 is full then go to
location LH1+ 3*LH2 and so

4.Rehashing: When the address generated by a hash function F1 and a key kj collides with
another key Ki then another hash function F2 is applied on the address to obtain a new
address A2. The collied key kj is then stored at the new address A2 .

If the space referred by A2 is also occupied, then the process of rehashing is again
repeated. In fact rehashing is repeatedly applied on the intermediated address until a free
location is found where the record could be stored.

Page 27 of 27

You might also like