0% found this document useful (0 votes)
85 views21 pages

10 Hashing Indexing

The document discusses indexing and hashing techniques for efficiently searching large datasets. It explains the motivation for indexing by comparing search times for different columns. Indexing creates an index table that maps column values like phone numbers to row numbers, allowing faster searching. The document also covers hashing, which maps keys to hash codes using a hash function to create a hash table for search. It discusses collision resolution techniques like chaining and probing to handle collisions.

Uploaded by

Harnek Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
85 views21 pages

10 Hashing Indexing

The document discusses indexing and hashing techniques for efficiently searching large datasets. It explains the motivation for indexing by comparing search times for different columns. Indexing creates an index table that maps column values like phone numbers to row numbers, allowing faster searching. The document also covers hashing, which maps keys to hash codes using a hash function to create a hash table for search. It discusses collision resolution techniques like chaining and probing to handle collisions.

Uploaded by

Harnek Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

CSN 102: DATA

STRUCTURES
Indexing and Hashing
Example
• Consider large number of records, with multiple fields in
one record

SID Name Email Phone


17103001 Paras Gupta [email protected] 7889541349
17103002 Sanamdeep Singh [email protected] 8284942755
17103003 Shreya Gupta [email protected] 7347555334
17103004 Ashutosh sah [email protected] 9766749590
17103005 Barleen Dhaliwal [email protected] 7009047379
17103006 Pranav Dhingra [email protected] 8284841852
Motivation for Indexing
• Search(SID=17103004): Apply binary search on SID
column
• Search(Phone=7347555334): ??
Motivation for Indexing
• Search(SID=17103004): Apply binary search on SID
column
• Search(Phone=7347555334): Linear search over phone
column. Time complexity O(n)
OR
Sort complete data over phone columns, and then binary
search. Time complexity = O(n log n) + O(log n)
= O(n log n)

• Sorting and then searching is therefore not efficient.


Indexing
• Create a table having two entries(key, pointer to record)
for the column. Sort this table. [its one time thing]
• Next time searching a non-sorted column in data set,
search this index table to get address of the desired
record in O(log n)
Index Table: Example
Phone Row No
7009047379 5
7347555334 3
7889541349 1
8284841852 6
8284942755 2
9766749590 4

• Search(Phone=7347555334): O(log n) using index table


Types of Indexing
• Dense indexing: Index table has entry for each record
and a pointer to corresponding record
• Searching using dense index is locating record in index
table and accessing the record using pointer
• Eg.
Phone Row No
7009047379 5
7347555334 3
7889541349 1
8284841852 6
8284942755 2
9766749590 4
Types of Indexing(cont’d)
• Sparse indexing: index table has only some entries
• Possible only when records are sorted on key
• To search some key, find range of records with in which
desired record exists
• Eg.

SID Row No
17103001 1
17103003 3
17103005 5
Multilevel Indexing
• The purpose of indexing is also to reduce the number of
disk accesses
• If size of index table is too large, create an index on index
table. This index has to be sparse as to reduce the size of
index table
• Repeat above strategy until size of index table is
sufficiently small
• Searching happens at the highest level and subsequently
goes to smaller levels upto 1st level
Multilevel Indexing(cont’d)
• Example:

Phone Row No
7009047379 5
Phone Row No 7347555334 3
7009047379 1 7889541349 1
7889541349 3 8284841852 6
8284942755 5 8284942755 2
9766749590 4
2nd level Index
1st level Index
Hash Function
• Hash function is a function which maps data of arbitrary
size to fixed size
• Hash function maps key to hash codes/hash
values/hashes.
• Hash function has many application like generating hash
tables, encrypting etc
Hashing
• Key in the data set is mapped to a hash code(index of
hash table) using hash function
• Hash table therefore stores key and a pointer to the
record in actual data set

Key Hash Code


Hash Function
Hash Table: Example
Index Phone Row
0 9766749590 3

SID Name Phone 1

17103001 Paras Gupta 7889541349 2 8284841852 5

17103002 Sanamdeep Singh 8284942755 3

17103003 Shreya Gupta 7347555334 4 7347555334 2

17103004 Ashutosh sah 9766749590 5 8284942755 1

17103005 Barleen Dhaliwal 7009047378 6

17103006 Pranav Dhingra 8284841852 7


8 7009047378 4
9 7889541349 0

Hash Table
Hash Function: Example
• Key%M, where M is size of hash table
• Key folding%M: eg- if keys have 3n length, than make
pair of 3 keys and sum it. Further, take mode M.
Let key= 123456789
fold keys= 123+456+789 = 1368
%M = 1368%1000 = 368

Let key=789456123
fold keys= 789+456+123=1368
%M = 1368%1000 = 368
Collision
• Collision is when two keys are mapped to same hash
index
• We need to Collision Resolving Techniques.
• Some techniques are:
1. Chaining
2. Open Addressing:
1. Linear probing
2. Quadratic probing
3. Double Hashing
Chaining
• If two or more keys maps to the same hash index, create
a linked list and store keys
• Eg. Let M=10 and Index Key
keys={10,20,30,40,50} 0 10 20 30
1
• Any number of keys can
2 40
be accommodated
3
50
• Search time is more 4
5
6
7
8
9
Linear Probing
• If the hash index is not available, store at the next
available index
• Add i to hash code and take %M. i =1,2,3,4…
Quadratic probing
• If the hash index is not available, increase the hash code
by i+i2 , where i=1,2,3,4…
Double Hashing
• Use two hash functions to generate hash code for any key
• (H1+H2)%M
Search in Hash Table
• In order to search a key, generate the hash index for the
key and search at that location in hash table
• If not found, continue probing based on the Collision
Resolving Technique used.
Perfect Hash Function
• If each key is mapped to a unique hash code for a given
data set, that function is called perfect hash function
• Practically hard to achieve
• Search time using perfect hash function is always
constant.

You might also like