Unit 4-Hashing

This document discusses different hashing techniques used for database management systems. It begins by describing basic hashing and its benefits and drawbacks. It then covers static hashing methods like internal and external hashing. The document introduces dynamic hashing techniques like extendible hashing and linear hashing to allow hash tables to dynamically grow and shrink. It provides examples and explanations of how these techniques work. Finally, it compares hashing to B+ trees in terms of speed and growth characteristics.

Uploaded by

Nimish Makharia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views24 pages

Unit 4-Hashing

Uploaded by

Nimish Makharia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 24

MC 302 – DBMS: Hashing

Goonjan Jain
Department of Applied Mathematics
Delhi Technological University
Outline
• hashing
• extendible hashing
• linear hashing
• Hashing vs B-trees
Hashing
• Primary file organization
• Very fast access to records on search condition
• Search condition must be on a single field – hash field or hash key
• Hash function/ randomizing function – applied on hash field, yields
address of disk block
Static Hashing
• Fixed number of buckets
• Drawback for dynamic files
• Types:
• Internal Hashing
• External Hashing
Internal Hashing
• For internal files
• implement a hash table using array of records
• M slots addressed as 0 to M-1
• Choose a hash function, transform hash field into an integer from 0 to
M-1
• Most common hash function h(k) = k mod M
• Problems:
• No guarantee that different values will hash to different addresses
Collision
• Hash field value hashed to an address already occupied
• Collision resolution: find another position
• Collision Resolution Techniques:
• Open addressing:
• Starting from the hashed address, check subsequent positions till an unused position is
found
• Chaining:
• Place new value in an unused overflow location
• Set a pointer from the address to the overflow location

• Multiple Hashing:
• Apply a second hash function
External Hashing
• Target address space is made of buckets,
• each bucket can hold multiple values
• Bucket is
• 1 disk block, or
• Cluster of contiguous blocks
• Hash function maps key to a relative bucket number
• Collision problem is less severe
• If a bucket if full, a variation of chaining can be used-
• Bucket points to record pointers – block address and record position
Problem with (static) hashing
• Overflow
• Underflow
• Sol: Dynamic Hashing
Dynamic Hashing
• idea: shrink / expand hash table on demand..
• ..dynamic hashing
• Details: how to grow gracefully, on overflow?
• Many solutions – One of them: ‘extendible hashing’
Extendible Hashing
• keep a directory, with pointers to hash-buckets
• Uses-
• An array of 2d buckets. d is the global depth
• Directory
• First d bits of hash value is used as index for directory entry
• Entry in director determines the bucket address
• Each bucket stores
• Local depth d’ – number of bits on which bucket contents are based
• Q: how to divide contents of bucket in two?
• A: hash each key into a very long bit string; keep only as many bits as needed
Extendible Hashing
Extendible Hashing
Extendible Hashing
Extendible Hashing
Extendible Hashing – Directory Doubling
• bucket overflows, 2 cases –
• If local depth, d’ = global depth, d
• Double the directory
• If d’< d
• Split the bucket, no need of directory doubling
Extendible Hashing
• Advantages:
• Performance of file does not degrade with increase in size
• Space overhead for directory is negligible
• Splitting causes minor reorganization
• Disadvantages:
• Directory must be searched before bucket. 2 block access instead of 1
Linear hashing
• Motivation: extendible hashing needs directory which doubles
• Q: can we do something simpler, with smoother growth?
• A: split buckets from left to right, regardless of which one overflowed
Initially: h(x) = x mod N (N=3 here)
Assume capacity: 2 records / bucket
Use two hash functions:
h0(x) = x mod N (N=3 here) - for unsplit buckets
h1(x) = x mod (2*N) (N=3 here) - for the splitted ones
Linear Hashing Example
• In the following M=3 (initial # of buckets)
• Each bucket has 2 keys. One extra key for overflow.
• s is a pointer, pointing to the split location. This is the place where
next split should take place.
• Insert Order: 1,7,3,8,12,4,11,2,10,13
• After insertion till 12:
Linear Hashing example
• When 4 inserted overflow occurred. So we split the bucket (no matter it is full or partially empty).
And increment pointer.

• split bucket 0 and rehashed all keys.

• Placed 3 to new bucket as (3 mod 6 = 3 ) and (12 mod 6 = 0 ).
• Then 11 and 2 are inserted.
• s is pointing to bucket 1, hence split bucket 1 by re- hashing it.
Linear Hashing Example
• After split:

• Insertion of 10:
• (10 mod 3 = 1) and bucket 1 < s, we need to hash 10 again using h 1(10) = 10 mod 6 = 4th bucket
• For 13
• same bucket
• Overflow
• split 2nd bucket.
Linear Hashing Example
• Final Hash Table:

• s is moved to the top again as one cycle is completed and level is
incremented.
Linear hashing - Searching
• Algo to find key ‘k’:
compute b= h0(k) // original slot
if b < s // has already split
compute b= h1(k)
search bucket b
Linear hashing - Deletion
• Inverse of insertion
• If underflow, contract
• If the last bucket is empty,
• remove it and
• Decremented s.
• If s is 0 and the last bucket becomes empty,
• s is made to point to bucket (n/2)-1, where n is the current number of
buckets,
• Level is decremented, and
• the empty bucket is removed.
B+ Trees vs Hashing
B+ Trees Hashing
• Speed on Search • Speed
• Exact match queries, worst • On exact match queries, on
case the average
• Range queries
• Nearest-neighbor queries
• Speed on insertion + deletion
• Smooth growing and shrinking
(no-reorg)

Sales Cloud Book
100% (1)
Sales Cloud Book
527 pages
WI-SSA-ENG-010-E - 1 - Control of Design Software
No ratings yet
WI-SSA-ENG-010-E - 1 - Control of Design Software
15 pages
9-Hashing Schemes
No ratings yet
9-Hashing Schemes
23 pages
Pumpsim en
100% (1)
Pumpsim en
169 pages
ACP Questions Examtopic
No ratings yet
ACP Questions Examtopic
71 pages
Chapter 11
No ratings yet
Chapter 11
22 pages
Static and Dynamic Hashing
No ratings yet
Static and Dynamic Hashing
12 pages
Basic VLSI Design Concept
No ratings yet
Basic VLSI Design Concept
18 pages
Unit 3 - DBMS (Indexing, Hashing, B+-Tree)
No ratings yet
Unit 3 - DBMS (Indexing, Hashing, B+-Tree)
7 pages
Computer Applications I - COM 001
No ratings yet
Computer Applications I - COM 001
16 pages
Hashing
No ratings yet
Hashing
33 pages
Linear Hashing
No ratings yet
Linear Hashing
21 pages
Dsa 240404 220052
No ratings yet
Dsa 240404 220052
9 pages
Unit V
No ratings yet
Unit V
93 pages
22-M4-File Organization - Single Level Indexing-09!09!2024
No ratings yet
22-M4-File Organization - Single Level Indexing-09!09!2024
12 pages
Static and Dynamic Hashing
No ratings yet
Static and Dynamic Hashing
10 pages
Data and File Structures: Hashing
No ratings yet
Data and File Structures: Hashing
24 pages
Memory Organization
No ratings yet
Memory Organization
117 pages
Linear Hashing: Historical Background
No ratings yet
Linear Hashing: Historical Background
4 pages
MODULE 5 - BCS304 - HASHING - Leftisht Trees - OBST - Notes
No ratings yet
MODULE 5 - BCS304 - HASHING - Leftisht Trees - OBST - Notes
32 pages
DS 8
No ratings yet
DS 8
30 pages
Extendible Hashing
No ratings yet
Extendible Hashing
65 pages
Database Systems (資料庫系統) : November 26/28, 2007 Lecture #9
No ratings yet
Database Systems (資料庫系統) : November 26/28, 2007 Lecture #9
43 pages
Ch11 Hash Indexes 1perpage Annotated
No ratings yet
Ch11 Hash Indexes 1perpage Annotated
28 pages
Chapter 11
No ratings yet
Chapter 11
22 pages
Webdev 21 Concepts Us
No ratings yet
Webdev 21 Concepts Us
254 pages
Chapter 8 - Hashing
No ratings yet
Chapter 8 - Hashing
26 pages
Lecture14 Hash Based Indexing and Sorting MHH 18oct 2016
No ratings yet
Lecture14 Hash Based Indexing and Sorting MHH 18oct 2016
71 pages
ds-5 Removed
No ratings yet
ds-5 Removed
16 pages
Linear Hash
No ratings yet
Linear Hash
15 pages
10.dynamic Hashing
No ratings yet
10.dynamic Hashing
4 pages
Data Management: INFO125
No ratings yet
Data Management: INFO125
111 pages
Data Structure Seminar
No ratings yet
Data Structure Seminar
23 pages
Hashing
No ratings yet
Hashing
30 pages
CO3 Session 6
No ratings yet
CO3 Session 6
29 pages
Chapter 8 - Hashing
No ratings yet
Chapter 8 - Hashing
78 pages
CSE 326: Data Structures Hash Tables: Autumn 2007
No ratings yet
CSE 326: Data Structures Hash Tables: Autumn 2007
29 pages
6 Hash-Based Indexing
No ratings yet
6 Hash-Based Indexing
26 pages
Mod 5
No ratings yet
Mod 5
13 pages
BCSE302L-Database Systems Module - 4 Part2
No ratings yet
BCSE302L-Database Systems Module - 4 Part2
71 pages
Adbs 5
No ratings yet
Adbs 5
37 pages
Chap. 6 Hash-Based Indexing: Abel J.P. Gomes
No ratings yet
Chap. 6 Hash-Based Indexing: Abel J.P. Gomes
15 pages
Hashing in DBMS
No ratings yet
Hashing in DBMS
5 pages
DSimp 2
No ratings yet
DSimp 2
21 pages
CS143: Hash Index
No ratings yet
CS143: Hash Index
26 pages
CO3 Notes Hashing
No ratings yet
CO3 Notes Hashing
10 pages
Hash-Based Indexes: Introduction To Database, Fall 2004/melikyan 1
No ratings yet
Hash-Based Indexes: Introduction To Database, Fall 2004/melikyan 1
19 pages
Lec04 Hashing CH 11 P2
No ratings yet
Lec04 Hashing CH 11 P2
44 pages
E Ds Extendiblehashing
No ratings yet
E Ds Extendiblehashing
3 pages
Hash Dbms
No ratings yet
Hash Dbms
5 pages
Hashing
No ratings yet
Hashing
8 pages
Hashing
No ratings yet
Hashing
8 pages
10 1 1 83 586 PDF
No ratings yet
10 1 1 83 586 PDF
31 pages
Unit-3 Hashing Storage Btree
No ratings yet
Unit-3 Hashing Storage Btree
26 pages
Study Material On Hashing
No ratings yet
Study Material On Hashing
4 pages
Efm-2200 NF00153 1112a
No ratings yet
Efm-2200 NF00153 1112a
4 pages
Logical IO Vs Physical IO Vs Consistent Gets
No ratings yet
Logical IO Vs Physical IO Vs Consistent Gets
11 pages
DSA MK Lect2 PDF
No ratings yet
DSA MK Lect2 PDF
92 pages
Dynamic Hashing
No ratings yet
Dynamic Hashing
35 pages
Hashing in DBMS
No ratings yet
Hashing in DBMS
11 pages
Hashing
No ratings yet
Hashing
34 pages
DBMS Hashing
No ratings yet
DBMS Hashing
3 pages
Unit 6.2 Indexing and Hashing
No ratings yet
Unit 6.2 Indexing and Hashing
37 pages
Hash Function
No ratings yet
Hash Function
9 pages
It Is A Very Efficient Method To Search The Exact Data Items Based On Hash Table
No ratings yet
It Is A Very Efficient Method To Search The Exact Data Items Based On Hash Table
49 pages
Hash-Based Indexes: As For Any Index, 3 Alternatives For Data Entries K
No ratings yet
Hash-Based Indexes: As For Any Index, 3 Alternatives For Data Entries K
7 pages
Moxa Product Selection Guide
No ratings yet
Moxa Product Selection Guide
87 pages
Hashing in DBMS
No ratings yet
Hashing in DBMS
9 pages
AIML MCQ On Unit I Introduction To AI & ML & Unit II Feature Extraction & Selection - VJD
No ratings yet
AIML MCQ On Unit I Introduction To AI & ML & Unit II Feature Extraction & Selection - VJD
5 pages
DSAD Dynamic Hashing
No ratings yet
DSAD Dynamic Hashing
79 pages
Mimix PDF
No ratings yet
Mimix PDF
46 pages
How Does PCB Test Fixture Work
No ratings yet
How Does PCB Test Fixture Work
12 pages
Chap 12. Extendible Hashing: File Structures
No ratings yet
Chap 12. Extendible Hashing: File Structures
40 pages
Hex 2 Bi
No ratings yet
Hex 2 Bi
3 pages
Main Method in C#: Sandesh M Patil
No ratings yet
Main Method in C#: Sandesh M Patil
9 pages
Li Fi
No ratings yet
Li Fi
6 pages
EES Program Guide
No ratings yet
EES Program Guide
31 pages
Dual In-Line Package: Applications
No ratings yet
Dual In-Line Package: Applications
5 pages
Acceptable Use Policy: IBEX GLOBAL Information Security
No ratings yet
Acceptable Use Policy: IBEX GLOBAL Information Security
5 pages
Interfaz de Usuario
No ratings yet
Interfaz de Usuario
19 pages
Change in Habits
No ratings yet
Change in Habits
15 pages
Sandvine - SB - Traffic Steering AW 20190506
No ratings yet
Sandvine - SB - Traffic Steering AW 20190506
3 pages
Pre-Emptive Priority Scheduling Algorithm
No ratings yet
Pre-Emptive Priority Scheduling Algorithm
9 pages
Motorola 68Hc11 Microcontroller: Definition - What Does Microcontroller Mean?
No ratings yet
Motorola 68Hc11 Microcontroller: Definition - What Does Microcontroller Mean?
2 pages
Original 1443975573 Original 1423139422 vILP Unix Level 01.4 Unix Command Usage GHT v1.2 Part 1
No ratings yet
Original 1443975573 Original 1423139422 vILP Unix Level 01.4 Unix Command Usage GHT v1.2 Part 1
8 pages
Maybank2U Intra Day Contra
No ratings yet
Maybank2U Intra Day Contra
1 page
DSE2124 DCN EndSem 2023 EPad
No ratings yet
DSE2124 DCN EndSem 2023 EPad
4 pages
Final Exam Notes On Cloud Computing
No ratings yet
Final Exam Notes On Cloud Computing
3 pages
The Tech Interview Playbook: From DSA to System Design
From Everand
The Tech Interview Playbook: From DSA to System Design
Chinmoy Mukherjee
No ratings yet
Flood Fill: Flood Fill: Exploring Computer Vision's Dynamic Terrain
From Everand
Flood Fill: Flood Fill: Exploring Computer Vision's Dynamic Terrain
Fouad Sabry
No ratings yet

Unit 4-Hashing

Uploaded by

Unit 4-Hashing

Uploaded by

MC 302 – DBMS: Hashing

• split bucket 0 and rehashed all keys.

You might also like