Linear Hashing
Linear Hashing
E0 261
Jayant Haritsa
Computer Science and Automation
Indian Institute of Science
0
h(key) mod N
1
key
h
N-1
Primary bucket pages Overflow pages
JAN 2021 LINEAR-HASHING Slide 4
Extendible Hashing
• Situation: Bucket (primary page) becomes full.
Why not re-organize file by doubling # of buckets?
– Reading and writing all pages is expensive!
– Idea: Use directory of pointers to buckets, double # of
buckets by doubling the directory, splitting just the
bucket that overflowed.
– Directory much smaller than file, so doubling it is much
cheaper. Only one page of data entries is split. No
overflow page.
– Trick lies in how hash function is adjusted.
2 2
Bucket B
00 1* 5* 21* 13*
• Directory is array of size 4.
01
• To find bucket for r, take 10 2
last `global depth’ # bits of 11 10*
Bucket C
h(k)
– If h(k) = 5* = binary 101, DIRECTORY 2
Bucket D
it is in bucket pointed to 15* 7* 19*
2 2
3 2
00 1* 5* 21* 13* Bucket B 000 1* 5* 21* 13* Bucket B
01 001
10 2 2
010
10* Bucket C
11 10*
011 Bucket C
100
2
DIRECTORY 101 2
Bucket D
15* 7* 19*
110 15* 7* 19* Bucket D
111
2
3
Bucket A2
4* 12* 20* DIRECTORY 4* 12* 20* Bucket A2
(‘split image’
of Bucket A1) (‘split image’
JAN 2021 LINEAR-HASHING
of Bucket
Slide 7 A1)
Comments on Extendible Hashing
• If directory fits in memory, equality search
answered with one disk access; else two.
• Directory grows in spurts, and, if the distribution of
hash values is skewed, directory can grow large.
• Multiple entries with same hash value cause
problems
– Unfixable issue for duplicates
– Results in EH directory exploding!
0 Next NR M 2NR
hlevel hlevel-1 hlevel
• m = hlevel (k)
if m >= M, m = m - NR
• Hash Function:
– hi (k) = [(A k) mod w] mod 2i N
where A = 6125423371 w = 232
(A is prime w.r.t. w)
• In LH, split is local impact, whereas in the
B-tree, split is “global”
• In LH, duplicates don’t cause a problem because
of presence of overflow buckets
E0 361