0% found this document useful (0 votes)
93 views36 pages

Principles of Database Management Systems: 4.2: Hashing Techniques

This document discusses hashing techniques for database management systems. It describes how hashing works by using a hash function to map keys to storage locations. Two common hashing alternatives are presented: using the hash value directly to determine the storage block, or locating records indirectly via index buckets. The document then discusses dynamic hashing techniques like extensible hashing and linear hashing that allow hash tables to grow without full reorganizations.

Uploaded by

gowtham1990
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
93 views36 pages

Principles of Database Management Systems: 4.2: Hashing Techniques

This document discusses hashing techniques for database management systems. It describes how hashing works by using a hash function to map keys to storage locations. Two common hashing alternatives are presented: using the hash value directly to determine the storage block, or locating records indirectly via index buckets. The document then discusses dynamic hashing techniques like extensible hashing and linear hashing that allow hash tables to grow without full reorganizations.

Uploaded by

gowtham1990
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 36

Principles of Database

Management Systems

4.2: Hashing Techniques


Pekka Kilpelinen
(after Stanford CS245 slide originals by
Hector Garcia-Molina, Jeff Ullman and
Jennifer Widom)

DBMS 200 Notes 4.2: Hashi 1


Hashing?
Locating the storage block of a
record by the hash value h(k) of
its key k
Normally really fast
records (often) located by a single
disk access

DBMS 200 Notes 4.2: Hashi 2


Hashing

<key>
key h(key)
Buckets
(typically 1
disk block)

DBMS 200 Notes 4.2: Hashi 3


Two alternatives
(1) Hash value determines the storage block directly
.

records
key h(key)
.

to implement a primary index

DBMS 200 Notes 4.2: Hashi 4


Two alternatives
(2) Records located indirectly via index buckets

record
key h(key) key 1

Index

for a secondary index

DBMS 200 Notes 4.2: Hashi 5


Example hash function

Key = x1 x2 xn n byte character


string
Have b buckets
h = (x1 + x2 + + xn) mod b
{0, 1, , b-1}

DBMS 200 Notes 4.2: Hashi 6


This may not be best function

Good hash Expected number of


function: keys/bucket is the
same for all buckets

Read Knuth Vol. 3 if you really


need to select a good function.

DBMS 200 Notes 4.2: Hashi 7


Next: example to illustrate
inserts, overflows,
deletes
h(K)

DBMS 200 Notes 4.2: Hashi 8


EXAMPLE 2 records/bucket

0
INSERT: d

h(a) = 1 1
a e
h(b) = 2 c
2
b
h(c) = 1
3
h(d) = 0
h(e) = 1

DBMS 200 Notes 4.2: Hashi 9


EXAMPLE: deletion

Delete: 0
a
e 1
b d
f c d
2
c e
3
f maybe move
g g up

DBMS 200 Notes 4.2: Hashi 10


Rule of thumb:
Try to keep space utilization
between 50% and 80%
Utilization = # keys used
total # keys that fit
If < 50%, wasting space
If > 80%, overflows significant
depends on how good hash
function is & on # keys/bucket

DBMS 200 Notes 4.2: Hashi 11


How do we cope with growth?
Overflows and reorganizations
Dynamic hashing: # of buckets
may vary
Extensible
Linear
also others ...

DBMS 200 Notes 4.2: Hashi 12


Extensible hashing: two ideas

(a) Use i of b bits output by hash


function For example,
b=32
b
00110101
h(K)

use i grows over time.

DBMS 200 Notes 4.2: Hashi 13


(b) Use directory

h(K)[i ] to bucket

Directory contains 2i pointers to buckets, and


stores i.
Each bucket stores j, indicating #bits used for
placing the records in this block (j i)

DBMS 200 Notes 4.2: Hashi 14


Extensible Hashing:
Insertion
If there's room in bucket h(k)[i], place
record there; Otherwise
If j=i, set i=i+1 and double the directory
If j<i, split the block in two, distribute
records among them now using j+1 bits
of h(k); (Repeat until some records end
up in the new bucket); Update pointers of
bucket array
See the next example
DBMS 200 Notes 4.2: Hashi 15
Example: h(k) is 4 bits; 2
keys/block
(j) i =2
1
00
i=1 0001
01

10
1 2
1001 11
1010 1100

1 2 New directory
Insert 1100
1010
DBMS 200 Notes 4.2: Hashi 16
Example continued 2
0000
i= 2 0001
00

01
12
0001 0111
10 0111
11 2
1001
1010
Insert:
2
0111 1100
0000

DBMS 200 Notes 4.2: Hashi 17


Example continued
i=3
0000 2 000
i= 2 0001
001
00
0111 2
010
01
011
10 1001 3
1001 100
11
10101001 2 3 101
1010
Insert: 110

1001 1100 2 111

DBMS 200 Notes 4.2: Hashi 18


Extensible hashing: deletion

Reverse insert procedure

Example:
Walk thru insert example in reverse!

DBMS 200 Notes 4.2: Hashi 19


Summary Extensible hashing
+ Can handle growing files
- without full reorganizations
+ Only one data block examined
- Indirection
(Not bad if directory in memory)

- Directory doubles in size


(First it fits in memory, then it does not
sudden performance degradation)

DBMS 200 Notes 4.2: Hashi 20


Linear hashing: grow # of buckets by
one

Two ideas:
(a) Use i low order bits of hash b

01110101
grows i
(b) File grows linearly

No bucket directory needed

DBMS 200 Notes 4.2: Hashi 21


Linear Hashing:
Parameters
n: number of buckets in use
buckets numbered 0n-1
i: number of bits of h(k) used to address
buckets i log(n)

r: number of records in hash table
ratio r/n limited to fit an avg bucket in a block
next example: r 1.7n, and block holds 2 records
=> AVG bucket occupancy is 1.7/2 = 0.85 of a block

DBMS 200 Notes 4.2: Hashi 22


Example: 2 keys/block, b=4 bits, n=2, i =1
insert 0101

now r=4 >1.7n


get new bucket
0000 0101
10
1010 1111 and distribute keys btw
00 01 buckets 00 and 10

Rule If h(k)[i ] = (a1 ai)2 < n, then


look at bucket h(k)[i ]; else
look at bucket h(k)[i ] - 2i -1 = (0a2 ai)2

DBMS 200 Notes 4.2: Hashi 23


n=3, i =2;
distribute keys btw buckets 00 and
10:

0000 0101 1010


1010 1111
00 01 10

DBMS 200 Notes 4.2: Hashi 24


n=3, i =2; insert 0001:
0001
can have overflow
chains!

0000 0101 1010


1111
00 01 10

DBMS 200 Notes 4.2: Hashi 25


n=3, i =2
0001 insert 0111
0111
bucket 11 not in use
0000 0101 1010 redirect to 01
1111
now r=6 > 1.7n
00 01 10
-> get new bucket 11

DBMS 200 Notes 4.2: Hashi 26


n=4, i =2; distribute keys btw 01 and
11 0001
0111

0000 0101 1010 1111


1111
0001 0111
00 01 10 11

DBMS 200 Notes 4.2: Hashi 27


Example Continued: How to grow beyond
this?

i = 23

0000 0101 1010 1111 0101


0101 0101
000 0 01 0 10 11
0 100 101
...
101 110 111
m = 11 (max used block)
100
101

DBMS 200 Notes 4.2: Hashi 28


Summary Linear Hashing
+ Can handle growing files
- without full reorganizations
+ No indirection directory of extensible
- hashing
Can have overflow chains
- but probability of long chains can be
kept low by controlling the r/n fill ratio (?)

DBMS 200 Notes 4.2: Hashi 29


Summary

Hashing
- How it works
- Dynamic hashing
- Extensible
- Linear

DBMS 200 Notes 4.2: Hashi 30


Next:

Indexing vs Hashing
Index definition in SQL

DBMS 200 Notes 4.2: Hashi 31


Indexing vs Hashing

Hashing good for probes given key


e.g., SELECT
FROM R
WHERE R.A = 5

DBMS 200 Notes 4.2: Hashi 32


Indexing vs Hashing

INDEXING (Including B-Trees) good


for
Range Searches:
e.g., SELECT
FROM R
WHERE R.A > 5

DBMS 200 Notes 4.2: Hashi 33


Index definition in SQL

Create index name on rel (attr)


Create unique index name on rel
(attr)
defines candidate key

Drop INDEX name

DBMS 200 Notes 4.2: Hashi 34


CANNOT SPECIFY TYPE OF INDEX
Note
(e.g. B-tree, Hashing, )
OR PARAMETERS
(e.g. Load Factor, Size of Hash,...)
... at least in SQL
Oracle and IBM DB2 UDB provide a
PCTFREE clause to inditate the proportion
of B-tree blocks initially left unfilled
Oracle: Hash clusters with built-in or DBA-
specified hash function

DBMS 200 Notes 4.2: Hashi 35


The BIG picture.
Chapters 2 & 3: Storage, records,
blocks...
Chapter 4: Access Mechanisms
- Indexes
- B trees
- Hashing
NEXT
Chapters 6 & 7: Query Processing

DBMS 200 Notes 4.2: Hashi 36

You might also like