Lec 20-24
Lec 20-24
90
100
1
2
2
3
3 3
4 3
5
3
3
OPTION 1 4
5
4
4
5
3
6
8
7 6
8
n Option 3 is most
commonly used
n Record Pointers
n Implemented using
one level of
indirection so that
index entries are of
fixed length and
have unique field
values
Figure taken from Elmasiri, 4e
© Prof. Navneet Goyal, BITS, Pilani
Types of Single-level
Indexes
Ordering Nonordering
Field Field
Key Field Primary Secondary Index
Index (key)
Nonkey Clustering Secondary Index
Field Index (nonkey)
* Yes if every distinct value of the ordering field starts from a new block; no otherwise
** For Option 1
*** For Options 2 & 3
© Prof. Navneet Goyal, BITS, Pilani
Multilevel Indexes
n In all single level indexes, the index file
is always sorted on the search key
n For an index with bi blocks, a binary
search requires approximately (log2 bi)
block accesses
n The idea behind multilevel indexes is to
reduce the part of the index file that we
continue to search by a factor of bfri
(blocking factor)
© Prof. Navneet Goyal, BITS, Pilani
Multilevel Indexes
n Blocking Factor=block size in bytes/record
size in bytes
n bfri, the blocking factor for the index, is
always greater than 2
n Search space is reduced much faster
n bfri is called the fan-out (fo) for the
multilevel index
n Searching a multilevel index requires (logfo
bi) block accesses, which is a smaller
number that for binary search if fo>2.
4.
n To find bucket for r, take 2 2
last `global depth’ # bits Bucket B
of h(r); we denote r by 00 1* 5* 21* 13*
h(r). 01
n If h(r) = 5 = binary 101, 10 2
it is in bucket pointed to Bucket C
by 01. 11 10*
2
DIRECTORY
Bucket D
15* 7* 19*
2 2
3 2
00 1* 5* 21*13* Bucket B 000 1* 5* 21*13* Bucket B
01 001
10 2 2
010
10* Bucket C
11 10*
011 Bucket C
100
2
DIRECTORY 101 2
Bucket D
15* 7* 19*
110 15* 7* 19* Bucket D
111
2
3
4* 12* 20* Bucket A2
DIRECTORY 4* 12* 20* Bucket A2
(`split image'
of Bucket A) (`split image'
of Bucket A)
© Prof. Navneet Goyal, BITS, Pilani
Points to Note
n 20 = binary 10100. Last 2 bits (00) tell us r
belongs in A or A2. Last 3 bits needed to tell
which.
n Global depth of directory: Max # of bits needed to tell
which bucket an entry belongs to.
n Local depth of a bucket: # of bits used to determine if
an entry belongs to this bucket.
n When does bucket split cause directory
doubling?
n Before insert, local depth of bucket = global depth.
Insert causes local depth to become > global depth;
directory is doubled by copying it over and `fixing’
pointer to split image page. (Use of least significant
bits enables efficient doubling via copying of
© Prof. Navneet Goyal, BITS, Pilani
Points to Note
75
Comments on Extendible
Hashing
n If directory fits in memory, equality search
answered with one disk access; else two.
n 100MB file, 100 bytes/rec, 4K pages contains 1,000,000
records (as data entries) and 25,000 directory elements;
chances are high that directory will fit in memory.
n Directory grows in spurts, and, if the distribution of hash
values is skewed, directory can grow large.
n Multiple entries with same hash value cause problems!
77
Linear Hashing
n Linear hashing is an alternative mechanism
which avoids these disadvantages at the
possible cost of more bucket overflows
n Motivation: Ext. Hashing uses a directory
that grows by doubling… Can we do
better? (smoother growth)
n LH: split buckets from left to right,
regardless of which one overflowed
(simple, but it works!!)
M=3
mod 3
0 1 2 3
M=3
mod 3
0 1 2 3 4
M=3
mod 3
0 1 2 3 4 5
mod 6 mod 6
M=3
mod 3
0 1 2 3 4 5
M=6
mod 6