IT3020 L06 Indexing
IT3020 L06 Indexing
Indexes
Files of Records
id)
scan all records (possibly with some conditions
Three types
Heap File Organization
Sequential File Organization
Hashing File Organization
Alternative File Organizations
Many alternatives exist, each ideal for some situation , and not so
good in others:
Heap files: Suitable when typical access is a file scan
retrieving all records.
Search (Equality/Range) needs to scan the file
Indexes
Three alternatives:
1. Data record with key value k (Alt. 1)
2. <k, rid of data record with search key value
k> (Alt. 2)
3. <k, list of rids of data records with search
key k> (Alt. 3)
Terminology
File of records containing index entries
= index file
Index entries
CLUSTERED direct search for UNCLUSTERED
data entries
clustered!
50
Tracy, 44, 5004
k1 k2 kN Index File
Index Entries
(Direct search)
Data Entries
("Sequence set")
B+ Trees in Practice
Typical order: 100. Typical fill-factor: 67%.
average fanout = 133
Typical capacities:
Height 4: 1334 = 312,900,700 records
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
occupancy is
guaranteed in 2* 3* 5* 7* 8*
5 13 24 30
2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
17
5 13 27 30
2* 3* 5* 7* 8* 14* 16* 22* 24* 27* 29* 33* 34* 38* 39*
Observe `toss’ of
index entry (on right), 22* 27* 29* 33* 34* 38* 39*
Alternatives…
Overflow leaf pages
N-1
Primary bucket pages Overflow pages
Static Hashing… (contd.)
Example
GLOBAL DEPTH
2 2
Bucket B
00 1* 5* 21* 13*
Directory is array of size 4.
01
To find bucket for r, take last 10 2
`global depth’ # bits of h(r); 10*
Bucket C
11
we denote r by h(r).
If h(r) = 5 = binary 101, it
DIRECTORY 2
is in bucket pointed to by 15* 7* 19*
Bucket D
01.
DATA PAGES
2 2
3 2
00 1* 5* 21*13*Bucket B 000 1* 5* 21*13*Bucket B
01 001
10 2 2
010
10* Bucket C
11 10*
011 Bucket C
100
2
DIRECTORY 101 2
Bucket D
15*7* 19*
110 15*7* 19* Bucket D
111
2
3
4* 12*20* Bucket A2
DIRECTORY 4* 12*20* Bucket A2
(`split image'
of Bucket A) (`split image'
of Bucket A)
Points to Note