File Organization and Indexing: Prof P Sreenivasa Kumar Department of CS&E, IITM 1
File Organization and Indexing: Prof P Sreenivasa Kumar Department of CS&E, IITM 1
Data transfer:
Move the r/w head to the appropriate track
• time needed - seek time – ~ 12 to 14 ms
Wait for the appropriate block to come under r/w head
• time needed - rotational delay - ~3 to 4ms (avg)
1
Data Records and Files
Fixed length record type: each field is of fixed length
• in a file of these type of records, the record number can be
used to locate a specific record
• the number of records, the length of each field are available
in file header
Record 2
1 2 3 3
1
File blocks:
sequence of blocks containing all the records of the file
Linked allocation
• each file block is assigned to some disk block
• each disk block has a pointer to next block of the sequence
• file expansion is easy; but scanning is slow
Mixed allocation
2
Operations on Files
Insertion of a new record: may involve searching for appropriate
location for the new record
Deletion of a record: locating a record –may involve search;
delete the record –may involve movement of other records
Update a record field/fields: equivalent to delete and insert
Search for a record: given value of a key field / non-key field
Range search: given range values for a key / non-key field
How successfully we can carry out these operations
depends on the organization of the file and the availability
of indexes
Heap file:
Records are appended to the file as they are inserted
Simplest organization
Insertion - Read the last file block, append the record and
write back the block - easy
Locating a record given values for any attribute
• requires scanning the entire file – very costly
Heap files are often used only along with other access structures.
3
Sorted files / Sequential files (1/2)
Ordering field: The field whose values are used for sorting the
records in the data file
Ordering key field: An ordering field that is also a key
Deleting a record
• deletion markers are used.
Prof P Sreenivasa Kumar 11
Department of CS&E, IITM
Hashed Files
Very useful file organization, if quick access to the data record is
needed given the value of a single attribute.
Hash function h: maps the values from the domain of the hashing
attribute to bucket numbers
4
Inserting Records into a Hashed File
Insertion: for the given record R,
apply h on the value of hashing 0
attribute to get the bucket number r.
Overflow
1
If there is space in bucket r, chain
place R there else place R in the
overflow chain of bucket r.
2
2
Search: Given the hash filed value
k, compute r = h(k). Get the bucket
r and search for the record. If not
found, search the overflow chain
of bucket r. M-1 Overflow
buckets
Main buckets
5
Hashing for Dynamic File Organization
Dynamic files
§ files where record insertions and deletion take place frequently
§ the file keeps growing and also shrinking
Locating a record
Match the d-bit sequence with an entry in the directory and go to
the corresponding bucket to find the record
Prof P Sreenivasa Kumar 18
Department of CS&E, IITM
6
Insertion in Extendible Hashing Scheme (1/2)
2 - bit sequence for the record to be inserted: 00
b0 b0
1
full
00 b3
00 b1
2 01
01
10 b1
10
b2 11
11
2 d=2
b2
d=2
all local
b0 Full: Bucket b0 is split depth = 2
7
Extendible Hashing Example
Bucket capacity – 2 Initial buckets = 1
Insert 45,22
Local depth 45 101101
Global 0
0 45 22 10110
depth
22 12 1100
11 1011
Insert 12 1 Bucket overflows
1 22 local depth = global depth
0 12
⇒ Directory doubles and split image
1
1 is created
45
Insert 11 1
1 22
0 12
1
1
45
11
Prof P Sreenivasa Kumar 22
Department of CS&E, IITM
Insert 15 1
22
12 Overflow occurs.
2 2 Global depth = local depth
00 45 Directory doubles and split occurs
01
10 45 101101
11 2
11 22 10110
15 12 1100
Insert 10 2 11 1011
12 15 1111
10 1010
2
45 Overflows occurs.
2
00 Since local depth < global depth
01 2 Split image is created
10 10
Directory is not doubled
11 22
2
11
15
Prof P Sreenivasa Kumar 23
Department of CS&E, IITM
Linear Hashing
Does not require a separate directory structure
8
Insertion (1/3)
Initially the structure has M main buckets 0
Overflow
( 0 ,…, M-1 ) and a few overflow buckets buckets
1
.
When the first overflow in any bucket occurs: .
Say, overflow occurred in bucket s M-1
Insert the record in the overflow chain of bucket s
Create a new bucket M
M
Split the bucket 0 by using h1 Split image
of bucket 0
Some records stay in bucket 0 and
some go to bucket M.
Insertion (2/3)
0
On first overflow,
irrespective of where it occurs, bucket 0 is split
1
On subsequent overflows
buckets 1, 2, 3, … are split in that order
(This why the scheme is called linear hashing) 2
N: the next bucket to be split .
After M overflows, .
.
Split
ho h1 hi M+1
images
… …
h1 h2 hi+1
.
.
9
Insertion (3/3)
Say the hash functions in use are hi, hi+1
To insert record with hash field value x,
Compute hi(x)
if hi(x) < N, the original bucket is already split
place the record in bucket hi+1(x)
else place the record in bucket hi(x)
Insert 12, 11
Insert 14 0 12 h0 = x mod 2
N N
0 12 14 h1 = x mod 4
11
B0 overflows
Bucket pointed by 1 11
N is split
Hash functions are
changed
Insert 13
0 12 h0 = x mod 2
N
0 12 h1 = x mod 4
Insert 9
14 N
1 11 9
B1 overflows 13
1 11
B0 is split using h1
13
and split image
2 14
is created
Insert 10
N 0 12
0 12
Insert 18 1 9
N 13
1 11 9
overflow at B2
13 2 14 18
h1 is split B1 10
applied here h0 = x mod 4
2 14 h1 = x mod 8 3 11
10
10
Index Structures
Index: A disk data structure
– enables efficient retrieval of a record
given the value (s) of certain attributes
– indexing attributes
Primary Index:
Index built on ordering key field of a file
Clustering Index:
Index built on ordering non-key field of a file
Secondary Index:
Index built on any non-ordering field of a file
a block Bj .
.
.
.
. 129
.
2
Index file: ordered file (sorted on OKF) 130
.
.
11
An Example
Data file:
No. of blocks b = 9500
Block size B = 4KB
OKF length V = 15 bytes
Block pointer length p = 6 bytes
Index file
No. of records ri = 9500
Size of entry V + P = 21 bytes
Blocking factor BFi = 4096/21 = 195
No. of blocks bi = ri/BFi = 49
Max No. of block accesses for getting record
using the primary index 1 + log2 bi = 7
Max No. of block accesses for getting record
without using primary index log2b = 14
Prof P Sreenivasa Kumar 34
Department of CS&E, IITM
12
Clustering Index
Built on ordered files where ordering field is not a key
Index attribute: ordering field (OF)
Distinct value Vi address of the first
Index entry: of the OF block that has a record with OF value Vi
Secondary Index
Built on any non-ordering field (NOF) of a data file.
Case I: NOF is also a key (Secondary key)
value of the NOF Vi pointer to the record with Vi as the NOF value
(2) value of the NOF Vi pointer to a block that has pointer(s) to the record(s)
with Vi as the NOF value
Remarks:
(1) index entry – variable length record
(2) index entry – fixed length – One more level of indirection
Prof P Sreenivasa Kumar 38
Department of CS&E, IITM
13
An Example
Data file:
No. of records r = 90,000 Block size B = 4KB
Record length R = 100 bytes BF = 4096/100 = 40,
b = 90000/40 = 2250
NOF length V = 15 bytes length of a record pointer Pr = 7 bytes
Index file :
No. of records ri = 90,000 record length = V + Pr = 22 bytes
BFi = 4096/22 = 186 No. of blocks bi = 90000/186 = 484
.
.
.
.
1 block
For the example data file: Second level
index
No of block accesses required: 3 blocks
First level
index 2250
multi-level index: 4 484 blocks blocks
single level index: 10
Prof P Sreenivasa Kumar 42
Department of CS&E, IITM
14
Index Sequential Access Method (ISAM) Files
ISAM files –
Ordered files with a multilevel primary/clustering index
Insertions:
Handled using overflow chains at data file blocks
Deletions:
Handled using deletion markers
Order
Order (m) of an Internal Node
• Order of an internal node is the maximum number of tree
pointers held in it.
• Maximum of (m-1) keys can be present in an internal node
15
Pi: Tree pointer
Internal Node Structure (Block pointer)
m Ki: Key value
2
≤ j≤ m m : Order(internal)
P1 K1 P2 K2 … Ki-1 Pi Ki … Kj-1 Pj …
Sub-trees
x ≤ K1 Ki-1 < x ≤ Ki Kj-1 < x
Example
2 5 12 -
x≤ 2 x > 12
2 < x ≤ 5 5 < x ≤ 12
Internal Nodes
An internal node of a B+- tree of order m:
m
§ It contains at least pointers, except when it is the root node
2
(Root nodes – a min of 2 pointers is ok)
§ It contains at most m pointers.
§ If it has P1, P2, …, Pj pointers with
K1 < K2 < K3 … < Kj-1 as keys, where m ≤ j ≤ m, then
2
• P1 points to the sub-tree with records having key value x ≤ K1
16
Order Calculation
Block size: B, Size of Index field: V
Size of block pointer: P, Size of record pointer: Pr
3 7
2 - 4 - 9 -
1 2 3 - 4 - 6 7 8 9 12 15 ^
17
Insertion into B+- trees
Every (key, record pointer) pair is inserted in an appropriate leaf
(Search for it)
§ If a leaf node overflows:
• Node is split at j = (m leaf + 1)
2
• First j entries are kept in original node
• Entities from j+1 are moved to new node
• jth key value Kj is replicated in the parent of the leaf.
§ If an internal node overflows:
• Node is split at j = (m + 1)
2
• Values and pointers up to Pj are kept in the original node
• jth key value Kj is moved to the parent of the internal node
• Pj+1 and the rest of entries are moved to a new node.
Prof P Sreenivasa Kumar 52
Department of CS&E, IITM
1
11 20 ^ 11 14 20 - ^
Overflow. leaf is split
at j = (mleaf + 1) = 2
2
14 is copied to parent level
3 4
Insert 25 Insert 30
14 - ^ 14 25
Inserted at Overflow.
11 14 20 25 ^ 11 14 20 25 30
leaf level split at 25.
25 is copied
to upper level
Prof P Sreenivasa Kumar
Department of CS&E, IITM
5 .
14 -
11 12 14 - 20 25 30 -
18
Insert 22
14 6
- ^
12 - ^ 22 25
11 12 14 20 22 25 30 -
Insert 23, 24 7
14 24
12 - ^ 22 25
11 12 14 20 22 23 24 25 30
Example
14 24
12 22 25
11 12 14 20 22 23 24 25 30
Delete 20
Removed entry
14 24 from leaf here
12 22 25
11 12 14 22 23 24 25 30
19
Delete 22 22 is removed
from leaf and
14 24 internal node
entries from right
sibling are
12 23 25
distributed to left
11 12 14 23 24 25 30
Delete 24
14
12 23 25
11 12 14 23 25 30
Delete 14
12
11 23 25
11 12 23 25 30
Delete 12
23 25 Level drop has occurred
11 23 25 30
3) Height of the tree is less as only keys are used for indexing
20
Parallel Access of Multiple Disks
Single Disk: high block access time: 6msec – 50msec
Why not use parallel access to improve performance?
RAID – Redundant Arrays of Independent Disks (current usage)
Redundant Arrays of Inexpensive Disks (early usage)
RAID techniques aim to improve performance and reliability
Data Striping
Data Striping – distribute data on multiple disks
Bit-level striping: ith bit of each byte – stored on the ith disk
Use 8 disks for 8 bits of a byte. // higher granularity is also possible
One (parallel) block read – 8 blocks of the data file
Transfer rate – eight times that of single disk
Read/write of a block – involves use of all the disks
An example scenario:
Mean Time To Failure (MTTF) of a disk: 2,40,000hrs
That is, probability of failure of a single disk in an hour: 1/2,40,000
This is unacceptable..
21
Mirroring disks to increase reliability
Mirroring – Each disk has a mirror disk – same data on both
If a disk fails – use the mirror of that disk till the original is replaced
If a disk k fails: Set the ith bit of block j using ith parity bit of block j
Do this for all blocks to recover data of disk k!
N – data disks, one extra disk – good performance and reliability!
Prof P Sreenivasa Kumar 65
Department of CS&E, IITM
Distributed Parity
N data disks and 1 redundant (parity) disk
• Very good performance and protection against single-disk crash
• Updating any data block – requires updating the parity disk
• Usage of parity disk – high and it ages faster!
22
Standard RAID Levels
RAID-0 – Bit-level striping; No parity data; No mirroring
RAID-1 – Mirrored disks; No parity; No data striping
RAID-2 – Bit-level striping; Redundancy using Hamming codes
Not in much use currently.
RAID-3 – Byte-level striping; dedicated parity disk
Not in common use currently.
RAID-4 – Block-level striping; dedicated parity disk
RAID-5 – Block-level striping; distributed parity
RAID-6 – Block-level striping; double distributed parity;
Up to 2 disk crashes can be tolerated
23