0% found this document useful (0 votes)
4 views59 pages

LN344 1112 Index

The document discusses index structures in database systems, focusing on physical storage and tree-based index structures such as B-trees and B+-trees. It explains the characteristics of disk storage devices, the organization of records, and the differences between unordered and ordered files. Additionally, it details the properties and operations of B-trees and B+-trees, including insertion, deletion, and the structure of nodes.

Uploaded by

levent
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views59 pages

LN344 1112 Index

The document discusses index structures in database systems, focusing on physical storage and tree-based index structures such as B-trees and B+-trees. It explains the characteristics of disk storage devices, the organization of records, and the differences between unordered and ordered files. Additionally, it details the properties and operations of B-trees and B+-trees, including insertion, deletion, and the structure of nodes.

Uploaded by

levent
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

Index Structures

Lecture 11–12

BIL344 Database Systems


Mustafa Sert
Assoc. Prof.
[email protected]

Depa r tment of Computer Engi neer i ng, Ba şkent Uni ver si ty


Ankara 06810 TURKEY
Agenda..
2

 Physical Storage
 Tree-based index structures

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


Disk Storage Devices

 Preferred secondary storage device for high


storage capacity and low cost.
 Data stored as magnetized areas on magnetic disk
surfaces.
 A disk pack contains several magnetic disks
connected to a rotating spindle.
 Disks are divided into concentric circular tracks on
each disk surface.
 Track capacities vary typically from 4 to 50 Kbytes or
more

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


Disk Storage Devices (contd.)

 A track is divided into smaller blocks or sectors


 because it usually contains a large amount of information
 The division of a track into sectors is hard-coded on the
disk surface and cannot be changed.
 One type of sector organization calls a portion of a track
that subtends a fixed angle at the center as a sector.
 A track is divided into blocks.
 The block size B is fixed for each system.
◼ Typical block sizes range from B=512 bytes to B=4096 bytes.
 Whole blocks are transferred between disk and main
memory for processing.

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


Disk Storage Devices (contd.)

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


Disk Storage Devices (contd.)
 A read-write head moves to the track that contains the
block to be transferred.
 Disk rotation moves the block under the read-write head for
reading or writing.
 A physical disk block (hardware) address consists of:
 a cylinder number (imaginary collection of tracks of same
radius from all recorded surfaces)
 the track number or surface number (within the cylinder)
 and block number (within track).
 Reading or writing a disk block is time consuming because
of the seek time s and rotational delay (latency) rd.
 Double buffering can be used to speed up the transfer of
contiguous disk blocks.

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


Disk Storage Devices (contd.)

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


Typical Disk Parameters

(Courtesy of Seagate Technology)

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


Records

 Fixed and variable length records


 Records contain fields which have values of a
particular type
 E.g., amount, date, time, age
 Fields themselves may be fixed length or variable
length
 Variable length fields can be mixed into one
record:
 Separator characters or length fields are needed so that
the record can be “parsed.”

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


Blocking
 Blocking:
 Refers to storing a number of records in one block on
the disk.
 Blocking factor (bfr) refers to the number of records
per block.
 There may be empty space in a block if an integral
number of records do not fit in one block.
 Spanned Records vs. Unspanned:
 Refers to records that exceed the size of one or more
blocks and hence span a number of blocks.
 Unspanned organization differs in which NO records
are spanned into multiple blocks

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


Files of Records

 A file is a sequence of records, where each record is a


collection of data values (or data items).
 A file descriptor (or file header) includes information that
describes the file, such as the field names and their data
types, and the addresses of the file blocks on disk.
 Records are stored on disk blocks.
 The blocking factor bfr for a file is the (average) number
of file records stored in a disk block.
 A file can have fixed-length records or variable-length
records.

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


Files of Records (contd.)
 File records can be unspanned or spanned
 Unspanned: no record can span two blocks
 Spanned: a record can be stored in more than one block

 The physical disk blocks that are allocated to hold the


records of a file can be contiguous, linked, or indexed.
 In a file of fixed-length records, all records have the same
format. Usually, unspanned blocking is used with such files.
 Files of variable-length records require additional
information to be stored in each record, such as separator
characters and field types.
 Usually spanned blocking is used with such files.

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


Unordered Files

 Also called a heap or a pile file.


 New records are inserted at the end of the file.
 A linear search through the file records is necessary
to search for a record.
 This requires reading and searching half the file blocks
on the average, and is hence quite expensive.
 Record insertion is quite efficient.
 Reading the records in order of a particular field
requires sorting the file records.

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


Ordered Files
 Also called a sequential file.
 File records are kept sorted by the values of an ordering field.
 Insertion is expensive: records must be inserted in the correct
order.
 It is common to keep a separate unordered overflow (or
transaction) file for new records to improve insertion efficiency; this
is periodically merged with the main ordered file.
 A binary search can be used to search for a record on its
ordering field value.
 This requires reading and searching log2 of the file blocks on the
average, an improvement over linear search.
 Reading the records in order of the ordering field is quite
efficient.

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


Ordered Files (contd.)

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


Average Access Times

 The following table shows the average access time


to access a specific record for a given type of file

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


B and B+ Trees
Dynamic Multilevel Indexes Using B-Trees
and B+-Trees
 Most multi-level indexes use B-tree or B+-tree data structures
because of the insertion and deletion problem
 This leaves space in each tree node (disk block) to allow for new
index entries
 These data structures are variations of search trees that allow
efficient insertion and deletion of new search values.
 In B-Tree and B+-Tree data structures, each node corresponds
to a disk block
 Each node is kept between half-full and completely full

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


Dynamic Multilevel Indexes Using B-Trees
and B+-Trees (contd.)
 An insertion into a node that is not full is quite
efficient
 If a node is full the insertion causes a split into two
nodes
 Splitting may propagate to other tree levels
 A deletion is quite efficient if a node does not
become less than half full
 If a deletion causes a node to become less than half
full, it must be merged with neighboring nodes

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


B-tree Structures

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


The Nodes of a B+-tree
 The nodes of a B+-tree
 (a) Internal node of a B+-tree with q –1 search values.
 (b) Leaf node of a B+-tree with q – 1 search values and q –
1 data pointers.

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


Difference between B-tree and B+-tree

 In a B-tree, pointers to data records exist at all


levels of the tree
 In a B+-tree, all pointers to data records exists at
the leaf-level nodes
 A B+-tree can have less levels (or higher capacity
of search values) than the corresponding B-tree

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


Definition of a B – Tree
23

 A B-tree with capacity order d is defined as:


 d  # of records per node  2d
(except the root, it has between 1 and 2d records)
d + 1  # of pointers per node  2d + 1
(except the root, it has between 2 and 2d+1 pointers)
 All the leaf nodes are on the same level
 Each node, excepting the root, must have a storage
utilization (load factor) of at least 50%
 Example node with capacity order (d) = 1:
 Max records = 2d = 2
 Max pointers = 2d+1 = 3
Ptr Record Ptr Record Ptr
BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.
B+-tree Example
Capacity OrderLeaf = 1
Capacity OrderInternal = 2

Insert the below records, in order:

80, 50, 100, 90, 60, 65, 70, 75, 55, 64, 51, 76, 77, 78, 200

80

80
BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.
B+-tree Example
80, 50, 100, 90, 60, 65, 70, 75, 55, 64, 51, 76, 77, 78, 200

80

50 80
BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.
B+-tree Example
80, 50, 100, 90, 60, 65, 70, 75, 55, 64, 51, 76, 77, 78, 200

80

50 80 100
BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.
B+-tree Example
80, 50, 100, 90, 60, 65, 70, 75, 55, 64, 51, 76, 77, 78, 200

80

50 80 90 100
BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.
B+-tree Example
80, 50, 100, 90, 60, 65, 70, 75, 55, 64, 51, 76, 77, 78, 200

50 60 80

80

90 100
BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.
B+-tree Example
80, 50, 100, 90, 60, 65, 70, 75, 55, 64, 51, 76, 77, 78, 200

60 80

50 60 80 90 100
BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.
B+-tree Example
80, 50, 100, 90, 60, 65, 70, 75, 55, 64, 51, 76, 77, 78, 200

60 80

50 60 65 80 90 100
BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.
B+-tree Example
80, 50, 100, 90, 60, 65, 70, 75, 55, 64, 51, 76, 77, 78, 200

65 70 80

60 80

50 60 90 100
BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.
B+-tree Example
80, 50, 100, 90, 60, 65, 70, 75, 55, 64, 51, 76, 77, 78, 200

60 70 80

50 60 65 70 80 90 100
BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.
B+-tree Example
80, 50, 100, 90, 60, 65, 70, 75, 55, 64, 51, 76, 77, 78, 200

60 70 80

50 60 65 70 75 80 90 100
BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.
B+-tree Example
80, 50, 100, 90, 60, 65, 70, 75, 55, 64, 51, 76, 77, 78, 200

50 55 60

60 70 80

65 70 75 80 90 100
BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.
B+-tree Example
80, 50, 100, 90, 60, 65, 70, 75, 55, 64, 51, 76, 77, 78, 200

55 60 70 80

50 55 60 65 70 75 80 90 100
BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.
B+-tree Example
80, 50, 100, 90, 60, 65, 70, 75, 55, 64, 51, 76, 77, 78, 200

64 65 70

55 60 70 80

50 55 60 75 80 90 100
BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.
B+-tree Example
80, 50, 100, 90, 60, 65, 70, 75, 55, 64, 51, 76, 77, 78, 200

65
55 60 70 80

50 55 60 64 65 70 75 80 90 100
BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.
B+-tree Example
80, 50, 100, 90, 60, 65, 70, 75, 55, 64, 51, 76, 77, 78, 200

65

55 60 70 80

50 55 60 64 65 70 75 80 90 100
BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.
B+-tree Example
80, 50, 100, 90, 60, 65, 70, 75, 55, 64, 51, 76, 77, 78, 200

65

55 60 70 80

50 55 60 64 65 70 75 80 90 100
BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.
B+-tree Example
80, 50, 100, 90, 60, 65, 70, 75, 55, 64, 51, 76, 77, 78, 200

65

51 55 60 70 80

50 51 55 60 64 65 70 75 80 90 100

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


B+-tree Example
80, 50, 100, 90, 60, 65, 70, 75, 55, 64, 51, 76, 77, 78, 200

65

51 55 60 70 76 80

50 51 55 60 64 65 70 75 76 80 90 100

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


B+-tree Example
80, 50, 100, 90, 60, 65, 70, 75, 55, 64, 51, 76, 77, 78, 200

65

51 55 60 70 76 80

50 51 55 60 64 65 70 75 76 77 80 90 100

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


B+-tree Example
80, 50, 100, 90, 60, 65, 70, 75, 55, 64, 51, 76, 77, 78, 200

65

51 55 60 70 76 78 80

50 51 55 60 64 65 70 75 76 77 78 80 90 100

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


B+-tree Example
80, 50, 100, 90, 60, 65, 70, 75, 55, 64, 51, 76, 77, 78, 200

65

100
51 55 60 70 76 78 80

50 51 55 60 64 65

70 75 76 77 78 80 90 100 200

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


B+-tree Example
80, 50, 100, 90, 60, 65, 70, 75, 55, 64, 51, 76, 77, 78, 200

65 78

51 55 60 70 76 80 100

70 75 76 77 78
50 51 55

60 64 65

80 90 100 200
BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.
Deletion from B-Tree
46

 Must maintain the B-tree constraints


 Process is opposite of insertion:
Join – and – Demote
 There are three possible scenarios:
 Deletion from a leaf node and constraints are maintained
 Deletion from a non-leaf node

 Deletion from a leaf node and constraints are violated (two


choices)

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


Deletion from B-Tree
47

 Scenario 1: Deletion from a leaf node that does not violate the
minimum capacity constraint
 Just delete the record from its location in the node
 Scenario 2: When deleting from a non-leaf node the comparison
key must be maintained (Delete the record with key 84)
 Replace the record with its inorder successor (right then all left). In
our case it is the record with key 87.
73 84

200 350 460


Inorder successor
120 160 176

87 91 92
BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.
Deletion from B-Tree
48

 Scenario 2 (cont.): When deleting from a non-


leaf node the comparison key must be
maintained

73 87

200 350 460

120 160 176

91 92

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


Deletion from B-Tree
49

 Scenario 3: Remove record from leaf node that causes a


minimum capacity constraint violation that can be
corrected by redisribution.
 DELETE – 70: Borrow a record from a sibling
Look right first by convention
 “rotate” with the record in their parent

85 200 350

53 70 90 100 105

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


Deletion from B-Tree
50

 Scenario 3: Remove record from leaf node that causes a


minimum capacity constraint violation that can be
corrected by redisribution.
 DELETE – 70: Borrow a record from a sibling
Look right first by convention
 “rotate” with the record in their parent

85 200 350

53 90 100 105

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


Deletion from B-Tree
51

 Scenario 3: Remove record from leaf node that causes a


minimum capacity constraint violation that can be
corrected by redisribution.
 DELETE – 70: Borrow a record from a sibling
Look right first by convention
 “rotate” with the record in their parent

90 200 350

53 85 100 105

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


Deletion from B-Tree
52

◼ Scenario 4: Remove record from leaf node that causes a minimum


capacity constraint violation that requires a coalescing of nodes.
⚫ DELETE – 70: Join – and – Demote
Combine siblings and the parent record into one node
If parent becomes deficient, recursively apply
If it propagates to the root and causes the root to be deficient, the tree
collapses

85 200 350

53 70 90 100

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


Deletion from B-Tree
53

◼ Scenario 4: Remove record from leaf node that causes a minimum


capacity constraint violation that requires a coalescing of nodes.
⚫ DELETE – 70: Join – and – Demote
Combine siblings and the parent record into one node
If parent becomes deficient, recursively apply
If it propagates to the root and causes the root to be deficient, the tree
collapses

85 200 350

53 90 100

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


Deletion from B-Tree
54

◼ Scenario 4: Remove record from leaf node that causes a minimum


capacity constraint violation that requires a coalescing of nodes.
⚫ DELETE – 70: Join – and – Demote
Combine siblings and the parent record into one node
If parent becomes deficient, recursively apply
If it propagates to the root and causes the root to be deficient, the tree
collapses

200 350

53 85 90 100

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


Deletion from B-Tree
55

◼ Scenario 4: Remove record from leaf node that causes a minimum


capacity constraint violation that requires a coalescing of nodes.
⚫ DELETE – 70: Join – and – Demote
Combine siblings and the parent record into one node
If parent becomes deficient, recursively apply
If it propagates to the root and causes the root to be deficient, the tree
collapses

200 350

53 85 90 100

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


Exercise – 2:
An Example
of an
Insertion in a
B+-tree

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


Exercise – 3:
An Example
of a
Deletion in a
B+-tree

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


Exercise – 4: Delete the record with key 24
5 13 17 1730

5 5 137 Association:
2 3 8 14 16 22 27 29 33 34 38 27
39 30
< : Left
 : Right

2 3 5 7 8 14 16 22 24 27 29 33 34 38 39

BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.


59
BIL344 Database Systems (2012) | Mustafa Sert, Ph.D.

You might also like