0% found this document useful (0 votes)
2 views34 pages

CNG351 Lecture 12 B

Uploaded by

berayseray382
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views34 pages

CNG351 Lecture 12 B

Uploaded by

berayseray382
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 34

Indexing Structures for Files

CNG351 - Data Management and File Structures


Lecture - 12
Instructor: Dr. Yeliz Yesilada
Multi-Level Indexes
• Because a single-level index is an ordered file, we can create a
primary index to the index itself;
– In this case, the original index file is called the first-level
index and the index to the index is called the second-level
index.
• We can repeat the process, creating a third, fourth, ..., top level
until all entries of the top level fit in one disk block
• A multi-level index can be created for any type of first-level
index (primary, secondary, clustering) as long as the first-level
index consists of more than one disk block

CNG 351 - lecture 11 2/34


A Two-level Primary Index

CNG 351 - lecture 11 3/34


Multi-Level Indexes
• Multilevel index reduces the number of blocks
accessed when searching for a record given its
indexing field value.
• Such a multi-level index is a form of search tree
– However, insertion and deletion of new index entries is
a severe problem because every level of the index is
an ordered file.
• To retain the benefits of using multilevel indexing
while reducing index insertion and deletion problems,
designers adopted a multilevel index called dynamic
multilevel index.

CNG 351 - lecture 11 4/34


Tree ADT 101

CNG 351 - lecture 11 5/34


A Node in a Search Tree with Pointers to
Subtrees below It

CNG 351 - lecture 11 6/34


A search tree of order p = 3.

CNG 351 - lecture 11 7/34


Using a Search Tree
• We can use a search tree as a mechanism to search for records
stored in a disk file.
• The values in the tree can be values of one of the fields of the
file, called the search field.
• Each key value in the tree is associated with a pointer to the
record in the data file having that value.
• Two issues:
– Balanced tree, is important because it guarantees that no
nodes will be at very high levels and hence require many
block accesses during a search tree.
– Record deletion, some nodes can be empty wasting storage.
– B-trees address these issues.
CNG 351 - lecture 11 8/34
Dynamic Multilevel Indexes
Using B-Trees and B+-Trees
• Most multi-level indexes use B-tree or B+-tree data structures
because of the insertion and deletion problem
– This leaves space in each tree node (disk block) to allow for
new index entries
• These data structures are variations of search trees that allow
efficient insertion and deletion of new search values.
• In B-Tree and B+-Tree data structures, each node corresponds
to a disk block
• Each node is kept between half-full and completely full

CNG 351 - lecture 11 9/34


B-Tree
• The tree is always balanced and the space wasted by deletion ,
if any never becomes excessive.
• Formal definition of a B-tree of order p:
– Each internal node in the B-tree is of the form:
• <P1, <K1, Pr1>, P2, <K2, Pr2>,…., Pq>
– Within each node K1<K2…<Kq-1.
– Each node has at most p tree pointers.
– Each node, except the root and leaf nodes has at least
ceiling (p/2) tree pointers.
– A node with q tree pointers, q<=p, has q-1 search key field
values (and hence q-1 data pointers).
– All leaf nodes are at the same level.

CNG 351 - lecture 11 10/34


Example B-tree with p=3

CNG 351 - lecture 11 11/34


B-Trees

• Insertion:
– An insertion into a node that is not full is quite efficient
– If a node is full the insertion causes a split into two
nodes
– Splitting may propagate to other tree levels
• Deletion:
– A deletion is quite efficient if a node does not become
less than half full
– If a deletion causes a node to become less than half
full, it must be merged with neighboring nodes

CNG 351 - lecture 11 12/34


Inserting 16*, 8* into Example B tree
Root
13 17 24 30

2* 3* 5* 7* 8* 14* 15* 16*

You overflow

13 17 24 30

2* 3* 5* 7* 8*

One new child (leaf node)


generated; must add one more
pointer to its parent, thus one more
key value as well.
Inserting 8* (cont.)
• Copy up the
13 17 24 30
middle value (leaf
Entry to be inserted in parent node.
split) (Note that 5 is
s copied up and
5
continues to appear in the leaf.)

2* 3* 7* 8*

5 13 17 24 30 You overflow!
Insertion into B tree (cont.)
• Understand
difference between
copy-up and push- 5 13 17 24 30
up

• Observe how
minimum We split this node, redistribute entries evenly,
occupancy is and push up middle key.
guaranteed in both
leaf and index pg 
splits.
Entry to be inserted in parent node.
17 (Note that 17 is pushed up and only
appears once in the index. Contrast
this with a leaf split.)

5 13 24 30
Example B Tree After Inserting 8*
Root
17

5 13 24 30

2* 3* 7* 8* 14* 15* 19* 20* 22* 27* 29* 33* 34* 38* 39*

Notice that root was split, leading to increase in height.

CNG 351 - lecture 11 16/34


Delete 19* and 20*
Root
17

5 13 24 30

2* 3* 7* 8* 14* 16* 19* 20* 22* 27* 29* 33* 34* 38* 39*

ow
nde r fl
u 22*
You

22* 24*

Have we still forgot something?


Deleting 19* and 20* (cont.)
Root

17

5 13 27 30

2* 3* 7* 8* 14* 16* 22* 24* 29* 33* 34* 38* 39*

• Notice how 27 is copied up.


• But can we move it up?
• Now we want to delete 24
• Underflow again! But can we redistribute this time?

CNG 351 - lecture 11 18/34


w n g!
i
flo sibl
Deleting 24* un
r
de ith
• Observe the two leaf nodes ou ew
Y erg
are merged, and 27 is M
discarded from their parent,
but … 30

• Observe `pull down’ of index


entry (below).
22* 27* 29* 33* 34* 38* 39*

New root 5 13 17 30

2* 3* 7* 8* 14* 16* 22* 27* 29* 33* 34* 38* 39*


B+-Trees
• Is a variation of a B-tree.
• In a B-tree, pointers to data records exist at all levels
of the tree.
• In a B+-tree, all pointers to data records exists at the
leaf-level nodes (data pointers are only at the leaf
nodes).
• A B+-tree can have less levels (or higher capacity of
search values) than the corresponding B-tree.

CNG 351 - lecture 11 20/34


The Nodes of a B+-tree

CNG 351 - lecture 11 21/34


B+ Tree

25 50 75

5 10 15 20 25 30 50 55 60 65 75 80 85 90

B+ tree is a B tree that have its Leaf nodes form linked lists

22
Inserting a Data Entry into a B+ Tree: Summary
• Find correct leaf L.
• Put data entry onto L.
– If L has enough space, done!
– Else, must split L (into L and a new node L2)
• Redistribute entries evenly, put middle key in L2
• copy up middle key.
• Insert index entry pointing to L2 into parent of L.
• This can happen recursively
– To split index node, redistribute entries evenly, but push up
middle key. (Contrast with leaf splits.)
• Splits “grow” tree; root split increases height.
– Tree growth: gets wider or one level taller at top.

CNG 351 - lecture 11 23/34


Insertion Example
• Add Record with Key 28:

• Add Record with Key 70: 50 55 60 65 70

24
Insertion Example (cont.)

• Add Record with Key


95: 75 80 85 90 95
– Split Leaf Node:
25 50 60 75 85
– Split Parent Node:

• New B+ Tree:

25
Rotation

• B+ trees can incorporate rotation to reduce the


number of node splits. A rotation occurs when a leaf
node is full, but one of its sibling nodes is not full.
• Example: Insert 70
• Before

• After

26
Deleting a Data Entry from a B+ Tree: Summary
• Start at root, find leaf L where entry belongs.
• Remove the entry.
– If L is at least half-full, done!
– If L has only d-1 entries,
• Try to re-distribute, borrowing from sibling (adjacent node with
same parent as L).
• If re-distribution fails, merge L and sibling.
• If merge occurred, must delete entry (pointing to L or sibling) from
parent of L.
• Merge could propagate to root, decreasing height.

CNG 351 - lecture 11 27/34


B * Tree
• Same as B tree but non-root nodes to be at least 2/3
full instead of 1/2. To maintain this, instead of
immediately splitting up a node when it gets full, its
keys are shared with the node next to it. i.e. rotate
before you split

28
Leaf Below Index Below Delete Actions
Fill Factor Fill Factor
NO NO Delete the record from the leaf. If the key
appears in the index, use the next key to
replace it.

YES NO Combine the leaf and its sibling. Change


the index to reflect the change.
YES YES 1.Combine the leaf and its sibling.
2.Adjust the index to reflect the change.
3.Combine the index with its sibling.
Continue combining index nodes until you
reach a node with the correct fill factor or
you reach the root.

29
Deletion Example
• Original B+ tree (After inserting Key 95):

30
Deletion Example (cont.)
• Delete Record with Key 70:

31
Deletion Example (cont.)
• Delete
Record
with
Key
25:

32
Deletion Example (cont.)
• Delete
Record
with Key
60:

• The leaf containing 60 (60 65) will be below the fill factor
after the deletion. Thus, we must combine leaf nodes.
• With recombined leaves, the index will be reduced by one key.
Hence, it will also fall below the fill factor. Thus, we must
combine index nodes.
• Sixty appears as the only key in the root index node.
Obviously, it will be removed with the deletion.
33
Summary
• Multilevel Indexes
• Dynamic Multilevel Indexes
– Using B-Trees, and
– Using B+-Trees
• Indexes on Multiple Keys

CNG 351 - lecture 11 34/34

You might also like