0% found this document useful (0 votes)
24 views

B - TREES: (Loosely Based On The Cow Book: Ch. 10)

This document discusses B+ trees, which are commonly used to index database tables. It describes how B+ trees are structured, with index entries in non-leaf nodes pointing to child nodes and data entries in leaf nodes containing record IDs. It explains how to perform operations like searching, inserting, and deleting records from the B+ tree through examples. Key aspects covered include splitting and merging nodes, redistributing entries between siblings to improve space utilization, and handling duplicate keys.

Uploaded by

naveejr
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

B - TREES: (Loosely Based On The Cow Book: Ch. 10)

This document discusses B+ trees, which are commonly used to index database tables. It describes how B+ trees are structured, with index entries in non-leaf nodes pointing to child nodes and data entries in leaf nodes containing record IDs. It explains how to perform operations like searching, inserting, and deleting records from the B+ tree through examples. Key aspects covered include splitting and merging nodes, redistributing entries between siblings to improve space utilization, and handling duplicate keys.

Uploaded by

naveejr
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

8-1kLLS

(LCCSLL 8ASLD CN 1nL CCW 8CCk: Cn. 10)


lall 2013
9/17/13 CS 564: Database Management Systems, Jignesh M. Patel 1
Monvanon
9/17/13 CS 564: Database Management Systems, Jignesh M. Patel 2
Consider the following table:
CREATE TABLE Tweets (
uniqueMsgID INTEGER, -- unique message id
tstamp TIMESTAMP, -- when was the tweet posted
uid INTEGER, -- unique id of the user
msg VARCHAR (140), -- the actual message
zip INTEGER -- zipcode when posted
);
Consider the following query, Q1:
SELECT * FROM Tweets
WHERE uid = 145;
And, the following query, Q2:
SELECT * FROM Tweets
WHERE zip BETWEEN 53000 AND 54999
Ways Lo evaluaLe Lhe querles, emclenLly?
1. SLore Lhe Lable as a heaple, scan Lhe le. l/C CosL?
2. SLore Lhe Lable as a sorted h|e, blnary search Lhe le. l/C CosL?
3. SLore Lhe Lable as a heaple, bulld an |ndex, and search uslng Lhe lndex.
4. SLore Lhe Lable ln an |ndex le. 1he enure Luple ls sLored ln Lhe lndex!
Index
1wo maln Lypes of lndlces
nash lndex: good for equallLy search (e.g. C1)
8-tree lndex: good for boLh range search
(e.g. C2) and equallLy search (e.g. C1)
Cenerally a hash lndex ls fasLer Lhan a 8-Lree lndex
for equallLy search
Pash lndlces alm Lo geL C(1) l/C and Cu
performance for search and lnserL
8-1rees have C(log
l
n) l/C and Cu cosL for
search, lnserL and deleLe.
9/17/13 CS 564: Database Management Systems, Jignesh M. Patel 3
What |s |n the |ndex
1wo Lhlngs: |ndex key and some va|ue
lnserL(lndexkey, value)
Search (lndexkey) -> value (s)
WhaL ls Lhe lndex key for C1 and C2?
Conslder C3:
value:
8ecord ld
LlsL of record ld
1he enure Luple!
9/17/13 CS 564: Database Management Systems, Jignesh M. Patel 4
SELECT * FROM Tweets
WHERE uid = 145 AND
zip BETWEEN 53000 AND 54999
9/17/13 CS 564: Database Management Systems, Jignesh M. Patel 5
(Ub|qu|tous) 8+ 1ree
PelghL-balanced (dynamlc) Lree sLrucLure
lnserL/deleLe aL log
l
n cosL (l = fanouL, n = # leaf pages)
Mlnlmum 30 occupancy (excepL for rooL).
Lach node conLalns d <= ! <= 2d enLrles.
1he parameLer d ls called Lhe order of Lhe Lree.
SupporLs equallLy and range-searches emclenLly.
Index Entries
(Direct search)

Data
Entries
Data Entries
Entries in the leaf pages:
(search key value, recordid)
Index Entries
Entries in the index
(i.e. non-leaf) pages:
(search key value, pageid)
9/17/13 CS 564: Database Management Systems, Jignesh M. Patel 6
Lxamp|e 8+ 1ree
Search: SLarung from rooL, examlne lndex enLrles ln non-leaf nodes,
and Lraverse down Lhe Lree unul a leaf node ls reached
non-leaf nodes can be searched uslng a blnary or a llnear search.
Search for 3*, 13*, all daLa enLrles >=24*
Root
17 24 30
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
13
Height = 1
9/17/13 CS 564: Database Management Systems, Jignesh M. Patel 7
8+-tree age Iormat
L
e
a
f

P
a
g
e

R
1
K
1
R
2
K
2
K
n
P
n+1
data entries
record 1 record 2

Next
Page
Pointer

R
n
record n

P
0
Prev
Page
Pointer

N
o
n
-
l
e
a
f



P
a
g
e

P
1
K
1
P
2
K
2
P
3
K
m
P
m+1
index entries
Pointer to a
page with
Values < K
1
Pointer to a page
with values s.t.
K
1
! Values < K
2
Pointer to a
page with
values "K
m
Pointer to a page
with values s.t.,
K
2
! Values < K
3
P
m
9/17/13 CS 564: Database Management Systems, Jignesh M. Patel 8
8+ 1rees |n racnce
1yplcal order: 100. 1yplcal ll-facLor: 67.
average fanouL = 133
1yplcal capaclues:
PelghL 4: 133
4
= 312,900,700 records
PelghL 3: 133
3
= 2,332,637 records
Can oen hold Lop levels ln buer pool:
Level 1 = 1 page = 8 kbyLes
Level 2 = 133 pages = 1 MbyLe
Level 3 = 17,689 pages = 133 M8yLes
9/17/13 CS 564: Database Management Systems, Jignesh M. Patel 9
8+-1ree: Insernng a Data Lntry
llnd correcL leaf "#
uL daLa enLry onLo ".
lf " has enough space, %&'(!
Llse, musL !"#$% " )*'+& " ,'% , '(- '&%( "./
8edlsLrlbuLe enLrles evenly, copy up mlddle key.
lnserL lndex enLry polnung Lo ". lnLo parenL of ".
1hls can happen recurslvely
1o spllL non-leaf node, redlsLrlbuLe enLrles evenly, buL
push|ng up Lhe mlddle key. (ConLrasL wlLh leaf spllLs.)
SpllLs grow" Lree, rooL spllL lncreases helghL.
1ree growLh: geLs -*%(0 or &'( 1(2(1 +,11(0 ,+ +&3#
9/17/13 CS 564: Database Management Systems, Jignesh M. Patel 10
Insernng 8* |nto 8+ 1ree
Root
17 24 30
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
13
Entry to be inserted in parent node
Copied up (and continues to
appear in the leaf)
2* 3* 5* 7* 8*
5
9/17/13 CS 564: Database Management Systems, Jignesh M. Patel 11
Insernng 8* |nto 8+ 1ree
Insert in parent node.
Pushed up (and only appears once in
the index)
5 24 30
17
13
9/17/13 CS 564: Database Management Systems, Jignesh M. Patel 12
2* 3*
Root
17
24 30
14* 16*
19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
13 5
7* 5* 8*
Insernng 8* |nto 8+ 1ree
8ooL was spllL: helghL lncreases by 1
Could avold spllL by re-dlsLrlbuung enLrles wlLh a slbllng
Slbllng: lmmedlaLely Lo le or rlghL, and same parenL
9/17/13 CS 564: Database Management Systems, Jignesh M. Patel 13
Insernng 8* |nto 8+ 1ree
8e-dlsLrlbuung enLrles wlLh a s|b||ng
lmproves page occupancy
usually noL used for non-leaf node spllLs. Why?
lncreases l/C, especlally lf we check boLh slbllngs
8euer lf spllL propagaLes up Lhe Lree (rare)
use only for leaf level enLrles as we have Lo seL polnLers
Root
17 24 30
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
13
8* 14*
16*
8
9/17/13 CS 564: Database Management Systems, Jignesh M. Patel 14
8+-1ree: De|enng a Data Lntry
SLarL aL rooL, nd leaf " where enLry belongs.
8emove Lhe enLry.
lf L ls aL leasL half-full, %&'(4
lf L has only d-1 enLrles,
1ry Lo re-d|str|bute, borrowlng from 5*61*'7 ),%8,9('+
'&%( -*+: 5,!( 3,0('+ ,5 "/.
lf re-dlsLrlbuuon falls, &'()' " and slbllng.
lf merge occurred, musL deleLe enLry (polnung Lo " or
slbllng) from parenL of ".
Merge could propagate Lo rooL, decreaslng helghL.
9/17/13 CS 564: Database Management Systems, Jignesh M. Patel 15
De|enng 22* and 20*
ueleung 22* ls easy.
ueleung 20* ls done wlLh re-dlsLrlbuuon.
nouce how mlddle key ls cop|ed up.
27* 29*
2* 3*
Root
17
24 30
14* 16*
19* 20* 22* 24* 33* 34* 38* 39*
13 5
7* 5* 8*
27
24* 27* 29*
9/17/13 CS 564: Database Management Systems, Jignesh M. Patel 16
... And 1hen De|enng 24*
MusL merge.
ln Lhe non-leaf node,
+&55 Lhe lndex enLry
wlLh key value = 27
30
19* 27* 29* 33* 34* 38* 39*
Can this
merge?
2* 3* 7*
14* 16*
19* 27* 29* 33* 34* 38* 39* 5* 8*
Root
30 13 5 17
! Pull down of
index entry
9/17/13 CS 564: Database Management Systems, Jignesh M. Patel 17
Non-|eaf ke-d|str|bunon
1ree %;0*'7 %(1(<&' of 24*.
Can re-dlsLrlbuLe enLry from le chlld of rooL Lo
rlghL chlld.
Root
13 5 17 20
22
30
14* 16*
17* 18* 20* 33* 34* 38* 39*
22* 27* 29* 21*
7* 5* 8* 3* 2*
9/17/13 CS 564: Database Management Systems, Jignesh M. Patel 18
Aher ke-d|str|bunon
8oLaLe Lhrough Lhe parenL node
lL sumces Lo re-dlsLrlbuLe lndex enLry wlLh key 20, lor
lllusLrauon 17 also re-dlsLrlbuLed
14* 16* 33* 34* 38* 39*
22* 27* 29* 17* 18* 20* 21*
7* 5* 8* 2* 3*
Root
13 5
17
30
20
22
9/17/13 CS 564: Database Management Systems, Jignesh M. Patel 19
8+-1ree De|enon
1ry redlsLrlbuuon wlLh a|| slbllngs rsL, Lhen
merge. Why?
Cood chance LhaL redlsLrlbuuon ls posslble (large
fanouL!)
Cnly need Lo propagaLe changes Lo parenL node
llles Lyplcally grow noL shrlnk!
9/17/13 CS 564: Database Management Systems, Jignesh M. Patel 20
Dup||cates
uupllcaLe keys: many daLa enLrles wlLh Lhe same key
value
Soluuon 1:
All enLrles wlLh a glven key value reslde on a slngle page
use overow pages!
Soluuon 2:
Allow dupllcaLe key values ln daLa enLrles
Modlfy search
use 8lu Lo geL a un|que (composlLe) key!
use llsL of rlds lnsLead of a slngle rld ln Lhe leaf level
Slngle daLa enLry could sull span muluple pages
9/17/13 CS 564: Database Management Systems, Jignesh M. Patel 21
A Note on Crder
=0%(0 (d) concepL replaced by physlcal space crlLerlon
ln pracuce (,+ 1(,5+ :,1>?>;11).
lndex (l.e. non-leaf) pages can Lyplcally hold many more
enLrles Lhan leaf pages.
Leaf pages could have acLual daLa records
varlable slzed records and search keys mean dlerenL
nodes wlll conLaln dlerenL numbers of enLrles.
Lven wlLh xed lengLh elds, muluple records wlLh Lhe
same search key value (%;31*9,+(5) can lead Lo varlable-
slzed daLa enLrles (e.g. llsL of rlds).
9/17/13 CS 564: Database Management Systems, Jignesh M. Patel 22
ISAM - Indexed Sequenna| Access Method
A 5+,<9 8+-Lree
When Lhe lndex ls creaLed, bulld a 8+-Lree on Lhe relauon
updaLes and deleLes don'L change Lhe non-leaf pages.
use overow pages. Leaf pages could be empLy!
Search CosL: Log
l
n + # overow pages
Non-leaf
Pages
Overflow
page
Primary pages
Leaf Pages
(primary
pages
sequential)

You might also like