B - TREES: (Loosely Based On The Cow Book: Ch. 10)
This document discusses B+ trees, which are commonly used to index database tables. It describes how B+ trees are structured, with index entries in non-leaf nodes pointing to child nodes and data entries in leaf nodes containing record IDs. It explains how to perform operations like searching, inserting, and deleting records from the B+ tree through examples. Key aspects covered include splitting and merging nodes, redistributing entries between siblings to improve space utilization, and handling duplicate keys.
B - TREES: (Loosely Based On The Cow Book: Ch. 10)
This document discusses B+ trees, which are commonly used to index database tables. It describes how B+ trees are structured, with index entries in non-leaf nodes pointing to child nodes and data entries in leaf nodes containing record IDs. It explains how to perform operations like searching, inserting, and deleting records from the B+ tree through examples. Key aspects covered include splitting and merging nodes, redistributing entries between siblings to improve space utilization, and handling duplicate keys.
lall 2013 9/17/13 CS 564: Database Management Systems, Jignesh M. Patel 1 Monvanon 9/17/13 CS 564: Database Management Systems, Jignesh M. Patel 2 Consider the following table: CREATE TABLE Tweets ( uniqueMsgID INTEGER, -- unique message id tstamp TIMESTAMP, -- when was the tweet posted uid INTEGER, -- unique id of the user msg VARCHAR (140), -- the actual message zip INTEGER -- zipcode when posted ); Consider the following query, Q1: SELECT * FROM Tweets WHERE uid = 145; And, the following query, Q2: SELECT * FROM Tweets WHERE zip BETWEEN 53000 AND 54999 Ways Lo evaluaLe Lhe querles, emclenLly? 1. SLore Lhe Lable as a heaple, scan Lhe le. l/C CosL? 2. SLore Lhe Lable as a sorted h|e, blnary search Lhe le. l/C CosL? 3. SLore Lhe Lable as a heaple, bulld an |ndex, and search uslng Lhe lndex. 4. SLore Lhe Lable ln an |ndex le. 1he enure Luple ls sLored ln Lhe lndex! Index 1wo maln Lypes of lndlces nash lndex: good for equallLy search (e.g. C1) 8-tree lndex: good for boLh range search (e.g. C2) and equallLy search (e.g. C1) Cenerally a hash lndex ls fasLer Lhan a 8-Lree lndex for equallLy search Pash lndlces alm Lo geL C(1) l/C and Cu performance for search and lnserL 8-1rees have C(log l n) l/C and Cu cosL for search, lnserL and deleLe. 9/17/13 CS 564: Database Management Systems, Jignesh M. Patel 3 What |s |n the |ndex 1wo Lhlngs: |ndex key and some va|ue lnserL(lndexkey, value) Search (lndexkey) -> value (s) WhaL ls Lhe lndex key for C1 and C2? Conslder C3: value: 8ecord ld LlsL of record ld 1he enure Luple! 9/17/13 CS 564: Database Management Systems, Jignesh M. Patel 4 SELECT * FROM Tweets WHERE uid = 145 AND zip BETWEEN 53000 AND 54999 9/17/13 CS 564: Database Management Systems, Jignesh M. Patel 5 (Ub|qu|tous) 8+ 1ree PelghL-balanced (dynamlc) Lree sLrucLure lnserL/deleLe aL log l n cosL (l = fanouL, n = # leaf pages) Mlnlmum 30 occupancy (excepL for rooL). Lach node conLalns d <= ! <= 2d enLrles. 1he parameLer d ls called Lhe order of Lhe Lree. SupporLs equallLy and range-searches emclenLly. Index Entries (Direct search)
Data Entries Data Entries Entries in the leaf pages: (search key value, recordid) Index Entries Entries in the index (i.e. non-leaf) pages: (search key value, pageid) 9/17/13 CS 564: Database Management Systems, Jignesh M. Patel 6 Lxamp|e 8+ 1ree Search: SLarung from rooL, examlne lndex enLrles ln non-leaf nodes, and Lraverse down Lhe Lree unul a leaf node ls reached non-leaf nodes can be searched uslng a blnary or a llnear search. Search for 3*, 13*, all daLa enLrles >=24* Root 17 24 30 2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* 13 Height = 1 9/17/13 CS 564: Database Management Systems, Jignesh M. Patel 7 8+-tree age Iormat L e a f
P a g e
R 1 K 1 R 2 K 2 K n P n+1 data entries record 1 record 2
Next Page Pointer
R n record n
P 0 Prev Page Pointer
N o n - l e a f
P a g e
P 1 K 1 P 2 K 2 P 3 K m P m+1 index entries Pointer to a page with Values < K 1 Pointer to a page with values s.t. K 1 ! Values < K 2 Pointer to a page with values "K m Pointer to a page with values s.t., K 2 ! Values < K 3 P m 9/17/13 CS 564: Database Management Systems, Jignesh M. Patel 8 8+ 1rees |n racnce 1yplcal order: 100. 1yplcal ll-facLor: 67. average fanouL = 133 1yplcal capaclues: PelghL 4: 133 4 = 312,900,700 records PelghL 3: 133 3 = 2,332,637 records Can oen hold Lop levels ln buer pool: Level 1 = 1 page = 8 kbyLes Level 2 = 133 pages = 1 MbyLe Level 3 = 17,689 pages = 133 M8yLes 9/17/13 CS 564: Database Management Systems, Jignesh M. Patel 9 8+-1ree: Insernng a Data Lntry llnd correcL leaf "# uL daLa enLry onLo ". lf " has enough space, %&'(! Llse, musL !"#$% " )*'+& " ,'% , '(- '&%( "./ 8edlsLrlbuLe enLrles evenly, copy up mlddle key. lnserL lndex enLry polnung Lo ". lnLo parenL of ". 1hls can happen recurslvely 1o spllL non-leaf node, redlsLrlbuLe enLrles evenly, buL push|ng up Lhe mlddle key. (ConLrasL wlLh leaf spllLs.) SpllLs grow" Lree, rooL spllL lncreases helghL. 1ree growLh: geLs -*%(0 or &'( 1(2(1 +,11(0 ,+ +&3# 9/17/13 CS 564: Database Management Systems, Jignesh M. Patel 10 Insernng 8* |nto 8+ 1ree Root 17 24 30 2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* 13 Entry to be inserted in parent node Copied up (and continues to appear in the leaf) 2* 3* 5* 7* 8* 5 9/17/13 CS 564: Database Management Systems, Jignesh M. Patel 11 Insernng 8* |nto 8+ 1ree Insert in parent node. Pushed up (and only appears once in the index) 5 24 30 17 13 9/17/13 CS 564: Database Management Systems, Jignesh M. Patel 12 2* 3* Root 17 24 30 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* 13 5 7* 5* 8* Insernng 8* |nto 8+ 1ree 8ooL was spllL: helghL lncreases by 1 Could avold spllL by re-dlsLrlbuung enLrles wlLh a slbllng Slbllng: lmmedlaLely Lo le or rlghL, and same parenL 9/17/13 CS 564: Database Management Systems, Jignesh M. Patel 13 Insernng 8* |nto 8+ 1ree 8e-dlsLrlbuung enLrles wlLh a s|b||ng lmproves page occupancy usually noL used for non-leaf node spllLs. Why? lncreases l/C, especlally lf we check boLh slbllngs 8euer lf spllL propagaLes up Lhe Lree (rare) use only for leaf level enLrles as we have Lo seL polnLers Root 17 24 30 2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* 13 8* 14* 16* 8 9/17/13 CS 564: Database Management Systems, Jignesh M. Patel 14 8+-1ree: De|enng a Data Lntry SLarL aL rooL, nd leaf " where enLry belongs. 8emove Lhe enLry. lf L ls aL leasL half-full, %&'(4 lf L has only d-1 enLrles, 1ry Lo re-d|str|bute, borrowlng from 5*61*'7 ),%8,9('+ '&%( -*+: 5,!( 3,0('+ ,5 "/. lf re-dlsLrlbuuon falls, &'()' " and slbllng. lf merge occurred, musL deleLe enLry (polnung Lo " or slbllng) from parenL of ". Merge could propagate Lo rooL, decreaslng helghL. 9/17/13 CS 564: Database Management Systems, Jignesh M. Patel 15 De|enng 22* and 20* ueleung 22* ls easy. ueleung 20* ls done wlLh re-dlsLrlbuuon. nouce how mlddle key ls cop|ed up. 27* 29* 2* 3* Root 17 24 30 14* 16* 19* 20* 22* 24* 33* 34* 38* 39* 13 5 7* 5* 8* 27 24* 27* 29* 9/17/13 CS 564: Database Management Systems, Jignesh M. Patel 16 ... And 1hen De|enng 24* MusL merge. ln Lhe non-leaf node, +&55 Lhe lndex enLry wlLh key value = 27 30 19* 27* 29* 33* 34* 38* 39* Can this merge? 2* 3* 7* 14* 16* 19* 27* 29* 33* 34* 38* 39* 5* 8* Root 30 13 5 17 ! Pull down of index entry 9/17/13 CS 564: Database Management Systems, Jignesh M. Patel 17 Non-|eaf ke-d|str|bunon 1ree %;0*'7 %(1(<&' of 24*. Can re-dlsLrlbuLe enLry from le chlld of rooL Lo rlghL chlld. Root 13 5 17 20 22 30 14* 16* 17* 18* 20* 33* 34* 38* 39* 22* 27* 29* 21* 7* 5* 8* 3* 2* 9/17/13 CS 564: Database Management Systems, Jignesh M. Patel 18 Aher ke-d|str|bunon 8oLaLe Lhrough Lhe parenL node lL sumces Lo re-dlsLrlbuLe lndex enLry wlLh key 20, lor lllusLrauon 17 also re-dlsLrlbuLed 14* 16* 33* 34* 38* 39* 22* 27* 29* 17* 18* 20* 21* 7* 5* 8* 2* 3* Root 13 5 17 30 20 22 9/17/13 CS 564: Database Management Systems, Jignesh M. Patel 19 8+-1ree De|enon 1ry redlsLrlbuuon wlLh a|| slbllngs rsL, Lhen merge. Why? Cood chance LhaL redlsLrlbuuon ls posslble (large fanouL!) Cnly need Lo propagaLe changes Lo parenL node llles Lyplcally grow noL shrlnk! 9/17/13 CS 564: Database Management Systems, Jignesh M. Patel 20 Dup||cates uupllcaLe keys: many daLa enLrles wlLh Lhe same key value Soluuon 1: All enLrles wlLh a glven key value reslde on a slngle page use overow pages! Soluuon 2: Allow dupllcaLe key values ln daLa enLrles Modlfy search use 8lu Lo geL a un|que (composlLe) key! use llsL of rlds lnsLead of a slngle rld ln Lhe leaf level Slngle daLa enLry could sull span muluple pages 9/17/13 CS 564: Database Management Systems, Jignesh M. Patel 21 A Note on Crder =0%(0 (d) concepL replaced by physlcal space crlLerlon ln pracuce (,+ 1(,5+ :,1>?>;11). lndex (l.e. non-leaf) pages can Lyplcally hold many more enLrles Lhan leaf pages. Leaf pages could have acLual daLa records varlable slzed records and search keys mean dlerenL nodes wlll conLaln dlerenL numbers of enLrles. Lven wlLh xed lengLh elds, muluple records wlLh Lhe same search key value (%;31*9,+(5) can lead Lo varlable- slzed daLa enLrles (e.g. llsL of rlds). 9/17/13 CS 564: Database Management Systems, Jignesh M. Patel 22 ISAM - Indexed Sequenna| Access Method A 5+,<9 8+-Lree When Lhe lndex ls creaLed, bulld a 8+-Lree on Lhe relauon updaLes and deleLes don'L change Lhe non-leaf pages. use overow pages. Leaf pages could be empLy! Search CosL: Log l n + # overow pages Non-leaf Pages Overflow page Primary pages Leaf Pages (primary pages sequential)