Database Modeling - Notes-V
Database Modeling - Notes-V
1
Random Access Methods
Hashing
Basic mechanism – transformation of a primary key directly to a physical address,
called a bucket (or indirectly via a logical address)
2
Extendible Hashing
* number of buckets grow or contracts
* bucket splits when it becomes full
* collisions are resolved immediately, no long overflow chains
* primary key transformed to an entry in the Bucket Address Table
(BAT), typically in RAM
* BAT has pointers to disk buckets that hold the actual data
* Retrieve a single record = 1 rba (access the bucket in one step)
* Cost (service time) of I/O for updates, inserts, and deletes is the same as for B+-trees
3
B-trees and B+-trees
B-tree index basic characteristics
* each node contains p pointers and p-1 records
* each pointer at level i is for a data and pointer block at level i+1
* i=1 denotes the root level (single node or block)
* can be inefficient for searching because of the overhead in each search level
4
B+-tree index basic characteristics
* eliminates data pointers from all nodes except the leaf nodes
* each non-leaf index node has p pointers and p-1 key values
* each pointer at level i is for an index block (of key/pointer pairs) at level i+1
* each leaf index has a key value/pointer pair to point to the actual data
block (and record) containing that primary key value
* leaf index nodes can be logically connected via pointers for ordered sequence search
* hybrid method for efficient random access and sequential search
Example: B + -tree
To determine the order of a B+-tree, let us assume that the database has 500,000
records of 200 bytes each, the search key is 15 bytes, the tree and data pointers are
5 bytes, and the index node (and data block size) is 1024 bytes. For this
configuration we have non-leaf index node size = 1024 bytes = p*5 + (p-1)*15
bytes
p = floor((1024+15)/20) = floor(51.95) = 51
number of search key values in the leaf nodes = floor ((1024-5)/(15+5))=50
h = height of the B+-tree (number of index levels, including the leaf index nodes
n = number of records in the database (or file); all must be pointed at from the next to last level, h-
1
ph-1(p-1) > n
(h-1)log p + log(p-1) > log n
(h-1)log p > log n-log(p-1)
h > 1 + (log n-log(p-1)) / log p
h > 1 + (log 500,000-log 50)/log 51 = 3.34, h=4 (nearest higher integer)
A good approximation can be made by assuming that the leaf index nodes are
implemented with p pointers and p key values:
ph > n
h log p > log n
h > log n/log p
In this case, the result above becomes h > 3.35 or h = 4.
5
B+-tree performance
read a single record (B+-tree) = h+1 rba
As an example, consider the insertion of a node (with key value 77) to the B+-
tree shown in Fig. 6.6. This insertion requires a search (query) phase and an
insertion phase with one split node. The total insertion cost for height 3 is
insertion cost = (3 + 1) rba search cost + (2 rba) rewrite cost
+ 1 split *(2 rba rewrite cost)
= 8 rba
6
7
Secondary Indexes
Basic characteristics of secondary indexes
* based on Boolean search criteria (AND, OR, NOT) of attributes that are
not the primary key
* one accession list per attribute value; pointers have block address and
record offset typically
bfac = block_size/pointer_size
* assume all accesses to the accession list are random due to dynamic re-allocation
of disk blocks
8
9