0% found this document useful (0 votes)
18 views9 pages

Database Modeling - Notes-V

Uploaded by

ishtiaq.hussain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views9 pages

Database Modeling - Notes-V

Uploaded by

ishtiaq.hussain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Batch processing of k sequentially stored records

read the transaction file:


lra = k where k = number of transaction records
sba = ceil(k/tfbf) where tfbf is the transaction file blocking factor
read the master file:
lra = n
sba = ceil(n/bf) where bf is the master file blocking factor
write a new master file:
lra = n + adds - deletes
sba = ceil((n+adds-deletes)/bf)
where adds is the number of records added or inserted,
and deletes is the number of records deleted.

1
Random Access Methods
Hashing
Basic mechanism – transformation of a primary key directly to a physical address,
called a bucket (or indirectly via a logical address)

Collisions – handled by variations of chained overflow techniques

random access to a hashed file


lra = 1 + overflow(avg)
rba = 1 + overflow(avg)
insertion into a hashed file
lra = 1 + overflow(avg) + rewrite
rba = 1 + overflow(avg)
rba=1 for the rewrite

2
Extendible Hashing
* number of buckets grow or contracts
* bucket splits when it becomes full
* collisions are resolved immediately, no long overflow chains
* primary key transformed to an entry in the Bucket Address Table
(BAT), typically in RAM
* BAT has pointers to disk buckets that hold the actual data
* Retrieve a single record = 1 rba (access the bucket in one step)
* Cost (service time) of I/O for updates, inserts, and deletes is the same as for B+-trees

3
B-trees and B+-trees
B-tree index basic characteristics
* each node contains p pointers and p-1 records
* each pointer at level i is for a data and pointer block at level i+1
* i=1 denotes the root level (single node or block)
* can be inefficient for searching because of the overhead in each search level

4
B+-tree index basic characteristics
* eliminates data pointers from all nodes except the leaf nodes
* each non-leaf index node has p pointers and p-1 key values
* each pointer at level i is for an index block (of key/pointer pairs) at level i+1
* each leaf index has a key value/pointer pair to point to the actual data
block (and record) containing that primary key value
* leaf index nodes can be logically connected via pointers for ordered sequence search
* hybrid method for efficient random access and sequential search

Example: B + -tree
To determine the order of a B+-tree, let us assume that the database has 500,000
records of 200 bytes each, the search key is 15 bytes, the tree and data pointers are
5 bytes, and the index node (and data block size) is 1024 bytes. For this
configuration we have non-leaf index node size = 1024 bytes = p*5 + (p-1)*15
bytes
p = floor((1024+15)/20) = floor(51.95) = 51
number of search key values in the leaf nodes = floor ((1024-5)/(15+5))=50
h = height of the B+-tree (number of index levels, including the leaf index nodes
n = number of records in the database (or file); all must be pointed at from the next to last level, h-
1

ph-1(p-1) > n
(h-1)log p + log(p-1) > log n
(h-1)log p > log n-log(p-1)
h > 1 + (log n-log(p-1)) / log p
h > 1 + (log 500,000-log 50)/log 51 = 3.34, h=4 (nearest higher integer)
A good approximation can be made by assuming that the leaf index nodes are
implemented with p pointers and p key values:
ph > n
h log p > log n
h > log n/log p
In this case, the result above becomes h > 3.35 or h = 4.

5
B+-tree performance
read a single record (B+-tree) = h+1 rba

update a single record (B+-tree) = search cost + rewrite data block


= (h+1) rba + 1 rba

general update cost for insertion (B+-tree)


=search cost (i.e., h+1 reads)
+simple rewrite of data block and leaf index node pointing to the
data block (i.e., 2 rewrites)
+nos*(write of new split index node
+ rewrite of the index node pointer to the new index node)
+ nosb*(write of new split data block)
= (h+1) rba + 2 rba + nos*(2 rba) + nosb*(1 rba)
where nos is the number of index split node operations required and nosb is the
number of data split block operations required

general update cost for deletion (B+-tree)


= search cost (i.e., h+1 reads)
+ simple rewrite of data block and leaf index node pointing to the
data block (i.e., 2 rewrites)
+ noc*(rewrite of the node pointer to the remaining node)
= (h+1) rba + 2 rba + noc*(1 rba)

where noc is the number of consolidations of index nodes required.

As an example, consider the insertion of a node (with key value 77) to the B+-
tree shown in Fig. 6.6. This insertion requires a search (query) phase and an
insertion phase with one split node. The total insertion cost for height 3 is
insertion cost = (3 + 1) rba search cost + (2 rba) rewrite cost
+ 1 split *(2 rba rewrite cost)
= 8 rba

6
7
Secondary Indexes
Basic characteristics of secondary indexes
* based on Boolean search criteria (AND, OR, NOT) of attributes that are
not the primary key

* attribute type index is level 1 (usually in RAM)

* attribute value index is level 2 (usually in RAM)


* accession list is level 3 (ordered list of pointers to blocks containing
records with the given attribute value)

* one accession list per attribute value; pointers have block address and
record offset typically

* accession lists can be merged to satisfy the intersection (AND) of


records that satisfy more than one condition

Boolean query cost (secondary index)


= search attribute type index + search attribute value index
+ search and merge m accession lists + access t target records

= (0 + 0 + sum of m accession list accesses) rba + t rba


= (sum of m accession list cost) rba + t rba
where m is the number of accession lists to be merged and t is the number
of target records to be accessed after the merge operation.
accession list cost (for accession list j) = ceil(pj/bfac) rba
where pj is the number of pointer entries in the jth accession list and bfac is
the blocking factor for all accession lists

bfac = block_size/pointer_size
* assume all accesses to the accession list are random due to dynamic re-allocation
of disk blocks

 use the 1% rule


(any variable affecting the result by less than 1% is ignored)

8
9

You might also like