DBMS Indexing and Storage
DBMS Indexing and Storage
Modified from:
Database System Concepts, 6th Ed.
©Silberschatz, Korth and Sudarshan
See www.db-book.com for conditions on re-use
Chapter 11: Indexing and Storage
n DBMS Storage
l Memory hierarchy
l File Organization
l Buffering
n Indexing
l Basic Concepts
l B+-Trees
l Static Hashing
l Index Definition in SQL
l Multiple-Key Access
CS425 – Fall 2013 – Boris Glavic 11.2 ©Silberschatz, Korth and Sudarshan
Memory Hierarchy
Modified from:
Database System Concepts, 6th Ed.
©Silberschatz, Korth and Sudarshan
See www.db-book.com for conditions on re-use
DBMS Storage
n Modern Computers have different types of memory
l Cache, Main Memory, Harddisk, SSD, …
n Memory types have different characteristics in terms of
l Persistent vs. volatile
l Speed (random vs. sequential access)
l Size
l Price – this usually determines size
n Database systems are designed to be use these different memory
types effectively
l Need for persistent storage: the state of the database needs to be
written to persistent storage
guarantee database content is not lost when the computer is
shutdown
l Moving data between different types of memory
Want to use fast memory to speed-up operations
Need slower memory for the size
CS425 – Fall 2013 – Boris Glavic 11.4 ©Silberschatz, Korth and Sudarshan
Storage Hierarchy
Persistent
storage cache
main memory
Speed
flash memory
Size
magnetic disk
optical disk
magnetic tapes
CS425 – Fall 2013 – Boris Glavic 11.5 ©Silberschatz, Korth and Sudarshan
Main Memory vs. Disk
n Why do we not only use main memory
l What if database does not fit into main memory?
l Main memory is volatile
n Main memory vs. disk
l Given available main memory when do we keep which part of the
database in main memory
Buffer manager: Component of DBMS that decides when to
move data between disk and main memory
l How do we ensure transaction property durability
Buffer manager needs to make sure data written by committed
transactions is written to disk to ensure durability
CS425 – Fall 2013 – Boris Glavic 11.6 ©Silberschatz, Korth and Sudarshan
Random vs. Sequential Access
n Transfer of data from disk has a minimal size = 1 block
l Reading 1 byte is as fast as reading one block (e.g., 4KB)
n Random Access
l Read data from anywhere on the disk
l Need to get to the right track (seek time)
l Need to wait until the right sector is under the arm (on avg ½ time
for one rotation) (rotational delay)
l Then can transfer data at ~ transfer rate
n Sequential Access
l Read data that is on the current track + sector
l can transfer data at ~ transfer rate
n Reading large number of small pieces of data randomly is very slow
compared to sequential access
l Thus, try layout data on disk in a way that enables sequential
access
CS425 – Fall 2013 – Boris Glavic 11.7 ©Silberschatz, Korth and Sudarshan
File Organization
Modified from:
Database System Concepts, 6th Ed.
©Silberschatz, Korth and Sudarshan
See www.db-book.com for conditions on re-use
File Organization
CS425 – Fall 2013 – Boris Glavic 11.9 ©Silberschatz, Korth and Sudarshan
Fixed-Length Records
n Simple approach:
l Store record i starting from byte n (i – 1), where n is the size of
each record. Put maximal P / n records on each page.
l Record access is simple but records may cross blocks
Modification: do not allow records to cross block boundaries
n Deletion of record i:
alternatives:
l move records i + 1, . . ., n
to i, . . . , n – 1
l move record n to i
l do not move records, but
link all free records on a
free list
CS425 – Fall 2013 – Boris Glavic 11.10 ©Silberschatz, Korth and Sudarshan
Free Lists
n Store the address of the first deleted record in the file header.
n Use this first record to store the address of the second deleted record,
and so on
n Can think of these stored addresses as pointers since they “point” to
the location of a record.
CS425 – Fall 2013 – Boris Glavic 11.11 ©Silberschatz, Korth and Sudarshan
Variable-Length Records
CS425 – Fall 2013 – Boris Glavic 11.12 ©Silberschatz, Korth and Sudarshan
Variable-Length Records: Slotted Page Structure
CS425 – Fall 2013 – Boris Glavic 11.13 ©Silberschatz, Korth and Sudarshan
Organization of Records in Files
CS425 – Fall 2013 – Boris Glavic 11.14 ©Silberschatz, Korth and Sudarshan
Buffering
Modified from:
Database System Concepts, 6th Ed.
©Silberschatz, Korth and Sudarshan
See www.db-book.com for conditions on re-use
Buffer Manager
n Buffer Manager
l Responsible for loading pages from disk and writing modified
pages back to disk
n Handling blocks
1. If the block is already in the buffer, the buffer manager
returns the address of the block in main memory
2. If the block is not in the buffer, the buffer manager
1. Allocates space in the buffer for the block
1. Replacing (throwing out) some other block, if required,
to make space for the new block.
2. Replaced block written back to disk only if it was
modified since the most recent time that it was written
to/fetched from the disk.
2. Reads the block from the disk to the buffer, and returns
the address of the block in main memory to requester.
CS425 – Fall 2013 – Boris Glavic 11.16 ©Silberschatz, Korth and Sudarshan
Buffer-Replacement Policies
n Most operating systems replace the block least recently used
(LRU strategy)
n Idea behind LRU – use past pattern of block references as a
predictor of future references
n Queries have well-defined access patterns (such as sequential
scans), and a database system can use the information in a user’s
query to predict future references
l LRU can be a bad strategy for certain access patterns involving
repeated scans of data
For example: when computing the join of 2 relations r and s
by a nested loops
for each tuple tr of r do
for each tuple ts of s do
if the tuples tr and ts match …
l Mixed strategy with hints on replacement strategy provided
by the query optimizer is preferable
CS425 – Fall 2013 – Boris Glavic 11.17 ©Silberschatz, Korth and Sudarshan
Buffer-Replacement Policies (Cont.)
n Pinned block – memory block that is not allowed to be written
back to disk. E.g., an operation still needs this block.
n Toss-immediate strategy – frees the space occupied by a block
as soon as the final tuple of that block has been processed
n Most recently used (MRU) strategy – system must pin the
block currently being processed. After the final tuple of that block
has been processed, the block is unpinned, and it becomes the
most recently used block.
n Buffer manager can use statistical information regarding the
probability that a request will reference a particular relation
l E.g., the data dictionary is frequently accessed. Heuristic:
keep data-dictionary blocks in main memory buffer
n Buffer managers also support forced output of blocks for the
purpose of recovery (more in Chapter 16 in the textbook)
CS425 – Fall 2013 – Boris Glavic 11.18 ©Silberschatz, Korth and Sudarshan
Indexing and Hashing
Modified from:
Database System Concepts, 6th Ed.
©Silberschatz, Korth and Sudarshan
See www.db-book.com for conditions on re-use
Basic Concepts
n Indexing mechanisms used to speed up access to desired data.
l E.g., author catalog in library
n Search Key - attribute or set of attributes used to look up records in a
file.
n An index file consists of records (called index entries) of the form
search-key pointer
n Index files are typically much smaller than the original file
n Two basic kinds of indices:
l Ordered indices: search keys are stored in some sorted order
l Hash indices: search keys are distributed uniformly across
“buckets” using a “hash function”.
CS425 – Fall 2013 – Boris Glavic 11.20 ©Silberschatz, Korth and Sudarshan
Index Evaluation Metrics
n Access types supported efficiently. E.g.,
l records with a specified value in the attribute
l or records with an attribute value falling in a specified range of
values.
n Access time
n Insertion time
n Deletion time
n Space overhead
CS425 – Fall 2013 – Boris Glavic 11.21 ©Silberschatz, Korth and Sudarshan
Ordered Indices
n In an ordered index, index entries are stored sorted on the search key
value. E.g., author catalog in library.
n Primary index: in a sequentially ordered file, the index whose search
key specifies the sequential order of the file.
l Also called clustering index
l The search key of a primary index is usually but not necessarily the
primary key.
n Secondary index: an index whose search key specifies an order
different from the sequential order of the file. Also called
non-clustering index.
n Index-sequential file: ordered sequential file with a primary index.
CS425 – Fall 2013 – Boris Glavic 11.22 ©Silberschatz, Korth and Sudarshan
Secondary Indices Example
CS425 – Fall 2013 – Boris Glavic 11.23 ©Silberschatz, Korth and Sudarshan
Primary and Secondary Indices
n Indices offer substantial benefits when searching for records.
n BUT: Updating indices imposes overhead on database
modification --when a file is modified, every index on the file
must be updated,
n Sequential scan using primary index is efficient, but a
sequential scan using a secondary index is expensive
l Each record access may fetch a new block from disk
l Block fetch requires about 5 to 10 milliseconds, versus
about 100 nanoseconds for memory access
CS425 – Fall 2013 – Boris Glavic 11.24 ©Silberschatz, Korth and Sudarshan
Secondary Indices
n Frequently, one wants to find all the records whose values in
a certain field (which is not the search-key of the primary
index) satisfy some condition.
l Example 1: In the instructor relation stored sequentially by
ID, we may want to find all instructors in a particular
department
l Example 2: as above, but where we want to find all
instructors with a specified salary or with salary in a
specified range of values
n We can have a secondary index with an index record for
each search-key value
CS425 – Fall 2013 – Boris Glavic 11.25 ©Silberschatz, Korth and Sudarshan
B+-Tree Index
CS425 – Fall 2013 – Boris Glavic 11.26 ©Silberschatz, Korth and Sudarshan
Example of B+-Tree
CS425 – Fall 2013 – Boris Glavic 11.27 ©Silberschatz, Korth and Sudarshan
B+-Tree Index Files (Cont.)
CS425 – Fall 2013 – Boris Glavic 11.28 ©Silberschatz, Korth and Sudarshan
B+-Tree Node Structure
n Typical node
CS425 – Fall 2013 – Boris Glavic 11.29 ©Silberschatz, Korth and Sudarshan
Leaf Nodes in B+-Trees
CS425 – Fall 2013 – Boris Glavic 11.30 ©Silberschatz, Korth and Sudarshan
Example of B+-tree
CS425 – Fall 2013 – Boris Glavic 11.31 ©Silberschatz, Korth and Sudarshan
Observations about B+-trees
n Since the inter-node connections are done by pointers,
“logically” close blocks need not be “physically” close.
n The non-leaf levels of the B+-tree form a hierarchy of sparse
indices.
n The B+-tree contains a relatively small number of levels
Level below root has at least 2* n/2 values
Next level has at least 2* n/2 * n/2 values
.. etc.
l If there are K search-key values in the file, the tree height is
no more than logn/2(K)
l thus searches can be conducted efficiently.
n Insertions and deletions to the main file can be handled
efficiently, as the index can be restructured in logarithmic time
(as we shall see).
CS425 – Fall 2013 – Boris Glavic 11.32 ©Silberschatz, Korth and Sudarshan
Queries on B+-Trees
n Find record with search-key value V.
1. C=root
2. While C is not a leaf node {
1. Let i be least value s.t. V Ki.
2. If no such exists, set C = last non-null pointer in C
3. Else { if (V= Ki ) Set C = Pi +1 else set C = Pi}
}
3. Let i be least value s.t. Ki = V
4. If there is such a value i, follow pointer Pi to the desired record.
5. Else no record with search-key value k exists.
CS425 – Fall 2013 – Boris Glavic 11.33 ©Silberschatz, Korth and Sudarshan
Handling Duplicates
n With duplicate search keys
l In both leaf and internal nodes,
we cannot guarantee that K1 < K2 < K3 < . . . < Kn–1
CS425 – Fall 2013 – Boris Glavic 11.34 ©Silberschatz, Korth and Sudarshan
Queries on B+-Trees (Cont.)
n If there are K search-key values in the file, the height of the tree is no
more than logn/2(K).
n A node is generally the same size as a disk block, typically 4
kilobytes
l and n is typically around 100 (40 bytes per index entry).
n With 1 million search key values and n = 100
l at most log50(1,000,000) = 4 nodes are accessed in a lookup.
n Contrast this with a balanced binary tree with 1 million search key
values — around 20 nodes are accessed in a lookup
l above difference is significant since every node access may need
a disk I/O, costing around 20 milliseconds
CS425 – Fall 2013 – Boris Glavic 11.35 ©Silberschatz, Korth and Sudarshan
Updates on B+-Trees: Insertion (Cont.)
n Splitting a leaf node:
l take the n (search-key value, pointer) pairs (including the one
being inserted) in sorted order. Place the first n/2 in the original
node, and the rest in a new node.
l let the new node be p, and let k be the least key value in p. Insert
(k,p) in the parent of the node being split.
l If the parent is full, split it and propagate the split further up.
n Splitting of nodes proceeds upwards till a node that is not full is found.
l In the worst case the root node may be split increasing the height
of the tree by 1.
Result of splitting node containing Brandt, Califieri and Crick on inserting Adams
Next step: insert entry with (Califieri,pointer-to-new-node) into parent
CS425 – Fall 2013 – Boris Glavic 11.36 ©Silberschatz, Korth and Sudarshan
B+-Tree Insertion
CS425 – Fall 2013 – Boris Glavic 11.37 ©Silberschatz, Korth and Sudarshan
B+-Tree Insertion
CS425 – Fall 2013 – Boris Glavic 11.38 ©Silberschatz, Korth and Sudarshan
Examples of B+-Tree Deletion
CS425 – Fall 2013 – Boris Glavic 11.39 ©Silberschatz, Korth and Sudarshan
Examples of B+-Tree Deletion (Cont.)
CS425 – Fall 2013 – Boris Glavic 11.40 ©Silberschatz, Korth and Sudarshan
Non-Unique Search Keys
n Alternatives to scheme described earlier
l Buckets on separate block (bad idea)
l List of tuple pointers with each key
Extra code to handle long lists
Deletion of a tuple can be expensive if there are many
duplicates on search key (why?)
Low space overhead, no extra cost for queries
l Make search key unique by adding a record-identifier
Extra storage overhead for keys
Simpler code for insertion/deletion
Widely used
CS425 – Fall 2013 – Boris Glavic 11.41 ©Silberschatz, Korth and Sudarshan
Hashing
Modified from:
Database System Concepts, 6th Ed.
©Silberschatz, Korth and Sudarshan
See www.db-book.com for conditions on re-use
Static Hashing
CS425 – Fall 2013 – Boris Glavic 11.43 ©Silberschatz, Korth and Sudarshan
Example of Hash File Organization
CS425 – Fall 2013 – Boris Glavic 11.44 ©Silberschatz, Korth and Sudarshan
Example of Hash File Organization
CS425 – Fall 2013 – Boris Glavic 11.46 ©Silberschatz, Korth and Sudarshan
Handling of Bucket Overflows
n Bucket overflow can occur because of
l Insufficient buckets
l Skew in distribution of records. This can occur due to two
reasons:
multiple records have same search-key value
chosen hash function produces non-uniform distribution of key
values
n Although the probability of bucket overflow can be reduced, it cannot
be eliminated; it is handled by using overflow buckets.
CS425 – Fall 2013 – Boris Glavic 11.47 ©Silberschatz, Korth and Sudarshan
Handling of Bucket Overflows (Cont.)
CS425 – Fall 2013 – Boris Glavic 11.48 ©Silberschatz, Korth and Sudarshan
Hash Indices
n Hashing can be used not only for file organization, but also for index-
structure creation.
n A hash index organizes the search keys, with their associated record
pointers, into a hash file structure.
n Strictly speaking, hash indices are always secondary indices
l if the file itself is organized using hashing, a separate primary
hash index on it using the same search-key is unnecessary.
l However, we use the term hash index to refer to both secondary
index structures and hash organized files.
CS425 – Fall 2013 – Boris Glavic 11.49 ©Silberschatz, Korth and Sudarshan
Example of Hash Index
CS425 – Fall 2013 – Boris Glavic 11.50 ©Silberschatz, Korth and Sudarshan
Deficiencies of Static Hashing
n In static hashing, function h maps search-key values to a fixed set of B
of bucket addresses. Databases grow or shrink with time.
l If initial number of buckets is too small, and file grows, performance
will degrade due to too much overflows.
l If space is allocated for anticipated growth, a significant amount of
space will be wasted initially (and buckets will be underfull).
l If database shrinks, again space will be wasted.
n One solution: periodic re-organization of the file with a new hash
function
l Expensive, disrupts normal operations
n Better solution: allow the number of buckets to be modified dynamically.
CS425 – Fall 2013 – Boris Glavic 11.51 ©Silberschatz, Korth and Sudarshan
Index Definition in SQL
n Create an index
create index <index-name> on <relation-name>
(<attribute-list>)
E.g.: create index b-index on branch(branch_name)
n Use create unique index to indirectly specify and enforce the
condition that the search key is a candidate key is a candidate key.
n To drop an index
drop index <index-name>
n Most database systems allow specification of type of index, and
clustering.
CS425 – Fall 2013 – Boris Glavic 11.52 ©Silberschatz, Korth and Sudarshan
End of Chapter
Modified from:
Database System Concepts, 6th Ed.
©Silberschatz, Korth and Sudarshan
See www.db-book.com for conditions on re-use