Ch08 Storage Indexing Overview
Ch08 Storage Indexing Overview
Overview of Storage and Indexing Tapes: Can only read pages in sequence
Cheaper than disks; used for archival storage
File organization: Method of arranging a file of records
on external storage.
Chapter 8 Record id (rid) is sufficient to physically locate record
Indexes are data structures that allow us to find the record ids
of records with given values in index search key fields
“How index-learning turns no student pale
Architecture: Buffer manager stages pages from external
Yet holds the eel of science by the tail.”
storage to main memory buffer pool. File and index
-- Alexander Pope (1688-1744) layers make calls to the buffer manager.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 2
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 3 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 4
B+ Tree Indexes
Example B+ Tree
Root
17
Non-leaf
Pages
Entries <= 17 Entries > 17
5 13 27 30
Leaf
Pages
Leaf pages contain data entries, and are chained (prev & next) 2* 3* 5* 7* 8* 14* 16* 22* 24* 27* 29* 33* 34* 38* 39*
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 15 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 16
* Several assumptions underlie these (rough) estimates! * Several assumptions underlie these (rough) estimates!
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 17 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 18
Understanding the Workload Choice of Indexes
For each query in the workload: What indexes should we create?
Which relations does it access? Which relations should have indexes? What field(s)
should be the search key? Should we build several
Which attributes are retrieved? indexes?
Which attributes are involved in selection/join conditions?
How selective are these conditions likely to be?
For each index, what kind of an index should it
be?
For each update in the workload:
Clustered? Hash/tree?
Which attributes are involved in selection/join conditions?
How selective are these conditions likely to be?
The type of update (INSERT/DELETE/UPDATE), and the
attributes that are affected.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 19 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 20
Clustering on E.hobby helps! FROM Emp E Lexicographic order, or Data entries in index Data entries
sorted by <sal,age> sorted by <sal>
WHERE E.hobby=Stamps Spatial order.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 23 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 24
SELECT D.mgr
Composite Search Keys Index-Only Plans FROM Dept D, Emp E
<E.dno> WHERE D.dno=E.dno
To retrieve Emp records with age=30 AND sal=4000, A number of <E.dno,E.eid> SELECT D.mgr, E.eid
an index on <age,sal> would be better than an index queries can be Tree index!
FROM Dept D, Emp E
on age or an index on sal. WHERE D.dno=E.dno
answered
Choice of index key orthogonal to clustering etc. SELECT E.dno, COUNT(*)
without <E.dno> FROM Emp E
If condition is: 20<age<30 AND 3000<sal<5000: retrieving any GROUP BY E.dno
Clustered tree index on <age,sal> or <sal,age> is best. tuples from one SELECT E.dno, MIN(E.sal)
If condition is: age=30 AND 3000<sal<5000: or more of the <E.dno,E.sal> FROM Emp E
Clustered <age,sal> index much better than <sal,age> relations Tree index! GROUP BY E.dno
index! involved if a <E. age,E.sal> SELECT AVG(E.sal)
Composite indexes are larger, updated more often. suitable index or FROM Emp E
is available. <E.sal, E.age> WHERE E.age=25 AND
Tree! E.sal BETWEEN 3000 AND 5000
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 25 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 26
Summary (Contd.)
Summary (Contd.)
Understanding the nature of the workload for the
application, and the performance goals, is essential
Data entries can be actual data records, <key, to developing a good design.
rid> pairs, or <key, rid-list> pairs. What are the important queries and updates? What
Choice orthogonal to indexing technique used to attributes/relations are involved?
locate data entries with a given key value. Indexes must be chosen to speed up important
Can have several indexes on a given file of queries (and perhaps some updates!).
data records, each with a different search key. Index maintenance overhead on updates to key fields.
Choose indexes that can help many queries, if possible.
Indexes can be classified as clustered vs.
Build indexes to support index-only strategies.
unclustered, primary vs. secondary, and
Clustering is an important decision; only one index on a
dense vs. sparse. Differences have important given relation can be clustered!
consequences for utility/performance. Order of fields in composite index key can be important.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 29 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 30