Lecture12 (CNC 312)
Lecture12 (CNC 312)
Lecture 12
1
Motivation
❖ DBMS stores vast quantities of data
❖ Data is stored on external storage devices and fetched
into main memory as needed for processing
❖ Page is unit of information read from or written to
disk. (in DBMS, a page may have size 8KB or more).
❖ Data on external storage devices :
▪ Disks: Can retrieve random page at fixed cost
But reading several consecutive pages is much cheaper than reading
them in random order
▪ Tapes: Can only read pages in sequence
Cheaper than disks; used for archival storage
2
Structure of a DBMS: These layers
must consider
Query Optimization
❖ external storage access and Execution
▪Disk space manager
manages persistent data Relational Operators
3
Files versus Indices
❖ File organization :
▪ Method of arranging a file of records on external storage.
▪ Record id (rid) is sufficient to physically locate record
❖ Indexes :
▪ Indexes are data structures that allow to find record ids
of records with given values in index search key fields
4
File Organizations
▪ Heap (random order) files: Suitable when typical
access is a file scan retrieving all records.
5
Alternatives for Data Entry k* in Index
❖ Data Entry : Records stored in index file
▪ Given search key value k, provide for efficient retrieval of all
data entries k* with value k.
6
Alternatives for Data Entries
❖ Alternative 1: Full data record with key value k
7
Alternatives for Data Entries
❖ Alternatives 2 (<k, rid>) and 3 (<k, list-of-rids>):
▪ Data entries typically much smaller than data records.
❖ Comparison:
▪ Both better than Alternative 1 with large data records,
especially if search keys are small.
8
Index Classification
9
Index Clustered vs Unclustered
❖ Observation 1:
▪ Alternative 1 implies clustered. True ?
❖ Observation 2:
▪ In practice, clustered also implies Alternative 1 (since
sorted files are rare).
❖ Observation 3:
▪ A file can be clustered on at most one search key.
❖ Observation 4:
▪ Cost of retrieving data records through index varies
greatly based on whether index is clustered or not !!
10
Clustered vs. Unclustered Index
Index entries
CLUSTERED direct search for UNCLUSTERED
data entries
11
Clustered vs. Unclustered Index
❖ Use Alternative (2) for data entries
❖ Data records are stored in Heap file.
▪ To build clustered index, first sort the Heap file
▪ Overflow pages may be needed for inserts.
▪ Thus, order of data recs is close to (not identical to) sort order.
Index entries
CLUSTERED direct search for UNCLUSTERED
data entries
Non-leaf
Pages
Leaf
Pages
(Sorted by search key)
❖ Index leaf pages contain data entries, and are chained (prev & next)
❖ Index non-leaf pages have index entries; only used to direct searches:
index entry
P0 K 1 P1 K 2 P 2 K m Pm
13
Example B+ Tree
Root Note how data entries
17 in leaf level are sorted
5 13 27 30
2* 3* 5* 7* 8* 14* 16* 22* 24* 27* 29* 33* 34* 38* 39*
14
Hash-Based Indexes
❖ Index is a collection of buckets.
▪ Bucket = primary page plus zero or more overflow pages.
▪ Buckets contain data entries.
❖ Hashing function h:
▪ h(r) = bucket in which data entry for record r belongs.
▪ h looks at search key fields of r.
▪ No need for “index entries” due to one-level index file
15
Hash-Based Indexes
16
Understanding the Workload
❖ For each query in workload:
▪ Which relations does it access?
▪ Which attributes are retrieved?
▪ Which attributes are involved in selection/join conditions?
▪ How selective are these conditions likely to be?
17
Choice of Indexes
❖ What indexes should we create?
▪ Which relations should have indexes?
▪ What field(s) should be the search key?
▪ Should we build several indexes?
18
Choice of Indexes: One Approach
▪ Consider most important queries in turn.
19
Choice of Indexes: Simple Approach
20
Index Selection Guidelines
❖ Attributes in WHERE clause are candidates for index keys.
21
Index Selection Guidelines
22
Examples of Clustered Indexes
SELECT E.dno
FROM Emp E
WHERE E.age>40
❖ Trade-offs :
▪ How selective is the condition?
(all > 40?) or (only some > 40)
24
Examples of Clustered Indexes
SELECT E.dno
FROM Emp E
WHERE E.hobby=Stamps
25
Indexes with Composite Search Keys
Composite Search Keys: Search on combination of fields (sal and age).
11,80 11
12,10 12
12,20 name age sal 12
13,75 bob 12 10 13
<age, sal> cal 11 80 <age>
joe 12 20
10,12 sue 13 75 10
20,12 Data records 20
75,13 sorted by name 75
80,11 80
<sal, age> <sal>
Data entries in index Data entries
sorted by <sal,age> sorted by <sal>
26
Equality and Composite Search Keys
27
Composite Search Keys
28
Ranges and Composite Search Keys
Examples of composite key
indexes using lexicographic order.
❖ Range query: Some
field value is not a 11,80 11
29
Composite Search Keys
30
Index-Only Plans
❖ Answer a query without
retrieving actual tuples …
❖ Is that possible ?
<E.dno> ?
SELECT E.dno, MIN(E.sal)
<E.sal> ?
FROM Emp E
<E.dno,E.sal> ?
GROUP BY E.dno
PROS:
+ The chance for index-only evaluation is
increased.
CONS:
- Index size larger.
- Update response for any field.
34
Index-Only Plans
❖ Which is better?
35
Index-Only Plans
❖ Tree index on
<dno,age>, SELECT E.dno, COUNT (*)
or on : FROM Emp E
<age,dno> WHERE E.age=30
GROUP BY E.dno
❖ Which is better?
36