c4 Index PDF
c4 Index PDF
B-Tree Indexes
4-1
Michael J. Corey, Michael Abbey, Daniel J. Dechichio, Ian Abramson: Oracle8 Tuning.
Osborne/ORACLE Press, 1998, ISBN 0-07-882390-0, 608 pages, ca. $44.99.
Oracle 8i Concepts, Release 2 (8.1.6), Oracle Corporation, 1999, Part No. A76965-01.
Page 10-23 ff: Indexes
Oracle 8i Designing and Tuning for Performance, Release 2 (8.1.6), Oracle Corporation,
1999, Part No. A76992-01. 12. Data Access Methods.
Oracle 8i SQL Reference, Release 2 (8.1.6), Oracle Corp., 1999, Part No. A76989-01.
CREATE INDEX, page 7-291 ff.
Oracle8 Administrators Guide, Release 8.0, Oracle Corp., 1997, Part No. A58397-01.
Appendix A: Space Estimations for Schema Objects.
Universit
at Halle, 2005
4. B-Tree Indexes
4-2
Objectives
After completing this chapter, you should be able to:
write a short paragraph about what indexes are.
explain the B-tree data structure.
decide whether a given index is useful for a given
query, select good indexes for an application.
explain why indexes have not only advantages.
enumerate input data about the application that is
necessary for physical database design.
write CREATE INDEX statements in Oracle SQL.
Stefan Brass: Datenbanken II B
Universit
at Halle, 2005
4. B-Tree Indexes
4-3
Overview
'
1. Motivation
&
2. B-Trees
3. Query Evaluation with Indexes, Index Selection
4. Index Declaration, Data Dictionary
5. Physical Design Summary
Universit
at Halle, 2005
4. B-Tree Indexes
4-4
Motivation (1)
Consider a table with information about customers:
CUSTOMERS(CUSTNO, FIRST_NAME, LAST_NAME, STREET,
CITY, STATE, ZIP, PHONE, EMAIL)
CUSTOMERS
CUSTNO FIRST_NAME LAST_NAME
1000001 John
Smith
1000002 Ann
Miller
1000003 David
Meyer
.
.
.
.
.
.
.
.
Universit
at Halle, 2005
4. B-Tree Indexes
4-5
Motivation (2)
Suppose that a specific customer record is queried:
SELECT *
FROM
CUSTOMERS
WHERE CUSTNO = 1000002
If there are no special access structures, the query
is executed with a full table scan:
for each row C in CUSTOMERS do
if C.CUSTNO = 1000002 then print C; fi
od
I.e. all 2 Million rows are read from the disk.
Stefan Brass: Datenbanken II B
Universit
at Halle, 2005
4. B-Tree Indexes
4-6
Motivation (3)
Average lengths: CUSTNO 5, FIRST_NAME 7, LAST_NAME 7,
STREET 20, CITY 10, STATE 2, ZIP 5, PHONE 10, EMAIL 20.
Then an average row needs needs 100 Byte.
86 Byte for the data, 9 for the lengths, 3 for the row header, and 2
for the row directory entry.
Universit
at Halle, 2005
4. B-Tree Indexes
4-7
Motivation (4)
Even if the entire table is stored in one extent of
contiguous blocks, a full table scan will need about
12 seconds.
Assuming that the disk reads 20MB/s in a sequential scan.
Universit
at Halle, 2005
4. B-Tree Indexes
4-8
Motivation (5)
One must consider not only a single query run in
isolation, but the entire system load.
Suppose that 100 employees enter orders in parallel, and for each order the customer data must be
accessed.
Since the DBMS (using the full table scan) can
process only five queries per minute, each employee
can only enter one order every 20 minutes.
If two full table scans run interleaved, the head has to move back and
forth, and the total time will be more than double of the time needed
for a single full table scan.
Stefan Brass: Datenbanken II B
Universit
at Halle, 2005
4. B-Tree Indexes
4-9
Motivation (6)
DB systems offer special data structures (indexes)
that allow to find all rows with a given attribute
value without reading and checking every row.
Consider how useful an index is in a book:
It is the only way to find all occurrences of a keyword without reading the entire text.
An typical B-tree index in a DBMS is very similar:
A (sorted) list of all occurring values for a specific
column together with references to the rows that
contain the respective value.
Stefan Brass: Datenbanken II B
Universit
at Halle, 2005
4. B-Tree Indexes
4-10
Motivation (7)
Indexes are sometimes called inverted files:
The heap file that contains the table data supports the mapping from ROWIDs to attribute
values.
A file containing a text document maps positions in the text to
words (i.e. it defines what is the first word, the second word, and
so on).
Universit
at Halle, 2005
4. B-Tree Indexes
4-11
Motivation (8)
In order to solve the example query, the DBMS
will first search the index over CUSTOMERS(CUSTNO)
(e.g. 4 block accesses).
In the index entry for the given customer number
1000002, it will find the ROWID of the requested
CUSTOMERS-row.
Finally, it reads the row with this ROWID from the
CUSTOMERS table (1 block access, 2 if row migrated).
In total, the query is executed in about 50 msec.
Stefan Brass: Datenbanken II B
Universit
at Halle, 2005
4. B-Tree Indexes
4-12
Universit
at Halle, 2005
4. B-Tree Indexes
4-13
Conceptual Schema
HH
H
H
H
H
HH
H
New Translation
HH
H
HH
H
HH
H
H
H
H
HH
H
Universit
at Halle, 2005
4. B-Tree Indexes
4-14
Overview
1. Motivation
'
2. B-Trees
&
Universit
at Halle, 2005
4. B-Tree Indexes
4-15
B+-Trees: Overview
The usual data structure for an index is the B+-tree.
Every modern DBMS contains some variant of B-trees plus maybe
other index structures for special applications.
Universit
at Halle, 2005
4. B-Tree Indexes
4-16
10
7
"
#
j
2
"
z
@
!
@
@
R
#@
j
9
"
Leaf
Leaf
Root
z
P
PP
PP
!
PP
PP
P
q
#
z
15
"
#
j
12
"
Leaf
z
@
!
@
@
R
#@
j
17
"
j
!
Leaf
Universit
at Halle, 2005
4. B-Tree Indexes
4-17
Universit
at Halle, 2005
4. B-Tree Indexes
4-18
150
390
562
Block 2
Block 3
Block 4
z
@
@
785
@
@
R
@
z
HH
H
HH
H
HH
j
H
Block 5
Block 6
Universit
at Halle, 2005
4. B-Tree Indexes
4-19
+
B -Trees:
Structure (2)
)
v v v
@
@
@
R
@
PP
PP
q
P
?
v v v
@
@
@
R
@
v v v
@
@
@
R
@
?
The blocks in the lowest level (leaf blocks) contain all occurring customer numbers (in ordered sequence) together with the address (ROWID) of
the corresponding CUSTOMERS row.
Stefan Brass: Datenbanken II B
Universit
at Halle, 2005
4. B-Tree Indexes
4-20
+
B -Trees:
v
230 v 390
Structure (3)
Branch Block
Leaf Blocks
HH
H
H
j
?
410 v 540
-
670 v 780
CUSTOMERS
CUSTNO
390
780
540
670
230
410
Universit
at Halle, 2005
4. B-Tree Indexes
4-21
It is also possible to create B-trees over the combination of two or more columns.
Then the indexed values are basically the concatenation of column
values (with e.g. a separator character).
Universit
at Halle, 2005
4. B-Tree Indexes
4-22
+
B -Trees:
v
Brown
Brass
Brown
-
Structure (5)
v
HH
H
HH
H
H
j
Meyer
Smith
Branch Block
Leaf Blocks
CUSTOMERS
LAST_NAME
Smith
Brass
Meyer
Smith
Brown
Brass
Universit
at Halle, 2005
4. B-Tree Indexes
4-23
Universit
at Halle, 2005
4. B-Tree Indexes
4-24
Universit
at Halle, 2005
4. B-Tree Indexes
4-25
Universit
at Halle, 2005
4. B-Tree Indexes
4-26
HH
v
230 390
HH
j
?
v
-
410 540
-
670 v 780
Universit
at Halle, 2005
4. B-Tree Indexes
4-27
Universit
at Halle, 2005
4. B-Tree Indexes
4-28
A tree of height 4 requires 5 (possibly 6) block accesses to get the row for a given customer number.
Four for the index and one for fetching the row from the table with
the ROWID obtained from the index. In case of a migrated row
(should be seldom), 2 block accesses are needed for fetching the row.
Universit
at Halle, 2005
4. B-Tree Indexes
+
B -Trees:
4-29
Performance (3)
Universit
at Halle, 2005
4. B-Tree Indexes
4-30
Universit
at Halle, 2005
4. B-Tree Indexes
4-31
+
B -Trees:
Performance (5)
Brass
? ?
Brown
v
HH
H
HH
H
H
j
-
Meyer
Smith
? ?
Universit
at Halle, 2005
4. B-Tree Indexes
4-32
Overview
1. Motivation
2. B-Trees
$
'
Universit
at Halle, 2005
4. B-Tree Indexes
4-33
Universit
at Halle, 2005
4. B-Tree Indexes
4-34
Universit
at Halle, 2005
4. B-Tree Indexes
4-35
Only rows satisfying both conditions are retrieved from the table.
Stefan Brass: Datenbanken II B
Universit
at Halle, 2005
4. B-Tree Indexes
4-36
Universit
at Halle, 2005
4. B-Tree Indexes
4-37
Universit
at Halle, 2005
4. B-Tree Indexes
4-38
Universit
at Halle, 2005
4. B-Tree Indexes
4-39
Universit
at Halle, 2005
4. B-Tree Indexes
4-40
Universit
at Halle, 2005
4. B-Tree Indexes
4-41
@
@
A=B
Universit
at Halle, 2005
4. B-Tree Indexes
4-42
Universit
at Halle, 2005
4. B-Tree Indexes
4-43
*
INVOICES
AMOUNT >= 10000
AMOUNT
Universit
at Halle, 2005
4. B-Tree Indexes
4-44
Universit
at Halle, 2005
4. B-Tree Indexes
4-45
Universit
at Halle, 2005
4. B-Tree Indexes
4-46
Universit
at Halle, 2005
4. B-Tree Indexes
4-47
Universit
at Halle, 2005
4. B-Tree Indexes
4-48
@
@
A=B
Universit
at Halle, 2005
4. B-Tree Indexes
4-49
Universit
at Halle, 2005
4. B-Tree Indexes
4-50
Universit
at Halle, 2005
4. B-Tree Indexes
4-51
Universit
at Halle, 2005
4. B-Tree Indexes
4-52
The DBMS could sort the ROWIDs before accessing the base table, but if the number of ROWIDs
is large, this is also an expensive operation.
Oracle sorts ROWIDs only for intersecting them.
Stefan Brass: Datenbanken II B
Universit
at Halle, 2005
4. B-Tree Indexes
4-53
If, however, only one row fits in each block, accessing 3% is even advantageous with completely
random accesses.
Stefan Brass: Datenbanken II B
Universit
at Halle, 2005
4. B-Tree Indexes
4-54
Universit
at Halle, 2005
4. B-Tree Indexes
4-55
Universit
at Halle, 2005
4. B-Tree Indexes
4-56
Universit
at Halle, 2005
4. B-Tree Indexes
4-57
Universit
at Halle, 2005
4. B-Tree Indexes
4-58
Universit
at Halle, 2005
4. B-Tree Indexes
4-59
Universit
at Halle, 2005
4. B-Tree Indexes
4-60
Universit
at Halle, 2005
4. B-Tree Indexes
4-61
Universit
at Halle, 2005
4. B-Tree Indexes
4-62
In extreme cases, the time limit built into most query optimizers might lead to a bad query evaluation
plan when there are too many possibilities.
Stefan Brass: Datenbanken II B
Universit
at Halle, 2005
4. B-Tree Indexes
4-63
Universit
at Halle, 2005
4. B-Tree Indexes
4-64
Thus, the indexed column must contain many different values. E.g. not only male, female.
It might confuse the optimizer if values are not equally distributed.
Stefan Brass: Datenbanken II B
Universit
at Halle, 2005
4. B-Tree Indexes
4-65
Universit
at Halle, 2005
4. B-Tree Indexes
4-66
For indexes on attribute combinations, put the attribute first that is also used without the other.
An index on (A, B) can be used like an index on A, but not like an
index on B. The performance is slightly worse than the index only on A
(and one loses the order of the ROWIDs), but one would usually not
declare an index on (A, B) and one on A.
Stefan Brass: Datenbanken II B
Universit
at Halle, 2005
4. B-Tree Indexes
4-67
Overview
1. Motivation
2. B-Trees
3. Query Evaluation with Indexes, Index Selection
'
Universit
at Halle, 2005
4. B-Tree Indexes
4-68
CREATE
"
INDEX
!
#
-
UNIQUE
"
ON
"
6
"
Name
#
!
Table
(
Column
6
"!
)
"!
"!
Universit
at Halle, 2005
4. B-Tree Indexes
4-69
Universit
at Halle, 2005
4. B-Tree Indexes
4-70
Universit
at Halle, 2005
4. B-Tree Indexes
4-71
Universit
at Halle, 2005
4. B-Tree Indexes
4-72
DROP
"
INDEX
!"
-
Name
Universit
at Halle, 2005
4. B-Tree Indexes
4-73
Universit
at Halle, 2005
4. B-Tree Indexes
4-74
Universit
at Halle, 2005
4. B-Tree Indexes
4-75
Universit
at Halle, 2005
4. B-Tree Indexes
4-76
Universit
at Halle, 2005
4. B-Tree Indexes
4-77
Universit
at Halle, 2005
4. B-Tree Indexes
4-78
AVG_DATA_BLOCKS_PER_KEY: Average number of data blocks that contain rows with the same attribute value.
I.e. we must expect that much block accesses to the table for a
given column value. Again, ANALYZE TABLE must be used.
Stefan Brass: Datenbanken II B
Universit
at Halle, 2005
4. B-Tree Indexes
4-79
Universit
at Halle, 2005
4. B-Tree Indexes
4-80
Universit
at Halle, 2005
4. B-Tree Indexes
4-81
Universit
at Halle, 2005
4. B-Tree Indexes
4-82
Column1
Data
Universit
at Halle, 2005
4. B-Tree Indexes
4-83
Universit
at Halle, 2005
4. B-Tree Indexes
4-84
TRUNC
Number
CEIL
Universit
at Halle, 2005
4. B-Tree Indexes
4-85
Universit
at Halle, 2005
4. B-Tree Indexes
4-86
Universit
at Halle, 2005
4. B-Tree Indexes
4-87
Universit
at Halle, 2005
4. B-Tree Indexes
4-88
Overview
1. Motivation
2. B-Trees
3. Query Evaluation with Indexes, Index Selection
4. Index Declaration, Data Dictionary
'
Universit
at Halle, 2005
4. B-Tree Indexes
4-89
Universit
at Halle, 2005
4. B-Tree Indexes
4-90
Universit
at Halle, 2005
4. B-Tree Indexes
4-91
Universit
at Halle, 2005
4. B-Tree Indexes
4-92
Universit
at Halle, 2005
4. B-Tree Indexes
4-93
Universit
at Halle, 2005
4. B-Tree Indexes
4-94
Universit
at Halle, 2005
4. B-Tree Indexes
4-95
Universit
at Halle, 2005
4. B-Tree Indexes
4-96
Experimental Approach
Since these parameters are difficult to estimate and
change over time, one must be prepared to repeat
the physical design step from time to time.
Creating a new index is simple in relational systems. However, if one
has to buy entirely new hardware because performance criteria are not
met, one has a problem. Thus, it is important to think about realistic
system loads during the design.
Universit
at Halle, 2005
4. B-Tree Indexes
4-97
Universit
at Halle, 2005
4. B-Tree Indexes
4-98
Outlook (1)
Further Oracle Data Structures:
Clusters for storing table data
This permits to store rows for the same attribute value together. It is
even possible to store rows from different tables in one cluster (makes
joins very fast).
Hash clusters
Here, the block (storage position) is computed from column value.
This is the fastest possible access for conditions of the form A = c,
but it is less flexible than a B-tree.
Bitmap indexes
Good for columns with few different values, for each row and each
possible value there is one bit.
Stefan Brass: Datenbanken II B
Universit
at Halle, 2005
4. B-Tree Indexes
4-99
Outlook (2)
Further Oracle Data Structures, continued:
Index-organized tables
Instead of ROWIDs, the index contains the complete rows.
Function-based indexes
Instead of indexing an attribute value, the search key of the index can
be a function of the row.
Object-relational features
E.g. non-first normal form tables: Table entries can be arrays.
Universit
at Halle, 2005
4. B-Tree Indexes
4-100
Outlook (3)
The literature contains many more data structures
for indexes:
E.g. there are special indexes for geometric data,
where one can search all points in a given rectangle, the nearest point to a given point, etc.
In general, an index allows special ways to compute
certain parameterized queries. E.g. a Hash-index on
R(A) supports
SELECT ROWID FROM R WHERE A=:1
Stefan Brass: Datenbanken II B
Universit
at Halle, 2005