0% found this document useful (0 votes)
88 views110 pages

Index Architecture: Febriliyan Samopa

The document provides an overview of index architecture, including basic concepts, ordered and hash indices, index evaluation metrics, and hands-on examples. It discusses key aspects of index structures like search keys, index entries, ordered vs hash indices. It also covers ordered index structures like dense and sparse indices, primary and secondary indices, and multilevel indices. The document concludes with explaining B+ tree index files, their structure and properties of leaf and non-leaf nodes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views110 pages

Index Architecture: Febriliyan Samopa

The document provides an overview of index architecture, including basic concepts, ordered and hash indices, index evaluation metrics, and hands-on examples. It discusses key aspects of index structures like search keys, index entries, ordered vs hash indices. It also covers ordered index structures like dense and sparse indices, primary and secondary indices, and multilevel indices. The document concludes with explaining B+ tree index files, their structure and properties of leaf and non-leaf nodes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 110

Index

Architecture

Febriliyan Samopa
Basic Concepts

• Indexes used to speed up record retrieval in response


to certain search conditions.
• Index structures provide secondary access paths.
• Search Key - attribute to set of attributes (fields) used to
look up records in a file.
• Any field can be used to create an index.
• Multiple indexes can be constructed.
Basic Concepts
• An index file consists of records (called index
entries) of the form :
Search Key Pointer

• Index files are typically much smaller than the


original file.
• Two basic kinds of indices:
• Ordered indices: search keys are stored in sorted
order
• Hash indices: search keys are distributed uniformly
across “buckets” using a “hash function”.
Index Evaluation Metrics
• Access types supported efficiently. E.g. :
• Records with a specified value in the
attribute.
• Records with an attribute value falling in a
specified range of values.
• Access time
• Insertion time
• Deletion time
• Space overhead
Hands On
• Create a copy of table Sales.SalesOrderDetail :

• Try to select a row from that table :


Command SSMS to show result’s
statistic to messages window

Click here to show the


messages window
Hands On
• Try to select a same row from original table
Command SSMS to show result’s
statistic to messages window

Click here to show the


messages window
Ordered Index
• Ordered index similar to index in a textbook.
• In an ordered index, index entries are stored
sorted on the search key value. E.g., author
catalog in library.
• Primary index: in a sequentially ordered file,
the index whose search key specifies the
sequential order of the file.
• Also called clustering index, if the search key
value is not unique for some records.
• The search key of a primary index is usually but
not necessarily the primary key.
Ordered Index
• Secondary index: an index whose search
key specifies an order different from the
sequential order of the file. Also called
non-clustering index.
• Index-sequential file: ordered sequential
data file with a primary index.
• A data file can only have one primary index
but can have many secondary indices.
Dense Index
• Dense index — Index record appears for
every search-key value in the file.
• E.g. index on ID attribute of instructor
relation below :
Dense Index
• Dense Index can be clustered.
• E.g. index on dept_name, with instructor
file sorted on dept_name below :
Sparse Index
• Sparse Index — Index records for only
some (not all) search-key values.
• Applicable when records are sequentially
ordered on search-key.
Sparse Index

•To locate a record with search-key


value K, we need to :
• Find index record with largest
search-key value < K.
• Search file sequentially starting at
the record to which the index record
points.
Sparse Index
Compared to dense indices:
• Less space and less maintenance overhead for
insertions and deletions.
• Generally slower than dense index for locating records.
• Good tradeoff : sparse index with an index entry for
every block in file, corresponding to least search-key
value in the block.
Secondary Index
• Frequently, one wants to find all the records
whose values in a certain field (which is not the
search-key of the primary index) satisfy some
condition :
• Example 1: In the instructor relation stored
sequentially by ID, we may want to find all
instructors in a particular department.
• Example 2: as above, but where we want to find all
instructors with a specified salary or with salary in a
specified range of values.
• We can have a secondary index with an index
record for each search-key value.
Secondary Index
• Secondary Index record points to a bucket that
contains pointers to all the actual records with
that particular search-key value.
• Secondary indices have to be dense.
Primary and Secondary Index
• Indices offer substantial benefits when searching for
records.
• BUT: Updating indices imposes overhead on
database modification – when a file is modified,
every index on the file must be updated.
• Sequential scan using primary index is efficient, but
a sequential scan using a secondary index is
expensive :
• Each record access may fetch a new block from
disk.
• Block fetch requires about 5 to 10 milliseconds,
versus about 100 nanoseconds for memory
access.
Multilevel Index
• If primary index does not fit in memory, access
becomes expensive.
• Solution : treat primary index kept on disk as a
sequential file and construct a sparse index on it :
• outer index – a sparse index of primary index
• inner index – the primary index file
• If even outer index is too large to fit in main
memory, yet another level of index can be
created, and so on.
• Indices at all levels must be updated on insertion
or deletion from the file.
Multilevel Index
Index Update - Deletion
• If deleted record was the only record
in the file with its particular search-key
value, the search-key is also deleted
from the index.
Index Update - Deletion
Single-level index deletion :
• Dense indices – deletion of search-key is
similar to file record deletion.
Index Update - Deletion
Single-level index deletion :
• Sparse indices :
• If an entry for the search key not exists in
the index, it just deleted the entry in the
file.
• If an entry for the search key exists in the
index, it is deleted by replacing the entry in
the index with the next search-key value in
the file (in search-key order).
• If the next search-key value already has an
index entry, the entry is deleted instead of
being replaced.
Index Update - Deletion
Index Update - Insertion
Single-level index insertion:
• Perform a lookup using the search-key value
appearing in the record to be inserted.
• Dense indices – if the search-key value does not
appear in the index, insert it.
A new index entry is created insert row(33456,Gold,Physics,87000)
Index Update - Insertion
Single-level index insertion:
• Sparse indices – if index stores an entry
for each block of the file, no change
needs to be made to the index unless a
new block is created.
• If a new block is created, the first search-
key value appearing in the new block is
inserted into the index.
Index Update - Insertion
Example : every block can contains maximum 4 rows.
insert row(58583,Califieri,History,62000)
insert new Index(76543) insert row(83821,Brandt,Comp. Sci.,92000)

new block
B+-Tree Index Files
B+-tree indices are an alternative to indexed-sequential files.
• Disadvantage of indexed-sequential files :
• Performance degrades as file grows, since many overflow
blocks get created.
• Periodic reorganization of entire file is required.
• Advantage of B+-tree index files :
• Automatically reorganizes itself with small, local, changes,
in the face of insertions and deletions.
• Reorganization of entire file is not required to maintain
performance.
• (Minor) disadvantage of B+-trees :
• Extra insertion and deletion overhead, space overhead.
• Advantages of B+-trees outweigh disadvantages thus
B+-trees are used extensively.
B+-Tree Index Files
Example of B+-Tree
B+-Tree Index Files
A B+-tree is a rooted tree satisfying the following
properties:
• All paths from root to leaf are of the same length
• Each node that is not a root or a leaf has between
𝑛/2 and 𝑛 children (𝑛 = number of pointer in a
node).
• A leaf node has between (𝑛–1)/2 and 𝑛–1
values.
• Special cases:
• If the root is not a leaf, it must has at least 2 children
→ 𝑛 ≥ 3.
• If the root is a leaf, it can have between 0 and (𝑛–1)
values.
B+-Tree Node Structure
• Typical node :

• Ki are the search-key values


• Pi are pointers to children (for non-leaf
nodes) or pointers to records or buckets of
records (for leaf nodes).
• The search-keys in a node are ordered :
K1 < K2 < K3 < . . . < Kn–1
(Initially assume no duplicate keys, address duplicates later)
Leaf Nodes in B+-Trees
Properties of a leaf node:
• For i = 1, 2, . . ., n–1, pointer Pi points to a file record
with search-key value Ki.
• If Li , Lj are leaf nodes and i < j, Li’s search-key values
are less than or equal to Lj’s search-key values.
• Pn points to next leaf node in search-key order.
Non-Leaf Nodes in B+-Trees
• Non leaf nodes form a multi-level sparse index
on the leaf nodes.
• For a non-leaf node with m pointers:
• All the search-keys in the subtree to which P1 points
are less than K1.
• For 2 ≤ i ≤ n–1, all the search-keys in the subtree to
which Pi points have values greater than or equal to
Ki–1 and less than Ki.
• All the search-keys in the subtree to which Pn points
have values greater than or equal to Kn–1.
Example of B+-Tree

B+-tree for instructor file (𝑛 = 6)

• Leaf nodes must have between 3 and 5 values


( (𝑛–1)/2 and 𝑛–1, with 𝑛 = 6).
• Non-leaf nodes other than root must have between 3
and 6 children ( 𝑛/2 and 𝑛 with 𝑛 = 6).
• Root must have at least 2 children.
Observations about B+-Tree
• Since the inter-node connections are done by pointers, “logically”
close blocks need not be “physically” close.
• The non-leaf levels of the B+-tree form a hierarchy of sparse
indices.
• The B+-tree contains a relatively small number of levels
• Level below root has at least 2 x 𝑛/2 values.
• Next level has at least 2 x 𝑛/2 x 𝑛/2 values.
• ... and so on.
• If there are K search-key values in the file, the tree height is
no more than 𝑙𝑜𝑔 𝑛/2 (𝑘) .
• Thus searches can be conducted efficiently.
• Insertions and deletions to the main file can be handled
efficiently, as the index can be restructured in logarithmic time
(as we shall see).
Queries on B+-Trees
Find record with search-key value V :
1. C = root.
2. While C is not a leaf node :
1. Let i be least value so that V ≤ Ki.
2. If no such i exists, set C = last non-null pointer in C.
3. Else { if (V = Ki ) Set C = Pi+1 else set C = Pi }
3. Let i be least value so that Ki = V.
4. If there is such a value i, follow pointer Pi to the
desired record.
5. Else no record with search-key value V exists.
Queries on B+-Trees Example
Because
Value
Find of or
isay
(1
Let’sresult
The value KV
iwe=,≤2Kwant
is
does
1of KaVor
221,row
and
3) to
notsoVC
Kfind
(‘Singh’)
but that ≠isare
3that
exist →
1→
=≤leaf
Kpointed C→
Vless
≤Cvalue
“Singh”
the →
P=→
=K=than
Cilast
by1PC2‘Singh’
‘Singh’
of ≤→value
V K=21 ‘Singh’,
non-null (‘Singh’)C→
i does K=i→
inot
=ofroot
(‘Srinivasan’)
pointer in C2exist
i=1

Last K 1 K2 pointer
non-null K3 in C

=C
P1 PK21 PK32 PK43
C
PK11 P
K2 PK3 P4

C
Handling Duplicates
• With duplicate search keys in both leaf and
internal nodes :
• Cannot guarantee that K1 < K2 < K3 < . . . < Kn–1.
• But can guarantee K1 ≤ K2 ≤ K3 ≤ . . . ≤ Kn–1.
• Search-keys (V) in the subtree to which Pi
points
• V ≤ Ki but not necessarily V < Ki.
• To see why, suppose same search key value V
is present in two leaf node Li and Li+1, then in
parent node Ki must be equal to V.
Handling Duplicates
• Modify find procedure as follows :
• Traverse Pi even if V = Ki.
• As soon as we reach a leaf node C, check if C has
only search key values less than V, if so set C = right
sibling of C before checking whether C contains V.
• Procedure printAll
• Uses modified find procedure to find first occurrence
of V.
• Traverse through consecutive leaves to find all
occurrences of V.
Modified Queries on B+-Trees
Find record with search-key value V :
1. C = root.
2. While C is not a leaf node : Second change
1. Let i be least value so that V ≤ Ki.
2. If no such i exists, set C = last non-null pointer in C.
3. Else set C = Pi First change
3. If for all Ki in C, Ki < V then C = right sibling of C.
4. Let i be least value so that Ki = V.
5. If there is such a value i, follow pointer Pi to the
desired record.
6. Else no record with search-key value V exists.
Queries on B+-Trees
• If there are K search-key values in the file, the height of
the tree is no more than 𝑙𝑜𝑔 𝑛/2 (𝐾) .
• A node is generally the same size as a disk block,
typically 4 kilobytes and 𝒏 is typically around 100 (40
bytes per index entry).
• With 1 million search key values and 𝑛 = 100, at most
𝑙𝑜𝑔50 (1000000) = 4 nodes are accessed in a lookup.
• Contrast this with a balanced binary tree with 1 million
search key values — 𝑙𝑜𝑔2(1000000) = 20 nodes are
accessed in a lookup.
• Above difference is significant since every node access
may need a disk I/O, costing around 20 milliseconds.
Insert on B+-Trees
1. Find the leaf node in which the search-key value would
appear
2. If the search-key value is already present in the leaf node
1. Add record to the file.
2. If necessary add a pointer to the bucket.
3. If the search-key value is not present, then :
1. Add the record to the main file (and create a bucket if
necessary).
2. If there is room in the leaf node, insert (key-value,
pointer) pair in the leaf node.
3. Otherwise, split the node (along with the new (key-
value, pointer) entry) as discussed in the next slide.
Insert on B+-Trees
• Splitting a leaf node :
• Take the n (search-key value, pointer) pairs (including the
one being inserted) in sorted order. Place the first 𝑛/2
in the original node, and the rest in a new node.
• let the new node be p, and let k be the least key value in
p. Insert (k,p) in the parent of the node being split.
• If the parent is full, split it and propagate the split further
up.
• Splitting of nodes proceeds upwards till a node that
is not full is found.
• In the worst case the root node may be split increasing
the height of the tree by 1.
Insert on B+-Trees
• Splitting a non-leaf node: when inserting
(k,p) into an already full internal node N :
• Copy N to an in-memory area M with space
for n+1 pointers and n keys.
• Insert (k,p) into M.
• Copy P1 ,K1 , …, K 𝑛/2 -1 ,P 𝑛/2 from M back into
node N.
• Copy P 𝑛/2 +1 ,K 𝑛/2 +1 ,…,Kn ,Pn+1 from M into newly
allocated node N’.
• Insert (K 𝑛/2 ,N’) into parent N.
Insert on B+-Trees Example
Let’s say that< parent
V (‘Adams’)
Modify
Insert leaf’s we want
K1 (‘Brandt’)
(‘Mozart’)
node
key-value(‘Adams’) →CCCdata

to insert
(‘Einstein’)and =→
=isPPleaf
accordingly
into → search-key
1C1 with
is full → Split C into
search-key value (V)
not 2found =→
nodes‘Adams’
insert new row in file

K1 K2 K3

K1 K2 K3 =C

C
K1 K2 K3

C=
Delete on B+-Trees
• Find the record to be deleted, and remove it from the
main file and from the bucket (if present).
• Remove (search-key value, pointer) from the leaf node if
there is no bucket or if the bucket has become empty.
• If the node has too few entries due to the removal, and
the entries in the node and a sibling fit into a single
node, then merge siblings :
• Insert all the search-key values in the two nodes into
a single node (the one on the left), and delete the
other node.
• Delete the pair (Ki–1 , Pi ), where Pi is the pointer to
the deleted node, from its parent, recursively using
the above procedure.
Delete on B+-Trees
• Otherwise, if the node has too few entries due to the
removal, but the entries in the node and a sibling do not
fit into a single node, then redistribute pointers :
• Redistribute the pointers between the node and a sibling such
that both have more than the minimum number of entries.
• Update the corresponding search-key value in the parent of
the node.
• The node deletions may cascade upwards till a node
which has 𝑛/2 or more pointers is found.
• If the root node has only one pointer after deletion, it is
deleted and the sole child becomes the root.
Delete on B+-Trees Example
V (‘Srinivasan’)
Remove
Let’s
Update
UpdatesayC’s
that
C’s we=> want
parent K1node
V (‘Srinivasan’)
parent’s tofrom C→
delete
(‘Mozart’)
parent node →
(‘Srinivasan’) → =C
Cnode
data =1+1PC1+1
Pwith
remove =is =2row
P2 from
Punder
search-key → merge
fullvalue
file (V) = C‘Srinivasan’
with previous node

K1 K2 K3

=C
K1 K2 K3
C C
K1 K2 K3

C
B-Tree Index Files
• Similar to B+-tree, but B-tree allows search-key
values to appear only once, eliminates
redundant storage of search keys.
• Search keys in non leaf nodes appear nowhere
else in the B-tree, an additional pointer field for
each search key in a nonleaf node must be
included.

Typical B-Tree nodes : (a) Leaf node, (b) Non leaf node
B-Tree Index Files Example

• B-tree (above) and B+-tree (below) on same data


B-Tree Index Files
• Advantages of B-Tree indices:
• May use less tree nodes than a corresponding B+-Tree.
• Sometimes possible to find search-key value before reaching
leaf node.
• Disadvantages of B-Tree indices:
• Only small fraction of all search-key values are found early.
• Non-leaf nodes are larger, so fan-out is reduced. Thus, B-
Trees typically have greater depth than corresponding B+-
Tree.
• Insertion and deletion more complicated than in B+-Trees.
• Implementation is harder than B+-Trees.
• Typically, advantages of B-Trees do not out weigh
disadvantages.
Multiple-Key Access
• Use multiple indices for certain types of queries.
• Example :
SELECT ID
FROM instructor
WHERE dept_name = ‘Finance’ AND salary = 80000
• Possible strategies for processing query using indices on
single attributes:
1. Use index on dept_name to find instructors with department
name Finance, then test salary = 80000.
2. Use index on salary to find instructors with a salary of
$80000, then test dept_name = ‘Finance’.
3. Use dept_name index to find pointers to all records
pertaining to the ‘Finance’ department. Similarly use index
on salary. Then take intersection of both sets of pointers
obtained.
Indices on Multiple Keys

•Composite search keys are search


keys containing more than one
attribute :
• E.g. (dept_name, salary)
•Lexicographic ordering : (a1 , a2 ) <
(b1 , b2 ) if either :
• a1 < b1 , or
• a1 = b1 and a2 < b2
Indices on Multiple Attributes
• Suppose we have an index on combined search-key :
(dept_name, salary)
• With the where clause :
where dept_name = ‘Finance’ and salary = 80000
• The index on (dept_name, salary) can be used to fetch
only records that satisfy both conditions.
• Using separate indices in less efficient — we may fetch
many records (or pointers) that satisfy only one of the
conditions.
• Can also efficiently handle :
where dept_name = ‘Finance’ and salary < 80000
• But cannot efficiently handle :
where dept_name < “Finance” and balance = 80000
• May fetch many records that satisfy the first but not the
second condition
Degenerated Index
• The most prominent myth is that an index
can become degenerated after a while and
must be re-built regularly.
• Sparse Index, static hash index may
degenerate but not the B-Tree/B+-Tree.
• Rebuilding an index on B-Tree/B+-Tree
might reduce the number of leaf nodes by
about 20% - 30%, other gain of an index
rebuild is only 0%-2% (on unique index)
Slow Index
• Despite the efficiency of the tree traversal, there
are still cases where an index lookup doesn't work
as fast as expected.
• Index rebuild as the miracle solution does not
improve performance on the long run.
• The slow index caused by two possibilities, the leaf
node chain and the access to the table.
Slow Index
• The leaf node chain :
An index lookup not only needs to perform the
tree traversal, it also needs to follow the leaf
node chain.
Slow Index
• The access to the table :
There is an additional table access for each hit.
Slow Index
• An index lookup requires three steps:
1. The tree traversal.
2. Following the leaf node chain.
3. Fetching the table data.
• The tree traversal is the only step that has
an upper bound for the number of
accessed blocks—the index depth.
• The other two steps might need to access
many blocks—they cause a slow index
lookup.
Slow Index
Three distinct operations that describe a basic
index lookup:
• INDEX UNIQUE SCAN
Performs the tree traversal only. For unique attribute or
primary key.
• INDEX RANGE SCAN (INDEX SEEK)
Performs the tree traversal and follows the leaf node chain
to find all matching entries.
• TABLE ACCESS BY INDEX ROWID (RID Lookup)
Retrieves the row from the table. This operation is (often)
performed for every matched record from a preceding index
scan operation.
Primary Key Index Lookup
• Example :
SELECT first_name, last_name
FROM employees
WHERE employee_id = 123

• The where clause cannot match multiple


rows because the primary key constraint
ensures uniqueness of the employee_id
values.
• The database does not need to follow the
index leaf nodes—it is enough to traverse
the index tree.
Primary Key Index Lookup

•The Execution Plan


SQL Server Operations in Execution Plan

• Clustered index seek :


• Find record based on clustered index – via B-Tree
structure.
• Index seek :
• Find record based on nonclustered index – via B-
Tree structure.
• Clustered index scan :
• Find record based on un-indexed attribute.
• Index scan
• Find record based on nonclustered index – via
scanning index file (dense index read).
Performance Considerations
• Attribute value need to obtain
• Whether included in search key or
not.
• Where clause
• Should includes indexed attributes.
• Indexed attribute
• Only required attribute.
• Concatenated key index
• First attribute should included.
Concatenated Key Index Lookup
• Concatenated keys were built to make a unique value
across more than two attributes :
CREATE UNIQUE INDEX employee_pk
ON employees (employee_id, subsidiary_id)
• Suppose that employee_id is not unique caused by
combining two tables from different subsidiary
employee table :
SELECT first_name, last_name
FROM employees
WHERE employee_id = 123 AND subsidiary_id = 30
• Note that the column order of a concatenated index has
great impact on its usability so it must be chosen
carefully.
Concatenated Key Index Lookup

• B+-Tree shorted index.


• The database considers each column according
to its position in the index definition to sort the
index entries :
• The first column is the primary sort criterion
• The second column determines the order only iftwo
entries have the same value in the first column and
so on.
• A concatenated index is one index across
multiple columns.
• The first column is important.
Concatenated Key Index Lookup
Concatenated Key Index Lookup

• Reversing the column for better


performance :
CREATE UNIQUE INDEX employee_pk
ON employees (subsidiary_id , employee_id)
• Usable under consideration where
subsidiary_id is preferable in the search.
• To get the benefit of index seek rather than
index scan.
• The B-Tree structure is totally different.
Concatenated Key Index Lookup
• Rules
• In general, a database can use a concatenated index
when searching with the leading (leftmost) columns.
• An index with three columns can be used when searching
for the first column, when searching with the first two
columns together, and when searching using all
columns.
• The order of columns in a where clause is not a problem
for SQL server execution plan.
• Trade off
• Storage Space.
• Maintenance overhead.
• The fewer indexes a table has, the better the insert,
delete and update performance.
Changing an Index
• Changing index when needed for better
performance.
• Compare the execution plan cost before and
after the change for performance check.
• Going through each execution plan :
• Especially when attributes in the query did not
exist in the indexed column list in where clause.
• Choosing the best execution plan depends on
the table's data distribution :
• Even used by query optimizer.
Changing an Index
• Ignore index :
SELECT first_name, last_name
FROM employees WITH(INDEX(0))
WHERE employee_id = 123 AND subsidiary_id = 30
• Disable ALL index in a table :
ALTER INDEX ALL ON <TABLE NAME> DISABLE;
• Disable an index in a table :
ALTER INDEX <INDEX NAME> ON <TABLE NAME> DISABLE;
Impact of Greater, Less and BETWEEN

• The biggest performance risk of an INDEX


RANGE SCAN is the leaf node traversal.
• To keep the scanned index range as small as
possible.
• Check the range scan by printing the result.
SELECT first_name, last_name, date_of_birth
FROM employees
WHERE
date_of_birth >= TO_DATE(?, 'YYYY-MM-DD')
AND
date_of_birth <= TO_DATE(?, 'YYYY-MM-DD')
Impact of Greater, Less and BETWEEN

SELECT first_name, last_name, date_of_birth


FROM employees
WHERE
date_of_birth >= TO_DATE(?, 'YYYY-MM-DD')
AND
date_of_birth <= TO_DATE(?, 'YYYY-MM-DD’)
AND subsidiary_id = ?

• By adding equal column, the start and stop


conditions become less, but in what order?
Impact of Greater, Less and BETWEEN

• The next figures show the


effect of the column order
on the scanned index
range.
• For this illustration we
search all employees of
subsidiary 27 who were
born between January 1st
and January 9th 1971.

SUBSIDIARY_ID is useless during tree traversal


yet it helps minimizing the range a bit.
Impact of Greater, Less and BETWEEN

• The picture looks entirely


different when reversing
the column order.
• The tree traversal directly
leads to the second leaf
node. In this case, all where
clause conditions limit the
scanned index range so that
the scan terminates at the
very same leaf node.

Rule of thumb: index for equality first—then for ranges.


Index Merge
• In most cases: one index with multiple columns is
better—that is, a concatenated or compound index,
than that with single column
• Nevertheless there are queries where a single index
cannot do a perfect job :
SELECT first_name, last_name, date_of_birth
FROM employees
WHERE UPPER(last_name) < ? AND date_of_birth < ?
• It is impossible to define a B-Tree index that would
support this query without filter predicates.
• For an explanation, you just need to remember that
an index is a linked list.
Index Merge
• Solution :
• Accept the filter predicate and use a multi-column
index.
• Use two separate indexes, one for each column then
merge the result.
Partial Indexes/ Filtered Indexes

• In a data warehouse project, where this case


often happened, bitmap index is used :
• But, the greatest weakness of bitmap indexes is the
ridiculous insert, update and delete scalability.
Partial Indexes/ Filtered Indexes
• So far we have only discussed which columns to add
to an index.
• With partial (PostgreSQL) or filtered (SQL Server)
indexes you can also specify the rows that are
indexed.
• A partial index is useful for commonly used where
conditions that use constant values—like the status
code
SELECT message
FROM messages
WHERE processed = 'N' AND receiver = ?
• Messages that were already processed are rarely
needed.
Partial Indexes/ Filtered Indexes

• Multiple index key :


CREATE INDEX messages_todo
ON messages (receiver, processed)

• Filtered indexes :
CREATE INDEX messages_todo
ON messages (receiver)
WHERE processed = 'N'
Join Operation
• Join operation may affect the performance.
• Join table or query involving more than two
tables might be performed by nested loop
select inside programming language via ORM
and function.
• Yet, network latencies occur on top of disk
latencies.
• SQL join operation performs better than nested
loop select.
• Two common approaches, the SQL join
operation and nested query.
SQL Join and Nested Query
•SQL Join :
SELECT S.SalesAmount FROM EMPLOYEE E
JOIN SALES S ON E.EMP_ID = S.EMP_ID
WHERE EMP_NAME = ?

•Nested Query :
SELECT SalesAmount FROM SALES WHERE
EMP_ID IN (SELECT EMP_ID FROM
EMPLOYEE WHERE EMP_NAME = ? )
Types of Join Operation
• SQL Server employs three types of join
operations :
• Nested loops joins.
• Merge joins.
• Hash joins.
• Type of join operation are automatically
chosen by DBMS.
• Some consideration includes number of
data and index availability.
Nested Loops Joins
• Employ nested iteration.
• Two tables A and B were outer input and
inner input.
• Outer loop consumes outer input table.
• Inner loop, executed each outer output row
matching the rows in the inner input table.
• Effective when outer input is small and
inner input is pre-indexed.
• Index on join predicates key and where
clause attributes.
Nested Loops Joins on SQL Server

• Check performance by means of subtree cost.


Nested Loops Joins on SQL Server
Merge Joins
• Two input tables were sorted on merge column :
Defined by the equality (ON) clauses of the join
predicate (typically by sorted index).
• Merge Join operator gets a row from each input and
compares them.
• The rows are returned if they are correspond to the join
type (inner, outer, etc.) otherwise, discarded.
• Merge join uses a temporary table to store rows.
• Merge join can be an expensive choice if sort
operations are required and no indexed column for
join predicate available.
• At least join predicate column should be indexed.
Merge Joins on SQL Server
• Check performance by means of subtree cost.
Merge Joins on SQL Server
Hash Joins
• The build input and probe input, assign by
query optimizer.
• The smaller input, the build input where
computed than inserted into hash table with
hash value corresponding hash key.
• The entire probe input is scanned or computed
one row at a time.
• For each probe row, the hash key's value is
computed.
• Corresponding hash bucket is scanned, and the
matches are produced.
Hash Joins
• The indexing strategy for a hash join is very
different because there is no need to index
the join columns.
• Therefore, useable and effective for join
with no index in join columns.
• Only indexes for independent WHERE
predicates improve hash join performance.
• Indexing join predicates doesn't improve
hash join performance.
Hash Joins on SQL Server
• Check performance by means of subtree cost.
Hash Joins on SQL Server
Sorting and Grouping
• Sorting is a very resource intensive
operation.
• Become a problem for large data sets.
• An index provides an ordered
representation of the indexed data.
• The index is, in fact, sorted just like
when using the index definition in an
order by clause.
Sorting ORDER BY
Sorting ORDER BY

•Create index for ORDER BY clause :


Sorting ORDER BY
Sorting ORDER BY
• If the index order corresponds to the ORDER BY clause,
the database can omit the explicit sort operation.
Indexing GROUP BY

•Stream Aggregate (sort/group


algorithm) :
• Smaller amount of data.
• Proceed with specific where clause.
•Hash Match (hash algorithm) :
• Large amount of data.
•Optimized by query optimizer.
Indexing GROUP BY
Partial Results
Sometimes we do not need all of the
query result :
• Querying Top-N Rows.
• Paging Through Results.
• Using Window Functions.
Querying Top-N Rows
Querying Top-N Rows
Optimize Top-N execution with index,
thus does not require sort operation :
Querying Top-N Rows
After index created, there is no costly
sort operation.
Paging Through Results
Paging Through Results
Optimize Paging OFFSET execution with
index, thus does not require sort
operation :
Paging Through Results
After index created, there is no costly
sort operation.
Using Window Functions
Using Window Functions
Optimize Windows Functions execution
with index, thus does not require sort
operation :
Paging Through Results
After index created, the cost of sort
operation is greatly reduced.
1. Draw the final index structure, after the above index
structure are inserted with key-value ‘Jose’ and
‘Lancelot’.
2. Draw the final index structure, after the above index
structure are deleted with key-value ‘Brand’ and
‘Einstein’.

You might also like