0% found this document useful (0 votes)
58 views41 pages

Indexing Lecture Nov 2023 Summary

1. Indexes are used to retrieve data from a database quickly by working like an index in a book. 2. There are different types of indexes including clustered, non-clustered, primary, and secondary indexes. 3. Multi-level indexes refer to a hierarchical structure of indexes that allows faster data retrieval and reduces disk access by providing more detailed references to the data at each index level.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views41 pages

Indexing Lecture Nov 2023 Summary

1. Indexes are used to retrieve data from a database quickly by working like an index in a book. 2. There are different types of indexes including clustered, non-clustered, primary, and secondary indexes. 3. Multi-level indexes refer to a hierarchical structure of indexes that allows faster data retrieval and reduces disk access by providing more detailed references to the data at each index level.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 41

Indexing

Dr David Hamill
Physical storage of data

• Heap (Unordered) – Records are placed on disk in no


particular order

• Ordered (Sequential) – Records are placed on disk


by the value of a specified field.

• Hash – Records are placed on disk according to a


hash function
What are they?

1. Indexes are used to retrieve data from the database very fast.
2. Indexes work just like an index in a book
3. The users cannot see the indexes, they are just used to speed up
searches/queries.
4. A primary key is an automatic index.

Note: Updating a table with indexes takes more time than updating a table
without (because the indexes also need an update). So, only create indexes on
fields that will be frequently searched against.
Overview

Introduction

Single-Level Ordered Indexes


• Primary Index
• Secondary Index (Non-Clustered)
• Clustering Index

Multi-Level Indexes

B-Trees and B+-Trees


Single-Level Ordered Indexes

• Primary Index: if data is sequentially ordered and


the indexing field is a key field to the file
(guaranteed to be unique) then we call it a primary
index.

• Clustering Index: if the data file is sequentially


ordered on a non-key field and the indexing field
corresponds to a non-key field, then the index is a
clustering index.
Single-Level Ordered Indexes

• Secondary Index: An index that is defined on a non-


ordering field of the data file.

• A file can have at most one physical ordering field.


• A file can have at most one primary index or clustering
index, but not both.
• A file can have several secondary indexes.
• Secondary indexes do not affect the physical organization
of records.
Single-Level Ordered Indexes

• An index can be sparse or dense:


• A sparse index has an index record for some of the
search key values in the file.
• A dense index has an index record for every search
key value in the file.
• A primary index is built for a data file sorted on its key field.
• The index file is a sorted file whose records are fixed in length
consisting of two fields:
• The first field is the same data type as the ordering key
field of the data file.
• The second field is a pointer to a disk block.
• The ordering key field is called the primary key of the data file.
• There is one entry for each block in the data file.
Primary
Indexes

1:1 index file to data file is intuitive but wasteful


Primary Index -Example

Index File, sorted Data File, sorted

Distinct values
The index file requires significantly fewer blocks
than the data file
• Sparse index
• Index file record typically smaller in size than data file record

Primary Index A binary search on the index file requires fewer


block accesses than a binary search on the data file

-Performance Insertion and deletion of records is problematic

• Not only do we have to move records in the data file we also have to
change some index entries

Storage Overhead is not a serious problem


Clustering indexing is a database indexing technique
that is used to physically arrange the data in a table
based on the values of the clustered index key. This
means that the rows in the table are stored on disk in the
same order as the clustered index key

The leaf nodes of a clustered index contain the data


pages.

Clustering
Indexes A clustered index is faster. A non-clustered index is
slower. The clustered index requires less memory for
operations. A non-Clustered index requires more memory
for operations.

A clustered index is most useful for columns that have


range predicates because it allows better sequential
access of data in the table. As a result, since like values
are on the same data page, fewer pages are fetched.
A clustering index is built for a data file sorted
on a non-key field.

Clustering The index file is another


sorted file whose records
First field is of the
same data type as
the clustering field
of the data file.

Indexes
are fixed length consisting Second field is a
of two fields. pointer to a disk
block

There is one entry in the clustering index for


each distinct value of the clustering field
containing the value and a pointer to the first
block in the data file that holds at least one
record with the value of the clustering field.
Index file requires • Sparse index
significantly fewer • Index file record typically smaller than data file
blocks than the data record.
file.

A binary search on the


index file requires

Clustering fewer block accesses


than a binary search
on the data file.
Indexes • We have to move records in the data file and we

Performance Insertion and deletion


of records is
problematic:
have to change some index entries.
• Common to reserve a whole block for each
distinct value of the clustering field with all
records with that value placed in the block.

Storage overhead is
not typically a serious
problem.
Secondary (Non-Clustered) Indexes

• A secondary index is built for a non-ordering field of a data file.


• The index file is itself a sorted file whose records are fixed or
variable length consisting of two fields.
• The first field is the same data type as the indexing field.
• The second field is a pointer to a disk block for a record.

• We can consider two types of secondary indexes:


• Case 1: Using a dense secondary index that maps to all records in the
data file.
• Case 2: Using a secondary index that has an entry for each distinct key
value but whose pointers can be multivalued or point to a bucket of
values.
Clustered v Non-Clustered

1. Difference 1: Only one clustered index per table. You can create
multiple non-clustered indexes in a single table
2. Difference 2: Clustered indexes only sort tables. Therefore, they do
not consume extra storage. Non-clustered indexes are stored in a
separate place from the actual table claiming more storage space.
3. Difference 3: Clustered indexes are faster than Non-clustered indexes
since they don’t involve any extra lookup step.
When an index file becomes large
and extends over many pages, the
search time for the required index
increases

Multi-Level
Indexes A multi-level index Treat the index like
any other file
attempts to overcome Split the index into a
this problem by number of smaller
indexes
reducing the search Maintain an index to
range the indexes
Multi-Level
Indexes
• Multilevel indexes refer to a
hierarchical structure of indexes.

• Here, each level of the index


provides a more detailed reference
to the data.

• It allows faster data retrieval,


reduces disk access, and improves
query performance.
Multi-Level Indexes - Performance

• Search performance increases when searching for a record


based on a specific indexing field value.
• Problems with insertions and deletions are still present.
• To retain the benefits of using multi-level indexing while
reducing insertion and deletion problems, an approach is
taken that leaves some space in each block for inserting
new entries.
• This is called dynamic multi-level index and is often
implemented using a data structure called a balanced tree
(B-trees and B+-trees).
There are two types of index

A table or view can contain the following types of indexes:

Clustered

NonClustered
Clustered Index

• A clustered index defines the order in which data is physically stored in a table. Table data
can be sorted in only way, therefore, there can be only one clustered index per table. In SQL
Server, the primary key constraint automatically creates a clustered index on that particular
column.

• The only time the data rows in a table are stored in sorted order is when the table
contains a clustered index. When a table has a clustered index, the table is called a
clustered table. If a table has no clustered index, its data rows are stored in an
unordered structure called a heap.

//Creating Clustered Index


CREATE Clustered Index IndexName_TableName_ColumnName
ON TableName(ColumnName ASC)
NonClustered Index

• A non-clustered index doesn’t sort the physical data inside the table. In fact, a non-clustered index is
stored at one place and table data is stored in another place. This is similar to a textbook where the
book content is located in one place and the index is located in another. This allows for more than one
non-clustered index per table

• The pointer from an index row in a nonclustered index to a data row is called a row locator. The
structure of the row locator depends on whether the data pages are stored in a heap or a clustered
table. For a heap, a row locator is a pointer to the row. For a clustered table, the row locator is the
clustered index key.

• You can add nonkey columns to the leaf level of the nonclustered index to by-pass existing index key
limits, and execute fully covered, indexed, queries. Both clustered and nonclustered indexes can be
unique.

CREATE NonClustered INDEX index_name ON table_name (column_name ASC);


Data Structures

• Most index data structures can be viewed as trees.


• In general, the root of this tree will always be in main memory,
while the leaves will be located on disk.
• The performance of a data structure depends on the number of
nodes in the average path from the root to the leaf.
• Data structure with high fan-out (maximum number of children of
an internal node) are thus preferred.
Hash index

• Indexes are used as entry points for memory-optimized tables.


Reading rows from a table requires an index to locate the data in
memory. A hash index consists of a collection of buckets organized
in an array. A hash function maps index keys to corresponding
buckets in the hash index.
• Records are placed on disk according to a hash function.
B-tree index

• In computer science, a B-tree is a self-balancing tree data


structure that keeps data sorted and allows searches, sequential
access, insertions, and deletions in logarithmic time. The B-tree is
a generalization of a binary search tree in that a node can have
more than two children (Comer 1979, p. 123).
B-Tree and Hash Tree
26

• Hash indexes don’t help


when evaluating range
queries

• Hash index outperforms B-


tree on point queries
Point Q ue rie s

Throughput(queries/sec)
60

50
40

30
20

10
0
B-T re e hash inde x
16/11/2023
B+-Trees  Special type of tree structure used for search purposes 27

Root Node

Child
Level 0 Node A Internal Node

Level 1 B C

Level 2 D E F

16/11/2023
Leaf Node
B+-Trees 28

B-Trees
◦ Invented in 1969, B-trees are still the prevailing data
structure for indexes in relational databases
◦ A search tree with some additional constraints on it.
◦ These constraints ensure that the tree is always
balanced and that the space wasted by deletion (if any)
never becomes excessive.

B+-Trees
◦ Most implementations of dynamic multilevel index use
a variation of the B-tree data structure called a B+-Tree 16/11/2023
Troubleshooting Techniques 29

• Monitoring a DBMS’s performance should be based on queries and


resources.
• The consumption chain helps distinguish problems’ causes from their
symptoms
• Existing tools help extracting relevant performance indicators

16/11/2023
Table tuning – Indexing 30

• What should be indexed?


• All attributes where you JOIN
• All attributes where you filter (WHERE)
• All attributes where you ORDER or GROUP BY
• All attributes where you want to do an Index Scan instead of a Table scan.
• NOT on attributes with an evenly distributed low cardinality.

16/11/2023
Table tuning – Indexing 31

• How should tables be indexed?


• Indexes can only be used from left to right.
• Keep them short.

16/11/2023
B-Tree example Youtube 32

• http://www.youtube.com/watch?v=coRJrcIYbF4

 Uses Bayer & McCreight terminology

16/11/2023
Query type dictates the
best Index
Point Query

This is a query that will return at least one record due to a where
condition

Eg:
Select * from staff where StaffID = ‘12345’
Multi-Point Query

A Multi-Point query will return more than one record using an


equality condition

Eg:
SELECT * FROM EMPLOYEES
WHERE DEPARTMENT = ‘Human Resources’
A Range Query

This type of query will return a set of values within an interval or half-interval

Eg:
SELECT * FROM EMPLOYEE
WHERE AGE >=50 AND <70

SELECT * FROM EMPLOYEE


WHERE AGE >=70

Obviously, the Indexing of the AGE field could speed up retrievals.


A Prefix Match query

In this scenario a Prefix match query is where only the first part of the attribute or sequence of attributes is
specified.

Eg:
SELECT FIRSTNAME, SURNAME FROM EMPLOYEES
WHERE SURNAME LIKE ‘ST%’

This query will return all records with surnames staring with the letters ‘ST’.
In this example it would be obvious to index the Surname field, if this type of
query is to be run repeatedly.
The End

You might also like