0% found this document useful (0 votes)

58 views41 pages

Indexing Lecture Nov 2023 Summary

1. Indexes are used to retrieve data from a database quickly by working like an index in a book. 2. There are different types of indexes including clustered, non-clustered, primary, and secondary indexes. 3. Multi-level indexes refer to a hierarchical structure of indexes that allows faster data retrieval and reduces disk access by providing more detailed references to the data at each index level.

Uploaded by

mccreary.michael95

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views41 pages

Indexing Lecture Nov 2023 Summary

Uploaded by

mccreary.michael95

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 41

Indexing

Dr David Hamill
Physical storage of data

• Heap (Unordered) – Records are placed on disk in no

particular order

• Ordered (Sequential) – Records are placed on disk

by the value of a specified field.

• Hash – Records are placed on disk according to a

hash function
What are they?

1. Indexes are used to retrieve data from the database very fast.
2. Indexes work just like an index in a book
3. The users cannot see the indexes, they are just used to speed up
searches/queries.
4. A primary key is an automatic index.

Note: Updating a table with indexes takes more time than updating a table
without (because the indexes also need an update). So, only create indexes on
fields that will be frequently searched against.
Overview

Introduction

Single-Level Ordered Indexes

• Primary Index
• Secondary Index (Non-Clustered)
• Clustering Index

Multi-Level Indexes

B-Trees and B+-Trees

Single-Level Ordered Indexes

• Primary Index: if data is sequentially ordered and

the indexing field is a key field to the file
(guaranteed to be unique) then we call it a primary
index.

• Clustering Index: if the data file is sequentially

ordered on a non-key field and the indexing field
corresponds to a non-key field, then the index is a
clustering index.
Single-Level Ordered Indexes

• Secondary Index: An index that is defined on a non-

ordering field of the data file.

• A file can have at most one physical ordering field.

• A file can have at most one primary index or clustering
index, but not both.
• A file can have several secondary indexes.
• Secondary indexes do not affect the physical organization
of records.
Single-Level Ordered Indexes

• An index can be sparse or dense:

• A sparse index has an index record for some of the
search key values in the file.
• A dense index has an index record for every search
key value in the file.
• A primary index is built for a data file sorted on its key field.
• The index file is a sorted file whose records are fixed in length
consisting of two fields:
• The first field is the same data type as the ordering key
field of the data file.
• The second field is a pointer to a disk block.
• The ordering key field is called the primary key of the data file.
• There is one entry for each block in the data file.
Primary
Indexes

1:1 index file to data file is intuitive but wasteful

Primary Index -Example

Index File, sorted Data File, sorted

Distinct values
The index file requires significantly fewer blocks
than the data file
• Sparse index
• Index file record typically smaller in size than data file record

Primary Index A binary search on the index file requires fewer

block accesses than a binary search on the data file

-Performance Insertion and deletion of records is problematic

• Not only do we have to move records in the data file we also have to
change some index entries

Storage Overhead is not a serious problem

Clustering indexing is a database indexing technique
that is used to physically arrange the data in a table
based on the values of the clustered index key. This
means that the rows in the table are stored on disk in the
same order as the clustered index key

The leaf nodes of a clustered index contain the data

pages.

Clustering
Indexes A clustered index is faster. A non-clustered index is
slower. The clustered index requires less memory for
operations. A non-Clustered index requires more memory
for operations.

A clustered index is most useful for columns that have

range predicates because it allows better sequential
access of data in the table. As a result, since like values
are on the same data page, fewer pages are fetched.
A clustering index is built for a data file sorted
on a non-key field.

Clustering The index file is another

sorted file whose records
First field is of the
same data type as
the clustering field
of the data file.

Indexes
are fixed length consisting Second field is a
of two fields. pointer to a disk
block

There is one entry in the clustering index for

each distinct value of the clustering field
containing the value and a pointer to the first
block in the data file that holds at least one
record with the value of the clustering field.
Index file requires • Sparse index
significantly fewer • Index file record typically smaller than data file
blocks than the data record.
file.

A binary search on the

index file requires

Clustering fewer block accesses

than a binary search
on the data file.
Indexes • We have to move records in the data file and we

Performance Insertion and deletion

of records is
problematic:
have to change some index entries.
• Common to reserve a whole block for each
distinct value of the clustering field with all
records with that value placed in the block.

Storage overhead is
not typically a serious
problem.
Secondary (Non-Clustered) Indexes

• A secondary index is built for a non-ordering field of a data file.

• The index file is itself a sorted file whose records are fixed or
variable length consisting of two fields.
• The first field is the same data type as the indexing field.
• The second field is a pointer to a disk block for a record.

• We can consider two types of secondary indexes:

• Case 1: Using a dense secondary index that maps to all records in the
data file.
• Case 2: Using a secondary index that has an entry for each distinct key
value but whose pointers can be multivalued or point to a bucket of
values.
Clustered v Non-Clustered

1. Difference 1: Only one clustered index per table. You can create
multiple non-clustered indexes in a single table
2. Difference 2: Clustered indexes only sort tables. Therefore, they do
not consume extra storage. Non-clustered indexes are stored in a
separate place from the actual table claiming more storage space.
3. Difference 3: Clustered indexes are faster than Non-clustered indexes
since they don’t involve any extra lookup step.
When an index file becomes large
and extends over many pages, the
search time for the required index
increases

Multi-Level
Indexes A multi-level index Treat the index like
any other file
attempts to overcome Split the index into a
this problem by number of smaller
indexes
reducing the search Maintain an index to
range the indexes
Multi-Level
Indexes
• Multilevel indexes refer to a
hierarchical structure of indexes.

• Here, each level of the index

provides a more detailed reference
to the data.

• It allows faster data retrieval,

reduces disk access, and improves
query performance.
Multi-Level Indexes - Performance

• Search performance increases when searching for a record

based on a specific indexing field value.
• Problems with insertions and deletions are still present.
• To retain the benefits of using multi-level indexing while
reducing insertion and deletion problems, an approach is
taken that leaves some space in each block for inserting
new entries.
• This is called dynamic multi-level index and is often
implemented using a data structure called a balanced tree
(B-trees and B+-trees).
There are two types of index

A table or view can contain the following types of indexes:

Clustered

NonClustered
Clustered Index

• A clustered index defines the order in which data is physically stored in a table. Table data
can be sorted in only way, therefore, there can be only one clustered index per table. In SQL
Server, the primary key constraint automatically creates a clustered index on that particular
column.

• The only time the data rows in a table are stored in sorted order is when the table
contains a clustered index. When a table has a clustered index, the table is called a
clustered table. If a table has no clustered index, its data rows are stored in an
unordered structure called a heap.

//Creating Clustered Index

CREATE Clustered Index IndexName_TableName_ColumnName
ON TableName(ColumnName ASC)
NonClustered Index

• A non-clustered index doesn’t sort the physical data inside the table. In fact, a non-clustered index is
stored at one place and table data is stored in another place. This is similar to a textbook where the
book content is located in one place and the index is located in another. This allows for more than one
non-clustered index per table

• The pointer from an index row in a nonclustered index to a data row is called a row locator. The
structure of the row locator depends on whether the data pages are stored in a heap or a clustered
table. For a heap, a row locator is a pointer to the row. For a clustered table, the row locator is the
clustered index key.

• You can add nonkey columns to the leaf level of the nonclustered index to by-pass existing index key
limits, and execute fully covered, indexed, queries. Both clustered and nonclustered indexes can be
unique.

CREATE NonClustered INDEX index_name ON table_name (column_name ASC);

Data Structures

• Most index data structures can be viewed as trees.

• In general, the root of this tree will always be in main memory,
while the leaves will be located on disk.
• The performance of a data structure depends on the number of
nodes in the average path from the root to the leaf.
• Data structure with high fan-out (maximum number of children of
an internal node) are thus preferred.
Hash index

• Indexes are used as entry points for memory-optimized tables.

Reading rows from a table requires an index to locate the data in
memory. A hash index consists of a collection of buckets organized
in an array. A hash function maps index keys to corresponding
buckets in the hash index.
• Records are placed on disk according to a hash function.
B-tree index

• In computer science, a B-tree is a self-balancing tree data

structure that keeps data sorted and allows searches, sequential
access, insertions, and deletions in logarithmic time. The B-tree is
a generalization of a binary search tree in that a node can have
more than two children (Comer 1979, p. 123).
B-Tree and Hash Tree
26

• Hash indexes don’t help

when evaluating range
queries

• Hash index outperforms B-

tree on point queries
Point Q ue rie s

Throughput(queries/sec)
60

50
40

30
20

10
0
B-T re e hash inde x
16/11/2023
B+-Trees  Special type of tree structure used for search purposes 27

Root Node

Child
Level 0 Node A Internal Node

Level 1 B C

Level 2 D E F

16/11/2023
Leaf Node
B+-Trees 28

B-Trees
◦ Invented in 1969, B-trees are still the prevailing data
structure for indexes in relational databases
◦ A search tree with some additional constraints on it.
◦ These constraints ensure that the tree is always
balanced and that the space wasted by deletion (if any)
never becomes excessive.

B+-Trees
◦ Most implementations of dynamic multilevel index use
a variation of the B-tree data structure called a B+-Tree 16/11/2023
Troubleshooting Techniques 29

• Monitoring a DBMS’s performance should be based on queries and

resources.
• The consumption chain helps distinguish problems’ causes from their
symptoms
• Existing tools help extracting relevant performance indicators

16/11/2023
Table tuning – Indexing 30

• What should be indexed?

• All attributes where you JOIN
• All attributes where you filter (WHERE)
• All attributes where you ORDER or GROUP BY
• All attributes where you want to do an Index Scan instead of a Table scan.
• NOT on attributes with an evenly distributed low cardinality.

16/11/2023
Table tuning – Indexing 31

• How should tables be indexed?

• Indexes can only be used from left to right.
• Keep them short.

16/11/2023
B-Tree example Youtube 32

• http://www.youtube.com/watch?v=coRJrcIYbF4

 Uses Bayer & McCreight terminology

16/11/2023
Query type dictates the
best Index
Point Query

This is a query that will return at least one record due to a where
condition

Eg:
Select * from staff where StaffID = ‘12345’
Multi-Point Query

A Multi-Point query will return more than one record using an

equality condition

Eg:
SELECT * FROM EMPLOYEES
WHERE DEPARTMENT = ‘Human Resources’
A Range Query

This type of query will return a set of values within an interval or half-interval

Eg:
SELECT * FROM EMPLOYEE
WHERE AGE >=50 AND <70

SELECT * FROM EMPLOYEE

WHERE AGE >=70

Obviously, the Indexing of the AGE field could speed up retrievals.

A Prefix Match query

In this scenario a Prefix match query is where only the first part of the attribute or sequence of attributes is
specified.

Eg:
SELECT FIRSTNAME, SURNAME FROM EMPLOYEES
WHERE SURNAME LIKE ‘ST%’

This query will return all records with surnames staring with the letters ‘ST’.
In this example it would be obvious to index the Surname field, if this type of
query is to be run repeatedly.
The End

DP Ss3 Note First Term
100% (2)
DP Ss3 Note First Term
43 pages
Software Codes
100% (1)
Software Codes
19 pages
Indexing in DBMS
No ratings yet
Indexing in DBMS
12 pages
DBMS Unit 5
No ratings yet
DBMS Unit 5
58 pages
Indexes
No ratings yet
Indexes
4 pages
File Organization
No ratings yet
File Organization
41 pages
What Is Indexing?: Indexing Is A Data Structure Technique Which Allows You To Quickly Retrieve
100% (1)
What Is Indexing?: Indexing Is A Data Structure Technique Which Allows You To Quickly Retrieve
7 pages
CIT 401 Lecture Note
No ratings yet
CIT 401 Lecture Note
46 pages
Sample Quotation Letter
56% (18)
Sample Quotation Letter
4 pages
Module Iippt
No ratings yet
Module Iippt
27 pages
File Organization and Indexing
No ratings yet
File Organization and Indexing
13 pages
InCircuitSerialProgramming PDF
100% (1)
InCircuitSerialProgramming PDF
244 pages
Unit - 5 DBMS
No ratings yet
Unit - 5 DBMS
69 pages
Unit - 4
No ratings yet
Unit - 4
42 pages
Screenshot 2025-03-12 at 9.41.04 AM
No ratings yet
Screenshot 2025-03-12 at 9.41.04 AM
41 pages
Unit5 File Organization
No ratings yet
Unit5 File Organization
112 pages
DBMS Unit9
No ratings yet
DBMS Unit9
44 pages
Dbms Mod3
No ratings yet
Dbms Mod3
54 pages
DBMS Unit-5
No ratings yet
DBMS Unit-5
33 pages
Indexing in Database
No ratings yet
Indexing in Database
33 pages
Indexes and Statistics
No ratings yet
Indexes and Statistics
42 pages
CO3-Session-09 & 10
No ratings yet
CO3-Session-09 & 10
41 pages
Chapter 3
No ratings yet
Chapter 3
50 pages
UNIT-IV - File Organization
No ratings yet
UNIT-IV - File Organization
10 pages
Week 15 Physical Database Design Index - CH 17 Updated
No ratings yet
Week 15 Physical Database Design Index - CH 17 Updated
35 pages
SQL Indexes 2
No ratings yet
SQL Indexes 2
10 pages
Indexing Lecture Nov 2023 Detailed
No ratings yet
Indexing Lecture Nov 2023 Detailed
37 pages
Indexing - II
No ratings yet
Indexing - II
57 pages
Indexing
No ratings yet
Indexing
62 pages
File Organization and Indexing
No ratings yet
File Organization and Indexing
38 pages
Index and Hashing 2017 Combined
No ratings yet
Index and Hashing 2017 Combined
60 pages
Index 1
No ratings yet
Index 1
25 pages
7-Indexing and Block
No ratings yet
7-Indexing and Block
20 pages
Chapter - 2 - Revision
No ratings yet
Chapter - 2 - Revision
26 pages
Indexing and Hashing: Basic Concept, Ordered Indices: Adbms
No ratings yet
Indexing and Hashing: Basic Concept, Ordered Indices: Adbms
22 pages
SS3 Term 1
No ratings yet
SS3 Term 1
18 pages
EBRIT MULTI METER WITH RS485 Rev-1
No ratings yet
EBRIT MULTI METER WITH RS485 Rev-1
10 pages
SQL 8
No ratings yet
SQL 8
18 pages
Dbms r18 Unit 5 Notes
No ratings yet
Dbms r18 Unit 5 Notes
24 pages
Dbms r18 Unit 5 Notes
No ratings yet
Dbms r18 Unit 5 Notes
24 pages
DBMS Unit-5
No ratings yet
DBMS Unit-5
23 pages
Introduction To Indexing in Database Management Systems Print
No ratings yet
Introduction To Indexing in Database Management Systems Print
12 pages
DBMS Seminar
No ratings yet
DBMS Seminar
12 pages
R22 Unit 5
No ratings yet
R22 Unit 5
23 pages
Indexes
No ratings yet
Indexes
70 pages
Co3 Session 21
No ratings yet
Co3 Session 21
53 pages
M12 Indexing in DBMS
No ratings yet
M12 Indexing in DBMS
18 pages
Primary Indexing
No ratings yet
Primary Indexing
7 pages
Co2 - Index in DBMS 1
No ratings yet
Co2 - Index in DBMS 1
29 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
15 pages
FALLSEM2019-20 ITE1003 ETH VL2019201002592 Reference Material I 06-Nov-2019 Indexing
No ratings yet
FALLSEM2019-20 ITE1003 ETH VL2019201002592 Reference Material I 06-Nov-2019 Indexing
32 pages
CO3 Notes Indexing
No ratings yet
CO3 Notes Indexing
11 pages
S - UNIT VII Indexing in Database
No ratings yet
S - UNIT VII Indexing in Database
9 pages
Indexing
No ratings yet
Indexing
6 pages
Indexing
No ratings yet
Indexing
6 pages
Indexing Structures For Files
No ratings yet
Indexing Structures For Files
23 pages
DBMS - R2017 - Anna University
No ratings yet
DBMS - R2017 - Anna University
20 pages
Link
No ratings yet
Link
4 pages
Indexing in DBMS
No ratings yet
Indexing in DBMS
5 pages
Lesson 4 - Indexing
No ratings yet
Lesson 4 - Indexing
6 pages
ISCSI Multi-Path-IO MPIO Example
100% (1)
ISCSI Multi-Path-IO MPIO Example
13 pages
Database Index PDF
No ratings yet
Database Index PDF
6 pages
Indexing
No ratings yet
Indexing
6 pages
TSO00364USEN - Ibm System Storage Product Guide
No ratings yet
TSO00364USEN - Ibm System Storage Product Guide
20 pages
Excel Advanced Course
No ratings yet
Excel Advanced Course
4 pages
Classes in Python
No ratings yet
Classes in Python
5 pages
Microsoft V Motorola ITC '376 Patent Claim Chart
No ratings yet
Microsoft V Motorola ITC '376 Patent Claim Chart
265 pages
File Handling MCQ-1-10
No ratings yet
File Handling MCQ-1-10
10 pages
Best Ways To Create Password Reset Disk For Windows
No ratings yet
Best Ways To Create Password Reset Disk For Windows
17 pages
Dumpsys ANR WindowManager
No ratings yet
Dumpsys ANR WindowManager
1,733 pages
ABLEPick Communication Protocol (V2.9)
No ratings yet
ABLEPick Communication Protocol (V2.9)
85 pages
Form 1 Computer Studies PP1 Ans
No ratings yet
Form 1 Computer Studies PP1 Ans
6 pages
Colibri T20 Datasheet Preliminary V0!91!2010!12!23
No ratings yet
Colibri T20 Datasheet Preliminary V0!91!2010!12!23
38 pages
Storage (S3, Cloudfront)
No ratings yet
Storage (S3, Cloudfront)
21 pages
Presentation:: Topic: Shift Register & Types
No ratings yet
Presentation:: Topic: Shift Register & Types
12 pages
Windows Virtual Desktop: User Login Instruction Guide
No ratings yet
Windows Virtual Desktop: User Login Instruction Guide
20 pages
Dolphin 6510 Mobile Computer: User's Guide
No ratings yet
Dolphin 6510 Mobile Computer: User's Guide
74 pages
Java Sample Program
No ratings yet
Java Sample Program
6 pages
Brother QL-800
No ratings yet
Brother QL-800
12 pages
Logcat 1729738744061
No ratings yet
Logcat 1729738744061
17 pages
Valvelink Software
No ratings yet
Valvelink Software
12 pages
Release Notes Xerox CX Print Server, Powered by Creo, For Xerox 700 Digital Color Press
No ratings yet
Release Notes Xerox CX Print Server, Powered by Creo, For Xerox 700 Digital Color Press
6 pages
QM Netstorm
No ratings yet
QM Netstorm
6 pages
Creating A Calculator Visual Studio C#
No ratings yet
Creating A Calculator Visual Studio C#
19 pages
Edge Adaptive Image Steganography Based On LSB Matching Revisited Code
No ratings yet
Edge Adaptive Image Steganography Based On LSB Matching Revisited Code
24 pages
UK Resume
No ratings yet
UK Resume
1 page
KDR-11708HD-DVI: 7" Full HD LCD Console With 8 Ports DVI KVM Switch
No ratings yet
KDR-11708HD-DVI: 7" Full HD LCD Console With 8 Ports DVI KVM Switch
1 page
HHJGHH
No ratings yet
HHJGHH
1 page
Databases: System Concepts, Designs, Management, and Implementation
From Everand
Databases: System Concepts, Designs, Management, and Implementation
Jonathan Rigdon
No ratings yet
Introduction to Microsoft SQL Server
From Everand
Introduction to Microsoft SQL Server
Eric Frick
No ratings yet

Indexing Lecture Nov 2023 Summary

Uploaded by

Indexing Lecture Nov 2023 Summary

Uploaded by

Indexing

• Heap (Unordered) – Records are placed on disk in no

• Ordered (Sequential) – Records are placed on disk

• Hash – Records are placed on disk according to a

Single-Level Ordered Indexes

B-Trees and B+-Trees

• Primary Index: if data is sequentially ordered and

• Clustering Index: if the data file is sequentially

• Secondary Index: An index that is defined on a non-

• A file can have at most one physical ordering field.

• An index can be sparse or dense:

1:1 index file to data file is intuitive but wasteful

Index File, sorted Data File, sorted

Primary Index A binary search on the index file requires fewer

-Performance Insertion and deletion of records is problematic

Storage Overhead is not a serious problem

The leaf nodes of a clustered index contain the data

A clustered index is most useful for columns that have

Clustering The index file is another

There is one entry in the clustering index for

A binary search on the

Clustering fewer block accesses

Performance Insertion and deletion

• A secondary index is built for a non-ordering field of a data file.

• We can consider two types of secondary indexes:

• Here, each level of the index

• It allows faster data retrieval,

• Search performance increases when searching for a record

A table or view can contain the following types of indexes:

//Creating Clustered Index

CREATE NonClustered INDEX index_name ON table_name (column_name ASC);

• Most index data structures can be viewed as trees.

• Indexes are used as entry points for memory-optimized tables.

• In computer science, a B-tree is a self-balancing tree data

• Hash indexes don’t help

• Hash index outperforms B-

• Monitoring a DBMS’s performance should be based on queries and

• What should be indexed?

• How should tables be indexed?

 Uses Bayer & McCreight terminology

A Multi-Point query will return more than one record using an

SELECT * FROM EMPLOYEE

Obviously, the Indexing of the AGE field could speed up retrievals.

You might also like