0% found this document useful (0 votes)
16 views21 pages

CNG351 Lecture 12 A

Uploaded by

berayseray382
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views21 pages

CNG351 Lecture 12 A

Uploaded by

berayseray382
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Indexing Structures for Files

CNG351 - Data Management and File Structures


Lecture - 12
Instructor: Dr. Yeliz Yesilada
Outline
• Types of Single-level Ordered Indexes
– Primary Indexes
– Clustering Indexes
– Secondary Indexes
• Multilevel Indexes
• Dynamic Multilevel Indexes Using B-Trees and
B+-Trees
• Indexes on Multiple Keys

CNG 351 - lecture 11 2/23


Overview
• Indexes are used to speed up the retrieval of records
in response to certain search conditions.
• Indexes do not affect the physical placement of
records on disk.
• Any field of the file can be used to create an index
and multiple on different fields can be constructed on
the same file.
• Types of indexes:
– Ordered files (single-level indexes)
– Tree data structures (multilevel indexes, B +-trees)

CNG 351 - lecture 11 3/23


Single-Level Ordered Indexes
• A single-level index is an auxiliary file that makes it
more efficient to search for a record in the data file.
• The idea behind an ordered index access structure is
similar to that behind the index used in the textbook.
• The index is usually specified on one field of the file
(although it could be specified on several fields)
• One form of an index is a file of entries <field value,
pointer to record>, which is ordered by field value
• The index is called an access path on the field.

CNG 351 - lecture 11 4/23


Single-Level Ordered Indexes II
• An index access structure is usually defined on a single field of
a file, called an indexing field.
• The values in the index are ordered so that we can do a binary
search on an index.
• The index file usually occupies considerably less disk blocks
than the data file because its entries are much smaller
• A binary search on the index yields a pointer to the file record
• Indexes can also be characterized as dense or sparse
– A dense index has an index entry for every search key
value (and hence every record) in the data file.
– A sparse (or nondense) index, on the other hand, has
index entries for only some of the search values

CNG 351 - lecture 11 5/23


Types of Single-Level Ordered Indexes
• There are several types of ordered indexes:
1. A primary index is specified on the ordering key field
of an ordered file of records.
2. Clustering index, if the ordering field is not a key
field, then clustering index can be used.
3. Secondary index, can be specified on any
nonordering field of a file.

CNG 351 - lecture 11 6/23


1. Primary Index
• Defined on an ordered data file.
• The data file is ordered on a key field.
• Includes one index entry for each block in the data
file; the index entry has the key field value for the first
record in the block, which is called the block anchor.
• A similar scheme can use the last record in a block.
• A primary index is a nondense (sparse) index, since
it includes an entry for each disk block of the data file
and the keys of its anchor record rather than for
every search value.
CNG 351 - lecture 11 7/23
CNG 351 - lecture 11 8/23
Primary Index
• The total number of entries in the index is the same as the
number of disk blocks in the ordered data file.
• The index file needs substantially fewer blocks than does the
data file, for two reasons:
– There are fewer index entries than there are records in the
data file;
– Each index entry is typically smaller in size than a data
record because it has only two fields.
• A binary search on an index file requires fewer block access
than a binary search on the data file.
• A record whose primary key value is K lies in the block whose
address is P(i), where K(i) <= K < K(i+1)
CNG 351 - lecture 11 9/23
Primary Index Limitation
• A major problem with a primary index - as with any
ordered file - is insertion and deletion of records.
• Using overflow file can address this problem.
• Another possibility is to use a linked list of overflow of
records for each block in the data file.

CNG 351 - lecture 11 11/23


2. Clustering Index
• If the records are physically ordered on a nonkey field - which
does not have distinct value for each record - that field is called
the clustering field.
• An index called, clustering index is created to speed up the
retrieval of records that have the same value for the clustering
field.
• This differs from the primary index which requires that the
ordering field of the data file have a distinct value for each
record.
• It is another example of nondense index where Insertion and
Deletion is relatively straightforward with a clustering index.
• A clustering index is also an ordered file with two fields; the first
field is of the same type as the clustering field of the data file,
and the second field is a block pointer.
CNG 351 - lecture 11 12/23
A Clustering Index Example

CNG 351 - lecture 11 13/23


Clustering Index Limitation..
• Record insertion and deletion still cause problems
because the data records are physically ordered.
• To alleviate the problem of insertion, it is common to
reserve a whole block for each value of the clustering
fields; all records with that value are placed in the
block.

CNG 351 - lecture 11 14/23


Another Clustering Index Example

CNG 351 - lecture 11 15/23


3. Secondary Index
• A secondary index provides a secondary means of
accessing a file for which some primary access already
exists.
• The secondary index may be on a field which is a
candidate key and has a unique value in every record, or
a non-key with duplicate values.
• The index is an ordered file with two fields.
– The first field is of the same data type as some non-
ordering field of the data file that is an indexing field.
– The second field is either a block pointer or a record pointer.
– There can be many secondary indexes (and hence, indexing
fields) for the same file.
• Includes one entry for each record in the data file; hence,
it is a dense index
CNG 351 - lecture 11 16/23
CNG 351 - lecture 11 17/23
Secondary Index
• A secondary index usually needs more storage space
and longer search time than does a primary index,
because of its larger number of entries.
• However, the improvement for in search time for an
arbitrary record is much greater for a secondary
index than for a primary index, since we would have
to do a linear search on the data file if the secondary
storage did not exist.

CNG 351 - lecture 11 18/23


Secondary Index - Nonkey Field
• There are different options for implementing this:
– Option 1: Dense index, include several entries with the
same K(i).
– Option 2: Variable-length records for the index entries,
with a repeating field for the pointer.
– Option 3: More commonly used, is to keep index
entries themselves at a fixed length and have a single
entry for each index field value but to create another
indirection to handle the multiple pointers.

CNG 351 - lecture 11 20/23


CNG 351 - lecture 11 21/23
Properties of Single-Level Index Types

CNG 351 - lecture 11 22/23


Summary
• Types of Single-level Ordered Indexes
– Primary Indexes
– Clustering Indexes
– Secondary Indexes

CNG 351 - lecture 11 23/23

You might also like