Chapter 03 - Database Indexing and Tuning - converted
Chapter 03 - Database Indexing and Tuning - converted
• At the end of this lesson, you will be able to; 3.1 Disk Storage and Basic File Structures
• Describe how the different computer memories evolved. 3.1.1 Computer memory hierarchy
• Identify the different types of computer memory and the storage 3.1.2 Storage organization of databases
organizations of databases
• Recognize the importance of implementing indexes on databases 3.1.3 Secondary storage mediums
• Explain the key concepts of the different types of database indexes. 3.1.4 Solid State Device Storage
3.1.5 Placing file records on disk (types of records)
3.1.6 File Operations
3.1.7 Files of unordered records (Heap Files) and ordered
records (Sorted Files)
3.1.8 Hashing techniques for storing database records: Internal
hashing, external hashing
3.1 Disk Storage and Basic File Structures 3.1 Disk Storage and Basic File Structures
3.1.1. Computer Memory Hierarchy
3.1.1. Computer Memory Hierarchy
• The data collected via a computational database Now let’s explain the hierarchy given in the previous slide
should be stored in a physical storage medium. (slide number 7).
• Once stored in a storage medium, the database
management software can execute functions on that 1. Primary Storage
to retrieve, update and process the data.
• In the current computer systems, data is stored and This operates directly in the computer’s Central
moved across a hierarchy of storage media. Processing Unit.
• As for the memory organization, the memory with the Eg: Main Memory, Cache Memory.
highest speed is the most expensive option and it also
• Provides fast access to data.
has the lowest capacity.
• Limited storage capacity.
• When it comes to lowest speed memory, they are the
• Contents of primary storage will be deleted when the
options with the highest available storage capacity.
computer shuts down or in case of a power failure.
• Comparatively more expensive.
• Static Random Access Memory (RAM) is the memory • Dynamic Random Access Memory (DRAM) is the
where as long as power is provided. CPU's space for storing application instructions and
• Cache memory in CPU is identified as the Static RAM data.
• Data is kept as bits in its memory. • Main memory of the computer is identified as the
• The most expensive type of memory. DRAM.
• Using techniques like prefetching and pipelining, the • The advantage of the DRAM is its low cost.
Cache memory speeds up the execution of program • When it is compared with the Static RAM, the speed is
instructions for the CPU. lesser.
1
© e-Learning Centre, UCSC 9 © e-Learning Centre, UCSC
0
3.1 Disk Storage and Basic File Structures 3.1 Disk Storage and Basic File Structures
3.1 Disk Storage and Basic File Structures 3.1 Disk Storage and Basic File Structures
1 1
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
5 6
Activity 3.1 Disk Storage and Basic File Structures
3.1.2. Storage Organization of Databases
Categorize the following devices as Primary, Secondary or • Usually databases have Persistent data. This means
Tertiary Storage Media. large volumes of data stored over long periods of
1. Random Access Memory time.
2. Hard Disk Drive • These persistent data are continuously retrieved and
3. Flash Drive processed in the storage period.
4. Tape Libraries • The place where the databases are stored
5. Optical Jukebox permanently in the computer memory is the
6. Magnetic Tape secondary storage.
7. Main Memory • Magnetic disks are widely used here since:
- If the database is too large, it will not fit in the
main memory.
- Secondary storage is non-volatile, but the
main memory is volatile.
- The cost of storage per unit of data is lesser in
secondary storage.
1 1
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
7 8
3.1 Disk Storage and Basic File Structures 3.1 Disk Storage and Basic File Structures
1 2
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
9 0
3.1 Disk Storage and Basic File Structures 3.1 Disk Storage and Basic File Structures
3.1.2. Storage Organization of Databases
3.1.2. Storage Organization of Databases
• The data on disk is grouped into Records or Files. • Primary File Organization defines how the data is
• These records include data about entities, attributes stored physically in the disk and how they can be
and relationships. accessed.
• Whenever a certain portion of the data retrieved from
the DB for processing, it needs to be found on disk, File Organization Description
copied to main memory for processing, and then
rewritten to the disk if the data gets updated. Heap File No particular order in storing data.
• Therefore, the data should be kept on disk in a way Appends new records to the end.
that allows them to be quickly accessed when they are Sorted File Maintains an order for the records by
needed. sorting data on a particular field.
Hashed File Uses the hash function of a field to identify
the record’s place in the database.
B Trees Use Tree structures for record storing.
2 2
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
1 2
State whether the following statement are true or false. • The device that holds the magnetic disks is the Hard
Disk Drive (HDD).
1. The place of permanently storing databases is the • Basic unit of data on a HDD is the Bit. Bits together
primary storage. make Bytes. One character is stored using a single
2. A Heap File has a specific ordering criterion where the byte.
new records are added at the end. • Capacity of a disk is the number of bytes the disk
3. Upon retrieval of data from a file, it needs to be found can store.
on disk and copied to main memory for processing. • Disks are composed of magnetic material in the
4. The database administrators need to be aware of the shape of a thin round disk, with a plastic or acrylic
physical structuring of the database to identify whether cover to protect it.
they can be sold to a client. • Single Sided Disk stores information on one of its
5. Solid State Drives are identified alternatives for surfaces.
magnetic disks. • Double Sided Disk stores information on both sides
of its surfaces.
• A few disks assembled together makes a Disk Pack
which has higher storage capacity.
2 2
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
3 4
3.1 Disk Storage and Basic File Structures 3.1 Disk Storage and Basic File Structures
3.1.3. Secondary Storage Media
3.1.3. Secondary Storage Media
2 2
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
5 6
3.1 Disk Storage and Basic File Structures 3.1 Disk Storage and Basic File Structures
3.1.3. Secondary Storage Media Match the description with the relevant technical term out
• A collection of several shared blocks is called a of the following.
Cluster [Capacity, Track, Buffer, Hardware Address of a Block,
• The hardware mechanism that reads or writes a Cluster]
block of data is the Read / Write Head
• An electronic component is coupled to a mechanical
arm in a read/write head. 1. The concentric circles on a disk where information is
• Fixed Head Disks - The read/write heads on disk stored.
units are fixed, with as many heads as there are 2. Combination of a cylinder number, track number, and
tracks. block number
• Movable Head Disks - Disk units with an actuator 3. Number of Bytes that a disk can store
connected to a second electrical motor that moves 4. Collection of shared blocks
the read/write heads together and accurately 5. A disk block stored in a reserved location in primary
positions them over the cylinder of tracks defined in storage.
a block address.
2 3
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
9 0
3.1 Disk Storage and Basic File Structures 3.1 Disk Storage and Basic File Structures
3.1.4. Solid State Device Storage 3.1.4. Solid State Device Storage
• Solid State Device (SSD) Storage is sometimes • As opposed to HDD, where Blocks and Cylinders
known as Flash Storage. should be pre-assigned for storing data, any
• They have the ability to store data on secondary address on an SSD can be directly addressed,
storage without requiring constant power. since there are no restrictions on where data can be
• A controller and a group of interconnected flash stored.
memory cards are the essential components of an • With this direct access, data is less likely to be
SSD. fragmented, and the need for restructuring is not
• SSDs can be plugged into slots already available for available.
mounting Hard Disk Drives (HDDs) on laptops and • Dynamic Random Access Memory (DRAM)-based
servers by using form factors compatible with HDDs. SSDs are also available in addition to flash
• SSDs are identified to be more durable, run silently, memory.
faster in terms of access time, and delivers better • DRAM based SSDs are more expensive than flash
transfer rates than HDD because there are no memory, but they provide faster access. However,
moving parts. they need an internal power supplier to perform.
3 3
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
1 2
Activity 3.1 Disk Storage and Basic File Structures
State four key features of a Solid State Drive (SSD). 3.1.5. Placing File Records on Disk
1.____________________ Explanation can be found next slide.
2.____________________
Field
3.____________________
4.____________________
Emp_No Name Data_Of_Birth Position Salary
Employee Relation
Value
Record
3
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
3
3.1 Disk Storage and Basic File Structures 3.1 Disk Storage and Basic File Structures
3.1.5. Placing File Records on Disk
3.1.5. Placing File Records on Disk
• As shows in the previous slide, columns of the table
are called fields; rows are called records; each cell
data item is called value. Create table Employee
• The Data Type is one of the standard data types that (
are used in programming. Emp_No Int,
- Numeric (Integer, Long Integer, Floating Point) Name Char (50),
- Characters / Strings (Fixed length, varying Date_Of_Birth Date, An example for the
length) Position Char (50), Creation of
Salary Int Employee relation
- Boolean (True or False and 0 or 1)
); using MySQL with
- Date, Time data types.
• For a particular computer system, the number of bytes
necessary for each data type is fixed.
3 3
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
5 6
Select the Data Type that best matches the description out of 3.1.5. Placing File Records on Disk
the following.
[Integer, Floating Point, Date and Time, Boolean, Character] • File is a sequence of records. Usually all records in a
file belong to the same record type.
• If the size of each record in the file is the same (in
1. NIC Number of Sri Lankans bytes) the file is known to be made up of Fixed
2. The access time of users for the Ministry of Health website Length Records.
within a week • Variable Length Records means that different
3. The number of students in a class records of the file are of different sizes.
4. Cash balance of a bank account • Reasons to have variable length records in a file:
5. Response to the question by a set of students whether - One or more fields are of different sizes.
they have received the vaccination for Rubella. - For individual records, one or more of the fields
may have multiple values (Repeating Group/
Field)
- One or more fields are optional (Optional Fields)
- File includes different record types (Mixed File)
3 3
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
7 8
3.1 Disk Storage and Basic File Structures 3.1 Disk Storage and Basic File Structures
3.1.5. Placing File Records on Disk 3.1.5. Placing File Records on Disk
3 4
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
9 0
3.1 Disk Storage and Basic File Structures 3.1 Disk Storage and Basic File Structures
3.1.5. Placing File Records on Disk 3.1.5. Placing File Records on Disk
4 4
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
1 2
3.1 Disk Storage and Basic File Structures 3.1 Disk Storage and Basic File Structures
3.1.5. Placing File Records on Disk 3.1.5. Placing File Records on Disk
• In a record with Optional fields;
Record Storage Format 3 • A series of <Field-Name, Field-Value> pairs can
Eg: A variable-field record with three types of separator be added in each record instead of the field values
characters if the overall number of fields for the record type is
high but the number of fields that actually occur in
a typical record is low.
• It will be more practical to store a Field Type
code, to each field and include in each record a
series of <Field-Type, Field-Value>.
• In a record with a Repeating Field;
• One separator character can be used to separate
the field's repeated values and another separator
character can be used to mark the field's end.
4 4
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
3 4
Activity
3.1 Disk Storage and Basic File Structures
Fill in the blanks in the following statements.
1. A file where the sizes of records in it are different in size 3.1.5. Placing File Records on Disk
is called a _______________. • Block is a unit of data transfer between disk and
memory.
• When the block size exceeds the record size, each
2. A _________________ includes different types of block will contain several records, however, certain
records inside it. files may have exceptionally large records that cannot
fit in a single block.
• Blocking Factor (bfr) is the number of records per
3. In a file, the records belong to _________ record type. block in bytes.
• If Block Size> Record Size,
4. A ___________ length record can be made by assigning bfr can be calculated using the below equation.
“Null” for optional fields where data values are not
available
bfr = B / R
5. To determine and terminate variable lengths special Block Size = B bytes
characters named as __________ can be used. 4
Record Size = R bytes 4
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
5 6
3.1 Disk Storage and Basic File Structures 3.1 Disk Storage and Basic File Structures
3.1.5. Placing File Records on Disk 3.1.5. Placing File Records on Disk
• In calculating the bfr a floor function rounds down the • Upon using the unused space, to minimize waste of
number to the nearest integer. space, a part of a record can be stored in one block
• But, when the bfr is calculated, there may be some and the other part can be stored in another block.
additional space remaining in each block. • If the next block on disk is not the one holding the
• The unused space can be calculated with the equation remainder of the record, a Pointer at the end of the
given below. first block refers to it.
• Spanned Organization of Records - One record
spanning to more than one block.
• Used when a record is larger than the block size.
Unused Space in bytes = B - bfr* R • Unspanned Organization of Records - Not allowing
records to span into more than one block.
• Used with fixed length records.
Block size Space dedicated
for blocks
4 4
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
7 8
3.1 Disk Storage and Basic File Structures 3.1 Disk Storage and Basic File Structures
3.1.5. Placing File Records on Disk 3.1.5. Placing File Records on Disk
Let’s look at the representation of Spanned and Unspanned • A spanned or unspanned organization can be utilized
Organization of Records. in variable-length records.
• If it is a spanned organization, each block may store a
different number of records.
• Here the bfr would be the average number of records
per block.
• Hence the number of blocks b needed for a file of r
records is,
b = r / bfr
4 5
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
9 0
3.1 Disk Storage and Basic File Structures 3.1 Disk Storage and Basic File Structures
3.1.5. Placing File Records on Disk 3.1.5. Placing File Records on Disk
Activity
3.1 Disk Storage and Basic File Structures
State the answer for the following calculations.
3.1.6 File Operations
Consider a disk with block size B=512 bytes. A file
has r=30,000 EMPLOYEE records of fixed-length.
Each record has the following fields: NAME (30 bytes), Operations on
NIC (9bytes), DEPARTMENTCODE (9 bytes), Files
ADDRESS (40 bytes), PHONE (9 bytes),BIRTHDATE
(8 bytes), SEX (1 byte), JOBCODE (4 bytes), SALARY
(4 bytes, real number). An additional byte is used as a Retrieval Update
deletion marker. Operations Operations
Emp_No Name Data_Of_Birth Position Salary • When several file records meet a search criterion,
the first record in the physical sequence of file
records is identified and assigned as the Current
0001 Nimal 1971 - 04 - 13 Manager 70,000 Record. Following search operations will start with
this record and find the next record in the file that
0005 Krishna 1980 - 01 - 25 Supervisor 50,000 meets the criterion.
• Simple Selection Condition • The actual procedures for identifying and retrieving
file records differ from one system to the next.
Search for the record where Emp_No = “0005”
• Complex Selection Condition
Search for the record where Salary>60,000
5 5
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
7 8
3.1 Disk Storage and Basic File Structures 3.1 Disk Storage and Basic File Structures
3.1.6. File Operation 3.1.6. File Operations
5 6
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
9 0
3.1 Disk Storage and Basic File Structures 3.1 Disk Storage and Basic File Structures
3.1 Disk Storage and Basic File Structures 3.1 Disk Storage and Basic File Structures
• The following are called “Set at a time” operations • File Organization - The way a file's data is
since they are applied to the file in full. organized into records, blocks, and access
structures, including how records and blocks are
Operation Description put on the storage media and interconnected.
6 6
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
3 4
Activity
3.1 Disk Storage and Basic File Structures
Match the following descriptions with the relevant file
3.1.6. File Operations operation out of the following.
• Dynamic Files - The files on which modifications 1. Returns the initial record if the file has just been
are frequently done. opened or reset; otherwise, returns the next record.
2. Releases the buffers and does any other necessary
• Read Only File - A file where modifications cannot cleaning actions
be done by the end user. 3. Sets the file pointer of an open file to the beginning of
the file
4. The first record that meets a search criterion is found
5. Locates all the records in the file that satisfy a search
condition.
6 6
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
5 6
3.1 Disk Storage and Basic File Structures 3.1 Disk Storage and Basic File Structures
3.1.7. Files of unordered records (Heap Files) and 3.1.7. Files of unordered records (Heap Files) and
ordered records (Sorted Files) ordered records (Sorted Files)
Files of Unordered Records (Heap Files) Files of Unordered Records (Heap Files)
• Records are entered into the file in the order in • If just one record meets the search criteria, the
which they are received, thus new records are program will typically read into memory and search
placed at the end. half of the file blocks before finding the record. Here,
• Inserting a new record is quick and efficient. The on average, searching (b/2) blocks for a file of b
file's last disk block is transferred into a buffer, blocks is required.
where the new record is inserted before the block is • If the search criteria is not satisfied by any records or
overwritten to disk. Then the final file block's there are many records, the program must read and
address is saved in the file header. search all b blocks in the file.
• Searching for a record is done by the Linear
Search.
6 6
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
7 8
3.1 Disk Storage and Basic File Structures 3.1 Disk Storage and Basic File Structures
3.1.7. Files of unordered records (Heap Files) and 3.1.7. Files of unordered records (Heap Files) and
ordered records (Sorted Files) ordered records (Sorted Files)
Files of Unordered Records (Heap Files) Files of Unordered Records (Heap Files)
6 7
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
9 0
3.1 Disk Storage and Basic File Structures 3.1 Disk Storage and Basic File Structures
3.1.7. Files of unordered records (Heap Files) and 3.1.7. Files of unordered records (Heap Files) and
ordered records (Sorted Files) ordered records (Sorted Files)
Files of Unordered Records (Heap Files) Files of Unordered Records (Heap Files)
7 7
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
1 2
3.1 Disk Storage and Basic File Structures 3.1 Disk Storage and Basic File Structures
3.1.7. Files of unordered records (Heap Files) and 3.1.7. Files of unordered records (Heap Files) and
ordered records (Sorted Files) ordered records (Sorted Files)
Files of Ordered Records (Sorted Files) Files of Ordered Records (Sorted Files)
• The values of one of the fields of a file's records, • Benefits of Ordered records:
called the Ordering Field can be used to physically • Because no sorting is necessary, reading the
order the data on disk. It will generate an ordered or records in order of the ordering key values
sequential file. becomes highly efficient.
• Ordered records offer a few benefits over files that are • Because the next item is in the same block as the
unordered. current one, locating it in order of the ordering key
• The benefits are listed in the next slide. typically does not need any extra block
visits.When the binary search approach is
employed, a search criterion based on the value
of an ordering key field results in quicker access.
7 7
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
3 4
3.1 Disk Storage and Basic File Structures 3.1 Disk Storage and Basic File Structures
3.1.8. Hashing techniques for storing database records:
3.1.8. Hashing techniques for storing database records:
Internal hashing, external hashing
Internal hashing, external hashing
• Internal Hashing.
• Internal Hashing.
When it comes to internal files, hashing is usually
done with a Hash Table and an array of records.
• Method 1 for Internal Hashing
• If the array index range is 0 to m – 1, there are m slots
with addresses that correspond to the array indexes.
• Then a hash function is selected that converts the
value of the hash field into an integer between 0 and
m-1.
• The record address is then calculated using the given
function.
• h(K) = Hash Function of K Value Internal Hashing Data Structure - Array of m positions to use in
• K = Field Value h(K) = K Mod m internal hashing
7 7
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
7 8
3.1 Disk Storage and Basic File Structures 3.1 Disk Storage and Basic File Structures
3.1.8. Hashing techniques for storing database records: 3.1.8. Hashing techniques for storing database records:
Internal hashing, external hashing Internal hashing, external hashing
• Internal Hashing. • Internal Hashing.
8 8
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
1 2
3.1 Disk Storage and Basic File Structures 3.1 Disk Storage and Basic File Structures
3.1.8. Hashing techniques for storing database 3.1.8. Hashing techniques for storing database records:
records: Internal hashing, external hashing Internal hashing, external hashing
• External Hashing.
• External Hashing. The following diagram shows matching bucket
• Hashing for disk files is named as External numbers (0 to M -1) to disk block addresses.
Hashing.
• The target address space is built up of Buckets,
each of which stores many records, to match the
properties of disk storage.
• A bucket is a continuous group of disk blocks or a
single disk block.
• Rather than allocating an absolute block address to
the bucket, the hashing function translates a key to a
relative bucket number.
• The bucket number is converted into the matching
disk block address via a table in the file header.
8 8
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
3 4
3.1 Disk Storage and Basic File Structures 3.1 Disk Storage and Basic File Structures
3.1.8. Hashing techniques for storing database 3.1.8. Hashing techniques for storing database records:
records: Internal hashing, external hashing Internal hashing, external hashing
• External Hashing.
• External Hashing. Handling overflow for buckets by chaining
• Since many records will fit in a bucket can hash to
the same bucket without generating issues, the
collision problem is less severe with buckets.
• When a bucket is full to capacity and a new record is
entered, a variant of chaining can be used in which a
pointer to a linked list of overflow records for the
bucket is stored in each bucket.
• Here, the linked list pointers should be Record
Pointers, which comprise a block address as well
as a relative record position inside the block.
8 8
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
5 6
8 8
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
7 8
3.2 Introduction to indexing 3.3 Types of Indexes
• Some Commonly used Types of Indexes • Single Level Indexes: Primary, Clustering and
• Single Level Ordered Indexes Secondary indexes
• Primary Index • Primary, Clustering and Secondary index are types
• Secondary Index of single level ordered indexes.
• Clustering Index • In some books, the last pages have ordered list of
• Multi Level Tree Structured Indexes words, which are categorized from A-Z. In each
• B Trees category they have put the word, as well as the page
• B+ Trees numbers where that particular word exactly appears.
• Hash Indexes These list of words are known as index.
• Logical Indexes • If a reader needs to find about a particular term,
• Multi Key Indexes he/she can go to the index and find the pages where
• Bitmap Indexes the term appears first and then can go through the
particular pages.
• Otherwise readers have to go through the whole
book, searching the term, which is similar to the
linear search.
8 9
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
9 0
• Single Level Indexes: Primary, Clustering and • Single Level Indexes: Primary, Clustering and
Secondary indexes Secondary indexes
• Primary Index - defined for an ordered file of • A file can have maximum of one physical ordering
records using the ordering key field. field. Therefore, a file can have one primary index or
• File records on a disk are physically ordered by the one clustering index. However, it cannot hold both
ordering key field. This ordering key field holds primary index and clustered index at once.
unique values for each record. • Unlike the primary indexes, a file can have few
• Clustering index is applied when multiple records in secondary indexes additional to the primary index.
the file have same value for the ordering field; here
the ordering field is a non key field. In this scenario,
data file is referred as clustered file.
9 9
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
1 2
• Single Level Indexes: Primary indexes • Single Level Indexes: Primary indexes
• Primary indexes are access structures that used to • As mentioned before, index entry consist of two values.
increase the efficiency of searching and accessing i. Primary key field value of the first record in a data
the data records in a data file. block.
• An ordered file which consists two fields and
limited length records is known as a primary index
file. i. Pointer to the data block which contains above
• One field is the ordering key field. Ordering key primary key field.
field of the index file and the primary key of the data
file have same data type.
• The other field contains pointers to the disk blocks.
• Hence, the index file contains one index entry(a.k.a for index entry i, two field values can be referred as,
index record) for each block in the data file.
<K(i) P(i)>
9 9
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
3 4
The image given in the next slide illustrates the index file and
respective block pointers to the data file.
9 9
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
5 6
3.3 Types of Indexes 3.3 Types of Indexes
• Single Level Indexes: Primary indexes • Single Level Indexes: Primary indexes
• Dense index and Sparse index
• In the given illustration of the previous slide, i. Indexes that contain index entries for each
number of index entries in the index file is record in the data file (or each search key value)
equal to the number of disk blocks in the data referred to as dense index.
file.
ii. Indexes that contains index entries for some
• Anchor record/Block anchor: for a given block in records in the data file referred as sparse
an ordered data file, the first record in that block is index.
known as anchor record. Each block has an anchor
record. • Therefore, by definition, primary index falls into the
sparse (or the non - dense) index type since it does
not keep index entries for every record in the data
file. Instead, primary index keep index entries for
anchor records for each block which contains data
file.
9 9
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
7 8
• Single Level Indexes: Primary indexes • Single Level Indexes: Primary indexes
Ex: Let’s say we have an ordered file with its key field. File Ex: For the previous scenario given, if we have a primary
records are of fixed size and are unspanned. Following index file with 9 bytes long ordering key field (V) and 6 bytes
details are given and we are going to calculate the block long block pointer (P), the required block accesses can be
accesses require when performing a binary search, calculated as follows.
number of records r = 300,000 number of records r = 300,000
block size B = 4,096 bytes block size B = 4,096 bytes
record length R = 100 bytes index entry length Ri = (V+P)= 15
We can calculate the blocking factor, We can calculate the blocking factor for index,
bfr = (B/R)= floor(4,096/100) = 40 records per block bfr = (B/R)= floor(4,096/15) = 273 records per block
Hence, the number of blocks needed to store all records Number of index entries required is equal to number of blocks
required for data file.
b = (r/bfr) = ceiling(300,000/40)= 7,500 blocks. Hence, number of blocks needed for index file,
Block accesses required = log2 b bi = (r/bfr) = ceiling(7,500/273)= 28 blocks.
= ceiling(log2 7,500)= 13 Go to the next slide for the rest of the calculation
1 1
© e-Learning Centre, UCSC 0 © e-Learning Centre, UCSC 0
1 2
1 1
© e-Learning Centre, UCSC 0 © e-Learning Centre, UCSC 0
3 4
3.3 Types of Indexes 3.3 Types of Indexes
• Single Level Indexes: Primary indexes • Single Level Indexes: Clustering indexes
• An unordered overflow file, can be used to scale • When a datafile is ordered using a non-key field
down this problem. which does not consist of unique values, such file
• Adding a linked list of overflow records for each are known as clustered files. The field which is
block in the data file is another way to address this used to order the file is known as clustering field.
issue.
• Deletion markers can be used to manage the • Clustering index accelerate the retrieval of all
issues with record deletion. records whose clustering field (field that is used to
order the data file) has same value.
• In primary index the the ordering field consist of
distinct values unlike the clustering index.
1 1
© e-Learning Centre, UCSC 0 © e-Learning Centre, UCSC 0
5 6
• Single Level Indexes: Clustering indexes • Single Level Indexes: Clustering indexes
1 1
© e-Learning Centre, UCSC 0 © e-Learning Centre, UCSC 0
7 8
1 1
© e-Learning Centre, UCSC 0 © e-Learning Centre, UCSC 1
9 0
• Single Level Indexes: Secondary indexes • Single Level Indexes: Secondary indexes
• A secondary index provides an additional medium • For a single file, few secondary Indexes (and
for accessing a data file which already has a therefore indexing fields) can be created - Each of
primary access. these serves as an additional method of accessing
• Data file records can be ordered, not ordered or that file based on a specific field.
hashed. Yet, the index file is ordered. • For a secondary index created on a candidate key
• A candidate key which has unique values for every (unique key/ primary key), which has unique values
record or a non - key value which holds redundant for every record in the file, the secondary index will
values can be use as the indexing field to define the get entries for every record in the data file.
secondary indexing. • The reason to have entries for every record is, the
• The first field of the index file has the same data key attribute which is used to create secondary
type as the non-ordering field in the data file, which index has distinct values for each and every record.
is an indexing field. • In such scenarios, the secondary index will create a
• A block pointer or a record pointer is put in the dense index which holds key value and block
second field. pointer for each record in the data file.
1 1
© e-Learning Centre, UCSC 1 © e-Learning Centre, UCSC 1
3 4
• Single Level Indexes: Secondary indexes • Single Level Indexes: Secondary indexes
• Same as before in primary index, here also two • Due to the huge number of entries, a secondary
fields of index entries are referred as <K (i), P (i)>. index requires much storage capacity when
• Since the order of the data file is based on the value compared to the primary index.
of K (i), a binary search can be performed. • But, on the other hand, secondary indexing gives
greater improvement in the search time for an
• However, block anchors cannot be used since the arbitrary record.
records of the data file is not physically ordered by • Secondary index is more important because we
the values of the secondary key field. have to do a linear search of the data file, If there
• This is the reason for creating an index entry for was no secondary index.
each record of data instead of using block anchors • For a primary index, a binary search can be
like in primary index. performed in the main file even if the index is not
present.
1 1
© e-Learning Centre, UCSC 1 © e-Learning Centre, UCSC 1
5 6
• Single Level Indexes: Secondary indexes • Single Level Indexes: Secondary indexes
• If we perform a binary search on this secondary index • Comparing to the linear search, that required 3,750
the required number of block accesses can be block accesses,the secondary index shows a big
calculated as follows, improvement with 12 block accesses. But it is
(log2 bi) = ceiling(log21,099) slightly worse than the primary index which
needed only 6 block accesses.
= 11 block accesses.
• This difference is a result of the size of the primary
• Since we need additional block access to find the index. The primary index is sparse index, therefore,
record in the data file using the index, the total number it has only 28 blocks.
of block accesses required is,
• While the secondary index which is dense, require
11 + 1 = 12 block accesses. length of 1,099 blocks. This is longer when
compared to the primary index.
1 1
© e-Learning Centre, UCSC 1 © e-Learning Centre, UCSC 2
9 0
3.3.2 Multilevel indexes: Overview of multilevel
3.3 Types of Indexes
indexes
• Single Level Indexes: Secondary indexes • Considering a single-level index is an ordered file, we
• Secondary index retrieves the records in the order can create a primary index to the index file itself.
of the index field that we considered to create the • Here the original index file is called as the first-level
secondary index, because secondary indexing index and the index file created to the original index is
gives a logical ordering of the records. called as the second-level index.
• However, in primary and clustering index, it • We can repeat the process, creating a third, fourth, ...,
assumes that, physical ordering of the file is top level until all entries of the top level fit in one disk
similar to the order of the indexing field. block.
• A multi-level index can be created for any type of first
level index (primary, secondary, clustering) as long as
the first-level index consists of more than one disk
block.
1 1
© e-Learning Centre, UCSC 2 © e-Learning Centre, UCSC 2
1 2
3.3.2 Multilevel indexes: Overview of multilevel 3.3.2 Multilevel indexes: Overview of multilevel
indexes indexes
• As we have discussed in topic 3.3, an ordered index • If the first level index has r1 entries, blocking factor for the
file is associated to the primary, clustered and first level bfr1 = fo.
secondary indexing schemes. • The number of blocks required for the first level is given
• Binary search is used to find indexes and the by, ( r1 / fo).
algorithm continues to reduces the part of the index • Therefore, the number of records in the second level
file that search, by factor 2 in each step. Hence we index r2= ( r1 / fo).
use the log function to the base 2. (log 2 bi)
• Similarly, r3= ( r2 / fo).
• Multilevel indexing is used to faster this search by
reducing the search space if the blocking factor of the • However, we need to have second level only if the first
index is greater than 2. level requires more than 1 block. Likewise, we consider for
a next level only if the current level requires more than 1
• In multilevel indexing the blocking factor of the index block.
bfri referred to as fan-out which is symbolized as fo.
• If the top level is t,
• In multilevel indexing, the number of block accesses
required is (approximately) logfo bi.
1 t= ⎡ (logfo (r1)) ⎤ 1
© e-Learning Centre, UCSC 2 © e-Learning Centre, UCSC 2
3 4
1 1
© e-Learning Centre, UCSC 2 © e-Learning Centre, UCSC 2
5 6
1 1
© e-Learning Centre, UCSC 2 © e-Learning Centre, UCSC 3
9 0
1 1
© e-Learning Centre, UCSC 3 © e-Learning Centre, UCSC 3
1 2
2 2 2.0 - 2.9 2
3 3 > 3.0 3
1
Linear scale for department_id Linear scale for gpa 1
© e-Learning Centre, UCSC 3 © e-Learning Centre, UCSC 3
3 4
0
1 1
3 3
© e-Learning Centre, UCSC
0 1 2 3 © e-Learning Centre, UCSC
5 6
3.5 Other types of Indexes 3.5 Other types of Indexes
1 1
© e-Learning Centre, UCSC 3 © e-Learning Centre, UCSC 3
7 8
• Bitmap index is commonly used for querying on • Consider we are creating a bitmap index on column
multiple keys. C, for a particular value V and we have n number of
• Generally, this is used for relations which are consist records.
of large number of rows. • Therefore, the index contains n number of bits.
• Bitmap index can be created for every value or range • For a given record with record number i, if that
of values in single or multiple columns. record has the value V in column C, the ith bit will be
• However, those columns used to create bitmap index given 1, otherwise it will be 0.
have quite less number of unique values.
1 1
© e-Learning Centre, UCSC 3 © e-Learning Centre, UCSC 4
9 0
3.6 Index Creation and Tuning 3.6 Index Creation and Tuning
1 1
© e-Learning Centre, UCSC 4 © e-Learning Centre, UCSC 4
7 8
3.7 Physical Database Design in Relational 3.7 Physical Database Design in Relational
Databases Databases
• Analyzing the Database queries and transactions • Analyzing the Database queries and transactions
• Before design the physical structure, we should have • For retrieval query, the information given below will
a thorough idea of intended use of the database and be important
abstract knowledge about the queries that will be i. The relations that will be access by the query
used. ii. The attributes specified for the selection
• Physical design of a database should provide the condition
appropriate structure to store data and at the same iii. Type of selection condition (equal, unequal,
time it should facilitate better performance. range etc)
• The mix of queries, transactions and applications that iv. Attributes help in linking multiple tables (join
are expected to be run on the database are some conditions)
factors that database designer should consider v. Attributes retrieved by the query
before design the physical structure. • ii and iv are candidates for index creation.
• Let’s discuss about each factor in detail.
1 1
© e-Learning Centre, UCSC 4 © e-Learning Centre, UCSC 5
9 0
3.7 Physical Database Design in Relational 3.7 Physical Database Design in Relational
Databases Databases
• Analyzing the Database queries and transactions • Analyzing the Expected Frequency of Invocation of
Queries and Transactions
• When it comes to the update operation or update
transaction, we should consider, • We must consider how frequently we expect to call/
i. Files subject to update invoke a particular query.
ii. Whether it is an insert, delete or update • An aggregated list of expected frequencies for all
operation the queries and transactions along with their
iii. Which attributes are specified in the selection attributes is prepared.
condition, to update or delete.
iv. Attributes whose values are subject to change
by the update query.
• Attributes in iii are useful when creating an index.
1 1
© e-Learning Centre, UCSC 5 © e-Learning Centre, UCSC 5
1 2
3.7 Physical Database Design in Relational 3.7 Physical Database Design in Relational
Databases Databases
• Analyzing the Time Constraints of Queries and • Analyzing the Expected Frequency of the update
Transactions. queries
• Some queries and transactions have rigid time • Updating the access paths for a record itself slow
constraints. For an example, if we take a stock down the operations. Therefore, least amount of
exchange system, some of the queries required to access paths should be specified for the file that are
be completed within milliseconds. subject to frequent updates.
• Generally, primary access structures provides the
most effective way of locating a record in a file.
Hence, selection attributes in queries with time
constraints should be given a high priority when
creating primary access structures.
1 1
© e-Learning Centre, UCSC 5 © e-Learning Centre, UCSC 5
3 4
3.7 Physical Database Design in Relational 3.7 Physical Database Design in Relational
Databases Databases
1 1
© e-Learning Centre, UCSC 5 © e-Learning Centre, UCSC 5
5 6
3.7 Physical Database Design in Relational 3.7 Physical Database Design in Relational
Databases Databases
1 1
© e-Learning Centre, UCSC 5 © e-Learning Centre, UCSC 5
7 8
Activity Activity
1. What are the types of single level ordered indexes? Fill in the blanks
a. ________ 1. A file can have _____ physical ordering field.
b. ________ 2. Primary, Clustering and Secondary index are types of
c. ________ _______ level _____ indexes.
3. ________ search is possible on the index field since it has
________ values.
4. Indexing access structure is established on ______ ____.
5. Index file is usually _____ than the datafile.
1 1
© e-Learning Centre, UCSC 5 © e-Learning Centre, UCSC 6
9 0
Activity Activity
Mark whether the given statement is true false. 1. You have a file with 600,000 records (r), which is
1. An unordered file which consists two fields and limited ordered by its key field and each record of this file is
length records is known as a primary index file. ( t/f ) fixed length and unspanned. Record length (R) is 100
bytes and block size(B) is 4096.
2. for a given block in an ordered data file, the first record
in that block is known as anchor record. ( t/f ) a. What is the blocking factor?
3. Indexes that contains index entries for some records in b. How many blocks required to store this file?
the data file referred as non - dense index. ( t/f ) c. Calculate the number of block accesses required
4. Ordering key field of the index file and the primary key when performing a binary search on this file and
of the data file have same data type. ( t/f ) access data.
5. In primary index, index file contains one index
entry(a.k.a index record) for each record in the data
file.( t/f )
1 1
© e-Learning Centre, UCSC 6 © e-Learning Centre, UCSC 6
1 2
Activity Activity
1. You have a file with 400,000 records (r), which is 1. You have a file with 400,000 records (r), which is
ordered by its key field and each record of this file is ordered by its key field and each record of this file is
fixed length and unspanned. Record length (R) is 100 fixed length and unspanned. Record length (R) is 100
bytes and block size(B) is 4096. bytes and block size(B) is 4096. If you have created a
a. What is the blocking factor? primary index file with 9 bytes long ordering key field
(v) and 6 bytes long block pointer (p),
b. How many blocks required to store this file?
a. What is the blocking factor for index ?
c. Calculate the number of block accesses required
when performing a binary search on this file and b. How many blocks required for the index file?
access data. c. Calculate the number of block accesses required
when performing a binary search on index fileand
access data.
1 1
© e-Learning Centre, UCSC 6 © e-Learning Centre, UCSC 6
3 4
Activity Activity
1. A data file with 400,000 records (r) is ordered by a non- 1. Assume we search for a non-ordering key field V = 9
key field called “product_category”. The bytes long,in a file with 600,000 records with a fixed
product_category field has 750 distinct values. Record length of 100 bytes. And given block size B =8,192
length (R) is 100 bytes and block size(B) is 4096. If you bytes.
have created a primary index file on this non-key field
a. What is the blocking factor?
with 9 bytes long ordering key field (v) and 6 bytes long
block pointer (p), b. What is the required number of blocks?
a. What is the blocking factor for index ? c. How many block accesses required for a linear
search?
b. How many blocks required to store the index file?
c. Calculate the number of block accesses required
when performing a binary search on index file and
access data.
1 1
© e-Learning Centre, UCSC 6 © e-Learning Centre, UCSC 6
5 6
Activity Activity
1. Assume we create a secondary index on non-ordering 1. Assume we have a file with multi-level indexing. In the
key field V = 9 bytes long, with entries for the block first level, number of blocks b1 = 1099 and blocking
pointers P=6, in a file with 600,000 records with a fixed factor (bfri) = 273.
length of 100 bytes. And given block size B =8,192
a. Calculate the number of blocks required for second
bytes.
level.
a. What is the blocking factor for index ?
b. Calculate the number of blocks required for third
b. How many blocks required to store the index file? level.
c. Calculate the number of block accesses required c. what is the top level index(t)?
when performing a binary search on index file and
d. How many block accesses required to access a
access data.
record using this multi-level index?
1 1
© e-Learning Centre, UCSC 6 © e-Learning Centre, UCSC 6
7 8