0% found this document useful (0 votes)
3 views

Chapter 03 - Database Indexing and Tuning - converted

The lesson on Database Indexing and Tuning covers the evolution of computer memory hierarchy and indexing methods. It aims to teach about different types of computer memory, the importance of database indexing, and various indexing techniques. Key topics include disk storage, file structures, and the organization of databases.

Uploaded by

Kavini Amandi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Chapter 03 - Database Indexing and Tuning - converted

The lesson on Database Indexing and Tuning covers the evolution of computer memory hierarchy and indexing methods. It aims to teach about different types of computer memory, the importance of database indexing, and various indexing techniques. Key topics include disk storage, file structures, and the organization of databases.

Uploaded by

Kavini Amandi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Overview

• This lesson on Database Indexing and Tuning discusses the


gradual development computer memory hierarchy and
hence the evolution of the Indexing Methods.

• Here we look into types of computer memory and types of


3 : Database Indexing and Tuning indexes in detail.

IT3306 – Data Management


Level II - Semester 3

© e-Learning Centre, UCSC © e-Learning Centre, UCSC 2

Intended Learning Outcomes List of subtopics

• At the end of this lesson, you will be able to; 3.1 Disk Storage and Basic File Structures
• Describe how the different computer memories evolved. 3.1.1 Computer memory hierarchy
• Identify the different types of computer memory and the storage 3.1.2 Storage organization of databases
organizations of databases
• Recognize the importance of implementing indexes on databases 3.1.3 Secondary storage mediums
• Explain the key concepts of the different types of database indexes. 3.1.4 Solid State Device Storage
3.1.5 Placing file records on disk (types of records)
3.1.6 File Operations
3.1.7 Files of unordered records (Heap Files) and ordered
records (Sorted Files)
3.1.8 Hashing techniques for storing database records: Internal
hashing, external hashing

© e-Learning Centre, UCSC 3 © e-Learning Centre, UCSC 4

List of subtopics 3.1 Disk Storage and Basic File Structures


3.1.1. Computer Memory Hierarchy
3.2 Introduction to indexing
Introduce index files, indexing fields, index entry (record pointers
Computer Memory
and block pointers)
3.3 Types of Indexes

3.3.1 Single Level Indexes: Primary, Clustering and Secondary


indexes Not directly accessible to
Directly accessible to the
3.3.2 Multilevel indexes: Overview of multilevel indexes CPU the CPU
3.4 Indexes on Multiple Keys
3.5 Other types of Indexes
Hash indexes, bitmap indexes, function based indexes
Secondary
3.6 Index Creation and Tuning Primary Storage Tertiary Storage
Storage
3.7 Physical Database Design in Relational Databases

© e-Learning Centre, UCSC 5 © e-Learning Centre, UCSC 6

3.1 Disk Storage and Basic File Structures 3.1 Disk Storage and Basic File Structures
3.1.1. Computer Memory Hierarchy
3.1.1. Computer Memory Hierarchy
• The data collected via a computational database Now let’s explain the hierarchy given in the previous slide
should be stored in a physical storage medium. (slide number 7).
• Once stored in a storage medium, the database
management software can execute functions on that 1. Primary Storage
to retrieve, update and process the data.
• In the current computer systems, data is stored and This operates directly in the computer’s Central
moved across a hierarchy of storage media. Processing Unit.
• As for the memory organization, the memory with the Eg: Main Memory, Cache Memory.
highest speed is the most expensive option and it also
• Provides fast access to data.
has the lowest capacity.
• Limited storage capacity.
• When it comes to lowest speed memory, they are the
• Contents of primary storage will be deleted when the
options with the highest available storage capacity.
computer shuts down or in case of a power failure.
• Comparatively more expensive.

© e-Learning Centre, UCSC 7 © e-Learning Centre, UCSC 8


3.1 Disk Storage and Basic File Structures 3.1 Disk Storage and Basic File Structures

3.1.1. Computer Memory Hierarchy 3.1.1. Computer Memory Hierarchy

Primary Storage - Static RAM Primary Storage - Dynamic RAM

• Static Random Access Memory (RAM) is the memory • Dynamic Random Access Memory (DRAM) is the
where as long as power is provided. CPU's space for storing application instructions and
• Cache memory in CPU is identified as the Static RAM data.
• Data is kept as bits in its memory. • Main memory of the computer is identified as the
• The most expensive type of memory. DRAM.
• Using techniques like prefetching and pipelining, the • The advantage of the DRAM is its low cost.
Cache memory speeds up the execution of program • When it is compared with the Static RAM, the speed is
instructions for the CPU. lesser.

1
© e-Learning Centre, UCSC 9 © e-Learning Centre, UCSC
0

3.1 Disk Storage and Basic File Structures


3.1 Disk Storage and Basic File Structures
3.1.1. Computer Memory Hierarchy

2. Secondary Storage 3.1.1. Computer Memory Hierarchy


Operates external to the computer’s main memory. 2. Secondary Storage
Eg: Magnetic Disks, Flash Drives, CD-ROM • Least expensive type of storage media.
• The CPU cannot process data in secondary storage • The storage capacity is measured in:
directly. It must first be copied into primary storage - kilobytes(kB)
before the CPU can handle it.
• Mostly used for online storage of enterprise - Megabytes(MB)
databases. - Gigabytes(GB)
• With regards to enterprise databases, the magnetic
disks have been used as the main storage medium. - Terabytes(TB)
• Recently there is a trend to use flash memory for the - Petabytes (PB)
purpose of storing moderate amounts of permanent
data.
• Solid State Drive (SSD) is a form of memory that can
be used instead of a disk drive.
1 1
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
1 2

3.1 Disk Storage and Basic File Structures 3.1 Disk Storage and Basic File Structures

3.1.1. Computer Memory Hierarchy


3.1.1. Computer Memory Hierarchy
4. Flash Memory
3. Tertiary Storage
• Popular type of memory with its non-volatility.
Operates external to the computer’s main memory. • Use the technique of EEPROM (Electronically
Eg: CD - ROMs, DVDs Erasable and Programmable Read Only Memory)
• High performance memory.
• The CPU cannot process data in tertiary storage • Fast access.
directly. It must first be copied into primary storage • One disadvantage is that the entire block must be
before the CPU can handle it. erased and written simultaneously.
• Removable media that can be used as offline storage • Two Types:
falls in this category. - NAND Flash Memory
• Large capacity to store data. - NOR Flash Memory
• Comparatively less cost. • Common examples:
• Slower access to data than primary storage media.
- Devices in Cameras, MP3/MP4 Players,
Cellphones, USB Flash Drives
1 1
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
3 4

3.1 Disk Storage and Basic File Structures 3.1 Disk Storage and Basic File Structures

3.1.1. Computer Memory Hierarchy 3.1.1. Computer Memory Hierarchy

5. Optical Drives 6. Magnetic Tapes


• Most popular type of Optical Drives are CDs and DVDs. • Used for archiving and as a backup storage of data.
• Capacity of a CD is 700-MB and DVDs have capacities • Note that Magnetic Disks (400 GB–8TB) and Magnetic
ranging from 4.5 to 15 GB. Tapes (2.5TB–8.5TB) are two different storage types.
• CD - ROM reads the data by laser technology. They
cannot be overwritten.
• CD-R(compact disk recordable) and DVD-R: Allows to
store data which can be read as many times as
required.
• Currently this type of storage is comparatively declining
due to the popularity of the magnetic disks.

1 1
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
5 6
Activity 3.1 Disk Storage and Basic File Structures
3.1.2. Storage Organization of Databases
Categorize the following devices as Primary, Secondary or • Usually databases have Persistent data. This means
Tertiary Storage Media. large volumes of data stored over long periods of
1. Random Access Memory time.
2. Hard Disk Drive • These persistent data are continuously retrieved and
3. Flash Drive processed in the storage period.
4. Tape Libraries • The place where the databases are stored
5. Optical Jukebox permanently in the computer memory is the
6. Magnetic Tape secondary storage.
7. Main Memory • Magnetic disks are widely used here since:
- If the database is too large, it will not fit in the
main memory.
- Secondary storage is non-volatile, but the
main memory is volatile.
- The cost of storage per unit of data is lesser in
secondary storage.
1 1
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
7 8

3.1 Disk Storage and Basic File Structures 3.1 Disk Storage and Basic File Structures

3.1.2. Storage Organization of Databases 3.1.2. Storage Organization of Databases


• Solid State Drive (SSD) is one of the latest
• Physical Database Design is a process that entails
technologies identified as an alternative for
selecting the techniques that best suit the
magnetic storage disks.
application requirements from a variety of data
• However, it is expected that the primary option for
organizing approaches.
the storage of large databases will continue to be
• When designing, implementing, and operating a
the magnetic disks.
database on a certain DBMS, database designers
• Magnetic tapes are also used for database backup
and DBAs must be aware of the benefits and
purposes due to their comparatively lower cost.
drawbacks of each storage medium.
• But the data in them need to be loaded and read
before processing. Opposing to this, magnetic disks
can be accessed directly at anytime.

1 2
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
9 0

3.1 Disk Storage and Basic File Structures 3.1 Disk Storage and Basic File Structures
3.1.2. Storage Organization of Databases
3.1.2. Storage Organization of Databases

• The data on disk is grouped into Records or Files. • Primary File Organization defines how the data is
• These records include data about entities, attributes stored physically in the disk and how they can be
and relationships. accessed.
• Whenever a certain portion of the data retrieved from
the DB for processing, it needs to be found on disk, File Organization Description
copied to main memory for processing, and then
rewritten to the disk if the data gets updated. Heap File No particular order in storing data.
• Therefore, the data should be kept on disk in a way Appends new records to the end.
that allows them to be quickly accessed when they are Sorted File Maintains an order for the records by
needed. sorting data on a particular field.
Hashed File Uses the hash function of a field to identify
the record’s place in the database.
B Trees Use Tree structures for record storing.
2 2
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
1 2

3.1 Disk Storage and Basic File Structures


Activity
3.1.3. Secondary Storage Media

State whether the following statement are true or false. • The device that holds the magnetic disks is the Hard
Disk Drive (HDD).
1. The place of permanently storing databases is the • Basic unit of data on a HDD is the Bit. Bits together
primary storage. make Bytes. One character is stored using a single
2. A Heap File has a specific ordering criterion where the byte.
new records are added at the end. • Capacity of a disk is the number of bytes the disk
3. Upon retrieval of data from a file, it needs to be found can store.
on disk and copied to main memory for processing. • Disks are composed of magnetic material in the
4. The database administrators need to be aware of the shape of a thin round disk, with a plastic or acrylic
physical structuring of the database to identify whether cover to protect it.
they can be sold to a client. • Single Sided Disk stores information on one of its
5. Solid State Drives are identified alternatives for surfaces.
magnetic disks. • Double Sided Disk stores information on both sides
of its surfaces.
• A few disks assembled together makes a Disk Pack
which has higher storage capacity.
2 2
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
3 4
3.1 Disk Storage and Basic File Structures 3.1 Disk Storage and Basic File Structures
3.1.3. Secondary Storage Media
3.1.3. Secondary Storage Media

• On a disk surface, information is stored in


concentric circles of small width, each with its own Hardware components
diameter. on disk:
• Each of these circles is called a Track.
• A Cylinder is a group of tracks on different surfaces a) A single-sided disk
of a disk pack that have the same diameter.
with read/write
• Retrieval of data stored on the same Cylinder is
faster compared to data stored in different hardware.
Cylinders.
• A track is broken into smaller Blocks or Sectors b) A disk pack with
since it typically includes a vast amount of data. read/write.

2 2
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
5 6

3.1 Disk Storage and Basic File Structures 3.1 Disk Storage and Basic File Structures

3.1.3. Secondary Storage Media


3.1.3. Secondary Storage Media • During disk formatting, the operating system divides a
track into equal-sized Disk Blocks (or pages). The
size of each block is fixed and cannot be adjusted
Different sector organizations dynamically.
• Interblock Gaps, which are fixed in size and contain
on disk:
specifically coded control information recorded during
disk formatting.
(a) Sectors subtending a fixed • Hardware Address of a Block is the combination of a
angle cylinder number, track number (surface number inside
the cylinder on which the track is placed), and block
(b) Sectors maintaining a number (within the track).
uniform recording density • Buffer is one disk block stored in a reserved region in
primary storage.
• Read Command - Disk block is copied into the buffer
• Write Command - Contents of the buffer are copied
into the disk block.
2 2
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
7 8

3.1 Disk Storage and Basic File Structures Activity

3.1.3. Secondary Storage Media Match the description with the relevant technical term out
• A collection of several shared blocks is called a of the following.
Cluster [Capacity, Track, Buffer, Hardware Address of a Block,
• The hardware mechanism that reads or writes a Cluster]
block of data is the Read / Write Head
• An electronic component is coupled to a mechanical
arm in a read/write head. 1. The concentric circles on a disk where information is
• Fixed Head Disks - The read/write heads on disk stored.
units are fixed, with as many heads as there are 2. Combination of a cylinder number, track number, and
tracks. block number
• Movable Head Disks - Disk units with an actuator 3. Number of Bytes that a disk can store
connected to a second electrical motor that moves 4. Collection of shared blocks
the read/write heads together and accurately 5. A disk block stored in a reserved location in primary
positions them over the cylinder of tracks defined in storage.
a block address.

2 3
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
9 0

3.1 Disk Storage and Basic File Structures 3.1 Disk Storage and Basic File Structures

3.1.4. Solid State Device Storage 3.1.4. Solid State Device Storage
• Solid State Device (SSD) Storage is sometimes • As opposed to HDD, where Blocks and Cylinders
known as Flash Storage. should be pre-assigned for storing data, any
• They have the ability to store data on secondary address on an SSD can be directly addressed,
storage without requiring constant power. since there are no restrictions on where data can be
• A controller and a group of interconnected flash stored.
memory cards are the essential components of an • With this direct access, data is less likely to be
SSD. fragmented, and the need for restructuring is not
• SSDs can be plugged into slots already available for available.
mounting Hard Disk Drives (HDDs) on laptops and • Dynamic Random Access Memory (DRAM)-based
servers by using form factors compatible with HDDs. SSDs are also available in addition to flash
• SSDs are identified to be more durable, run silently, memory.
faster in terms of access time, and delivers better • DRAM based SSDs are more expensive than flash
transfer rates than HDD because there are no memory, but they provide faster access. However,
moving parts. they need an internal power supplier to perform.
3 3
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
1 2
Activity 3.1 Disk Storage and Basic File Structures

State four key features of a Solid State Drive (SSD). 3.1.5. Placing File Records on Disk
1.____________________ Explanation can be found next slide.
2.____________________
Field
3.____________________
4.____________________
Emp_No Name Data_Of_Birth Position Salary

0001 Nimal 1971 - 04 - 13 Manager 70,000

0005 Krishna 1980 - 01 - 25 Supervisor 50,000

Employee Relation

Value
Record
3
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
3

3.1 Disk Storage and Basic File Structures 3.1 Disk Storage and Basic File Structures
3.1.5. Placing File Records on Disk
3.1.5. Placing File Records on Disk
• As shows in the previous slide, columns of the table
are called fields; rows are called records; each cell
data item is called value. Create table Employee
• The Data Type is one of the standard data types that (
are used in programming. Emp_No Int,
- Numeric (Integer, Long Integer, Floating Point) Name Char (50),
- Characters / Strings (Fixed length, varying Date_Of_Birth Date, An example for the
length) Position Char (50), Creation of
Salary Int Employee relation
- Boolean (True or False and 0 or 1)
); using MySQL with
- Date, Time data types.
• For a particular computer system, the number of bytes
necessary for each data type is fixed.

3 3
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
5 6

Activity 3.1 Disk Storage and Basic File Structures

Select the Data Type that best matches the description out of 3.1.5. Placing File Records on Disk
the following.
[Integer, Floating Point, Date and Time, Boolean, Character] • File is a sequence of records. Usually all records in a
file belong to the same record type.
• If the size of each record in the file is the same (in
1. NIC Number of Sri Lankans bytes) the file is known to be made up of Fixed
2. The access time of users for the Ministry of Health website Length Records.
within a week • Variable Length Records means that different
3. The number of students in a class records of the file are of different sizes.
4. Cash balance of a bank account • Reasons to have variable length records in a file:
5. Response to the question by a set of students whether - One or more fields are of different sizes.
they have received the vaccination for Rubella. - For individual records, one or more of the fields
may have multiple values (Repeating Group/
Field)
- One or more fields are optional (Optional Fields)
- File includes different record types (Mixed File)
3 3
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
7 8

3.1 Disk Storage and Basic File Structures 3.1 Disk Storage and Basic File Structures

3.1.5. Placing File Records on Disk 3.1.5. Placing File Records on Disk

• In a Fixed Length Record; • In a Variable Length Record;


• The system can identify the starting byte location • Each field in each record contains a value, but the
of each field relative to the starting position of the precise length of some field values is not correctly
record since each record has equal fields and known.
field lengths. This makes it easier for programs • To determine and terminate variable lengths
that access such files to locate field values. special characters can be used.
• However, variable length records can also be • They represent the number of bytes for a particular
stored as fixed length records. record in each field.
• By assigning “Null” for optional fields where data • Separators that can be used are: ?, $, %
values are not available.
• By assigning the maximum possible number of
records for each repeating group.
• In each if these cases, the space is wasted.

3 4
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
9 0
3.1 Disk Storage and Basic File Structures 3.1 Disk Storage and Basic File Structures

3.1.5. Placing File Records on Disk 3.1.5. Placing File Records on Disk

Record Storage Format 1 Record Storage Format 2


Eg: A fixed-length record with four fields and size of 44 bytes. Eg: A record with two variable-length fields (Name and
Department) and two fixed-length fields (NIC and Job_Code ).
Separator Character is used to mark the record separation.

4 4
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
1 2

3.1 Disk Storage and Basic File Structures 3.1 Disk Storage and Basic File Structures

3.1.5. Placing File Records on Disk 3.1.5. Placing File Records on Disk
• In a record with Optional fields;
Record Storage Format 3 • A series of <Field-Name, Field-Value> pairs can
Eg: A variable-field record with three types of separator be added in each record instead of the field values
characters if the overall number of fields for the record type is
high but the number of fields that actually occur in
a typical record is low.
• It will be more practical to store a Field Type
code, to each field and include in each record a
series of <Field-Type, Field-Value>.
• In a record with a Repeating Field;
• One separator character can be used to separate
the field's repeated values and another separator
character can be used to mark the field's end.

4 4
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
3 4

Activity
3.1 Disk Storage and Basic File Structures
Fill in the blanks in the following statements.
1. A file where the sizes of records in it are different in size 3.1.5. Placing File Records on Disk
is called a _______________. • Block is a unit of data transfer between disk and
memory.
• When the block size exceeds the record size, each
2. A _________________ includes different types of block will contain several records, however, certain
records inside it. files may have exceptionally large records that cannot
fit in a single block.
• Blocking Factor (bfr) is the number of records per
3. In a file, the records belong to _________ record type. block in bytes.
• If Block Size> Record Size,
4. A ___________ length record can be made by assigning bfr can be calculated using the below equation.
“Null” for optional fields where data values are not
available
bfr = B / R
5. To determine and terminate variable lengths special Block Size = B bytes
characters named as __________ can be used. 4
Record Size = R bytes 4
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
5 6

3.1 Disk Storage and Basic File Structures 3.1 Disk Storage and Basic File Structures

3.1.5. Placing File Records on Disk 3.1.5. Placing File Records on Disk
• In calculating the bfr a floor function rounds down the • Upon using the unused space, to minimize waste of
number to the nearest integer. space, a part of a record can be stored in one block
• But, when the bfr is calculated, there may be some and the other part can be stored in another block.
additional space remaining in each block. • If the next block on disk is not the one holding the
• The unused space can be calculated with the equation remainder of the record, a Pointer at the end of the
given below. first block refers to it.
• Spanned Organization of Records - One record
spanning to more than one block.
• Used when a record is larger than the block size.
Unused Space in bytes = B - bfr* R • Unspanned Organization of Records - Not allowing
records to span into more than one block.
• Used with fixed length records.
Block size Space dedicated
for blocks
4 4
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
7 8
3.1 Disk Storage and Basic File Structures 3.1 Disk Storage and Basic File Structures

3.1.5. Placing File Records on Disk 3.1.5. Placing File Records on Disk

Let’s look at the representation of Spanned and Unspanned • A spanned or unspanned organization can be utilized
Organization of Records. in variable-length records.
• If it is a spanned organization, each block may store a
different number of records.
• Here the bfr would be the average number of records
per block.
• Hence the number of blocks b needed for a file of r
records is,

b = r / bfr

4 5
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
9 0

3.1 Disk Storage and Basic File Structures 3.1 Disk Storage and Basic File Structures

3.1.5. Placing File Records on Disk 3.1.5. Placing File Records on Disk

Example of Calculation Example of Calculation Continued…


There is a disk with block size B=256 bytes. A file has
r=50,000 STUDENT records of fixed-length. Each (ii) Calculate the blocking factor (bfr)
record has the following fields:
NAME (55 bytes), STDID (4 bytes),
Blocking factor bfr = floor (B/R)
DEGREE(2 bytes), PHONE(10 bytes),
SEX (1 byte). = floor(256/72)
= 3 records per block
(i) Calculate the record size in Bytes.
Floor Function = Rounds the value
Record Size R = (55 + 4 + 2 + 10 + 1) = 72 bytes down to the previous integer.
5 5
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
1 2

3.1 Disk Storage and Basic File Structures


3.1 Disk Storage and Basic File Structures
3.1.5. Placing File Records on Disk
3.1.5. Placing File Records on Disk
• A File Header, also known as a File Descriptor,
Example of Calculation Continued... includes information about a file that is required by
(iii) Calculate the number of file blocks (b) required to the system applications which access the file
store the STUDENT records, assuming an unspanned records.
organization. • For fixed-length unspanned records, the header
contains information to determine the disk addresses
of the blocks, as well as record format descriptions,
Number of blocks needed for file = ceiling(r/bfr) which may include field lengths and the order of
fields within a record, and field type codes, separator
= ceiling(50000/3) characters, and record type codes for variable-length
= 16667 records.
• One or more blocks are transferred into main
memory buffers to search for a record on disk.
Ceiling Function = Rounds the • The search algorithms must do a Linear Search
value up to the next integer. over the file blocks if the address of the block
containing the requested record is unknown.
5 5
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
3 4

Activity
3.1 Disk Storage and Basic File Structures
State the answer for the following calculations.
3.1.6 File Operations
Consider a disk with block size B=512 bytes. A file
has r=30,000 EMPLOYEE records of fixed-length.
Each record has the following fields: NAME (30 bytes), Operations on
NIC (9bytes), DEPARTMENTCODE (9 bytes), Files
ADDRESS (40 bytes), PHONE (9 bytes),BIRTHDATE
(8 bytes), SEX (1 byte), JOBCODE (4 bytes), SALARY
(4 bytes, real number). An additional byte is used as a Retrieval Update
deletion marker. Operations Operations

Does not change any Changes the file by


(i) Calculate the record size in Bytes. data in the file. But insertion, deletion or
(ii) Calculate the blocking factor (bfr) locate a certain modification of a certain
record based on the record based on the
(iii) Calculate the number of file blocks (b) required to
selection / filtering selection / filtering
store the EMPLOYEE records, assuming an
unspanned organization. condition. condition.
5
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
5
5
3.1 Disk Storage and Basic File Structures 3.1 Disk Storage and Basic File Structures

3.1.6. File Operations 3.1.6. File Operations

Emp_No Name Data_Of_Birth Position Salary • When several file records meet a search criterion,
the first record in the physical sequence of file
records is identified and assigned as the Current
0001 Nimal 1971 - 04 - 13 Manager 70,000 Record. Following search operations will start with
this record and find the next record in the file that
0005 Krishna 1980 - 01 - 25 Supervisor 50,000 meets the criterion.

• Simple Selection Condition • The actual procedures for identifying and retrieving
file records differ from one system to the next.
Search for the record where Emp_No = “0005”
• Complex Selection Condition
Search for the record where Salary>60,000

5 5
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
7 8

3.1 Disk Storage and Basic File Structures 3.1 Disk Storage and Basic File Structures
3.1.6. File Operation 3.1.6. File Operations

The following are the File Access Operations.


Operation Description
Operation Description Read (Get) Copies the current record from the buffer to
a user-defined program variable. The
Open Allows to read or write to a file. Sets the file current record pointer may also be
pointer to the file's beginning. advanced to the next record in the file using
Reset Sets the file pointer of an open file to the this command.
beginning of the file. FindNext Searches the file for the next entry that
Find (Locate) The first record that meets a search meets the search criteria. The block holding
criterion is found. The block holding that that record is transferred to a main memory
record is transferred to a main memory buffer.
buffer. The file pointer is set to the buffer Delete The current record is deleted, and the file
record, which becomes the current record. on disk is updated to reflect the deletion.

5 6
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
9 0

3.1 Disk Storage and Basic File Structures 3.1 Disk Storage and Basic File Structures

3.1.6. File Operations 3.1.6. File Operations

Operation Description • The following is called “Record at a time” operation


since it is applied to a single record.
Modify Modifies some field values for the current
record and the file on disk is updated to
reflect the modification.
Operation Description
Insert Locates the block where the record is to be
inserted and transfers that block into a main Scan Scan returns the initial record if the file
memory buffer to insert a new record in the has just been opened or reset;
file and the file on disk is updated to reflect otherwise, it returns the next record.
the insertion.
Close Releases the buffers and does any other
necessary cleaning actions to complete the
file access.
6 6
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
1 2

3.1 Disk Storage and Basic File Structures 3.1 Disk Storage and Basic File Structures

3.1.6. File Operations 3.1.6. File Operations

• The following are called “Set at a time” operations • File Organization - The way a file's data is
since they are applied to the file in full. organized into records, blocks, and access
structures, including how records and blocks are
Operation Description put on the storage media and interconnected.

FindAll Locates all the records in the file that


satisfy a search condition. • Access Methods - A set of operations that may be
applied to a file is provided. In general, a file
FindOrdered Locates all the records in the file in a structured using a specific organization can be
specified order condition. accessed via a variety of techniques.
Reorganize Starts the reorganization process. (In
cases such as ordering the records)

6 6
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
3 4
Activity
3.1 Disk Storage and Basic File Structures
Match the following descriptions with the relevant file
3.1.6. File Operations operation out of the following.

• Static Files - The files on which modifications are


rarely done. [Find, Reset, Close, Scan, FindAll]

• Dynamic Files - The files on which modifications 1. Returns the initial record if the file has just been
are frequently done. opened or reset; otherwise, returns the next record.
2. Releases the buffers and does any other necessary
• Read Only File - A file where modifications cannot cleaning actions
be done by the end user. 3. Sets the file pointer of an open file to the beginning of
the file
4. The first record that meets a search criterion is found
5. Locates all the records in the file that satisfy a search
condition.
6 6
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
5 6

3.1 Disk Storage and Basic File Structures 3.1 Disk Storage and Basic File Structures

3.1.7. Files of unordered records (Heap Files) and 3.1.7. Files of unordered records (Heap Files) and
ordered records (Sorted Files) ordered records (Sorted Files)

Files of Unordered Records (Heap Files) Files of Unordered Records (Heap Files)

• Records are entered into the file in the order in • If just one record meets the search criteria, the
which they are received, thus new records are program will typically read into memory and search
placed at the end. half of the file blocks before finding the record. Here,
• Inserting a new record is quick and efficient. The on average, searching (b/2) blocks for a file of b
file's last disk block is transferred into a buffer, blocks is required.
where the new record is inserted before the block is • If the search criteria is not satisfied by any records or
overwritten to disk. Then the final file block's there are many records, the program must read and
address is saved in the file header. search all b blocks in the file.
• Searching for a record is done by the Linear
Search.
6 6
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
7 8

3.1 Disk Storage and Basic File Structures 3.1 Disk Storage and Basic File Structures

3.1.7. Files of unordered records (Heap Files) and 3.1.7. Files of unordered records (Heap Files) and
ordered records (Sorted Files) ordered records (Sorted Files)

Files of Unordered Records (Heap Files) Files of Unordered Records (Heap Files)

• Deleting a Record. • Deleting a Record cont.


• A program must first locate its block, copy the block • Deletion Marker - An extra byte or bit stored with
into a buffer, remove the record from the buffer, and every record whereas the deletion marker will get
then rewrite the block back to the disk to delete a a certain value when the record is deleted. This
record. value is not similar to the value that the deletion
• This method of deleting a large number of data marker holds when there is data available in the
results in waste of storage space. record.
• Using the space of deleted records to store data
can also be used. But it includes additional work.

6 7
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
9 0

3.1 Disk Storage and Basic File Structures 3.1 Disk Storage and Basic File Structures

3.1.7. Files of unordered records (Heap Files) and 3.1.7. Files of unordered records (Heap Files) and
ordered records (Sorted Files) ordered records (Sorted Files)

Files of Unordered Records (Heap Files) Files of Unordered Records (Heap Files)

• Modifying a Record. • Reading a Record.


• Because the updated record may not fit in its • A sorted copy of the file is produced to read all
former space on disk, modifying a variable-length entries in order of the values of some field.
record may require removing the old record and Because sorting a huge disk file is a costly task,
inserting the modified record. specific approaches for external sorting are
employed.

7 7
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
1 2
3.1 Disk Storage and Basic File Structures 3.1 Disk Storage and Basic File Structures

3.1.7. Files of unordered records (Heap Files) and 3.1.7. Files of unordered records (Heap Files) and
ordered records (Sorted Files) ordered records (Sorted Files)

Files of Ordered Records (Sorted Files) Files of Ordered Records (Sorted Files)

• The values of one of the fields of a file's records, • Benefits of Ordered records:
called the Ordering Field can be used to physically • Because no sorting is necessary, reading the
order the data on disk. It will generate an ordered or records in order of the ordering key values
sequential file. becomes highly efficient.
• Ordered records offer a few benefits over files that are • Because the next item is in the same block as the
unordered. current one, locating it in order of the ordering key
• The benefits are listed in the next slide. typically does not need any extra block
visits.When the binary search approach is
employed, a search criterion based on the value
of an ordering key field results in quicker access.
7 7
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
3 4

3.1 Disk Storage and Basic File Structures Activity


Fill in the blanks with the correct technical term.
3.1.8. Hashing techniques for storing database
records: Internal hashing, external hashing
1. The _____________________ is an extra byte that is
• Another kind of primary file structure is Hashing, stored with a record which will get updated when a record
which allows for extremely quick access to is deleted.
information under specific search conditions.
• The equality requirement on a single field, termed
the Hash Field, must be used as the search 2. A field which can generate an ordered or sequential file by
condition. physically ordering the records is called ______________.
• The hash field is usually also a key field of the file,
in which case it is referred to as the hash key. 3. The function which calculates the Hash value of a field is
• The concept behind hashing is to offer a function h, called ______________.
also known as a Hash Function or randomizing
function, that is applied to a record's hash field
value and returns the address of the disk block 4. Searching for a record in a Heap file is done by the
where the record is stored. ____________.
7 7
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
5 6

3.1 Disk Storage and Basic File Structures 3.1 Disk Storage and Basic File Structures
3.1.8. Hashing techniques for storing database records:
3.1.8. Hashing techniques for storing database records:
Internal hashing, external hashing
Internal hashing, external hashing
• Internal Hashing.
• Internal Hashing.
When it comes to internal files, hashing is usually
done with a Hash Table and an array of records.
• Method 1 for Internal Hashing
• If the array index range is 0 to m – 1, there are m slots
with addresses that correspond to the array indexes.
• Then a hash function is selected that converts the
value of the hash field into an integer between 0 and
m-1.
• The record address is then calculated using the given
function.
• h(K) = Hash Function of K Value Internal Hashing Data Structure - Array of m positions to use in
• K = Field Value h(K) = K Mod m internal hashing
7 7
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
7 8

3.1 Disk Storage and Basic File Structures 3.1 Disk Storage and Basic File Structures

3.1.8. Hashing techniques for storing database records: 3.1.8. Hashing techniques for storing database records:
Internal hashing, external hashing Internal hashing, external hashing
• Internal Hashing. • Internal Hashing.

• Method 2 for Internal Hashing • Method 3 for Internal Hashing


• By using algorithms that calculate the Hash Function • Folding - To compute the hash address, an arithmetic
function such as addition or a logical function such as
Exclusive OR (XOR) is applied to distinct sections of
temp ← 1;
the hash field value.
for i ← 1 to 20 do temp ← temp * code(K[i ] ) mod M ;
hash_address ← temp mod M;

Hashing Algorithm in applying the mod hash function to a


character string K.
7 8
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
9 0
3.1 Disk Storage and Basic File Structures 3.1 Disk Storage and Basic File Structures
3.1.8. Hashing techniques for storing database
3.1.8. Hashing techniques for storing database records: records: Internal hashing, external hashing
Internal hashing, external hashing • Internal Hashing.
• Internal Hashing. • Methods of Collision Resolution
• Collision - When the hash field value of a record that • Open Addressing - The program scans the
is being inserted hashes to an address that already subsequent locations in order until an unused
holds another record. (empty) position is discovered, starting with the
• Because the hash address is already taken, the new occupied position indicated by the hash address.
record must be moved to a different location. • Chaining - Changing the pointer of the occupied
• Collision Resolution - The process of finding another hash address location to the address of the new
location. record in an unused overflow location and putting
• There are several methods for collision resolution. the new record in an unused overflow location.
• Multiple Hashing - If the first hash function fails,
the program uses a second hash function. If a new
collision occurs, the program will utilize open
addressing or a third hash function, followed by
open addressing if required.

8 8
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
1 2

3.1 Disk Storage and Basic File Structures 3.1 Disk Storage and Basic File Structures
3.1.8. Hashing techniques for storing database 3.1.8. Hashing techniques for storing database records:
records: Internal hashing, external hashing Internal hashing, external hashing
• External Hashing.
• External Hashing. The following diagram shows matching bucket
• Hashing for disk files is named as External numbers (0 to M -1) to disk block addresses.
Hashing.
• The target address space is built up of Buckets,
each of which stores many records, to match the
properties of disk storage.
• A bucket is a continuous group of disk blocks or a
single disk block.
• Rather than allocating an absolute block address to
the bucket, the hashing function translates a key to a
relative bucket number.
• The bucket number is converted into the matching
disk block address via a table in the file header.

8 8
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
3 4

3.1 Disk Storage and Basic File Structures 3.1 Disk Storage and Basic File Structures
3.1.8. Hashing techniques for storing database 3.1.8. Hashing techniques for storing database records:
records: Internal hashing, external hashing Internal hashing, external hashing
• External Hashing.
• External Hashing. Handling overflow for buckets by chaining
• Since many records will fit in a bucket can hash to
the same bucket without generating issues, the
collision problem is less severe with buckets.
• When a bucket is full to capacity and a new record is
entered, a variant of chaining can be used in which a
pointer to a linked list of overflow records for the
bucket is stored in each bucket.
• Here, the linked list pointers should be Record
Pointers, which comprise a block address as well
as a relative record position inside the block.

8 8
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
5 6

Activity 3.2 Introduction to indexing


Match the description with the correct term.
• Indexes are used to speed up record retrieval in
response to specific search criteria.
1. Applying an arithmetic function such as addition or a
• The index structures are extra files on disk that provide
logical function such as Exclusive OR (XOR) to distinct
secondary access pathways, allowing users to access
sections of the hash field value.
records in different ways without changing the physical
2. The technique used for hashing where the program uses
a second hash function if first hash function fails. location of records in the original data file on disk.
3. The instance when the value of the hash field of a newly • They make it possible to quickly access records using
inserted record hashes to an address that already the Indexing Fields that were used to create the index.
contains another record. • Any field in the file can be used to generate an index,
4. Starting with the occupied place given by the hash and the same file can have numerous indexes on
address, the program examines the succeeding locations separate fields as well as indexes on multiple fields.
in succession until an unused (empty) spot is located
when a collision has occurred.
5. A continuous group of disk blocks or a single disk block
which is comprising of the target address space.

8 8
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
7 8
3.2 Introduction to indexing 3.3 Types of Indexes

• Some Commonly used Types of Indexes • Single Level Indexes: Primary, Clustering and
• Single Level Ordered Indexes Secondary indexes
• Primary Index • Primary, Clustering and Secondary index are types
• Secondary Index of single level ordered indexes.
• Clustering Index • In some books, the last pages have ordered list of
• Multi Level Tree Structured Indexes words, which are categorized from A-Z. In each
• B Trees category they have put the word, as well as the page
• B+ Trees numbers where that particular word exactly appears.
• Hash Indexes These list of words are known as index.
• Logical Indexes • If a reader needs to find about a particular term,
• Multi Key Indexes he/she can go to the index and find the pages where
• Bitmap Indexes the term appears first and then can go through the
particular pages.
• Otherwise readers have to go through the whole
book, searching the term, which is similar to the
linear search.
8 9
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
9 0

3.3 Types of Indexes 3.3 Types of Indexes

• Single Level Indexes: Primary, Clustering and • Single Level Indexes: Primary, Clustering and
Secondary indexes Secondary indexes
• Primary Index - defined for an ordered file of • A file can have maximum of one physical ordering
records using the ordering key field. field. Therefore, a file can have one primary index or
• File records on a disk are physically ordered by the one clustering index. However, it cannot hold both
ordering key field. This ordering key field holds primary index and clustered index at once.
unique values for each record. • Unlike the primary indexes, a file can have few
• Clustering index is applied when multiple records in secondary indexes additional to the primary index.
the file have same value for the ordering field; here
the ordering field is a non key field. In this scenario,
data file is referred as clustered file.

9 9
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
1 2

3.3 Types of Indexes 3.3 Types of Indexes

• Single Level Indexes: Primary indexes • Single Level Indexes: Primary indexes
• Primary indexes are access structures that used to • As mentioned before, index entry consist of two values.
increase the efficiency of searching and accessing i. Primary key field value of the first record in a data
the data records in a data file. block.
• An ordered file which consists two fields and
limited length records is known as a primary index
file. i. Pointer to the data block which contains above
• One field is the ordering key field. Ordering key primary key field.
field of the index file and the primary key of the data
file have same data type.
• The other field contains pointers to the disk blocks.
• Hence, the index file contains one index entry(a.k.a for index entry i, two field values can be referred as,
index record) for each block in the data file.
<K(i) P(i)>

9 9
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
3 4

3.3 Types of Indexes 3.3 Types of Indexes

• Single Level Indexes: Primary indexes Primary index on the


ordering key field
• Ex: Assuming that “name” is a unique field and the
“name” has been used to order the data file, we can
create index file as follows.
<K(1) = (Aaron, Ed), P(1) = address of block 1>
<K(2) = (Adams, John), P(2) = address of block 2>
<K(3) = (Alexander, Ed), P(3) = address of block 3>

The image given in the next slide illustrates the index file and
respective block pointers to the data file.

9 9
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
5 6
3.3 Types of Indexes 3.3 Types of Indexes

• Single Level Indexes: Primary indexes • Single Level Indexes: Primary indexes
• Dense index and Sparse index
• In the given illustration of the previous slide, i. Indexes that contain index entries for each
number of index entries in the index file is record in the data file (or each search key value)
equal to the number of disk blocks in the data referred to as dense index.
file.
ii. Indexes that contains index entries for some
• Anchor record/Block anchor: for a given block in records in the data file referred as sparse
an ordered data file, the first record in that block is index.
known as anchor record. Each block has an anchor
record. • Therefore, by definition, primary index falls into the
sparse (or the non - dense) index type since it does
not keep index entries for every record in the data
file. Instead, primary index keep index entries for
anchor records for each block which contains data
file.

9 9
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
7 8

3.3 Types of Indexes 3.3 Types of Indexes

• Single Level Indexes: Primary indexes


• Single Level Indexes: Primary indexes
• Block accesses for an ordered file with b blocks can
• Generally, a primary indexing file takes smaller be calculated by using log2 b.
space compared to the datafile due to two reasons.
• Let’s assume that we want to access a record with
i. Number of index entries are smaller than the primary key value is K , which resides on the block
number of records in the data file. address is P(i), where K(i) ≤ K < K(i + 1).
ii. Index entry holds two fields which are • Since the physical ordering of the data file is depends
comparatively very short in size. on the primary key, all records of K(i) resides in the the
• Hence, performing a binary search on an index file ith block.
results in less number of block accesses when • Therefore, to retrieve the record corresponding to
compared to the binary search performed on a data given K value, a binary search is performed on the
file. index file to find the index entry for i.
• Then we can get the block address for the P (i) and
retrieve the record.
1
9
© e-Learning Centre, UCSC © e-Learning Centre, UCSC 0
9
0

3.3 Types of Indexes 3.3 Types of Indexes

• Single Level Indexes: Primary indexes • Single Level Indexes: Primary indexes
Ex: Let’s say we have an ordered file with its key field. File Ex: For the previous scenario given, if we have a primary
records are of fixed size and are unspanned. Following index file with 9 bytes long ordering key field (V) and 6 bytes
details are given and we are going to calculate the block long block pointer (P), the required block accesses can be
accesses require when performing a binary search, calculated as follows.
number of records r = 300,000 number of records r = 300,000
block size B = 4,096 bytes block size B = 4,096 bytes
record length R = 100 bytes index entry length Ri = (V+P)= 15
We can calculate the blocking factor, We can calculate the blocking factor for index,
bfr = (B/R)= floor(4,096/100) = 40 records per block bfr = (B/R)= floor(4,096/15) = 273 records per block
Hence, the number of blocks needed to store all records Number of index entries required is equal to number of blocks
required for data file.
b = (r/bfr) = ceiling(300,000/40)= 7,500 blocks. Hence, number of blocks needed for index file,
Block accesses required = log2 b bi = (r/bfr) = ceiling(7,500/273)= 28 blocks.
= ceiling(log2 7,500)= 13 Go to the next slide for the rest of the calculation
1 1
© e-Learning Centre, UCSC 0 © e-Learning Centre, UCSC 0
1 2

3.3 Types of Indexes 3.3 Types of Indexes

block accesses required • Single Level Indexes: Primary indexes


= log2 bi= ceiling(log2 28)= 5
• Primary indexing has problems when we add new
However to access the record using the index, we records to or delete existing records from an ordered
have to perform binary search on the index file plus file.
one additional access to retrieve the record. • If a new record is inserted to its correct position
according to the order, existing records in the data
• Therefore the formula for total number of block file might subject to change their index in order to
access to access the record should be, spare some space for the new record.
log2 bi + 1 accesses = 6 block accesses. • Sometimes this change result in change of anchor
records as well.
• Deletion of records also has the same issue as
insertion.

1 1
© e-Learning Centre, UCSC 0 © e-Learning Centre, UCSC 0
3 4
3.3 Types of Indexes 3.3 Types of Indexes

• Single Level Indexes: Primary indexes • Single Level Indexes: Clustering indexes

• An unordered overflow file, can be used to scale • When a datafile is ordered using a non-key field
down this problem. which does not consist of unique values, such file
• Adding a linked list of overflow records for each are known as clustered files. The field which is
block in the data file is another way to address this used to order the file is known as clustering field.
issue.
• Deletion markers can be used to manage the • Clustering index accelerate the retrieval of all
issues with record deletion. records whose clustering field (field that is used to
order the data file) has same value.
• In primary index the the ordering field consist of
distinct values unlike the clustering index.

1 1
© e-Learning Centre, UCSC 0 © e-Learning Centre, UCSC 0
5 6

3.3 Types of Indexes 3.3 Types of Indexes

• Single Level Indexes: Clustering indexes • Single Level Indexes: Clustering indexes

• Since the data file is ordered, entering and deleting


• Clustering index also consists of two fields. One is
records still causes problems in the clustering index
for the clustering field of the data file and the second
as well.
one is for block pointers.
• A common method to address this problem is to
• In the index file, there is only one entry for distinct
assign an entire block (or a set of neighbouring
values in the clustering field with a pointer to the
blocks) for each value in the clustering field.
first block where the record corresponding to the
clustering field appear. • All records that have similar clustering field will be
stored in that allocated block.
• This method ease the insertion and deletion of the
records.

1 1
© e-Learning Centre, UCSC 0 © e-Learning Centre, UCSC 0
7 8

3.3 Types of Indexes 3.3 Types of Indexes

• Single Level Indexes: Clustering indexes Clustering Index

• This problem can be scaled down using an


unordered overflow file.
• Adding a linked list of overflow records for each
block in the data file is another way to address this
issue.
• Deletion markers can be used to manage the
issues with record deletion.
• Clustering index also falls into the sparse index type
since the index field contains entries for distinct
values of the ordering key field in the data file, rather
than each and every record in the ordering key field.

1 1
© e-Learning Centre, UCSC 0 © e-Learning Centre, UCSC 1
9 0

3.3 Types of Indexes


3.3 Types of Indexes
• Single Level Indexes: Clustering indexes
Clustering Index with allocation of
blocks for distinct values in the Ex: For the same ordered file with r = 300,000, B = 4,096
ordered key field. bytes, let’s say we have used a field “Zip code“ which is non
key field, to order the data file.
Assumption: Each Zip Code has equal number of records and
there are 1000 distinct values for Zip Codes (ri). Index entries
consist of 5-byte long Zip Code and 6-byte long block pointer.
Size of the record Ri = 5+6 = 11 bytes
Blocking factor bfri = B/Ri = floor(4,096/11)
= 372 index entries per
block
Hence, number of blocks needed bi = Ri/bfri
= ceiling(1,000/372) = 3 blocks.
Block accesses to perform a binary search,
= log2 (bi) = ceiling(log2 (3))= 2
1 1
© e-Learning Centre, UCSC 1 © e-Learning Centre, UCSC 1
1 2
3.3 Types of Indexes 3.3 Types of Indexes

• Single Level Indexes: Secondary indexes • Single Level Indexes: Secondary indexes

• A secondary index provides an additional medium • For a single file, few secondary Indexes (and
for accessing a data file which already has a therefore indexing fields) can be created - Each of
primary access. these serves as an additional method of accessing
• Data file records can be ordered, not ordered or that file based on a specific field.
hashed. Yet, the index file is ordered. • For a secondary index created on a candidate key
• A candidate key which has unique values for every (unique key/ primary key), which has unique values
record or a non - key value which holds redundant for every record in the file, the secondary index will
values can be use as the indexing field to define the get entries for every record in the data file.
secondary indexing. • The reason to have entries for every record is, the
• The first field of the index file has the same data key attribute which is used to create secondary
type as the non-ordering field in the data file, which index has distinct values for each and every record.
is an indexing field. • In such scenarios, the secondary index will create a
• A block pointer or a record pointer is put in the dense index which holds key value and block
second field. pointer for each record in the data file.
1 1
© e-Learning Centre, UCSC 1 © e-Learning Centre, UCSC 1
3 4

3.3 Types of Indexes 3.3 Types of Indexes

• Single Level Indexes: Secondary indexes • Single Level Indexes: Secondary indexes

• Same as before in primary index, here also two • Due to the huge number of entries, a secondary
fields of index entries are referred as <K (i), P (i)>. index requires much storage capacity when
• Since the order of the data file is based on the value compared to the primary index.
of K (i), a binary search can be performed. • But, on the other hand, secondary indexing gives
greater improvement in the search time for an
• However, block anchors cannot be used since the arbitrary record.
records of the data file is not physically ordered by • Secondary index is more important because we
the values ​of the secondary key field. have to do a linear search of the data file, If there
• This is the reason for creating an index entry for was no secondary index.
each record of data instead of using block anchors • For a primary index, a binary search can be
like in primary index. performed in the main file even if the index is not
present.

1 1
© e-Learning Centre, UCSC 1 © e-Learning Centre, UCSC 1
5 6

3.3 Types of Indexes


3.3 Types of Indexes
• Single Level Indexes: Secondary indexes
However, if we have a secondary indexing on that non -
• Single Level Indexes: Secondary indexes ordering keyfield, with entries for the block pointers P= 6 bytes
Ex: If we take the same example in primary index and assume long,
we search for a non-ordering key field V = 9 bytes long,in a Length of the index entry Ri = V+P
file with 300,000 records with a fixed length of 100 bytes. And
given block size B =4,096 bytes. = 9+6 = 15
We can calculate the blocking factor, Blocking factor bfri = B/Ri
bfr = (B/R)= floor(4,096/100) = 40 records per block = floor(4,096/15) = 273
Hence, the number of blocks needed, Since the secondary index is dense, the number of index
entries (ri) are same as the number of records (300,000) in the
b = (r/bfr) = ceiling(300,000/40)= 7,500 blocks. file.
• If we perform a linear search on this file, the required • Therefore, number of blocks required for secondary index
number of block access = b/2 is,
= 7,500/2 bi = ri/ bfri
=ceiling(300,000 / 273)
= 3,750 block accesses = 1,099
1 1
© e-Learning Centre, UCSC 1 © e-Learning Centre, UCSC 1
7 8

3.3 Types of Indexes 3.3 Types of Indexes

• Single Level Indexes: Secondary indexes • Single Level Indexes: Secondary indexes
• If we perform a binary search on this secondary index • Comparing to the linear search, that required 3,750
the required number of block accesses can be block accesses,the secondary index shows a big
calculated as follows, improvement with 12 block accesses. But it is
(log2 bi) = ceiling(log21,099) slightly worse than the primary index which
needed only 6 block accesses.
= 11 block accesses.
• This difference is a result of the size of the primary
• Since we need additional block access to find the index. The primary index is sparse index, therefore,
record in the data file using the index, the total number it has only 28 blocks.
of block accesses required is,
• While the secondary index which is dense, require
11 + 1 = 12 block accesses. length of 1,099 blocks. This is longer when
compared to the primary index.

1 1
© e-Learning Centre, UCSC 1 © e-Learning Centre, UCSC 2
9 0
3.3.2 Multilevel indexes: Overview of multilevel
3.3 Types of Indexes
indexes

• Single Level Indexes: Secondary indexes • Considering a single-level index is an ordered file, we
• Secondary index retrieves the records in the order can create a primary index to the index file itself.
of the index field that we considered to create the • Here the original index file is called as the first-level
secondary index, because secondary indexing index and the index file created to the original index is
gives a logical ordering of the records. called as the second-level index.
• However, in primary and clustering index, it • We can repeat the process, creating a third, fourth, ...,
assumes that, physical ordering of the file is top level until all entries of the top level fit in one disk
similar to the order of the indexing field. block.
• A multi-level index can be created for any type of first
level index (primary, secondary, clustering) as long as
the first-level index consists of more than one disk
block.

1 1
© e-Learning Centre, UCSC 2 © e-Learning Centre, UCSC 2
1 2

3.3.2 Multilevel indexes: Overview of multilevel 3.3.2 Multilevel indexes: Overview of multilevel
indexes indexes

• As we have discussed in topic 3.3, an ordered index • If the first level index has r1 entries, blocking factor for the
file is associated to the primary, clustered and first level bfr1 = fo.
secondary indexing schemes. • The number of blocks required for the first level is given
• Binary search is used to find indexes and the by, ( r1 / fo).
algorithm continues to reduces the part of the index • Therefore, the number of records in the second level
file that search, by factor 2 in each step. Hence we index r2= ( r1 / fo).
use the log function to the base 2. (log 2 bi)
• Similarly, r3= ( r2 / fo).
• Multilevel indexing is used to faster this search by
reducing the search space if the blocking factor of the • However, we need to have second level only if the first
index is greater than 2. level requires more than 1 block. Likewise, we consider for
a next level only if the current level requires more than 1
• In multilevel indexing the blocking factor of the index block.
bfri referred to as fan-out which is symbolized as fo.
• If the top level is t,
• In multilevel indexing, the number of block accesses
required is (approximately) logfo bi.
1 t= ⎡ (logfo (r1)) ⎤ 1
© e-Learning Centre, UCSC 2 © e-Learning Centre, UCSC 2
3 4

3.4 Indexes on Multiple Keys 3.4 Indexes on Multiple Keys

1. By assuming only the department_id has an index, we


• If a certain combination of attributes is used frequently, can access the records with department_id = 1 using
we can set up a key value on those combination of the index and then find the records that has 3.5 gpa.
attributes for efficient access.
• For an example, let's say we have a file for students, 2. Alternatively, we can assume only the gpa has an
containing student_id, name, age, gpa, department_id index and not the department_id, we can access the
and department_name. records with gpa = 3.5 using the index and then find
• If we want to find students whose department id = 1 and the records that has department_id = 1.
gpa is 3.5, we can have the search strategies specified 3. If both of this department_id and gpa fields have
in the next slide. indexes, we can get the records that meets the given
individual condition (depadrmrnt_id = 1 and gpa = 3.5 )
and then take the intersection of those records.

1 1
© e-Learning Centre, UCSC 2 © e-Learning Centre, UCSC 2
5 6

3.4 Indexes on Multiple Keys


3.4 Indexes on Multiple Keys
• Ordered Index on Multiple Attributes
• All of the mentioned methods will eventually give the • We can create a key field for previously discussed
same set of records as the result. file as <department_id,gpa>.
• However, the number of individual records which meet • Search key is also a pair of values. For the previous
one of the specified conditions (either department_id= 1 example this will be <1,3.5>
or gpa= 3.5) are larger than the records that satisfy both
conditions (department_id= 1 and gpa= 3.5). • In general, if an index is created on attributes
<A1,A2,A3 …. ,An>, the search key values are tuples
• Hence, none of the above three methods is efficient for with n values ; <v1,v2,v3 …. ,vn>.
searching records we required.
• A lexicographic (alphabetical) ordering of these tuple
• Having a multiple key index on department_id and gpa values establishes an order on this composite
would be more efficient in this case, because we can search keys.
search for the records which meets given requirements
just by accessing the index file. • For example, all the composite keys with 1 for
department_id will precede those for department_id
• We refer to keys containing multiple attributes as 2.
composite keys.
• When the department_id is the same, the composite
1
keys will be sorted in ascending order of the gpa. 1
© e-Learning Centre, UCSC 2 © e-Learning Centre, UCSC 2
7 8
3.4 Indexes on Multiple Keys 3.4 Indexes on Multiple Keys

• Partitioned Hashing • Partitioned Hashing


• Partitioned hashing is an extension of static external • For example, consider the composite search key
hashing (when a search-key value is provided, the <department_id,gpa>
hash function always computes the same address) • If department_id and gpa are hashed into 2-bit and
which allows access on multiple keys. 6-bit address respectively, we get an 8 bit bucket
• This is suitable only for equality comparisons. It address.
doesn’t support range queries. • If department_id = 1 hashed to 01 and gpa = 3.5
• For a key consisting n attributes, n separate hash hashed to 100011 then the bucket address is
addresses are generated. The bucket address is a 01100011.
concatenation of these n addresses. • To search for students with 3.5 gpa, we can search
• Then it is possible to search for composite key by for buckets 00100011 , 01100011, 10100011,
looking up the appropriate buckets that match the 11100011
parts of the address in which we are interested.

1 1
© e-Learning Centre, UCSC 2 © e-Learning Centre, UCSC 3
9 0

3.4 Indexes on Multiple Keys 3.4 Indexes on Multiple Keys

• Partitioned Hashing • Partitioned Hashing


• Advantages of partitioned hashing: • Disadvantages of partitioned hashing:
i. Ease of extending for any number of attributes. i. Inability to handle range queries on any of the
component attributes.
ii. Ability to design the bucket addresses in a way
that frequently accessed attributes get higher- ii. Most of the time, records are not maintained by
order bits in the address. (Higher-order bits are the order of the key which was used for the hash
the left most bits) function. Hence, using lexicographic order of
combination of attributes as a key (eg:
iii. There is no need to maintain a separate access
<department_id,gpa>) to access the records
structure for individual attributes.
would not be straightforward or efficient.

1 1
© e-Learning Centre, UCSC 3 © e-Learning Centre, UCSC 3
1 2

3.4 Indexes on Multiple Keys


3.4 Indexes on Multiple Keys
Following illustration shows a grid array for the Student file with
• Grid Files one linear scale for department_id and another for the gpa
attribute
• Constructed using a grid array with one linear Student File
scale (or dimension) for each of the search 3
attributes.
2
• For the previous example of students file, we can
construct a linear scale for department_id and 1
another for gpa. 0
• These linear scales are created to preserve the 0 1 2 3
uniform distribution of that particular attributes that department_id Linear scale gpa Linear scale
are considered as index.
0 0 < 0.9 0
• Each cell points to some bucket address where
the records corresponding to that cell are stored. 1 1 1.0 - 1.9 1

2 2 2.0 - 2.9 2

3 3 > 3.0 3
1
Linear scale for department_id Linear scale for gpa 1
© e-Learning Centre, UCSC 3 © e-Learning Centre, UCSC 3
3 4

3.4 Indexes on Multiple Keys 3.4 Indexes on Multiple Keys

• Grid Files • Grid Files


• When we query for deparment_id = 1 and gpa =3.5, it • Grid files can be applied to any number of search
maps to cell (1,3) as highlighted in the previous slide. keys.
• Records for this combination can be found in the • If we have n number of search keys, we’ll get a grid
corresponding bucket. array of n dimensions.
• Due to nature of this indexing, we can perform range • Hence it is possible to partition the file along the
queries. dimensions of the search key attributes.
• As an example, for range query gpa > 2.0 and • Thus, grid files provide an access by combinations of
department_id < 2 , following bucket pool can be values along dimensions of grid array.
selected.
3 • Space overhead and additional maintenance cost for
reorganization of the dynamic files are some
2
drawbacks of grid files.
1

0
1 1
3 3
© e-Learning Centre, UCSC
0 1 2 3 © e-Learning Centre, UCSC
5 6
3.5 Other types of Indexes 3.5 Other types of Indexes

• Hash Indexes • Hash Indexes


• The hash index is a secondary structure that allows Hash-based indexing.
access to the file using hashing.
• The search key is defined on an attribute except the
one used for organizing the primary data file.
• Index entries consist of the hashed key and the
pointer to the record which is corresponding to the
key.
• The index files with hash index could be arranged as
dynamically expandable hash file.

1 1
© e-Learning Centre, UCSC 3 © e-Learning Centre, UCSC 3
7 8

3.5 Other types of Indexes 3.5 Other types of Indexes

• Bitmap Indexes • Bitmap Indexes

• Bitmap index is commonly used for querying on • Consider we are creating a bitmap index on column
multiple keys. C, for a particular value V and we have n number of
• Generally, this is used for relations which are consist records.
of large number of rows. • Therefore, the index contains n number of bits.
• Bitmap index can be created for every value or range • For a given record with record number i, if that
of values in single or multiple columns. record has the value V in column C, the ith bit will be
• However, those columns used to create bitmap index given 1, otherwise it will be 0.
have quite less number of unique values.

1 1
© e-Learning Centre, UCSC 3 © e-Learning Centre, UCSC 4
9 0

3.5 Other types of Indexes 3.5 Other types of Indexes

• Bitmap Indexes Row_id Emp_id Lname Gender M F


• Bitmap Indexes
0 51024 Sandun M 1 0 • According to the example given in the previous slide,
• In the given table we 1 23402 Kamalani F 0 1 • If we consider value F in column gender, 1st, 3rd,
have a column for
2 62104 Eranda M 1 0
4th and 7th bits are marked as “1” because record
record the gender of ids of 1,3,4, and 7 have value F init. But the record
the employee. 3 34723 Christina F 0 1 ids of 0,2,5 and 6 set to “0”.
• The bitmap index for
the values are an
4 81165 Clera F 0 1
• Bitmap index is created on a set of records which
array of bits as 5 13646 Mohamad M 1 0 are numbered from 0 to n with a record id or row id
shown. 6 54649 Karuna M 1 0 that can be mapped to a physical address.
7 41301 Padma F 0 1 • This physical address is created with block number
and record offset within the block.
M 10100110
F 01011001
1 1
4 4
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
1 2

3.5 Other types of Indexes 3.5 Other types of Indexes

• Function based indexing


• Function based indexing
• This methods was introduced by commercial DBMS • If we apply following query, DBMS will use the index
products like Oracle relational DBMS. created on last_name rather than searching the entire
• In function based indexing, a function is applied on table.
one or more columns and the resulting value is the
key to the index.
• For an example, we can create an index on
uppercase of the last_name field as follows; SELECT Emp_id, Lname
FROM Employee
CREATE INDEX upper_lname
WHERE UPPER(last_name)= "Sandun"
ON Employee (UPPER(Lname));

• “UPPER” function is applied on “Lname” field to


create index called “upper_lname”. 1 1
4 4
© e-Learning Centre, UCSC © e-Learning Centre, UCSC
3 4
3.6 Index Creation and Tuning 3.6 Index Creation and Tuning

• Index Creation • Index Creation


• An index is not an essential part of a data file. • Following command is a general way of creating an
However, we can create and remove index index in RDBMS;
dynamically. CREATE [ UNIQUE ] INDEX <index name>
• Usually, index is known as access structures. We ON <table name> ( <column name> [ <order> ] { ,
can create index based on the frequently used <column name> [ <order> ] } )
search requirements.
[ CLUSTER ] ;
• The physical ordering of the data file is disregarded
by creating a secondary index. • Keywords in green square brackets are optional.
• [Cluster] → sort records in the datafile on the
• Secondary index can be created in conjunction with indexing attribute.
virtually any primary record organization. • <order> → ASC/DESC (default- ASC)
• Secondary index can be used in addition to the
primary index such as ordering, hashing or mixed
files.
1 1
© e-Learning Centre, UCSC 4 © e-Learning Centre, UCSC 4
5 6

3.6 Index Creation and Tuning 3.6 Index Creation and Tuning

• Tuning Indexes • Tuning Indexes


• The indexes that we have created, may require
modifications due to following reasons,
i. Long run time of the queries due to deficiency • Database tuning takes place with the goal of meeting
of an index. best overall performance. The requirements are
ii. Index may not get utilized. dynamically evaluated and the organization of the
iii. Attributes that are used to create the index index and files are changed accordingly.
might subject to frequent changes. • Change nonclustered index into a clustered index or
• DBMS provide options to view the execution order change clustered index into a nonclustered index ,
of the queries. The indexes used, number of disk creating or dropping index are some ways of
accesses are include in this view and it is known as improving performance.
query plan. • Rebuild operation of index might help to improve the
• With the query plan, we can identify if the above performance by claiming the wasted space due to
problems are taking place and hence update or many deletions.
remove index accordingly.

1 1
© e-Learning Centre, UCSC 4 © e-Learning Centre, UCSC 4
7 8

3.7 Physical Database Design in Relational 3.7 Physical Database Design in Relational
Databases Databases

• Analyzing the Database queries and transactions • Analyzing the Database queries and transactions

• Before design the physical structure, we should have • For retrieval query, the information given below will
a thorough idea of intended use of the database and be important
abstract knowledge about the queries that will be i. The relations that will be access by the query
used. ii. The attributes specified for the selection
• Physical design of a database should provide the condition
appropriate structure to store data and at the same iii. Type of selection condition (equal, unequal,
time it should facilitate better performance. range etc)
• The mix of queries, transactions and applications that iv. Attributes help in linking multiple tables (join
are expected to be run on the database are some conditions)
factors that database designer should consider v. Attributes retrieved by the query
before design the physical structure. • ii and iv are candidates for index creation.
• Let’s discuss about each factor in detail.

1 1
© e-Learning Centre, UCSC 4 © e-Learning Centre, UCSC 5
9 0

3.7 Physical Database Design in Relational 3.7 Physical Database Design in Relational
Databases Databases

• Analyzing the Database queries and transactions • Analyzing the Expected Frequency of Invocation of
Queries and Transactions
• When it comes to the update operation or update
transaction, we should consider, • We must consider how frequently we expect to call/
i. Files subject to update invoke a particular query.
ii. Whether it is an insert, delete or update • An aggregated list of expected frequencies for all
operation the queries and transactions along with their
iii. Which attributes are specified in the selection attributes is prepared.
condition, to update or delete.
iv. Attributes whose values are subject to change
by the update query.
• Attributes in iii are useful when creating an index.

1 1
© e-Learning Centre, UCSC 5 © e-Learning Centre, UCSC 5
1 2
3.7 Physical Database Design in Relational 3.7 Physical Database Design in Relational
Databases Databases

• Analyzing the Time Constraints of Queries and • Analyzing the Expected Frequency of the update
Transactions. queries

• Some queries and transactions have rigid time • Updating the access paths for a record itself slow
constraints. For an example, if we take a stock down the operations. Therefore, least amount of
exchange system, some of the queries required to access paths should be specified for the file that are
be completed within milliseconds. subject to frequent updates.
• Generally, primary access structures provides the
most effective way of locating a record in a file.
Hence, selection attributes in queries with time
constraints should be given a high priority when
creating primary access structures.

1 1
© e-Learning Centre, UCSC 5 © e-Learning Centre, UCSC 5
3 4

3.7 Physical Database Design in Relational 3.7 Physical Database Design in Relational
Databases Databases

• Analyzing the Uniqueness constraint on attributes • Design Decisions about indexing


• Whether to index an attribute:
• Primary key of a file or the unique attributes of a file • In general, indexes are created on the attributes
that are candidate keys, should have access paths which are used as the unique key of the file or the
defined. attributes which are used in selection conditions or
• Having index (or the access path) defined will make in join conditions in queries.
it easy to search on the index when checking for • Multiple indexes are defined to process operation
uniqueness. just by scanning the index rather than accessing
• This will help to check the uniqueness when data files.
inserting new records because if the value is already
exist, database will reject that record since it violates
the uniqueness.

1 1
© e-Learning Centre, UCSC 5 © e-Learning Centre, UCSC 5
5 6

3.7 Physical Database Design in Relational 3.7 Physical Database Design in Relational
Databases Databases

• Design Decisions about indexing • Design Decisions about indexing


• What attribute or attributes to index on: • Whether to set up a clustered index:
• An index can be defined on a single attribute or it • We cannot have both primary and clustering index
could be a composite index created on multiple on the same file because the data file is physically
attributes. In composite index, the order of the ordered accordingly in both scenarios. We can
attributes should be match with their order in the apply clustered index, if it supports answer the
respective queries. queries just by accessing index. Otherwise, there is
no use of making a clustered index. If multiple
Ex: If we have a composite key with (department, queries require clustering on different attributes, we
subject) it assumes the queries are based on should evaluate the gain of each and decide on
subjects within a department. which attribute to use.
• Whether to use dynamic hashing for the file:
• Dynamic hashing would be suitable for files which
are subject to frequent expansion and shrinking.

1 1
© e-Learning Centre, UCSC 5 © e-Learning Centre, UCSC 5
7 8

Activity Activity

1. What are the types of single level ordered indexes? Fill in the blanks
a. ________ 1. A file can have _____ physical ordering field.
b. ________ 2. Primary, Clustering and Secondary index are types of
c. ________ _______ level _____ indexes.
3. ________ search is possible on the index field since it has
________ values.
4. Indexing access structure is established on ______ ____.
5. Index file is usually _____ than the datafile.

1 1
© e-Learning Centre, UCSC 5 © e-Learning Centre, UCSC 6
9 0
Activity Activity

Mark whether the given statement is true false. 1. You have a file with 600,000 records (r), which is
1. An unordered file which consists two fields and limited ordered by its key field and each record of this file is
length records is known as a primary index file. ( t/f ) fixed length and unspanned. Record length (R) is 100
bytes and block size(B) is 4096.
2. for a given block in an ordered data file, the first record
in that block is known as anchor record. ( t/f ) a. What is the blocking factor?
3. Indexes that contains index entries for some records in b. How many blocks required to store this file?
the data file referred as non - dense index. ( t/f ) c. Calculate the number of block accesses required
4. Ordering key field of the index file and the primary key when performing a binary search on this file and
of the data file have same data type. ( t/f ) access data.
5. In primary index, index file contains one index
entry(a.k.a index record) for each record in the data
file.( t/f )

1 1
© e-Learning Centre, UCSC 6 © e-Learning Centre, UCSC 6
1 2

Activity Activity

1. You have a file with 400,000 records (r), which is 1. You have a file with 400,000 records (r), which is
ordered by its key field and each record of this file is ordered by its key field and each record of this file is
fixed length and unspanned. Record length (R) is 100 fixed length and unspanned. Record length (R) is 100
bytes and block size(B) is 4096. bytes and block size(B) is 4096. If you have created a
a. What is the blocking factor? primary index file with 9 bytes long ordering key field
(v) and 6 bytes long block pointer (p),
b. How many blocks required to store this file?
a. What is the blocking factor for index ?
c. Calculate the number of block accesses required
when performing a binary search on this file and b. How many blocks required for the index file?
access data. c. Calculate the number of block accesses required
when performing a binary search on index fileand
access data.

1 1
© e-Learning Centre, UCSC 6 © e-Learning Centre, UCSC 6
3 4

Activity Activity

1. A data file with 400,000 records (r) is ordered by a non- 1. Assume we search for a non-ordering key field V = 9
key field called “product_category”. The bytes long,in a file with 600,000 records with a fixed
product_category field has 750 distinct values. Record length of 100 bytes. And given block size B =8,192
length (R) is 100 bytes and block size(B) is 4096. If you bytes.
have created a primary index file on this non-key field
a. What is the blocking factor?
with 9 bytes long ordering key field (v) and 6 bytes long
block pointer (p), b. What is the required number of blocks?
a. What is the blocking factor for index ? c. How many block accesses required for a linear
search?
b. How many blocks required to store the index file?
c. Calculate the number of block accesses required
when performing a binary search on index file and
access data.

1 1
© e-Learning Centre, UCSC 6 © e-Learning Centre, UCSC 6
5 6

Activity Activity

1. Assume we create a secondary index on non-ordering 1. Assume we have a file with multi-level indexing. In the
key field V = 9 bytes long, with entries for the block first level, number of blocks b1 = 1099 and blocking
pointers P=6, in a file with 600,000 records with a fixed factor (bfri) = 273.
length of 100 bytes. And given block size B =8,192
a. Calculate the number of blocks required for second
bytes.
level.
a. What is the blocking factor for index ?
b. Calculate the number of blocks required for third
b. How many blocks required to store the index file? level.
c. Calculate the number of block accesses required c. what is the top level index(t)?
when performing a binary search on index file and
d. How many block accesses required to access a
access data.
record using this multi-level index?

1 1
© e-Learning Centre, UCSC 6 © e-Learning Centre, UCSC 6
7 8

You might also like