0% found this document useful (0 votes)

69 views168 pages

Database Indexing and Tuning

Database indexing for beginners

Uploaded by

Renuja De Costa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views168 pages

Database Indexing and Tuning

Database indexing for beginners

Uploaded by

Renuja De Costa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 168

3 : Database Indexing and Tuning

IT3306 – Data Management

Level II - Semester 3

© e-Learning Centre, UCSC

Overview

• This lesson on Database Indexing and Tuning discusses the

gradual development computer memory hierarchy and
hence the evolution of the Indexing Methods.

• Here we look into types of computer memory and types of

indexes in detail.

© e-Learning Centre, UCSC 2

Intended Learning Outcomes

• At the end of this lesson, you will be able to;

• Describe how the different computer memories evolved.
• Identify the different types of computer memory and the storage
organizations of databases
• Recognize the importance of implementing indexes on databases
• Explain the key concepts of the different types of database indexes.

© e-Learning Centre, UCSC 3

List of subtopics

3.1 Disk Storage and Basic File Structures

3.1.1 Computer memory hierarchy
3.1.2 Storage organization of databases
3.1.3 Secondary storage mediums
3.1.4 Solid State Device Storage
3.1.5 Placing file records on disk (types of records)
3.1.6 File Operations
3.1.7 Files of unordered records (Heap Files) and ordered
records (Sorted Files)
3.1.8 Hashing techniques for storing database records: Internal
hashing, external hashing

© e-Learning Centre, UCSC 4

List of subtopics

3.2 Introduction to indexing

Introduce index files, indexing fields, index entry (record pointers
and block pointers)
3.3 Types of Indexes

3.3.1 Single Level Indexes: Primary, Clustering and Secondary

indexes
3.3.2 Multilevel indexes: Overview of multilevel indexes
3.4 Indexes on Multiple Keys
3.5 Other types of Indexes
Hash indexes, bitmap indexes, function based indexes
3.6 Index Creation and Tuning
3.7 Physical Database Design in Relational Databases

© e-Learning Centre, UCSC 5

3.1 Disk Storage and Basic File Structures
3.1.1. Computer Memory Hierarchy

Computer Memory

Directly accessible to the Not directly accessible to

CPU the CPU

Secondary
Primary Storage Tertiary Storage
Storage

© e-Learning Centre, UCSC 6

3.1 Disk Storage and Basic File Structures
3.1.1. Computer Memory Hierarchy

• The data collected via a computational database

should be stored in a physical storage medium.
• Once stored in a storage medium, the database
management software can execute functions on that
to retrieve, update and process the data.
• In the current computer systems, data is stored and
moved across a hierarchy of storage media.
• As for the memory organization, the memory with the
highest speed is the most expensive option and it also
has the lowest capacity.
• When it comes to lowest speed memory, they are the
options with the highest available storage capacity.

© e-Learning Centre, UCSC 7

3.1 Disk Storage and Basic File Structures

3.1.1. Computer Memory Hierarchy

Now let’s explain the hierarchy given in the previous slide
(slide number 7).

1. Primary Storage
This operates directly in the computer’s Central
Processing Unit.
Eg: Main Memory, Cache Memory.
• Provides fast access to data.
• Limited storage capacity.
• Contents of primary storage will be deleted when the
computer shuts down or in case of a power failure.
• Comparatively more expensive.

© e-Learning Centre, UCSC 8

3.1 Disk Storage and Basic File Structures

3.1.1. Computer Memory Hierarchy

Primary Storage - Static RAM

• Static Random Access Memory (RAM) is the memory

where as long as power is provided.
• Cache memory in CPU is identified as the Static RAM
• Data is kept as bits in its memory.
• The most expensive type of memory.
• Using techniques like prefetching and pipelining, the
Cache memory speeds up the execution of program
instructions for the CPU.

© e-Learning Centre, UCSC 9

3.1 Disk Storage and Basic File Structures

3.1.1. Computer Memory Hierarchy

Primary Storage - Dynamic RAM

• Dynamic Random Access Memory (DRAM) is the

CPU's space for storing application instructions and
data.
• Main memory of the computer is identified as the
DRAM.
• The advantage of the DRAM is its low cost.
• When it is compared with the Static RAM, the speed is
lesser.

1
© e-Learning Centre, UCSC
0
3.1 Disk Storage and Basic File Structures
3.1.1. Computer Memory Hierarchy

2. Secondary Storage
Operates external to the computer’s main memory.
Eg: Magnetic Disks, Flash Drives, CD-ROM
• The CPU cannot process data in secondary storage
directly. It must first be copied into primary storage
before the CPU can handle it.
• Mostly used for online storage of enterprise
databases.
• With regards to enterprise databases, the magnetic
disks have been used as the main storage medium.
• Recently there is a trend to use flash memory for the
purpose of storing moderate amounts of permanent
data.
• Solid State Drive (SSD) is a form of memory that can
be used instead of a disk drive.
1
© e-Learning Centre, UCSC
1
3.1 Disk Storage and Basic File Structures

3.1.1. Computer Memory Hierarchy

2. Secondary Storage
• Least expensive type of storage media.
• The storage capacity is measured in:
- kilobytes(kB)
- Megabytes(MB)
- Gigabytes(GB)
- Terabytes(TB)
- Petabytes (PB)

1
© e-Learning Centre, UCSC
2
3.1 Disk Storage and Basic File Structures

3.1.1. Computer Memory Hierarchy

3. Tertiary Storage
Operates external to the computer’s main memory.
Eg: CD - ROMs, DVDs
• The CPU cannot process data in tertiary storage
directly. It must first be copied into primary storage
before the CPU can handle it.
• Removable media that can be used as offline storage
falls in this category.
• Large capacity to store data.
• Comparatively less cost.
• Slower access to data than primary storage media.

1
© e-Learning Centre, UCSC
3
3.1 Disk Storage and Basic File Structures

3.1.1. Computer Memory Hierarchy

4. Flash Memory
• Popular type of memory with its non-volatility.
• Use the technique of EEPROM (Electronically
Erasable and Programmable Read Only Memory)
• High performance memory.
• Fast access.
• One disadvantage is that the entire block must be
erased and written simultaneously.
• Two Types:
- NAND Flash Memory
- NOR Flash Memory
• Common examples:
- Devices in Cameras, MP3/MP4 Players,
Cellphones, USB Flash Drives
1
© e-Learning Centre, UCSC
4
3.1 Disk Storage and Basic File Structures

3.1.1. Computer Memory Hierarchy

5. Optical Drives
• Most popular type of Optical Drives are CDs and DVDs.
• Capacity of a CD is 700-MB and DVDs have capacities
ranging from 4.5 to 15 GB.
• CD - ROM reads the data by laser technology. They
cannot be overwritten.
• CD-R(compact disk recordable) and DVD-R: Allows to
store data which can be read as many times as
required.
• Currently this type of storage is comparatively declining
due to the popularity of the magnetic disks.

1
© e-Learning Centre, UCSC
5
3.1 Disk Storage and Basic File Structures

3.1.1. Computer Memory Hierarchy

6. Magnetic Tapes
• Used for archiving and as a backup storage of data.
• Note that Magnetic Disks (400 GB–8TB) and Magnetic
Tapes (2.5TB–8.5TB) are two different storage types.

1
© e-Learning Centre, UCSC
6
Activity

Categorize the following devices as Primary, Secondary or

Tertiary Storage Media.
1. Random Access Memory
2. Hard Disk Drive
3. Flash Drive
4. Tape Libraries
5. Optical Jukebox
6. Magnetic Tape
7. Main Memory

1
© e-Learning Centre, UCSC
7
3.1 Disk Storage and Basic File Structures
3.1.2. Storage Organization of Databases
• Usually databases have Persistent data. This means
large volumes of data stored over long periods of
time.
• These persistent data are continuously retrieved and
processed in the storage period.
• The place where the databases are stored
permanently in the computer memory is the
secondary storage.
• Magnetic disks are widely used here since:
- If the database is too large, it will not fit in the
main memory.
- Secondary storage is non-volatile, but the
main memory is volatile.
- The cost of storage per unit of data is lesser in
secondary storage.
1
© e-Learning Centre, UCSC
8
3.1 Disk Storage and Basic File Structures

3.1.2. Storage Organization of Databases

• Solid State Drive (SSD) is one of the latest
technologies identified as an alternative for
magnetic storage disks.
• However, it is expected that the primary option for
the storage of large databases will continue to be
the magnetic disks.
• Magnetic tapes are also used for database backup
purposes due to their comparatively lower cost.
• But the data in them need to be loaded and read
before processing. Opposing to this, magnetic disks
can be accessed directly at anytime.

1
© e-Learning Centre, UCSC
9
3.1 Disk Storage and Basic File Structures

3.1.2. Storage Organization of Databases

• Physical Database Design is a process that entails

selecting the techniques that best suit the
application requirements from a variety of data
organizing approaches.
• When designing, implementing, and operating a
database on a certain DBMS, database designers
and DBAs must be aware of the benefits and
drawbacks of each storage medium.

2
© e-Learning Centre, UCSC
0
3.1 Disk Storage and Basic File Structures

3.1.2. Storage Organization of Databases

• The data on disk is grouped into Records or Files.

• These records include data about entities, attributes
and relationships.
• Whenever a certain portion of the data retrieved from
the DB for processing, it needs to be found on disk,
copied to main memory for processing, and then
rewritten to the disk if the data gets updated.
• Therefore, the data should be kept on disk in a way
that allows them to be quickly accessed when they are
needed.

2
© e-Learning Centre, UCSC
1
3.1 Disk Storage and Basic File Structures
3.1.2. Storage Organization of Databases

• Primary File Organization defines how the data is

stored physically in the disk and how they can be
accessed.

File Organization Description

Heap File No particular order in storing data.
Appends new records to the end.
Sorted File Maintains an order for the records by
sorting data on a particular field.
Hashed File Uses the hash function of a field to identify
the record’s place in the database.
B Trees Use Tree structures for record storing.
2
© e-Learning Centre, UCSC
2
Activity

State whether the following statement are true or false.

1. The place of permanently storing databases is the
primary storage.
2. A Heap File has a specific ordering criterion where the
new records are added at the end.
3. Upon retrieval of data from a file, it needs to be found
on disk and copied to main memory for processing.
4. The database administrators need to be aware of the
physical structuring of the database to identify whether
they can be sold to a client.
5. Solid State Drives are identified alternatives for
magnetic disks.

2
© e-Learning Centre, UCSC
3
3.1 Disk Storage and Basic File Structures
3.1.3. Secondary Storage Media

• The device that holds the magnetic disks is the Hard

Disk Drive (HDD).
• Basic unit of data on a HDD is the Bit. Bits together
make Bytes. One character is stored using a single
byte.
• Capacity of a disk is the number of bytes the disk
can store.
• Disks are composed of magnetic material in the
shape of a thin round disk, with a plastic or acrylic
cover to protect it.
• Single Sided Disk stores information on one of its
surfaces.
• Double Sided Disk stores information on both sides
of its surfaces.
• A few disks assembled together makes a Disk Pack
which has higher storage capacity.
2
© e-Learning Centre, UCSC
4
3.1 Disk Storage and Basic File Structures

3.1.3. Secondary Storage Media

• On a disk surface, information is stored in

concentric circles of small width, each with its own
diameter.
• Each of these circles is called a Track.
• A Cylinder is a group of tracks on different surfaces
of a disk pack that have the same diameter.
• Retrieval of data stored on the same Cylinder is
faster compared to data stored in different
Cylinders.
• A track is broken into smaller Blocks or Sectors
since it typically includes a vast amount of data.

2
© e-Learning Centre, UCSC
5
3.1 Disk Storage and Basic File Structures
3.1.3. Secondary Storage Media

Hardware components
on disk:

a) A single-sided disk
with read/write
hardware.

b) A disk pack with

read/write.

2
© e-Learning Centre, UCSC
6
3.1 Disk Storage and Basic File Structures

3.1.3. Secondary Storage Media

Different sector organizations

on disk:

(a) Sectors subtending a fixed

angle

(b) Sectors maintaining a

uniform recording density

2
© e-Learning Centre, UCSC
7
3.1 Disk Storage and Basic File Structures

3.1.3. Secondary Storage Media

• During disk formatting, the operating system divides a
track into equal-sized Disk Blocks (or pages). The
size of each block is fixed and cannot be adjusted
dynamically.
• Interblock Gaps, which are fixed in size and contain
specifically coded control information recorded during
disk formatting.
• Hardware Address of a Block is the combination of a
cylinder number, track number (surface number inside
the cylinder on which the track is placed), and block
number (within the track).
• Buffer is one disk block stored in a reserved region in
primary storage.
• Read Command - Disk block is copied into the buffer
• Write Command - Contents of the buffer are copied
into the disk block.
2
© e-Learning Centre, UCSC
8
3.1 Disk Storage and Basic File Structures

3.1.3. Secondary Storage Media

• A collection of several shared blocks is called a
Cluster
• The hardware mechanism that reads or writes a
block of data is the Read / Write Head
• An electronic component is coupled to a mechanical
arm in a read/write head.
• Fixed Head Disks - The read/write heads on disk
units are fixed, with as many heads as there are
tracks.
• Movable Head Disks - Disk units with an actuator
connected to a second electrical motor that moves
the read/write heads together and accurately
positions them over the cylinder of tracks defined in
a block address.

2
© e-Learning Centre, UCSC
9
Activity

Match the description with the relevant technical term out

of the following.
[Capacity, Track, Buffer, Hardware Address of a Block,
Cluster]

1. The concentric circles on a disk where information is

stored.
2. Combination of a cylinder number, track number, and
block number
3. Number of Bytes that a disk can store
4. Collection of shared blocks
5. A disk block stored in a reserved location in primary
storage.

3
© e-Learning Centre, UCSC
0
3.1 Disk Storage and Basic File Structures

3.1.4. Solid State Device Storage

• Solid State Device (SSD) Storage is sometimes

known as Flash Storage.
• They have the ability to store data on secondary
storage without requiring constant power.
• A controller and a group of interconnected flash
memory cards are the essential components of an
SSD.
• SSDs can be plugged into slots already available for
mounting Hard Disk Drives (HDDs) on laptops and
servers by using form factors compatible with HDDs.
• SSDs are identified to be more durable, run silently,
faster in terms of access time, and delivers better
transfer rates than HDD because there are no
moving parts.

3
© e-Learning Centre, UCSC
1
3.1 Disk Storage and Basic File Structures

3.1.4. Solid State Device Storage

• As opposed to HDD, where Blocks and Cylinders

should be pre-assigned for storing data, any
address on an SSD can be directly addressed,
since there are no restrictions on where data can be
stored.
• With this direct access, data is less likely to be
fragmented, and the need for restructuring is not
available.
• Dynamic Random Access Memory (DRAM)-based
SSDs are also available in addition to flash
memory.
• DRAM based SSDs are more expensive than flash
memory, but they provide faster access. However,
they need an internal power supplier to perform.
3
© e-Learning Centre, UCSC
2
Activity

State four key features of a Solid State Drive (SSD).

1.____________________
2.____________________
3.____________________
4.____________________

3
© e-Learning Centre, UCSC
3
3.1 Disk Storage and Basic File Structures

3.1.5. Placing File Records on Disk

Explanation can be found next slide.

Field

Emp_No Name Data_Of_Birth Position Salary

0001 Nimal 1971 - 04 - 13 Manager 70,000

0005 Krishna 1980 - 01 - 25 Supervisor 50,000

Employee Relation

Value
Record
© e-Learning Centre, UCSC
3.1 Disk Storage and Basic File Structures
3.1.5. Placing File Records on Disk

• As shows in the previous slide, columns of the table

are called fields; rows are called records; each cell
data item is called value.
• The Data Type is one of the standard data types that
are used in programming.
- Numeric (Integer, Long Integer, Floating Point)
- Characters / Strings (Fixed length, varying
length)
- Boolean (True or False and 0 or 1)
- Date, Time
• For a particular computer system, the number of bytes
necessary for each data type is fixed.

3
© e-Learning Centre, UCSC
5
3.1 Disk Storage and Basic File Structures

3.1.5. Placing File Records on Disk

Create table Employee

(
Emp_No Int,
Name Char (50),
Date_Of_Birth Date, An example for the
Position Char (50), Creation of
Salary Int Employee relation
); using MySQL with
data types.

3
© e-Learning Centre, UCSC
6
Activity

Select the Data Type that best matches the description out of
the following.
[Integer, Floating Point, Date and Time, Boolean, Character]

1. NIC Number of Sri Lankans

2. The access time of users for the Ministry of Health website
within a week
3. The number of students in a class
4. Cash balance of a bank account
5. Response to the question by a set of students whether
they have received the vaccination for Rubella.

3
© e-Learning Centre, UCSC
7
3.1 Disk Storage and Basic File Structures

3.1.5. Placing File Records on Disk

• File is a sequence of records. Usually all records in a

file belong to the same record type.
• If the size of each record in the file is the same (in
bytes) the file is known to be made up of Fixed
Length Records.
• Variable Length Records means that different
records of the file are of different sizes.
• Reasons to have variable length records in a file:
- One or more fields are of different sizes.
- For individual records, one or more of the fields
may have multiple values (Repeating Group/
Field)
- One or more fields are optional (Optional Fields)
- File includes different record types (Mixed File)
3
© e-Learning Centre, UCSC
8
3.1 Disk Storage and Basic File Structures

3.1.5. Placing File Records on Disk

• In a Fixed Length Record;

• The system can identify the starting byte location
of each field relative to the starting position of the
record since each record has equal fields and
field lengths. This makes it easier for programs
that access such files to locate field values.
• However, variable length records can also be
stored as fixed length records.
• By assigning “Null” for optional fields where data
values are not available.
• By assigning the maximum possible number of
records for each repeating group.
• In each if these cases, the space is wasted.

3
© e-Learning Centre, UCSC
9
3.1 Disk Storage and Basic File Structures

3.1.5. Placing File Records on Disk

• In a Variable Length Record;

• Each field in each record contains a value, but the
precise length of some field values is not correctly
known.
• To determine and terminate variable lengths
special characters can be used.
• They represent the number of bytes for a particular
record in each field.
• Separators that can be used are: ?, $, %

4
© e-Learning Centre, UCSC
0
3.1 Disk Storage and Basic File Structures

3.1.5. Placing File Records on Disk

Record Storage Format 1

Eg: A fixed-length record with four fields and size of 44 bytes.

4
© e-Learning Centre, UCSC
1
3.1 Disk Storage and Basic File Structures

3.1.5. Placing File Records on Disk

Record Storage Format 2

Eg: A record with two variable-length fields (Name and
Department) and two fixed-length fields (NIC and Job_Code ).
Separator Character is used to mark the record separation.

4
© e-Learning Centre, UCSC
2
3.1 Disk Storage and Basic File Structures

3.1.5. Placing File Records on Disk

Record Storage Format 3

Eg: A variable-field record with three types of separator
characters

4
© e-Learning Centre, UCSC
3
3.1 Disk Storage and Basic File Structures

3.1.5. Placing File Records on Disk

• In a record with Optional fields;
• A series of <Field-Name, Field-Value> pairs can
be added in each record instead of the field values
if the overall number of fields for the record type is
high but the number of fields that actually occur in
a typical record is low.
• It will be more practical to store a Field Type
code, to each field and include in each record a
series of <Field-Type, Field-Value>.
• In a record with a Repeating Field;
• One separator character can be used to separate
the field's repeated values and another separator
character can be used to mark the field's end.

4
© e-Learning Centre, UCSC
4
Activity
Fill in the blanks in the following statements.
1. A file where the sizes of records in it are different in size
is called a _______________.

2. A _________________ includes different types of

records inside it.

3. In a file, the records belong to _________ record type.

4. A ___________ length record can be made by assigning

“Null” for optional fields where data values are not
available

5. To determine and terminate variable lengths special

characters named as __________ can be used. 4
© e-Learning Centre, UCSC
5
3.1 Disk Storage and Basic File Structures
3.1.5. Placing File Records on Disk
• Block is a unit of data transfer between disk and
memory.
• When the block size exceeds the record size, each
block will contain several records, however, certain
files may have exceptionally large records that cannot
fit in a single block.
• Blocking Factor (bfr) is the number of records per
block in bytes.
• If Block Size> Record Size,
bfr can be calculated using the below equation.

bfr = B / R
Block Size = B bytes
Record Size = R bytes 4
© e-Learning Centre, UCSC
6
3.1 Disk Storage and Basic File Structures

3.1.5. Placing File Records on Disk

• In calculating the bfr a floor function rounds down the
number to the nearest integer.
• But, when the bfr is calculated, there may be some
additional space remaining in each block.
• The unused space can be calculated with the equation
given below.

Unused Space in bytes = B - bfr* R

Block size Space dedicated
for blocks
4
© e-Learning Centre, UCSC
7
3.1 Disk Storage and Basic File Structures

3.1.5. Placing File Records on Disk

• Upon using the unused space, to minimize waste of

space, a part of a record can be stored in one block
and the other part can be stored in another block.
• If the next block on disk is not the one holding the
remainder of the record, a Pointer at the end of the
first block refers to it.
• Spanned Organization of Records - One record
spanning to more than one block.
• Used when a record is larger than the block size.
• Unspanned Organization of Records - Not allowing
records to span into more than one block.
• Used with fixed length records.

4
© e-Learning Centre, UCSC
8
3.1 Disk Storage and Basic File Structures

3.1.5. Placing File Records on Disk

Let’s look at the representation of Spanned and Unspanned

Organization of Records.

4
© e-Learning Centre, UCSC
9
3.1 Disk Storage and Basic File Structures

3.1.5. Placing File Records on Disk

• A spanned or unspanned organization can be utilized

in variable-length records.
• If it is a spanned organization, each block may store a
different number of records.
• Here the bfr would be the average number of records
per block.
• Hence the number of blocks b needed for a file of r
records is,

b = r / bfr

5
© e-Learning Centre, UCSC
0
3.1 Disk Storage and Basic File Structures

3.1.5. Placing File Records on Disk

Example of Calculation
There is a disk with block size B=256 bytes. A file has
r=50,000 STUDENT records of fixed-length. Each
record has the following fields:
NAME (55 bytes), STDID (4 bytes),
DEGREE(2 bytes), PHONE(10 bytes),
SEX (1 byte).

(i) Calculate the record size in Bytes.

Record Size R = (55 + 4 + 2 + 10 + 1) = 72 bytes

5
© e-Learning Centre, UCSC
1
3.1 Disk Storage and Basic File Structures

3.1.5. Placing File Records on Disk

Example of Calculation Continued…

(ii) Calculate the blocking factor (bfr)

Blocking factor bfr = floor (B/R)

= floor(256/72)
= 3 records per block

Floor Function = Rounds the value

down to the previous integer.
5
© e-Learning Centre, UCSC
2
3.1 Disk Storage and Basic File Structures

3.1.5. Placing File Records on Disk

Example of Calculation Continued...
(iii) Calculate the number of file blocks (b) required to
store the STUDENT records, assuming an unspanned
organization.

Number of blocks needed for file = ceiling(r/bfr)

= ceiling(50000/3)
= 16667

Ceiling Function = Rounds the

value up to the next integer.

5
© e-Learning Centre, UCSC
3
3.1 Disk Storage and Basic File Structures
3.1.5. Placing File Records on Disk

• A File Header, also known as a File Descriptor,

includes information about a file that is required by
the system applications which access the file
records.
• For fixed-length unspanned records, the header
contains information to determine the disk addresses
of the blocks, as well as record format descriptions,
which may include field lengths and the order of
fields within a record, and field type codes, separator
characters, and record type codes for variable-length
records.
• One or more blocks are transferred into main
memory buffers to search for a record on disk.
• The search algorithms must do a Linear Search
over the file blocks if the address of the block
containing the requested record is unknown.
5
© e-Learning Centre, UCSC
4
Activity

State the answer for the following calculations.

Consider a disk with block size B=512 bytes. A file
has r=30,000 EMPLOYEE records of fixed-length.
Each record has the following fields: NAME (30 bytes),
NIC (9bytes), DEPARTMENTCODE (9 bytes),
ADDRESS (40 bytes), PHONE (9 bytes),BIRTHDATE
(8 bytes), SEX (1 byte), JOBCODE (4 bytes), SALARY
(4 bytes, real number). An additional byte is used as a
deletion marker.

(i) Calculate the record size in Bytes.

(ii) Calculate the blocking factor (bfr)
(iii) Calculate the number of file blocks (b) required to
store the EMPLOYEE records, assuming an
unspanned organization.
5
© e-Learning Centre, UCSC
5
3.1 Disk Storage and Basic File Structures

3.1.6 File Operations

Operations on
Files

Retrieval Update
Operations Operations

Does not change any Changes the file by

data in the file. But insertion, deletion or
locate a certain modification of a certain
record based on the record based on the
selection / filtering selection / filtering
condition. condition.

© e-Learning Centre, UCSC

3.1 Disk Storage and Basic File Structures

3.1.6. File Operations

Emp_No Name Data_Of_Birth Position Salary

0001 Nimal 1971 - 04 - 13 Manager 70,000

0005 Krishna 1980 - 01 - 25 Supervisor 50,000

• Simple Selection Condition

Search for the record where Emp_No = “0005”
• Complex Selection Condition
Search for the record where Salary>60,000

5
© e-Learning Centre, UCSC
7
3.1 Disk Storage and Basic File Structures

3.1.6. File Operations

• When several file records meet a search criterion,

the first record in the physical sequence of file
records is identified and assigned as the Current
Record. Following search operations will start with
this record and find the next record in the file that
meets the criterion.

• The actual procedures for identifying and retrieving

file records differ from one system to the next.

5
© e-Learning Centre, UCSC
8
3.1 Disk Storage and Basic File Structures
3.1.6. File Operation

The following are the File Access Operations.

Operation Description
Open Allows to read or write to a file. Sets the file
pointer to the file's beginning.
Reset Sets the file pointer of an open file to the
beginning of the file.
Find (Locate) The first record that meets a search
criterion is found. The block holding that
record is transferred to a main memory
buffer. The file pointer is set to the buffer
record, which becomes the current record.

5
© e-Learning Centre, UCSC
9
3.1 Disk Storage and Basic File Structures
3.1.6. File Operations

Operation Description
Read (Get) Copies the current record from the buffer to
a user-defined program variable. The
current record pointer may also be
advanced to the next record in the file using
this command.
FindNext Searches the file for the next entry that
meets the search criteria. The block holding
that record is transferred to a main memory
buffer.
Delete The current record is deleted, and the file
on disk is updated to reflect the deletion.

6
© e-Learning Centre, UCSC
0
3.1 Disk Storage and Basic File Structures

3.1.6. File Operations

Operation Description
Modify Modifies some field values for the current
record and the file on disk is updated to
reflect the modification.

Insert Locates the block where the record is to be

inserted and transfers that block into a main
memory buffer to insert a new record in the
file and the file on disk is updated to reflect
the insertion.
Close Releases the buffers and does any other
necessary cleaning actions to complete the
file access.
6
© e-Learning Centre, UCSC
1
3.1 Disk Storage and Basic File Structures

3.1.6. File Operations

• The following is called “Record at a time” operation

since it is applied to a single record.

Operation Description
Scan Scan returns the initial record if the file
has just been opened or reset;
otherwise, it returns the next record.

6
© e-Learning Centre, UCSC
2
3.1 Disk Storage and Basic File Structures

3.1.6. File Operations

• The following are called “Set at a time” operations

since they are applied to the file in full.

Operation Description
FindAll Locates all the records in the file that
satisfy a search condition.
FindOrdered Locates all the records in the file in a
specified order condition.
Reorganize Starts the reorganization process. (In
cases such as ordering the records)

6
© e-Learning Centre, UCSC
3
3.1 Disk Storage and Basic File Structures

3.1.6. File Operations

• File Organization - The way a file's data is

organized into records, blocks, and access
structures, including how records and blocks are
put on the storage media and interconnected.

• Access Methods - A set of operations that may be

applied to a file is provided. In general, a file
structured using a specific organization can be
accessed via a variety of techniques.

6
© e-Learning Centre, UCSC
4
3.1 Disk Storage and Basic File Structures

3.1.6. File Operations

• Static Files - The files on which modifications are

rarely done.

• Dynamic Files - The files on which modifications

are frequently done.

• Read Only File - A file where modifications cannot

be done by the end user.

6
© e-Learning Centre, UCSC
5
Activity

Match the following descriptions with the relevant file

operation out of the following.

[Find, Reset, Close, Scan, FindAll]

1. Returns the initial record if the file has just been

opened or reset; otherwise, returns the next record.
2. Releases the buffers and does any other necessary
cleaning actions
3. Sets the file pointer of an open file to the beginning of
the file
4. The first record that meets a search criterion is found
5. Locates all the records in the file that satisfy a search
condition.
6
© e-Learning Centre, UCSC
6
3.1 Disk Storage and Basic File Structures