Organizing Files For Performance
Organizing Files For Performance
By
Dr. Ahmed Taha
Lecturer, Computer Science Department,
Faculty of Computers & Informatics,
Benha University
1
2 LOGO
3 LOGO
Book Title:
File Structures : An Object-Oriented Approach
with C++
Authors:
Michael J. Folk
Greg Riccardi
Bill Zoellick
Publisher:
ADDISON-WESLEY
Edition:
Third edition (March, 1998)
4 LOGO
Book Contents
# Chapter Title
1 Introduction to File Structures
2 Fundamental File Processing Operations
3 Secondary Storage and System Software
4 Fundamental File Structure Concepts
5 Managing Files of Records
6 Organizing Files for Performance
7 Indexing
8 Cosequential Processing and the Sorting of Large Files
9 Multi-level Indexing and B-trees
10 Indexed Sequential File Access and Prefix B+ Trees
11 Hashing
12 Extendible Hashing
5 LOGO
Organizing Files for Performance
Lecture No. 5
6
Contents
1 Motivation
7 LOGO
Motivation
8 LOGO
Motivation
❖ Let us consider a file of records (fixed length or variable
length)
10 LOGO
Strategies for Record Deletion
❖ How to delete records and reuse the unused space?
11 LOGO
Strategies for Record Deletion
1. Record Deletion and Storage Compaction
❖ Deletion can be done by marking a
record as deleted
12 LOGO
Strategies for Record Deletion
2. Deleting Fixed-Length Records and Reclaiming
Space Dynamically
❖ How to use the space of deleted
records for storing records that are
added later?
❖ Use an “AVAIL LIST”, a linked list of
available records.
❖ A header record stores the beginning
of the AVAIL LIST File of fixed length records
14 LOGO
Strategies for Record Deletion
3. Deleting Variable-Length Records
❖ RRN can not be used, but exact byte offset must be used
16 LOGO
Placement Strategies for New Records
❖ There are several strategies for selecting a record from
AVAIL LIST when adding a new record:
1. First-Fit Strategy
17 LOGO
Placement Strategies for New Records
2. Best-Fit Strategy
❖ Example:
▪ AVAIL LIST: size=10,size=22,size=50,size=60
▪ record to be added: size=20
▪ Which record from AVAIL LIST is used for the new record?
18 LOGO
Placement Strategies for New Records
3. Worst-Fit Strategy
❖ Example:
▪ AVAIL LIST: size=60,size=50,size=22,size=10
▪ record to be added: size=20
▪ Which record from AVAIL LIST is used for the new record?
19 LOGO
How to choose between Strategies?
❖ We must consider two types of fragmentation within a file:
❖ Internal Fragmentation
▪ wasted space within a record.
❖ External Fragmentation
▪ space is available at AVAIL LIST, but it is so small that cannot be
reused.
20 LOGO
Study This !
❖ For each of the following approaches, which type of
fragmentation arises, and which placement strategy is more
suitable?
❖ When the added record is smaller than the item taken from
AVAIL LIST:
22 LOGO
Physical File Storage
❖ Each of your disks contains its own index file so that
information about its contents is always available when the
disk is in use.
23 LOGO
Physical File Storage
❖ A file that does not fit into a single cluster spills over into the
next contiguous (meaning adjacent) cluster, unless that
cluster already contains data.
24 LOGO
Physical File Storage
25 LOGO
What happens when a file is deleted
❖ When a file is deleted, the operating system simply changes
the status of the file’s clusters to “empty” and removes the
file name from the index file.
26 LOGO
Fragmentation & Defragmentation
❖ As a computer writes files on a disk, parts of files tend to
become scattered all over the disk.
27 LOGO
Fragmentation & Defragmentation
28 LOGO
Fragmentation & Defragmentation
29 LOGO
30