Unit 8 - File System Interface and Implementation
Unit 8 - File System Interface and Implementation
SEMESTER 1
O02CA503
OPERATING SYSTEMS
Unit: 8 – File System Interface and Implementation 1
O02CA503: Operating Systems
Unit 8
File System Interface and Implementation
TABLE OF CONTENTS
Fig No /
SL SAQ /
Topic Table / Page No
No Activity
Graph
1 Introduction - -
1.1 Objectives 4
- -
2 Concept of a File - 1
2.1 Attributes of a file - -
2.2 Operations on files - - 5-8
2.3 Types of files - -
2.4 Structure of file - -
3 File Access Methods - -
3.1 Sequential access - -
9 - 10
3.2 Direct access - -
3.3 Indexed sequential access - -
4 Directory Structure 1, 2 , 3, 2
4.1 Single-level directory - -
11 - 14
4.2 Two-level directory - -
4.3 Tree-structured directories - -
5 Allocation Methods 4, 5 , 6, 7 -
5.1 Contiguous allocation - -
5.2 Linked allocation - - 15 - 20
5.3 Indexed allocation - -
5.4 Performance comparison - -
6 Free Space Management - -
6.1 Bit vector - -
6.2 Linked list - - 21 - 22
6.3 Grouping - -
6.4 Counting - -
7 Directory Implementation - 3
7.1 Linear list - - 23 - 24
7.2 Hash table - -
1. INTRODUCTION
In the unit 8, we have discussed about virtual memory and various page replacement algorithms.
The operating system is a resource manager. Secondary resources like the disk are also to be
managed. Information is stored in secondary storage because it costs less, is non-volatile and
provides large storage space. Processes access data / information present on secondary storage
while in execution. Thus, the operating system has to properly organize data / information in
secondary storage for efficient access.
The file system is the most visible part of an operating system. It is a way for on-line storage and
access of both data and code of the operating system and the users. It resides on the secondary
storage because of the two main characteristics of secondary storage, namely, large storage
capacity and non-volatile nature. This unit will give you an overview of file system interface and its
implementation.
1.1. Objectives
After studying this unit, you should be able to:
explain various file concepts
discuss different file access methods
describe various directory structures
list out and explain various disk space allocation methods
manage free space on the disk effectively
implement directory
2. CONCEPT OF A FILE
Users use different storage media such as magnetic disks, tapes, optical disks and so on. All these
different storage media have their own way of storing information. The operating system provides
a uniform logical view of information stored in these different media. The operating system
abstracts from the physical properties of its storage devices to define a logical storage unit called
a file. These files are then mapped on to physical devices by the operating system during use.
The storage devices are usually non-volatile, meaning the contents stored in these devices persist
through power failures and system reboots.
The concept of a file is extremely general. A file is a collection of related information recorded on
the secondary storage. For example, file containing student information, file containing employee
information, files containing C source code and so on. A file is thus the smallest allotment of logical
secondary storage, that is any information to be stored on the secondary storage need to be
written on to a file and the file is to be stored. Information in files could be program code or data
in numeric, alphanumeric, alphabetic or binary form either formatted or in free form. A file is
therefore a collection of records if it is a data file or a collection of bits / bytes / lines if it is code.
Program code stored in files could be source code, object code or executable code whereas data
stored in files may consist of plain text, records pertaining to an application, images, sound and
so on. Depending on the contents of a file, each file has a pre-defined structure. For example, a
file containing text is a collection of characters organized as lines, paragraphs and pages whereas
a file containing source code is an organized collection of segments which in turn are organized
into declaration and executable statements.
All these attributes of files are stored in a centralized place called the directory. The directory is
big if the numbers of files are many and also requires permanent storage.
They are:
1) Creating a file: The two steps in file creation include space allocation for the file and an entry
to be made in the directory to record the name and location of the file.
2) Writing a file: The parameters required to write into a file are the name of the file and the
contents to be written into it. Given the name of the file the operating system makes a search
in the directory to find the location of the file. An updated write pointer enables to write the
contents at a proper location in the file.
3) Reading a file: To read information stored in a file the name of the file specified as a parameter
is searched by the operating system in the directory to locate the file. An updated read pointer
helps read information from a particular location in the file.
4) Repositioning within a file: A file is searched in the directory and a given new value replaces
the current file position. No I/O takes place. It is also known as files seek.
5) Deleting a file: The directory is searched for the particular file. If it is found, file space and other
resources associated with that file are released and the corresponding directory entry is
erased.
6) Truncating a file: In this the file attributes remain the same, but the file has a reduced size
because the user deletes information in the file. The end of file pointer is reset.
Other common operations are combinations of these basic operations. They include append,
rename and copy. A file on the system is very similar to a manual file. An operation on a file is
possible only if the file is open. After performing the operation, the file is closed. All the above
basic operations together with the open and close are provided by the operating system as system
calls.
extension, C source code files have a .c extension, COBOL source code files have a .cob
extension and so on.
If an operating system can recognize the type of a file then it can operate on the file quite well. For
example, an attempt to print an executable file should be aborted since it will produce only garbage.
Another use of file types is the capability of the operating system to automatically recompile the
latest version of source code to execute the latest modified program. This is observed in the Turbo
/ Borland integrated program development environment.
Operating system support for multiple file structures makes the operating system more complex.
Hence some operating systems support only a minimal number of files structures. A very good
example of this type of operating system is the UNIX operating system. UNIX treats each file as a
sequence of bytes. It is up to the application program to interpret a file. Here maximum flexibility
is present but support from operating system point of view is minimal. Irrespective of any file
structure support, every operating system must support at least an executable file structure to load
and execute programs.
Disk I/O is always in terms of blocks. A block is a physical unit of storage. Usually all blocks are
of same size. For example, each block = 512 bytes. Logical records have their own structure that
is very rarely an exact multiple of the physical block size. Therefore a number of logical records
are packed into one physical block. This helps the operating system to easily locate an offset
within a file. For example, as discussed above, UNIX treats files as a sequence of bytes. If each
physical block is say 512 bytes, then the operating system packs and unpacks 512 bytes of logical
records into physical blocks.
File access is always in terms of blocks. The logical size, physical size and packing technique
determine the number of logical records that can be packed into one physical block. The mapping
is usually done by the operating system. But since the total file size is not always an exact multiple
of the block size, the last physical block containing logical records is not full. Some part of this last
block is always wasted. On an average half a block is wasted. This is termed internal
fragmentation. Larger the physical block size, greater is the internal fragmentation. All file
systems do suffer from internal fragmentation. This is the penalty paid for easy file access by the
operating system in terms of blocks instead of bits or bytes.
SELF-ASSESSMENT QUESTIONS – 1
1. The operating system provides a uniform logical view of information stored in different
storage media. (True / False)
2. A _______________ is a collection of related information recorded on the secondary
storage.
3. Usually a block size will be _______________ bytes. (Pick the right option)
a) 512
b) 256
c) 128
d) 64
Information is stored in files. Files reside on secondary storage. When this information is to be
used, it has to be accessed and brought into primary main memory. Information in files could be
accessed in many ways. It is usually dependent on an application. Access methods could be:-
• Sequential access
• Direct access
• Indexed sequential access
best suited where most of the records in a file are to be processed. For example, transaction files.
4. DIRECTORY STRUCTURE
Files systems are very large. Files have to be organized. Usually a two level organization is done:
• The file system is divided into partitions. By default, there is at least one partition. Partitions
are nothing but virtual disks with each partition considered as a separate storage device.
• Each partition has information about the files in it. This information is nothing but a table of
contents. It is known as a directory.
The directory maintains information about the name, location, size and type of all files in the
partition. A directory has a logical structure. This is dependent on many factors including
operations that are to be performed on the directory like search for file/s, create a file, delete a file,
list a directory, rename a file and traverse a file system. For example, the dir, del, ren commands
in MS-DOS.
A single-level directory has limitations as the number of files and users increase. Since there is
only one directory to list all the files, no two files can have the same name i.e., file names must be
unique in order to identify one file from another. Even with one user, it is difficult to maintain files
with unique names when the number of files becomes large.
A two-level directory structure has one directory exclusively for each user. The directory structure
of each user is similar in structure and maintains file information about files present in that directory
only. The operating system
has one master directory for a partition. This directory has entries for each of the user directories
(Refer figure 8.2).
Files with same names exist across user directories but not in the same user directory. File
maintenance is easy. Users are isolated from one another. But when users work in a group and
each wants to access files in another user’s directory, it may not be possible.
Access to a file is through user name and file name. This is known as a path. Thus a path uniquely
defines a file. For example, in MS-DOS if ‘C’ is the partitions then C:\USER1\TEST,
C:\USER2\TEST, C:\USER3\C are all files in user directories. Files could be created, deleted,
searched and renamed in the user directories only.
directories. Every file has a unique path. Here the path is from the root through all the sub
directories to the specific file.
Usually, the user has a current directory. User created sub directories could be traversed. Files
are usually accessed by giving their path names. Path names could be either absolute or relative.
Absolute path names begin with the root and give the complete path down to the file. Relative
path names begin with the current directory. Allowing users to define sub directories allows for
organizing user files based on topics. A directory is treated as yet another file in the directory,
higher up in the hierarchy. To delete a directory, it must be empty. Two options exist: delete all
files and then delete the directory or delete all entries in the directory when the directory is deleted.
Deletion may be a recursive process since directory to be deleted may contain sub directories.
SELF-ASSESSMENT QUESTIONS – 2
4. The CPU can directly process the information stored in secondary memory without
transferring it to main memory. (True / False)
5. In _______________ method, information in a file is accessed sequentially one
record after another.
6. A _______________ is a tree of height two with the master file directory at the root
having user directories as descendants that in turn have the files themselves as
descendants. (Pick the right option)
a) Single level directory
b) Tree structured directory
c) Two level directory
d) Directory
5. ALLOCATION METHODS
Allocation of disk space to files is a problem that looks at how effectively disk space is utilized and
quickly files can be accessed. The three major methods of disk space allocation are:
• Contiguous allocation
• Linked allocation
• Indexed allocation
A file that is ‘n’ blocks long starting at a location ‘b’ on the disk occupies blocks b, b+1, b+2, …..,
b+(n-1). The directory entry for each contiguously allocated file gives the address of the starting
block and the length of the file in blocks as illustrated below (See figure 8.4).
Accessing a contiguously allocated file is easy. Both sequential and random access of a file is
possible. If a sequential access of a file is made then the next block after the current is accessed,
whereas if a direct access is made then a direct block address to the ith block is calculated as b+i
where b is the starting block address.
A major disadvantage with contiguous allocation is to find contiguous space enough for the file.
From a set of free blocks, a first-fit or best-fit strategy is adopted to find ‘n’ contiguous holes for a
file of size ‘n’. But these algorithms suffer from external fragmentation. As disk space is allocated
and released, a single large hole of disk space is fragmented into smaller holes. Sometimes the
total size of all the holes put together is larger than the size of the file size that is to be allocated
space. But the file cannot be allocated in that space because there is no contiguous hole of size
equal to that of the file. This is when external fragmentation has occurred. Compaction of disk
space is a solution to external fragmentation. But it has a very large overhead.
Another problem with contiguous allocation is to determine the space needed for a file. The file is
a dynamic entity that grows and shrinks. If allocated space is just enough (a best-fit allocation
strategy is adopted) and if the file grows, there may not be space on either side of the file to
expand. The solution to this problem is to again reallocate the file into a bigger space and release
the existing space. Another solution that could be possible if the file size is known in advance is
to make an allocation for the known file size. But in this case there is always a possibility of a large
amount of internal fragmentation because initially the file may not occupy the entire space and
also grow very slowly.
Initially a block is allocated to a file, with the directory having this block as the start and end. As
the file grows, additional blocks are allocated with the current block containing a pointer to the
next and the end block being updated in the directory.
This allocation method does not suffer from external fragmentation because any free block can
satisfy a request. Hence there is no need for compaction. Moreover, a file can grow and shrink
without problems of allocation.
Linked allocation has some disadvantages. Random access of files is not possible. To access the
ith block access begins at the beginning of the file and follows the pointers in all the blocks till the
ith block is accessed. Therefore, access is always sequential. Also, some space in all the allocated
blocks is used for storing pointers. This is clearly an overhead as a fixed percentage from every
block is wasted. This problem is overcome by allocating blocks in clusters that are nothing but
groups of blocks. But this tends to increase internal fragmentation. Another problem in this
allocation scheme is that of scattered pointers. If for any reason a pointer is lost, then the file after
that block is inaccessible. A doubly linked block structure may solve the problem at the cost of
additional pointers to be maintained.
MS-DOS uses a variation of the linked allocation called a File Allocation Table (FAT). The FAT
resides on the disk and contains entry for each disk block and is indexed by block number. The
directory contains the starting
block address of the file. This block in the FAT has a pointer to the next block and so on till the
last block (Refer figure 8.6). Random access of files is possible because the FAT can be scanned
for a direct block address.
Indexed allocation does suffer from wasted block space. Pointer overhead is more in indexed
allocation than in linked allocation. Every file needs an index block. Then what should be the size
of the index block? If it is too big, space is wasted. If it is too small, large files cannot be stored.
More than one index blocks are linked so that large files can be stored. Multilevel index blocks are
also used. A combined scheme having direct index blocks as well as linked index blocks has been
implemented in the UNIX operating system.
Conversion of one file type to another needs a copy operation to the desired file type.
Some systems support both contiguous and linked allocation. Initially all files have contiguous
allocation. As they grow a switch to indexed allocation takes place. If on an average, file are small,
then contiguous file allocation is advantageous and provides good performance
The disk is a scarce resource. Also, disk space can be reused. Free space present on the disk is
maintained by the operating system. Physical blocks that are free are listed in a free-space list.
When a file is created or a file grows, requests for blocks of disk space are checked in the free-
space list and then allocated. The list is updated accordingly. Similarly, freed blocks are added to
the free-space list. The free-space list could be implemented in many ways as follows:
Illustration: If blocks 2, 4, 5, 9, 10, 12, 15, 18, 20, 22, 23, 24, 25, 29 are free and the rest are
allocated, then a free-space list implemented as a bit vector would look as shown below:
00101100011010010010101111000100000………
The advantage of this approach is that it is very simple to implement and efficient to access. If
only one free block is needed, then a search for the first ‘1’ in the vector is necessary. If a
contiguous allocation for ‘b’ blocks is required, then a contiguous run of ‘b’ number of 1’s is
searched. And if the first-fit scheme is used then the first such run is chosen, and the best of such
runs is chosen if best-fit scheme is used.
Bit vectors are inefficient if they are not in memory. Also, the size of the vector has to be updated
if the size of the disk changes.
next free block and so on. But this scheme works well for linked allocation. If contiguous allocation
is used then to search for ‘b’ contiguous free blocks calls for traversal of the free-space list which
is not efficient. The FAT in MS-DOS builds in free block accounting into the allocation data
structure itself where free blocks have an entry say –1 in the FAT.
6.3. Grouping
Another approach is to store ‘n’ free block addresses in the first free block. Here (n-1) blocks are
actually free. The last nth address is the address of a block that contains the next set of free block
addresses. This method has the advantage that a large number of free block addresses are
available at a single place unlike in the previous linked approach where free block addresses are
scattered.
6.4. Counting
If contiguous allocation is used and a file has freed its disk space then a contiguous set of ‘n’
blocks is free. Instead of storing the addresses of all these ‘n’ blocks in the free-space list, only
the starting free block address and a count of the number of blocks free from that address can be
stored. This is exactly what is done in this scheme where each entry in the free-space list is a disk
address followed by a count.
7. DIRECTORY IMPLEMENTATION
• Linear list
• Hash table
SELF-ASSESSMENT QUESTIONS – 3
7. In contiguous allocation the disk access time is reduced, as disk head movement is
usually restricted to only one track. (True / False)
8. FAT stands for _______________.
9. If the file size is small, then _______________ is advantageous and provides good
performance. (Pick the right option)
a) Linked allocation
b) Indexed allocation
c) Contiguous file allocation
d) Linked with indexed allocation
The image shows the physical structure of hard disk drive (HDD), and the main components
involved in data storage and retrieval are:
1. Platters: These are the circular disks inside the HDD made of a rigid material, usually aluminum
or glass, coated with a magnetic material. Data is stored magnetically on these platters.
2. Read/Write Head: These are the tools that the drive uses to read data from and write data to
the platters. They "float" just above the surface of the platters on a cushion of air generated by
the spinning platters.
3. Track: This is a concentric circle on the surface of a platter. Tracks are sub-divided into sectors
and are the paths on which the read/write head reads and writes data.
4. Sector: These are subdivisions of a track and represent the smallest unit of storage on a disk.
A sector typically stores a fixed amount of data, such as 512 bytes or 4K bytes.
5. Cylinder: This is a concept used to describe the set of tracks that are at the same position on
each platter within the HDD. When the read/write heads move across the platters to access
data, they move in unison, so all the heads are over the same track number on each platter,
forming a cylinder.
6. Spindle: This is the axis on which the platters spin. It is driven by the motor, allowing the platters
to rotate at high speeds, which is essential for the read/write heads to access data.
7. Arm Assembly: This assembly holds the read/write heads and is mechanically moved in and
out to place the heads over the desired track on the platters. It is precisely controlled to allow
the heads to find and follow the tracks on the platters.
• Current position is 53. The first request is for track 98 i.e., The disk head moves from 53 to
98.
• Next, move to track 183. The disk head moves from 98 to 183.
• Then, move to track 37. The disk head moves from 183 to 37.
• Next, move to track 122. The disk head moves from 37 to 122.
• Move to track 14. The disk head moves from 122 to 14.
• Then, to track 124. The disk head moves from 14 to 124.
• Next, to track 65. The disk head moves from 124 to 65.
• Finally, to track 67. The disk head moves from 65 to 67.
The total head movement in this example can be calculated by summing the distances moved
between each of these requests. This total head movement is used as a metric to determine the
efficiency of the disk scheduling algorithm. The FCFS algorithm does not minimize this head
movement, and thus, it may not be the most efficient algorithm, especially in a system with a large
number of disk I/O requests.
Starting from track 53, the head movements will be calculated as follows:
1. Move from 53 to 98: ∣98−53∣=45 tracks
2. Move from 98 to 183: ∣183−98∣=85 tracks
3. Move from 183 to 37: ∣183−37∣=146 tracks
4. Move from 37 to 122: ∣122−37∣=85tracks
5. Move from 122 to 14: ∣122−14∣=108 tracks
6. Move from 14 to 124: ∣124−14∣=110 tracks
7. Move from 124 to 65: ∣124−65∣=59 tracks
8. Move from 65 to 67: ∣67−65∣=2 tracks
Now, we add up all these movements to get the total head movement:
Total Head Movement = 45 + 85 + 146 + 85 + 108 + 110 + 59 + 2 = 640 tracks
Example: Imagine a disk with 200 tracks numbered 0 to 199. The disk head is currently at track
53. We have a queue of disk access requests for tracks in the following order:
98,183,37,122,14,124,65,67 To visualise the SSTF disk scheduling algorithm based on the given
queue of disk access requests and starting at track 53, the disk head will move in the following
order of tracks:
• First, it moves to track 65, as it is closest to the starting track 53.
• Next, it goes to track 67, followed by 37, which are the nearest to the current positions.
• After that, the head moves to track 14 and 98.
• Continuing, it accesses tracks 122 and 124 in sequence.
• Finally, it serves the request at track 183.
This order ensures the shortest seek time first, as at each step, the disk head moves to the nearest
track with a pending request.
To calculate the total head movement in the Shortest Seek Time First (SSTF) scheduling, you
follow the head as it moves to the nearest track request at each step. Here's how it works step by
step:
2. The nearest request is at track 65, so it moves to 65. This is a movement of ∣65−53∣=12
tracks.
3. The next nearest request is at track 67, moving from 65. This movement is ∣67−65∣=2 tracks.
8. Finally, it moves to track 124, just ∣124−122∣=2 tracks, and then to the furthest request at
track 183, which is ∣183−124∣=59 tracks.
Add up all the individual movements to get the total head movement: 12+2+30+23+84+24+2+59
= 236 tracks.
3. SCAN
SCAN disk scheduling, also known as the elevator algorithm, is a method the operating system
uses to schedule disk I/O requests.
It works like:
✓ The disk arm starts at a particular track and moves in a direction towards one end of the disk.
✓ As it moves, it services all the requests (reads or writes) that come along that path.
✓ Upon reaching the end of the disk or the last request in the direction it is moving, the disk arm
reverses its direction.
✓ The arm then services the remaining requests in the new direction, again servicing all requests
encountered along this path.
The name "elevator algorithm" comes from the similarity of the disk arm movement to an elevator
in a building, which travels in one direction servicing all the floors (requests) on its path until it
reaches the top or bottom and then reverses direction.
• It prevents the "starvation" of requests, as it will eventually service all tracks on its path.
4. C-SCAN (Circular-SCAN)
C-SCAN disk scheduling, also known as Circular SCAN, the disk arm moves in a single direction
and services all the requests until it reaches the other end of the disk. After reaching the end, it
immediately returns to the beginning of the disk, without servicing any requests on the return trip,
and starts servicing requests in the same direction again. This provides a more uniform wait time
compared to SCAN.
Example: Suppose we have a disk with 200 tracks numbered from 0 to 199. The head of the
disk is initially at track number 53, and the requests arrive in the following order: 98, 183, 37,
122, 14, 124, 65, 67. The direction of head movement is from track 0 to track 199.
Solution:
• The disk head starts at track 53 and moves towards the higher numbered tracks.
• Services requests at 65, 67, 98, 122, 124, and 183 in that order as it moves in the direction
of increasing track numbers.
• Once it reaches the highest request (or the end of the disk), it jumps back to the lowest
track number without servicing any requests.
• It then services the remaining requests starting from the lowest track number going
upwards.
The total head movement for the SCAN Disk Scheduling algorithm, based on the provided
sequence of requests and starting from track 53, is 208 tracks.
6. C-LOOK
The Circular-LOOK (C-LOOK) algorithm is a variant of LOOK. With C-LOOK, the disk arm again
starts at one end and moves towards the other end, servicing requests. However, unlike LOOK,
when the arm reaches the furthest request in one direction, it doesn't service any requests on its
return to the start; it jumps directly to the first request in the queue, effectively looping around like
a circle. C-LOOK, therefore, always moves in the same direction and doesn't reverse as LOOK
does.
Example: Suppose we have a disk with 200 tracks numbered from 0 to 199. The head of the
disk is initially at track number 53, and the requests arrive in the following order: 98, 183, 37,
122, 14, 124, 65, 67. The direction of head movement is from track 0 to track 199.
The total head movements incurred while servicing these requests is 322.
9. SUMMARY
11. ANSWERS
a. Sequential access
b. Direct access
Answer 2: The directory maintains information about the name, location, size and type of all files
in the partition. A directory has a logical structure. This is dependent on many factors including
operations that are to be performed on the directory like search for file/s, create a file, delete a file,
list a directory, rename a file and traverse a file system. (Refer Section 4)
Answer 3: Allocation of disk space to files is a problem that looks at how effectively disk space is
utilized and quickly files can be accessed. The three major methods of disk space allocation are:
a. Contiguous allocation
b. Linked allocation