File Systems in Operating System
A computer file is defined as a medium used for saving and managing data in the
computer system. The data stored in the computer system is completely in digital format,
although there can be various types of files that help us to store the data.
File systems are a crucial part of any operating system, providing a structured way to store,
organize, and manage data on storage devices such as hard drives, SSDs, and USB drives.
Essentially, a file system acts as a bridge between the operating system and the physical
storage hardware, allowing users and applications to create, read, update, and delete files in
an organized and efficient manner.
File Directories:
The collection of files is a file directory. The directory contains information about the files,
including attributes, location, and ownership. Much of this information, especially that is
concerned with storage, is managed by the operating system. The directory is itself a file,
accessible by various file management routines.
Below are information contained in a device directory.
Name
Type
Address
Current length
Maximum length
Date last accessed
Date last updated
Owner id
Protection information
The operation performed on the directory are:
Search for a file
Create a file
Delete a file
List a directory
Rename a file
Traverse the file system
Advantages of Maintaining Directories
Efficiency: A file can be located more quickly.
Naming: It becomes convenient for users as two users can have same name for
different files or may have different name for same file.
Grouping: Logical grouping of files can be done by properties e.g. all java programs,
all games etc.
Single-Level Directory
In this, a single directory is maintained for all the users.
Naming Problem: Users cannot have the same name for two files.
Grouping Problem: Users cannot group files according to their needs.
Two-Level Directory
In this separate directories for each user is maintained.
Path Name: Due to two levels there is a path name for every file to locate that file.
Now, we can have the same file name for different users.
Searching is efficient in this method.
Tree-Structured Directory
The directory is maintained in the form of a tree. Searching is efficient and also there is
grouping capability. We have absolute or relative path name for a file.
File Allocation Methods
There are several types of file allocation methods. These are mentioned below.
Continuous Allocation
Linked Allocation(Non-contiguous allocation)
Indexed Allocation
Continuous Allocation
A single continuous set of blocks is allocated to a file at the time of file creation. Thus, this
is a pre-allocation strategy, using variable size portions. The file allocation table needs just
a single entry for each file, showing the starting block and the length of the file. This
method is best from the point of view of the individual sequential file. Multiple blocks can
be read in at a time to improve I/O performance for sequential processing. It is also easy to
retrieve a single block. For example, if a file starts at block b, and the ith block of the file is
wanted, its location on secondary storage is simply b+i-1.
Disadvantages of Continuous Allocation
External fragmentation will occur, making it difficult to find contiguous blocks of space
of sufficient length. A compaction algorithm will be necessary to free up additional
space on the disk.
Also, with pre-allocation, it is necessary to declare the size of the file at the time of
creation.
Linked Allocation(Non-Contiguous Allocation)
Allocation is on an individual block basis. Each block contains a pointer to the next block
in the chain. Again the file table needs just a single entry for each file, showing the starting
block and the length of the file. Although pre-allocation is possible, it is more common
simply to allocate blocks as needed. Any free block can be added to the chain. The blocks
need not be continuous. An increase in file size is always possible if a free disk block is
available. There is no external fragmentation because only one block at a time is needed but
there can be internal fragmentation but it exists only in the last disk block of the file.
Disadvantage Linked Allocation(Non-contiguous allocation)
Internal fragmentation exists in the last disk block of the file.
There is an overhead of maintaining the pointer in every disk block.
If the pointer of any disk block is lost, the file will be truncated.
It supports only the sequential access of files.
Indexed Allocation
It addresses many of the problems of contiguous and chained allocation. In this case, the
file allocation table contains a separate one-level index for each file: The index has one
entry for each block allocated to the file. The allocation may be on the basis of fixed-size
blocks or variable-sized blocks. Allocation by blocks eliminates external fragmentation ,
whereas allocation by variable-size blocks improves locality. This allocation technique
supports both sequential and direct access to the file and thus is the most popular form of
file allocation.
RAID
RAID (Redundant Array of Independent Disks) is like having backup copies of your
important files stored in different places on several hard drives or solid-state drives (SSDs).
If one drive stops working, your data is still safe because you have other copies stored on
the other drives. It’s like having a safety net to protect your files from being lost if one of
your drives breaks down.
RAID (Redundant Array of Independent Disks) in a Database Management System
(DBMS) is a technology that combines multiple physical disk drives into a single logical
unit for data storage. The main purpose of RAID is to improve data reliability, availability,
and performance. There are different levels of RAID, each offering a balance of these
benefits.
Types of RAID Controller
There are three types of RAID controller:
Hardware Based: In hardware-based RAID, there’s a physical controller that manages the
whole array. This controller can handle the whole group of hard drives together. It’s
designed to work with different types of hard drives, like SATA (Serial Advanced
Technology Attachment) or SCSI (Small Computer System Interface). Sometimes, this
controller is built right into the computer’s main board, making it easier to set up and
manage your RAID system. It’s like having a captain for your team of hard drives, making
sure they work together smoothly.
Software Based: In software-based RAID, the controller doesn’t have its own special
hardware. So it use computer’s main processor and memory to do its job. It perform the
same function as a hardware-based RAID controller, like managing the hard drives and
keeping your data safe. But because it’s sharing resources with other programs on your
computer, it might not make things run as fast. So, while it’s still helpful, it might not give
you as big of a speed boost as a hardware-based RAID system
Firmware Based: Firmware-based RAID controllers are like helpers built into the
computer’s main board. They work with the main processor, just like software-based
RAID. But they only implement when the computer starts up. Once the operating system is
running, a special driver takes over the RAID job. These controllers aren’t as expensive as
hardware ones, but they make the computer’s main processor work harder. People also call
them hardware-assisted software RAID, hybrid model RAID, or fake RAID.
Why Data Redundancy?
Data redundancy, although taking up extra space, adds to disk reliability. This means, that
in case of disk failure, if the same data is also backed up onto another disk, we can retrieve
the data and go on with the operation. On the other hand, if the data is spread across
multiple disks without the RAID technique, the loss of a single disk can affect the entire
data.
1. RAID-0 (Stripping)
Blocks are “stripped” across disks.
In the figure, blocks “0,1,2,3” form a stripe.
Instead of placing just one block into a disk at a time, we can work with two (or more)
blocks placed into a disk before moving on to the next one.
Evaluation
Reliability: 0
There is no duplication of data. Hence, a block once lost cannot be recovered.
Capacity: N*B
The entire space is being used to store data. Since there is no duplication, N disks each
having B blocks are fully utilized.
Advantages
It is easy to implement.
It utilizes the storage capacity in a better way.
Disadvantages
A single drive loss can result in the complete failure of the system.
It’s not a good choice for a critical system.
2. RAID-1 (Mirroring)
More than one copy of each block is stored in a separate disk. Thus, every block has
two (or more) copies, lying on different disks.
Raid-1
The above figure shows a RAID-1 system with mirroring level 2.
RAID 0 was unable to tolerate any disk failure. But RAID 1 is capable of reliability.
Evaluation
Assume a RAID system with mirroring level 2.
Reliability: 1 to N/2
1 disk failure can be handled for certain because blocks of that disk would have
duplicates on some other disk. If we are lucky enough and disks 0 and 2 fail, then again
this can be handled as the blocks of these disks have duplicates on disks 1 and 3. So, in
the best case, N/2 disk failures can be handled.
Capacity: N*B/2
Only half the space is being used to store data. The other half is just a mirror of the
already stored data.
Advantages
It covers complete redundancy.
It can increase data security and speed.
Disadvantages
It is highly expensive.
Storage capacity is less.
3. RAID-2 (Bit-Level Stripping with Dedicated Parity)
In Raid-2, the error of the data is checked at every bit level. Here, we use Hamming
Code Parity Method to find the error in the data.
It uses one designated drive to store parity.
The structure of Raid-2 is very complex as we use two disks in this technique. One
word is used to store bits of each word and another word is used to store error code
correction.
It is not commonly used.
Advantages
In case of Error Correction, it uses hamming code.
It Uses one designated drive to store parity.
Disadvantages
It has a complex structure and high cost due to extra drive.
It requires an extra drive for error detection.
4. RAID-3 (Byte-Level Stripping with Dedicated Parity)
It consists of byte-level striping with dedicated parity striping.
At this level, we store parity information in a disc section and write to a dedicated parity
drive.
Whenever failure of the drive occurs, it helps in accessing the parity drive, through
which we can reconstruct the data.
Raid-3
Here Disk 3 contains the Parity bits for Disk 0, Disk 1, and Disk 2. If data loss occurs,
we can construct it with Disk 3.
Advantages
Data can be transferred in bulk.
Data can be accessed in parallel.
Disadvantages
It requires an additional drive for parity.
In the case of small-size files, it performs slowly.
5. RAID-4 (Block-Level Stripping with Dedicated Parity)
Instead of duplicating data, this adopts a parity-based approach.
Raid-4
In the figure, we can observe one column (disk) dedicated to parity.
Parity is calculated using a simple XOR function. If the data bits are 0,0,0,1 the parity
bit is XOR(0,0,0,1) = 1. If the data bits are 0,1,1,0 the parity bit is XOR(0,1,1,0) = 0. A
simple approach is that an even number of ones results in parity 0, and an odd number
of ones results in parity 1.
Raid-4
Assume that in the above figure, C3 is lost due to some disk failure. Then, we can
recompute the data bit stored in C3 by looking at the values of all the other columns and
the parity bit. This allows us to recover lost data.
Evaluation
Reliability: 1
RAID-4 allows recovery of at most 1 disk failure (because of the way parity works). If
more than one disk fails, there is no way to recover the data.
Capacity: (N-1)*B
One disk in the system is reserved for storing the parity. Hence, (N-1) disks are made
available for data storage, each disk having B blocks.
Advantages
It helps in reconstructing the data if at most one data is lost.
Disadvantages
It can’t help reconstructing data when more than one is lost.
6. RAID-5 (Block-Level Stripping with Distributed Parity)
This is a slight modification of the RAID-4 system where the only difference is that the
parity rotates among the drives.
Raid-5
In the figure, we can notice how the parity bit “rotates”.
This was introduced to make the random write performance better.
Evaluation
Reliability: 1
RAID-5 allows recovery of at most 1 disk failure (because of the way parity works). If
more than one disk fails, there is no way to recover the data. This is identical to RAID-
4.
Capacity: (N-1)*B
Overall, space equivalent to one disk is utilized in storing the parity. Hence, (N-1) disks
are made available for data storage, each disk having B blocks.
Advantages
Data can be reconstructed using parity bits.
It makes the performance better.
Disadvantages
Its technology is complex and extra space is required.
If both discs get damaged, data will be lost forever.
7. RAID-6 (Block-Level Stripping with two Parity Bits)
Raid-6 helps when there is more than one disk failure. A pair of independent parities are
generated and stored on multiple disks at this level. Ideally, you need four disk drives
for this level.
There are also hybrid RAIDs, which make use of more than one RAID level nested one
after the other, to fulfill specific requirements.
Raid-6
Advantages
Very high data Accessibility.
Fast read data transactions.
Disadvantages
Due to double parity, it has slow write data transactions .
Extra space is required.
Advantages of RAID
Data redundancy: By keeping numerous copies of the data on many disks, RAID can
shield data from disk failures.
Performance enhancement: RAID can enhance performance by distributing data over
several drives, enabling the simultaneous execution of several read/write operations .
Scalability: RAID is scalable, therefore by adding more disks to the array, the storage
capacity may be expanded.
Versatility: RAID is applicable to a wide range of devices, such as workstations,
servers, and personal PCs
Disadvantages of RAID
Cost: RAID implementation can be costly, particularly for arrays with large capacities.
Complexity: The setup and management of RAID might be challenging.
Decreased performance: The parity calculations necessary for some RAID
configurations, including RAID 5 and RAID 6, may result in a decrease in speed.
Single point of failure: RAID is not a comprehensive backup solution while offering
data redundancy. The array’s whole contents could be lost if the RAID controller
malfunctions.
FILE SYSTEM STRUCTURE:
File System provide efficient access to the disk by allowing data to be stored, located and
retrieved in a convenient way. A file System must be able to store the file, locate the file and
retrieve the file.
Most of the Operating Systems use layering approach for every task including file systems.
Every layer of the file system is responsible for some activities.
The image shown below, elaborates how the file system is divided in different layers, and
also the functionality of each layer.
o When an application program asks for a file, the first request is directed to the logical
file system. The logical file system contains the Meta data of the file and directory
structure. If the application program doesn't have the required permissions of the file
then this layer will throw an error. Logical file systems also verify the path to the file.
o Generally, files are divided into various logical blocks. Files are to be stored in the
hard disk and to be retrieved from the hard disk. Hard disk is divided into various
tracks and sectors. Therefore, in order to store and retrieve the files, the logical blocks
need to be mapped to physical blocks. This mapping is done by File organization
module. It is also responsible for free space management.
o Once File organization module decided which physical block the application program
needs, it passes this information to basic file system. The basic file system is
responsible for issuing the commands to I/O control in order to fetch those blocks.
o I/O controls contain the codes by using which it can access hard disk. These codes are
known as device drivers. I/O controls are also responsible for handling interrupts.
File Access Methods:
Let's look at various ways to access files stored in secondary memory.
1 Sequential Access
Most of the operating systems access the file sequentially. In other words, we can say that
most of the files need to be accessed sequentially by the operating system.
In sequential access, the OS read the file word by word. A pointer is maintained which
initially points to the base address of the file. If the user wants to read first word of the file
then the pointer provides that word to the user and increases its value by 1 word. This process
continues till the end of the file.
Modern word systems do provide the concept of direct access and indexed access but the
most used method is sequential access due to the fact that most of the files such as text files,
audio files, video files, etc need to be sequentially accessed.
2 Direct Access
The Direct Access is mostly required in the case of database systems. In most of the cases,
we need filtered information from the database. The sequential access can be very slow and
inefficient in such cases.
Suppose every block of the storage stores 4 records and we know that the record we needed is
stored in 10th block. In that case, the sequential access will not be implemented because it
will traverse all the blocks in order to access the needed record.
Direct access will give the required result despite of the fact that the operating system has to
perform some complex tasks such as determining the desired block number. However, that is
generally implemented in database applications.
3 Indexed Access
If a file can be sorted on any of the filed then an index can be assigned to a group of certain
records. However, A particular record can be accessed by its index. The index is nothing but
the address of a record in the file.
In index accessing, searching in a large database became very quick and easy but we need to
have some extra space in the memory to store the index value.
Allocation Methods
There are various methods which can be used to allocate disk space to the files. Selection of
an appropriate allocation method will significantly affect the performance and efficiency of
the system. Allocation method provides a way in which the disk will be utilized and the files
will be accessed.
There are following methods which can be used for allocation.
1. Contiguous Allocation.
2. Extents
3. Linked Allocation
4. Clustering
5. FAT
6. Indexed Allocation
7. Linked Indexed Allocation
8. Multilevel Indexed Allocation
9. Inode
We will discuss three of the most used methods in detail.
1 Contiguous Allocation
If the blocks are allocated to the file in such a way that all the logical blocks of the file get the
contiguous physical block in the hard disk then such allocation scheme is known as
contiguous allocation.
In the image shown below, there are three files in the directory. The starting block and the
length of each file are mentioned in the table. We can check in the table that the contiguous
blocks are assigned to each file as per its need.
Advantages
1. It is simple to implement.
2. We will get Excellent read performance.
3. Supports Random Access into files.
Disadvantages
1. The disk will become fragmented.
2. It may be difficult to have a file grow.
2 Linked List Allocation
Linked List allocation solves all problems of contiguous allocation. In linked list allocation,
each file is considered as the linked list of disk blocks. However, the disks blocks allocated to
a particular file need not to be contiguous on the disk. Each disk block allocated to a file
contains a pointer which points to the next disk block allocated to the same file.
Advantages
1. There is no external fragmentation with linked allocation.
2. Any free block can be utilized in order to satisfy the file block requests.
3. File can continue to grow as long as the free blocks are available.
4. Directory entry will only contain the starting block address.
Disadvantages
1. Random Access is not provided.
2. Pointers require some space in the disk blocks.
3. Any of the pointers in the linked list must not be broken otherwise the file will get
corrupted.
4. Need to traverse each block.
3 Indexed Allocation
Indexed Allocation Scheme
Instead of maintaining a file allocation table of all the disk pointers, Indexed allocation
scheme stores all the disk pointers in one of the blocks called as indexed block. Indexed
block doesn't hold the file data, but it holds the pointers to all the disk blocks allocated to that
particular file. Directory entry will only contain the index block address.
Advantages
1. Supports direct access
2. A bad data block causes the lost of only that block.
Disadvantages
1. A bad index block could cause the lost of entire file.
2. Size of a file depends upon the number of pointers, a index block can hold.
3. Having an index block for a small file is totally wastage.
4. More pointer overhead
Disk Data Structures:
There are various on disk data structures that are used to implement a file system. This
structure may vary depending upon the operating system.
1. Boot Control Block
Boot Control Block contains all the information which is needed to boot an operating
system from that volume. It is called boot block in UNIX file system. In NTFS, it is
called the partition boot sector.
2. Volume Control Block
Volume control block all the information regarding that volume such as number of
blocks, size of each block, partition table, pointers to free blocks and free FCB blocks.
In UNIX file system, it is known as super block. In NTFS, this information is stored
inside master file table.
3. Directory Structure (per file system)
A directory structure (per file system) contains file names and pointers to
corresponding FCBs. In UNIX, it includes inode numbers associated to file names.
4. File Control Block
File Control block contains all the details about the file such as ownership details,
permission details, file size,etc. In UFS, this detail is stored in inode. In NTFS, this
information is stored inside master file table as a relational database structure. A
typical file control block is shown in the image below.