OS_FileSystem
OS_FileSystem
Introduction
A file is a collection of related information. Files are organized into a structure called the
file-system. The file systems reside on secondary storage (disks). The file systems provide
efficient and convenient access to the disk by allowing data to be stored, located and retrieved
easily. In this module, we understand the different layers in a layered file system, the on-disk
structures and the in-memory structures used for the implementation of file systems and
issues related to the implementation of directories.
File-System structure
A file system is organized into many layers. The different layers of a file system are shown
in Figure 34.1. The application programs written by users are shown in the top most layer.
The users’ programs use the logical file system. The logical file system invokes different data
structures and the file-organization module. The file-organization module uses the basic file
system, which, in turn uses I/O control, which, in turn accesses the I/O devices. We now see
the functionalities of each of these layers in the subsequent subsections.
This layer is placed just above the I/O devices. This level comprises of the device drivers
and the interrupt handlers. The device drivers act as an interface between the devices and the
operating system. They help to transfer information between the main memory and the disk.
An example of input given to a device driver is – ‘retrieve block 123’. In response to this, the
device driver has to send low-level hardware specific instructions to the disk controller. The
disk controller assists in reading block 123 from the disk.
This layer issues generic commands to the device driver to read and write physical blocks
on the disk. The input received from the I/O control layer is sent from this layer. Memory
buffers and caches are maintained by the operating system in the main memory. The basic
file system layer is responsible for managing the memory buffers and caches. Each memory
buffer can hold contents equal to the size of a block. Each buffer holds the contents of a disk
block. The contents that are read from the disk are copied to the buffers and can even be used
later. The cache holds frequently used file-system metadata. This can be the contents of the
file or attributes of the file such as the owner of the file, size of the file and so on. If the file is
used frequently, the metadata can be used from the cache itself. It is not necessary to read
from the disk each time.
2. File-organization module
This layer uses the functionalities of the basic file system layer. This layer knows about
files and their logical and physical blocks. The logical blocks are with respect to a particular
file. The logical blocks are numbered from 0 to N for a particular file. Physical blocks do not
match the logical numbers. Logical block i need not be kept in block number i in the physical
memory. It can be kept in any physical disk block. Therefore, it is necessary to know the
location of the location of the file in the disk (That is the locations of the disk blocks where
the contents of the file are kept in the disk). Therefore, appropriate data structures are
maintained by the file-organization module to know the mapping between the logical
block number and the physical bock number. The file-organization module also has a free-
space manager, which tracks unallocated disk blocks.
3. Logical file system
This layer lies above the file-organization module. This layer manages the metadata
information of a file. The metadata information includes all details about a file except the
actual contents of the file, for example, the name of the file, the size of the file and so on.
This layer also manages the directory structure. It maintains the file-structure via file-control
blocks. A File-control block (FCB) (inode in UNIX) has information about a file – owner,
size, permissions, time of access, location of file contents and so on.
Duplication of code is minimized. The code for I/O control and basic file-system layers
can be used by multiple file systems. The layers above these two layers can be modified for
different file systems. That is, each file system can then have its own logical file system and
file-organization modules, while the I/O control and basic file-system layers being the same.
Disadvantages
Having a layered file system can introduce more operating-system overhead, resulting in
decreased performance. The decision about how many layers to use, what each layer should
do is a challenge.
Many file-systems are in use today – UNIX file system, FAT, FAT32, NTFS, ext3, ext4,
Google. The FAT, FAT32 and NTFS are used in Windows operating systems. Ext3 and ext4
are used in Linux operating systems. Google has its own distributed file system called the
Google File System (GFS).
File-System Implementation
In this section, we learn different data structures that used to assist in the implementation
of file systems. There are several on-disk and in-memory structures used for the
implementation of file systems. The on-disk structures are kept in the disks. The on-disk
structures contain information about how to boot an OS stored in the disk, total number of
disk blocks, number and location of free disk blocks, directory structure and individual files.
The in-memory structures are kept in the main memory. The in-memory structures are helpful
for file-system management, caching and so on.
On-Disk Structures
Operating system is kept in the boot control block. If the disk has no operating system, this
block is empty. In UNIX, the boot control block is called as a boot block. In NTFS, the boot
control block is called as a partition boot sector.The boot control block is usually the first
block of the volume where the file system is kept. If a partition of a disk has an operating
system, information about how to boot.
This data structure contains details about a volume (partition). The information maintained
in the volume control block are number of blocks in the partition, size of each block, number
of free blocks in the partition, free-block pointers (addresses of free disk blocks) and so on. In
UNIX the volume control block is called a superblock and is the block next to the boot block.
In NTFS, these details are stored in the master file table.
The directory structure is used to organize files. In the directory structure, the names of
files and associated information are kept. In UNIX, the directory structure includes file names
and associated inode numbers. In NTFS, these details are stored in the master file table.
4 Per-file file control block (FCB)
For each and every file, information about that file is maintained in a file control block.
The FCB has a unique identifier number to allow association with a directory entry. In UNIX,
the per-file file control block is nothing but the inode. The inode has an inode number, which
is the unique identifier. In NTFS, the details about the file is stored in the master file table.
In-Memory Structures
These are data structures that are maintained inside the main memory.
• Mount table – Information about each mounted volume is maintained in the mount
table.
• Directory-structure cache – Holds directory information of recently accessed
directories. If the same directory has to be accessed again, it is not necessary to read from the
disk. The details can be taken from the directory-structure cache.
• System-wide open-file table – This data structure is common to all the processes
present in the system. This contains a copy of the FCB of each open file.
• Per-process open-file table – This is a table that is available for each and every process.
This per-process open-file table points to the appropriate entry in the system-wide open-file
table.
• Buffers – These are buffers that are kept in the memory to hold the contents of disk
blocks.
Each buffer can hold the contents of one disk block. When contents of a disk block are
read from the disk, they are stored in these buffers kept in memory. If the contents of the
same disk block are needed again, the contents are taken from the buffer and need not be read
from the disk. Similarly, the contents present in the buffer may be modified and need not be
written to the disk for each and every modification. It is enough to write the contents of the
buffer to the disk when the buffer is to be used for some other disk block contents.
File Operations
We now see how the data structures that we learnt in the previous sections are used during
file operations.
The application program calls the logical file system and gives the name of the file to be
created to the logical file system. The logical file system knows the name of the directory
structures. It finds the name of the directory in which the file is to be created from the file
name given by the application program. The logical file system allocates a new FCB. In the
case of UNIX, a new inode is allocated. The system reads the appropriate parent directory
into memory. It updates the directory with the new file name and FCB and writes the
directory back to disk. Figure 34.2 shows a typical file control block. The FCB has
information about the file like the owner of the file, file size, file permissions and so on.
The open() call called by the application program passes a file name to the logical file
system. The call first searches the system-wide open-file table to see if the file is already in
use by another process. If it is, a per-process open-file table entry is created pointing to the
existing system-wide open-file table entry. If file is not already open, the directory structure
is searched for the given file name. There is a possibility that parts of the directory structure
are cached in memory. If the directory structure is present in the cache, it is taken from the
cache. Else, the directory structure is read from the disk.
Once the file is found, the FCB is copied into an entry in the system-wide open-file table
in memory. The FCB entry also keeps track of the number of processes that have opened the
file. An entry is made in the per-process open-file table. This entry points to the system-wide
open-file table. The FCB entry also has information about where the next read/write should
be done on the file (file offset), access mode in which the file is open. open() returns a pointer
to the entry in the per-process file-system table. All file operations after the open() system
call use this pointer. In UNIX this pointer is called the file descriptor (file handle in
Windows).
3 Close a File
When a process closes a file, the per-process open-file table’s entry is removed. The count
(count of the number of processes using the file) in the system-wide open-file table entry is
decremented. When the count becomes zero, the updated metadata is copied to the directory
structure in the disk and the system-wide open-file table entry is removed.
Figure (a) shows that the open call accesses the directory structure in memory. If the
directory structure is not cached in memory, it is read from the disk. The file control block is
accessed using the directory structure. If a copy of the FCB is not present in the memory, it is
read from the disk.
Figure (b) shows how the read system call uses the in-memory data structures. The read
uses the index returned by the open call to access the entry in the per-process open-file table.
The entry in the system-wide open-file table is obtained using the pointer from the per-
process open-file table. The file control block is accessed from the system-wide open-file
table and the data blocks are accessed using the entries in the FCB.
Physical disks are commonly divided into smaller units called partitions. They can also be combined
into larger units, but that is most commonly done for RAID installations and is left for later chapters.
Partitions can either be used as raw devices ( with no structure imposed upon them ), or they can be
formatted to hold a filesystem ( i.e. populated with FCBs and initial directory structures as
appropriate. ) Raw partitions are generally used for swap space, and may also be used for certain
programs such as databases that choose to manage their own disk storage system. Partitions
containing filesystems can generally only be accessed using the file system structure by ordinary
users, but can often be accessed as a raw device also by root.
The boot block is accessed as part of a raw partition, by the boot program prior to any operating
system being loaded. Modern boot programs understand multiple OSes and filesystem formats, and
can give the user a choice of which of several available systems to boot.
The root partition contains the OS kernel and at least the key portions of the OS needed to
complete the boot process. At boot time the root partition is mounted, and control is transferred from
the boot program to the kernel found there. ( Older systems required that the root partition lie
completely within the first 1024 cylinders of the disk, because that was as far as the boot program
could reach. Once the kernel had control, then it could access partitions beyond the 1024 cylinder
boundary. )
Continuing with the boot process, additional filesystems get mounted, adding their information into
the appropriate mount table structure. As a part of the mounting process the file systems may be
checked for errors or inconsistencies, either because they are flagged as not having been closed
properly the last time they were used, or just for general principals. Filesystems may be mounted
either automatically or manually. In UNIX a mount point is indicated by setting a flag in the in-
memory copy of the inode, so all future references to that inode get re-directed to the root directory of
the mounted filesystem.
Virtual File Systems
Virtual File Systems, VFS, provide a common interface to multiple different filesystem types.
In addition, it provides for a unique identifier ( vnode ) for files across the entire space, including
across all filesystems of different types. ( UNIX inodes are unique only across a single filesystem, and
certainly do not carry across networked file systems. )
Linux VFS provides a set of common functionalities for each filesystem, using function
pointers accessed through a table. The same functionality is accessed through the same table
position for all filesystem types, though the actual functions pointed to by the pointers may be
filesystem-specific. See /usr/include/linux/fs.h for full details. Common operations provided
include open( ), read( ), write( ), and mmap( ).
Operating systems implement a software cache to store the most recently used directory
information.
Maintaining a sorted list also allows a binary search and reduces the search time.
But maintaining a sorted list is difficult. When new entries are added, the new entries
should be added to the appropriate position.
Using a hash data structure –name and returns a pointer to the file name in the linear
list. This reduces the– In addition to having a linear list for storing the directory entries, a
hash data structure can also be used. The hash table takes a value computed from the
file search time. But provision must be made for collisions, that is, it is to be ensured that two
file names do not hash to the same location.
Allocation Methods
There are three major methods of storing files on disks: contiguous, linked, and
indexed.
1 Contiguous Allocation
Contiguous Allocation requires that all blocks of a file be kept together contiguously.
Performance is very fast, because reading successive blocks of the same file generally
requires no movement of the disk heads, or at most one small step to the next adjacent
cylinder.
Storage allocation involves the same issues discussed earlier for the allocation of
contiguous blocks of memory ( first fit, best fit, fragmentation problems, etc. ) The
distinction is that the high time penalty required for moving the disk heads from spot
to spot may now justify the benefits of keeping files contiguously when possible.
( Even file systems that do not by default store files contiguously can benefit from
certain utilities that compact the disk and make all files contiguous in the process. )
Problems can arise when files grow, or if the exact size of a file is unknown at
creation time:
o Over-estimation of the file's final size increases external fragmentation and
wastes disk space.
o Under-estimation may require that a file be moved or a process aborted if the
file grows beyond its originally allocated space.
o If a file grows slowly over a long time period and the total final space must be
allocated initially, then a lot of space becomes unusable before the file fills the
space.
A variation is to allocate file space in large contiguous chunks, called extents. When a
file outgrows its original extent, then an additional one is allocated. ( For example an
extent may be the size of a complete track or even cylinder, aligned on an appropriate
track or cylinder boundary. ) The high-performance files system Veritas uses extents
to optimize performance.
Figure: - Contiguous allocation of disk space
2. Linked Allocation
Linked allocation does not suffer from external fragmentation at all. Any block anywhere on
the volume can be used in any file any time. It is easy to add more blocks to a file at any time.
Compaction is never necessary.
The directory or FCB contains pointers to the first and last blocks of the file.
One way of implementing the rest of the pointers is to place in each data block a pointer to
the next block.
The major problem with linked allocation is that it supports direct access extremely
poorly. Generally, in order to get access to the Kth block of a file, it is necessary to follow K
pointers, starting with the pointer to the first block.
Also, since each file block can be anywhere on the volume, the average seek time required
per block access can be much greater than is the case with contiguous allocation.
If we link contiguous clusters of blocks instead of blocks there will be fewer pointers to
follow and the proportion of space on disk used by pointers will be smaller. On the other
hand, the average amount of internal fragmentation will increase.
Reliability is a problem, since the consequences of a lost or corrupted pointer are potentially
great.
The concept of a file allocation table (FAT) is a useful variation on linked allocation. The
FAT is a table stored at the beginning of the volume, having an entry for each physical block.
It is in effect an array, indexed by physical block number. The pointers used to link files
together are stored in the FAT instead of in the data blocks of the file.
As an example of how to use the FAT, suppose we want to access logical block 2 of file X.
First we consult X's directory entry or FCB to learn the physical block number of X's first
logical block, block 0. Let's say that physical block number is 123. We then examine the
contents of entry 123 in the FAT. Let's say 876 is stored there. That means that 876 is the
physical block number of X's logical block 1. We then go to entry 876 of the FAT, and find
there the physical address of X's logical block 2. Let's say that number is 546. All we have to
do now is request physical block 546.
By putting a special sentinel value in the entries of free blocks, the OS can implement a free-
block list within the FAT.
Using the FAT for direct access tends to require fewer seeks than with ordinary linked
allocation, because all the pointers are relatively close to each other. If the FAT is small
enough to keep in primary memory, then traversing the FAT would not require any seeks.
(The FAT for a terabyte drive with 4KB blocks would use about a gigabyte of primary
memory.) The design could enhance the reliability of the pointers by making backup copies
of the FAT.
Figure: File-allocation table
3. Indexed Allocation
Indexed allocation utilizes a per-file table where all the physical addresses of the data
blocks of the file are stored.
This table, like a FAT, is basically an array of physical block numbers. However, the
indexes into the array are logical block numbers. (The indexes into a FAT
are physical block numbers.)
To find the kth logical block of the file, we simply consult the kth entry in the table
and read off the physical address of the block.
The table is usually called "the index"
The directory or FCB would contain the address of the index.
The indexing scheme works pretty much the same way that paging works in primary
memory management. (The FAT scheme is something like an inverted page table, but
that analogy is weaker.)
With indexed allocation, there is no external fragmentation, and the OS can support
both sequential and direct access with acceptable efficiency.
Internal fragmentation occurs in the last data block of files and in the unused portions
of the index blocks.
If it is a simple one-level index, it may be possible to cache the entire index of a file in
main memory. If so, to find the physical address of any block in the file, it is enough
just to look at a single entry of the index. In contrast, it is likely that there will not be
enough memory to cache an entire FAT, and on average we have to probe half the
file's FAT entries to find one of the file's data blocks.
When using indexed allocation, each file block can be anywhere on the volume,
so there can be a long average seek time required for accessing a series of blocks of
the file, whether sequentially or randomly.
With contiguous allocation, the blocks of a file are close together, so when accessing
a series of blocks, the average seek time tends to be shorter than is the case with either
linked allocation or indexed allocation.
To accommodate large files, the system may resort to using a linked list of index
blocks, or a multilevel index in which one master index points to multiple second-
level index blocks that point to file data blocks.
Figure:Two level indexing
The unix inode utilizes a variation on multilevel indexing. The first 12 pointers in the
inode point to file data blocks. One entry of the inode points to a single indirect block,
which is a block of pointers to file blocks. Another entry of the inode points to
a double indirect block - a block of pointers to single indirect blocks. A third entry of
an inode points to a triple indirect block, which is a block of pointers to double
indirect blocks. "Under this method, the number of blocks that can be allocated to a
file exceeds the amount of space addressable by the 4-byte pointers used by many
operating systems" (4 GB).
Figure: Multi level indexing