File Structure
File Structure
File Structures
A file structure is a combination of representations for data in
files. It is also a collection of operations for accessing the data. It
enables applications to read, write, and modify data. File structures
may also help to find the data that matches certain criteria. An
improvement in file structure has a great role in making applications
hundreds of times faster.
The main goal of developing file structures is to minimize the
number of trips to the disk in order to get desired information. It
ideally corresponds to getting what we need in one disk access or
getting it with as little disk access as possible.
What is file?
A file a container in a computer system that stores data, information,
settings, or commands, which are used with a computer program. In
graphical user interface (GUI), such as Microsoft operating systems,
represent the files as icons, which associate to the program that opens
the file.
There are several types of files available such as directory files, data
files, text files, binary and graphic files, and these several kinds of
files contain different types of information. In the computer system,
files are stored on hard drives, optical drives, discs, or other storage
devices.
In most of the operating systems, a file must be saved with a unique
name within a given file directory. However, certain characters cannot
be used during creating a file as they are considered illegal. A
filename is consisted of with a file extension that is also called a
suffix. The file extension contains two to four characters that follow
the complete filename, and it helps to recognize the file format, type
of file, and the attributes related to the file.
What is a file system?
In computing, a file system -- sometimes written filesystem -- is a
logical and physical system for organizing, managing and accessing
the files and directories on a device's solid-state drive (SSD), hard-
disk drive (HDD) or other media. Without a file system, the operating
system (OS) would see only large chunks of data without any way to
distinguish one file from the next. As data capacities increase, the
efficient organization and accessibility of individual files becomes
even more important in data storage.
In this scheme, each file occupies a contiguous set of blocks on the disk.
For example, if a file requires n blocks and is given a block b as the
starting location, then the blocks assigned to the file will be: b, b+1,
b+2,……b+n-1. This means that given the starting block address and the
length of the file (in terms of blocks required), we can determine the
blocks occupied by the file.
The directory entry for a file with contiguous allocation contains
Address of starting block
Length of the allocated portion.
The file ‘mail’ in the following figure starts from the block 19 with
length = 6 blocks. Therefore, it occupies 19, 20, 21, 22, 23, 24 blocks.
Advantages:
Both the Sequential and Direct Accesses are supported by this. For
direct access, the address of the kth block of the file which starts at
block b can easily be obtained as (b+k).
This is extremely fast since the number of seeks are minimal because of
contiguous allocation of file blocks.
Disadvantages:
This method suffers from both internal and external fragmentation.
This makes it inefficient in terms of memory utilization.
Increasing file size is difficult because it depends on the availability
of contiguous memory at a particular instance.
Advantages:
This is very flexible in terms of file size. File size can be increased
easily since the system does not have to look for a contiguous chunk
of memory.
This method does not suffer from external fragmentation. This makes
it relatively better in terms of memory utilization.
Disadvantages:
Because the file blocks are distributed randomly on the disk, a large
number of seeks are needed to access every block individually. This
makes linked allocation slower.
It does not support random or direct access. We can not directly
access the blocks of a file. A block k of a file can be accessed by
traversing k blocks sequentially (sequential access ) from the starting
block of the file via block pointers.
Pointers required in the linked allocation incur some extra overhead.
3. Indexed Allocation
In this scheme, a special block known as the Index block contains
the pointers to all the blocks occupied by a file. Each file has its own
index block. The ith entry in the index block contains the disk
address of the ith file block. The directory entry contains the address
of the index block as shown in the image:
Advantages:
This supports direct access to the blocks occupied by the file and
therefore provides fast access to the file blocks.
It overcomes the problem of external fragmentation.
Disadvantages:
The pointer overhead for indexed allocation is greater than linked
allocation.
For very small files, say files that expand only 2-3 blocks, the
indexed allocation would keep one entire block (index block) for the
pointers which is inefficient in terms of memory utilization.
However, in linked allocation we lose the space of only 1 pointer per
block.
The FAT file system got its start with the introduction of MS-DOS, or
Microsoft Disk Operating System, and is still in use today, but it has
evolved over the years to accommodate growing data volumes. During
that time, there have been multiple versions of the file system. The
original version was FAT8, followed by FAT12, then FAT16 and finally
FAT32. Several variants have also been developed based on the last three
versions -- FAT12, FAT16 and FAT32 -- further extending the file
system.
The number associated with each FAT version (e.g., FAT16) refers to the
number of bits that are used for each entry in the allocation table. For
example, an allocation table entry for a FAT16 volume is 16 bits long.
However, it's not just the number of entry bits that differentiates the FAT
versions. Each version also supports larger volumes and file sizes than its
predecessor. Even so, FAT is still limited to small-scale storage compared
with more modern file systems. The largest file size FAT32 can support
is 4 gigabytes, and the largest volume size is 2 terabytes.
At one time, the FAT file system was used extensively for Windows
computers. However, today, most Windows systems use the New
Technology File System (NTFS) -- or, to a lesser degree, the Resilient
File System (ReFS), which Microsoft introduced in 2012 promising to
support greater availability, resiliency and scalability.
One bit in each directory entry is used to identify the entry as either a file
(represented by ‘0’) or a directory (represented by ‘1’).
Tree-structured directory
Users
Each user has their own directory and cannot access the directory of
another user. All users can read data from the root directory.
However, they cannot write to or modify the root directory. This
privilege is reserved for the system administrator who has complete
access to the root directory.
User directories
Paths
Absolute path
Relative path
The absolute path is the path to the file with respect to the root
directory.
The relative path is the path to the file with respect to the current
working directory.
Path Types
Consider the above directory tree. Let’s assume that the current working
directory is: root/home/Desktop.
To access the file under the directory labeled Edpresso, we can use the
following paths:
Advantages
A tree-structured directory is very general because every file can be
accessed using its absolute path.
Disadvantages
Tree-structured directories do not allow file sharing between users.
They are also inefficient in the sense that accessing a file may force you
to go under multiple directories.
Advantages Disadvantages
General Inefficient
Scalable No file sharing
Easy to Seacrh File duplication in multiple directories