File System
File System
File systems can be used on many types of storage devices using various media.
As of 2019, hard disk drives have been key storage devices and are projected to
remain so for the foreseeable future. Other kinds of media that are used
include SSDs, magnetic tapes, and optical discs. In some cases, such as with tmpfs,
the computer's main memory (random-access memory, RAM) is used to create a
temporary file system for short-term use.
Some file systems are used on local data storage devices; others provide file
access via a network protocol (for example, NFS, SMB, or 9P clients). Some file
systems are "virtual", meaning that the supplied "files" (called virtual files) are
computed on request (such as procfs and sysfs) or are merely a mapping into a
different file system used as a backing store. The file system manages access to
both the content of files and the metadata about those files.
Origin of the term
Before the advent of computers the term file system was used to describe a
method of storing and retrieving paper documents. By 1961, the term was being
applied to computerized filing alongside the original meaning. By 1964, it was in
general use.
Architecture
A file system consists of two or three layers. Sometimes the layers are explicitly
separated, and sometimes the functions are combined.
The logical file system is responsible for interaction with the user application. It
provides the application program interface (API) for file operations
— OPEN , CLOSE , READ , etc., and passes the requested operation to the layer
below it for processing. The logical file system "manage[s] open file table entries
and per-process file descriptors". This layer provides "file access, directory
operations, [and] security and protection".
The second optional layer is the virtual file system. "This interface allows support
for multiple concurrent instances of physical file systems, each of which is called a
file system implementation".
The third layer is the physical file system. This layer is concerned with the physical
operation of the storage device (e.g. disk). It processes physical blocks being read
or written. It handles buffering and memory management and is responsible for
the physical placement of blocks in specific locations on the storage medium. The
physical file system interacts with the device drivers or with the channel to drive
the storage device.
Aspects of file systems
Space management
File systems allocate space in a granular manner, usually multiple physical units
on the device. The file system is responsible for organizing files and directories,
and keeping track of which areas of the media belong to which file and which are
not being used. For example, in Apple DOS of the early 1980s, 256-byte sectors on
140 kilobyte floppy disk used a track/sector map.
This results in unused space when a file is not an exact multiple of the allocation
unit, sometimes referred to as slack space. For a 512-byte allocation, the average
unused space is 256 bytes. For 64 KB clusters, the average unused space is 32 KB.
The size of the allocation unit is chosen when the file system is created. Choosing
the allocation size based on the average size of the files expected to be in the file
system can minimize the amount of unusable space. Frequently the default
allocation may provide reasonable usage. Choosing an allocation size that is too
small results in excessive overhead if the file system will contain mostly very large
files.
File system fragmentation occurs when unused space or single files are not
contiguous. As a file system is used, files are created, modified and deleted. When
a file is created, the file system allocates space for the data. Some file systems
permit or require specifying an initial space allocation and subsequent
incremental allocations as the file grows. As files are deleted, the space they were
allocated eventually is considered available for use by other files. This creates
alternating used and unused areas of various sizes. This is free space
fragmentation. When a file is created and there is not an area of contiguous space
available for its initial allocation, the space must be assigned in fragments. When
a file is modified such that it becomes larger, it may exceed the space initially
allocated to it, another allocation must be assigned elsewhere and the file
becomes fragmented.
File names
A filename (or file name) is used to identify a storage location in the file system.
Most file systems have restrictions on the length of filenames. In some file
systems, filenames are not in others, filenames are case sensitive (i.e., the
names MYFILE , MyFile , and myfile refer to three separate files that are in the
same directory).
Most modern file systems allow filenames to contain a wide range of characters
from the Unicode character set. However, they may have restrictions on the use
of certain special characters, disallowing them within filenames; those characters
might be used to indicate a device, device type, directory prefix, file path
separator, or file type.
Directories
File systems typically have directories (also called folders) which allow the user to
group files into separate collections. This may be implemented by associating the
file name with an index in a table of contents or an inode in a Unix-like file
system. Directory structures may be flat (i.e. linear), or allow hierarchies where
directories may contain subdirectories. The first file system to support arbitrary
hierarchies of directories was used in the Multics operating system.[12] The native
file systems of Unix-like systems also support arbitrary directory hierarchies, as
do, for example, Apple's Hierarchical File System and its successor HFS+ in classic
Mac OS, the FAT file system in MS-DOS 2.0 and later versions of MS-DOS and
in Microsoft Windows, the NTFS file system in the Windows NT family of
operating systems, and the ODS-2 (On-Disk Structure-2) and higher levels of
the Files-11 file system in OpenVMS.
Meta data
Other bookkeeping information is typically associated with each file within a file
system. The length of the data contained in a file may be stored as the number of
blocks allocated for the file or as a byte count. The time that the file was last
modified may be stored as the file's timestamp. File systems might store the file
creation time, the time it was last accessed, the time the file's metadata was
changed, or the time the file was last backed up. Other information can include
the file's device type (e.g. block, character, socket, subdirectory, etc.), its
owner user ID and group ID, its access permissions and other file attributes (e.g.
whether the file is read-only, executable, etc.).
A file system stores all the metadata associated with the file—including the file
name, the length of the contents of a file, and the location of the file in the folder
hierarchy—separate from the contents of the file.
Most file systems store the names of all the files in one directory in one place—
the directory table for that directory—which is often stored like any other file.
Many file systems put only some of the metadata for a file in the directory table,
and the rest of the metadata for that file in a completely separate structure, such
as the inode.
Some file systems allow for different data collections to be associated with one
file name. These separate collections may be referred to as streams or forks.
Apple has long used a forked file system on the Macintosh, and Microsoft
supports streams in NTFS. Some file systems maintain multiple past revisions of a
file under a single file name; the filename by itself retrieves the most recent
version, while prior saved version can be accessed using a special naming
convention such as "filename;4" or "filename(-4)" to access the version four saves
ago.
Utilities
File systems include utilities to initialize, alter parameters of and remove an
instance of the file system. Some include the ability to extend or truncate the
space allocated to the file system.
Directory utilities may be used to create, rename and delete directory entries,
which are also known as dentries (singular: dentry), and to alter metadata
associated with a directory. Directory utilities may also include capabilities to
create additional links to a directory (hard links in Unix), to rename parent links
(".." in Unix-like operating systems),[clarification needed] and to create bidirectional links
to files.
File utilities create, list, copy, move and delete files, and alter metadata. They may
be able to truncate data, truncate or extend space allocation, append to, move,
and modify files in-place. Depending on the underlying structure of the file
system, they may provide a mechanism to prepend to or truncate from the
beginning of a file, insert entries into the middle of a file, or delete entries from a
file. Utilities to free space for deleted files, if the file system provides an undelete
function, also belong to this category.
Some file systems defer operations such as reorganization of free space, secure
erasing of free space, and rebuilding of hierarchical structures by providing
utilities to perform these functions at times of minimal activity. An example is the
file system defragmentation utilities.
Some of the most important features of file system utilities are supervisory
activities which may involve bypassing ownership or direct access to the
underlying device. These include high-performance backup and recovery, data
replication, and reorganization of various data structures and allocation tables
within the file system.
Methods for encrypting file data are sometimes included in the file system. This is
very effective since there is no need for file system utilities to know the
encryption seed to effectively manage the data. The risks of relying on encryption
include the fact that an attacker can copy the data and use brute force to decrypt
the data. Additionally, losing the seed means losing the data.
Maintaining integrity
One significant responsibility of a file system is to ensure that the file system
structures in secondary storage remain consistent, regardless of the actions by
programs accessing the file system. This includes actions taken if a program
modifying the file system terminates abnormally or neglects to inform the file
system that it has completed its activities. This may include updating the
metadata, the directory entry and handling any data that was buffered but not
yet updated on the physical storage media.
Other failures which the file system must deal with include media failures or loss
of connection to remote systems.
The file system must also record events to allow analysis of systemic issues as well
as problems with specific files or directories.
User data
The most important purpose of a file system is to manage user data. This includes
storing, retrieving and updating data.
Some file systems accept data for storage as a stream of bytes which are collected
and stored in a manner efficient for the media. When a program retrieves the
data, it specifies the size of a memory buffer and the file system transfers data
from the media to the buffer. A runtime library routine may sometimes allow the
user program to define a record based on a library call specifying a length. When
the user program reads the data, the library retrieves data via the file system and
returns a record.
Some file systems allow the specification of a fixed record length which is used for
all writes and reads. This facilitates locating the nth record as well as updating
records.
An identification for each record, also known as a key, makes for a more
sophisticated file system. The user program can read, write and update records
without regard to their location. This requires complicated management of blocks
of media usually separating key blocks and data blocks. Very efficient algorithms
can be developed with pyramid structures for locating records.
Using a file system
Utilities, language specific run-time libraries and user programs use file system
APIs to make requests of the file system. These include data transfer, positioning,
updating metadata, managing directories, managing access specifications, and
removal.
Another approach is to partition the disk so that several file systems with different
attributes can be used. One file system, for use as browser cache or email storage,
might be configured with a small allocation size. This keeps the activity of creating
and deleting files typical of browser activity in a narrow area of the disk where it
will not interfere with other file allocations. Another partition might be created
for the storage of audio or video files with a relatively large block size. Yet another
may normally be set read-only and only periodically be set writable.
Having multiple file systems on a single system has the additional benefit that in
the event of a corruption of a single partition, the remaining file systems will
frequently still be intact. This includes virus destruction of the system partition or
even a system that will not boot. File system utilities which require dedicated
access can be effectively completed piecemeal. In addition, defragmentation may
be more effective. Several system maintenance utilities, such as virus scans and
backups, can also be processed in segments. For example, it is not necessary to
backup the file system containing videos along with all the other files if none have
been added since the last backup. As for the image files, one can easily "spin off"
differential images which contain only "new" data written to the master (original)
image. Differential images can be used for both safety concerns (as a "disposable"
system - can be quickly restored if destroyed or contaminated by a virus, as the
old image can be removed and a new image can be created in matter of seconds,
even without automated procedures) and quick virtual machine deployment
(since the differential images can be quickly spawned using a script in batches).
Design limitations
All file systems have some functional limit that defines the maximum storable
data capacity within that system. These functional limits are a best-guess effort by
the designer based on how large the storage systems are right now and how large
storage systems are likely to become in the future. Disk storage has continued to
increase at near exponential rates (see Moore's law), so after a few years, file
systems have kept reaching design limitations that require computer users to
repeatedly move to a newer system with ever-greater capacity.
File system complexity typically varies proportionally with the available storage
capacity. The file systems of early 1980s home computers with 50 KB to 512 KB of
storage would not be a reasonable choice for modern storage systems with
hundreds of gigabytes of capacity. Likewise, modern file systems would not be a
reasonable choice for these early systems, since the complexity of modern file
system structures would quickly consume or even exceed the very limited
capacity of the early storage systems.
Types of file systems
File system types can be classified into disk/tape file systems, network file
systems and special-purpose file systems.
Optical discs
ISO 9660 and Universal Disk Format (UDF) are two common formats that
target Compact Discs, DVDs and Blu-ray discs. Mount Rainier is an extension to
UDF supported since 2.6 series of the Linux kernel and since Windows Vista that
facilitates rewriting to DVDs.
A flash file system considers the special abilities, performance and restrictions
of flash memory devices. Frequently a disk file system can use a flash memory
device as the underlying storage media, but it is much better to use a file system
specifically designed for a flash device.
Tape file systems
A tape file system is a file system and tape format designed to store files on
tape. Magnetic tapes are sequential storage media with significantly longer
random data access times than disks, posing challenges to the creation and
efficient management of a general-purpose file system.
In a disk file system there is typically a master file directory, and a map of used
and free data regions. Any file additions, changes, or removals require updating
the directory and the used/free maps. Random access to data regions is
measured in milliseconds so this system works well for disks.
Tape requires linear motion to wind and unwind potentially very long reels of
media. This tape motion may take several seconds to several minutes to move the
read/write head from one end of the tape to the other.
Consequently, a master file directory and usage map can be extremely slow and
inefficient with tape. Writing typically involves reading the block usage map to
find free blocks for writing, updating the usage map and directory to add the data,
and then advancing the tape to write the data in the correct spot. Each additional
file write requires updating the map and directory and writing the data, which
may take several seconds to occur for each file.
Tape file systems instead typically allow for the file directory to be spread across
the tape intermixed with the data, referred to as streaming, so that time-
consuming and repeated tape motions are not required to write new data.
However, a side effect of this design is that reading the file directory of a tape
usually requires scanning the entire tape to read all the scattered directory
entries. Most data archiving software that works with tape storage will store a
local copy of the tape catalog on a disk file system, so that adding files to a tape
can be done quickly without having to rescan the tape media. The local tape
catalog copy is usually discarded if not used for a specified period of time, at
which point the tape must be re-scanned if it is to be used in the future.
IBM has developed a file system for tape called the Linear Tape File System. The
IBM implementation of this file system has been released as the open-source IBM
Linear Tape File System — Single Drive Edition (LTFS-SDE) product. The Linear
Tape File System uses a separate partition on the tape to record the index meta-
data, thereby avoiding the problems associated with scattering directory entries
across the entire tape.
Tape formatting
Because of the time it can take to format a tape, typically tapes are pre-formatted
so that the tape user does not need to spend time preparing each new tape for
use. All that is usually necessary is to write an identifying media label to the tape
before use, and even this can be automatically written by software when a new
tape is used for the first time.
Another concept for file management is the idea of a database-based file system.
Instead of, or in addition to, hierarchical structured management, files are
identified by their characteristics, like type of file, topic, author, or similar rich
metadata.
IBM DB2 for i (formerly known as DB2/400 and DB2 for i5/OS) is a database file
system as part of the object based IBM ioperating system (formerly known as
OS/400 and i5/OS), incorporating a single level store and running on IBM Power
Systems (formerly known as AS/400 and iSeries), designed by Frank G. Soltis IBM's
former chief scientist for IBM i. Around 1978 to 1988 Frank G. Soltis and his team
at IBM Rochester have successfully designed and applied technologies like the
database file system where others like Microsoft later failed to accomplish. These
technologies are informally known as 'Fortress Rochester'and were in few basic
aspects extended from early Mainframe technologies but in many ways more
advanced from a technological perspective
Many Web content management systems use a relational DBMS to store and
retrieve files. For example, XHTML files are stored as XML or text fields, while
image files are stored as blob fields; SQL SELECT (with optional XPath)
statements retrieve the files, and allow the use of a sophisticated logic and
more rich information associations than "usual file systems." Many CMSs also
have the option of storing only metadata within the database, with the
standard filesystem used to store the content of files.
Very large file systems, embodied by applications like Apache
Hadoop and Google File System, use some database file system concepts.
Advantages of File Management System
1. Cost Effective
And also, this type of management greatly reduces the space that the office
occupies for the documents.
2. Security
The traditional method of storing files cannot match the level of security provided
by the file management system. In fact security is one of the reason why many
organizations prefer to use file management system. The documents stored in the
file management system is protected using authentication methods like username
and password.
3. Reliability
The data that is stored in a file management system is far more reliable than
physically storing it using papers and files. Unlike traditional methods of storing
data, files here is very less likely to undergo damage or destruction. Any damages
from nature or handling can be completely avoided in a file management system.
All the data of the users are stored inside the servers. Therefore, users can ensure
that their documents are secured from potential damages.
4. Data Sharing
Data sharing is one of the key features of a file management system. FMS allows
very efficient way of sharing data with each and every person. The same data that
is stored on files can be shared with multiple users simultaneously.
5. Data Retrieval
Using file management system means that it will be very easy to retrieve data.
File management system follows a digital approach that provides access to
required data within few minutes. Users don't need to search copies of
documents manually here. Thus, there is very less amount of time spent for data
retrieval.
6. Data Backup
In case of a failure, file management system provides a seamless way for backing
up data. For this purpose, computers on default offer functionalities. However, if
needed there can be also third party application programs be used.
7. Environment Friendly
Due to the fact that file management system follows a digital system and there is
no paper works involved, it can be said that this technique is more environment
friendly.
By green practices, not only the cost of an organization is reduced, it can increase
the overall image of a company. As a result, it can provide tax benefits and other
advertising opportunities as well.
Disadvantages of File Management System
1. Redundancy
2. Inconsistency
Due to the effect of data redundancy this often leads to data inconsistency. Which
means that the same copies of data located in different places contain different
values. For preventing this, there should be paper listing among different files.
3. Accessibility
4. Integrity
The data that is present on a file management system can get integrated.
Meaning it is not correct and consistent. Most often this is caused in the presence
of consistency constraints. Constraints are imposed by the programmers using
programming codes. If the integrity continues, it can make the process of adding
new constraints to be difficult.
5. Atomicity
Atomicity refers to the data that is incomplete. This often happens if the data is
either completely entered or not entered at all. For an example, your system
could fail in the middle of a transaction leading to data atomicity. Unlike in
database management system, it is difficult to ensure atomicity in file
management system.
6. Data duplication
Since data is stored in more than one location, there is a possibility of data
duplication to take place. If file management system undergoes data duplication it
will cause problems in the storage space. These duplications are difficult to
correct due to the fact that they are independent to each other. Hence, it requires
manual correction which can take time and effort.
7. Data isolation
If the data is stored in different locations, this could essentially mean that they
are isolated in file management system. Under this circumstances, the formats of
each file can vary significantly. As a result, extracting data from files can be
difficult as it requires complex programming.
In the Linux kernel, configfs and sysfs provide files that can be used to query
the kernel for information and configure entities in the kernel.
procfs maps processes and, on Linux, other operating system structures into a
filespace.
When the system needed to write data, the user was notified to press "RECORD"
on the cassette recorder, then press "RETURN" on the keyboard to notify the
system that the cassette recorder was recording. The system wrote a sound to
provide time synchronization, then modulated sounds that encoded a prefix, the
data, a checksum and a suffix. When the system needed to read data, the user
was instructed to press "PLAY" on the cassette recorder. The system
would listen to the sounds on the tape waiting until a burst of sound could be
recognized as the synchronization. The system would then interpret subsequent
sounds as data. When the data read was complete, the system would notify the
user to press "STOP" on the cassette recorder. It was primitive, but it (mostly)
worked. Data was stored sequentially, usually in an unnamed format, although
some systems (such as the Commodore PET series of computers) did allow the
files to be named. Multiple sets of data could be written and located by fast-
forwarding the tape and observing at the tape counter to find the approximate
start of the next data region on the tape.
Unix-like systems assign a device name to each device, but this is not how the files
on that device are accessed. Instead, to gain access to files on another device, the
operating system must first be informed where in the directory tree those files
should appear. This process is called mounting a file system. For example, to
access the files on a CD-ROM, one must tell the operating system "Take the file
system from this CD-ROM and make it appear under such-and-such directory."
The directory given to the operating system is called the mount point – it might,
for example, be /media. The /media directory exists on many Unix systems (as
specified in the Filesystem Hierarchy Standard) and is intended specifically for use
as a mount point for removable media such as CDs, DVDs, USB drives or floppy
disks. It may be empty, or it may contain subdirectories for mounting individual
devices. Generally, only the administrator (i.e. root user) may authorize the
mounting of file systems.
Unix-like operating systems often include software and tools that assist in the
mounting process and provide it new functionality. Some of these strategies have
been coined "auto-mounting" as a reflection of their purpose.
Solaris
Support for other file systems and significant enhancements were added over
time, including Veritas Software Corp. (journaling) VxFS, Sun Microsystems
(clustering) QFS, Sun Microsystems (journaling) UFS, and Sun Microsystems (open
source, poolable, 128 bit compressible, and error-correcting) ZFS.
Logical Volume Management allows for spanning a file system across multiple
devices for the purpose of adding redundancy, capacity, and/or throughput.
Legacy environments in Solaris may use Solaris Volume Manager (formerly known
as Solstice DiskSuite). Multiple operating systems (including Solaris) may
use Veritas Volume Manager. Modern Solaris based operating systems eclipse the
need for volume management through leveraging virtual storage pools in ZFS.
macOS
macOS (formerly Mac OS X) uses the Apple File System (APFS), which in 2017
replaced a file system inherited from classic Mac OS called HFS Plus (HFS+). Apple
also uses the term "Mac OS Extended" for HFS+.[27] HFS Plus is a metadata-rich
and case-preserving but (usually) case-insensitive file system. Due to the Unix
roots of macOS, Unix permissions were added to HFS Plus. Later versions of HFS
Plus added journaling to prevent corruption of the file system structure and
introduced a number of optimizations to the allocation algorithms in an attempt
to defragment files automatically without requiring an external defragmenter.
Filenames can be up to 255 characters. HFS Plus uses Unicode to store filenames.
On macOS, the filetype can come from the type code, stored in file's metadata, or
the filename extension.
HFS Plus has three kinds of links: Unix-style hard links, Unix-style symbolic links,
and aliases. Aliases are designed to maintain a link to their original file even if
they are moved or renamed; they are not interpreted by the file system itself, but
by the File Manager code in userland.
OS/2
OS/2 1.2 introduced the High Performance File System (HPFS). HPFS supports
mixed case file names in different code pages, long file names (255 characters),
more efficient use of disk space, an architecture that keeps related items close to
each other on the disk volume, less fragmentation of data, extent-based space
allocation, a B+ tree structure for directories, and the root directory located at the
midpoint of the disk, for faster average access. A journaled filesystem (JFS) was
shipped in 1999.
PC-BSD
Plan 9
Plan 9 from Bell Labs treats everything as a file and accesses all objects as a file
would be accessed (i.e., there is no ioctl or mmap): networking, graphics,
debugging, authentication, capabilities, encryption, and other services are
accessed via I/O operations on file descriptors. The 9P protocol removes the
difference between local and remote files. File systems in Plan 9 are organized
with the help of private, per-process namespaces, allowing each process to have a
different view of the many file systems that provide resources in a distributed
system.
Windows makes use of the FAT, NTFS, exFAT, Live File System and ReFS file
systems (the last of these is only supported and usable in Windows Server
2012, Windows Server 2016, Windows 8, Windows 8.1, and Windows 10;
Windows cannot boot from it).
Windows uses a drive letter abstraction at the user level to distinguish one disk or
partition from another. For example, the path C:\WINDOWS represents a
directory WINDOWS on the partition represented by the letter C. Drive C: is most
commonly used for the primary hard disk drive partition, on which Windows is
usually installed and from which it boots. This "tradition" has become so firmly
ingrained that bugs exist in many applications which make assumptions that the
drive that the operating system is installed on is C. The use of drive letters, and
the tradition of using "C" as the drive letter for the primary hard disk drive
partition, can be traced to MS-DOS, where the letters A and B were reserved for
up to two floppy disk drives. This in turn derived from CP/M in the 1970s, and
ultimately from IBM's CP/CMS of 1967.
OpenVMS
MVS
Data on the AS/400 and its successors consists of system objects mapped into the
system virtual address space in a single-level store. Many types of objects are
defined including the directories and files found in other file systems. File objects,
along with other types of objects, form the basis of the AS/400's support for an
integrated relational database.
The Prospero File System is a file system based on the Virtual System Model.
The system was created by Dr. B. Clifford Neuman of the Information Sciences
Institute at the University of Southern California.
RSRE FLEX file system - written in ALGOL 68
The file system of the Michigan Terminal System (MTS) is interesting because:
(i) it provides "line files" where record lengths and line numbers are associated
as metadata with each record in the file, lines can be added, replaced,
updated with the same or different length records, and deleted anywhere in
the file without the need to read and rewrite the entire file; (ii) using program
keys files may be shared or permitted to commands and programs in addition
to users and groups; and (iii) there is a comprehensive file locking mechanism
that protects both the file's data and its metadata.
Limitations
Converting the type of a file system
In-place conversion
In some cases conversion can be done in-place, although migrating the file system
is more conservative, as it involves a creating a copy of the data and is
recommended. On Windows, FAT and FAT32 file systems can be converted to
NTFS via the convert.exe utility, but not the reverse. On Linux, ext2 can be
converted to ext3 (and converted back), and ext3 can be converted to ext4 (but
not back), and both ext3 and ext4 can be converted to btrfs, and converted back
until the undo information is deleted. These conversions are possible due to using
the same format for the file data itself, and relocating the metadata into empty
space, in some cases using sparse file support.
For example, to migrate a FAT32 file system to an ext2 file system. First create a
new ext2 file system, then copy the data to the file system, then delete the FAT32
file system.