Virtual File System - Linux
Virtual File System - Linux
The main data item in any Unix-like system is the ``file'', and an unique pathname
identifies each file within a running system. Every file appears like any other file in
the way is is accessed and modified: the same system calls and the same user
commands apply to every file. This applies independently of both the physical
medium that holds information and the way information is laid out on the medium.
Abstraction from the physical storage of information is accomplished by dispatching
data transfer to different device drivers; abstraction from the information layout is
obtained in Linux through the VFS implementation.
Object Orientedness
While the previous list describes the theoretical organization of information, an
operating system must be able to deal with different ways to layout information on
disk. While it is theoretically possible to look for an optimum layout of information
on disks and use it for every disk partition, most computer users need to access all
of their hard drives without reformatting, mount NFS volumes across the network,
and sometimes even access those funny CDROM's and floppy disks whose filenames
can't exceed 8+3 characters.
The problem of handling different data formats in a transparent way has been
addresses by making super-blocks, inodes and files into ``objects'': an object
declares a set of operations that must be used to deal with it. The kernel won't be
stuck into big switch statements to be able to access the different physical layouts
of data, and new ``filesystem types'' can be added and removed at run time.
All the VFS idea, therefore, is implemented around sets of operations to act on the
objects. Each object includes a structure declaring its own operations, and most
operations receive a pointer to the ``self'' object as first argument, thus allowing
modification of the object itself.
In practice, a super-block structure, encloses a field ``struct super_operations
*s_op'', an inode encloses ``struct inode_operations *i_op'' and a file encloses
``struct file_operations *f_op''.
All the data handling and buffering that is performed by the Linux kernel is
independent of the actual format of the stored data: every communication with the
storage medium passes through one of the operations structures. The ``file-system
type'', then, is the software module that is in charge of mapping the operations to
the actual storage mechanism -- either a block device, a network connection (NFS)
or virtually any other mean to store/retrieve data. These software modules
implementing filesystem types can either be linked to the kernel being booted or
actually compiled in the form of loadable modules.
The current implementation of Linux allows to use loadable modules for all the
filesystem types but the root filesystem (the root filesystem must be mounted
before loading a module from it). Actually, the initrd machinery allows to load a
module before mounting the root filesystem, but this technique is usually only
exploited in installation floppies.
In this article I use the phrase ``filesystem module'' to refer independently to a
loadable module or a filesystem decoder linked to the kernel.
Linux Virtual File System
The Linux kernel implements the concept of Virtual File System (VFS, originally
Virtual Filesystem Switch), so that it is (to a large degree) possible to separate
actual "low-level" filesystem code from the rest of the kernel. The API of a
filesystem is described below.
This API was designed with things closely related to the ext2 filesystem in mind. For
very different filesystems, like NFS, there are all kinds of problems.
Four main objects: superblock, dentries, inodes, files
The kernel keeps track of files using in-core inodes ("index nodes"), usually derived
by the low-level filesystem from on-disk inodes.
A file may have several names, and there is a layer of dentries ("directory entries")
that represent pathnames, speeding up the lookup operation.
Several processes may have the same file open for reading or writing,
and file structures contain the required information such as the current file position.
Access to a filesystem starts by mounting it. This operation takes a filesystem type
(like ext2, vfat, iso9660, nfs) and a device and produces the in-core superblock that
contains the information required for operations on the filesystem; a third
ingredient, the mount point, specifies what pathname refers to the root of the
filesystem.
Auxiliary objects