What Is A File System?
What Is A File System?
This report describes the basic foundations of distributed file systems and one example of an
implementation of one such system, the Andrew File System (AFS).
The first part of the report describes the conditions on which distributed systems started to evolve
and why. It also discusses some common requirements for distributed systems such as concurrent
updates, different kind of transparency and security.
The second part explains the history of AFS since its creation in the 1980s, and its usage today. It
also goes more into detail and describes the basic terms used in AFS systems such as cells and
volumes. The core components of AFS such as server and client services are described and in the
last part AFS is compared to other systems and its advantages are identified.
Concurrent updates
The file systems in the 1970s were developed for centralized computer systems, where the data was
only accessed by one user at a time. When multiple users and processes were to access files at the
same time a notion of locking was introduced. There are two kinds of locks, read and write. Since
two read operations don't conflict, an attempt to set a read lock on an object that is already locked
with a read lock is always successful. But an attempt to set a write lock on an object with a read or
write lock isn't.
There are two philosophies concerning locks: mandatory locking and advisory locking. In
mandatory systems operations actually fail if a lock is set. In advisory systems it is up to the
programs to decide if the lock is to be honored.
Transparency
An early stated goal for some of the new distributed file systems was to provide the same interface
as the old file systems to processes. This service is known as access transparency and have the
advantage that current stand-alone systems could be migrated to networks seemingly easy and old
program wouldn't need to be modified.
Location transparency makes it possible for the clients to access the file system without knowledge
of where the physical files are located. For example this makes it possible to retain paths even when
when the servers are physically moved or replaced.
Scaling transparency is the ability of incremental growth of the system size without having to make
changes to the structure of the system or its applications. The larger the system, the more important
is scaling transparency.
Security
In a multi-user environment a desirable feature is to only allow access to data to authorized users.
Users must be authenticated and only requests to files where users has access right should be
granted. Most distributed file systems implement some kind of access control lists. Additionally
protection mechanisms may include protect of protocol messages by signing them with digital
signatures and encrypting the data.
History
Andrew File System (AFS) is a file system that once was a part a larger project known as Andrew.
Andrew was a project of Carnegie Mellon University (CMU) to develop a distributed computing
environment on the Carnegie Mellon campus in the mid 1980s. AFS was originally developed for a
computer network running BSD UNIX and Mach.
The design goal of AFS was to create a system for large networks. To scale well the system
minimizes client-server communication by using an extensive client-caching scheme. Instead of
transferring recent data for each request, client stores whole files in their cache.
In 1989, the original design team of AFS left CMU and founded a company named Transarc
Corporation that commercialized AFS. In the 1990s a number of ports of AFS were done by
Transarc's customer to other systems, for example a group at MIT ported AFS to Linux. IBM
acquired Transarc in the 1990s and continued to produce commercial implementations of AFS. In
2000 IBM decided to release a branch of the source code under the IBM Public License. The
current development of AFS is now done in a project named OpenAFS. Today the OpenAFS
implementation supports a number of platforms such as: Linux, Apple MacOSX, Sun Solaris and
Microsoft Windows NT.
AFS today
Today AFS is mature and is not only used for research purposes, it is deployed in some large (5000
workstations+) networks. A number of known universities such as Stanford, MIT and KTH uses
AFS in their computer networks. Even research labs such as CERN and large companies such as
Intel and Morgan Stanley uses AFS.
OpenAFS is in active development, the most recent version (1.4.1) was released 2006-04-23 and
added support for MacOSX 10.4.
Cells
The basic organizational unit of AFS is the cell. An AFS cell is an independently administered
collection of server and client machines. Usually, a cell corresponds to an Internet domain, and is
named thereafter. Since AFS is most commonly used in academic and research environments, it is
not unusual for cells to also participate in a global AFS filetree. However, this is not mandatory.
Servers and clients can only belong to a single cell at a given time, but a user may have accounts in
multiple cells. The cell that is first logged in to is the home cell; other cells are called foreign.
From the user's point of view, the AFS hierarchy is accessed from /afs/[cell name] through normal
file system operations. Since AFS is location transparent, the path to a file in the tree will be the
same regardless of the location of the client. Whether the file is actually accessible or not depends
on the user's credentials and the file's access permissions, as set by Access Control Lists.
AFS clients
The client-side component of AFS is the Cache Manager. Depending on AFS version, it may be run
as a user process (with a few changes in the kernel), or implemented as a set of kernel modifications
(OpenAFS). The responsibilities of the Cache Manager include retreiving files from servers,
maintaining a local file cache, translating file requests into remote procedure calls, and storing
callbacks.
Several key observations of common file access patterns inspired the design of AFS:
• Commonly accessed files are usually a few kilobytes in size
• Reading is more common than writing
• Sequential accesses outnumber random accesses
• Even if the file is shared, writes are usually made by only one user
• Files are referenced in bursts
The strategy chosen for AFS was to cache files locally on the clients. When a file is opened, the
Cache Manager first checks if there is a valid copy of the file in the cache. If this is not the case, the
file is retrieved from a file server. A file that has been modified and closed on the client is
transferred back to the server.
Over time, the client builds up a "working set" of often-accessed files in the cache. A typical
optimal cache size is 100 MB, but this depends on what the client is used for. As long as the cached
files are not modified by other users, they do not have to be fetched from the server when
subsequently accessed. This reduces the load on the network significantly. Caching can be done on
disk or in memory.
The conceptual strategy used is that of whole file serving and caching. However, since version 3 of
AFS, data is actually served and cached in chunks of 64 kilobytes.
To ensure that files are kept up-to-date, a callback mechanism is used. When a file is first opened, a
so-called Callback Promise sent to the client together with newly opened files. The Callback
Promise is stored on the client and marked as valid by the Cache Manager. If the file is updated by
another client, the server sends a Callback to the Cache Manager, which sets the state of the
Callback Promise to cancelled. The next time an application accesses the file, a current copy must
be fetched from the server. If a file is flushed from the cache, the server is notified so that the
callback can be removed.
In the case where the client has been rebooted, all stored Callback Promises need to be checked to
see if their statuses have changed. To achieve this, the Cache Manager issues a Cache Validation
Request.
If multiple clients happen to concurrently open, modify, and close the same file, the update resulting
from the last close will overwrite previous changes. That is, the AFS implementation does not
handle concurrent updates, instead delegating this responsibility to the applications.
AFS uses advisory file locking. Byte ranges can not be locked, only whole files. For these reason,
AFS is not suited for situations where many users simultaneously need to write to a file, for
instance in database applications.
Volumes
The basic unit of storage in AFS is the volume, a logical unit corresponding to a directory in the file
tree. For example, it is common to map users' home directories to separate volumes. System files
that are not needed at boot time can also be placed in volumes.
A volume must fit in a single physical partition, but is usually much smaller. In contrast to having
partitions as the smallest unit of storage, the use of volumes offers several advantages:
• Volumes can be moved to different locations on a live system without affecting availability
• Volumes can be replicated, offering the possibility of load-balancing and increased
availability
• File system quotas can be set on a per-volume basis
• Handling of backup and recovery is simplified
Volume replication
Volumes that are frequently accessed by many clients can be replicated. This entails placing read-
only copies of a volume on different servers, and is thus most useful for volumes whose contents
are frequently accessed but seldom changed, such as UNIX libraries and programs.
Which file server is contacted when a client tries to access a file depends on server load and
availability, and is handled by the Cache Manager without client intervention.
When a file from a replicated volume is retreived, only one Callback Promise per volume is
obtained. Any change to the volume will break the Callback Promise.
AFS servers
Server machines can have several roles, which may be combined:
• Acting as a simple file server
• Acting as a server for the replicated AFS databases
• Acting as a binary distribution machine, for delivering updated versions of the server
software
• Acting as a system control machine that distributes common configuration files and also acts
as a primary time synchronization site
When there is only one server in a cell, it will take on all of the mentioned roles. It is recommended
that if a cell has more than one server machine, it is best to run multiple database servers, to gain the
advantages of database replication. Since the administrative databases are often accessed, doing this
helps distribute the load on the servers. However, having more than three database server machines
is not necessary.
Simple file servers can be added as necessary as the demand for storage space increases. Only one
system control machine is needed in each cell, and there should be as many binary distribution
machines as there are different server system types.
The system can be administered from any client workstation in the cell.
Server processes
The AFS server implementation is modularized, and not all types of server processes are run on
each server. Here is a short summary of the different processes and their functions:
The Basic OverSeer Server (BOS) makes sure that the other server processes are running
correctly, and will restart failed processes. Its primary function is to minimize system outtages, and
to reduce the need for manual administrator interventions. After a File Server or Volume Server
fails, the BOS will invoke a special process known as the Salvager, which repairs disk
inconsistencies caused by failure.
The File Server is responsible for delivering and storing files, maintaining a hierarchical file
structure, handling copy/move/create/delete requests, and keeping track of file stats (size,
modification time). It is also involved in creating links between files and granting advisory locks.
To determine if users are authorized to perform requested actions, the File Server communicates
with the Protection Server.
The Authentication Server maintains the Authentication Database, which stores user password
and server encryption keys. It employs Kerberos to provide services for mutual authentication
between clients and servers, and to grant time-limited tokens that clients use to prove that they have
been authenticated. An option for the administrator is to use an external Kerberos implementation,
such as MIT's.
The Protection Server maintains the Protection Database, and is a key component in the
implementation of Access Control Lists. ACLs are applied on the directory level, and constitute a
refinement of the normal UNIX permission system. There are seven different types of access
permissions (lookup, insert, delete, administer, read, write, and lock). Up to 20 different ACL
entries can be applied to each directory. For fine-grained access control, users can create ACL
groups. These enable setting permissions for many users (or IP addresses) at the same time.
The Volume Server provides facilities related to volume management, such as creation, relocation,
deletion and replication, and preparation for backup and archival of volumes.
The Volume Location Server maintains the Volume Location Database, which keeps track of
which servers different volumes reside on. It is a key component in the transparent access of data,
since the Cache Manager contacts it to determine from which file server to fetch a file.
The Update Server makes sure that all servers have up-to-date versions of the server processes and
common system configuration files.
The Backup Server maintains the Backup Database, and is used to create backup tape archives of
volumes.
Database replication
The contents of the four types of AFS databases may change frequently, and it is of vital
importance to system performance that all database servers have identical copies of the databases.
To ensure this, AFS employs Ubik, a replicated database mechanism implemented with RPC. Only
one server, the synchronization site, may accept updates to the databases. It runs a process known as
the Ubik coordinator. Changes are immediately propagated to other servers. During this time,
read/write access to the databases is not possible. Unless a majority of the other servers receive and
acknowledge the changes, the update is rolled back.
A list of which servers are actually synchronized needs to be maintained. If necessary, the Ubik
coordinator may be moved to a different server through an election process, to maximize
availability.
Clock synchronization
AFS in general, and its database replication in particular, is dependent on server clock
synchronization. This is achieved by using the Network Time Protocol Daemon. One server acts as
a synchronization site, gleaning the time from an external source. The other servers update their
clocks against the synchronization site.
Discussion
The prominent features of AFS are location transparency, its performance and the built-in security:
Location transparency
AFS has automatic file location tracking by system processes and Volume Location Database. NFS
mount points for tracking file's physical location must be set by administrators and users. Users
only need to know the pathname to a file and its AFS cell.
Security
AFS uses Kerberos to authenticate users, this improves the security in many different aspects.
Passwords are not sent over the network in plain text. Also, the authentication is mutual - both the
user and the service provider's identities are verified.
The use of Access Control Lists enables the system's users to set detailed access rights for their own
directories.
Conclusion
AFS has got some very pleasing qualities such as very good scaling and (for the most of the time)
good performance thanks in part to its client-caching. The drawbacks of AFS is that it isn't user
friendly for administrators and there may be some performance problems when heavy used read-
write files exists in the system. The drawbacks seems insignificant compared to the advantages
when comparing AFS with the competition, at least historically. So why didn't AFS become the
most popular distributed file system? There are not an easy answer to that question, maybe it was
lack of an early big company for marketing, or the fact that one had to pay a good deal of money to
get an AFS license or its early lack of multi-platform support. Perhaps AFS will have a brighter
future now when its open source, the future will tell.
References
• Operating Systems Concept, 7th edition.
• http://en.wikipedia.org/wiki/Sneaker_net
• http://en.wikipedia.org/wiki/OpenAFS
• AFS Administrator's Guide - http://www.openafs.org/pages/doc/index.htm
• AFS Frequently Asked Questions - http://www.angelfire.com/hi/plutonic/afs-faq.html