Sa301 (Midterm Coverage)
Sa301 (Midterm Coverage)
MIDTERM COVERAGE
Chapter 4
Of File Systems and Storage Models
This chapter deals primarily with how we store data. Virtually all computer systems require some
way to store data permanently; even so-called “diskless” systems do require access to certain
files in order to boot, run and be useful. Albeit stored remotely (or in memory), these bits reside
on some sort of storage system.
Most frequently, data is stored on local hard disks, but over the last few years more and more of
our files have moved “into the cloud”, where different providers offer easy access to large
amounts of storage over the network. We have more and more computers depending on access
to remote systems, shifting our traditional view of what constitutes a storage device.
As system administrators, we are responsible for all kinds of devices: we build systems running
entirely without local storage just as we maintain the massive enterprise storage arrays that
enable decentralized data replication and archival. We manage large numbers of computers with
their own hard drives, using a variety of technologies to maximize throughput before the data
even gets onto a network.
In order to be able to optimize our systems on this level, it is important for us to understand the
principal concepts of how data is stored, the different storage models and disk interfaces. It is
important to be aware of certain physical properties of our storage media, and the impact they, as
well as certain historic limitations, have on how we utilize disks.
Available storage space is, despite rapidly falling prices for traditional hard disk drives, a scarce
resource1. The quote a the beginning of this chap- ter is rather apt: no matter how much disk
space we make available to our users, we will eventually run out and need to expand. In order to
accom- modate the ever-growing need for storage space, we use technologies such as Logical
Volume Management to combine multiple physical devices in a flex- ible manner to present a
single storage container to the operating system. We use techniques such as RAID to increase
capacity, resilience, or perfor- mance (pick two!), and separate data from one another within one
storage device using partitions. Finally, before we can actually use the disk devices to install an
operating system or any other software, we create a file system on top of these partitions.
System administrators are expected to understand well all of these topics. Obviously, each one
can (and does) easily fill many books; in this chapter, we will review the most important concepts
underlying the different technologies from the bottom up to the file system level. At each point, we
will compare and contrast traditional systems with recent developments, illustrating how the
principles, even if applied differently, remain the same. For significantly deeper discussions and
many more details, please see the chapter references, in particular the chapters on file systems
in Silberschatz[8] and McKusick et al.’s canonical paper on the Berkeley Fast File System[9].
In contrast to DAS, a dedicated file server generally contains significantly more and larger disks;
RAID or LVM may likewise be considered a requirement in this solution, so as to ensure both
performance and failover. Given the additional overhead of transferring data over the network, it
comes as no surprise that a certain performance penalty (mainly due to network speed or
congestion) is incurred. Careful tuning of the operating system and in particular the network
stack, the TCP window size, and the buffer cache can help minimize this cost.
The benefits of using a central file server for data storage are immediate and obvious: data is no
longer restricted to a single physical or virtual host and can be accessed (simultaneously) by
multiple clients. By pooling larger resources in a dedicated NAS device, more storage becomes
available.
Other than the performance impact we mentioned above, the distinct disadvantage lies in the fact
that the data becomes unavailable if the net- work connection suffers a disruption. In many
environments, the network connection can be considered sufficiently reliable and persistent to
alleviate this concern. However, such solutions are less suitable for mobile clients, such as
laptops or mobile devices, which frequently may disconnect from and reconnect to different
networks. Recent developments in the area of Cloud Storage have provided a number of
solutions (see Section 4.2.4), but it should be noted that mitigation can also be found in certain
older network file sys- tems and protocols: the Andrew File System (AFS), for example, uses a
local caching mechanism that lets it cope with the temporary loss of connectivity without blocking.
While network attached storage is most frequently used for large, shared partitions or data
resources, it is possible to boot and run a server entirely without any direct attached storage. In
this case, the entire file system, operating system kernel and user data may reside on the
network. We touch on this special setup in future chapters.
NAS and SAN enterprise hardware. On the left, Huawei storage servers with 24 hard drives each
and built-in hardware RAID controllers; on the right, a NetApp Fabric Attached Storage device,
also known as a “filer” with disk enclosures commonly referred to as “shelves”.
storage size, data availability, data redundancy, and performance, it becomes desirable to allow
different clients to access large chunks of storage on a block level. To accomplish this, we build
high performance networks specifically dedicated to the management of data storage: Storage
Area Networks.
In these dedicated networks, central storage media is accessed using high performance
interfaces and protocols such as Fibre Channel or iSCSI, making the exposed devices appear
local on the clients. As you can tell, the boundaries between these storage models are not rigid: a
single storage de- vice connected via Fibre Channel to a single host (i.e. an example of DAS) is
indistinguishable (to the client) from a dedicated storage device made avail- able over a Storage
Area Network (SAN). In fact, today’s network file servers frequently manage storage made
available to them over a SAN to export a file system to the clients as NAS.
Figure 4.4 illustrates how the storage volumes managed within a SAN can be accessed by one
host as if it was direct attached storage while other parts are made available via a file server as
NAS to different clients. In order for the
Figure 4.4: A SAN providing access to three devices; one host accesses parts of the available
storage as if it was DAS, while a file server manages other parts as NAS for two clients.
different consumers in a SAN to be able to independently address the distinct storage units, each
is identified by a unique Logical Unit Number (LUN). The system administrator combines the
individual disks via RAID, for example, into separate volumes; assigning to each storage unit an
independent LUN allows for correct identification by the clients and prevents access of data by
unauthorized servers, an important security mechanism. Fibre Channel switches used in SANs
allow further partitioning of the fabric by LUNs and subdivision into SAN Zones, allowing sets of
clients specifically access to “their” storage device only.
In this storage model the clients – the computers, file servers or other devices directly attached to
the SAN – are managing the volumes on a block level (much like a physical disk, as discussed in
Section 4.3.1). That is, they
need to create a logical structure on top of the block devices (as which the SAN units appear),
and they control all aspects of the I/O operations down to the protocol. With this low-level access,
clients can treat the storage like any other device. In particular, they can boot off SAN attached
devices, they can partition the volumes, create different file systems for different purposes on
them and export them via other protocols.
Storage area networks are frequently labeled an “enterprise solution” due to their significant
performance advantages and distributed nature. Espe- cially when used in a switched fabric,
additional resources can easily be made available to all or a subset of clients. These networks
utilize the Small Computer System Interface (SCSI) protocol for communications between the
different devices; in order to build a network on top of this, an additional protocol layer – the Fibre
Channel Protocol (FCP) being the most common one – is required. We will review the various
protocols and interfaces in Section 4.3.
SANs overcome their restriction to a local area network by further encapsulation of the protocol:
Fibre Channel over Ethernet (FCoE) or iSCSI, for example, allow connecting switched SAN
components across a Wide Area Network (or WAN). But the concept of network attached storage
devices fa- cilitating access to a larger storage area network becomes less accurate when end
users require access to their data from anywhere on the Internet. Cloud storage solutions have
been developed to address these needs. However, as we take a closer look at these
technologies, it is important to remember that at the end of the day, somewhere a system
administrator is in charge of mak- ing available the actual physical storage devices underlying
these solutions. Much like a file server may provide NAS to its clients over a SAN, so do cloud
storage solutions provide access on “enterprise scale” (and at this size the use of these words
finally seems apt) based on the foundation of the technologies we discussed up to here.
We also have come full circle from direct attached storage providing block- level access, to
distributed file systems, and then back around to block-level access over a dedicated storage
network. But this restricts access to clients on this specific network. As more and more (especially
smaller or mid-sized) companies are moving away from maintaining their own infrastructure to-
wards a model of Infrastructure as a Service (IaaS) and Cloud Computing, the storage
requirements change significantly, and we enter the area of Cloud Storage4.
The term “cloud storage” still has a number of conflicting or surprisingly different meanings. On
the one hand, we have commercial services offering file hosting or file storage services; common
well-known providers currently include Dropbox, Google Drive, Apple’s iCloud and Microsoft’s
SkyDrive. These services offer customers a way to not only store their files, but to access them
from different devices and locations: they effectively provide network attached storage over the
largest of WANs, the Internet.
On the other hand we have companies in need of a more flexible storage solutions than can be
provided with the existing models. Especially the increased use of virtualization technologies
demands faster and more flexible access to reliable, persistent yet relocatable storage devices. In
order to meet these requirements, storage units are rapidly allocated from large storage area
networks spanning entire data centers.
Since the different interpretations of the meaning of “cloud storage” yield significantly different
requirements, the implementations naturally vary, and there are no current industry standards
defining an architecture. As such, we are forced to treat each product independently as a black
box; system administrators and architects may choose to use any number of combinations of the
previously discussed models to provide the storage foundation upon which the final solution is
built.
We define three distinct categories within this storage model: (1) services that provide file system
level access as in the case of file hosting services such as those mentioned above; (2) services
that provide access on the object level, hiding file system implementation details from the client
and providing for easier abstraction into an API and commonly accessed via web services such
Figure 4.5: A possible cloud storage model: an internal SAN is made available over the Internet to
multiple clients. In this example, the storage provider effectively functions as a NAS server,
though it should generally be treated as a black box.
as Amazon’s Simple Storage Service (S3), or Windows Azure’s Blob Storage; and (3) services
that offer clients access on the block level, allowing them to create file systems and partitions as
they see fit (examples include Amazon’s Elastic Block Store (EBS) and OpenStack’s Cinder
service).
All of these categories have one thing in common, however. In order to provide the ability of
accessing storage units in a programmatic way – a fundamental requirement to enable the
flexibility needed in demanding environments – they rely on a clearly specified API. Multiple
distributed resources are combined to present a large storage pool, from which units are
allocated, de-allocated, re-allocated, relocated, and duplicated, all by way of higher-level
programs using well-defined interfaces to the lower-level storage systems.
Customers of cloud storage solution providers reap a wealth of benefits, including: their
infrastructure is simplified through the elimination of storage components; storage units are
almost immediately made available as needed and can grow or shrink according to immediate or
predicted usage patterns;
applications and entire OS images can easily be deployed, imported or exported as virtual
appliances.
Of course these benefits carry a cost. As usual, any time we add layers of abstraction we also run
the risk of increasing, possibly exponentially, the number of ways in which a system can fail.
Cloud storage is no exception: by relying on abstracted storage containers from a third-party
provider, we remove the ability to troubleshoot a system end-to-end; by outsourcing data storage,
we invite a number of security concerns regarding data safety and privacy; by accessing files
over the Internet, we may increase latency and decrease throughput; the cloud service provider
may become a single point of failure for our systems, one that is entirely outside our control.
access to the shared storage, even though layer-2 security mechanisms such as IPsec may be
combined with or integrated into the solution. Cloud storage, on the other hand, has to directly
address the problem of transmitting data and providing access over untrusted networks and thus
usually relies on application layer protocols such as Transport Layer Security (TLS)/Secure
Sockets Layer (SSL).
We will touch on some of these aspects in future sections and chapters, but you should keep
them in mind as you evaluate different solutions for different use cases. As we will see, in most
cases the simpler model turns out to be the more scalable and more secure one as well, so
beware adding unneeded layers of complexity!
lower power consumption and generally higher performance, the dominant medium in use
especially in enterprise scale storage solutions remains the ubiquitous hard drive6, storing data
on rotating, magnetic platters (see Fig- ure 4.6a). Understanding the physical structure of these
traditional storage devices is important for a system administrator, as the principles of espe- cially
the addressing modes and partition schemas used here come into play when we look at how file
systems manage data efficiently.
Hard drives can be made available to a server in a variety of ways. In- dividual disks are
connected directly to a Host Bus Adapter (HBA) using a single data/control cable and a separate
power cable. The traditional inter- faces here are SCSI, PATA and SATA, as well as Fibre
Channel.
SCSI, the Small Computer System Interface, has been around for over 25 years and has seen a
large number of confusing implementations and standards7. Once the default method to connect
any peripheral device using long, wide and generally unwieldy ribbon cables, SCSI has now been
largely obsoleted by the Advanced Technology Attachment (ATA) standards. At the same time,
however, it lives on in the Internet Small Computer System Interface (iSCSI), a standard
specifying storage connections using the SCSI command protocol over IP-based networks. iSCSI
is a common choice in storage area networks; as it uses TCP/IP over existing networks, it does
not require a dedicated storage network as is the case in traditional Fibre Channel SANs.
The Parallel Advanced Technology Attachment (PATA) standard, also frequently referred to as
IDE (for Integrated Device Electronics, a reference to the fact that the drive controller is included
in the hard drive), uses a 40 or 80 wire ribbon cable and allows for a maximum number of two
devices on the connector (traditionally referred to as the master and slave; this is often a source
of confusion, as neither device takes control or precedence over the other).
Faster throughput, smaller cables and support for hot-swapping, i.e. the ability to replace a drive
without having to shut down the operating system8,
Despite declining prices for SDDs, as of 2012 traditional hard drives remain notably cheaper than
SSDs and have higher storage capacity.
Different SCSI versions include such wonderful variations as Fast SCSI, Fast Wide SCSI, Ultra
SCSI, and Wide Ultra SCSI. None of which are to be confused with iSCSI, of course.
8Hot-swapping was a standard feature in many SCSI implementations. Many system
administrators in charge of the more powerful servers, using larger and more performant
were some of the advantages provided by the Serial ATA (SATA) interface. A number of revisions
and updates to the standard added more advanced features and, most significantly, increasingly
greater transfer speeds.
Most motherboards have integrated ATA host adapters, but a server can be extended with
additional HBAs via, for example, its PCI Express expansion slots; similarly, dedicated storage
appliances make use of disk array controllers to combine multiple drives into logical units (more
on that in Section 4.4). Fibre Channel HBAs finally allow a server to connect to a dedicated Fibre
Channel SAN. All of these interfaces can be either internal (the devices connected to the bus are
housed within the same physical enclosure as the server) or external (the devices are entirely
separate of the server, racked and powered independently and connected with suitable cables).
In the end, consider a host with a large amount of DAS and a NAS server managing multiple
terabytes of file space which is housed in a separate device and which it accesses over a SAN:
the main difference lies not in the technologies and protocols used, but in how they are
combined.