0% found this document useful (0 votes)
47 views12 pages

Sa301 (Midterm Coverage)

This document provides an overview of different storage models used in computer systems, including direct attached storage (DAS), network attached storage (NAS), and storage area networks (SANs). DAS involves storage devices directly connected to a server, which is the most common configuration. NAS allows storage to be shared over a network by connecting storage to a file server. SANs connect storage to a network separately from servers, allowing storage to be shared among multiple servers. The document discusses the advantages and disadvantages of each model and their roles in modern storage architectures.

Uploaded by

EFREN LAZARTE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views12 pages

Sa301 (Midterm Coverage)

This document provides an overview of different storage models used in computer systems, including direct attached storage (DAS), network attached storage (NAS), and storage area networks (SANs). DAS involves storage devices directly connected to a server, which is the most common configuration. NAS allows storage to be shared over a network by connecting storage to a file server. SANs connect storage to a network separately from servers, allowing storage to be shared among multiple servers. The document discusses the advantages and disadvantages of each model and their roles in modern storage architectures.

Uploaded by

EFREN LAZARTE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

City of Passi

SCHOOL OF SCIENCE IN INFORMATION AND COMMUNICATION TECHNOLOGY


SYSTEM ADMINISTRATION AND MAINTENANCE – SA 301
2ND Semester, A.Y. 2022-2023

MIDTERM COVERAGE

Chapter 4
Of File Systems and Storage Models
This chapter deals primarily with how we store data. Virtually all computer systems require some
way to store data permanently; even so-called “diskless” systems do require access to certain
files in order to boot, run and be useful. Albeit stored remotely (or in memory), these bits reside
on some sort of storage system.
Most frequently, data is stored on local hard disks, but over the last few years more and more of
our files have moved “into the cloud”, where different providers offer easy access to large
amounts of storage over the network. We have more and more computers depending on access
to remote systems, shifting our traditional view of what constitutes a storage device.

As system administrators, we are responsible for all kinds of devices: we build systems running
entirely without local storage just as we maintain the massive enterprise storage arrays that
enable decentralized data replication and archival. We manage large numbers of computers with
their own hard drives, using a variety of technologies to maximize throughput before the data
even gets onto a network.
In order to be able to optimize our systems on this level, it is important for us to understand the
principal concepts of how data is stored, the different storage models and disk interfaces. It is
important to be aware of certain physical properties of our storage media, and the impact they, as
well as certain historic limitations, have on how we utilize disks.
Available storage space is, despite rapidly falling prices for traditional hard disk drives, a scarce
resource1. The quote a the beginning of this chap- ter is rather apt: no matter how much disk
space we make available to our users, we will eventually run out and need to expand. In order to
accom- modate the ever-growing need for storage space, we use technologies such as Logical
Volume Management to combine multiple physical devices in a flex- ible manner to present a
single storage container to the operating system. We use techniques such as RAID to increase
capacity, resilience, or perfor- mance (pick two!), and separate data from one another within one
storage device using partitions. Finally, before we can actually use the disk devices to install an
operating system or any other software, we create a file system on top of these partitions.
System administrators are expected to understand well all of these topics. Obviously, each one
can (and does) easily fill many books; in this chapter, we will review the most important concepts
underlying the different technologies from the bottom up to the file system level. At each point, we
will compare and contrast traditional systems with recent developments, illustrating how the
principles, even if applied differently, remain the same. For significantly deeper discussions and
many more details, please see the chapter references, in particular the chapters on file systems
in Silberschatz[8] and McKusick et al.’s canonical paper on the Berkeley Fast File System[9].

4.2 Storage Models


We distinguish different storage models by how the device in charge of keeping the bits in place
interacts with the higher layers: by where raw block device access is made available, by where a
file system is created to make available the disk space as a useful unit, by which means and
protocols the operating system accesses the file system. Somewhat simplified, we identify as the
main three components the storage device itself, i.e. the actual medium; the file system, providing
access to the block level storage media to the operating system; and finally the application
software. The operating system managing the file system and running the application software
then acts as the agent making actual I/O possible.

4.2.1 Direct Attached Storage


The by far most common way to access storage is so simple that we rarely think about it as a
storage model: hard drives are attached (commonly via a host bus adapter and a few cables)
directly to the server, the operating system detects the block devices and maintains a file system
on them and thus allows for access with the smallest level of indirection. The vast majority of
hosts (laptops, desktop and server systems alike) all utilize this method. The term used
nowadays – Direct Attached Storage (DAS) – was effectively created only after other approaches
become popular enough to require a simple differentiating name.
Figure 4.1a illustrates this model: all interfacing components are within the control of a single
server’s operating system (and frequently located within the same physical case) and multiple
servers each have their own storage system. On the hardware level, the storage media may be
attached using a variety of technologies, and we have seen a number of confusing standards
come and go over the years. The best choice here depends on many factors, including the
number of devices to be attached, driver support for the connecting interface in the OS, and
performance or reliability considerations. We will touch upon all of these aspects throughout this
chapter.
A server with a single hard drive (such as the one shown in Figure 4.1b is perhaps the simplest
application of this model. But it is not uncommon for servers to have multiple direct attached
devices, the configuration of which then depends entirely on the next higher level of storage
strategies. Individual disks can simply be mounted in different locations of the file system
hierarchy.
Alternatively, multiple direct attached disks can be combined to create a single logical storage
unit through the use of a Logical Volume Manager (LVM) or a Redundant Array of Independent
Disks (RAID). This allows for improved performance, increased amount of storage and/or
redundancy. We will discuss these concepts in more detail in Section 4.4.
Direct attached storage need not be physically located in the same case (or even rack) as the
server using it. That is, we differentiate between in- ternal storage (media attached inside the
server with no immediate external exposure) and external storage (media attached to a server’s
interface ports, such as Fibre Channel, USB etc.) with cables the lengths of which depend on the
technology used. External media allows us to have large amounts of storage housed in a
separate enclosure with its own power supply, possibly located several feet away from the server.
If a server using these disks suf- fers a hardware failure, it becomes significantly easier to move
the data to another host: all you need to do is connect the cable to the new server.
Simple as this architecture is, it is also ubiquitous. The advantages of DAS should be obvious:
since there is no network or other additional layer in between the operating system and the
hardware, the possibility of failure on that level is eliminated. Likewise, a performance penalty
due to network latency, for example, is impossible. As system administrators, we frequently need
to carefully eliminate possible causes of failures, so the fewer layers of indirection we have
between the operating system issuing I/O operations and the bits actually ending up on a storage
medium, the better.
At the same time, there are some disadvantages. Since the storage media is, well, directly
attached, it implies a certain isolation from other systems on the network. This is both an
advantage as well as a drawback: on the one hand, each server requires certain data to be
private or unique to its operating system; on the other hand, data on one machine cannot
immediately be made available to other systems. This restriction is overcome with either one of
the two storage models we will review next: Network Attached Storage (NAS) and Storage Area
Networks (SANs).
DAS can easily become a shared resource by letting the operating system make available a local
storage device over the network. In fact, all network file servers and appliances ultimately are
managing direct attached storage on behalf of their clients; DAS becomes a building block of
NAS. Likewise, physically separate storage enclosures can function as DAS if connected di-
rectly to a server or may be combined with others and connected to network or storage fabric,
that is: they become part of a SAN.2

4.2.2 Network Attached Storage


As the need for more and more data arises, we frequently want to be able to access certain data
from multiple servers. An old and still very common example is to store all your users’ data on
shared disks that are made available to all clients over the network. When a user logs into hostA,
she expects to find all her files in place just as when she logs into hostB. To make this magic
happen, two things are required: (1) the host’s file system has to know how to get the data from a
central location and (2) the central storage has to be accessible over the network. As you can tell,
this introduces a number of complex considerations, not the least of which are access control and
performance.
For the moment, let us put aside these concerns, however, and look at the storage model from a
purely architectural point of view: One host functions as the “file server”, while multiple clients
access the file system over the network. The file server may be a general purpose Unix system or
a special network appliance – either way, it provides access to a number of disks or
other storage media, which, within this system are effectively direct attached storage. In order for
the clients to be able to use the server’s file system remotely, they require support for (and have
to be in agreement with) the protocols used3. However, the clients do not require access to the
storage media on the block level; in fact, they cannot gain such access.
From the clients’ perspective, the job of managing storage has become simpler: I/O operations
are performed on the file system much as they would be on a local file system, with the
complexity of how to shuffle the data over the network being handled in the protocol in question.
This model is illustrated in Figure 4.2, albeit in a somewhat simplified manner: even though the
file system is created on the file server, the clients still require support for the network file system
that brokers the transaction performed locally with the file server.

In contrast to DAS, a dedicated file server generally contains significantly more and larger disks;
RAID or LVM may likewise be considered a requirement in this solution, so as to ensure both
performance and failover. Given the additional overhead of transferring data over the network, it
comes as no surprise that a certain performance penalty (mainly due to network speed or
congestion) is incurred. Careful tuning of the operating system and in particular the network
stack, the TCP window size, and the buffer cache can help minimize this cost.
The benefits of using a central file server for data storage are immediate and obvious: data is no
longer restricted to a single physical or virtual host and can be accessed (simultaneously) by
multiple clients. By pooling larger resources in a dedicated NAS device, more storage becomes
available.
Other than the performance impact we mentioned above, the distinct disadvantage lies in the fact
that the data becomes unavailable if the net- work connection suffers a disruption. In many
environments, the network connection can be considered sufficiently reliable and persistent to
alleviate this concern. However, such solutions are less suitable for mobile clients, such as
laptops or mobile devices, which frequently may disconnect from and reconnect to different
networks. Recent developments in the area of Cloud Storage have provided a number of
solutions (see Section 4.2.4), but it should be noted that mitigation can also be found in certain
older network file sys- tems and protocols: the Andrew File System (AFS), for example, uses a
local caching mechanism that lets it cope with the temporary loss of connectivity without blocking.
While network attached storage is most frequently used for large, shared partitions or data
resources, it is possible to boot and run a server entirely without any direct attached storage. In
this case, the entire file system, operating system kernel and user data may reside on the
network. We touch on this special setup in future chapters.

4.2.3 Storage Area Networks


Network Attached Storage (NAS) allows multiple clients to access the same file system over the
network, but that means it requires all clients to use specifically this file system. The NAS file
server manages and handles the creation of the file systems on the storage media and allows for
shared access, overcoming many limitations of direct attached storage. At the same time,
however, and especially as we scale up, our requirements with respect to
Figure 4.3: NAS and SAN Hardware

NAS and SAN enterprise hardware. On the left, Huawei storage servers with 24 hard drives each
and built-in hardware RAID controllers; on the right, a NetApp Fabric Attached Storage device,
also known as a “filer” with disk enclosures commonly referred to as “shelves”.

storage size, data availability, data redundancy, and performance, it becomes desirable to allow
different clients to access large chunks of storage on a block level. To accomplish this, we build
high performance networks specifically dedicated to the management of data storage: Storage
Area Networks.
In these dedicated networks, central storage media is accessed using high performance
interfaces and protocols such as Fibre Channel or iSCSI, making the exposed devices appear
local on the clients. As you can tell, the boundaries between these storage models are not rigid: a
single storage de- vice connected via Fibre Channel to a single host (i.e. an example of DAS) is
indistinguishable (to the client) from a dedicated storage device made avail- able over a Storage
Area Network (SAN). In fact, today’s network file servers frequently manage storage made
available to them over a SAN to export a file system to the clients as NAS.
Figure 4.4 illustrates how the storage volumes managed within a SAN can be accessed by one
host as if it was direct attached storage while other parts are made available via a file server as
NAS to different clients. In order for the
Figure 4.4: A SAN providing access to three devices; one host accesses parts of the available
storage as if it was DAS, while a file server manages other parts as NAS for two clients.

different consumers in a SAN to be able to independently address the distinct storage units, each
is identified by a unique Logical Unit Number (LUN). The system administrator combines the
individual disks via RAID, for example, into separate volumes; assigning to each storage unit an
independent LUN allows for correct identification by the clients and prevents access of data by
unauthorized servers, an important security mechanism. Fibre Channel switches used in SANs
allow further partitioning of the fabric by LUNs and subdivision into SAN Zones, allowing sets of
clients specifically access to “their” storage device only.
In this storage model the clients – the computers, file servers or other devices directly attached to
the SAN – are managing the volumes on a block level (much like a physical disk, as discussed in
Section 4.3.1). That is, they

need to create a logical structure on top of the block devices (as which the SAN units appear),
and they control all aspects of the I/O operations down to the protocol. With this low-level access,
clients can treat the storage like any other device. In particular, they can boot off SAN attached
devices, they can partition the volumes, create different file systems for different purposes on
them and export them via other protocols.
Storage area networks are frequently labeled an “enterprise solution” due to their significant
performance advantages and distributed nature. Espe- cially when used in a switched fabric,
additional resources can easily be made available to all or a subset of clients. These networks
utilize the Small Computer System Interface (SCSI) protocol for communications between the
different devices; in order to build a network on top of this, an additional protocol layer – the Fibre
Channel Protocol (FCP) being the most common one – is required. We will review the various
protocols and interfaces in Section 4.3.
SANs overcome their restriction to a local area network by further encapsulation of the protocol:
Fibre Channel over Ethernet (FCoE) or iSCSI, for example, allow connecting switched SAN
components across a Wide Area Network (or WAN). But the concept of network attached storage
devices fa- cilitating access to a larger storage area network becomes less accurate when end
users require access to their data from anywhere on the Internet. Cloud storage solutions have
been developed to address these needs. However, as we take a closer look at these
technologies, it is important to remember that at the end of the day, somewhere a system
administrator is in charge of mak- ing available the actual physical storage devices underlying
these solutions. Much like a file server may provide NAS to its clients over a SAN, so do cloud
storage solutions provide access on “enterprise scale” (and at this size the use of these words
finally seems apt) based on the foundation of the technologies we discussed up to here.

4.2.4 Cloud Storage


In the previous sections we have looked at storage models that ranged from the very simple and
very local to a more abstracted and distributed approach, to a solution that allows access across
even a Wide Area Network (WAN). At each step, we have introduced additional layers of
indirection with the added benefit of being able to accommodate larger requirements: more
clients, more disk space, increased redundancy, etc.

We also have come full circle from direct attached storage providing block- level access, to
distributed file systems, and then back around to block-level access over a dedicated storage
network. But this restricts access to clients on this specific network. As more and more (especially
smaller or mid-sized) companies are moving away from maintaining their own infrastructure to-
wards a model of Infrastructure as a Service (IaaS) and Cloud Computing, the storage
requirements change significantly, and we enter the area of Cloud Storage4.
The term “cloud storage” still has a number of conflicting or surprisingly different meanings. On
the one hand, we have commercial services offering file hosting or file storage services; common
well-known providers currently include Dropbox, Google Drive, Apple’s iCloud and Microsoft’s
SkyDrive. These services offer customers a way to not only store their files, but to access them
from different devices and locations: they effectively provide network attached storage over the
largest of WANs, the Internet.
On the other hand we have companies in need of a more flexible storage solutions than can be
provided with the existing models. Especially the increased use of virtualization technologies
demands faster and more flexible access to reliable, persistent yet relocatable storage devices. In
order to meet these requirements, storage units are rapidly allocated from large storage area
networks spanning entire data centers.
Since the different interpretations of the meaning of “cloud storage” yield significantly different
requirements, the implementations naturally vary, and there are no current industry standards
defining an architecture. As such, we are forced to treat each product independently as a black
box; system administrators and architects may choose to use any number of combinations of the
previously discussed models to provide the storage foundation upon which the final solution is
built.
We define three distinct categories within this storage model: (1) services that provide file system
level access as in the case of file hosting services such as those mentioned above; (2) services
that provide access on the object level, hiding file system implementation details from the client
and providing for easier abstraction into an API and commonly accessed via web services such

Figure 4.5: A possible cloud storage model: an internal SAN is made available over the Internet to
multiple clients. In this example, the storage provider effectively functions as a NAS server,
though it should generally be treated as a black box.

as Amazon’s Simple Storage Service (S3), or Windows Azure’s Blob Storage; and (3) services
that offer clients access on the block level, allowing them to create file systems and partitions as
they see fit (examples include Amazon’s Elastic Block Store (EBS) and OpenStack’s Cinder
service).
All of these categories have one thing in common, however. In order to provide the ability of
accessing storage units in a programmatic way – a fundamental requirement to enable the
flexibility needed in demanding environments – they rely on a clearly specified API. Multiple
distributed resources are combined to present a large storage pool, from which units are
allocated, de-allocated, re-allocated, relocated, and duplicated, all by way of higher-level
programs using well-defined interfaces to the lower-level storage systems.
Customers of cloud storage solution providers reap a wealth of benefits, including: their
infrastructure is simplified through the elimination of storage components; storage units are
almost immediately made available as needed and can grow or shrink according to immediate or
predicted usage patterns;

applications and entire OS images can easily be deployed, imported or exported as virtual
appliances.
Of course these benefits carry a cost. As usual, any time we add layers of abstraction we also run
the risk of increasing, possibly exponentially, the number of ways in which a system can fail.
Cloud storage is no exception: by relying on abstracted storage containers from a third-party
provider, we remove the ability to troubleshoot a system end-to-end; by outsourcing data storage,
we invite a number of security concerns regarding data safety and privacy; by accessing files
over the Internet, we may increase latency and decrease throughput; the cloud service provider
may become a single point of failure for our systems, one that is entirely outside our control.

4.2.5 Storage Model Considerations


As we have seen in the previous sections, the larger we grow our storage re- quirements, the
more complex the architecture grows. It is important to keep this in mind: even though added
layers of abstraction and indirection help us scale our infrastructure, the added complexity has
potentially exponential costs. The more moving parts a system has, the more likely it is to break,
and the more spectacular its failure will be.
A single bad hard drive is easy to replace; rebuilding the storage array underneath hundreds of
clients much less so. The more clients we have, the more important it is to build our storage
solution for redundancy as well as reliability and resilience.
System administrators need to understand all of the storage models we discussed, as they are
intertwined: DAS must eventually underly any storage solution, since the bits do have to be
stored somewhere after all; the concepts of NAS permeate any infrastructure spanning more than
just a few work stations, and SANs and cloud storage combine DAS and NAS in different ways to
make storage available over complex networks.
At each layer, we introduce security risks, of which we need to be aware: any time bits are
transferred over a network, we need to consider the integrity and privacy of the files: who has
access, who should have access, how is the access granted, how are clients authenticated, and
so on. NAS and SAN solutions tend to ignore many of these implications and work under the as-
sumption that the network, over which the devices are accessed, are “secure”; access controls
are implemented on a higher layer such as the implementation of the file system. Often times,
access to the network in question implies
Figure 4.6: An open PATA (or IDE) hard drive (left) and a Solid State Drive (right). The HDD
shows the rotating disk platters, the read-write head with its motor, the disk controller and the
recognizable connector socket.

access to the shared storage, even though layer-2 security mechanisms such as IPsec may be
combined with or integrated into the solution. Cloud storage, on the other hand, has to directly
address the problem of transmitting data and providing access over untrusted networks and thus
usually relies on application layer protocols such as Transport Layer Security (TLS)/Secure
Sockets Layer (SSL).
We will touch on some of these aspects in future sections and chapters, but you should keep
them in mind as you evaluate different solutions for different use cases. As we will see, in most
cases the simpler model turns out to be the more scalable and more secure one as well, so
beware adding unneeded layers of complexity!

4.3 Disk Devices and Interfaces


The different storage models we discussed in the previous sections are just a means to access
the storage devices in order to, well, store our data. Devices or media used to store data include
tape drives (for use with magnetic tape), optical media such as CDs, various non-volatile memory
based devices such as flash drives, DRAM-based storage, and of course the hard-disk drive
(HDD). Even though Solid State Drive (SDD) offer significant advantages such as

lower power consumption and generally higher performance, the dominant medium in use
especially in enterprise scale storage solutions remains the ubiquitous hard drive6, storing data
on rotating, magnetic platters (see Fig- ure 4.6a). Understanding the physical structure of these
traditional storage devices is important for a system administrator, as the principles of espe- cially
the addressing modes and partition schemas used here come into play when we look at how file
systems manage data efficiently.
Hard drives can be made available to a server in a variety of ways. In- dividual disks are
connected directly to a Host Bus Adapter (HBA) using a single data/control cable and a separate
power cable. The traditional inter- faces here are SCSI, PATA and SATA, as well as Fibre
Channel.
SCSI, the Small Computer System Interface, has been around for over 25 years and has seen a
large number of confusing implementations and standards7. Once the default method to connect
any peripheral device using long, wide and generally unwieldy ribbon cables, SCSI has now been
largely obsoleted by the Advanced Technology Attachment (ATA) standards. At the same time,
however, it lives on in the Internet Small Computer System Interface (iSCSI), a standard
specifying storage connections using the SCSI command protocol over IP-based networks. iSCSI
is a common choice in storage area networks; as it uses TCP/IP over existing networks, it does
not require a dedicated storage network as is the case in traditional Fibre Channel SANs.
The Parallel Advanced Technology Attachment (PATA) standard, also frequently referred to as
IDE (for Integrated Device Electronics, a reference to the fact that the drive controller is included
in the hard drive), uses a 40 or 80 wire ribbon cable and allows for a maximum number of two
devices on the connector (traditionally referred to as the master and slave; this is often a source
of confusion, as neither device takes control or precedence over the other).
Faster throughput, smaller cables and support for hot-swapping, i.e. the ability to replace a drive
without having to shut down the operating system8,

Despite declining prices for SDDs, as of 2012 traditional hard drives remain notably cheaper than
SSDs and have higher storage capacity.
Different SCSI versions include such wonderful variations as Fast SCSI, Fast Wide SCSI, Ultra
SCSI, and Wide Ultra SCSI. None of which are to be confused with iSCSI, of course.
8Hot-swapping was a standard feature in many SCSI implementations. Many system
administrators in charge of the more powerful servers, using larger and more performant

were some of the advantages provided by the Serial ATA (SATA) interface. A number of revisions
and updates to the standard added more advanced features and, most significantly, increasingly
greater transfer speeds.
Most motherboards have integrated ATA host adapters, but a server can be extended with
additional HBAs via, for example, its PCI Express expansion slots; similarly, dedicated storage
appliances make use of disk array controllers to combine multiple drives into logical units (more
on that in Section 4.4). Fibre Channel HBAs finally allow a server to connect to a dedicated Fibre
Channel SAN. All of these interfaces can be either internal (the devices connected to the bus are
housed within the same physical enclosure as the server) or external (the devices are entirely
separate of the server, racked and powered independently and connected with suitable cables).
In the end, consider a host with a large amount of DAS and a NAS server managing multiple
terabytes of file space which is housed in a separate device and which it accesses over a SAN:
the main difference lies not in the technologies and protocols used, but in how they are
combined.

You might also like