Sun's Network File System (NFS) : Security
Sun's Network File System (NFS) : Security
Client 0
Client 1
Network Server
Client 2
Client 3
1
2 S UN ’ S N ETWORK F ILE S YSTEM (NFS)
O PERATING
S YSTEMS
[V ERSION 0.92] WWW. OSTEP. ORG
S UN ’ S N ETWORK F ILE S YSTEM (NFS) 3
49.2 On To NFS
One of the earliest and quite successful distributed systems was devel-
oped by Sun Microsystems, and is known as the Sun Network File Sys-
tem (or NFS) [S86]. In defining NFS, Sun took an unusual approach: in-
stead of building a proprietary and closed system, Sun instead developed
an open protocol which simply specified the exact message formats that
clients and servers would use to communicate. Different groups could
develop their own NFS servers and thus compete in an NFS marketplace
while preserving interoperability. It worked: today there are many com-
panies that sell NFS servers (including Oracle/Sun, NetApp [HLM94],
EMC, IBM, and others), and the widespread success of NFS is likely at-
tributed to this “open market” approach.
T HREE
c 2014, A RPACI -D USSEAU
E ASY
P IECES
4 S UN ’ S N ETWORK F ILE S YSTEM (NFS)
char buffer[MAX];
int fd = open("foo", O_RDONLY); // get descriptor "fd"
read(fd, buffer, MAX); // read MAX bytes from foo (via fd)
read(fd, buffer, MAX); // read MAX bytes from foo
...
read(fd, buffer, MAX); // read MAX bytes from foo
close(fd); // close file
Now imagine that the client-side file system opens the file by sending
a protocol message to the server saying “open the file ’foo’ and give me
back a descriptor”. The file server then opens the file locally on its side
and sends the descriptor back to the client. On subsequent reads, the
client application uses that descriptor to call the read() system call; the
client-side file system then passes the descriptor in a message to the file
server, saying “read some bytes from the file that is referred to by the
descriptor I am passing you here”.
In this example, the file descriptor is a piece of shared state between
the client and the server (Ousterhout calls this distributed state [O91]).
Shared state, as we hinted above, complicates crash recovery. Imagine
the server crashes after the first read completes, but before the client
has issued the second one. After the server is up and running again,
the client then issues the second read. Unfortunately, the server has no
idea to which file fd is referring; that information was ephemeral (i.e.,
in memory) and thus lost when the server crashed. To handle this situa-
tion, the client and server would have to engage in some kind of recovery
protocol, where the client would make sure to keep enough information
around in its memory to be able to tell the server what it needs to know
(in this case, that file descriptor fd refers to file foo).
O PERATING
S YSTEMS
[V ERSION 0.92] WWW. OSTEP. ORG
S UN ’ S N ETWORK F ILE S YSTEM (NFS) 5
It gets even worse when you consider the fact that a stateful server has
to deal with client crashes. Imagine, for example, a client that opens a file
and then crashes. The open() uses up a file descriptor on the server; how
can the server know it is OK to close a given file? In normal operation, a
client would eventually call close() and thus inform the server that the
file should be closed. However, when a client crashes, the server never
receives a close(), and thus has to notice the client has crashed in order
to close the file.
For these reasons, the designers of NFS decided to pursue a stateless
approach: each client operation contains all the information needed to
complete the request. No fancy crash recovery is needed; the server just
starts running again, and a client, at worst, might have to retry a request.
T HREE
c 2014, A RPACI -D USSEAU
E ASY
P IECES
6 S UN ’ S N ETWORK F ILE S YSTEM (NFS)
NFSPROC_GETATTR
expects: file handle
returns: attributes
NFSPROC_SETATTR
expects: file handle, attributes
returns: nothing
NFSPROC_LOOKUP
expects: directory file handle, name of file/directory to look up
returns: file handle
NFSPROC_READ
expects: file handle, offset, count
returns: data, attributes
NFSPROC_WRITE
expects: file handle, offset, count, data
returns: attributes
NFSPROC_CREATE
expects: directory file handle, name of file, attributes
returns: nothing
NFSPROC_REMOVE
expects: directory file handle, name of file to be removed
returns: nothing
NFSPROC_MKDIR
expects: directory file handle, name of directory, attributes
returns: file handle
NFSPROC_RMDIR
expects: directory file handle, name of directory to be removed
returns: nothing
NFSPROC_READDIR
expects: directory handle, count of bytes to read, cookie
returns: directory entries, cookie (to get more entries)
O PERATING
S YSTEMS
[V ERSION 0.92] WWW. OSTEP. ORG
S UN ’ S N ETWORK F ILE S YSTEM (NFS) 7
of the file along with the offset within the file and number of bytes to read.
The server then will be able to issue the read (after all, the handle tells the
server which volume and which inode to read from, and the offset and
count tells it which bytes of the file to read) and return the data to the
client (or an error if there was a failure). WRITE is handled similarly,
except the data is passed from the client to the server, and just a success
code is returned.
One last interesting protocol message is the GETATTR request; given a
file handle, it simply fetches the attributes for that file, including the last
modified time of the file. We will see why this protocol request is impor-
tant in NFSv2 below when we discuss caching (can you guess why?).
T HREE
c 2014, A RPACI -D USSEAU
E ASY
P IECES
8 S UN ’ S N ETWORK F ILE S YSTEM (NFS)
Client Server
fd = open(”/foo”, ...);
Send LOOKUP (rootdir FH, ”foo”)
Receive LOOKUP request
look for ”foo” in root dir
return foo’s FH + attributes
Receive LOOKUP reply
allocate file desc in open file table
store foo’s FH in table
store current file position (0)
return file descriptor to application
close(fd);
Just need to clean up local structures
Free descriptor ”fd” in open file table
(No need to talk to server)
O PERATING
S YSTEMS
[V ERSION 0.92] WWW. OSTEP. ORG
S UN ’ S N ETWORK F ILE S YSTEM (NFS) 9
T IP : I DEMPOTENCY I S P OWERFUL
Idempotency is a useful property when building reliable systems. When
an operation can be issued more than once, it is much easier to handle
failure of the operation; you can just retry it. If an operation is not idem-
potent, life becomes more difficult.
T HREE
c 2014, A RPACI -D USSEAU
E ASY
P IECES
10 S UN ’ S N ETWORK F ILE S YSTEM (NFS)
In this way, the client can handle all timeouts in a unified way. If a
WRITE request was simply lost (Case 1 above), the client will retry it, the
server will perform the write, and all will be well. The same will happen
if the server happened to be down while the request was sent, but back
up and running when the second request is sent, and again all works
as desired (Case 2). Finally, the server may in fact receive the WRITE
request, issue the write to its disk, and send a reply. This reply may get
lost (Case 3), again causing the client to re-send the request. When the
server receives the request again, it will simply do the exact same thing:
write the data to disk and reply that it has done so. If the client this time
receives the reply, all is again well, and thus the client has handled both
message loss and server failure in a uniform manner. Neat!
A small aside: some operations are hard to make idempotent. For
example, when you try to make a directory that already exists, you are
informed that the mkdir request has failed. Thus, in NFS, if the file server
receives a MKDIR protocol message and executes it successfully but the
reply is lost, the client may repeat it and encounter that failure when in
fact the operation at first succeeded and then only failed on the retry.
Thus, life is not perfect.
O PERATING
S YSTEMS
[V ERSION 0.92] WWW. OSTEP. ORG
S UN ’ S N ETWORK F ILE S YSTEM (NFS) 11
T HREE
c 2014, A RPACI -D USSEAU
E ASY
P IECES
12 S UN ’ S N ETWORK F ILE S YSTEM (NFS)
C1 C2 C3
cache: F[v1] cache: F[v2] cache: empty
Server S
disk: F[v1] at first
F[v2] eventually
(version 2), or F[v2] and the old version F[v1] so we can keep the two
distinct (but of course the file has the same name, just different contents).
Finally, there is a third client, C3, which has not yet accessed the file F.
You can probably see the problem that is upcoming (Figure 49.7). In
fact, there are two subproblems. The first subproblem is that the client C2
may buffer its writes in its cache for a time before propagating them to the
server; in this case, while F[v2] sits in C2’s memory, any access of F from
another client (say C3) will fetch the old version of the file (F[v1]). Thus,
by buffering writes at the client, other clients may get stale versions of the
file, which may be undesirable; indeed, imagine the case where you log
into machine C2, update F, and then log into C3 and try to read the file,
only to get the old copy! Certainly this could be frustrating. Thus, let us
call this aspect of the cache consistency problem update visibility; when
do updates from one client become visible at other clients?
The second subproblem of cache consistency is a stale cache; in this
case, C2 has finally flushed its writes to the file server, and thus the server
has the latest version (F[v2]). However, C1 still has F[v1] in its cache; if a
program running on C1 reads file F, it will get a stale version (F[v1]) and
not the most recent copy (F[v2]), which is (often) undesirable.
NFSv2 implementations solve these cache consistency problems in two
ways. First, to address update visibility, clients implement what is some-
times called flush-on-close (a.k.a., close-to-open) consistency semantics;
specifically, when a file is written to and subsequently closed by a client
application, the client flushes all updates (i.e., dirty pages in the cache)
to the server. With flush-on-close consistency, NFS ensures that a subse-
quent open from another node will see the latest file version.
Second, to address the stale-cache problem, NFSv2 clients first check
to see whether a file has changed before using its cached contents. Specifi-
cally, when opening a file, the client-side file system will issue a GETATTR
request to the server to fetch the file’s attributes. The attributes, impor-
tantly, include information as to when the file was last modified on the
server; if the time-of-modification is more recent than the time that the
file was fetched into the client cache, the client invalidates the file, thus
removing it from the client cache and ensuring that subsequent reads will
go to the server and retrieve the latest version of the file. If, on the other
O PERATING
S YSTEMS
[V ERSION 0.92] WWW. OSTEP. ORG
S UN ’ S N ETWORK F ILE S YSTEM (NFS) 13
hand, the client sees that it has the latest version of the file, it will go
ahead and use the cached contents, thus increasing performance.
When the original team at Sun implemented this solution to the stale-
cache problem, they realized a new problem; suddenly, the NFS server
was flooded with GETATTR requests. A good engineering principle to
follow is to design for the common case, and to make it work well; here,
although the common case was that a file was accessed only from a sin-
gle client (perhaps repeatedly), the client always had to send GETATTR
requests to the server to make sure no one else had changed the file. A
client thus bombards the server, constantly asking “has anyone changed
this file?”, when most of the time no one had.
To remedy this situation (somewhat), an attribute cache was added
to each client. A client would still validate a file before accessing it, but
most often would just look in the attribute cache to fetch the attributes.
The attributes for a particular file were placed in the cache when the file
was first accessed, and then would timeout after a certain amount of time
(say 3 seconds). Thus, during those three seconds, all file accesses would
determine that it was OK to use the cached file and thus do so with no
network communication with the server.
T HREE
c 2014, A RPACI -D USSEAU
E ASY
P IECES
14 S UN ’ S N ETWORK F ILE S YSTEM (NFS)
servers will keep it in memory, and subsequent reads of said data (and
metadata) will not go to disk, a potential (small) boost in performance.
More intriguing is the case of write buffering. NFS servers absolutely
may not return success on a WRITE protocol request until the write has
been forced to stable storage (e.g., to disk or some other persistent device).
While they can place a copy of the data in server memory, returning suc-
cess to the client on a WRITE protocol request could result in incorrect
behavior; can you figure out why?
The answer lies in our assumptions about how clients handle server
failure. Imagine the following sequence of writes as issued by a client:
write(fd, a_buffer, size); // fill first block with a’s
write(fd, b_buffer, size); // fill second block with b’s
write(fd, c_buffer, size); // fill third block with c’s
These writes overwrite the three blocks of a file with a block of a’s,
then b’s, and then c’s. Thus, if the file initially looked like this:
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz
We might expect the final result after these writes to be like this, with the
x’s, y’s, and z’s, would be overwritten with a’s, b’s, and c’s, respectively.
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
Now let’s assume for the sake of the example that these three client
writes were issued to the server as three distinct WRITE protocol mes-
sages. Assume the first WRITE message is received by the server and
issued to the disk, and the client informed of its success. Now assume
the second write is just buffered in memory, and the server also reports
it success to the client before forcing it to disk; unfortunately, the server
crashes before writing it to disk. The server quickly restarts and receives
the third write request, which also succeeds.
Thus, to the client, all the requests succeeded, but we are surprised
that the file contents look like this:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy <--- oops
cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
Yikes! Because the server told the client that the second write was
successful before committing it to disk, an old chunk is left in the file,
which, depending on the application, might be catastrophic.
To avoid this problem, NFS servers must commit each write to stable
(persistent) storage before informing the client of success; doing so en-
ables the client to detect server failure during a write, and thus retry until
O PERATING
S YSTEMS
[V ERSION 0.92] WWW. OSTEP. ORG
S UN ’ S N ETWORK F ILE S YSTEM (NFS) 15
it finally succeeds. Doing so ensures we will never end up with file con-
tents intermingled as in the above example.
The problem that this requirement gives rise to in NFS server im-
plementation is that write performance, without great care, can be the
major performance bottleneck. Indeed, some companies (e.g., Network
Appliance) came into existence with the simple objective of building an
NFS server that can perform writes quickly; one trick they use is to first
put writes in a battery-backed memory, thus enabling to quickly reply
to WRITE requests without fear of losing the data and without the cost
of having to write to disk right away; the second trick is to use a file sys-
tem design specifically designed to write to disk quickly when one finally
needs to do so [HLM94, RO91].
49.12 Summary
We have seen the introduction of the NFS distributed file system. NFS
is centered around the idea of simple and fast recovery in the face of
server failure, and achieves this end through careful protocol design. Idem-
potency of operations is essential; because a client can safely replay a
failed operation, it is OK to do so whether or not the server has executed
the request.
We also have seen how the introduction of caching into a multiple-
client, single-server system can complicate things. In particular, the sys-
tem must resolve the cache consistency problem in order to behave rea-
sonably; however, NFS does so in a slightly ad hoc fashion which can
occasionally result in observably weird behavior. Finally, we saw how
server caching can be tricky: writes to the server must be forced to stable
storage before returning success (otherwise data can be lost).
We haven’t talked about other issues which are certainly relevant, no-
tably security. Security in early NFS implementations was remarkably
lax; it was rather easy for any user on a client to masquerade as other
users and thus gain access to virtually any file. Subsequent integration
with more serious authentication services (e.g., Kerberos [NT94]) have
addressed these obvious deficiencies.
T HREE
c 2014, A RPACI -D USSEAU
E ASY
P IECES
16 S UN ’ S N ETWORK F ILE S YSTEM (NFS)
References
[C00] “NFS Illustrated”
Brent Callaghan
Addison-Wesley Professional Computing Series, 2000
A great NFS reference; incredibly thorough and detailed per the protocol itself.
[S86] “The Sun Network File System: Design, Implementation and Experience”
Russel Sandberg
USENIX Summer 1986
The original NFS paper; though a bit of a challenging read, it is worthwhile to see the source of these
wonderful ideas.
O PERATING
S YSTEMS
[V ERSION 0.92] WWW. OSTEP. ORG