0% found this document useful (0 votes)
33 views

File Systems 2

This document provides an overview of Sun's Network File System (NFS). It discusses the architecture, protocols, and implementation of NFS. Key points include: NFS allows clients to access files on remote servers; it uses a stateless protocol for mounting and file access; and it employs client-side caching to improve performance. The document also briefly describes how NFS works with the Network Information Service (NIS) for authentication.

Uploaded by

daniela
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

File Systems 2

This document provides an overview of Sun's Network File System (NFS). It discusses the architecture, protocols, and implementation of NFS. Key points include: NFS allows clients to access files on remote servers; it uses a stateless protocol for mounting and file access; and it employs client-side caching to improve performance. The document also briefly describes how NFS works with the Network Information Service (NIS) for authentication.

Uploaded by

daniela
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

Distributed File Systems

(II)

Outline
Last topics:

Introduction

Design of distributed file systems

Implementation of distributed file systems

Now:

Case studies: NFS, AFS

Suns Network File System (NFS)

NFS is a popular and widely used network file system


NFS was originally designed and implemented by
Sun Microsystems for use on its UNIX-based
workstations
Other manufacturers now support it as well, for both
UNIX and other operating systems (including Linux,
MS-DOS, etc.)
NFS supports heterogeneous systems, for example,
MS-DOS clients making use of UNIX servers
It is not even required that all the machines use the
same hardware

Suns Network File System (NFS)

Three aspects of NFS are of interest:

architecture
protocol
implementation

NFS Architecture

The basic idea behind NFS is to allow an arbitrary collection


of clients and servers to share a common file system

In most cases, all the clients and servers are on the same LAN
NFS allows every machine to be both a client and a server at
the same time

Server side:

Each NFS server exports one or more of its directories for


access by remote clients. When a directory is made
available, so are all of its sub-directories, so the entire
directory tree is exported as a unit

The list of directories a server exports is maintained in the


/etc/exports file, so these directories can be exported
5
automatically whenever the server is booted

NFS Architecture (cont.)

Client (workstation) side:

Clients access exported directories by mounting them.


When a client mounts a directory, it becomes part of its
directory hierarchy

A diskless workstation can mount a remote file system


on its root directory, resulting in a file system that is
supported entirely on a remote server

Those workstations that have a local disk can mount


remote directories anywhere they wish. There is no
difference between a remote file and a local file

If two or more clients mount the same directory at the


same time, they can communicate by sharing files in
their common directories

NFS Protocols

A protocol is a set of requests sent by clients to


servers, along with the corresponding replies
sent by the servers back to the clients

As long as a server recognizes and can handle all the


requests in the protocols, it need not know anything at
all about its clients

Clients can treat servers as black boxes that accepts


and process a specific set of requests; how they do it is
their own business

NFS defines 2 protocols:

the protocol for mounting volumes


the protocol for directory and file access

NFS Protocols : Mounting

Mounting protocol:

A client can send a path name to a server and request


permission to mount that directory somewhere in its directory
hierarchy.
The place where it is to be mounted is not contained in the
message, as the server does not care where it is to be
mounted.
If the path name is legal and the directory specified has been
exported, the server returns a file handle to the client.
The file handle contains fields uniquely identifying the file
system type, the disk, the i-node number of the directory, and
security information.
Subsequent calls to read and write files in the mounted
directory use the file handle.

Mount Protocol

NFS uses the mount protocol to access remote files

Mount protocol establishes a local name for remote files


Users access remote files using local names; OS takes care of the mapping

Automounting

Suns version of UNIX also supports automounting


This feature allows a set of remote directories to be
associated with a local directory
None of these remote directories are mounted (or their
servers even contacted) when the client is booted
Instead, the first time a remote file is opened, the
operating system sends a message to each of the
servers. The first one to reply wins, and its directory is
mounted
10

NFS Automounting

Automounting has two principal advantages over static mounting:

First, in static mounting via the /etc/rc file, if one of the NFS servers
happens to be down, it is impossible to bring the client up -- at least not
without some difficulty, delay, and quite a few error messages

Second, by allowing the client to try a set of servers in parallel, a degree of


fault tolerance can be achieved (because only one of them need to be up),
and the performance can be improved (by choosing the first one to reply -presumably the least heavily loaded)

On the other hand, it is assumed that all the file systems specified
as alternatives for the automount are identical

Since NFS provides no support for file or directory replication, it is up to the


user to arrange for all the file systems to be the same

Thus, automounting is most often used for read-only file systems


containing system binaries and other files that rarely change
11

NFS Protocols: Directory and File Access

Clients can send messages to servers to manipulate directories


and to read and write files. They can also access file attributes,
such as file mode, size, and time of last modification. Most UNIX
system calls are supported by NFS.

In NFS, each message is self-contained

The advantage of this scheme is that the server does not have to remember
anything about open connections in between calls to it. Thus, if a server
crashes and then recovers, no information about open files is lost, because
there is none.

A server like this that does not maintain state information


about open files is said to be stateless server
In contrast, in UNIX System V, the Remote File System (RFS)
requires a file to be opened before it can be read or written.

The server then makes a table entry keeping track of the file is open, and
where the reader currently is, so each request need not carry an offset.
The disadvantage of this scheme is that if a server crashes and then
quickly reboots, all open connections are lost, and client programs fails.

12

File System Operations (1)

An incomplete list of file system operations supported by NFS


13

File System Operations (2)

An incomplete list of file system operations supported by NFS


14

NFS Protocols: Directory and File Access

The NFS scheme makes it difficult to achieve the exact


UNIX file semantics.
In UNIX, a file can be opened and locked so that other
processes cannot access it.
When the file is closed, the locks are released.
In a stateless server such as NFS, locks cannot be
associated with open files, because the server does
not know which files are open. NFS therefore needs a
separate, additional mechanism to handle locking.

15

NFS Protocols: Directory and File Access

NFS uses the UNIX protection mechanism, with rwx bits for
the owner, group, and others.
Originally, each request message simply contained the user
and group ids of the caller, which the NFS server used to
validate the access.

Currently, public key cryptography can be used to establish


a secure key for validating the client and server on each
request and reply.

In effect, it trusted the clients not to cheat.

When this option is enabled, a malicious client cannot impersonate


another client because it does not know that clients secret key.

As an aside, cryptography is used only to authenticate the


parties. The data themselves are never encrypted.

16

Network Information Service (NIS)

All the keys used for the authentication, as well as other information are
maintained by the NIS (Network Information Service)
The NIS was formerly known as the yellow pages
Its function is to store (key, value) pairs

When a key is provided, it returns the corresponding value.


Not only does it handle encryption keys, but it also stores the mapping of
user names to (encrypted) passwords, as well as the mapping of machine
names to network addresses, and other items.

The network information servers are replicated using a master/slave


arrangement

To read their data, a process can use either the master or any of the copies
in the slaves.

However, all changes must be made only to the master, which then
propagates them to the slaves.

There is a short interval after an update in which the NIS server is


inconsistent.

17

Implementation: NFS Layer Structure

18

NFS Implementation

It consists of three layers:

System call layer:


This handles calls like OPEN, READ, and CLOSE.

Virtual file system (VFS):


The task of the VFS layer is to maintain a table with one entry for
each open file, analogous to the table of I-nodes for open files in
UNIX. VFS layers has an entry, called a v-node (virtual i-node) for
every open file telling whether the file is local or remote.

NFS client code:


Used to create an r-node (remote i-node) in its internal tables to hold
the file handles. The v-node points to the r-node. Each v-node in the
VFS layer will ultimately contain either a pointer to an r-node in the
NFS client code, or a pointer to an i-node in the local operating
system. Thus from the v-node it is possible to see if a file or directory
19
is local or remote, and if it is remote, to find its file handle.

NFS Implementation (cont.)

Use client caching to improve the performance:

Transfer between client and server are done in large


chunks, normally 8 Kbytes, even if fewer bytes are
requested. This is known as read ahead.
The same for writes, if a write system call writes fewer
than 8 Kbytes, the data are just accumulated locally.
Only when the entire 8K chunk is full is it sent to the
server. However, when a file is closed, all of its data
are sent to the server immediately.

20

NFS Implementation (cont.)

Client caching improves performance

Problem: 2 clients caching the same file block and that one of them
modifies it. When the other one reads the block, it gets the old value.

Solutions:

Solution 1:

Associate with each cache block a timer, when the timer expires, the
entry is discarded. Normally, the timer is 3 sec. for data blocks and
30 sec. for directory block.

Solution 2:

Whenever a cached file is open, a message is sent to the server to


find out when the file was last modified.
If the last modification occurred after the local copy was cached, the
cached copy is discarded and the new copy fetched from the server.
Finally once every 30 sec. a cache timer expires, and all the dirty 21
blocks in the cache are sent to the server.

NFS Implementation (cont.)

Criticism:

NFS has been widely criticized for not implementing


the proper UNIX semantics
A write to a file on one client may or may not be
seen when another client reads the file, depending
on the timing
When a file is created, it may not be visible to the
outside world for as much as 30 sec.

22

NFS Implementation (cont.)

Lessons learned:

Workstations have cycles to burn, so do it on the


client-side, not the server-side
Cache whenever possible
Exploit the usage properties
Minimize systemwide knowledge and change
Trust the fewest possible entities
Batch work where possible

23

The Andrew File System (AFS)

A different approach to remote file access


Meant to service a large organization

Such as a university campus

Scaling is a major goal

24

Basic AFS Model

Files are stored permanently at file server


machines
Users work from workstation machines

With their own private namespace

Andrew provides mechanisms to cache users


files from shared namespace

25

Basic AFS Model (cont.)

User model of AFS use:

Sit down at any AFS workstation anywhere


Log in and authenticate who I am
Access all files without regard to which workstation Im
using

The local namespace:

Each workstation stores a few files

Mostly system programs and configuration files

Workstations are treated as generic, interchangeable


entities

26

Virtue and Vice

Vice is the system run by the file servers

Distributed system

Virtue is the protocol client workstations


use to communicate to Vice

27

Overall Architecture

System is viewed as a WAN composed


of LANs
Each LAN has a Vice cluster server

Which stores local files

But Vice makes all files available to all


clients

28

AFS Architecture Diagram

LAN

WAN
LAN
LAN

29

Caching the User Files

Goal is to offload work from servers to clients


When must servers do work?

To answer requests
To move data

Whole files cached at clients. Why? Reasons:

Minimizes communications with server

Most files used in entirety, anyway

Easier cache management problem

Requires substantial free disk space on workstations30

Doesnt address huge file problems

The Shared Namespace

An Andrew installation has global shared


namespace
All clients files are viewed in the namespace
with the same names
High degree of name and location transparency

31

How do servers provide the


namespace?

Files are organized into volumes


Volumes are grafted together into overall
namespace
Each file has globally unique ID
Volumes are stored at individual servers

But a volume can be moved from server to server

32

Finding a File

At high level, files have names


Directory translates name to unique ID
If client knows where the volume is, it simply
sends unique ID to appropriate server

33

Finding a Volume

What if you enter a new volume?

How do you find which server stores the volume?

Volume-location database stored on each server


Once information on volume is known, client
caches it

34

Making a Volume

When a volume moves from server to server,


update database

Heavyweight distributed operation

What about clients with cached information?


Old server maintains forwarding info

Also eases server update

35

Handling Cached Files : Venus

Files fetched transparently when needed


File system traps opens

Sends them to local Venus process

The Venus Daemon:

Responsible for handling single client cache

Caches files on open

Writes modified versions back on close

36

Consistency for AFS

If my workstation has a locally cached copy


of a file, what if someone else changes it?
Callbacks used to invalidate my copy
Requires servers to keep info on who caches
files

37

Write Consistency in AFS

What if I write to my cached copy of a file?


Need to get write permission from server

Which invalidates anyone elses callback

Permission obtained on open for write


Need to obtain new data at this point
Initially, written only to local copy
On close, Venus sends update to server
Server will invalidate callbacks for other copies
38
Extra mechanism to handle failures

Storage of Andrew Files

Stored in UNIX file systems


Client cache is a directory on local machine

Low-level names do not match Andrew names

39

Venus Cache Management

Venus keeps two caches

Status cache kept in virtual memory

Status
Data
For fast attribute lookup

Data cache kept on disk

40

Venus Process Architecture

Venus is a single user process


But multithreaded
Uses RPC to talk to server

RPC is built on low level datagram service

41

AFS Security

Only server/Vice are trusted here

Client machines might be corrupted

No client programs run on Vice machines


Clients must authenticate themselves to servers
Encryption used to protect transmissions

42

AFS File Protection

AFS supports access control lists

Each file has list of users who can access it


And permitted modes of access

Maintained by Vice
Used to mimic UNIX access control

AFS Read-Only Replication

For volumes containing files that are used frequently,


but not changed often (e.g. executables), AFS allows
43
multiple servers to store read-only copies

You might also like