Distributed File Systems
Distributed File Systems
(DFS)
Server
Server A
A
I need to
I want to store my
store my analysis
thesis on and
the server! I need to I need reports
have my book storage My boss safely…
always for my wants…
available.. reports
Introduction
Server
Server A
A
Same
here… I
Server
Server B
B don’t
remember.
.
Server
Server C
C Hey… but
where did I
Uhm… … maybe we put my
need a DFS?... Well docs?
after the paper and a Wow… now I can
nap… store a lot more
I am not sure documents…
whether server A,
or B, or C…
Introduction
Server
Server BB Server
Server A
A
Server
Server CC
Distributed
Distributed File
File System
System
Nice… my
It is reliable, fault tolerant, Good… I can access
boss will Wow! I do not
highly available, location my folders from
promote have to
transparent…. I hope I can anywhere..
me! remember which
finish my newspaper server I stored
now… the data into…
Storage systems and their properties
7
Storage systems and their properties
Other features:
– mountable file stores
– more? ...
9
What is a file system? 2
filedes = open(name, mode) Opens an existing file with the given name.
filedes = creat(name, mode) Creates a new file with the given name.
Both operations deliver a file descriptor referencing the open
file. The mode is read, write or both.
status = close(filedes) Closes the open file filedes.
count = read(filedes, buffer, n) Transfers n bytes from the file referenced by filedes to buffer.
count = write(filedes, buffer, n) Transfers n bytes to the file referenced by filedes from buffer.
Both operations deliver the number of bytes actually transferred
and advance the readwrite pointer.
pos = lseek(filedes, offset, Moves the readwrite pointer to offset (relative or absolute,
whence) depending on whence).
status = unlink(name) Removes the file name from the directory structure. If the file
has no other names, it is deleted.
status = link(name1, name2) Adds a new name (name2) for a file (name1).
status = stat(name, buffer) Gets the file attributes for file name into buffer.
10
What is a file system? 2
Class Exercise A
Write a simple C program to copy a file using the UNIX
file system operations:
11
Exercise A solution
Write a simple C program to copy a file using the UNIX file system operations.
Blocks
Blocks
Device
Device 13
Files
Files Directories
Directories
What is a file system? 4
14
Distributed File system/service requirements
Transparency Tranparencies
Changes to aproperties
file by one client should not interfere
Replication
Heterogeneity properties
Access:
Fault Same
tolerance
Consistency with operations
the operation(client of other programs
clients are
Concurrency FileSecurity
service
Efficiency maintains multiple identical copies of
Service
Service unaware
can be of
accessed
simultaneously
mustone-copy
continue distribution
by
accessing
tocontrol
operate of
clients files)
running
evenor when on
changing
Unix
files
Must offers
maintain access update semantics
and privacy for as for
Replication (almost)
Goal
Location:
clientsfor
Sameany
the
make OS
distributed
same
name or hardware
file.
errors file
space
or systems
after
crash. platform.
is usually
relocation of
operations
local on local
files. between files - caching is completely
• Design
Load-sharing
performance
files
must comparable
or processes
be compatible servers
(client
with makes
tothe
local
programs
file service
file system.
systems of
Concurrency
transparent.
Service must properties
resume after a server machine
Heterogeneity more • based
scalable
differentshould on identity of user
OSessee a uniform file name space)
making request
crashes.
Isolation
Difficult to achieveofthe
•identities same for distributed file
• Service
Local access
Mobility: Automatic has remote
better
relocation users
response
of must
files be
(lower
is authenticated
latency)
possible
Fault tolerance If the
File-level interfaces
systems
service
or while
is mustlocking
replicated,
record-level be open
maintaining it cangood-continue
precise
performance
to
• Fault • privacy
(neither requires
client secure
programs communication
nor system
andtolerance
specifications
scalability.
operate even of APIs aare
during published.
server crash.
Consistency Other formsadmin
Service of concurrency
tablesare
interfaces in opencontrol
client nodes
to to processes
all minimise
need to be not
Full replication is
contention difficult to implement.
changed
excluded by awhen
firewall.files are moved).
Security Caching (of all or part of a file) gives most of the
Performance: Satisfactory performance
•vulnerable to impersonation and other across a
benefitsspecified
(except fault range tolerance)
of system loads
Efficiency.. attacks
Scaling: Service can be expanded to meet
File service is most heavily loaded additional loads or growth.
service in an intranet, so its
functionality and performance
are critical 15
File Service Architecture
16
Model file service architecture
Lookup
AddName
UnName
Client computer Server computer
GetNames
Client module
Read
Write
Create
Delete
GetAttributes
SetAttributes
17
Responsibilities of various modules
18
Server operations/interfaces for the model file service
20
DFS: Case Studies
21
Case Study: Sun NFS
An industry standard for file sharing on local networks since the 1980s
An open standard with clear and simple interfaces
Closely follows the abstract file service model defined above
Supports many of the design requirements already mentioned:
– transparency
– heterogeneity
– efficiency
– fault tolerance
22
NFS - History
2010: NFSv4.1
– Adds Parallel NFS (pNFS) for parallel data access
2015
– RFC 7530
NFS architecture
Client computer
remote files
UNIX NFS UNIX
NFS NFS Client
file file
Other
client server
system system
NFS
protocol
(remote operations)
24
NFS architecture:
does the implementation have to be in the system kernel?
No:
– there are examples of NFS clients and servers that run at application-
level as libraries or processes (e.g. early Windows and MacOS
implementations, current PocketPC, etc.)
25
NFS server operations (simplified)
27
Architecture Components (UNIX / Linux)
Server:
– nfsd: NFS server daemon that services requests from clients.
– mountd: NFS mount daemon that carries out the mount request
passed on by nfsd.
– rpcbind: RPC port mapper used to locate the nfsd daemon.
– /etc/exports: configuration file that defines which portion of the file
systems are exported through NFS and how.
Client:
– mount: standard file system mount command.
– /etc/fstab: file system table file.
– nfsiod: (optional) local asynchronous NFS I/O server.
Mount service
Mount operation:
mount(remotehost, remotedirectory, localdirectory)
Server maintains a table of clients who have
mounted filesystems at that server
Each client maintains a table of mounted file
systems holding:
< IP address, port number, file handle>
Hard versus soft mounts
29
Local and remote file systems accessible on an NFS client
Remote Remote
people students x staff users
mount mount
Note: The file system mounted at /usr/students in the client is actually the subtree located at /export/people in Server 1;
the file system mounted at /usr/staff in the client is actually the subtree located at /nfs/users in Server 2.
30
Automounter
31
Kerberized NFS
32
New design approaches
'Serverless' architecture
– Exploits processing and disk resources in all available network nodes
– Service is distributed at the level of individual files
Examples:
xFS : Experimental implementation demonstrated a substantial performance gain
over NFS and AFS
Peer-to-peer systems: Napster, OceanStore (UCB), Farsite (MSR), Publius (AT&T
research) - see web for documentation on these very recent systems
Cloud-based File Systems: DropBox
33
Dropbox Folder
Automatic
synchronization
Distributed File systems provide illusion of a local file system and hide complexity
from end users.
Sun NFS is an excellent example of a distributed service designed to meet many
important design requirements
Effective client caching can produce file service performance equal to or better than
local file systems
Consistency versus update semantics versus fault tolerance remains an issue
Most client and server failures can be masked
Superior scalability can be achieved with whole-file serving (Andrew FS) or the
distributed virtual disk approach
Advanced Features:
– support for mobile users, disconnected operation, automatic re-integration
– support for data streaming and quality of service (Tiger file system, Content Delivery
Networks)
35