06 dfs2
06 dfs2
1
Logistical Updates
• P0
• Due date: Midnight EST 9/22 (Thursday)
• NOTE: We will not accept P0 after Midnight EST 9/24
• Each late day constitutes a 10% penalty (max -20%)
• Attend office hours in case you are having trouble
• Solutions for P0 discussed next Monday
• Recitation sections on Monday 9/26
• Learn about good solutions to P0
• May help to learn how to structure GO code (for P1)
• P1 Released!
• For deadlines class page is most up to date
2
Review of Last Lecture
3
Today's Lecture
4
Topic 2: File Access Consistency
5
Session Semantics in AFS v2
• What it means:
• A file write is visible to processes on the same box
immediately, but not visible to processes on other
machines until the file is closed
• When a file is closed, changes are visible to new
opens, but are not visible to “old” opens
• All other file operations are visible everywhere
immediately
• Implementation
• Dirty data are buffered at the client machine until file
close, then flushed back to server, which leads the
server to send “break callback” to other clients
6
AFS Write Policy
• Writeback cache
• Opposite of NFS “every write is sacred”
• Store chunk back to server
• When cache overflows
• On last user close()
• ...or don't (if client machine crashes)
• Is writeback crazy?
• Write conflicts “assumed rare”
• Who wants to see a half-written file?
7
Results for AFS
9
Implications on Location
Transparency
• NFS: no transparency
• If a directory is moved from one server to another, client
must remount
• AFS: transparency
• If a volume is moved from one server to another, only
the volume location database on the servers needs to
be updated
10
Naming in NFS (1)
• AFS: transparency
• If a volume is moved from one server to another, only
the volume location database on the servers needs to
be updated
13
Topic 4: User Authentication and
Access Control
• User X logs onto workstation A, wants to access files
on server B
• How does A tell B who X is?
• Should B believe A?
• Choices made in NFS V2
• All servers and all client workstations share the same <uid,
gid> name space B send X’s <uid,gid> to A
• Problem: root access on any client workstation can lead
to creation of users of arbitrary <uid, gid>
• Server believes client workstation unconditionally
• Problem: if any client workstation is broken into, the
protection of data on the server is lost;
• <uid, gid> sent in clear-text over wire request packets
can be faked easily
14
User Authentication (cont’d)
15
A Better AAA System: Kerberos
KDC
fs” ticket server
ss
c ce generates S
a
d to
e
“ Ne [S] Kf [ file server
nt s S]
K c li e encrypt S with
client’s key
client
17
Background
18
CODA
19
Hardware Model
20
Accessibility
21
Design Rationale
• Scalability
• Callback cache coherence (inherit from AFS)
• Whole file caching
• Fat clients. (security, integrity)
• Avoid system-wide rapid change
• Portable workstations
• User’s assistance in cache management
22
Design Rationale –Replica
Control
• Pessimistic
• Disable all partitioned writes
- Require a client to acquire control (lock) of a cached
object prior to disconnection
• Optimistic
• Assuming no others touching the file
- conflict detection
+ fact: low write-sharing in Unix
+ high availability: access anything in range
23
What about Consistency?
24
Pessimistic Replica Control
25
Leases
26
Optimistic Replica Control (I)
27
Optimistic Replica Control (II)
28
Coda States
Hoarding
Emulating Recovering
1. Hoarding:
Normal operation mode
2. Emulating:
Disconnected operation mode
3. Reintegrating:
Propagates changes and detects inconsistencies
29
Hoarding
30
Prioritized algorithm
31
Emulation
• In emulation mode:
• Attempts to access files that are not in the client caches
appear as failures to application
• All changes are written in a persistent log,
the client modification log (CML)
• Coda removes from log all obsolete entries like those
pertaining to files that have been deleted
33
Reintegration
• When workstation gets reconnected, Coda initiates a
reintegration process
• Performed one volume at a time
• Venus ships replay log to all volumes
• Each volume performs a log replay algorithm
• In practice:
• No Conflict at all! Why?
• Over 99% modification by the same person
• Two users modify the same obj within a day: <0.75%
35
Remember this slide?
36
What’s now?
37
Today's Lecture
38
Low Bandwidth File System
Key Ideas
• A network file systems for slow or wide-area
networks
• Exploits similarities between files or versions of
the same file
• Avoids sending data that can be found in the server’s
file system or the client’s cache
• Also uses conventional compression and caching
• Requires 90% less bandwidth than traditional
network file systems
39
Working on slow networks
40
LBFS design
41
Indexing
42
LBFS chunking solution
43
Effects of edits on file chunks
44
More Indexing Issues
• Pathological cases
• Very small chunks
• Sending hashes of chunks would consume as much
bandwidth as just sending the file
• Very large chunks
• Cannot be sent in a single RPC
• LBFS imposes minimum (2K) and maximum
chunk (64K) sizes
45
The Chunk Database
46
DFS in real life
Picture Credit: Yong Cui, QuickSync: Improving Synchronization Efficiency for Mobile Cloud Storage Services 47
Features and Comparisons
51
Key Lessons for LBFS
52