Coda File System: Design Overview
Coda File System: Design Overview
The main purpose of CFS is enabling disconnected operation, mainly for portable devices, which is achieved by
using the caching of critical data, in turn leading to improvement in availability.
Design overview
• Client communicates over high bandwidth network with server.
• Clients view Coda as a single, location-transparent shared Unix file system.
• The Coda namespace is mapped to individual file servers at the granularity of sub trees called volumes. At
each client, a cache manager (Venus) dynamically obtains and caches volume mappings.
• High Availability is one of the major goals of CFS. Two complimentary mechanisms used for this are : Server
replication and Disconnected operation.
• Coda uses optimistic replica control for providing high availability based on the assumption that there is a
low degree of write-sharing.
Server replication
Volumes have read-write replicas at more than one server. There is a performance cost due to the server replication.
To counter this cost caching on local disk and parallel access protocols are used. Venus uses callbacks for cache
coherence, to guarantee that an open file yields its latest copy in the AVSG (Accessible Volume Storage Group).
Servers notify clients when their cache copies are no longer valid by callback break. Modifications in Coda are
propagated in parallel to all AVSG sites and eventually to missing VSG sites.
Disconnected operation
While disconnected Venus relies totally on local cache. Cache misses appear as failures. When it reconnects Venus
propagates changes and reverts to server replication.
• Whole-file caching: offers the added advantage of a much simpler failure model: a cache miss can only occur on
an open, never on a read, write, seek, or close. This, in turn, substantially simplifies the implementation of
disconnected operation.
• Placing of functionality on clients rather than servers.
• Avoidance of system-wide rapid change.
Clients and Servers
Clients are like appliances: they have limited disk storage capacity, their software and hardware may be tampered
with, and their owners may not be diligent about backing up the local disks. Servers are like public utilities: they have
much greater disk capacity, they are physically secure, and they are carefully monitored and administered by
professional staff. First-class replicas on servers, and second-class replicas (i.e., cache copies) on clients. First-class
replicas are of higher quality: they are more persistent, widely known, secure, available, complete and accurate.
Second-class replicas, in contrast, are inferior along all these dimensions.
Venus
• Due to the complexity of Venus, it is implemented as a user-level process than part of the kernel.
• A tiny in-kernel Minicache is implemented to filter out kernel-Venus interactions which otherwise leads to a heavy
performance overhead.
• Venus operates in three states
o Hoarding: Cache is managed using the prioritized cache management algorithm, which makes use of recent
history and hoard database. The hoard database contains pathnames and priority information. Venus uses
hierarchical cache management to resolve pathnames. Venus does hoard walks to maintain cache equilibrium.
o Emulation: Venus acts as a pseudo-server and maintains a per-volume log of mutating operations called a replay
log.
o Reintegration: Venus transforms from being a pseudo server to cache manager and uses the replay algorithm to
propagate changes from client to AVSG. Coda only considers write/write conflicts for conflict handling.
Strengths
• The simple but elegant idea behind disconnected operation in coda is that caching can be used to enhance
availability as well as performance.
• Coda is unique in that it exploits caching for both performance and high availability while preserving a high
degree of transparency.
• The users can work disconnected for a couple of days with only about 50-100 MB of local storage and the re-
integration upon reconnection takes only a couple of minutes.
• Coda provides a simple way to handle priority of cached files by the use of HDB’s.
Weaknesses
• Coda has a very simple conflict handling. During conflict the first update would always go through and the
subsequent updates are lost or rolled back.
• Caching entire files and relying on having few write conflicts limits the applicability of the solution, since it rules
out both applications with a large database or several large files (tens or hundreds of megabytes) and online
transaction processing applications where many writes are made to files.
• Whole file caching means users must plan their work to take full advantage of disconnected operation.
• There needs to be machinery in the system for detecting conflicts, for automating resolution when possible, and
for confining damage and preserving evidence for manual repair.
• Having to repair conflicts manually violates transparency, is an annoyance to users, and reduces the usability of
the system.
• Coda needs the client machines to be powerful as majority of the disconnected operation is handled by Venus at
the client. For optimal replica control strategy to be viable the system has to be sophisticated
• Coda is mainly designed for local area networks; there will be a significant drop in performance if Coda is used
on a wide area system.
Comparisons
• Venus is implemented as a user-level process instead of kernel level process, this is to increase the portability of
the system.
• Callbacks in general increase the performance of the system but callbacks are prone to failures.
• The major limitations to use coda in mobile devices is storage. There needs to be high storage in mobiles to
support coda.
• The Store-Id is like a timestamp and stores the information about the latest update -like who updated it and when
the file was last updated.
• General wide area problems are consistency, low network rates and concurrency.