DC Ese Notes
DC Ese Notes
Mod 2-5
2 Marks Questions:
Mod 2,3
1. Encapsulation in layered protocols
• Each layer adds its own header (and possibly trailer) to data before passing it down, hiding
lower‐level details.
• This “wrapper” lets higher layers treat communication uniformly without knowing how bits
travel.
• Example: Like sending a letter: you put your message in an envelope (application layer), then
the post office adds tracking info (network layer).
2. What is MPI, and why is it used in distributed systems?
• MPI (Message Passing Interface) is a standard library for processes to exchange messages in
parallel programs.
• It’s used to coordinate work and share data efficiently across multiple machines or CPU cores.
• Example: Weather simulation on a supercomputer where different nodes calculate different
regions and share boundary data via MPI.
3. State examples for the following message communication models.
1. Transient Synchronous
o Sender blocks until receiver gets the message.
o Example: Making a direct phone call—you speak only when the listener is on the line.
2. Persistent Asynchronous
o Sender writes message to mailbox and continues immediately; receiver reads later.
o Example: Sending an email—you don’t wait for the recipient to read it.
4. Key characteristics of multicast communication
• One‐to‐many delivery: a sender transmits a single message to a group of receivers.
• Efficient bandwidth use: network duplicates data only where branches diverge.
• Example: Live video streaming to multiple subscribers in a video conference.
5. What do we already know about network transparency in RPC?
• RPC makes remote calls look like local function calls, hiding network details.
• Developers don’t manage sockets or serialization manually.
• Example: Calling getUserData() on a server as if it were a local library function.
6. Differences between synchronous and asynchronous RPC
• Synchronous RPC: Caller waits (blocks) until the remote function returns a result.
• Asynchronous RPC: Caller sends request and continues; result is delivered later via callback
or polling.
• Example:
o Sync: Waiting on a web API call before proceeding.
o Async: Sending a notification request and handling response in an event handler.
7. Assumptions made about the reliability of group communication
• Messages may be lost, duplicated, or reordered; protocols must handle these failures.
• Group membership can change (joins/leaves) and must be tracked consistently.
• Example: Chat room apps assume users can disconnect unexpectedly and handle message
recovery.
8. Use of remote procedure call in distributed systems
• Provides a simple interface for invoking operations on remote machines.
• Abstracts marshalling (packing) and unmarshalling of parameters and results.
• Example: A web service exposes processPayment() via RPC so clients call it like a local
function.
9. Elaborate Remote Method Invocation (RMI)
• RMI extends RPC to object‐oriented systems, allowing methods on remote objects to be
invoked.
• It handles object serialization, method dispatch, and remote garbage collection.
• Example: Java RMI lets you call remoteAccount.deposit(100) on a server‐side Account object.
10. Illustrate the concept of Message Passing Interface
• MPI defines functions like MPI_Send and MPI_Recv for explicit data exchange between
processes.
• Processes are identified by ranks; messages include source, destination, and tags.
• Example: In a parallel matrix multiplication, each process sends submatrix results to neighbors
using MPI_Send.
11. Elaborate the concept of Interprocess Communication (IPC)
• IPC enables data exchange and synchronization between processes on the same or different
hosts.
• Common mechanisms include pipes, shared memory, message queues, and sockets.
• Example: Two local processes coordinate via a named pipe where one writes log entries and
the other reads them for display.
10. What is task migration, and how does it contribute to resource optimization?
• Task migration moves tasks between nodes to balance load or avoid failures.
• It ensures efficient use of CPU and memory across the system.
• Example: Cloud platforms shifting user requests from busy to idle servers.
12. Define and provide examples of transient faults and intermittent faults.
• Transient fault: Temporary error that disappears quickly.
o Example: Network lag due to sudden spike in traffic.
• Intermittent fault: Occurs randomly over time.
o Example: A loose cable causing occasional data errors.
2. What are the implications of using multicast communication over broadcast communication?
1. Multicast sends a message only to a specific group of receivers, while broadcast sends to all
devices on the network.
2. Multicast reduces network traffic and improves efficiency by targeting only interested
receivers.
3. It is more secure and controlled, as messages are not exposed to all nodes like in broadcast.
4. Multicast is scalable, especially useful in large networks like live video conferencing or
software updates.
5. However, multicast requires group management, while broadcast is simpler but less efficient.
6. Example: Sending updates to all subscribed users in a stock trading platform using multicast
instead of broadcasting to every user.
14. What are the components of Cristian’s Algorithm, and how do they interact?
1. Cristian’s Algorithm is used for time synchronization between computers in a distributed
system. It involves two main components: a client and a time server.
2. The client sends a request to the server, asking for the current time. It also records the time
when the request was sent to measure delay later.
3. When the server receives the request, it responds with its current time immediately. This
response is then received by the client after some delay.
4. The client measures the round-trip time and estimates the one-way delay by dividing it in half.
It adjusts the server’s time by subtracting this delay.
5. This adjusted server time is used by the client to set or update its own clock. This method works
well when network delay is small and nearly equal in both directions.
6. For example, Cristian’s Algorithm could be used in a stock trading system where local trading
terminals sync time with a central financial server.
15. How do you know that logical clocks accurately maintain event order?
1. Logical clocks do not represent actual physical time but help keep track of the order of events
in distributed systems. They are mainly used where event sequencing is more important than
real time.
2. Each process maintains its own counter, which it increments before every event. When sending
a message, it includes this clock value in the message.
3. Upon receiving a message, the receiver compares the incoming clock with its own and sets its
clock to the maximum of both, plus one. This ensures causality is preserved.
4. For example, if Process A sends a message after event 3 and Process B receives it before event
2, logical clocks adjust to reflect the message came after event 3.
5. This follows Lamport’s "happened-before" relation, which ensures that the system respects the
correct order of dependent events across different nodes.
6. Thus, logical clocks maintain the causal sequence of events even if system clocks are
unsynchronized, which is useful in debugging and distributed databases.
19. Illustrate method for Physical Clock Synchronization for passive mode
1. In passive mode, the client does not send requests but waits for time messages from the server.
It passively listens to synchronization broadcasts.
2. When it receives a message from the server, it records the time of arrival and compares it with
the server time included in the message.
3. The client estimates the one-way delay using past data or by assuming minimal network delay
and adjusts its clock accordingly.
4. This method avoids adding network load because the client does not initiate communication.
5. Passive synchronization is often used in systems like GPS or radio clocks where devices sync
with a central signal.
6. It is also common in large sensor networks where nodes conserve energy by not sending
messages frequently.
20. Illustrate method for Physical Clock Synchronization for Active mode
1. In active mode, the client takes the initiative to send a request to the server for time
synchronization. This is similar to Cristian’s Algorithm.
2. The client sends a message and notes the time it was sent. When the server replies with its
current time, the client notes the arrival time.
3. The client calculates the round-trip delay and assumes one-way delay is half. Then it adjusts
the server’s time to fit its own timeline.
4. This method is more accurate than passive mode as it measures the current network delay
directly.
5. Active synchronization is useful in systems like financial servers or airline booking platforms
where up-to-date time is critical.
6. However, it adds some network overhead due to the active communication.
20. Why is fault tolerance critical for distributed systems? Explain with examples
1. Fault tolerance is crucial in distributed systems because these systems often span multiple
nodes, which may fail independently due to hardware issues, network failures, or software bugs.
2. Fault tolerance ensures that a system can continue operating despite failures, preventing data
loss and maintaining availability.
3. Techniques like replication, redundancy, and consensus protocols are used to ensure that
failures do not cause system-wide outages.
4. For example, in a cloud-based service like AWS, fault tolerance ensures that user data is
available even if one server or data center goes offline.
5. In distributed databases, fault tolerance is achieved by replicating data across different nodes.
If one node fails, the system can still provide service using the replicated data.
6. Without fault tolerance, distributed systems would be vulnerable to unexpected downtimes,
which could lead to service interruptions, data corruption, or loss.
21. What assumptions are made in the Paxos protocol for achieving consensus?
1. Paxos assumes that nodes can fail and recover, but do not act maliciously. It is designed to
tolerate crash failures, not Byzantine failures.
2. It assumes a reliable message delivery, though messages can be delayed, lost, or duplicated,
but not corrupted. Messages are eventually delivered.
3. Paxos also assumes that a majority (quorum) of nodes is always available to proceed with
consensus. This is critical to make progress.
4. The protocol expects that there is no global clock and no assumptions about exact timing; it
only requires that messages will eventually reach their destination.
5. Another assumption is that nodes operate independently and can act concurrently, but each
proposer and acceptor behaves correctly according to protocol rules.
6. For example, Paxos can be used in distributed databases to ensure that multiple replicas agree
on the same value, even if some servers crash during the process.
22. Describe the role of conflict resolution in maintaining data integrity in replication
1. In distributed systems, replication can lead to conflicts when different nodes update the same
data concurrently. Conflict resolution ensures that these updates don’t corrupt data integrity.
2. Conflict resolution techniques can be manual, automatic, or based on predefined rules like
"last-writer-wins", where the latest timestamped update is kept.
3. More advanced systems use merge functions or application-specific logic to combine
conflicting changes meaningfully instead of discarding any.
4. Conflict detection requires version tracking (like vector clocks), which helps identify
divergent updates from different replicas.
5. For example, in collaborative editing tools like Google Docs, conflict resolution ensures that
multiple users editing the same document don’t overwrite each other's changes.
6. Without effective conflict resolution, replication could lead to inconsistencies, defeating the
purpose of having multiple synchronized copies for fault tolerance.
10 Marks Question:
Mod 2,3
1. What is Remote Procedure Call? Explain the working of RPC in detail.
1. Remote Procedure Call (RPC) allows a program to invoke a procedure on a remote machine
as if it were local. It hides the network communication details from the programmer.
2. The main goal of RPC is to simplify distributed computing by making remote interactions look
like local function calls.
3. When a client invokes a remote procedure, the client stub packs the request and sends it over
the network to the server.
4. The request travels through the RPC runtime and is received by the server stub, which
unpacks the message and calls the appropriate procedure.
5. The server executes the procedure and sends the result back through the server stub to the client.
6. The client stub then unpacks the result and returns it to the calling function, completing the
RPC call transparently.
7. RPC supports different communication semantics like at-most-once, at-least-once, and
exactly-once, depending on reliability needs.
8. A key challenge in RPC is handling failures—like lost messages or server crashes—while
maintaining consistency.
9. An example of RPC is when a front-end app requests data from a backend server using a remote
call without knowing the actual server location.
4. How does the absence of guaranteed delivery affect message queues in distributed systems?
1. In distributed systems, message queues are used to pass messages between components
asynchronously.
2. Without guaranteed delivery, some messages may be lost due to network failures, timeouts,
or crashes.
3. This loss can cause inconsistencies, especially if critical messages like transaction updates are
not received.
4. Systems need to implement retries, acknowledgments, or logging mechanisms to ensure
reliability.
5. Without these, a sender may assume a message was delivered while the receiver never received
it.
6. Applications must be designed to tolerate message loss or ensure idempotent operations to
handle retries safely.
7. Some message queue systems offer options like "at-least-once" or "exactly-once" delivery,
but these come at a performance cost.
8. For example, in an order-processing system, losing a message might result in a customer not
receiving their order confirmation.
9. To mitigate such issues, developers often use persistent message queues or introduce
transaction-like behavior for critical operations.
12. Analyze what is happens-before relation in Logical clock? Does the ordering of event matter?
Explain with Lamport logic and a suitable example.
1. The happens-before (→) relation defines a causal ordering of events in distributed systems. It
holds that if two events occur in the same process, the earlier event → the later one, and if event
A sends a message that event B receives, then A → B.
2. This relation is transitive: if A → B and B → C, then A → C. It captures the notion of causality,
ensuring effects never precede their causes across processes.
3. Event ordering matters because operations that depend on each other must be observed in the
correct sequence to maintain consistent system state. Ignoring order breaks causality.
4. Lamport’s logical clock assigns a counter to each event: each process increments its clock
before executing an event and attaches this timestamp when sending a message.
5. On receiving a message, a process updates its clock to max(local, received) + 1, ensuring its
new timestamp reflects the causally preceding event.
6. If two events are concurrent (neither → the other), Lamport clocks may still order them
arbitrarily, but this does not violate causality since there is no causal link.
7. Example: Process P1 has clock=2, sends “m” to P2. P2’s clock was 1, so on receipt it becomes
max(1,2)+1 = 3, preserving send(m) → receive(m).
8. This mechanism allows all processes to agree on the order of causally related events even
without synchronized physical clocks.
9. In summary, the happens-before relation and Lamport’s logical clocks guarantee that causally
linked events are consistently ordered across distributed processes.
13. Explain the key components of the Ricart–Agrawala algorithm and their importance.
1. The Ricart–Agrawala algorithm is a non-token-based method for ensuring mutual exclusion
in distributed systems. Its first component is timestamped Request messages, carrying the
sender’s Lamport timestamp and ID.
2. Each process maintains a Lamport clock to assign timestamps, ensuring a total ordering of
all requests across the system.
3. When a process wants the critical section (CS), it sends a Request(timestamp, pid) to all other
processes and waits for their replies.
4. On receiving a Request, a process sends an immediate Reply if it is not interested in the CS or
if the incoming timestamp is smaller (higher priority) than its own.
5. If the receiving process is also requesting the CS with a smaller timestamp, it defers its reply
by enqueuing the incoming request in a deferred queue.
6. A process can only enter the CS after receiving Reply messages from all other processes,
guaranteeing exclusive access.
7. After exiting the CS, the process sends Reply messages to all queued requests, allowing them
to proceed in timestamp order.
8. This design ensures fairness (oldest request first), deadlock freedom, and no starvation, as
every request eventually gets a reply.
9. The combination of logical timestamps, request/reply messaging, and deferred queues makes
Ricart–Agrawala simple, decentralized, and efficient for mutual exclusion.
15. Illustrate Raymond’s Tree-Based Algorithm for token-based distributed mutual exclusion.
1. Processes are arranged in a logical tree and exactly one token circulates; only the token holder
may enter the CS.
2. Each process has a parent pointer (toward the token) and a FIFO request queue for tracking
pending requests.
3. To request the CS, a process sends a Request to its parent pointer if it does not hold the token.
4. Intermediate nodes forward the request upstream if they are not waiting for the token, or queue
it if they have already requested it.
5. When the token holder exits the CS, it checks its queue and sends the token to the first
requester, updating pointers along the path.
6. Example: In a chain P1→P2→P3 with the token at P1, if P3 requests, P3→P2→P1. P1 then
sends token P1→P2→P3.
7. Upon receiving the token, each node passes it down the path to the requester, ensuring exclusive
CS access.
8. Message complexity is O(h) where h = tree height, often O(log N) in balanced trees, reducing
overhead compared to full broadcasts.
9. Raymond’s algorithm achieves efficient, fair mutual exclusion by routing the token along tree
paths only to requesting processes.
16. Design a solution using Maekawa’s Algorithm for managing resource access in a distributed
file system.
1. Maekawa’s algorithm uses quorum (vote) sets: each server has a unique set of peers, with
every two sets overlapping in at least one member.
2. To access a file, a server sends Request messages to all members of its quorum, asking for
permission.
3. A quorum member sends Grant if it is not currently granting to another or has not already
granted to a higher-priority request; otherwise it queues the request.
4. When the requesting server receives Grants from its entire quorum, it enters the critical section
and accesses the file.
5. After finishing, it broadcasts Release to its quorum, allowing members to grant pending
requests in their queues.
6. Example design: For 16 servers, arrange them in a 4×4 grid. Each server’s quorum is its row
plus column (size 7), ensuring overlap.
7. Overlapping quorums guarantee that two concurrent requests share at least one common
member, preventing simultaneous access.
8. This reduces message complexity to O(√N) per access (versus O(N)) and avoids a single point
of failure.
9. Maekawa’s algorithm thus provides efficient, fault-tolerant, and decentralized control for
distributed file access.
17. Illustrate Suzuki–Kasami Broadcast Algorithm for token-based distributed mutual exclusion.
1. Suzuki–Kasami employs a single token that grants exclusive CS access; requests are broadcast
to all processes.
2. Each process maintains an array RN of highest request numbers seen; the token holder
maintains LN of last served numbers.
3. To request the CS, a process increments its RN entry and broadcasts its RN to all peers.
4. Upon receiving the broadcast, peers update their RN for that sender; explicit replies are not
required.
5. When the token holder exits the CS, it checks for processes with RN[i] > LN[i], enqueues them
in increasing order.
6. It then sends the token to the first process in its queue and updates LN[i] for that process.
7. Example: P2 broadcasts RN[2]=3; token holder P1 sees RN[2]>LN[2], enqueues P2, and sends
the token to P2.
8. This ensures fairness (served in sequence number order) and deadlock freedom, as every
request is eventually served.
9. Suzuki–Kasami is simple and effective, trading off broadcast cost for a clear, sequence-based
token passing mechanism.
18. Illustrate the need for a Coordinator in Distributed Systems and demonstrate the Bully
Election Algorithm.
1. A coordinator (leader) centralizes tasks like resource allocation, clock sync, and failure
detection, simplifying coordination among processes.
2. If the coordinator fails, no process can perform these centralized tasks, so a re-election
mechanism is required to maintain system operation.
3. The Bully Algorithm elects the highest-ID alive process as coordinator: any lower-ID process
detecting failure initiates an election.
4. It sends an Election message to all higher-ID processes. If none respond, it declares itself
coordinator and broadcasts a Coordinator message.
5. If a higher-ID process responds, it takes over the election, repeating the process with even
higher IDs until the highest-ID alive process wins.
6. Example: Processes {P1=ID1, P2=ID2, P3=ID3}. If P3 fails, P1 and P2 start elections; P2 hears
no response from P3, becomes coordinator, and informs P1.
7. This ensures that the process with the maximum ID among alive processes becomes the new
leader, restoring central control.
8. The Bully Algorithm handles coordinator failures quickly but can incur O(N²) messages in
worst cases; it’s simple and effective for moderate-sized systems.
9. A reliable coordinator and election algorithm are vital for maintaining consistency and
availability in distributed environments.
19. Discuss the key components of the Ricart–Agrawala algorithm and explain their significance
in achieving mutual exclusion.
1. Logical Clocks: Each process uses Lamport clocks to timestamp requests, providing a global
ordering of critical section requests.
2. Request Messages: A process wishing to enter the CS sends Request(timestamp, pid) to all
peers, initiating permission-seeking.
3. Immediate Replies: Peers reply immediately if they are not interested in the CS or if the
incoming request’s timestamp is smaller, granting permission.
4. Deferred Replies: If a peer has its own pending request with a smaller timestamp, it queues
the incoming request instead of replying.
5. Deferred Queue: Each process maintains a queue of deferred requests; these are serviced
(replied to) once the process exits the CS.
6. Reply Counting: A requester must receive replies from all peers before entering the CS,
ensuring exclusive access.
7. Exit Protocol: On exiting, the process sends replies to all queued requests, unlocking those
processes to enter the CS.
8. These components ensure mutual exclusion (only one process in CS), fairness (oldest request
first), and deadlock freedom (no cyclic waits).
9. Ricart–Agrawala’s decentralized design avoids single points of failure and scales well for
moderate numbers of processes.
20. Develop a resource access management solution for a distributed file system using Maekawa’s
Algorithm, and explain its working and advantages.
1. In a distributed file system, assign each file server a voting set (quorum) of peers, with
overlapping sets guaranteeing mutual exclusion.
2. To access a file, a server sends Request messages to all members of its quorum, seeking
permission to enter the CS.
3. Quorum members reply with Grant if they are free; otherwise they queue the request for later.
4. Once the server collects Grants from its entire quorum, it accesses the file exclusively in its
critical section.
5. After use, the server sends Release to its quorum, allowing them to grant pending requests from
their queues.
6. Design example: For 25 servers, arrange them in a 5×5 grid; each server’s quorum is its row
plus column (size 9), ensuring overlap.
7. Advantages: Message complexity is reduced to O(√N) per access, significantly less than
contacting all N servers.
8. Overlapping quorums ensure safety (no two servers access simultaneously), and
decentralization avoids single points of failure.
9. Maekawa’s algorithm thus offers an efficient, scalable, and fault-tolerant solution for file access
management in distributed systems.
21. What are the broader implications of message complexity in Raymond’s Tree-Based
Algorithm?
1. In Raymond’s algorithm, message complexity for one CS entry is O(h), where h is the height
(number of hops) between the requester and token-holder.
2. For a balanced tree of N processes, h ≈ O(log N), giving logarithmic message cost and low
average latency.
3. However, in unbalanced trees (e.g., chains), h can be O(N), leading to high latency and heavy
network traffic for each request.
4. High message complexity increases waiting time for the critical section and can become a
bottleneck under heavy contention.
5. On the plus side, Raymond’s algorithm avoids system-wide broadcasts, limiting traffic to the
tree path and often saving messages relative to all-to-all schemes.
6. Designers must balance the tree to keep h small; poor tree structure can cause hotspots near
the token-holder and degrade performance.
7. Understanding message complexity helps in capacity planning—it guides how to structure the
process tree and place the token-holder.
8. In large-scale systems, optimizing for low h ensures scalability, while ignoring complexity can
lead to network congestion and uneven load.
9. Thus, message complexity in Raymond’s algorithm highlights key trade-offs between
efficiency, latency, and scalability in distributed mutual exclusion.
Module 4,5
3. Compare the load balancing approach with the load-sharing approach in distributed
computing.
1. Load Balancing:
Load balancing actively monitors systems and moves tasks between them to keep the workload even.
It reduces delays and ensures no server is overloaded.
Example: A website that sends visitors to less busy servers.
2. Load Sharing:
Load sharing only assigns new tasks to the least loaded server. It doesn’t move running tasks even if a
server becomes busy.
Example: A printer network where a new print job goes to an idle printer.
3. Decision Timing:
Load balancing is often dynamic and done in real-time, while load sharing can be static and simpler.
Balancing adapts to changes better.
Example: In a cloud system, dynamic balancing keeps adjusting as users join or leave.
4. Efficiency:
Load balancing usually gives better performance and responsiveness, especially in large or changing
systems. Load sharing works well in simpler or smaller setups.
Example: A social media app uses load balancing to handle millions of users smoothly.
5. Summary:
Both improve performance, but load balancing is smarter and more proactive. Load sharing is easier
to implement but less flexible.
7. Analyze State Information Exchange, Priority Assistance, and Migration Limitation Policies
in Load Balancing
1. State Information Exchange:
This defines how machines share their load information. It can be done regularly (periodically), when
needed (on demand), or never (static systems).
Example: Servers checking in with each other every 10 seconds to report CPU use.
2. Priority Assistance:
In this, heavily loaded machines request help from lightly loaded ones. It ensures urgent help reaches
overloaded nodes.
Example: A video processing node asks other idle nodes for help rendering scenes.
3. Migration Limitation Policies:
These policies decide how often tasks should be moved. Too much migration can waste time and
overload the network.
Example: A policy that says, “don’t migrate unless CPU usage is above 85%.”
4. Trade-off Management:
More updates mean better load awareness, but also more communication cost. The system must
balance between performance and overhead.
Example: In high-speed trading, the system must act fast without overloading itself.
5. Coordination:
All three policies must work together—good state info helps with fair assistance, and migration limits
protect the system from constant shuffling.
8. Analyze the Trade-offs Between Static and Dynamic Task Assignment Approaches
1. Static Assignment:
In static systems, tasks are assigned in advance based on fixed assumptions about load and resource
availability. It's simple but can’t adapt to real-time changes.
Example: A company assigns tasks based on past usage, assuming it won’t change.
2. Dynamic Assignment:
Dynamic systems assign tasks based on current system conditions. They are more flexible but more
complex.
Example: A cloud service routes requests to the least busy server at that moment.
3. Performance:
Static methods are faster in small or predictable environments. Dynamic ones work better in changing
or large systems but may have some delay due to decision-making.
Example: A web server farm dynamically balances traffic during a big event.
4. Overhead:
Static systems have low communication cost. Dynamic systems use more resources to monitor and
update load states.
Example: A sensor network avoids dynamic assignment to save energy.
5. Error Handling:
Dynamic systems can recover from failures by reassigning tasks. Static systems struggle if a pre-
assigned server crashes.
9. How Can Task Prioritization and Fairness Be Ensured in Global Scheduling Algorithms?
1. Task Priority Levels:
Tasks can be given high, medium, or low priority. The scheduler ensures urgent tasks are served first
without ignoring others.
Example: Emergency alerts get faster service than background file syncing.
2. Aging Technique:
To prevent low-priority tasks from waiting forever, their priority increases over time. This ensures
fairness.
Example: A print job waiting for 10 minutes moves up in priority.
3. Fair Queueing:
The system assigns equal time slices to all users or groups. It balances user needs across the system.
Example: In a classroom, all students get equal computer time during an online test.
4. Feedback Control:
The scheduler uses past performance to adjust priorities. If some tasks always finish late, their future
priority may increase.
Example: A slow-loading webpage is prioritized next time it’s accessed.
5. Policy + Mechanism:
Fairness is a policy goal; the mechanism (like queues, aging, etc.) makes it work. Good design
balances performance with equality.
16. Analyze cache validation schemes and explain their different types.
1. Write-Invalidate:
When a client updates a file, other clients' cached copies are marked invalid. They must re-fetch the
file.
Example: Only one person can edit a Google Sheet cell at once.
2. Write-Update:
When one client updates, the new data is sent to all other clients. This keeps all copies up-to-date.
Example: Collaborative live editing in tools like Google Docs.
3. Periodic Validation:
Clients regularly check with the server to see if the file has changed.
Example: Your browser checks if a website has new content every few minutes.
4. On-Open Validation:
Validation happens when a file is opened. If the server version is newer, the client fetches the update.
Example: Dropbox checks for newer versions of a file when you open it.
5. Trade-offs:
Write-update uses more bandwidth, but write-invalidate is simpler. Each is chosen based on system
goals.
18. Explain the quorum-based protocol for updating multiple copies of files.
1. What It Solves:
In distributed systems, files often have multiple copies. Quorum ensures safe updates by requiring a
majority agreement.
2. Read and Write Quorums:
Before a read or write, the system contacts a minimum number of replicas. Write quorum (W) + read
quorum (R) must be > total replicas (N).
Example: For 5 replicas, if W=3 and R=3, at least one replica will always have the latest update.
3. Conflict Avoidance:
By contacting multiple nodes, quorum avoids conflicts from out-of-sync replicas.
Example: Email clients syncing across phone, tablet, and laptop.
4. Strong Consistency:
It ensures at least one node with the latest write is always read from, keeping data correct.
5. Trade-off:
More nodes = more safety but more delay. System designers pick values based on speed vs.
consistency.
22. Compare the fault tolerance mechanisms of eager and lazy replication.
1. Eager Replication:
All replicas are updated at the same time. It gives strong consistency but needs lots of coordination.
Example: A bank updates all copies of a customer record instantly.
2. Lazy Replication:
Only one replica is updated first, and others catch up later. It’s faster but may be inconsistent.
Example: A photo upload shows up immediately on your phone, later on your tablet.
3. Fault Tolerance:
Eager systems handle failures better since all nodes are up-to-date. Lazy systems risk data loss if the
main node crashes before syncing.
4. Performance:
Lazy replication has better performance under high load, as fewer updates happen in real-time.
5. Trade-offs:
Eager = safe but slow. Lazy = fast but may show old data. Choice depends on how critical consistency
is.