0% found this document useful (0 votes)
2 views47 pages

DC Ese Notes

The document provides notes on various concepts related to distributed systems, including encapsulation, MPI, message communication models, and remote procedure calls. It covers topics such as leader election algorithms, load balancing, fault tolerance, and consistency models in distributed systems. Additionally, it discusses the advantages of using threads, process migration, and message-oriented communication for scalability and efficiency.

Uploaded by

ANUJ DUBEY
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views47 pages

DC Ese Notes

The document provides notes on various concepts related to distributed systems, including encapsulation, MPI, message communication models, and remote procedure calls. It covers topics such as leader election algorithms, load balancing, fault tolerance, and consistency models in distributed systems. Additionally, it discusses the advantages of using threads, process migration, and message-oriented communication for scalability and efficiency.

Uploaded by

ANUJ DUBEY
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

DC ESE Notes

Mod 2-5
2 Marks Questions:
Mod 2,3
1. Encapsulation in layered protocols
• Each layer adds its own header (and possibly trailer) to data before passing it down, hiding
lower‐level details.
• This “wrapper” lets higher layers treat communication uniformly without knowing how bits
travel.
• Example: Like sending a letter: you put your message in an envelope (application layer), then
the post office adds tracking info (network layer).
2. What is MPI, and why is it used in distributed systems?
• MPI (Message Passing Interface) is a standard library for processes to exchange messages in
parallel programs.
• It’s used to coordinate work and share data efficiently across multiple machines or CPU cores.
• Example: Weather simulation on a supercomputer where different nodes calculate different
regions and share boundary data via MPI.
3. State examples for the following message communication models.
1. Transient Synchronous
o Sender blocks until receiver gets the message.
o Example: Making a direct phone call—you speak only when the listener is on the line.
2. Persistent Asynchronous
o Sender writes message to mailbox and continues immediately; receiver reads later.
o Example: Sending an email—you don’t wait for the recipient to read it.
4. Key characteristics of multicast communication
• One‐to‐many delivery: a sender transmits a single message to a group of receivers.
• Efficient bandwidth use: network duplicates data only where branches diverge.
• Example: Live video streaming to multiple subscribers in a video conference.
5. What do we already know about network transparency in RPC?
• RPC makes remote calls look like local function calls, hiding network details.
• Developers don’t manage sockets or serialization manually.
• Example: Calling getUserData() on a server as if it were a local library function.
6. Differences between synchronous and asynchronous RPC
• Synchronous RPC: Caller waits (blocks) until the remote function returns a result.
• Asynchronous RPC: Caller sends request and continues; result is delivered later via callback
or polling.
• Example:
o Sync: Waiting on a web API call before proceeding.
o Async: Sending a notification request and handling response in an event handler.
7. Assumptions made about the reliability of group communication
• Messages may be lost, duplicated, or reordered; protocols must handle these failures.
• Group membership can change (joins/leaves) and must be tracked consistently.
• Example: Chat room apps assume users can disconnect unexpectedly and handle message
recovery.
8. Use of remote procedure call in distributed systems
• Provides a simple interface for invoking operations on remote machines.
• Abstracts marshalling (packing) and unmarshalling of parameters and results.
• Example: A web service exposes processPayment() via RPC so clients call it like a local
function.
9. Elaborate Remote Method Invocation (RMI)
• RMI extends RPC to object‐oriented systems, allowing methods on remote objects to be
invoked.
• It handles object serialization, method dispatch, and remote garbage collection.
• Example: Java RMI lets you call remoteAccount.deposit(100) on a server‐side Account object.
10. Illustrate the concept of Message Passing Interface
• MPI defines functions like MPI_Send and MPI_Recv for explicit data exchange between
processes.
• Processes are identified by ranks; messages include source, destination, and tags.
• Example: In a parallel matrix multiplication, each process sends submatrix results to neighbors
using MPI_Send.
11. Elaborate the concept of Interprocess Communication (IPC)
• IPC enables data exchange and synchronization between processes on the same or different
hosts.
• Common mechanisms include pipes, shared memory, message queues, and sockets.
• Example: Two local processes coordinate via a named pipe where one writes log entries and
the other reads them for display.

12. How does the Bully Algorithm ensure leader election?


• When a process notices the coordinator has failed, it starts an election by sending messages to
higher-ID processes.
• If no higher-ID responds, it becomes the leader and announces to all.
• Example: In a group chat, if the admin (ID 5) goes offline, member with ID 4 checks if anyone
higher is active; if not, becomes the new admin.

13. Define the concept of quorum in mutual exclusion algorithms.


• A quorum is a subset of nodes where any two quorums always share at least one common node.
• To enter critical section, permission must be received from all quorum members.
• Example: In a 5-member committee, 3 members approving a decision ensures overlap with
any other group of 3.

14. Logical clock conditions and implementation rule


• Condition: If event A happens before B, then time(A) < time(B).
• Implementation rule: Increment clock on each event, and on receiving a message, set local
clock to max(local, received)+1.

15. Purpose of an election algorithm


• To ensure only one coordinator or leader is active among distributed nodes.
• Helps in assigning responsibilities like synchronization or resource management.
• Example: Electing a team lead in a project after the current lead leaves.

16. Evaluate logical clock modified received time


• If send = 6, receive = 10: Modified receive = max(6,10)+1 = 11
• If send = 48, receive = 40: Modified receive = max(48,40)+1 = 49

17. Ring Election Algorithm with simultaneous detection


• Both processes send election messages in the ring; only one survives based on highest ID.
• Duplicate elections merge, and the process with the highest ID wins and declares itself.
• Example: Two nodes start elections, but only the message from node with higher ID completes
the ring.

18. Leader election under network failures


• Systems use timeouts and retries to detect failed leaders and start elections.
• Some algorithms tolerate partial failures using majority or quorum agreement.
• Example: In cloud systems, if the main server crashes, a backup automatically takes over.

19. Properties of quorum set in Maekawa’s Algorithm


• Each process has a unique quorum; any two quorums overlap at least at one process.
• Each quorum is smaller than the total number of processes (usually √N).
• Ensures mutual exclusion with fewer messages than full permission sets.

20. Compare performance parameters of non-token-based algorithms


• High message complexity (e.g., Ricart-Agrawala needs 2(N–1) messages).
• More overhead in handling message queues and maintaining request timestamps.
• No token loss, but slower response under heavy load.

21. Difference between clock drift and clock skew


• Clock drift: Clock runs faster/slower than actual time due to hardware differences.
• Clock skew: Difference in time between two clocks at a given moment.
• Example: One watch gains 2 seconds/day (drift); comparing two clocks showing 2:00 and 2:03
is skew.

22. Advantages of Token-Based Algorithm


• Ensures mutual exclusion with only 1 message per entry into critical section.
• No starvation: every process eventually gets the token.
• Example: Like a talking stick – only the person holding it can speak, preventing interruptions.
Mod 4,5
1. What are the key advantages of using threads over multiple processes?
• Threads share memory space, so communication is faster and needs fewer resources.
• Creating threads is lighter and quicker than creating new processes.
• Example: A browser using threads to load different tabs efficiently without opening new
processes.

2. What are the benefits of process migration in a distributed system?


• Balances load by moving tasks from busy to idle machines.
• Helps in fault tolerance by shifting processes from failing machines.
• Example: A game server shifting players from an overloaded server to a less crowded one.

3. Explain process thrashing with an example.


• Thrashing occurs when processes constantly move in and out of memory, wasting CPU time.
• It happens due to frequent swapping caused by overloading.
• Example: Too many apps open on a low-RAM computer cause frequent freezing and lag.

4. Define the concept of the critical section in process synchronization.


• A critical section is a part of code where shared resources are accessed.
• Only one process should execute it at a time to prevent data corruption.
• Example: Two ATMs accessing the same bank account balance.

5. Briefly describe the sender-initiated load-balancing policy.


• The overloaded sender looks for underloaded receivers to send tasks.
• Works well when the system is lightly loaded.
• Example: A busy server distributes tasks to idle servers.

6. Briefly describe the receiver-initiated load-balancing policy.


• Idle nodes (receivers) actively request tasks from busy ones.
• Effective in high-load situations where idle nodes are rare.
• Example: An idle cloud instance asking for tasks from busy peers.

7. Define the threshold in a load-balancing algorithm.


• A threshold is a set limit to decide whether a node is overloaded or underloaded.
• Load balancing decisions are made based on this value.
• Example: If a server load exceeds 80%, it triggers task transfer.

8. What is global scheduling, and why is it used in distributed systems?


• Global scheduling assigns tasks to nodes across the whole system.
• It improves resource use and performance by avoiding bottlenecks.
• Example: Distributing search queries across all Google data centers.

9. List the desirable features of a global scheduling algorithm.


• Load balancing, minimum communication overhead, and scalability.
• Fault tolerance and fairness among all nodes.

10. What is task migration, and how does it contribute to resource optimization?
• Task migration moves tasks between nodes to balance load or avoid failures.
• It ensures efficient use of CPU and memory across the system.
• Example: Cloud platforms shifting user requests from busy to idle servers.

11. What is migration transparency in process migration, and why is it necessary?


• It hides migration details from users and applications.
• Ensures that the process continues working smoothly after moving.
• Example: A video call continues seamlessly even if the server handling it changes.

12. Define and provide examples of transient faults and intermittent faults.
• Transient fault: Temporary error that disappears quickly.
o Example: Network lag due to sudden spike in traffic.
• Intermittent fault: Occurs randomly over time.
o Example: A loose cable causing occasional data errors.

13. Transient fault vs. permanent fault (with example)


• Transient fault: Temporary and disappears without repair.
o Example: A dropped Wi-Fi connection that reconnects in seconds.
• Permanent fault: Requires hardware/software replacement.
o Example: A failed hard drive that won't work again.

14. Intermittent fault vs. permanent fault (with example)


• Intermittent fault: Occurs occasionally, often hard to detect.
o Example: CPU overheating randomly causing restarts.
• Permanent fault: Constant and requires fixing or replacement.
o Example: Burned-out power supply that never works again.

15. Key advantages of message logging in a distributed system


• Helps recover lost data after crashes by replaying logged messages.
• Enables rollback to consistent states.
• Example: Email servers log all messages, so they can restore emails after a crash.

16. Benefits of replication in a distributed system


• Improves data availability and fault tolerance.
• Increases system performance by reducing access delays.
• Example: Google stores the same data in multiple locations.

17. Briefly explain the continuous consistency model


• Ensures that replicas are “almost” consistent within a certain range or time.
• Allows small delays but keeps data mostly up to date.
• Example: Social media likes taking a few seconds to sync across devices.

18. Briefly explain the sequential consistency model


• All operations appear in the same order across all processes.
• Order respects the program sequence, not real-time.
• Example: If A posts before B, everyone sees A’s post first.

19. What is strict consistency in distributed systems?


• The most rigid model: any read returns the latest write instantly.
• Very hard to implement in real distributed systems.
• Example: Reading a file must always return the most recently updated version.

20. How does eventual consistency differ from sequential consistency?


• Eventual: Data becomes consistent over time, no order guarantee.
• Sequential: All nodes agree on the order of operations.
• Example: Social media profile updates may take time (eventual), but all see updates in the
same order (sequential).

21. How can checkpointing improve fault tolerance in distributed systems?


• Saves system state regularly so recovery can resume from last good point.
• Reduces data loss after failure.
• Example: Autosave in games or Word documents.

22. Why is redundancy essential in fault-tolerant distributed systems?


• Provides backup components that take over if others fail.
• Increases reliability and uptime.
• Example: Airplanes use redundant engines and systems for safety.
5 Marks Questions:
MOD 2,3
1. How does message-oriented communication impact the scalability of distributed systems?
1. Message-oriented communication allows systems to send messages without waiting for an
immediate response, making components loosely coupled.
2. It enables asynchronous communication, meaning systems can continue working while waiting
for messages, improving efficiency.
3. This model supports horizontal scaling, as more servers or services can be added without
redesigning the communication flow.
4. Message queues or brokers (e.g., Kafka, RabbitMQ) can handle large volumes of data and help
distribute load across systems.
5. The system becomes more resilient to failures, as messages can be stored and retried without
affecting overall performance.
6. Example: A video streaming service using message queues to process uploads across multiple
servers for faster results.

2. What are the implications of using multicast communication over broadcast communication?
1. Multicast sends a message only to a specific group of receivers, while broadcast sends to all
devices on the network.
2. Multicast reduces network traffic and improves efficiency by targeting only interested
receivers.
3. It is more secure and controlled, as messages are not exposed to all nodes like in broadcast.
4. Multicast is scalable, especially useful in large networks like live video conferencing or
software updates.
5. However, multicast requires group management, while broadcast is simpler but less efficient.
6. Example: Sending updates to all subscribed users in a stock trading platform using multicast
instead of broadcasting to every user.

3. Can you rephrase the concept of "stream-oriented communication" in simpler terms?


1. Stream-oriented communication means data is sent in a continuous flow, like water through a
pipe, rather than in separate chunks.
2. It is used when timing and order of data matter, such as live audio or video transmission.
3. This method is connection-based, often relying on TCP, which maintains the order and
reliability of the stream.
4. Data is delivered byte-by-byte, and the receiver reads it in the same sequence it was sent.
5. It’s ideal for real-time communication where small delays or missing data can affect the
experience.
6. Example: Watching a live YouTube stream, where video and audio are sent continuously and
must be received in the correct order.

4. Why is synchronization important in interprocess communication?


1. Synchronization ensures that two or more processes communicate in an orderly and consistent
way.
2. It prevents race conditions, where multiple processes access shared data at the same time and
cause conflicts.
3. Proper synchronization ensures data consistency and correctness during concurrent access.
4. Without synchronization, one process might read or write incomplete or outdated data.
5. It is especially important in shared memory communication where timing of access matters.
6. Example: Two banking systems updating your account balance at the same time need
synchronization to avoid incorrect totals.

5. How does message-oriented communication relate to asynchronous communication?


1. Message-oriented communication naturally supports asynchronous behavior, allowing systems
to send and receive messages independently.
2. In asynchronous communication, the sender doesn’t wait for the receiver to respond and can
continue processing other tasks.
3. Messages are often stored in queues and processed later when the receiver is ready, improving
system responsiveness.
4. This model supports decoupling between components, allowing them to work at their own pace.
5. It’s suitable for high-load environments like web servers or distributed services.
6. Example: A food delivery app sends a message to the kitchen system, which processes it when
ready — sender and receiver work separately.

6. Compare and contrast synchronous and asynchronous communication in distributed systems.


1. In synchronous communication, the sender waits for the receiver to respond before continuing.
In asynchronous, the sender doesn’t wait.
2. Synchronous is simple and ensures immediate interaction but can slow down the system if the
receiver is busy.
3. Asynchronous is faster and allows multitasking but introduces complexity in message handling
and response tracking.
4. Synchronous is used when real-time interaction is needed (e.g., login authentication).
Asynchronous is used in background tasks (e.g., email sending).
5. Synchronous uses direct communication like RPC, while asynchronous uses message queues
or mailboxes.
6. Example: Booking a ticket (synchronous) vs. submitting a contact form and getting an email
later (asynchronous).

7. Analyze different features of a good message-passing system.


1. A good message-passing system should provide reliable delivery so that no messages are lost
in transit.
2. It should support both synchronous and asynchronous modes to handle various communication
needs.
3. Scalability is important — the system should handle increasing numbers of users and messages
without slowing down.
4. Security features like encryption and authentication ensure data integrity and privacy.
5. It should handle message persistence and fault tolerance so that messages are not lost during
failures.
6. Example: WhatsApp uses a message-passing system that stores undelivered messages and
encrypts them for security.

8. Elaborate the types of communication.


1. Unicast sends a message from one sender to one receiver. It’s used for direct and private
communication.
2. Multicast sends a message from one sender to a group of receivers, used for group
collaboration or updates.
3. Broadcast sends a message to all nodes in the network. It's simple but not efficient for large
systems.
4. Anycast sends a message to the nearest or best receiver out of a group, often used in network
routing.
5. Peer-to-peer allows nodes to both send and receive data directly, without centralized control.
6. Example: A live stream sent via multicast, a personal message via unicast, and a system update
broadcasted to all devices.

9. How does the Remote Procedure Call (RPC) work?


1. The client calls a procedure just like a normal function, but it is actually located on a remote
server.
2. The client-side stub converts the function call into a message and sends it over the network to
the server.
3. The server-side stub receives the message, converts it back into a function call, and executes
the procedure.
4. The result is then sent back through the same path, from the server to the client.
5. The client receives the result and continues execution, unaware that the function was remote.
6. Example: A weather app on your phone makes an RPC to a server to fetch current weather data.

10. What is the definition of Remote Procedure Call (RPC)?


1. RPC is a communication method that lets a program call a function on another computer as if
it's on the same system. It hides the network communication from the user.
2. It works in a client-server model where the client sends a request and the server performs the
task and sends back the result.
3. RPC uses "stubs" – the client stub packs the function data, and the server stub unpacks and runs
it. This simplifies the process.
4. It can be synchronous (waits for reply) or asynchronous (doesn’t wait). This flexibility helps in
different types of applications.
5. RPC supports location transparency, so the client doesn't need to know where the function is
actually running.
6. Example: A payment app sending a request to a bank’s server to check your balance uses RPC
behind the scenes.

11. Why is message persistence critical in distributed systems?


1. Message persistence ensures that messages aren’t lost if a system crashes or fails. This makes
the system more reliable.
2. If a receiver is offline, messages can be stored and delivered later. This supports delayed
communication.
3. Persistent messaging systems like Kafka and RabbitMQ store messages until they are
successfully received.
4. This allows senders and receivers to work independently, improving scalability and flexibility.
5. It's essential in critical systems like banking or e-commerce, where losing a message could
mean losing money or data.
6. Example: A successful order message from a shopping app is stored even if the server restarts,
so it's processed later.

12. Evaluate T_Client using given timestamps


1. The client sends a request at time T0 = 5:08:15:100 and receives the response at T1 =
5:08:15:900. These times are recorded on the client side and represent the beginning and end
of a communication exchange.
2. The round-trip time (RTT) for this communication is 800 milliseconds, calculated by
subtracting T0 from T1. Since we usually assume a symmetrical network delay, the one-way
delay is 400 milliseconds.
3. The server sent the response message at T_Server = 5:09:25:300. This is the server’s local
time when it processed and sent the reply to the client.
4. To calculate the approximate time on the client’s clock when the server sent the message, we
subtract the one-way delay from the server time. This gives T_Client = 5:09:24:900.
5. The client can now consider 5:09:24:900 as the actual time when the server’s message was sent.
It may adjust its own clock based on this estimate.
6. This method of adjusting client time is useful in systems like online banking or cloud services,
where accurate time coordination is important.

13. How would you use Berkeley’s Algorithm in a real-time system?


1. Berkeley’s Algorithm is a clock synchronization method used in distributed systems that don’t
rely on a single accurate time source. It works by having one node act as a coordinator to
synchronize other clocks.
2. In a real-time system like a smart energy grid, the control center (coordinator) polls time from
all meters across the city to synchronize operations like power cuts and billing times.
3. After collecting the times, the coordinator calculates the time differences between its clock and
each node’s clock. It removes any clocks with large deviations (outliers).
4. The coordinator then averages the remaining time differences and sends back an adjustment
value to each node, telling it how to change its clock.
5. This helps all the systems, including the coordinator itself, to come to a nearly common time,
even if no clock was perfect to begin with.
6. Such synchronization is crucial in real-time distributed systems to make consistent decisions,
such as coordinated switching in electric networks.

14. What are the components of Cristian’s Algorithm, and how do they interact?
1. Cristian’s Algorithm is used for time synchronization between computers in a distributed
system. It involves two main components: a client and a time server.
2. The client sends a request to the server, asking for the current time. It also records the time
when the request was sent to measure delay later.
3. When the server receives the request, it responds with its current time immediately. This
response is then received by the client after some delay.
4. The client measures the round-trip time and estimates the one-way delay by dividing it in half.
It adjusts the server’s time by subtracting this delay.
5. This adjusted server time is used by the client to set or update its own clock. This method works
well when network delay is small and nearly equal in both directions.
6. For example, Cristian’s Algorithm could be used in a stock trading system where local trading
terminals sync time with a central financial server.
15. How do you know that logical clocks accurately maintain event order?
1. Logical clocks do not represent actual physical time but help keep track of the order of events
in distributed systems. They are mainly used where event sequencing is more important than
real time.
2. Each process maintains its own counter, which it increments before every event. When sending
a message, it includes this clock value in the message.
3. Upon receiving a message, the receiver compares the incoming clock with its own and sets its
clock to the maximum of both, plus one. This ensures causality is preserved.
4. For example, if Process A sends a message after event 3 and Process B receives it before event
2, logical clocks adjust to reflect the message came after event 3.
5. This follows Lamport’s "happened-before" relation, which ensures that the system respects the
correct order of dependent events across different nodes.
6. Thus, logical clocks maintain the causal sequence of events even if system clocks are
unsynchronized, which is useful in debugging and distributed databases.

16. Illustrate any one algorithm of non-token-based Distributed Mutual Exclusion


1. One popular non-token-based algorithm is Ricart-Agrawala Algorithm, which helps multiple
processes share a critical resource without using a token.
2. In this method, when a process wants to enter a critical section, it sends a timestamped request
to all other processes in the system.
3. Other processes reply immediately if they are not interested in the resource or if the incoming
request has a smaller timestamp than their own request.
4. A process can only enter the critical section when it has received permission from all other
processes, ensuring mutual exclusion.
5. After finishing the critical section, the process sends a release message to all, allowing others
to proceed.
6. For example, this method can be used in a distributed database where only one node can update
shared records at a time.

17. Illustrate Vector Clock in distributed computing for a given scenario


1. Vector clocks are an improvement over Lamport clocks, used to capture causality between
events in distributed systems more accurately.
2. Each process keeps an array (vector) of counters, one for each process in the system. When a
process performs an event, it increments its own counter.
3. When a message is sent, the entire vector is sent along. On receiving a message, the receiver
takes the element-wise maximum of its own vector and the one received, and then increments
its own counter.
4. For example, if Process A sends a message with vector [2,1,0] to Process B which has [1,2,0],
B updates its vector to [2,2,0] and increments its own to [2,3,0].
5. This way, we can clearly detect whether one event happened before another, or if they are
concurrent.
6. Vector clocks are useful in distributed version control systems, where tracking causality and
resolving conflicts is important.

18. Illustrate Election Algorithm with suitable example


1. Election algorithms are used in distributed systems to select a coordinator (leader) among
several nodes when the previous one fails.
2. A common example is the Bully Algorithm, where the process with the highest ID becomes
the coordinator.
3. When a process notices that the coordinator is down, it starts an election by sending messages
to all processes with higher IDs.
4. If no one replies, it becomes the coordinator. If someone with a higher ID replies, that process
continues the election.
5. For example, in a group of 5 servers numbered 1 to 5, if server 3 detects that 5 is down, it sends
a message to 4 and 5. If both are down, 3 declares itself the coordinator.
6. This method is useful in systems like master-slave architectures where a leader must coordinate
tasks.

19. Illustrate method for Physical Clock Synchronization for passive mode
1. In passive mode, the client does not send requests but waits for time messages from the server.
It passively listens to synchronization broadcasts.
2. When it receives a message from the server, it records the time of arrival and compares it with
the server time included in the message.
3. The client estimates the one-way delay using past data or by assuming minimal network delay
and adjusts its clock accordingly.
4. This method avoids adding network load because the client does not initiate communication.
5. Passive synchronization is often used in systems like GPS or radio clocks where devices sync
with a central signal.
6. It is also common in large sensor networks where nodes conserve energy by not sending
messages frequently.

20. Illustrate method for Physical Clock Synchronization for Active mode
1. In active mode, the client takes the initiative to send a request to the server for time
synchronization. This is similar to Cristian’s Algorithm.
2. The client sends a message and notes the time it was sent. When the server replies with its
current time, the client notes the arrival time.
3. The client calculates the round-trip delay and assumes one-way delay is half. Then it adjusts
the server’s time to fit its own timeline.
4. This method is more accurate than passive mode as it measures the current network delay
directly.
5. Active synchronization is useful in systems like financial servers or airline booking platforms
where up-to-date time is critical.
6. However, it adds some network overhead due to the active communication.

21. Illustrate Lamport Distributed algorithm of non-token-based distributed mutual exclusion


1. The Lamport algorithm allows processes to request access to a critical section using
timestamped messages, without using a token.
2. When a process wants to enter the critical section, it sends a request message to all others and
places its request in a local request queue.
3. Each process replies to incoming requests and maintains its own queue, sorting requests based
on Lamport timestamps and process IDs.
4. A process enters the critical section only when its request is at the front of the queue and it has
received replies from all other processes.
5. Once done, it sends a release message to all, who then remove its request from their queues.
6. This ensures mutual exclusion and fairness. It's suitable for distributed file systems where
multiple users request file access.

22. Illustrate Ring Election Algorithm with suitable example


1. The Ring Election Algorithm organizes processes in a logical ring and uses message passing
to elect a new coordinator.
2. When a process detects the failure of the coordinator, it starts an election by sending an election
message to its next neighbor in the ring.
3. Each process forwards the message while adding its own ID. Once the message returns to the
originator, it declares the process with the highest ID as the new coordinator.
4. A coordinator message is then circulated to inform all processes about the new leader.
5. For example, in a ring of 5 processes with IDs 1 to 5, if 3 starts the election and 5 has the highest
ID, then 5 becomes the coordinator.
6. This algorithm is efficient in ring topologies and avoids heavy message overhead in small to
medium systems.
MOD 4,5
1. Analyze the key challenges in designing a load-sharing algorithm for a distributed system
1. One major challenge is monitoring and collecting real-time load information from all nodes
without causing high communication overhead. This is essential to make accurate and timely
decisions.
2. Deciding the threshold for when a node is considered overloaded or underloaded is complex. It
requires balancing performance goals with system stability.
3. Transferring tasks between nodes must be efficient and not disrupt the execution or consume
excessive bandwidth. The cost of migration should be justified by the performance gain.
4. The system should be scalable and able to handle increases in the number of nodes without
significantly affecting performance or reliability.
5. The algorithm must also be fault-tolerant and able to function if a node fails during load sharing.
This is critical in real-world systems like cloud platforms.
6. For example, in a video streaming platform, if one server becomes overloaded, tasks like video
transcoding must be shifted to other servers without delay.

2. Illustrate the different classifications of load-balancing algorithms


1. Load-balancing algorithms are broadly classified as static or dynamic. Static algorithms make
decisions in advance using fixed rules, while dynamic algorithms adapt to current load
conditions in real-time.
2. In a centralized algorithm, a single master node makes all load-balancing decisions. This
simplifies coordination but can become a bottleneck or single point of failure.
3. Distributed algorithms let each node make independent decisions based on local or limited
global information, improving scalability and fault tolerance.
4. Another classification is cooperative vs. non-cooperative. In cooperative algorithms, nodes
share information and work together, while in non-cooperative ones, nodes act independently.
5. Load balancing can also be preemptive, where running tasks are migrated, or non-preemptive,
where only new tasks are considered for load sharing.
6. For instance, round-robin (static, centralized) is used in simple systems, while task-stealing
(dynamic, distributed) is used in parallel computing frameworks like Cilk.

3. Compare different models for organizing threads in a distributed environment


1. The many-to-one model maps multiple user-level threads to one kernel thread. It's simple but
limits concurrency, as only one thread can run at a time.
2. The one-to-one model maps each user thread to a unique kernel thread. This supports full
parallel execution but can cause overhead with too many threads.
3. The many-to-many model allows multiple user threads to be mapped to multiple kernel
threads. It balances flexibility with performance and reduces overhead.
4. Some systems use a two-level model, combining many-to-many with one-to-one
characteristics for better tuning and control.
5. Thread model selection affects how distributed applications perform, especially under high
concurrency or I/O-bound workloads.
6. For example, in a cloud-based chat server, the one-to-one model allows each user’s session to
run independently and simultaneously.

4. Analyze the address space transport mechanism during process migration


1. Address space transport is essential when migrating a process from one node to another. It
ensures the process can resume correctly at the destination.
2. The process’s entire memory image, including code, data, and stack segments, must be
transferred. This is time-consuming and must be done efficiently.
3. There are two main techniques: pre-copy and post-copy. Pre-copy transfers memory before
suspension, while post-copy moves the process and then fetches memory as needed.
4. The address space must be reconstructed at the destination, ensuring pointers and references
still point to valid memory locations.
5. Address translation mechanisms like page tables must be updated to reflect the new physical
location. This involves low-level OS and hardware support.
6. For example, in live migration of virtual machines, pre-copy is used to minimize downtime,
copying memory while the VM is still running.

5. Illustrate the process of freezing and restarting a migrated process


1. Freezing a process involves halting its execution temporarily to save its state and memory
contents. This ensures the process can be safely moved without corruption.
2. During freezing, all process state information, including the program counter and register
values, must be saved to persistent storage or memory.
3. Restarting the process involves loading its state from the saved data and resuming execution
as though it had never been paused. This requires the target node to have sufficient resources.
4. The system must ensure that the address space and execution context are properly restored to
avoid errors or inconsistencies during restart.
5. These techniques are particularly useful in virtual machine migration, where entire systems can
be paused, moved, and restarted on different physical machines.
6. For example, cloud service providers use process freezing and restarting to migrate running
applications across servers with minimal downtime.

6. Compare code migration and process migration with examples


1. Code migration involves moving the program code from one machine to another, where it can
be executed. It requires the destination machine to have the necessary execution environment.
2. In process migration, the entire process, including its state, memory, and execution context, is
moved. This is more complex, as it involves ensuring continuity of execution.
3. Code migration is typically simpler, as it doesn’t require the full context of the process. It’s
more commonly used in environments where applications are stateless.
4. Process migration allows the system to balance load and ensure high availability by moving
running tasks between machines, but it introduces overhead due to state transfer.
5. A practical example of code migration is Java’s bytecode, which can run on any platform with
a compatible JVM, while process migration is used in virtualized environments, like live
migration of VMs in cloud computing.
6. In short, code migration is lightweight but less flexible than process migration, which provides
greater control over resource utilization but with higher complexity.

7. Discuss the key challenges in designing an efficient load-balancing algorithm


1. The primary challenge in load balancing is the accurate measurement of system load. If the
load information is inaccurate, the system may end up redistributing tasks inefficiently.
2. Task migration cost is another challenge. The cost of transferring tasks should be weighed
against the benefits of load balancing to avoid diminishing returns.
3. The algorithm must also be scalable, meaning it should efficiently handle increasing nodes or
tasks without degrading performance.
4. The system must adapt to dynamic workloads and be able to respond to sudden spikes in
demand or unexpected node failures.
5. Another challenge is handling heterogeneity in node performance. Different nodes may have
varying computational capacities, requiring a flexible algorithm to ensure fair distribution.
6. For example, in a cloud environment, load balancing ensures that compute tasks are distributed
across servers to optimize response time and avoid server overload.

8. How does load balancing improve resource utilization in distributed systems?


1. Load balancing improves resource utilization by ensuring that no single node is overburdened
while others remain idle, which maximizes the use of available resources.
2. It helps distribute the workload evenly, reducing bottlenecks and ensuring that system resources
like CPU, memory, and storage are used efficiently.
3. By balancing the load, the system can handle more tasks or users, increasing throughput and
reducing response time.
4. Load balancing also helps in fault tolerance by redistributing tasks in case of node failures,
ensuring uninterrupted service.
5. For example, in a web server farm, load balancing ensures that user requests are spread across
multiple servers, avoiding delays and ensuring better performance.
6. In distributed databases, load balancing ensures that read and write operations are evenly
distributed, which leads to faster query responses and improved system stability.
9. Explain the difference between centralized and decentralized scheduling in distributed systems
1. Centralized scheduling involves a single central node making all decisions about where tasks
should be executed. This simplifies management but can lead to bottlenecks and single points
of failure.
2. In decentralized scheduling, each node makes its own scheduling decisions, based on local
information. This increases scalability and fault tolerance but requires more complex
coordination.
3. Centralized scheduling is often easier to implement in small or controlled environments where
the number of nodes is limited.
4. Decentralized scheduling works well in large, dynamic environments like cloud computing,
where nodes can join or leave the system frequently.
5. A practical example of centralized scheduling is a traditional supercomputer, where one
controller manages task distribution, while decentralized scheduling is common in peer-to-
peer systems like torrent networks.
6. In cloud computing, decentralized scheduling is more scalable and efficient, as it can distribute
tasks across hundreds or thousands of servers without a central bottleneck.

10. Describe the role of computational granularity in task assignment


1. Computational granularity refers to the size of the tasks assigned to a node. It can range from
fine-grained (small tasks) to coarse-grained (large tasks).
2. Fine-grained tasks are smaller, allowing better load balancing but requiring more overhead in
task coordination and communication.
3. Coarse-grained tasks, on the other hand, reduce overhead but might not utilize resources
efficiently if the tasks are too large for some nodes.
4. The choice of granularity affects the performance of a distributed system, as smaller tasks may
lead to high communication overhead, while larger tasks might leave some nodes idle.
5. In real-time systems, coarse-grained task assignments are often preferred to meet tight
deadlines, while fine-grained tasks are useful in high-performance computing where load
balancing is crucial.
6. For example, in map-reduce frameworks, the tasks are coarse-grained to minimize coordination
overhead across large clusters of machines.

11. How does virtualization enable resource abstraction in distributed systems?


1. Virtualization abstracts the underlying physical hardware and presents a virtual environment
to applications. This allows multiple virtual machines (VMs) to run on a single physical
machine.
2. It creates a virtualized layer between hardware and software, enabling resource pooling and
isolation of resources across different virtual instances.
3. Virtualization enhances resource utilization by allowing for more flexible and efficient use of
CPU, memory, and storage across multiple workloads.
4. It also simplifies resource management by allowing resources to be allocated dynamically based
on demand, improving scalability and flexibility.
5. For example, in cloud computing, virtualization allows providers to offer resources on-demand,
where users are unaware of the underlying physical infrastructure.
6. In a data center, virtualization enables server consolidation, where multiple virtual servers run
on fewer physical machines, reducing costs and energy consumption.

12. Compare consistency with coherence in distributed systems


1. Consistency refers to ensuring that all copies of a data item reflect the same value, regardless
of which node is accessed.
2. Coherence, on the other hand, ensures that the updates to a shared data item are observed in a
sequential order across nodes.
3. Consistency ensures that there is no discrepancy in the data values, while coherence ensures
the correct order of operations.
4. These concepts are crucial in distributed systems to ensure that multiple replicas of data remain
synchronized and operations occur in a predictable manner.
5. For example, in a distributed database, consistency ensures all replicas have the same data,
while coherence ensures that updates made at one node are observed in the same order by other
nodes.
6. Achieving both consistency and coherence can be challenging, particularly in large-scale
distributed systems, where network delays and failures may cause data inconsistency or
incoherent operations.

13. Analyze how RPC semantics handle failures in a distributed system


1. RPC (Remote Procedure Call) semantics define how a client calls a procedure on a remote
server and handles the response or failure.
2. When a failure occurs, RPC semantics often implement retries or timeouts to handle temporary
issues, but this can lead to issues like duplicate calls or inconsistent results.
3. Some RPC systems, such as idempotent RPCs, are designed to safely handle retries without
causing side effects, ensuring consistency even in the presence of failures.
4. Failure detection is another key aspect; the client must be aware if the server has failed and
how to handle it, either by trying another server or notifying the user.
5. In cases of permanent failure, some RPC systems allow callback mechanisms to notify the
client when the server has recovered or when an alternative service is available.
6. For example, in distributed file systems like HDFS, RPCs may fail due to server unavailability,
and retries or failover mechanisms are used to ensure that the request eventually succeeds.
14. Explain different failure models in distributed computing
1. In crash failures, nodes stop functioning unexpectedly but retain their state and can recover.
The system must tolerate crashes without losing consistency.
2. Omission failures occur when a node fails to send or receive messages, which can cause
communication gaps in the system. These need to be detected and handled promptly.
3. Timing failures happen when nodes fail to meet deadlines or timing constraints, leading to
system performance issues. These are particularly relevant in real-time systems.
4. Byzantine failures are the most complex, where nodes provide incorrect or inconsistent
information due to malicious behavior or software bugs. These failures require robust fault-
tolerant mechanisms.
5. Distributed systems must be designed to handle these failures through redundancy,
replication, and consensus protocols to maintain reliability and availability.
6. For example, in a cloud environment, a node might crash (crash failure), or an incorrect result
may be reported due to a bug (Byzantine failure), both of which require mechanisms like fault
detection and recovery.

15. Illustrate various recovery mechanisms in distributed systems


1. Checkpointing is a common recovery mechanism where the state of a process is periodically
saved. If a failure occurs, the system can roll back to the last checkpoint.
2. Logging involves recording every change in the system’s state so that in case of failure, the
system can replay the logs to reach a consistent state.
3. Replication is another mechanism, where multiple copies of data are maintained across
different nodes. If one node fails, another replica can take over.
4. Consensus protocols like Paxos or Raft are used to ensure that nodes in a distributed system
agree on a common state, especially during recovery after failure.
5. Shadow paging and journaling are techniques used in databases to ensure data consistency
during recovery, providing a way to track changes without overwriting existing data.
6. For example, in a distributed database, if one node fails, replicated copies of data are used to
recover without losing data, ensuring high availability.

16. Analyze the importance of replication and consistency in a distributed system


1. Replication ensures that multiple copies of data exist across different nodes, providing fault
tolerance and availability in case of node failure.
2. Consistency ensures that all replicas of data are synchronized and reflect the same state, which
is vital for maintaining the accuracy of operations across the system.
3. The challenge is achieving strong consistency while maintaining performance, especially in
large systems with frequent updates. Techniques like Quorum-based replication help achieve
this balance.
4. Replication is essential for high availability, but without proper consistency models like
eventual consistency, the system could provide outdated or incorrect data to users.
5. For example, in cloud storage services like Google Drive, data is replicated across servers,
ensuring availability even if one server fails. However, consistency protocols ensure that users
see the most up-to-date version of their files.
6. In distributed systems like databases or file systems, replication and consistency must be
carefully managed to avoid issues like split-brain scenarios, where inconsistent data could
lead to corruption.

17. Discuss replication as a scaling technique


1. Replication is a technique where multiple copies of data or services are maintained across
different nodes. This allows the system to scale horizontally by adding more nodes.
2. By replicating data, systems can handle a higher number of read requests, as the load is
distributed across multiple replicas, improving response times and throughput.
3. Replication also increases availability, as if one node fails, another replica can take over,
reducing downtime and ensuring continuity of service.
4. However, scaling with replication requires handling data consistency and synchronization
between replicas, which can be challenging in highly distributed environments.
5. Geo-replication is another form of replication where copies of data are distributed across
multiple geographical locations, improving fault tolerance and reducing latency for global
users.
6. For example, content delivery networks (CDNs) replicate website content across multiple
servers worldwide, ensuring faster load times for users regardless of their location.

18. Illustrate the FIFO consistency model with an example


1. The FIFO (First-In-First-Out) consistency model ensures that updates to data are seen in the
order they were written by the producer, but it doesn’t guarantee visibility across all nodes
immediately.
2. FIFO guarantees that operations performed on a system are observed in the same order by all
clients, ensuring a predictable sequence of events.
3. This model works well in situations where the order of events matters but eventual consistency
is acceptable, such as in messaging or logging systems.
4. For example, in a distributed logging system, if Event A is logged before Event B on one node,
FIFO consistency ensures that all other nodes will see Event A before Event B.
5. FIFO consistency is useful in scenarios where event order matters, but strict consistency isn’t
required for every node at every moment.
6. A real-world example could be in email systems: if messages are received in order, they are
processed in the same order across different servers or clients.
19. Describe the primary-backup model for replica synchronization
1. The primary-backup model involves one primary server handling requests while backup
servers hold replicas of the data. Only the primary is allowed to update the data.
2. When a change occurs, the primary updates its state and then synchronizes the backup replicas
to ensure they remain consistent.
3. This model simplifies consistency management but can create a bottleneck if the primary server
becomes overwhelmed or fails.
4. Backup servers can take over as the primary if it fails, ensuring high availability and fault
tolerance. The backup process must ensure that all data changes are replicated correctly.
5. For example, in database systems, the primary node handles writes, while backups only store
data and can serve read requests.
6. This approach is commonly used in database replication systems to provide fault tolerance
and maintain data consistency.

20. Why is fault tolerance critical for distributed systems? Explain with examples
1. Fault tolerance is crucial in distributed systems because these systems often span multiple
nodes, which may fail independently due to hardware issues, network failures, or software bugs.
2. Fault tolerance ensures that a system can continue operating despite failures, preventing data
loss and maintaining availability.
3. Techniques like replication, redundancy, and consensus protocols are used to ensure that
failures do not cause system-wide outages.
4. For example, in a cloud-based service like AWS, fault tolerance ensures that user data is
available even if one server or data center goes offline.
5. In distributed databases, fault tolerance is achieved by replicating data across different nodes.
If one node fails, the system can still provide service using the replicated data.
6. Without fault tolerance, distributed systems would be vulnerable to unexpected downtimes,
which could lead to service interruptions, data corruption, or loss.

21. What assumptions are made in the Paxos protocol for achieving consensus?
1. Paxos assumes that nodes can fail and recover, but do not act maliciously. It is designed to
tolerate crash failures, not Byzantine failures.
2. It assumes a reliable message delivery, though messages can be delayed, lost, or duplicated,
but not corrupted. Messages are eventually delivered.
3. Paxos also assumes that a majority (quorum) of nodes is always available to proceed with
consensus. This is critical to make progress.
4. The protocol expects that there is no global clock and no assumptions about exact timing; it
only requires that messages will eventually reach their destination.
5. Another assumption is that nodes operate independently and can act concurrently, but each
proposer and acceptor behaves correctly according to protocol rules.
6. For example, Paxos can be used in distributed databases to ensure that multiple replicas agree
on the same value, even if some servers crash during the process.

22. Describe the role of conflict resolution in maintaining data integrity in replication
1. In distributed systems, replication can lead to conflicts when different nodes update the same
data concurrently. Conflict resolution ensures that these updates don’t corrupt data integrity.
2. Conflict resolution techniques can be manual, automatic, or based on predefined rules like
"last-writer-wins", where the latest timestamped update is kept.
3. More advanced systems use merge functions or application-specific logic to combine
conflicting changes meaningfully instead of discarding any.
4. Conflict detection requires version tracking (like vector clocks), which helps identify
divergent updates from different replicas.
5. For example, in collaborative editing tools like Google Docs, conflict resolution ensures that
multiple users editing the same document don’t overwrite each other's changes.
6. Without effective conflict resolution, replication could lead to inconsistencies, defeating the
purpose of having multiple synchronized copies for fault tolerance.
10 Marks Question:
Mod 2,3
1. What is Remote Procedure Call? Explain the working of RPC in detail.
1. Remote Procedure Call (RPC) allows a program to invoke a procedure on a remote machine
as if it were local. It hides the network communication details from the programmer.
2. The main goal of RPC is to simplify distributed computing by making remote interactions look
like local function calls.
3. When a client invokes a remote procedure, the client stub packs the request and sends it over
the network to the server.
4. The request travels through the RPC runtime and is received by the server stub, which
unpacks the message and calls the appropriate procedure.
5. The server executes the procedure and sends the result back through the server stub to the client.
6. The client stub then unpacks the result and returns it to the calling function, completing the
RPC call transparently.
7. RPC supports different communication semantics like at-most-once, at-least-once, and
exactly-once, depending on reliability needs.
8. A key challenge in RPC is handling failures—like lost messages or server crashes—while
maintaining consistency.
9. An example of RPC is when a front-end app requests data from a backend server using a remote
call without knowing the actual server location.

2. What is Remote Method Invocation? Explain the working of RMI in detail.


1. Remote Method Invocation (RMI) is a Java-based mechanism allowing an object to invoke
methods on an object located in another Java Virtual Machine (JVM).
2. It enables object-to-object communication in distributed applications using Java, maintaining
object-oriented principles.
3. In RMI, a client accesses a remote object reference, which acts like a proxy to the actual object
on the server.
4. The client stub sends the method call to the server over the network using serialization and the
RMI registry.
5. On the server side, the skeleton receives the request, deserializes the data, and invokes the
method on the actual remote object.
6. After method execution, the result is serialized and returned to the client through the stub.
7. RMI simplifies communication by supporting parameter passing, return values, and object
references between JVMs.
8. It handles network failures and exceptions using Java’s built-in exception handling and
interfaces like RemoteException.
9. For example, a Java-based banking application might use RMI for clients to perform remote
operations like checking account balance or transferring funds.

3. How would you use encapsulation in designing a protocol stack?


1. Encapsulation is the process of wrapping data with necessary protocol information at each
layer of the communication stack.
2. It is essential for modular design, as each layer operates independently with its own headers
and functionality.
3. In a protocol stack, data from the application is passed down through layers like transport,
network, and data link, with each adding its header.
4. For instance, the transport layer might add a TCP header, the network layer adds an IP header,
and so on.
5. The encapsulated data becomes a packet that travels across the network, and each layer at the
receiving end removes its corresponding header.
6. This design allows changes in one layer without affecting others, supporting interoperability
and scalability in distributed systems.
7. Encapsulation also helps in error detection, flow control, and addressing, making protocols
more reliable.
8. For example, in HTTP over TCP/IP, the web page data is encapsulated in HTTP, then in TCP,
and finally in IP for delivery.
9. The encapsulation process ensures that each layer knows how to handle its own part of the
communication, allowing complex communication systems to function efficiently.

4. How does the absence of guaranteed delivery affect message queues in distributed systems?
1. In distributed systems, message queues are used to pass messages between components
asynchronously.
2. Without guaranteed delivery, some messages may be lost due to network failures, timeouts,
or crashes.
3. This loss can cause inconsistencies, especially if critical messages like transaction updates are
not received.
4. Systems need to implement retries, acknowledgments, or logging mechanisms to ensure
reliability.
5. Without these, a sender may assume a message was delivered while the receiver never received
it.
6. Applications must be designed to tolerate message loss or ensure idempotent operations to
handle retries safely.
7. Some message queue systems offer options like "at-least-once" or "exactly-once" delivery,
but these come at a performance cost.
8. For example, in an order-processing system, losing a message might result in a customer not
receiving their order confirmation.
9. To mitigate such issues, developers often use persistent message queues or introduce
transaction-like behavior for critical operations.

5. Can you design a distributed application architecture using RMI?


1. Yes, RMI is well-suited for designing object-oriented distributed applications in Java.
2. The architecture typically involves clients, a server hosting remote objects, and an RMI
registry for lookup.
3. Remote interfaces are defined, extending the java.rmi.Remote interface, and the server
implements these interfaces.
4. The server registers the remote object in the RMI registry, making it accessible to remote
clients.
5. Clients use Naming.lookup() to retrieve the stub of the remote object and invoke its methods.
6. This architecture supports scalability, as multiple clients can connect to the server
simultaneously.
7. The RMI framework handles serialization, communication, and method invocation over the
network transparently.
8. For example, an online voting system can use RMI to allow users to cast votes remotely via a
centralized Java server.
9. This approach maintains Java’s strong type safety and simplifies the complexities of low-level
network programming.
6. What strategies would you use to resolve synchronization issues in a distributed video
conference?
1. Synchronization ensures all participants in a video conference see and hear events in the correct
order and at the right time.
2. Timestamps are crucial—they help align video and audio streams across devices using
synchronized clocks.
3. Protocols like NTP (Network Time Protocol) or Berkeley’s Algorithm can be used to
maintain synchronized time across systems.
4. Using buffering allows systems to handle delays and jitter by storing data before playback,
improving continuity.
5. Time-stretching techniques can adjust playback speeds slightly to resynchronize streams in
real-time.
6. Distributed consensus may be used to agree on the sequence of shared screen updates or
interactive inputs.
7. Network conditions like latency and packet loss must be addressed using adaptive bitrate
streaming and retransmission protocols.
8. A master clock or coordinator server can be used to issue timestamps or control event order.
9. Real-life apps like Zoom use such strategies to deliver synchronized audio, video, and shared
content to users across the globe.

7. What evidence supports the use of consensus algorithms in group synchronization?


1. Consensus algorithms help multiple nodes agree on a common value even when some nodes
fail or messages are delayed.
2. They are fundamental in achieving group synchronization, especially in fault-tolerant
distributed systems.
3. Paxos, Raft, and Viewstamped Replication are examples of consensus algorithms used in
practice.
4. Google’s Chubby lock service uses Paxos to manage distributed locks and maintain
synchronization across its services.
5. In blockchain networks, consensus ensures all copies of the ledger remain consistent across
thousands of nodes.
6. Distributed databases like Etcd and ZooKeeper rely on consensus to coordinate configuration
changes and leader election.
7. Consensus helps avoid split-brain scenarios where two parts of a system believe they are in
control.
8. Without consensus, nodes might take conflicting decisions, leading to data corruption or service
failure.
9. So, real-world applications across cloud computing, blockchain, and file systems validate the
critical role of consensus in synchronization.

8. Describe Stream-oriented communication with suitable examples.


1. In stream-oriented communication, data is sent as a continuous stream of bytes, allowing
real-time interaction.
2. It does not maintain message boundaries, making it suitable for applications needing smooth
and constant data flow.
3. TCP (Transmission Control Protocol) is a common stream-oriented protocol that ensures
reliable, ordered delivery.
4. Applications like video streaming, audio calls, and file transfers use this mode to maintain
steady data flow.
5. For example, in a live YouTube stream, the server sends video data continuously to the client
as a stream.
6. Stream-oriented systems support flow control and congestion control to adapt to changing
network conditions.
7. Because there are no message boundaries, the application must decide where the message ends
using special markers.
8. Streams can be full-duplex, allowing simultaneous sending and receiving of data, which is vital
in video conferencing.
9. The main advantage is continuous data delivery, but it requires careful handling to manage
boundaries and buffering.

9. Illustrate Message-oriented communication with suitable examples.


1. In message-oriented communication, data is exchanged as discrete messages, each being a
complete unit.
2. It maintains message boundaries, making it suitable for request-response or event-driven
systems.
3. UDP, AMQP, and MQTT are popular message-oriented protocols.
4. A real-life example is email, where each mail is sent as an independent message from one server
to another.
5. Message queues like RabbitMQ or Apache Kafka use message-oriented communication to
deliver data reliably between components.
6. These systems allow for asynchronous communication, decoupling sender and receiver.
7. It also supports multicast and broadcast, making it suitable for chat applications and IoT
devices.
8. Message loss is a risk, so reliability is often added with acknowledgment and retry mechanisms.
9. Message-oriented communication is ideal when you need clear separation between data units,
like in job scheduling or alerts.

10. Compare difference between message-oriented communication and stream-oriented


communication.
1. Message-oriented communication sends data in distinct packets, while stream-oriented sends
data as a continuous flow.
2. Messages are self-contained and maintain boundaries, but streams require boundary
management by the application.
3. Message-oriented systems (like UDP, MQTT) are suitable for event-based communication
with less overhead.
4. Stream-oriented systems (like TCP) ensure reliable, ordered byte streams, ideal for continuous
data like audio or video.
5. Message-oriented communication is generally faster and simpler, but less reliable without
added mechanisms.
6. Streams support flow and congestion control, making them more robust but heavier.
7. In message systems, each message is processed separately; in streams, data is read byte-by-
byte.
8. Examples: Kafka for message-based; YouTube Live for stream-based.
9. The choice depends on the application's needs—streams for smooth playback, messages for
independent commands.

11. Illustrate Socket Programming Primitives in detail.


1. Socket programming enables communication between processes on the same or different
machines using a network.
2. A socket is an endpoint for sending or receiving data across a network. It supports both TCP
and UDP.
3. The basic socket primitives include socket(), bind(), listen(), accept(), connect(), send(), and
recv().
4. socket() creates a new socket; bind() assigns it to a specific port and address.
5. listen() puts the socket in passive mode for TCP servers to accept incoming requests.
6. accept() waits for a connection; when a client connects, it returns a new socket for
communication.
7. connect() is used by clients to establish a connection to a server.
8. Once connected, send() and recv() allow data exchange between client and server.
9. Example: A chat application using sockets where one user sends messages and another receives
them using these primitives.

12. Analyze what is happens-before relation in Logical clock? Does the ordering of event matter?
Explain with Lamport logic and a suitable example.
1. The happens-before (→) relation defines a causal ordering of events in distributed systems. It
holds that if two events occur in the same process, the earlier event → the later one, and if event
A sends a message that event B receives, then A → B.
2. This relation is transitive: if A → B and B → C, then A → C. It captures the notion of causality,
ensuring effects never precede their causes across processes.
3. Event ordering matters because operations that depend on each other must be observed in the
correct sequence to maintain consistent system state. Ignoring order breaks causality.
4. Lamport’s logical clock assigns a counter to each event: each process increments its clock
before executing an event and attaches this timestamp when sending a message.
5. On receiving a message, a process updates its clock to max(local, received) + 1, ensuring its
new timestamp reflects the causally preceding event.
6. If two events are concurrent (neither → the other), Lamport clocks may still order them
arbitrarily, but this does not violate causality since there is no causal link.
7. Example: Process P1 has clock=2, sends “m” to P2. P2’s clock was 1, so on receipt it becomes
max(1,2)+1 = 3, preserving send(m) → receive(m).
8. This mechanism allows all processes to agree on the order of causally related events even
without synchronized physical clocks.
9. In summary, the happens-before relation and Lamport’s logical clocks guarantee that causally
linked events are consistently ordered across distributed processes.

13. Explain the key components of the Ricart–Agrawala algorithm and their importance.
1. The Ricart–Agrawala algorithm is a non-token-based method for ensuring mutual exclusion
in distributed systems. Its first component is timestamped Request messages, carrying the
sender’s Lamport timestamp and ID.
2. Each process maintains a Lamport clock to assign timestamps, ensuring a total ordering of
all requests across the system.
3. When a process wants the critical section (CS), it sends a Request(timestamp, pid) to all other
processes and waits for their replies.
4. On receiving a Request, a process sends an immediate Reply if it is not interested in the CS or
if the incoming timestamp is smaller (higher priority) than its own.
5. If the receiving process is also requesting the CS with a smaller timestamp, it defers its reply
by enqueuing the incoming request in a deferred queue.
6. A process can only enter the CS after receiving Reply messages from all other processes,
guaranteeing exclusive access.
7. After exiting the CS, the process sends Reply messages to all queued requests, allowing them
to proceed in timestamp order.
8. This design ensures fairness (oldest request first), deadlock freedom, and no starvation, as
every request eventually gets a reply.
9. The combination of logical timestamps, request/reply messaging, and deferred queues makes
Ricart–Agrawala simple, decentralized, and efficient for mutual exclusion.

14. Illustrate Singhal’s heuristic algorithm with a suitable example.


1. Singhal’s heuristic algorithm optimizes mutual exclusion by maintaining for each process a
dynamic active set of peers likely to grant permission, reducing needless broadcasts.
2. When a process wants the CS, it sends Request messages only to processes in its active set, not
to all processes in the system.
3. Upon receiving a Request, a process grants permission immediately if it is not requesting the
CS, adding the requester to its active set.
4. If the receiver is also requesting with higher priority, it queues the incoming request for later
reply.
5. Over time, each process’s active set evolves to include only those peers with recent cooperative
interactions, minimizing message traffic.
6. Example: Processes P1, P2, P3 start with active sets {P2,P3}. P1 requests CS, P2 and P3 reply
and add P1. Next time, P1 still contacts only {P2,P3}.
7. This heuristic dramatically reduces messages when requests are frequent among a subset of
processes.
8. Mutual exclusion holds because active sets overlap and timestamp rules ensure only one
process holds all permissions at a time.
9. Singhal’s algorithm balances efficiency (fewer messages) with correctness, making it well-
suited for systems with clustered access patterns.

15. Illustrate Raymond’s Tree-Based Algorithm for token-based distributed mutual exclusion.
1. Processes are arranged in a logical tree and exactly one token circulates; only the token holder
may enter the CS.
2. Each process has a parent pointer (toward the token) and a FIFO request queue for tracking
pending requests.
3. To request the CS, a process sends a Request to its parent pointer if it does not hold the token.
4. Intermediate nodes forward the request upstream if they are not waiting for the token, or queue
it if they have already requested it.
5. When the token holder exits the CS, it checks its queue and sends the token to the first
requester, updating pointers along the path.
6. Example: In a chain P1→P2→P3 with the token at P1, if P3 requests, P3→P2→P1. P1 then
sends token P1→P2→P3.
7. Upon receiving the token, each node passes it down the path to the requester, ensuring exclusive
CS access.
8. Message complexity is O(h) where h = tree height, often O(log N) in balanced trees, reducing
overhead compared to full broadcasts.
9. Raymond’s algorithm achieves efficient, fair mutual exclusion by routing the token along tree
paths only to requesting processes.

16. Design a solution using Maekawa’s Algorithm for managing resource access in a distributed
file system.
1. Maekawa’s algorithm uses quorum (vote) sets: each server has a unique set of peers, with
every two sets overlapping in at least one member.
2. To access a file, a server sends Request messages to all members of its quorum, asking for
permission.
3. A quorum member sends Grant if it is not currently granting to another or has not already
granted to a higher-priority request; otherwise it queues the request.
4. When the requesting server receives Grants from its entire quorum, it enters the critical section
and accesses the file.
5. After finishing, it broadcasts Release to its quorum, allowing members to grant pending
requests in their queues.
6. Example design: For 16 servers, arrange them in a 4×4 grid. Each server’s quorum is its row
plus column (size 7), ensuring overlap.
7. Overlapping quorums guarantee that two concurrent requests share at least one common
member, preventing simultaneous access.
8. This reduces message complexity to O(√N) per access (versus O(N)) and avoids a single point
of failure.
9. Maekawa’s algorithm thus provides efficient, fault-tolerant, and decentralized control for
distributed file access.

17. Illustrate Suzuki–Kasami Broadcast Algorithm for token-based distributed mutual exclusion.
1. Suzuki–Kasami employs a single token that grants exclusive CS access; requests are broadcast
to all processes.
2. Each process maintains an array RN of highest request numbers seen; the token holder
maintains LN of last served numbers.
3. To request the CS, a process increments its RN entry and broadcasts its RN to all peers.
4. Upon receiving the broadcast, peers update their RN for that sender; explicit replies are not
required.
5. When the token holder exits the CS, it checks for processes with RN[i] > LN[i], enqueues them
in increasing order.
6. It then sends the token to the first process in its queue and updates LN[i] for that process.
7. Example: P2 broadcasts RN[2]=3; token holder P1 sees RN[2]>LN[2], enqueues P2, and sends
the token to P2.
8. This ensures fairness (served in sequence number order) and deadlock freedom, as every
request is eventually served.
9. Suzuki–Kasami is simple and effective, trading off broadcast cost for a clear, sequence-based
token passing mechanism.

18. Illustrate the need for a Coordinator in Distributed Systems and demonstrate the Bully
Election Algorithm.
1. A coordinator (leader) centralizes tasks like resource allocation, clock sync, and failure
detection, simplifying coordination among processes.
2. If the coordinator fails, no process can perform these centralized tasks, so a re-election
mechanism is required to maintain system operation.
3. The Bully Algorithm elects the highest-ID alive process as coordinator: any lower-ID process
detecting failure initiates an election.
4. It sends an Election message to all higher-ID processes. If none respond, it declares itself
coordinator and broadcasts a Coordinator message.
5. If a higher-ID process responds, it takes over the election, repeating the process with even
higher IDs until the highest-ID alive process wins.
6. Example: Processes {P1=ID1, P2=ID2, P3=ID3}. If P3 fails, P1 and P2 start elections; P2 hears
no response from P3, becomes coordinator, and informs P1.
7. This ensures that the process with the maximum ID among alive processes becomes the new
leader, restoring central control.
8. The Bully Algorithm handles coordinator failures quickly but can incur O(N²) messages in
worst cases; it’s simple and effective for moderate-sized systems.
9. A reliable coordinator and election algorithm are vital for maintaining consistency and
availability in distributed environments.

19. Discuss the key components of the Ricart–Agrawala algorithm and explain their significance
in achieving mutual exclusion.
1. Logical Clocks: Each process uses Lamport clocks to timestamp requests, providing a global
ordering of critical section requests.
2. Request Messages: A process wishing to enter the CS sends Request(timestamp, pid) to all
peers, initiating permission-seeking.
3. Immediate Replies: Peers reply immediately if they are not interested in the CS or if the
incoming request’s timestamp is smaller, granting permission.
4. Deferred Replies: If a peer has its own pending request with a smaller timestamp, it queues
the incoming request instead of replying.
5. Deferred Queue: Each process maintains a queue of deferred requests; these are serviced
(replied to) once the process exits the CS.
6. Reply Counting: A requester must receive replies from all peers before entering the CS,
ensuring exclusive access.
7. Exit Protocol: On exiting, the process sends replies to all queued requests, unlocking those
processes to enter the CS.
8. These components ensure mutual exclusion (only one process in CS), fairness (oldest request
first), and deadlock freedom (no cyclic waits).
9. Ricart–Agrawala’s decentralized design avoids single points of failure and scales well for
moderate numbers of processes.

20. Develop a resource access management solution for a distributed file system using Maekawa’s
Algorithm, and explain its working and advantages.
1. In a distributed file system, assign each file server a voting set (quorum) of peers, with
overlapping sets guaranteeing mutual exclusion.
2. To access a file, a server sends Request messages to all members of its quorum, seeking
permission to enter the CS.
3. Quorum members reply with Grant if they are free; otherwise they queue the request for later.
4. Once the server collects Grants from its entire quorum, it accesses the file exclusively in its
critical section.
5. After use, the server sends Release to its quorum, allowing them to grant pending requests from
their queues.
6. Design example: For 25 servers, arrange them in a 5×5 grid; each server’s quorum is its row
plus column (size 9), ensuring overlap.
7. Advantages: Message complexity is reduced to O(√N) per access, significantly less than
contacting all N servers.
8. Overlapping quorums ensure safety (no two servers access simultaneously), and
decentralization avoids single points of failure.
9. Maekawa’s algorithm thus offers an efficient, scalable, and fault-tolerant solution for file access
management in distributed systems.

21. What are the broader implications of message complexity in Raymond’s Tree-Based
Algorithm?
1. In Raymond’s algorithm, message complexity for one CS entry is O(h), where h is the height
(number of hops) between the requester and token-holder.
2. For a balanced tree of N processes, h ≈ O(log N), giving logarithmic message cost and low
average latency.
3. However, in unbalanced trees (e.g., chains), h can be O(N), leading to high latency and heavy
network traffic for each request.
4. High message complexity increases waiting time for the critical section and can become a
bottleneck under heavy contention.
5. On the plus side, Raymond’s algorithm avoids system-wide broadcasts, limiting traffic to the
tree path and often saving messages relative to all-to-all schemes.
6. Designers must balance the tree to keep h small; poor tree structure can cause hotspots near
the token-holder and degrade performance.
7. Understanding message complexity helps in capacity planning—it guides how to structure the
process tree and place the token-holder.
8. In large-scale systems, optimizing for low h ensures scalability, while ignoring complexity can
lead to network congestion and uneven load.
9. Thus, message complexity in Raymond’s algorithm highlights key trade-offs between
efficiency, latency, and scalability in distributed mutual exclusion.
Module 4,5

1. Describe the step-by-step procedure for process migration.


1. Selection of Process for Migration:
Before migrating, the system decides which process is best to move. This choice depends on several
factors like CPU usage, memory load, and resource requirements. It avoids moving processes that
depend on special devices or are in critical states. The goal is to reduce overload on one system and
improve overall performance.
Example: If a game server is too busy, the system might choose a less critical background task to
move to another server.
2. Suspending the Process and Saving Its State:
Once a process is selected, it is paused so it won’t change during migration. The system then saves all
parts of the process like memory, program counter, and open files. This saved state is used to recreate
the process later. It is like taking a photo of everything the process is doing.
Example: A paused video editing app may have a large file open, a timeline mid-edit, and effects in
memory—all of this needs saving.
3. Preparing the Destination Machine:
The destination system must be ready to accept the incoming process. It sets up memory, copies
required files, and checks for permissions. The new system must match the process’s needs to run
smoothly after migration.
Example: If the task involves audio processing, the new system must have audio support and enough
memory for large files.
4. Transferring and Restarting the Process:
The saved state is sent over the network to the new machine. Once received, the process is restarted
with the same data and continues working from where it left off. The user usually doesn’t even notice
the move if done correctly.
Example: In cloud gaming, if one server gets overloaded, the game session can be quietly shifted to
another server, and the player keeps playing.
5. Cleanup and Final Setup:
After migration, the system checks if any old links (like file or network connections) need to be
reconnected. Then, the old system cleans up memory and resources used by the process. This frees up
space and keeps things efficient.
Example: After a design tool is moved to another machine, the old machine disconnects its access to
project files to avoid conflicts.

2. Compare different approaches to code migration in distributed systems.


1. Weak Mobility:
In weak mobility, only the code and input data move to the new machine. The execution always starts
from the beginning, not from where it left off. This is simpler and commonly used when tasks are
lightweight or short-lived.
Example: A Java applet downloaded and run on a client browser is weak mobility.
2. Strong Mobility:
Strong mobility moves not just the code but the entire execution state, including memory and
registers. The program resumes from the same point on the new machine. This is useful for long-
running or complex programs.
Example: A paused game level continuing on another server.
3. Remote Evaluation:
Here, a client sends code to a remote machine for execution. It’s used when the client is weak or lacks
power. This helps reduce the client’s workload.
Example: A phone sending an AI image filter to be processed in the cloud.
4. Code on Demand:
In this method, a machine downloads code when needed, often to add features or process new
formats. It reduces the initial load and updates systems quickly.
Example: A website loading JavaScript or Flash only when a user clicks a button.
5. Comparison Summary:
Weak mobility is easier but can’t resume from mid-execution. Strong mobility is more powerful but
complex. Remote evaluation and code-on-demand focus on where the execution happens and how it’s
triggered.

3. Compare the load balancing approach with the load-sharing approach in distributed
computing.
1. Load Balancing:
Load balancing actively monitors systems and moves tasks between them to keep the workload even.
It reduces delays and ensures no server is overloaded.
Example: A website that sends visitors to less busy servers.
2. Load Sharing:
Load sharing only assigns new tasks to the least loaded server. It doesn’t move running tasks even if a
server becomes busy.
Example: A printer network where a new print job goes to an idle printer.
3. Decision Timing:
Load balancing is often dynamic and done in real-time, while load sharing can be static and simpler.
Balancing adapts to changes better.
Example: In a cloud system, dynamic balancing keeps adjusting as users join or leave.
4. Efficiency:
Load balancing usually gives better performance and responsiveness, especially in large or changing
systems. Load sharing works well in simpler or smaller setups.
Example: A social media app uses load balancing to handle millions of users smoothly.
5. Summary:
Both improve performance, but load balancing is smarter and more proactive. Load sharing is easier
to implement but less flexible.

4. Illustrate the task assignment approach with a suitable example.


1. What It Means:
Task assignment involves deciding where each new task should run. The system tries to match the
task’s needs with a machine that can handle it well.
2. Static vs Dynamic:
In static assignment, decisions are made before tasks run. In dynamic assignment, the system checks
current loads and resources before assigning.
3. Goal:
The goal is to reduce task wait time and use all resources efficiently. It also avoids overload and
makes better use of powerful machines.
4. Example:
In a university lab, when students submit code to run, the system assigns each job to the least busy
computer with enough memory.
5. Real-Life Analogy:
It’s like assigning dishes in a kitchen: lighter meals to fast cooks and big meals to stronger chefs,
based on current workload.

5. Explain different load estimation policies used by load balancing approach.


1. Queue Length:
The system checks how many tasks are waiting on each machine. A longer queue means a busier
system.
2. CPU Usage:
The percentage of CPU currently used helps estimate load. A high percentage shows the system is
already working hard.
3. Resource Usage:
Memory, disk, and network usage are also checked. A server might have low CPU use but still be
slow due to low memory.
4. Averaging Over Time:
Some systems look at average load over time to avoid reacting to sudden, short spikes in activity.
5. Example:
A file server may appear idle for a moment, but by looking at past 5-minute usage, the system knows
whether to assign new tasks or not.

6. Illustrate Desirable Features of a Global Scheduling Algorithm


1. Fairness:
A good global scheduling algorithm must treat all tasks and users fairly. No task should be starved or
ignored due to system preferences.
Example: An online classroom server gives every student equal bandwidth during a quiz.
2. Scalability:
The algorithm should work efficiently even if more computers or users are added. It should not slow
down when the system grows.
Example: A gaming platform adds more servers as players increase, and tasks are still well managed.
3. Adaptability:
The scheduler should respond to changes in system load and network speed. It must adjust task
assignments in real-time.
Example: If one cloud region becomes slow, tasks are moved to another region.
4. Low Overhead:
It should not use too much system power or network bandwidth just to make decisions. The cost of
scheduling must stay small.
Example: Scheduling shouldn't slow down video streaming more than the streaming itself.
5. Reliability:
Even if one machine crashes, the algorithm should continue scheduling tasks. It must be fault-tolerant.
Example: If one server goes offline, the system reassigns work without human help.

7. Analyze State Information Exchange, Priority Assistance, and Migration Limitation Policies
in Load Balancing
1. State Information Exchange:
This defines how machines share their load information. It can be done regularly (periodically), when
needed (on demand), or never (static systems).
Example: Servers checking in with each other every 10 seconds to report CPU use.
2. Priority Assistance:
In this, heavily loaded machines request help from lightly loaded ones. It ensures urgent help reaches
overloaded nodes.
Example: A video processing node asks other idle nodes for help rendering scenes.
3. Migration Limitation Policies:
These policies decide how often tasks should be moved. Too much migration can waste time and
overload the network.
Example: A policy that says, “don’t migrate unless CPU usage is above 85%.”
4. Trade-off Management:
More updates mean better load awareness, but also more communication cost. The system must
balance between performance and overhead.
Example: In high-speed trading, the system must act fast without overloading itself.
5. Coordination:
All three policies must work together—good state info helps with fair assistance, and migration limits
protect the system from constant shuffling.

8. Analyze the Trade-offs Between Static and Dynamic Task Assignment Approaches
1. Static Assignment:
In static systems, tasks are assigned in advance based on fixed assumptions about load and resource
availability. It's simple but can’t adapt to real-time changes.
Example: A company assigns tasks based on past usage, assuming it won’t change.
2. Dynamic Assignment:
Dynamic systems assign tasks based on current system conditions. They are more flexible but more
complex.
Example: A cloud service routes requests to the least busy server at that moment.
3. Performance:
Static methods are faster in small or predictable environments. Dynamic ones work better in changing
or large systems but may have some delay due to decision-making.
Example: A web server farm dynamically balances traffic during a big event.
4. Overhead:
Static systems have low communication cost. Dynamic systems use more resources to monitor and
update load states.
Example: A sensor network avoids dynamic assignment to save energy.
5. Error Handling:
Dynamic systems can recover from failures by reassigning tasks. Static systems struggle if a pre-
assigned server crashes.

9. How Can Task Prioritization and Fairness Be Ensured in Global Scheduling Algorithms?
1. Task Priority Levels:
Tasks can be given high, medium, or low priority. The scheduler ensures urgent tasks are served first
without ignoring others.
Example: Emergency alerts get faster service than background file syncing.
2. Aging Technique:
To prevent low-priority tasks from waiting forever, their priority increases over time. This ensures
fairness.
Example: A print job waiting for 10 minutes moves up in priority.
3. Fair Queueing:
The system assigns equal time slices to all users or groups. It balances user needs across the system.
Example: In a classroom, all students get equal computer time during an online test.
4. Feedback Control:
The scheduler uses past performance to adjust priorities. If some tasks always finish late, their future
priority may increase.
Example: A slow-loading webpage is prioritized next time it’s accessed.
5. Policy + Mechanism:
Fairness is a policy goal; the mechanism (like queues, aging, etc.) makes it work. Good design
balances performance with equality.

10. Apply Task Migration Strategies to Balance Workloads in a Heterogeneous Environment


1. Understand Machine Differences:
In a heterogeneous system, machines vary in speed and power. The system must know which machine
is best for each type of task.
Example: A graphics-heavy task is sent to a GPU-enabled server.
2. Load Monitoring:
Tasks are moved from overloaded to underloaded machines. The system considers not just CPU, but
also memory and special hardware.
Example: If a slow server gets too many jobs, some are moved to a faster server.
3. Cost of Migration:
Task movement takes time and network bandwidth. Only tasks that give benefit after migration are
moved.
Example: A 5-minute file processing task may be moved, but a 2-second task won’t.
4. Scheduling with Knowledge:
The scheduler considers both the machine capabilities and current load before migrating. This
prevents wrong decisions.
Example: An AI workload is shifted from a busy CPU node to an idle GPU node.
5. Feedback and Tuning:
Over time, the system learns what works best and improves migration choices. This makes future
decisions faster and smarter.

11. How do load balancing strategies influence scalability in distributed systems?


1. Even Task Distribution:
Load balancing ensures tasks are spread evenly across machines, avoiding overuse of any one node.
This keeps the system fast and responsive.
Example: An online store spreads web traffic to many servers on Black Friday.
2. Efficient Resource Use:
Balancing uses all available CPU, memory, and network efficiently, so no resource is wasted. This
supports more users without needing new hardware.
Example: Cloud systems run many small websites on the same servers without slowdowns.
3. Support for Growth:
As the system grows (more users or devices), good balancing ensures new machines can be added
smoothly without changing the whole system.
Example: A video app can scale to millions of viewers by adding new streaming servers.
4. Less Downtime:
With balanced loads, failures on one machine don't stop the system. Tasks shift to working machines
automatically.
Example: If one game server crashes, the match continues on another one.
5. Scalability Outcome:
Overall, load balancing allows systems to grow, handle more work, and respond better, making them
scalable and reliable.

12. Illustrate the Andrew File System (AFS) in detail.


1. Overview:
AFS is a distributed file system designed to share files across multiple machines while providing
access like a local system. It focuses on security, scalability, and efficiency.
2. Location Transparency:
Users don’t need to know where the file is stored. AFS presents one big virtual directory to everyone.
Example: Students in different cities can access the same “/afs/school/notes” folder.
3. Caching:
AFS caches files on the client side. Once a file is accessed, it's stored locally, reducing future network
requests.
Example: If a student opens a project once, it loads faster next time even offline.
4. Server Volume Concept:
Files are grouped into volumes. Volumes can be moved or replicated across servers without affecting
users.
Example: Admins move data from busy to idle servers to boost performance.
5. Access Control:
AFS uses Kerberos for secure logins and lets users control who can read or write their files.

13. Illustrate the Network File System (NFS) in detail.


1. Introduction:
NFS allows users to access files over a network as if they were on their local drive. It uses a client-
server model and is commonly used in UNIX/Linux systems.
2. Stateless Protocol:
NFS (especially early versions) doesn’t remember past client activity. Every request includes all
needed info.
Example: Every file read includes user ID, file path, and offset.
3. File Sharing:
NFS enables multiple users to access and even write to the same file across systems.
Example: Office employees edit shared documents on a central server.
4. Mounting:
Clients “mount” remote directories, making them appear as part of the local file system.
Example: “/project” on your PC might actually be stored on another machine.
5. Weaknesses:
Because early NFS was stateless and had no strong authentication, it faced security and consistency
issues.

14. Illustrate the architecture of NFS.


1. Client-Server Model:
NFS follows a client-server design. The client requests file operations (read, write) from a remote
server.
2. Mount Protocol:
Before accessing remote files, the client uses the mount protocol to connect to a shared directory.
Example: Client mounts “/home/shared” from Server A to access common files.
3. NFS Protocol:
NFS uses RPC (Remote Procedure Call) to perform file actions over the network. It's simple but
powerful.
4. File System Layer:
The file system layer handles directory structure and file metadata. It converts NFS calls to actual disk
operations.
Example: NFS says “read block 3 of file X,” and the server reads from disk.
5. Daemons:
Background services like nfsd, rpc.mountd, and rpc.statd handle NFS tasks, connection requests, and
recovery.
15. Explain different file accessing models in a distributed system.
1. Remote Access Model:
Files are accessed from a server over the network, and all operations happen remotely.
Example: Editing a file directly on Google Drive.
2. Upload-Download Model:
The whole file is downloaded first, then edited locally, and finally uploaded back.
Example: Downloading a Word doc from email, editing, then re-attaching.
3. Caching Model:
A copy of the file is kept locally for fast access. Changes sync later with the server.
Example: Offline editing in Google Docs that syncs later.
4. Session Semantics:
All changes made during a session are saved when the file is closed. Others don’t see updates until the
session ends.
Example: When two users edit the same file separately, their changes may conflict later.
5. Summary:
Each model balances speed, consistency, and network usage differently depending on system needs.

16. Analyze cache validation schemes and explain their different types.
1. Write-Invalidate:
When a client updates a file, other clients' cached copies are marked invalid. They must re-fetch the
file.
Example: Only one person can edit a Google Sheet cell at once.
2. Write-Update:
When one client updates, the new data is sent to all other clients. This keeps all copies up-to-date.
Example: Collaborative live editing in tools like Google Docs.
3. Periodic Validation:
Clients regularly check with the server to see if the file has changed.
Example: Your browser checks if a website has new content every few minutes.
4. On-Open Validation:
Validation happens when a file is opened. If the server version is newer, the client fetches the update.
Example: Dropbox checks for newer versions of a file when you open it.
5. Trade-offs:
Write-update uses more bandwidth, but write-invalidate is simpler. Each is chosen based on system
goals.

17. Compare various name resolution techniques in distributed systems.


1. Iterative Resolution:
The client asks one server, which returns the address of the next server to contact.
Example: DNS gives you the next nameserver instead of a final answer.
2. Recursive Resolution:
The server takes responsibility for fully resolving the name and returns the final address.
Example: Your DNS resolver contacts all needed servers on your behalf.
3. Caching:
Clients and servers store results of past resolutions to speed up future lookups.
Example: Your computer remembers the IP of a website you visited earlier.
4. Broadcast:
The name request is sent to all nodes in the network. Only the right one responds.
Example: Some older LAN systems used this method for device discovery.
5. Comparison:
Recursive gives simplicity to the client, but loads servers. Iterative reduces server load but needs
smarter clients.

18. Explain the quorum-based protocol for updating multiple copies of files.
1. What It Solves:
In distributed systems, files often have multiple copies. Quorum ensures safe updates by requiring a
majority agreement.
2. Read and Write Quorums:
Before a read or write, the system contacts a minimum number of replicas. Write quorum (W) + read
quorum (R) must be > total replicas (N).
Example: For 5 replicas, if W=3 and R=3, at least one replica will always have the latest update.
3. Conflict Avoidance:
By contacting multiple nodes, quorum avoids conflicts from out-of-sync replicas.
Example: Email clients syncing across phone, tablet, and laptop.
4. Strong Consistency:
It ensures at least one node with the latest write is always read from, keeping data correct.
5. Trade-off:
More nodes = more safety but more delay. System designers pick values based on speed vs.
consistency.

19. Evaluate the effectiveness of DNSSEC in securing DNS systems.


1. What DNSSEC Does:
DNSSEC (DNS Security Extensions) adds digital signatures to DNS records to protect against fake
DNS replies (spoofing).
2. Authentication, Not Encryption:
It ensures the DNS data is genuine and unchanged, but it doesn’t hide the data from others.
Example: You can trust you’re really visiting your bank's site, not a fake one.
3. Trust Chain:
DNSSEC uses a chain of trust from the root to each domain. Each zone signs the next lower zone.
4. Deployment Challenges:
Not all systems support DNSSEC, and some admins find it complex. Slow adoption reduces
effectiveness.
5. Overall Benefit:
When fully deployed, DNSSEC prevents many cyberattacks like cache poisoning and is key for
internet safety.

20. How does client-side caching differ from server-side caching?


1. Client-Side Caching:
Data is stored on the client’s local machine. It speeds up access and reduces server load.
Example: A browser stores images so pages load faster next time.
2. Server-Side Caching:
The server keeps commonly requested data ready. This speeds up response for many users.
Example: A news site’s homepage is stored in memory to handle traffic.
3. Control:
Client-side caching is managed by the user's system. Server-side caching is managed by the server.
4. Consistency:
Client caches may get outdated. Server-side caches are easier to keep fresh.
Example: A user sees an old version of a blog post until their cache refreshes.
5. Use Cases:
Client caching helps with offline access. Server caching is better for high-traffic optimization.

21. What are the main challenges of maintaining consistency in DFS?


1. Concurrent Access:
Multiple users may edit the same file at once, leading to conflicts or overwritten data.
2. Network Delays:
Updates may not reach all servers at the same time. Some users may see old versions.
3. Partial Failures:
Some nodes may go down during updates, causing mismatch in data across replicas.
4. Caching Issues:
Clients using cached files may unknowingly work on outdated copies, creating sync errors.
5. Conflict Resolution:
DFS systems need rules or manual help to resolve version conflicts. This can be tricky.

22. Compare the fault tolerance mechanisms of eager and lazy replication.
1. Eager Replication:
All replicas are updated at the same time. It gives strong consistency but needs lots of coordination.
Example: A bank updates all copies of a customer record instantly.
2. Lazy Replication:
Only one replica is updated first, and others catch up later. It’s faster but may be inconsistent.
Example: A photo upload shows up immediately on your phone, later on your tablet.
3. Fault Tolerance:
Eager systems handle failures better since all nodes are up-to-date. Lazy systems risk data loss if the
main node crashes before syncing.
4. Performance:
Lazy replication has better performance under high load, as fewer updates happen in real-time.
5. Trade-offs:
Eager = safe but slow. Lazy = fast but may show old data. Choice depends on how critical consistency
is.

You might also like