Discrete computing
Discrete computing
AFS FEATURES:
Andrew File System is designed to:
1. Handle terabytes of data.
2. Handle thousands of users.
3. Working in a WAN Environment.
Advantages
Shared files that aren't updated very often will last a long time.
It sets up a lot of storage space for caching.
AFS is designed for large-scale environments & can handle a significant no. of users & files
Disadvantages
AFS setup can be complex & requires a good understanding of its architecture & components.
AFS may exhibit higher latency compared to local file systems
AFS may have compatibility issues with newer operating systems.
Explain NFS in Distributed Systems
1. NFS stands for Network File System.
2. NFS is a platform independent remote file system technology created by Sun Microsystems
(Sun) in 1984.
3. It is a client/server application that provides shared file storage for clients across a network.
4. It was designed to simplify the sharing of file systems resources in a network of
non-homogeneous machines.
5. It is implemented using the RPC protocol and the files are available through the network via a
VirtualFile System (VFS), an interface that runs on top of the TCP/IP layer.
6. Allows an application to access files on remote hosts in the same way it access local files.
7. NFS is generally implemented following the layered architecture shown in figure 6.8 below.
.
Advantages :
NFS is generally easy to set up and configure
NFS can be scaled to accommodate large file systems and numerous clients
NFS can offer fast query execution and data transfer rates
NFS simplifies storage management and backup processes
Disadvantages:
NFS is heavily reliant on network performance and stability.
NFS is a stateless protocol, which means that the server does not track client state
NFS is heavily reliant on network performance and stability.
HDFS (Hadoop Distributed File System) is a unique design that provides storage for
extremely large files,data in a range of petabytes (1000 TB). with streaming data access pattern
HDFS is designed on principle of write-once and read-many-times.
Once data is written large portions of the dataset can be processed any number of times.
NameNode(MasterNode): Manages all the slave nodes and assigns work to them.
It executes filesystem namespace operations like opening, closing, and renaming files and
directories. It should be deployed on reliable hardware that has a high configuration. not on
commodity hardware.
DataNode(SlaveNode): Actual worker nodes do the actual work like reading, writing,
processing, etc.They also perform creation, deletion, and replication upon instruction from the
master. They can be deployed on commodity hardware.
HDFS daemons: Daemons are the processes running in the background.
DataNodes:
Run on slave nodes.
Require high memory as data is actually stored here.
Features:
Limitations:
Applications that require low-latency access to data i.e in the range of milliseconds will not work
well with HDFS
Having lots of small files will result in lots of seeks and lots of movement from one datanode to
another datanode to retrieve each small file, this whole process is a very inefficient data access
pattern.
DESIRABLE FEATURES OF A GOOD DISTRIBUTED FILE SYSTEM:
I) Transparency- refers to hiding details from the user. It consists of following types:
a. StructureT: Clients shouldn’t know the no. of locations of the file servers & the storage
devices.
b. Access Transparency: The local and remote files should be accessible in the same way.
File system interface should not be able to distinguish between the local and remote files.
c. Naming Transparency: Name of the file should give no hint of the location of the file.
Without changing the filename, the file should be allowed to move from one node to another.
d. Replication Transparency: If a file is located in multiple nodes, its existing multiple copies
should be hidden from the client.
II) User Mobility: 1. Users should not be forced to work on specific nodes.
2. Users should have flexibility to work on different nodes at different times.
III) Performance: 1.the average amount of time required to satisfy the client requests.
2. This time also includes network communication overhead when the access file is remote.
3. Performance of DFS should be comparable to the performance of Centralized File System.
IV) Simplicity and ease of use: 1. Semantics of the DFS should be easy to understand.
2. User interface of the file system must be simple and the number of commands should be as
small as possible.
V) Scalability: 1. DFS grows with time due to expansion of networks by adding two or more
networks at a common place.
VI) High availability: 1. System should continue to function even when partial failures occur.
2. Failure causes temporary loss of service to small groups of users.
3. Highly available DFS should have multiple independent file servers.
IX) Security: 1. DFS should be secure so as to provide privacy to the users for their data.
2. Necessary Security mechanism must be implemented to keep the data secure.
III) Immutable shared files semantics: 1. It is based on the use of an immutable file model.
2. An immutable file cannot be modified once it has been created.
3. The creator declares the file as shareable so as to make it an immutable file.
implementation
2. The basic idea of RPC is to make a remote procedure call look transparent.
3. The calling procedure should not be aware that the called procedure is executing on a
different machine.
4. RPC achieves transparency in an analogous way.
5. The caller (client process) sends a call (request) message to the callee (server process) and
waits for the reply.
6. The request message contains the remote procedures parameters, among other things.
7. The server process executes the procedure and then returns the result of procedure
execution in a reply message to the client process.
8. Once the reply message is received, the result of procedure execution is extracted, and the
caller’s execution is resumed.
9. It has no idea that work was done remotely instead of the local operating system.
10. In this way transparency is achieved.
In an RMI application, we write two programs, a server program (resides on the server) and a
client program (resides on the client).
5.RMI architecture includes:
a. Transport Layer: This layer connects the client and the server. It manages the existing
connection and also sets up new connections.
b. Stub: A stub is a representation (proxy) of the remote object at client. It resides in the client
system; it acts as a gateway for the client program.
c. Skeleton: This is the object which resides on the server side. Stub communicates with this
skeleton to pass requests to the remote object.
d. Remote Reference Layer: It is the layer which manages the references made by the client to
the remote object.
WORKING OF RMI:
1. When the client makes a call to the remote object, it is received by the stub which eventually
passes this request to the Remote Reference Layer (RRL).
2. When the client-side RRL receives the request, it invokes a method of the object. It passes
the request to the RRL on the server side.
3. The RRL on the server side passes the request to the Skeleton (proxy on the server) which
finally invokes the required object on the server.
4. The result is passed all the way back to the client.
GOALS OF RMI
1. To minimize the complexity of the application.
2. To preserve type safety.
3. Distributed garbage collection.
4. Minimize the difference between working with local and remote objects.
LOGIC CLOCK:
1. A logical clock is a mechanism for capturing chronological and causal relationships in a
distributed system.
2. DSs may have no physically synchronous global clock. So a logical clock is used.
4. It allows global ordering on events from different processes in such systems.
5. The first implementation, the Lamport timestamps, was proposed by Leslie Lamport in 1978.
6. In logical clock systems each process has two data structures: logical local time and logical
global time.
7. Logical local time is used by the process to mark its own events, and logical global time is the
local information about global time.
8. A special protocol is used to update logical local time after each local event, and logical
global time when processes exchange data.
9. Logical clocks are useful in computation analysis, distributed algorithm design, individual
event tracking, and exploring computational progress.
LAMPORT’S LOGICAL CLOCKS: the Lamport timestamps, was proposed by Leslie Lamport in 1978.
1. In a distributed system, clocks need not be synchronized absolutely.
2. If two processes do not interact, it is not necessary that their clocks be synchronized because
the lack of synchronization would not be observable and thus it does not cause a problem.
3. It is not important that all processes agree on what the actual time is, but that they agree on
the order in which events occur.
4. Lamport clocks are a simple technique used for determining the order of events in a DS.
5. Lamport clocks provide a partial ordering of events – specifically “happened-before” ordering.
6. If there is no “happened-before” relationship, then the events are considered concurrent.
In this algorithm:
Two types of messages ( REQUEST and REPLY) are used and communication channels are
assumed to follow FIFO order.
A site sends a REQUEST message to all other sites to get their permission to enter the critical
section.
A site sends a REPLY message to another site to give its permission to enter the critical section.
A timestamp is given to each critical section request using Lamport's logical clock.
Timestamp is used to determine priority of critical section requests. Smaller timestamp gets high
priority over larger timestamp. The execution of critical section requests is always in the order of
their timestamp.
Algorithm:
To enter Critical section:
When a site Si wants to enter the critical section, it sends a timestamped REQUEST message
to all other sites.
When a site Sj receives a REQUEST message from site Si, It sends a REPLY message to site
Si if and only if
Site Sj is neither requesting nor currently executing the critical section.
In case Site Sj is requesting, the timestamp of Site Si's request is smaller than its own request.
To execute the critical section:
Site Si enters the critical section if it has received the REPLY message from all other sites.
To release the critical section:
Upon exiting the site Si sends a REPLY message to all the deferred requests.
III) Quick decision making capability: In a distributed system, the situation keeps on changing.
2. Thus scheduling algorithm must be capable of making quick decisions about the allocation by
analyzing the situation.
V) Stability:
1. Unstable process migration results in a process thrashing.
2. So the scheduling algorithm should have stable process migration.
VI) Scalability:
1. A scheduling algorithm must be capable of handling a network with any size.
2. Thus the scheduling algorithm should be scalable to adopt any change.
VII)Fault Tolerance:
1. In a DS, when the node crashes, it should not affect the performance of the overall system.
2. The scheduling algorithm should be able to handle such situations of fault tolerance.
I) Load Estimation Policy: Estimation of the workload of a particular node is a difficult problem
for which no completely satisfactory solution exists.
A nodes workload can be estimated based on some measurable parameters below:
a. Total number of processes on the node. b. Resource demands of these processes.
c. Instruction mixes of these processes. d. Architecture and speed of the node’s processor.
II) Process Transfer Policy: 1. The idea of using this policy is to transfer processes from heavily
loaded nodes to lightly loaded nodes.
2. Most of the algos use the threshold policy to decide whether the node is lightly loaded or
heavily loaded.
3. Threshold value is a limiting value of the workload of node which can be determined by:
a. Static Policy: Each node has a predefined threshold value.
b. Dynamic Policy: The threshold value for a node is dynamically decided.
III) Location Policy: 1. When a transfer policy decides that a process is to be transferred from
one node to any other lightly loaded node, the challenge is to find the destination node.
2. Location policy determines this destination node.
VI) Migration Limiting Policy: 1. This policy determines the total number of times a process can
migrate from one node to another.
2. It includes: a. Uncontrolled Policy: The remote process is treated as a local process. Hence, it
can be migrated any number of times.
b. Controlled Policy: Migration count is used for remote processes.
Explain message oriented communication with suitable examples.
PERSISTENT COMMUNICATION:
1. In this communication, a transferred message is stored at a communication server until it is
delivered to the receiver.
2. Example: Email system - message is stored at the communication server until it is delivered.
3. It includes synchronous and asynchronous communication.
I) Persistent Synchronous Communication:
TRANSIENT COMMUNICATION
1. In this comm, msg can be stored only if the sender and receiver applications are executing
2. A msg is discarded by a comm server as soon as it cannot be delivered to the receiver.
3. Example: Router- if the router cannot deliver the msg to the next router it drops the msg
4. It includes Asynchronous and synchronous communication.
a. Asynchronous comm: Sender continues immediately after the message has been sent.
b. Synchronous comm: Sender is blocked until the request is known to be accepted.
I) Transient asynchronous communication:
2. A sends the message and continues execution (non blocking).
3. B has to be running, because if it is not running the message will be discarded.
4. Even if any router along the way is down, the message will be discarded.
5. UDP communication is an example of transient asynchronous communication
1. In process migration, an entire process has to be moved from one machine to another.
2. But this may be a crucial task, but it has good overall performance.
3. In some cases, a code needs to be migrated rather than the process.
4. Code Migration refers to transfer of a program from one node to another.
5. e.g Consider a client server system, in which the server has to manage a big database.
6. The client application has to perform many database operations.
7. In such a situation, it is better to move part of the client application to the server and the result
is sent across the network. Thus code migration is a better option.
9. CM is used to improve overall performance of the system by exploiting parallelism.
10. Example of code migration is shown below in figure 4.4. Here the server is responsible to
provide the client’s implementation, when the client binds to the server.
11. The advantage of this approach is that the client does not need to install all the required s/w.
The software can be moved in as when necessary and discarded when it is not needed.
Code migration strategies are approaches used to transfer software code from one environment
to another within distributed systems. Here are five common code migration strategies:
1. Process Migration
Definition: Process migration involves moving an entire running process, including its code,
data, and execution state, from one node to another within the distributed system.
Use Cases: Process migration is useful for load balancing, fault tolerance, and dynamic
resource allocation. For example, if one server becomes overloaded, processes can be
migrated to underutilized servers to balance the workload.
2. Thread Migration
Definition: Thread migration focuses on moving individual threads of execution between different
nodes within the distributed system.
Use Cases: Thread migration is beneficial for optimizing resource usage and improving
parallelism. For instance, threads can be migrated to nodes with available CPU resources to
maximize processing efficiency.
3. Object Migration
Definition: Object migration involves transferring objects or components of an application
between different nodes within the distributed system.
Use Cases: Object migration is commonly used for data replication, caching, and distributed
computing. For example, objects representing frequently accessed data can be migrated closer
to clients to reduce latency.
4. Container Migration
Definition: Container migration focuses on moving containerized applications or services
between different hosts or clusters within the distributed system.
Use Cases: Container migration enables flexible deployment, scaling, and resource
management. For example, containers can be migrated to new hosts to balance resource usage
or perform maintenance without downtime.
1. Crash Failures
Nodes abruptly halt or crash without warning.
This type of failure is characterized by sudden and complete loss of functionality.
Crash failures can lead to data loss or inconsistency if not handled properly.
Systems employ techniques like redundancy and checkpointing to recover from crash failures.
Detecting and isolating crashed nodes is essential for maintaining system integrity.
2. Byzantine Failures
Nodes exhibit arbitrary or malicious behavior, intentionally providing false information.
Byzantine failures can result from compromised nodes or malicious attacks.
They pose significant challenges to system reliability and trustworthiness.
Byzantine fault-tolerant algorithms are used to detect and mitigate these failures.
Consensus protocols and cryptographic techniques help ensure the integrity of communication.
3. Transient Failures
Failures occur temporarily and may resolve on their own.
They are often caused by transient environmental conditions or network glitches.
Transient failures can be challenging to reproduce and diagnose.
Implementing retry mechanisms and exponential backoff strategies can mitigate their impact.
Monitoring and logging transient failures help in identifying underlying causes.
4. Performance Failures
Nodes degrade in performance, leading to slower response times or reduced throughput.
Performance failures can result from resource contention, bottlenecks, or hardware degradation.
They negatively impact the system's scalability and user experience.
Load balancing and resource provisioning techniques help alleviate performance failures.
Monitoring system metrics and performance tuning are crucial for detecting and mitigating
performance issues.
5. Network Partitions
Segments of the network become isolated, leading to communication failures between nodes.
Network partitions can occur due to network outages, misconfigurations, or hardware failures.
They pose challenges to maintaining data consistency and synchronization.
Distributed consensus algorithms and quorum systems are used to handle network partitions.
What are different data centric consistency model
1. Data-centric consistency models aim to provide a sys,a wide consistent view of the data
store. A data store may be physically distributed across multiple machines.
3. Each process that can access data from the store is assumed to have a local or nearby copy
available of the entire store.
I) Strict Consistency Model: Any read on a data item X returns a value corresponding to the
result of the most recent write on X. This is the strongest form of memory coherence which has
the most stringent consistency requirement.
III) Linearizability: weaker than strict consistency, but stronger than sequential consistency.
2. A data store is said to be linearizable when each operation is timestamped and the result of
any execution is the same as if the (read and write) operations by all processes on the data
store were executed in some sequential order.
V) FIFO Consistency: 1. It is weaker than causal consistency, simple and easy to implement
2. This model ensures that all write operations performed by a single process are seen by all
other processes in the order in which they were performed, like a single process in a pipeline.
VI) Weak Consistency: 1. The basic idea behind the weak consistency model is enforcing
consistency on a group of memory reference operations rather than individual operations.
2. A Distributed Shared Memory system that supports the weak consistency model uses a
special variable called a synchronization variable which is used to synchronize memory.
VII)Release Consistency: This model tells whether a process is entering or exiting from a critical
section so that the system performs either of the operations when a synchronization variable is
accessed by a process.
3. Release consistency can be viewed as a synchronization mechanism based on barriers
instead of critical sections.
VIII) Entry Consistency: in this model every shared data item is associated with a
synchronization variable.
2. In order to access consistent data, each synchronization variable must be explicitly acquired.
3. Release consistency affects all shared data but entry consistency affects only those shared
data associated with a synchronization variable.
The requirement of mutual exclusion is that when process P1 is accessing a shared resource
R1, another process should not be able to access resource R1 until process P1 has finished its
operation with resource R1.
Algorithm:
Requesting the critical section (CS):
1. If the site does not have the token, then it increases its sequence number Reqi [i] and sends
a request (i, sn) message to all other sites (sn= Reqi [i])
2. When a site Sj receives this message, it sets Reqj [i] to Max (Reqj [i], sn). If Sj has the idle
token, then it sends the token to Si, if Reqj [i] = Last [i] + 1
Message Complexity:
The algorithm requires 0 message invocation if the site already holds the idle token at the time
of critical section request or maximum of N message per critical section execution.
This N messages involves
(N - 1) request messages
1 reply message
Requesting a Token:
1. The node adds “self” in its request queue.
2. Forwards the request to the parent
3. The parent adds the request to its request
queue.
4. If the parent does not hold the token and it has
not sent any requests to get the token, it sends a
request to its parent for the request.
5. This process continues till we reach the root
(holder of the token).
Releasing a Token:
1. Ultimately a request will reach the token holder.
2. The token holder will wait till it is done with the critical section.
3. It will forward the token to the node at the head of its request queue.
a. It removes the entry.
b. It updates its parent pointer.
4. Any subsequent node will do the following:
c. Dequeue the head of the queue.
d. If “self” was at the head of its request queue, then it will enter the critical section.
e. Otherwise, it forwards the token to the dequeued entry.
5. After forwarding the entry, a process needs to make a fresh request for the token, if it has
outstanding entries in its request queue.
Group communication in distributed systems refers to the process where multiple nodes or
entities communicate with each other as a group.
Instead of sending messages to individual recipients, group communication allows a sender to
transmit information to all members of a group simultaneously.
This method is essential for coordinating actions, sharing data, and ensuring that all participants
in the system are informed and synchronized. It's particularly useful in scenarios like
collaborative applications and real-time updates
Multicast communication involves sending a single message from one sender to multiple
receivers simultaneously within a network. It is particularly useful in distributed systems where
broadcasting information to a group of nodes is necessary
Multicast lets a sender share a message with a specific group of people who want it.
This way, the sender can reach many people at once, which is more efficient than sending
separate messages.
This approach is often used to send updates to subscribers or in collaborative applications
where real-time sharing of changes is needed.
Broadcast communication involves sending a message from one sender to all nodes in the
network, ensuring that every node receives the message
Broadcast is when a sender sends a message to every node in the network without targeting
specific recipients.
Messages are delivered to all nodes at once using a special address designed for this purpose.
It’s often used for network management tasks, like sending status updates, or for emergency
alerts that need to reach everyone quickly.
REPLICATION MODELS:
I) Master-Slave Model:
1. In this model, one of the copy is the master replica and all the other copies are slaves.
2. In this model, the functionality of the slaves are very limited, thus the configuration is very
simple. The slaves essentially are read-only.
4. Most of the master-slaves services ignore all the updates or modifications performed at the
slave, and “undo” the update during synchronization, making the slave identical to the master.
Replication in distributed systems refers to the process of creating and maintaining multiple copies (replicas) of
data, resources, or services across different nodes (computers or servers) within a network. The primary goal of
replication is to enhance system reliability, availability, and performance by ensuring that data or services are
accessible even if some nodes fail or become unavailable.
II) Client – Server Model:
1. It is like the master slave designates one server, which serves multiple clients.
2. In Client- server replication all the updates must be propagated first to the server, which then
updates all the other clients.
3. Since all updates must go through the server, the server acts as a physical synchronization
point.
4. In this model the conflicts which occur are always detected only at the server and only the
server needs to handle them.
III) Peer–to–Peer Model: In this model any replica can synchronize with any other replica, and
any file system modification or update can be applied at any replica.
3. These systems can propagate updates faster by making use of any available connectivity.
4. They provide a very rich and robust communication framework.
5. They are more complex in implementation and in the states they can achieve. One more
problem with this model is scalability.