Imp DS
Imp DS
DSM is a mechanism that manages memory across multiple nodes and makes inter-process
communications transparent to end-users. The applications will think that they are running
on shared memory. DSM is a mechanism of allowing user processes to access shared data
without using inter-process communications. In DSM every node has its own memory and
provides memory read and write services and it provides consistency protocols. The
distributed shared memory (DSM) implements the shared memory model in distributed
systems but it doesn’t have physical shared memory. All the nodes share the virtual address
space provided by the shared memory model. The Data moves between the main memories
of different nodes.
On-Chip Memory:
Bus-Based Multiprocessors:
A set of parallel wires called a bus acts as a connection between CPU and
memory.
accessing of same memory simultaneously by multiple CPUs is prevented by
using some algorithms
Cache memory is used to reduce network traffic.
Ring-Based Multiprocessors:
Apart from the above-mentioned advantages, DSM has furthermore advantages like:
Less expensive when compared to using a multiprocessor system.
No bottlenecks in data access.
Scalability i.e. Scales are pretty good with a large number of nodes.
In this, a central server maintains all shared data. It services read requests from other
nodes by returning the data items to them and write requests by updating the data and
returning acknowledgement messages.
What is distributed file system and explain Issues related to Distributed file
system .also tell about it's implementation time ?
What is a Distributed File System?
A Distributed File System (DFS) is a type of file system that allows access to
files and data across multiple physical machines as if they were on a single local
machine. The main goal of a DFS is to facilitate file sharing and storage across
a network of computers, ensuring data availability, redundancy, and efficient
data management.
Key Features of Distributed File Systems
1. Transparency: Users experience seamless file access without needing to
know the file's physical location.
C
2. Scalability: Can accommodate a growing number of nodes and data
R without performance degradation.
2. Migration Algorithm:
In contrast to central server algo where every data access request is
forwarded to location of data while in this data is shipped to location of
data access request which allows subsequent access to be performed
locally.
It allows only one node to access a shared data at a time and the whole
block containing data item migrates instead of individual item requested.
It is susceptible to thrashing where pages frequently migrate between nodes
while servicing only a few requests.
This algo provides an opportunity to integrate DSM with virtual memory
provided by operating system at individual nodes.
Distributed Scheduling:-
Distributed scheduling is the process of allocating tasks to various nodes in a
distributed system to achieve optimal performance, load balancing, and efficient
resource utilization. In a distributed system, tasks must be assigned to different
processors or nodes such that the overall execution time is minimized, and
system resources are used effectively.
Key Goals of Distributed Scheduling
1. Load Balancing: Distribute tasks evenly across nodes to prevent any
single node from becoming a bottleneck.
2. Fault Tolerance: Ensure the system can continue functioning smoothly
even if some nodes fail.
3. Minimizing Latency: Reduce the time it takes for tasks to be executed
by minimizing communication delays and ensuring tasks are executed on
appropriate nodes.
4. Maximizing Throughput: Increase the number of tasks completed in a
given time frame by optimizing resource utilization.
Distributed Scheduling Algorithms
Several algorithms have been developed to manage distributed scheduling. They
can be broadly categorized into static and dynamic algorithms.
1. Static Scheduling Algorithms
Static scheduling algorithms make decisions at compile-time before execution
begins. The allocation of tasks to nodes does not change at runtime.
Round Robin:
Tasks are assigned to nodes in a circular order.
Simple and easy to implement but does not account for node
capabilities or task complexities.
Random Assignment:
Tasks are assigned to nodes randomly.
May result in imbalanced loads and does not consider node
performance characteristics.
Min-Min Algorithm:
All tasks are initially unscheduled.
The task with the minimum completion time on the fastest
available node is scheduled first.
Iteratively assigns tasks to minimize the overall execution time.
2. Dynamic Scheduling Algorithms
Dynamic scheduling algorithms make decisions at runtime, allowing for more
flexibility and adaptability to changing system conditions.
Load Balancing:
Tasks are dynamically assigned based on the current load of each
node.
Nodes with lower load get more tasks to ensure balanced
utilization across the system.
Example: Distributed Hash Table (DHT) based load balancing.
Task Migration:
Tasks can be moved from one node to another during execution to
achieve better load distribution.
Migration decisions are based on the current load and execution
progress.
Example: Condor and MOSIX systems.
Auction-Based Scheduling:
Nodes bid for tasks based on their available resources and
capabilities.
Tasks are assigned to the highest bidding node.
Encourages efficient resource utilization.
Example Algorithm: Load Balancing with Task Migration
Here’s a simple dynamic scheduling algorithm that uses load balancing and task
migration:
1. Initialization:
Each node periodically broadcasts its load status (e.g., CPU usage,
memory usage).
2. Task Assignment:
When a new task arrives, it is assigned to the node with the lowest
load.
3. Load Monitoring:
Nodes continuously monitor their load and the load of neighboring
nodes.
4. Task Migration:
If a node becomes overloaded, it selects a task to migrate to a
neighboring node with lower load.
Migration is based on criteria such as task size, current execution
state, and the load difference between nodes.
5. Consistency and Fault Tolerance:
Migration decisions ensure that the system remains consistent.
Backup mechanisms are in place to handle node failures, such as
task replication or checkpointing.
Performance Maintenance in Distributed Scheduling
Maintaining performance in a distributed scheduling system involves several
strategies:
1. Monitoring and Metrics:
Continuously monitor node performance metrics (CPU usage,
memory usage, network latency).
Use these metrics to make informed scheduling and migration
decisions.
2. Adaptive Algorithms:
Implement adaptive algorithms that can adjust their behavior based
on current system conditions.
Example: Algorithms that adapt to workload patterns or changes in
resource availability.
3. Feedback Loops:
Use feedback loops to dynamically adjust scheduling parameters.
Example: Adjusting the frequency of task migration based on
recent load balancing performance.
4. Scalability:
Ensure that the scheduling algorithm scales well with the number
of nodes and tasks.
Distributed data structures and decentralized decision-making can
help achieve scalability.
5. Fault Tolerance:
Incorporate redundancy and failover mechanisms to handle node
failures.
Example: Task replication, checkpointing, and recovery
mechanisms.
6. Load Prediction:
Use predictive analytics to anticipate future loads and proactively
distribute tasks.
Example: Machine learning models that predict workload trends
based on historical data.
Conclusion
Distributed scheduling is a critical component of distributed systems, aiming to
optimize resource utilization, balance load, and ensure system reliability. Both
static and dynamic scheduling algorithms have their advantages and trade-offs.
Dynamic algorithms, particularly those incorporating load balancing and task
migration, are essential for adapting to real-time changes in system conditions.
Maintaining performance in distributed scheduling requires continuous
monitoring, adaptive strategies, scalability, fault tolerance, and predictive
capabilities. Through these techniques, distributed systems can achieve high
efficiency, reliability, and performance.