0% found this document useful (0 votes)
7 views13 pages

DS Unit 1

A distributed system is a network of interconnected computers that collaborate to perform tasks as a unified system even if physically separated, allowing for more efficient processing, resource sharing, and improved reliability. Key aspects include concurrency with parallel tasks, independent clocks making coordination complex, fault tolerance if one computer fails, and transparency hiding distribution complexity from users.

Uploaded by

Komal Garg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views13 pages

DS Unit 1

A distributed system is a network of interconnected computers that collaborate to perform tasks as a unified system even if physically separated, allowing for more efficient processing, resource sharing, and improved reliability. Key aspects include concurrency with parallel tasks, independent clocks making coordination complex, fault tolerance if one computer fails, and transparency hiding distribution complexity from users.

Uploaded by

Komal Garg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Unit-1

PROFESSOR EDUCATION

What's a Distributed System?

A distributed system is a network of interconnected computers that collaborate to perform tasks,


treating them as a unified system even if they are physically separated. This approach allows for
more efficient processing, resource sharing, and improved reliability.

Think of them as a team that communicates to achieve a common goal. This teamwork
allows for better efficiency and reliability.

1. Concurrency: Concurrency in distributed systems means that multiple tasks can progress
simultaneously. This is achieved through the parallel execution of processes on different
computers, enhancing overall system performance. For example, in a distributed database
system, multiple users can perform queries at the same time without waiting for each
other.
Simple Version : Multiple tasks happen at the same time, like cooking different
dishes simultaneously in a kitchen.

2. No Global Clock: Since computers in a distributed system may be spread across the
globe, maintaining a synchronized global clock is impractical. Each computer typically has
its own local clock, and mechanisms such as timestamps are used to order events. This
lack of a global clock makes coordinating actions more complex but is essential for the
independence of distributed components.
Simple Version: Since the computers can be far away, keeping them all in sync with
the same time is tricky. They each have their own time, like having watches set to
different times

3. Independent Failures: Distributed systems are designed to be resilient in the face of


individual component failures. If one computer experiences a malfunction or goes offline,
the rest of the system can continue operating. This fault tolerance is critical for ensuring
the reliability of services. For instance, in a cloud computing environment, if one server
fails, other servers can take over the workload seamlessly.
Simple Version: If one computer has a glitch, the rest can keep going. It's like if one
light goes out in a string of fairy lights, the others still shine.

4. Transparency: Transparency in distributed systems aims to make the system appear as a


single entity to users despite its underlying complexity. This involves hiding details
related to the distribution of resources, communication protocols, and system failures.
Users interact with the system without needing to understand its internal workings.
Achieving transparency enhances user experience and simplifies the use of distributed
applications.
Simple Version: If one computer has a glitch, the rest can keep going. It's like if one
light goes out in a string of fairy lights, the others still shine.

Examples of Distributed Systems:


Unit-1
1. World Wide Web (WWW): The WWW is a vast distributed system where web servers,
databases, and client devices collaborate. When a user requests a webpage, multiple
servers may be involved in processing the request. Content delivery networks (CDNs)
distribute content across various servers globally, optimizing user experience by
minimizing latency.
Imagine the internet as a vast library. When you request information (a book),
servers (librarians) from various locations collaborate to fetch and deliver the
content to you.

2. Cloud Computing: Cloud computing platforms like AWS or Azure exemplify distributed
systems. These services provide users with virtualized computing resources on-demand.
Users can deploy applications on a network of servers, and the system dynamically
allocates resources based on demand. This scalability is crucial for handling varying
workloads efficiently.
3. Cloud computing is like renting internet space for your data, allowing easy access
from anywhere. It's like storing photos in a virtual space without the bother of
physical devices.

Advantages and Applications in Depth:

Advantages:

1. Always Works (Reliability):


The reliability of distributed systems stems from their ability to continue functioning even
if individual components fail. Redundancy and fault-tolerant mechanisms ensure that
service interruptions are minimized, making distributed systems suitable for critical
applications like financial transactions.
• Simple Version: Even if one computer has a problem, the others keep going,
making sure things don't break easily.

2. Handles a Lot of Work (Scalability): Scalability allows a distributed system to handle


increased workloads efficiently. Horizontal scalability, achieved by adding more
computers to the network, is a characteristic of many distributed systems. This is
particularly beneficial for applications with unpredictable or varying demands, such as
online retail platforms during peak shopping seasons.
Simple Version: If we need more computer power, we can add more
computers to the team, making it stronger and faster.

3. Sharing is Caring (Resource Sharing): Resource sharing in distributed systems involves


the pooling of computing resources, such as storage and processing power. This sharing
optimizes resource utilization across the system. For instance, in a peer-to-peer file-
sharing network, users share their storage space and bandwidth, collectively creating a
distributed storage system.
Simple Version: Computers in the team share things like files and power,
helping each other out.

Applications:
Unit-1
1. Big Data Processing: Distributed systems play a pivotal role in big data processing,
where massive datasets are analyzed to extract valuable insights. Technologies like
Apache Hadoop distribute data processing tasks across a cluster of computers, enabling
the efficient handling of vast amounts of information.
Simple Version: When there's a ton of information to go through, like figuring out
what people like on the internet, a distributed system makes it faster and easier
2. Online Social Networks: Social networking platforms rely on distributed systems to
manage user data, handle interactions, and ensure responsiveness. The distribution of
data and processing across multiple servers enables these platforms to support a large
user base and provide seamless user experiences.
Simple Version: When you post a picture or chat with friends online, a team
of computers across the world makes sure everything runs smoothly.

Distributed Computing vs. Parallel Processing:

Distributed computing and parallel processing share the goal of improving computational
efficiency but differ in their approaches. Distributed computing involves multiple computers
collaborating over a network, often geographically dispersed. In contrast, parallel processing
utilizes multiple processors within a single computer.

Distributed systems offer several advantages over parallel processing systems:

• (Works Everywhere)Flexibility: Distributed systems are more flexible in terms of


geographical distribution. Computers in a distributed system can be located anywhere,
facilitating collaboration over large distances. This flexibility is especially valuable in
scenarios where resources need to be utilized globally.
• Simple Version: The team of computers can be anywhere globally, making it
more flexible.

• (Gets Stronger Easily)Scalability: Distributed systems can scale horizontally by adding


more computers to the network, providing a more scalable solution. In contrast, the
scalability of parallel processing is constrained by the limitations of a single machine.
• Simple Version: Adding more computers to the team makes it stronger,
while one big computer can only do so much.

• (Stays Strong Even if Some Computers Have Issues)Fault Tolerance: The


independence of distributed components enhances fault tolerance. If one part of the
system fails, other components can continue to operate. Parallel processing systems,
being confined to a single machine, are more vulnerable to complete failure if the
machine experiences issues.
Simple Version: If one computer has a problem, the others keep going, unlike one
big computer that might stop working completely

Distributed Transparency in Depth:

1. Access Transparency: Access transparency ensures that users can access


resources without being aware of their physical location. This is achieved through
mechanisms such as naming services, where users refer to resources by names
Unit-1
rather than specific addresses. For example, in a distributed file system, users can
access files without needing to know the exact server where the file is stored.

Simple Version: You can get what you need without knowing where it is.

3. Location Transparency: Location transparency hides the physical location of resources


or services from users. Users can interact with a resource or service without being
concerned about where it is situated. This transparency is crucial for systems with
dynamic resource allocation, as the actual location of resources can change over time
without affecting user interactions.

Simple Version: It doesn't matter where something is; you can use it without
worrying about where it's kept.

4. Concurrency Transparency: Concurrency transparency allows users to execute multiple


tasks concurrently without being aware of potential interference. In a distributed system,
multiple processes may be running simultaneously, and mechanisms such as locks or
transactions ensure that the concurrent execution of tasks does not lead to
inconsistencies. This transparency simplifies the development of applications that require
concurrent processing.

Simple Version: Many things happen at the same time, but you don't need to worry
about them getting mixed up.

5. Failure Transparency: Failure transparency ensures that users are unaware of failures in
the system. Even if a component fails, the system continues to operate, and users may
not experience any disruption. This is achieved through redundancy, fault tolerance
mechanisms, and error recovery strategies. For instance, in a distributed database, data
replication can ensure that data remains available even if a part of the system fails.

Simple Version: Even if one part has a problem, the rest keeps going, and you might
not even notice.

6. Replication Transparency: Replication transparency conceals from users whether a


resource or service is replicated for redundancy. Replication is a common technique in
distributed systems to enhance reliability. Users can access a resource or service without
needing to know whether there are multiple copies of it. This transparency simplifies the
management of replicated resources and ensures a consistent user experience.
Simple Version: Sometimes, there are extra copies of things to make sure nothing
gets lost. You don't need to know about these copies; everything just works.

Resource Sharing in Distributed Systems:


Unit-1
Resource sharing in distributed systems involves allowing multiple processes to access and use
resources such as files, databases, or services. This is typically achieved through mechanisms like
remote procedure calls (RPC), message passing, or shared data structures.

Example: File Sharing in a Distributed System Consider a distributed file system where
multiple nodes share access to a common set of files. In this scenario, a user on one node may
request to read or write to a file that resides on another node. The system must manage file
access, handle concurrency control, and ensure data consistency across nodes.

PROFESSOR EDUCATION
The major issues in designing a distributed system in simpler terms:

1. Heterogeneity:
• In a distributed system, we use different types of networks, operating systems,
computer hardware, and programming languages.
• The internet communication protocol helps hide these differences in networks,
and middleware (software in between) can handle other differences.
• Explanation: Imagine a group of friends using different types of phones,
laptops, and speaking different languages. Making them work together
seamlessly can be challenging.

2. Openness:
• A distributed system should be easy to expand. This means we should be able to
add new parts to the system easily.
• We need to create interfaces for different components so that new parts can be
added to the system without much trouble.
• Explanation: Think of a video game that can be updated with new levels.
The game should be designed so that adding these new levels doesn't break
the existing game and is easy to integrate.

3. Security:
• To keep information safe when it travels over a network, we use encryption. This
makes sure that shared resources are protected, and sensitive information
remains a secret.
• Security is crucial to prevent problems like denial of service attacks, which can
disrupt the normal functioning of the system.
• Explanation: Just like you use a secret code with your friend to pass notes so
others can't understand, in a distributed system, we use encryption to keep
information safe when it travels between different parts
4. Scalability:
• Scalability means a system's ability to handle more work as it grows. For example,
if we need to add more machines to manage increased work, the system should
handle it easily.
• The design should allow the system to grow by adding more nodes (machines)
and users without causing problems.
• Explanation: Think of a library that can easily accommodate more books and
more readers as it gets popular without becoming overcrowded or slow.
5. Fault Avoidance:
Unit-1
•Fault avoidance is about designing the system components in a way that reduces
the chances of errors or faults.
• Using reliable components helps improve the system's reliability, making it less
likely to experience issues
• Explanation: Similar to using strong and reliable materials to build a sturdy
bridge that doesn't break easily, in a distributed system, we design
components to minimize the chances of things going wrong.
6. Transparency:
• Transparency in a distributed system means hiding the complex details from
users.
• Users or programmers don't need to worry about where things are located, how
operations are accessed by other parts, or whether something is replicated or
moved. It's all made simple for them.
• Explanation: When you use a smartphone, you don't need to know the
complicated technology behind it. Similarly, in a distributed system, we
want to hide the complexity from users, making it easy for them to use
without understanding the technical details.
Why scalability is important in the design of a distributed system and discuss some guiding
principles :

Why Scalability is Important: Scalability is crucial in designing a distributed system because it


helps the system work well as more people or resources are added. It makes sure that as the
system grows, it continues to perform efficiently.

Guiding Principles for Designing a Scalable Distributed System:

1. Avoid Centralized Entities:


• Don't use one main point for everything. In a scalable system, if that one point
fails, the whole system might fail.
• Also, when a lot of information flows through one point, like in a wide-area
network, it can get overloaded.
• Example: Imagine a social media platform where all user data is stored in a
single server. If that server goes down, the entire platform might become
inaccessible. Instead, distribute the data across multiple servers to avoid a
single point of failure.
2. Replication and Distributed Control:
• Make copies of important things and let different parts of the system share
control. This helps the system handle more without slowing down.
• Example: In an online shopping system, product information could be
replicated across multiple servers. If one server is busy, users can still access
the information from another server, ensuring a smooth shopping
experience.
3. Avoid Centralized Algorithms:
• Centralized algorithms collect information from all over, process it in one place,
and then send it back. This can create heavy traffic and use a lot of network
space.
Unit-1
• Instead, use decentralized algorithms, where different parts work together
without relying too much on one central point.
• Example: Picture a classroom where all questions are asked to the teacher,
and the teacher answers each one individually. This can create a bottleneck.
In a distributed system, use decentralized algorithms where tasks are shared
among nodes, preventing one central point from becoming a performance
bottleneck.
4. Perform Operations on Client Workstations:
• Do as much work as possible on the computers people use. This helps the system
handle a lot of users without getting too slow.
• All computers in the system should have equal roles to keep things fair and
efficient.
• Example: Consider a group project where every member contributes equally
instead of relying heavily on one person. In a distributed system, distribute
the workload among client workstations to ensure fairness and prevent
overburdening a single point.
5. Control Costs of Physical Resources:
• Make sure that when more people or things join the system, it doesn't cost too
much. The system should be able to grow without becoming too expensive.
• Example: Consider a cloud storage service. If more users need storage space,
a scalable system allows the service provider to add servers without
significantly increasing costs. This way, the system can meet growing
demands without becoming too expensive.

These principles make sure that a distributed system can grow smoothly, work well, and not cost
too much to keep up.

Let's break down the limitations of distributed systems mentioned, and I'll provide
examples to illustrate each point:

1. Absence of Global Clock:

• Limitation Explanation: In a distributed system, there is no global clock that all


processes share. This absence makes it challenging to synchronize actions and events
across all nodes in the system.
• Example: Consider a financial trading application where different servers handle
transactions. If there were a global clock, all servers could timestamp transactions
consistently. However, without a global clock, transactions may be recorded with different
timestamps on different servers, causing inconsistencies in financial records.
• Think of a group chat with friends using different devices. If each device has its own
clock, messages might seem to arrive at different times on each device.

2. Absence of Shared Memory:

• Limitation Explanation: In a distributed system, processes do not share a common


memory. This lack of shared memory can lead to challenges in maintaining a consistent
and up-to-date view of the system.
Unit-1
• Example: Imagine a distributed database where multiple servers store pieces of customer
information. Without shared memory, a customer update performed on one server might
not be immediately visible to another server, leading to inconsistencies in the customer
data across the system.
• Picture two friends working on a project from different places. Without shared
memory, they might not see the latest changes made by the other person in real-
time

3. Inconsistent Observations:

• Limitation Explanation: Due to the absence of a global clock and shared memory,
different processes in a distributed system may observe events at different times. This
inconsistency can make it difficult to reason about the state of the system.
• Example: In a collaborative document editing system, if two users edit the same
document simultaneously, the absence of a global clock and shared memory might result
in different servers processing the edits in a different order. As a result, the final state of
the document may vary across different parts of the system.
• Imagine friends collaborating on a document online. They might see some changes
instantly but miss others because the system struggles to keep everything perfectly
in sync

4. Difficulty in Obtaining Coherent Global State:

• Limitation Explanation: Because of the absence of a global clock, obtaining a coherent


global state of the system is challenging. A coherent state means that all observations
from different processes are made at the same physical time.
• Example: In a distributed monitoring system for a network, different monitoring nodes
may observe the network status at slightly different times due to communication delays.
Obtaining a consistent snapshot of the entire network state becomes difficult without a
global clock.
• In a game with players in different locations, trying to capture a snapshot of the
game's status at a specific time becomes tricky without a shared clock.

5. Impacts on System Reasoning and Debugging:

• Limitation Explanation: The limitations in obtaining a coherent global state impact


various aspects of system management, including reasoning about system behavior,
debugging, and recovering from failures.
• Example: In a distributed e-commerce platform, a customer's order status might be
inconsistent across different servers. Debugging and understanding the sequence of
events leading to an order state become challenging without a clear and synchronized
view of the entire system.
• If an app has a glitch, finding out what went wrong in the sequence of events across
different devices becomes difficult without a shared clock.

In summary, the absence of a global clock and shared memory in distributed systems introduces
complexities that can lead to inconsistencies in observations, difficulties in obtaining a coherent
global state, and challenges in reasoning about system behavior and debugging. These
Unit-1
limitations highlight the need for careful design and coordination mechanisms in distributed
systems

PROFESSOR EDUCATION
Causal ordering:

Causal ordering of messages in a distributed system is about making sure messages are received
in the right order based on their cause-and-effect relationships. If one message depends on
another, we want to make sure they are received in the correct sequence.

Problem Scenario: Imagine if we send two messages, let's call them m1 and m2, to someone. If
the person gets m2 before m1, it might cause problems because m2 depends on m1. This can
make the system work incorrectly.

Schiper-Eggli-Sandoz Algorithm: The Schiper-Eggli-Sandoz algorithm is like a set of rules to


make sure messages in a distributed system are delivered in the right order. It doesn't care about
how many messages there are; it just helps keep things in order.

Sending a Message:

• All messages are timestamped and sent out with a list of all timestamps of messages sent
to other processes.
• Locally store the timestamp of the sent message.

Receiving a Message:

1. A message can't be delivered if there's an older message in the list of timestamps.


2. If there's no older message, follow these steps for delivery:
3. Add the timestamp of the delivered message to the list.
a. Add information about messages sent to other computers to the list.
b. Check all the messages waiting to be delivered to see if they can be sent now.

Explanation:

• Putting timestamps on messages helps us organize them in the right order.


• The list of timestamps makes sure messages are not delivered in the wrong order.
• The algorithm keeps track of what messages are supposed to go where, helping to keep
things organized.
• By checking waiting messages, the algorithm ensures that if a new message arrives, we
can send previous messages if it's now their turn.

So, the Schiper-Eggli-Sandoz algorithm is like a set of rules that helps us make sure messages are
received in the right order, respecting the cause-and-effect relationships between them. It's a
way to keep things organized and prevent problems caused by messages arriving in the wrong
sequence

Chandy-Lamport Global State Recording Algorithm:


Unit-1
The Chandy-Lamport algorithm is a method used in distributed systems to record a consistent
global state across multiple processes. The algorithm uses markers to separate messages in
channels, ensuring that the recorded state reflects a snapshot of the entire system at a particular
point in time.

The Chandy-Lamport algorithm is like a plan for teams of computers to take a group
picture or snapshot together. They use special markers to make sure everyone knows when
the picture starts. This helps in capturing a clear snapshot of what all the computers are
doing at the same time, making it easier to understand and manage the entire team's
activities in a distributed system.

Marker Sending Rule (for Process P):

1. Record Own State: Process P writes down what it's currently doing (local state).
2. Send Markers: For every connection (channel) from P to other processes:
• P sends a special marker before sending any regular messages.
• (Other processes will see this marker and know something important is
happening.)

Marker Receiving Rule (for Process Q):

1. Check Own State: If Q hasn't written down what it's doing (local state) yet:
• Q notes that it hasn't received regular messages on its incoming channel
associated with the marker.
• After noting this, Q follows the marker sending rule.
2. If State Already Recorded:
• If Q has already written down its state:
• Q notes down the messages received on its incoming channel after
writing its state but before getting the marker from P.

Explanation:

• This algorithm helps all processes in a group take a snapshot of what they're doing at the
same time.
• When a process decides to take a snapshot (using a marker), it tells others by sending
markers.
• Other processes, upon receiving a marker, note down what messages they've received,
ensuring a clear snapshot.
• The process continues until all processes have taken a snapshot.

Key Points:

• Markers show when the snapshot begins.


• Each process writes down its local state and adjusts its view based on markers.
• The algorithm allows processes to keep working on regular tasks while also taking
snapshots.
Unit-1
This method is crucial for understanding what's happening across different processes in a
distributed system, especially when you need a snapshot of their states for various reasons like
fixing issues or analyzing the system's behavior

Notations:
• B(DW): It's like a special message for doing work (computation). The 'DW' is how heavy
or important this work is.
• C(DW): This is another special message, but it's for controlling things, not for doing
actual work.

Algorithm:

Sending Work (B(DW)):

1. Sending Work Message:


• Imagine Process P has a job to do with weight W. It splits W into two parts, let's
call them W1 and W2. So, W = W1 + W2, and both W1 and W2 are more than
zero.
• P now works with only W1 (P's new weight is W1).
• P sends a special message B(W2) to another process, let's say Q. This message
says, "Hey, I'm doing some work, and it's this heavy (W2)."
2. Receiving Work Message:
• If Process Q gets the message B(W2) from P:
• Q adds the weight W2 to its own weight (W = W + W2).
• If Q wasn't doing anything before, it starts working now.

Sending Control Message (C(DW)):

1. Sending Control Message:


• Any process that is doing work (active) can decide to take a break. It sends a
control message C(W) to a controlling agent. The 'W' is the weight of the process.
• The process sets its weight to 0 (which means it's taking a break and not doing
any work).
2. Receiving Control Message:
• If the controlling agent gets the message C(W):
• The agent adds the weight W to its own weight (W = W + W).
• If the agent's weight becomes 1 after adding, it means all the work is
done, and the computation is finished.

Explanation:

• Imagine the processes are like workers, and they send messages to each other.
• Work messages (B) are sent when a process wants to do some work, and control
messages (C) are sent when a process wants to take a break.
• The algorithm makes sure everyone is working together and taking breaks as needed
until all the work is done.
Unit-1
Lamport's Logical Clocks:

Lamport's Logical Clocks are a concept in distributed systems designed to order events in a
distributed system. The logical clock values assigned to events represent a partial ordering of the
events, helping to establish a timeline in a distributed environment.

Lamport's Logical Clocks are like a shared timeline for events in a group of computers.
Each event gets a number, and these numbers help us see the order of events. It's like
giving each action a timestamp so we can understand when things happen in a network of
connected computers.

Conditions for Lamport Logical Clocks:

1. Clock Initialization:
• Each process starts with its logical clock initialized to zero.
2. Event Timestamping:
• Every time a process performs an internal event or sends a message, it increments
its logical clock value.
3. Message Reception:
• Upon receiving a message, a process sets its logical clock value to the maximum
of its current value and the timestamp of the received message, plus one.

Limitations of Lamport's Logical Clocks with Examples:

1. No Consideration of Event Execution Time:


• Limitation: Lamport's logical clocks do not account for the actual execution time
of events, assuming that events take no time to execute.
• Example: Consider two events, A and B, where A sends a message to B. If B
receives the message and processes it immediately, according to Lamport's
clocks, A and B are considered concurrent, even though B's processing time might
affect the order.
2. Assumption of Instantaneous Message Delivery:
• Limitation: Lamport's logical clocks assume instantaneous message delivery,
ignoring real-world delays in message transmission.
• Example: If a process A sends a message to B, and the message takes some time
to reach B, Lamport's clocks may not accurately reflect the temporal relationship
between the events.
3. Clock Skew and Synchronization:
• Limitation: Lamport's clocks assume synchronized clocks among processes,
neglecting the impact of clock skew.
• Example: If two processes have slightly unsynchronized clocks, the logical clocks
may not accurately represent the temporal ordering of events.
4. Overhead of Logical Clock Maintenance:
• Limitation: The process of incrementing logical clocks with each event can
introduce overhead.
• Example: In a highly dynamic system with frequent events, constantly updating
logical clocks may lead to unnecessary computational costs.
Unit-1
5. Assumption of Causality:
• Limitation: Lamport's clocks assume that events causally related are ordered,
which may not always be the case.
• Example: If events A and B are causally related, their logical clock values should
reflect this relationship. However, if there are complex dependencies, the logical
clocks may not capture the true causality.
6. Complexity in Real-Time System Modeling:
• Limitation: Lamport's clocks are not designed to model real-time systems
accurately.
• Example: In applications where precise timing is crucial, such as systems requiring
microsecond accuracy, Lamport's clocks may not provide the required level of
precision.

In summary, while Lamport's Logical Clocks provide a useful mechanism for establishing a partial
ordering of events in distributed systems, they come with limitations related to the assumptions
made about event execution, message delivery, clock synchronization, and overhead of logical
clock maintenance. In scenarios where these limitations are critical, more advanced clock
synchronization mechanisms and algorithms may be considered

You might also like