Unit-2 Simplified
Unit-2 Simplified
In distributed systems, imagine you have multiple computers (or nodes) connected to each other, all
working together to accomplish some task. Now, time and global states come into play in managing how
these computers coordinate and communicate with each other.
1.Time: In a distributed system, time can be a tricky concept because each computer operates
independently and might have its own clock. So, when we talk about time in this context, we're usually
concerned with establishing some sort of synchronized time across all the computers. This helps ensure
that they all agree on when events happen and in what order. Without synchronized time, it's
challenging to coordinate actions across the system reliably.
2.Global States: When you have multiple computers doing their own thing, it becomes important to
understand the collective state of the entire system at any given point. That's what we mean by global
states. It's like taking a snapshot of all the computers and their current states (like what data they have,
what they're doing, etc.) at a particular moment in time. Understanding these global states helps in
debugging, monitoring, and ensuring the system behaves as expected.
So, in simple terms, in distributed systems, time and global states are about making sure all the
computers are on the same page regarding when things happen and what the overall state of the system
is at any given moment. This coordination ensures that the system works smoothly and reliably, despite
its distributed nature.
1.Clocks: In a distributed system, each computer has its own clock, just like how you might have a clock
on your phone or computer. But here's the catch: these clocks might not be perfectly synchronized. So,
when we talk about clocks in distributed systems, we're often concerned with making sure that even
though they might not show the exact same time, we can still understand the order of events that
happen across all the computers.
2.Events: Events are things that happen in the system, like a message being sent from one computer to
another or a process starting or ending. In a distributed system, it's crucial to understand the order in
which these events occur, especially when they're happening on different computers. So, we use the
concept of clocks to help us order these events correctly and understand the sequence of actions
happening across the system.
3.Process States: Every computer in a distributed system is running some kind of program or process.
The state of these processes refers to what they're currently doing or what data they're holding.
Understanding the states of these processes is important for keeping track of the overall behavior of the
system and ensuring that everything is working as intended.
So, in simple terms, clocks help us keep track of time (even if they're not perfectly synchronized), events
help us understand what's happening in the system and in what order, and process states tell us what
each part of the system is up to at any given moment. These concepts are essential for managing and
understanding distributed systems.
At its core, clock synchronization involves aligning the clocks of different computers to a common time
reference, thereby establishing a consistent notion of time across the system. This common reference
time allows distributed processes to agree on the order of events, despite variations in clock speeds or
network delays.
One common approach to clock synchronization is the use of network time protocols, such as the
Network Time Protocol (NTP) or the Precision Time Protocol (PTP). These protocols facilitate the
exchange of timing information between computers over a network, enabling them to adjust their clocks
accordingly. NTP, for instance, operates by periodically querying time servers on the network and
adjusting the local clock based on the received time information. PTP, on the other hand, offers higher
precision synchronization suitable for applications requiring sub-microsecond accuracy, such as industrial
automation or financial trading systems.
Another technique for clock synchronization is the use of logical clocks, such as Lamport clocks or vector
clocks. These logical clocks do not rely on physical time, but rather on the ordering of events within the
system. Lamport clocks, for example, assign a timestamp to each event based on the causality
relationship between events, allowing processes to infer the partial ordering of events even in the
absence of synchronized physical clocks. While logical clocks may not provide real-time synchronization,
they are valuable for maintaining consistency and ordering in distributed systems.
However, achieving perfect clock synchronization in a distributed system is inherently challenging due to
factors such as network latency, clock drift, and Byzantine faults. Network delays and variable
communication times can introduce uncertainty in clock adjustments, while clock drift, caused by
hardware variations or temperature fluctuations, can lead to gradual time discrepancies over time.
Byzantine faults, characterized by arbitrary or malicious behavior of nodes, pose additional challenges to
achieving consensus on time values.
In conclusion, clock synchronization in distributed systems plays a pivotal role in maintaining coherence,
consistency, and reliability across multiple computing nodes. Whether through network time protocols,
logical clocks, or fault-tolerant algorithms, the pursuit of synchronized time fosters order and
coordination in the inherently decentralized landscape of distributed computing. As distributed systems
continue to evolve and proliferate, the quest for precise and robust clock synchronization remains a
cornerstone of their design and operation.
Distributed debugging:
Distributed debugging refers to the process of diagnosing and resolving issues or errors in a distributed
system. In a distributed system, where multiple components or nodes interact with each other over a
network, identifying the root cause of a problem can be challenging due to the complexity of
interactions and the potential for issues to arise in various parts of the system simultaneously.
side headings:
• Monitoring Metrics
• Logging Mechanisms
• Observability Tools
• Event Correlation
• Anomaly Detection
• Scope Narrowing
• Reproduction Techniques
• Fault Injection
• Distributed Tracing
• Remote Debugging
• Debugging Frameworks
6. Iterative Refinement
• Hypothesis Generation
• Experimentation
Berkley algo:
The Berkeley algorithm is a method used in distributed systems to synchronize the clocks of multiple
computers. Here's how it works in simple terms:
Imagine you have a group of friends who all have slightly different clocks on their phones. Some clocks
might be a little fast, while others might be a bit slow. The Berkeley algorithm helps them agree on a
common time, even though their clocks aren't perfectly synchronized.
Leader Election: First, they choose one friend to be the leader. This friend will help coordinate the clock
synchronization process.
Time Query: The leader asks each friend to send their current time.
Average Calculation: Once the leader receives the times from everyone, it calculates the average time.
Time Adjustment: The leader then tells each friend how much they need to adjust their clock to match
the average time. For example, if one friend's clock is ahead by 2 minutes and another friend's clock is
behind by 1 minute, the leader might tell them to adjust their clocks accordingly.
Adjustment Confirmation: Each friend adjusts their clock as instructed by the leader.
By following these steps, the group of friends can synchronize their clocks to a common time, making it
easier for them to coordinate activities and events. The Berkeley algorithm helps ensure that even
though their clocks might not be perfectly accurate, they're all at least in agreement with each other.
Requesting Access: When a friend wants to use the toy, they send a request to a coordinator (let's call
them the "toy coordinator").
Permission Granted: The toy coordinator checks if the toy is currently in use. If not, they grant
permission to the friend requesting it. If the toy is in use, they ask the friend to wait.
Using the Toy: Once permission is granted, the friend can use the toy. They let everyone know that
they're using it.
Releasing the Toy: When they're done, they tell the toy coordinator they're finished, and the toy
coordinator updates everyone else that the toy is available again.
This process ensures that only one friend can use the toy at a time, preventing conflicts and ensuring fair
access for everyone. In a distributed system, this concept is applied to resources like files, databases, or
critical sections of code to ensure that only one process can access them at a time, even though they're
spread across different computers.