Unit 5 Distributed
Unit 5 Distributed
NOSQL DATABASE
centralised database architecture
DISTRIBUTED database system
A distributed system
• A distributed system is a collection of independent computers that
appear to the users of the system as a single coherent system. These
computers or nodes work together, communicate over a network, and
coordinate their activities to achieve a common goal by sharing
resources, data, and tasks.
Advantages of Distributed
System
Applications in Distributed Systems are Inherently Distributed Applications.
• Information in Distributed Systems is shared among geographically distributed users.
• Resource Sharing (Autonomous systems can share resources from remote locations).
• It has a better price performance ratio and flexibility.
• It has shorter response time and higher throughput.
• It has higher reliability and availability against component failure.
• It has extensibility so that systems can be extended in more remote locations and also incremental
growth.
• Consistency means that all the nodes (databases) inside a network will have the
same copies of a replicated data item visible for various transactions. It guarantees
that every node in a distributed cluster returns the same, most recent, and
successful write. It refers to every client having the same view of the data. There
are various types of consistency models. Consistency in CAP refers to sequential
consistency, a very strong form of consistency.
• For example, a user checks his account balance and knows that he has 500 rupees.
He spends 200 rupees on some products. Hence the amount of 200 must be
deducted changing his account balance to 300 rupees. This change must be
committed and communicated with all other databases that hold this user’s details.
Otherwise, there will be inconsistency, and the other database might show his
account balance as 500 rupees which is not true.
2. Availability
• Availability means that each read or write request for a data item will either be
processed successfully or will receive a message that the operation cannot be
completed. Every non-failing node returns a response for all the read and write
requests in a reasonable amount of time. The key word here is “every”. In simple
terms, every node (on either side of a network partition) must be able to respond in a
reasonable amount of time.
• For example, user A is a content creator having 1000 other users subscribed to his
channel. Another user B who is far away from user A tries to subscribe to user A’s
channel. Since the distance between both users are huge, they are connected to
different database node of the social media network. If the distributed system follows
the principle of availability, user B must be able to subscribe to user A’s channel.
3. Partition Tolerance
• Partition tolerance means that the system can continue operating even if the network connecting the
nodes has a fault that results in two or more partitions, where the nodes in each partition can only
communicate among each other. That means, the system continues to function and upholds its
consistency guarantees in spite of network partitions. Network partitions are a fact of life. Distributed
systems guaranteeing partition tolerance can gracefully recover from partitions once the partition
heals.
• For example, take the example of the same social media network where two users are trying to find
the subscriber count of a particular channel. Due to some technical fault, there occurs a network
outage, the second database connected by user B losses its connection with first database. Hence the
subscriber count is shown to the user B with the help of replica of data which was previously stored in
database 1 backed up prior to network outage. Hence the distributed system is partition tolerant.
The CAP theorem states that distributed databases can have at
most two of the three properties: consistency, availability, and
partition tolerance. As a result, database systems prioritize only
two properties at a time. It is possible to attend two properties only