0% found this document useful (0 votes)
17 views4 pages

BDT Assignment

The document discusses different data distribution models in NoSQL databases including sharding, replication, consistent hashing, gossip protocols, and multi-master replication. It explains advantages and challenges of each model and how understanding the trade-offs is important for effectively designing distributed NoSQL databases.

Uploaded by

poorvaja.r
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views4 pages

BDT Assignment

The document discusses different data distribution models in NoSQL databases including sharding, replication, consistent hashing, gossip protocols, and multi-master replication. It explains advantages and challenges of each model and how understanding the trade-offs is important for effectively designing distributed NoSQL databases.

Uploaded by

poorvaja.r
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

BIG DATA TECHNOLOGY

ASSIGNMENT 3

POORVAJA R
201501035
AI & ML

DATA DISTRIBUTION MODEL IN NOSQL:


NoSQL databases are a category of databases that provide a
mechanism for the storage and retrieval of data that doesn't adhere to
the traditional relational database management system (RDBMS)
model. These databases are designed to handle various types of data
and to scale horizontally, which makes them suitable for large-scale
applications.

In NoSQL databases, data distribution models play a crucial role in


ensuring efficient and scalable storage and retrieval of data across
multiple nodes or servers.

1. Sharding:
Sharding, also known as partitioning, involves breaking a large
database into smaller, more manageable pieces called shards. Each
shard is stored on a separate server or node. This distribution model
is particularly effective for horizontal scaling, as new servers can be
added to accommodate growing data volumes.
Advantages:
● Improved scalability: Allows the database to handle larger
amounts of data by distributing it across multiple servers.
● Better performance: Queries can be executed in parallel on
different shards, improving overall performance.
Challenges:
● Data consistency: Ensuring consistency across shards can be
challenging, especially in distributed systems.
● Query complexity: Some queries may require coordination
across multiple shards, which can add complexity.

2. Replication:
Replication involves creating and maintaining copies of the data on
multiple nodes. Each copy, or replica, is stored on a different server.
This model is commonly used to enhance data availability and fault
tolerance.
Advantages:
● Fault tolerance: If one server fails, data can still be retrieved from
the replicas, ensuring high availability.
● Load balancing: Read operations can be distributed among
replicas, balancing the load and improving performance.
Challenges:
● Consistency: Ensuring consistency between replicas can be
challenging, and different NoSQL databases implement different
consistency models.
● Write performance: Replication can introduce overhead for write
operations, as changes must be propagated to all replicas.

3. Consistent Hashing:
Consistent hashing is a technique that distributes data across nodes
in a way that minimizes the need for reorganization when nodes are
added or removed. Each node is assigned a range of hash values,
and data is assigned to the node whose hash range includes the
data's hash value.
Advantages:
● Scalability: Nodes can be added or removed without significant
reorganization of data, making it scalable.
● Load balancing: Evenly distributes data across nodes,
preventing hotspots.
Challenges:
● Range queries: Performing range queries may require accessing
multiple nodes, impacting performance.
● Uneven data distribution: In some cases, data may not be evenly
distributed, leading to uneven loads on nodes.

4. Gossip Protocol:
Gossip protocols are used to disseminate information about the state
of nodes in a distributed system. Nodes periodically exchange
information about their state with other nodes, facilitating a
decentralized and dynamic approach to data distribution.
Advantages:
● Decentralization: Gossip protocols operate in a decentralized
manner, making them resilient to failures.
● Adaptability: Nodes can quickly adapt to changes in the system
by exchanging information.
Challenges:
● Consistency: Achieving consistency can be challenging, as
nodes may have different views of the system state.
● Network overhead: Gossiping introduces network overhead, and
the frequency of gossip messages needs to be carefully
managed.

5. Multi-Master Replication:
In some NoSQL databases, each node in the system is a master,
meaning it can accept both read and write operations. Multi-master
replication aims to distribute the read and write load across multiple
nodes.
Advantages:
● Improved write scalability: Multiple nodes can accept write
operations, improving write throughput.
● Enhanced fault tolerance: If one node fails, other nodes can
continue processing read and write operations.
Challenges:
● Conflict resolution: Conflicts may arise when multiple nodes
accept conflicting write operations, requiring a mechanism for
resolution.
● Complexity: Managing a multi-master setup can be more
complex than other distribution models.

NoSQL databases provide flexibility in terms of data distribution


models, allowing developers to choose the most suitable approach
based on the requirements of their application. Sharding, replication,
consistent hashing, gossip protocols, and multi-master replication are
just a few examples of the strategies employed to distribute data
efficiently in NoSQL databases. The choice of a particular model
depends on factors such as scalability needs, fault tolerance
requirements, and the nature of the data being stored. Each model
comes with its own set of advantages and challenges, and
understanding these trade-offs is crucial for designing and managing
distributed NoSQL databases effectively.

You might also like