BDT Assignment
BDT Assignment
ASSIGNMENT 3
POORVAJA R
201501035
AI & ML
1. Sharding:
Sharding, also known as partitioning, involves breaking a large
database into smaller, more manageable pieces called shards. Each
shard is stored on a separate server or node. This distribution model
is particularly effective for horizontal scaling, as new servers can be
added to accommodate growing data volumes.
Advantages:
● Improved scalability: Allows the database to handle larger
amounts of data by distributing it across multiple servers.
● Better performance: Queries can be executed in parallel on
different shards, improving overall performance.
Challenges:
● Data consistency: Ensuring consistency across shards can be
challenging, especially in distributed systems.
● Query complexity: Some queries may require coordination
across multiple shards, which can add complexity.
2. Replication:
Replication involves creating and maintaining copies of the data on
multiple nodes. Each copy, or replica, is stored on a different server.
This model is commonly used to enhance data availability and fault
tolerance.
Advantages:
● Fault tolerance: If one server fails, data can still be retrieved from
the replicas, ensuring high availability.
● Load balancing: Read operations can be distributed among
replicas, balancing the load and improving performance.
Challenges:
● Consistency: Ensuring consistency between replicas can be
challenging, and different NoSQL databases implement different
consistency models.
● Write performance: Replication can introduce overhead for write
operations, as changes must be propagated to all replicas.
3. Consistent Hashing:
Consistent hashing is a technique that distributes data across nodes
in a way that minimizes the need for reorganization when nodes are
added or removed. Each node is assigned a range of hash values,
and data is assigned to the node whose hash range includes the
data's hash value.
Advantages:
● Scalability: Nodes can be added or removed without significant
reorganization of data, making it scalable.
● Load balancing: Evenly distributes data across nodes,
preventing hotspots.
Challenges:
● Range queries: Performing range queries may require accessing
multiple nodes, impacting performance.
● Uneven data distribution: In some cases, data may not be evenly
distributed, leading to uneven loads on nodes.
4. Gossip Protocol:
Gossip protocols are used to disseminate information about the state
of nodes in a distributed system. Nodes periodically exchange
information about their state with other nodes, facilitating a
decentralized and dynamic approach to data distribution.
Advantages:
● Decentralization: Gossip protocols operate in a decentralized
manner, making them resilient to failures.
● Adaptability: Nodes can quickly adapt to changes in the system
by exchanging information.
Challenges:
● Consistency: Achieving consistency can be challenging, as
nodes may have different views of the system state.
● Network overhead: Gossiping introduces network overhead, and
the frequency of gossip messages needs to be carefully
managed.
5. Multi-Master Replication:
In some NoSQL databases, each node in the system is a master,
meaning it can accept both read and write operations. Multi-master
replication aims to distribute the read and write load across multiple
nodes.
Advantages:
● Improved write scalability: Multiple nodes can accept write
operations, improving write throughput.
● Enhanced fault tolerance: If one node fails, other nodes can
continue processing read and write operations.
Challenges:
● Conflict resolution: Conflicts may arise when multiple nodes
accept conflicting write operations, requiring a mechanism for
resolution.
● Complexity: Managing a multi-master setup can be more
complex than other distribution models.