Assignment 04 BigData Computing Noc23-Cs112
Assignment 04 BigData Computing Noc23-Cs112
Solution:
B) Load balancing and distribution of requests
Explanation:
Snitches in Apache Cassandra play a crucial role in determining the network topology and
facilitating efficient routing of requests. They help Cassandra distribute replicas by grouping
machines into data centers and racks. This information is essential for load balancing and
ensuring that requests are routed to the appropriate nodes, which helps in achieving high
availability, fault tolerance, and efficient data distribution across the cluster. While
encryption, compression, and schema management are important aspects of a distributed
database system like Cassandra, these functions are not the primary responsibility of
Snitches.
Solution:
C) Both Statement 1 and Statement 2 are correct.
Explanation:
Statement 1 is correct: In Cassandra, when hinted handoff is enabled and a replica is down
during a write operation, the coordinator node writes to all other replicas that are available
and keeps the write locally. It does this to ensure that the write is not lost and can be
delivered to the down replica when it comes back up.
Statement 2 is correct: Ec2Snitch is indeed an important Snitch for deployments in Amazon
EC2 environments. It is designed for use in Amazon EC2 deployments where all nodes are
typically in a single region. In Ec2Snitch, the region name refers to the data center, and the
availability zone refers to the rack within a cluster. This snitch helps Cassandra understand
the network topology within Amazon EC2, facilitating efficient routing and data replication.
Therefore, both statements are correct.
3. ZooKeeper allows distributed processes to coordinate with each other through registers,
known as _________________.
A) znodes
B) hnodes
C) vnodes
D) rnodes
Solution:
A) znodes
Explanation:
ZooKeeper allows distributed processes to coordinate with each other through registers
called "znodes." These znodes act as the basic building blocks in ZooKeeper, providing a
distributed and hierarchical namespace where data can be stored and synchronized across a
cluster of machines.
4. In Zookeeper, when a _______ is triggered the client receives a packet saying that the
znode has changed.
A) Event
B) Row
C) Watch
D) Value
Solution:
C) Watch
Explanation:
ZooKeeper supports the concept of watches. Clients can set a watch on a znodes.
5. What does the CAP theorem, proposed by Eric Brewer and subsequently proved by
Gilbert and Lynch, state about distributed systems?
A) You can always achieve all three guarantees: Consistency, Availability, and Partition
tolerance.
B) In a distributed system, you can satisfy at most 3 out of the 3 guarantees.
C) In a distributed system, you can satisfy at most 2 out of the 3 guarantees: Consistency,
Availability, and Partition tolerance.
D) The CAP theorem only applies to centralized systems, not distributed systems.
Solution:
C) In a distributed system, you can satisfy at most 2 out of the 3 guarantees: Consistency,
Availability, and Partition tolerance.
Explanation:
The CAP theorem, proposed by Eric Brewer and subsequently proved by Gilbert and Lynch,
states that in a distributed system, you can achieve at most two out of the three
guarantees: Consistency, Availability, and Partition tolerance. This theorem highlights the
trade-offs that need to be made when designing distributed systems. Depending on the
system's requirements and the nature of network partitions, you may prioritize consistency
and availability, consistency and partition tolerance, or availability and partition tolerance,
but achieving all three simultaneously can be challenging or impossible in certain scenarios.
Solution:
D) It balances between consistency and availability by requiring a quorum of replicas across
datacenters.
Explanation:
In Cassandra, consistency levels allow clients to specify the level of consistency they require
for read and write operations.
The "QUORUM" consistency level ensures a balance between consistency and availability. It
requires a quorum of replicas to acknowledge the operation. A quorum is typically
calculated as (N/2 + 1) replicas, where N is the total number of replicas.
This means that to achieve a successful read or write operation with "QUORUM," the client
must receive acknowledgments from a majority of replicas, ensuring a level of data
consistency while still allowing for reasonable availability and fault tolerance.
Options A, B, and C describe the characteristics of other consistency levels ("ANY," "ALL,"
and "ONE") in Cassandra, which have different trade-offs in terms of consistency,
availability, and speed.
7. Which strong consistency model ensures that each operation by a client is visible
instantaneously to all other clients, with real-time visibility?
A) Sequential Consistency
B) Linearizability
C) Transaction ACID properties
D) Transaction chains
Solution:
B) Linearizability
Explanation:
Linearizability (Option B) is a strong consistency model that guarantees that each operation
by a client is visible instantaneously to all other clients, providing real-time visibility.
It ensures that operations appear to be executed instantaneously and in a total order, as if
they were executed one after the other in real-time, even in a distributed system.
This strong consistency model is known for its strict and intuitive guarantees of visibility and
consistency.
Options A, C, and D refer to other consistency models or properties, but they do not provide
the same level of real-time visibility as linearizability.
Solution:
C) To discover location and state information about other nodes in the cluster.
Explanation:
In Cassandra, the gossip protocol is primarily used for discovering location and state
information about other nodes in the cluster.
It is a peer-to-peer communication protocol where nodes periodically exchange information
about themselves and the nodes they are aware of in the cluster.
Through gossip, nodes learn about the status, health, and metadata of other nodes, helping
Cassandra maintain an up-to-date and accurate view of the cluster's topology.
While data replication and consistency are important aspects of Cassandra's functionality,
these are achieved through other mechanisms like partitioning and consistency levels, not
directly through the gossip protocol.
Solution:
C) To define data centers and the number of replicas to place within each data center.
Explanation:
In Cassandra, the Network Topology Strategy is used to specify the organization of data
centers and the number of replicas to place within each data center.
It is a crucial strategy for achieving fault tolerance, data availability, and disaster recovery by
distributing data across multiple data centers and racks.
By defining the placement of replicas on distinct racks, the Network Topology Strategy helps
ensure that data remains available even in the event of node or rack failures.
Options A, B, and D are not accurate descriptions of the Network Topology Strategy's
primary objective. It focuses on data placement and replication, not data types, consistency
levels, or query optimization.
10. How does ZooKeeper achieve high throughput values, including hundreds of thousands
of operations per second for read-dominant workloads?
A) By utilizing a lock-based approach to coordination.
B) By employing eventual consistency for all operations.
C) By exposing wait-free objects to clients.
D) By using fast reads with watches and serving both from local replicas.
Solution:
D) By using fast reads with watches and serving both from local replicas.
Explanation:
ZooKeeper achieves high throughput, including hundreds of thousands of operations per
second for read-dominant workloads, through its use of fast reads with watches and serving
both from local replicas.
Fast reads provide low-latency access to frequently read data, while watches allow clients to
receive notifications of changes to data they are interested in, avoiding the need for
frequent polling.
Serving both reads and watches from local replicas reduces the latency and load on the
ZooKeeper ensemble, contributing to high throughput.
Options A, B, and C are not accurate descriptions of how ZooKeeper achieves high
throughput. ZooKeeper's approach focuses on minimizing latency and optimizing read
operations for distributed coordination.