0% found this document useful (0 votes)
17 views5 pages

Casandra Vs MongoDB

Uploaded by

pbecic
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views5 pages

Casandra Vs MongoDB

Uploaded by

pbecic
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Casandra vs MongoDB

Independent benchmark analyses and testing of various NoSQL platforms under big data and
production-level workloads have been performed over the years. Most of them, including the recent
ones, shows that Apache Cassandra performed significantly better than Couchbase 3.0, MongoDB 3.0
(with the Wired Tiger storage engine) in throughput and latency.

Casandra advantages:
Peer to Peer Architecture:

Cassandra follows a peer-to-peer architecture, instead of master-slave architecture. Hence, there is no


single point of failure in Cassandra. Moreover, any number of servers/nodes can be added to any
Cassandra cluster in any of the datacenters. As all the machines are at equal level, any server can
entertain request from any client. Undoubtedly, with its robust architecture and exceptional
characteristics, Cassandra has raised the bar far above than other databases.

Elastic Scalability:

One of the biggest advantages of using Cassandra is its elastic scalability. Cassandra cluster can be easily
scaled-up or scaled-down. Interestingly, any number of nodes can be added or deleted in Cassandra
cluster without much disturbance. You don’t have to restart the cluster or change queries related
Cassandra application while scaling up or down. This is why Cassandra is popular of having a very high
throughput for the highest number of nodes. As scaling happens, read and write throughput both
increase simultaneously with zero downtime or any pause to the applications.

High Availability and Fault Tolerance:

Another striking feature of Cassandra is Data replication which makes Cassandra highly available and
fault-tolerant. Replication means each data is stored at more than one location. This is because, even if
one node fails, the user should be able to retrieve the data with ease from another location. In a
Cassandra cluster, each row is replicated based on the row key. You can set the number of replicas you
want to create. Just like scaling, data replication can also happen across multiple data centers. This
further leads to high level back-up and recovery competencies in Cassandra.

High Performance:

Cassandra provides very fast writes, and they are actually faster than reads where it can transfer data
about 80-360MB/sec per node. It achieves this using two techniques.

 Cassandra keeps most of the data within memory at the responsible node, and any updates are
done in the memory and written to the persistent storage (file system) in a lazy fashion. To avoid
losing data, however, Cassandra writes all transactions to a commit log in the disk. Unlike
updating data items in the disk, writes to commit logs are append-only and, therefore, avoid
rotational delay while writing to the disk.
 Unless writes have requested full consistency, Cassandra writes data to enough nodes without
resolving any data inconsistencies where it resolves inconsistencies only at the first read. This
process is called "read repair."

Tunable Consistency:

Characteristics like Tunable Consistency, makes Cassandra an incomparable database. In Cassandra,


Consistency can be of two types:

 Eventual consistency - makes sure that the client is approved as soon as the cluster accepts the
write
 Strong consistency - any update is broadcasted to all machines or all the nodes where the
particular data is situated

You can adopt any of these, based on your requirements. You also have the freedom to blend both
eventual and strong consistency. For instance, you can go for eventual consistency in case of remote
data centers where latency is quite high and go for Strong consistency for local data centers where
latency is low.

Replication

Cassandra has much more advanced support for replication by being aware of the network topology.
The server can be set to use a specific consistency level to ensure that queries are replicated locally, or
to remote data centers. This means you can let Cassandra handle redundancy across nodes where it is
aware of which rack and data center those nodes are on. Cassandra can also monitor nodes and route
queries away from “slow” responding nodes.

Idempotency

Idempotency is easy to maintain (don’t need to do a query before an insertion) which prevent
duplication of data.

Memory requirements

Cassandra is much lighter on the memory requirements, especially if you don’t need to keep a lot of
data in cache

Casandra disadvantages and limitations:


 Aggregations in Cassandra are not supported by the Cassandra nodes - client must provide
aggregations. When the aggregation requirement spans multiple rows, Random Partitioning
makes aggregations very difficult for the client. Recommendation is to use Storm or Hadoop for
aggregations.
 Cassandra doesn’t provide a custom map/reduce implementation, but provides native Hadoop
support including for Hive (a SQL data warehouse built on Hadoop map/reduce)
 Querying options for retrieving data are very limited
 Cassandra is more complex to use, and more sensitive to queries (in fact, one large query can
very easily bring down a node)
 In-depth understanding of the database is required to effectively manage it.
 Ordering is done per-partition, and is specified at table creation time.
 A single column value may not be larger than 2GB; in practice, "single digits of MB" is a more
reasonable limit, since there is no streaming or random access of blob values.
 Collection values may not be larger than 64KB.
 The maximum number of cells (rows x columns) in a single partition is 2 billion.

Hadoop advantages:
 Hadoop Distributed File System (HDFS) - can store massive distributed unstructured data sets.
Data can be stored directly in HDFS, or it can be stored in a semi-structured format in HBase,
which allows rapid record-level data access
 MapReduce capabilities are very strong

Hadoop disadvantages:
 HDFS file system is extremely complex to set up
 Has single points of failure

Advantages of Cassandra-Hadoop combination:


 Can be implemented on the same cluster which means we can have the best of both worlds.
 Time-based and real-time running under Cassandra applications (real-time being the strength
of Cassandra) while batch-based analytics and queries that do not require a timestamp can run
on Hadoop. In this kind of ecosystem, HDFS is replaced by Cassandra and this is invisible to the
developer. One can reassign dynamically, nodes between the Cassandra and Hadoop
environments as is appropriate.
 Cassandra File System removes the single points of failure that are associated with HDFS,
namely the NameNode and Job Tracker points of failure that are associated with HDFS.

MongoDB advantages:
 Easier development and much better documentation
 Better fit for single server
 Stores BSON (basically JSON) which is easy to manage and extremely useful when working with
web applications
 Strongly consistent by default
 Scalability – mongoDB has a number of functions related to scalability
o automatic sharding (auto-partitioning of data across servers)
o reads and writes distributed over shards
o eventually-consistent reads that can be distributed over replicated servers
 Availability - data is spread across several shards (replica sets).
Typically, each shard consists of multiple Mongo Daemon instances, including an
arbiter node, a master node, and multiple slaves. If a slave node fails, the master
node automatically re-distributes the workload to the rest of the slave nodes. In
case the master node crashes, the arbiter node elects a new master.
Replica set can span across multiple datacenters but writes can only go to one primary instance
in one data-center.
 Simple and powerful indexing - Indexes work very similar to relational databases. You can create
single or compound indexes on the collection level and every document inserted into that
collection has those fields indexed. Querying by index is extremely fast so long as you have all
your indexes in memory.
 Dynamic queries, sorting, rich updates…
 MapReduce can be used for batch processing of data and aggregation operations. The
aggregation framework enables users to obtain the kind of results for which the SQL GROUP BY
clause is used.

MongoDB disadvantages:
 Global write lock limits its use for big data applications (When you perform a write operation in
MongoDB, it creates a lock on the entire database, not just the affected entries, and not just for
a particular connection. This lock blocks not only other write operations, but also read
operations.)
 Writes in MongoDB are “unsafe” by default.
Data isn’t written right away by default so it’s possible that a write operation could return
success but be lost if the server fails before the data is flushed to disk. This is how Mongo attains
high performance. If you need increased durability then you can specify a safe write which will
guarantee the data is written to disk before returning
 Memory Usage - MongoDB has the natural tendency to use up more memory because it has to
store the key names within each document. This is due to the fact that the data structure is not
necessarily consistent amongst the data objects.
 Increasing cluster size in Mongo involves a lot of manual operations done through the command
line. So, it is mandatory that you have a highly skilled system administrator for this database.

Couchbase advantages:
 Couchbase is really user/developer/admin friendly. You can see easily what’s going on in your
cluster by using web console. When things get wrong, web console is a huge advantage.
 Built-in caching mechanism - couchbase includes a Memcached component that can operate
independently (if you wish) from the document storage components.
 Low-latency read and write operations
 No single point of failure
 Document access in Couchbase is strongly consistent, query access is eventually consistent
 Scalability - easy to scale-out the cluster and support live cluster topology changes (all nodes
are identical, easy to setup and can be added or removed with no changes to the application)
Cross-datacenter replication makes it possible to scale a cluster across datacenters for better
data locality and faster data access.
 Availability - Couchbase Server maintains multiple copies (up to three replicas) of
each document in a cluster. Each server is identical and serves active and replica
documents. Data is uniformly distributed across all the nodes and the clients are
aware of the topology. If a node in the cluster fails, Couchbase Server detects the
failure and promotes replica documents on other live nodes to active.

Couchbase disadvantages and limitations:


 Max key length = 250 bytes
 Max value size = 20 Mbytes
 Max metadata = 150 bytes per document
 Max Buckets per Cluster = 10
 Max View Key Size = 4096 bytes

You might also like