Understanding Data Consistency in Apache Cassandra: Cassandra Essentials Tutorial Series
Understanding Data Consistency in Apache Cassandra: Cassandra Essentials Tutorial Series
Agenda
!! Overview
of reading/writing data in Cassandra !! Details on how Cassandra writes data !! Review of the CAP theorem !! Tunable data consistency !! Choosing a data consistency strategy for writes !! Choosing a data consistency strategy for reads !! CQL examples of data consistency !! Where to get Cassandra
www.datastax.com
www.datastax.com
Writes in Cassandra
Data is first written to a commit log for durability !! Then written to a memtable in memory !! Once the memtable becomes full, it is flushed to an SSTable (sorted strings table) !! Writes are atomic at the row level; all columns are written or updated, or none are. RDBMS-styled transactions are not supported
!!
INSERT INTO
Commit log memtable
SSTable
Cassandra is known for being the fastest database in the industry where write operations are concerned.
www.datastax.com
Cassandra is up to:
4x better in writes! 2x better in reads! 12x better in reads/updates!
Sept, 2011: http://blog.cubrid.org/dev-platform/nosql-benchmarking/
www.datastax.com
www.datastax.com
between strong and eventual consistency (All to any node responding) depending on the need !! Can be done on a per-operation basis, and for both reads and writes !! Handles Multi-data center operations
1 6 2
Writes
!! !! !! !! !! !!
Reads
!! !! !! !! !!
www.datastax.com
www.datastax.com
Hinted Handoffs
Cassandra attempts to write a row to all replicas for that row !! If all replica nodes are not available, a hint is stored on one node to update any downed nodes with the row once they are available again !! If no replica nodes are available for a row, the use of the ANY consistency level will instruct the coordinator node to store a hint and the row data, which it passes to the replica nodes when they are available
!!
Replica 1
Replica3
Replica2
www.datastax.com
www.datastax.com
Read Repair
Cassandra ensures that frequently-read data remains consistent !! When a read is done, the coordinator node compares the data from all the remaining replicas that own the row in the background, and if they are inconsistent, issues writes to the out-of-date replicas to update the row to reflect the most recently written values. !! Read repair can be configured per column family and is enabled by default.
!!
Replica 1
Replica3
repair reque st
Replica2
www.datastax.com
CQL Examples
SELECT total_purchases FROM SALES USING CONSISTENCY QUORUM WHERE customer_id = 5 UPDATE USING SET WHERE SALES CONSISTENCY ONE total_purchases = 500000 customer_id = 4
www.datastax.com
to www.datastax.com !! DataStax makes free smart start installers available for Cassandra that include:
!! The
most up-to-date Cassandra version that is production quality !! A version of DataStax OpsCenter, which is a visual, browser-based management tool for managing and monitoring Cassandra !! Drivers and connectors for popular development languages !! Same database and application !! Automatic configuration assistance for ensuring optimal performance and setup for either standalone or cluster implementations !! Getting Started Guide
www.datastax.com
www.datastax.com
!! !! !! !! !! !! !! !! !! !!
Free Online Documentation Technical White Papers Technical Articles Tutorials User Forums User/Customer Case Studies FAQs Videos Blogs Software downloads
www.datastax.com
Cassandra Essentials Tutorial Series Understanding Data Partitioning and Replication in Apache Cassandra Thanks!