0% found this document useful (0 votes)
100 views10 pages

Cassandra Intro

This document provides an introduction to Cassandra, an open source, distributed database management system. It describes how Cassandra was originally developed at Facebook, its origins and influences, key features including high availability, linear scalability, and tunable consistency. The document also summarizes Cassandra's data model as a column-oriented key-value store and discusses its ability to trade off consistency for availability and partition tolerance.

Uploaded by

Alexandra Ureche
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
100 views10 pages

Cassandra Intro

This document provides an introduction to Cassandra, an open source, distributed database management system. It describes how Cassandra was originally developed at Facebook, its origins and influences, key features including high availability, linear scalability, and tunable consistency. The document also summarizes Cassandra's data model as a column-oriented key-value store and discusses its ability to trade off consistency for availability and partition tolerance.

Uploaded by

Alexandra Ureche
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Introduction to Cassandra

What Google says...


Origins
● Was developed @ Facebook to power their Inbox Search feature by Avinash Lakshman (Amazon
Dynamo) & Prashant Malik ( “ continuously available” database)
● Open Source on Google code in July 2008
● March 2009 - part of Apache Incubator project
● February 2010 top level project, first release 0.6 in April 12, 2010.
● Last release is Cassandra 3.11 (2017-06-23)

Influenced by:

● Bigtable - 2006 Google Research publication (data model)


● Dynamo - 2007 Amazon paper (distributed design)
What is Cassandra
● Fast Distributed Database
● Built-in High Availability (no SPOF) - due to replication factor
● Linear Scalability
● DataCenter Aware
● Multi-DC
● Commodity Hardware
● Easy to manage operationally
● Not a drop-in replacement for your SQL solution
● Tunable consistency
● Column Oriented Key value store

Why Cassandra: if you need scalability and high availability without compromising
performance (support for replicating across multiple datacenters is best-in-class, providing lower
latency for your users and the peace of mind of knowing that you can survive regional outages.)
Fast Database

*See http://www.planetcassandra.org/nosql-performance-benchmarks/
Distributed & Decentralized Database
Distributed: capable of running on multiple machines

Decentralized: no SPOF

- No master slave issues due to “peer to peer” architecture (“gossip” protocol)


- Single Cassandra cluster may run across geographically dispersed data centers

“Masterless” architecture” all nodes are the same.


Scalable
Cassandra scales horizontally adding more machines
that have all or some of the data on them.

Adding of nodes increases performance throughput


linearly

Increasing and decreasing nodes number happens


seamlessly.

48, 96, 144,288 instances with 10,20,20,60


clients respectively. Each client generated
20k writes/sec each having 400byte in size.
Column Oriented Key value store
MovieId Name Length Release
- Data is stored in sparse,
Relational view 1 Insurgent 144 2015
multi-dimensional hash tables
- A row can have more columns, not 2 Interstellar 98 2014

necessarily the same amount of 3 Mockingjay 122 2014


columns per row
- Each row has a unique key which Cassandra
also determines partitioning 1 Name Length Release
- No relations
Insurgent 144 2015
- Denormalize is the key word
2 Name Length Release

Interstellar 98 2015

3 Name Length Release

Mockingjay 122 2014


CAP ( Brewer’s theorem)
● Consistency (all nodes see the same data at the
same time)
● Availability (every request receives a response:
ok/not ok)
● Partition Tolerance (continues to operate despite
partially loss of system)

Trade-offs:

● Impossible to be both consistent and HA during a


network partition
● Latency between data centers also makes
consistency impractical
● Cassandra tends to favor A and P (over C)
Cassandra Version History (as of Jan 2017)
● Stable Recommended version/s
○ 3.0.14 or 2.2.10
● Older, still supported version (only for critical fixes!)
○ 2.1 original release on 16-Sep-2014 (latest 10-Oct-2016)
● Unsupported versions
○ 2.0, 1.2, 1.1, 1.0, < 1.0
● First official version
○ 0.6 on 12-Apr-2010
● Latest Stable Version
○ 3.11.0 original release on 23-June-2017

Cassandra uses a tick-tock-like release model, with even-numbered releases providing both new features and bug fixes while odd-numbered releases
will include bug fixes only

You might also like