100% found this document useful (1 vote)
127 views21 pages

Cassendra

The document provides an overview of Cassandra including its history, architecture, key features, users, and comparisons to other databases. Cassandra was originally developed at Facebook and is an open source NoSQL database that provides high availability, performance, and scalability across many servers.

Uploaded by

Nikhil Erande
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
127 views21 pages

Cassendra

The document provides an overview of Cassandra including its history, architecture, key features, users, and comparisons to other databases. Cassandra was originally developed at Facebook and is an open source NoSQL database that provides high availability, performance, and scalability across many servers.

Uploaded by

Nikhil Erande
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Cassandra

CREATED BY
NIKHIL ERANDE

GUIDENCE BY
KAVITA MAHAJAN MADAM
OVERVIEW
1.What is Casseandra?
2.History

3.Architecture

4.Key Feutures and benefits.


5.Who’s Using Cassandra.
6.Comparison of Other Databases
CASSANDRA- A Decentralized
Structured Storage System
Apache Cassandra
• Apache Cassandra was initially
developed at Facebook to power their
Inbox Search Originalaulhor(s) Avinash Lakshman.
Prasha nt Malik

• Originally designed at Facebook, Developers} Apache Software


Foundation
Initial release 2DD8
Cassandra came from Amazon's highly Stable release 3.4 ! March 8.2016
Development status Active
available Dynamo and Google's Written in Java
Operating system Cross-platform

BigTable data model Available in English


Type Database
License Apache License 2.D
Website cassandra.apache
.orgd?
DEFINATION OF
CASSANDRA
Apache Cassandra™ is a free
Distributed...
High performance.
Extremely scalable.
Fault tolerant (i.e. no single point
of failure).
post-relational database solution. Cassandra can
serve as both real-time datastore (the “system of
record”) for online/transactional applications, and as
a readintensive database for business intelligence
systems.
HISTORY OF CASSANDRA
Bigtable Dynamo

amazon.com

\ /
facebook
CASSANDRA ARCHITECTURE
 Cassandra was designed with the understanding that
system/hardware failures can and do occur
 Peer-to-peer, distributed system
 All nodes the same
 Data partitioned among all nodes in the cluster
 Custom data replication to ensure fault tolerance
 Read/Write-anywhere design
ARCHITECHTURE OVERVIEW
o Each node communicates with each other through the Gossip protocol, which
exchanges information across the cluster every second
 A commit log is used on each node to capture write activity. Data durability

is assured
 Data also written to an in-memory structure (memtable) and then to disk

once the memory structure is full (anSStable)


ARCHITECHTURE OVERVIEW
The schema used in Cassandra is mirrored after Google Bigtable. It is a row-

oriented, column structure


 A keyspace is akin to a database in the RDBMS world

 A column family is similar to an RDBMS table but is more flexible/dynamic

 A row in a column family is indexed by its key. Other columns may be

indexed as well

Portfolio Keyspace

Customer Column Family

ID Name SSN DOB


WHY CASSANDRA?
 Gigabyte to Petabyte scalability
 Linear performance gains through adding nodes
 No single point of failure
 Easy replication / data distribution
 Multi-data center and Cloud capable
 No need for separate caching layer
 Tunable data consistency
 Flexible schema design
 Data Compression
 CQL language (like SQL)
 Support for key languages and platforms
 No need for special hardware or software
BIG DATA SCALABILITY
 Capable of comfortably scaling to petabytes
 New nodes = Linear performance increases

 Add new nodes online

1 1

4 2
Double Throughput
Capabilities
2 3
NO SINGLE POINT OF FAILURE
 All nodes the same
 Customized replication affords tunable data

 redundancy

 Read/write from any node

 Can replicate data among different physical

 data center racks


EASY REPLICATION / DATA
DISTRIBUTION
 Transparently handled by Cassandra
 Multi-data center capable
 Exploits all the benefits of Cloud computing
 Able to do hybrid Cloud/On-premise setup
NO NEED FOR CACHING
SOFTWARE
 Peer-to-peer architecture removes need for special caching
layer and the programming that goes with it
 The database cluster uses the memory from all participating
nodes to cache the data assigned to each node
 No irregularities between a memory cache and database are
encountered
TUNABLE DATA CONSISTENCY
 Choose between strong and eventual consistency (All to
any node responding) depending on the need
 Can be done on a per-operation basis, and for both reads
and writes
 Handles Multi-data center operations

Writes Reads
o Any o One
o one o Quorum
o Quorom o Local_Quorum
o Local_Quorum o Each_Quorum
o Each_Quorum o All
all
DATA COMPRESSION
 Uses Google’s Snappy data compression
 algorithm

 Compresses data on a per column family level

 Internal tests at DataStax show up to 80%+ compression of


raw data
 No performance penalty (and some increases in

 overall performance due to less physical I/O)!


CQL LANGUAGE
 Very similar to RDBMS SQL syntax
 Create objects via DDL (e.g. CREATE…)

 Core DML commands supported: INSERT, UPDATE,

 DELETE

 Query data with SELECT

SELECT *
FROM USERS
WHERE STATE = 'TX';
WHO’S USING CASSANDRA?
http://www.datastax.com/cassandrausers#all
USE-CASES NETFLIX
 Manage subscriber interactions with
downloaded movies
 Need to handle distributed databases all
over the world (40 countries)
 Need better TCO than Oracle

Thousands oJ iiwits and TV episodes nduding These;


 Why Cassandra?
 Easy scale and multi-data center support
for geographical data distribution
 Data model perfect fit for customer
interaction data
 Much better TCO than Oracle or
SimpleDB
COMPARISON OF THE OTHER DATABASES
Mango DB vs Cassandra
COMPARISON OF MYSQL

 MySQL > 50 GB Data Writes Average : ~300 ms Reads


Average : ~350 ms

 Cassandra > 50 GB Data Writes Average : 0.12 ms Reads


Average : 15 ms

 Stats provided by Authors using facebook data.

You might also like