Apache Cassandra: Database
Apache Cassandra: Database
Apache Cassandra: Database
Database
Software Engineering Branch
3
Our process is
easy
first second third last
Basic of NoSQL, cassandra and The Different aspects of the An demo illustrating basics of Strenghs and Weaknesses, and
the instalation process Cassandra database Cassandra query language some questions
4
1
Introduction
Let’s start with some definitions.
“ I don’t always use Cassandra,
But when I do, I denormalize
-Meme.
6
NoSQL Databases
A NoSQL database (Not Only SQL) is a database that provides
a mechanism to store and retrieve data other than the tabular
relations used in relational databases. These databases are
schema-free, support easy replication, have simple API,
eventually consistent, and can handle huge amounts of data.
7
NoSQL Databases
In general, they share the following features:
8
Apache Cassandra
A distributed NoSQL database
system for managing large
amounts of structured data
across many commodity servers,
while providing highly available
service and no single point of
failure.
9
Caracteristics
Physical
Data security Data sharing
independence
Speed of Verification
Manipulability
access of integrity
Limitation
Cassandra support most of the General DBMS characteristics of the
roundness
10
The
Instalation
To strat using cassandra we need to set a
workplace for it first.
11
Requirements:
● The latest version of Java 8
● The latest version of Python 2.7 or 3.6
● Download the Software (DataStax Community Edition
for Apache Cassandra™)
12
13
Additional Tool:
You can use DataGrip for interacting with the database
instead of the CQLSH, but it does require a license key for
using it.
https://www.jetbrains.com/datagrip/
14
3 – Key Space Name
2 - Connection name
1 – Choose Cassandra
15
2
Key Principles
The “Must” Be understood of the cassandra
High CQL query
Performance interface
Key
Features Distributed
&
Column
oriented
Decentralized
Fault
Tolerance
17
Distributed &
Decentralized
● Distributed: Capable of
running on multiple machines
● Decentralized: No single point
of failure
● No master-slave issues due to
peer-to-peer architecture Read- and write-requests
(protocol "gossip") to any node
18
Elastic
Scalability
● Cassandra scales horizontally,
adding more machines that
have all or some of the data on
● Adding of nodes increase
performance throughput
linearly
Linearly scales to terabytes
● Decreasing and increasing the and petabytes of data
node count happen seamlessly
19
High Availability &
Fault Tolerance
High Availability?
● Multiple networked computers
operating in a cluster
● Facility for recognizing node
failures
No single point of failure
● Forward failing over requests due to the peer-to-peer
to another part of the system architecture
20
Column oriented
Key-Value Store R1 C1 Key C2 Key C3Key
21
Cassandra Query Language
22
Cassandra Query Language
CRETE TABLE songs ( SELECT * FROM songs
Id uuid PRIMARY KEY, title text, WHERE id = 'a3e64f8f...';
Album text, Artist text,
data blob ); SELECT * FROM songs ;
23
Cassandra Query Language
😋
24
Cassandra Query Language
The resulting table in RDMBS is this:
25
Cassandra Query Language
The resulting table in Cassandra is this:
g617Dd23… Al Kanas
26
MySQL Comparision:
Statistics based on 50 GB Data
Cassandra MySQL
27
And Much More…
28
The Data
Model
How the Database is Organized ?
29
Data Model
Cluster:
Cassandra database is distributed over several machines that operate
together. The outermost container is known as the Cluster. For failure
handling, every node contains a replica, and in case of a failure, the replica
takes charge. Cassandra arranges the nodes in a cluster, in a ring format, and
assigns data to them.
30
Data Model
Keyspace Column family Column
Outermost container Contains Super Basic data structures
for data (one or more columns or Columns with: key, value,
column families), like (but not both). timestamp
database in RDBMS.
31
🌏
Data Model
Keyspace
Column Family
Settings Column
Settings
key value timestamp
32
3
Demo
Example illustrating different part of CQL
Examples Using
CQL
The Following Slides will User Emails
demonstrate different cases with
different CQL interfaces like DDL, • Id • Id
DML etc.. • Name • email
• Phone
• Age
34
Interface DDL
• Type
DROP • Keyspace , Table
• Index , Trigger
• Type
Same as SQL, but with CREATE • Keyspace , Table
keyspaces and types • Index , Trigger
option added.
• Type
ALTER • Keyspace , Table
• Index , Trigger
35
Interface DML
SELECT INSERT
The DML Interface is
the Same With DML
Normal SQL DML
UPDATE DELETE
36
Interface DCL
USER
VIEW
38
Metadata
& Logging
How to see metadata and make logging in
Cassandra database ?
39
Metadata Using Describe
keyspace Describe keyspace name
40
Metadata Keyspace
Query the defined key spaces using the SELECT statement.
SELECT * FROM
system___schema._keyspaces
…… …… ……
41
Metadata Tables
Getting information about tables in the test keyspace.
…… …… ……
42
Metadata Columns
Getting information about columns in the users tables.
…… …… …… …… ……
43
Logging with System.log
To see what is happening in the database, you can use the
system.log file in the Cassandra home to directory to track
creational query.
{CASSANDRA HOME}/utils/cassandra.logdir_IS_UNDEFINED/
Here is an Example
{CASSANDRA HOME}/utils/cassandra.logdir_IS_UNDEFINED/
44
Logging with System.log
Here is an Example
45
Logging with Tracing
It’s an option to activate in the Cassandra database
TRACING [ ON | OFF]
USE system_traces;
SELECT * FROM events;
46
Logging with Tracing
Example:
Result:
Execute CQL3 query
Parsing insert into product(id , name) values(UUID(), 'Hello');
Preparing statement
……
47
4
Debate
Strength and weakness of Cassandra.
Strengths (1)
● Linear scale performance
The ability to add nodes without failures leads to predictable
increases In performance
● Supports multiple languages
Python, C#/.NET, C++, Ruby, Java, Go, and many more…
● Operational and developmental simplicity
There are no complex software tiers to be managed, so
administration duties are greatly simplified.
49
Strengths (2)
● Ability to deploy across data centers
Cassandra can be deployed across multiple, geographically
dispersed data centers
● Cloud availability
Installations in cloud environments
● Peer to peer architecture
Cassandra follows a peer-to-peer architecture, instead of
master-slave architecture
50
Strengths (3)
● Flexible data model
Supports modern data types with fast writes and reads
● Fault tolerance
Nodes that fail can easily be restored or replaced
● High Performance
Cassandra has demonstrated brilliant performance under
large sets of data
51
Strengths (4)
● Schema-free/Schema-less
In Cassandra, columns can be created at your will within the
rows. Cassandra data model is also famously known as a
schema-optional data model
● AP-CAP
Cassandra is typically classified as an AP system, meaning
that availability and partition tolerance are generally
considered to be more important than consistency in
Cassandra
52
Weaknesses (1)
Use Cases where is better to avoid using Cassandra
● If there are too many joins required to retrieve the data
● To store configuration data
● During compaction, things slow down and throughput
degrades
● Basic things like aggregation operators are not supported
● Range queries on partition key are not supported
53
Weaknesses (2)
Use Cases where is better to avoid using Cassandra
● If there are transactional data which require 100%
consistency
● Cassandra can update and delete data but it is not
designed to do so
54
Thanks!
Any questions?
55