Intro To Cassandra For Developers
Intro To Cassandra For Developers
Housekeeping
Courses: youtube.com/DataStaxDevs Runtime: dtsx.io/workshop
YouTube
Twitch
YouTube
2
Achievement Unlocked! - “Introduction to Cassandra”
Homework
==
Fully managed Cassandra
Without the ops!
DataStax Astra
1 Installation = 1 NODE
NODE ✔ Capacity = ~ 2-4TB
✔ Throughput = LOTS Tx/sec/core
NODE NODE
DataCenter | Ring
NODE NODE
Communication:
✔ Gossiping
NODE NODE
Apache Cassandra™ = NoSQL Distributed Database
Partition Key
Data is Distributed
USA New York 8.000.000
Country City Population
USA Los Angeles 4.000.000
FR Paris 2.230.000
DE Berlin 3.350.000
FR Toulouse 1.100.000
DE Nuremberg 500.000
RF = 3 83 17
Replication Factor 3
means that every
row is stored on 3
different nodes
67 33
50
Replication within the Ring
0
59 (data)
83 17
RF = 3
67 33
50
Replication within the Ring
83 59 (data)
17
RF = 3
67 33
50
Replication within the Ring
59 (data)
0
59 (data)
83 17
RF = 3
59 (data)
67 33
50
Node Failure
59 (data)
0
83 17 Hint
59 (data)
RF = 3
59 (data)
67 33
50
Node Failure Recovered
59 (data)
0
83 17 Hint
59 (data)
RF = 3
59 (data)
67 33
50
Immediate Consistency – A Better Way
Client Client
Write Read
CL = QUORUM CL = QUORUM
Data Distributed Everywhere
On-premise
Understanding Use Cases
High Throughput Heavy Writes Event Streaming Log Analytics
Scalability
High Volume Heavy Reads Internet of Things Other Time Series
1. Tables, Partitions
3. What’s NEXT?
Intro to Cassandra for Developers
1. Tables, Partitions
3. What’s NEXT?
Data Structure: a Cell
An intersection of a row
and a column, stores data.
Data Structure: a Row
A single, structured
data item in a table.
Data Structure: a Partition
z - - -
rows
Partition key
Example Data: Users organized by city
Keyspace killrvideo
Table users_by_city
Last First
City Address Email
Name Name
Hellson Kevin 23 Jackson St. [email protected]
Phoenix Lastfall Norda 3 Stone St [email protected]
partitions Smith Jana 3 Stone St [email protected]
Franklin George 2 Star St [email protected]
rows
Seattle Jackson Jane 2 Star St [email protected]
Jasons Judy 2 StarSt [email protected]
keyspace table
Good Examples:
Bad Example:
PRIMARY KEY ((city), last_name, first_name);
Partition Key CREATE TABLE killrvideo.users_by_city (
city text,
An identifier for a partition. last_name text,
Consists of at least one column, first_name text,
address text,
may have more if needed email text,
PRIMARY KEY ((city), last_name, first_name, email));
PARTITIONS ROWS.
Good Examples:
Bad Example:
PRIMARY KEY ((sensor_id), logged_at);
Clustering Column(s) CREATE TABLE killrvideo.users_by_city (
city text,
Used to ensure uniqueness and last_name text,
sorting order. Optional. first_name text,
address text,
email text,
PRIMARY KEY ((city), last_name, first_name, email));
BUCKETING
● Avoid big and constantly growing partitions!
● Avoid hot partitions
1. Tables, Partitions
3. What’s NEXT?
Normalization
Employees
“Database normalization is the process of
structuring a relational database in accordance userId deptId firstName lastName
with a series of so-called normal forms in order
to reduce data redundancy and improve data 1 1 Edgar Codd
integrity. It was first proposed by Edgar F. Codd
as part of his relational model.” 2 1 Raymond Boyce
Departments
departmentId department
PROS: Simple write, Data Integrity
CONS: Slow read, Complex Queries 1 Engineering
2 Math
41
Denormalization
“Denormalization is a strategy used on a Employees
database to increase performance. In
computing, denormalization is the process of userId firstName lastName department
trying to improve the read performance of a
database, at the expense of losing some write 1 Edgar Codd Engineering
performance, by adding redundant copies of
data” 2 Raymond Boyce Engineering
42
Relational Data Modelling
Data
1. Analyze raw data
Application
NoSQL Data Modelling
Application
1. Analyze user behaviour
(customer first!)
Queries
Designing Process:
Conceptual Data Model
Designing Process:
Application Workflow
Use-Case I:
● A User opens a Profile
WF2: Find comments related to target user using its identifier, get most recent first
Use-Case II:
● A User opens a Video Page
WF1: Find comments related to target video using its identifier, most recent first
Designing Process:
Mapping
comments_by_user comments_by_video
userid K videoid K
creationdate creationdate C
↑
C
↑
commentid C↑ commentid C↑
videoid userid
comment comment
Designing Process:
Physical Data Model
comments_by_user comments_by_video
1. Tables, Partitions
3. What’s NEXT?
Homework
MORE LEARNING!!!!
Developer site: datastax.com/dev
● Developer Stories
● New hands-on learning scenarios with
Katacoda
● Try it Out
● Cassandra Fundamentals
● https://www.datastax.com/learn/cassandra-funda
mentals
● New Data Modeling course
https://www.datastax.com/dev/modeling
✔ datastax.com/dev
✔ community.datastax.com
✔ Datastax Developers
YouTube Channel
58
Weekly Workshops https://www.datastax.com/workshops
59
Join our 10k Discord Community https://bit.ly/cassandra-workshop
The Fellowship of the RINGS
60
Thank you!