0% found this document useful (0 votes)
20 views61 pages

Intro To Cassandra For Developers

Uploaded by

Adithya ghost
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views61 pages

Intro To Cassandra For Developers

Uploaded by

Adithya ghost
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

Intro to Cassandra for Developers

Housekeeping
Courses: youtube.com/DataStaxDevs Runtime: dtsx.io/workshop

YouTube

Twitch

Questions: bit.ly/cassandra-workshop Quizz: menti.com


Discord

YouTube

2
Achievement Unlocked! - “Introduction to Cassandra”
Homework
==
Fully managed Cassandra
Without the ops!
DataStax Astra

Global Scale No Operations 25 Gig Free Tier


Put your data where you need it Launch a database in the cloud
Eliminate the overhead to install,
without compromising performance, with a few clicks, no credit card
operate, and scale Cassandra.
availability, or accessibility. required.
menti.com
Apache Cassandra™ = NoSQL Distributed Database

1 Installation = 1 NODE
NODE ✔ Capacity = ~ 2-4TB
✔ Throughput = LOTS Tx/sec/core
NODE NODE

DataCenter | Ring

NODE NODE
Communication:
✔ Gossiping

NODE NODE
Apache Cassandra™ = NoSQL Distributed Database

- Big Data Ready


- Highest Availability
- Geographical Distribution
- Read/Write Performance
- Vendor Independent
Data is Distributed
Country City Population

USA New York 8.000.000


USA Los Angeles 4.000.000
FR Paris 2.230.000
DE Berlin 3.350.000
UK London 9.200.000
AU Sydney 4.900.000
DE Nuremberg 500.000
CA Toronto 6.200.000
CA Montreal 4.200.000
FR Toulouse 1.100.000
JP Tokyo 37.430.000
IN Mumbai 20.200.000

Partition Key
Data is Distributed
USA New York 8.000.000
Country City Population
USA Los Angeles 4.000.000

FR Paris 2.230.000
DE Berlin 3.350.000
FR Toulouse 1.100.000
DE Nuremberg 500.000

UK London 9.200.000 JP Tokyo 37.430.000

AU Sydney 4.900.000 CA Toronto 6.200.000


IN Mumbai 20.200.000 CA Montreal 4.200.000
Data is Replicated

RF = 3 83 17

Replication Factor 3
means that every
row is stored on 3
different nodes
67 33

50
Replication within the Ring

0
59 (data)
83 17

RF = 3

67 33

50
Replication within the Ring

83 59 (data)
17

RF = 3

67 33

50
Replication within the Ring

59 (data)
0

59 (data)
83 17

RF = 3

59 (data)
67 33

50
Node Failure

59 (data)
0

83 17 Hint
59 (data)
RF = 3

59 (data)
67 33

50
Node Failure Recovered

59 (data)
0

83 17 Hint
59 (data)
RF = 3

59 (data)
67 33

50
Immediate Consistency – A Better Way

Client Client

Write Read
CL = QUORUM CL = QUORUM
Data Distributed Everywhere

• Geographic Distribution • Hybrid-Cloud and Multi-Cloud

On-premise
Understanding Use Cases
High Throughput Heavy Writes Event Streaming Log Analytics
Scalability
High Volume Heavy Reads Internet of Things Other Time Series

No Data Loss Caching Pricing


Availability Mission-Critical
Always-on Market Data Inventory

Global Presence Banking Retail


Distributed Compliance /
GDPR Tracking / Customer
Workload Mobility
Logistics Experience

Modern Cloud API Layer Hybrid-cloud


Cloud-native Applications
Enterprise Data
Multi-cloud
Layer
https://github.com/DataStax-Academy
/Intro-to-Cassandra-for-Developers
Intro to Cassandra for Developers

1. Tables, Partitions

2. The Art of Data Modelling

3. What’s NEXT?
Intro to Cassandra for Developers

1. Tables, Partitions

2. The Art of Data Modelling

3. What’s NEXT?
Data Structure: a Cell

An intersection of a row
and a column, stores data.
Data Structure: a Row

A single, structured
data item in a table.
Data Structure: a Partition

A group of rows having the ID First Name Last Name Department


same partition token, a base
unit of access in Cassandra. 1 John Doe Wizardry

IMPORTANT: stored together, all 399 Marisha Chapez Wizardry


the rows are guaranteed to be
neighbors. 415 Maximus Flavius Wizardry
Data Structure: a Table

ID First Name Last Name Department

1 John Doe Wizardry


A group of columns and
rows storing partitions. 2 Mary Smith Dark Magic

3 Patrick McFadin DevRel


Data Structure: Overall
Keyspace columns

Table ● Tabular data model, with one twist


● Tables are organized in rows and columns
- - - -
- - - ● Groups of related rows called partitions are
x stored together on the same node (or nodes)
partitions - - -
● Each row contains a partition key
- - - ○ One or more columns that are hashed to
y - - - determine which node(s) store that data
- - -

z - - -
rows

Partition key
Example Data: Users organized by city

Keyspace killrvideo

Table users_by_city
Last First
City Address Email
Name Name
Hellson Kevin 23 Jackson St. [email protected]
Phoenix Lastfall Norda 3 Stone St [email protected]
partitions Smith Jana 3 Stone St [email protected]
Franklin George 2 Star St [email protected]
rows
Seattle Jackson Jane 2 Star St [email protected]
Jasons Judy 2 StarSt [email protected]

Partition key column Clustering columns Data columns


Creating a Table in CQL

keyspace table

CREATE TABLE killrvideo.users_by_city (


city text,
column last_name text,
definitions first_name text,
address text,
email text,
PRIMARY KEY ((city), last_name, first_name, email));

Primary key Partition key Clustering columns


Primary Key CREATE TABLE killrvideo.users_by_city (
city text,
An identifier for a row. Consists last_name text,
of at least one Partition Key and first_name text,
address text,
zero or more Clustering email text,
Columns. PRIMARY KEY ((city), last_name, first_name, email));

MUST ENSURE UNIQUENESS.


MAY DEFINE SORTING. Partition key Clustering columns

Good Examples:

PRIMARY KEY ((city), last_name, first_name, email);

PRIMARY KEY (user_id);

Bad Example:
PRIMARY KEY ((city), last_name, first_name);
Partition Key CREATE TABLE killrvideo.users_by_city (
city text,
An identifier for a partition. last_name text,
Consists of at least one column, first_name text,
address text,
may have more if needed email text,
PRIMARY KEY ((city), last_name, first_name, email));
PARTITIONS ROWS.

Partition key Clustering columns

Good Examples:

PRIMARY KEY (user_id);

PRIMARY KEY ((video_id), comment_id);

Bad Example:
PRIMARY KEY ((sensor_id), logged_at);
Clustering Column(s) CREATE TABLE killrvideo.users_by_city (
city text,
Used to ensure uniqueness and last_name text,
sorting order. Optional. first_name text,
address text,
email text,
PRIMARY KEY ((city), last_name, first_name, email));

Partition key Clustering columns

PRIMARY KEY ((city), last_name, first_name); Not Unique

PRIMARY KEY ((city), last_name, first_name, email);

PRIMARY KEY ((video_id), comment_id); Not Sorted

PRIMARY KEY ((video_id), created_at, comment_id);


The Slide of the Year Award!
Rules of a Good Partition
● Store together what you retrieve together
● Avoid big partitions
● Avoid hot partitions

Example: open a video? Get the comments in a single query!

PRIMARY KEY ((video_id), created_at, comment_id);

PRIMARY KEY ((comment_id), created_at);


The Slide of the Year Award!
Rules of a Good Partition
● Store together what you retrieve together
● Avoid big partitions
● Avoid hot partitions

PRIMARY KEY ((video_id), created_at, comment_id);

PRIMARY KEY ((country), user_id);

● Up to 2 billion cells per partition


● Up to ~100k rows in a partition
● Up to ~100MB in a Partition
The Slide of the Year Award!
Rules of a Good Partition
● Store together what you retrieve together
● Avoid big and constantly growing partitions!
● Avoid hot partitions

Example: a huge IoT infrastructure, hardware all over


● Sensor ID: UUID
the world, different sensors reporting their state
● Timestamp: Timestamp
every 10 seconds. Every sensor reports its UUID,
● Value: float
timestamp of the report, sensor’s value.

PRIMARY KEY ((sensor_id), reported_at);


The Slide of the Year Award!
Rules of a Good Partition
● Store together what you retrieve together

BUCKETING
● Avoid big and constantly growing partitions!
● Avoid hot partitions

Example: a huge IoT infrastructure, hardware all over


● Sensor ID: UUID
the world, different sensors reporting their state
● MonthYear: Integer or String
every 10 seconds. Every sensor reports its UUID,
● Timestamp: Timestamp
timestamp of the report, sensor’s value.
● Value: float

PRIMARY KEY ((sensor_id), reported_at);

PRIMARY KEY ((sensor_id, month_year), reported_at);


The Slide of the Year Award!
Rules of a Good Partition
● Store together what you retrieve together
● Avoid big partitions
● Avoid hot partitions

PRIMARY KEY (user_id);

PRIMARY KEY ((video_id), created_at, comment_id);

PRIMARY KEY ((country), user_id);


https://github.com/DataStax-Academy/Intro-t
o-Cassandra-for-Developers#2-create-a-table
Intro to Cassandra for Developers

1. Tables, Partitions

2. The Art of Data Modelling

3. What’s NEXT?
Normalization
Employees
“Database normalization is the process of
structuring a relational database in accordance userId deptId firstName lastName
with a series of so-called normal forms in order
to reduce data redundancy and improve data 1 1 Edgar Codd
integrity. It was first proposed by Edgar F. Codd
as part of his relational model.” 2 1 Raymond Boyce

Departments

departmentId department
PROS: Simple write, Data Integrity
CONS: Slow read, Complex Queries 1 Engineering

2 Math

41
Denormalization
“Denormalization is a strategy used on a Employees
database to increase performance. In
computing, denormalization is the process of userId firstName lastName department
trying to improve the read performance of a
database, at the expense of losing some write 1 Edgar Codd Engineering
performance, by adding redundant copies of
data” 2 Raymond Boyce Engineering

3 Sage Lahja Math

PROS: Quick Read, Simple Queries 4 Juniper Jones Botany


CONS: Multiple Writes, Manual Integrity

42
Relational Data Modelling
Data
1. Analyze raw data

2. Identify entities, their properties


and relations

3. Design tables, using


normalization and foreign keys. Models

4. Use JOIN when doing queries to


join normalized data from
multiple tables

Application
NoSQL Data Modelling
Application
1. Analyze user behaviour
(customer first!)

2. Identify workflows, their


dependencies and needs

3. Define Queries to fulfill these Models


workflows

4. Knowing the queries, design tables,


using denormalization.

5. Use BATCH when inserting or


updating denormalized data of Data
multiple tables
Designing Process: Step by Step
Entities & Relationships

Queries
Designing Process:
Conceptual Data Model
Designing Process:
Application Workflow

Use-Case I:
● A User opens a Profile

WF2: Find comments related to target user using its identifier, get most recent first

Use-Case II:
● A User opens a Video Page

WF1: Find comments related to target video using its identifier, most recent first
Designing Process:
Mapping

Query I: Find comments posted for a user comments_by_user


with a known id (show most recent first)

Query II: Find comments for a video with a comments_by_video


known id (show most recent first)
Designing Process:
Mapping

SELECT * FROM comments_by_user comments_by_user

WHERE userid = <some UUID>

SELECT * FROM comments_by_video comments_by_video


WHERE videoid = <some UUID>
Designing Process:
Logical Data Model

comments_by_user comments_by_video

userid K videoid K
creationdate creationdate C

C

commentid C↑ commentid C↑
videoid userid
comment comment
Designing Process:
Physical Data Model

comments_by_user comments_by_video

userid UUID K videoid UUID K


commentid TIMEUUID C
↑ commentid TIMEUUID C

videoid UUID userid UUID

comment TEXT comment TEXT


Designing Process:
Schema DDL
CREATE TABLE IF NOT EXISTS comments_by_user (
userid uuid,
commentid timeuuid,
videoid uuid,
comment text,
PRIMARY KEY ((userid), commentid)
) WITH CLUSTERING ORDER BY (commentid DESC);

CREATE TABLE IF NOT EXISTS comments_by_video (


videoid uuid,
commentid timeuuid,
userid uuid,
comment text,
PRIMARY KEY ((videoid), commentid)
) WITH CLUSTERING ORDER BY (commentid DESC);
https://github.com/DataStax-Academy/Intro-to-Cas
sandra-for-Developers#3-execute-crud-operations
menti.com
Intro to Cassandra for Developers

1. Tables, Partitions

2. The Art of Data Modelling

3. What’s NEXT?
Homework
MORE LEARNING!!!!
Developer site: datastax.com/dev

● Developer Stories
● New hands-on learning scenarios with
Katacoda
● Try it Out
● Cassandra Fundamentals
● https://www.datastax.com/learn/cassandra-funda
mentals
● New Data Modeling course
https://www.datastax.com/dev/modeling

Classic courses available at DataStax Academy


✔ Academy.datastax.com

✔ datastax.com/dev

✔ community.datastax.com

✔ Datastax Developers
YouTube Channel

58
Weekly Workshops https://www.datastax.com/workshops

59
Join our 10k Discord Community https://bit.ly/cassandra-workshop
The Fellowship of the RINGS

60
Thank you!

You might also like