0% found this document useful (0 votes)
26 views41 pages

Data Modelling For Apache Cassandra: Datastax C Ollege Credit

This document provides an overview of modeling data for Apache Cassandra. It begins by discussing general guidelines and choices for APIs. Then it demonstrates creating tables to model a Twitter clone application using Cassandra Query Language (CQL). Various tables are created to store users, tweets, a user timeline, user metrics, and follower relationships. The document shows how to insert and query sample data in these tables. It uses the example of modeling a Twitter application to illustrate best practices for denormalizing data and designing Cassandra data models around queries.

Uploaded by

manishsg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views41 pages

Data Modelling For Apache Cassandra: Datastax C Ollege Credit

This document provides an overview of modeling data for Apache Cassandra. It begins by discussing general guidelines and choices for APIs. Then it demonstrates creating tables to model a Twitter clone application using Cassandra Query Language (CQL). Various tables are created to store users, tweets, a user timeline, user metrics, and follower relationships. The document shows how to insert and query sample data in these tables. It uses the example of modeling a Twitter application to illustrate best practices for denormalizing data and designing Cassandra data models around queries.

Uploaded by

manishsg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

DATASTAX C*OLLEGE CREDIT:

DATA MODELLING FOR


APACHE CASSANDRA
Aaron Morton
Apache Cassandra Committer, Data Stax MVP for Apache Cassandra
@aaronmorton
www.thelastpickle.com
Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License
General Guidelines
API Choice
Example
Cassandra is good at
reading data from a row in the
order it is stored.
Typically an efcient data model will
denormalize data and use the
storage engine order.
To create a good data model
understand the queries your
application requires.
General Guidelines
API Choice
Example
Multiple APIs?
initially only a Thrift / RPC
API, used by language specic
clients.
Multiple APIs...
Cassandra Query Language
(CQL) started as a higher
level, declarative alternative.
Multiple APIs...
CQL 3 brings many changes.
Currently in Beta in
Cassandra v1.1
CQL 3 uses
a Table Orientated, Schema
Driven, Data Model.
(I said it had many changes.)
General Guidelines
API Choice
Example
Twitter Clone
Previously done with Thrift at WDCNZ

Hello @World #Cassandra - Apache
Cassandra in action
http://vimeo.com/49762233
Twitter clone...
using CQL 3 via the cqlsh
tool.
bin/cqlsh -3
Queries?
* Post Tweet to Followers
* Get Tweet by ID
* List Tweets by User
* List Tweets in User Timeline
* List Followers

Keyspace is
a namespace container.
Our Keyspace
CREATE KEYSPACE
cass_college
WITH
strategy_class = 'NetworkTopologyStrategy'
AND
strategy_options:datacenter1 = 1;

Table is
a sparse collection of well
known, ordered columns.
First Table
CREATE TABLE User
(
user_name text,
password text,
real_name text,
PRIMARY KEY (user_name)
);
Some users...
cqlsh:cass_college> INSERT INTO User
... (user_name, password, real_name)
... VALUES
... ('fred', 'sekr8t', 'Mr Foo');
cqlsh:cass_college> select * from User;
user_name | password | real_name
-----------+----------+-----------
fred | sekr8t | Mr Foo

Some users...
cqlsh:cass_college> INSERT INTO User
... (user_name, password)
... VALUES
... ('bob', 'pwd');
cqlsh:cass_college> select * from User where user_name =
'bob';
user_name | password | real_name
-----------+----------+-----------
bob | pwd | null
Data Model (so far)

User
Data Model (so far)

CF /
Value
User
user_name Primary Key
Tweet Table
CREATE TABLE Tweet
(
tweet_id bigint,
body text,
user_name text,
timestamp timestamp,
PRIMARY KEY (tweet_id)
);
Tweet Table...
cqlsh:cass_college> INSERT INTO Tweet
... (tweet_id, body, user_name, timestamp)
... VALUES
... (1, 'The Tweet','fred',1352150816917);
cqlsh:cass_college> select * from Tweet where tweet_id = 1;
tweet_id | body | timestamp | user_name
----------+-----------+--------------------------+-----------
1 | The Tweet | 2012-11-06 10:26:56+1300 | fred

Data Model (so far)

CF /
Value
User Tweet
user_name Primary Key Field
tweet_id Primary Key
UserTweets Table
CREATE TABLE UserTweets
(
tweet_id bigint,
user_name text,
body text,
timestamp timestamp,
PRIMARY KEY (user_name, tweet_id)
);
UserTweets Table...
cqlsh:cass_college> INSERT INTO UserTweets
... (tweet_id, body, user_name, timestamp)
... VALUES
... (1, 'The Tweet','fred',1352150816917);
cqlsh:cass_college> select * from UserTweets where
user_name='fred';
user_name | tweet_id | body | timestamp
-----------+----------+-----------+--------------------------
fred | 1 | The Tweet | 2012-11-06 10:26:56+1300
UserTweets Table...
cqlsh:cass_college> select * from UserTweets where
user_name='fred' and tweet_id=1;
user_name | tweet_id | body | timestamp
-----------+----------+-----------+--------------------------
fred | 1 | The Tweet | 2012-11-06 10:26:56+1300
UserTweets Table...
cqlsh:cass_college> INSERT INTO UserTweets
... (tweet_id, body, user_name, timestamp)
... VALUES
... (2, 'Second Tweet', 'fred', 1352150816918);
cqlsh:cass_college> select * from UserTweets where user_name = 'fred';
user_name | tweet_id | body | timestamp
-----------+----------+--------------+--------------------------
fred | 1 | The Tweet | 2012-11-06 10:26:56+1300
fred | 2 | Second Tweet | 2012-11-06 10:26:56+1300
UserTweets Table...
cqlsh:cass_college> select * from UserTweets where user_name = 'fred' order by
tweet_id desc;
user_name | tweet_id | body | timestamp
-----------+----------+--------------+--------------------------
fred | 2 | Second Tweet | 2012-11-06 10:26:56+1300
fred | 1 | The Tweet | 2012-11-06 10:26:56+1300
UserTimeline
CREATE TABLE UserTimeline
(
tweet_id bigint,
user_name text,
body text,
timestamp timestamp,
PRIMARY KEY (user_name, tweet_id)
);
Data Model (so far)

CF /
Value
User Tweet
User
Tweets
User
Timeline
user_name Primary Key Field Primary Key Primary Key
tweet_id Primary Key
Primary Key
Component
Primary Key
Component
UserMetrics Table
CREATE TABLE UserMetrics
(
user_name text,
tweets counter,
followers counter,
following counter,
PRIMARY KEY (user_name)
);
UserMetrics Table...
cqlsh:cass_college> UPDATE
... UserMetrics
... SET
... tweets = tweets + 1
... WHERE
... user_name = 'fred';
cqlsh:cass_college> select * from UserMetrics where user_name
= 'fred';
user_name | followers | following | tweets
-----------+-----------+-----------+--------
fred | null | null | 1
Data Model (so far)

CF /
Value
User Tweet
User
Tweets
User
Timeline
User Metrics
user_name
Primary
Key
Field
Primary
Key
Primary
Key
Primary
Key
tweet_id
Primary
Key
Primary Key
Component
Primary Key
Component
Relationships
CREATE TABLE Followers
(
user_name text,
follower text,
timestamp timestamp,
PRIMARY KEY (user_name, follower)
);
CREATE TABLE Following
(
user_name text,
following text,
timestamp timestamp,
PRIMARY KEY (user_name, following)
);
Relationships
INSERT INTO
Following
(user_name, following, timestamp)
VALUES
('bob', 'fred', 1352247749161);
INSERT INTO
Followers
(user_name, follower, timestamp)
VALUES
('fred', 'bob', 1352247749161);
Relationships
cqlsh:cass_college> select * from Following;
user_name | following | timestamp
-----------+-----------+--------------------------
bob | fred | 2012-11-07 13:22:29+1300
cqlsh:cass_college> select * from Followers;
user_name | follower | timestamp
-----------+----------+--------------------------
fred | bob | 2012-11-07 13:22:29+1300

Data Model

CF /
Value
User Tweet
User
Tweets
User
Timeline
User
Metrics
Follows
Followers
user_name
Primary
Key
Field
Primary
Key
Primary
Key
Primary
Key
Primary
Key
Field
tweet_id
Primary
Key
Primary Key
Component
Primary Key
Component
Thanks.
Aaron Morton
@aaronmorton
www.thelastpickle.com
Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License

You might also like