0% found this document useful (0 votes)
75 views25 pages

Cassandra

Cassandra is a distributed, decentralized, highly scalable NoSQL database. It is designed to manage large amounts of structured data across multiple nodes without single points of failure. Cassandra uses a peer-to-peer model with no single point of failure and replication across nodes for high availability. It offers high scalability, availability and fault tolerance through its architecture and data replication strategies.

Uploaded by

Charnath G
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views25 pages

Cassandra

Cassandra is a distributed, decentralized, highly scalable NoSQL database. It is designed to manage large amounts of structured data across multiple nodes without single points of failure. Cassandra uses a peer-to-peer model with no single point of failure and replication across nodes for high availability. It offers high scalability, availability and fault tolerance through its architecture and data replication strategies.

Uploaded by

Charnath G
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 25

APACHE CASSANDRA:-

->Cassandra is a NoSQL and distributed ,decentralized and highly scalable database.

Distributed:-
Capable of running on mutiple on machines.
Decentralized:-
* There is no master every node performs the same functionality.
*No Single point of failure.
*Peer-to-Peer Connection.
*Identical Nodes.
*High Availability
Scalable:-
*Add or remove nodes without or very little performance impact.
*No manual intervention and manual rebalancing.

->Data is placed on different machines with more than one replication factor to attain a high
availability.
Reason behind its popularity:-

->Designed to manage very large amounts of structured data.


->Provides high availability with no single point of failure.

Features:-

->High Scalability
->Rigid Architecture
->Fast Linear-scale Performance
->Fault Tolerant

When to use Cassandra:-


->Large and Growing data.
->Faster Writes.
->High Availabity is important.
->Changing data.
->Consistency is not a concern.

Cassandra Data Model:-

->Columns-name value pair.


->Row-container for columns referenced by primary key.

->Table -container for rows.

In row1 we have col1,col2,col3.but in row2 we have col1 and col4.


As you can see in this above picture in cassandra,if column2 and column3 has no values i.e null.
This Cassandra architecture saves a lot of space.

->Keyspace -Container for tables that span one or more nodes.


->Custer -Container for Keyspaces.

CASSANDRA CONSTRAINTS:-

->No Joins.

->No Foreign Key.

->No Flexible Queries.


ARCHITECTURE:-

* Cassandra was designed to handle big data workloads across multiple nodes without a single
point of failure.

*It has a peer-to-peer distributed system across its nodes, and data is distributed among all the
nodes in a cluster.

DATA REPLICATION IN CASSANDRA:-

->In Cassandra,nodes in a cluster act as replicas for a given piece of data.If some of the nodes are
responded with an out-of-date value,
->Cassandra will return the most recent value to the client.
->After returning the most recent value,Cassandra performs a read pair in the background to
update the stale values.

CASSANDRA QUERY
LANGUAGE:-

->Cassandra Query Language (CQL) is used to access Cassandra through its nodes. CQL treats
the database (Keyspace) as a container of tables.

->Programmers use cqlsh: a prompt to work with CQL or separate application language drivers.

->The client can approach any of the nodes for their read-write operations. That node
(coordinator) plays a proxy between the client and the nodes holding the data.

REPLICATION:-
Replication is must for high availability.
 REPLICATION FACTOR + REPLICATION STRATEGY
REPLICATION FACTOR:-

Decides how many copies would be there.

REPLICATION STRATEGY:-
Decides which all nodes will carry those copies.

If it is SimpleStrategy , used in a single data center.


Let’s say the replication factor is 3 then other two nodes will have the replicas.

If it is NetworkTopologyStrategy ,used in case of multiple data center.

Suppose the client initiates the write requests with the coordinator .it will find the primary node.
And having the replicas as well
CASSANDRA CONSISTENCY:-

We all know cassandra having tunable consistency.

->Increase or decrease consistecy.


->trade off between consistency or peformance.
->Configure consistency for read and write separately.

Write Consistency:-
Number of replicas on which write must succeed before returning success to client.
Read Consistency:-
Number of replicas to check for consistency before returning data to client.

GOSSIP PROTOCOL:-

Nodes Gossip with each other every second to share state info about each other

Gossiper class:-
Gossiper initiates a gossip session with any random node in the cluste

How write happens:-


CASSANDRA STORAGE & SSTABLE:-

How read happens:-

COMPONENTS OF CASSANDRA:-

Node:
A Cassandra node is a place where data is stored.

Data center:
Data center is a collection of related nodes.
Cluster:
A cluster is a component which contains one or more data centers.
Commit log:
In Cassandra, the commit log is a crash-recovery mechanism. Every write operation is
written to the commit log.
Mem-table:
A mem-table is a memory-resident data structure. After commit log, the data will be
written to the mem-table. Sometimes, for a single-column family, there will be multiple mem-
tables.
SSTable:
It is a disk file to which the data is flushed from the mem-table when its contents reach a
threshold value.
Bloom filter:
These are nothing but quick, nondeterministic, algorithms for testing whether an element is
a member of a set. It is a special kind of cache. Bloom filters are accessed after every query.

Cassandra Automatic Data Expiration


Cassandra provides functionality by which data can be automatically expired.
During data insertion, you have to specify 'ttl' value in seconds. 'ttl' value is the time to live value
for the data. After that particular amount of time, data will be automatically removed.

Cassandra CQLsh:-

->Cassandra CQLsh stands for Cassandra CQL shell. CQLsh specifies how to use Cassandra
commands.
-> After installation, Cassandra provides a prompt Cassandra query language shell (cqlsh). It
facilitates users to communicate with it.
These are the cql commands available in Cassandra:-

CASSANDRA KEYSPACE:-

Cassandra Create Keyspace:-

->Cassandra Query Language (CQL) facilitates developers to communicate with Cassandra.


->The syntax of Cassandra query language is very similar to SQL.

KEYSPACE:-

Keyspaces consist of core objects called column families (which are like tables in RDBMS)

SYNTAX:-

CREATE KEYSPACE <identifier>WITH<properties>

OR
Create keyspace keyspaceName with replication ={‘class’:strategy name,
‘replication_factor’ :No of replications on different nodes}

Different components of Cassandra Keyspace:-


Strategy: -
There are two types of strategy declaration in Cassandra syntax:
Simple Strategy:
Simple strategy is used in the case of one data center. In this strategy, the first replica is placed
on the selected node and the remaining nodes are placed in clockwise direction in the ring without
considering rack or node location.
Network Topology Strategy:
This strategy is used in the case of more than one data centers. In this strategy, you have to
provide replication factor for each data center separately.
Replication Factor:
Replication factor is the number of replicas of data placed on different nodes. More than two
replication factor are good to attain no single point of failure. So, 3 is good replication factor.
EXAMPLE:-

The keyspace called ‘chan’ is created now.

Let’s verify that the keyspace is there or not:-

VERIFICATION:-

-> To check whether the keyspace is created or not ,use the “ DESCRIBE” command.

->Using this command we can see all the keyspaces that are created.
There is another property of CREATE KEYSPACE in Cassandra.

Durable_writes
By default, the durable_writes properties of a table is set to true, you can also set this property to
false. But, this property cannot be set to simplex strategy.
Ex:-

VERIFICATION:-

Using a Keyspace:-
To use the created keyspace, you have to use the USE command.

SYNTAX:-

USE <identifier>

Ex:-
ALTER KEYSPACE:-

The "ALTER keyspace" command is used to alter the replication factor, strategy name and
durable writes properties in created keyspace in Cassandra.

SYNTAX:-

ALTER KEYSPACE <identifier> WITH <properties>.

Old keyspace:-

New Keyspace:-

DROP KEYSPACE:-
In Cassandra, "DROP Keyspace" command is used to drop keyspaces with all the data,
column families, user defined types and indexes from Cassandra.
Cassandra takes a snapshot of keyspace before dropping the keyspace. If keyspace does not exist in
the Cassandra, Cassandra will return an error unless IF EXISTS is used.
SYNTAX:-

DROP KEYSPACE KeyspaceName;

CASSANDRA TABLE INDEX:-

CREATE TABLE:-
In Cassandra, CREATE TABLE command is used to create a table. Here, column family is
used to store data just like table in RDBMS.
So, you can say that CREATE TABLE command is used to create a column family in Cassandra.
EXAMPLE:-

There
are two primary
keys:-
->Single primary key PRIMARY KEY(ColumnName)
->Compound primary key PRIMARY KEY(ColumnName)
Now read the table using SELECT command as follows:-

ALTER TABLE:-
ALTER TABLE command is used to alter the table after creating it. You can use the ALTER
command to perform two types of operations:
 Add a column
 Drop a column

ADD A COLUMN:-
We can add a column in the table by using the ALTER command. While adding column, you
have to aware that the column name is not conflicting with the existing column names and that the
table is not defined with compact storage option.

SYNTAX:-

ALTER TABLE table_name


ADD column_name datatype;

DROP A COLUMN:-
->You can also drop an existing column from a table by using ALTER command.
->You should check that the table is not defined with compact storage option before dropping a
column from a table.

SYNTAX:-

ALTER TABLE table_name


DROP column_name;
You can drop many Columns at a time like.
ALTER TABLE table_name
DROP(col1,col2,..);
EXAMPLE:-

TRUNCATE TABLE:-

TRUNCATE command is used to truncate a table. If you truncate a table, all the rows of the
table are deleted permanently.

SYNTAX:-
TRUNCATE table_name

EXAMPLE:-

CREATE INDEX:-

CREATE INDEX command is used to create an index on the column specified by the user.
-> If the data already exists for the column which you choose to index, Cassandra creates indexes
on the data during the 'create index' statement execution.

SYNTAX:-

CREATE INDEX <identifier> ON <tablename>


EXAMPLE:-

You can also verify using same the command whether in that particular index is created or not.

DROP INDEX:-

DROP INDEX command is used to drop a specified index. If the index name was not specified
during index creation, then index name is TableName_ColumnName_idx.

SYNTAX:-

DROP INDEX <identifer> or DROP INDEX IF EXISTS keyspacename.indexname

EXAMPLE:-

BATCH:-
In Cassandra BATCH is used to execute multiple modification statements (insert, update, delete)
simultaneously. It is very useful when you have to update some column as well as delete some of
the existing.

SYNTAX:

BEGIN BATCH
<insert-stmnt> / <update-stmnt> / <delete-stmnt>
APPLY BATCH
EXAMPLE:-

CASSANDRA QUERY LANGUAGE(CQL)

CURD OPERATION:-

CREATE DATA:-

INSERT command is used to insert data into the columns of the table.

EXAMPLE:-
READ DATA:-

SELECT command is used to read data from Cassandra table. You can use this command to
read a whole table, a single column, a particular cell etc.

SYNTAX:-

SELECT FROM tablename

EXAMPLE:-

UPDATE DATA:-
UPDATE command is used to update data in a Cassandra table. If you see no result
after updating the data, it means data is successfully updated otherwise an error will be returned.
While updating data in Cassandra table, the following keywords are commonly used:
 Where: The WHERE clause is used to select the row that you want to update.
 Set: The SET clause is used to set the value.
SYNTAX:-

UPDATE table_name
SET column_name =new values,..
WHERE condition;
DELETE daTA:-

DELETE command is used to delete data from Cassandra table. You can delete the complete
table or a selected row by using this command.

SYNTAX:-

DELETE FROM <identifier> where <condition>;

Let’s delete an entire row:-

CASSANDRA COLLECTIONS:-

Cassandra collections are used to handle tasks. You can store multiple elements in collection.
There are three types of collection supported by Cassandra:
 Set
 List
 Map
SET COLLECTIONS:-
A set collection stores group of elements that returns sorted elements when querying.

EXAMPLE:-

The table will be created like this:-

INSERT VALUES in the table:-

Let’s read the table:


LIST COLLECTION:-

The list collection is used when the order of elements matters.

EXAMPLE:-

MAP COLLECTION:-
-> The map collection is used to store key value pairs.
->It maps one thing to another.

For example, if you want to save course name with its prerequisite course name, you can use
map collection.

EXAMPLE:-

Let’s add data in the table;


CASSANDRA KEYS:-

PRIMARY KEY = PARTITION KEY + CLUSTERING KEY

It decides how data is It decides how data it stored


distributed across nodes. On single node.

How data is distributed:-

Every node is assigned a unique token/range of token that determines which row will go the
which node.

How tokens are generated:-


A hashing function called partitioner is used to generate token values.

CLUSTERING KEYS determines how data is stored within a single node.

Why cassandra is wide column store:-


TUPLES:-
This Provide you the ability to store an address, but it can be a bit awkward to
try to remember the positional values of the various fields of a tuple without having a
name associated with each value. There is also no way to update individual fields of a
tuple; the entire tuple must be updated.

cqlsh:my_keyspace> ALTER TABLE user ADD


address tuple<text, text, text, int>;

Then you could populate an address using the following statement:


cqlsh:my_keyspace> UPDATE user SET address =
('7712 E. Broadway', 'Tucson', 'AZ', 85715 )
WHERE first_name = 'Mary' AND last_name = 'Rodriguez';

UDT:-(USER DEFINED DATA TYPES)

Cassandra gives you a way to define your own types to extend its data model. These
user-defined types (UDTs) are easier to use than tuples since you can specify the val‐
ues by name rather than position. Create your own address type:

cqlsh:my_keyspace> CREATE TYPE address (


street text,
city text,
state text,
zip_code int);
cqlsh:my_keyspace> ALTER TABLE user ADD
addresses map<text, address>;
InvalidRequest: code=2200 [Invalid query] message="Non-frozen
collections are not allowed inside collections: map<text,
address>"

cqlsh:my_keyspace> ALTER TABLE user ADD addresses map<text,


frozen<address>>;

Now let’s add a home address :-

cqlsh:my_keyspace> UPDATE user SET addresses = addresses +


{'home': { street: '7712 E. Broadway', city: 'Tucson',
state: 'AZ', zip_code: 85715 } }

WHERE first_name = 'Mary' AND last_name = 'Rodriguez';


cqlsh:my_keyspace> SELECT addresses FROM user

WHERE first_name = 'Mary' AND last_name = 'Rodriguez';


addresses
---------------------------------------------------------
{'home': {street: '7712 E. Broadway',
city: 'Tucson', state: 'AZ', zip_code: 85715}}
(1 rows)

SECONDARY INDEXES:-
If you try to query on a column in a Cassandra table that is not part of the primary
key, you’ll soon realize that this is not allowed. For example, consider this
table, which uses the id as the primary key. Attempting to query by the stud_name
results in the following output:

As the error message instructs, you could override Cassandra’s default behavior in
order to force it to query based on this column using the ALLOW FILTERING keyword.

Drop the index and use Allow Filtering.

JSON:-

->JAVASCRIPT OBJECT NOTATION.

Query the data without using JSON


Now Query the data using JSON:-

As you can see below json entry got pushed into the cassandra database:-

You might also like