Cassandra
Cassandra
Distributed:-
Capable of running on mutiple on machines.
Decentralized:-
* There is no master every node performs the same functionality.
*No Single point of failure.
*Peer-to-Peer Connection.
*Identical Nodes.
*High Availability
Scalable:-
*Add or remove nodes without or very little performance impact.
*No manual intervention and manual rebalancing.
->Data is placed on different machines with more than one replication factor to attain a high
availability.
Reason behind its popularity:-
Features:-
->High Scalability
->Rigid Architecture
->Fast Linear-scale Performance
->Fault Tolerant
CASSANDRA CONSTRAINTS:-
->No Joins.
* Cassandra was designed to handle big data workloads across multiple nodes without a single
point of failure.
*It has a peer-to-peer distributed system across its nodes, and data is distributed among all the
nodes in a cluster.
->In Cassandra,nodes in a cluster act as replicas for a given piece of data.If some of the nodes are
responded with an out-of-date value,
->Cassandra will return the most recent value to the client.
->After returning the most recent value,Cassandra performs a read pair in the background to
update the stale values.
CASSANDRA QUERY
LANGUAGE:-
->Cassandra Query Language (CQL) is used to access Cassandra through its nodes. CQL treats
the database (Keyspace) as a container of tables.
->Programmers use cqlsh: a prompt to work with CQL or separate application language drivers.
->The client can approach any of the nodes for their read-write operations. That node
(coordinator) plays a proxy between the client and the nodes holding the data.
REPLICATION:-
Replication is must for high availability.
REPLICATION FACTOR + REPLICATION STRATEGY
REPLICATION FACTOR:-
REPLICATION STRATEGY:-
Decides which all nodes will carry those copies.
Suppose the client initiates the write requests with the coordinator .it will find the primary node.
And having the replicas as well
CASSANDRA CONSISTENCY:-
Write Consistency:-
Number of replicas on which write must succeed before returning success to client.
Read Consistency:-
Number of replicas to check for consistency before returning data to client.
GOSSIP PROTOCOL:-
Nodes Gossip with each other every second to share state info about each other
Gossiper class:-
Gossiper initiates a gossip session with any random node in the cluste
COMPONENTS OF CASSANDRA:-
Node:
A Cassandra node is a place where data is stored.
Data center:
Data center is a collection of related nodes.
Cluster:
A cluster is a component which contains one or more data centers.
Commit log:
In Cassandra, the commit log is a crash-recovery mechanism. Every write operation is
written to the commit log.
Mem-table:
A mem-table is a memory-resident data structure. After commit log, the data will be
written to the mem-table. Sometimes, for a single-column family, there will be multiple mem-
tables.
SSTable:
It is a disk file to which the data is flushed from the mem-table when its contents reach a
threshold value.
Bloom filter:
These are nothing but quick, nondeterministic, algorithms for testing whether an element is
a member of a set. It is a special kind of cache. Bloom filters are accessed after every query.
Cassandra CQLsh:-
->Cassandra CQLsh stands for Cassandra CQL shell. CQLsh specifies how to use Cassandra
commands.
-> After installation, Cassandra provides a prompt Cassandra query language shell (cqlsh). It
facilitates users to communicate with it.
These are the cql commands available in Cassandra:-
CASSANDRA KEYSPACE:-
KEYSPACE:-
Keyspaces consist of core objects called column families (which are like tables in RDBMS)
SYNTAX:-
OR
Create keyspace keyspaceName with replication ={‘class’:strategy name,
‘replication_factor’ :No of replications on different nodes}
VERIFICATION:-
-> To check whether the keyspace is created or not ,use the “ DESCRIBE” command.
->Using this command we can see all the keyspaces that are created.
There is another property of CREATE KEYSPACE in Cassandra.
Durable_writes
By default, the durable_writes properties of a table is set to true, you can also set this property to
false. But, this property cannot be set to simplex strategy.
Ex:-
VERIFICATION:-
Using a Keyspace:-
To use the created keyspace, you have to use the USE command.
SYNTAX:-
USE <identifier>
Ex:-
ALTER KEYSPACE:-
The "ALTER keyspace" command is used to alter the replication factor, strategy name and
durable writes properties in created keyspace in Cassandra.
SYNTAX:-
Old keyspace:-
New Keyspace:-
DROP KEYSPACE:-
In Cassandra, "DROP Keyspace" command is used to drop keyspaces with all the data,
column families, user defined types and indexes from Cassandra.
Cassandra takes a snapshot of keyspace before dropping the keyspace. If keyspace does not exist in
the Cassandra, Cassandra will return an error unless IF EXISTS is used.
SYNTAX:-
CREATE TABLE:-
In Cassandra, CREATE TABLE command is used to create a table. Here, column family is
used to store data just like table in RDBMS.
So, you can say that CREATE TABLE command is used to create a column family in Cassandra.
EXAMPLE:-
There
are two primary
keys:-
->Single primary key PRIMARY KEY(ColumnName)
->Compound primary key PRIMARY KEY(ColumnName)
Now read the table using SELECT command as follows:-
ALTER TABLE:-
ALTER TABLE command is used to alter the table after creating it. You can use the ALTER
command to perform two types of operations:
Add a column
Drop a column
ADD A COLUMN:-
We can add a column in the table by using the ALTER command. While adding column, you
have to aware that the column name is not conflicting with the existing column names and that the
table is not defined with compact storage option.
SYNTAX:-
DROP A COLUMN:-
->You can also drop an existing column from a table by using ALTER command.
->You should check that the table is not defined with compact storage option before dropping a
column from a table.
SYNTAX:-
TRUNCATE TABLE:-
TRUNCATE command is used to truncate a table. If you truncate a table, all the rows of the
table are deleted permanently.
SYNTAX:-
TRUNCATE table_name
EXAMPLE:-
CREATE INDEX:-
CREATE INDEX command is used to create an index on the column specified by the user.
-> If the data already exists for the column which you choose to index, Cassandra creates indexes
on the data during the 'create index' statement execution.
SYNTAX:-
You can also verify using same the command whether in that particular index is created or not.
DROP INDEX:-
DROP INDEX command is used to drop a specified index. If the index name was not specified
during index creation, then index name is TableName_ColumnName_idx.
SYNTAX:-
EXAMPLE:-
BATCH:-
In Cassandra BATCH is used to execute multiple modification statements (insert, update, delete)
simultaneously. It is very useful when you have to update some column as well as delete some of
the existing.
SYNTAX:
BEGIN BATCH
<insert-stmnt> / <update-stmnt> / <delete-stmnt>
APPLY BATCH
EXAMPLE:-
CURD OPERATION:-
CREATE DATA:-
INSERT command is used to insert data into the columns of the table.
EXAMPLE:-
READ DATA:-
SELECT command is used to read data from Cassandra table. You can use this command to
read a whole table, a single column, a particular cell etc.
SYNTAX:-
EXAMPLE:-
UPDATE DATA:-
UPDATE command is used to update data in a Cassandra table. If you see no result
after updating the data, it means data is successfully updated otherwise an error will be returned.
While updating data in Cassandra table, the following keywords are commonly used:
Where: The WHERE clause is used to select the row that you want to update.
Set: The SET clause is used to set the value.
SYNTAX:-
UPDATE table_name
SET column_name =new values,..
WHERE condition;
DELETE daTA:-
DELETE command is used to delete data from Cassandra table. You can delete the complete
table or a selected row by using this command.
SYNTAX:-
CASSANDRA COLLECTIONS:-
Cassandra collections are used to handle tasks. You can store multiple elements in collection.
There are three types of collection supported by Cassandra:
Set
List
Map
SET COLLECTIONS:-
A set collection stores group of elements that returns sorted elements when querying.
EXAMPLE:-
EXAMPLE:-
MAP COLLECTION:-
-> The map collection is used to store key value pairs.
->It maps one thing to another.
For example, if you want to save course name with its prerequisite course name, you can use
map collection.
EXAMPLE:-
Every node is assigned a unique token/range of token that determines which row will go the
which node.
Cassandra gives you a way to define your own types to extend its data model. These
user-defined types (UDTs) are easier to use than tuples since you can specify the val‐
ues by name rather than position. Create your own address type:
SECONDARY INDEXES:-
If you try to query on a column in a Cassandra table that is not part of the primary
key, you’ll soon realize that this is not allowed. For example, consider this
table, which uses the id as the primary key. Attempting to query by the stud_name
results in the following output:
As the error message instructs, you could override Cassandra’s default behavior in
order to force it to query based on this column using the ALLOW FILTERING keyword.
JSON:-
As you can see below json entry got pushed into the cassandra database:-