100% found this document useful (1 vote)
252 views49 pages

Cassandra Best Practices

The document discusses Apache Cassandra including an overview of its architecture and data distribution, how it handles writes and reads of data, and how consistency levels control how up-to-date data is across replicas.

Uploaded by

Eder Henry
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
252 views49 pages

Cassandra Best Practices

The document discusses Apache Cassandra including an overview of its architecture and data distribution, how it handles writes and reads of data, and how consistency levels control how up-to-date data is across replicas.

Uploaded by

Eder Henry
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Axway API Management 7.5.

x
Cassandra Best practices

#axway
Axway API Management 7.5.x Cassandra Best practices

Agenda Apache Cassandra - Overview


Apache Cassandra - Focus on consistency level
Apache Cassandra - Additional definitions
Cassandra configuration for API Mgt - Overview
Cassandra configuration for API Mgt - Single node deployment
Cassandra configuration for API Mgt - 3 Node Cluster Deployment (Single Datacenter)
Apache Cassandra - Reference
Apache Cassandra - Tools
Apache Cassandra - To go further in Cassandra understanding

© 2016 Axway 1
Apache Cassandra
Overview

© 2016 Axway 2
Cassandra overview
What is Cassandra?

• Apache Cassandra is an open source, distributed and decentralized/distributed storage system


(database), for managing very large amounts of structured data spread out across the world. It
provides highly available service with no single point of failure.
• Cassandra is classified as a NoSQL database.
• Primary objective of a NoSQL database is to have:
• simplicity of design
• horizontal scaling
• finer control over availability

© 2016 Axway 3
Cassandra overview
Architecture Cluster with 4 Cassandra nodes

• Cassandra has peer-to-peer distributed


system across its nodes, and data is
distributed among all the nodes in a
cluster. Node 1
• All the nodes in a cluster play the
same role. Each node is independent
and at the same time interconnected
to other nodes. Node 4 Node 2

• Each node in a cluster can accept


read and write requests, regardless of
where the data is actually located in
the cluster. Node 3

• When a node goes down, read/write


requests can be served from other
nodes in the network.

© 2016 Axway 4
Cassandra overview
Write process

• We consider the following data to illustrate the Cassandra features:


Jim Age:36 Car:camaro Gender:M
Carol Age:37 Car:subaru Gender:F
Johnny Age:12 Gender:M
Suzy Age:10 Gender:F
Primary key

Let’s see how Cassandra stores those data:


• Partitioners : it determines how data is distributed across the nodes in the cluster (including
replicas). Basically, a partitioner is a function for deriving a token representing a row from its
partion key, typically by hashing. Each row of data is then distributed across the cluster by the
value of the token.
Several functions are available to distribute
Jim 5e02739678… data.
Carol A9a0198010... In our example, for each primary key, the MD5
hash function is used (128 bit).
Johnny F4eb27cea7…
© 2016 Axway 5
Suzy 78b421309e…
Cassandra overview
Write process

• A range of hash is associated to each node:


Node Start End
Node 1 0xc000000…1 0x0000000…0
Node 2 0x0000000…1 0x4000000…0 Broken into ranges (In
our example we have 4
Node 3 0x4000000…1 0x8000000…0 nodes so each node = a
quarter of it)
Node 4 0x8000000…1 0xc000000…0

• Data are written in the node corresponding to the related range:

Jim 5e02739678… Node 3


Carol A9a0198010... Node 4
Johnny F4eb27cea7… Node 1
Suzy 78b421309e… Node 3

© 2016 Axway 6
Cassandra overview
Replication factor - Definition If we go back to our previous example :

• The replication factor is the total number of


replicas across the cluster.
• You define the replication factor for each
data center.
• Generally you should set the replication
strategy greater than one, but no more than • With a replication factor set to 3, there are 3
the number of nodes in the cluster. replicas for each data :
• A replication factor of 1 means that there Jim
Suzy
is only one copy of each row on one Node 1 Carol
node. Johnny

• A replication factor of 2 means two


copies of each row, where each copy is
on a different node. Jim
Suzy Node 4
Replication Carol
Node 2 Johnny
• All replicas are equally important; there is
made clockwise
Carol

no primary or master replica.

Jim Node 3 © 2016 Axway 7


Suzy
Johnny
Cassandra overview
Reading Data Read - Process

• Reading data is performed in parallel • The client is aware of every single node. It can ask every
single node (Every node can received a read question).
across a cluster. A user requests data
from any node (which becomes that • In this example, the node 4 do not have the right data.
The node 4 knows all nodes of the cluster. It will play the
user’s coordinator node ), with the user’s role of a coordinator.
query being assembled from one or more Node 1
nodes holding the necessary data. 5µs ack Primary

Read 12µs ack


Client

• If a particular node having the required Node 4


Node 2
data is down, Cassandra simply requests 12µs ack
+ Copy of 1

data from another node holding a


replicated copy of that data. Node 3
500µs ack + Copy of 1

© 2016 Axway 8
Cassandra overview
Read - Process

• If node 4 is crashed, the client will ask


the node 1

Node 1
Primary

Read
Client
Node 4
Node 2
+ Copy of 1

Node 3
+ Copy of 1

© 2016 Axway 9
Apache Cassandra
Focus on
consistency level
© 2016 Axway 10
Consistency
Definition

• Consistency refers to how up-to-date and synchronized a row of Cassandra data is on all of its
replicas. Cassandra extends the concept of eventual consistency by offering tunable
consistency. For any given read or write operation, the client application decides how consistent
the requested data must be.
• Even at low consistency levels, Cassandra writes to all replicas of the partition key, even replicas
in other data centers. The consistency level determines only the number of replicas that need to
acknowledge the write success to the client application. Typically, a client specifies a
consistency level that is less than the replication factor specified by the keyspace. This practice
ensures that the coordinating server node reports the write successful even if some replicas are
down or otherwise not responsive to the write.

Resource: https://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dmlAboutDataConsistency.html?hl=consistency
© 2016 Axway 11
Consistency (Write)
Definition Level definition

• The coordinator sends a write request to • One : A write must be written to the commit log and memtable of
all replicas that own the row being at least one replica node.
written. As long as all replica nodes are • Two : A write must be written to the commit log and memtable of
up and available, they will get the write at least two replica node.
regardless of the consistency level • Quorum* : A write must be written to the commit log and
specified by the client. The write memtable on a quorum of replica nodes across all data centers.
consistency level determines how many • Local_Quorum* : Strong consistency. A write must be written to
replica nodes must respond with a the commit log and memtable on a quorum of replica nodes in the
success acknowledgment in order for the same data center as the coordinator node. Avoids latency of inter-
data center communication.
write to be considered successful.
Success means that the data was written • All : A write must be written to the commit log and
memtable on all replica nodes in the cluster for that
to the commit log and the memtable as partition.
described in About writes.
Note* : Q=QUORUM (Q = N / 2 + 1) with N=Replication
factor
© 2016 Axway 12
Consistency (Write)
Example (1/3) Example (2/3)

• If we go back to our previous example (4 nodes / replication • If the write consistency level specified by the client is ONE (1), the
factor=3). The incoming write (1) will go to all 3 nodes (2) that first node to complete the write responds back to the coordinator
own the requested row : (3), which then proxies the success message back to the client (4). A
consistency level of ONE means that it is possible that 2 of the 3
replicas could miss the write if they happened to be down at the
time the request was made. If a replica misses a write, Cassandra
will make the row consistent later using one of its built-in repair
mechanisms: hinted handoff, read repair, or anti-entropy node repair.
Node 1
+Johnny
3 Node 1
1 5µs ack +Johnny
Write Write
Client
Write Johnny 2 Client
Node 4 Write Write Johnny 1 Write Write
Node 2 Write consistency level
+Johnny
= One Node 4 2
Write Write
Node 2
12µs ack 4
Node 3
+Johnny Write

Node 3
© 2016 Axway 13
Consistency (Write)
Example (3/3)

• If the write consistency level specified by the client is ALL (1), a


write must be written to the commit log and memtable on all
replica nodes in the cluster for that partition (2). The coordinator
node must wait the acknowledgement (3) of all replicas
(specified by the consistency level. 3 in our example) before
acknowledging the client (4):

3 Node 1
5µs ack +Johnny
Client
Write Johnny Write
Write consistency level 1 Write
= All
12µs ack 3
Node 4 2
Node 2
Write +Johnny
12µs ack
Write
4
Node 3
500µs ack +Johnny

3
© 2016 Axway 14
Consistency (Read)
Definition

• There are three types of read requests that a • For a digest request the coordinator first contacts the
coordinator node can send to a replica: replicas specified by the consistency level. The coordinator
• A direct read request sends these requests to the replicas that are currently
• A digest request responding the fastest. The nodes contacted respond with
• A background read repair request a digest of the requested data; if multiple nodes are
contacted, the rows from each replica are compared in
memory to see if they are consistent. If they are not, then
• The coordinator node contacts one replica node with the replica that has the most recent data (based on the
a direct read request. Then the coordinator sends a timestamp) is used by the coordinator to forward the result
digest request to a number of replicas determined by back to the client.
the consistency level specified by the client. The
• To ensure that all replicas have the most recent version of
digest request checks the data in the replica node to
frequently-read data, the coordinator also contacts and
make sure it is up to date. Then the coordinator
sends a digest request to all remaining replicas. If compares the data from all the remaining replicas that own
any replica nodes have out of date data, a the row in the background. If the replicas are inconsistent,
background read repair request is sent. Read repair the coordinator issues writes to the out-of-date replicas to
requests ensure that the requested row is made update the row to the most recent values. This process is
consistent on all replicas. known as read repair. Read repair can be configured per
table for non-QUORUM consistency levels (using
read_repair_chance), and is enabled by default
© 2016 Axway 15
Consistency (Read)
Level definition

• One : Returns a response from the closest replica, as determined by the snitch. By default, a read repair runs in the background to
make the other replicas consistent.

• Two : Returns the most recent data from two of the closest replicas.
• Quorum* : Returns the record after a quorum of replicas from all data centers has responded.
• Local_Quorum* : Returns the record after a quorum of replicas in the current data center as the coordinator node has reported.
Avoids latency of inter-data center communication.

• All : Returns the record after all replicas have responded. The read operation will fail if a replica does not respond.

Note* : Q=QUORUM (Q = N / 2 + 1) with N=Replication factor

© 2016 Axway 16
Consistency (Read)
Example (1/2) Example (2/2)

• In a single data center cluster with a replication factor of 3, and a • In a single data center cluster with a replication factor of 3, and a
read consistency level of ONE (1), the closest replica for the read consistency level of QUORUM (1), 2 of the 3 replicas for the
given row is contacted to fulfill the read request (2). In the given row must respond to fulfill the read request (2&3). If the
background a read repair is potentially initiated : contacted replicas have different versions of the row, the replica
with the most recent version will return the requested data. In the
background, the third replica is checked for consistency with the
first two, and if needed, a read repair is initiated for the out-of-date
Node 1 replicas:
5µs ack Johnny 3
Node 1
5µs ack +Johnny
Client
Read Johnny 1 Read Read 2 Client
consistency level = One
Read Johnny
Read
3
Node 4
Read consistency level = 1 Read 12µs ack
Node 2 QUORUM
Johnny
Node 4 2
12µs ack Node 2
Read +Johnny
Node 3 12µs ack
Read
Johnny
4
Node 3
500µs ack +Johnny

© 2016 Axway 17
Consistency
Summary

• Using a replication factor of 3, a quorum is 2 nodes.


• The cluster can tolerate 1 replica down.
Node 1 Node 2 Node 3 … Node N Node 1 Node 2 Node 3 …
Node 4 Node 4 Node N
Johnny Johnny Johnny Johnny Johnny Johnny

2 nodes can still provide the data Only one node can still provide the data.
The Quorum can be achieved The Quorum can never be achieved

• Using a replication factor of 6, a quorum is 4.


• The cluster can tolerate 2 replicas down.

Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6
Node 7 … Node N Node 7 … Node N
Johnny Johnny Johnny Johnny Johnny Johnny Johnny Johnny Johnny Johnny Johnny Johnny

4 nodes can still provide the data Only 3 nodes can still provide the data
The Quorum can be achieved The Quorum can never be achieved

© 2016 Axway 18
Apache Cassandra
Additional definitions
© 2016 Axway 19
Seed Nodes
Definition

• The seed node designation has no purpose other than bootstrapping the
gossip process for new nodes joining the cluster.
• Cassandra nodes use this list of hosts to find each other and learn the
topology of the ring.
• To prevent problems in gossip communications, use the same list of seed
nodes for all nodes in a cluster.
• More than a single seed node per data center is recommended for fault
tolerance
• Example: cassandra.yaml

© 2016 Axway 20
Keyspace
Definition

• A cluster is a container for keyspaces—typically a single keyspace. A keyspace is the outermost


container for data in Cassandra, corresponding closely to a relational database.
• In the same way that a database is a container for tables, a keyspace is a container for a list of one or
more column families. A column family is roughly analogous to a table in the relational model, and is a
container for a collection of rows. Each row contains ordered columns. Column families represent the
structure of your data. Each keyspace has at least one and often many column families.
• Like a relational database, a keyspace has a name and a set of attributes that define keyspace-wide
behavior :
• Replication factor
• Replica placement strategy
• Column families

© 2016 Axway 21
Cassandra
configuration for
API Mgt
Overview
© 2016 Axway 22
Cassandra usage for API Management

• API Gateway KPS Custom KPS tables *


• API Gateway OAuth Token Store *
• API Manager (Client Registry, API Catalog, quotas)
• API Gateway Client Registry (Client Registry for API Keys and OAuth solution,
when API Gateway only is used)

(*) Cassandra is optional for those, other data store options are available

© 2016 Axway 23
Cassandra configuration for API Management - overview
Regarding the previous chapters, elements that must be configured are :

Element Configuration Configuration


location / tool
Each Cassandra node - Node configuration Cassandra.yaml
- IP, port
- Node configuration for client (API GW/Manager) connection
- rpc address and port
- Node configuration related to cluster
- Seed node (for the node to be aware of all other nodes
in the cluster)
- Listen address and port used for internode
communication (used for replication)
Keyspace - Replication factor Policy studio
- Replica placement strategy
Cassandra client (API Read and write consistency level Policy studio
Gateway/Manager)

© 2016 Axway 24
Best Practices (1/2)

• Always build your Cassandra deployment pattern first


• Configure JAVA_HOME referencing an independent ORACLE JRE 1.8 installation
• This completely separates the interdependency between APIGateway and
Cassandra
• Enable Authentication and SSL between Cassandra Hector client and Cassandra
server (multi-node cluster)
• Enable SSL communication between Cassandra servers (multi-node cluster)
• For optimal write performance, place the commit log be on a separate disk
partition. Advice per datastax
https://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configCassandra_yaml_r.html

© 2016 Axway 25
Best Practices (2/2)

• Configure Cassandra H/A before installing the product


• Supported configuration is only strong consistency with minimum: 3 nodes,
Quorum for read and write level of consistency, replication factor : 3
• Synchronize time on all servers
• Don't add a node to the cluster if the seed node is not started
• Start Cassandra Node 1 (seed) first, after it has booted, start Cassandra Node 2,
and, after this has booted, start Cassandra Node 3

© 2016 Axway 26
Cassandra configuration
for API Mgt
Single node deployment
© 2016 Axway 27
Single Node

• A single node deployment is the simplest API Gateway with


(or without API
• Suitable for development environment ONLY Manager)

• The use of LOCALHOST within the cassandra.yaml


and Cassandra- Host (see Policy Studio server
settings) is only suitable for a single node
deployment. NOTE: IPV6 may need to be disabled
Cassandra
Node 1

© 2016 Axway 28
Single node configuration
Cassandra.yaml Keyspace and client configuration
• seed_provider: • Register Host
# Addresses of hosts that are deemed contact points. • Designate Admin Node Manager
# Cassandra nodes use this list of hosts to find each other and learn
# the topology of the ring. You must change this if you are running • Configure API Gateway Instance
# multiple nodes!
- class_name: org.apache.cassandra.locator.SimpleSeedProvider • Configure Hector Client via Policy Studio
parameters:
# seeds is actually a comma-delimited list of addresses.
# Ex: "<ip1>,<ip2>,<ip3>"
- seeds: "127.0.0.1"

• listen_address – Interface used for data connection (PORT 7000 or 7001)


listen_address: 192.168.147.129
# listen_interface: eth0
• Once configuration is deployed Cassandra keyspace will be
• Note: Choose listen address or listen interface created
not both
• Install APIMGR
• rpc_address – Interface used for client connection (port 9160 and 9042)
rpc_address: 192.168.147.129 • Server Settings / Cassandra / Hosts
# rpc_interface: eth1
• Note: Choose listen address or listen interface
not both

© 2016 Axway 29
Cassandra configuration for
API Mgt
3 Node Cluster Deployment
(Single Datacenter)
© 2016 Axway 30
3 Node Cluster

• DC 1:
API Gateway with API Gateway with
(or without API (or without API
Manager) Manager)
Cassandra DB Node 1: 192.168.147.127
Cassandra DB Node 2: 192.168.147.128
Cassandra DB Node 3: 192.168.147.129
Cassandra
Node 1

Cassandra Cassandra
Node 1 Node 2

© 2016 Axway 31
Each node configuration
Cassandra.yaml

• seed_provider:
# Addresses of hosts that are deemed contact points.
# Cassandra nodes use this list of hosts to find each other and learn
# the topology of the ring. You must change this if you are running
# multiple nodes!
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
# seeds is actually a comma-delimited list of addresses.
# Ex: "<ip1>,<ip2>,<ip3>"
- seeds: “192.168.147.127“ (all Cassandra instances should reference the same seed)

• listen_address – Interface used for data connection (PORT 7000 or 7001)


listen_address: 192.168.147.127 (change address to correspond with Cassandra instance)
# listen_interface: eth0
• Note: Choose listen address or listen interface not both
• rpc_address – Interface used for client connection (port 9160 and 9042)
rpc_address: 192.168.147.127 (change address to correspond with Cassandra instance)
# rpc_interface: eth1
• Note: Choose listen address or listen interface not both

© 2016 Axway 32
Keyspace and client configuration
For the first API Gateway/Manager Update replication factor
• Register Host • Login to the Cassandra DB Node 1
# cd ../cassandra/bin
• Configure 1 APIGateway instance
# ./cqlsh <IP Address>
• Configure Hector Client via Policy Studio ./cqlsh 192.168.147.127
# Find keyspace
> DESCRIBE KEYSPACES;
Example: x8746e4a4_e423_40ac_95a7_4934215e4e5d_group_2
# Execute the following command to alter table
• Once configuration is deployed Cassandra > ALTER KEYSPACE
keyspace will be created x8746e4a4_e423_40ac_95a7_4934215e4e5d_group_2 WITH
• Install APIMGR on first gateway REPLICATION = {'class' : ‘SimpleStrategy', 'replication_factor :
• Update Read/Write Consistency Level to 3’};
QUORUM for KPS collections via Policy Studio # Exit cqlsh utility and run the following command on all cassandra
instances:
• Register remaining hosts and configure APIGateway
Instance --- AFTER replication factor is updated • Run nodetool repair
x8746e4a4_e423_40ac_95a7_4934215e4e5d_group_2 on all cassandra
instances.

© 2016 Axway 33
Keyspace and client configuration
For the other API Gateway/Manager

• Register remaining hosts and configure API Gateway Instance

© 2016 Axway 34
Apache Cassandra
Reference

© 2016 Axway 35
Reference
Reference- /tmp --noexec Reference: JAVA_HOME

• /tmp –noexec (Cassandra Only) • JAVA_HOME


• If /tmp –noexec is configured an error • # tar xvzf jdk-8u101-linux-x64.tar.gz
will be generated when cassandra is
started -C /opt/jdk/
• Solution: • Make sure the JAVA_HOME variable is
• Create tmp directory in /opt/cassandra available for all the users, by adding the
• Edit the cassandra-env.sh file which following entries in /etc/profile file.
would be under
/opt/axway/cassandra/conf folder and • # sudo vi /etc/profile
add the following lines: • JAVA_HOME=/opt/jdk/jdk1.8.0_101
• # vi cassandra-env.sh • Export PATH=$PATH:$JAVA_HOME/bin
• JVM_OPTS="$JVM_OPTS -
Djava.io.tmpdir=$CASSANDRA_HOME/tmp“
• Solution 2: This works for both APIGateway and
Cassandra
• sudo mount -o remount,exec /tmp

© 2016 Axway 36
Apache Cassandra
Tools

© 2016 Axway 37
Tools
http://www.ecyrd.com/cassandracalculator/

© 2016 Axway 38
Tools
DBeaver - Linux

© 2016 Axway 39
Apache Cassandra
To go further in
Cassandra understanding
© 2016 Axway 40
Cassandra: Components
Write process - Additional definitions

• Commit log: The commit log is a crash-recovery mechanism in Cassandra. Every write
operation is written to the commit log.
• Mem-table: A mem-table is a memory-resident data structure. After commit log, the data will
be written to the mem-table. Sometimes, for a single-column family, there will be multiple
mem-tables.
• SSTable: It is a disk file to which the data is flushed from the mem-table when its contents
reach a threshold value.

© 2016 Axway 41
Cassandra: Components
Write process (1/3)
Client 1 The client send the request to the node

Update users
Set firstname = ‘Patrick’ It is written into the commit log (written
Where id=‘pmcfadin’ 2 on the server disk). It is very fast.

Write Rowkey, Column


(id=‘pmcfadin’,firstname = ‘Patrick’
Cassandra server
File system 2

id=‘pmcfadin’,
firstname = ‘Patrick’

Commit log Data directory

© 2016 Axway 42
Resource: https://www.youtube.com/watch?v=B_HTdrTgGNs
Cassandra: Components
Write process (2/3)
Then the data is put on a memtable
Client 3 stored in memory

Update users
Set firstname = ‘Patrick’
Where id=‘pmcfadin’ 4 Acknowledgement to the client

Memtable – Table = users


id=‘pmcfadin’ firstname = ‘Patrick’ Lastname name = ‘McFadin’
Cassandra server
File system
3

id=‘pmcfadin’,
firstname = ‘Patrick’

Commit log Data directory

© 2016 Axway 43
Resource: https://www.youtube.com/watch?v=B_HTdrTgGNs
Cassandra: Components
Write process (3/3)
The flush process writes out data into a
Client 5 file called SStable. It is flushed to disk.
It is not about random IO but sequential
Update users IO (sequential write). It is ordered by
Set firstname = ‘Patrick’ time.
Where id=‘pmcfadin’

Memtable – Table = users


id=‘pmcfadin’ firstname = ‘Patrick’ Lastname name = ‘McFadin’
Cassandra server
File system 5

id=‘pmcfadin’ firstname = ‘Patrick’ Lastname name = ‘McFadin’

Commit log Data directory

© 2016 Axway 44
Resource: https://www.youtube.com/watch?v=B_HTdrTgGNs
Replication Strategies
Definition

• A replication strategy determines the nodes where replicas are placed. Two replication
strategies are available:
• SimpleStrategy :
• Use only for a single data center. SimpleStrategy places the first replica on a node determined by the
partitioner. Additional replicas are placed on the next nodes clockwise in the ring without considering
topology (rack or data center location).
• NetworkTopologyStategy
• Use when you have (or plan to have) your cluster deployed across multiple data centers. This strategy
specify how many replicas you want in each data center.

• Strategy is configured per KEYSPACE

© 2016 Axway 45
Snitch
Definition Example
• SimpleSnitch
• A snitch determines which data centers and racks
nodes belong to. • The SimpleSnitch is used only for single-data center
deployments.
• The SimpleSnitch (default) is used only for single-data center
deployments. It does not recognize data center or rack
• They inform Cassandra about the network information and can be used only for single-data center
topology so that requests are routed efficiently deployments or single-zone in public clouds. It treats strategy
and allows Cassandra to distribute replicas by order as proximity, which can improve cache locality when
grouping machines into data centers and racks. disabling read repair.
• Specifically, the replication strategy places the • Using a SimpleSnitch, you define the keyspace to use
replicas based on the information provided by the SimpleStrategy and specify a replication factor.
new snitch.
• GossipingPropertyFileSnitch
• All nodes must return to the same rack and data
• Automatically updates all nodes using gossip when adding
center.
new nodes and is recommended for production.
• Cassandra does its best not to have more than one
• This snitch is recommended for production.
replica on the same rack (which is not necessarily a
physical location). • It uses rack and data center information for the local node
defined in the cassandra-rackdc.properties file and propagates
this information to other nodes via gossip.

• Referenced in cassandra.yaml
© 2016 Axway 46
• endpoint_snitch:SimpleSnitch
Internode communications (gossip)
Definition

• Cassandra uses a protocol called gossip to discover location and state information about the
other nodes participating in a Cassandra cluster.
• Gossip is a peer-to-peer communication protocol in which nodes periodically exchange state
information about themselves and about other nodes they know about.
• The gossip process runs every second and exchanges state messages with up to three other
nodes in the cluster.
• The nodes exchange information about themselves and about the other nodes that they have
gossiped about, so all nodes quickly learn about all other nodes in the cluster.
• A gossip message has a version associated with it, so that during a gossip exchange, older
information is overwritten with the most current state for a particular node.

© 2016 Axway 47
Thank
you!
© 2016 Axway 48

You might also like