Apache Cassandra Essentials
4/5
()
About this ebook
Related to Apache Cassandra Essentials
Related ebooks
Cassandra High Availability Rating: 5 out of 5 stars5/5Implementing Cloud Design Patterns for AWS Rating: 0 out of 5 stars0 ratingsDistributed Computing in Java 9 Rating: 0 out of 5 stars0 ratingsLearning Apache Cassandra - Second Edition Rating: 0 out of 5 stars0 ratingsCassandra Design Patterns - Second Edition Rating: 0 out of 5 stars0 ratingsASP.NET Core 3 and React: Hands-On full stack web development using ASP.NET Core, React, and TypeScript 3 Rating: 0 out of 5 stars0 ratingsLearning Apache Cassandra Rating: 0 out of 5 stars0 ratingsProfessional Hadoop Solutions Rating: 4 out of 5 stars4/5Sonar Code Quality Testing Essentials Rating: 0 out of 5 stars0 ratingsRestlet in Action: Developing RESTful web APIs in Java Rating: 0 out of 5 stars0 ratingsMastering Apache Cassandra - Second Edition Rating: 0 out of 5 stars0 ratingsModern API Design with gRPC Rating: 0 out of 5 stars0 ratingsSpring MVC Cookbook Rating: 0 out of 5 stars0 ratingsData Engineering with Databricks Cookbook: Build effective data and AI solutions using Apache Spark, Databricks, and Delta Lake Rating: 0 out of 5 stars0 ratingsInstant Pentaho Data Integration Kitchen Rating: 0 out of 5 stars0 ratingsJava Persistence with NoSQL: Revolutionize your Java apps with NoSQL integration (English Edition) Rating: 0 out of 5 stars0 ratingsKubernetes A Complete Guide Rating: 0 out of 5 stars0 ratingsIaC Mastery: Infrastructure As Code: Your All-In-One Guide To Terraform, AWS, Azure, And Kubernetes Rating: 0 out of 5 stars0 ratingsLearning Continuous Integration with Jenkins Rating: 0 out of 5 stars0 ratingsAcing the Certified Kubernetes Administrator Exam Rating: 0 out of 5 stars0 ratingsLearning Elasticsearch 7.x: Index, Analyze, Search and Aggregate Your Data Using Elasticsearch (English Edition) Rating: 0 out of 5 stars0 ratingsServerless Architectures on AWS Complete Self-Assessment Guide Rating: 0 out of 5 stars0 ratingsApache Hive Cookbook Rating: 0 out of 5 stars0 ratingsAngular Services Rating: 0 out of 5 stars0 ratingsLearn Cassandra in 24 Hours Rating: 0 out of 5 stars0 ratingsAdvanced Platform Development with Kubernetes: Enabling Data Management, the Internet of Things, Blockchain, and Machine Learning Rating: 0 out of 5 stars0 ratings
Computers For You
The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution Rating: 4 out of 5 stars4/5The Invisible Rainbow: A History of Electricity and Life Rating: 5 out of 5 stars5/5Elon Musk Rating: 4 out of 5 stars4/5Slenderman: Online Obsession, Mental Illness, and the Violent Crime of Two Midwestern Girls Rating: 4 out of 5 stars4/5How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally Rating: 4 out of 5 stars4/5Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics Rating: 4 out of 5 stars4/5SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5Mastering ChatGPT: 21 Prompts Templates for Effortless Writing Rating: 4 out of 5 stars4/5The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology Rating: 4 out of 5 stars4/5Uncanny Valley: A Memoir Rating: 4 out of 5 stars4/5Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad Rating: 5 out of 5 stars5/5Alan Turing: The Enigma: The Book That Inspired the Film The Imitation Game - Updated Edition Rating: 4 out of 5 stars4/5People Skills for Analytical Thinkers Rating: 5 out of 5 stars5/5Excel 101: A Beginner's & Intermediate's Guide for Mastering the Quintessence of Microsoft Excel (2010-2019 & 365) in no time! Rating: 0 out of 5 stars0 ratingsCompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61 Rating: 0 out of 5 stars0 ratingsThe Hacker Crackdown: Law and Disorder on the Electronic Frontier Rating: 4 out of 5 stars4/5Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are Rating: 4 out of 5 stars4/5Deep Search: How to Explore the Internet More Effectively Rating: 5 out of 5 stars5/5Computer Science I Essentials Rating: 5 out of 5 stars5/5The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling Rating: 0 out of 5 stars0 ratingsCompTIA Security+ Get Certified Get Ahead: SY0-701 Study Guide Rating: 5 out of 5 stars5/5CompTia Security 701: Fundamentals of Security Rating: 0 out of 5 stars0 ratings101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters Rating: 4 out of 5 stars4/5Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates Rating: 4 out of 5 stars4/5
Reviews for Apache Cassandra Essentials
1 rating0 reviews
Book preview
Apache Cassandra Essentials - Padalia Nitin
Table of Contents
Apache Cassandra Essentials
Credits
About the Author
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Getting Your Cassandra Cluster Ready
Installation
Prerequisites
Compiling Cassandra from source and installing
Installation from a precompiled binary
The installation layout
The directory layout in tarball installations
The directory layout in package-based installation
Configuration files
cassandra.yaml
Running a Cassandra server
Running a Cassandra node
Setting up the cluster
Viewing the cluster status
Summary
2. An Architectural Overview
Background
Cassandra cluster overview
The Gossip protocol
Failure detection
Data distribution
Replication
SimpleStrategy
NetworkTopologyStrategy
Snitches
Virtual nodes
Adding nodes to our cluster
Create keyspace and column family
Summary
3. Creating Database and Schema
A database and schema
Keyspace
Column families
Static rows
Wide rows
A primary key
Partition keys and clustering columns
A composite partition key
Multiple clustering columns
Static columns
Modifying a table
Data types
Counters
Collections
Sets
Lists
Map
UDTs
Secondary indexes
Allowing filtering
TTL
Conditional querying
Conditions on a partition key
Conditions on a partition key and clustering columns
Sorting query results
Write operations
Lightweight transactions
Batch statements
Summary
4. Read and Write – Behind the Scenes
Write operations
CommitLog
Anatomy of Memtable
SSTable explained
SSTable Compaction strategies
Size-tiered compaction
Leveled compaction
DateTiered compaction
Read operations
Reads from row cache
Read operations for row cache miss
Key is in KeyCache
Key search miss both the key cache and the row cache
Delete operations
Data consistency
Read operation
Digest reads
Read repair
Consistency levels
Write operation
Hinted handoff
Consistency levels
Tracing Cassandra queries
Summary
5. Writing Your Cassandra Client
Connecting to a Cassandra cluster
Driver Connection policies
Load balancing policies
Retry policies
Reconnection policies
Reading and writing to the Cassandra cluster
QueryBuilder
Reading and writing asynchronously
Prepared statements
Example REST service using prepared statement
Batch statements
Mapping API
Tracing Cassandra queries using Java driver
Summary
6. Monitoring and Tuning a Cassandra Cluster
Monitoring a Cassandra cluster
Use logging for debugging
Monitoring using command-line utilities
nodetool cfstats
nodetool cfhistograms
nodetool netstats
nodetool tpstats
JConsole
Third-party tools
Tuning Cassandra nodes
Configuring Cassandra caches
Tuning Bloom filters
Configuring and tuning Java
Summary
7. Backup and Restore
Taking backup of a Casandra cluster
Manual backup
Deleting snapshots
Incremental backup
Restoring data to Cassandra
The Cassandra bulk loader
Exporting and importing data using the Cassandra JSON utility
Loading external data into Cassandra
Removing nodes from Cassandra cluster
Adding nodes to a Cassandra cluster
Replacing dead nodes in a cluster
Summary
Index
Apache Cassandra Essentials
Apache Cassandra Essentials
Copyright © 2015 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: November 2015
Production reference: 1161115
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78398-910-2
www.packtpub.com
Credits
Author
Nitin Padalia
Reviewers
Ranjeet Kumar Jha
Sonal Raj
Chaoran Yu
Commissioning Editor
Akram Hussain
Acquisition Editor
Meeta Rajani
Content Development Editor
Aparna Mitra
Technical Editor
Rohan Uttam Gosavi
Copy Editor
Pranjali Chury
Project Coordinator
Mary Alex
Proofreader
Safis Editing
Indexer
Mariammal Chettiyar
Graphics
Disha Haria
Production Coordinator
Nilesh Mohite
Cover Work
Nilesh Mohite
About the Author
Nitin Padalia is the technical leader at Aricent Group, where he is involved in building highly scalable distributed applications in the field of telecommunications. From the beginning of his career, he has been working in the field of telecommunications and has worked on protocols such as SMPP, RTP, SIP, and VOIP. Since the beginning of his career, he has worked on the development of applications that can scale infinitely with highest performance possible. He has experience of developing applications for bare metal hardware, virtualized environments, and cloud-based applications using various languages and technologies.
I would like to thank all the reviewers of this book; their comments helped me to present data effectively.
Meeta Rajani, for setting things up and providing input during the initial phase of the book.
Anish Sukumaran, for helping me through his comments and input till the completion of this book.
Chaoran Yu, for good suggestions regarding presenting data and examples in a way that could be more helpful from the readers' perspective.
Ranjit, for his input throughout the book.
I also would like to thank my family—my mother, father, wife, and kids—for letting me take some time out to write this book.
About the Reviewers
Ranjeet Kumar Jha has over 12 years (three years in the big data field) of experience in various phases of the project life cycle, including the development and design phases. He has also been part of production support for Java/JEE and big data-based applications. He is a certified enterprise architect, that is, Oracle Certified Master Enterprise Java JEE Architect, and has worked for over six years as an architect in Java JEE technologies (over three years in the big data field). He has worked in various domains such as finance, insurance, e-commerce, digital media, CMS, security, and online advertisements.
He has worked as a programmer, designer, mentor, and architect on all types of projects related to Java, especially JEE and big data. He is the reviewer of the book Real-time Analytics with Storm and Cassandra.
To find out more about him, visit his LinkedIn profile at https://www.linkedin.com/in/jharanjeet.
I would like to thank my family—my wife, Anila Jha, and two kids, Anushka Jha and Tanisha Jha, for their constant support, encouragement, and patience. Without you, I wouldn't have achieved so much! Love you all immensely.
Sonal Raj is a hacker, Pythonista, big data believer, and a technology dreamer. He has a passion for design and is an artist at heart. He blogs about technology, design, and gadgets at http://www.sonalraj.com/. When not working on projects, he can be found travelling, stargazing, or reading.
He has pursued engineering in computer science and holds a master's degree in IT. He loves to work on community projects. He has been a research fellow at IISc and has taken up projects on graph computations using Neo4j, Storm, and NoSQL databases. He has been a speaker at PyCon India and local meetups and has also published articles and research papers in leading magazines and international journals. He has contributed to several open source projects.
He is the author of Neo4j High Performance, Packt Publishing, and has reviewed titles on technologies such as Storm and Neo4j
I am grateful to the author for patiently listening to my critiques. I'd like to thank the open source community for keeping their passions alive and contributing to such remarkable projects. A special thank you to my parents, without whom I never would have grown to love learning as much as I do.
Chaoran Yu obtained his bachelor's degree with high honors from UC Berkeley Department of Electrical Engineering and Computer Science in May 2014. He has been a software developer with the data analytics team of Ericsson MediaFirst, a leading IPTV solution, since then. The technologies that he has worked on include Apache Cassandra, Spark, and the Microsoft .NET framework. He organized service and client logging and performance data and wrote code to store them in Cassandra, which he then processed with Spark jobs to generate real-time reports for TV operators. His passion for open source technologies, especially for distributed and scalable systems, makes him an avid learner in this ever-changing technology landscape.
www.PacktPub.com
Support files, eBooks, discount offers, and more
For support files and downloads related to your book, please visit www.PacktPub.com.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
https://www2.packtpub.com/books/subscription/packtlib
Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.
Why subscribe?
Fully searchable across every book published by Packt
Copy and paste, print, and bookmark content
On demand and accessible via a web browser
Free access for Packt account holders
If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view 9 entirely free books. Simply use your login credentials for immediate access.
Preface
Traditional database management systems sometimes become the bottleneck of being highly available, scalable, and ultra responsive for modern day applications, as they are not able to satisfy the storage and retrieval needs of modern applications with all these attributes. Apache Cassandra being a highly available, massively scalable, NoSQL, query-driven database helps our applications to achieve these modern day must have attributes. Apache Cassandra's core features include handling of large data with the flexibility of configuring responsiveness, scalability, and high availability at the same time to suit our requirements.
In this book, I've provided step-by-step information starting from the basic installation to the advanced installation options and database design techniques. It gives all the information that you will need to design a well-distributed and high performance database. This book focuses on explaining core concepts with simple and easy-to-understand examples. I've also incorporated some code examples with this book. You can use these examples while working on your day-to-day tasks with Cassandra.
What this book covers
Chapter 1, Getting Your Cassandra Cluster Ready, gives an introduction to Cassandra and helps you to set up your cluster. It also introduces you to the various configuration options available to set up your cluster, which can be referred to while fine tuning the cluster.
Chapter 2, An Architectural Overview, helps you to understand the internal architecture of a Cassandra cluster. It details various strategies used by Cassandra to distribute data among various nodes in the cluster. It describes how Cassandra becomes highly available by employing various replication strategies. It also clarifies various replication and data distribution strategies.
Chapter 3, Creating Database and Schema, details the concepts used by Cassandra. We'll learn to use CQL (Cassandra Query Language), which is used by Cassandra clients to describe data models, to create our databases and tables. Also, we'll discuss various techniques provided by Cassandra that can be used based on our storage and data retrieval requirements.
Chapter 4, Read and Write – Behind the Scenes, has been written keeping in mind how the reader can understand core concepts of a system. We'll discuss the operations that Cassandra performs for every read and write query along with all the data structures and caches it uses. We'll also discuss what configuration options it provides to configure the trade-off between consistency and latency. In the later parts of this chapter, we'll see how we can trace a Cassandra read/write query to debug performance issues for our read/write queries.
Chapter 5, Writing Your Cassandra Client, provides some code samples to set up your cluster, learn the core concepts of Cassandra, and create