Discover millions of ebooks, audiobooks, and so much more with a free trial

From $11.99/month after trial. Cancel anytime.

Apache Cassandra Essentials
Apache Cassandra Essentials
Apache Cassandra Essentials
Ebook338 pages2 hours

Apache Cassandra Essentials

Rating: 4 out of 5 stars

4/5

()

Read preview

About this ebook

If you are a developer who is working with Cassandra and you want to deep dive into the core concepts and understand Cassandra’s non-relational nature, then this book is for you. A basic understanding of Cassandra is expected.
LanguageEnglish
Release dateNov 20, 2015
ISBN9781783989119
Apache Cassandra Essentials

Related to Apache Cassandra Essentials

Related ebooks

Computers For You

View More

Related articles

Reviews for Apache Cassandra Essentials

Rating: 4 out of 5 stars
4/5

1 rating0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Apache Cassandra Essentials - Padalia Nitin

    Table of Contents

    Apache Cassandra Essentials

    Credits

    About the Author

    About the Reviewers

    www.PacktPub.com

    Support files, eBooks, discount offers, and more

    Why subscribe?

    Free access for Packt account holders

    Preface

    What this book covers

    What you need for this book

    Who this book is for

    Conventions

    Reader feedback

    Customer support

    Downloading the example code

    Errata

    Piracy

    Questions

    1. Getting Your Cassandra Cluster Ready

    Installation

    Prerequisites

    Compiling Cassandra from source and installing

    Installation from a precompiled binary

    The installation layout

    The directory layout in tarball installations

    The directory layout in package-based installation

    Configuration files

    cassandra.yaml

    Running a Cassandra server

    Running a Cassandra node

    Setting up the cluster

    Viewing the cluster status

    Summary

    2. An Architectural Overview

    Background

    Cassandra cluster overview

    The Gossip protocol

    Failure detection

    Data distribution

    Replication

    SimpleStrategy

    NetworkTopologyStrategy

    Snitches

    Virtual nodes

    Adding nodes to our cluster

    Create keyspace and column family

    Summary

    3. Creating Database and Schema

    A database and schema

    Keyspace

    Column families

    Static rows

    Wide rows

    A primary key

    Partition keys and clustering columns

    A composite partition key

    Multiple clustering columns

    Static columns

    Modifying a table

    Data types

    Counters

    Collections

    Sets

    Lists

    Map

    UDTs

    Secondary indexes

    Allowing filtering

    TTL

    Conditional querying

    Conditions on a partition key

    Conditions on a partition key and clustering columns

    Sorting query results

    Write operations

    Lightweight transactions

    Batch statements

    Summary

    4. Read and Write – Behind the Scenes

    Write operations

    CommitLog

    Anatomy of Memtable

    SSTable explained

    SSTable Compaction strategies

    Size-tiered compaction

    Leveled compaction

    DateTiered compaction

    Read operations

    Reads from row cache

    Read operations for row cache miss

    Key is in KeyCache

    Key search miss both the key cache and the row cache

    Delete operations

    Data consistency

    Read operation

    Digest reads

    Read repair

    Consistency levels

    Write operation

    Hinted handoff

    Consistency levels

    Tracing Cassandra queries

    Summary

    5. Writing Your Cassandra Client

    Connecting to a Cassandra cluster

    Driver Connection policies

    Load balancing policies

    Retry policies

    Reconnection policies

    Reading and writing to the Cassandra cluster

    QueryBuilder

    Reading and writing asynchronously

    Prepared statements

    Example REST service using prepared statement

    Batch statements

    Mapping API

    Tracing Cassandra queries using Java driver

    Summary

    6. Monitoring and Tuning a Cassandra Cluster

    Monitoring a Cassandra cluster

    Use logging for debugging

    Monitoring using command-line utilities

    nodetool cfstats

    nodetool cfhistograms

    nodetool netstats

    nodetool tpstats

    JConsole

    Third-party tools

    Tuning Cassandra nodes

    Configuring Cassandra caches

    Tuning Bloom filters

    Configuring and tuning Java

    Summary

    7. Backup and Restore

    Taking backup of a Casandra cluster

    Manual backup

    Deleting snapshots

    Incremental backup

    Restoring data to Cassandra

    The Cassandra bulk loader

    Exporting and importing data using the Cassandra JSON utility

    Loading external data into Cassandra

    Removing nodes from Cassandra cluster

    Adding nodes to a Cassandra cluster

    Replacing dead nodes in a cluster

    Summary

    Index

    Apache Cassandra Essentials


    Apache Cassandra Essentials

    Copyright © 2015 Packt Publishing

    All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

    Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

    Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

    First published: November 2015

    Production reference: 1161115

    Published by Packt Publishing Ltd.

    Livery Place

    35 Livery Street

    Birmingham B3 2PB, UK.

    ISBN 978-1-78398-910-2

    www.packtpub.com

    Credits

    Author

    Nitin Padalia

    Reviewers

    Ranjeet Kumar Jha

    Sonal Raj

    Chaoran Yu

    Commissioning Editor

    Akram Hussain

    Acquisition Editor

    Meeta Rajani

    Content Development Editor

    Aparna Mitra

    Technical Editor

    Rohan Uttam Gosavi

    Copy Editor

    Pranjali Chury

    Project Coordinator

    Mary Alex

    Proofreader

    Safis Editing

    Indexer

    Mariammal Chettiyar

    Graphics

    Disha Haria

    Production Coordinator

    Nilesh Mohite

    Cover Work

    Nilesh Mohite

    About the Author

    Nitin Padalia is the technical leader at Aricent Group, where he is involved in building highly scalable distributed applications in the field of telecommunications. From the beginning of his career, he has been working in the field of telecommunications and has worked on protocols such as SMPP, RTP, SIP, and VOIP. Since the beginning of his career, he has worked on the development of applications that can scale infinitely with highest performance possible. He has experience of developing applications for bare metal hardware, virtualized environments, and cloud-based applications using various languages and technologies.

    I would like to thank all the reviewers of this book; their comments helped me to present data effectively.

    Meeta Rajani, for setting things up and providing input during the initial phase of the book.

    Anish Sukumaran, for helping me through his comments and input till the completion of this book.

    Chaoran Yu, for good suggestions regarding presenting data and examples in a way that could be more helpful from the readers' perspective.

    Ranjit, for his input throughout the book.

    I also would like to thank my family—my mother, father, wife, and kids—for letting me take some time out to write this book.

    About the Reviewers

    Ranjeet Kumar Jha has over 12 years (three years in the big data field) of experience in various phases of the project life cycle, including the development and design phases. He has also been part of production support for Java/JEE and big data-based applications. He is a certified enterprise architect, that is, Oracle Certified Master Enterprise Java JEE Architect, and has worked for over six years as an architect in Java JEE technologies (over three years in the big data field). He has worked in various domains such as finance, insurance, e-commerce, digital media, CMS, security, and online advertisements.

    He has worked as a programmer, designer, mentor, and architect on all types of projects related to Java, especially JEE and big data. He is the reviewer of the book Real-time Analytics with Storm and Cassandra.

    To find out more about him, visit his LinkedIn profile at https://www.linkedin.com/in/jharanjeet.

    I would like to thank my family—my wife, Anila Jha, and two kids, Anushka Jha and Tanisha Jha, for their constant support, encouragement, and patience. Without you, I wouldn't have achieved so much! Love you all immensely.

    Sonal Raj is a hacker, Pythonista, big data believer, and a technology dreamer. He has a passion for design and is an artist at heart. He blogs about technology, design, and gadgets at http://www.sonalraj.com/. When not working on projects, he can be found travelling, stargazing, or reading.

    He has pursued engineering in computer science and holds a master's degree in IT. He loves to work on community projects. He has been a research fellow at IISc and has taken up projects on graph computations using Neo4j, Storm, and NoSQL databases. He has been a speaker at PyCon India and local meetups and has also published articles and research papers in leading magazines and international journals. He has contributed to several open source projects.

    He is the author of Neo4j High Performance, Packt Publishing, and has reviewed titles on technologies such as Storm and Neo4j

    I am grateful to the author for patiently listening to my critiques. I'd like to thank the open source community for keeping their passions alive and contributing to such remarkable projects. A special thank you to my parents, without whom I never would have grown to love learning as much as I do.

    Chaoran Yu obtained his bachelor's degree with high honors from UC Berkeley Department of Electrical Engineering and Computer Science in May 2014. He has been a software developer with the data analytics team of Ericsson MediaFirst, a leading IPTV solution, since then. The technologies that he has worked on include Apache Cassandra, Spark, and the Microsoft .NET framework. He organized service and client logging and performance data and wrote code to store them in Cassandra, which he then processed with Spark jobs to generate real-time reports for TV operators. His passion for open source technologies, especially for distributed and scalable systems, makes him an avid learner in this ever-changing technology landscape.

    www.PacktPub.com

    Support files, eBooks, discount offers, and more

    For support files and downloads related to your book, please visit www.PacktPub.com.

    Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at for more details.

    At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

    https://www2.packtpub.com/books/subscription/packtlib

    Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.

    Why subscribe?

    Fully searchable across every book published by Packt

    Copy and paste, print, and bookmark content

    On demand and accessible via a web browser

    Free access for Packt account holders

    If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view 9 entirely free books. Simply use your login credentials for immediate access.

    Preface

    Traditional database management systems sometimes become the bottleneck of being highly available, scalable, and ultra responsive for modern day applications, as they are not able to satisfy the storage and retrieval needs of modern applications with all these attributes. Apache Cassandra being a highly available, massively scalable, NoSQL, query-driven database helps our applications to achieve these modern day must have attributes. Apache Cassandra's core features include handling of large data with the flexibility of configuring responsiveness, scalability, and high availability at the same time to suit our requirements.

    In this book, I've provided step-by-step information starting from the basic installation to the advanced installation options and database design techniques. It gives all the information that you will need to design a well-distributed and high performance database. This book focuses on explaining core concepts with simple and easy-to-understand examples. I've also incorporated some code examples with this book. You can use these examples while working on your day-to-day tasks with Cassandra.

    What this book covers

    Chapter 1, Getting Your Cassandra Cluster Ready, gives an introduction to Cassandra and helps you to set up your cluster. It also introduces you to the various configuration options available to set up your cluster, which can be referred to while fine tuning the cluster.

    Chapter 2, An Architectural Overview, helps you to understand the internal architecture of a Cassandra cluster. It details various strategies used by Cassandra to distribute data among various nodes in the cluster. It describes how Cassandra becomes highly available by employing various replication strategies. It also clarifies various replication and data distribution strategies.

    Chapter 3, Creating Database and Schema, details the concepts used by Cassandra. We'll learn to use CQL (Cassandra Query Language), which is used by Cassandra clients to describe data models, to create our databases and tables. Also, we'll discuss various techniques provided by Cassandra that can be used based on our storage and data retrieval requirements.

    Chapter 4, Read and Write – Behind the Scenes, has been written keeping in mind how the reader can understand core concepts of a system. We'll discuss the operations that Cassandra performs for every read and write query along with all the data structures and caches it uses. We'll also discuss what configuration options it provides to configure the trade-off between consistency and latency. In the later parts of this chapter, we'll see how we can trace a Cassandra read/write query to debug performance issues for our read/write queries.

    Chapter 5, Writing Your Cassandra Client, provides some code samples to set up your cluster, learn the core concepts of Cassandra, and create

    Enjoying the preview?
    Page 1 of 1