Mastering Elastic Stack
By Gupta Ravi Kumar and Gupta Yuvraj
()
About this ebook
- Your one-stop solution to perform advanced analytics with Elasticsearch, Logstash, and Kibana
- Learn how to make better sense of your data by searching, analyzing, and logging data in a systematic way
- This highly practical guide takes you through an advanced implementation on the ELK stack in your enterprise environment
This book cater to developers using the Elastic stack in their day-to-day work who are familiar with the basics of Elasticsearch, Logstash, and Kibana, and now want to become an expert at using the Elastic stack for data analytics.
Related to Mastering Elastic Stack
Related ebooks
Learning ELK Stack Rating: 0 out of 5 stars0 ratingsElasticsearch Essentials Rating: 0 out of 5 stars0 ratingsMastering Zabbix - Second Edition Rating: 0 out of 5 stars0 ratingsTypeScript: Modern JavaScript Development Rating: 0 out of 5 stars0 ratingsKubernetes Handbook: Non-Programmer's Guide to Deploy Applications with Kubernetes Rating: 4 out of 5 stars4/5Mastering Kubernetes Rating: 5 out of 5 stars5/5Learn Kubernetes & Docker - .NET Core, Java, Node.JS, PHP or Python Rating: 0 out of 5 stars0 ratingsAcing the Certified Kubernetes Administrator Exam Rating: 0 out of 5 stars0 ratingsDocker Demystified: Learn How to Develop and Deploy Applications Using Docker (English Edition) Rating: 0 out of 5 stars0 ratingsOperations Anti-Patterns, DevOps Solutions Rating: 0 out of 5 stars0 ratingsRed Hat OpenShift A Complete Guide - 2020 Edition Rating: 0 out of 5 stars0 ratingsGetting Started with Kubernetes - Second Edition Rating: 0 out of 5 stars0 ratingsDocker: A Quick-Start Beginner's Guide Rating: 4 out of 5 stars4/5Monitoring Docker Rating: 0 out of 5 stars0 ratingsAzure for .NET Core Developers: Implementing Microsoft Azure Solutions Using .NET Core Framework Rating: 0 out of 5 stars0 ratingsOpenShift in Action Rating: 0 out of 5 stars0 ratingsLearning Elasticsearch 7.x: Index, Analyze, Search and Aggregate Your Data Using Elasticsearch (English Edition) Rating: 0 out of 5 stars0 ratingsLearn Git in a Month of Lunches Rating: 0 out of 5 stars0 ratingsImplementing Cloud Design Patterns for AWS Rating: 0 out of 5 stars0 ratingsNative Docker Clustering with Swarm Rating: 0 out of 5 stars0 ratingsTerraform in Action Rating: 5 out of 5 stars5/5DevOps and Containers Security: Security and Monitoring in Docker Containers Rating: 0 out of 5 stars0 ratingsHands-On Microservices with Kubernetes: Build, deploy, and manage scalable microservices on Kubernetes Rating: 5 out of 5 stars5/5CI/CD Pipeline with Docker and Jenkins: Learn How to Build and Manage Your CI/CD Pipelines Effectively (English Edition) Rating: 0 out of 5 stars0 ratingsLearning Elasticsearch Rating: 4 out of 5 stars4/5Implementing DevOps on AWS Rating: 0 out of 5 stars0 ratingsGraphQL in Action Rating: 2 out of 5 stars2/5Azure Infrastructure as Code: With ARM templates and Bicep Rating: 0 out of 5 stars0 ratingsMastering Elasticsearch - Second Edition Rating: 0 out of 5 stars0 ratings
Databases For You
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5Access 2010 All-in-One For Dummies Rating: 4 out of 5 stars4/5COMPUTER SCIENCE FOR ROOKIES Rating: 0 out of 5 stars0 ratingsPractical Data Analysis Rating: 4 out of 5 stars4/5Blockchain Basics: A Non-Technical Introduction in 25 Steps Rating: 4 out of 5 stars4/5Learn SQL in 24 Hours Rating: 5 out of 5 stars5/5CompTIA DataSys+ Study Guide: Exam DS0-001 Rating: 0 out of 5 stars0 ratingsGrokking Algorithms: An illustrated guide for programmers and other curious people Rating: 4 out of 5 stars4/5Access 2019 For Dummies Rating: 0 out of 5 stars0 ratingsLearn Git in a Month of Lunches Rating: 0 out of 5 stars0 ratingsServerless Architectures on AWS, Second Edition Rating: 5 out of 5 stars5/5The AI Bible, Making Money with Artificial Intelligence: Real Case Studies and How-To's for Implementation Rating: 4 out of 5 stars4/5Visual Basic 6.0 Programming By Examples Rating: 5 out of 5 stars5/5Troubleshooting PostgreSQL Rating: 5 out of 5 stars5/5Learn SQL Server Administration in a Month of Lunches Rating: 3 out of 5 stars3/5Go in Action Rating: 5 out of 5 stars5/5Visualizing Graph Data Rating: 0 out of 5 stars0 ratingsData Analysis with R Rating: 5 out of 5 stars5/5Starting Database Administration: Oracle DBA Rating: 3 out of 5 stars3/5MATLAB Machine Learning Recipes: A Problem-Solution Approach Rating: 0 out of 5 stars0 ratingsProfessional ADO.NET 3.5 with LINQ and the Entity Framework Rating: 3 out of 5 stars3/5Advanced Analytics in Power BI with R and Python: Ingesting, Transforming, Visualizing Rating: 0 out of 5 stars0 ratingsPython Projects for Everyone Rating: 0 out of 5 stars0 ratingsDeveloping Analytic Talent: Becoming a Data Scientist Rating: 3 out of 5 stars3/5Dark Data: Why What You Don’t Know Matters Rating: 3 out of 5 stars3/5Learn dbatools in a Month of Lunches: Automating SQL server tasks with PowerShell commands Rating: 0 out of 5 stars0 ratingsArtificial Intelligence Basics: A Non-Technical Introduction Rating: 5 out of 5 stars5/5R: Recipes for Analysis, Visualization and Machine Learning Rating: 0 out of 5 stars0 ratings
Reviews for Mastering Elastic Stack
0 ratings0 reviews
Book preview
Mastering Elastic Stack - Gupta Ravi Kumar
Table of Contents
Mastering Elastic Stack
Credits
About the Authors
About the Reviewer
www.PacktPub.com
Why subscribe?
Customer Feedback
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Elastic Stack Overview
Introduction to ELK Stack
Logstash
Elasticsearch
Kibana
The birth of Elastic Stack
Beat
Who uses Elastic Stack?
Salesforce
CERN
Green Man Gaming
Stack competitors
Setting up Elastic Stack
Installation of Java
Installation of Java on Ubuntu 14.04
Installation of Java on Windows
Installation of Elasticsearch
Installation of Elasticsearch on Ubuntu 14.04
Installation of Elasticsearch on Windows
Installation of Elasticsearch as a service
Installation of Kibana
Installation of Kibana on Ubuntu 14.04
Installation of Kibana on Windows
Installation of Logstash
Installation of Logstash on Ubuntu 14.04
Installation of Logstash on Windows
Installation of Filebeat
Installation of Filebeat on Ubuntu 14.04
Installation of Filebeat on Windows
X-Pack
Summary
2. Stepping into Elasticsearch
The beginning of Elasticsearch
Key features
Understanding the architecture
Recommended cluster configurations
Minimum master nodes
Local cluster settings
Understanding document processing
Elasticsearch APIs
Document APIs
Single document APIs
Index API
Get API
Delete API
Update API
Multi-document APIs
Multi-get API
Bulk API
Search APIs
Search API
Query parameters
Search shard API
Multi-search APIs
Count API
Validate API
Explain API
Profile API
Field stat API
Indices APIs
Managing indices
Creating an index
Checking if an index exists
Getting index information
Managing index settings
Getting index stats
Getting index segments
Getting index recovery information
Getting shard stores information
Index aliases
Mappings
Closing, opening, and deleting an index
Other operations
Cat APIs
Cluster APIs
Query DSL
Aggregations
Bucket
Metrics aggregations
Avg aggregation
Min aggregation
Max aggregation
Percentiles Aggregation
Sum aggregation
Value count aggregation
Cardinality aggregation
Stats aggregation
Extended stats aggregation
A note for painless scripting
Summary
3. Exploring Logstash and Its Plugins
Introduction to Logstash
Why do we need Logstash?
Features of Logstash
Logstash Plugin Architecture
Logstash Configuration File Structure
Value types
Array
Boolean
Bytes
Codec
Comments
Hash
Number
String
Use of Conditionals
Types of Plugins
Input plugins
Filter plugins
Output plugins
Codec plugins
Exploring Input Plugins
stdin
file
path
udp
Exploring Filter Plugins
grok
mutate
csv
Exploring Output Plugins
stdout
file
elasticsearch
Exploring Codec Plugins
rubydebug
json
avro
multiline
Plugins Command-Line Options
Listing of Plugins
Installing a plugin
Removing a plugin
Updating a plugin
Packing a plugin
Unpacking a plugin
Logstash command-line options
Logstash Tips and Tricks
Referencing fields and Its values
Adding custom-created grok patterns
Logstash does not show any output
When an input file has already been completely read
When an input file is not modified since 1 day
Logstash Configuration for Parsing Logs
Sample Catalina logs
Sample Tomcat logs
Grok pattern for Catalina logs
Grok pattern for Tomcat logs
Logstash configuration file
Monitoring APIs
Node info API
OS Info
JVM info
Pipleine Info
Plugins Info API
Node stats API
JVM stats
Process stats
Pipeline stats
Hot threads API
Threads
Human
Ignore idle threads
Summary
4. Kibana Interface
Kibana and its offerings
Kibana interface
Exploring the discover interface
Time Filter
Quick time filter
Relative time filter
Absolute time filter
Auto-refresh
Querying and Searching data
Full-text searches
Range searches
Boolean searches
Proximity search
Wildcard searches
Regular expressions search
Grouping
Fields and filters
Filtering the field
Functionalities of filters
Discovery page options
Exploring the visualize interface
Understanding aggregations
Bucket aggregations
Metric aggregations
Visualization Canvas
Area chart
Data table
Line chart
Bubble chart
Markdown widget
Metric
Pie chart
Tag clouds
Tile map
Time series
Vertical bar chart
Exploring the Dashboard interface
Understanding Timelion
Exploring Dev Tools
Exploring the Management interface
Index patterns
Saved objects
Advanced Settings
Status
Putting it all together
Input data
Creating a Logstash configuration file
Using Kibana
Top states based on 2003 RUCC
Top states based on 2003 UIC
Top five area names with less than high school diploma 1970
Top five area names with high school diploma 1970
Percentage of adults having less than high school diploma in 1970 by area and state
Top states as per their count and their top 2013 RUCC
Insights
Creating a dashboard in Kibana
Summary
5. Using Beats
Introduction to Beats
How Beats differ from Logstash
How Beats fits into Elastic Stack
An overview of the different types of Beats
Beats by Elastic Team
Packetbeat
Metricbeat
Filebeat
Winlogbeat
Libbeat
Beats by community
Dockbeat
Lmsensorbeat
Exploring Elastic Team Beats
Understanding Filebeat
Filebeat Prospectors Configuration
Processors configuration
Defining a processor
Output Configuration
Elasticsearch Output Configuration
Logstash Output Configuration
Logging Configuration
Understanding Metricbeat
System Module
CPU metricset
Disk I/O metricset
Filesystem metricset
FsStat metricset
Load metricset
Memory metricset
Network metricset
Process Metricset
Installation of Metricbeat
Installation of Metricbeat on Ubuntu 14.04
Understanding Packetbeat
Installation of Packetbeat
Installation of Packetbeat on Ubuntu 14.04
Exploring Community Beats
Understanding Elasticbeat
Installation of Elasticbeat
Installation of Elasticbeat on Ubuntu 14.04
Elasticbeat configuration
Beats in action with Elastic Stack
Exploring Metricbeat with Logstash and Kibana
Step 1-Configuring Metricbeat to send data to Logstash
Step 2-Creating a Logstash configuration file
Step 3-Downloading and loading the sample Beats dashboard
Step 4-Viewing the sample Beats dashboard
Exploring Elasticbeat with Elasticsearch and Kibana
Step 1-Configuring Elasticbeat to send data to Elasticsearch
Step 2-Downloading and loading the Elasticbeat dashboard
Step 3-Viewing the sample Beats dashboard
Summary
6. Elastic Stack in Action
Understanding problem scenario
Understanding the architecture
Preparing Elastic Stack pipeline
What to capture?
Updated architecture
Configuring Elastic Stack components
Setting up Elasticsearch
Setting up agents/Beats
Packetbeat
Metricbeat
Filebeat
Setting up Logstash
grok for nginxlogs
grok for liferaylogs
grok for openDJ logs.
Config File
Setting up Kibana
Setting up Kibana Dashboards
PacketBeat
MetricBeat
Checking DB (MySQL) Performance
Analyzing CPU usage
Keeping an eye on memory
Checking logs
Finding most visited pages
Visitors' map
Number of visitors in a time frame
Request Types
Error type-log levels
Top referrers
Top agents
Alerting using Logstash e-mail capability
Using a message broker
Summary
7. Customizing Elastic Stack
Extending Elasticsearch
Elasticsearch development environment
Anatomy of an Elasticsearch Java plugin
Building the plugin
Extending Logstash
Generating a plugin
Anatomy of the plugin
weather.rb file
Plugin logic implementation
Reading data from API end point
Preparing an event
Publish the event
Building and installing a plugin
Testing our plugin
Extending Beats
libbeat framework
Creating a beat
Anatomy of a Beat
Beat configuration
weatherbeat.go file
Implementing beat logic
Adding the Configuration
Reading data from API
Parsing the data
Preparing an event
Publishing the event
Running the beat
Extending Kibana
Setting up Kibana development environment
Generating the plugin
Anatomy of a plugin
Summary
8. Elasticsearch APIs
The cluster APIs
Cluster health
Cluster State
Cluster stats
Pending tasks
Cluster reroute
Cluster update settings
Node stats
Nodes info API
Task Management API
The cat APIs
Elasticsearch modules
Cluster module
Discovery module
Gateway module
HTTP module
Indices module
Network module
Node client
Plugins module
Scripting
Snapshot/restore module
Thread pools
Transport module
Tribe nodes module
Ingest nodes
Elasticsearch clients
Supported clients
Community contributed clients
Java API
Connecting to a Cluster
Admin tasks
Managing indices
Creating an index
Getting index settings
Updating index settings
Refreshing an index
Managing clusters
Getting cluster tasks
Getting cluster health
Index-level tasks
Managing documents
Indexing a document
Getting a document
Deleting a document
Updating a document
Query DSL and search API
Aggregations
Elasticsearch plugins
Discovery plugins
Ingest plugins
Elasticsearch SQL
Summary
9. X-Pack: Security and Monitoring
Introduction to X-Pack
Installation of X-Pack
Installing X-Pack in Elasticsearch
Installing X-Pack in Kibana
Installing X-Pack on offline systems
Uninstalling X-Pack
Security
Listing of all users in security
Listing of roles in security
Understanding roles in security
Understanding Cluster Privileges
Understanding Run As privileges
Understanding Indices privileges
Decoding default user roles
kibana_user
superuser
transport_client
Adding a role in security
Updating a role in security
Understanding Field Level Security
Adding a user in security
Updating user details in security
Changing the password of a user in security
Deleting a role in security
Deleting a user in security
Viewing X-Pack information
Enabling and disabling of X-Pack features
Monitoring
Exploring monitoring statistics for Elasticsearch
Discovering the Overview tab
Discovering the Indices tab
Discovering the Nodes tab
Exploring monitoring statistics for Kibana
Understanding Profiler
Summary
10. X-Pack: Alerting, Graph, and Reporting
Alerting and notification
Working of watcher
Trigger
Schedule trigger
Input
Simple input
Search input
HTTP input
Chain input
Conditions
Always condition
Never condition
Compare condition
Array compare condition
Script condition
Transforms
Search transform
Script transform
Chain transform
Actions
Throttling
Graph
Working of Graph
Graph UI
Reporting
Summary
11. Best Practices
Why do we require best practices?
Understanding your use case
Managing configuration files
Elasticsearch - elasticsearch.yml
Kibana - kibana.yml
Choosing the right set of hardware
Memory
Java heap size
Swapping memory
Disks
Sizing disk space
I/O
CPU
Network
Searching and indexing performance
Filter cache
Fielddata size
Indexing buffer
Sizing the Elasticsearch cluster
Choosing the right kind of node
Master and data node
Master node
Data node
Ingest node
No master, no data, and no ingest node
Determining the number of nodes
Determining the number of shards
Reducing disk space
Logstash configuration file
Categorizing multiple sources of data
Using conditionals
Using custom grok patterns
Simplifying _grokparsefailure
Mapping of fields
Dynamic templating
Testing configuration
Re-indexing data
Using aliases
Summary
12. Case Study-Meetup
Understanding meetup scenario
Setting things up
A bit of Meetup API understanding
Setting up Elasticsearch
Preparing Logstash
Setting up Kibana
Analyzing data using Kibana
Filtering Content
Number of Meetups by Country
Top 10 meetup cities in world
Meetups trends by duration
Meetups by RSVP Counts
Number of Groups by country
Number of Groups by join mode
Popular Categories
Popular Topics
Meetup Venue Map
Meetups on Map
Just the number of things
Getting Notified
Summary
Mastering Elastic Stack
Mastering Elastic Stack
Copyright © 2017 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author(s), nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: February 2017
Production reference: 1240217
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-78646-001-1
www.packtpub.com
Credits
About the Authors
Ravi Kumar Gupta is an author, reviewer, and open source software evangelist. He pursued an MS degree in software system at BITS Pilani and a B.Tech at LNMIIT, Jaipur. His technological forte is portal management and development.
He is currently working with Azilen Technologies, where he acts as a Technical Architect and Project Manager. His previous assignment was as a lead consultant with CIGNEX Datamatics. He was a core member of the open source group at TCS, where he started working on Liferay and other UI technologies. During his career, he has been involved in building enterprise solutions using the latest technologies with rich user interfaces and open source tools.
He loves to spend time writing, learning, and discussing new technologies. His interest in search engines and that small project on crawler during college time made him a technology lover. He is one of the authors of Test-Driven JavaScript Development, Packt Publishing. He is an active member of the Liferay forum. He also writes technical articles for his blog at TechD of Computer World (http://techdc.blogspot.in).
He has been a Liferay trainer at TCS and CIGNEX, where he has provided training on Liferay 5.x and 6.x versions. He was also a reviewer for Learning Bootstrap, Packt Publishing.
He can be reached on Skype at kravigupta, on Twitter at @kravigupta, and on LinkedIn at https://in.linkedin.com/in/kravigupta.
Seven blessing and my gratitude to my wife, Kriti. Despite tough times, she motivated me throughout the writing period. Support from my wife and my family, specially my father and mother-in-law helped me a lot. I can’t forget my co-author, Yuvraj, for his excellent support and understanding. He has been a great friend and help. Without him, it was not possible to finish. I would also like to thanks PACKT team, reviewers and editorial team for their cooperation. I truly appreciate you guys. Thank you.
Yuvraj Gupta is an author and a keen technologist with interest towards Big Data, Data Analytics, Data Visualization, and Cloud Computing. He has been working as a Big Data Consultant primarily in domain of Big Data Testing. He loves to spend time writing on various social platforms. He is an avid gadget lover, a foodie, a sports enthusiast and love to watch tv-series or movies. He always keep himself updated with the latest happenings in technology. He has authored a book titled Kibana Essentials with Packt Publishers. He can be reached at [email protected] or at LinkedIn www.linkedin.com/in/guptayuvraj.
I would like to thank my family and friends for encouraging and motivating me to write the book. I would like to thank the reviewers and the whole team of PacktPub who were involved in producing this book without their support it would never have been possible. I would like to thank everyone else who helped me directly or indirectly in writing this book. Also I would like to thank my teachers, professors, Gurus, schools and university for playing an important part in providing me with the education which has helped me to gain knowledge. Lastly but not the least I would like to thanks my co-author Ravi without whose help, guidance and support, the book would never have been completed.
About the Reviewer
Marcelo Ochoa works at the system laboratory of Facultad de Ciencias Exactas of the Universidad Nacional del Centro de la Provincia de Buenos Aires and is the CTO at Scotas.com, a company that specializes in near real-time search solutions using Apache Solr and Oracle. He divides his time between university jobs and external projects related to Oracle and big data technologies. He has worked on several Oracle-related projects, such as the translation of Oracle manuals and multimedia CBTs. His background is in database, network, web, and Java technologies. In the XML world, he is known as the developer of the DB Generator for the Apache Cocoon project. He has worked on the open source projects DBPrism and DBPrism CMS, the Lucene-Oracle integration using the Oracle JVM Directory implementation, and the https://restlet.com/ project, where he worked on the Oracle XDB Restlet Adapter, which is an alternative to writing native REST web services inside a database resident JVM. Since 2006, he has been part of an Oracle ACE program. Oracle ACEs are known for their strong credentials as Oracle community enthusiasts and advocates, with candidates nominated by ACEs in the Oracle technology and applications communities. He has coauthored Oracle Database Programming using Java and Web Services by Digital Press and Professional XML Databases by Wrox Press, and has been a technical reviewers for several Packt books, such as Apache Solr 4 Cookbook, ElasticSearch Server and others.
www.PacktPub.com
For support files and downloads related to your book, please visit www.PacktPub.com.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
https://www.packtpub.com/mapt
Get the most in-demand software skills with Mapt. Mapt gives you full access to all Packt books and video courses, as well as industry-leading tools to help you plan your personal development and advance your career.
Why subscribe?
Fully searchable across every book published by Packt
Copy and paste, print, and bookmark content
On demand and accessible via a web browser
Customer Feedback
Thanks for purchasing this Packt book. At Packt, quality is at the heart of our editorial process. To help us improve, please leave us an honest review on this book's Amazon page at https://www.amazon.com/dp/1786460017.
If you'd like to join our team of regular reviewers, you can e-mail us at [email protected]. We award our regular reviewers with free eBooks and videos in exchange for their valuable feedback. Help us be relentless in improving our products!
Preface
Even structured data is useless if it can’t help you to take strategic decisions and improve existing system. If you love to play with data, or your job requires you to process custom log formats, design a scalable analysis system, and manage logs to do real-time data analysis, this book is your one-stop solution. By combining the massively popular Elasticsearch, Logstash, Beats and Kibana, ELK Stack has advanced to Elastic Stack that delivers actionable insights in near real time from almost any type of structured or unstructured data.
This book brushes up your basic knowledge of implementing the Elastic Stack and then dives deeper into complex and advanced scenarios. We’ll help you with data analytics challenges and take you through practical scenario of an intranet portal to understand utilization of Elastic Stack components. You will be able to grasp advanced techniques for log analysis and visualization. Newly announced features such as Beats and X-Pack are also covered in detail with examples.
Toward the end, you will see how to use the Elastic stack for real-world case studies and we’ll show you some best practices and troubleshooting techniques for the Elastic Stack.
What this book covers
Chapter 1, Elastic Stack Overview, covers the shift from ELK Stack to Elastic Stack followed by setup of various components of Elastic Stack.
Chapter 2, Stepping into Elasticsearch, takes us to how Elasticsearch started as a project, how Elasticsearch works and covering various Elasticsearch API’s and Aggregations.
Chapter 3, Exploring Logstash and Its Plugins, covers introduction of Logstash along with understanding it’s architecture. It also covers the various plugins with suitable examples. At the end, a Logstash configuration file is shown for parsing logs.
Chapter 4, Kibana Interface, teaches about the various interfaces present in Kibana in depth along with an example to demonstrate how to combine all the interfaces to create a dashboard.
Chapter 5, Using Beats, takes us to introducing the beats, understanding how beat differs from Logstash followed by exploring various beats, their functionalities and setup steps. At the end, we explored how to use Beats in Elastic Stack.
Chapter 6, Elastic Stack in Action, covers a real-world use-case of an Intranet Portal server and showcases and how to use Elastic Stack components to solve the problem.
Chapter 7, Customizing Elastic Stack, teaches us how to extend each component of Elastic Stack and how to create a plugin for our use-cases.
Chapter 8, Elasticsearch APIs, takes us to various Elasticsearch API’s along with understanding Elasticsearch modules, Ingest nodes, Discovery pPlugins and how to use Java client to access various Elasticsearch operations.
Chapter 9, X-Pack: Security and Monitoring, covers introduction of X-Pack along with installation of X-Pack. It also covers the usage and functionalities provided by Shield, Marvel and Profiler.
Chapter 10, X-Pack: Alerting, Graph, and Reporting, teaches us about the usage and functionalities of Watcher, Graph and Reporting features.
Chapter 11, Best Practices, takes us to understand why do we need to follow best practices along with listing of various best practices which should be followed which has been categorized into multiple sub-sections.
Chapter 12, Case Study-Meetup, covers complete coverage of understanding the problem statement followed by extending Logstash and creating a plugin to fetch required information. It then takes us to understand how to utilize Elastic Stack components to cover end-to-end understanding of Meetup data and showcasing the powerful capabilities of Elastic Stack for data analytics.
What you need for this book
Following table lists all required software and tools needed to execute example in the book. Wherever requires, links to download the software is also present within the chapter as well.
Who this book is for
If you have heard the word ELK stack and want to learn more about it’s latest development and how it became Elastic Stack, this book is for you. If you use analytics or like to play with visualizations on your data, this book helps you to understand how the components of the stack can help you.
Conventions
In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.
Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: The next lines of code read the link and assign it to the to the BeautifulSoup function.
A block of code is set as follows:
#import packages into the project
from bs4 import BeautifulSoup
from urllib.request import urlopen
import pandas as pd
When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:
utf-8>
viewport content=width=device-width
>
Any command-line input or output is written as follows:
C:\Python34\Scripts> pip install -upgrade pip C:\Python34\Scripts> pip install pandas
New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: In order to download new modules, we will go to Files | Settings | Project Name | Project Interpreter.
Note
Warnings or important notes appear in a box like this.
Tip
Tips and tricks appear like this.
Reader feedback
Feedback from our readers is always welcome. Let us know what you think about this book-what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of. To send us general feedback, simply e-mail [email protected], and mention the book's title in the subject of your message. If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.
Customer support
Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.
Downloading the example code
You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
You can download the code files by following these steps:
Log in or register to our website using your e-mail address and password.
Hover the mouse pointer on the SUPPORT tab at the top.
Click on Code Downloads & Errata.
Enter the name of the book in the Search box.
Select the book for which you're looking to download the code files.
Choose from the drop-down menu where you purchased this book from.
Click on Code Download.
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
WinRAR / 7-Zip for Windows
Zipeg / iZip / UnRarX for Mac
7-Zip / PeaZip for Linux
The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Mastering-Elastic-Stack. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
Errata
Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books-maybe a mistake in the text or the code-we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.
To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.
Piracy
Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.
Please contact us at [email protected] with a link to the suspected pirated material.
We appreciate your help in protecting our authors and our ability to bring you valuable content.
Questions
If you have a problem with any aspect of this book, you can contact us at [email protected], and we will do our best to address the problem.
Chapter 1. Elastic Stack Overview
It's as easy to read a log file of a few MBs or hundreds as it is to keep data of this size in databases or files and still get sense out of it. But then a day comes when this data takes up terabytes, petabytes and grows even faster in future. As data demand pushes, normal text editors or word processing tools would refuse to cope up and would not be able to open such a large dataset. There would be a need to analyze the raw data which can be used to discover insights. You start to find something for huge log management, or something that can index the data properly and make sense out of it. If you Google this, you will stumble upon ELK Stack. Elasticsearch manages your data, Logstash reads the data from different sources, and Kibana makes a fine visualization of it.
Recently, ELK Stack has evolved as Elastic Stack. We will get to know more about it in this chapter, along with setting it up. The following are the points that will be covered in this chapter:
Introduction to ELK Stack
The birth of Elastic Stack
Who uses the Stack
Stack competitors
Setting up Elastic Stack
X-Pack
Introduction to ELK Stack
It all began with Shay Banon, who started an open source project called Elasticsearch, successor of Compass, which gained popularity as one of the top open source database engines. Later, based on the distributed model of working, Kibana was introduced, to visualize the data present in Elasticsearch. Earlier, to put data into Elasticsearch, we had Rivers, which provided us with a specific input via which we inserted data into Elasticsearch.
However, with growing popularity, this setup required a tool via which we could insert data into Elasticsearch and have flexibility to perform various transformations on data (to make unstructured data structured and have full control on how to process the data). Based on this premise, Logstash was born, which was then incorporated into the Stack, and together these three tools, Elasticsearch, Logstash, and Kibana were named ELK Stack.
The following diagram is a simple data pipeline using ELK Stack:
As we can see from the preceding figure, data is read using Logstash and indexed to Elasticsearch. Later, we can use Kibana to read the indices from Elasticsearch and visualize it using charts and lists. Let's understand these components separately, and the role they play in the making of the Stack.
Logstash
As mentioned earlier, Rivers were initially used to put data into Elasticsearch before ELK Stack. For ELK Stack, Logstash is the entry point for all types of data. Logstash has so many plugins to read data from a number of sources, and so many output plugins to submit data to a variety of destinations - one of those is the Elasticsearch plugin, which helps to send data to Elasticsearch.
After Logstash became popular, Rivers eventually got deprecated, as they made the cluster unstable and also performance issues were observed.
Logstash does not just ship data from one end to another; it helps us with collecting raw data and modifying/filtering it to convert it to something meaningful, formatted, and organized. The updated data is then sent to Elasticsearch. If there is no plugin available to support reading data from a specific source, writing the data to a location, or modifying it in your own way, Logstash is flexible enough to allow you to write your own plugins.
Simply put, Logstash is open source, highly flexible, rich with plugins and can read your data from your choice of location. It normalizes data as per your defined configurations, and sends it to a particular destination, as per the requirements.
We will be learning more about Logstash in Chapter 3, Exploring Logstash and Its Plugins and Chapter 7, Customizing Elastic Stack.
Elasticsearch
All of the data read by Logstash is sent to Elasticsearch for indexing. Elasticsearch is not only used to index data, it is also full-text search engine, highly scalable, distributed, and offers many more things too. Elasticsearch manages and maintains your data in the form of indices and offers you to query, access, and aggregate the data using its APIs. Elasticsearch is based on Lucene, thus providing you all of the features that Lucene does.
We will be learning more about Elasticsearch in Chapter 2, Stepping into Elasticsearch, Chapter 7, Customizing Elastic Stack, and Chapter 8, Elasticsearch APIs.
Kibana
Kibana uses Elasticsearch APIs to read/query data from Elasticsearch indices, to visualize and analyze in the form of charts, graphs and tables. Kibana is in the form of a web application, providing you with a highly configurable user interface that lets you query the data, create a number of charts to visualize, and make actual sense out of the data stored.
We will be learning more about Kibana in Chapter 4, Kibana Interface and Chapter 7, Customizing Elastic Stack.
After a robust ELK Stack, as time passed, a few important and complex demands took place, such as authentication, security, notifications, and so on. This demand led to the development of a few other tools such as Watcher (providing alerts and notifications based on changes in data), Shield (authentication and authorization for securing clusters), Marvel (monitoring statistics of the cluster), ES-Hadoop, Curator, and Graph, as requirements arose.
The birth of Elastic Stack
All the jobs of reading data were once done using Logstash, but that's resource consuming. Since Logstash runs on JVM, it consumes a good amount of memory. The community realized the need for improvement and to make the pipelining process resource friendly and lightweight. In 2015, Packetbeat was born, a project which was an effort to make a network packet analyzer that could read from different protocols, parse the data, and ship to Elasticsearch. Being lightweight in nature did the trick and a new concept of Beats was formed. Beats are written in Go programming language. The project evolved, and now ELK stack was no more just Elasticsearch, Logstash, and Kibana; Beats also became a significant component.
The pipeline now looked as follows:
Beat
A Beat reads data, parses it, and can ship it to either Elasticsearch or Logstash. The difference is that they are lightweight, serve a specific purpose, and are installed as agents. There are a few Beats available such as Metricbeat, Filebeat, Packetbeat, and so on, which are supported and provided by the Elastic Team and a good number of Beats are already written by the community. If you have a specific requirement, you can write your own Beat using the libbeat library.
In simple words, Beats can be treated as very lightweight agents to ship data to either Logstash or Elasticsearch, offering you an infrastructure using the libbeat library to create your own Beats.
We will be learning more about Beats in Chapter 5, Using Beats and Chapter 7, Customizing Elastic Stack.
Together Elasticsearch, Logstash, Kibana, and Beats became Elastic Stack, formally known as ELK Stack. Elastic Stack did not just add Beats to its team; they will be using the same version always. The starting version of the Elastic Stack will be 5.0.0 and the same version will apply to all the components.
This version and release method is not only for Elastic Stack, but for other tools of the Elastic family as well. Due to there being so many tools, there was a problem of unification, wherein each tool had their own version, and every version was not compatible with each other, hence leading to a problem. To solve this, all of the tools will now be built, tested, and released together.
All of these components play a