Observability Fundamentals
Observability Fundamentals
Observability Fundamentals
Topics
• Observability with the Elastic Stack
• Logs
• Metrics
• APM
Lesson 1
Observability with the Elastic Stack
Observability
• It is not a technology
• It is an attribute of a system
‒ like high availability, stability and usability
• It helps detect undesirable behaviors
‒ e.g. errors, service downtime and slow responses
• It provides granular information to debug production issues
quickly and efficiently
‒ e.g. application traces, event logs and resource information
Observability
Elasticsearch Kibana
Index patterns
Kibana heartbeat-* my_index
Indices
Elasticsearch
Lesson 1
Review - Observability with the
Elastic Stack
Summary
• Observability is a search use case
• Observability helps detect undesirable behaviors and
provides granular information to debug production issues
quickly and efficiently
• The three main pillars of observability are logs, metrics and
application traces, but uptime and machine learning are
extra pillars often present in observable systems
• The Elastic Stack provides a way to have a unified
implementation of observability
• Elastic Common Schema is a standard defined by Elastic to
make sure that all the data collected from different sources
can be correlated
Lesson 1
Lab - Observability with the Elastic
Stack
Lab Environment
• Visit Strigo using the link that was shared with you, and log
in if you haven't already done so
• Click on "My Lab" on the left
Lesson 2
Logs
Business Questions
???
When should
we schedule How many
download service webinar signups did
maintenance? we get from
Europe?
- Jordan Sissel
27
What is a log?
• logs are records of activities
‒ by a system
‒ by an application
‒ by a device
‒ by a human
‒ …
• timestamp + data
• Time Formats
‒ "Oct 11 20:21:47", "020805 13:51:24"
• Decentralized
‒ logs are spread across all of your servers
‒ SSH + grep aren’t scalable
• Experts Required
‒ limited access to log files on servers
‒ limited knowledge of the log format
Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or
distributing without written permission is strictly prohibited 31
Logs Lifecycle
Elasticsearch
Hot Warm
3. Logs Processed
1. Genesis and Stored
5. Archive
2. Logs sent
Filebeat
6. Purge
4. Search and
Analysis Kibana
Discovery
Visualize
Dashboard
Graph
Logging Cluster
Server
Filebeat Elasticsearch
Lesson 2
Review - Logs
Summary
• Logs can give us the answers to many questions which we
ask of our data
• A log consists of a message with both a timestamp and
some piece(s) of data
• Filebeat monitors log directories or specific log files
• Filebeat Modules simplify the collection, parsing, and
visualization of common log formats
• Once the data is sent to Elasticsearch, it is possible to
query Elasticsearch to explore the data
Lesson 2
Lab - Logs
Observability Fundamentals
Lesson 3
Metrics
Monitoring
• Systems and services are generating a lot of data that
should be:
‒ stored
‒ analyzed
‒ monitored
Is my system or Will my
service alright right system or service
now? be alright tonight?
Host Metrics
Server
Elasticsearch Elasticsearch
Logs
Elasticsearch
Metrics
[2018-09-07T07:48:00,381][INFO ][o.e.x.m.a.TransportDeleteExpiredDataAction]
[_8LMCWq] Deleting expired data
Which
time zone are
we speaking
about? 2018-09-07T06:10:00
Oh ok! New
York time zone!
2018-09-07 06:10:00 -0400
3. Metrics stored
2. Metrics sent
5. Archive
Metricbeat
1. Scheduling
6. Purge
4. Search and
Analyze Kibana
Discovery
Visualize
Dashboard
Graph
Metrics Cluster
Server
Metricbeat Elasticsearch
Lesson 3
Review - Metrics
Summary
• Metrics and Logs provide important observability data
• Logs are about what happened and when it happened
• Metrics are about collecting a certain information
periodically
• Metricbeat can collect multiple metrics from systems and
services
• Once the data is sent to Elasticsearch, it is possible to
query Elasticsearch to explore the data
Lesson 3
Lab - Metrics
Observability Fundamentals
Lesson 4
APM
What is APM?
• Application Performance Monitoring
Client Client
Client
Agent
Server
Agent
Server
Data Data User
Agent Processor Storage Interface
Client
Client Agent
Agent
Collects
data and sends to
APM server
Processes
data and sends Visualizes
to Elasticsearch data sent to
Elasticsearch
Stores data
APM
server receives data
from agents
APM server
transforms data into
Elasticsearch documents
Transaction
Span
Span
Span
Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or
distributing without written permission is strictly prohibited 67
Errors
• An error is either a captured exception or a captured log
‒ it can contain a stack trace, which is helpful for debugging
‒ the culprit of the error indicating where it originated
‒ and might relate to the transaction during which it happened
• For simplicity, errors are represented by a unique ID
server/top.js in <anonymous> at line 27
25.
26. app.get('/top/10', function (req, res) {
27. apm.captureError('this is a string', function (err) {
28. if (err) {
29. res.status(500).send('could not capture error: ' + err.message)
server.js in <anonymous> at line 75
73. })
74.
75. next()
76. })
77.
Copyright Elasticsearch BV 2015-2019 Copying, publishing and/or
distributing without written permission is strictly prohibited 68
Metrics
• APM agents automatically pick up basic host-level metrics
‒ including system and process-level CPU and memory metrics
• Agents specific metrics are also available
‒ like JVM metrics in the Java agent
‒ and Go runtime metrics in the Go agent
Span
Span
Transaction 2
Span
Transaction 3
Span
Span
Lesson 4
Review - APM
Summary
• Elastic APM allows you to monitor software services and
applications in real time, collecting detailed performance
information on response time for incoming requests,
database queries, calls to caches, external HTTP requests,
etc
• Elastic APM also automatically collects unhandled errors
and exceptions
• Elastic APM consists of four components: Elasticsearch,
APM agents, APM server, and Kibana APM UI
• Distributed Tracing enables you to analyze performance
throughout your micro services architecture all in one view
• Once the data is sent to Elasticsearch, it is possible to
query Elasticsearch to explore the data
Lesson 4
Lab - APM
Quiz Answers
Observability with the Elastic Stack
1. False. Observability is more than just detecting undesirable
behaviors and it is also about providing operators with
granular information to debug production issues quickly and
efficiently.
2. Logs, Metrics and APM.
3. True. Implementing observability with the Elastic Stack
allows the correlation of data through unified UIs, machine
learning and alerting.