0% found this document useful (0 votes)
144 views

Intro To Prometheus Workshop - Grafana

This document provides an agenda for an interactive workshop on Prometheus. It introduces the presenters and their backgrounds. The agenda includes introductions to Prometheus, hands-on exercises on Prometheus UI and queries, PromQL, alerting and high availability. It also includes polls to engage participants and a recap of key aspects of Prometheus like being open source, running anywhere, and being incredibly efficient.

Uploaded by

andreiionita
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
144 views

Intro To Prometheus Workshop - Grafana

This document provides an agenda for an interactive workshop on Prometheus. It introduces the presenters and their backgrounds. The agenda includes introductions to Prometheus, hands-on exercises on Prometheus UI and queries, PromQL, alerting and high availability. It also includes polls to engage participants and a recap of key aspects of Prometheus like being open source, running anywhere, and being incredibly efficient.

Uploaded by

andreiionita
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 67

Grafana Labs:

Prometheus & more


Interactive Workshop
Prometheus Interactive Workshop

Feel free to ask


questions in the
Zoom chat
Willie Engelbrecht, Senior Solutions Engineer
- Electronics hobbyist and beginner
astrophotographer

https://www.linkedin.com/in/willie-engelbrecht/
Aengus Rooney, Principal Solutions Engineer
- Enjoys cycling and swimming
- Can load a Pez dispenser in one go

@aengusrooney

https://www.linkedin.com/in/aengusrooney/
Nabeel Saad, Principal Solutions Engineer,
- Sci-Fi and gaming aficionado
- Enjoys ultimate frisbee, trapezing, and walks
on the beach
@saadnabs

www.linkedin.com/in/nabeelsaad
Emil A. Siemes, Principal Solutions Engineer
- Running
- Baking sourdough bread

https: //www.linkedin.com/in/emil-andreas-siemes-a793926/
Prometheus Introduction

Hands-on breakout
Prometheus UI & Queries (20 min)

PROMETHEUS INTERACTIVE WORKSHOP


PromQL
Agenda
Hands-on breakout
PromQL, Grafana Explore, & Dashboarding
(20 min)

Alerting and HA

| 7
Meet your team
Introduce yourself to your breakout group!

Please turn on your camera and say ‘Hi’ by sharing your:


● Name
● Company
● Title / Role
Poll Q
A pull-based monitoring system The Prometheus
with dynamic service discovery, Monitoring System
built for the cloud.

A powerful query language and


multidimensional data model for
rich, ad hoc analysis.

Open source, incredibly resource


efficient and simple to operate.
Cloud Native

Prometheus pulls (“scrapes”) metrics from jobs.

Service Discovery is source-of-truth for jobs existence.

Jobs can be natively instrumented jobs + or use


adapters (“exporters”).
Rich Analysis
Combine app & infra metrics for capacity planning SLO Alerting to reduce pager fatigue

rate(container_cpu_usage_seconds_total[1m]) slo_errors_total{job=”frontend”} 145


/
rate(request_count[1m])

Join app metrics & metadata to monitor rollouts

... rate(request_count[1m]) ...


Figure 5-2. Error rate over a 36-hour period
* on (instance) group_left (image)
... kube_pod_container_info ... https://landing.google.com/sre/workbook/cha
pters/alerting-on-slos/
Recap
Trusted: open source, Apache licensed and
vendor neutral

Run anywhere: single binary with no


dependencies

Incredibly efficient: monitor thousands of


machines with a single Prometheus
Poll Q
Prometheus Data Model &
Time Series Data
Time Series Data
● Measurement data stored with a
time stamp

● It’s typically regular increments of


data points over time

● Conducive to mathematical
functions, capacity planning,
predictions, and alerting
Prometheus Data Model
Timeseries

<identifier> → [ (t0, v0), (t1, v1), … ]

What metric? int64 float64


Prometheus Data Model

tns_response_message_bytes_count

Metric name
Prometheus Data Model

tns_response_message_bytes_count{job="tns-app", status_code=”200”}
tns_response_message_bytes_count{job="tns-app", status_code=”404”}
tns_response_message_bytes_count{job="tns-app", status_code=”500”}

Metric name (identifier) Labels


Prometheus Data Model
float64

tns_response_message_bytes_count{job="tns-app", status_code=”503”} 77776

Metric name Labels


Prometheus Architecture
Instrumentation & Exposition

Targets

web app
clientlib

API
server clientlib

Linux VM
exporter

mysql
exporter

Windows
VM exporter
Prometheus Architecture
Instrumentation & Exposition

Targets

web app
clientlib

API
server clientlib

Linux VM
Prometheus
exporter
TSDB
mysqld
exporter

cgroups
exporter

Collection, Storage & Processing


Prometheus Architecture
Instrumentation & Exposition

Service Discovery
Targets
(DNS, Kubernetes, AWS, Consul,
custom...)
web app
clientlib

API
server clientlib

Linux VM
Prometheus
exporter
TSDB
mysqld
exporter

cgroups
exporter

Collection, Storage & Processing


Prometheus Architecture
Instrumentation & Exposition

Service Discovery
Targets
(DNS, Kubernetes, AWS, Consul,
custom...)
web app
clientlib

API
server clientlib

Grafana
Linux VM
Prometheus Web UI
exporter
TSDB
mysqld
exporter

cgroups
exporter Querying, Dashboards
Collection, Storage & Processing
Poll Q
Installing Prometheus
prometheus.io
Get and Untar
Start Node Exporter
prometheus.yml
Prometheus UI Breakout
See PDF in chat window
Poll Q
PromQL Data Types

● Scalars

● Instant vectors

● Range vectors
PromQL Data Types

● Instant vectors:

prometheus_http_requests_total
● Range vectors

prometheus_http_requests_total{code="200"}[5m]
Prometheus Metric Types

● Counters

● Gauges

● Histograms

● Summary
Counters


rate()

● Range vectors

prometheus_http_requests_total{code="200"}[5m]
Gauges


Histogram
# TYPE prometheus_http_request_duration_seconds histogram
prometheus_http_request_duration_seconds_bucket{handler="/",le="0.1"} 100
prometheus_http_request_duration_seconds_bucket{handler="/",le="0.2"} 200
prometheus_http_request_duration_seconds_bucket{handler="/",le="0.4"} 300
prometheus_http_request_duration_seconds_bucket{handler="/",le="1"} 400
prometheus_http_request_duration_seconds_bucket{handler="/",le="+Inf"} 1000
prometheus_http_request_duration_seconds_sum{handler="/"}
1847.000596540000001
prometheus_http_request_duration_seconds_count{handler="/"} 1000

histogram_quantile(0.95, rate(prometheus_http_request_duration_seconds_bucket[1m]))
Summary
# HELP prometheus_rule_evaluation_duration_seconds The duration for a
rule to execute.
# TYPE prometheus_rule_evaluation_duration_seconds summary
prometheus_rule_evaluation_duration_seconds{quantile="0.5"} 6.4853e-05
prometheus_rule_evaluation_duration_seconds{quantile="0.9"} 0.00010102
prometheus_rule_evaluation_duration_seconds{quantile="0.99"}
0.000177367
prometheus_rule_evaluation_duration_seconds_sum 1.623860968846092e+06

prometheus_rule_evaluation_duration_seconds_count 1.112293682e+09
PromQL: Binary Operators
Arithmetic

+ (addition)
- (subtraction)
* (multiplication)
/ (division)
% (modulo)
^ (exponentiation)
Operators
PromQL: Binary Operators
Arithmetic Comparison Binary logic

+ (addition) == (equal) and (intersection)


- (subtraction) != (not-equal) or (union)
* (multiplication) > (greater-than) unless (complement)
/ (division) < (less-than)
% (modulo) >= (greater-or-equal)
^ (exponentiation) <= (less-or-equal)
PromQL: Binary Operators
Arithmetic Comparison Binary logic

+ (addition) == (equal) and (intersection)


- (subtraction) != (not-equal) or (union)
* (multiplication) > (greater-than) unless (complement)
/ (division) < (less-than)
% (modulo) >= (greater-or-equal)
^ (exponentiation) <= (less-or-equal)
PromQL: Binary Operators
Arithmetic Comparison Binary logic

+ (addition) == (equal) and (intersection)


- (subtraction) != (not-equal) or (union)
* (multiplication) > (greater-than) unless (complement)
/ (division) < (less-than)
% (modulo) >= (greater-or-equal)
^ (exponentiation) <= (less-or-equal)
Label Matching Operators
Label Matching Operators
Aggregation operators
sum (calculate sum over dimensions)
min (select minimum over dimensions)
max (select maximum over dimensions)
avg (calculate the average over dimensions)
stddev (calculate population standard deviation over dimensions)
stdvar (calculate population standard variance over dimensions)
count (count number of elements in the vector)
count_values (count number of elements with the same value)
bottomk (smallest k elements by sample value)
topk (largest k elements by sample value)
quantile (calculate φ-quantile (0 ≤ φ ≤ 1) over dimensions)
| 50
PromQL & Dashboarding
Breakout
Alerting
The Four
Golden Signals

| 53
The Four
Golden Signals

| 54
USE and RED Methods
Utilization (U): The proportion of the resource that is used
Saturation (S): A measure of how “full” a service is, often measured by
latency.
Errors (E): The count of error events or rate of failed requests.

Rate (R): The number of requests per second.


Errors (E): The number of failed requests.
Duration (D): The amount of time to process a request.

| 55
Alertmanager
● Separate component that sits alongside Prometheus
● Handles alerts received from Prometheus (built-in alerting)

| 56
What Does Alertmanager Do?

● 📨 Routes them
○ Determines who should
receive an alert
○ Sends them along to a
notification channel
■ E.g. email, Slack, PagerDuty,
etc.
■ Webhooks

| 57
What Does Alertmanager Do?
● 📥 Deduplication

● 🔃 Provide High Availability (HA)

| 58
Example rule

- alert: KubernetesPodCrashLooping
expr: increase(kube_pod_container_status_restarts_total[1m]) > 3
for: 2m
labels:
severity: warning
annotations:
summary: Kubernetes pod crash looping (instance {{ $labels.instance }})
description: "Pod {{ $labels.pod }} is crash looping\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"

https://awesome-prometheus-alerts.grep.to/rules#kubernetes

| 59
Scaling
Prometheus HA

| 61
Federation or…

Global Federation

eu-west-1 us-central-1 asia-east-1

| 62
Mimir

Central Mimir Cluster

eu-west-1 us-central-1 asia-east-1


Grafana Cloud

Grafana Cloud

eu-west-1 us-central-1 asia-east-1


Summary
● Prometheus
○ Simple to operate
○ Powerful & concise query language
○ Fantastic service discovery
○ Grafana Dashboards
● Easy to get started using Grafana Cloud
● https://grafana.com/signup/cloud (free tier available)

| 65
Wrap Up
AMA

Thank You

You might also like