0% found this document useful (0 votes)
11 views

Prometheus PromQL Basics

The document provides an introduction to PromQL, the query language for Prometheus, detailing its data model, metric types, and querying capabilities. It covers the four core metric types (Counters, Gauges, Histograms, and Summaries), their characteristics, and best practices for using them. Additionally, it explains how to use selectors, functions, and aggregations in PromQL to effectively analyze time series data.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Prometheus PromQL Basics

The document provides an introduction to PromQL, the query language for Prometheus, detailing its data model, metric types, and querying capabilities. It covers the four core metric types (Counters, Gauges, Histograms, and Summaries), their characteristics, and best practices for using them. Additionally, it explains how to use selectors, functions, and aggregations in PromQL to effectively analyze time series data.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 62

PromQL

An introduction to PromQL
Agenda
● PromQL introduction
○ Data model
○ Metrics types
● Further PromQL
○ Scalars / Instant / Range Vectors
○ Functions and Aggregations
The Prometheus Data Model
Time Series Data
● Measurement data stored with a
time stamp

● It’s typically regular increments of


data points over time

● Conducive to mathematical
functions, capacity planning,
predictions, and alerting
Prometheus data model
Every time series is uniquely identified by:
● Its metric name
● Optional key-value pairs called labels (to distinguish metrics with the
same name)
● Timestamps with millisecond-level precision
● Value stored as 64-bit floating point number

<identifier> → [(t0, v0), (t1, v1), … ]

metric name / labels int64 float64

| 18
Cardinality equates to Prometheus performance

● GOOD: datacenter, HTTP method ✅


● BAD: IP address, username 󰢃

https://www.robustperception.io/cardinality-is-key
Prometheus Metric Types
Prometheus metric types
Prometheus offers four core metric types:
● Counters
● Gauges
● Histograms
● Summary

* Histograms and Summaries are considered Complex Types

| 26
Metric: Counters

| 27
Metric types: Counters
● Starts at 0 and is incremented.
● Track the number or size of events and the value your app expose on
their /metrics endpoint is the total since it started
● Examples:
○ CPU seconds total
○ Packets processed total
○ Requests handled total
● Generally used with functions such as rate() or irate()
● Resets and process restarts?

| 28
Metric types: Counters
Should you care about the number of seconds?

node_cpu_seconds_total{mode="system"}

| 29
Metric types: Counters
Best to use a rate function with Counters
- rate()
- irate()
- increase()

rate(node_cpu_seconds_total{mode="system"}[1m])

| 30
Metric: Gauges

| 31
Metric types: Gauges
● Snapshot of state, and usually when aggregating them you want to take
a sum, average, minimum, or maximum
● Examples:
○ Temperature
○ Items in queue
○ Disk space used
○ Memory used
○ Number of switches
○ Packets processed in the last 5mins

| 32
Metric types: Gauges
node_memory_MemFree_bytes

| 33
Metric types: Gauges
For graphs, best used with any *_over_time aggregation:

For Gauge type visualizations, best to use the Last data point

avg_over_time(node_memory_MemFree_bytes[5m])

| 34
Metric: Histogram

| 35
Metric types: Histograms
● Samples observations (i.e request durations or response sizes) and
counts them in configurable buckets.
● Buckets are user-definable in ranges
● It implements also 2 time series: _sum and _count of observed values

| 36
Working with Histograms
● Histograms are essentially collections of counter metrics
● Based on time take to perform an operation
● Allows you to compartmentalize counts into ‘buckets’
○ Buckets are user-definable in ranges
○ +Inf bucket must always be present
● Grafana includes several useful histogram visualizations

0 0 0 0 0

0-5ms 0-20ms 0-50ms 0-500ms +Inf


Working with Histograms
Imagine measure you want to measure the latency of a specific
endpoint…

0 0 0 0 0

0-5ms 0-20ms 0-50ms 0-500ms +Inf


Working with Histograms

POST /api/db
Time to response: 34ms

0 0 1 1 1

0-5ms 0-20ms 0-50ms 0-500ms +Inf


Working with Histograms

POST /api/db
Time to response: 147ms

0 0 1 2 2

0-5ms 0-20ms 0-50ms 0-500ms +Inf


Working with Histograms

POST /api/db
Time to response: 1ms

1 1 2 3 3

0-5ms 0-20ms 0-50ms 0-500ms +Inf


Percentiles using Histograms
● How do you calculate average request duration? (example)
rate(prometheus_http_request_duration_seconds_sum{handler="/api/v1/query"}[5m])
/
rate(prometheus_http_request_duration_seconds_count{handler="/api/v1/query"}[5m])

● What is my 95%-ile batch size? (example)


histogram_quantile(0.95,
sum(
rate(prometheus_http_request_duration_seconds_bucket{}[5m])
) by (le)
)
Metric: Summaries

| 44
Metric types: Summaries
● Quantiles are calculated on the client side.
● are a good option if you need to calculate accurate quantiles, but can’t
be sure what the range of the values will be

| 45
Histograms vs Summaries
● Whenever in doubt , choose histograms :
○ they allow you to calculate the percentile (establish your SLO) on the
buckets
○ They are less expensive as they are calculated on the fly on the server
side
● Choose summaries if:
○ Your application is already instrumented with this metric type
○ You know exactly what percentile you need and define them upfront
○ You don’t need aggregations
Metric types: compared

Counter Gauge Histogram Summary

General
Can go up and down No Yes No No
Is an approximation No No Yes Yes

Querying
Can calculate percentiles No No Yes Yes
Can use rate function Yes No No No
Can query with historgram_quantile
No No Yes No
function

| 47
Introducing PromQL
PromQL
Prometheus provides a functional query language called PromQL that lets
the user select and aggregate time series data in real time.
The result of an expression can either be shown as a graph, viewed as
tabular data in Prometheus's expression browser, or consumed by external
systems via the HTTP API.

| 52
Selectors
Selects all time series for a specific metric name:
prometheus_http_requests_total

Selects only those time series for a specific metrics name that has the label
code set to 200
prometheus_http_requests_total{code="200"}

The part in the curly brackets is called label matchers.


Selectors
= Select labels that are exactly equal to the provided string.
!= Select labels that are not equal to the provided string.
=~ Select labels that regex-match the provided string.
!~ Select labels that do not regex-match the provided string.
Selectors
prometheus_http_requests_total{code=~"400|500"}

prometheus_http_requests_total{code!~"2..|3.."}

prometheus_http_requests_total{handler=~"/api.*",
code!="200"}
Range
Select all the values we have recorded within the last 1 minute for all time
series for a specific metric name:

prometheus_http_requests_total{handler="/-/healthy"}[1m]
Functions (i)
Used to apply aggregation, math operation or transformations i.e.:

rate(v range-vector)
count_over_time(range-vector)
ceil(v instant-vector)
label_replace(v instant-vector, dst_label string,
replacement string, src_label string, regex string)
PromQL Data Types
PromQL data types
Scalar: a single number with no dimensionality. They are just a simple
numeric floating-point value that do not have any labels, they are just
numbers.

Instant Vector: a set of time series containing a single sample for each time
series and all sharing the same timestamp.

Range Vector: a set of time series containing a range of data points over
time for each time series.
PromQL: Scalars

| 66
PromQL: Instant Vectors
To run an instant query in Grafana, make sure to select Format table
and Type instant.

Format: table
Type: instant

| 67
PromQL: Range Vectors
To run an instant query in Grafana, make sure to select Format table
and Type instant.

Format: table
Type: instant

Data points in the


Measurements in the
last [1m]
1m (range)
range

| 68
Query types
Grafana supports two different query types:
● Instant query
● Range Query
Query: Instant

| 70
Instant query
Instant queries are executed over the [range], where now is the
execution time of the query.
now

avg_over_time(...[1m])
Instant query
As the range is increased [2m], more data is queried

now

avg_over_time(...[2m])
Query: Range

| 74
Range query
Range queries are instant queries executed repeatedly every step.

Note: Grafana sets the step automatically based on the time range
of the query. As you zoom in and out, the step changes.
now

avg_over_time(...[1m]) avg_over_time(...[1m]) avg_over_time(...[1m])

1m 1m 1m
Range query
step == [range]

now

avg_over_time(...[1m]) avg_over_time(...[1m]) avg_over_time(...[1m])

1m 1m 1m
Range query
step > [range]

now

Data is not queried


avg_over_time(...[1m]) avg_over_time(...[1m])

2m 2m
Range query
step < [range]

now

Overlap

avg_over_time(...[2m])
avg_over_time(...[2m]) avg_over_time(...[2m])

1m 1m 1m
Range query
step == [$__interval]

As Grafana choses a new step, the range is automatically updated to


match
now

avg_over_time(...[$__i avg_over_time(...[$__i avg_over_time(...[$__i


nterval]) nterval]) nterval])

5m 5m 5m
Best practice

Use $__interval or $__rate_interval by default for the best


performance and most accurate results

More about this here


Aggregations
PromQL: Aggregation operators
sum (calculate sum over dimensions)

min (select minimum over dimensions)

max (select maximum over dimensions)

avg (calculate the average over dimensions)

group (all values in the resulting vector are 1)

stddev (calculate population standard deviation over dimensions)

stdvar (calculate population standard variance over dimensions)

count (count number of elements in the vector)

count_values (count number of elements with the same value)

bottomk (smallest k elements by sample value)

topk (largest k elements by sample value)

quantile (calculate φ-quantile (0 ≤ φ ≤ 1) over dimensions)

| 87
PromQL: <aggregation>_over_time()
avg_over_time(range-vector): the average value of all points in the specified interval.

min_over_time(range-vector): the minimum value of all points in the specified interval.

max_over_time(range-vector): the maximum value of all points in the specified interval.

sum_over_time(range-vector): the sum of all values in the specified interval.

count_over_time(range-vector): the count of all values in the specified interval.

quantile_over_time(scalar, range-vector): the φ-quantile (0 ≤ φ ≤ 1) of the values in the specified interval.

stddev_over_time(range-vector): the population standard deviation of the values in the specified interval.

stdvar_over_time(range-vector): the population standard variance of the values in the specified interval.

last_over_time(range-vector): the most recent point value in specified interval.

present_over_time(range-vector): the value 1 for any series in the specified interval.

| 88
Aggregations
Rate by *:
rate(prometheus_http_requests_total[$__rate_interval])

Rate by code:
sum by (code) (
rate(prometheus_http_requests_total[$__rate_interval])
)

Note the use of $__rate_interval


Aggregations
avg_over_time(
probe_http_duration_seconds[$__interval]
)

Note the use of $__interval


Operators
PromQL: Operators
Arithmetic Comparison Binary logic
+ (addition) == (equal) and (intersection)
- (subtraction) != (not-equal) or (union)
* (multiplication) > (greater-than) unless (complement)
/ (division) < (less-than)
% (modulo) >= (greater-or-equal)
^ (exponentiation) <= (less-or-equal)
PromQL: Arithmetic operators
instant vector / scalar (example):
rate(node_network_transmit_bytes_total[$__rate_interval]) / 1024

instant vector + instant vector (example):


rate(node_network_transmit_bytes_total[$__rate_interval])
+
rate(node_network_receive_bytes_total[$__rate_interval])
PromQL: Comparison operators
avg_over_time(node_load1[1m]) >
avg_over_time(node_load1[1m] offset 1d)

Useful for setting up alert thresholds


* more about offsets

avg_over_time(node_load1[1m]) > bool


avg_over_time(node_load1[1m] offset 1d)
PromQL: Logical binary operators
Show the avg if stddev is above 0.1 (example):
avg_over_time(process_open_fds[$__interval])
and
(stddev_over_time(process_open_fds[5m]) > 0.1)

Alternatively use unless operator (example):


avg_over_time(process_open_fds[$__interval])
unless
(stddev_over_time(process_open_fds[5m]) < 0.1)
Q&A

You might also like