Prometheus PromQL Basics
Prometheus PromQL Basics
An introduction to PromQL
Agenda
● PromQL introduction
○ Data model
○ Metrics types
● Further PromQL
○ Scalars / Instant / Range Vectors
○ Functions and Aggregations
The Prometheus Data Model
Time Series Data
● Measurement data stored with a
time stamp
● Conducive to mathematical
functions, capacity planning,
predictions, and alerting
Prometheus data model
Every time series is uniquely identified by:
● Its metric name
● Optional key-value pairs called labels (to distinguish metrics with the
same name)
● Timestamps with millisecond-level precision
● Value stored as 64-bit floating point number
| 18
Cardinality equates to Prometheus performance
https://www.robustperception.io/cardinality-is-key
Prometheus Metric Types
Prometheus metric types
Prometheus offers four core metric types:
● Counters
● Gauges
● Histograms
● Summary
| 26
Metric: Counters
| 27
Metric types: Counters
● Starts at 0 and is incremented.
● Track the number or size of events and the value your app expose on
their /metrics endpoint is the total since it started
● Examples:
○ CPU seconds total
○ Packets processed total
○ Requests handled total
● Generally used with functions such as rate() or irate()
● Resets and process restarts?
| 28
Metric types: Counters
Should you care about the number of seconds?
node_cpu_seconds_total{mode="system"}
| 29
Metric types: Counters
Best to use a rate function with Counters
- rate()
- irate()
- increase()
rate(node_cpu_seconds_total{mode="system"}[1m])
| 30
Metric: Gauges
| 31
Metric types: Gauges
● Snapshot of state, and usually when aggregating them you want to take
a sum, average, minimum, or maximum
● Examples:
○ Temperature
○ Items in queue
○ Disk space used
○ Memory used
○ Number of switches
○ Packets processed in the last 5mins
| 32
Metric types: Gauges
node_memory_MemFree_bytes
| 33
Metric types: Gauges
For graphs, best used with any *_over_time aggregation:
For Gauge type visualizations, best to use the Last data point
avg_over_time(node_memory_MemFree_bytes[5m])
| 34
Metric: Histogram
| 35
Metric types: Histograms
● Samples observations (i.e request durations or response sizes) and
counts them in configurable buckets.
● Buckets are user-definable in ranges
● It implements also 2 time series: _sum and _count of observed values
| 36
Working with Histograms
● Histograms are essentially collections of counter metrics
● Based on time take to perform an operation
● Allows you to compartmentalize counts into ‘buckets’
○ Buckets are user-definable in ranges
○ +Inf bucket must always be present
● Grafana includes several useful histogram visualizations
0 0 0 0 0
0 0 0 0 0
POST /api/db
Time to response: 34ms
0 0 1 1 1
POST /api/db
Time to response: 147ms
0 0 1 2 2
POST /api/db
Time to response: 1ms
1 1 2 3 3
| 44
Metric types: Summaries
● Quantiles are calculated on the client side.
● are a good option if you need to calculate accurate quantiles, but can’t
be sure what the range of the values will be
| 45
Histograms vs Summaries
● Whenever in doubt , choose histograms :
○ they allow you to calculate the percentile (establish your SLO) on the
buckets
○ They are less expensive as they are calculated on the fly on the server
side
● Choose summaries if:
○ Your application is already instrumented with this metric type
○ You know exactly what percentile you need and define them upfront
○ You don’t need aggregations
Metric types: compared
General
Can go up and down No Yes No No
Is an approximation No No Yes Yes
Querying
Can calculate percentiles No No Yes Yes
Can use rate function Yes No No No
Can query with historgram_quantile
No No Yes No
function
| 47
Introducing PromQL
PromQL
Prometheus provides a functional query language called PromQL that lets
the user select and aggregate time series data in real time.
The result of an expression can either be shown as a graph, viewed as
tabular data in Prometheus's expression browser, or consumed by external
systems via the HTTP API.
| 52
Selectors
Selects all time series for a specific metric name:
prometheus_http_requests_total
Selects only those time series for a specific metrics name that has the label
code set to 200
prometheus_http_requests_total{code="200"}
prometheus_http_requests_total{code!~"2..|3.."}
prometheus_http_requests_total{handler=~"/api.*",
code!="200"}
Range
Select all the values we have recorded within the last 1 minute for all time
series for a specific metric name:
prometheus_http_requests_total{handler="/-/healthy"}[1m]
Functions (i)
Used to apply aggregation, math operation or transformations i.e.:
rate(v range-vector)
count_over_time(range-vector)
ceil(v instant-vector)
label_replace(v instant-vector, dst_label string,
replacement string, src_label string, regex string)
PromQL Data Types
PromQL data types
Scalar: a single number with no dimensionality. They are just a simple
numeric floating-point value that do not have any labels, they are just
numbers.
Instant Vector: a set of time series containing a single sample for each time
series and all sharing the same timestamp.
Range Vector: a set of time series containing a range of data points over
time for each time series.
PromQL: Scalars
| 66
PromQL: Instant Vectors
To run an instant query in Grafana, make sure to select Format table
and Type instant.
Format: table
Type: instant
| 67
PromQL: Range Vectors
To run an instant query in Grafana, make sure to select Format table
and Type instant.
Format: table
Type: instant
| 68
Query types
Grafana supports two different query types:
● Instant query
● Range Query
Query: Instant
| 70
Instant query
Instant queries are executed over the [range], where now is the
execution time of the query.
now
avg_over_time(...[1m])
Instant query
As the range is increased [2m], more data is queried
now
avg_over_time(...[2m])
Query: Range
| 74
Range query
Range queries are instant queries executed repeatedly every step.
Note: Grafana sets the step automatically based on the time range
of the query. As you zoom in and out, the step changes.
now
1m 1m 1m
Range query
step == [range]
now
1m 1m 1m
Range query
step > [range]
now
2m 2m
Range query
step < [range]
now
Overlap
avg_over_time(...[2m])
avg_over_time(...[2m]) avg_over_time(...[2m])
1m 1m 1m
Range query
step == [$__interval]
5m 5m 5m
Best practice
| 87
PromQL: <aggregation>_over_time()
avg_over_time(range-vector): the average value of all points in the specified interval.
stddev_over_time(range-vector): the population standard deviation of the values in the specified interval.
stdvar_over_time(range-vector): the population standard variance of the values in the specified interval.
| 88
Aggregations
Rate by *:
rate(prometheus_http_requests_total[$__rate_interval])
Rate by code:
sum by (code) (
rate(prometheus_http_requests_total[$__rate_interval])
)