0% found this document useful (0 votes)
8 views18 pages

Prometheus

The document provides an overview of Prometheus, an open-source monitoring system and time series database. It describes key Prometheus concepts like metrics, data types, functions, best practices, and how to visualize data using PromQL and Grafana.

Uploaded by

ankitbansal006
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views18 pages

Prometheus

The document provides an overview of Prometheus, an open-source monitoring system and time series database. It describes key Prometheus concepts like metrics, data types, functions, best practices, and how to visualize data using PromQL and Grafana.

Uploaded by

ankitbansal006
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 18

Prometheus

Ankit Bansal
Introduction

Prometheus provides:
• A data scrapper that pulls metrics data over
http periodically at a configured interval
• A time series database to store all the
metrics data
• A simple user interface where you can
visualize query and monitor all the metrics
data.
Metrics

Common Metrics Custom Metrics​​


CPU Usage Metrics applicable for your business requirement​​
Memory consumption • No of registrations done in last 2 days​​
Requests received • Pending assessment ​evaluation at the moment​​
Etc...

To support custom metrics, Prometheus support 4 types of metrics


• Counter
• Gauge
• Histogram
• Summary
Counter

• Count the number of occurrences of a


particular event
• Ex: Request Count, Registrations Count
etc...
• Counter value can only increase or reset
to zero on application restarts
Gauge

• Represents a single value that can increase


or decrease over time
• Ex: Memory usage, no of pending
assessment evaluation etc.
Histogram

• It can be used to see the performance of the


application in different scenario
• It record metrics in buckets
Summary

• Similar to histogram
• Instead of measuring the distribution
of values over time it measures the
quantile of values over time
• Avg response time =
(0.23+0.3+…+60.33+0.5)/10=6.307
• To ignore such outliers, better user
percentiles, ex: 50th percentile
o 0.19, 0.23,0.26......... (sort in asc order)
o Take middle value and divide by 2
(0.3+0.36)/2=0.33
o It means around fifty percent of the request took
0.33sec
o Similarly, there are p90, p99
Summary vs Histogram

• Summaries also measure events and are an


alternative to histograms.
• They are cheaper but lose more data.
• They are calculated on the application level
hence aggregation of metrics from multiple
instances of the same process is not
possible.
• They are used when the buckets of a metric
are not known beforehand
• It is highly recommended to use histograms
over summaries whenever possible
PromQL

• Prometheus Query Language


• Used to get metrics from
prometheus
• Build dashboards
Analogy with SQL
SQL PromQL
Select * from mntc_http_request_count mntc_http_request_count
Select * from mntc_http_request_count mntc_http_request_count{path="/api/ping"}
Where path="/api/ping"
Select * from mntc_http_request_count mntc_http_request_count{path="/api/ping",
Where path="/api/ping" and method="GET" method="GET"}

Select * from mntc_http_request_count mntc_http_request_count@1890745000


Where timestamp='1890745000'
Select * from mntc_http_request_count mntc_http_request_count offset 5m
Where timestamp=DATE_SUB(now(), INTERVAL 5
MINUTE)
Select * from mntc_http_request_count mntc_http_request_count[5m]
Where timestamp>DATE_SUB(now(), INTERVAL
5 MINUTE)
Data Types

Scalar: Simple numeric values


sum(mntc_http_request_count) 120

Instant Vector: Single value at a given timestamp


mntc_http_request_count 5@184973000

Range Vector: Multiple Values in the given timeperiod


mntc_http_request_count[3m] 20@187655000
40@187655030
60@187655060
Time No of Request (Counter
metrics)

Function: Rate 11:00:00 70


11:00:30 90
•Rate: Calculates the per second 11:01:00 92
average rate of increase i.e. rate of
change in the given timestamp 11:01:30 95

Assume current time is 11:03:30 11:02:00 98

rate(mntc_PerformanceService_meth 11:02:30 100


od_count{method="getTrainingPerf 11:03:00 178
ormance"}[3m]) = (190-90)/(3*60)
= 0.55 11:03:30 190
11:04:00 240
11:04:30 298
11:05:00 310
Time No of Request (Counter
metrics)

Function: iRate 11:00:00 70


11:00:30 90
•iRate: Similar to rate, useful to see 11:01:00 92
sudden spikes or drop in metric.
11:01:30 95
It takes the last value and second last
value instead of first value 11:02:00 98
Assume current time is 11:03:30
11:02:30 100
irate(mntc_PerformanceService_met 11:03:00 178
hod_count{method="getTrainingPer
formance"}[3m]) 11:03:30 190
= (190-178)/(3*60) = 0.06
11:04:00 240
11:04:30 298
11:05:00 310
Time No of Request (Counter
metrics)

Function: Increase 11:00:00 70


11:00:30 90
•Increase: Gives net change in the 11:01:00 92
counter value over the specific
period 11:01:30 95

It takes the last value and first value 11:02:00 98


to calculate increase
Assume current time is 11:03:30 11:02:30 100
11:03:00 178
increase(mntc_PerformanceService_
method_count{method="getTraining 11:03:30 190
Performance"}[3m]) = (190-90)
= 100 11:04:00 240
11:04:30 298
11:05:00 310
Mistakes to avoid

• Too many labels: Split up the metrics by


too many labels like userId, it can
hamper prometheus server
• Too short rate windows: At least 2
samples needed to calculate the trend.
We have scrap interval of 30s.
Best Practice
rate window >= 4 X (scrap_interval)
Golden Metrics

• Traffic / Rate
sum(rate(mntc_PerformanceService_met
hod_count{}[10m]))

• Latency / Duration
histogram_quantile(0.9,sum by (le)
(rate(mntc_UserService_method_duratio
n_metric_bucket[10m])))
Over to Grafana

Grafana
Questions ?

You might also like