Prometheus Monitor

prometheus_monitor

Uploaded by

Martin De La Mora

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views10 pages

Prometheus Monitor

prometheus_monitor

Uploaded by

Martin De La Mora

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

In this video we're going to talk about Prometheus so ﬁrst I’m going to explain to you what

prometheus is and what are different use cases where prometheus is used and why is it such an
important tool in modern infrastructure we're going to go through prometheus architecture so
different components that it contains we're going to see an example configura on and also some
of these key characteris cs why it became so widely accepted and popular especially in
containerized environments
h ps://www.youtube.com/watch?v=h4Sl21AKiDg

Prometheus was created to monitor highly dynamic container environments like kubernetes
docker swarm etc. however it can also be used in a tradi onal non-container infrastructure where
you have just bare servers with applica ons deployed directly on it. Over the past years
prometheus has become the mainstream monitoring tool of choice in container and micro service
world.
So typically, you have mul ple servers that run containerized applica ons and there are hundreds
of different processes running on that infrastructure and things are interconnected so maintaining
such setup to run smoothly and without applica on down mes is very challenging. Imagine
having such a complex infrastructure with loads of servers distributed over many loca ons and you
have no insight of what is happening on hardware level or on applica on level like errors response
latency hardware down or overloaded maybe running out of resources etc. In such complex
infrastructure there are more things that can go wrong when you have tons of services and
applica ons deployed any one of them can crash and cause failure of other services and only have
so many moving pieces and suddenly applica on becomes unavailable. To users you must quickly
iden fy what exactly out of these hundred different things went wrong and that could be difficult
and me-consuming when debugging the system. Manually so let's take a specific example. Say
one specific server ran out of memory and kicked off a running container that was responsible for
providing database sync between two database pods in a kubernetes cluster. That in turn caused
those two database parts to fail that database was used by an authen ca on service that also
stopped working because the database became unavailable and then applica on that depended
on that authen ca on service couldn't authen cate users in the ui anymore. But from a user
perspec ve all you see is error in the ui can't login. So how do you know what actually went wrong
when you don't have any insight of what's going on inside. The cluster you don't see that red line
of the chain of events as displayed here you just see the error.
So now you start working backwards from there to find the cause and fix it. So, you check is the
applica on back and running does it show an excep on. Is the authen ca on service running did
it, crash why did it crash in all the way to the ini al container failure. But what will make this
searching the problem process more efficient would be to have a tool that constantly monitors
whether services are running and alerts the maintainers as soon as one service crashes. So, you
know exactly what happened or even be er it iden fies problems before they even occur and
alerts the system administrators responsible for that infrastructure to prevent that issue.
So, for example in this case it would check regularly the status of memory usage on each server,
and when on one of the servers it spikes over for example 70 percent for over an hour or keeps
increasing, no fy about the risk that the memory on that server. Might soon run out or let's
consider another scenario where suddenly you stop seeing logs for your applica on because
elas csearch doesn't accept any new logs because the server ran out of disk space or elas csearch
reached the storage limit that was allocated for it. again, the monitoring tool would check
con nuously the storage space and compare with the elas c search consump on of space of
storage and it will see the risk and no fy maintainers of the possible storage issue and you can
tell the monitoring tool what that cri cal point is when the alert should be triggered. for example
if you have a very important applica on that absolutely can have any log data loss you may be very
strict and want to take measures as soon as 50 or 60 percent capacity is reached or maybe you
know adding more storage space will take long because it's a bureaucra c process in your
organiza on where you need approval of some it department and several other people then
maybe you also want to be no fied earlier about the possible storage issue.
So that you have more me to fix it or a third scenario where applica on suddenly becomes too
slow because one service breaks down and starts sending hundreds of error messages in a loop
across the network that creates high network traffic and slows down other services too. having a
tool that detects such spikes in network load plus tells you which service is responsible for
causing it can give you mely alert to fix the issue and such automated monitoring and aler ng
is exactly what prometheus offers as a part of a modern devops workflow.
So how does prometheus actually work or how does its architecture actually look like at its core.
prometheus has the main component called prometheus server that does the actual monitoring
work and is made up of three parts it has a me:
 series database that stores all the metrics data like current cpu usage or number of
excep ons in an applica on
 second it has a data retrieval worker that is responsible for ge ng or pulling those metrics
from applica ons services servers and other target resources and storing them or pushing
them into that database
 third it has a web server or server api that accepts queries for that stored data and that web
server component or the server api is used to display the data in a dashboard or ui either
through prometheus dashboard or some other data visualiza on tool like Grafana
So, the prometheus server monitors a par cular thing and that thing could be anything. it could
be an en re Linux server or windows server. it could be a standalone Apache server a single
applica on or service like a database and those things that prometheus monitors are called
targets and each target has units of monitoring for Linux. server target it could be a current cpu
status its memory usage disk space usage etc. for an applica on for example it could be number of
excep ons number of requests or request dura on and that unit that you would like to monitor for
a specific target is called a metric and metrics are what gets saved into prometheus database
component. prometheus defines human readable text-based format for this metrics entries or
data has type and help a ributes to increase its readability. so, help is basically a descrip on that
just describe what the metrics is about and type is one of three metrics types for metrics about
how many mes something happened, like number of excep ons that applica on had or number
of requests it has received.
There is
a counter type metric that can go both up and down is represented by a gauche example what is
the current value of cpu usage now or what is the current capacity of disk space now or what is the
number of concurrent requests at that given moment and for tracking how long something took
or how big for example the size of a request was there is a histogram type.
So now the interes ng ques on is how does prometheus actually collect those metrics from the
targets. prometheus pulls metrics data from the targets from an h p endpoint which by default is
host address slash metrics and for that to work one targets must expose that slash metrics
endpoint and two data available at slash matrix endpoint must be in the format that prometheus
understands and we saw that example metrics before some servers are already exposing
prometheus endpoints. So, you don't need extra work to gather metrics from them but many
services don't have na ve prometheus endpoints.

So extra component is required to do that and this component is exporter. so, exporter is
basically a script or service that fetches metrics from your target and converts them in format
prometheus understands and exposes this converted data at its own slash metrics endpoint
where prometheus can scrape them and prometheus has a list of exporters for diﬀerent services
like mysql elas csearch Linux server build tools cloud pla orms and so on.

So, for example if you want to monitor a Linux server you can download a node exporter tar file
from prometheus repository you can untar and execute it and it will start conver ng the metrics
of the server and making them scrapable at its own slash matrix endpoint and then you can go and
configure prometheus to scrape that endpoint and these exporters are also available as docker
images.
If you want to monitor your mysql container in kubernetes cluster you can deploy a sidecar
container of mysql exporter that will run inside the pod with mysql container connect to it and
start transla ng mysql metrics for prometheus and making them available at its own slash metrics
endpoint and again. once you add mysql exporter endpoint to prometheus configura on
prometheus will start collec ng those metrics and saving them in its database what about
monitoring your own applica ons.

Let's say you want to see how many requests your applica on is ge ng at different mes, or how
many excep ons are occurring ,how many server resources your applica on is using etc for this
use. case there are prometheus client libraries for different languages like node.js java etc using
these libraries you can expose the slash metrics scraping endpoint in your applica on and provide
different metrics that are relevant for you on that endpoint and this is a pre y convenient way for
the infrastructure team to tell developers emit metrics that are relevant to you and will collect
and monitor them in our infrastructure and I will also link the list of client libraries prometheus
supports where you can see the documenta on of how to use them.
Prometheus pulls
Prometheus pulls this data from endpoints and that's actually an important characteris c of
Prometheus. let's see why most monitoring systems like amazon cloud watch or new relief etc use
a push system meaning applica ons and servers are responsible for pushing their metric data to a
centralized collec on pla orm of that monitoring tool. When you're working with many
microservices and you have each service pushing their metrics to the monitoring system, it creates
a high load of traffic within your infrastructure and your monitoring can actually become your
bo leneck. So you have monitoring which is great but you pay the price of overloading your
infrastructure with constant push requests from all the services and thus flooding the network,
plus you also have to install daemons on each of these targets to push the metrics to monitoring
server, while prometheus requires just a scraping endpoint and this way metrics can also be pulled
by mul ple prometheus instances and another advantage of that is using Pull prometheus can
easily detect whether service is up and running for example when he doesn't respond on the pull
or when the endpoint isn't available while with push if the service doesn't push any data or send
its health status it might have many reasons other than the service isn't running it could be that
network isn't working the package got lost on the way or some other problem so you don't really
have an insight of what happened . But there are limited number of cases where a target that needs
to be monitored runs only for a short me so they aren't around long enough to be scrapped.
Example could be a batch job or scheduled job that say cleans up some old data or does backups
etc , for such jobs prometheus offers push gateway component so that these services can push
their metrics directly to prometheus database but obviously using push gateway to gather metrics
in, prometheus should be an excep on because of the reasons I men oned earlier so how does
prometheus know what to scrape and when all that is configured in prometheus.yaml
configura on file. so, you define which targets prometheus should scrape and at what interval
prometheus then uses a service discovery mechanism to find those target endpoints.
Service Discovery
when you first download and install prometheus you will see the sample config file with some
default values in it here is an example we have global config that defines scrape interval or how
o en prometheus will scrape its targets and you can override these for individual targets. The
rule files block specifies the loca on of any rules we want prometheus server to load and the rules
are basically either for aggrega ng matrix values or crea ng alerts when some condi on is met
like cpu usage reached 80 percent for example.
So prometheus uses rules to create new me series entries and to generate alerts and the
evalua on interval op on in global config defines how o en prometheus will evaluate these rules
in the last block scrape configs controls what resources prometheus monitors. This is where you
define the targets since prometheus has its own metrics endpoint to expose its own data it can
monitor its own health.
In this default configura on there is a single job called prometheus which scrapes the metrics
exposed by the prometheus server. so, it has a single target at localhost 1990 and prometheus
expects metrics to be available on a target on a path of slash metrics which is a default path that
is configured for that endpoint and here you can also define other endpoints to scrape through
jobs. so, you can create another job and for example override the scrape interval from the global
configura on and define the target host address. so, a couple of important points here. so, the
first one is how does prometheus actually trigger the alerts that are defined by rules and who
receives them prometheus has a component called alert manager that is responsible for firing
alerts via different channels it could be email it could be a slack channel or some other
no fica on client .
Alert Manager
So prometheus server will read the alert rules and if the condi on in the rules is met an alert gets
fired through that configured channel and the second one is prometheus data storage where
does prometheus store all this data that it collects and then aggregates and how can other
systems access this data prometheus stores the metrics data on disk so it includes a local on disk
me series database but also op onally integrates with remote storage system and the data is
stored in a custom me series format and because of that you can't write prometheus data directly
into a rela onal database for example. Once you've collected the metrics prometheus also lets you
query the metrics data on targets through its server api using promptql query language . You can
use prometheus dashboard ui to ask the prometheus server via promql to for example show the
status of a par cular target right now or you can use more powerful data visualiza on tools like
grafana to display the data which under the hood. Also uses promql to get the data out of
prometheus and this is an example of a promql query which this one here basically queries all h p
status codes except the ones in 400 range and this one basically does some sub query on that for
a period of 30 minutes and this is just to give you an example of how is query language look like.
But with grafana instead of wri ng promptq queries directly into the prometheus server ui, you
basically have grafina ui where you can create dashboards that can then in the background use
prom ql to query the data that you want to display now concerning promql the prometheus
configura on in grafana ui. Configuring prometheus yml file to scrape different targets and then
crea ng all those dashboards to display meaningful data out of the script metrics can actually be
pre y complex and it's also not very well documented. So there is some steep learning curve to
learning how to correctly configure prometheus and how to then query the collected metrics data
to create dashboards. The final point is an important characteris c of prometheus that it is
designed to be reliable. Even when other systems have an outage so that you can diagnose the
problems and fix them so each prometheus server is standalone and self-containing meaning it
doesn't depend on network storage or other remote services it's meant to work when other parts
of the infrastructure are broken and you don't need to set up extensive infrastructure to use it
which of course is a great thing. however it also has disadvantage that prometheus can be difficult
to scale so when you have hundreds of servers you might want to have mul ple prometheus
servers that somewhere aggregate all this metrics data and configuring that and scaling
prometheus in that way can actually be very difficult because of this characteris c so while using
a single node is less complex and you can get started very easily it puts a limit on the number of
metrics that can be monitored by Prometheus. so, to work around that you either increase the
capacity of the prometheus server so it can store more metrics data or you limit the number of
metrics that prometheus collects from the applica ons to keep it down to only the relevant ones.
and finally in terms of prometheus with docker and kubernetes as I men oned throughout the
video with different examples prometheus is fully compa ble with both and prometheus
components are available as docker images and therefore can easily be deployed in kubernetes or
other container environments and it integrates great with kubernetes infrastructure providing
cluster node resource monitoring out of the box which means once it's deployed on kubernetes it
starts gathering matrix data on each kubernetes node server without any extra configura on.

Prometheus Ebook v2
75% (4)
Prometheus Ebook v2
231 pages
Mastering Prometheus & Grafana
No ratings yet
Mastering Prometheus & Grafana
18 pages
Application Monitoring With Prometheus: Intro, Practical Tips, and Adform's Experience
No ratings yet
Application Monitoring With Prometheus: Intro, Practical Tips, and Adform's Experience
41 pages
Turnbull James Monitoring With Prometheus PDF
100% (1)
Turnbull James Monitoring With Prometheus PDF
394 pages
Prometheus and Grafana Monitoring Tools 1703260158
No ratings yet
Prometheus and Grafana Monitoring Tools 1703260158
59 pages
Prometheus Certified Associate
No ratings yet
Prometheus Certified Associate
513 pages
Prometheus Certified Associate-1
No ratings yet
Prometheus Certified Associate-1
513 pages
Devops Ultimate Monitoring Project
No ratings yet
Devops Ultimate Monitoring Project
17 pages
DevOps Shack - Comprehensive Monitoring Guide
No ratings yet
DevOps Shack - Comprehensive Monitoring Guide
41 pages
Prometheus Certified Associate-1
No ratings yet
Prometheus Certified Associate-1
513 pages
Prometheus
No ratings yet
Prometheus
34 pages
16 - Prometheus Handout
No ratings yet
16 - Prometheus Handout
31 pages
Intro To Prometheus Workshop - Grafana
No ratings yet
Intro To Prometheus Workshop - Grafana
67 pages
Prometheus Grafana Setup
No ratings yet
Prometheus Grafana Setup
4 pages
Advanced Excel Top 9 Report
No ratings yet
Advanced Excel Top 9 Report
19 pages
Mastering Monitoringwith Prometheusand Grafanae 356 A 4305 D 8896 CF
No ratings yet
Mastering Monitoringwith Prometheusand Grafanae 356 A 4305 D 8896 CF
14 pages
Monitoring
No ratings yet
Monitoring
63 pages
Introduction To Prometheus PromQL. Local Setup Included - by Ivan Polovyi - Level Up Coding
No ratings yet
Introduction To Prometheus PromQL. Local Setup Included - by Ivan Polovyi - Level Up Coding
35 pages
Kubernetes Monitoring With Prometheus Grafana
No ratings yet
Kubernetes Monitoring With Prometheus Grafana
6 pages
Interview Questions On Prometheus and Grafana
No ratings yet
Interview Questions On Prometheus and Grafana
33 pages
Chapter 4, E Commerce Security & Payment Systems
No ratings yet
Chapter 4, E Commerce Security & Payment Systems
21 pages
DAS Fiber Transport V1.3
No ratings yet
DAS Fiber Transport V1.3
4 pages
Log 1
No ratings yet
Log 1
758 pages
Ceragon FibeAir 2500 Product Description V3.3.30 RevA
No ratings yet
Ceragon FibeAir 2500 Product Description V3.3.30 RevA
127 pages
Grafana
No ratings yet
Grafana
88 pages
LTE FDD Baseband Proc Tech Spec
No ratings yet
LTE FDD Baseband Proc Tech Spec
15 pages
2007 02 01b Janecek Perceptron
No ratings yet
2007 02 01b Janecek Perceptron
37 pages
Alerta PDF
No ratings yet
Alerta PDF
123 pages
An Introduction To Prometheus: Brian Brazil Founder
No ratings yet
An Introduction To Prometheus: Brian Brazil Founder
42 pages
Prometheus Course
No ratings yet
Prometheus Course
162 pages
Sns-En-Cli Console SSH Commands Reference Guide-V4
No ratings yet
Sns-En-Cli Console SSH Commands Reference Guide-V4
163 pages
Gestetner DSm635.645 General Settings Guide
No ratings yet
Gestetner DSm635.645 General Settings Guide
188 pages
Booking Confirmation
No ratings yet
Booking Confirmation
56 pages
SRECon EMEA 2017 - Monitoring Cloudflare's Planet-Scale Edge Network With Prometheus
No ratings yet
SRECon EMEA 2017 - Monitoring Cloudflare's Planet-Scale Edge Network With Prometheus
76 pages
UNIT5 - PLDS, CPLDs & FPGA
No ratings yet
UNIT5 - PLDS, CPLDs & FPGA
203 pages
Adding Observability To A Kubernetes Cluster Using Prometheus - by Martin Hodges - Jan, 2024 - Medium
No ratings yet
Adding Observability To A Kubernetes Cluster Using Prometheus - by Martin Hodges - Jan, 2024 - Medium
2 pages
Lesson 3 S3
No ratings yet
Lesson 3 S3
37 pages
Linux Performance Troubleshooting
No ratings yet
Linux Performance Troubleshooting
78 pages
Observing Enterprise Kubernetes Clusters at Scale
No ratings yet
Observing Enterprise Kubernetes Clusters at Scale
59 pages
An Answer
No ratings yet
An Answer
106 pages
Soulution BRO Serie7 EN A4
No ratings yet
Soulution BRO Serie7 EN A4
36 pages
School English 11 159 November 2008
No ratings yet
School English 11 159 November 2008
36 pages
Monitor Health Graf Prom
No ratings yet
Monitor Health Graf Prom
34 pages
Prometheus Part 13 Use Cases
No ratings yet
Prometheus Part 13 Use Cases
24 pages
11 Prometheus Interview Question & Answers
No ratings yet
11 Prometheus Interview Question & Answers
9 pages
AIOps Anomaly Detection With Prometheus Marcel Hild Red Hat
No ratings yet
AIOps Anomaly Detection With Prometheus Marcel Hild Red Hat
53 pages
Prom Notes
No ratings yet
Prom Notes
47 pages
Grafana
No ratings yet
Grafana
13 pages
HK Citation 200 Owner S Manual en
No ratings yet
HK Citation 200 Owner S Manual en
29 pages
Monitoring Ec2 Instance
No ratings yet
Monitoring Ec2 Instance
15 pages
Assignment 3
No ratings yet
Assignment 3
13 pages
Lagan Valley
No ratings yet
Lagan Valley
21 pages
Server Monitoring
No ratings yet
Server Monitoring
12 pages
Devo P Monitoring
No ratings yet
Devo P Monitoring
15 pages
Bai 5 - He Thong Canh Bao
No ratings yet
Bai 5 - He Thong Canh Bao
12 pages
ArticleText 56606 1 10 20210426
No ratings yet
ArticleText 56606 1 10 20210426
21 pages
Cloud Storage
No ratings yet
Cloud Storage
14 pages
Lecture 6
No ratings yet
Lecture 6
20 pages
Tesi
No ratings yet
Tesi
73 pages
Devo
No ratings yet
Devo
17 pages
Ch-2 Information Technology
No ratings yet
Ch-2 Information Technology
15 pages
29 Using Prometheus Alertmanager Node Exporter To Monitor A Companys Geo Distributed Infrastructure
No ratings yet
29 Using Prometheus Alertmanager Node Exporter To Monitor A Companys Geo Distributed Infrastructure
12 pages
Prom Qna
No ratings yet
Prom Qna
43 pages
Code Review Report
No ratings yet
Code Review Report
14 pages
### Build and Monitor Your FastAPI Microservice With Docker, Prometheus and Grafana. (Part-1) - by Collins Onyemaobi - Medium
No ratings yet
### Build and Monitor Your FastAPI Microservice With Docker, Prometheus and Grafana. (Part-1) - by Collins Onyemaobi - Medium
13 pages
What Is Direct Connect Gateway
No ratings yet
What Is Direct Connect Gateway
8 pages
Phippy Goes To The Zoo - 231005 - 130330
No ratings yet
Phippy Goes To The Zoo - 231005 - 130330
21 pages
BASH Programming 50 Questions
No ratings yet
BASH Programming 50 Questions
14 pages
F 3 D Solver Runner Debug Log Preview
No ratings yet
F 3 D Solver Runner Debug Log Preview
11 pages
Unit 5
No ratings yet
Unit 5
13 pages
Observability Basic
No ratings yet
Observability Basic
6 pages
Observability - Part 2
No ratings yet
Observability - Part 2
9 pages
Network Monitoring
No ratings yet
Network Monitoring
8 pages
SRE-Practical Work 3 Monitoring and Alerting Setup
No ratings yet
SRE-Practical Work 3 Monitoring and Alerting Setup
6 pages
House Dzone Refcard 293 Getting Started Prometheus
No ratings yet
House Dzone Refcard 293 Getting Started Prometheus
6 pages
Software Engineer Job Description
No ratings yet
Software Engineer Job Description
5 pages
RESUME Updated 2013
No ratings yet
RESUME Updated 2013
3 pages
APROG - PL - Intellij - Tutorial - EN
No ratings yet
APROG - PL - Intellij - Tutorial - EN
6 pages
All MonitoringTools Configurations
No ratings yet
All MonitoringTools Configurations
5 pages
Prometheus Concepts
No ratings yet
Prometheus Concepts
4 pages
Spotlight To Olly Migration
No ratings yet
Spotlight To Olly Migration
6 pages
Linux Boot Process
No ratings yet
Linux Boot Process
4 pages
Essential Prometheus Interview Questions Detailed Answers
No ratings yet
Essential Prometheus Interview Questions Detailed Answers
7 pages
Abusing HTTP Hop-By-Hop Request Headers
No ratings yet
Abusing HTTP Hop-By-Hop Request Headers
4 pages
1515 SecureAgentBestPracticesAndTuningGuidelines en H2L
No ratings yet
1515 SecureAgentBestPracticesAndTuningGuidelines en H2L
5 pages
16 Monitoring Part4 02
No ratings yet
16 Monitoring Part4 02
5 pages
VSN 9 & 11 Series: Video Wall Controller Solutions
No ratings yet
VSN 9 & 11 Series: Video Wall Controller Solutions
2 pages
Java Notes (Inheritance)
No ratings yet
Java Notes (Inheritance)
4 pages
SF-ROUTER-4G-CAT6 User Manual-V1.1
No ratings yet
SF-ROUTER-4G-CAT6 User Manual-V1.1
2 pages
7.IT Infra Support Q&A
No ratings yet
7.IT Infra Support Q&A
3 pages
Linux Sysadmin Cheat Sheet: by Via
No ratings yet
Linux Sysadmin Cheat Sheet: by Via
2 pages
Folleto GA1-240202501-AA2-EV01
No ratings yet
Folleto GA1-240202501-AA2-EV01
2 pages
Bai Bao
No ratings yet
Bai Bao
2 pages
Speeeechh
No ratings yet
Speeeechh
2 pages
Sanket Suryawanshi Resume
No ratings yet
Sanket Suryawanshi Resume
2 pages
Monitoring Basics
No ratings yet
Monitoring Basics
1 page
Atmel Example List
No ratings yet
Atmel Example List
1 page
Aslin DevOPs
No ratings yet
Aslin DevOPs
1 page

Prometheus Monitor

Uploaded by

Prometheus Monitor

Uploaded by

In this video we're going to talk about Prometheus so ﬁrst I’m going to explain to you what

You might also like