Prometheus Concepts
Prometheus Concepts
1. memory usage
One of the servers memory usage is more than 70% for more than an hour and keeps
increasing
2. logs unavailable
Elasticsearch does not accept any new logs due to the disk space or the storage
limit allocated to it completely run out.
one of the services breaks down and starts sending the eror message continuously
consuming all the network bandwidth and slows down the other services
continuous monitoring and timely alerts could fix the issue before it gets out of
hand
Components of Prometheus
Prometheus Server
1. Time Series Database (Storage) - stores metrics data like CPU,
memory, disk soace utilization, number of requests, exceptions and etc
2. Data Retrieval Worker (Retrieval) - pulls the metrics from
applications, services, servers and other target resources and stores them on the
Storage database
3. HTTP Web Server - accepts queries through Server API for the stored
data using PromQL and displays them on the Prometheus Web UI/ Dashboard or any
other Data Visualization tools like Grafana and etc
Prometheus can monitor a wide range of items like Windows / Linux Server, Apache
Server, Single Application, Services like Database and etc.
These are called Prometheus Targets.
For Servers
CPU Utilization
Memory Usage
Disk Space Consumption
The unit that we want to monitor for a target is called as a metric and these
metrics are stored in the Prometheus Storage database.
Other targets which do not expose the endpoint by default, needs another component
called Exporter
There are various exporters available for MySQL, Elasticsearch, Linux Servers,
Build Tools, Cloud Platforms and so on
- For Linux server: node exporter tar file from Prometheus repository
For monitoring our own application metrices, we can use Prometheus provided client
libraries to expose /metrics endpoint to start scraping
If applications are pushing the metrics to the monitoring system, the network gets
flooded with traffic, eg. AWS CloudWatch, New Relic
Pulling the metrics by the monitoring tool is the better and Prometheus has this
advantage.
For services that are short-lived, the services can push the metrics to
Pushgateway, from where the Prometheus pulls those metrices
It configures
1. what to scrape and
2. when to scrape.
3. on which targets
4. at what interval
rule_files:
# - "first.rules"
# - "second.rules"
scrape_configs:
- job_name: prometheus
static_configs:
- targets: ['localhost:9090']
scrape_interval can be used to define how often the Prometheus will scrape its
target
rule_files define the threshold at which the alerts are created, like 50% disk
space utilization and so on
evaluation_interval can be used to define how often the above rules are evaluated
scrape_configs defines what resources Prometheus monitors
scrape_configs:
- job_name: prometheus
static_configs:
- targets: ['localhost:9090']
- job_name: node_exporter
scrape_interval: 1m
scrape_timeout: 1m
static_configs:
- targets: ['localhost:9100']
Below are defaults for all jobs and we do not need to provide them, unless we want
to change them.
metrics_path: "/metrics"
scheme: "http"
Prometheus reads the rules defined in the config files and when the condition is
met, it pushes respective alert to its another component called Alertmanager which
can be used to notify users / other systems via email / slack and etc
Prometheus stores this metrics data on disk, it could be local or remote storage
system, but a time series database on disk.
It uses Custom Time Series format and cannot be writen to a relational database
We can use Server API via Prometheus Web UI using PromQL to query Prometheus
We can also use more powerful data visualization tools like Grafana to query
Prometheus using promQL
PromQL examples
- Learning PromQL
- Configuring Prometheus YAML Files
- Creating Grafana Dashboards
Prometheus is designed to be
- reliable,
- stand alone and self-containing
- works well even if other parts are broken
- no extensive setup required
- single node is less complex
Prometheus components are available as docker images which helps to deploy them in
Kubernetes or other containerized clusters
It integrates very well with Kubernetes infrastructure and provides cluster node
resource monitoring out of the box. Once it is deployed on Kubernetes, it starts
gathering metrics data from each node server without any extra configuration.