0% found this document useful (0 votes)
15 views15 pages

Devo P Monitoring

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views15 pages

Devo P Monitoring

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Ultimate monitoring using Prometheus: Ensuring

optimal performance & Reliability

Components:
• Prometheus: It’s an open-source tool for monitoring and alerting applications. It
uses the concept of scrapping when target systems metric points are contacted to
fetch data at regular intervals.
• Node exporter: It is a monitoring agent that is installed on all target machines so that
Prometheus can fetch the data from all the metrics endpoints.
• Blackbox exporter: It is used to get information from the website like traffic is
coming from the website or not
• Alert manager: The Alert manager handles alerts sent by client applications such as
the Prometheus server. Used to set alert based on conditions so we will be notified ex
if the website is down for continuous 1- 5 minutes, service unavailability

Pre-requisites to start:
Created a security group with the following ports open:
• 22 for SSH
• 80 for HTTP
• 443 for HTTPS
• 25 for SMTP
• 465 for SMTPS
• 587 for SMTP
• 9090 for Prometheus
• 9093 for Alert manager
• 9115 for Blackbox Exporter
• 9100 for Node Exporter
Project steps:
Step 1: Lunched a 2 Ec2 Instance with ubuntu AMI, instance type=t2.medium, storage=20GB
and name them as Virtual machine 1 and Virtual machine 2

Prometheus components exporter tar files: https://prometheus.io/download/

Step 2: In Virtual machine 1:

Downloaded Node Exporter and start


→ sudo apt update
## Download Node Exporter
→ wget
https://github.com/prometheus/node_exporter/releases/download/v1.8.1/node_exporter-
1.8.1.linux-amd64.tar.gz

##Extract Node Exporter


→ tar xvfz node_exporter-1.8.1.linux-amd64.tar.gz
→ mv node_exporter-1.8.1.linux-amd64 node_exporter
##Start Node Exporter
→ cd node_exporter
→ ./node_exporter &

Step 3:
In Virtual machine 2 install Prometheus, Alert manager, Blackbox Exporter
Install Prometheus
→ sudo apt update
→ wget
https://github.com/prometheus/prometheus/releases/download/v2.52.0/prometheus-
2.52.0.linux-amd64.tar.gz
→ tar xvfz prometheus-2.52.0.linux-amd64.tar.gz
→ mv prometheus-2.52.0.linux-amd64 prometheus
→ cd prometheus
→ ./prometheus --config.file=prometheus.yml &

Alert Manager
→ wget
https://github.com/prometheus/alertmanager/releases/download/v0.27.0/alertmanager-
0.27.0.linux-amd64.tar.gz
→ tar xvfz alertmanager-0.27.0.linux-amd64.tar.gz
→ mv alertmanager-0.27.0.linux-amd64 alertmanager
→ cd alertmanager
→ ./alertmanager --config.file=alertmanager.yml &

Blackbox Exporter
→ wget
https://github.com/prometheus/blackbox_exporter/releases/download/v0.25.0/blackbox_
exporter-0.25.0.linux-amd64.tar.gz
→ tar xvfz blackbox_exporter-0.25.0.linux-amd64.tar.gz
→ mv blackbox_exporter-0.25.0.linux-amd64 blackbox_exporter
→ cd blackbox_exporter
→ ./blackbox_exporter &

Once completing the above steps, we will be able to see all folders like this

Once the VM-1 node exporter is up and running we can see the below webpage
Step 4:
Now let’s run a simple game application to monitor

To run the boardgame application on the website page we need to have java and maven to
build, so will install them using the below commands

→ cd Boardgame
→ sudo apt install openjdk-11-jre-headless
→ sudo apt install maven -y
→ mvn package // to build the project

We can execute the jar file to run the application on browser


→ cd target
→ ls // can see .jar file
→ java -jar database_service_project-0.0.4.jar

Now will access the game application at: http://3.135.20.106:8080/


Step 5:
Next, go to VM-2 to configure the Prometheus server by defining alert-rules for the
different scenarios. and based on these rules we will get the alerts
→ cd Prometheus
→ ./Prometheus &

Can access the Prometheus server at: http://3.145.128.69:9090/graph

For now, we can’t see any alert rules so let’s create a new alert_rules.yaml file to configure
alert rules in Prometheus server

vi alert_rules.yaml

groups:
- name: alert_rules # Name of the alert rules group
rules:
- alert: InstanceDown
expr: up == 0 # Expression to detect instance down
for: 1m
labels:
severity: "critical"
annotations:
summary: "Endpoint {{ $labels.instance }} down"
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minute."

- alert: WebsiteDown
expr: probe_success == 0 # Expression to detect website down
for: 1m
labels:
severity: critical
annotations:
description: The website at {{ $labels.instance }} is down.
summary: Website down

- alert: HostOutOfMemory
expr: node_memory_MemAvailable / node_memory_MemTotal * 100 < 25 # Expression to detect
low memory
for: 5m
labels:
severity: warning
annotations:
summary: "Host out of memory (instance {{ $labels.instance }})"
description: "Node memory is filling up (< 25% left)\n VALUE = {{ $value }}\n LABELS: {{
$labels }}"

- alert: HostOutOfDiskSpace
expr: (node_filesystem_avail{mountpoint="/"} * 100) / node_filesystem_size{mountpoint="/"} <
50 # Expression to detect low disk space
for: 1s
labels:
severity: warning
annotations:
summary: "Host out of disk space (instance {{ $labels.instance }})"
description: "Disk is almost full (< 50% left)\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"

- alert: HostHighCpuLoad
expr: (sum by (instance) (irate(node_cpu{job="node_exporter_metrics",mode="idle"}[5m]))) > 80
# Expression to detect high CPU load
for: 5m
labels:
severity: warning
annotations:
summary: "Host high CPU load (instance {{ $labels.instance }})"
description: "CPU load is > 80%\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"

- alert: ServiceUnavailable
expr: up{job="node_exporter"} == 0 # Expression to detect service
unavailability
for: 2m
labels:
severity: critical
annotations:
summary: "Service Unavailable (instance {{ $labels.instance }})"
description: "The service {{ $labels.job }} is not available\n VALUE = {{ $value }}\n LABELS: {{
$labels }}"

- alert: HighMemoryUsage
expr: (node_memory_Active / node_memory_MemTotal) * 100 > 90 # Expression to detect high
memory usage
for: 10m
labels:
severity: critical
annotations:
summary: "High Memory Usage (instance {{ $labels.instance }})"
description: "Memory usage is > 90%\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"

- alert: FileSystemFull
expr: (node_filesystem_avail / node_filesystem_size) * 100 < 10 # Expression to detect file system
almost full
for: 5m
labels:
severity: critical
annotations:
summary: "File System Almost Full (instance {{ $labels.instance }})"
description: "File system has < 10% free space\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"

Now we need to input the above rules file to Prometheus server by updating
prometheus.yml file

Now to view these alert rules on our Prometheus website page, you need to restart the
Prometheus server

→ pgrep prometheus //to get process id


→ kill id
→ ./prometheus &
Step 6:
Now we need to connect both Alert manager and VM-1 node exporter to prometheus server
by updating prometheus.yml file

After restarting the Prometheus server and we should be able to see the node exporter on
Prometheus target section
Next, need to configure the Blackbox exporter to scrape the data from the website
application, so let’s update scrapping configs on prometheus.yml file

vi prometheus.yml file

Restart the Prometheus server to reflect the changes


Need to start the Blackbox exporter

When we start Alert manager, and we won’t be able see any alerts as of now since we
haven’t configured alert manager

So, let’s configure it


Now we need to configure email notification to get emails when the defined conditions are
met

To receive email notification, we need to enable 2 step verifications on the Gmail account

Step 7:

Next, go to https://myaccount.google.com/apppasswords
And enter name and get a app password which can be used for routing configuration

cd alertmanager
vi alertmanager.yml

---
route:
group_by:
- alertname
group_wait: 30s
group_interval: 5m
repeat_interval: 1h
receiver: email-notifications
receivers:
- name: email-notifications
email_configs:
- to: [email protected]
from: [email protected]
smarthost: smtp.gmail.com:587
auth_username: [email protected]
auth_identity: [email protected]
auth_password: luwg yvge wwez fjti
send_resolved: true
inhibit_rules:
- source_match:
severity: critical
target_match:
severity: warning
equal:
- alertname
- dev
- instance

Now, Restart the alert manager and check

Hurray, the monitoring setup complete!!!!


Everything seems fine now
Step 8:
Next, will try check the entire functionality by shutting down the game application.

The status is in pending state


After 1 minute the status will change to firing state and soon will receive an email
notification

Can view the notification on alert manager


Next will try terminating the node exporter

Terminating node exporter will send the notification for both ec2 instance as well the
service

You might also like