0% found this document useful (0 votes)
77 views2 pages

Adding Observability To A Kubernetes Cluster Using Prometheus - by Martin Hodges - Jan, 2024 - Medium

This document discusses adding observability to a Kubernetes cluster using Prometheus for monitoring. It recommends deploying Prometheus from the beginning to capture performance information and spot issues. The document outlines setting up a Persistent Volume Claim for Prometheus storage, configuring Prometheus to scrape cluster metrics, and using Grafana for user-friendly alerting. It describes the overall monitoring architecture with Prometheus installed on the Kubernetes cluster to provide observability.

Uploaded by

Alain Dupres
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views2 pages

Adding Observability To A Kubernetes Cluster Using Prometheus - by Martin Hodges - Jan, 2024 - Medium

This document discusses adding observability to a Kubernetes cluster using Prometheus for monitoring. It recommends deploying Prometheus from the beginning to capture performance information and spot issues. The document outlines setting up a Persistent Volume Claim for Prometheus storage, configuring Prometheus to scrape cluster metrics, and using Grafana for user-friendly alerting. It describes the overall monitoring architecture with Prometheus installed on the Kubernetes cluster to provide observability.

Uploaded by

Alain Dupres
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Open in app Sign up Sign in

Search Write

Adding observability to a
Kubernetes cluster using
Prometheus
Monitoring your services is vital and should be considered as part of
your underlying infrastructure for your services. You should put this in
place ahead of creating and deploying your services. In this article I
look at how to deploy Prometheus provide the observability you need
to run your services.

Martin Hodges · Follow


14 min read · 2 days ago

Adding observability to your Kubernetes cluster

This article follows my article on creating a Kubernetes cluster using


Infrastructure as Code (IaC) tools such as Terraform and Ansible and
assumes you have a cluster ready to use. It also requires persistent storage,
which you can read about here.

What to do when things go wrong


Anyone who has developed software knows that things go wrong. Whether
this is during development, testing or in production, unexpected things
happen to break your services. These problems can be broadly categorised
into:

1. Resource depletion (eg: running out of memory)

2. Logic errors (eg: defects in the code)

3. Unexpected user behaviour (including cyber attacks)

4. Operational activities (eg: defects created in the data)

When things do go wrong, the results for your users and for you could be
catastrophic, frustrating or plain old embarrassing. It might be that your
service fails completely, partially, slowly, or worse, without anyone noticing.

The last one is particularly problematic. For catastrophic failures, their


highly visible impact is felt quickly by users and the user provides the
feedback. In the case where a service is starting to go wrong and may even
be making mistakes that no one notices, there is no user feedback. Your
service may deteriorate further and the first you know about it is when it
becomes a catastrophic failure. That is bad for your users, bad for your
business and bad for you.

Hopefully you now understand why you need to ensure that you know about
problems first and that you can fix things before they become catastrophes.

Whilst I will introduce the concepts around monitoring and alerting, there
are many great articles and descriptions that describe these in detail.

Monitoring and Alerting


More often than not, enabling the operation and management of your
services is not considered until it is too late. The focus tends to be on the
functionality provided, not how it is to be operated.

I would recommend you always consider how you will operate and manage
your services from the start of the project. Do not leave it until after your
first major production outage.

With this in mind, I want to introduce monitoring and alerting.

Monitoring
Let’s say you have your Kubernetes cluster running a set of services that you
have deployed. It needs to run 24 x 7. As mentioned earlier, there are many
reasons why it may not, such as running out of resources, unexpected
events, defects and cyber events.

To ensure your services are always available for your users to use, you want
to be able to see how your services are performing at any given time.

You should consider adding a monitoring system to your deployment from


the begining. It will capture vital performance information from your
services in a central location and will provide you the ability to review and
analyse that information, so you can spot when something is going to go
wrong or work out what happened after something went wrong.

Another reason you should start your monitoring system early is that it can
take time to optimise the information you are going to make available
through the system. By starting early, you stand more chance of having a
functional, operational and beneficial monitoring system for when you
launch.

When it comes to the type information you need your monitoring system to
collect, it is important to understand the difference between logs and
metrics:

A log is a time ordered list of events that your service recorded and
generally contains information that helps you work out what was
happening at that point in time.

A metric is a measurement of resource consumption and/or the amount


of events that have occurred over a small time period.

When monitoring a system, it is necessary to capture both logs and metrics.


Metrics may be calculated from the logs and reported alongside the log
entries themselves to allow you to get a full picture. Other metrics may be
collected from the service itself.

Alerting
Ok, so you are monitoring your services (and the underlying infrastructure
that supports them) but you cannot stay glued to your screen 24x7. Instead,
you need to be told when something has gone wrong (or better still, that
something will go wrong without your action).

This is where alerting comes in.

When things do go wrong (or start to go wrong), you want to know quickly,
regardless of where you might be. For this reason, you want your alert to be
delivered through a channel that is likely to catch your attention no matter
where you are. Channels such as email, Slack, text message, push
notification are examples that are likely to be effective.

The alert is triggered from the information that is collected by your


monitoring system based on a set of rules that you set up. These rules may
be based on metrics, specific types of log entry or a combination of both.

You should now be building a picture of how your service, your monitoring
and alerting systems work together.

Monitoring and Alerting

It is important that your monitoring and alerting system is reliable and fault
tolerant. In a failure case you do not want to find that your service
monitoring failed to capture the required information or that it lost it. You
do not want your alerts to fail to reach you.

We will now look at how Prometheus can help within a Kubernetes cluster
by providing monitoring. Prometheus also provides alerting but I want to
come back to that in another article.

Architecture
From my previous articles, I am assuming that you have a Kubernetes cluster
that looks like this:

Kubernetes cluster

We will now install Prometheus onto this cluster, using a Persistent Volume
backed by the nfs-server .

Monitoring solution

Starting with a Kubernetes cluster backed by an NFS server and persistent


storage. To this we will add:

A Persistent Volume (PV) for Prometheus

A Persistent Volume Claim (PVC) for Prometheus

Prometheus

All these components will be added to a Kubernetes namespace called


monitoring.

We will configure Prometheus to scrape metrics from the cluster itself.

You will see that Prometheus is capable of providing sophisticated alerting


via its Alert Manager module but I have decided to use Grafana as it is more
user friendly and lends itself to more to ad hoc changes, allowing you to
experiments with alerting rules and levels as you learn about how your
system behaves.

Setting up our PVC


You would not be very happy if, when your monitoring pods are restarted,
you lost all your historic data and your configurations, requiring you to start
again. I know this as I have been there.

We need a place for Prometheus to store its data safely. We do this using a
Persistent Volume (PV), which the application claims through a Persistent
Volume Claim (PVC). I have written about creating PVs and PVCs here.

Creating the PVs


I am assuming you have a Kubernetes cluster with access to an NFS server.

I would strongly suggest that you create a separate share for Prometheus. If
you have followed my previous articles, you will need to set up this share.
Log in to your nfs-server and modify this file as root (keep any other
changes you may have made):

/etc/exports

/pv-share *(rw,async,no_subtree_check)
/pv-share/prometheus *(rw,async,no_subtree_check)

Before we load these into NFS, we have to create the subfolder:

sudo mkdir /pv-share/prometheus


sudo chmod 777 /pv-share/prometheus

Note that these file permissions are weak and should not be used for production.
For this article I am showing an example to get you started.

Now load this share and ensure the service starts correctly.

sudo systemctl restart nfs-server


sudo systemctl status nfs-server

You can now use this share.

Log in to your k8s-master and create the following file (I am assuming here
that you are accessing your cluster via kubectl on your master node. If not,
use whatever access you typically use to deploy to your cluster):

Remember to replace any fields between < and > with your own values.

prometheus-pv.yml

apiVersion: v1
kind: PersistentVolume
metadata:
name: prometheus-pv
spec:
capacity:
storage: 10Gi
storageClassName: prometheus-class
accessModes:
- ReadWriteOnce
nfs:
path: /pv-share/prometheus
server: <nfs-server IP address>
persistentVolumeReclaimPolicy: Retain

Remember to change the path and the server IP addresses to those used by
your cluster. You may also need to change the overall size of these PVs,
which I have set to 10GB.

Now create the PV and check it has been created:

kubectl create -f prometheus-pv.yml


kubectl get pv

You should see your PV is now available to the cluster:

NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECL


prometheus-pv 10Gi RWO Retain Available prometheu

We now need to create a PVC for the PV. Before we do that, we need to create
the namespace for them:

kubectl create namespace monitoring

Now create this file:

prometheus-pvc.yml

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: prometheus-pvc
namespace: monitoring
spec:
storageClassName: prometheus-class
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi

Now create them and check they are bound to the PV:

kubectl create -f prometheus-pvc.yml


kubectl get pvc -n monitoring

This should immediately show the PVCs bound to their PV equivalent:

NAMESPACE NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGE


monitoring prometheus-pvc Bound prometheus-pv 10Gi RWO prometh

These can now be mounted in your Prometheus pods.

Installing Helm
It is possible to create manifest files (deployment, service, secrets and
configuration yaml files) for Prometheus and Grafana and then deploy them.
This is a complex process to get right and it is better to use Helm charts
instead, which come with these two applications integrated.

Helm is like a package manager for Kubernetes and manages the


dependencies and configurations required for the application(s) you are
loading. It installs all the manifests you need to make your system
operational.

Helm is installed where you run kubectl , which in my case is on the k8s-

master node.

The easiest way to deploy Helm is from the install script.

curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-h


chmod +x get_helm.sh
./get_heml.sh

You can check Helm is installed with:

helm version
helm

This will show you the version of Helm you have installed as well a list of
commands you can use with Helm.

Deploying Prometheus and Grafana


Now we have Helm installed we can use it to deploy Prometheus and
Grafana in to our cluster with all the configuration in place to monitor our
cluster.

There are many ways to deploy Prometheus. In fact, you can see a list of
available charts in the Artifacthub alone with:

helm search hub prometheus

The list is very long and can be quite daunting. Some are no longer
supported.

The instructions here are working at the time of writing, Jan 2024.

Deploying Prometheus
First we will add the community Helm chart repository to our system:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts


helm repo update

We can now install Prometheus but if we do that, it will create an ephemeral


storage within the container. You can get the installation to create a PV and
PVC automatically using PV operators but want it to use the PV and PVC we
created earlier.

To do this we create a values file. A values file is a yaml file that overrides or
defines additional configuration for a Helm chart. In this case we will use a
values file to override the defaults in the Helm chart.

The structure and purpose of the values properties is quite extensive.


Looking at the Helm chart version helps. You can find it here.

Whilst we are telling Prometheus to use our PV/PVC, we can also tell it not to
start its Alerts Manager as we will be using Grafana for this. We will also
remove the Push Gateway as we will not be using this.

Create the following file (remember to replace the < > fields with your own
values).

prometheus-values.yml

alertmanager:
enabled: false
prometheus-pushgateway:
enabled: false
server:
service:
externalIPs:
- <k8s-master IP address>
servicePort: 9090
type: NodePort
nodePort: 31190
persistentVolume:
enabled: true
existingClaim: prometheus-pvc

Prometheus will normally create a ClusterIP service that requires you to


carry out a bunch of port forwards. In this values file I have asked it to create
a NodePort that is then accessible externally.

Now let’s install Prometheus, from the community supported Helm chart,
with our values override file. We will install it into the monitoring

namespace we created earlier.

helm install prometheus-monitoring prometheus-community/prometheus -f prometheus-values.

Now check that it is up and running with:

kubectl get pods -n monitoring

You should see:

NAME READY
STATUS RESTARTS AGE
prometheus-monitoring-kube-state-metrics-84945c4bd5-n29mr 1/1
Running 0 12m
prometheus-monitoring-prometheus-node-exporter-ksmnl 1/1
Running 0 12m
prometheus-monitoring-prometheus-node-exporter-swrhj 1/1
Running 0 12m
prometheus-monitoring-prometheus-node-exporter-zp5mz 1/1
Running 0 12m
prometheus-monitoring-server-94f974648-x7jxs 2/2
Running 0 12m

Some descriptions:

kube-state-metrics allows Prometheus to scrape the cluster metrics via


the Kubernetes API

node-exporter allows Prometheus to scrape the cluster nodes themselves


(we have 3 Kubernetes nodes so we need 3 daemon pods)

server is Prometheus itself

Note that you can uninstall Prometheus at any time with:

helm delete prometheus-monitoring -n monitoring

You should not lose any data as the PVC is still retained:

kubectl get pvc -n monitoring

Testing Prometheus
You should now be able to go to a browser on your development machine
and access the UI at: http://<k8s-master IP address>:9090/graph .

All being well, you will be presented with the Prometheus graph page. From
here you can type a search criteria into the search bar. For example
kubelet_active_pods . When you click Execute you will see the number of
pods created since you started Prometheus.

Adding monitoring to our gateway and NFS servers


With Prometheus set up, we still have to ensure our servers that are not in
the cluster are also monitored. If you have been following my articles on
automating the creation of a Kubernetes cluster, you will have a cluster that
includes a gateway server that acts as a ingress point to the cluster from the
Internet. You will also have an NFS server that is providing our PVs.

These are vital components in our architecture and, even though they are
not in our cluster, we need to monitor them.

Welcome to the Prometheus Node Exporter. This is a service that will run on
our external nodes and collect metrics from the node’s Operating Systems
(OS) and present them in a way that Prometheus can scrape them.

We will need to install Node Exporter on both of our external servers. I’ll
only explain one of them for brevity.

First go to the official set of downloads to find the correct version. At the
time of writing, I am selecting 1.7.0 linux on amd64.

Log in to your server and download the required version:

wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-
tar xvf node_exporter-1.7.0.linux-amd64.tar.gz

This will download 3 files into a folder within your current folder. Copy the
executable as follows:

sudo cp node_exporter-1.7.0.linux-amd64/node_exporter /usr/local/bin

You can now delete the downloaded folder and file.

It is recommended that you run node exporter under a separate user. The
user should not be able to log in. Create the user and assign the binary to
them.

sudo useradd --no-create-home --shell /bin/false node_exporter


sudo chown node_exporter:node_exporter /usr/local/bin/node_exporter

We now need to create the service to run the exporter. As root create the
following file (remember to change < > fields to match your set up):

/etc/systemd/system/node_exporter.service

[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter $ARGS --web.listen-address
<server private IP address>:9100
Restart=always
RestartSec=3
[Install]
WantedBy=multi-user.target

Reload the service daemon so it picks up the new service and then start and
enable the service.

sudo systemctl daemon-reload


sudo systemctl start node_exporter
sudo systemctl enable node_exporter

You should be able to check the service is up and working as expected by


logging in to your master node and using:

curl <server private IP address>/metrics -v

You should see a set of metrics being returned.

For the gateway server, you should not be able to access this end point from
the Internet but you may also find you cannot access it from the k8s-master

node either. This is because the gateway has a firewall preventing the
connection. On the gateway server, enable the port with:

ufw allow 9100/tcp

At this point your two servers should now have node_exporter running on
them. Now we need to get Prometheus to scrape these two new data sources.

Updating Prometheus
We now need to add our two new node_exporter instances to our Prometheus
deployment.

It is tempting to add them in as two new endpoints to the Prometheus value


file but, because the endpoints used by Prometheus are determined from
calls to the Kubernetes API, we will, instead add them as extra scape configs.

Update your prometheus-values.yml to the following, replacing < > fields with
your values.

alertmanager:
enabled: false
prometheus-pushgateway:
enabled: false
server:
service:
externalIPs:
- <k8s-master IP address>
servicePort: 9090
type: NodePort
nodePort: 31190
persistentVolume:
enabled: true
existingClaim: prometheus-pvc
extraScrapeConfigs: |
- job_name: 'rs-nfs-server'
metrics_path: /metrics
static_configs:
- targets:
- <nfs server IP address>:9100
- job_name: 'rs-gw'
metrics_path: /metrics
static_configs:
- targets:
- <gw server IP address>:9100

Note that the extra scrape config is added as a string representing additional
yaml so make sure you copy and paste the config above.

Once you update this configuration, you will need to delete and reinstall
your helm charts.

helm uninstall prometheus-monitoring -n monitoring


helm install prometheus-monitoring prometheus-community/prometheus -f prometheus-values.

Check the pod statuses to wait for it come back up:

kubectl get pods -n monitoring

Now when you look at the Prometheus UI, you should see your additional
servers in your node metrics.

Visualisation and Alerting


Prometheus gives you access to a set of metrics in tabular and graph form.
However, this is not adequate for the majority of uses.

Typically a separate application is used to provide better visualisation. I will


be following this article with another on installing Grafana to provide
visualisation for the metrics.

In addition, Prometheus has sophisticated alerting but I have disabled it in


this deployment as I will be implementing alerting through Grafana.

Summary
In this article we looked at the need to implement monitoring and alerting to
ensure our services remain available and meeting the expectation of our
users.

We then create a Persistent Volumes to hold our data and followed this up by
installing Helm so we could then install Prometheus using a community
Helm chart.

By overriding the Helm chart defaults, we were able to connect Prometheus


to our PV as well as providing a NodePort service that allows us to access the
User Interface. We also added Node Exporter to our non-Kubernetes servers
so that Prometheus could scrape those metrics too.

In my next article, I will show you hw to install Grafana and connect it to


Prometheus.

If you found this article of interest, please give me a clap as that helps me
identify what people find useful and what future articles I should write. If
you have any suggestions, please add them in the comments section.

Kubernetes Prometheus Helm Persistent Volume

Written by Martin Hodges Follow

28 Followers

More from Martin Hodges

Martin Hodges Martin Hodges

Guide to using OpenVPN to access Adding persistent storage to your


servers in your VPC Kubernetes cluster
An OpenVPN server can provide secure When the systems you develop require
access to your cloud servers that are isolate… persistent storage you need to consider…
from the Internet where your precious data is to be stored. If
16 min read · Dec 15, 2023 10
youmin read
get · 5 days ago
it wrong…

61 5

Martin Hodges Martin Hodges

Automatic creation of Kubernetes Spring Boot CI/CD on Kubernetes


cluster on Binary Lane using Terraform, Ansible and…
Using Terraform and Ansible to quickly stand GitHub: Part
Introduction 1
to creating a CI/CD pipeline on
up a Kubernetes cluster on the Australian… Kubernetes in the cloud
Binary Lane cloud provider.
7 min read · Dec 26, 2023 5 min read · Nov 6, 2023

12

See all from Martin Hodges

Recommended from Medium

Romaric Philogène Martin Hodges

Top 10 Platform Engineering Tools Adding persistent storage to your


You Should Consider in 2024 Kubernetes cluster
Recently, we wrote a great article about When the systems you develop require
Platform Engineering and how to get started… persistent storage you need to consider…
well, now that you know much more about it, where your precious data is to be stored. If
7
you read · 6 days ago
minmight… 10
youmin read
get · 5 days ago
it wrong…

224 3 5

Lists

Natural Language Processing


1072 stories · 550 saves

Ach.Chusnul Chikam Karthik Seenuvasan in Coinmonks

Deploy Production Ready GitOps: ArgoCD vs. FluxCD—


OpenStack Using Kolla Ansible Unveiling the Victorious Champion
In today’s tech landscape, cloud computing, In the fast-evolving landscape of DevOps,
especially with OpenStack, has become the… GitOps has emerged as a powerful paradigm…
backbone of modern infrastructure. enabling teams to manage and automate
8 min read · Dec
OpenStack 19, 2023
offers a… · 3 min read · Dec 22, 2023
their…

17 13

Anji Keesari Hardik Patel

Getting Started with ArgoCD ArgoCD : Configure Projects &


Introduction Applications
Introduction

8 min read · Jan 3 5 min read · 5 days ago

1 1 6 1

See more recommendations

You might also like