Hpcsa Block Monitoring Tutorial
Hpcsa Block Monitoring Tutorial
Learning Objectives
Tools
• InfluxDB, Telegraf
• Grafana
• Centos 8 server
• an editor (vim, nano,. . . )
• bash
• browser
Contents
Optional: Setup Database and Grafana with specific users 7: Tutorial (0 min) 17
Optional: Central setup for telegraf 8: Tutorial (0 min) 17
Goal of this task is to set up the InfluxDB v2x, which is the foundation for the rest of the tutorial. After this
the InfluxDB should be running and the following information is available for the further tasks:
• < inf luxip > - IP of the InfluxDB server: not the floating ip
• < inf luxport > - Port of InfluxDB: the port the InfluxDB will listen to
• < org > - Organization Name - the organization of the database within InfluxDB
• < bucket > - Bucket Name - the name of the database within InfluxDB
• < inf luxuser > - the main user for the Bucket
• < inf luxpassword > Password - password of the user for the bucket
• < token > Token - the access token to access the bucket
• < graf anaadminpass > - the password for the admin access of the grafana server (application)
• < f rontendf loatingip > - the floating ip of the frontend server
The term bucket will be explained a little bit more later.
It should be ensured that the data listed above is stored somewhere - e.g. a pad or in a local editor - for easy
access.
The < f rontendf loatingip > can be found in the cloud administration tool cloud.gwdg.de
The database for storing the metrics provided by the different agent on the systems to monitor has to be set
up first. This will be done on the frontend server.
• Login to the server via ssh.
• $ ifconfig
• note down the inet ip adress of eth0 as < inf luxip > w/o the netmask: e.g if the inet is 10.254.1.9/24
< inf luxip > would be 10.254.1.9
• note down “8086” as < inf luxport > (will be changed later - just in case)
The standard Redhat/Centos package manager yum is used to install the InfluxDB. The repository is not yet
in the repository list of yum and has to be added. Copy the following bash code block to the bash and execute
it:
To ensure that this service will be started after a boot it needs to be enabled permanentely:
$ sudo systemctl enable influxdb
Hints
• The reason to use the option –nogpgcheck is an issue with the gpg-keys that confirm the identity of the
binary packages that are installed via yum. Usually these fingerprints should be adapted by the package
provider when building new versions, but this seems to be not the case. It is usually not recommended
When the InfluxDB service is installed and started it can be configured via the according commandline tools.
To set up an initial database a username, the password for this user and a bucket name - the database name
- has to be defined. Then the following command can be executed to set up InfluxDB (the values should be
replaced by the chosen ones). The general form of the setup command is:
$ influx setup --username <influx user> --password <influxpassword> --bucket <bucket>
It has to be ensured that the values are noted down and are accessible for later user.
The setup process will ask for < org > - enter it and write it down. Select “0” for the retention time. An
example setup command execution:
1 influx setup --username hpcuser --password hpcsa_user --bucket hpcsa
2 ? Please type your primary organization name gwdg
3 ? Please type your retention period in hours, or 0 for infinite 0 0
4 ? Setup with these parameters? --- confirm with 'y'
In case that another port has to be used by influx it can be changed by editing the config files. In this case
that is required in order to allow later access to the influx web interface. This can be done by appending the
according address to the InfluxDB server config and of course by adapting the config for the Influx CLI tool.
The following has to be executed line by line.
Important!
The $ tee command with the -a option appends a line to an existing file. If the given command is executed
multiple times, e.g. to configure another port for the http-bind-address the config file will end up with having
multiple lines defining that option. When restarting the InfluxDB service will fail due to a malformed config
while giving a non explainatory log output. In that case the wrong http-bind-address lines in the config file
/etc/influxdb/config.toml have to be removed.
To check if the InfluxDB has been restarted with the correct port:
$ sudo systemctl status influxdb | grep port
The output should show that InfluxDB uses port 8009 now:
1 Feb 16 21:58:42 worker.novalocal influxd-systemd-start.sh[11730]: ts=2023-02-16T20:58:42.601641Z lvl=info
,→ msg=Listening log\_id=0g30CKUG000 service=tcp-listener transport=http addr=:8009 port=8009
The < inf luxport > value in the notes has to be modified to be “8009”, too.
In order to fill the database at least one node agent has to run in order provide metrics to the InfluxDB. This
will be done in this tutorial.
The initial Telegraf agent will be installed on the frontend as this is the only machine with access to the internet
which is required for an easy download and install of the Telegraf package.
Telegraf is part of the InfluxDB repository and needs the same repository setup as InfluxDB. Therefore, Telegraf
can just be installed as InfluxDB repos has already been added to the repository list of the frontend server.
$ sudo yum install telegraf --nogpgcheck
The service is now installed but not started yet. Telegraf will not run with the default configuration - it has
to be modified first.
In order to run Telegraf it has to be configured properly. At least one input and one output plugin has to be
configured. The standard input plugin is configured, but the output is not yet done.
The information to be provided in the config in the following steps are the < inf luxip >, < inf luxport >,
< bucket >, < org > and < token > from the InfluxDB setup. This information is used by the Telegraf agent
to contact InfluxDB and access the database (bucket) that has been defined.
Steps
1. open the file /etc/telegraf/telegraf.conf with an editor using sudo (e.g sudo nano)
• if nano is not installed it must be installed via $ sudo yum install nano
2. search for the section [[outputs.influxdb v2]]
3. uncomment (remove the leading #) and modify following entries according to the data collected during
the influxdb config:
• # [[outputs.influxdb v2]] −→ just uncomment
• # urls = [”http://127.0.0.1:8086”] −→ urls = [”http://< inf luxip >:< inf luxport >”]
• # token = ”” −→ token = ”< token >”
• # organization = ”” −→ organization = ”< org >”
• # bucket = ”” −→ bucket = ”< bucket >”
4. save the file and leave the editor (for nano CTRL-o, Return, CTRL-X)
This output shows which plugins are loaded for input and output.
To stop telegraf:
$ CTRL-C
Now the telegraf service can be started and enabled:
$ sudo systemctl start telegraf
$ sudo systemctl enable telegraf
Check if telegraf has been started:
$ sudo systemctl status telegraf
• the pre-configured standard plugin configuration can be found in the section [[inputs.cpu]] of the /etc/tele-
graf/telegraf.conf file. More details to the settings/metrics for this plugin can be found on the according
≪CPU Input Plugin≫ webpage.
The influx commandline tools can be used to check if data is arriving in the influxdb. On the frontend do:
$ influx query
This opens a query pipeline. The influx tool now waits for a query. Copy this query to the shell with the query
pipeline
Grafana will be used to display data collected by the Telegraf agent. In the following the Grafana service/server
will be setup and configured. At the end a simple dashboard will be created.
Grafana will be the outward facing user interface that will be used to display plots from time-series data.
Therefor it needs to be installed on the frontend :
$ sudo yum install grafana
This will install two packages.
Before starting Grafana the port has to be adjusted to avoid conflicts:
• open /etc/grafana/grafana.ini in an editor with sudo right (e.g sudo nano - install it if not installed)
• search for the option http port
• set the value to 8000
• save the file and exit
Now the Grafana server can be started:
$ sudo systemctl start grafana-server
The status should be checked as before with InfluxDB and Telegraf.
$ sudo systemctl status grafana-server
If the server is running it should be checked if the port is set correctly to “8000”:
$ sudo systemctl status grafana-server | grep address
The output should show the correct port to be used by grafana-server:
1 Feb 17 06:57:16 frontend.novalocal grafana-server[27906]: t=2023-02-17T06:57:16+0100 lvl=info msg="HTTP
,→ Server Listen" logger=http.server address=[::]:8000 protocol=http subUrl= socket=
Setting up Grafana is done via the web-interface. Grafana provides its own web server. First the “admin”
account has to be setup. This happens automatically when trying to login to Grafana for the first time.
1. open a browser
2. URL to use: < f rontendf loatingip >:8000
3. enter “admin” as user name
4. enter “admin” as password
5. Grafana ask for a new password and the according confirmation - enter an arbitrary password
6. the password should be noted down as < graf anaadminpass > - just in case
Hints
• If you loose the Grafana admin password it can be reset on the frontend:
$ sudo grafana-cli --homepath "/usr/share/grafana" admin reset-admin-password <new password>
In order to display data in Grafana the application requires the information where and how to retrieve the
data from. This is information a “Datasource”. It is possible to define multiple datasources, but in this case
only one is created.
As “admin” on the Grafana server select “Configuration – Data Sources” from the toolbar on the left:
Fill out the given form with the collected information, check the image and the text below for hints:
When done press the “Save and test” button at the bottom of the form. If everything is setup correclty there
will be green feedback:
Grafana is now able to connect to the given database and retrieve metrics to display
As the datasource is setup it is possible to create panels showing plots from metrics of this source. In the
following a simple dashboard is created using the Flux query language. Flux is a topic on its own, so the
according query will be provided.
This can be done as user “admin” in Grafana as no other user is created yet.
Select the menu “Dashboards — Manage” from the toolbar on the left:
The dashboard creation interface shows up. Select the “Add an empty panel” option in the “Add panel” field:
from(bucket: "<bucket>")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "cpu")
|> filter(fn: (r) => r["_field"] == "usage_user")
|> filter(fn: (r) => r["cpu"] == "cpu-total" or r["cpu"] == "cpu0" or r["cpu"] == "cpu1")
|> yield(name: "mean")
This query request data from the given bucket with in the time-range given (this is a dynamic timerange). The
InfluxDB will first select all data in the given timespan, and they apply the filters on it. In this case the query
filters for the general CPU usage of the users on the systems which provide the according measurements.
Press apply in the top right corner:
The panel is shown, but it is empty - the timeframe has to be selected in the dropdown menu first
To save this dashboard for future use press the name/title of the dashboard and select “edit”. The editor
shows up again. Save it with the according button in the top right.
The dashboard created is quite simple. The issue with creating more interesting panel is to know the available
metrics and the syntax. InfluxDB provides a web interface for data exploration to make this easier for the
users.
This web interface can be reached via web browser on < f rontendf loatingip:< inf luxport >.
Login is < inf luxuser > and < inf luxpassword >.
Select the dataexplorer from the toolbar on the left:
This will open the dataexplorer. At the bottom are columns letting users select, the bucket, measurements,
according metrics - with this it is easy to dig down through the data in InfluxDB.
This requires a working TIG Stack as it has been installed in the previous tasks.
Check the available plugins on the according webpage https://docs.influxdata.com/telegraf/v1.20/plugins/ ,
select one of them, install and configure it.
Optional: Setup Database and Grafana with specific users 7: Tutorial (0 min)
Hints
• Think about the following: Wow many users would be useful for the InfluxDB and for Grafana? What
rights should they have? Why?
• Do not forget to change parts the telegraf configuration.
Requires the working TIG stack from this tutorial and telegraf already integrated on the worker nodes.
In a previous lecture slurm has been setup using a central installation and configuration. Is this possible for
telegraf, too? Or has something to be considered and adapted to telegraf. Implement you solution.
• HInt for tutor: you may have different hardware installed which utilize different plugins. Therefor at
least the configuration has to be local for the specific machine types/hardware.
Further Reading
• https://grafana.com/docs/grafana/latest/getting-started/get-started-grafana-influxdb/
• https://grafana.com/docs/grafana/latest/introduction/
• https://docs.influxdata.com/influxdb/v2.6/install/
• https://docs.influxdata.com/telegraf/v1.20/plugins/