0% found this document useful (0 votes)

72 views15 pages

Fine-Tuning Checkmk Monitoring Rules

1. The document discusses fine-tuning monitoring in Checkmk to avoid false alarms. It is important to avoid notifying users of non-critical issues to prevent monitoring fatigue. 2. Checkmk uses rule-based configurations to set thresholds across many services using rules rather than individual configurations. Rules make up rule sets that can be accessed through the setup menu or by finding the appropriate rule set for a specific service. 3. Creating new rules involves defining the rule properties, values, and conditions to determine when the rule is applied and what thresholds it sets. Host tags and explicit services can target rules to particular systems.

Uploaded by

carlosft89

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

72 views15 pages

Fine-Tuning Checkmk Monitoring Rules

Uploaded by

carlosft89

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Fine-tuning the monitoring

docs.checkmk.com/latest/en/intro_finetune.html

1. False alarms — fatal to any monitoring

Monitoring is only really useful if it is precise. The biggest obstacle to acceptance by
colleagues (and probably by yourself) are false positives — or, in plain English, false
alarms.

With some Checkmk beginners, we have seen how they have added many systems to the
monitoring in a short space of time — perhaps because it is so easy to do so in Checkmk.
When they then shortly afterwards activated the notifications for all users, their
colleagues were flooded with hundreds of emails per day, and after only a few days their
enthusiasm for monitoring was effectively destroyed.

Even if Checkmk makes a real effort to define appropriate and sound default values for all
possible settings, it simply cannot know precisely enough how things should be in your IT
environment under normal conditions. Therefore, a bit of manual work is required on
your part to fine-tune the monitoring until even the last false alarm will not be sent. Apart
from that, Checkmk will also find quite a few real problems that you and your colleagues
have not yet suspected. These, too, must first be properly remedied — in reality, not in the
monitoring!

The following principle has proven itself - first quality, then quantity — or in other words:

Do not include too many hosts in the monitoring all at once.

Make sure that all services that do not really have a problem are reliably on OK.

Activate the notifications by email or SMS only after Checkmk has been running
reliably for a while with no or very few false alarms.

Note: False alarms can of course only occur when the notification function is switched
on. So basically, what we need to do here is to turn off the preliminary stage of
notifications and to avoid the critical states DOWN, WARN or CRIT for non-critical
problems.

In the following chapters on configuration, we will show you what fine-tuning options you
have — so that everything that does not cause problems will be green — and how to get
any occasional drop-outs under control.

2. Rules-based configurations
Before we start configuring, we must first briefly look at the settings of hosts and services
in Checkmk. Since Checkmk was developed for large and complex environments, this is
done using rules. This concept is very powerful and brings many advantages to smaller

1/15
environments as well.

The basic idea is that you don’t explicitly specify every single parameter for every service,
but implement something like: 'On all production Oracle servers, file systems with the
prefix /var/ora at 90% filled will be WARN and at 95% will be CRIT'.

Such a rule can set thresholds for thousands of file systems with a single action. At the
same time, it also documents very clearly which monitoring policies apply in your
company.

Based on a basic rule, you can then define exceptions for individual cases separately. A
suitable rule might look like this: 'On the Oracle server srvora123 , the file system
/var/ora/db01 at 96% filled will be WARN and at 98% will be CRIT'. This exception
rule is set in Checkmk in the same way as the basic rule.

Each rule has the same structure. It always consists of a condition and a value. You can
also add a description and a comment to document the purpose of the rule.

The rules are organised in rule sets. For each type of parameter, Checkmk has a suitable
rule set ready, so you can choose from several hundred rule sets. For example, there is one
called Filesystems (used space and growth) that sets the thresholds for all services that
monitor filesystems. To implement the above example, you would set the basic rule and
the exception rule from this rule set. To determine which thresholds are valid for a
particular file system, Checkmk goes in sequence through all of the rules valid for the
check. The first rule for which the condition applies sets the value — in this case, the
percentage value at which the file system check becomes WARN or CRIT.

3. Finding rules
You have various options for accessing the rule sets in Checkmk.

On the one hand, you can find the rule sets in the Setup menu under the topics of the
objects for which there are rule sets (Hosts, Services and Agents) in different categories.
For services, there are the following rule set entries: Service monitoring rules, Discovery
rules, Enforced services, HTTP, TCP, Email, … and Other services. If you select one of
these entries, the associated rule sets will be listed on the main page. This can be only a
handful, or also very, very many as with the Service monitoring rules. Therefore you have
the possibility to filter on the results page — in the Filter field of the menu bar.

If you are unsure in which category the rule set can be found, you can also search through
all rules in one go, either by using the search field in the setup menu or by opening the
rule search page via Setup > General > Rule search. We will take the latter route in the
following chapter, in which we will introduce the process of rule creation.

With the large number of rule sets available, it is not always easy to find the right one,
with or without a search. However, there is another way that you can access the
appropriate rules for an existing service. In a view that includes the service, click on the
menu option and select the Parameters for this service entry:

2/15
You will receive a page from which you can access all of the rule sets for this service:

In the first box entitled Check origin and parameters, the Filesystems (used space and
growth) entry takes you directly to the set of rules for the file system monitoring
thresholds. However, you can see in the overview that Checkmk has already set default
values, so you only need to create a rule if you want to modify those defaults.

4. Creating rules

3/15
What does a rule look like in practice? The best way to start is to formulate the rule you
want to implement in a sentence, like this: 'On all production Oracle servers, tablespaces
DW20 and DW30 will at 90%fill level be WARN and at 95% be CRIT'.

You can then search for an appropriate rule set — in this example via the rule search:
Setup > General > Rule search. This opens a page in which you can search for 'Oracle' or
for 'tablespace' (case-insensitive) and find all of the rule sets that contain this text in their
name or in their description (not shown here):

The rule set Oracle Tablespaces is found in two categories. The number following the title
(here everywhere 0 ) shows the number of rules that have already been created from this
rule set. If you click on the name in the Service monitoring rules category, you will land in
the overview page for this rule set:

This rule set does not yet contain any rules. You can create the first one with the Create
rule in folder button. In doing so, you will already be defining the first part of the rule’s
condition, namely in which folder it is to apply. If you change the Main directory default
setting to Windows, for example, the new rule only applies to hosts that are directly in or
are below the Windows folder.

4/15
Creating — and later editing — this rule opens an input form with three boxes: Rule
Properties, Value and Conditions. We will look at each of these three in turn.

In the Rule Properties box, all entries are optional. In addition to the informative texts,
here you also have the option of temporarily deactivating a rule. This is practical because
you can sometimes avoid deleting and recreating a rule if you temporarily do not need it.

What you find in the Value box depends in each specific case on the content of what is
being regulated:

As you can see, this can be quite a number of parameters. The example shows a typical
case — each individual parameter can be activated by a checkbox, and the rule will then
only apply to this parameter. You can, for example, let another parameter be determined
by a different rule if that simplifies your configuration. In this example, only the threshold
values for the percentage of free space in the tablespace will be defined.

The Conditions menu for setting the conditions looks a little more confusing at first
glance:

5/15
In this example will only go into the parameters that we absolutely need for defining this
rule:

You have already selected the folder when creating the rule, but here you can change it
again if required.

The Host tags are a very important feature in Checkmk, so we will be devoting a separate
chapter to them right after this chapter. At this point, you use one of the predefined host
tags to specify that the rule should only apply to production systems. First select the host
tag group Criticality from the list and then click Add tag condition and select the value
Productive system.

Very important in this example are the Explicit Tablespaces, which restrict the rule to very
specific services. Two points are important here:

The name of this condition adapts to the rule type. If it says Explicit Services, specify
the names of the services concerned. For example, one such could be Tablespace
DW20 — that is, including the word Tablespace . In the example shown, however,
Checkmk only wants to know the name of the tablespace itself, thus DW20 .

The entered texts are always matched against the beginning. The input of DW20
therefore also accesses a fictitious tablespace DW20A . If you want to prevent this,
append the $ character to the end, i.e. DW20$ , because these are so-called regular
expressions.

Note: A detailed description of all of the other parameters and a detailed explanation of
the important concept of rules can be found in the article on rules. By the way, you can
find out more about the service labels, the last parameter in the image above, in the article
on labels.

After all entries for the definition are complete, save the rule with Save. After saving, there
will be exactly one new rule in the rule set:

6/15
Tip: If rather than with one rule, you later work with hundreds, there is a danger of losing
an overall view. So in order to help you maintain an overview, Checkmk provides very
helpful entries in the Related menu on every page that lists rules. With this you can
display the rules used in the current site (Used rulesets) and, similarly, those that are not
used at all (Ineffective rules).

5. Host tags

5.1. How host tags work

In the previous chapter we saw an example of a rule that should only apply to production
systems. More specifically, in that rule we defined a condition using the Productive
system host tag. Why did we define the condition as a tag and not simply set it for the
folder? Well, you can only define a single folder structure, and each host can only be in
one folder. But a host can have many different tags, and the folder structure is simply too
limited and not flexible enough for that.

In contrast, you can assign host tags to the hosts as freely and arbitrarily as you like —
regardless of the folder in which the hosts are located. You can then refer to these tags in
your rules. This makes the configuration not only simpler, but also easier to understand
and less error-prone than if you were to define everything explicitly for each host.

But how and where do you define which hosts should have which tag? And how can you
define your own customised tags?

5.2. Defining host tags

Let’s start with the answer to the second question about customised tags. First of all, you
need to know that tags are organised in groups called host tag groups. Let’s take location
as an example. A tag group could be named location, and this group could contain the
tags Munich, Austin and Singapore. Basically, each host from each tag group is assigned
exactly one tag. So as soon as you define your own tag group, each host will carry one
of the tags from this group. Hosts for which you have not selected a tag from the group are
simply assigned the first one by default.

For the definition of host tag groups, see the Setup menu: Setup > Hosts > Tags:

7/15
As can be seen, some tag groups have already been predefined. You cannot change most
of these. We also recommend that you do not touch the two predefined example groups
Criticality and Networking Segment. It is better to define your own groups:

Click Add tag group. This opens the page for creating a new tag group. In the first box
Basic settings you assign — as so often in Checkmk — an internal ID that serves as a key
and which cannot be changed later. In addition to the ID, you define a descriptive title,
which you can change at any later time. With Topic you can determine where the tag will
be offered later in the properties for the host. If you create a new topic here, the tag will be
displayed in a separate box in the host properties.

The second box Tag choices is about the actual tags, i.e. the selection options in the group.
Click Add tag choice to create a tag and assign an internal ID and a title for each tag:

8/15
Notes:

The IDs must be unique across all groups.

Groups with only one selection are also allowed and can even be useful. These then
appear as checkboxes. Each host will then have the tag — or not.

At this point, you can ignore the auxiliary tags for now. You can get all the
information on auxiliary tags in particular and on host tags in general in the article
on rules.

Once you have saved this new host tags group with Save, you can start using it.

5.3. Assigning a tag to a host

You have already seen how to assign tags to a host — in the host properties when creating
or editing a host. In the Custom attributes box — or in a separate one if you have created a
topic — the new host tag group will appear and there you can make your selection and set
the tag for the host:

Now that you have learned the important principles of configuration with rules and host
tags, in the remaining chapters we would like to give you some practical guidelines on
how to reduce false alarms in a new Checkmk system.

6. Customising file system thresholds

Check the threshold values for monitoring file systems and adjust them if necessary. We
have already briefly shown the default values above at the search for rules.

9/15
By default, Checkmk takes the thresholds 80% for WARN and 90% for CRIT for the fill
level of file systems. Now 80% for a 2 TByte hard disk is 400 GByte after all — perhaps a
bit much buffer for a warning. So here are a few tips on the subject of file systems:

Create your own rules in the Filesystem (used space and growth) rule set.

The parameters allow thresholds that depend on the size of the file system. To do
this, select Levels for filesystems > Levels for filesystem used space > Dynamic
levels. With the Add new element button you can now define your own threshold
values per disk size.

It is even easier with the magic factor, which we will introduce in the final chapter.

7. Sending hosts into downtime

Some servers are restarted on a regular basis — either to apply patches or simply because
they are supposed to be. You can avoid false alarms at these times in two ways:

In the Checkmk Raw Edition you first define a time period that covers the

times of the reboot. You can find out how to do this in the article on time periods. Then
create a rule in each of the rule sets Notification period for hosts and Notification period
for services for the affected hosts and select the previously-defined time period there. The
second rule for the services is necessary so that any services that go to CRIT during this
time do not trigger a notification. If problems occur within this time frame — and are also
resolved within the same time frame — no notification will be triggered.

In the Checkmk Enterprise Editions there are regular scheduled

downtimes for this purpose that you can set for any affected hosts.

Tip: An alternative to creating downtimes for hosts, which we have already described in
the chapter on scheduled downtimes, is the Recurring downtimes for hosts rule set in the
Enterprise Editions. This has the great advantage that hosts that are added to the
monitoring later automatically receive these scheduled downtimes.

8. Ignoring switched-off hosts

It is not always a problem when a computer is switched off. Printers are a classic example.
Monitoring these with Checkmk makes perfect sense — some users even organise the
reordering of toner using Checkmk. As a rule, however, switching off a printer before
closing time is not a problem. It is simply senseless, however, when at this point Checkmk
notifies due to the printer’s corresponding host going DOWN.

You can tell Checkmk that it is perfectly OK for a host to be powered off. To do this, find
the Host check command rule set and set its value to Always assume host to be up:

10/15
In the Conditions box, make sure that this rule is really only applied to the appropriate
hosts — depending on the structure you have chosen. For instance, you can define a host
tag and use it here, or you can set the rule for a folder in which all of the printers are
located.

Now, all printers will always be displayed as UP — no matter what their actual status is.

However, the services of the printer will continue to be checked and any timeout would
result in a CRIT state. To avoid this as well, configure a rule for the affected hosts in the
Status of the Checkmk services rule set, in which you set timeouts and connection
problems to OK respectively:

9. Configuring switchports
If you monitor a switch with Checkmk, you will notice that during the service
configuration, a service is automatically created for each port that is UP at the time. This
is a sensible default setting for core and distribution switches — i.e. those to which only
infrastructure devices or servers are connected. However, for switches to which end
devices such as workstations or printers are connected, this leads on the one hand to
continuous notifications if a port goes DOWN, and on the other hand to new services
being continuously found because a previously unmonitored port goes UP.

Two approaches have proven successful for such situations. Firstly, you can restrict the
monitoring to the uplink ports. To do this, create a rule for the disabled services that
excludes the other ports from monitoring.

11/15
However, the second method is much more interesting. With this method you monitor all
ports, but allow DOWN to be a valid state. The advantage is that you will have
transmission-error monitoring even for ports to which end devices are connected and can
thus very quickly detect bad patch cables or errors in auto-negotiation. To implement this
function, you need two rules:

The first set of rules Network interface and switch port discovery defines the conditions
under which switch ports are to be monitored. Create a rule for the desired switches and
select whether individual interfaces (Configure discovery of single interfaces), or groups
(Configure grouping of interfaces) are to be discovered. Then, under Conditions for this
rule to apply > Match port states, activate 2 - down in addition to 1 - up:

In the service configuration of the switches, the ports with the DOWN state will now also
be presented, and you can add these to the list of monitored services. Before you activate
the change, you will still need the second rule that ensures that this state is evaluated as
OK.

This rule set is called Network interfaces and switch ports. Create a new rule and activate
the Operational state option, deactivate Ignore the operational state below it and then
activate the 1 - up and 2 - down states for the Allowed operational states. (and any other

12/15
states as may be required).

10. Disabling services permanently

For some services that simply cannot be reliably set to OK, it is better not to monitor them
at all. In this case, you could simply manually remove the services from the monitoring for
the affected hosts in the service configuration (on the Services of host page) by setting
them to Undecided. However, this method is cumbersome and error-prone.

It is much better to define rules according to which specific services will systematically
not be monitored. For this purpose there is the Disabled services rule set. Here you can,
for example, create a rule according to which file systems with the /var/test mount
point are by definition not to be monitored.

Tip: If you disable an individual service in the service configuration of a host by clicking
on , a rule is automatically created for the host in this very rule set. You can edit this
rule manually and, for example, remove the explicit host name. The affected service will
then be disabled on all hosts.

You can read more information about this in the article on configuring services.

11. Catching outliers using mean values

Sporadic notifications are often generated by threshold values on utilisation metrics —
such as CPU utilization — which are only exceeded for a short time. As a rule, such brief
peaks are not a problem and should not be faulted by the monitoring.

For this reason, quite a number of check plug-ins have the option in their configuration
that their metrics are averaged over a longer period of time before the thresholds are
applied. An example of this is the rule set for CPU utilisation for non-Unix systems called
CPU utilization for simple devices. For this there is the Averaging for total CPU utilization
parameter:

13/15
If you activate this and enter 15 , the CPU load will first be averaged over a period of 15
minutes and only afterwards will the threshold values be applied to this average value.

12. Managing sporadic errors

When nothing else helps and services continue going occasionally to WARN or CRIT for a
single check interval — i.e. for one minute — there is one last method for preventing false
alarms — the Maximum number of check attempts for service rule set.

If you create a rule in that rule set and set its value to, say, 3 , a service that goes from OK
to WARN, for example, will not yet trigger a notification and will not yet be displayed as a
problem in the Overview. The intermediate state in which the service will now be in is
called the 'soft state'. Only when the state remains not OK for three consecutive checks —
which is a total duration of just over two minutes — will a persistent problem be reported.
Only a hard state will trigger a notification.

This is admittedly not an attractive solution. You should always try to get to the root of
any problem, but sometimes things are just the way they are, and with the 'check
attempts' you at least have a viable way around such situations.

13. Keeping the list of services up to date

In any data centre, work is constantly being carried out, and so the list of services to be
monitored will never remain static. To make sure you don’t miss anything, Checkmk
automatically sets up a special service for you on each host — this service is known as the
Check_MK Discovery:

By default, every two hours this service checks whether new — not yet monitored —
services have been found or existing services have been dropped. If this is the case, the
identified service go to WARN. You can then call up the service configuration (on the
Services of host page) and bring the services list back up to the current status.

Detailed information on this 'Discovery Check' can be found in the article on configuring
services. There you can also learn how you can have unmonitored services added
automatically, which makes the work in a large configuration much easier.

Tip: With Monitor > Analyze > Unmonitored services you can call up a view that shows
you any new or dropped services.

Continue with multiple users

14/15
15/15

Group A Filing System
100% (1)
Group A Filing System
23 pages
Marco Reale Check MK Beginner Guide
No ratings yet
Marco Reale Check MK Beginner Guide
110 pages
App Manager For Microsoft SQ L Server
No ratings yet
App Manager For Microsoft SQ L Server
84 pages
SCOM
100% (1)
SCOM
7 pages
Proof of Concept Guide For ManageEngine OpManager
No ratings yet
Proof of Concept Guide For ManageEngine OpManager
29 pages
Components of Operating System FILE MANAGER
No ratings yet
Components of Operating System FILE MANAGER
8 pages
Netsm110 - Server Administration Guide
No ratings yet
Netsm110 - Server Administration Guide
300 pages
Opmanager Standard Userguide
No ratings yet
Opmanager Standard Userguide
717 pages
TXSeries For Multiplatforms SFS Server and PPC Gateway Server - Advanced Administration Version 6.2
No ratings yet
TXSeries For Multiplatforms SFS Server and PPC Gateway Server - Advanced Administration Version 6.2
277 pages
Oracle DBA Daily Checklist
No ratings yet
Oracle DBA Daily Checklist
17 pages
Telecom Fault Management Overview
No ratings yet
Telecom Fault Management Overview
82 pages
Oracle EBS Maintenance and Management Guide
No ratings yet
Oracle EBS Maintenance and Management Guide
11 pages
Opmanager Userguide
No ratings yet
Opmanager Userguide
385 pages
03 FSRM
No ratings yet
03 FSRM
7 pages
Procedimientos Administrador Dba - Checklist17
No ratings yet
Procedimientos Administrador Dba - Checklist17
19 pages
System Center Service Manager Implementation Guide
No ratings yet
System Center Service Manager Implementation Guide
62 pages
MC Console 3
No ratings yet
MC Console 3
38 pages
Patrol Configuration Manager User Guide: August 15, 2003
No ratings yet
Patrol Configuration Manager User Guide: August 15, 2003
258 pages
How To Conduct A Firewall Audit
No ratings yet
How To Conduct A Firewall Audit
9 pages
Ibm 000 8697 PDF
No ratings yet
Ibm 000 8697 PDF
702 pages
Windows File Server Auditing Guide
No ratings yet
Windows File Server Auditing Guide
18 pages
Module 12: Configuring and Managing Storage Technologies
No ratings yet
Module 12: Configuring and Managing Storage Technologies
33 pages
HA Solutions For Windows, SQL, and Exchange Servers
No ratings yet
HA Solutions For Windows, SQL, and Exchange Servers
61 pages
Windows Server 2012 File and Storage Services Management
No ratings yet
Windows Server 2012 File and Storage Services Management
42 pages
Top DBA Shell Scripts For Monitoring The Database
No ratings yet
Top DBA Shell Scripts For Monitoring The Database
6 pages
1 - Maintaining A Stable System - English
No ratings yet
1 - Maintaining A Stable System - English
9 pages
Operating and Extending Service Management in The Private Cloud
No ratings yet
Operating and Extending Service Management in The Private Cloud
28 pages
Server Maintenance Schedule
No ratings yet
Server Maintenance Schedule
5 pages
Introduction To File Server Resource Manager
No ratings yet
Introduction To File Server Resource Manager
4 pages
Zabbix Monitoring for IT Pros
No ratings yet
Zabbix Monitoring for IT Pros
40 pages
Operations Manager 2007 R2 Design Guide: Author
No ratings yet
Operations Manager 2007 R2 Design Guide: Author
44 pages
FlexTk: Automated Storage Management
No ratings yet
FlexTk: Automated Storage Management
4 pages
Installation and Administration Guide
No ratings yet
Installation and Administration Guide
228 pages
TSM Administration Reference
No ratings yet
TSM Administration Reference
1,415 pages
Au Satslowsys PDF
No ratings yet
Au Satslowsys PDF
11 pages
Analyzing Performance Data
No ratings yet
Analyzing Performance Data
2 pages
File Management in Operating Systems
100% (2)
File Management in Operating Systems
100 pages
IBM Backup Guide
No ratings yet
IBM Backup Guide
364 pages
Admin Reference TSM
No ratings yet
Admin Reference TSM
1,469 pages
ManageEngine OpManager Plus Datasheet
No ratings yet
ManageEngine OpManager Plus Datasheet
2 pages
Opmanager Plus
No ratings yet
Opmanager Plus
2 pages
Fault Management
No ratings yet
Fault Management
8 pages
Scom Interview Questions
No ratings yet
Scom Interview Questions
3 pages
DBA Checklist 14
No ratings yet
DBA Checklist 14
17 pages
InfoSphere DataStage - Operations Console Guide and Reference 9.1
No ratings yet
InfoSphere DataStage - Operations Console Guide and Reference 9.1
124 pages
OpManager Plus System Requirements
No ratings yet
OpManager Plus System Requirements
6 pages
2016, Q1 - Concurrent Manager Express Lane - 15 Seconds or Less Please - Spring 16
No ratings yet
2016, Q1 - Concurrent Manager Express Lane - 15 Seconds or Less Please - Spring 16
10 pages
Group A File System
No ratings yet
Group A File System
8 pages
Frp0 BK Commands
No ratings yet
Frp0 BK Commands
540 pages
FAT32 vs NTFS: Key Differences Explained
No ratings yet
FAT32 vs NTFS: Key Differences Explained
7 pages
Unit - 5 UNIX / Linux - File System Basics: Directory Structure
No ratings yet
Unit - 5 UNIX / Linux - File System Basics: Directory Structure
41 pages
FileManager ZOS
No ratings yet
FileManager ZOS
1,248 pages
AdvancedMonitoringOverview CommunityWiki
No ratings yet
AdvancedMonitoringOverview CommunityWiki
4 pages
10) Lecture10
No ratings yet
10) Lecture10
8 pages
Assign Os
No ratings yet
Assign Os
15 pages
SAP Access Control™ 10.1 - Process Control™ 10.1 - Risk Management™ 10
No ratings yet
SAP Access Control™ 10.1 - Process Control™ 10.1 - Risk Management™ 10
40 pages
Managing Your Datacenter: With System Center 2012 R2
No ratings yet
Managing Your Datacenter: With System Center 2012 R2
176 pages
Unit Iv: Sumit Sar Department of Computer Science & Engineering BIT, Durg
No ratings yet
Unit Iv: Sumit Sar Department of Computer Science & Engineering BIT, Durg
23 pages
Core Python
No ratings yet
Core Python
53 pages
C Storage Classes and Pointer Examples
No ratings yet
C Storage Classes and Pointer Examples
159 pages
UNIT II Functions
No ratings yet
UNIT II Functions
59 pages
Xii Functions Answers
No ratings yet
Xii Functions Answers
10 pages
EMR-3000 Motor Relay Manual
No ratings yet
EMR-3000 Motor Relay Manual
774 pages
Software Development Overview
No ratings yet
Software Development Overview
53 pages
Unity3D RaceGame Tutorial
100% (1)
Unity3D RaceGame Tutorial
42 pages
Sean Allen SwiftUI Fundamentals - FULL COURSE - Beginner Friendly
No ratings yet
Sean Allen SwiftUI Fundamentals - FULL COURSE - Beginner Friendly
192 pages
CRM Hand-Book @ZOHO
No ratings yet
CRM Hand-Book @ZOHO
142 pages
Chattanooga OptiFlex 3 Model-2090 User Manual
No ratings yet
Chattanooga OptiFlex 3 Model-2090 User Manual
68 pages
SE - Computer Engineering - 2024 Pattern - 15072025
No ratings yet
SE - Computer Engineering - 2024 Pattern - 15072025
82 pages
Bayesian Kernel Machine Regression
No ratings yet
Bayesian Kernel Machine Regression
21 pages
Unit Iv
No ratings yet
Unit Iv
75 pages
Computer Programming (Unit III & IV) Program For Matrix Multiplication
No ratings yet
Computer Programming (Unit III & IV) Program For Matrix Multiplication
17 pages
Cambridge International AS & A Level: Computer Science 9618/21
No ratings yet
Cambridge International AS & A Level: Computer Science 9618/21
20 pages
REF615
No ratings yet
REF615
240 pages
Chapter 6 Input in Java
No ratings yet
Chapter 6 Input in Java
14 pages
AK3 Series Temperature Controller: Outer Dimension and Install Hole Size
No ratings yet
AK3 Series Temperature Controller: Outer Dimension and Install Hole Size
2 pages
Part - C - Answers
No ratings yet
Part - C - Answers
52 pages
Delta NC200 CNC Lathe Controller
No ratings yet
Delta NC200 CNC Lathe Controller
8 pages
The Complete Java Crash Course - Learn Interactively
No ratings yet
The Complete Java Crash Course - Learn Interactively
93 pages
Sort Ibm
No ratings yet
Sort Ibm
946 pages
Class 11 IP PA 1 25-26
No ratings yet
Class 11 IP PA 1 25-26
6 pages
SAP Fiori OData ABAP Services Guide
No ratings yet
SAP Fiori OData ABAP Services Guide
85 pages
Week-9.2 Typescript Intro
No ratings yet
Week-9.2 Typescript Intro
29 pages
A Complete I Troduction To: Functional Test Automation
No ratings yet
A Complete I Troduction To: Functional Test Automation
57 pages
2D 1 Tutorial
No ratings yet
2D 1 Tutorial
202 pages
Iev22 24D GB
No ratings yet
Iev22 24D GB
60 pages
BAHRIA UNIVERSITY (Karachi Campus) : Object-Oriented Programming (Csc-210)
100% (1)
BAHRIA UNIVERSITY (Karachi Campus) : Object-Oriented Programming (Csc-210)
5 pages

Fine-Tuning Checkmk Monitoring Rules

Uploaded by

Fine-Tuning Checkmk Monitoring Rules

Uploaded by

Fine-tuning the monitoring

1. False alarms — fatal to any monitoring

Do not include too many hosts in the monitoring all at once.

5.1. How host tags work

5.2. Defining host tags

The IDs must be unique across all groups.

5.3. Assigning a tag to a host

6. Customising file system thresholds

7. Sending hosts into downtime

In the Checkmk Enterprise Editions there are regular scheduled

8. Ignoring switched-off hosts

10. Disabling services permanently

11. Catching outliers using mean values

12. Managing sporadic errors

13. Keeping the list of services up to date

Continue with multiple users

You might also like