Continuous Monitoring Interview Questions
Continuous Monitoring Interview Questions
continuous audit
continuous controls monitoring
continuous transaction inspection
You can answer this question by first mentioning that Nagios is one of the monitoring
tools. It is used for Continuous monitoring of systems, applications, services, and
business processes etc in a DevOps culture. In the event of a failure, Nagios can alert
technical staff of the problem, allowing them to begin remediation processes before
outages affect business processes, end-users, or customers. With Nagios, you don’t
have to explain why an unseen infrastructure outage affect your organization’s bottom
line.
Now once you have defined what is Nagios, you can mention the various things that
you can achieve using Nagios.
By using Nagios you can:
This completes the answer to this question. Further details like advantages etc. can be
added as per the direction where the discussion is headed.
Q3. How does Nagios works?
I will advise you to follow the below explanation for this answer:
Nagios runs on a server, usually as a daemon or service. Nagios periodically runs
plugins residing on the same server, they contact hosts or servers on your network or
on the internet. One can view the status information using the web interface. You can
also receive email or SMS notifications if something happens.
The Nagios daemon behaves like a scheduler that runs certain scripts at certain
moments. It stores the results of those scripts and will run other scripts if these results
change.
Now expect a few questions on Nagios components like Plugins, NRPE etc..
Begin this answer by defining Plugins. They are scripts (Perl scripts, Shell scripts, etc.)
that can run from a command line to check the status of a host or service. Nagios uses
the results from Plugins to determine the current status of hosts and services on your
network.
Once you have defined Plugins, explain why we need Plugins. Nagios will execute a
Plugin whenever there is a need to check the status of a host or service. Plugin will
perform the check and then simply returns the result to Nagios. Nagios will process the
results that it receives from the Plugin and take the necessary actions.
For this answer, give a brief definition of Plugins. The NRPE addon is designed to allow
you to execute Nagios plugins on remote Linux/Unix machines. The main reason for
doing this is to allow Nagios to monitor “local” resources (like CPU load, memory usage,
etc.) on remote machines. Since these public resources are not usually exposed to
external machines, an agent like NRPE must be installed on the remote Linux/Unix
machines.
I will advise you to explain the NRPE architecture on the basis of diagram shown below.
The NRPE addon consists of two pieces:
According to me, the answer should start by explaining Passive checks. They are
initiated and performed by external applications/processes and the Passive check
results are submitted to Nagios for processing.
Then explain the need for passive checks. They are useful for monitoring services that
are Asynchronous in nature and cannot be monitored effectively by polling their status
on a regularly scheduled basis. They can also be used for monitoring services that are
Located behind a firewall and cannot be checked actively from the monitoring host.
Make sure that you stick to the question during your explanation so I will advise you to
follow the below mentioned flow. Nagios check for external commands under the
following conditions:
For this answer, first point out the basic difference Active and Passive checks. The
major difference between Active and Passive checks is that Active checks are initiated
and performed by Nagios, while passive checks are performed by external applications.
If your interviewer is looking unconvinced with the above explanation then you can also
mention some key features of both Active and Passive checks:
Passive checks are useful for monitoring services that are:
First mention what this main configuration file contains and its function. The main
configuration file contains a number of directives that affect how the Nagios daemon
operates. This config file is read by both the Nagios daemon and the CGIs (It specifies
the location of your main configuration file).
Now you can tell where it is present and how it is created. A sample main configuration
file is created in the base directory of the Nagios distribution when you run the
configure script. The default name of the main configuration file is nagios.cfg. It is
usually placed in the etc/ subdirectory of you Nagios installation (i.e.
/usr/local/nagios/etc/).
I will advise you to first explain Flapping first. Flapping occurs when a service or host
changes state too frequently, this causes lot of problem and recovery notifications.
Once you have defined Flapping, explain how Nagios detects Flapping. Whenever
Nagios checks the status of a host or service, it will check to see if it has started or
stopped flapping. Nagios follows the below given procedure to do that:
Storing the results of the last 21 checks of the host or service analyzing the
historical check results and determine where state changes/transitions occur
Using the state transitions to determine a percent state change value (a measure
of change) for the host or service
Comparing the percent state change value against low and high flapping
thresholds
A host or service is determined to have started flapping when its percent state change
first exceeds a high flapping threshold. A host or service is determined to have stopped
flapping when its percent state goes below a low flapping threshold.
Q12. What are the three main variables that affect recursion
and inheritance in Nagios?
Name
Use
Register
I will advise you to first give a small introduction on State Stalking. It is used for
logging purposes. When Stalking is enabled for a particular host or service, Nagios will
watch that host or service very carefully and log any changes it sees in the output of
check results.
Depending on the discussion between you and interviewer you can also add, “It can be
very helpful in later analysis of the log files. Under normal circumstances, the result of
a host or service check is only logged if the host or service has changed state since it
was last checked.”
The node itself may be up but because Nagios is unable to connect to it, it has to mark
this as unreachable. To achieve this, Nagios use parent-child relationship between
components.
The current state of monitored services and hosts is determined by two components:
The status of service or host i.e. OK, WARNING, UP, DOWN etc..
The type of state the service or host is in.
There are two types of states SOFT states and HARD states.
Now explain what is Soft and Hard states:
When a service or host check results are in a non-OK or non-UP state and the
service check has not yet been rechecked the number of times specified by the
max_check_attempts directives in the service or host definition. This is called
Soft Error. When a service or a host recovers from Soft Error that is considered
as Soft Recovery.
When a service or host check results are in a non-OK or non-UP state and the
service check has been rechecked the number of times specified by the
max_check_attempts directives in the service or host definition. This is called
Hard Error. When a service or a host recovers from Hard Error that is considered
as Hard Recovery.
This is the end of my blog on Nagios interview questions and if you want in-depth
knowledge about the whole DevOps life-cycle click on the button below: