Lab Guide - ITSI Search Party
Lab Guide - ITSI Search Party
Ready to get some hand-on experience with Splunk and IT Service Intelligence? We’re
going to create our own Service, KPIs, and Glass Table today! But first a quick
overview of what we’re doing…
This search returns a list of sources and sourcetypes for the data already ingested into
Splunk.
As a reminder, Splunk can automatically identify fields when data is searched. You’ll
find these fields listed on the left side of the search screen. Mouse over a field to see a
pop-up of the top 10 values for that field. Select one for filtering or reporting options.
The fundamental data sources that will always be available are: Time, Source,
Sourcetype, and Host. Not all of these will be relevant or useful to our Services today,
but it’s good to know your data. You’ve been provided a cheat sheet that summarizes
the sourcetypes on the next page.
1
Sourcetypes in today’s Splunk instance:
Windows Log
WinEventLog:Security Failure_Reason tag dc-01 Events
webserver-0
1
Action, bytes, bytes_in, bytes_out, webserver-0 Web Server
apache:access response_time ,status 2 Logs
AWS
aws:cloudfront Bytes, bytes_in, bytes_out, status splunk_sh-01 CloudFront
Load Balancer
f5:bigip:ltm:locallb:pool:icon Current_conns, throughput, Transport
trol total_conns splunk_sh-01 Layer Info
2xx_codes, 3xx_codes,
4xx_codes, 5xx_codes, Load Balancer
f5:bigip:system:statistics:ico get_requests, post_requests, Request-level
ntrol requests splunk_sh-01 Info
App_name,
application_summary.error_rate,
application_summary.response_ti
me,
application_summary.throughput,
end_user_summary.response_tim
newrelic:applications e, end_user_summary.throughput splunk_sh-01 APM
Configuration
snow:change_request Made_sla, phase splunk_sh-01 Changes
appserver-01
Bytes, bytes_in, bytes_out, appserver-02 Application
stream:http response_time, status appserver-03 Performance
appserver-01
appserver-02 DB
stream:mysql Bytes, query_time, response_time appserver-03 Performance
2
Table of Contents:
Brief intro: 1
Sourcetypes in today’s Splunk instance: 2
Table of Contents: 3
1. ITSI Service Review 4
2. Create a Metrics-based KPI: CPU Utilization: % 5
3. Create an ad-hoc KPI: Database Service Response Time 12
4. Exploring KPI Base searches 15
5. Leverage Base searches for an existing KPI 19
6. Leverage Base searches for a new KPI 20
7. Getting notified when there is a problem: Multi KPI Alerts 22
8. Deep Diving Your Data 26
9. Building Glass Tables 30
3
1. ITSI Service Review
Before we start configuring, let’s see which services are currently configured in our
instance.
This shows you a simple stop light chart of the various services configured with Service
Health Scores and Key Performance Indicators (KPIs).
It looks like someone started to create the Database Service, but it’s missing some KPIs
that we would like to track. Let’s finish the creation of the Database Service by editing
the service, adding some KPIs!
KPIs are a metric value, combined with a threshold levels corresponding to Normal,
Low, Medium, High, Critical, or Info.
4
2. Create a Metrics-based KPI: CPU Utilization: %
Note: In a production environment, you’d already have OS monitoring using a
Splunk-provided service template for OS monitoring for the usual KPIs in CPU, Memory,
Disk, and networking. Here, we’re just using this KPI to demonstrate addition of a KPI.
We will now proceed to configure our first KPI! This KPI will be based on the CPU
utilization of our database servers, allowing us to track this KPI in ITSI.
Because we are working out of a shared instance for this search party, you will clone an
existing service for your own work.
1. Find Database from the list of services, select Database then and click Edit →
Clone
5
2. Click to edit your newly cloned database service. Going forward, when this Lab
Document references the ‘Database’ service, it is referring to the clone you just
created.
6
Click Next
Click Next and on the next screen, fill in options as shown here:
7
- Entity Split Field: host
Filter to Entities in Service: Yes
- Entity Filter Field: host
Entity Alias Filtering: host [Note: you can start typing ‘host’ to speed this up]
Next → Step 4 of 7
Next → Step 5 of 7
Unit: %
8
Next → Step 6 of 7
Next → Step 7 of 7
On the thresholds page, click “Add Threshold” three times. Set severities levels as
shown here:
9
You have just set the aggregate thresholds, or “what to look for across all hosts”. Next,
set per-host thresholds as well. Click the “Per Entity Thresholds” tab, then click “Apply
values”. Notice that when you do, you will see all the values for each host in this
service.
Click Finish.
How did ITSI know which hosts were in this Database service? Click the Entities tab
just to review. ITSI allows you to define which hosts, which entities apply to a service –
all in one place. Any KPIs using “Filter to entities in service” will use that list.
10
***Click the GREEN ‘Save’ button in the bottom right corner once completed!***
11
3. Create an ad-hoc KPI: Database Service Response Time
On the KPIs tab of the database service, click
New → Generic KPI
Next → Step 2 of 7
Last time, we use a metrics index for our search. Here, we create a KPI using a regular
Splunk search. This allows you to leverage virtually any log source, event source, or
even wire data for powering your KPIs.
Click run search, which opens a new tab. You should see wealth of detail including
queries, response timing, and more. Now that you verified your search, close this tab.
Next → Step 3 of 7
12
Leave defaults (No, No)
Next → Step 4 of 7
Leave defaults
Next → Step 5 of 7
Units: ms
Monitoring Lag: 30
13
Next → Step 6 of 7
Toggle the backfill switch, then leave defaults
Next → Step 7 of 7
Finish
***Click the GREEN ‘Save’ button in the bottom right corner once completed!***
14
4. Exploring KPI Base searches
A powerful feature of ITSI is the “Base Search”. This feature allows a single search, run
by Splunk, to provide multiple metrics to multiple services, all at once. By leveraging a
library of well-crafted Base Searches, administrators can create additional KPIs without
having to know advanced SPL.
15
Create
Click Run Search. By going back and forth between these two tabs, you can pick from
extracted metrics, and create metrics using the data as a reference.
16
Title: Response Time
Threshold Field: time_taken
Configure settings as shown here. While you are at it, look at the other options for the
aggregate calculation and other settings.
Click +_Add__
17
We will now explore a count-based KPI. While other KPIs leverage values reported into
the Splunk index, a count-based KPI is analyzing the number of matching events in the
index. To do so, we must count a field that always exists within the data set. Here, we
use “host” for convenience.
Title: Number of DB queries
Threshold Field: host
Click _Add__
Click _Save__
18
5. Leverage Base searches for an existing KPI
With our new KPI base search, switching our previous ad-hoc search will improve
search efficiency, and will improve our old “average” calculation with the Perc95.
Configure → Services
Edit your “Database ____” S
ervice
Select the DB Response Time KPI
On the line reading with the “Source” for this KPI, click Edit
Select the KPI base search you just created, and the Response Time metric.
Click Finish. You will get a warning like this, due to the change in calculation metric.
Click _Yes__.
19
6. Leverage Base searches for a new KPI
Now, we create a new KPI using your KPI base search.
Next → Step 2 of 7
Select the KPI base search you created, and the “Number of DB queries” metric
Next → Steps 3 - 5
Leave defaults (you will not be able to edit)
Next → Step 6
Toggle backfilling to last 7 days
Next → Step 7
Leave defaults
The Base Search provides the metric values, but the KPI itself provides the threshold
values. This allows multiple services to leverage the same base search, but have
different thresholds applied to different groups of hosts. Example: DB servers used by
the customer-facing online store might have one set of thresholds, while the HR
database might have the same metric – but different expected values.
For our database service, let’s say we want to track the number of requests – but not
use this as a factor in computing the health score. For this, set the base severity to
“Info”. (For OS health, this is typical for KPIs like network bandwidth.)
20
Finish
Click _Save__
21
7. Getting notified when there is a problem: Multi KPI Alerts
Let’s say we have a well-crafted service with appropriate KPIs. If there’s a problem with
the service, we want an alert to be generated. One alert with all of the related KPI
OT one alert per KPI per host. Here’s how:
statuses, N
Set calculation to Composite Score and change your time range to Relative, past 5
minutes.
Select your d atabase service from the left column, then check each box of the KPIs
listed, ignoring any that are set to “info”. Click “Add Selected” near the top of the
screen. Your setting should look something like this:
22
Now, let’s say that you want to get a single alert sent to the Notable Events framework,
if any of these KPIs s hows a problem at Medium or above. The alert will include the
status of each KPI, all in the single alert.
To do this, we will change the weighting (via sliders) for each KPI, up to 11. That’s
right, Splunk goes to 11.
With everything set to 11, even if all other KPIs are currently “green”, that one KPI in the
red will result in a status of red. The worst KPI status of the group will be used for the
status of the group. (Info status values are ignored.)
SAVE!
23
The Create Correlation Search - Step 1 window will pop-up. Customize the alert metrics
based on the fields depicted in the image below.
24
In the Create Correlation Search - Step 2 window, customize the correlation search.
Great work on getting to this point! Let’s do a quick recap of what we’ve done thus far.
- Created multiple KPIs to finish our database service dependencies
- Adjusted the health settings of our database service to fit our needs
- Used a correlation search to notify us when our database service health
diminishes
We will dive into generating a Deep Dive view next!
25
8. Deep Diving Your Data
Deep Dive provides a swimlane view of KPIs against time. This enables analysts and
engineers to do what they do best, pattern match based on graphics! Let’s dive (pardon
the pun) straight into it!
Title: Database##
Permissions: Shared in App
Click Create
26
Click on the Deep Dive link you just created (e.g., database##)
Click the “+ Add Lane” dropdown and select “Add KPI Lane”
27
Service: Database##
KPI: ServiceHealthScore
Click Create Lane
Please repeat the prior steps for the remaining KPIs within your Database## service
28
Once you’ve completed the configurations, you should see a Deep Dive view similar to
the one displayed below.
Feel free to rearrange the lanes around. Based on the data in the lanes, can you
describe why the Service Health Score is dropping for our Database Service?
29
9. Building Glass Tables
In this section, we will CLONE the Digital Transaction Flow glass table and apply the
new database service we created.
Find the existing 4 - Digital Transaction Flow dashboard from the list.
Click Edit → Clone
30
Title: 4 - Digital Transaction Flow 00
Permissions: Shared in App
Click Clone
Click on the new glass table you just created (4 - Digital Transaction Flow
##)
Notice that the Database Service is greyed out and set to N/A
31
We will now proceed to replacing the Database block with our newly created service.
Click Edit
In the Services sidebar on the left, identify the service your created. Click on the “>”
next to your service to expand it: > Database##
32
Once the service is expanded, you will see the list of KPIs and aggregated health
scores you have created in the previous steps (see below).
Drag and drop the ServiceHealthScore onto the Glass Table canvas. A new widget
appears with the right panel “Configurations” populated. Screen below cropped for ease
of reading.
Alter the Configurations side bar for this widget to reflect the following:
● Viz Type: Single Value
● Width: 180
33
● Height: 180
● Label Box: Off
● Thresholds: On
● Drilldown: On
○ Change Default to Saved Deep Dive
○ Find the name of the Deep Dive you created earlier (e.g., database##)
Click Update
Drag and drop the widget to cover the grey N/A box under DATABASE.
34
Click Save
Click View
Congratulations! You have just created and altered your first Glass Table! You can
now click into the Service Health Score widget you just created to drill down into the
Deep Dive View you created in prior steps.
35