0% found this document useful (0 votes)
156 views35 pages

Lab Guide - ITSI Search Party

Uploaded by

Victor
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
156 views35 pages

Lab Guide - ITSI Search Party

Uploaded by

Victor
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Brief intro:

Ready to get some hand-on experience with Splunk and IT Service Intelligence? We’re
going to create our own Service, KPIs, and Glass Table today! But first a quick
overview of what we’re doing…

Let’s see what data we have in our Splunk system:


1. Navigate to the Search App
2. Apps > Searching & Reporting
3. In the Search bar type in the following search string: ​index=*
4. Set search range to ​Last 60 Minutes​: *
5. Click the search icon

This search returns a list of sources and sourcetypes for the data already ingested into
Splunk.

As a reminder, Splunk can automatically identify fields when data is searched. You’ll
find these fields listed on the left side of the search screen. Mouse over a field to see a
pop-up of the top 10 values for that field. Select one for filtering or reporting options.

The fundamental data sources that will always be available are: Time, Source,
Sourcetype, and Host. Not all of these will be relevant or useful to our Services today,
but it’s good to know your data. You’ve been provided a cheat sheet that summarizes
the sourcetypes on the next page​.

1
Sourcetypes in today’s Splunk instance:

Sourcetype Interesting Fields Host Type of Data

Windows Log
WinEventLog:Security Failure_Reason tag dc-01 Events

webserver-0
1
Action, bytes, bytes_in, bytes_out, webserver-0 Web Server
apache:access response_time ,status 2 Logs

AWS
aws:cloudfront Bytes, bytes_in, bytes_out, status splunk_sh-01 CloudFront

Load Balancer
f5:bigip:ltm:locallb:pool:icon Current_conns, throughput, Transport
trol total_conns splunk_sh-01 Layer Info

2xx_codes, 3xx_codes,
4xx_codes, 5xx_codes, Load Balancer
f5:bigip:system:statistics:ico get_requests, post_requests, Request-level
ntrol requests splunk_sh-01 Info

Carrier, device, statusCode, url, Mobile Device


mint:network user splunk_sh-01 Data

App_name,
application_summary.error_rate,
application_summary.response_ti
me,
application_summary.throughput,
end_user_summary.response_tim
newrelic:applications e, end_user_summary.throughput splunk_sh-01 APM

Configuration
snow:change_request Made_sla, phase splunk_sh-01 Changes

appserver-01
Bytes, bytes_in, bytes_out, appserver-02 Application
stream:http response_time, status appserver-03 Performance

appserver-01
appserver-02 DB
stream:mysql Bytes, query_time, response_time appserver-03 Performance

2
Table of Contents:

Brief intro: 1
Sourcetypes in today’s Splunk instance: 2
Table of Contents: 3
1. ITSI Service Review 4
2. Create a Metrics-based KPI: CPU Utilization: % 5
3. Create an ad-hoc KPI: Database Service Response Time 1​2
4. Exploring KPI Base searches 1​5
5. Leverage Base searches for an existing KPI 1​9
6. Leverage Base searches for a new KPI 20
7. ​Getting notified when there is a problem: Multi KPI Alerts 22
8. ​Deep Diving Your Data 2​6
9. ​Building Glass Tables 30

3
1. ITSI Service Review

Before we start configuring, let’s see which services are currently configured in our
instance.

From the App menu select​ I​ T Service Intelligence

This shows you a simple stop light chart of the various services configured with Service
Health Scores and Key Performance Indicators (KPIs).

Let’s see if there is an existing Database service in ITSI:


In the Filter Services search bar, start typing ​Database. ​When you see ​Database​ on the
list, just click the line. Next, click the database tile.

It looks like someone started to create the Database Service, but it’s missing some KPIs
that we would like to track. Let’s finish the creation of the Database Service by editing
the service, adding some KPIs!

KPIs are a metric value, combined with a threshold levels corresponding to Normal,
Low, Medium, High, Critical, or Info.

4
2. Create a Metrics-based KPI: CPU Utilization: %
Note: In a production environment, you’d already have OS monitoring using a
Splunk-provided service template for OS monitoring for the usual KPIs in CPU, Memory,
Disk, and networking. Here, we’re just using this KPI to demonstrate addition of a KPI.

We will now proceed to configure our first KPI! This KPI will be based on the CPU
utilization of our database servers, allowing us to track this KPI in ITSI.

From the ITSI Application, go to ​Configure > Services

Because we are working out of a shared instance for this search party, you will clone an
existing service for your own work.

1. Find ​Database​ from the list of services, select Database then and click ​Edit →
Clone

Name the clone according to your username.


Example: Database-jsmith

5
2. Click to edit your newly cloned database service. Going forward, ​when this Lab
Document references the ‘Database’ service, it is referring to the clone you just
created.

3. In your new service, select


New > Generic KPI

Title​: ​CPU Utilization: %


Description:​ ​The percentage of CPU being utilized

6
Click ​Next

Here, we create a KPI using values from a Splunk metrics index.

KPI Source:​ ​Metrics Search


Select a metrics index:​ e
​ m_metrics
Select a metric:​ ​cpu.system

Click Next ​and on the next screen, fill in options as shown here:

Split by Entity: ​Yes

7
- Entity Split Field: ​host
Filter to Entities in Service: ​Yes
- Entity Filter Field: ​host
Entity Alias Filtering: ​host ​[Note: you can start typing ‘host’ to speed this up]

Next​ → Step 4 of 7

Set as pictured above!

Next​ → Step 5 of 7

Unit​: ​%

8
Next​ → Step 6 of 7

Toggle Backfill to last 7 days

Next​ → Step 7 of 7
On the thresholds page, click “Add Threshold” three times. Set severities levels as
shown here:

9
You have just set the aggregate thresholds, or “what to look for across all hosts”. Next,
set per-host thresholds as well. Click the “Per Entity Thresholds” tab, then click “Apply
values”. Notice that when you do, you will see all the values for each host in this
service.

Click ​Finish.

How did ITSI know which hosts were in this Database service? Click the Entities tab
just to review. ITSI allows you to define which hosts, which entities apply to a service –
all in one place. Any KPIs using “Filter to entities in service” will use that list.

10
***Click the ​GREEN​ ‘Save’ button in the bottom right corner once completed!***

11
3. Create an ad-hoc KPI: Database Service Response Time
On the KPIs tab of the database service, click
New → Generic KPI

Title:​ ​DB Response Time


Description:​ ​Response time in milliseconds

Next​ → Step 2 of 7

Last time, we use a metrics index for our search. Here, we create a KPI using a regular
Splunk search. This allows you to leverage virtually any log source, event source, or
even wire data for powering your KPIs.

KPI Source:​ ​Ad Hoc Search


Search​: ​sourcetype=stream:mysql query=*
In Threshold​ ​Field:​ (​type in)​:​ ​time_taken

Click run search, which opens a new tab. You should see wealth of detail including
queries, response timing, and more. Now that you verified your search, close this tab.
Next​ → Step 3 of 7

12
Leave defaults ​(No, No)

Next​ → Step 4 of 7
Leave defaults

Next​ → Step 5 of 7

Units​: ​ms
Monitoring Lag:​ ​30

13
Next​ → Step 6 of 7
Toggle the backfill switch, then leave defaults

Next​ → Step 7 of 7

Finish

***Click the ​GREEN​ ‘Save’ button in the bottom right corner once completed!***

14
4. ​Exploring​ KPI Base searches
A powerful feature of ITSI is the “Base Search”. This feature allows a single search, run
by Splunk, to provide multiple metrics to multiple services, all at once. By leveraging a
library of well-crafted Base Searches, administrators can create additional KPIs without
having to know advanced SPL.

Configure → KPI Base Searches

Create KPI Base Search​ (​green box, upper right corner)​

Be sure to title your KPI base search according to your username


Title:​ ​DB Metrics - jsmith

15
Create

Search​: ​Ad Hoc Search


sourcetype=stream:mysql query=*

Click ​Run Search. ​By going back and forth between these two tabs, you can pick from
extracted metrics, and create metrics using the data as a reference.

On the Base Search Editor tab, Click ​Add Metric


We already created an ad-hoc KPI for time taken, but here we’ll improve on this
approach in two ways. One, by putting it into a base search, this metric will be available
for others without having to search the raw data. Two, we will use some fancy metrics
analysis. Our first metric will determine if a majority of queries are taking longer than
expected; this metric will ignore the top 5% as outliers.

16
Title:​ ​Response Time
Threshold Field​: ​time_taken

Configure settings as shown here. While you are at it, look at the other options for the
aggregate calculation and other settings.

Click +​_Add_​_

17
We will now explore a count-based KPI. While other KPIs leverage values reported into
the Splunk index, a count-based KPI is analyzing the number of matching events in the
index. To do so, we must count a field that always exists within the data set. Here, we
use “host” for convenience.
Title:​ ​Number of DB queries
Threshold Field​: ​host

Create a metric for host the info pictured above.

Click ​_Add_​_

Click ​_Save_​_

18
5. Leverage Base searches for an existing KPI
With our new KPI base search, switching our previous ad-hoc search will improve
search efficiency, and will improve our old “average” calculation with the Perc95.

Configure → Services
Edit your​ ​“​Database ____” S
​ ervice
Select the DB Response Time KPI
On the line reading with the “Source” for this KPI, click Edit

Select the KPI base search you just created, and the Response Time metric.

Click Finish. You will get a warning like this, due to the change in calculation metric.

Click ​_Yes_​_​.

That was easy. No typing at all.

19
6. Leverage Base searches for a new KPI
Now, we create a new KPI using your KPI base search.

Edit your​ ​“​Database ____” S


​ ervice
New → Generic KPI

TItle:​ ​Database Service Requests

Next​ → Step 2 of 7
Select the KPI base search you created, and the “Number of DB queries” metric

Next​ → Steps 3 - 5
Leave defaults (you will not be able to edit)

Next​ → Step 6
Toggle backfilling to last 7 days

Next​ → Step 7
Leave defaults

The Base Search provides the metric values, but the KPI itself provides the threshold
values. This allows multiple services to leverage the same base search, but have
different thresholds applied to different groups of hosts. Example: DB servers used by
the customer-facing online store might have one set of thresholds, while the HR
database might have the same metric – but different expected values.

For our database service, let’s say we want to track the number of requests – but not
use this as a factor in computing the health score. For this, set the base severity to
“Info”. (For OS health, this is typical for KPIs like network bandwidth.)

20
Finish

Click ​_Save_​_

21
7. Getting notified when there is a problem: Multi KPI Alerts
Let’s say we have a well-crafted service with appropriate KPIs. If there’s a problem with
the service, we want an alert to be generated. One alert with all of the related KPI
​ OT​ one alert per KPI per host. Here’s how:
statuses, N

In ITSI, Select ​Multi KPI Alerts

Set calculation to Composite Score and change your time range to Relative, past 5
minutes.

Select ​your d​ atabase service from the left column, then check each box of the KPIs
listed, ignoring any that are set to “info”. Click “Add Selected” near the top of the
screen. Your setting should look something like this:

22
Now, let’s say that you want to get a single alert sent to the Notable Events framework,
if any of these KPIs s​ hows a problem at Medium or above. The alert will include the
status of each KPI, all in the single alert.

To do this, we will change the weighting (via sliders) for each KPI, up to 11. That’s
right, Splunk goes to 11.

With everything set to 11, even if all other KPIs are currently “green”, that one KPI in the
red will result in a status of red. The worst KPI status of the group will be used for the
status of the group. (Info status values are ignored.)

That workflow you’ve done so far:


a) Select your time range
b) Select the service on the left, which displays KPIs at the top
c) Select the KPIs you want, click “add” to move them to the Selected KPIs
d) One can repeat steps b and c, to include KPIs from across multiple services.
e) Set sliders to 11

SAVE!

23
The Create Correlation Search - Step 1 window will pop-up. Customize the alert metrics
based on the fields depicted in the image below.

Alert if Status is medium or higher (Medium severity or worse)


Run Every: 5 minutes
Suppression: Disabled
Next

24
In the Create Correlation Search - Step 2 window, customize the correlation search.

Search Name: db-<username>


Notable Event Title: Database Issues - <username>
Severity: Medium
Save

These alerts will appear in Episode Review

Great work on getting to this point! Let’s do a quick recap of what we’ve done thus far.
- Created multiple KPIs to finish our database service dependencies
- Adjusted the health settings of our database service to fit our needs
- Used a correlation search to notify us when our database service health
diminishes
We will dive into generating a Deep Dive view next!

25
8. ​Deep Diving Your Data
Deep Dive provides a swimlane view of KPIs against time. This enables analysts and
engineers to do what they do best, ​pattern match​ based on graphics! Let’s dive (pardon
the pun) straight into it!

Click ​Deep Dive

A list of the existing Deep Dive view will be presented

Click ​Create Deep Dive

Title:​ ​Database##
Permissions​: ​Shared in App
Click ​Create

26
Click on the Deep Dive link you just created (e.g.,​ database##)

Click the “​+ Add Lane”​ dropdown and select “​Add KPI Lane”​

27
Service:​ ​Database##
KPI:​ ​ServiceHealthScore
Click ​Create Lane

Please repeat the prior steps for the remaining KPIs within your Database## service

28
Once you’ve completed the configurations, you should see a Deep Dive view similar to
the one displayed below.

Feel free to rearrange the lanes around. Based on the data in the lanes, can you
describe why the Service Health Score is dropping for our Database Service?

29
9. ​Building Glass Tables
In this section, we will ​CLONE​ the Digital Transaction Flow glass table and apply the
new database service we created.

Navigate to ​Glass Tables

Find the existing 4 - Digital Transaction Flow dashboard from the list.
Click ​Edit → Clone

In the Clone Glass Table

30
Title:​ ​4 - Digital Transaction Flow 00
Permissions: Shared in App
Click ​Clone

You will return to the Save Glass Tables window

Click on the new glass table you just created (​4 - Digital Transaction Flow
##​)

Notice that the Database Service is greyed out and set to ​N/A

31
We will now proceed to replacing the Database block with our newly created service.

Click ​Edit

In the Services sidebar on the left, identify the service your created. Click on the “>”
next to your service to expand it:​ > Database##

32
Once the service is expanded, you will see the list of KPIs and aggregated health
scores you have created in the previous steps (see below).

Drag and drop the ServiceHealthScore onto the Glass Table canvas. A new widget
appears with the right panel “Configurations” populated. Screen below cropped for ease
of reading.

Alter the ​Configurations​ side bar for this widget to reflect the following:
● Viz Type: Single Value
● Width: 180

33
● Height: 180
● Label Box: Off
● Thresholds: On
● Drilldown: On
○ Change Default to Saved Deep Dive
○ Find the name of the Deep Dive you created earlier (e.g., database##)
Click Update

Drag and drop the widget to cover the grey N/A box under DATABASE.

34
Click ​Save
Click ​View

Congratulations! You have just created and altered your first Glass Table! You can
now click into the Service Health Score widget you just created to drill down into the
Deep Dive View you created in prior steps.

35

You might also like