Splunk Getting Started With Itsi
Splunk Getting Started With Itsi
Table of Contents
Introduction Features and Definitions
Product Overview Content Packs & Splunk App for Content Packs
How ITSI Can Make Your Job Easier Service Insights Terms
Content Packs and Splunk App for Content Packs Service Health Scores
Resources Dashboards
Introduction
Hey there! Thanks for checking out this Getting Started Guide. You may be exploring Splunk IT Service Intelligence (ITSI),
or already have it but are unsure how best to utilize it. Well this guide is here to help!
We put this guide together to answer all of your questions and help you feel confident navigating around the platform.
But most of all, we know ITSI has an immense amount of value to offer you and your organization so we hope this guide
leaves you feeling empowered when using ITSI.
Product Overview
ITSI consists of two primary components: Service Insights and Event Analytics.
These operational issues put your revenue, customer experience, employee effectiveness and innovation at risk.
The unfortunate reality is that legacy IT tools just aren’t equipped to handle the way businesses operate today —
customer-focused, service-centric, and increasingly hybrid and interconnected digital businesses.
Splunk ITSI addresses these challenges by applying machine learning to data for 360° service monitoring, predictive
analytics and streamlined incident management.
Here are some ways that ITSI can make your job easier:
• ITSI is customizable and monitors performance in the way that the business operates
• Protect business service-level agreements with dashboards to:
Business and
- Monitor service health
Technical Service
- Troubleshoot alerts
Level Monitoring
- Perform root cause analysis
• Achieve end-to-end visibility across your entire IT environment
Get Started with ITSI Warning: You can import a maximum of 50,000 entities
at a time in ITSI. If you attempt to import more than
Step 1: Getting Data In
50,000 entities, only the first 50,000 are imported.
There are two ways to get data into ITSI: entities and
content packs. Prerequisites:
• ITSI role: You have to log in as a user with the itoa_
Entities & Entity Integration admin or itoa_team_admin ITSI role and access to
Entities and entity integrations are used to collect and the Global team.
aggregate data into Splunk ITSI. Data is collected into
• Indexed data: You must have already indexed data
what we call Entities – you could define entities any way
you want to associate with entities.
that fits your needs, but this usually includes data from
servers, DNS groups, firewalls or other devices. Data
can be metrics, logs, traces — anything that helps you
gain better visibility into the health of the services you
are responsible for. Data is streamed and collected from
native systems or management/monitoring tools like
Splunk Infrastructure Monitoring.
For more information, see Documentation - create a Warning: You can import a maximum of 50,000 entities
single entity in ITSI. at a time in ITSI. If you attempt to import more than
50,000 entities, only the first 50,000 are imported.
Manually import entities from a Splunk search in ITSI
Create entities from ITSI module searches, saved
searches or ad hoc searches using indexed data coming
into your Splunk platform deployment.
Importing from a CSV file supports five different data on a recurring basis with ITSI entity integrations.
separators: comma (,), semicolon (;), pipe (|), tab (\t) The integrations that are available are:
In this example you want to create two entities called • Windows entity integration in ITSI
appserver-04 and appserver-05, and associate • VMware vSphere entity integration in ITSI
appserver-04 with the Web A service and associate
• Splunk Infrastructure Monitoring entity integration
appserver-05 with the Web B service. The Web A service
in ITSI
already exists in ITSI but the Web B service does not.
The following image shows the CSV file to import: To learn more, see Overview of entity integrations in ITSI.
Splunk App for Content Packs is a free Splunkbase app for ITSI (version 4.9 and later) that acts as a one-stop shop for
content packs, and out-of-the-box searches and dashboards for common IT infrastructure monitoring sources. With this
app, you no longer need to use the backup/restore functionality to install content packs. Instead, the app contains a library
of readily updated content packs and is used to update all of them, rather than individually updating each content pack.
For guidance on how to onboard data for use with the desired Content Pack, see the details included in the Content
Pack documentation for how to get data in.
• Prerequisite: You must have command-line access and Splunk admin access to an ITSI v4.9 or later instance
• Step 2: Install the app per the instructions on the Splunk Docs page.
• Step 3: Go to Configuration > Data Integrations to see the available content packs. Data Integrations is the top-level
GDI guidance we give for common data sources like Unix and Linux, Windows, and Splunk Infrastructure Monitoring.
Make sure to install the associated Add-On for the Content Pack you downloaded! For example, there is a
corresponding Unix and Linux Add-On that works with the Monitoring Unix and Linux content pack.
For more information regarding: how to install the Splunk App for Content Packs on a Splunk Cloud Platform or on-
premises environments, how to install content packs for ITSI version 4.8.x and below, and to see a list of available
content packs, see the Splunk Content Packs Manual.
Resources
• Blog post: Introducing Splunk App for Content Packs
• Blog post: Content Pack for Microsoft Exchange
Step 2a: Services and Service Insights Another glass table used to evaluate business,
A service is a set of interconnected applications and operational and SLA performances, along with
hosts that are configured to offer a specific service infrastructure status.
to the organization. These services can be internal —
Service Insights also enables you to create and use four
an organization’s email system — or external — an
kinds of dashboards: infrastructure overview, service
organization’s website.
analyzer, deep dives and predictive analytics.
You can create business and technical services that
Infrastructure Overview Dashboards provide a
model those within your environment. Some services
consolidated view of all of your entities grouped by
might have dependencies on other services. Services
type with drill downs to entity-specific dashboards for
contain key performance indicators (KPIs), which make
operating systems, virtual infrastructures, containers
it possible to monitor service health via service health
and cloud services.
scores, perform root cause analysis, receive alerts, and
ensure that your IT operations are in compliance with Here is an example of an infrastructure overview
business service-level agreements (SLAs). dashboard:
Deep Dives Dashboards are an investigative tool to help Predictive Analytics Dashboards predict future
you identify and analyze issues in your IT environment. incidents 30 minutes in advance using machine learning
algorithms and historical service health scores.
Here's what a deep dive dashboard looks like:
Here is an example of a predictive analytics dashboard:
Use swim lane displays of multiple KPIs and correlate To learn more about these data models, see the Splunk
metrics over time to identify root causes. ITSI interactive demo.
Service Modeling and Service Decomposition Create KPIs for Your Services
Before you are ready to set up your dashboards and ITSI KPIs are Key Performance Indicators that are
services in Splunk ITSI, it’s important to identify what helpful in determining the health of the service they
services will provide the most value. belong to. KPIs are recurring saved searches that
Best Practices for Selecting return the value of an IT performance metric. They are
Best Practices for selecting the right services to
Services to Apply created within a specific service and define everything
apply in Splunk ITSIin ITSI
needed to generate searches to understand the
What are some
important services to Does it impact
underlying data, including how to access, aggregate,
your organization? revenue, customers,
etc.?
and qualify with thresholds. There are two types of
These can be business
or IT services KPIs: business and technical.
e.g.: DNS, Online store, No Yes
EPP
Doing pre-work with service decomposition to
No Do you currently have correctly identify what services are most valuable
supporting data for
this service? to the organization is a good first step to identifying
Yes appropriate KPIs to map to these services. Please
BestBest
Practices
Practicesfor Choosing
for Choosing KPIsKPIs
What was the impact
of these outages?
What are some key Is it insightful? We
e.g.: $/week in lost revenue
performance want a value with
indicators for your meaning.
services? Think : threshholds
What additional
e.g.: shopping cart No Yes
services, apps and abandonment,
infrastructure support delayed messages
these services?
e.g.: web tier, middleware, No Is it intuitive and easy
database, mobile tier to understand?
Yes
You have your What are the alert
No
supporting services sources for these
services? Is it relevant to the
and dependencies! You have your Yes
user's job/role
e.g.: Splunk, Nagios KPIs! function?
You can also use Content Packs for preconfigured Get Started with Service Insights
services and KPIs. Here are some KPIS available in the Service Insights within Splunk ITSI consists of various
Microsoft 365 Content Pack: dashboard views, alerts and metrics so that you
can effectively monitor and map services within
• Extended recovery your organization. Here are some ways to get better
• False positive acquainted with the various available features and views.
Availability
• Investigating
KPIs
• Restoring service Tasks to tackle
• Normal service Navigate Service Analyzer
- Explore the Tile View of Services
• Added delegation entry
Performance • Added service principal - Filter to the Shared Infrastructure Service and
KPIs • Set company information Show Dependencies
• Set password policy - How many Services are in Shared Infrastructure?
Step 2b: Event Analytics You can run actions on episodes either automatically
Event Analytics in Splunk ITSI is where you can using aggregation policies or manually in Episode
streamline your incident management workflows, from Review. Some actions, like sending an email or pinging a
alert management to incident response triggers. host, are shipped with ITSI. You can also create tickets in
external ticketing systems like ServiceNow or Remedy.
Get Started with Event Analytics Finally, actions can also be modular alerts that are
Tasks to tackle shipped with Splunk add-ons or apps, or custom actions
Ingest events through correlational searches. that you configure.
The data itself comes from Splunk indexes, but ITSI
To learn more about event analytics, see the
only focuses on a subset of all Splunk Enterprise data.
documentation and Event Analytics section (step 7 / 8)
This subset is generated by correlation searches. A
on the Splunk ITSI interactive demo.
correlation search is a specific type of saved search that
generates notable events from the search results. Event Analytics Best Practices for Third-Party
See Overview of correlation searches in ITSI.
Data Sources
To avoid duplicate events, use the same frequency
Configure aggregation policies to group events into and time range in correlation searches.
episodes. Once notable events start coming in, they When configuring a correlation search, consider using the
need to be organized so you can start gaining value from same value for the search frequency and time range to
them. Configure an aggregation policy to define which avoid duplicate events. For example, a search might run
notable events are related to each other and group every five minutes and also look back every five minutes.
them into episodes. An episode contains a chronological
If there's latency in your data and you need to look for
sequence of events that tells the story of a problem or
events you might have missed, consider expanding the
issue. In the backend, a component called the Rules
time range. For example, the search could run every
Engine executes the aggregation policies you configure.
minute but look back 5 minutes.
For more information, see Overview of aggregation
policies in ITSI. To reduce load on your system, don't use a time
range greater than 5 minutes.
Set up automated actions to take on episodes. Exceeding a calculation window of 5 minutes can put a
lot of load on your system, especially if you have a lot of
events coming in. If you want to avoid putting extra load
on your system, consider reducing the time range to 5
minutes or less.
Additional Resources
• Webinar: Getting Started with ITSI part 2
• Blog post: Event Storms with ITSI
Splunk App for Content Packs is a free application for ITSI (version 4.9 and later) that acts as a one-stop shop for
content packs, and out-of-the-box searches and dashboards for common IT infrastructure monitoring sources. With
this app, you no longer need to use the backup/restore functionality to install content packs. Instead, the app contains
a library of readily updated content packs and is used to update all of them, rather than individually updating each
content pack. Getting started with Splunk for IT operations use cases has never been easier!
Check out the release documentation for content packs for more information.
You can create business and technical services that model those within your environment. Some services might
have dependencies on other services. Services contain key performance indicators (KPIs), which make it possible to
monitor service health via service health scores, perform root cause analysis, receive alerts, and ensure that your IT
operations are in compliance with business service-level agreements (SLAs).
For more information about creating services, see Overview of Creating Services in ITSI and ITSI Thresholding Basics.
Entities
An entity is an IT infrastructure component that is managed to support IT/business services.
Each entity is unique: it can be identified based on its specific attributes and relationships to other IT processes.
Entities contain information ITSI uses to associate services with information found in Splunk searches, imports and
integrations. You can use this entity information to filter items according to the entity definition.
An entity is not a service. You must define entities before creating services. When you configure a service, you can
specify entity matching rules based on entity aliases (that automatically add the entities to your service).
See the Entity Integrations Manual for more information about importing, defining, and managing entities.
KPIs are created within a specific service and define everything needed to generate searches to understand the
underlying data, including how to access, aggregate and qualify with thresholds.
There are two types of KPIs: business and technical. Some examples of business KPIs are number of site visitors,
number of transactions and number of logins. On the other hand, a few examples of technical KPIs are CPU load
percentage, memory used percentage and response time.
You can use these metrics to measure and ensure that performance remains within acceptable parameters.
ITSI uses a combination of individual KPI health scores and their importance settings to calculate the overall service
health score. After creating a KPI, you may optionally change the importance of the KPI in order to increase or reduce
its impact on the service health score.
ITSI considers KPIs that have an importance value of 11 as a special case that represents a "minimum health indicator"
for the service. When a KPI with an importance value of 11 reaches the critical state, the overall health score for the
service turns critical, regardless of the status of other KPIs in the service.
For more information, see How service health scores are calculated.
Dashboards
Infrastructure Overview Dashboard
A consolidated view of all your data integrations and investigation tools for operating systems, virtual infrastructures,
containers, and cloud services.
Visually correlate services to underlying infrastructure with a tile or tree view. Drill down to code level and identify root
causes directly from service monitoring dashboards.
Figure 4: Deep dives display a side-by-side view of KPIs and service health scores over time to help you zoom in on metric and log data and visually correlate
root cause.
Figure 5: Use side-by-side displays of multiple KPIs and correlate metrics over time to identify root causes.
Figure 6: Top five contributing service metrics are displayed to guide troubleshooting.
Glass Tables
Glass tables are custom visualizations that help you monitor real-time interrelationships and dependencies via KPIs
and service health scores across your IT and business services in one view.
Create glass tables to provide dynamic contextual views of your IT topology or business processes and monitor them
in real time. Glass tables also feature a drawing canvas where you can add visualizations in the form of KPIs and service
health scores, upload images and icons, and add charts.
Figure 7: An example of a glass table used to evaluate performance metrics at San Francisco International Airport.
Figure 8: Another example of a glass table used to evaluate business, operational and SLA performances, along with infrastructure status.
For more information, see Overview of the glass table editor in ITSI.
Getting Started with Splunk IT Service Intelligence 19
GETTING STARTED GUIDE
Machine learning algorithms are used to automatically update thresholds based on observed behaviors. This not only
determines what should be considered normal in your IT environment but also prevents alerts from becoming stale.
The thresholds automatically recalculate on a nightly basis to ensure that changes in behavior don’t trigger false alerts.
To learn more about how to use adaptive thresholds, see Apply adaptive thresholds to a KPI in ITSI.
Anomaly Detection
Anomaly detection generates notable events when a KPI deviates from an expected pattern. These notable events
represent detected abnormal behavior for service-level (trending) and entity-level (cohesive) KPI data. The algorithms learn
KPI patterns continuously in real time and detect when a KPI departs from its own historical behavior.
Service/KPI alerts
Enable alerting on a single KPI in ITSI so you can be alerted when aggregate KPI threshold values change. ITSI generates
notable events in Episode Review based on the alerting rules you configure.
Use these alerts to investigate and take action on the severity changes of your individual KPIs before they negatively
impact the service as a whole.
To learn more about how to set up KPI alerts, see Receive alerts when KPI severity changes in ITSI.
Multi-KPI Alerts
Trigger alerts based on multiple service conditions. Define severity levels and trigger conditions, or assign weights to
attribute relative importance.
Create a multi-KPI alert from a deep dive view when you see a correlation between two or more KPIs, and get notified
next time a similar problem occurs.
To learn more about how to set up multi-KPI alerts, see Create multi-KPI alerts in ITSI
Episode Review
Incident management dashboard that prioritizes issues by severity and allows teams to trigger a response from the
same view. Reduce “swivel-chair” operations and respond to the most urgent issue first.
Rules Engine
The ITSI Rules Engine is a system for continuously processing notable events to allow for event grouping and
deduplication, as well as automatic action execution, based on user-defined criteria. The system revolves around a
continuously running indexed real-time search that streams all notable events into a custom search command.
The Rules Engine's functionality begins with correlation searches. Correlation searches generate notable events in ITSI,
which are stored in the itsi_tracked_alerts index. The Rules Engine saved search accepts notable events into
the itsi_rules_engine custom search command. The search command generates the internal structures required
for aggregating events into episodes and executing actions.
The Rules Engine search periodically polls the configuration database for updates. If a policy indicates some
action should be executed, the Rules Engine dispatches a REST request to the Event Management Interface to
execute the action.
To see a diagram of the Rules Engine workflow or to learn more about the searches, see Overview of the ITSI Rules Engine.
Notable Events
Notable events represent detected abnormal behavior for service-level (trending) and entity-level (cohesive) KPI data.
Notable events form the basis for Episodes which are an intelligently grouped set of related notable events.
Aggregation Policies
A notable event aggregation policy is the fundamental unit of event grouping in IT Service Intelligence (ITSI).
Aggregation policies are the data structure the Rules Engine uses to group notable events into deduplicated episodes
and organize them in Episode Review. These episodes have their own title, description, severity, status and assignee
that are separate from the individual notable events within the episode. Aggregation policies are also the container for
action rules that automate episode actions, such as sending an email or pinging a host.
To learn more about the 3 components of aggregation policies: 1) filtering, splitting, and breaking criteria, 2) episode
information, and 3) action rules, see Overview of Aggregation Policies in ITSI.
Additional Resources
• On Demand Services Catalog
• Learning Path: ITSI Admins
• Learning Path: ITSI End Users
• Course: Implementing ITSI
• Course: Using ITSI
• ITSI Community
• .conf Sessions
• Events and Workshops
• Webinars
Use the resources and tools outlined in this Getting Started Guide to explore ITSI and all of its capabilities!
Please reach out to your account team if you have any questions or concerns.
Splunk, Splunk>, Data-to-Everything, D2E and Turn Data Into Doing are trademarks and registered trademarks of Splunk Inc. in the United States and
other countries. All other brand names, product names or trademarks belong to their respective owners. © 2021 Splunk Inc. All rights reserved. 21-20243-Splunk-Getting Started with ITSI-101-GSG