0% found this document useful (0 votes)
116 views

Splunk Search Optimization

The document discusses strategies for optimizing searches on the Splunk platform. It covers techniques like filtering data early, limiting data retrieved from indexes, using appropriate time windows, and partitioning data into separate indexes. Tips provided include narrowing search criteria, specifying indexes/sources, and understanding the type of search being performed.

Uploaded by

sanjay
Copyright
© © All Rights Reserved
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
116 views

Splunk Search Optimization

The document discusses strategies for optimizing searches on the Splunk platform. It covers techniques like filtering data early, limiting data retrieved from indexes, using appropriate time windows, and partitioning data into separate indexes. Tips provided include narrowing search criteria, specifying indexes/sources, and understanding the type of search being performed.

Uploaded by

sanjay
Copyright
© © All Rights Reserved
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 3

Splunk Search Optimization

Optimizing the search is a strategy that lets the quest run as effectively as
possible. In this section, we will learn how to optimize searches on the Splunk
platform

A search also runs longer when not configured, retrieves more enormous quantities
of data from the indexes than is required, and inefficiently consumes more memory
and network resources. Multiply these problems across hundreds or thousands of
searches, and the result is slow or sluggish.

There's a collection of fundamental concepts we can obey to maximize our searches.

Retrieve only the required data


Move as little data as possible
Parallelize as much work as possible
Set appropriate time windows
We are using the following methods to incorporate search optimization concepts.

Filter as much as possible in the initial search


Perform joins and lookups on only the required data
Perform evaluations on the minimum number of events possible
Move commands that bring data to the search head as late as possible in our search
criteria.
Indexes and lookups
The Splunk program uses the information in the index files to classify the events
that can be retrieved from the disk when we run a scan. The lower the number of
events to be retrieved from the disk, the faster the quest is going.

How we build our quest can have a huge effect on the number of retrieved events
from the disk.

When data is indexed the data will be translated into events based on time

The processed data consists of several files:

The raw data in compressed form (rawdata)


The indexes that point to the raw data (index files, also referred to as tsidx
files)
Some metadata files
These files are written to the disk and reside in age-organized directory sets
called buckets.

Use indexes effectively


One way of limiting the data extracted from the disk is to partition data into
different indexes. If we rarely scan multiple data types at a time, partition
various data types into separate indexes. Limit our searches to the specific index.
Store data about web access in one index, for example, and firewall data in
another. For sparse data, it is suggested to use different indexes, which otherwise
may be lost in a large amount of irrelevant data.

An optimized search
We can optimize the entire search by moving some of the components from the second
search to locations earlier in the search process.

Moving the criteria A=25 before the first pipe filters the events earlier and
reduces the amount of times that the index is accessed. The number of events
extracted is 300,000. This is a reduction of 700,000 compared to the original
search. The lookup is performed on 300,000 events instead of 1 million events.
Moving the criteria L>100 immediately after the lookup filters the events further
reduces the number of events that are returned by 100,000. The eval is performed on
200,000 events instead of 1 million events.

The criteria E>50 is dependent on the results of the eval command and cannot be
moved. The results are the same as the original search. 50,000 events are returned,
but with much less impact on resources.

Quick tips for optimization


The key to fast searching is to limit the data to the absolute minimum that needs
to be pulled from the disk. In the search, filter the data as early as possible, so
processing takes place on the minimum amount of data needed.

Limit the data from disk


The techniques for restricting the amount of data retrieved from the disk range
from setting a narrow time frame, being as precise as possible, and retrieving the
smallest required events.

Narrow the time window


Limiting the time span is one of the most powerful ways to restrict the data that
is taken off disk. Use the picker time range or specify time alterers in our search
to identify the smallest time window necessary for our search.

If we need to view data from the last hour only, don't use the Last 24 hours
default time range.

If we must use a broad time range, such as Last week or All-time, then use other
techniques to limit the amount of data retrieved from disk.

Specify the index, source, or source type


To optimize our searches, it's necessary to understand how our data is structured.
Take the time to learn which indexes contain our data, our data sources, and the
type of source. Knowing the data regarding this information lets, us narrow down
the searches.

Run the following search.


Search=*
This search is not optimized, but it provides us with an opportunity to learn about
the data we have access to.
In the Selected fields list, click on each field and look at the values for host,
source, and sourcetype.
In the Interesting fieldslist, click on the index Look at the names of the indexes
that we have access to.
In our quest, define the index, source, or source form where possible. When the
Splunk program indexes data, it will automatically add a number of fields to each
case. Fields of the index, source, and source type are automatically added as
default fields to each event. A default field is an indexed field recognized by the
Splunk program at search time in our case. The host and the source, and source type
fields describe where the event originated.

Write better searches


This topic examines some of the causes of slow searches and includes guidelines to
help us write more efficient searches. Several factors, including: can influence
the pace of our searches

The volume of data that we are searching


How our searches are constructed
The number of concurrent searches
To optimize the speed at which our search runs, minimize the processing time
required for each component of the search.

Know your type of search


Search optimization guidelines depend on the type of search we are running and the
characteristics of the data we are looking for. Searches fall into two categories,
which are based on the objective we wish to accomplish. A search is intended to
retrieve events, or a search is designed to produce a report summarizing or
organizing the data.

Searches that retrieve events


Raw event searches retrieve events from a Splunk database without any further
processing of the retrieved events. When picking up events from the index, be clear
about the events we want to imagine. This can be done with keywords and field-value
pairs unique to the events.

If the events in the dataset we want to retrieve occur frequently, the search is
called a dense search. If the events in the dataset that we want to retrieve are
rare, the search is called a sparse search. Sparse searches that run against large
data volumes take longer than dense searches for the same data set.

Searches that generate reports


Report-generating searches, or transforming searches, conduct additional analysis
on events after retrieval of the events from an index. This processing can include
filtering, transforming, and other operations using one or more statistical
functions against result collection. Since this processing takes place in memory,
the more restrictive and precise we retrieve the events, the faster the search will
run.

Tips for tuning your searches


In most cases, because of the complexity of our query, our search is slow to
retrieve events from the index. For instance, if our search contains extremely
large OR lists, complex sub searches (which break down into OR lists), and phrase
search types, processing takes longer. This section explores tips to fine-tune our
searches to make them more successful.

It takes a lot of memory to conduct statistics with a BY clause on a set of field


values that have high cardinality, lots of uncommon or special values. One
potential solution is to lower the value for the chunk size setting used with the
command tstats. Additionally, it can also be beneficial to reduce the number of
distinct values that the BY clause must process.

Restrict searches to the specific index


If we rarely search over more than one data type at a time, divide the different
data types into separate indexes. Limit our searches then to the same index-Store
Web access data, for example, in one index, and another firewall. This is
recommended for sparse data, which could otherwise be buried in a large volume of
unrelated data.

You might also like