Splunk Power
Splunk Power
ButterCup Games
field=Description "^(?<TaskID>[^-]+).*"
Combine the results from a main search with the results from a subsearch search
vendors. The result sets are joined on the product_id field, which is common to
both sources.
Why Time:
Time is the most efficient way filter in splunk
when Splunk indexes data that data will store in bucket
Buckets are directories containing set of raw data and indexing data.
Buckets have configurable:
maximum size and maximum time span (by admin users)
Three types of Buckets
1) Hot: As events are indexed they are placed in hot.
hot buckets are only writeable buckets
2) Warm: hot buckets rolls to warm buckets when
maximum size reached
Time span reached
Indexer is restarted
Upon rolling, bucket is closed, renamed and changed to "read only"
renamed: db_1389230491_1389230488_5
1389230491(youngest event) and 1389230488(oldest event) in bucket.
3) Cold: Warm buckets rolls to cold bucket when
maximum size reached
Time span reached
cold buckets are typically stored in differt locations than hot and warm
buckets
This allows to be stored in slower, cost effective infrastructure
Distribution of buckets is more important in splunk search
When splunk searches, it used timestamp on splunk directory and opens bucket
directory
and compressed the raw data according to splunk search
Bucket names in Splunk indexes are used to:
determine if the bucket should be searched based on the time range of
the search
Wildcards:
Wildcards are tested after all other search terms
Only trailing wildcardsmake efficient use of index
Wildcards at the beginning of a string cause splunk to search all events
within timeframe
causing performance issue
Wildcards in the middle of a string produces inconsistent results
Avoid using wildcards to match punctuation
Be as specific as possible in search terms
Best Practices:
The less data you have to search, the faster splunk will be.
Time is the most efficient filter, after time, index, source, host and
sourcetype are most popular
Fields extracted at index time do not need to be extracted for each search.
More you tell the search engiene, more likely it is that you will get good
results.
sourcetype=access status>299
| chart count over status by host (only value can be specified after by
clause)
usernull=f (to remove null columns to chart
Chart command by default limited to 10 columns others can be included by
limit column
useother=f to remove OTHER column in result/chart.
The over clause allows you to define which field is represented on the X axis
of a chart.
TimeChart command:
Performs stats aggregations over time
Time is always the X axis.
sourcetype=vendor_sales
| timechart count by poduct_name (limit option available)
timechart command intellegently clusters data in time intervals depend on the
time selected
sourcetype=vendor_sales
| timechart span=12hr count by poduct_name (limit option available)
span to change clustered data time
Visualization examples:
Format option in visualization
General, X-Axis,Y-axis,Chart Overlay, Legend
Module 4:
=========
Commands to pull geographic data from your machine data. Visualization to display
data is easy to understand
iplocation command:
is used to lookup and add location information to events
Search city, country, region, latitude and longitude added to events include
external ip addresses
Geostats command:
aggreagates geographical data for use on a map visualization.
sourcetype=vendor_sales
| geostats latfield=Vendorlatitude longfield=VendorLongitutude
count by product_name globallimit=10
Choropleth Map:
is another way to see our data in geographical visualization
in order to use this we need .kmz(compressed keyhole markup launguage file)
splunk ships with two kmz files
geo_us_states.kmz
geo_countries.kmz
Geom command:
Adds field with geographical data structures matching ploygons on map
sourcetpe=vend* vendorID>=5000 AND vendorID<=5055
| stats count as Sales by VendorCountry
| geom geo_countries featureIdField=vendorCountry
(geo_countries is feature collection)
Trendline Command:
Computes moving averages of field values
sourcetype=access-combined action=purchase
| timechart sum(price) as sales
| trendline wma2(sales) as trend
Field Format:
Format link in statistics tab
Addtotals command:
Computes the sum of all numeric fields for each event
Module 5:
========
Eval Command:
is used to caluculate and manipulate field values
Arithmatic, concatination, boolean operartors supported
Results can be written to new field or replace existing field
newly crearted values are case sensitive
eg: we can use eval to convert bytes to MG
mathematical operations
Fieldformat command:
Fomrat values without changing characteristics of underlying values
uses same function as eval command
While eval creates new field values, the underlying data in the index does not
change
Never use where command when you can filter by search terms
Module 6:
Correlating events:
Transaction:
Any group of related events that span time
Each event represents a user generating a single http request
Transaction command:
transaction field-list (one or list)
definations:
maxspam: Allows setting of maximum total time between earlist and latest
events
maxspan: Allows maximum total time between events
startswith: Allows forming transactions starting with specified terms,
field values, evaluations
endswith: Allows forming transactions ending with specified terms, field
values, evaluations
maxpause Finds groups of events where the span of time between included
events does not exceed a specific value
Which of these is NOT a field that is automatically created with the transaction
command? maxcount
Module 7:
=======
Knowledge habits:
Naming Convention:
OPS_WFA_Network_Security_na_IPwhoisAction
group__type_platform_category_Time_Description
Module 8:
Field Extractions;
=================
Utility allows us to use a graphical user interface
to extract fields that persist as knowledge objects making them resuable in
searches
How many ways are there to access the Field Extractor Utility? 3
Once a field is created using the regex method, you cannot modify the underlying
regular expression. false
In the Field Extractor Utility, this button will display events that do not contain
extracted fields. Non-Matchers
When extracting fields, we may choose to use our own regular expressions true
Module 9:
Aliases and calc fields:
Field aliases:
Calculated fields must be based on extracted or discovered fields
Fields from a lookup table or generated from a search command cannot be used.
Once a field alias is created: You can still use the original field name to search
Field aliases can only be applied to a single source type, source, or host. false
Field aliases are used to normalize data.
Calculated fields are based on underlying: eval expressions
Calculated fields are based on underlying: false
Module 10:
Tags and Event Types:
These allow you to categorize events based on search terms. event types
Which search would limit an "alert" tag to the "host" field? tag::host=alert
Tags are descriptive names for key value pairs
Event Types do not show up in the Fields List. false
You can only add one tag per field value pair. false
Module 11:
macros:
Reusable search strings or portions of search strings
useful for frequent searches with complicated search syntax
macors allow to store entire search stings
Time range independent
pass argument to search
Search macros:
What is the correct way to name a macro with two arguments?us_sales(2)
The number of arguments in a macro must be included in the macro name. true
Module12:
WorkFlow Actions
create links to interact with external resources or narrow search
Get Method
Post metod
mODULE 13:
Data models
Events Searches transactions (Root event)
Root Data Model:
Datasets --> Root Event/Root Search/Root transaction/child
Root Event enables us to create hierardhies based on a set of events, and are
the most commanly used type of root data model object.
Root Search builds these hierarchies from a transforming search.
Root Search do not benifit from data model accelaraiton
avoid root searches whenever possible.
Root transaction objects allow us to create datasets from groups of related
events that span time. They use an existing object from our data hierarchy to group
on.
Child objects allos us to constrain or narrow down the evetns in the object
above it in hierarchy tree.
Using transactions:
Transactions with datasets
do not benifit from data modek accelaration(think about the reports
users will be running)