Splunk Data Life Cycle Determining When and Where To Roll Data
Splunk Data Life Cycle Determining When and Where To Roll Data
The forward-looking statements made in this presentation are being made as of the time and date of its live
presentation. If reviewed after its live presentation, this presentation may not contain current or accurate
information. We do not assume any obligation to update any forward looking statements we may make. In
addition, any information about our roadmap outlines our general product direction and is subject to change
at any time without notice. It is for informational purposes only and shall not be incorporated into any contract
or other commitment. Splunk undertakes no obligation either to develop the features or functionality
described or to include any such feature or functionality in a future release.
Splunk, Splunk>, Listen to Your Data, The Engine for Machine Data, Splunk Cloud, Splunk Light and SPL are trademarks and registered trademarks of Splunk Inc. in
the United States and other countries. All other brand names, product names, or trademarks belong to their respective owners. © 2017 Splunk Inc. All rights reserved.
Who’s This Dude?
Jeff Champagne
[email protected]
Staff Architect
▶︎ Started with Splunk in the fall of 2014
▶︎ Former Splunk customer in the Financial Services Industry
▶︎ Lived previous lives as a Systems Administrator, Engineer,
and Architect
▶︎ Loves Skiing, traveling, photography, and a good Sazerac
3
Am I In The Right Place?
You’ll find this session helpful if…
TSIDX Reduce
Delete
Cold
Frozen
Hot Warm [coldPath] [coldToFrozenDir] -or- [coldToFrozenScript]
Data Roll
[homePath] [homePath] [vix.provider] -and- [vix.input.x.path]
Searchable Searchable
(but slower)
How Are Events Stored?
What’s enabled out of the box?
Delete
Cold
[homePath] [homePath]
Searchable
Hot/Warm Storage
I’m too hot (hot damn)
Make a dragon wanna retire man
Hot/Warm Storage
How is it used?
800+
IOPS
Hot/Warm Storage
I/O Requirements
• Enterprise Security
• New test suite is coming
• There’s an app for that:
• High search concurrency https://splunkbase.splunk.com/app/3002/
• Do yourself a favor and use SSD ▶ Block Storage
▶ Sustained I/O per indexer • We DO NOT support NFS/NAS for
simultaneously Hot/Warm volumes
• All indexers search at the same time • Common filesystems: EXT4 or XFS
• Important if you’re using a SAN
Cold Storage
Champagne on Ice
Cold Storage
How is it used?
▶ IO Performance
• Lower IOPS can be tolerated with the expectation of slower search
Don’t go below 350 IOPS
• Remember: Sustained IO across all indexers
▶ Additional storage platforms are supported
• NAS/NFS
Frozen Storage
Let it go, let it go…
Frozen Storage
Ice Ice, Baby
▶ No longer searchable
• Keep data in Cold as long as you can
▶ Data rolls from Cold to Frozen when…
• The total size of the index (Hot+Warm+Cold) grows too large
• The oldest event in a bucket exceeds a specific age
▶ Default freezing process
• TSIDX file is removed Conf File indexes.conf
▶ Manual Process
• Copy frozen buckets to thawed path [thawedPath]
• Use the rebuild command to re-index the data
− CLI command
• http://docs.splunk.com/Documentation/Splunk/latest/Indexer/Restorearchiveddata
▶ Re-Indexing
• Does not count against your license
• Takes time
• Use the same estimates for indexing new data
• Example: A reference indexer can index 300GB/day
Delete
Lets just dump it all…
Delete
When do we delete?
▶ If you already have HDFS deployed and are experienced with Hadoop
• Data roll can help reduce Splunk storage costs
• Use Splunk Bucket Reader to search archived data without Splunk
▶ Don’t use Data Roll if you don’t already use HDFS
• You can deploy Splunk in a similar manner to achieve cost savings
Searching data natively in Splunk will be faster
▶ Dense searches on HDFS will have the best performance
• Data in HDFS is indexed on-the-fly
• Sparse searches will be slower
TSIDX Reduce
Put your buckets on a diet
TSIDX Reduce
How does it work?
▶ Conf File
Conf File indexes.conf
▶ Historical/Archive data
Do NOT use TSIDX reduce on frequently searched data
▶ Dense searches
− Return a large percentage (10% or more) of matching events from a bucket
• Largely unaffected by TSIDX reduce
▶ Sparse searches
• Needle in the haystack style searches
• Significantly affected by TSIDX reduce
• 3-10X slower
• Depends on the volume of data searches
Retention
How long does this stuff stay around?
Retention
General Guidelines
TSIDX Reduce
Delete
Cold
Frozen
Hot Warm
Data Roll
Uncompressed
Raw Data
Calculating Retention
Splunk Volumes
• Find your “typical” search range • Consider using TSIDX reduce to conserve
index=_audit action=search info=completed is_realtime=0 more disk space
• Factor reduced buckets into storage planning
Retention
Volume Definitions
▶ Control retention for all indexes that reference the volume
• Allows you to consume a defined storage amount across multiple indexes
Conf File indexes.conf
[<index name>]
homePath = volume:<volume name>/$_index_name/db
coldPath = volume:<volume name>/$_index_name/colddb
▶ Freezing Data
• Frozen buckets are not fixed-up
− http://docs.splunk.com/Documentation/Splunk/latest/Indexer/Bucketsandclusters#How_the_cluster_handles_frozen_buckets
▶ Storage
• Keep your summaries in the Hot/Warm volume for best performance
• Be aware of storage impact
TSIDX Reduce
Delete
Cold
Frozen
Hot Warm [coldPath] [coldToFrozenDir] -or- [coldToFrozenScript]
Data Roll
[homePath] [homePath] [vix.provider] -and- [vix.input.x.path]
© 2017 SPLUNK INC.
Questions?
Help me help you
© 2017 SPLUNK INC.
Thank You
Don't forget to rate this session in the
.conf2017 mobile app