IT1171
IT1171
PJ Pokhrel
Performance Engineer | Stubhub
© 2019 SPLUNK INC.
6. Summary
© 2019 SPLUNK INC.
StubHub
Introduction to StubHub
Our Stack
How do you detect production issues early in this complexity?
Exceptions Sniffer
What is the exceptions sniffer?
• java.io.IOException • SHBadRequestException
• java.lang.NumberFormatException • SHResourceNotFoundException
• java.lang.NullPointerException • UserNotAuthorizedException
• java.net.SocketTimeoutException •…
• java.sql.SQLException
•…
© 2019 SPLUNK INC.
Overview: Issues:
• Internal java application which sniffs Errors and • Became slower as data grew
Exceptions in java apps from application logs • Time consuming
• Calls Splunk REST APIs • Server maintenance
• Data processed by the sniffer and saved to • Dependent on Splunk REST APIs
PostgreSQL
• No native Machine Learning support
• Rules engine
• Alert manager module to send alerts
• Built in Splunk
• Set of data models, metrics,
dashboards and alerts
• Uses Splunk components: metric store,
alert, dashboard, machine learning
functions etc
• Allows us to store lot of data without
worrying about space, reducing time to
generate weekly and monthly reports
© 2019 SPLUNK INC.
Journey
How we got started?
Splunk Metrics
Data retrieval performance before and after
Dashboard
Splunk dashboard that allows users to filter and group data
© 2019 SPLUNK INC.
Weekly Report
Weekly exceptions report heatmap view by role
© 2019 SPLUNK INC.
Monthly Report
Monthly exceptions trend report
© 2019 SPLUNK INC.
Reports are
great, but we
also needed
data in
real-time to
take action
© 2019 SPLUNK INC.
Alerts
Actionable alert policies
•Threshold method
– Standard deviation
– Standard deviation with sliding window
– Median absolute deviation
•Other ML algorithms
– Clustering to find underlying structure in exception data
– Probability density function
© 2019 SPLUNK INC.
Both global and local outliers found, but now it’s way too noisy
Thresholds are too small or large at certain periods
© 2019 SPLUNK INC.
The boundaries look better, but stilll appears to be way too noisy
© 2019 SPLUNK INC.
-2 SD -1 SD Average +1 SD +2 SD +3 SD
© 2019 SPLUNK INC.
| apply MyModel
© 2019 SPLUNK INC.
The multi-modal nature of our data probably to the fact that our data is cyclical
Consider what else you may want to split your data by (app type, user group, etc)
What to
Learn More
About
Density
Function?
Additional sessions to
further deep dive on the
theory and example use
cases
© 2019 SPLUNK INC.
Pros Pros
• Easy to understand • Better for outlier detection
• MLTK assistant available • Supports fit and apply, so easier to setup
Cons Cons
• Doesn’t support fit or apply • Not available in MLTK assistant yet
• Complicated to use for alerts
© 2019 SPLUNK INC.
Effective Alerts
Final findings for the most effective alert for each exception pattern
Thank
You
!Go to the .conf19 mobile app to
Q&A
Steve Veio | Ops Manager
PJ Pokhrel | Performance Engineer
Eurus Kim | Staff ML Architect | Splunk