Observability Buyers Guide
Observability Buyers Guide
Buyer’s Guide
Improve digital resilience
by lowering the cost of
unplanned downtime
Table of Contents
About this buyer’s guide............................................................................................. 3
Gartner, Magic Quadrant for Security Information and Event Management (October 2022) | Gartner: Market Share: All Software Markets, Worldwide 2021 (April 2022) | Gartner, Market Share Analysis: ITOM, Performance Analysis Software (October 2022) | IDC, Worldwide
Security Information and Event Management Market Shares, 2021: The Cardinal SIEMs, doc #US48506522 (July 2022) | IDC, Worldwide IT Operations Management Software Market Shares, 2021: Market Growth Moderates, doc #US49609921 (September 2022) | GigaOm, Radar
for Cloud Observability Solutions (March 2022) | Quadrant Knowledge Solutions, SPARK Matrix for Cloud Observability (December 2022)
GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally and is used herein with permission. All rights reserved. Gartner does not endorse any vendor, product or service depicted in its research publications, and does
not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties,
expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.
Lastly, we’ll get down to business and help you get started
with your own observability practice, including core buying
criteria (what you should be looking for in an observability
tool), a breakdown of pricing and licensing models for different
tools, and guidance on how to evaluate different observability
vendors for long-term partnership.
Digital resilience.
Observability is …
a practice used by software developers, site reliability engineers and Ninety-one percent of organizations said one hour of downtime that takes
IT operations to improve digital resilience by lowering the cost of mission-critical infrastructure and applications offline costs them at least
unplanned digital downtime. $300,000 due to lost business, productivity disruptions and remediation efforts.
When time is money, every minute counts. And during an economic downturn,
every dollar and every purchase lost counts even more.
But how can you be more resilient when our digital reality barely resembles
what it looked like just a few years ago? IT and DevOps tech stacks are exploding So it’s no surprise that if you ask an alert-fatigued SRE or ITOps practitioner
because of the growing complexity of software development, the need for what they need to be more resilient, inevitably the answer will be — time and
faster development cycles and the increasing demand for automation and information. They care about:
collaboration. At the same time, customers’ expectations continue to grow. They
• Wasting less time firefighting in noisy alert storms and war rooms.
want more digital interactions and they expect them to be perfect — if they
aren’t seamless and secure, your customers will punish you not only with their • Eliminating the guesswork required to fix problems.
cash but also with their voices. When a page fails to load quickly, you’ll hear • Making their processes and tech environment more reliable — preventing
about it. You’ll read the angry tweet as the world finds out, too. issues from becoming customer-facing problems.
• Having more complete information available to make the right call at the
right time, both tactically day-to-day as well as for long-term strategic
business decisions.
All of these factors increase the likelihood that a critical signal (like a failure, error
or outage) goes unnoticed.
Most other monitoring and observability tools weren’t built to handle the
frequency of changes or the explosion of potential failure scenarios found with
modern software. For example, the traditional way monitoring tools are used
starts with an engineer getting paged and interpreting dashboards or logs to
investigate the problem. But these days, it’s impossible to predict all of the ways
Not to mention that nearly every monitoring vendor tries to lock you in by
our software might break, which means it’s impossible to set up alerts for every
making you learn and implement their proprietary agents for instrumenting
potential failure scenario.
telemetry data.
In addition, finding root causes still requires too much manual labor, guesswork
and expensive war room calls. A lot of tools do a good job surfacing visibility
into application golden signals (like latency, traffic, errors and saturation) or It’s time for a new approach.
infrastructure metrics (like memory and CPU utilization). Still, it’s not easy An observability practice helps ITOps and engineering teams gain complete
to get to the “so what?” to understand the broader impact of a software or business visibility across their infrastructure, applications and digital customer
infrastructure performance issue. experience. Teams need the ability to proactively spot unknowns and see root
causes of problems before customers are impacted, all with full control over
their data.
After migrating to Splunk, the average time it takes to recover Does it reduce alert noise?
Intelligent event correlation in Spunk ITSI uses machine learning groups and
from a system failure has now gone from 30 minutes to
prioritizes logs, metrics and events from multiple sources (infrastructure,
about five minutes, and they’re fixing things more than applications, networks, etc.) and helps ITOps teams reduce alert noise by
80% faster. over 90%.
The right observability solution will give you time back. Pointing teams to a few Does it prioritize alerts?
actionable events and helping prioritize them based on service impact means We do that via guided root cause analysis that uses machine learning and
they can quickly identify root cause and improve their mean time to resolve historical data to prioritize alerts and float the big rocks to the top.
(MTTR) critical incidents. Is it OpenTelemetry native?
Many monitoring tools still use proprietary agents that are cumbersome
to maintain and very expensive to scale in cloud environments. And while
“We can now correlate backend traces from APM with frontend some of them claim to use OpenTelemetry, the number of projects that
traces from RUM. That’s a huge value because that’s been our contribute to OpenTelemetry continues to be low. Using an OpenTelemetry
missing link. It’s been very illuminating and has revealed hidden native observability solution helps development teams build faster and
inefficiencies that we’re now able to address.” more reliable applications by providing streamlined observability, efficient
debugging, improved collaboration, better resource allocation and reduced
— Sean Schade, Principal Architect, Care.com vendor lock-in. Splunk is the top contributor to OpenTelemetry.
j With Splunk, teams can view a unified and holistic picture of their
application and infrastructure health. Shared visibility can help break “Splunk Observability Cloud captures all the logs, metrics and
down silos and promote collaboration between different teams, such as traces in a way that allows us to understand any event across our
development, operations and security. platform, so we can ask questions and get answers.”
j Splunk provides rich contextual data and insights, such as logs, metrics —M
att Coddington, Senior Director of DevOps Engineering, Care.com
and traces, which can help teams quickly identify and diagnose issues. This
information can be shared across teams, enabling better collaboration and
problem solving.
j Splunk supports collaboration workflows, such as shared dashboards,
alerts and reports. This helps teams share information and coordinate
efforts in real time.
Logs and • Ideal for static reporting • Ideal for on-demand analytics Agility to quickly adapt to changing business
machine • Structured events • Any data conditions, customer needs or market trends.
data • Pre-defined, normalized • Index data “as is” in native format
• Filtered, adulterated data • Complete, presine raw data
• Limited to known knowns • Unknown unknowns
• ETL into brittle schema • Flexible index “Schema-on-the-Fly”
• Enrich at write • Enrich at read
• Write SQL & build report • Dynamic Google type search
• New questions = re-write & start over • Ask anything, anytime
• “Data at rest” • “Data in motion”
Metrics • Written for VMs and Retrofitted for • Natively architected for Micro-services Help DevOps teams quickly identify and
Containers • Business observability resolve issues, improving system reliability
• Infrastructure observability • Pre-built & custom metrics + metadata and reducing downtime.
• Pre-built metrics • Any metrics, any source (i.e. OTEL)
• Proprietary Monetized Agents • In-flight real-time analytics
• Heavy processing post ingestion • Highly scalable
• Memory constrained • Custom metrics for YOUR business
• Pre-defined metrics • Infinite dimensions/high cardinality
• Limited to known knowns • Real-time streaming analytics
• Batch Analysis • Machine-grade analytics & automation
• Human analytics & alert overload • Full Splunk integration
• No or limited logging
Traces • Architected for monolithic apps • Architected for modern apps Improved visibility, scalability, performance
• Proprietary heavy client • OpenTelemetry open standards client and development processes.
• Agent-based analytics (snap shots) • Cloud-based analytics (omniscient)
• Rigid, pre-defined data tagging • Flexible, universal data-tagging
• DB batch analytics (post ingress) • Real-time streaming analytics & alerting
• Correlate data abstractions (guess) • Analyze 100% of raw data (facts)
• Container-level within apps only • True microservices
• High-level analytics • Granular analytics
• Legacy/monolith APM • Distributed/microservices APM
• Limited RUM • Customer experience end-to-end visibility
• No or limited logging • Full Splunk integration
leaders. This helps IT teams showcase the value they provide to lines of
business, and drives better informed strategic decisions.
How complex is their pricing model? j Unlike competitors, we do not adjust license cost by host RAM size — so
Many observability and monitoring tools have complex pricing models that it doesn’t matter what size RAM your cloud VMs are with Splunk (8, 16, 32,
make it difficult for users to understand how much they will have to pay. 64gb) — we are the same cost across all.
Some tools charge per host, per container, or per metric, while others charge j OpenTelemetry native solutions like Splunk are more cost effective over
based on the volume of data ingested or the number of alerts generated. This time because they don’t require licensing fees and your staff doesn’t have
makes it challenging for users to accurately estimate their costs and plan to spend time learning proprietary agents before they can be effective.
their budgets.
How do I avoid vendor lock-in and how does this vendor support
customization I may require down the road?
Many observability and monitoring tools are offered as a package deal,
making it difficult for users to switch vendors or use only the parts of the
tool that they need. This can lead to vendor lock-in, where users are tied to a
specific vendor and have limited options for customization and flexibility.
• Community and ecosystem j Splunkbase powers Splunk Observability customers with over 2,800 apps
Evaluate the size and vibrancy of the vendor’s community and ecosystem. (most of these are free). It offers a wide range of content — including apps,
Look for evidence of an engaged user community, a robust partner network add-ons, dashboards and more — that can help observability customers get
and a commitment to collaboration and knowledge sharing. more out of their Splunk deployment.
j Lastly, combining the capabilities of Splunk Observability Cloud, Splunk
j Splunk has a large and active user community and provides comprehensive Cloud and Enterprise platforms empowers customers with a holistic,
support and training resources. This can reduce the need for expensive cost-effective and reliable technology platform for all their IT and
vendor support contracts. engineering needs
• Avoid limited integrations • Company history and reputation
Some observability and monitoring tools are limited in their ability to integrate Research the vendor’s history and reputation in the market. Look for evidence
with other tools or platforms. This can make it difficult for users to get a of stability, growth and a commitment to customer success.
complete view of their systems and applications, leading to blind spots and
making it challenging to identify and diagnose issues.
• A Visionaryin 2022 Gartner® Magic Quadrant™ for Application • A Market LeaderResearch in Action’s Vendor Selection Matrix for
Performance Monitoring and Observability1 AIOps Platforms, 2022
• A Strong Performerin Forrester® Wave™ for AI for IT • A Notable Vendorin Constellation ShortLists for Observability,
Operations Solutions, 20222 AIOps & Incident Management, 2022
• #1 by Market Sharein Gartner® Market Share Analysis: ITOM, Health and • A Market LeaderOmdia Universe for AIOps, 2021-22
Performance Analysis Software, Worldwide, 2021 (published 2022)3 • Top 3 Vendorin EMA Code-Level Observability Award, 2021
• Ranked #1 Market Sharein IDC Worldwide IT Operations Analytics
Software Market Shares, 2021: Market Growth Accelerates, 20224
1 Gartner® Magic Quadrant™ for Application Performance Monitoring and Observability (June 2022)
GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally, and MAGIC QUADRANT is a registered trademark of Gartner, Inc. and/or its affiliates and are
used herein with permission. All rights reserved. Gartner does not endorse any vendor, product or service depicted in its research publications and does not advise technology users to select only those vendors
with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s Research & Advisory organization and should not be construed as statements of fact. Gartner
disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.
2 The Forrester Wave™: Artificial Intelligence for IT Operations, Q4 2022 (December 2022)
3 Gartner® Market Share Analysis: ITOM, Health and Performance Analysis Software, Worldwide, 2021 (October 2022)
4 IDC Research: Worldwide IT Operations Analytics Software Market Shares, 2021: Market Growth Accelerates (doc #US49609921, September 2022)
Get Started
Splunk, Splunk> and Turn Data Into Doing are trademarks and registered trademarks of Splunk Inc. in the United States and
other countries. All other brand names, product names or trademarks belong to their respective owners. © 2023 Splunk Inc.
All rights reserved.