100% found this document useful (1 vote)
133 views17 pages

SLA Monitoring

This document discusses how Catchpoint can be used to monitor service level agreements (SLAs) for cloud services. It explains that SLAs define acceptable service levels through service level objectives (SLOs) and service level indicators (SLIs) that measure metrics like response time and availability. While cloud providers report on SLA performance, independent monitoring through Catchpoint improves accountability by measuring from multiple vantage points. Catchpoint allows configuring tests of critical transactions and alerts when SLOs are breached to help ensure services meet contractual obligations.

Uploaded by

Ary Antonietto
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
133 views17 pages

SLA Monitoring

This document discusses how Catchpoint can be used to monitor service level agreements (SLAs) for cloud services. It explains that SLAs define acceptable service levels through service level objectives (SLOs) and service level indicators (SLIs) that measure metrics like response time and availability. While cloud providers report on SLA performance, independent monitoring through Catchpoint improves accountability by measuring from multiple vantage points. Catchpoint allows configuring tests of critical transactions and alerts when SLOs are breached to help ensure services meet contractual obligations.

Uploaded by

Ary Antonietto
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Handbook

SLA Monitoring

TM

Catchpoint SLA Monitoring Handbook


With more applications moving to the cloud, the
role of IT moves from managing performance and
availability, to governance. Ensuring service level
agreements (SLAs) are being met is important for
both the consumer and provider of a SaaS service.
SLAs provide the consumer with objective grading
criteria and protection from poor service. The
provider is able to set appropriate expectations
regarding how the service will be judged, and is
incentivized to improve the quality of service.

The 2017 State of SaaS Performance report


conducted by Tech Target revealed that over 25%
of survey respondents had incurred financial
penalties for failing to meet SLAs. Monitoring
your cloud providers can improve accountability,
and help you potentially recoup costs if an SLA
is breached. This handbook focuses on what
consumers of cloud services can do to measure
the performance and manage SLAs of their various
cloud providers.

Catchpoint SLA Monitoring Handbook 2


The term SLA is widely used and has become an umbrella term. You can’t actually monitor an SLA.
The metric measured is a service level indicator (SLI) in relation to a service level objective (SLO). An
SLI is a quantitative measure of the level of service. This measure could be related to response time,
errors, or availability. The SLO provides a value or range of values considered acceptable for the
SLI. There is generally an upper-bound or a lower-bound. For example, DNS resolution time may be
reported as not taking more than 100 ms.

Service Level Indicator (SLI)


Metric that is measured

Service Level Objective (SLO)


Acceptable range of values for the SLI

Service Level Agreement (SLA)


Legal document or contract with end users defining
consequences if SLO is not met.

The SLA can outline how the SLI is going to be measured, the length of time or number of
measurements that must be outside the range, and the consequences when the agreement is
breached. If there are no consequences, there is no SLA.

Catchpoint SLA Monitoring Handbook 3


SLAs exist between consumers and providers of
services. The provider can be an external provider
or an internal provider. External providers can
include DNS services, content delivery networks,
hosting providers, productivity software,
communication and collaboration suites. It doesn’t
matter whether the provider is internal or external,
if the application or service being provided is
business critical, an SLO and SLA should be set.
But setting an SLO and publishing an agreement
isn’t enough; you also need to measure the service
to ensure the SLOs are being met. This handbook
will cover how Catchpoint can be used to help
you monitor SLOs to ensure you are receiving the
appropriate level of service.

Catchpoint SLA Monitoring Handbook 4


Improve accountability and governance

When talking about SLAs it is very easy to get into a finger pointing exercise where each party feels
they are right. ”Yes, the performance wasn’t what was expected, the customer is right.” ”No, the SLA
wasn’t breached, the provider is right.” Looking at things from multiple angles sometimes reveals
there is no one right answer.

The first step is to collect objective and accurate measurements. Cloud providers often provide
reports on how they are performing in relation to SLAs, but verifying this with your own monitoring
increases confidence in the level of performance being received. Too much noise in the data and
you won’t be able to effectively determine if the SLA has been breached. However, measuring from
a single location isn’t effective; you need to measure from vantage points that match where your
users are located. The wider number of vantage points, the easier it is to see if issues are regional or
global.

Catchpoint SLA Monitoring Handbook 5


It is good to measure from the cloud if that is where
your application is hosted, but that shouldn’t be the
sole source. If the cloud provider experiences an
outage and you’re only monitoring from the same
cloud provider, you will lose visibility and reporting —
not good.

Measurement locations should include backbone and


last mile. The tests from the backbone will eliminate
noise, so these are the metrics that should be used for
validating the SLO as they are the cleanest. Last mile
or real user measurements are still valuable as they
show the end users’ experience.

Honeywell used Catchpoint OnPrem agents to


measure the SLAs of their satellite ISP from on-
board a fleet of planes. Network service levels were
measured and revealed that the spotbeam service
was over-saturated by Honeywell’s use during peak
travel times. They were receiving the level of service
they purchased so no SLAs had been breached, but
their usage exceeded capacity. Measuring from the
vantage point of the user — in this case passengers
on airplanes — in addition to backbone locations
revealed the service was running as designed, but
users still had a poor experience. To resolve the
performances issues Honeywell needed to purchase a
higher level of service.

Catchpoint SLA Monitoring Handbook 6


Capture data on critical business transactions

Users are interacting with more than just the home page of an application. They are logging in,
searching for information, and downloading content. If there are SLOs in place for multiple areas of
the application, those need to be monitored.

Transaction monitoring can be more complicated than single page monitoring when determining
which parameters should be used. Different inputs may yield different results based on application
logic or the APIs used to pull information. Using the same inputs may not provide enough insight,
but it isn’t feasible to test every permutation.

Catchpoint offers the ability to quickly create multi-step transactions through a Selenium-based
Chrome script recorder. The script recorder makes it easy to create and upload a transaction to
detect issues with key business processes. Logic can be inserted into scripts to choose different
search terms, select valid travel dates, or always click on the second item in a list by dynamically
cycling through a list of terms, dates, or numbers.

Catchpoint SLA Monitoring Handbook 7


Given that many SaaS solutions are purchased on
a per-seat basis, it may discourage you from using
valuable seats to monitor the application. But
consider the alternative: what if you don’t monitor
and an issue occurs? A little knowledge can go a long
way; talk to your vendor about providing an extra
seat for testing and monitoring purposes.

Tests can be configured in Catchpoint to run on a


staggered basis, so you don’t have to worry about
errors occurring from simultaneous logins from a
single test account, or you can use multiple logins for
a single test. In the “Targeting and Scheduling” section
select “Random” for Node Distribution. This will have
test run at a different time on each node.

Catchpoint SLA Monitoring Handbook 8


Receive notifications when something is wrong

Alerting is a core component of monitoring. Alerts let us know when a problem is occurring and
when an action needs to be taken. The actions taken when a SaaS service is facing problems are
different from when an internally-hosted application experiences similar problems, but they are no
less important. Configuring alerts when errors occur or response time thresholds are passed can set
in motion a series of events that may include:

• Disabling a tag for a third-party vendor


• Notifying users that a problem is occurring
• Engaging with the vendor

When tracking the SLAs for third-party tags, the hosts and zones feature allows you to create custom
categories of content known as zones. A zone can be created for your social media tags, advertising
tag, analytics tags, etc. Once the zones are enabled, alerts can be triggered when thresholds are
passed. This enables organizations to quickly identify which third-party component is potentially in
breach of an SLA.

Catchpoint SLA Monitoring Handbook 9


The zones alerting is one of many advanced alerts
available to track SLAs. Other alert types include
content matches, response time, and availability
alerts. All alerts can be customized to specify
recipients and additional instructions. Notification
groups allow you to target specific groups based
on the severity of an alert. As an outage escalates,
additional parties will need to be notified; setting a
critical notification group allows those parties to be
notified at the right time.

Catchpoint SLA Monitoring Handbook 10


The instructions section allows custom content to be included in the alert email. This can be useful
for including action items such as instructions for disabling a tag, details on notifying stakeholders
and users within the organization, or contact information for the SaaS provider. Including all relevant
information in the email saves time when incidents occur.

Catchpoint SLA Monitoring Handbook 11


Partner with vendors and share information

To work together effectively, information must be


shared. Transparency is important when enforcing
SLAs, instead of implicitly trusting that everything
is fine; data must be shown and shared from both
sides. This applies when actively troubleshooting
an issue or when things are running smoothly. Talk
to your SaaS vendor about how they measure and
monitor for performance. Get advice from them on
what you should be monitoring from the customers’
perspective

Regardless of how much everyone hates outages,


avoiding them completely is simply not possible. Yet
being able to share snapshots, charts, waterfalls, and
other diagnostic information can help to reduce the
mean time to resolve an incident. Public URLs can be
created for the majority of Catchpoint charts, making
it easy to quickly share information with external
recipients when it matters most.

Catchpoint SLA Monitoring Handbook 12


Public URL included in each waterfall.

Link to create a public URL for a chart:

When not actively troubleshooting a performance problem, it can be easy to lose track of whether
SLAs are being met. Receiving regular reports that track performance in relation to the SLOs ensures
that all stakeholders are made aware if and when an SLA has been breached in a given time period.
Color-coded reports from Catchpoint can be emailed on a scheduled basis (hourly to quarterly) or
added as a widget to a dashboard to keep all stakeholders informed of SLA statuses.

Catchpoint SLA Monitoring Handbook 13


To create an SLA report, first create a template.

The template can then be used to create a scheduled


report showing whether a given test was within the
SLOs specified in your agreement.

Catchpoint SLA Monitoring Handbook 14


The report keeps you up to date on performance
levels on a regular basis.

Applications periodically have maintenance windows,


or passwords change which result in test “failures.”
These test runs would need to be excluded when
creating reports. Purging data from tests gives you a
more accurate representation of whether an SLO has
been exceeded.

Catchpoint SLA Monitoring Handbook 15


Conclusion

Customers are no longer relying on a single provider and are utilizing


multiple regions or availability zones for their cloud infrastructure. This offers
protection in case a provider experiences an outage. When Oracle+Dyn
experienced an outage in October 2016, large portions of the internet
stopped working. Being alerted to the issue in a timely manner enabled the
Catchpoint operations team to take action and start to mitigate the impact
to customers. It may not always be possible to have a back-up provider,
but investigate the areas where it makes the most sense. Potential areas to
consider include:

• Content Delivery Networks


• Multiple regions or availability zones for your cloud provider
• Managed DNS services

While having multiple vendors means more SLAs to manage, the benefits of
less downtime far outweighs the consequences of sustained outages.

SLAs provide value and protection to both the consumer and the vendor. The
framework outlined here can help consumers monitor and track the various
service levels holding vendors accountable and potentially recouping costs.
You’re already monitoring your internal applications infrastructure; adding a
few more monitors won’t hurt anybody. In fact, it will help everyone involved.

Catchpoint SLA Monitoring Handbook 16


TM

A Different Approach to Digital Experience Monitoring 17 Smart Monitors

Catchpoint is a leading digital experience intelligence company Real browser, multi-transaction, mobile, HTML code, API,
that provides unparalleled insight into your customer-critical streaming, DNS, FTP, TCP, SMTP, ping, traceroute, SSH, NTP, IMAP,
services to help you consistently deliver amazing digital web socket and MQTT.
experiences. Catchpoint is the only performance digital
experience monitoring platform that provides integrated synthetic Deepest and broadest diagnostics
and real user monitoring, comprehensive test types, real-time 100 days of object level data; 3 years of raw aggregate data.
analytics, and a diverse node network to help you continuously
preempt performance issues and optimize service delivery. More To request a free trial, visit
than 400 customers in over 30 countries trust Catchpoint to catchpoint.com/freetrial

strengthen their brands and grow their businesses.

Catchpoint SLA Monitoring Handbook

You might also like