UTest Whitepaper New Metrics of Software Testing
UTest Whitepaper New Metrics of Software Testing
White Paper
November 2012
White Paper: Software Testing Metrics
White Paper
Increase Software Quality
Pass/Fail Tests Aren’t Enough; You Must Analyze Metrics
Throughout Production for a Successful App
Table of Contents
• Introduction........................................................... 3
• About uTest........................................................... 9
2
White Paper: Software Testing Metrics
Introduction
As mobile and desktop devices advance and more people weave technology seamlessly
into their everyday life, how applications function and how they’re produced and tested
needs to change. We’ve entered a time where people expect more from their
technology. They expect applications to function correctly, run smoothly and not bog
down their devices from the moment they are released. If there is a bug, users expect it
to be corrected quickly. This has led to near constant releases and version updates and
the need to perform deep metric analysis to ensure apps are running as efficiently as
possible. Enter continuous development (which calls for continuous testing), Testing in
Production and the rise of modern application metrics.
Adopting a mind-frame that embraces continuous testing and Testing in Production will
help teams perform traditional testing while also providing them with the vital metrics all
developers need to pay attention to.
In this whitepaper we’ll cover which metrics are most important to modern
development and how continuous testing and Testing in Production can help you
analyze and act on that data.
Testing in Production
Testing in Production (TiP) is a testing method that advocates releasing your product to
the public while having developers on hand to monitor and immediately fix any bugs.
Bear in mind that this method will not work for applications that have a vetting process
(such as iOS mobile apps). It may seem like a risky option, but if done correctly, TiP can
be extremely useful. But first, you need to be sure that your application is thoroughly
tested before being released into production. That’s where continuous testing comes in.
3
White Paper: Software Testing Metrics
Testing in Production (TiP) is a natural extension of continuous testing. By the time you
near launch, your product should be largely bug free from the continuous testing. This is
what makes TiP feasible. Though Testing in Production involves releasing your
application to the public, keep in mind that this can be accomplished through a limited
release – either to a subset of users or during a lull in activity. This gives you the same
end user insight while limiting exposure of a potentially faulty product. Since the product
is in active use, development teams have the ability not only to find bugs that eluded
them in the lab, but also see if a bug fix works almost immediately. This last line of
testing defense, which is intended to find fringe use case or in-the-wild bugs that didn’t
appear in traditional testing, helps developers find and fix issues before they cause any
major impact.
TiP Methods
So how do you Test in Production? Do you just release your application in the middle of
the night and watch what happens? Not quite. Here are a few methods (identified by
Seth Eliot, a Senior Test Engineer of Testing Excellence at Microsoft) that will give you
an idea of what you should look for and what you can achieve using TiP.
Data Mining: Data mining allows you to track real usage patterns and tease out
defects. It also lends itself to optimization since you can look at statistics from
before and after changes. Another option is to collect the data in real-time for
team members to analyze later. This can influence future changes.
User Performance Testing: This issue will come up again when we discuss
which metrics are most important. As far as testing goes, use TiP to get an idea
of how your app performs across the hardware/software matrix. Like with data
mining, this gives you access to real life results from a user’s perspective.
Environment Validation: You can run environment validation during the initial
launch or collect data continuously. This category involves traditional pass/fail
tests that look for version compatibility, connection health, installation success
and other important factors across the user matrix.
4
White Paper: Software Testing Metrics
Experimentation for Design: This is your typical A/B test. Divide your users into
two (or more) groups and give each a different design to see which users
respond to best.
Load Testing in Production: Release your application into TiP and add
synthetic load on top of the real users. You’ll be able to see exactly what
happens if something goes wrong when your app is hit with heavy traffic – before
you disappoint a tidal wave of visitors. Load testing this way can help you identify
issues that may not appear in traditional automated load testing – such as
images not loading properly.
Remember, ideally these tests shouldn’t have adverse effects on users. Most major
issues should have been caught already. Many of these tests can be performed in ways
that don’t involve releasing your product to your real-life audience. With that being said,
it’s important to remember that your real-life audience will be the ones who actually use
your application in the end. So do both. Use traditional testing methods initially, then use
TiP as a sanity check. You will get the most accurate and useful information by testing
with end users – actionable metrics you can’t get from a controlled environment.
The Metrics
Much of the testing you’ve likely done to date consists of a simple pass or fail
assessment. While this type of testing is important, pass/fail tests alone aren’t enough to
make your app succeed. Performance
based metrics, especially when gleaned “If the team really needs to plot the
from real-life devices, are particularly history of pass/fail, the team is living
important these days. Users will quickly
abandon your application if it is slow to in the past.”
load, takes up too much memory or data or - Jason Arbon
doesn’t interact properly with other aspects Former Google Engineer
of their device. An end user doesn’t care
about how many test cases passed or failed, those are useless metrics in the long run.
Instead, metrics like CPU usage, API performance and system response time should all
be considered while testing. These are the things your users are concerned about.
As technology continues to become more prevasive, everyday users will get savvier and
more comfortable with the “tech” part of techonolgoy. There are already a number of
consumer-facing applications that help users measure their connection speeds, data
usage and a variety of other information that used to be stricly in the relm of developers
and testers. With the glut of big data flooding in, it is helpful to focus specifically on the
metrics users themselves can acess. These will be the ones they are paying attention to
and the ones that will be influcening their use habits. In many cases, ignoring these
metrics can cost you users – and ultimately revenue.
End Users
What’s the point of putting out an application if no one uses it? Software testing should
be extremely end user focused. After all, they’re the ones who will ultimately be making
5
White Paper: Software Testing Metrics
CPU
Using too much processing power is one of the biggest reasons people will abandon an
application. You absolutely must measure this. Slow response time is an indicator to
users that something is bogging down their system and there is a selection of apps
across the different markets that let users see how much power each application is
using. If your application is flagged as the biggest abuser it’s not unlikely that users will
look for a better tuned replacement, particularly if your app isn’t absolutely necessary.
Another important factor to remember – and another reason to monitor CPU usage – is
that not all devices have the same processing capabilities. By not monitoring CPU usage
across a range of devices you may miss some major device-specific issues. It is
important to particularly monitor and test CPU usage as you release new versions or
new popular handsets hit the market. Do not assume that because your application was
working fine at one point that it will always be fine.
API
API requests can also have an effect on the response time of an application. If not
properly integrated, APIs can slow down an application, or completely fail to return a
desired action. Whether you create a custom API or use an open source version, be
sure to test the functionality and security of the API itself in addition to testing how it
works on devices.
APIs are becoming increasingly common in the world of mobile and web, but unlike
CPUs, API technology is less understood by the common user. This should be an even
bigger incentive to carefully vet and monitor any API requests integrated into your
application. Users will recognize that something is wrong, but they won’t be able to
pinpoint the problem, which will lead to general frustration and anger.
6
White Paper: Software Testing Metrics
Once you move out of initial testing and into a production environment, continue to
monitor your APIs to ensure outside factors don’t adversely affect the application’s
quality or the effectiveness of the API’s service.
System Response
Testing CPU usage and APIs isn’t enough to ensure good system response time – there
is still a lot more to look at specifically related to system performance.
Measuring system response time has its own set of sub-metrics. Nexcess, a web hosting
company, highlights these specific measurements:
Payload: The total size in bytes sent to the test application, including all resource
files.
Bandwidth: From client to server, the minimal bandwidth in Bps across all
network links.
AppTurns: The number of components (images, scripts, CSS, etc) needed for
the page to render.
Round-Trip Time: The amount of time in milliseconds it takes to communicate
from client to server.
Concurrency: The number of simultaneous requests an application will make for
resources.
Server Compute Time: The time it takes for the server to parse the request, run
application code, fetch data and compose a response.
Client Compute Time: The time it takes for the application to render client-
facing features (e.g. HTML, scripts, stylesheets, etc. for web apps).
7
White Paper: Software Testing Metrics
Tips
Now you know what to monitor once your application is released to the public, but what if
you turn up a problem? Here’s a few recommendations that will help you process the
data you’re collecting and troubleshoot issues if they arise.
API Issues
The best way to avoid API issues is to reduce API request complexity. Couple as many
queries as possible instead of sending many individual requests. Similarly, if you are
working with an application that has caching capabilities, figure out what items can be
cached to cut down on the amount of data that needs to be retrieved everytime your app
is used.
It’s also extremely important to remember that APIs can be affected by platform version,
particularly within the Android ecosystem. Each version of the Android operating system
only supports specific API classes. Identify the most popular devices within your target
demographic and see what platform versions those devices support – it is often not the
most recent platform release. Tailor your API integration to work with the dominate
platform version (and remember to update as the new versions disseminate).
Work with percentiles, not averages. Taking a broad measurement and finding
the average response speed will not give you an accurate portrayal. This practice
disregards the top and bottom speeds (important information) and doesn’t give
you any idea of how many users are experiencing what speeds. Measuring
response speed and dividing the data into percentile designations will give you a
clearer picture of the dominate response speed. If the biggest percentile has slow
response times, you have an issue.
Not all performance data is the same. Don’t lump initial load time with
response time for logging in – they are two separate actions and should be
analyzed as such. One action may be slower than others (especially if an action
relies on API calls) but if you look at all response time data together you will not
know which action needs addressing.
Cache what you can. Like with API requests, caching whatever data you can
will reduce response time.
Conclusion
Understanding that these metrics are important – and often accessible – your users is a
vital part of modern application development and testing. You cannot push aside data
like CPU usage, API response and system response time and deal with it another day.
8
White Paper: Software Testing Metrics
Users will notice an issue and know that it isn’t a price they have to pay for technology.
They can and will go elsewhere.
But don’t be overwhelmed by the flood of big data. Take advantage of testing methods
such as continuous testing and Testing in Production to help you not only find and fix
bugs, but see your application from an end users’ perspective. These practices will only
become more important as people continue to grow more and more involved with their
everyday technology.
9
White Paper: Software Testing Metrics
About uTest
uTest provides in-the-wild testing services that span the entire software development lifecycle –
including functional, security, load, localization and usability testing. The company’s community
of 80,000+ professional testers from 190 countries put web, mobile and desktop applications
through their paces by testing on real devices under real-world conditions.
More info is available at www.utest.com or blog.utest.com, or you can watch a brief online
demo at www.utest.com/demo.
uTest, Inc.
153 Cordaville Road
Southborough, MA 01772
p: 1.800.445.3914
e: [email protected]
w: www.utest.com
10