How To Write A Performance Test Case

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 10

http://www.myloadtest.

com/how-to-write-a-performance-test-case/

How to write a performance test case


I have decided to release an early draft of this document so that others may provide feedback. Please let me know what you think. Writing test cases for performance testing requires a different mindset to writing functional test cases. Fortunately it is not a difficult mental leap. This article should give you enough information to get you up and running. First, lets set out some background and define some terms that are used in performance testing.
Test case a test case is the same as a use case or business process. Just as with a functional test case, it outlines test steps that should be performed, and the expected result for each step. Test script a test script is a program created by a Performance tester that will perform all the steps in the test case. Virtual user a virtual user generally runs a single test script. Virtual users do not run test scripts using the Graphical User Interface (like a functional test case that has been automated with tools like WinRunner, QuickTest, QARun or Rational Robot); they simulate a real user by sending the same network traffic as a real user would. A single workstation can run multiple virtual users. [Would a diagram be useful here?] Scenario a performance test scenario is a description of how a set of test scripts will be run. It outlines how many times an hour they will be run, how many users will run each test script, and when each test script will be run. The aim of a scenario is to simulate real world usage of a system.

Writing a test case for performance testing is basically writing a simple Requirements Specification for a piece of software (the test script). Just as with any specification, it should be unambiguous and as complete as possible. Every test case will contain the steps to be performed with the application and the expected result for each step. As a performance tester will generally not know the business processes that they will be automating, a test case should provide more detail than may be included in a functional test case intended for a tester familiar with the application.

It is important that the test case describes a single path through the application. Adding conditional branches to handle varying application responses, such as error messages, will greatly increase script development time and the time taken to verify that the test script functions as expected. If a test script encounters an error that it does not expect, it will usually just stop. If the Project Manager decides that test scripts should handle errors the same way a real user would, then information should be included on how to reproduce each error condition, and additional scripting time should be included in the project plan. The main reason a user may be presented with a different flow through the application is the input data that is used. Each test case will be executed with a large amount of input data. Defining data requirements is a critical part of planning for a performance test, and is the most common area to get wrong on a first attempt. It is very easy to forget that certain inputs will present the user with different options. The other important data issues to identify are any data dependencies and any potential problems with concurrency. Is it important that data is used in some business functions before they are used in others? And, will data modified by virtual users cause other virtual users to fail when they try to use the same data? The test tool can partition the data used by each virtual user if these requirements can be identified. It can be difficult for a performance tester to debug test script failures with little knowledge of the application, especially if the failures only occur when multiple virtual users are running at once. One of the most important pieces of information a performance test is designed to discover is the response time of the system under test both at the overall business function level and at the low level of individual steps in the test case, such as the time it takes for a search to return a result set. Any test cases provided to a performance tester should clearly define the start and end points for any transaction timings that should be included in the test results. It is important to remember that the test script is only creating the network traffic that would normally be generated by the application under test. This means that any operations that happen only on the client do not get simulated and therefore do not get included in any transaction timing points. A good example would be a client application that runs on a users PC, and communicates with a server. Starting the client application takes 10 seconds and logging in takes 5 seconds but, since only the login is sending network traffic to the server, the transaction timing point will only measure 5 seconds. Operations that only happen on the client, including the time users take to enter data or spend looking at the screen is simulated with user think time an

intentional delay that is inserted into the test script. If no think time is included, virtual users will execute the steps of the test case as fast as they can, resulting in greater (and unrealistic) load on the system under test. Depending on the sophistication of the performance test tool, the user think time may be automatically excluded from the transaction timing points. Think times are generally inserted outside of any transaction timing points anyhow. While a functional test case will be run once from start to finish, a performance test case will be run many times (iterated) by the same virtual user in a single scenario. Information on how the test steps will be iterated should be included in the test case. For example, if a test case involves a user logging in and performing a search, and the entire test case is iterated by the virtual user; then a test scenario may be generating too many logins if the real users generally stay logged into the application. [Is there a way to break up this sentence?] A more realistic test case may have the virtual user log in once and then keep doing the same action for as long as the virtual user is run. When a script is iterated, consideration should be given to the non-obvious details of how it is iterated. A good example would be a test script simulating users using an Internet search engine. When the test script is iterated, simulating a new search operation, should the virtual user establish a new network connection and empty their cache or should every iteration simulate the same user conducting another search? As all performance test tools have different default behaviours, a good performance tester should clarify this type of detail with business and technical experts. Some performance test tools make these details easier to change than others. If it is not practical to emulate all the attributes of the expected system traffic with a particular tool, then it should be noted as a limitation of the performance test effort, and a technical expert should assess the likely impact. Hopefully this article has provided some insight into the extra considerations that must be given when writing a performance test case, rather than a functional test case. As with any software specification, a performance test case may need to be refined as questions are raised by the performance tester.

An example test case:

Performance vs. load vs. stress testing


Here's a good interview question for a tester: how do you define performance/load/stress testing? Many times people use these terms interchangeably, but they have in fact quite different meanings. This post is a quick review of these concepts, based on my own experience, but also using definitions from testing literature -- in particular: "Testing computer software" by Kaner et al, "Software testing techniques" by Loveland et al, and "Testing applications on the Web" by Nguyen et al. Update July 7th, 2005

From the referrer logs I see that this post comes up fairly often in Google searches. I'm updating it with a link to a later post I wrote called 'More on performance vs. load testing'. Performance testing The goal of performance testing is not to find bugs, but to eliminate bottlenecks and establish a baseline for future regression testing. To conduct performance testing is to engage in a carefully controlled process of measurement and analysis. Ideally, the software under test is already stable enough so that this process can proceed smoothly. A clearly defined set of expectations is essential for meaningful performance testing. If you don't know where you want to go in terms of the performance of the system, then it matters little which direction you take (remember Alice and the Cheshire Cat?). For example, for a Web application, you need to know at least two things:

expected load in terms of concurrent users or HTTP connections acceptable response time Once you know where you want to be, you can start on your way there by constantly increasing the load on the system while looking for bottlenecks. To take again the example of a Web application, these bottlenecks can exist at multiple levels, and to pinpoint them you can use a variety of tools: at the application level, developers can use profilers to spot inefficiencies in their code (for example poor search algorithms) at the database level, developers and DBAs can use database-specific profilers and query optimizers at the operating system level, system engineers can use utilities such as top, vmstat, iostat (on Unix-type systems) and PerfMon (on Windows) to monitor hardware resources such as CPU, memory, swap, disk I/O; specialized kernel monitoring software can also be used at the network level, network engineers can use packet sniffers such as tcpdump, network protocol analyzers such as ethereal, and various utilities such as netstat, MRTG, ntop, mii-tool From a testing point of view, the activities described above all take a white-box approach, where the system is inspected and monitored "from the inside out" and from a variety of angles. Measurements are taken and analyzed, and as a result, tuning is done. However, testers also take a black-box approach in running the load tests against the system under test. For a Web application, testers will use tools that simulate concurrent users/HTTP connections and measure response times. Some lightweight open source tools I've used in the past for this purpose are ab, siege, httperf. A more heavyweight tool I haven't used yet isOpenSTA. I also haven't used The Grinder yet, but it is high on my TODO list. When the results of the load test indicate that performance of the system does not meet its expected goals, it is time for tuning, starting with the application and the database. You want to make sure your code runs as efficiently as possible and your database is optimized on a given OS/hardware configurations. TDD practitioners will find very useful in this context a framework such as Mike Clark's jUnitPerf, which enhances existing unit test code with load test and timed test functionality. Once a particular function or method has been profiled and

tuned, developers can then wrap its unit tests in jUnitPerf and ensure that it meets performance requirements of load and timing. Mike Clark calls this "continuous performance testing". I should also mention that I've done an initial port of jUnitPerf to Python -- I called it pyUnitPerf. If, after tuning the application and the database, the system still doesn't meet its expected goals in terms of performance, a wide array of tuning procedures is available at the all the levels discussed before. Here are some examples of things you can do to enhance the performance of a Web application outside of the application code per se: Use Web cache mechanisms, such as the one provided by Squid Publish highly-requested Web pages statically, so that they don't hit the database Scale the Web server farm horizontally via load balancing Scale the database servers horizontally and split them into read/write servers and read-only servers, then load balance the read-only servers Scale the Web and database servers vertically, by adding more hardware resources (CPU, RAM, disks) Increase the available network bandwidth Performance tuning can sometimes be more art than science, due to the sheer complexity of the systems involved in a modern Web application. Care must be taken to modify one variable at a time and redo the measurements, otherwise multiple changes can have subtle interactions that are hard to qualify and repeat. In a standard test environment such as a test lab, it will not always be possible to replicate the production server configuration. In such cases, a staging environment is used which is a subset of the production environment. The expected performance of the system needs to be scaled down accordingly. The cycle "run load test->measure performance->tune system" is repeated until the system under test achieves the expected levels of performance. At this point, testers have a baseline for how the system behaves under normal conditions. This baseline can then be used in regression tests to gauge how well a new version of the software performs. Another common goal of performance testing is to establish benchmark numbers for the system under test. There are many industry-standard benchmarks such as the ones published by TPC, and many hardware/software vendors will fine-tune their systems in such ways as to obtain a high ranking in the TCP top-tens. It is common knowledge that one needs to be wary of any performance claims that do not include a detailed specification of all the hardware and software configurations that were used in that particular test. Load testing We have already seen load testing as part of the process of performance testing and tuning. In that context, it meant constantly increasing the load on the system via automated tools. For a Web application, the load is defined in terms of concurrent users or HTTP connections. In the testing literature, the term "load testing" is usually defined as the process of exercising the system under test by feeding it the largest tasks it can operate with. Load testing is

sometimes called volume testing, or longevity/endurance testing. Examples of volume testing: testing a word processor by editing a very large document testing a printer by sending it a very large job testing a mail server with thousands of users mailboxes a specific case of volume testing is zero-volume testing, where the system is fed empty tasks Examples of longevity/endurance testing: testing a client-server application by running the client in a loop against the server over an extended period of time Goals of load testing: expose bugs that do not surface in cursory testing, such as memory management bugs, memory leaks, buffer overflows, etc. ensure that the application meets the performance baseline established during performance testing. This is done by running regression tests against the application at a specified maximum load. Although performance testing and load testing can seem similar, their goals are different. On one hand, performance testing uses load testing techniques and tools for measurement and benchmarking purposes and uses various load levels. On the other hand, load testing operates at a predefined load level, usually the highest load that the system can accept while still functioning properly. Note that load testing does not aim to break the system by overwhelming it, but instead tries to keep the system constantly humming like a well-oiled machine. In the context of load testing, I want to emphasize the extreme importance of having large datasets available for testing. In my experience, many important bugs simply do not surface unless you deal with very large entities such thousands of users in repositories such as LDAP/NIS/Active Directory, thousands of mail server mailboxes, multi-gigabyte tables in databases, deep file/directory hierarchies on file systems, etc. Testers obviously need automated tools to generate these large data sets, but fortunately any good scripting language worth its salt will do the job. Stress testing Stress testing tries to break the system under test by overwhelming its resources or by taking resources away from it (in which case it is sometimes called negative testing). The main purpose behind this madness is to make sure that the system fails and recovers gracefully -- this quality is known as recoverability. Where performance testing demands a controlled environment and repeatable measurements, stress testing joyfully induces chaos and unpredictability. To take again the example of a Web application, here are some ways in which stress can be applied to the system: double the baseline number for concurrent users/HTTP connections randomly shut down and restart ports on the network switches/routers that connect the servers (via SNMP commands for example) take the database offline, then restart it

rebuild a RAID array while the system is running run processes that consume resources (CPU, memory, disk, network) on the Web and database servers I'm sure devious testers can enhance this list with their favorite ways of breaking systems. However, stress testing does not break the system purely for the pleasure of breaking it, but instead it allows testers to observe how the system reacts to failure. Does it save its state or does it crash suddenly? Does it just hang and freeze or does it fail gracefully? On restart, is it able to recover from the last good state? Does it print out meaningful error messages to the user, or does it merely display incomprehensible hex codes? Is the security of the system compromised because of unexpected failures? And the list goes on. Conclusion I am aware that I only scratched the surface in terms of issues, tools and techniques that deserve to be mentioned in the context of performance, load and stress testing. I personally find the topic of performance testing and tuning particularly rich and interesting, and I intend to post more articles on this subject in the future.

http://agiletesting.blogspot.com/2005/02/performance-vs-load-vs-stress-testing.html

More on performance vs. load testing


I recently got some comments/questions related to my previous blog entry on performance vs. load vs. stress testing. Many people are still confused as to exactly what the difference is between performance and load testing. I've been thinking more about it and I'd like to propose the following question as a litmus test to distinguish between these two types of testing: are you actively profiling your application code and/or monitoring the server(s) running your application? If the answer is yes, then you're engaged in performance testing. If the answer is no, then what you're doing is load testing. Another way to look at it is to see whether you're doing more of a white-box type testing as opposed to black-box testing. In the white-box approach, testers, developers, system administrators and DBAs work together in order to instrument the application code and the database queries (via specialized profilers for example), and the hardware/operating system of the server(s) running the application and the database (via monitoring tools such as vmstat, iostat, top or Windows PerfMon). All these activities belong to performance testing. The black box approach is to run client load tools against the application in order to measure its responsiveness. Such tools range from lightweight, command-line driven tools such as httperf, openload, siege, Apache Flood, to more heavy duty tools such as OpenSTA, The Grinder, JMeter. This type of testing doesn't look at the internal behavior of the application, nor does it monitor the hardware/OS resources on the server(s) hosting the application. If this sounds like the type of testing you're doing, then I call it load testing. In practice though the 2 terms are often used interchangeably, and I am as guilty as anyone else of doing this, since I called one of my recent blog entries "HTTP performance testing with httperf, autobench and openload" instead of calling it more precisely "HTTP

load testing". I didn't have access to the application code or the servers hosting the applications I tested, so I wasn't really doing performance testing, only load testing. I think part of the confusion is that no matter how you look at these two types of testing, they have one common element: the load testing part. Even when you're profiling the application and monitoring the servers (hence doing performance testing), you still need to run load tools against the application, so from that perspective you're doing load testing. As far as I'm concerned, these definitions don't have much value in and of themselves. What matters most is to have a well-established procedure for tuning the application and the servers so that you can meet your users' or your business customers' requirements. This procedure will use elements of all the types of testing mentioned here and in my previous entry: load, performance and stress testing. Here's one example of such a procedure. Let's say you're developing a Web application with a database back-end that needs to support 100 concurrent users, with a response time of less than 3 seconds. How would you go about testing your application in order to make sure these requirements are met? 1. Start with 1 Web/Application server connected to 1 Database server. If you can, put both servers behind a firewall, and if you're thinking about doing load balancing down the road, put the Web server behind the load balancer. This way you'll have one each of different devices that you'll use in a real production environment. 2. Run a load test against the Web server, starting with 10 concurrent users, each user sending a total of 1000 requests to the server. Step up the number of users in increments of 10, until you reach 100 users. 3. While you're blasting the Web server, profile your application and your database to see if there are any hot spots in your code/SQL queries/stored procedures that you need to optimize. I realize I'm glossing over important details here, but this step is obviously highly dependent on your particular application. Also monitor both servers (Web/App and Database) via command line utilities mentioned before (top, vmstat, iostat, netstat, Windows PerfMon). These utilities will let you know what's going on with the servers in terms of hardware resources. Also monitor the firewall and the load balancer (many times you can do this via SNMP) -- but these devices are not likely to be a bottleneck at this level, since they usualy can deal with thousands of connections before they hit a limit, assuming they're hardware-based and not softwarebased. This is one of the most important steps in the whole procedure. It's not easy to make sense of the output of these monitoring tools, you need somebody who has a lot of experience in system/network architecture and administration. On Sun/Solaris platforms, there is a tool called the SE Performance Toolkit that tries to alleviate this task via built-in heuristics that kick in when certain thresholds are reached and tell you exactly what resource is being taxed.

4. Let's say your Web server's reply rate starts to level off around 50 users. Now you have a repeatable condition that you know causes problems. All the profiling and monitoring you've done in step 3, should have already given you a good idea about hot spots in your applicationm about SQL queries that are not optimized properly, about resource status at the hardware/OS level. At this point, the developers need to take back the profiling measurements and tune the code and the database queries. The system administrators can also increase server performance simply by throwing more hardware at the servers -- especially more RAM at the Web/App server in my experience, the more so if it's Java-based. 5. Let's say the application/database code, as well as the hardware/OS environment have been tuned to the best of everybody's abilities. You re-run the load test from step 2 and now you're at 75 concurrent users before performance starts to degrade. At this point, there's not much you can do with the existing setup. It's time to think about scaling the system horizontally, by adding other Web servers in a load-balanced Web server farm, or adding other database servers. Or maybe do content caching, for example with Apache mod_cache. Or maybe adding an external caching server such as Squid. One very important product of this whole procedure is that you now have a baseline number for your application for this given "staging" hardware environment. You can use the staging setup for nightly peformance testing runs that will tell you whether changes in your application/database code caused an increase or a decrease in performance. 6. Repeat above steps in a "real" production environment before you actually launch your application. All this discussion assumed you want to get performance/benchmarking numbers for your application. If you want to actually discover bugs and to see if your application fails and recovers gracefully, you need to do stress testing. Blast your Web server with double the number of users for example. Unplug network cables randomly (or shut down/restart switch ports via SNMP). Take out a disk from a RAID array. That kind of thing. The conclusion? At the end of the day, it doesn't really matter what you call your testing, as long as you help your team deliver what it promised in terms of application functionality and performance. Performance testing in particular is more art than science, and many times the only way to make progress in optimizing and tuning the application and its environment is by trial-and-error and perseverance. Having lots of excellent open source tools also helps a lot. http://agiletesting.blogspot.com/2005/04/more-on-performance-vs-load-testing.html

You might also like