Performance testing is a non-functional testing technique that assesses system responsiveness and stability under various workloads, measuring attributes like scalability and reliability. It is crucial for ensuring acceptable quality levels for end users across different application domains and involves various types of tests such as load, stress, and scalability testing. The process includes defining stakeholder expectations, conducting tests on representative systems, and analyzing results to identify risks and performance issues.
Performance testing is a non-functional testing technique that assesses system responsiveness and stability under various workloads, measuring attributes like scalability and reliability. It is crucial for ensuring acceptable quality levels for end users across different application domains and involves various types of tests such as load, stress, and scalability testing. The process includes defining stakeholder expectations, conducting tests on representative systems, and analyzing results to identify risks and performance issues.
technique performed to determine the system parameters in terms of responsiveness and stability under various workload. Performance testing measures the quality attributes of the system, such as scalability, reliability and resource usage. • Performance testing plays a critical role in establishing acceptable quality levels for the end user and is often closely integrated with other disciplines such as usability engineering and performance engineering. Performance testing is not limited to the web-based domain where the end user is the focus. It is also relevant to different application domains with a variety of system architectures, such as classic client-server, distributed and embedded. • Popular toy store Toysrus.com, could not handle the increased traffic generated by their advertising campaign resulting in loss of both marketing dollars, and potential toy sales. • An Airline website was not able to handle 10000+ users during a festival offer. • Encyclopedia Britannica declared free access to their online database as a promotional offer. They were not able to keep up with the onslaught of traffic for weeks. • Test results analysis may identify other areas of risk that need to be addressed. • Time Behavior: Generally the evaluation of time behavior is the most common performance testing objective. This aspect of performance testing examines the ability of a component or system to respond to user or system inputs within a specified time and under specified conditions. Measurements of time behavior may vary from the “end-to-end” time taken by the system to responding to user input, to the number of CPU cycles required by a software component to execute a particular task. • Resource Utilization: If the availability of system resources is identified as a risk, the utilization of those resources (e.g., the allocation of limited RAM) may be investigated by conducting specific performance tests. • Capacity: If issues of system behavior at the required capacity limits of the system (e.g., numbers of users or volumes of data) is identified as a risk, performance tests may be conducted to evaluate the suitability of the system architecture. • Performance testing often takes the form of experimentation, which enables measurement and analysis of specific system parameters to take place. These may be conducted iteratively in support of system analysis, design and implementation to enable architectural decisions to be made and to help shape stakeholder expectations. Principles of Performance Testing • Tests must be aligned to the defined expectations of different stakeholder groups, in particular users, system designers and operations staff. • The tests must be reproducible. Statistically identical results (within a specified tolerance) must be obtained by repeating the tests on an unchanged system. • The tests must yield results that are both understandable and can be readily compared to stakeholder expectations. • The tests can be conducted, where resources allow, either on complete or partial systems or test environments that are representative of the production system. • The tests must be practically affordable and executable within the timeframe set by the project. Performance Testing Process: • Attributes of Performance Testing: • Speed • Scalability • Stability • reliability Types of Performance Testing • Performance Testing Performance testing is an umbrella term including any kind of testing focused on performance (responsiveness) of the system or component under different volumes of load. • Load Testing Load testing focuses on the ability of a system to handle increasing levels of anticipated realistic loads resulting from transaction requests generated by controlled numbers of concurrent users or processes. • Stress Testing Stress testing focuses on the ability of a system or component to handle peak loads that are at or beyond the limits of its anticipated or specified workloads. Stress testing is also used to evaluate a system’s ability to handle reduced availability of resources such as accessible computing capacity, available bandwidth, and memory. • Scalability Testing Scalability testing focuses on the ability of a system to meet future efficiency requirements which may be beyond those currently required. The objective of these tests is to determine the system’s ability to grow (e.g., with more users, larger amounts of data stored) without violating the currently specified performance requirements or failing. Once the limits of scalability are known, threshold values can be set and monitored in production to provide a warning of problems which may be about to arise.. In addition the production environment may be adjusted with appropriate amounts of hardware. • Spike Testing Spike testing focuses on the ability of a system to respond correctly to sudden bursts of peak loads and return afterwards to a steady state. • Endurance Testing Endurance testing focuses on the stability of the system over a time frame specific to the system’s operational context. This type of testing verifies that there are no resource capacity problems (e.g., memory leaks, database connections, thread pools) that may eventually degrade performance and/or cause failures at breaking points. • Concurrency Testing Concurrency testing focuses on the impact of situations where specific actions occur simultaneously (e.g., when large numbers of users log in at the same time). Concurrency issues are notoriously difficult to find and reproduce, particularly when the problem occurs in an environment where testing has little or no control, such as production. • Capacity Testing Capacity testing determines how many users and/or transactions a given system will support and still meet the stated performance objectives. These objectives may also be stated with regard to the data volumes resulting from the transactions. Testing Types in Performance Testing • Static testing Static testing activities are often more important for performance testing than for functional suitability testing. This is because so many critical performance defects are introduced in the architecture and design of the system. These defects can be introduced by misunderstandings or a lack of knowledge by the designers and architects. These defects can also be introduced because the requirements did not adequately capture the response time, throughput, or resource utilization targets, the expected load and usage of the system, or the constraints. Static testing activities for performance can include: Reviews of requirements with focus on performance aspects and risks Reviews of database schemas, entity-relationship diagrams, metadata, stored procedures and queries Reviews of the system and network architecture Reviews of critical segments of the system code (e.g., complex algorithms) • Dynamic testing As the system is built, dynamic performance testing should start as soon as possible. Opportunities for dynamic performance testing include: During unit testing, including using profiling information to determine potential bottlenecks and dynamic analysis to evaluate resource utilization During component integration testing, across key use cases and workflows, especially when integrating different use case features or integrating with the “backbone” structure of a workflow • During system testing of overall end-to-end behaviors under various load conditions During system integration testing, especially for data flows and workflows across key inter-system interfaces. In system integration testing is not uncommon for the “user” to be another system or machine (e.g. inputs from sensor inputs and other systems ) During acceptance testing, to build user, customer, and operator confidence in the proper performance of the system and to fine tune the system under real world conditions (but generally not to find performance defects in the system) • In Agile and other iterative-incremental lifecycles, teams should incorporate static and dynamic performance testing into early iterations rather than waiting for final iterations to address performance risks. If custom or new hardware is part of the system, early dynamic performance tests can be performed using simulators. However, it is good practice to start testing on the actual hardware as soon as possible, as simulators often do not adequately capture resource constraints and performance-related behaviors. The Concept of Load Generation • Loads are comparable to the data inputs used for functional test cases, but differ in the following principal ways: A performance test load must represent many user inputs, not just one A performance test load may require dedicated hardware and tools for generation Generation of a performance test load is dependent on the absence of any functional defects in the system under test which may impact test execution The efficient and reliable generation of a specified load is a key success factor when conducting performance tests. There are different options for load generation. Load Generation via the User Interface • This may be an adequate approach if only a small number of users are to be represented and if the required numbers of software clients are available from which to enter required inputs. This approach may also be used in conjunction with functional test execution tools, but may rapidly become impractical as the numbers of users to be simulated increases. The stability of the user interface (UI) also represents a critical dependency. Frequent changes can impact the repeatability of performance tests and may significantly affect the maintenance costs. Testing through the UI may be the most representative approach for end-to-end tests. Load Generation using Crowds • This approach depends on the availability of a large number of testers who will represent real users. In crowd testing, the testers are organized such that the desired load can be generated. This may be a suitable method for testing applications that are reachable from anywhere in the world (e.g., web-based), and may involve the users generating a load from a wide range of different device types and configurations. Although this approach may enable very large numbers of users to be utilized, the load generated will not be as reproducible and precise as other options and is more complex to organize. Load Generation via the Application Programming Interface (API) • This approach is similar to using the UI for data entry, but uses the application’s API instead of the UI to simulate user interaction with the system under test. The approach is therefore less sensitive to changes (e.g., delays) in the UI and allows the transactions to be processed in the same way as they would if entered directly by a user via the UI. Dedicated scripts may be created which repeatedly call specific API routines and enable more users to be simulated compared to using UI inputs. Load Generation using Captured Communication Protocols • This approach involves capturing user interaction with the system under test at the communications protocol level and then replaying these scripts to simulate potentially very large numbers of users in a repeatable and reliable manner. Common Performance Efficiency Failure Modes and Their Causes • Slow response under all load levels In some cases, response is unacceptable regardless of load. This may be caused by underlying performance issues, including, but not limited to, bad database design or implementation, network latency, and other background loads. Such issues can be identified during functional and usability testing, not just performance testing, so test analysts should keep an eye open for them and report them. Slow response under moderate-to-heavy load levels In some cases, response degrades unacceptably with moderate-to-heavy load, even when such loads are entirely within normal, expected, allowed ranges. Underlying defects include saturation of one or more resources and varying background loads. • Degraded response over time In some cases, response degrades gradually or severely over time. Underlying causes include memory leaks, disk fragmentation, increasing network load over time, growth of the file repository, and unexpected database growth. Inadequate or graceless error handling under heavy or over-limit load In some cases, response time is acceptable but error handling degrades at high and beyond-limit load levels. Underlying defects include insufficient resource pools, undersized queues and stacks, and too rapid time-out settings. • Specific examples of the general types of failures listed above include: • A web-based application that provides information about a company’s services does not respond to user requests within seven seconds (a general industry rule of thumb). The performance efficiency of the system cannot be achieved under specific load conditions. • A system crashes or is unable to respond to user inputs when subjected to a sudden large number of user requests (e.g., ticket sales for a major sporting event). The capacity of the system to handle this number of users is inadequate. • System response is significantly degraded when users submit requests for large amounts of data (e.g., a large and important report is posted on a web site for download). The capacity of the system to handle the generated data volumes is insufficient. • Batch processing is unable to complete before online processing is needed. The execution time of the batch processes is insufficient for the time period allowed. • A real-time system runs out of RAM when parallel processes generate large demands for dynamic memory which cannot be released in time. The RAM is not dimensioned adequately, or requests for RAM are not adequately prioritized. • A real-time system component A which supplies inputs to real- time system component B is unable to calculate updates at the required rate. The overall system fails to respond in time and may fail. Code modules in component A must be evaluated and modified (“performance profiling”) to ensure that the required update rates can be achieved. • Inadequate or graceless error handling under heavy or over-limit load In some cases, response time is acceptable but error handling degrades at high and beyond-limit load levels. Underlying defects include insufficient resource pools, undersized queues and stacks, and too rapid time-out settings.
• Specific examples of the general types of failures listed above include:
• A web-based application that provides information about a company’s services does not respond to user requests within seven seconds (a general industry rule of thumb). The performance efficiency of the system cannot be achieved under specific load conditions. • A system crashes or is unable to respond to user inputs when subjected to a sudden large number of user requests (e.g., ticket sales for a major sporting event). The capacity of the system to handle this number of users is inadequate. • System response is significantly degraded when users submit requests for large amounts of data (e.g., a large and important report is posted on a web site for download). The capacity of the system to handle the generated data volumes is insufficient. • Batch processing is unable to complete before online processing is needed. The execution time of the batch processes is insufficient for the time period allowed. A real-time system runs out of RAM when parallel processes generate large demands for dynamic memory which cannot be released in time. The RAM is not dimensioned adequately, or requests for RAM are not adequately prioritized. A real-time system component A which supplies inputs to real-time system component B is unable to calculate updates at the required rate. The overall system fails to respond in time and may fail. Code modules in component A must be evaluated and modified (“performance profiling”) to ensure that the required update rates can be achieved. Typical Metrics Collected in Performance Testing • Why Performance Metrics are Needed Accurate measurements and the metrics which are derived from those measurements are essential for defining the goals of performance testing and for evaluating the results of performance testing. Performance testing should not be undertaken without first understanding which measurements and metrics are needed. The following project risks apply if this advice is ignored: It is unknown if the levels of performance are acceptable to meet operational objectives The performance requirements are not defined in measurable terms • It may not be possible to identify trends that may predict lower levels of performance The actual results of a performance test cannot be evaluated by comparing them to a baseline set of performance measures that define acceptable and/or unacceptable performance Performance test results are evaluated based on the subjective opinion of one or more people The results provided by a performance test tool are not understood Collecting Performance Measurements and Metrics • Collecting Performance Measurements and Metrics As with any form of measurement, it is possible to obtain and express metrics in precise ways. Therefore, any of the metrics and measurements described in this section can and should be defined to be meaningful in a particular context. This is a matter of performing initial tests and learning which metrics need to be further refined and which need to be added. For example, the metric of response time likely will be in any set of performance metrics. However, to be meaningful and actionable, the response time metric will need to be further defined in terms of time of day, number of concurrent users, the amount of data being processed and so forth. • The metrics collected in a specific performance test will vary based on the business context (business processes, customer and user behavior, and stakeholder expectations), operational context (technology and how it is used) test objectives For example, the metrics chosen for the performance testing of an international ecommerce website will differ from those chosen for the performance testing of an embedded system used to control medical device functionality. Technical Environment • The categories of measurements and metrics included below are the ones commonly obtained from performance testing. Technical Environment Performance metrics will vary by the type of the technical environment, as shown in the following list: Web-based Mobile Internet-of-Things (IoT) Desktop client devices Server-side processing Mainframe Databases Networks The nature of software running in the environment (e.g., embedded) • Resource utilization (e.g., CPU, memory, network bandwidth, network latency, available disk space, I/O rate, idle and busy threads) Throughput rate of key transaction (i.e., the number of transactions that can be processed in a given period of time) Batch processing time (e.g., wait times, throughput times, data base response times, completion times) Numbers of errors impacting performance Completion time (e.g., for creating, reading, updating, and deleting data) Background load on shared resources (especially in virtualized environments) Software metrics (e.g., code complexity) Business Environment • Business Environment From the business or functional perspective, performance metrics may include the following: Business process efficiency (e.g., the speed of performing an overall business process including normal, alternate and exceptional use case flows) Throughput of data, transactions, and other units of work performed (e.g., orders processed per hour, data rows added per minute) Service Level Agreement (SLA) compliance or violation rates (e.g., SLA violations per unit of time) Scope of usage (e.g., percentage of global or national users conducting tasks at a given time) Concurrency of usage (e.g., the number of users concurrently performing a task) Timing of usage (e.g., the number of orders processed during peak load times) Operational Environment • Operational Environment The operational aspect of performance testing focuses on tasks that are generally not considered to be user-facing in nature. • These include the following: • Operational processes (e.g., the time required for environment start-up, backups, shutdown and resumption times) System restoration (e.g., the time required to restore data from a backup) Alerts and warnings (e.g., the time needed for the system to issue an alert or warning) Sources of Performance Metrics • System performance should be no more than minimally impacted by the metrics collection effort (known as the “probe effect”). In addition, the volume, accuracy and speed with which performance metrics must be collected makes tool usage a requirement. There are three key sources of performance metrics: • Performance Test Tools: All performance test tools provide measurements and metrics as the result of a test. Tools may vary in the number of metrics shown, the way in which the metrics are shown, and the ability for the user to customize the metrics to a particular situation . Some tools collect and display performance metrics in text format, while more robust tools collect and display performance metrics graphically in a dashboard format. Many tools offer the ability to export metrics to facilitate test evaluation and reporting. • Performance Monitoring Tools Performance monitoring tools are often employed to supplement the reporting capabilities of performance test tools (see also Section 5.1). In addition, monitoring tools may be used to monitor system performance on an ongoing basis and to alert system administrators to lowered levels of performance and higher levels of system errors and alerts. These tools may also be used to detect and notify in the event of suspicious behavior • Log Analysis Tools There are tools that scan server logs and compile metrics from them. Some of these tools can create charts to provide a graphical view of the data. Errors, alerts and warnings are normally recorded in server logs. These include: High resource usage, such as high CPU utilization, high levels of disk storage consumed, and insufficient bandwidth Memory errors and warnings, such as memory exhaustion Deadlocks and multi-threading problems, especially when performing database operations Database errors, such as SQL exceptions and SQL timeouts Results of a Performance Test • In functional testing, particularly when verifying specified functional requirements or functional elements of user stories, the expected results usually can be defined clearly and the test results interpreted to determine if the test passed or failed. For example, a monthly sales report shows either a correct or an incorrect total. Whereas tests that verify functional suitability often benefit from well-defined test oracles, performance testing often lacks this source of information. Not only are the stakeholders notoriously bad at articulating performance requirements, many business analysts and product owners are bad at eliciting such requirements. Testers often receive limited guidance to define the expected test results. • When evaluating performance test results, it is important to look at the results closely. Initial raw results can be misleading with performance failures being hidden beneath apparently good overall results. For example, resource utilization may be well under 75% for all key potential bottleneck resources, but the throughput or response time of key transactions or use cases are an order-of- magnitude too slow. Performance Testing Activities • Performance testing is iterative in nature. Each test provides valuable insights into application and system performance. The information gathered from one test is used to correct or optimize application and system parameters. The next test iteration will then show the results of modifications, and so on until test objectives are reached. Test Planning • Test planning is particularly important for performance testing due to the need for the allocation of test environments, test data, tools and human resources. In addition, this is the activity in which the scope of performance testing is established. During test planning, risk identification and risk analysis activities are completed and relevant information is updated in any test planning documentation (e.g., test plan, level test plan). Just as test planning is revisited and modified as needed, so are risks, risk levels and risk status modified to reflect changes in risk conditions. Test Monitoring and Control • Control measures are defined to provide action plans should issues be encountered which might impact performance efficiency, such as increasing the load generation capacity if the infrastructure does not generate the desired loads as planned for particular performance tests changed, new or replaced hardware changes to network components changes to software implementation The performance test objectives are evaluated to check for exit criteria achievement. Test Analysis • Test Analysis Effective performance tests are based on an analysis of performance requirements, test objectives, Service Level Agreements (SLA), IT architecture, process models and other items that comprise the test basis. This activity may be supported by modeling and analysis of system resource requirements and/or behavior using spreadsheets or capacity planning tools. Specific test conditions are identified such as load levels, timing conditions, and transactions to be tested. The required type(s) of performance test (e.g., load, stress, scalability) are then decided. • Test Design Performance test cases are designed. These are generally created in modular form so that they may be used as the building blocks of larger, more complex performance tests • Test Implementation In the implementation phase, performance test cases are ordered into performance test procedures. These performance test procedures should reflect the steps normally taken by the user and other functional activities that are to be covered during performance testing. A test implementation activity is establishing and/or resetting the test environment before each test execution. Since performance testing is typically data-driven, a process is needed to establish test data that is representative of actual production data in volume and type so that production use can be simulated. • Test Execution Test execution occurs when the performance test is conducted, often by using performance test tools. Test results are evaluated to determine if the system’s performance meets the requirements and other stated objectives. Any defects are reported. Test Completion • Test Completion Performance test results are provided to the stakeholders (e.g., architects, managers, product owners) in a test summary report. The results are expressed through metrics which are often aggregated to simplify the meaning of the test results. Visual means of reporting such as dashboards are often used to express performance test results in ways that are easier to understand than text-based metrics. Performance testing is often considered to be an ongoing activity in that it is performed at multiple times and at all test levels (component, integration, system, system integration and acceptance testing). At the close of a defined period of performance testing, a point of test closure may be reached where designed tests, test tool assets (test cases and test procedures), test data and other testware are archived or passed on to other testers for later use during system maintenance activities. Categories of Performance Risks for Different Architectures • application or system performance varies considerably based on the architecture, application and host environment. While it is not possible to provide a complete list of performance risks for all systems, the list below includes some typical types of risks associated with particular architectures: • Single Computer Systems These are systems or applications that runs entirely on one non- virtualized computer. Performance can degrade due to excessive resource consumption including memory leaks, background activities such as security software, slow storage subsystems (e.g., low-speed external devices or disk fragmentation), and operating system mismanagement. inefficient implementation of algorithms which do not make use of available resources (e.g., main memory) and as a result execute slower than required. • Multi-tier Systems These are systems of systems that run on multiple servers, each of which performs a specific set of tasks, such as database server, application server, and presentation server. Each server is, of course, a computer and subject to the risks given earlier. In addition, performance can degrade due to poor or non-scalable database design, network bottlenecks, and inadequate bandwidth or capacity on any single server. • Distributed Systems These are systems of systems, similar to a multi-tier architecture, but the various servers may change dynamically, such as an e- commerce system that accesses different inventory databases depending on the geographic location of the person placing the order. In addition to the risks associated with multi-tier architectures, this architecture can experience performance problems due to critical workflows or dataflows to, from, or through unreliable or unpredictable remote servers, especially when such servers suffer periodic connection problems or intermittent periods of intense load. • Virtualized Systems These are systems where the physical hardware hosts multiple virtual computers. These virtual machines may host single-computer systems and applications as well as servers that are part of a multi-tier or distributed architecture. Performance risks that arise specifically from virtualization include excessive load on the hardware across all the virtual machines or improper configuration of the host virtual machine resulting in inadequate resources. • Dynamic/Cloud-based Systems These are systems that offer the ability to scale on demand, increasing capacity as the level of load increases. These systems are typically distributed and virtualized multitier systems, albeit with self- scaling features designed specifically to mitigate some of the performance risks associated with those architectures. However, there are risks associated with failures to properly configure these features during initial setup or subsequent updates. • Client –Server Systems These are systems running on a client that communicate via a user interface with a single server, multi-tier server, or distributed server. Since there is code running on the client, the single computer risks apply to that code, while the server-side issues mentioned above apply as well. Further, performance risks exist due to connection speed and reliability issues, network congestion at the client connection point (e.g., public Wi-Fi), and potential problems due to firewalls, packet inspection and server load balancing. • Mobile Applications This are applications running on a smartphone, tablet, or other mobile device. Such applications are subject to the risks mentioned for client-server and browser-based (web apps) applications. In addition, performance issues can arise due to the limited and variable resources and connectivity available on the mobile device (which can be affected by location, battery life, charge state, available memory on the device and temperature). For those applications that use device sensors or radios such as accelerometers or Bluetooth, slow dataflows from those sources could create problems. Finally, mobile applications often have heavy interactions with other local mobile apps and remote web services, any of which can potentially become a performance efficiency bottleneck. • Embedded Real-time Systems These are systems that work within or even control everyday things such as cars (e.g., entertainment systems and intelligent braking systems), elevators, traffic signals, Heating, Ventilation and Air Conditioning (HVAC) systems, and more. These systems often have many of the risks of mobile devices, including (increasingly) connectivity- related issues since these devices are connected to the Internet. However the diminished performance of a mobile video game is usually not a safety hazard for the user, while such slowdowns in a vehicle braking system could prove catastrophic. • Mainframe Applications These are applications—in many cases decades-old applications—supporting often mission-critical business functions in a data center, sometimes via batch processing. Most are quite predictable and fast when used as originally designed, but many of these are now accessible via APIs, web services, or through their database, which can result in unexpected loads that affect throughput of established applications. Note that any particular application or system may incorporate two or more of the architectures listed above, which means that all relevant risks will apply to that application or system. In fact, given the Internet of Things and the explosion of mobile applications—two areas where extreme levels of interaction and connection is the rule—it is possible that all architectures are present in some form in an application, and thus all risks can apply. • While architecture is clearly an important technical decision with a profound impact on performance risks, other technical decisions also influence and create risks. For example, memory leaks are more common with languages that allow direct heap memory management, such as C and C++, and performance issues are different for relational versus non-relational databases. Such decisions extend all the way down to the design of individual functions or methods (e.g., the choice of a recursive as opposed to an iterative algorithm). As a tester, the ability to know about or even influence such decisions will vary, depending on the roles and responsibilities of testers within the organization and software development lifecycle. Sequential Development Models • The ideal practice of performance testing in sequential development models is to include performance criteria as a part of the acceptance criteria which are defined at the outset of a project. Reinforcing the lifecycle view of testing, performance testing activities should be conducted throughout the software development lifecycle. As the project progresses, each successive performance test activity should be based on items defined in the prior activities as shown below. Concept – Verify that system performance goals are defined as acceptance criteria for the project. Requirements – Verify that performance requirements are defined and represent stakeholder needs correctly. • Analysis and Design – Verify that the system design reflects the performance requirements. Coding/Implementation – Verify that the code is efficient and reflects the requirements and design in terms of performance. Component Testing – Conduct component level performance testing. Component Integration Testing – Conduct performance testing at the component integration level. System Testing – Conduct performance testing at the system level, which includes hardware, software, procedures and data that are representative of • the production environment. System interfaces may be simulated provided that they give a true representation of performance. System Integration Testing– Conduct performance testing with the entire system which is representative of the production environment. Acceptance Testing – Validate that system performance meets the originally stated user needs and acceptance criteria. Iterative and Incremental Development Models • In these development models, such as Agile, performance testing is also seen as an iterative and incremental activity (see [ISTQB_FL_AT]).Performance testing can occur as part of the first iteration, or as an iteration dedicated entirely to performance testing. However, with these lifecycle models, the execution of performance testing may be performed by a separate team tasked with performance testing. Continuous Integration (CI) is commonly performed in iterative and incremental software development lifecycles, which facilitates a highly automated execution of tests. The most common objective of testing in CI is to perform regression testing and ensure each build is stable. Performance testing can be part of the automated tests performed in CI if the tests are designed in such a way as to be executed at a build level. • However, unlike functional automated tests, there are additional concerns such as the following: The setup of the performance test environment – This often requires a test environment that is available on demand, such as a cloud-based performance test environment. Determining which performance tests to automate in CI – Due to the short timeframe available for CI tests, CI performance tests may be a subset of more extensive performance tests that are conducted by a specialist team at other times during an iteration. Creating the performance tests for CI – The main objective of performance tests as part of CI is to ensure a change does not negatively impact performance. Depending on the changes made for any given build, new performance tests may be required. Executing performance tests on portions of an application or system – This often requires the tools and test environments to be capable of rapid performance testing including the ability to select subsets of applicable tests. • Performance testing in the iterative and incremental software development lifecycles can also have its own lifecycle activities: Release Planning – In this activity, performance testing is considered from the perspective of all iterations in a release, from the first iteration to the final iteration. Performance risks are identified and assessed, and mitigation measures planned. This often includes planning of any final performance testing before the release of the application. • Iteration Planning – In the context of each iteration, performance testing may be performed within the iteration and as each iteration is completed. Performance risks are assessed in more detail for each user story. User Story Creation – User stories often form the basis of performance requirements in Agile methodologies, with the specific performance criteria described in the associated acceptance criteria. These are referred to as “nonfunctional” user stories. Design of performance tests –performance requirements and criteria which are described in particular user stories are used for the design of tests • Coding/Implementation – During coding, performance testing may be performed at a component level. An example of this would be the tuning of algorithms for optimum performance efficiency. Testing/Evaluation – While testing is typically performed in close proximity to development activities, performance testing may be performed as a separate activity, depending on the scope and objectives of performance testing during the iteration. For example, if the goal of performance testing is to test the performance of the iteration as a completed set of user stories, a wider scope of performance testing will be needed than that seen in performance testing a single user story. This may be scheduled in a dedicated iteration for performance testing. Delivery – Since delivery will introduce the application to the production environment, performance will need to be monitored to determine if the application achieves the desired levels of performance in actual usage. Commercial Off-the-Shelf (COTS) and other Supplier/Acquirer Models • Many organizations do not develop applications and systems themselves, but instead are in the position of acquiring software from vendor sources or from open- source projects. In such supplier/acquirer models, performance is an important consideration that requires testing from both the supplier (vendor/developer) and acquirer (customer) perspectives. • Regardless of the source of the application, it is often the responsibility of the customer to validate that the performance meets their requirements. In the case of customized vendor- developed software, performance requirements and associated acceptance criteria which should be specified as part of the contract between the vendor and customer. In the case of COTS applications, the customer has sole responsibility to test the performance of the product in a realistic test environment prior to deployment. Tool Support Performance testing tools include the following types of tool to support performance testing. Load Generators The generator, through an IDE, script editor or tool suite, is able to create and execute multiple client instances that simulate user behavior according to a defined operational profile. Creating multiple instances in short periods of time will cause load on a system under test. The generator creates the load and also collects metrics for later reporting. When executing performance tests the objective of the load generator is to mimic the real world as much as is practical. This often means that user requests coming from various locations are needed, not just from the testing location. Environments that are set up with multiple points of presence will distribute where the load is originating from so that it is not all coming from a single network. This provides realism to the test, though it can sometimes skew results if intermediate network hops create delays. • Load Management Console The load management console provides the control to start and stop the load generator(s). The console also aggregates metrics from the various transactions that are defined within the load instances used by the generator. The console enables reports and graphs from the test executions to be viewed and supports results analysis. • Monitoring Tool Monitoring tools run concurrently with the component or system under test and supervise, record and/or analyze the behavior of the component or system. Typical components which are monitored include web server queues, system memory and disk space. • License models for performance test tools include the traditional seat/site based license with full ownership, a cloud-based pay-as-you-go license model, and open source licenses which are free to use in a defined environment or through cloud-based offerings. Each model implies a different cost structure and may include ongoing maintenance. What is clear is that for any tool selected, understanding how that tool works (through training and/or self-study) will require time and budget. Tool Suitability • Compatibility In general a tool is selected for the organization and not only for a project. This means considering the following factors in the organization: Interfaces to external components: • Interfaces to software components or other tools may need to be considered as part of the complete integration requirements to meet process or other inter-operability requirements (e.g., integration in the CI process). Platforms: Compatibility with the platforms (and their versions) within an organization is essential. This applies to the platforms used to host the tools and the platforms with which the tools interact for monitoring and/or load generation. • Scalability Another factor to consider is the total number of concurrent user simulations the tool can handle. This will include several factors: Maximum number of licenses required Load generation workstation/server configuration requirements Ability to generate load from multiple points of presence (e.g., distributed servers) • Understandability Another factor to consider is the level of technical knowledge needed to use the tool. This is often overlooked and can lead to unskilled testers incorrectly configuring tests, which in turn provide inaccurate results. For testing requiring complex scenarios and a high level of programmability and customization, teams should ensure that the tester has the necessary skills, background, and training. • Monitoring Is the monitoring provided by the tool sufficient? Are there other monitoring tools available in the environment that can be used to supplement the monitoring by the tool? Can the monitoring be correlated to the defined transactions? All of these questions must be answered to determine if the tool will provide the monitoring required by the project.