Software Testing
Software Testing
LESSON 1
This is the first lesson of the entire subject, will introduce you the fundamental
concepts of software testing. The prime part of software development is its
phases. Before getting into software testing the readers must understand the
phases of software project development which are associated with entire
software testing task.
At the end of the lesson, I hope you can able understand
• Requirements Analysis
• How to Plan?
• The tools and methods of design
• Various Coding methods
• An Independent Testing Module
• The ways of deployment and further maintenance
Software life cycle models describe phases of the software cycle and the
order in which those phases are executed. There are tons of models, and many
companies adopt their own, but all have very similar patterns. A software
project is made up of a series of phases. Broadly, most software projects
comprise the following phases.
1
• Requirements Gathering and Analysis
• Planning
• Design
• Development or Coding
• Testing
• Deployment and Maintenance
Among various models, Let me explain the waterfall model is a sequential
software development model (a process for the creation of software) in which
development is seen as flowing steadily downwards (like a waterfall) through the
phases of requirements analysis, design, implementation, testing (validation),
integration, and maintenance. This is otherwise called as common and classic
model where it is a linear-model as depicted in figure 1.1
2
1.2 REQUIREMENTS GATHERING AND ANALYSIS
1.3 PLANNING
3
Check your progress 2
1.4 DESIGN
The purpose of the design is to figure out how to satisfy the requirements
enumerated in the System Requirements Specification document. The design
phase produces a representation that will be used by the following phase, the
development phase. This representation should serve two purposes. First, from
this representation, it should be possible to verify that all the requirements are
satisfied. Second, this representation should give sufficient information for the
development phase to proceed with the coding and implementation of the
system. Design is usually split into two levels – high level design and low-level
or a detailed design. The design step produces the System Design Description
(SDD) document that will be used by development teams to produce the
programs that realize the design.
Design acts as a blue print for the actual coding to proceed. This
development or coding phase comprises coding the programs in the chosen
programming language. It produces the software that meets the requirements
the design was meant to satisfy. In addition to programming, this phase also
involves the creation of product documentation.
1.6 TESTING
As the programs are coded (in the chosen programming language), they
are also tested. In addition, after the coding is (deemed) complete, the product
4
is subjected to testing. Testing is the process of exercising the software product
in pre-defined ways to check if the behavior is the same as expected behavior.
By testing the product, an organization identifies and removes as many defects
as possible before shipping it out.
Table 1.1
Strengths Weaknesses
5
1.8 LETS SUM UP
In this lesson, we discussed briefly on the phases of SDLC and the activities
performed in each phase. The Software Development Life cycle models such as
waterfall, V-process and other models are illustrated in detail in third lesson of
the courseware.
References:
6
LESSON 2
2.1 QUALITY
7
1. How these inputs actually get processed;
2. What changes are actually produced in the internal state or
environment? and
3. What outputs are actually produced?
If the actual behavior and the expected behavior are identical in all their
characteristics, then that test case is said to be passed. If not, the given
software is said to have a defect on that test case. How do we increase the
chances of a product meeting the requirements expected of it, consistently and
predictably? There are two types of methods – quality control and quality
assurance.
Usually done throughout the life Usually done after the product is
cycle built
8
This is usually a staff function This is usually a line function
9
VALIDATION
Validation is the process of evaluating a system or component during or
at the end of the development process to determine whether it satisfies specified
requirements. Testing is NOT meant to replace other ways of ensuring quality
(like reviews). It is one of the methods to detect defects in a software product.
There are other methods that achieve the same function. For example, we will
see later that following well-defined processes and standards reduce the
chances of defects creeping into a software. We will also discuss other methods
like reviews and inspections, which actually attempt to prevent defects coming
into the product. To be effective, testing should complement, supplement, and
augment such quality assurance methods discussed in the previous section.
The idea of catching defects within each phase, without letting then reach the
testing phase, leads us to define two more terms-verification and validation.
During the requirements gathering phase, the requirements are faithfully
captured. The SRS document is the product of the requirements phase. To
ensure that requirements are faithfully captured, the customer verifies this
document. The design phase takes the SRS document as input and maps the
requirements to a design that can drive the coding. The SOD document is the
product of the design phase. The SOD is verified by the requirement team to
ensure that the design faithfully reflects the SRS, which imposed the conditions
at the beginning of the design phase.
Verification takes care of activities to focus on the question "Are we
building the product right?" and validation takes care of a set of activities to
address the question "Are we building the right product?"
To build the product right, certain activities/conditions/procedures ill
imposed at the beginning of the life cycle. These activities are considered
"proactive" as their purpose is to prevent the defects before they take shape. The
process activities carried out during various phases for each of the product
releases can be termed as verification. Requirements review, design review, and
code review are some examples of verification activities.
To build the right product, certain activities are carried out during
various phases to validate whether the product is built as per specification.
These activities are considered "reactive" as their purpose is to find defeat that
affect the product and fix them as soon as they are introduced. Sam examples
of validation include unit testing performed to verify if the code logic works,
integration testing performed to verify the design, and system testing performed
to verify that the requirements are met.
To summarize, there are different terminologies that may stand for the
same or similar concepts. For all practical purposes in this study material, we
can assume verification and quality assurance to be one and the same.
Similarly quality control, validation, and. testing mean the same.
Quality Assurance = Verification
Quality Control = Validation
= Testing
10
Check your progress 2
Define Validation.
Notes: a) Write your answer in the space given below
f) Check your answer with the one given at the end of this lesson.
--------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
11
An example of applying the ETVX model to the design phase is presented in the
figure 2.1.
Entry criteria:
Approval of SRS by customer
Output:
• Architecture documents
Input: • Design documents
Approved SRS • Program Specifications
Exit criteria:
Figure 2.1 • Complete traceability between
design and SRS
ETVX model applied to design • Development team ready to
start programming
Check your progress 3
I hope this lesson provides hands on view about quality with assurance and
control. And Validation and Verification process of a software project.
A model, known as the Entry Task Verification eXit or ETVX model, offers
several advantages for effective verification and validation.
12
2. Validation is the process of evaluating a system or component during or
at the end of the development process to determine whether it satisfies
specified requirements.
3. The ETVX system provides,
a. The verification for each phase (or each activity in each phase) helps
prevent defects, or at least, minimizes the time delay between defect
injection and defect detection.
b. Documentation of the detailed tasks that comprise each phase reduces
the ambiguity in interpretation of the instructions and thus minimizes
the variations that can come from repeated executions of these tasks by
different individuals.
13
LESSON 3
14
The sequence of activities The different activities work together in
unison in a certain sequence of steps to achieve overall project goals. For
example, the process of requirements gathering may involve steps such as
interviews with customers, documentation of requirements, validation of
documented requirements with customers, and freezing of requirements,
These steps may be repeated as many times as needed to get the final frozen
requirements.
Methods of verification of each activity, including the mechanism
of communication amongst the activities The different activities interact
with one another by means of communication methods. For example, when a
defect is found in one activity and is traced back to the causes in an earlier
activity, proper verification methods are needed to retrace steps from the point
of defect to the cause of the defect.
We will now look at some of the common life cycle models that are used
in software projects. For each model, we will look at:
1. a brief description of the model;
2. the relationship of the model to verification and validation activities;
and
3. typical scenarios where that life cycle model is useful.
15
Description. With the SDD as input, the project proceeds to the development or
coding phase, wherein programmers develop the programs required to satisfy
the design. Once the programmers complete their coding tasks, they hand the
product to the testing team, who test the product before it is released.
If there is no problem in a given phase, then this method can work, going
in one direction (like a waterfall) given in Figure 3.1. But what would happen if
there are problems after going to a particular phase? For example, you go into
the design phase and find that it is not possible to satisfy the requirements,
16
1. The software development organization interacts with customers to
understand their customers to understand their requirements.
2. The software development organization produces a prototype to show
how the eventual software system would look like. This prototype
would have the models of how the input screens and output reports
would look like, in addition to having some “empty can functionality”
to demonstrate the workflow and processing logic.
3. The customer and the development organization review the prototype
frequently so that the customer’s feedback is taken very early in the
cycle (that is, during the requirements gathering phase).
4. Based on the feedback and the prototyping that is produced, the
software development organization produces the System
Requirements Specification document.
5. Once the SRS document is produced, the prototype can be discarded.
6. The SRS document is used as the basis for further design and
development.
Thus, the prototype is simply used as a means of quickly gathering (the
right) requirements. This model has built-in mechanisms for verification and
validation of the requirements. As the prototype is being developed, the
customer’s frequent feedback acts as a validation mechanism. Once the SRS is
produced, it acts as a validation mechanism for the design and subsequent
steps. But the verification and validation activities of the subsequent phases are
actually dictated by the life cycle model that is followed after the SRS is
obtained.
The Rapid Application Development model is a variation of the
Prototyping Model. Like the Prototyping Model, the RAD Model relies on
feedback and interaction by the customers to gather the initial requirements.
However, the Prototyping model differs from the RAD Model on two counts.
First, in the RAD Model, it is not a prototype that is built but the actual
product itself. That is, the built application (prototype, in the previous model) is
not discarded. Hence, it is named Rapid Application Development Model.
Second, in order to ensure formalism in capturing the requirements in
the design and subsequent phases, a Computer Aided Software Engineering
(CASE) tool is used throughout the life cycle, right from requirements gathering.
Such CASE tools have
Methodologies to elicit requirements
Repositories to store the gathered requirements and all downstream
entities such as design objects; and
Mechanisms to automatically translate the requirements stored in the
repositories to design and generate the code in the chosen programming
environment.
17
This method can have wider applicability for even general-purpose
products. The automatic generation of the design and programs produced by a
CASE tool makes this model more attractive.
18
Figure 3.2 The Spiral Model
19
form components. This testing of the program units forms unit testing.
20
3.1.5 MODIFIED V MODEL
The V Model split the design and execution portion of the various types of
tests and attached the test design portion to the corresponding earlier phases of
the software life cycle.
An assumption made there was that even though the activity of test
execution was split into execution of tests of different types, the execution
cannot happen until the entire product is built. For a given product, I different
units and components can be in different stages of evolution. The V Model does
not explicitly address this natural parallelism commonly found in product
development.
Just as the V Model introduced various types of testing, the modified V
model introduces various phases of testing. A phase of testing has a one to-one
mapping to the types of testing, that is, there is a unit-testing phase,
component-testing phase, and so on. Once a unit has completed the unit -
testing phase, it becomes part of a component and enters the component
testing phase. It then moves to integration-testing phase and so on. Rather
than view the product as going through different types of tests (as the V model
does), the modified V Model views each part of the product to go through
different phases of testing. These are actually two sides of the same coin and
thus provide complimentary views. The main advantage the modified V model
brings to the table is the recognition of the parallelism present in different parts
of the product and assigning each part to the most appropriate phase of testing
that is possible. In Figure 2.6, the columns of the table represents one side of V,
and rows (which are test phases) represent the other side of V.
As can be seen from the above discussion, each of the models has its
advantages and disadvantages. Each of them has applicability in a specific
scenario. Each of them also provides different issues, challenges and
opportunities for verification and validation depicted n Figure 3.4.
21
3.3 LETS SUM UP
We are in the end of the lesson, in which you can be able to understand the
different types of testing and different types of life cycle models and RAD model.
Prototyping is a method of doing a software project as a blueprint. i.e
implementing it for trial before making it reality.
References:
1. http://en.wikipedia.org/wiki/Software_development_process
2. 2. SOFTWARE TESTING Principles and Practices – Srinivasan
Desikan Gopalswamy Ramesh, 2006, Pearson Education.
3. http://www.stylusinc.com/Common/Concerns/
SoftwareDevtPhilosophy.php
22
LESSON 4
In this lesson, we are going to discuss on the foremost level of testing. i.e.
White box testing. Another interesting testing called Static testing is also
discussed in detail with its sub-heads such as Static Testing by humans and
automatic static testing tool.
At the end of this lesson, you might be able to understand what basic
testing methods available and tools used for them.
White box testing is a way of testing the external functionality of the code
by examining and testing the program code that realizes the external
functionality. This is also known as clear box, or glass box or open box testing
given in Figure 4.1.
23
Figure 4.1 Classification of White Box Testing
White box testing takes into account the program code, code structure
and internal design flow. A number of defects come about because of incorrect
translation of requirements and design into program code. Some other defects
are created by programming errors and programming language idiosyncrasies.
The different methods of white box testing, reduces the delay between the
injection of a defect in the program code and its detection. Furthermore, since
the program code represents what the product actually does (rather than what
the product is intended to do), testing by looking at the program code makes us
get closer to what the product is actually doing.
Static testing is a type of testing which requires only the source code of
the product, not the binaries or executables. Static testing does not involve
executing the programs on computers but involves select people going through
the code to find out whether
• The code works according to the functional requirement;
• The code has been written in accordance with the design developed
earlier in the project life cycle.
• The code for any functionality has been missed out;
24
• The code handles errors properly.
Static testing can be done by humans or with the help of specialized tools.
These methods rely on the principle of humans reading the program code
to detect errors rather than computers executing the code to find errors. This
process has several advantages.
1. Sometimes humans can find errors that computers cannot. For example,
when there are two variables with similar names and the programmer
used a "wrong" variable by mistake in an expression, the computer will
not detect the error but execute the statement and produce incorrect
results, whereas a human being can spot such an error.
2. By making multiple humans read and evaluate the program, we can get
multiple perspectives and therefore have more problems identified
upfront than a computer could.
3. A human evaluation of the code can compare it against the specifications
or design and thus ensure that it does what is intended to do. This may
not always be possible when a computer runs a test.
4. A human evaluation can detect many problems at one go and can even
try to identify the root causes of the problems. More often than not,
multiple problems can get fixed by attending to the same root cause.
Typically, in a reactive testing, a test uncovers one problem (or, at best, a
few problems) at a time. Often, such testing only reveals the symptoms
rather than the root causes. Thus, the overall time required to fix all the
problems can be reduced substantially by a human evaluation.
5. By making humans test the code before execution, computer resources
can be saved. Of course, this comes at the expense of human resources.
6. A proactive method of testing like static testing minimizes the delay in
identification of the problems. The sooner a defect is identified and
corrected, lesser is the cost of fixing the defect.
7. From a psychological point of view, finding defects later in the cycle (for
example, after the code is compiled and the system is being put together)
creates immense pressure on programmers. They have to fix defects with
less time to spare. With this kind of pressure, there are higher chances of
other defects creeping in.
There are multiple methods to achieve static testing by humans. They are (in
the increasing order of formalism) as follows.
1. Desk checking of the code
2. Code walkthrough
3. Code review
4. Code inspection
25
Since static testing by humans is done before the code is compiled am
executed, some of these methods can be viewed as process-oriented or defect
prevention-oriented or quality assurance-oriented activities rather than pure
testing activities. Especially as the methods become increasingly formal (for
example, Fagan Inspection), these traditionally fall under the “process” domain.
They find a place in formal process models such as ISO 9001, CMMI, and so on
and are seldom treated as part of the “testing” domain. Nevertheless, as
mentioned earlier in this book, we take a holistic view of "testing" as anything
that furthers the quality of a product. These methods have been included in
this chapter because they have visibility into the program code.
4.2.1.1 Desk checking Normally done manually by the author' the code, desk
checking is a method to verify the portions of the code for correctness. Such
verification is done by comparing the code with the design or specifications to
make sure that the code does what it is supposed to do and effectively. This is
the desk checking that most programmers do before compiling and executing
the code. Whenever errors are found the author applies the corrections for
errors on the spot. This method, catching and correcting errors is characterized
by:
1. No structured method or formalism to ensure completeness and!
2. No maintaining of a log or checklist.
In effect, this method relies completely on the author's thoroughness,
diligence, and skills. There is no process or structure that guarantees verifies
the effectiveness of desk checking. This method is effective for correcting
"obvious" coding errors but will not be effective in detecting errors that arise
due to incorrect understanding of requirements or incomplete requirements.
This is because developers (or, more precisely, programmers who are doing the
desk checking) may not have the domain knowledge required to understand the
requirements fully.
The main advantage offered by this method is that the programmer who
knows the code and the programming language very well is well equipped to
read and understand his or her own code. Also, since this is done by one
individual, there are fewer scheduling and logistics overheads. Furthermore, the
defects are detected and corrected with minimum time delay.
Some of the disadvantages of this method of testing are as follows.
1. A developer is not the best person to detect problems in his or her own
code. He or she may be tunnel visioned and have blind spots to certain
types of problems.
2. Developers generally prefer to write new code rather than do any form of
testing! (We will see more details of this syndrome later in the section on
challenges as well as when we discuss people issues.)
3. This method is essentially person-dependent and informal and thus may
not work consistently across all developers.
26
Owing to these disadvantages, the next two types of proactive methods
are introduced. The basic principle of walkthroughs and formal inspections is to
involve multiple people in the review process.
4.2.1.2 Code walkthrough This method and formal inspection (described in
the next section) are group-oriented methods. Walkthroughs are less formal
than inspections. The line drawn in formalism between walkthroughs and
inspections is very thin and varies from organization to organization. The
advantage that walkthrough has over desk checking is that it brings multiple
perspectives. In walkthroughs, a set of people look at the program code and
raise questions for the author. The author explains the logic of the code, and
answers the questions. If the author is unable to answer some questions, he or
she then takes those questions and finds their answers. Completeness is
limited to the area where questions are raised by the team.
4.2.1.3 Formal inspection Code inspection-also called Fagan Inspection
(named after the original formulator) - is a method, normally with a high degree
of formalism. The focus of this method is to detect all faults, violations, and
other side-effects. This method increases the number of defects detected by
1. demanding thorough preparation before an inspection/review;
2. enlisting multiple diverse views;
3. assigning specific roles to the multiple participants; and
4. going sequentially through the code in a structured manner.
A formal inspection should take place only when the author has made
sure the code is ready for inspection by performing some basic desk checking
and walkthroughs. When the code is in such a reasonable state of readiness, an
inspection meeting is arranged. There are four roles in inspection. Fin is the
author of the code. Second is a moderator who is expected to formally run the
inspection according to the process. Third are the inspectors. These are the
people who actually provides, review comments for the code. There are typically
multiple inspectors. Finally, there is a scribe, who takes detailed notes during
the inspection meeting and circulates them to the inspection team after the
meeting.
The author or the moderator selects the review team. The chosen
members have the skill sets to uncover as many defects as possible. In
introductory meeting, the inspectors get copies (These can be hard copies or
soft copies) of the code to be inspected along with other supporting documents
such as the design document, requirements document, and any documentation
of applicable standards. The author also presents his or her perspective of what
the program is intended to do along with any specific issues that he or she may
want the inspection team to put extra focus on. The moderator informs the
team about the date, time, and venue of the inspection meeting. The inspectors
get adequate time to go through the documents and program and ascertain
their compliance to the requirements, design and standards.
The inspection team assembles at the agreed time for the inspection
meeting (also called the defect logging meeting). The moderator takes the iii
sequentially through the program code, asking each inspector if there are any
27
defects in that part of the code. If any of the inspectors raises a defect then the
inspection team deliberates on the defect and, when agreed there is a defect,
classifies it in two dimensions-minor/major and systemic/ execution. A mis-
execution defect is one which, as the name suggests, happens because of an
error or slip on the part of the author. It is unlikely to be repeated later, either
in this work product or in other work products. An example of this is using a
wrong variable in a statement. Systemic dele on the other hand, can require
correction at a different level. For example an error such as using some
machine-specific idiosyncrasies may have to remove by changing the coding
standards. Similarly, minor defects are defects that may not substantially affect
a program, whereas major defects need immediate attention.
A scribe formally documents the defects found in the inspection meeting
and the author takes care of fixing these defects. In case the defects severe, the
team may optionally call for a review meeting to inspect the fixes to ensure that
they address the problems. In any case, defects found thro inspection need to
be tracked till completion and someone in the team has to verify that the
problems have been fixed properly.
4.2.1.4 Combining various methods The methods discussed above are not
mutually exclusive. They need to be used in a judicious combination to be
effective in achieving the goal of finding defects early.
Formal inspections have been found very effective in catching defects
early. Some of the challenges to watch out for in conducting formal inspections
are as follows.
1. These are time consuming. Since the process calls for preparation as well
as formal meetings, these can take time.
2. The logistics and scheduling can become an issue since multiple people
are involved.
3. It is not always possible to go through every line of code, with several
parameters and their combinations in mind to ensure the correctness of
the logic, side-effects and appropriate error handling. It may also not be
necessary to subject the entire code to formal inspection.
In order to overcome the above challenges, it is necessary to identify,
during the planning stages, which parts of the code will be subject to formal
inspections. Portions of code can be classified on the basis of their criticality or
complexity as "high," "medium," and "low." High or medium complex critical
code should be subject to formal inspections, while those classified as "low" can
be subject to either walkthroughs or even desk checking.
Desk checking, walkthrough, review and inspection are not only used for
code but can be used for all other deliverables in the project life cycle such as
documents, binaries, and media.
28
l) Check your answer with the one given at the end of this lesson.
--------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
29
4.2.2.1 CODING REVIEW CHECKLIST
30
or the section of code is critical for product functioning?
Is appropriate change history documented?
Are the interfaces and the parameters thereof properly documented?
We stand at the end of this lesson where you understood the fundamentals of
white box testing, static testing. Static testing categorized with human testing
and testing tools. The methods of testing practices are also discussed.
1. White box testing is a way of testing the external functionality of the code
by examining and testing the program code that realizes the external
functionality. This is also known as clear box, or glass box or open box
testing.
2. The methods are
a. Desk checking of the code
b. Code walkthrough
c. Code review
d. Code inspection
2. Software Development Lifecycle and its phases needs software
components to face the market regime.
3. Software components take code reuse with self-contained, binary
modules to be created by independent developers. A Component can be
written in any computer language that supports the creation of
components. Components are plugged into application runtime.
31
LESSON 5
STRUCTURAL TESTING
Contents
5.0 Aims and Objectives
5.1 Structural Testing
5.2 Unit/Code Functional Testing
5.3 Code Coverage Testing
5.3.1 Statement Coverage
5.3.2 Path Coverage
5.3.3 Condition Coverage
5.3.4 Function Coverage
5.4 Code Complexity Testing
5.5 Challenges in White Box Testing
5.6 Let Us Sum Up
This lesson is the end part of Unit I, where we will discuss on Structural
testing, Code functional testing and Code coverage testing.
At the end of this lesson, the reader might be able to know the challenges
in white box testing.
Structural testing takes into account the code, code structure, internal
design, and how they are coded. The fundamental difference between structural
testing and static testing is that in structural testing tests are actually run by
the computer on the built product, whereas in static testing the product is
tested by humans using just the source code and not the executables or
binaries.
Structural testing entails running the actual product against some pre
designed test cases to exercise as much of the code as possible or necessary. A
given portion of the code is exercised if a test case causes the program to
execute that portion of the code when running the test.
List the basic difference between structural testing and static testing.
Notes: a) Write your answer in the space given below
m) Check your answer with the one given at the end of this Lesson.
32
--------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
33
Code coverage testing is made up of the following types of coverage.
1. Statement coverage
2. Path coverage
3. Condition coverage
4. Function coverage
5.3.1 Statement coverage Program constructs in most conventional
programming languages can be classified as
1. Sequential control flow
2. Two-way decision statements like if then else
3. Multi-way decision statements like Switch
4. Loops like while do, repeat until and for
Object-oriented languages have all of the above and, in addition number
of other constructs and concepts. We will take up issues pertaining to object
oriented languages together. Statement coverage refers to writing test cases that
execute each of the program statements. One can start with the assumption
that more the code covered, the better is the testing of the functionality, as the
code realizes the functionality. Based on this assumption, code coverage can be
achieved by providing coverage to each of the above types of statements. When
we consider a two-way decision construct like the if statement then to cover all
the statements, we should also cover the then and parts of the if statement.
This means we should have, for each if else, (at least) one test case to test the
Then part and (at least) one test case to test the else part.
The multi-way decision construct such as a Switch statement reduced to
multiple two-way if statements. Thus, to cover all possible switch cases, there
would be multiple test cases. Loop constructs present more variations to take
care of. A loop in various forms such as for, while, repeat, and so on-is
characterized by executing a set of statements repeatedly until or while certain
conditions are met. A good percentage of the defects in programs come about
because of loops that do not function properly. More often, loops fail in what are
called "boundary conditions." One of the common looping errors is that the
termination condition of the loop is not properly stated. In order to make sure
that there is better statement coverage for statements within a loop, there
should be test cases that
1. Skip the loop completely, so that the situation of the termination
condition being true before starting the loop is tested.
2. Exercise the loop between once and the maximum number of times, to
check all possible “normal” operations of the loop.
3. Try covering the loop, around the “boundary” of n-that is, just below n, n
and just above n.
34
5.3.2 Path Coverage In path coverage, we split a program into a number, of
distinct paths. A program (or a part of a program) can start from the beginning
and take any of the paths to its completion.
Let us take an example of a date validation routine. The date is accepted,
as three fields mm, dd and yyyy. We have assumed that prior to entering this
routine; the values are checked to be numeric. To simplify the discussion, we
have assumed the existence of a function called leap year which will return
TRUE if the given year is a leap year. There is an array called DayofMonth which
contains the number of days in each month.
Path coverage provides a stronger condition of coverage than statement
coverage as it relates to the various logical paths in the program rather than
just program statements.
5.3.3 Condition coverage In the above example, even if we have covered all the
paths possible, it would not mean that the program is fully tested. For example,
we can make the program take the path A by giving a value less than 1 (for
example, 0) to mm and find that we have covered the path A and the program
has detected that the month is invalid. But, the program may still not be
correctly testing for the other condition namely mm > 12. Furthermore, most
compliers perform optimizations to minimize the number of Boolean operations
and all the conditions may not get evaluated, even though the right path is
chosen. For example, when there is an OR condition (as in the first I F
statement above), once the first part of the I F (for example, mm < 1) is found to
be true, the second part will not be evaluated at all as the overall value of the
Boolean is TRUE. Similarly, when there is an AND condition in a Boolean
expression, when the first condition evaluates to FALSE, the rest of the
expression need not be evaluated at all.
The condition coverage, as defined by the formula alongside in the
margin gives an indication of the percentage of conditions covered by a set of
test cases. Condition coverage is a much stronger criteria than path coverage,
which in turn is a much stronger criteria than statement coverage.
5.3.4 Function coverage This is a new addition to structural testing to identify
how many program functions (similar to functions in "C" language) are covered
by test cases.
The requirements of a product are mapped into functions during the
design phase and each of the functions form a logical unit. For example, in a
database software, "inserting a row into the database" could be a function. Or,
in a payroll application, "calculate tax" could be a function. Each function
could, in turn, be implemented using other functions. While providing function
coverage, test cases can be written so as to exercise each of the different
functions in the code. The advantages that function coverage provides over the
other types of coverage are as follows.
1. Functions are easier to identify in a program and hence it is easier to
write test cases to provide function coverage.
35
2. Since functions are at a much higher level of abstraction than code, it is
easier to achieve 100 percent function coverage than 100 percent
coverage in any of the earlier methods.
3. Functions have a more logical mapping to requirements and hence can
provide a more direct correlation to the test coverage of the product. In
the next chapter, we will be discussing the requirement traceability
matrix, which track a requirement through design coding, and testing
phases. Functions provide one means to achieve this traceability.
Function coverage provides a way of testing this traceability.
4. Since functions are a means of realizing requirements, the importance of
functions can be prioritized based on the importance of the requirements
they realize. Thus, it would be easier to prioritize the functions for
testing. This is not necessarily the case with the earlier methods of
coverage.
5. Function coverage provides a natural transition to black box testing.
We can also measure how many times a given function is called. This will
indicate which functions are used most often and hence these function become
the target of any performance testing and optimization. As an example, if
networking software, we find that the function that assemble, and disassembles
the data packets is being used most often, it is appropriate to spend extra effort
in improving the quality and performance of the function. Thus, function
coverage can help in improving the performance well as quality of the product.
5.3.5 Summary Code coverage testing involves" dynamic testing methods of
executing the product with pre-written test cases, and finding out how much of
code has been covered. If a better coverage of a code is desired, several
iterations of testing may be required. For each iteration, one has to go through
the statistics and write a new set of test cases Ii covering portions of the code
not covered by earlier test cases. To do till type of testing not only does one
need to understand the code, logic but also need to understand how to write
effective test cases that can cover good portions of the code. This type of testing
can also be referred to as "gray box testing" as this uses the combination of
"white box and bill box methodologies" (white + black = gray) for effectiveness.
Performance analysis and optimization Code coverage tests can identify the
areas of a code that are executed most frequently. Extra attention can then be
paid to these sections of the code. If further performance improvement is no
longer possible, then other strategies like caching can be considered. Code
coverage testing provides information that is useful in making such
performance-oriented decisions.
Resource usage analysis White box testing, especially with instrumented code,
is useful in identifying bottlenecks in resource usage. For example, if a
particular resource like the RAM or network is perceived as a bottleneck, then
instrumented code can help identify where the bottlenecks are and point
towards possible solutions.
Checking of critical sections or concurrency related parts of code Critical
sections are those parts of a code that cannot have multiple processes
36
executing at the same time. Coverage tests with instrumented code are one of
the best means of identifying any violations of such concurrency constraints
through critical sections.
Identifying memory leaks Every piece of memory that is acquired or allocated
by a process (for example, by malloc in C) should be explicitly released (for
example, by free in C). If not, the acquired memory is "lost" and the amount of
available memory decreases correspondingly. Over time, there would be no
memory available for allocation to meet fresh memory requests and processes
start failing for want of memory. The various white box testing methods can
help identify memory leaks. Most debuggers or instrumented code can tally
allocated and freed memory.
Dynamically generated code White box testing can help identify security holes
effectively, especially in a dynamically generated code. In instances where a
piece of code is dynamically created and executed, the functionality of the
generated code should be tested on the fly. For example, when using web
services, there may be situations wherein certain parameters are accepted from
the users and html/java code may be generated and passed on to a remote
machine for execution. Since after the transaction or service is executed, the
generated code ceases to exist, testing the generated code requires code
knowledge. Hence, the various techniques of white box testing discussed in this
chapter come in handy.
In previous sections, we saw the different types of coverage that can be provided
to test a program. Two questions that come to mind while using this coverage
are:
1. Which of the paths are independent? If two paths are not independent,
then we may be able to minimize the number of tests.
2. Is there an upper bound on the number of tests that must be run to
ensure that all the statements have been executed at least once?
Cyclomatic complexity is a metric that quantifies the complexity of a
program and thus provides answers to the above questions.
A program is represented in the form of a flow graph. A flow graph
consists of nodes and edges. In order to convert a standard flow chart into a
flow graph to compute cyclomatic complexity, the following steps can be taken.
37
1. Identify the predicates or decision points (typically the Boolean
conditions or conditional statements) in the program.
2. Ensure that the predicates are simple (that is, no and/or, and so on in
each predicate). Figure shows how to break up a condition having or into
simple predicates. Similarly, if there are loop constructs break the loop
termination checks into simple predicates.
3. Combine all sequential statements into a single node. The reasoning here
is that these statements all get executed, once started.
4. When a set of sequential statements are followed by a simple predicate
(as simplified in (2) above), combine all the sequential statements and the
predicate check into one node and have two edges emanating from this
one node. Such nodes with two edge emanating from them are called
predicate nodes.
5. Make sure that all the edges terminate at some node; add a node to
represent all the sets of sequential statements at the end of the program
We have illustrated the above transformation rules of a conventional flow
chart to a flow diagram in Figure 5.1 (a) and 5.1 (b). We have color coded the
different boxes so that the reader can see the transformation more clearly. The
flow chart elements of a given color on the left-hand side get mapped to flow
graph elements of the corresponding nodes on the ri~ hand side. Intuitively, a
flow graph and the cyclomatic complexity provide indicators to the complexity of
the logic flow in a program and to the number of independent paths in a
program.
Flow graph translation of an OR to a simple predicate.
38
complexity checks must be performed on the modules before embarking upon
the testing (or even coding) phase. This can become one of the items to check
for in a code review. Based on the complexity number that emerges from using
the toot one can conclude what actions need to be taken for complexity
measure.
White box testing requires a sound knowledge of the program code and
the programming language. This means that the developers should get
intimately involved in white box testing. Developers, in general, do not like to
perform testing functions. This applies to structural testing as well as static
testing methods such as reviews. In addition, because of the timeline pressures,
the programmers may not "find time" for reviews (an euphemism for wanting to
do more coding). We will revisit this myth of dichotomy between testing and
development functions in the chapter on people issues (Chapter 13).
Human tendency of a developer being unable to find the defects in
his or her code As we saw earlier, most of us have blind spots in detecting
errors in our own products. Since white box testing involves programmers who
write the code, it is quite possible that they may not be most effective in
detecting defects in their own work products. An independent perspective could
certainly help.
Fully tested code may not correspond to realistic scenarios
Programmers generally do not have a full appreciation of the external
(customer) perspective or the domain knowledge to visualize how a product will
be deployed in realistic scenarios. This may mean that even after extensive
testing, some of the common user scenarios may get left out and defects may
creep in.
These challenges do not mean that white box testing is ineffective. But
when white box testing is carried out and these challenges are addressed by
other means of testing, there is a higher likelihood of more effective testing.
39
built product, where as in static testing the product is tested by humans
using just the source code and not the executables or binaries.
2. Code coverage testing is made up of the following types of coverage.
1. Statement coverage
2. Path coverage
3. Condition coverage
4. Function coverage
3. Human tendency of a developer being unable to find the defects in his or
her code Fully tested code may not correspond to realistic scenarios
References :
40
UNIT - II
LESSON 6
Black box testing involves looking at the specifications and does not
require examining the code of a program. Black box testing is done from the
customer's view point. The test engineer engaged in black box testing only
knows the set of inputs and expected outputs and is unaware of hoi those
inputs are transformed into outputs by the software. Black box tests are
convenient to administer because they use the complete finished product and
do not require any knowledge of its construction Independent test laboratories
can administer black box tests to ensure functionality and compatibility.
Black-box test design treats the system as a "black-box", so it doesn't
explicitly use knowledge of the internal structure. Black-box test design is
usually described as focusing on testing functional requirements. Glass-box
test design allows one to peek inside the "box", and it focuses specifically on
using internal knowledge of the software to guide the selection of test data
41
Black box testing thus requires a functional knowledge of the product to
be tested. It does not mandate the knowledge of the internal logic of the system
nor does it mandate the knowledge of the programming language used to build
the product. Our tests in the above example were focused towards testing the
features of the product (lock and key), the different states, we already knew the
expected outcome. You may check if the lock works with some other key (other
than its own). You may also want to check with a hairpin or any thin piece of
wire if the lock works. We shall see in further sections, in detail, about the
different kinds of tests that can be performed in, a given product.
42
Test scenarios can be generated as soon as the specifications are ready. Since
requirements specifications are the major inputs for black box testing test
design can be started early in the cycle.
Black box testing activities require involvement of the testing team from
the beginning of the software project life cycle, regardless of the software
development life cycle model chosen for the project.
Testers can get involved right from the requirements gathering analysis
phase for the system under test. Test scenarios and test data are prepared
during the test construction phase of the test life cycle, when' software is in the
design phase.
Once the code is ready and delivered for testing, test execution can be
done. All the test scenarios developed during the construction phi are executed.
Usually, a subset of these test scenarios is selected: regression testing.
43
--------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
44
The Requirements Traceability Matrix provides a wealth of information
various test metrics. Some of the metrics that can be collected or inferred from
this matrix are as follows.
• Requirements addressed priority wise - This metric helps in knowing the
test coverage based on the requirements. Number of tests that is covered
for high-priority requirement versus tests created for low-priority
requirement.
• Number of test cases requirement wise - For each requirement, the total
number of test cases created.
• Total number of test cases prepared - Total of all the test cases prepared
for all requirements.
Once the test cases are executed, the test results can be used to collect
metrics such as
• Total number of test cases (or requirements) passed-Once execution is
completed, the total passed test cases and what per of requirements they
correspond.
• Total number of test cases (or requirements) failed -Once"' execution is
completed, the total number of failed test easel what percent of
requirements they correspond.
• Total number of defects in requirements- List of defects reported for each
requirement (defect density for requirements). This helps in doing an
impact analysis of what requirements have more de and how they will
impact customers. A comparatively high defects density in low-priority
requirements is acceptable for a release. A high-defect density in high-
priority requirement is consider high-risk area, and may prevent a
product release.
45
A product delivering an error when it is expected to give an error is also a part
of positive testing.
Positive testing can thus be said to check the products behavior for
positive and negative conditions as stated in the requirement.
Negative testing is done to show that the product does not fail when an
unexpected input is given. The purpose of negative testing is to try and break
the system. Negative testing covers scenarios for which the product is not
designed and coded. In other words, the input values may not have been
represented in the specification of the product. These test conditions can be
termed as unknown conditions for the product as far as the specifications are
concerned. But, at the end – user level, there are multiple scenarios that are
encountered and that need to be taken care of by the product. It becomes even
more important for the tester to know the negative situations that may occur at
the end-user level so that the application can be tested and made foolproof. A
negative test would be a product not delivering an error when it should or
delivering an error when it should not.
The difference between positive and negative testing is in their coverage.
For positive testing if all documented requirements and test conditions are
covered, then coverage can be considered to be 100 percent. If the specifications
are very clear, then coverage can be achieved. In contrast there is no end to
negative testing and 100 percent coverage in negative testing in impractical.
Negative testing requires a high degree of creativity among the testers to cover
as many “unknowns” as possible to avoid failure at a customer site.
46
• The requirements themselves may not be clearly understood, especially
around the boundaries, thus causing even, the correctly coded program
to not perform the correct way.
Another instance where boundary value testing is extremely useful in
uncovering defects is when there are internal limits placed on certain resources,
variables or data structures. Consider a database management system (or a file
system) which caches the recently used data blocks in a shared memory area.
Usually such a cached area is limited by a parameter that the user specifies at
the time of starting up the system. Assume that the database is brought up
specifying that the most recent 50 database buffers have to be cached. When
these buffers are full and a 51st block needs to be released, after storing it in
secondary memory. As you can observe, both the operations – inserting the new
buffer as well as freeing up the first buffer – happen at the “boundaries”.
To summarize boundary value testing
• Look for any kind of graduation or discontinuity in data values which
affect computation – the discontinuities are the boundary values, which
require thorough testing.
• Look for any internal limits such as limits on resources (as in the
example of buffers given above). The behavior of the product at these
limits should also be the subject of boundary value testing.
• Also include in the list of boundary values, documented limits on
hardware resources. For example, if it is documented that a product will
run with minimum 4MB of RAM, make sure you include test cases for
the minimum RAM (4MB in this case).
• The examples given above discuss boundary conditions for input data –
the same analysis needs to be done for output variables also.
Boundary value analysis discussed in context of black box testing applies to
white box testing also. Internal data structures like arrays, stacks and queues
need to be checked for boundary or limit conditions; when there are linked lists
used as internal structures, the behavior of the list at the beginning and end
have to be tested thoroughly.
Boundary values and decision tables help identify the test cases that are
most likely to uncover defects. A generalization of both these concepts is the
concept of equivalence classes.
Now we are at the end of the lesson, you may be able to understand what
black box is testing.
Advantages of Black Box Testing
• more effective on larger units of code than glass box testing
• tester needs no knowledge of implementation, including specific
programming languages
47
• tester and programmer are independent of each other
• tests are done from a user's point of view
• will help to expose any ambiguities or inconsistencies in the
specifications
• test cases can be designed as soon as the specifications are complete
Disadvantages of Black Box Testing
• only a small number of possible inputs can actually be tested, to test
every possible input stream would take nearly forever
• without clear and concise specifications, test cases are hard to design
• there may be unnecessary repetition of test inputs if the tester is not
informed of test cases the programmer has already tried
• may leave many program paths untested
• cannot be directed toward specific segments of code which may be very
complex (and therefore more error prone)
• most testing related research has been directed toward glass box testing
Check Your Progress: Model Answers
1. Black box testing is done from the customer's view point. The test
engineer engaged in black box testing only knows the set of inputs and
expected outputs and is unaware of hoi those inputs are transformed
into outputs by the software.
Black-box test design treats the system as a "black-box", so it doesn't
explicitly use knowledge of the internal structure. Black-box test design
is usually described as focusing on testing functional requirements.
Glass-box test design allows one to peek inside the "box", and it focuses
specifically on using internal knowledge of the software to guide the
selection of test data
2. The decision-table-based testing technique
A decision table has two parts: the conditions part and the actions part.
The decision table specifies under what conditions a test action must be
performed. Each condition expresses a relationship among variables that
must be resolvable as true or false. All the possible combinations of
conditions define a set of alternatives. For each alternative, a test action
should be considered. The number of alternatives increases exponentially
with the number of conditions, which may be expressed as
2NumberOfConditions. When the decision table becomes too complex, a
hierarchy of new decision tables can be constructed.
48
Figure 6.1 Example of a decision table
Because some alternatives specified might be unrealistic, a test strategy
should
1) verify that all alternatives can actually be reached and 2) describe how
the AUT will behave under all alternative conditions. With a decision
table, it is easy to add and remove conditions, depending on the test
strategy. It is easy to increase test coverage by adding new test actions
from iteration to iteration, according to the test strategy.
As illustrated in Figure 6.1, decision tables are useful when specifying,
analyzing, and testing complex logic. They are efficient for describing
situations where varying conditions produce different test actions. They
are powerful for finding faults both in implementation and specifications.
3. All explicit requirements (from the Systems Requirements Specifications)
and implied requirements (inferred by the test team) are collected and
documented as "Test Requirements Specification" (TRS).
49
LESSON 7
A decision table lists the various decision variables, the conditions (or
values) assumed by each of the decision variables, and the actions to take in
each combination of conditions. The variables that contribute to the decision
are listed as the columns of the table. The last column of the table is the action
to be taken for the combination of values of the decision variables. In cases
when the number of decision variables is many (say, more than five or six) and
the number of distinct combinations of variables is few (say, four or five), the
decision variables can be listed as rows.
The reader would have noticed that there are a number of entries
marked"-" in the decision table. The values of the appropriate decision variables
in these cases do not affect the outcome of the decision. For example, the status
of the spouse is relevant only when the filing status is "Married filing separate
return." Similarly, the age of spouse and whether spouse is blind or not comes
into play only when the status is "Married, filing joint return." Such entries are
called don't cares (sometimes represented by the Greek character phi,Φ). These
don't cares significantly reduce the number of tests to be performed. For
example, in case there were no don’t cares, there would be eight cases for the
status of "Single": four with status of spouse as claimed standard deduction
and four with spouse status being not claiming standard deduction. Other than
50
this one difference, there is no material change in the status of expected result
of the standard deduction amount. We leave it as an exercise for the reader to
enumerate the number of rows in the decision table, should we not allow don't
cares and have to explicitly specify each case. There are formal tools like
Karnaugh Maps which can be used to arrive at a minimal Boolean expression
that represents the various Boolean conditions in a decision table. The
references given at the end of this chapter discuss these tools and techniques.
Thus, decision tables act as invaluable tools for designing black box tests
to examine the behavior of the product under various logical condition input
variables. The steps in forming a decision table are as follows.
1. Identify the decision variables.
2. Identify the possible values of each of the decision variables.
3. Enumerate the combinations of the allowed values of each of the
variables.
4. Identify the cases when values assumed by a variable (or by sets of
variables) are immaterial for a given combination of other input variables.
Represent such variables by the don't care symbol.
5. For each combination of values of decision variables (appropriately
minimized with the don't care scenarios), list out the action or expected
result.
6. Form a table, listing in each but the last column a decision variable. In
the last column, list the action item for the combination of variables in
that row (including don't cares, as appropriate).
Once a decision table is formed, each row of the table acts as the
specification for one test case. Identification of the decision variables makes
these test cases extensive, if not exhaustive. Pruning the table by using don't
cares minimizes the number of test cases. Thus, decision tables are usually
effective in arriving at test cases in scenarios which depend on the values of the
decision variables.
WORKING WITH DECISION AND DATA-DRIVEN TABLES
Since the logic is defined in the decision table, the tester does not need to
hard code any testing logic. The decision script just performs the verifications
during execution, compares the result of the verifications with the alternatives
provided by the decision table, and returns the next test script to run if a
solution is found.
A test suite script contains several decision scripts and test scripts. All
the elements of a test suite are defined in a driver table that specifies an un-
ordered set of test segments. Each test segment consists of a collection of test
scripts that are executed sequentially between two decision scripts. For each
test segment, the driver table specifies the transition between a source test
script and a target test script.
As the decision is computed dynamically by the decision script during
execution, a mechanism of notification must be implemented for the test suite
51
script to be notified by the decision script about the next test script to run.
When the decision script notifies the test suite script about the next test script
to run, the test suite script queries the driver table to find the next test segment
to run. The process is illustrated in Figure 7.1.
The set of input values that generate one single expected output is called
a partition. When the behavior of the software is the same for a set of values
52
then the set is termed as an equivalence class or a partition. In this case
representative sample from each partition (also called the member of the
equivalence class) is picked up for testing. One sample from the partition is
enough for testing as the result of picking up some more values from the set
will be the same and will not yield any additional defects. Since all the values
produce equal and same output they are termed as equivalence partition.
Testing by this technique involves (a) identifying all partitions for the
complete set of input, output values for a product and (b) picking up one
member value from each partition for testing to maximize complete coverage.
From the results obtained for a member of an equivalence class or
partition, this technique extrapolates the expected results for all the values in
that partition. The advantage of using this technique is that we gain good
coverage with a small number of test cases. For example, if there is a defect in
one value in a partition, then it can be extrapolated to all the values of that
particular partition. By using this technique, redundancy of tests is minimized
by not repeating the same tests for multiple values in the same partition.
Let us consider the example below, of an insurance company that has
the following premium rates based on the age group.
Life Insurance Premium Rates
A life insurance company has base premium of $0.50 for all ages. Based
on the age group, an additional monthly premium has to be paid that is as
listed in the table below. For example, a person aged 34 has to pay a premium =
base Premium + additional premium = $0.50 + $1.65 = $2.15
Age Group Additional
Premium
Under 35 $1.65
35-59 $2.87
60+ $6.00
Based on the equivalence partitioning technique, the equivalence
partitions that are based on age are given below:
• Below 35 years of age (valid input)
• Between 35 and 59 years of age (valid input)
• Above 60 years of age (valid input)
• Negative age (invalid input)
• Age as 0 (invalid input)
• Age as any three-digit number (valid input)
We need to pick up representative values from each of the above
partitions. You may have observed that even though we have only a small table
of valid values, the equivalence classes should also include samples of invalid
inputs. This is required so that these invalid values do not cause unforeseen
53
errors. You can see that the above cases include both positive and negative test
input values.
The test cases for the example based on the equivalence partitions. The
equivalence partitions table has as columns:
• Partition definition
• Type of input (valid / invalid)
• Representative test data for that partition
• Expected results
One way to divide the set is by
1. Prime numbers
2. Composite numbers
3. Numbers with decimal point
These three classes divide the set of numbers into three valid classes. In
addition, to account for any input a user may give, we will have to add an
invalid class-string with alphanumeric characters. As in the previous case, we
can construct an equivalence partitions table for this example.
Thus, like in the first example on life insurance premium, here have
reduced a potentially infinite input data space to a finite one, without losing the
effectiveness of testing. This is the power of using equivalence classes: choosing
a minimal set of input values that are truly representative of the entire
spectrum and uncovering a higher number of defects.
The steps to prepare an equivalence partitions table are as follows.
• Choose criteria for doing the equivalence partitioning (range, list of
values, and so on)
• Identify the valid equivalence classes based on the above criteria
(number of ranges allowed values, and so on)
• Select a sample data from that partition
• Write the expected result based on the requirements given
• Identify special values, if any, and include them in the table
• Check to have expected results for all the cases prepared
• If the expected result is not clear for any particular test case, mark
appropriately and escalate for corrective actions. If you cannot answer a
question, or find an inappropriate answer, consider whether you want to
record this issue on your log and clarify with the team that
arbitrates/dictates the requirements.
54
--------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
55
The parameters that generally affect the compatibility of the product are
• Processor (CPU) (Pentium III, Pentium IV, Xeon, SPARC, and so on) and
the number of processors in the machine
• Architecture and characteristics of the machine (32 bit, 64 bit, and so
on)
• Resource availability on the machine (RAM, disk space, network card)
• Equipment that the product is expected to work with (printers, modems,
routers, and so on)
• Operating system (Windows, Linux, and so on and their variants) and
operating system services (DNS, NIS, FTP, and so on)
• Middle-tier infrastructure components such as web server, application
server, network server
• Backend components such database servers (Oracle, Sybase, and so on)
• Services that require special hardware-cum-software solutions (cluster
machines, load balancing, RAID array, and so on)
• Any software used to generate product binaries (compiler, linker, and so
on and their appropriate versions)
• Various technological components used to generate components (SDK,
JDK, and so on and their appropriate different versions)
The above are just a few of the parameters. There are many more
parameters hat can affect the behavior of the product features. In the above
example, we have described ten parameters. If each of the parameters can take
four values, then there are forty different values to be tested. But that is not all.
Not only can the individual values of the parameters affect the features, but
also the permutation and combination of the parameters. Taking these
combinations into consideration, the number of times a particular feature to be
tested for those combinations may go to thousands or even millions. In he
above assumption of ten parameters and each parameter taking on four values,
the total number of combinations to be tested is 410, which is a large number
and impossible to test exhaustively.
Some of the common techniques that are used for performing
compatibility testing, using a compatibility table are
1. Horizontal combination All values of parameters that can coexist with the
product for executing the set test cases are grouped together as a row in
the compatibility matrix. The values of parameters that can coexist
generally belong to different layers/types of infrastructure pieces such as
operating system, web server, and so on. Machines or environments are
set up for each row and the set of product features are tested using each
of these environments.
2. Intelligent sampling In the horizontal combination method, each feature
of the product has to be tested with each row in the compatibility matrix.
This involves huge effort and time. To solve this problem, combinations
of infrastructure parameters are combined with the set of features
intelligently and tested. When there are problems due to any of the
combinations then the test cases are executed, exploring the various
permutations and combinations. The selection of intelligent samples is
based on information collected on the set of dependencies of the product
56
with the parameters. If the product results are less dependent on a set of
parameters, then they are removed from the list of intelligent samples.
All other parameters are combined and tested. This method significantly
reduces the number of permutations and combinations for test cases.
Compatibility testing not only includes parameters that are outside the
product, but also includes some parameters that are a part of the product. For
example, two versions of a given version of a database may depend on a set of
APIs that are part of the same database. These parameters are also an added
part of the compatibility matrix and tested. The compatibility testing of a
product involving parts of it self can be further classified into two types.
1. Backward compatibility testing There are many versions of the same
product that are available with the customers. It is important for the
customers that the objects, object properties, schema, rules, reports, and
so on, that are created with an older version of the product continue to
work with the current version of the same product. The testing that
ensures the current version of the product continues to work with the
older versions of the same product is called backward compatibility
testing. The product parameters required for the backward compatibility
testing are added to the compatibility matrix and are tested.
2. Forward compatibility testing There are some provisions for the product
to work with later versions of the product and other infrastructure
components, keeping future requirements in mind. For example, IP
network protocol version 6 uses 128 bit addressing scheme (IP version 4,
uses only 32 bits). The data structures can now be defined to
accommodate 128 bit addresses, and be tested with prototype
implementation ofIpv6 protocol stack that is yet to become a completely
implemented product. The features that are part of Ipv6 may not be still
available to end users but this kind of implementation and testing for the
future helps in avoiding drastic changes at a later point of time. Such
requirements are tested as part of forward compatibility testing. Testing
the product with a beta version of the operating system, early access
version of the developers' kit, and so on are examples of forward
compatibility. This type of testing ensures that the risk involved in
product for future requirements is minimized.
For compatibility testing and to use the techniques mentioned above, an
in-depth internal knowledge of the product may not be required. Compatibility
testing begins after validating the product in the basic environment. It is a type
of testing that involves high degree of effort, as there are a large number of
parameter combinations. Following some of the techniques mentioned above
may help in performing compatibility testing more effectively.
57
--------------------------------------------------------------------------------------------------
58
7.6 DOMAIN TESTING
White box testing required looking at the program code. Black box testing
performed testing without looking at the program code but looking at the
specifications. Domain testing can be considered as the next level of testing in
which we do not look even at the specifications of a software product but are
testing the product, purely based on domain knowledge and expertise in the
domain of application. This testing approach requires critical understanding of
the day-to-day business activities for which the software is written. This type of
testing requires business domain knowledge rather than the knowledge of what
the software specification contains or how the software is written. Thus domain
testing can be considered as an extension of black box testing. As we move from
white box testing through black box testing to domain testing we know less and
less about the details of the software product and focus more on its external
behavior.
The test engineers performing this type of testing are selected because
they have in-depth knowledge of the business domain. Since the depth in
business domain is a prerequisite for this type of testing, sometimes it is easier
to hire testers from the domain area (such as banking, insurance, and so on)
and train them in software, rather than takes software professionals and trains
them in the business domain. This reduces the effort and time required for
training the testers in domain testing and also increases the effectiveness of
domain testing.
59
LESSON 8
INTEGRATION TESTING
Contents
8.0 Aims and Objectives
8.1 What is Integration Testing?
8.2 Integration Testing as a Type of Testing
8.2.1 Top-Down Integration
8.2.2 Bottom-up Integration
8.2.3 Bi-Directional Integration
8.2.4 System Integration
8.2.5 Choosing Integration Method
8.5 Let Us Sum Up
60
done. Recognizing this complexity, a phase in testing is dedicated to test these
interactions, resulting in the evolution of a process. This ensuing phase is
called the integration testing phase.
Since integration testing is aimed at testing the interactions among the
modules, this testing-just like white box, black box, and other types of testing-
comes with a set of techniques and methods, which we will see in the following
sections. Hence integration testing is also viewed as a type of testing (and thus
fits into the canvas of this part of the book).
Define Integration.
Notes: a) Write your answer in the space given below
v) Check your answer with the one given at the end of this lesson.
--------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
61
teams, each having their own schedules. In order to test the interfaces, when
the full functionality of the component being introduced is not available, stubs
are provided. A stub simulates the interface by providing the appropriate values
in the appropriate format as would be provided by the actual component being
integrated.
Integration testing is done with test cases, which goes through the
internal and exported interfaces, and tests the functionality of the software.
Internal interfaces are for other developers inside an organization and external
interfaces are for third party developers or other users outside the group.
Testing for internal interfaces requires a complete understanding of
architecture and high-level design (HLD) and how they impact the software
functionality. In cases where exported interfaces are provided from the software,
one needs to understand the purpose of those interfaces, why they are
provided, and how they are actually used by developers and solution
integrators. Hence knowledge of design, architecture, and usage is a must for
integration testing.
Initially, the exported (or external) interfaces were provided through APIs
and Software Development Kits (SDKs). The use of SDKs required an
understanding of the programming language on which the API/SDK is provided.
Later, the interfaces became available through scripting languages, without the
need for SDKs. (Some of the popular scripting languages include Perl, Tcl/Tk).
These scripting languages eliminated or minimized the effort in learning the
languages in which the API was written. This also made it possible for the
interfaces to be called from programming language environments different from
the one in which the interface was originally written. This significantly
simplified the usage of exported interfaces. For testing interfaces, we now have
dynamically created scripts, which can be changed at run time, by a few clicks
of the mouse.
All these have made the use of interfaces a lot more widespread. The
number of purposes for which the interfaces are provided have been on the
increase. These interfaces are becoming increasingly generic in nature not
getting tied to a specific application or language. This has resulted in increasing
the permutations and combinations of scenarios of usage of the interfaces.
Thus, the complexity of integration testing-that is, testing of the various
scenarios of usage of interfaces - has also increased significantly.
While discussing about interfaces we need to keep in mind that not all
interactions between the modules are known and explained through interfaces.
Some of the interfaces are documented and some of them, not. This gives rise to
another classification of interfaces, that is, implicit and explicit interfaces.
Explicit interfaces are documented interfaces and implicit interfaces are those
which are known internally to the software engineers but are not documented.
The testing (white box/black box) should look for both implicit and explicit
interfaces and test all those interactions.
A question that often arises in the mind of a test engineer is whether
integration testing is a black box or a white box testing approach. In most
cases, the most appropriate answer is to say integration testing is a bit box
testing approach. However, in situations where architecture or design
62
documents do not clear by explain all interfaces among components the
approach can include going through the code and generating some additional
test cases, and mixing them with other test cases generated, black box testing
approaches. This approach could be termed as the "gray box testing" approach.
There are several methodologies available, to in decide the order for
integration testing. These are as follows.
1. Top-down integration
2. Bottom-up integration
3. Bi-directional integration
4. System integration
63
the elapsed time, as we do not have to wait for steps 6 and 8 to get over to start
with testing steps 7 and 9 respectively.
If a set of components and their related interfaces can deliver
functionality without expecting the presence of other components or with
minimal interface requirement in the software/product, then that set of
components and their related interfaces is called as a "sub-system." Each sub-
system in a product can work independently with or without other sub-
systems. This makes the integration testing easier and enables focus on
required interfaces rather than getting worried about each and every
combination components.
The top-down integration explanation above assumes that component
provides all the interface requirements of other components even while other
components are getting ready and does not require modification at a later stage
(that is, after the other components have been developed). This approach
reflects the Waterfall or V model of software development.
If a component at a higher level requires a modification every time a
module gets added to the bottom, then for each component addition integration
testing needs to be repeated starting from step 1. This may be a requirement for
an iterative model of software development. Hence whatever may be the
software development model; top-down integration can be still applied with
appropriate repetition in integration testing. .
64
The number of steps in the bottom-up approach can be optimized into
four steps, by combining steps 2 and 3 and by combining steps 5-8.
65
needs to consider several other perspectives such as availabilil1 of components,
technology used, process, testing skills, and resource availability.
System integration means that all the components of the system are
integrated and tested as a single unit. Integration testing, which is testing of
interfaces, can be divided into two types:
• Components or sub-system integration
• Final integration testing or system integration
When looking at steps for each of the above integration methodologies it is
obvious that complete system integration is also covered as the last step. Thus,
system integration is actually a part of every methodology described above.
The salient point this testing methodology raises, is that of optimization.
Instead of integrating component by component and testing, this approach
waits till all components arrive and one round of integration testing is done.
This approach is also called big-bang integration. It reduces testing effort and
removes duplication in testing.
System integration using the big bang approach is well suited in product
development scenario where the majority of components are already available
and stable and very few components get added or modified. In this case, instead
of testing component interfaces one by one, it makes sensei integrate all the
components at one go and test once, saving effort and time for the multi-step
component integrations.
66
While this approach saves time and effort, it is also not without
disadvantages. Some of the important disadvantages that can have a bearing on
the release dates and quality of a product are as follows.
1. When a failure or defect is encountered during system integration is very
difficult to locate the problem, to find out in which interface the defect
exists. The debug cycle may involve focusing on specific interfaces and
testing them again.
2. The ownership for correcting the root cause of the defect may be a
difficult issue to pinpoint.
3. When integration testing happens in the end, the pressure from the
approaching release date is very high. This pressure on the engineers
may cause them to compromise on the quality of the product.
4. A certain component may take an excessive amount of time to be ready.
This precludes testing other interfaces and wastes till' till the end.
As a result of all these factors, the choice of the method of integration
testing becomes extremely crucial. A judicious combination of the above -
methods would be needed to achieve effectiveness in the time and quail of
integration testing.
67
2. A set of components and their related interfaces can deliver functionality
without expecting the presence of other components or with minimal
interface requirement in the software/product, which set of components
and their related interfaces is called as a "sub-system."
3. System integration means that all the components of the system are
integrated and tested as a single unit. Integration testing, which is
testing of interfaces, can be divided into two types:
• Components or sub-system integration
• Final integration testing or system integration
68
LESSON 9
Integration testing as a phase of testing starts from the point where two
components can be tested together, to the point where all the components work
together as a complete system delivering system/product functionality. In the
integration testing phase, the focus is not only on whether functionality of the
components works well, but also on whether they work together and deliver
sub-system and system functionality.
The integration testing phase focuses on finding defects which
predominantly arise because of combining various components for testing, and
should not be focused on for component or few components. Integration testing
as a type focuses on testing the interfaces. This is a subset of the integration
testing phase. When a sub-system or system components are put together (or
integrated), the defects not only arise because of interfaces, but also for various
other reasons such as usage, incomplete understanding of product domain,
user errors, and so on. Hence the integration testing phase needs to focus on
interfaces as well as usage flow. It is very important to note this point to avoid
confusion between integration testing type and integration testing phase.
Integration testing as a phase involves different activities and different
types of testing have to be done in that phase. This is a testing phase that
should ensure completeness and coverage of testing for functionality. To
achieve this, the focus should not only be on planned test case execution but
also on unplanned testing, which is termed as "ad hoc testing." A principle of
Testing, there is no end to testing, and quality cannot depend only on pre-
written test cases; ad hoc testing becomes important to integration testing
69
phase. There are different terminologies associated with ad hoc testing, such as
exploratory testing, monkey testing, out of the box testing, and so on. All these
tests perform the same functions during integration testing phase, that is,
uncover or unearth those defects which are not found by planned test case
execution. This approach helps in locating some problems which are difficult to
find by test teams but also difficult to imagine in the first place. The approach
also helps in generating a comfort feeling on the software and getting an overall
acceptance of the product from all internal users of the system.
The integration testing phase involves developing and executing test
cases that cover multiple components and functionality. When the functionality
of different components are combined and tested together for a sequence
related operations, they are called scenarios. Scenario testing is a planned
activity to explore different usage patterns and combine them into test cases
called scenario test cases. We will see scenario testing in more detail in the next
section.
70
Life cycle/state transition Consider an object, derive the different transitions/
modifications that happen to the object, and derive scenarios to cover them. For
example, in a savings bank account, you can start, opening an account with a
certain amount of money; make a deposit perform a withdrawal, calculate
interest, and so on. All these activities are applied to the "money" object, and
the different transformations, applied to the "money" object becomes different
scenarios.
Deployment/implementation stories from customer Develop a scenario from
a known customer deployment/implementation details and create a set of
activities by various users in that implementation.
Business verticals Visualize how a product/software will be applied to different
verticals and create a set of activities as scenarios to address specific vertical
businesses. For example, take the purchasing function. It may be done
differently in different verticals like pharmaceuticals, software houses, and
government organizations. Visualizing these different types of tests make the
product "multi-purpose."
Battle ground Create some scenarios to justify that "the product works" and
some scenarios to "try and break the system" to justify "the product doesn't
work." This adds flavor to the scenarios mentioned above.
The set of scenarios developed will be more effective if the majority of the
approaches mentioned above are used in combination, not in isolation.
Scenario should not be a set of disjointed activities which have no relation to
each other. Any activity in a scenario is always a continuation of the previous
activity, and depends on or is impacted by the results of previous activities.
Effective scenarios will have a combination of current customer
implementation, foreseeing future use of product, and developing ad hoc test
cases. Considering only one aspect (current customer usage or future customer
requirements, for instance) would make scenarios ineffective. If only current
customer usage is considered for testing, new features may not get tested
adequately. Considering only the future market for scenarios may make the
scenarios test only the new features and some of the existing functionality may
not get tested. A right mix of scenarios using the various approaches explained
above is very critical for the effectiveness of scenario testing.
Coverage is always a big question with respect to functionality in
scenario testing. This testing is not meant to cover different permutations and
combinations of features and usage in a product. However, by using a simple
technique, some comfort feeling can be generated on the coverage of activities
by scenario testing.
71
--------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
72
is only interested in knowing from the computer whether he or she can give the
cash or not. However, the system behavior (computer logic) needs to be tested
before applying the sequence of agent activities and actor activities. In this
example depicted in Figure 9.1, the activities performed by the actor and the
agent can be tested by testers who do not have much knowledge of the product.
Testers who have in-depth knowledge of the product can perform the system
behavior part of testing. They need to know the logic of how the code works and
whether or not the system response is accurate.
As mentioned earlier, actor and agent are roles that represent different
types (classes) of users. Simulating different types of users again needs a clear
understanding of business and the system response for each of the user needs
a clear understanding of how the product is implemented. Hence, testers using
the use case model, with one person testing the actions and other person
testing the system response, complement each other's testing as well as testing
the business and the implementation' aspect of the product at the same time.
The agent part of the use cases are not needed in all cases. In a
completely automated system involving the customer and the system, use cases
can be written without considering the agent portion. Let us extend the earlier
example of cash withdrawal using an ATM. Table 9.1 illustrates how the actor
and system response can be described in the use case.
Table 9.1 Possible actor responses
User selects an account type Ask the user for amount to withdraw
This way of documenting a scenario and testing makes it simple and also
makes it realistic for customer usage. Use cases are not used only for testing. In
some product implementations, use cases are prepared prior to the design and
coding phases, and they are used as a set of requirements for design and
coding phases. All development activities are performed based on use case
73
documentation. In extreme programming models these are termed as user
stories and form the basis for architecture/design and coding phases. Hence,
use cases are useful in combining the business perspectives and
implementation detail and testing them together.
74
LESSON 10
DEFECT BASH
Contents
10.0 Aims and Objectives
10.1 Defect Bash
10.2 Choosing the Frequency and Duration of Defect Bash
10.3 Selecting the Right Product Build
10.4 Communicating the Objective of Defect Bash
10.5 Setting up and monitoring the Lab
10.6 Taking Actions and Fixing Issues
10.7 Optimizing the Effort Involved in Defect Bash
10.8 Let Us Sum Up
75
6. Let testing doesn't wait for lack of/time taken for documentation- "Does
testing wait till all documentation is done?"
7. Enabling people to say "system works" as well as enabling them to
"break-the system" -"Testing isn't to conclude the system works or
doesn't work"
Even though it is said that defect bash is an ad hoc testing, not all
activities of defect bash are unplanned. All the activities in the defect bash are
planned activities, except for what to be tested. It involves several steps.
Step 1 Choosing the frequency and duration of defect bash
Step 2 Selecting the right product build
Step 3 Communicating the objective of each defect bash to everyone
Step 4 Setting up and monitoring the lab for defect bash
Step 5 Taking actions and fixing issues
Step 6 Optimizing the effort involved in defect bash
Bug bash is where all the developers, testers, program managers, usability
researchers, designers, documentation folks, and even sometimes marketing
people, put aside their regular day-to-day duties and pound on the product to
get as many eyes on the product as possible.
Bug bash sounds similar to eat one's own dog food and is a tool used as part of
test management approach. Bug bash is usually declared in advance to the
team. The test management team sends out the scope and assigns the testers
as resource to assist in setup and also collect bugs. Test management might
use this along with small token prize for good bugs found and/or have small
socials at the end of the Bug Bash. Another interesting bug bash prize was to
Piecing test management team members.
Ad hoc testing is a commonly used term for software testing performed without
planning and documentation. The tests are intended to be run only once,
unless a defect is discovered. Ad hoc testing is a part of exploratory testing,
being the least formal of test methods. In this view, ad hoc testing has been
criticized because it isn't structured, but this can also be strength for important
things can be found quickly. It is performed with improvisation; the tester seeks
to find bugs with any means that seem appropriate. It contrasts to regression
testing that looks for a specific issue with detailed reproduction steps, and a
clear expected result. Ad hoc testing is most often used as a complement to
other types of testing.
76
10.2 CHOOSING THE FREQUENCY AND DURATION OF DEFECT BASH
Since the defect bash involves a large number of people, effort and
planning, a good quality build is needed for defect bash. A regression tested
build would be ideal as all new features and defect fixes would have been
already tested in such a build. An intermediate build where the code
functionality is evolving or an untested build will make the purpose and
outcome of a defect bash ineffective. Where a large number of people are
involved, a good quality product build gives confidence on the product and
progress. Also, when testers doing a defect bash uncover an excessive number
of defects or very severe defects, the confidence of the testers falls and the
perception of the product being unstable lingers on for long.
Even though defect bash is an ad hoc activity, its purpose and objective
have to be very clear. Since defect bash involves people performing different
roles, the contribution they make has to be focused towards meeting the
purpose and objective of defect bash. The objective should be to find a large
number of uncovered defects or finding out system requirements (CPU,
memory, disk, and so on) or finding the non-reproducible or random defects,
which could be difficult to find through other planned tests. Defects that a test
engineer would find easily should not be-the objective of a defect bash. Once
they are told in advance, the members of defect bash team will be in a better
position to contribute towards stated objectives.
77
10.5 SETTING UP AND MONITORING THE LAB
Since defect bashes are planned, short term and resource intensive activities, it
makes sense to setup and monitor a laboratory for this purpose. Finding out
the right configuration, resources (hardware, software, and set of people to
perform defect bash) are activities that have to be planned carefully before a
bash actually starts. Since the effort involved is more, it is critical to ensure
that the right setup is done, so that everyone can perform the desired set of
activities on the software. The majority of defect bash fail due to inadequate
hardware, wrong software configurations, and perceptions related to
performance and scalability of the software. During defect bash, the product
parameters and system resources (CPU, RAM, disk, network) need to be
monitored for defects and also corrected so that users can continue to use the
system for the complete duration of the defect bash.
There are two types of defects that will emerge during a defect bash. The
defects that are in the product, as reported by the users, can be classified as
functional defects. Defects that are unearthed while monitoring the system
resources, such as memory leak, long turnaround time, missed requests, high
impact and utilization of system resources, and so on are called non-functional
defects. Defect bash is a unique testing method which can bring out both
functional and non-functional defects. However, if the lab is not set up properly
or not monitored properly, there is a chance that some of the non-functional
defects may not get noticed at all.
The last step, is to take the necessary corrective action after the defect
bash. Getting a large number of defects from users is the purpose and also the
normal end result from a defect bash. Many defects could be duplicate defects.
However, different interpretations of the same defect by different users, and the
impact of the same defect showing up differently in different places, make them
difficult to be called duplicates. Since there could be a large number of defects,
the approach to fix problems from a defect bash should not be at a per defect
level. It is difficult to solve all the problems if they are taken one by one and
fixed in code. The defects need to be classified into issues at a higher level, so
that a similar outcome can be avoided in future defect bashes. There could be
one defect associated with an issue and there could be several defects that can
be called as an issue. An example of an issue can be "In all components, all
inputs for employee number have to be validated before using them in business
78
logic." This enables all defects from different components to be grouped and
classified as one issue. All the issues reported from a defect bash need to be
taken through a complete code and design inspections, analyzed, and fixed
together in places where a defect could evolve from. So the outcome of a defect
bash can also be used for preventing defects for future defect bashes.
79
is often ignored by many organizations. Owing to project pressure and delay in
development schedules, integration testing may get diluted as it is performed in
between component and system testing phases. A separate test team focusing
on integration testing is an initiative recently taken by several companies to
provide to integration testing the focus it has deserved for a long time.
Integration testing, if done properly, can reduce the number of defects that will
be found in the system testing phase, a phase that is explained in the following
chapter.
We are at the end of the lesson, we look at a glance all discussed above.
The testing by all the participants during defect bash is not based on written
test cases. What is to be tested is left to an individual's decision and creativity.
Defect bash is an activity involving a large amount of effort (since it involves
large a number of people) and an activity involving huge planning (as is evident
from the above steps)
Integration testing is both a type of testing and a phase of testing. Integration
testing starts after each of the components are tested alone and delivered, using
black box testing approaches discussed
Check Your Progress: Model Answers
1. Defect bash is an ad hoc testing where people performing different roles
in an organization test the product together at the same time. This is yen
popular among application development companies, where the product
can be used by people who perform different roles.
2. Defects that a test engineer would find easily should not be-the objective
of a defect bash. Once they are told in advance, the members of defect
bash team will be in a better position to contribute towards stated
objectives.
3. There are two types of defects that will emerge during a defect bash. The
defects that are in the product, as reported by the users, can be
classified as functional defects. Defects that are unearthed while
monitoring the system resources, such as memory leak, long turnaround
time, missed requests, high impact and utilization of system resources,
and so on are called non-functional defects
80
UNIT - III
LESSON 11
This is the entry spot of unit III, in which we will discuss about System and
Acceptance Testing. In this lesson, we will introduce you about system testing
with functional and non-functional testing.
We hope the reader might be able to understand various circumstances of
applying the test methodologies to the real world software.
81
different versions of same product(s) or a different competitive product(s)
is called performance testing.
2. Scalability testing A testing that requires enormous amount of resource
to find out the maximum capability of the system parameters is called
scalability testing.
3. Reliability testing To evaluate the ability of the system or an
independent component of the system to perform its required functions
repeatedly for a specified period of time is called reliability testing.
4. Stress testing Evaluating a system beyond the limits of the specified
requirements or system resources (such as disk space, memory,
processor utilization) to ensure the system does not break down
unexpectedly is called stress testing.
5. Interoperability testing This testing is done to ensure that two or more
products can exchange information, use the information and work
closely.
6. Localization Testing Testing conducted to verify that the localized
product works in different languages is called localization testing.
The definition of system testing can keep changing, covering wider and
more high-level aspects, depending on the context. A solution provided to a
customer may be an integration of multiple products. Each product may be a
combination of several components. A supplier of a component of a product can
assume the independent component as a system in its own right and do system
testing of the component. From the perspective of the product organization,
integrating those components is referred to as sub-system testing. When all
components, delivered by different component developers, are assembled by a
product organization, they are tested together as a system. At the next level,
there are solution integrators who combine products from multiple sources to
provide a complete integrated solution for a client. They put together many
products as a system and perform system testing of this integrated solution.
System testing is performed on the basis of written test cases according
to information collected from detailed architecture/design documents, module
specifications and system requirements specifications. System test cases are
created after looking at component and integration test cases, and are also at
same time designed to include the functionality that tests the system together.
System test cases can also be developed based on user stories, customer
discussions, and points made by observing typical customer usage.
System testing may not include many negative scenario verifications,
such as testing for incorrect and negative values. This is because such negative
testing would have been already performed by component and integration
testing and may not reflect real-life customer usage.
System testing may be started once unit, component, and integration
testing are completed. This would ensure that the more basic program logic
errors and defects have been corrected. Apart from verifying the business
requirements of the product, system testing is done to ensure that the product
is ready for moving to the user acceptance test level.
82
Check your Progress 1
83
the defects is high, then the defects are fixed before the release; else, the
product is released as such. The analysis of defects and their classification into
various categories also gives an idea about the kind of defects that will be found
by the customer after release. This information helps in planning native
approaches and so on. Hence, system testing helps in reducing the risk of
releasing a product.
System testing is highly complementary to other phases of testing. The
component and integration test phases are conducted taking inputs from
functional specification and design. The main focus during these testing phases
and technology and product implementation. On the other hand, customer
scenarios and usage patterns serve as the basis for system testing. Thus system
testing phase complements the earlier phases with an explicit focus on
customers. The system testing phase helps in switching this focus of the
product development team towards customers and their use of the product.
To summarize, system testing is done for the following reasons.
1. Provide independent perspective in testing.
2. Bring in customer perspective in testing.
3. Provide a “fresh pair of eyes” to discover defects not found earlier by
testing.
4. Test product behavior in a holistic, complete and realistic environment
5. Test both functional and non-functional aspects of the product.
6. Build confidence in the product.
7. Analyze and reduce the risk of realizing the product.
8. Ensure all requirements are met and ready the product for acceptance
testing.
84
Functional testing helps in verifying what the system is supposed to do.
It aids in testing the product’s features or functionality. It has only two results
as far as requirements fulfillment is concerned-met or not met. If requirements
are not properly enumerated, functional requirements may be understood in
many ways. Hence, functional testing should have very clear expected results
documented in terms of the behavior of the product. Functional testing
comprises simple methods and steps to execute the test cases. Functional
testing results normally depend on the product, not on the environment. It uses
a pre-determined set of resources and configuration except for a few types of
testing such as compatibility testing where configurations play a role.
Functional testing requires in-depth customer and product knowledge as well
as domain knowledge so as to develop different test cases and find critical
defects; as the focus of the testing is to find defects. Failures in functional
testing normally results in fixes in the code to arrive at the right behavior.
Functional testing is performed in all phases of testing such as unit testing,
component testing, integration testing, and system testing. Having said that,
the functional testing done in the system testing phase (functional system
testing) focuses on product features as against component features and
interface features.
Non-functional testing is performed to verify the quality factors (such as
reliability, scalability etc.). These quality factors are also called non-functional
requirements. Non-functional testing requires the expected results to be
documented in qualitative and quantifiable terms. Non-functional testing
requires large amount of resources and the results are different for different
configurations and resources. Non-functional testing is very complex due to the
large amount of data that needs to be collected and analyzed. The focus on non-
functional testing is to qualify the product and is not meant to be a defect-
finding exercise. Test cases for non-functional testing include clear pass/fail
criteria. However, test results are concluded both on pass/fail definitions and
on the experiences encountered in running the tests.
Apart from verifying the pass or fail status, non-functional tests results
are also determined by the amount of effort involved in executing them and any
problems faced during execution. For example, if a performance test met the
pass/fail criteria after 10 iterations, then the experience is bad and test result
cannot be taken as pass. Either the product or the non-functional testing
process needs to be fixed here.
Non-functional testing requires understanding the product behavior,
design, and architecture and also knowing what the competition provides. It
also requires analytical and statistical skills as the large amount of data
generated requires careful analysis. Failures in non-functional testing affect the
design and architecture much more than the product code. Since non-
functional testing is not repetitive in nature and requires a stable product, it is
performed in the system testing phase.
Some of the points mentioned in the Table 11.1 may be seen as
judgmental and subjective. For example, design and architecture knowledge is
needed for functional testing also. Hence all the above points have to be taken
as guidelines, not dogmatic rules. Since both functional and non-functional
85
aspects are being tested in the system testing phase, the question that can be
asked is "What is the right proportion of test cases/effort for these two types of
testing?" Since functional testing is a focus area starting from the unit testing
phase while non-functional aspects get tested only in the system testing phase,
it is a good idea that a majority of system testing effort be focused on the non-
functional aspects. A 70%-30% ratio between non-functional and functional
testing can be considered good and 50%-50% ratio is a good starting point.
However, this is only a guideline, and the right ratio depends more on the
context, type of release, requirements, and products.
Table 11.1 Functional testing versus non-functional testing
86
--------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------
Now we are at the end of the lesson, and we hope the reader can understand
the concepts of system testing its various sub sections.
System testing is the only phase of testing which tests the both functional and
non-functional aspects of the product. On the functional side, system testing
focuses on real-life customer usage of the product and solutions.
Stress testing Evaluating a system beyond the limits of the specified
requirements or system resources (such as disk space, memory, processor
utilization) to ensure the system does not break down unexpectedly is called
stress testing.
Performance/Load testing, Scalability testing, Reliability testing, Stress testing,
Interoperability testing, Localization Testing are the non-functional testing
methods.
Check Your Progress: Model Answers
1. System testing is defined as a testing phase conducted on the complete
integrated system, to evaluate the system compliance with its specified
requirements. It is done after unit, component and integration testing
phases.
2. System testing is conducted with an objective to find product level
defects and in building the confidence before the product is released to
the customer. Component and integration testing phases focus on
finding defects. If the same focus is provided in system testing and
significant defects are found, it may generate a feeling that the product is
unstable (especially because system testing is closer to product release
than component or integration testing).
3. Non-functional testing is performed to verify the quality factors (such as
reliability, scalability etc.). These quality factors are also called non-
functional requirements. Non-functional testing requires the expected
results to be documented in qualitative and quantifiable terms. Non-
functional testing requires large amount of resources and the results are
different for different configurations and resources
87
LESSON 12
88
defects as early as possible. This has to be done after completing all tests meant
for the current phase, without diluting the test, of the current phase.
There are multiple ways system functional testing is performed. There
are also many ways product level test cases are derived for functional testing.
Some of the common techniques are given below.
1. Design/architecture verification
2. Business vertical testing
3. Deployment testing
4. Beta testing
5. Certification, standards, and testing for compliance.
Describe on Duplication.
Notes: a) Write your answer in the space given below
gg) Check your answer with the one given at the end of this lesson.
--------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
In this method of functional testing, the test cases are developed and
checked against the design and architecture to see whether they are actual
product-level test cases. Comparing this with integration testing, the test cases
for integration testing are created by looking at interfaces whereas system level
test cases are created first and verified with design and architecture to check
whether they are product-level or component-level test cases. The integration
test cases focus on interactions between modules or component whereas the
functional system test focuses on the behavior of the complete product. A side
benefit of this exercise is ensuring completeness of ill product implementation.
This technique helps in validating the product features that are written based
on customer scenarios and verifying them using product implementation. If
there is a test case that is a customer scenario but failed validation using this
technique, then it is moved appropriately to component or integration testing
phases. Since functional testing is performed at various test phases, it is
important to reject the test cases and move them to an earlier phase to catch
defects early and avoid any major surprise at the later phases. Some of the
guidelines used to reject test cases for system functional testing include the
following.
1. Is this focusing on code logic, data structures, and unit of the product?
(If yes, then it belongs to unit testing.)
89
2. Is this specified in the functional specification of any component? (If yes,
then it belongs to component testing.)
3. Is this specified in design and architecture specification for in integration
testing? (If yes, then it belongs to integration testing.)
4. Is it focusing on product implementation but not visible to customers?
(This is focusing on implementation-to be covered in
unit/component/integration testing.)
5. Is it the right mix of customer usage and product implementation?
(Customer usage is a prerequisite for system testing.)
90
Yet another aspect involved in business vertical testing is syndication.
Not all the work needed for business verticals are done by product development
organizations. Solution integrators, service providers pay a license fee to a
product organization and sell the products and solutions using their name and
image. In this case the product name, company name, technology names, and
copyrights may belong to the latter parties or associations and the former would
like to change the names in the product. A product should provide features for
those syndications in the product and they are as tested part of business
verticals testing.
Business vertical testing can be done in two ways-simulation and
replication. In simulation of a vertical test, the customer or the tester assumes
requirements and the business flow is tested. In replication, customer data and
process is obtained and the product is completely customized, tested and the
customized product as it was tested is released to the customer.
As discussed in the chapters on integration testing, business vertical are
tested through scenarios. Scenario testing is only a method to evolve scenarios
and ideas, and is not meant to be exhaustive. It's done more from the
perspective of interfaces and their interaction. Having some business verticals
scenarios created by integration testing ensures quick progress in system
testing, which is done with a perspective of end-to-end scenario. In the system
testing phase, the business verticals are completely tested real-life customer
environment using the aspects such as customization terminology, and
syndication described in the above paragraphs.
System testing is the final phase before product delivery. By this till the
prospective customers and their configuration would be known an: in some
cases the products would have been committed for sale. Hence system testing is
the right time to test the product for those customers who are waiting for it. The
short-term success or failure of a particular product release is mainly assessed
on the basis of on how well these customer requirements are met. This type of
deployment (simulated) testing till happens in a product development company
to ensure that customer deployment requirements are met is called offsite
deployment.
Deployment testing is also conducted after the release of the product by
utilizing the resources and setup available in customer’s locations. This is a
combined effort by the product development organization and the organization
trying to use the product. This is called onsite deployment. Even though onsite
deployment is not conducted in the system testing phase, it is explained here to
91
set the context. It is normally the system testing team that is involved in
completing the onsite deployment test. Onsite deployment testing is considered
to be a part of acceptance testing and is an extension of offsite deployment
testing
Onsite deployment testing is done at two stages. In the first stage (Stage
1), actual data from the live system is taken and similar machines and
configurations are mirrored, and the operations from the users are rerun on the
mirrored deployment machine. This gives an idea whether the enhanced or
similar product can perform the existing functionality without after the user.
This also reduces the risk of a product not being able to satisfy existing
functionality, as deploying the product without adequate testing can cause
major business loss to an organization. Some deployments use intelligent
recorders to record the transactions that happen on a live system and commit
these operations on a mirrored system and then compare the results against
the live system.
The objective of the recorder is to help in keeping the mirrored and live
system identical with respect to business transactions. In the second stage
(Stage 2), after a successful first stage, the mirrored system is made a live
system that runs the new product. Regular backups are taken and alternative
methods are used to record the incremental transactions from the time
mirrored system became live. The recorder that was used in the first stage can
also be used here. However, a different method to record the incremental
transaction is advised, for sometimes failures can happen due to recorder also.
This stage helps to avoid any major failures since some of the failures can be
noticed only after an extended period of time. In this stage, the live system that
was used earlier and the recorded transactions from the time mirrored system
became live, are preserved to enable going back to the old system if any major
failures are observed at this stage. If no failures are observed in this (second)
stage of deployment for an extended period (for example, one month), then the
onsite deployment is considered successful and the old live system is replaced
by the new system. Stages 1 and 2 of deployment testing are represented in
Figure.
92
from the live system are then played back on the product under test under the
supervision of the test engineer (shown by dotted lines). In Stage 2, the test
engineer records all transactions using a recorder and other methods and plays
back on the old live system (shown again by dotted lines).
93
3. Sending some documents for reading in advance and training the
customer on product usage.
4. Testing the product to ensure it meets "beta testing entry criteria." The
customers and the product development/management groups of the
vendor together prepare sets of entry/exit criteria for beta testing.
5. Sending the beta product (with known quality) to the customer and
enable them to carry out their own testing.
6. Collecting the feedback periodically from the customers and prioritizing
the defects for fixing.
7. Responding to customers' feedback with product fixes or documentation
changes and closing the communication loop with the customers in a
timely fashion.
8. Analyzing and concluding whether the beta program met the exit criteria.
9. Communicate the progress and action items to customers and formally
closing the beta program.
10. Incorporating the appropriate changes in the product.
Deciding on the entry criteria of a product for beta testing and deciding
the timing of a beta test poses several conflicting choices to be made. Sending
the product too early, with inadequate internal testing will make the customers
unhappy and may create a bad impression on quality of product. Sending the
product too late may mean too little a time for beta defect fixes and this one
defeats the purpose of beta testing. Late integration testing phase and early
system testing phase is the ideal time for starting a beta program.
It is quite possible that customers discontinue the beta program after
starting it or remain passive, without adequately using the product and giving
feedback. From the customers' perspective, it is possible that beta testing is
normally just one of their activities and it may not be high on their priority list.
Constant communication with the customers is necessary to motivate them to
use the product and help them whenever they are facing problems with the
product. Defects reported in beta programs are also given the same priority and
urgency as that of normal support calls, with the only difference being that the
product development/engineering department is likely to have a more direct
interaction with the beta customers. Failure in meeting beta testing objectives
or in giving timely fixes may mean some customers rejecting the product.
One other challenge in beta programs is the choice of the number of beta
customers. If the numbers chosen are too few, then the product may not get a
sufficient diversity of test scenarios and test cases. If too many beta customers
are chosen, then the engineering organization may not be able to cope up with
fixing the reported defects in time. Thus the number of beta customers should
be a delicate balance between providing a diversity of product usage scenarios
and the manageability of being able to handle their reported defects effectively.
Finally, the success of a beta program depends heavily on the willingness
of the beta customers to exercise the product in various ways, knowing fully
well that there may be defects. This is not an easy task. As mentioned earlier,
the beta customers must be motivated to see the benefits they can get. Only
customers who can be thus motivated and are willing to play the role of trusted
partners in the evolution of the product should participate in the beta program.
94
Check your progress 3
95
called testing for standards. Once the product is tested for a set of standards,
they are published in the release documentation for the information of the
customers so that they know what standards, are implemented in the product.
There are many contractual and legal requirements for a product. Failing
to meet these may result in business loss and bring legal action against the
organization and its senior management. Some of these requirements could be
contractual obligations and some statutory requirements. Failing to meet these
could severely restrict the market for the product. For example, it may not be
possible to bid for US government organizations if usability guidelines (508
Accessibility Guidelines) are not met. Testing the product for contractual, legal,
and statutory compliance is one of the critical activities of the system testing
team. The following are some examples of compliance testing.
• Compliance to FDA This act by the food and drug administration
requires that adequate testing be done for products such as cosmetics,
drugs, and medical sciences. This also requires that all the test reports
along with complete documentation of test cases, execution information
for each test cycle along with supervisory approvals be preserved for
checking adequacy of tests by the FDA.
• 508 accessibility guidelines This accessibility set of guidelines requires
the product to meet some requirements for its physically challenged
users. These guidelines insist that the product should be as accessible to
physically challenged people as it is to people without those disabilities.
• SOX (Sarbanes-Oxley's Act) This act requires that products and services
be audited to prevent financial fraud in the organization. The software is
required to go through all transactions and list out the suspected faulty
transactions for analysis. The testing for this act helps the top executives
by keeping them aware of financial transactions and their validity.
• OFAC and Patriot Act This act requires the transactions of the banking
applications be audited for misuse of funds for terrorism.
The terms certification, standards and compliance testing are used
interchangeably. There is nothing wrong in the usage of terms as long as the
objective of the testing is met. For example, a certifying agency helping an
organization to meet standards can be called as both certification testing and
standards testing (for example, Open LDAP is both a certification and a
standard).
96
Check Your Progress: Model Answers
1. Duplication refers to the same tests being performed multiple times and
gray area refers to certain tests being missed out in all the phases. A
small percentage of duplication across phases is unavoidable as different
teams are involved
2. The integration test cases focus on interactions between modules or
component where as the functional system test focuses on the behavior
of the complete product. A side benefit of this exercise is ensuring
completeness of ill product implementation
4. Deployment testing is also conducted after the release of the product by
utilizing the resources and setup available in customer’s locations.
97
LESSON 13
NON-FUNCTIONAL TESTING
Contents
13.0 Aims and Objectives
13.1 Non-Functional Testing
13.2 Setting up the Configuration
13.3 Coming up with Entry/Exit Criteria
13.4 Balancing key Resources
13.5 Scalability Testing
13.6 Reliability Testing
13.7 Stress Testing
13.8 Interoperability Testing
13.9 Functional Vs Non-functional Testing
13.10 Let Us Sum Up
98
Operational Readiness Testing
Installation Testing
Security Testing (Application Security, Network, System Security)
We are going to discuss only few of the above types in this lesson.
99
In order to create a "near real-life" environment, the details regarding
customer's hardware setup, deployment information and test data collected in
advance. Test data is built based on the sample data given is a new product,
then information regarding similar or related products is collected. These inputs
help in setting up the test environment close
customer's so that the various quality characteristics of the system can be
verified more accurately.
Coming up with entry and exit criteria is another critical factor in non-
functional testing. Table 6.2 gives some examples of how entry/exit criteria can
be developed for a set of parameters and for various types of nonfunctional
tests. Meeting the entry criteria is the responsibility of the previous test phase
(that is, integration testing phase) or it could be the objective of dry-run tests
performed by the system testing team, before accepting the product for system
testing.
100
LAN and not for WAN. In the case of WAN or routes involving multiple
hops, the packets generated by the product need to be reduced.
5. More disk space or the complete I/O bandwidth can be used for the
product as long as they are available. While disk costs are getting
cheaper, 10 ban3width is not.
6. The customer gets the maximum return on investment (ROI) only if the
resources such as CPU, disk, memory, and network are optimally used.
So there is intelligence needed in the software to understand the server
configuration and its usage.
7. Graceful degradation in non-functional aspects can be expected when
resources in the machine are also utilized for different activities in the
server.
8. Predictable variations in performance or scalability are acceptable for
different configurations of the same product.
9. Variation in performance and scalability is acceptable when some
parameters are tuned, as long as we know the impact of adjusting each
of those tunable parameters.
10. The product can behave differently for non-functional factors for different
configurations such as low-end and high-end servers as long as they
support return on investment. This in fact motivates the customers to
upgrade their resources.
Once such sample assumptions are validated by the development team
and customers, and then non-functional testing is conducted.
101
requirements, design, and architecture together provide inputs to the scalability
testing on what parameter values are to be tested.
Contrary to other types of testing, scalability testing does not end when
the requirements are met. The testing continues till the maximum capability of
a scalable parameter is found out for a particular configuration. Having a highly
scalable system that considers the future requirements of the customer helps a
product to have a long lifetime. Otherwise, each time there are new
requirements, a major redesign and overhaul takes place in the product and
some stable features may stop working because of those changes, thus creating
quality concerns. The cost and effort involved in such product developments are
very high.
Failures during scalability test include the system not responding, or the
system crashing, and so on. But whether the failure is acceptable or not has to
be decided on the basis of business goals and objectives. For example, a
product not able to respond to 100 concurrent users while its objective to serve
at least 200 users simultaneously is considered a failure. When a product
expected to withstand only 100 users fails when its load is increased to 200,
then it is a passed test case and an acceptable situation.
Scalability tests help in identifying the major bottlenecks in a product.
When resources are found to be the bottlenecks, they are increased validating
the assumptions mentioned. If the bottles are in the product, they are fixed.
However, sometimes the under infrastructure such as the operating system or
technology can also become bottlenecks. In such cases, the product
organization is expected to work with the OS and technology vendors to resolve
the issues.
Scalability tests are performed on different configurations to check
product's behavior. For each configuration, data are collected and analyzed. On
completion of the tests, the data collected in the templates are analyzed and
appropriate actions are taken. For example, if CPU utilization approaches to
100%, then another server is set up to share the load or another CPU is added
to the server. If the results are successful, then, tests are repeated for 200 users
and more to find the maximum limit for configuration.
102
13.6 RELIABILITY TESTING
103
13.7 STRESS TESTING
104
--------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
105
information across systems, the structure and interpretation of these
data structures should be consistent across the system.
2. Changes to data representation as per the system requirements
When two different systems are integrated to provide a response to the
user, data sent from the first system in a particular format must be
modified or adjusted to suit the next system's requirement. This would
help the request to be understood by the current system. Only then can
an appropriate response be sent to the user. For example, when a littile
end-ian machine passes data to a big erid-ian machine, the byte ordering
would have to be changed.
3. Correlated interchange of messages and receiving appropriate
responses When one system sends an input in the form of a message,
the next system is in the waiting mode or listening mode to receive the
input. When multiple machines are involved in information exchange,
there could be clashes, wrong response, deadlocks, or delays in
communication. These aspects should be considered in
architecting/designing the product, rather than leave it to be found as a
surprise during the later phases.
4. Communication and messages When a message is passed on from a
system A to system B, if any and the message is lost or gets garbled the
product should be tested to check how it responds to such erroneous
messages. The product must not crash or hang. It should give useful
error messages to the user requesting him to wait for sometime until it
recovers the connection. As multiple products are involved, a generic
error message such as "Error from remote machine" will be misleading
and not value adding. The user need not know where the message is
coming from but needs to understand the cause of the message and the
necessary corrective action.
5. Meeting quality factors When two or more products are put together,
there is an additional requirement of information exchange between
them. This requirement should not take away the quality of the products
that would have been already met individually by the products.
Interoperability testing needs to verify this perspective.
The responsibility for interoperability lies more on the architecture,
design, and standards of various products involved in the domain. Hence,
testing for interoperability yields better results only if the requirements are met
by development activities such as architecture, design, and coding.
Interoperability testing should be restricted to qualify the information exchange
rather than finding defects and fixing them one after another.
Interoperability among products is a collective responsibility and the effort of
many product organizations. All product organizations are expected to work
together to meet the purpose of interoperability. There are standards
organizations that focus on interoperability standards which help the product
organizations to minimize the effort involved in collaborations. They also assist
in defining, implementing, and certifying the standards implementation for
interoperability.
106
One of the fundamental objectives of a project is to collect both the functional
and non-functional requirements. These need to be kept in balance and
harmony, and most importantly not compromised as the project progresses.
Functional Requirements
The official definition for a functional requirement specifies what the system
should do:
"A requirement specifies a function that a system or component must be able to
perform."
Functional requirements specify specific behavior or functions, for example:
"Display the heart rate, blood pressure and temperature of a patient connected to
the patient monitor."
Typical functional requirements are:
• Business Rules
• Transaction corrections, adjustments, cancellations
• Administrative functions
• Authentication
• Authorization –functions user is delegated to perform
• Audit Tracking
• External Interfaces
• Certification Requirements
• Reporting Requirements
• Historical Data
• Legal or Regulatory Requirements
Non-Functional Requirements
The official definition for a non-functional requirement specifies how the system
should behave:
"A non-functional requirement is a statement of how a system must behave, it is a
constraint upon the systems behavior."
Non-functional requirements specify all the remaining requirements not covered
by the functional requirements. They specify criteria that judge the operation of
a system, rather than specific behaviors, for example:
"Display of the patient's vital signs must respond to a change in the patient's
status within 2 seconds."
Typical non-functional requirements are:
Performance - Response Time, Throughput, Utilization, Static Volumetric
107
Security / Penetration Testing, Operational Readiness Testing, Installation
Testing, Security Testing (Application Security, Network, System Security)
2. Scalability tests are performed on different configurations to check product's
behavior. For each configuration, data are collected and analyzed. An
example of a data collection template is given below.
3. Repetitive testing, concurrency, magnitude and random variation are the
topics concentrated on stress testing.
108
LESSON 14
ACCEPTANCE TESTING
Contents
14.0 Aims and Objectives
14.1 Acceptance Testing
14.2 Acceptance Criteria
14.2.1 Acceptance criteria-Product acceptance
14.2.2 Acceptance criteria-Procedure acceptance
14.3 Selecting Test Cases for Acceptance Testing
14.4 Executing Acceptance Tests
14.5 Let Us Sum Up
109
14.1 ACCEPTANCE TESTING
110
14.2 ACCEPTANCE CRITERIA
111
With some criteria as above (except for downtime), it may look as though
there is nothing to be tested or verified. But the idea of acceptance testing here
is to ensure that the resources are available for meeting those SLAs.
As mentioned, the test cases for acceptance testing are selected from the
existing set of test cases from different phases of testing. This section gives
some guideline on what test cases can be included for acceptance testing.
1. End-to-end functionality verification Test cases that include the end-
to-end functionality of the product are taken up for acceptance testing.
This ensures that all the business transactions are tested as a whole and
those transactions are completed successfully. Real-life test scenarios are
tested when the product is tested end-to-end.
2. Domain tests Since acceptance tests focus on business scenarios, the
product domain tests are included. Test cases that reflect business
domain knowledge are included.
3. User scenario tests Acceptance tests reflect the real-life user scenario
verification. As a result, test cases that portray them are included.
4. Basic sanity tests Test that verify the basic existing behavior of the
product are included. These tests ensure that the system performs the
basic operations that it was intended to do. Such tests may gain more
attention when a product undergoes changes or modifications. It is
necessary to verify that the existing behavior is retained without any
breaks.
5. New functionality When the product undergoes modifications or
changes, the acceptance test cases focus on verifying the new features.
6. A few non-functional tests Some non-functional tests are included and
executed as part of acceptance testing to double-check that the non-
functional aspects of the product meet the expectations.
7. Tests pertaining to legal obligations and service level agreements
Tests that are written to check if the product complies with certain legal
obligations and SLAs are included in the acceptance test criteria.
8. Acceptance test data Test cases that make use of customer real-life
data are included for acceptance testing.
112
--------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
113
Check your progress 3
114
LESSON 15
This lesson would be the summing up of all the testing methods discussed in
previous chapters. We aim to consolidate the testing phases and their
procedures. At end of this lesson the reader might be able to understand testing
phases, multiphase testing model and working across multiple releases of a
software.
IEEE standards are most accepted in the software testing industry. However, it
is not mandatory that all software testing processes have to follow the standard.
Software testing has many different phases such as the test planning, test
specification and test reporting phase.
Test plan is the most important phase in the software testing process. It sets
the process rolling and describes the scope of the testing assignment, the
approach methodology, the resource requirement for testing and the project
plan or time schedule. The test plan outlines the test items, system features
testing or checking out the functionality of the system, the testing tasks,
responsibility matrix and the risks associated with the process. The testing task
is achieved by testing different types of test data. The steps that are followed in
system testing are program testing, string testing, system testing, system
documentation and user acceptance testing.
Test specification document helps in refining the test approach that has been
planned for executing the test plan. It identifies the test cases, procedures and
the pass/fail criteria for the assignment. The test case specification document
outlines the actual values required as input parameters in the testing process
and the expected outputs of the testing results. It also identifies the various
constraints related to the test case. It is important to note that test cases are re-
usable components and one test case can be used in various test designs. The
test procedure outlines all the processes that are required to test the system
and implement the test cases.
115
During the testing phase all the activities that occur are documented. There are
various reasons why clear documentation is required during testing. It helps
the development team to understand the bugs and fix them quickly. Incase
there is a change in the testing team it will help the new team members to
quickly understand the process and help in a quick transition. The overall
summary report of the testing process helps the entire project team to
understand the initial flaws in design and development and ensure that the
same errors are not repeated again. There are four types of testing documents.
The transmittal report which specifies the testing events being transmitted from
the development team to the testing team, the test log which is a very important
document and used to document the events that happened during execution,
test incident report which has a list of testing events that requires further
investigation and the test summary report which summarizes the overall testing
activities.
Many software testing companies follow the IEEE standard of software testing
when executing their testing projects. Software application development
companies may have their own testing templates which they use for their
testing requirements. Outsourcing the testing requirements to a third party
vendor helps in improving the quality of the software to the great extent. Also
an unbiased view helps to find out the many different loopholes that are
existent in the software system.
116
bug fixes. This results in releasing a bad quality product and lack of ownership
on issues. It also creates a repetition of test cases at various phases of testing
in case the quality requirements of that phase are not met. Having too strict
entry criteria solves this problem but a lack of parallelism in this case creates a
delay in the release of the product. These two extreme situations are depicted in
the figure 15.1.
The right approach is to allow product quality to decide when to start a
phase and entry criteria should facilitate both the quality requirements for a
particular phase and utilize the earliest opportunity for starting a particular
phase. The team performing the earlier phase has the owner ship to meet the
entry criteria of the following phase.
Some sample entry and exit criteria are given in table. Please note that
there is no entry and exit criteria for unit testing as it starts soon after the code
is ready to compile and the entry criteria for component testing can serve as
exit criteria for unit testing. However, unit test regression continues till the
product is released. The criterion given below enables the product quality to
decide on starting/completing test phases at the same time and creates many
avenues for allowing parallelism among test phases. The following figure 15.1
illustrates the three possible entry criteria of testing phases.
Integration
Testing
System Testing
Acceptance
Time line
117
Too Mild Entry Criteria
Unit Testing
Component
Testing
Integration
Testing
System Testing
Acceptance
Time line
Optimized Entry
Unit Testing
Criteria
Component
Testing
Integration
Testing
System Testing
Acceptance
Time line
Figure 15.1 Possible entry criteria for testing phases
The above three time charts mention about the time when a testing phase must
be started and when it must be ended. The duration of time taken to complete
each phase is classified into optimum, mild and strict. The Software Test
Engineer specific on his requirements and time schedule can follow one among
the above.
118
15.2 Entry/Exit Criteria for Testing Models
Table 15.1 Sample entry and exit criteria for component testing
Component Testing
Periodic unit test progress report No extreme and critical outstanding
showing 70% completion rate defects in features
Stable build (installable) with basic All 100% component test cases
features working executed with at least 98% pass ratio
Table 15.2 Sample entry and exit criteria for integration testing
Integration Testing
Periodic component test progress No extreme and critical outstanding
report (with at least 50% completion defects in features
ratio) with at least 70% pass rate
Table 15.3 Sample entry and exit criteria for Acceptance testing
119
Entry Criteria Exit Criteria
Acceptance Testing
Periodic integration test progress All 100% system test cases executed
report with at least 50% pass rate for with at least 98% pass ratio
starting system testing, 90% pass
All 100% acceptance test cases
rate for starting acceptance testing
executed with 100% pass rate
Stable build (production format) with Test summary report all phases
all features integrated consolidated (periodic) and they are
analyzed and defect trend showing
downward trend for last four weeks
No extreme and critical defects Performance, load test report for all
outstanding critical features, system
120
Figure 15.2 Testing with multiple releases
121
Compatibility
(Forward/Backward)
Localization Testing
Interoperability
API/ interface
testing
Performance testing
Load testing
Reliability
122
2. Specified components. A software component must have a specification
in order to be tested. Given any initial state of the component, in a
defined environment, for any fully-defined sequence of inputs and any
observed outcome, it shall be possible to establish whether or not the
component conforms to the specification.
Dynamic execution. Dynamic execution and analysis of the results of
execution must be focused.
Techniques and measures. define test case design techniques and test
measurement techniques. The techniques are defined to help users of
this Standard design test cases and to quantify the testing performed.
The definition of test case design techniques and measures provides for
common understanding in both the specification and comparison of
software testing.
Test process attributes. Describe attributes of the test process that
indicate the quality of the testing performed. These attributes are
selected to provide the means of assessing, comparing and improving test
quality.
Generic test process. define a generic test process. A generic process is
chosen to ensure that this Standard is applicable to the diverse
requirements of the software industry.
4. Performance Testing covers a broad range of engineering or functional
evaluations where a material, product, system, or person is not specified
by detailed material or component specifications: rather, emphasis is on
the final measurable performance characteristics.
Performance testing can refer to the assessment of the performance of a
human examinee. For example, a behind-the-wheel driving test is a
performance test of whether a person is able to perform the functions of
a competent driver of an automobile.
In the computer industry, software performance testing is used to
determine the speed or effectiveness of a computer, network, software
program or device. This process can involve quantitative tests done in a
lab, such as measuring the response time or the number of MIPS
(millions of instructions per second) at which a system functions.
Qualitative attributes such as reliability, scalability and interoperability
may also be evaluated. Performance testing is often done in conjunction
with stress testing.
123
UNIT – IV
LESSON 16
PERFORMANCE TESTING
Contents
16.0 Aims and Objectives
16.1 Introduction
16.2 Factors governing Performance Testing
16.3 Performance Engineering
16.4 Let Us Sum Up
We are into the entry of Unit IV, which explores on performance testing, tools,
processes and Regression Testing. In this first lesson, we will discuss on the
fundamentals of performance testing and factors that govern performance
testing etc.
The reader might be expected to be thorough in writing test cases for all testing
models required for her/his project specification. Choosing any of the testing
method is also best quality of a test engineer expected.
16.1 INTRODUCTION
124
functions. Qualitative attributes such as reliability, scalability and
interoperability may also be evaluated. Performance testing is often done in
conjunction with stress testing.
Performance testing can verify that a system meets the specifications claimed
by its manufacturer or vendor. The process can compare two or more devices or
programs in terms of parameters such as speed, data transfer rate, bandwidth,
throughput, efficiency or reliability.
Performance testing can also be used as a diagnostic aid in locating
communications bottlenecks. Often a system will work much better if a problem
is resolved at a single point or in a single component. For example, even the
fastest computer will function poorly on today's Web if the connection occurs at
only 40 to 50 Kbps (kilobits per second).
125
a lack of satisfactory response and the system starts taking more time to
complete business transactions. The "optimum throughput" is represented by
the saturation point and is the one that represents the maximum throughput
for the product.
126
Network latency = NI + N2 + N3 + N4
Product latency = Al + A2 + A3
Actual response time = Network latency + Product latency
The discussion about the latency in performance is very important, as
any improvement that is done in the product can only reduce the response time
by the improvements made in AI, A2, and A3. If the network latency is more
relative to the product latency and if that is affecting the response time, then
there is no point in improving the product performance. In such case it will be
worthwhile looking at improving the network infrastructure. In cases where
network latency is more or can not be improved, the product can use intelligent
approaches of caching and sending multiple requests in one packet and
receiving responses as a bunch.
The next factor that governs the performance testing is tuning. Tuning is
a procedure by which the product performance is enhanced by setting different
values to the parameters (variables) of the product, operation system, and other
components. Tuning improves the product performance without having to touch
the source code of the product. Each product may have certain parameters or
variables that can be set a run time to gain optimum performance. The default
values that are assumed by such product parameters may not always give
optimum performance for a particular deployment. This necessitates the need
for changing the values of parameters or variables to suit the deployment or for
a particular configuration. Doing performance testing, tuning of parameters is
an important activity that needs to be done before collecting actual numbers.
Another factor that needs to be considered for performance testing is
performance of competitive products. A very well-improved performance of a
product makes no business sense if that performance does not match up to the
competitive products. Hence it is very important to compare the throughput
and response time of the product with those of the competitive products. This
type of performance testing wherein competitive products are compared is
called benchmarking. No two products are the same in features, cost, and
functionality. Hence, it is not easy to decide which parameters must be
compared across two products. A careful analysis is needed to chalk out the list
of transactions to be compared across products, so that an apples-to-apples
comparison becomes possible. This produces meaningful analysis to improve
the performance of the product with respect to competition.
One of the most important factors that affect performance testing is the
availability of resources. A right kind of configuration (both hardware and
software) is needed to derive the best results from performance testing and for
deployments.
The exercise to find out what resources and configurations are needed is
called capacity planning. The purpose of a capacity planning exercise is to help
customers plan for the set of hardware and software resources prior to
installation or upgrade of the product. This exercise also sets the expectations
on what performance the customer will get with the available hardware and
software resources.
127
To summarize, performance testing is done to ensure that a product
• processes the required number of transactions in any given interval
(throughput);
• is available and running under different load conditions (availability);
• responds fast enough for different load conditions (response time);
• delivers worthwhile return on investment for the resources hardware and
software-and deciding what kind of resources are needed for the product
for different load conditions (capacity planning); and
• is comparable to and better than that of the competitors for different
parameters (competitive analysis and benchmarking).
128
More than 200 application specific performance parameters at the client and
server end are monitored and the scalability and stability of the application
under load is studied. Using the knowledge base of common and not-so-
common performance problems, a root cause analysis of the performance
bottlenecks is done and recommendations provided
129
process can involve quantitative tests done in a lab, such as measuring
the response time or the number of MIPS (millions of instructions per
second) at which a system functions.
2. Network latency = NI + N2 + N3 + N4
Product latency = Al + A2 + A3
Actual response time = Network latency + Product latency
3. The following are some of the open source performance testing tools with
descriptions.
a) Apache JMeter
Description: Apache JMeter is a 100% pure Java desktop application
designed to load test functional behavior and measure performance. It
was originally designed for testing Web Applications but has since
expanded to other test functions. Apache JMeter may be used to test
performance both on static and dynamic resources (files, Servlets, Perl
scripts, Java Objects, Data Bases and Queries, FTP Servers and more). It
can be used to simulate a heavy load on a server, network or object to
test its strength or to analyze overall performance under different load
types. You can use it to make a graphical analysis of performance or to
test your server/script/object behavior under heavy concurrent load.
b) benerator
Description:
benerator is a framework for creating realistic and valid high-volume test
data, used for (unit/integration/load) testing and showcase setup.
Metadata constraints are imported from systems and/or configuration
files. Data can be imported from and exported to files and systems,
anonymized or generated from scratch. Domain packages provide
reusable generators for creating domain-specific data as names and
addresses internationalizable in language and region. It is strongly
customizable with plugins and configuration options.
130
LESSON 17
131
5. Analyzing performance test results
6. Performance tuning
7. Performance benchmarking
8. Recommending right configuration for the customers (Capacity Planning)
132
handling 1000 transactions per day transaction not taking more than a
minute."
4. Performance numbers derived from architecture and design The
architect or a designer of a product would normally be in a much better
position than anyone else to say what is the performance expected out of
the product. The architecture and design goals are based on the
performance expected for a particular load. Hence, there is an
expectation that the source code is written in such a way that those
numbers are met. .
There are two types of requirements performance testing focuses on
generic requirements and specific requirements. Generic requirements are
those that are: common across all products in the product domain area. All
products in that area are expected to meet those performance expectations. For
some of the products they are mandated by SLAs (Service Level Agreements)
and standards. The time taken to load a page, initial response when a mouse is
clicked, and times taken to navigate between screens are some examples of
generic requirements. Specific requirements are those that depend on
implementation for a particular product and differ from one product to another
in a given domain. An example of specific performance requirement is the time
taken to withdraw cash in an ATM. During performance testing both generic
and specific requirements need to be tested.
As discussed earlier, the requirements for performance testing also
include the load pattern and resource availability and what is expected from the
product under different load conditions. Hence, while documenting the expected
response time, throughput, or any other performance factor, it is equally
important to map different load conditions as illustrated in the example.
Beyond a particular load, any product shows- some degradation in
performance. While it is easy to understand this phenomenon, it will be very
difficult to do a performance test without knowing the degree of degradation
with respect to load conditions. Massive degradation in performance beyond a
degree is not acceptable by users. For example, ATM cash withdrawal taking
one hour to complete a transaction (regardless of reason or load) is not
acceptable. In such a case, the customer who requested the transaction would
have waited and left the ATM and the money may get disbursed to the person
who reaches the ATM next. The performance values that are in acceptable limits
when the load increases are den' term called II graceful performance
degradation. A performance test case for a product needs to validate this
graceful degradation also as one requirement.
133
--------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
The next step involved in performance testing is writing test cases briefly
discussed earlier, a test case for performance testing should do the following
details defined.
1. List of operations or business transactions to be tested
2. Steps for executing those operations/transactions
3. List of product, as parameters that impact the performance and
their values
4. Loading pattern
5. Resource and their configuration (network, hardware, so
configurations)
6. The expected results (that is, expected response time, through
latency)
7. The product versions/competitive products to be compared and
related information such as their corresponding fields
Performance test cases are repetitive in nature. These test cases normally
executed repeatedly for different values of parameters, its load conditions,
different configurations, and so on. Hence, the de what tests are to be repeated
for what values should be part of the test documentation.
While testing the product for different load patterns, it is important
increase the load or scalability gradually to avoid any unnecessary case of
failures. For example, if an ATM withdrawal fails for ten concurrent operations,
there is no point in trying it for 10,000 operations. The involved in testing for 10
concurrent operations may be several times than that of testing for 10,000
operations. Hence, a methodical application to gradually improve the
concurrent operations by say 10, 100, l000, l0,000 and so on rather than trying
to attempt 10,000 concurrent operations in first iteration itself. The test case
documentation should clearly reflect approach.
Performance testing is a laborious process involving time and effort 1 all
operations/business transactions can be included in performance tell Hence,
all test cases that are part of performance testing have to be as!' different
priorities so that high priority test cases can be completed 1 others. The priority
can be absolute as indicated by the customers or II within the test cases
considered for performance testing. Absolute p;' is indicated by the
requirements and the test team normally assigns relative priority. While
executing the test cases, the absolute and relative priorities are looked at and
the test cases are sequenced accordingly.
134
--------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
Performance testing generally involves less effort for execution but more
effort for planning, data collection, and analysis. As discussed earlier, 100%
end-to-end automation is desirable for performance testing and if that is
achieved, executing a performance test case may just mean certain automated
scripts. However, the most effort-consuming and execution is usually data
135
collection. Data corresponding to the 1011 points needs to be collected while
executing performance tests.
1. Start and end time of test case execution
2. Log and trace/audit files of the product and operating system future
debugging and repeatability purposes)
3. Utilization of resources (CPU, memory, disk, network and so on) on a
periodic basis
4. Configuration of all environmental factors (hardware, and other
components)
5. The response time, throughput, latency, and so on as specific the test
case documentation at regular intervals
Another aspect involved in performance test execution is scenario
testing. A set of transactions/operations that are usually performed by the user
forms the scenario for performance testing. This particular testing is done to
ensure whether the mix of operations/transactions concurrently by different
users/ machines meets the performance criteria. In real life, not all users the
same operation all the time and hence these tests are performed. For example,
not all users withdraw cash from an ATM; some of them query for account
balance; some make deposits, and so on. In this case this (with different users
executing different transactions) is executed with existing automation that is
available and related data is collected using the existing tools.
What performance a product delivers for different configurations of
hardware and network setup, is another aspect that needs to be included
during execution. This requirement mandates the need for repeating for
different configurations. This is referred to as configuration performance tests.
This test ensures that the performance of the product is compatible with
different hardware, utilizing the special nature of those configurations and
yielding the best performance possible. For a given configuration the product
has to give the best possible performance, and if the configuration is better, it
has to get even better of test. The performance test case is repeated for each
row in the following table and factors such as response time and throughput
are recorded and analyzed.
Once performance tests are executed and various data points are
collected, the next step is to plot them. As explained earlier, performance test
cases are repeated for different configurations and different values of
parameters. Hence, it makes sense to group them and plot them in the form of
graphs and charts. Plotting the data helps in making a quick analysis which
would otherwise be difficult to do with only the raw data.
136
--------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
137
The majority of the server-client, Internet, and database applications
store the data in a local high-speed buffer when a query is made. This enables
them to present the data quickly when the same request is made again is called
caching. The performance data need to be differentiated according to where the
result is coming from - the server or the cache. The data points can be kept as
two different data sets-one for cache and one coming from server. Keeping them
as two different data sets enables the performance data to be extrapolated in
future, based on the hit ratio expected in deployments.
For example, assume that data in a cache can produce a response time
of 1000 microseconds and a server access takes 1 microsecond and 90% of the
time a request is satisfied by the cache. Then the average response time is (0.9)
x 1000 + 0.1 x 1 = 900.1Ils. The mean response time is thus calculated as a
weighted average rather than a simple mean.
Some "time initiated activities" of the product or background activities of
the operating system and network may have an effect on the performance data.
An example of one such activity is garbage collection/defragmentation in
memory management of the operating system or a compiler. When such
activities are initiated in the background, degradation in the performance may
be observed. Finding out such background events and separating those data
points and making an analysis would help in presenting the right performance
data.
Once the data sets are organized (after appropriate noise removal and
after appropriate refinement as mentioned above), the analysis of performance
data is carried out to conclude the following.
1. Whether performance of the product is consistent when tests are
executed multiple times
2. What performance can be expected for what type of configuration (both
hardware and software), resources
3. What parameters impact performance and how they can be used to
derive better performance (Please refer to the section on performance
tuning)
4. What is the effect of scenarios involving several mix of operations for the
performance factors
5. What is the effect of product technologies such as caching on per-
formance improvements (Please refer to the section on performance
tuning)
6. Up to what load are the whether the performance "graceful degradation"
7. What is the optimum throughput/response time of the product for a set
of factors such as load, resources, and parameters
8. What performance requirements are met and how the performance looks
when compared to the previous version or the expectations set earlier or
the competition
138
9. Sometime high-end configuration may not be available for performance
testing. In that case, using the current set of performance data and the
charts that are available through performance testing, the performance
numbers that are to be expected from a high-end configuration should be
extrapolated or predicted.
139
There is one important point that needs to be noted while tuning the
product parameters. Performance tuning provides better results only for a
particular configuration and for certain transactions. It would have achieved
the performance goals, but it may have a side-effect on functionality or on some
non-functional aspects. Therefore, tuning may be counter-productive to other
situations or scenarios. This side-effect of tuning product parameters needs to
be analyzed and such side-effects also should be included as part of the
analysis of this performance-tuning exercise.
Tuning the as parameters is another step towards getting better per-
formance. There are various sets of parameters provided by the operating
system under different categories. Those values can be changed using the
appropriate tools that come along with the operating system (for example, the
Registry in MS-Windows can be edited using regedit.exe). These parameters in
the operating system are grouped under different categories to explain their
impact, as given below.
1. File system related parameters (for example, number of open files
permitted)
2. Disk management parameters (for example, simultaneous disk
reads/writes)
3. Memory management parameters (for example, virtual memory page size
and number of pages)
4. Processor management parameters (for example, enabling/ disabling
processors in multiprocessor environment)
5. Network parameters (for example, setting TCP/IP time out)
As explained earlier, not only each of the in parameters but also their
combinations, have different effects on product performance. As before, the
performance tests have to be repeated for different values of each and for a
combination of as parameters. While repeating the tests, the as parameters
need to be tuned before application/product tuning is done.
There is one important point that needs to be remembered when tuning
the as parameters for improving product performance. The machine, on which
the parameter is tuned, may have multiple products and applications that are
running. Hence, tuning an as parameter may give better results for the product
under test, but may heavily impact the other products that are running on the
same machine. Hence, as parameters need to be tuned only when the complete
impact is known to all applications running in the machine or they need to be
tuned only when it is absolutely necessary, giving big performance advantages.
Tuning as parameters for small gains in performance is not the right thing to
do.
The products are normally supported on more than one platform. Hence,
the performance tuning procedure should consider the as parameters and their
effect on all supported platforms for the product.
140
the same architecture, design, functionality, and code. The customers and
types of deployments can also be different. Hence, it will be very difficult to
compare two products on those aspects. End-user transactions/scenarios could
be one approach for comparison. In general, an independent test team or an
independent organization not related to the organizations of the products being
compared does performance benchmarking. This does away with any bias in
the test. The person doing the performance benchmarking needs to have the
expertise in all the products being compared for the tests to be executed
successfully. The steps involved in performance benchmarking are the
following:
1. Identifying the transactions/scenarios and the test configuration
2. Comparing the performance of different products
3. Tuning the parameters of the products being compared fairly to deliver
the best performance
4. Publishing the results of performance benchmarking
As mentioned earlier, as the first step, comparable (apples-to-apples)
transactions/scenarios are selected for performance benchmarking. Normally,
the configuration details are determined well in advance and hence test cases
are not repeated for different configurations. Generally, the test cases for all the
products being compared are executed in the same test bed. However, two to
three configurations are considered for performance benchmarking just to
ensure that the testing provides the breadth required to cover realistic
scenarios.
Once the tests are executed, the next step is to compare the results. This
is where the understanding of the products being compared becomes essential.
Equal expertise level in all the products is desirable for the person doing the
tests. The tunable parameters for the various products may be completely
different and understanding those parameters and their impact on performance
is very important in doing a fair comparison of results. This is one place where
bias can come in. A well tuned product, A, may be compared with a product B
with no parameter tuning, to prove that the product A performs better than B.
It is important that in performance benchmarking all products should be tuned
to the same degree.
From the point of view of a specific product there could be three
outcomes from performance benchmarking. The first outcome can be positive,
where it can be found that a set of transactions/scenarios outperform with
respect to competition. The second outcome can be neutral, where a set of
transactions are comparable with that of the competition. The third outcome
can be negative, where a set of transaction under-perform compared to that of
the competition. The last outcome may be detrimental for the success of the
product; hence, the performance tuning exercise described in the previous
section needs to be performed for this set of transactions using the same
configuration internally by the product organization. If tuning helps in this
case, it at least helps in bringing down the criticality of the failure; else it
requires the performance defects to be fixed and a subset of test cases for
performance benchmarking to be repeated again. Even though it was said that
tuning as an exercise needs to be repeated for the third outcome, it need not be
141
limited only to that situation. Tuning can be repeated for all situations of
positive, neutral, and negative results to derive the best performance results.
Repeating the performance tuning may not be always possible. If neutral
agencies (as benchmarks are done) are involved, then they may just bring out
the apples-to-apples comparison and may not do tuning. In such cases, the test
teams will take care of repeating the tests.
The results of performance benchmarking are published. There types of
publications that are involved. One is an internal, confident publication to
product teams, containing all the three outcomes above and the recommended
set of actions. The positive outcomes of performance benchmarking are
normally published as marketing collateral, which helps as a sales tool for the
product. Also benchmarks conducted by independent organizations are
published as audited benchmarks.
142
meeting the performance requirements of the required load pattern and can
also handle a slight increase in the load pattern. A special configuration
denotes that capacity planning was done considering all future requirements.
There are two techniques that playa major role in capacity planning.
They are load balancing and high availability. Load balancing ensures that the
multiple machines available are used equally to service the transactions. This
ensures that by adding more machines, more load can be handled by the
product. Machine clusters are used to ensure availability. In a cluster there are
multiple machines with shared data so that in case one machine goes down, the
transactions can be handled by another machine in the cluster. When doing
capacity planning, both load balancing and availability factors are included to
prescribe the desired configuration.
The majority of capacities planning exercises are only interpretations of
data and extrapolation of the available information. A minor mistake in the
analysis of performance results or in extrapolation may cause a deviation in
expectations when the product is used in deployments. Moreover, capacity
planning is based on performance test data generated in the test lab, which is
only a simulated environment. In real-life deployment, there could be several
other parameters that may impact product performance. As a result of these
unforeseen reasons, apart from the skills mentioned earlier, experience is
needed to know real-world data and usage patterns for the capacity planning
exercise.
Performance
Check Your Progress: Model Answers
1. Performance compared to the previous release of the same product ,
Performance compared to the competitive products, Performance
compared to absolute numbers derived from need, Performance
numbers derived from architecture and design
2. List of operations or business transactions to be tested
Steps for executing those operations/transactions
List of product, as parameters that impact the performance and their
values
Loading pattern
Resource and their configuration (network, hardware, so configurations)
The expected results (that is, expected response time, through latency)
The product versions/competitive products to be compared and related
information such as their corresponding fields
3. Performance testing is repetitive.
143
Performance test cases cannot be effective without automation and in
most cases it is, in fact, almost impossible to do performance testing
without automation.
The results of performance testing need to be accurate, and manually
calculating the response time, throughput, and so on can introduce
inaccuracy.
Performance testing takes into account several factors. There are far too
many permutations and combination of those factors and it will be
difficult to remember all these and use them if the tests are done
manually.
The analysis of performance results and failures needs to take into
account related information such as resource utilization, log files, trace
files, and so on that are collected at regular intervals. It is impossible to
do this testing and perform the book-keeping of all related information
and analysis manually.
144
LESSON 18
There are two types of tools that can be used for performance testing
functional performance tools and load tools.
Functional performance tools help in recording and playing back the
transactions and obtaining performance numbers. This test generally involves
very few machines. Load testing tools simulate the load condition for
performance testing without having to keep that many users or machines. The
load testing tools simplify the complexities involved in creating the load and
without such load tools it may be impossible to perform these kinds of tests. As
was mentioned earlier, this is only a simulated load and real-life experience
may vary from the simulation.
We list below some popular performance tools:
• Functional performance tools
o Win Runner from Mercury
o QA Partner from Compuware
o Silk test from Segue
• Load testing tools
o Load Runner from Mercury
o QA Load from Compuware
o Silk Performer from Segue
145
There are many vendors who sell these performance tools. The references
at the end of the book point to some of the popular tools.
Performance and load tools can only help in getting performance
numbers. The utilization of resources is another important parameter that
needs to be collected. "Windows Task Manager" and "top" in Linux are examples
of tools that help in collecting resource utilization. Network performance
monitoring tools are available with almost all operating systems today to collect
network data.
Discuss on WinRunner.
Notes: a) Write your answer in the space given below
ww) Check your answer with the one given at the end of this lesson.
--------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
Performance testing follows the same process as any other testing type.
The only difference is in getting more details and analysis. As mentioned earlier,
the effort involved in performance testing is more and tests are generally
repeated several times. The increased effort reflects in increased costs, as the
resources needed for performance testing is quite high. A major challenge
involved in performance testing is getting the right process so that the effort can
be minimized. A simple process for performance testing tries to address these
aspects.
Ever-changing requirements for performance is a serious threat to
product as performance can only be improved marginally by fixing i' the code.
As mentioned earlier, a majority of the performance issues require rework or
changes in architecture and design. Hence, it is important collect the
requirements for performance earlier in the life cycle and addresses them,
because changes to architecture and design late in the cycle are expensive.
While collecting requirements for performance testing, it is important to decide
whether they are testable, that is, to ensure that performance requirements are
quantified and validated in an objective way. If so, the quantified expectation of
performance is documented. Making the requirements testable and measurable
is the first activity needed for the success of performance testing.
146
Figure 18.1 Process on performance testing
The next step in the performance testing process in Figure 18.1 is to
create a performance test plan. This test plan needs to have the following
details.
1. Resource requirements All additional resources that are specifically
needed for performance testing need to be planned and obtained.
Normally these resources are obtained, used for performance test, and
released after performance testing is over. Hence, the resources need to
be included as part of the planning and tracked.
2. Test bed (simulated and real life), test-lab setup The test lab, with all
required equipment and software configuration, has to be set up prior to
execution. Performance testing requires a large number of resources and
requires special configurations. Hence, setting up both the simulated
and real-life environment is time consuming and any mistake in the test-
bed setup may mean that the complete performance tests have be
repeated. Hence, it has to be a part of the planning exercise and tracked.
3. Responsibilities Performance defects, as explained earlier, may cause
changes to architecture, design, and code. Additionally, the teams facing
the customers normally communicate requirements for performance.
Multiple teams are involved in the successful execution of performance
tests and all the teams and people performing different roles need to
work together if the objectives of performance have to be met. Hence, a
matrix containing responsibilities must be worked out as part of the
performance test plan and communicated across all teams.
4. Setting up product traces, audits, and traces (external and internal)
Performance test results need to be associated with traces and audit
trails to analyze the results and defects. What traces and audit trials
have to be collected is planned in advance and is an associated part of
the test plan. This is to be planned in advance, because enabling too
many traces and audit traces may start impacting the performance
results.
147
5. Entry and exit criteria Performance tests require a stable product due
to its complexity and the accuracy that is needed. Changes to the
product affect performance numbers and may mean that the tests have
to be repeated. It will be counter-productive to execute performance test
cases before the product is stable or when changes are being made.
Hence, the performance test execution normally starts after the product
meets a set of criteria. The set of criteria to be met are defined well in
advance and documented as part of the performance test plan. Similarly,
a set of exit criteria is defined to conclude the results of performance
tests.
Designing and automating the test cases form the next step in the
performance test process. Automation deserves a special mention as this step
because it is almost impossible to perform performance testing without
automation.
Entry and exit criteria play a major role in the process of performance
test execution. At regular intervals during product development, the entry
criteria are evaluated and the test is started if those criteria are met. There can
be a separate set of criteria for each of the performance test cases. The entry
criteria need to be evaluated at regular intervals since starting the tests early is
counter-productive and starting late may mean that the performance objective
is not met on time before the release. At the end of performance test execution,
the product is evaluated to see whether it met all the exit criteria. If some of the
criteria are not met, improvements are made to the product and the test cases
corresponding to the exit criteria are re-executed with an objective to fill the
gap. This process is repeated till all the exit criteria are met.
Each of the process steps for the performance tests described above are
critical because of the factors involved (that is; cost, effort, time, and
effectiveness). Hence, keeping a strong process for performance testing provides
a high return on investment.
18.3 CHALLENGES
148
the engineers on these skills and making them available for a long duration for
doing performance testing will help in meeting these skills.
Performance testing requires a large number and amount of resources
such as hardware, software, effort, time, tools, and people. Even large
organizations find these resources that are needed to meet the objectives of
performance testing scarce. Even if they are available, it is so only for a short
duration. This is yet another challenge in performance testing. Looking at the
resources available and trying to meet as many objectives as possible as what is
expected from the teams executing performance tests.
Performance test results need to reflect real-life environment and
expectations. But due to the nature of tools which only simulate the
environment, the test lab that works in a controlled environment, and data sets
which may not have all fields populated the same way as the customer has,
repeating the performance test results in the real-life customer deployments is a
big challenge. Adequate care to create a test bed as close to a customer
deployment is another expectation for performance tests.
Selecting the right tool for the performance testing is another challenge.
There are many tools available for performance testing but not all of them meet
all the requirements. Moreover, performance test tools are expensive and
require additional resources to install and use. Performance tools also expect
the test engineers to learn additional meta-languages and scripts. This throws
up another challenge for performance testing.
Interfacing with different teams that include a set of customers is yet
another challenge in performance testing. Not only the customers but also the
technologists give performance test requirements and development teams.
Performance testing is conducted to meet the expectations of customers,
architects, and development team. As a business case, the performance of the
product needs to match up with the competition. As expectations keep growing
from all directions, it will be difficult to meet all of them at one go. Sustained
effort is needed if the majority of performance expectations have to be met.
Lack of seriousness on performance tests by the management and
development team is another challenge. Once all functionalities are working fine
in a product, it is assumed that the product is ready to ship. Due to various
reasons specified, performance tests are conducted after the features are stable,
and the defects that come out of these tests need to be looked into very
seriously by the management. Since it may be too late to fix some defects or due
to release pressures or due to fixes needed in design and architecture that may
need a big effort in regression or various other reasons, generally some of the
defects from these tests are postponed to the next release. It defeats the
purpose of performance tests. A high degree of management commitment and
directive to fix performance defects before product release are needed for
successful execution of performance tests.
149
b)Check your answer with the one given at the end of this lesson.
--------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
18.4 BENEFITS OF AUTOMATED TESTING
1. Fast - WinRunner runs tests significantly faster than
human users.
2. Reliable - Tests perform precisely the same operations each
time they are run, thereby eliminating human error.
3. Repeatable - You can test how the software reacts under repeated
execution of the same operations.
4. Programmable - You can program sophisticated tests that bring out
hidden information from the application.
5. Comprehensive - You can build a suite of tests that covers every
feature in your application.
6. Reusable - You can reuse tests on different versions of an
application, even if the user interface changes.
150
LESSON 19
REGRESSION TESTING
Contents
19.0 Aims and Objectives
19.1 Introduction
19.2 What is Regression Testing?
19.3 Types of Regression Testing
19.4 When to do Regression Testing?
19.5 Strategies of Regression Testing
19.6 Let Us Sum Up
In this lesson, we introduce regression testing and its types. When a regression
testing must be done is added as a separate section. The reader is expected to
decide on regression testing and to write test cases.
19.1 INTRODUCTION
Regression testing means rerunning test cases from existing test suites to build
confidence that software changes have no unintended side-effects. The “ideal”
process would be to create an extensive test suite and run it after each and
every change. Unfortunately, for many projects this is just impossible because
test suites are too large, because changes come in too fast, because humans are
in the testing loop, because scarce, highly in-demand simulation laboratories
are needed, or because testing must be done on many different hardware and
OS platforms.
Researchers have tried to make regression testing more effective and efficient by
developing regression test selection (RTS) techniques, but many problem
remain, such as:
• Unpredictable performance. RTS techniques sometimes save time and
money, but they sometimes select most or all of the original test cases.
Thus, developers using RTS techniques can find themselves worse off for
having done so.
• Incompatible process assumptions. Testing time is often limited (e.g.,
must be done overnight). RTS techniques do not consider such
constraints and, therefore, can and do select more test cases than can be
run.
• Inappropriate evaluation models. RTS techniques try to maximize
average regression testing performance rather than optimize aggregate
performance over many testing sessions. However, companies that test
frequently might accept less effective, but cheaper individual testing
151
sessions if the system would, nonetheless, be well-tested over some short
period of time.
These and other issues have not been adequately considered in current
research, yet they strongly affect the applicability of proposed regression testing
processes. Moreover, we believe that solutions to these problems can be
exploited, singly and in combination, to dramatically improve the costs and
benefits of the regression testing process.
152
new functionality to be added. Anytime such changes, made, it is important to
ensure that
1. The changes or additions work as designed; and
2. The changes or additions do not break something that is already working
and should continue to work.
Regression testing is designed to address the above two purpose
illustrate this with a simple example.
Assume that in a given release of a product, there were three defects –
D1, D2, and D3. When these defects are reported, presumably the development
team will fix these defects and the testing team will perform tests to ensure that
these defects are indeed fixed. When the customers start using the product
(modified to fix defects D1, D2, and D3) they may encounter defects-D4 and D5.
Again, the development and testing teams will fix and test these new defect
fixes. But, in the process of fixing D4 and D5, as an unintended side-effect, D1
may resurface. Thus, the testing team should not only ensure that the fixes
take care of the defects they are supposed to fix but also that they do not break
anything else that was already working.
Regression testing enables the test team to meet this objective.
Regression testing is important in today's context since software is being
released very often to keep up with the competition and increasing customer
awareness. It is essential to make quick and frequent releases and also deliver
stable software. Regression testing enables that any new feature introduced to
the existing product does not adversely affect the current functionality.
Regression testing follows selective re-testing technique. Whenever the
defect fixes are done, a set of test cases that need to be run to verify the defect
fixes are selected by the test team. An impact analysis is done to what areas
may get impacted due to those defect fixes. Based on the impact analysis, some
more test cases are selected to take care of the impacted areas. Since this
testing technique focuses on reuse of existing test cases that have already been
executed, the technique is called selective re-testing. There may be situations
where new test cases need to be developed to take care of some impacted areas.
However, by and large, regression testing reuses the test cases that are
available, as it focuses on testing the features that are already available and
tested at least once already.
153
19.3 TYPES OF REGRESSION TESTING
Before going into the types of regression testing, let us understand what
a "build" means. When internal or external test teams or customers begin using
a product, they report defects. These defects are analyzed by each developer
who makes individual defect fixes. The developers then do appropriate unit
testing and check the defect fixes into a Configuration Management (CM)
System. The source code for the complete product is then compiled and these
defect fixes along with the existing features get consolidated into the build. A
build thus becomes an aggregation of all the defect fixes and features that are
present in the product.
There are two types of regression testing in practice.
1. Regular regression testing
2. Final regression testing
A regular regression testing is done between test cycles to ensure that the
defect fixes that are done and the functionality that were working with the
earlier test cycles continue to work. A regular regression testing can use more
than one product build for the test cases to be executed.
A "final regression testing" is done to validate the final build before
release. The CM engineer delivers the final build with the media and other
contents exactly as it would go to the customer. The final regression test cycle
is conducted for a specific period of duration, which is mutually agreed upon
between the development and testing teams. This is called the" cook time" for
regression testing. Cook time is necessary to keep testing the product for
certain duration, since some of the defects (for example, Memory leaks) can be
unearthed only after the product has been used for certain time duration. The
product is continuously exercised for the complete duration of the cook time to
ensure that such time-bound defects are identified. Some of the test cases are
repeated to find out whether there are failures in the final product that will
reach the customer. All the defect fixes for the release should have been
completed for the build used for the final regression test cycle. The final
regression test cycle is more critical than any other type or phase of testing, as
this is the only testing that ensures the same build of the product that was
tested reaches the customer.
154
19.4 WHEN TO DO REGRESSION TESTING?
155
19.5 STRATEGIES OF REGRESSION TESTING
Any time you modify an implementation within a program, you should also do
regression testing. You can do so by rerunning existing tests against the
modified code to determine whether the changes break anything that worked
prior to the change and by writing new tests where necessary. Adequate
coverage without wasting time should be a primary consideration when
conducting regression tests. Try to spend as little time as possible doing
regression testing without reducing the probability that you will detect new
failures in old, already tested code.
Some strategies and factors to consider during this process include the
following:
• Test fixed bugs promptly. The programmer might have handled the
symptoms but not have gotten to the underlying cause.
• Watch for side effects of fixes. The bug itself might be fixed but the fix
might create other bugs.
• Write a regression test for each bug fixed.
• If two or more tests are similar, determine which is less effective and get
rid of it.
• Identify tests that the program consistently passes and archive them.
• Focus on functional issues, not those related to design.
• Make changes (small and large) to data and find any resulting
corruption.
• Trace the effects of the changes on program memory.
156
feature introduced to the existing product does not adversely affect the
current functionality.
2. A "final regression testing" is done to validate the final build before
release. The CM engineer delivers the final build with the media and
other contents exactly as it would go to the customer. The final
regression test cycle is conducted for a specific period of duration, which
is mutually agreed upon between the development and testing teams.
This is called the" cook time" for regression testing.
3. A defect tracking system is used to communicate the status of defect
fixes amongst the various stake holders. When a developer fixes a defect,
the defect is sent back to the test engineer for verification using the
defect tracking system. The test engineer needs to take the appropriate
action of closing the defect if it is fixed or reopening it if it has not been
fixed properly.
157
LESSON 20
158
Check your progress 1
159
20.1.2 UNDERSTANDING THE CRITERIA FOR SELECTING THE TEST
CASES
160
the cause of the failure. It is also recommended that the regular test cycles
before regression testing should have the right mix of both positive and negative
test cases.
The selection of test cases for regression testing depends more on the
impact of defect fixes than the criticality of the defect itself. A minor defect can
result in a major side-effect and a defect fix for a critical defect can have little or
minor side-effect. Hence the test engineer needs to balance these aspects while
selecting test cases for regression testing.
Selecting regression test cases is a continuous process. Each time a set
of regression tests (also called regression test bed) is to be executed, the test
cases need to be evaluated for their suitability, based on the above conditions.
When the test cases have to be selected dynamically for each regression
run, it would be worthwhile to plan for regression testing from the beginning of
project, even before the test cycles start. To enable choosing the right tests for a
regression run, the test cases can be classified into various priorities based on
importance and customer usage. As an example, we can the test cases into
three categories.
Priority-0 These test cases can be called sanity test cases which check
basic functionality and are run for accepting the b further testing. They
are also run when a product goes through a major change. These test
cases deliver a very high project value to both to product development
teams and to the customers.
Priority-1 Uses the basic and normal setup and these test cases deliver
high project value to both development team and to customers.
Priority-2 These test cases deliver moderate project value. They are
executed as part of the testing cycle and selected for regression testing
on a need basis.
Once the test cases are classified into different priorities, the test cases
can be selected. There could be several right approaches to regression testing
161
which need to be decided on "case to case" basis. There are several
methodologies available in the industry for selecting regression test cases. The
methodology discussed in this section takes into account the criticality and
impact of defect fixes after test cases are classified into several priorities as
explained in the previous section.
Case 1 If the criticality and impact of the defect fixes are low, then it is enough
that a test engineer selects a few test cases from test case database (TCDB), (a
repository that stores all the test cases that can be used for testing a product)
and executes them. These test cases can fall under any priority (0, 1, or 2).
Case 2 If the criticality and the impact of the defect fixes are medium, then we
need to execute all Priority-O and Priority-l test cases. If defect fixes need
additional test cases (few) from Priority-2, then those test cases can also be
selected and used for regression testing. Selecting Priority-2 test cases in this
case is desirable but not necessary.
Case 3 If the criticality and impact of the defect fixes are high, then we need to
execute all Priority-O, Priority-l and a carefully selected subset of Priority-2 test
cases.
The above methodology requires that the impact of defect fixes be
analyzed for all defects. This can be a time-consuming procedure. If, for some
reason, there is not enough time and the risk of not doing an impact analysis is
low, then the alternative methodologies given below can be considered.
Regress all For regression testing, all priority 0, 1, and 2 test cases are
rerun. This means all the test cases in the regression test bed/ suite are
executed.
Priority based regression For regression testing based on this priority,
all priority 0, 1, and 2 test cases are run in order, based on the
availability of time. Deciding when to stop the regression testing is based
on the availability of time.
Regress changes For regression testing using this methodology code
changes are compared to the last cycle of testing and test cases are
selected based on their impact on the code (gray box testing)
Random regression Random test cases are selected and executed for
this regression methodology.
Context based dynamic regression A few Priority-O test cases are
selected, and based on the context created by the analysis of those test
cases after the execution (for example, find new defects, r boundary
value) and outcome, additional related cases are selected for continuing
the regression testing.
An effective regression strategy is usually a combination of all of the
above and not necessarily any of these in isolation.
162
Check your progress 3
After selecting the test cases using the above methodology, the next step
is to prepare the test cases for execution. For proceeding with this step, a "test
case result history" is needed.
In a large product release involving several rounds of testing, it is very
important to record what test cases were executed in which cycle, their results,
and related information. This is called test case result history. This is part of
the test case database.
In many organizations, not all the types of testing or all the test cases are
repeated for each cycle. As mentioned, test case result history provides a wealth
of information on what test cases were executed and when. A method or
procedure that uses test case result history to indicate some of the test cases is
selected for regression testing is called a reset procedure. Resetting a test case
is nothing but setting a flag called not run or execute again in test case
database (TCDB). The reset procedure also hides the test case results of
previous builds for the test cases, so that the test engineer executing the test
cases may not be biased by the result history.
Resetting test cases reduces the risk involved in testing defect fixes by
making the testers go through all the test cases and selecting appropriate test
cases based on the impact of those defect fixes. If there are defect fixes that are
done just before the release, the risk is more; hence, more test cases have to be
selected.
Resetting of test cases is not expected to be done often, and it needs to
be done with the following considerations in mind.
1. When there is a major change in the product.
2. When there is a change in the build procedure which affects the product.
3. Large release cycle where some test cases were not executed for a long
time.
4. When the product is in the final regression test cycle with a few selected
test cases.
163
5. Where there is a situation, that the expected results of the test cases
could be quite different from the previous cycles.
6. The test cases relating to defect fixes and production problems need to be
evaluated release after release. In case they are found to be working fine,
they can be reset.
7. Whenever existing application functionality is removed, the related test
cases can be reset.
8. Test cases that consistently produce a positive result can be removed.
9. Test cases relating to a few negative test conditions (not producing any
defects) can be removed.
When the above guidelines are not met, we may want to rerun the test
cases rather than reset the results of the test cases. There are only a few
differences between the rerun and reset states in test cases. In both instances,
the test cases are executed but in the case of "reset" we can expect a different
result from what was obtained in the earlier cycles. In the case of rerun, the
test cases are expected to give the same test result as in the past; hence, the
management need not be unduly worried because those test cases are executed
as a formality and are not expected to reveal any major problem.
Test cases belonging to the "rerun" state help to gain confidence in the
product by testing for more time. Such test cases are not expected to fail or
affect the release. Test cases belonging to the "reset" state say that the test
results can be different from the past, and only after these test cases are
executed can we know the result of regression and the release status.
For example, if there is a change in the installation of a product, which
does not affect product functionality, then the change can be tested
independently by rerunning some test cases and the test cases do not have to
be "reset." Similarly, if there is a functionality that underwent a major change
(design or architecture or code revamp), then all the related test cases for that
functionality need to be "reset," and these test cases have to be executed again.
By resetting test cases, the test engineer has no way of knowing their past
results. This removes bias and forces the test engineer to pick up those test
cases and execute them.
A return state in a test case indicates low risk and reset status
represents medium to high risk for a release. Hence, close to the product
release, it is a good practice to execute the "reset" test cases first before
executing the "rerun" test cases.
Reset is also decided on the basis of the stability of the functionalities. If
you are in Priority-I and have reached a stage of comfort level in Priority-0 (say,
for example, more than 95% pass rate), then you do not reset Priority-0 test
cases unless there is a major change. This is true with Priority-l test cases
when you are in the Priority-2 test phase.
We will now see illustrate the use of the "reset" flag for regression testing
in the various phases.
164
Component test cycle phase Regression testing between component test
cycles uses only Priority-0 test cases. For each build that enters the test, the
build number is selected and all test cases in Priority0 are reset. The test cycle
starts only if all Priority-O test cases pass.
Integration testing phase After component testing is over, if regression is
performed between integration test cycles Priority-O and Priority-I test cases are
executed. Priority-I testing can use multiple builds, In this phase, the test cases
are "reset" only if the criticality and impact of the defect fixes and feature
additions are high. A "reset" procedure during this phase may affect all Priority-
0 and Priority-1 test cases.
System test phase Priority-2 testing starts after all test cases in Priority-1 are
executed with an acceptable pass percentage as defined in the test plan. In this
phase, the test cases are "reset" only if the criticality and impact of the defect
fixes and feature additions are very high. A "reset" procedure during this phase
may affect Priority-0, Priority-1, and Priority-2 test cases.
Why reset test cases Regression testing uses a good number of test cases
which have already been executed and are associated with some results and
assumptions on the result. A "reset" procedure gives a clear picture of how
much of testing still remains, and reflects the status of regression testing.
If test cases are not "reset," then the test engineers tend to report a
completion rate and other results based on previous builds. This is because of
the basic assumption that multiple builds are used in each phase of the testing
and a gut feeling that if something passed in the past builds, it will pass in
future builds also. Regression testing does not go with an assumption that
"Future is an extension of the past." Resetting as a procedure removes any bias
towards test cases because resetting test case results prevents the history of
test cases being viewed by testers.
Apart from test teams, regression test results are monitored by many
people in an organization as it is done after test cycles and sometimes very
close to the release date. Developers also monitor the results from regression as
they would like to know how well their defect fixes work in the product. Hence,
there is a need to understand a method for concluding the results of regression.
Since regression uses test cases that have already executed more than
once, it is expected that 100% of those test cases pass using the same build, if
defect fixes are done right. In situations where the pass percentage is not 100,
the test manager can compare with the previous results of the test case to
conclude whether regression was successful or not.
If the result of a particular test case was a pass using the previous builds
and a fail in the current build, then regression has failed. A new build is
required and the testing must start from scratch after resetting the test
cases.
165
If the result of a particular test case was a fail using the previous builds
and a pass in the current build, then it is safe to assume the defect fixes
worked.
If the result of a particular test case was a fail using the previous builds
and a fail in the current build and if there are no defect fixes for this
particular test case, it may mean that the result of this test case should
not be considered for the pass percentage. This may also mean that such
test cases should not be selected for regression.
If the result of a particular test case is a fail using the previous builds
but works with a documented workaround and if you are satisfied with
the workaround, then it should considered as a pass for both the system
test cycle and regression test cycle.
If you are not satisfied with the workaround, then it should be
considered as a fail for a system test cycle but may be considered as a
pass for regression test cycle.
166
automated test cases in the regression test bed can be executed along with
nightly builds to ensure that the quality of the product is maintained during
product development phases.
It was mentioned earlier that the knowledge of defects, products, their
interdependences and a well-structured methodology are all very important to
select test cases. These points stress the need for selecting the right person for
the right job. The most experienced person in the team or the most talented
person in the team may do a much better job of selecting the right test cases for
regression than someone with less experience. Experience and talent can bring
in knowledge of fragile areas in the product and impact the analysis, of defects.
Please look at the pictures below. In the first picture, the tiger has been
put in a cage to prevent harm to human kind. In the second picture, some
members of a family are lie inside the mosquito net as prevention against from
mosquitoes.
The same strategy has to be adopted for regression. Like the tiger in the
cage, all defects in the product have to be identified and fixed. This is what
"detecting defects in your product" means. All the testing types discussed in the
earlier chapters and regression testing adopt this technique to find each defect
and fix it.
The photograph of the family under the mosquito net signifies "protecting
your product from defects." The strategy followed here is of defect prevention.
There are many verification and quality assurance activities such as reviews
and inspections that try to do this.
Another aspect related to regression testing is "protecting your product
from defect fixes." As discussed earlier, a defect that is classified as a minor
defect may create a major impact on the product when it gets fixed into the
code, It is similar to what a mosquito can do to humans (impact), even though
its size is small. Hence, it is a good practice to analyze the impact of defect
fixes, irrespective of size and criticality, before they are incorporated into the
code. The analysis of an impact due to defect fixes is difficult due to lack of time
and the complex nature of the products. Hence, it is a good practice to limit the
amount of changes in the product when close to the release date. This will
prevent the product from defects that may seep in through the defect fixes
route, just as mosquitoes can get into the mosquito net through a small hole
there. If you make a hole for a mosquito to get out of the net, it also opens the
doors for new mosquitoes to come into the net. Fixing a problem without
analyzing the impact can introduce a large number of defects in the product.
Hence, it is important to insulate the product from defects as well as defect
fixes.
If defects are detected and the product is protected from defects and
defect fixes, then regression testing becomes effective and efficient.
167
• A methodology for selecting test cases
• Resetting the test cases for test execution
• Concluding the results of a regression cycle
2. Selecting regression test cases is a continuous process. Each time a set
of regression tests (also called regression test bed) is to be executed, the
test cases need to be evaluated for their suitability, based on the above
conditions.
3. Context based dynamic regression A few Priority-O test cases are
selected, and based on the context created by the analysis of those test
cases after the execution (for example, find new defects, r boundary
value) and outcome, additional related cases are selected for continuing
the regression testing.
168
UNIT - V
LESSON 21
TEST PLANNING
Contents
21.0 Aims and Objectives
21.1 Test Planning
21.1.1 Preparing a Test Plan
21.1.2 Scope Management: Deciding Features to be Tested/Not Tested
21.1.3 Deciding Test Approach/Strategy
21.1.4 Setting up Criteria for Testing
21.1.5 Identifying Responsibilities, Staffing and Training Needs
21.1.6 Identifying Resource Requirements
21.1.7 Identifying Test Deliverables
21.1.8 Testing Tasks: Size and Effort Estimation
21.1.9 Activity Breakdown and Scheduling
21.1.210 Communication Management
21.1.11 Risk Management
21.2 Let Us Sum Up
169
conclude this chapter by sharing some of the best practices in test management
and execution.
Testing - like any project - should be driven by a plan. The test plan acts
as the anchor for the execution, tracking, and reporting of the entire testing
project and covers
1. What needs to be tested -the scope of testing, including clear
identification of what will be tested and what will not be tested,
2. How the testing i£ going to be performed - breaking down the testing into
small and manageable tasks and identifying the strategies to be used for
carrying out the tasks.
3. What resources are needed for testing-computer as well as human
resources?
4. The time lines by which the testing activities will be performed.
5. Risks that may be faced in all of the above, with appropriate mitigation
and contingency plans.
As was explained in the earlier chapters, various testing teams do testing for
various phases of testing. One single test plan can be prepared to cover all
phases and all teams or there can be separate plans for each phase or for each
type of testing. For example, there needs to be plans for unit testing integration
testing, performance testing, and acceptance testing, and so on. They can all be
part of a single plan or could be covered by multiple plans. In situations where
there are multiple test plans, there should be one test plan, which covers the
activities common for all plans. This is called the master test plan.
Scope management pertains to specifying the scope of a project. For
testing, scope management entails
1. Understanding what constitutes a release of a product;
2. Breaking down the release into features;
170
3. Prioritizing the features for testing;
4. Deciding which features will be tested and which will not be; and
5. Gathering details to prepare for estimation of resources for testing.
It is always good to start from the end-goal or product-release perspective
and get a holistic picture of the entire product to decide the scope and priority
of testing. Usually, during the planning stages of a release, the features that
constitute the release are identified. For example, a particular release of an
inventory control system may introduce new features to automatically integrate
with supply chain management and to provide the user with various options of
costing. The testing teams should get involved early in the planning cycle and
understand the features. Knowing the features and understanding them from
the usage perspective will enable the testing team to prioritize the features for
testing.
The following factors drive the choice and prioritization of features to be
tested.
Features that are new and critical for the release The new features of a
release set" the expectations of the customers and must perform properly.
These new features result in new program code and 'thus have a higher
susceptibility and exposure to defects. Furthermore, these are likely to be areas
where both the development and testing teams will have to go through a
learning curve. Hence, it makes sense to put these features on top of the
priority list to be tested. This will ensure that these key features get enough
planning and learning time for testing and do not go out with inadequate
testing. In order to get this prioritization right, the product marketing team and
some select customers participate in identification of the features to be tested.
Features whose failures can be catastrophic Regardless of whether a feature
is new or not, any feature the failure of which can be catastrophic or produce
adverse business impact has to be high on the list of features to be tested. For
example, recovery mechanisms in a database will always have to be among the
most important features to be tested.
Features that are expected to be complex to test Early participate the
testing team can help identify features that are difficult to test. This can help in
starting the work on these features early and line up appropriate resources in
time.
Features which are extensions of earlier features that have been defect
prone As we have seen in Chapter 8, Regression Testing, certain areas of a code
tend to be defect prone and such areas need very thorough testing so that old
defects do not creep in again. Such features that are defect prone should be
included ahead of more stable features for testing.
A product is not just a heterogeneous mixture of these features. These
features work together in various combinations and depend on several
environmental factors and execution conditions. The test plan should clearly
identify these combinations that will be tested.
171
Given the limitations on resources and time, it is likely that it will not be
possible to test all the combinations exhaustively. During planning time, a test
manager should also consciously identify the features or combinations that will
not be tested. This choice should balance the requirements of time and
resources while not exposing the customers to any serious defects. Thus, the
test plan should contain clear justifications of why certain combinations will
not be tested and what are the risks that may be faced by doing so.
Check your progress 2
Write notes on Scope Management.
Notes: a) Write your answer in the space given below
eee) Check your answer with the one given at the end of this lesson.
--------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
Once we have this prioritized feature list, the next step is to drill down
into some more details of what needs to be tested, to enable estimation of size,
effort, and schedule. This includes identifying
1. What type of testing would you use for testing the functionality?
2. What are the configurations or scenarios for testing the features?
3. What integration testing would you do to ensure these features work
together?
4. What localization validations would be needed?
5. What "non-functional" tests would you need to do?
We have discussed various types of tests in earlier chapters of this book.
Each of these types has applicability and usefulness under certain conditions.
The test approach/strategy part of the test plan identifies the right type of
testing to effectively test a given feature or combination. The test strategy or
approach should result in identifying the right type of test for each of the
features or combinations. There should also be objective criteria for measuring
the success of a test.
172
tests after development delays (see the section on Risk Management below) is
minimized. However, it is futile to run certain tests too early. The entry criteria
for a test specify threshold criteria for each phase or type of test. There may
also be entry criteria for the entire testing activity to start. The completion/exit
criteria specify when a test cycle or a testing activity can be deemed complete.
Without objective exit criteria, it is possible or testing to continue beyond the
point of diminishing returns.
A test cycle or a test activity will not be an isolated, continuous activity
hat can be carried out at one go. It may have to be suspended at various points
of time because it is not possible to proceed further. When it is possible to
proceed further, it will have to be resumed. Suspension criteria specify when a
test cycle or a test activity can be suspended. Resumption criteria specify when
the suspended tests can be resumed. Some of the typical suspension criteria
include
1. Encountering more than a certain number of defects, causing frequent
stoppage of testing activity
2. Hitting show stoppers that prevent further progress of testing (for
example, if a database does not start, further tests of query, data
manipulation, and so on are is simply not possible to execute); and
3. Developers releasing a new version which they advise should be used in
lieu of the product under test (because of some critical defect fixes).
When such conditions are addressed, the tests can resume.
173
addition, responsibilities in terms of SLAs for responding to queries should also
be addressed during the planning stage.
Staffing is done based on estimation of effort involved and the availability
of time for release. In order to ensure that the right tasks get executed, the
features and tasks are prioritized the basis of on effort, time, and importance.
People are assigned to tasks that achieve the best possible fit between
the requirements of the job and skills and experience levels needed to perform
that function. It may not always be possible to find the perfect fit between the
requirements and the skills available. In case there are gaps between the
requirements and availability of skills, they should be addressed with
appropriate training programs. It is important to plan for such training
programs upfront as they are usually are de-prioritized under project pressures.
As a part of planning for a testing project, the project manager (or test
manager) should' provide estimates for the various hardware and software
resources required. Some of the following factors need to be considered.
1. Machine configuration (RAM, processor, disk, and so on) needed to run
the product under test
2. Overheads required by the test automation tool, if any
3. Supporting tools such as compilers, test data generators, configuration
management tools, and so on
4. The different configurations of the supporting software (for example, OS)
that must be present
5. Special requirements for running machine-intensive tests such as load
tests and performance tests
6. Appropriate number of licenses of all the software
In addition to all of the above, there are also other implied environmental
requirements that need to be satisfied. These include office space, support
functions (like HR), and so on.
Underestimation 6f these resources can lead to considerable slowing
down of the testing efforts and this can lead to delayed product release and to
de-motivated testing teams. However, being overly conservative and "safe" in
estimating these resources can prove to be unnecessarily expensive. Proper
estimation of these resources requires co-operation and teamwork among
different groups-product development team, testing team, system
administration team, and senior management.
The test plan also identifies the deliverables that should come out of the
test cycle/testing activity. The deliverables include the following, all reviewed
and approved by the appropriate people.
1. The test plan itself (master test plan, and various other test plans for the
project)
174
2. Test case design specifications
3. Test cases, including any automation that is specified in the plan
4. Test logs produced by running the tests
5. Test summary reports
As we will see in the next section, a defect repository gives the status of
the defects reported in a product life cycle. Part of the deliverables of a test cycle
is to ensure that the defect repository is kept current. This includes entering
new defects in the repository and updating the status of defect fixes after
verification. We will see the contents of some of these deliverables in the later
part of this chapter.
175
This methodology of estimating size or complexity of an application is
comprehensive in terms of taking into account realistic factors. The
major challenge in this method is that it requires formal training and is
not easy to use. Furthermore, this method is not directly suited to
systems software type of projects.
3. A somewhat simpler representation of application size is the number of
screens, reports, or transactions. Each of these can't be further classified
as "simple," "medium," or "complex." This classification can be based on
intuitive factors such as number or fields in the screen, number of
validations to be done, and so on.
Extent of automation required When automation is involved, the size of work
to be done for testing increases. This is because, for automation, we should first
perform the basic test case design (identifying input data and expected results
by techniques like condition coverage, boundary value analysis, equivalence
partitioning, and so on.) and then scripting them into the programming
language of the test automation tool.
Number of platforms and inter-operability environments to be tested If a
particular product is to be tested under several different platforms or under
several different configurations then the size of the testing task increases. In
fact as the number of platforms or touch points across different environment
increases, the amount of testing increases almost exponentially.
All the above size estimates pertain to "regular" test case development.
Estimation of size for regression testing involves considering the changes in the
product and other similar factors.
In order to have a better handle on the size estimate, the work to be done
is broken down into smaller and more manageable parts called work breakdown
structure (WBS) units. For a testing project, WBS units are typically test cases
for a given module, test cases for a given platform, and so on. This
decomposition breaks down the problem domain or the product into simpler
parts and is likely to reduce the uncertainty and unknown factors.
Size estimate is expressed in terms of any of the following.
4. Number of test cases
5. Number of test scenarios
6. Number of configurations to be tested
Size estimate provides an estimate of the actual ground to be covered for
testing. This acts as a primary input for estimating effort. Estimating effort is
important because often effort has a more direct influence on cost than size.
The other factors that drive the effort estimate are as follows.
Productivity data Productivity refers to the speed at which the various
activities of testing can be carried out. This is based on historical data available
in the organization. Productivity data can be further classified into the number
of test cases that can be developed per day (or some unit time), the number of
test cases that can be run per day, the number of pages of pages of
documentation that can be tested per day, and so on. Having these fine-grained
176
productivity data enables better planning and increases the confidence level
and accuracy of the estimates.
Reuse opportunities If the test architecture has been designed keeping reuse
in mind, then the effort required to cover a given size of testing can come down.
For example, if the tests are designed in such a way that some of the earlier
tests can be reused, then the effort of test development decreases.
Robustness of processes Reuse is a specific example of process maturity of an
organization. Existence of well-defined processes will go a long way in reducing
the effort involved in any activity. For example, in an organization with higher
levels of process maturity, there are likely to be
1. Well-documented standards for writing test specifications, test scripts,
and so on;
2. Proven processes for performing functions such as reviews and audits;
3. Consistent ways of training people; and
4. Objective ways of measuring the effectiveness of compliance to processes.
All these reduce the need to reinvent the wheel and thus enable
reduction in the effort involved.
Effort estimate is derived from size estimate by taking the individual
WBS units and classifying them as "reusable," "modifications," and "new
development." For example, if parts of a test case can be reused from existing
test cases, then the effort involved in developing these would be close to zero. If,
on the other hand, a given test case is to be developed fully from scratch, it is
reasonable to assume that the effort would be the size of the test case divided
by productivity.
Effort estimate is given in person days, person months, or person years.
The effort estimate is then translated to a schedule estimate. We will address
scheduling in the next sub-section.
177
on it, expecting the duration to come down proportionally. As stated, adding
more people to an already delayed project is a sure way of delaying the project
even further. This is because, when new people are added to a project, it
increases the communication overheads and it takes some time for the new
members to gel with the rest of the team. Furthermore, these WBS units cannot
be executed in any random order because there will be dependencies among the
activities. These dependencies can be external dependencies or internal
dependencies. External dependencies of an activity are beyond the control and
purview of the manager/person performing the activity. Some of the common
external dependencies are
1. Availability of the product from developers;
2. Hiring;
3. Training;
4. Acquisition of hardware/software required for training; and
5. Availability of translated message files for testing.
Internal dependencies are fully within the control of the manager/person
performing that activity. For example, some of the internal dependencies could
be.
1. Completing the test specification
2. Coding/scripting the tests
3. Executing the tests
The testing activities will also face parallelism constraints that will
further restrict the activities that can be done at a time. For example, certain
tests cannot be run together because of conflicting conditions (for example,
requiring different versions of a component for testing) or a high-end machine
may have to be multiplexed across multiple tests.
Based on the dependencies and the parallelism possible, the test
activities are scheduled in a sequence that helps accomplish the activities in the
minimum possible time, while taking care of all the dependencies.
178
21.1.10 COMMUNICATIONS MANAGEMENT
Just like every project, testing projects also face risks. Risks are events
that could potentially affect a project's outcome. These events are normally
beyond the control of the project manager.
1. Identifying the possible risks;
2. Quantifying the risks;
3. Planning how to mitigate the risks; and
4. Responding to risks when they become a reality.
As some risks are identified and resolved, other risks may surface. Hence
as risks can happen any time, risk management is essentially a cycle, which
Des through the above four steps repeatedly.
Risk identification consists of identifying the possible risks that may hit a
project. Although there could potentially be many risks that can hit a project le
risk identification step should focus on those risks that are more likely to
happen. The following are some of the common ways to identify risks in testing.
1. Use of checklists Over time, an organization may find new gleanings on
testing that can be captured in the form of a checklist. For example, if
during installation testing, it is found that a particular step of the
installation has repeatedly given problems, then the checklist can have
an explicit line item to check that particular problem. When checklists
are used for risk identification, there is also a great risk of the checklist
itself being out of date, thereby pointing to red herrings instead of risks!
2. Use of organizational history and metrics When an organization
collects and analyzes the various metrics, the information can provide
valuable insights into what possible risks can hit a project. For example,
the past effort variance in testing can give pointers to how much
contingency planning is required.
3. Informal networking across the industry The informal networking
across the industry can help in identifying risks that other organizations
have encountered.
Risk quantification deals with expressing the risk in numerical terms.
There are two components to the quantification of risk. One is the probability of
the risk happening and the other is the impact of the risk, if the risk happens.
For example, the occurrence of a low-priority defect may have a high
probability, but a low impact. However, a show stopper may have (hopefully!) a
low probability, but a very high impact (for both the customer and the vendor
179
organization). To quantify both these into one number, Risk exposure is used'.
This is defined as the product of risk probability and risk impact. To make
comparisons easy, risk impact is expressed in monetary terms (for example, in
dollars).
Risk mitigation planning deals with identifying alternative strategies to
combat a risk event, should that risk materialize. For example, a couple of
mitigation strategies for the risk of attrition are to spread the knowledge to
multiple people and to introduce organization-wide processes and standards. To
be better prepared to handle the effects of a risk, it is advisable to have multiple
mitigation strategies.
When the above three steps are carried out systematically and in a timely
manner, the organization would be in a better position to respond to the risks,
should the risks become a reality. When sufficient care is not given to these
initial steps, a project may find itself under immense pressure to react to a risk.
In such cases, the choices made may not be the most optimal or prudent, as
the choices are made under pressure.
The following are some of the common risks encountered in testing
projects and their characteristics.
Unclear requirements The success of testing depends a lot on knowing what
the correct expected behavior of the product under test is. When the
requirements to be satisfied by a product are not clearly documented, there is
ambiguity in how to interpret the results of a test. This could result in wrong
defects being reported or in the real defects being missed out. This will, in turn,
result in unnecessary and wasted cycles of communication between the
development and testing teams and consequent loss of time. One way to
minimize the impact of this risk is to ensure upfront participation of the testing
team during the requirements phase itself.
Schedule dependence The schedule of the testing team depends significantly
on the schedules of the development team. Thus, it becomes difficult for the
testing team to line up resources properly at the right time. The impact of this
risk is especially severe in cases where a testing team is shared across multiple-
product groups or in a testing services organization. A possible mitigation
strategy against this risk is to identify a backup project for a testing resource.
Such a backup project may be one of that could use an additional resource to
speed up execution but would not be unduly affected if the resource were not
available. An example of such a backup project is chipping in for speeding up
test automation.
Insufficient time for testing Throughout the book, we have stressed the
different types of testing and the different phases of testing. Though some of
these types of testing-such as white box testing-can happen early in the cycle,
most of the tests tend to happen closer to the product release. For example,
system testing and performance testing can happen only after the entire
product is ready and close to the release date. Usually these tests are resource
intensive for the testing team and, in addition, the defects that these tests
uncover are challenging for the developers to fix. As discussed in performance
testing chapter, fixing some of these defects could lead to changes in
180
architecture and design. Carrying out such changes into the cycle may be
expensive or even impossible. Once the developers fix the defects, the testing
team would have even lesser time to complete the testing and is under even
greater pressure. The use of the V model to at least shift the test design part of
the various test types to the earlier phases of the project can help in
anticipating the risks of tests failing at each level in a better manner. This in
turn could lead to a reduction in the last-minute crunch. The metric days
needed for release when captured and calculated properly, can help in planning
the time required for testing better.
"Show stopper" defects When the testing team reports defects, the dev-
elopment team has to fix them. Certain defects which are show stoppers may
prevent the testing team to proceed further with testing, until development fixes
such show stopper defects. Encountering this type of defects will have a double
impact on the testing team: Firstly, they will not be able to continue with the
testing and hence end up with idle time. Secondly, when the defects do get fixed
and the testing team restarts testing, they would have lost valuable time and
will be under tremendous pressure with the deadline being nearer. This risk of
show stopper defects can pose a big challenge to scheduling and resource
utilization of the testing teams. The mitigation strategies for this risk are similar
to those seen on account of dependence development schedules.
Availability of skilled and motivated people for testing As we saw, People
Issues in Testing, hiring and motivating people in testing is a major challenge.
Hiring, retaining and constant skill upgrade of testers in an organization is
vital. This is especially important for testing functions because of the tendency
of people to look for development positions.
Inability to get a test automation tool Manual testing is error prone and
labor intensive. Test automation alleviates some of these problems. However,
test automation tools are expensive. An organization may face the risk of not
being able to afford a test automation tool. This risk can in turn lead to less
effective and efficient testing as well as more attrition. One of the ways in which
organizations may try to reduce this risk is to develop in-house tools. However,
this approach could lead to an even greater risk of having a poorly written or
inadequately documented in-house tool.
These risks are not only potentially dangerous individually, but even
more dangerous when they occur in tandem. Unfortunately, often, these risks
do happen in tandem! A testing group plans its schedules based on
development schedules, development schedules slip, testing team resources get
into an idle time, pressure builds, schedules slip, and the vicious cycle starts all
over again. It is important that these risks be caught early or before they create
serious impact on the testing teams. Hence, we need to identify the symptoms
for each of these risks. These symptoms and their impacts need to be tracked
closely throughout the project.
181
21.2 LET US SUM UP
182
• Identifying the time required for each of the WBS activities, taking into
account the above two factors.
• Monitoring the progress in terms of time and effort
• Rebalancing schedules and resources as necessary
183
LESSON 22
TEST MANAGEMENT
Contents
22.0 Aims and Objectives
22.1 Test Management
22.2 Choice of Standards
22.3 Test Infrastructure Management
22.4 Test People Management
22.5 Integration with Product Release
22.6 Let Us Sum Up
In this lesson, we are going to introduce Test Management and its fundamental
concepts.
At the end of this lesson, you are able to understand the choice of standards,
test infrastructure management, Test people management and Integration of a
Product Release.
184
3. Test coding standards; and
4. Test reporting standards.
Naming and storage conventions for test artifacts Every test artifact (test
specification, test case, test results, and so on) have to be named appropriately,
and meaningfully. Such naming conventions should enable
1. Easy identification of the product functionality that a set of tests are
intended for; and
2. Reverse mapping to identify the functionality corresponding to a given set
of tests.
This two-way mapping between tests and product functionality through
appropriate naming conventions will enable identification of appropriate tests to
be modified and run when product functionality changes.
In addition to file-naming conventions, the standards may also stipulate
the conventions for directory structures for tests. Such directory structures can
group logically related tests together (along with the related product
functionality). These directory structures are mapped into a configuration
management repository (discussed later in the chapter).
Documentation standards Most of the discussion on documentation and
coding standards pertain to automated testing. In the case of manual testing &
documentation standards correspond to specifying the user and system
responses at the right level of detail that is consistent with the skill level of the
tester.
While naming and directory standards specify how a test entity is
represented externally, documentation standards specify how to capture
information about the tests within the test scripts themselves. Internal
documentation of test scripts are similar to internal documentation of program
code and should include the following.
1. Appropriate header level comments at the beginning of a file that outlines
the functions to be served by the test.
2. Sufficient in-line comments, spread throughout the file, explaining the
functions served by the various parts of a test script. This is especially
needed for those parts of a test script that are difficult to understand or
have multiple levels of loops and iterations.
3. Up-to-date change history information, recording all the changes made to
the test file.
Without such detailed documentation, a person maintaining the test
scripts is forced to rely only on the actual test code or script to guess what the
test is supposed to do or what changes happened to the test scripts. This may
not give a true picture. Furthermore, it may place an undue dependence on the
person who originally wrote the tests.
185
Test coding standards Test coding standards go one level deeper into the tests
and enforce standards on how the tests themselves are written. The standards
may
1. Enforce the right type of initialization and clean-up that the test should
do to make the results independent of other tests;
2. Stipulate ways of naming variables within the scripts to makes sure that
a reader understands consistently the purpose of a variable. (for
example, instead of using generic names such as i, j I and so on, the
names can be meaningful such as network - init_flag);
3. Encourage reusability of test artifacts (for example, all tests should call
an initialization module init_env first, rather than use their own
initialization routines); and
4. Provide standard interfaces to external entities like operating system,
hardware, and so on. For example, if it is required for tests to spawn
multiple as processes, rather than have each of the test, directly spawn
the processes, the coding standards may dictate that they should all call
a standard function, say, create_os_process. By isolating the external
interfaces separately, the tests can be reasonably insulated from changes
to these lower-level layers.
Test reporting standards Since testing is tightly interlinked with product
quality, all the stakeholders must get a consistent and timely view of the
progress of tests. Test reporting standards address this issue. They provide
guidelines on the level of detail that should be present in the test reports, their
standard formats and contents, recipients of the report, and so on. We will
revisit this in more detail later in this chapter.
Internal standards provide a competitive edge to a testing organization
and act as a first-level insurance against employee turnover and attrition.
Internal standards help bring new test engineers up to speed rapidly. When
such consistent processes and standards are followed across an organization, it
brings about predictability and increases the confidence level one can have on
the quality of the final product. In addition, any anomalies can be brought to
light in a timely manner.
186
22.3 TEST INFRASTRUCTURE MANAGEMENT
187
Change control ensures that
1. Changes to test files are made in a controlled fashion and only with
proper approvals.
2. Changes made by one test engineer are not accidentally lost or
overwritten by other changes.
3. Each change produces a distinct version of the file that is recreatable at
any point of time.
4. At any point of time, everyone gets access to only the most recent version
of the test files (except in exceptional cases).
Version control ensures that the test scripts associated with a given release of a
product are base lined along with the product files. Base lining is akin to taking
a snapshot of the set of related files of a version, assigning a unique identifier to
this set. In future, when anyone wants to recreate the environment for the given
release, this label would enable him or her to do so.
CDB, defect repository, and SCM repository should complement each other and
work together in an integrated. For example, the defect repository links the
defects, fixes, and tests. The files for all these will be in the SCM. The meta data
about the modified test files will be in the TCDB. Thus, starting with a given
defect, one can trace all the test cases that test the defect (from the TCDB) and
then find the corresponding test case files and source files from the SCM
repository.
Similarly, in order to decide which tests to run for a given regression run,
1. The defects recently fixed can be obtained from the defect repository and
tests for these can be obtained from the TCDB and included in the
regression tests.
2. The list of files changed since the last regression run can be obtained
from the SCM repository and the corresponding test files traced from the
TCDB.
3. The set of tests not run recently can be obtained from the TCDB and
these can become potential candidates to be run at certain frequencies
188
22.4 TEST PEOPLE MANAGEMENT
189
ensure that testing focuses on finding relevant and important defects
only.
3. Consistent definitions of the various priorities and severities of the
defects. This will bring in a shared vision between development and
testing teams, on the nature of the defects to focus on.
4. Communication mechanisms to the documentation group to ensure that
the documentation is kept in sync with the product in terms of known
defects, workarounds, and so on.
The purpose of the testing team is to identify the defects in the product
and the risks that could be faced by releasing the product with the existing
defects. Ultimately, the decision to release or not is a management decision,
dictated by market forces and weighing the business impact for the
organization and the customers.
190
LESSON 23
TEST PROCESS
Contents
23.0 Aims and Objectives
23.1 Test Process
23.2 Putting Together and Base lining a Test plan
23.3 Test Case Specification
23.4 Update of Traceability Matrix
23.5 Identifying possible Candidates for Automation
23.6 Developing and Base lining Test Cases
23.7 Executing Test Cases and keeping Traceability Matrix Current
23.8 Collecting and Analyzing Metrics
23.9 Preparing Test Summary Report
23.10 Recommending Product Release Criteria
23.11 Let Us Sum Up
In this lesson, we discuss on the process of the testing, how to put base line for
a test plan, test case specification, traceability matrix, developing and executing
test cases and preparing test summary report. At the end of this lesson, you
can be able to understand the test case specification and all the above contents.
A test plan combines all the points discussed above into a single
document that acts as an anchor point for the entire testing project. A template
of a test plan is provided in Appendix B at the end of this chapter. Appendix A
gives a check list of questions that are useful to arrive at a Test Plan.
An organization normally arrives at a template that is to be used across
the board. Each testing project puts together a test plan based on the template.
Should any changes be required in the template, then such a change is made
191
only after careful deliberations (and with appropriate approvals). The test plan
is reviewed by a designated set of competent people in the organization. It then
is approved by a competent authority, which is independent of the project
manager directly responsible for testing. After this, the test plan is base lined
into the configuration management repository. From then on, the base lined
test plan becomes the basis for running the testing project. Any significant
changes in the testing project should thereafter be reflected in the test plan and
the changed test plan base lined again in the configuration management
repository. In addition, periodically, any change needed to the test plan
templates are discussed among the different stake holders and this is kept
current and applicable to the testing teams.
Using the test plan as the basis, the testing team designs test case
specifications, which then becomes the basis for preparing individual test
cases. We have been using the term test cases freely throughout this book.
Formally, a test case is nothing but a series of steps executed on a product,
using a pre-defined set of input data, expected to produce a pre-defined set of
outputs, in a given environment. Hence, a test case specification should clearly
identify
1. The purpose of the test: This lists what feature or part the test is
intended for. The test case should follow the naming conventions (as
discussed earlier) that are consistent with the feature/module being
tested.
2. Items being tested, along with their version/release numbers as
appropriate.
3. Environment that needs to be set up for running the test case: This can
include the hardware environment setup, supporting software
environment setup (for example, setup of the operating system, database,
and so on), setup of the product under test (installation of the right
version, configuration, data initialization, and so on).
4. Input data to be used for the test case: The choice of input data will be
dependent on the test case itself and the technique followed in the test
case (for example, equivalence partitioning, boundary value analysis, and
192
so on). The actual values to be used for the various fields should be
specified unambiguously (for example, instead of saying "enter a three-
digit positive integer," it is better to say "enter 789"). If automated testing
is to be used, these values should be captured in a file and used, rather
than having to enter the data manually every time.
5. Steps to be followed to execute the test: If automated testing is used,
then, these steps are translated to the scripting language of the tool. If
the testing is manual, then the steps are detailed instructions that can
be used by a tester to execute the test. It is important to ensure that the
level of detail in documenting the steps is consistent with the skill and
expertise level of the person who will execute the tests.
6. The expected results that are considered to be "correct results." These
expected results can be what the user may see in the form of a GUI,
report, and so on and can be in the form of updates to persistent storage
in a database or in files.
7. A step to compare the actual results produced with the expected results:
This step should do an "intelligent" comparison of the expected and
actual results to highlight any discrepancies. By "intelligent" comparison,
we mean that the comparison should take care of "acceptable
differences" between the expected results and the actual results, like
terminal ID, user ID, system date, and so on.
8. Any relationship between this test and other tests: This can be in the
form of dependencies among the tests or the possibility of reuse across
the tests.
The test case design forms the basis for writing the test cases. Before writing
the test cases, a decision should be taken as to which tests are to be automated
193
and which should be run manually. Suffice to say here, some of the criteria that
will be used in deciding which scripts to automate include
1. Repetitive nature of the test;
2. Effort involved in automation;
3. Amount of manual intervention required for the test; and
4. Cost of automation tool.
The need for efficient software testing has always existed, because the users
expect software that works. The need for process improvement is apparent, due
to the number of defects delivered, and the time and money consumed in
testing.
We expect that the future will present even more extensive demands to the
testing process. We believe that users will demand better technical quality, and
we expect development organisations to demand less expensive testing.
The solutionThe solution is not more people equipped with more tools. As
already mentioned, the solution is process improvement. The ’Testing Maturity
Model’ (TMM) is developed with this purpose. It is an extension to CMM
Choosing the TMM is a perfectly good choice, and we will work with this and
other SPI-models in the future.
Expand TMM.
Notes: a) Write your answer in the space given below
jjj) Check your answer with the one given at the end of this lesson.
--------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
Based on the test case specifications and the choice of candidates for
automation, test cases have to be developed. The development of test cases
entails translating the test specifications to a form from which the tests can be
executed. If a test case is a candidate for automation, then, this step requires
writing test scripts in the automation language. If the test case is a manual test
case, then test case writing maps to writing detailed step-by-step instructions
for executing the test and validating the results. In addition, the test case
should also capture the documentation for the changes made to the test case
since the original development. Hence, the test cases should also have change
history documentation, which specifies
1. What was the change;
194
2. Why the change was necessitated;
3. Who made the change;
4. When was the change made;
5. A brief description of how the change has been implemented; and
6. Other files affected by the change.
All the artifacts of test cases-the test scripts, inputs, scripts, expected
outputs, and so on-should be stored in the test case database and SCM, as
described earlier. Since these artifacts enter the SCM, they have to be reviewed
and approved by appropriate authorities before being base lined.
195
Notes: a) Write your answer in the space given below
kkk) Check your answer with the one given at the end of this lesson.
--------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
When tests are executed, information about test execution gets collected
in test logs and other files. The basic measurements from running the tests are
then converted to meaningful metrics by the use of appropriate transformations
and formulae, Metrics and Measurements.
196
• Test Incident Report: detailing, for any test that failed, the actual
versus expected result, and other information intended to throw
light on why a test has failed. This document is deliberately
named as an incident report, and not a fault report. The reason is
that a discrepancy between expected and actual results can occur
for a number of reasons other than a fault in the system. These
include the expected results being wrong, the test being run
wrongly, or inconsistency in the requirements meaning that more
than one interpretation could be made. The report consists of all
details of the incident such as actual and expected results, when
it failed, and any supporting evidence that will help in its
resolution. The report will also include, if possible, an assessment
of the impact upon testing of an incident.
• Test Summary Report: A management report providing any
important information uncovered by the tests accomplished, and
including assessments of the quality of the testing effort, the
quality of the software system under test, and statistics derived
from Incident Reports. The report also records what testing was
done and how long it took, in order to improve any future test
planning. This final document is used to indicate whether the
software system under test is fit for purpose according to whether
or not it has met acceptance criteria defined by project
stakeholders.
197
made in the defect repository. Each defect has a unique ID and this is used to
identify the incident. The high impact test incidents (defects) are highlighted in
the test summary report.
Test cycle report As discussed, test projects take place in units of test cycles.
A test cycle entails planning and running certain tests in cycles, each cycle
using a different build of the product. As the product progresses through the
various cycles, it is to be expected to stabilize. A test cycle report, at the end of
each cycle, gives
1. A summary of the activities carried out during that cycle;
2. Defects that were uncovered during that cycle, based on their severity
and impact;
3. Progress -from the previous cycle to the current cycle in terms of defects
fixed;
4. Outstanding defects that are yet to be fixed in this cycle; and
5. Any variations observed in effort or schedule (that can be used for future
planning).
Test summary report The final step in a test cycle is to recommend the
suitability of a product for release. A report that summarizes the results of a
test cycle is the test summary report.
There are two types of test summary reports.
1. Phase-wise test summary, which is produced at the end of every phase
2. Final test summary reports (which has all the details of all testing done
by all phases and teams, also called as "release test report")
198
LESSON 24
TESTING METRICS
Contents
24.0 Aims and Objectives
24.1 What are Metrics and Measurements?
24.2 Why Metrics in Testing?
24.3 Types of Metrics
24.4 Project Metrics
24.4.1 Effort Variance (Planned vs Actual)
24.4.2 Schedule Variance (Planned vs Actual)
24.4.3 Effort Distribution across Phases
24.5 Progress Metrics
24.5.1 Test Defect Metrics
24.5.2 Development Defect Metrics
24.6 Let Us Sum Up
199
3. Any pointers to how the data can be used for future planning and
continuous improvements.
Metrics are thus derived from measurements using appropriate formulae
or calculations. Obviously, the same set of measurements can help product
different set of metrics, of interest to different people.
From the above discussion, it is obvious that in order that a project
performance be tracked and its progress monitored effectively,
1. The right parameters must be measured; the parameters may pertain
to product or to process.
2. The right analysis must be done on the data measured, to draw
correct conclusions about the health of the product or process within
a project or organization.
3. The results of the analysis must be presented in an appropriate form
to the stakeholders to enable them to make the right decisions on
improving product or process quality (or any other relevant business
drivers).
Since the focus of this book is on testing and products under test, only
metrics related to testing and product are discussed in this chapter and not
those meant for process improvements.
The metrics and analysis of metrics may convey the reason when data
points are combined. Relating several data points and consolidating the result
in terms of charts and pictures simplifies the analysis and facilitates the use of
metrics for decision making.
Effort is the actual time that is spent on a particular activity or a phase.
Elapsed days is the difference between the start of an activity and the
completion of the activity. For example, ordering a product through the web
may involve five minutes of effort and three elapsed days. It is the packaging
and shipping that takes that much duration, not the time spent by the person
in ordering. However, in the schedule, this latency or delay needs to be entered
as three days. Of course, during these three days, the person who ordered the
product can get on to some other activity and do it in simultaneously. In
general, effort is derived from productivity numbers, and elapsed days are the
number of days required to complete the set of activities. Elapsed days for a
complete set of activities become the schedule for the project. Collecting and
analyzing metrics involves effort and several steps.
The first step involved in a metrics program is to decide what
measurements are important and collect data accordingly. The effort spent on
testing, number of defects, and number of test cases, are some examples of
measurements. Depending on what the data is used for, the granularity of
measurement will vary.
While deciding what to measure, the following aspects need to be kept in
mind.
1. What is measured should be of relevance to what we are trying to
achieve. For testing functions, we would obviously be interested in the
200
effort spent on testing, number of test cases, number of defects
reported from test cases, and so on.
2. The entities measured should be natural and should not involve too
many overheads for measurements. If there are too many overheads
in making the measurements or if the measurements do not follow
naturally from the actual work being done, then the people who
supply the data may resist giving the measurement data (or even give
wrong data).
3. What is measured should be at the right level of granularity to satisfy
the objective for which the measurement is being made.
Let us look at the last point on granularity of data in more detail. The
different people who use the measurements may want to make inferences on
different dimensions. The level of granularity of data obtained depends on the
level of detail required by a specific audience. Hence the measurements and the
metrics derived from them - will have to be at different levels for different
people. An approach involved in getting the granular detail is called data
drilling. Given in the next page is an example of a data drilling exercise. This is
what typically happens in many organizations when metrics/test reports are
presented and shows how different granularity of data is relevant for decision
making at different levels.
The conversation in the example continues till all questions are answered
or till the defects in focus becomes small in number and can be traced to
certain root causes. The depth to which data drilling happens depends on the
focus area of the discussion or need. Hence, it is important to provide as much
granularity in measurements as possible. In the above example, the
measurement was "number of defects."
Not all conversations involve just one measurement as in the example. A
set of measurements can be combined to generate metrics that will be explained
in further sections of this chapter. An example question involving multiple
measurements is "How many test cases produced the 40 defects in data
migration involving different schema?" There are two measurements involved in
this question: the number of test cases and the number of defects. Hence, the
second step involved in metrics collection is defining how to combine data
points or measurements to provide meaningful metrics. A particular metric can
use one or more measurements.
Knowing the ways in which a measurement is going to be used and
knowing the granularity of measurements leads us to the third step in the
metrics program-deciding the operational requirement for measurements. The
operational requirement for a metrics plan should lay down not only the
periodicity but also other operational issues such as who should collect
measurements, who should receive the analysis, and so on. This step helps to
decide on the appropriate periodicity for the measurements as well as assign
operational responsibility for collecting, recording, and reporting the
measurements and dissemination of the metrics information. Some
measurements need to be made on a daily basis (for example, how many test
cases were executed, how many defects found, defects fixed, and so on). But the
201
metrics involving a question like the one above ("how many test cases produced
40 defects") is a type of metric that needs to be monitored at extended periods
of time, say, once in a week or at the end of a test cycle. Hence, planning
metrics generation also needs to consider the periodicity of the metrics.
The fourth step involved in a metrics program is to analyze the metrics to
identify both positive areas and improvement areas on product quality. Often,
only the improvement aspects pointed to by the metrics are analyzed and
focused; it is important to also highlight and sustain the positive areas of the
product. This will ensure that the best practices get institutionalized and also
motivate the team better.
The final step involved in a metrics plan is to take necessary action and
follow up on the action. The purpose of a metrics program will be defeated if the
action items are not followed through to completion. This is especially true of
testing, which is the penultimate phase before release. Any delay in analysis
and following through with action items to completion can result in undue
delays in product release.
Any metrics program, as described above, is a continuous and ongoing
process. As we make measurements, transform the measurements into metrics,
analyze the metrics, and take corrective action, the issues for which the
measurements were made in the first place will become resolved. Then, we
would have to continue the next iteration of metrics programs, measuring
(possibly) a different set of measurements, leading to more refined metrics
addressing (possibly) different issues.
202
two data points are needed - remaining test cases yet to be executed and how
many test cases can be executed per elapsed day. The test cases that can be
executed per person day are calculated based on a measure called test case
execution productivity. This productivity number is derived from the previous
test cycles. It is represented by the formula, given alongside in the margin.
Thus, metrics are needed to know test case execution productivity and to
estimate test completion date.
It is not testing alone that determines the date at which the product can
be released. The number of days needed to fix all outstanding defects is another
crucial data point. The number of days needed for defects fixes needs to take
into account the" outstanding defects waiting to be fixed" and a projection of
"how many more defects that will be unearthed from testing in future cycles."
The defect trend collected over a period of time gives a rough estimate of the
defects that will come through future test cycles. Hence, metrics helps in
predicting the number of defects that can be found in future test cycles.
The defect-fixing trend collected over a period of time gives another
estimate of the defect-fixing capability of the team. This measure gives the
number of defects that can be fixed in a particular duration by the development
team. Combining defect prediction with defect-fixing capability produces an
estimate of the days needed for the release. The formula given alongside
in the margin can help arrive at a rough estimate of the total days needed for
defect fixes.
Hence, metrics helps in estimating the total days needed for fixing
defects. Once the time needed for testing and the time for defects fixing are
known, the release date can be estimated. Testing and defect fixing are
activities that can be executed simultaneously, as long as there is a regression
testing planned to verify the outstanding defects fixes and their side-effects. If a
product team follows the model of separate development and testing teams, the
release date is arrived at on the basis of which one (days needed for testing or
days needed for defect fixes) is on the critical path. The formula given alongside
in the margin helps in arriving at the release date.
The defect fixes may arrive after the regular test cycles are completed.
These defect fixes will have to be verified by regression testing before the
product can be released. Hence, the formula for days needed for release is to be
modified as alongside in the margin. The above formula can be further tuned to
provide more accuracy to estimates as the current formula does not include
various other activities such as documentation, meetings, and so on. The idea
of discussing the formula here is to explain that metrics are important and help
in arriving at the release date for the product.
The measurements collected during the development and test cycle are
not only used for release but also used for post-release activities. Looking at the
defect trend for a period helps in arriving at approximate estimates for the
number of defects that may get reported post release. This defect trend is used
as one of the parameters to increase the size of the maintenance/sustenance
team to take care of defects that may be reported post release. Knowing the type
of defects that are found during a release cycle and having an idea of all
203
outstanding defects and their impact helps in training the support staff, thereby
ensuring they are well equipped and prepared for the defects that may get
reported by the customers.
Metrics are not only used for reactive activities. Metrics and their
analysis help in preventing the defects proactively, thereby saving cost and
effort. For example, if there is a type of defect (say, coding defects) that is
reported in large numbers, it is advisable to perform a code review and prevent
those defects, rather than finding them one by one and fixing them in the code.
Metrics help in identifying these opportunities.
Metrics are used in resource management to identify the right size of
product development teams. Since resource management is an important
aspect of product development and maintenance, metrics go a long way in
helping in this area.
There are various other areas where metrics can help; ability of test
cases in finding defects is one such area. We discussed test case result history
in Chapter 8, Regression Testing. When this history is combined with the
metrics of the project, it provides detailed information on what test cases have
the capabilities to produce more/less defects in the current cycle.
To summarize, metrics in testing help in identifying
When to make the release.
What to release - Based on defect density (formally defined later) across
modules, their importance to customers, and impact analysis of those
defects, the scope of the product can be decided to release the product on
time. Metrics help in making this decision.
Whether the product is being released with known quality - The idea of
metrics is not only for meeting the release date but also to know the
quality of the product and ascertaining the decision on whether we are
releasing the product with the known quality and whether it will function
in a predictable way in the field.
Metrics can be classified into different types based on what they measure
and what area they focus on. At a very high level, metrics can be classified as
product metrics and process metrics. As explained earlier, process metrics are
not discussed in this chapter.
Product metrics can be further classified as
1. Project metrics A set of metrics that indicates how the project is
planned and executed.
2. Progress metrics A set of metrics that tracks how the different activities
of the project are progressing. The activities include both development
activities and testing activities. Since the focus of this book is testing,
only metrics applicable to testing activities are discussed in this book
(and in this chapter). Progress metrics is monitored during testing
204
phases. Progress metrics helps in finding out the status of test activities
and they are also good indicators of product quality. The defects that
emerge from testing provide a wealth of information that help both
development team and test team to analyze and improve. For this reason,
progress metrics in this chapter focus only on defects. Progress metrics,
for convenience, is further classified into test defect metrics and
development defect metrics.
3. Productivity metrics A set of metrics that takes into account various
productivity numbers that can be collected and used for planning and
tracking testing activities. These metrics help in planning and estimating
of testing activities.
205
late). If the release date (that is, schedule) is met by putting more effort, then
the project planning and execution cannot be considered successful. In fact,
such an idea of adding effort may not be possible always as the resources may
not be available in the organization every time and engineers working late may
not be productive beyond a certain point.
At the same time, if planned effort and actual effort are the same but if
the schedule is not met then too the project cannot be considered successful.
Hence, it is a good idea to track both effort and schedule in project metrics.
The basic measurements that are very natural, simple to capture, and
form the inputs to the metrics in this section are
1. The different activities and the initial baselined effort and schedule for
each of the activities; this is input at the beginning of the project/phase.
2. The actual effort and time taken for the various activities; this is entered
as and when the activities take place.
3. The revised estimate of effort and schedule; these are re-calculated at
appropriate times in the project life.
When the base lined effort estimates, revised effort estimates, and actual
effort are plotted together for all the phases of SOLC, it provides many insights
about the estimation process. As different set of people may get involved in
different phases, it is a good idea to plot these effort numbers phase-wise.
Normally, this variation chart is plotted as the point revised estimates are being
made or at the end of a release.
If there is a substantial difference between the base lined and revised
effort, it points to incorrect initial estimation. Calculating effort variance for
each of the phases (as calculated by the formula below) provides a quantitative
measure of the relative difference between the revised and actual efforts.
If variance takes into account only revised estimate and actual effort,
then a question arises, what is the use of base lined estimate? As mentioned
earlier, the effort variation chart provides input to estimation process. When
estimates are going wrong (or right), it is important to find out where we are
going wrong (or right). Many times the revised estimates are done in a hurry, to
respond fast enough to the changing requirements or unclear requirements. If
this is the case, the right parameter for variance calculation is the base lined
estimate. In this case analysis should point out the problems in the revised
estimation process. Similarly, there could be a problem in the baseline
estimation process that can be brought out by variance calculation. Hence, all
the base lined estimates, revised estimates, and actual effort are plotted
together for each of the phases.
The variance can be negative also. A negative variance is an indication of
an over estimate. These variance numbers along with analysis can help in
better estimation for the next release or the next revised estimation cycle.
206
24.4.2 SCHEDULE VARIANCE (PLANNED VS ACTUAL)
Most software projects are not only concerned about the variance in
effort, but are also concerned about meeting schedules. This leads us to the
schedule variance metric. Schedule variance, like effort variance, is the
deviation of the actual schedule from the estimated schedule. There is one
difference, though. Depending on the SDLC model used by the project, several
phases could be active at the same time. Further, the different phases in SDLC
are interrelated and could share the same set of individuals. Because of all
these complexities involved, schedule variance is calculated only at the overall
project level, at specific milestones, not with respect to each of the SDLC
phases.
Using the data in the above chart, the variance percent can be calculated
using a similar formula as explained in the previous section, considering the
estimated schedule and actual schedule.
Schedule variance is calculated at the end of every milestone to find out
how well the project is doing with respect to the schedule. To get a real picture
on schedule in the middle of project execution, it is important to calculate
"remaining days yet to be spent" on the project and plot it along with the "actual
schedule spent" as in the above chart. "Remaining days yet to be spent" can be
calculated by adding up all remaining activities. If the remaining days yet to be
spent on project is not calculated and plotted, it does not give any value to the
chart in the middle of the project, because the deviation cannot be inferred
visually from the chart. The remaining days in the schedule becomes zero when
the release is met.
Effort and schedule variance have to be analyzed in totality, not in
isolation. This is because while effort is a major driver of the cost, schedule
determines how best a product can exploit market opportunities. Variance can
be classified into negative variance, zero variance, acceptable variance, and
unacceptable variance. Generally 0-5% is considered as acceptable variance.
Probable causes and outcomes under the various scenarios, it may not
reflect all possible causes and outcomes. For example, a negative variance in
phase/module would have nullified the positive variance in another phase of
product module. Hence, it is important to look at the "why and how" in metrics
rather than just focusing on "what" was achieved. The data drilling exercise
discussed earlier will help in this analysis. Some of the typical questions one
should ask to analyze effort and schedule variances are given below.
Did the effort variance take place because of poor initial estimation or
poor execution?
If the initial estimation turns out to be off the mark, is it because of
lack of availability of the supporting data to enable good estimation?
If the effort or schedule in some cases is not in line with what was
estimated, what changes caused the variation? Was there a change in
technology of what was tested? Was there a new tool introduced for
testing? Did some key people leave the team?
207
If the effort was on target, but the schedule was not, did the plan take
into account appropriate parallelism? Did it explore the right
multiplexing of the resources?
Can any process or tool be enhanced to improve parallelism and
thereby speed up the schedules?
Whenever we get negative variance in effort or schedule (that is, we
are completing the project with lesser effort and/or faster than what
was planned), do we know what contributed to the efficiency and if
so, can we institutionalize the efficiencies to achieve continuous
improvement?
Any project needs to be tracked from two angles. One, how well the
project is doing with respect to effort and schedule. This is the angle we have
been looking at so far in this chapter. The other equally important angle is to
find out how well the product is meeting the quality requirements for the
release. There is no point in producing a release on time and within the effort
estimate but with a lot of defects, causing the product to be unusable. One of
the main objectives of testing is to find as many defects as possible before any
customer finds them. The number of defects that are found in the product is
one of the main indicators of quality. Hence in this section, we will look at
progress metrics that reflect the defects (and hence the quality) of a product.
Defects get detected by the testing team and get fixed by the development
team. In line with this thought, defect metrics are further classified in to test
defect metrics (which help the testing team in analysis of product quality and
testing) and development defect metrics (which help the development team in
analysis of development activities).
How many defects have already been found and how many more defects
may get unearthed are two parameters that determine product quality and its
208
assessment. For this assessment, the progress of testing has to be understood.
If only 50% of testing is complete and if 100 defects are found, then, assuming
that the defects are uniformly distributed over the product (and keeping all
other parameters same), another 80-100 defects can be estimated as residual
defects.
The progress chart gives the pass rate and fail rate of executed test
cases, pending test cases, and test cases that are waiting for defects to be fixed.
Representing testing progress in this manner will make it is easy to understand
the status and for further analysis. Another perspective from the chart is that
the pass percentage increases and fail percentage decreases, showing the
positive progress of testing and product quality. The defects that are blocking
the execution of certain test cases also get reduced in number as weeks
progress in the above chart. Hence, a scenario represented by such a progress
chart shows that not only is testing progressing well, but also that the product
quality is improving (which in turn means that the testing is effective). If, on the
other hand, the chart had shown a trend that as the weeks progress, the "not
run" cases are not reducing in number, or "blocked" cases are increasing in
number, or "pass" cases are not increasing, then it would clearly point to
quality problems in the product that prevent the product from being ready for
release.
The test progress metrics discussed in the previous section capture the
progress of defects found with time. The next set of metrics helps us
understand how the defects that are found can be used to improve testing and
product quality. Not all defects are equal in impact or importance. Some
organizations classify defects by assigning a defect priority (for example, PI, P2,
P3, and so on). The priority of a defect provides a management perspective for
the order pf defect fixes. For example, a defect with priority PI indicates that it
should be fixed before another defect with priority P2. Some organizations use
defect severity levels (for example, 51, 52, S3, and so on). The severity of defects
provides the test team a perspective of the impact of that defect in product
functionality. For example, a defect with severity level 51 means that either the
major functionality is not working or the software is crashing. 52 may mean a
failure or functionality not working. From the above example it is clear that
priority is a management perspective and priority levels are relative. This means
209
that the priority of a defect can change dynamically once assigned. Severity is
absolute and does not change often as they reflect the state and quality of the
product. Some organizations use a combination of priority and severity to
classify the defects.
So far, our focus has been on defects and their analysis to help in
knowing product quality and in improving the effectiveness of testing. We will
now take a different perspective and see how metrics can be used to improve
development activities. The defect metrics that directly help in improving
development activities are discussed in this section and are termed as
development defect metrics. While defect metrics focus on the number of
defects, development defect metrics try to map those defects to different
components of the product and to some of the parameters of development such
as lines of code.
While it is important to count the number of defects in the product, for
development it is important to map them to different components of the product
so that they .can be assigned to the appropriate developer to fix those defects.
The project manager in charge of development maintains a module ownership
list. where all product modules and owners are listed. Based on the number of
defects existing in each of the modules, the effort needed to fix them, and the
availability of skill sets for each of the modules, the project manager assigns
resources accordingly.
It can be noted from the chart that there are four components (install,
reports, client, and database) with over 20 defects, indicating that more focus
and resources are needed for these components. The number of defects and
their classification are denoted in different colors and shading as mentioned in
the legend. The defect classification as well as the total defects corresponding to
each component in the product helps the project manager in assigning and
resolving those defects.
There is another aspect of release, that is, what to release. If there is an
independent component which is producing a large number of defects, and if all
other components are stable, then the scope of the release can be reduced to
remove the component producing the defects and release other stable
components thereby meeting the release date and release quality provided the
functionality provided by that component is not critical to the release. The
above classification of defects into components helps in making such decisions.
210
2 Progress metrics A set of metrics that tracks how the different activities
of the project are progressing. The activities include both development
activities and testing activities. Since the focus of this book is testing,
only metrics applicable to testing activities are discussed in this book
(and in this chapter). Progress metrics is monitored during testing
phases. Progress metrics helps in finding out the status of test activities
and they are also good indicators of product quality.
3. Schedule variance is calculated at the end of every milestone to find out
how well the project is doing with respect to the schedule. To get a real
picture on schedule in the middle of project execution, it is important to
calculate "remaining days yet to be spent" on the project and plot it along
with the "actual schedule spent" as in the above chart. "Remaining days
yet to be spent" can be calculated by adding up all remaining activities
211
LESSON 25
PRODUCTIVITY METRICS
Contents
25.0 Aims and Objectives
25.1 Productivity Metrics
25.1.1 Defects per 100 Hours of Testing
25.1.2 Test Cases Executed per 100 Hours of Testing
25.1.3 Test Cases Developed per 100 Hours of Testing
25.1.4 Defects per 100 Test Cases
25.1.5 Defects per 100 Failed Test Cases
25.1.6 Test Phase Effectiveness
25.1.7 Closed Defect Distribution
25.2 Release Metrics
25.3 Let Us Sum Up
In this lesson, we are going discuss on productivity metrics and how to improve
the quality of software by finding defects. And calculating 100 hrs of testing,
identifying defects also a challenging process discussed here in this lesson.
212
using institution. While there is general consensus as to the importance and
abstract nature of a productivity value of merit, there is not a common
viewpoint of the specific metric and its formulation by which to quantitatively
characterize it. The contribution of this paper is to propose a conceptual
framework within which to consider productivity as a parameter and a
formulation by which to quantify it. The objective of this framework is to
establish a set of mutually consistent and complementary metrics that satisfy
the discipline of dimensional analysis and that together provide a rigorous
definition of productivity. It is the intent of this work that its results may serve
as a tool for evaluation and comparison of alternate high performance
computing systems.
Program testing can only prove the presence of defects, never their
absence. Hence, it is reasonable to conclude that there is no end to testing and
more testing may reveal more new defects. But there may be a point of
diminishing returns when further testing may not reveal any defects. If
incoming defects in the product are reducing, it may mean various things.
1. Testing is not effective.
2. The quality of the product is improving.
3. Effort spent in testing is falling.
The first two aspects have been adequately covered in the metrics discussed
above. The metric defects per 100 hours of testing covers the third point and
normalize the number of defects found in the product with respect to the effort
spent. It is calculated as given below:
Defects per 100 hours of testing = (Total defects found in the product for a
period/Total hours spent to get those defects) * 100
Effort plays an important role in judging quality. It is assumed that
constant effort is spent in all the weeks. The chart produced a bell curve,
indicating readiness for the release.
However, in real life, the above assumption may not be true and effort is
not spent equally on testing week by week. Sometimes, the charts and analysis
can be misleading if the effort spent towards the end of the release reduces and
213
may mean that the downward trend in defect arrival is because of less focus on
testing, not because of improved quality.
It is also assumed that 15 hours are spent ill weeks 9 and 10 and 120
hours ill all other weeks. This assumption, which could mean reality too,
actually suggests that the quality of the product has fallen and more defects
were found by investing less effort in testing for weeks 9 and 10. This example
clearly shows that the product is not ready for release at all.
It may be misleading to judge the quality of a product without looking at
effort because a downward trend shown assumes that effort spent is equal
across all weeks. This chart provides the insight-where people were pulled out
of testing or less number of people was available for testing and that is making
the defect count come down. Defects per 100 hours of testing provide this
important perspective, to make the right decision for the release.
The number of test cases executed by the test team for a particular
duration depends on team productivity and quality of product. The team
productivity has to be calculated accurately so that it can be tracked for the
current release and be used to estimate the next release of the product. If the
quality of the product is good, more test cases can be executed, as there may
not be defects blocking the tests. Also, there may be few defects and the effort
required in filing, reproducing, and analyzing defects could be minimized.
Hence, test cases executed per 100 hours of testing helps in tracking
productivity and also in judging the product quality. It is calculated using the
formula
Test cases executed per 100 hours of testing = (Total test cases
executed for a period/Total hours spent in test execution) * 100
Define productivity.
Notes: a) Write your answer in the space given below
6. Check your answer with the one given at the end of this lesson.
--------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
Both manual execution of test cases and automating test cases require
estimating and tracking of productivity numbers. In a product scenario, not all
test cases are written afresh for every release. New test cases are added to
address new functionality and for testing features that were not tested earlier.
Existing test cases are modified to reflect changes in the product. Some test
214
cases are deleted if they are no longer useful or if corresponding features are
removed from the product. Hence the formula for test cases developed uses the
count corresponding to added/modified and deleted test cases.
Test cases developed per 100 hours of testing = (Total test cases
developed for a period /total hours spent in test case development) * 100
Defects per 100 failed test cases are a good measure to find out how
granular the test cases are. It indicates
1. How many test cases need to be executed when a defect is fixed;
2. What defects need to be fixed so that an acceptable number of test cases
reach the pass state; and
3. How the fail rate of test cases and defects affect each other for release
readiness analysis.
Defects per 100 failed test cases = (Total defects found for a period!
Total test cases failed due to those defects) * 100
In Principles of Testing we saw that testing is not the job of testers alone.
Developers perform unit testing and there could be multiple testing teams
performing component, integration, and system testing phases. The idea of
testing is to find defects early in the cycle and in the early phases of testing. As
testing is performed by various teams with the objective of finding defects early
at various phases, a metric is needed to compare the defects filed by each of the
phases in testing. The defects found in various phases such as unit testing
(UT), component testing (CT), integration testing (IT), and system testing (ST).
215
The following few observations can be made.
1. A good proportion of defects were found in the early phases of testing
(UT and CT).
2. Product quality improved from phase to phase (shown by less percent
of defects found in the later test phases-IT and ST).
Extending this data, some projections on post-release defects can be
arrived at. CT found 32% of defects and IT found 17% of defects. This is
approximately a 45% reduction in the number of defects. Similarly,
approximately 35% reduction in the number of defects was found going from IT
to ST. A post release can now assume 35% reduction in the number of defects
which amounts to 7.5% of the total defects. A conservative estimate thus
indicates that close to 7.5% of total defects will be found by customers. This
may not be an accurate estimate but can be used for staffing and planning of
support activities.
The objective of testing is not only to find defects. The testing team also
has the objective to ensure that all defects found through testing are fixed so
that the customer gets the benefit of testing and the product quality improves.
To ensure that most of the defects are fixed, the testing team has to track the
defects and analyze how they are closed. The closed defect distribution helps in
this analysis.
216
to the end user mission. Productivity is defined to reflect the rate, value,
and costs of producing computational results in contribution to
achieving mission objectives of the using institution.
References :
1. SOFTWARE TESTING Principles and Practices – Srinivasan Desikan
& Gopalswamy Ramesh, 2006, Pearson Education.
2. http://www.a2zdotnet.com/View.aspx?id=51
3. http://en.wikipedia.org/wiki/White_box_testing
4. https://buildsecurityin.us-cert.gov/daisy/bsi/articles/best-
practices/white-box/259-BSI.html
5. http://pagkis-software-testing.blogspot.com/2007/10/functional-
testing-vs-non-functional.html
6. http://www.testingstandards.co.uk/non_functional_testing_
techniques.htm
7. http://en.wikipedia.org/wiki/Performance_testing
8. http://www.opensourcetesting.org/performance.php
9. http://www.cs.umd.edu/~aporter/html/currTesting.html
10. http://www.cs.umd.edu/~aporter/html/currTesting.html
11. http://www.stpmag.com/downloads/stp-0507_testmetrics.html
12. http://www.webspiders.com/en/testing_deliverables.asp
13. http://it.toolbox.com/blogs/enterprise-solutions/identifying-test-
metrics-13382
217