ST Module - 5
ST Module - 5
MODULE – 5
CHAPTER -1
INTEGRATION AND COMPONENT-BASED SOFTWARE TESTING
5.1 Overview
The traditional V model introduced in Chapter 2 divides testing into four main levels
of granularity: module, integration, system, and acceptance test. Module or unit test checks
module behavior against specifications or expectations; integration test checks module
compatibility; system and acceptance tests check behavior of the whole system with respect
to specifications and user needs, respectively. An effective integration test is built on a
foundation of thorough module testing and inspection. Module test maximizes controllability
and observability of an individual unit, and is more effective in exercising the full range of
module behaviors, rather than just those that are easy to trigger and observe in a particular
context of other modules.
While integration testing may to some extent act as a process check on module testing
(i.e., faults revealed during integration test can be taken as a signal of unsatisfactory unit
testing), thorough integration testing cannot fully compensate for sloppiness at the module
level. In fact, the quality of a system is limited by the quality of the modules and components
from which it is built, and even apparently noncritical modules can havewidespread effects.
For example, in 2004 a buffer overflow vulnerability in a single, widely used library for
reading Portable Network Graphics (PNG) files caused security vulnerabilities in Windows,
Linux, and Mac OS X Web browsers and email clients.
On the other hand, some unintended side-effects of module faults may become
apparent only in integration test and even a module that satisfies its interface specification
may be incompatible because of errors introduced in design decomposition. Integration tests
therefore focus on checking compatibility between module interfaces.
modules, particularly if the name clash appears rarely and only in some installation
configurations.
The official investigation of the Ariane 5 accident that led to the loss of the rocket on
July 4, 1996 concluded that the accident was caused by incompatibility of a software module
with the Ariane 5 requirements. The software module was in charge of computing the
horizontal bias, a value related to the horizontal velocity sensed by the platform that is
calculated as an indicator of alignment precision. The module had functioned correctly for
Ariane 4 rockets, which were smaller than the Ariane 5, and thus had a substantially lower
horizontal velocity. It produced an overflow when integrated into the Ariane 5 software. The
overflow started a series of events that terminated with self-destruction of the launcher. The
problem was not revealed during testing because of incomplete specifications:
Even if the programming language choice is determined by other factors, many errors
can be avoided by choosing patterns and enforcing coding standards across the entire code
base; the standards can be designed in such a way that violations are easy to detect manually
or with tools. For example, many projects using C or C++ require use of "safe" alternatives to
unchecked procedures, such as requiring strncpy or strlcpy (string copy procedures less
vulnerable to buffer overflow) in place of strcpy. Checking for the mere presence of strcpy is
much easier (and more easily automated) than checking for its safe use. These measures do
not eliminate the possibility of error, but integration testing is more effective when focused
on finding faults that slip through these design measures.
failures that are manifested may propagate across many modules, making fault localization
difficult. Therefore it is worthwhile to thoroughly test a small collection of modules before
adding more.
Since incremental assemblies of modules are incomplete, one must often construct
scaffolding - drivers, stubs, and various kinds of instrumentation - to effectively test them.
This can be a major cost of integration testing, and it depends to a large extent on the order in
which modules are assembled and tested.
One extreme approach is to avoid the cost of scaffolding by waiting until all modules
are integrated, and testing them together - essentially merging integration testing into system
testing. In this big bang approach, neither stubs nor drivers need be constructed, nor must the
development be carefully planned to expose well-specified interfaces to each subsystem.
These savings are more than offset by losses in observability, diagnosability, and feedback.
Delaying integration testing hides faults whose effects do not always propagate outward to
visible failures (violating the principle that failing always is better than failing sometimes)
and impedes fault localization and diagnosis because the failures that are visible may be far
removed from their causes. Requiring the whole system to be available before integration
does not allow early test and feedback, and so faults that are detected are much more costly to
repair. Big bang integration testing is less a rational strategy than an attempt to recover from
a lack of planning; it is therefore also known as the desperate tester strategy.
Memory Leaks
Memory leaks are typical of program faults that often escape module testing. They
may be detected in integration testing, but often escape further and are discoveredonly in
actual system operation.
The Apache Web server, version 2.0.48, contained the following code for reacting to
normal Web page requests that arrived on the secure (https) server port:
This code fails to reclaim some dynamically allocated memory, causing the
Webserver to "leak" memory at run-time. Over a long period of use, or over a shorter period
if the fault is exploited in a denial-of-service attack, this version of the Apache Web server
will allocate and fail to reclaim more and more memory, eventually slowing to the point of
unusability or simply crashing.
The fault is nearly impossible to see in this code. The memory that should be
deallocated here is part of a structure defined and created elsewhere, in the SSL (secure
sockets layer) subsystem, written and maintained by a different set of developers. Even
reading the definition of the ap filter t structure, which occurs in a different part of the
Apache Web server source code, doesn't help, since the ctx field is an opaque pointer (type
void * in C) . The repair, applied in version 2.0.49 of the
server, is:
Finally, although the fault would be very difficult to detect with conventional unit
testing techniques, there do exist both static and dynamic analysis techniques
Among strategies for incrementally testing partially assembled systems, we can
distinguish two main classes: structural and feature oriented. In a structural approach,
modules are constructed, assembled, and tested together in an order based on hierarchical
structure in the design. Structural approaches include bottom-up, top-down, and a
combination sometimes referred to as sandwich or backbone strategy. Feature oriented
strategies derive the order of integration from characteristics of the application, and include
threads and critical modules strategies.
Top-down and bottom-up strategies are classic alternatives in system construction and
incremental integration testing as modules accumulate.
A top-down integration strategy begins at the top of the uses hierarchy, including the
interfaces exposed through a user interface or top-level application program interface (API).
The need for drivers is reduced or eliminated while descending the hierarchy, since at each
stage the already tested modules can be used as drivers while testing the next layer. For
example, referring to the excerpt of the Chipmunk Web presence shown in Figure we can
start by integrating Customer Care with Customer, while stubbing Account and Order. We
could then add either Account or Order and Package, stubbing Model and Component in the
last case.
Figure 21.1: An excerpt of the class diagram of the Chipmunk Web presence. Modules are
sorted from the top to the bottom according to the use/include relation. The topmost modules
are not used or included in any other module, while the bottommost modules do not include
or use other modules.
Bottom-up integration similarly reduces the need to develop stubs, except for
breaking circular relations. Referring again to the example in Figure 21.1, we can start
bottom-up by integrating Slot with Component, using drivers for Model and Order.We can
then incrementally add Model and Order. We can finally add either Package or Account and
Customer, before integrating CustomerCare, without constructing stubs.
Top-down and bottom-up approaches to integration testing can be applied early in the
development if paired with similar design strategies: If modules are delivered following the
hierarchy, either top-down or bottom-up, they can be integrated and tested as soon as they are
delivered, thus providing early feedback to the developers. Both approaches increase
controllability and diagnosability, since failures are likely caused by interactions with the
newly integrated modules.
An early top-down approach may result from developing prototypes for early user
feedback, while existing modules may be integrated bottom-up. This is known as the
sandwich or backbone strategy. For example, referring once more to the small system of
Figure let us imagine reusing existing modules for Model, Slot, and Component, and
developing CustomerCare and Customer as part of an early prototype. We can start
integrating CustomerCare and Customer top down, while stubbing Account and Order.
Meanwhile, we can integrate bottom-up Model, Slot, and Component with Order, using
drivers for Customer and Package. We can then integrate Account with Customer, and
Package with Order, before finally integrating the whole prototype system.
The price of flexibility and adaptability in the sandwich strategy is complex planning
and monitoring. While top-down and bottom-up are straightforward to plan and monitor, a
sandwich approach requires extra coordination between development and test. In contrast to
structural integration testing strategies, feature-driven strategies select an order of integration
that depends on the dynamic collaboration patterns among modules regardless of the static
structure of the system. The thread integration testing strategy integrates modules according
to system features. Test designers identify threads of execution that correspond to system
features, and they incrementally test each thread.
The thread integration strategy emphasizes module interplay for specific functionality.
Referring to the Chipmunk Web presence, we can identify feature threads for assembling
models, finalizing orders, completing payments, packaging and shipping, and so on. Feature
thread integration fits well with software processes that emphasize incremental delivery of
Manjushree TL, Asst Prof., Dept of ISE, SVIT Page 8
Software Testing(18IS62) Module 5
user-visible functionality. Even when threads do not correspond to usable end-user features,
ordering integration by functional threads is a useful tactic to make flaws in integration
externally visible. Incremental delivery of usable features is not the only possible
consideration in choosing the order in which functional threads are integrated and tested. Risk
reduction is also a driving force in many software processes. Critical module integration
testing focuses on modules that pose the greatest risk to the project. Modules are sorted and
incrementally integrated according to the associated risk factor that characterizes the
criticality of each module. Both external risks (such as safety) and project risks (such as
schedule) can be considered.
Feature-driven test strategies usually require more complex planning and management
than structural strategies. Thus, we adopt them only when their advantages exceed the extra
management costs. For small systems a structural strategy is usually sufficient, but for large
systems feature-driven strategies are usually preferred. Often large projects require
combinations of development strategies that do not fit any single test integration strategies. In
these cases, quality managers would combine different strategies: top-down, bottom-up, and
sandwich strategies for small subsystems, and a blend of threads and critical module
strategies at a higher level.
Reusable components are often more dependable than software developed for a single
application. More effort can be invested in improving the quality of a component when the
cost is amortized across many applications. Moreover, when reusing a component that has
been in use in other applications for some time, one obtains the benefit not only of test and
analysis by component developers, but also of actual operational use.
The advantages of component reuse for quality are not automatic. They do not apply
to code that was developed for a single application and then scavenged for use in another.
The benefit of operational experience as a kind of in vivo testing, moreover, is obtained only
to the extent that previous uses of the component are quite similar to the new use. These
advantages are balanced against two considerable disadvantages. First, a component designed
for wide reuse will usually be much more complex than a module designed for a single use; a
rule of thumb is that the development effort (including analysis and test) for a widely usable
component is at least twice that for a module that provides equivalent functionality for a
single application. In addition, a reusable component is by definition developed without full
knowledge of the environment in which it will be used, and it is exceptionally difficult to
fully and clearly describe all the assumptions, dependencies, and limitations that might
impinge upon its use in a particular application.
Components typically use persistent storage, while objects usually have only local
state.
Components may be accessed by an extensive set of communication mechanisms,
while objects are activated through method calls.
Components are usually larger grain subsystems than objects.
Component contract or interface The component contract describes the access points and
parameters of the component, and specifies functional and nonfunctional behavior and any
conditions required for using the component.
Framework A framework is a micro-architecture or a skeleton of an application, with hooks
for attaching application-specific functionality or configuration-specific components. A
framework can be seen as a circuit board with empty slots for components.
Frameworks and design patterns Patterns are logical design fragments, while frameworks
are concrete elements of the application. Frameworks often implement patterns.
Component-based system A component-based system is a system built primarily by
assembling software components (and perhaps a small amount of application specific
code) connected through a framework or ad hoc "glue code."
COTS The term commercial off-the-shelf, or COTS, indicates components developed
for the sale to other organizations.
use. As with system and acceptance testing of complete applications, it is then necessary to
move to test suites that are more reflective of actual use. Testing with usage scenarios places
a higher priority on finding faults most likely to be encountered in use and is needed to gain
confidence that the component will be perceived by its users (that is, by developers who
employ it as part of larger systems) as sufficiently dependable.
Test designers cannot anticipate all possible uses of a component under test, but
theycan design test suites for classes of use in the form of scenarios. Test scenarios are
closely related to scenarios or use cases in requirements analysis and design.
Sometimes different classes of use are clearly evident in the component specification.
For example, the W3 Document Object Model (DOM) specification has parts that deal
exclusively with HTML markup and parts that deal with XML; these correspond to different
uses to which a component implementing the DOM may be put. The DOM specification
further provides two "views" of the component interface. In the flat view, all traversal and
inspection operations are provided on node objects, without regard to subclass. In the
structured view, each subclass of node offers traversal and inspection operations specific to
that variety of node. For example, an Element node has methods to get and set attributes, but
a Text node (which represents simple textual data within XML or HTML) does not.
Software design for testability is an important factor in the cost and effectiveness of
test and analysis, particularly for module and component integration. To some extent model
based testing
5.4.1 Overview
System, acceptance, and regression testing are all concerned with the behavior of a
software system as a whole, but they differ in purpose. System testing is a check of
consistency between the software system and its specification (it is a verification activity).
Like unit and integration testing, system testing is primarily aimed at uncovering faults, but
unlike testing activities at finer granularity levels, system testing focuses on system-level
properties. System testing together with acceptance testing also serves an important role in
assessing whether a product can be released to customers, which is distinct from its role in
exposing faults to be removed to improve the product
The system test suite may share some test cases with test suites used in integration
and even unit testing, particularly when a thread-based or spiral model of development has
been taken and subsystem correctness has been tested primarily through externally visible
features and behavior. However, the essential characteristic of independence implies that test
cases developed in close coordination with design and implementation may be unsuitable.
The overlap, if any, should result from using system test cases early, rather than reusing unit
and integration test cases in the system test suite.
Independence in system testing avoids repeating software design errors in test design.
This danger exists to some extent at all stages of development, but always in trade for some
advantage in designing effective test cases based on familiarity with the software design and
its potential pitfalls. The balance between these considerations shifts at different levels of
granularity, and it is essential that independence take priority at some level to obtain a
credible assessment of quality.
In some organizations, responsibility for test design and execution shifts at a discrete
point from the development team to an independent verification and validation team that is
organizationally isolated from developers. More often the shift in emphasis is gradual,
without a corresponding shift in responsible personnel.
Particularly when system test designers are developers or attached to the development
team, the most effective way to ensure that the system test suite is not unduly influenced by
design decisions is to design most system test cases as early as possible. Even in agile
development processes, in which requirements engineering are tightly interwoven with
development, it is considered good practice to design test cases for a new feature before
implementing the feature. When the time between specifying a feature and implementing it is
longer, early design of system tests facilitates risk-driven strategies that expose critical
behaviors to system test cases as early as possible, avoiding unpleasant surprises as
deployment nears.
The appropriate notions of thoroughness in system testing are with respect to the
system specification and potential usage scenarios, rather than code or design. Each feature or
specified behavior of the system should be accounted for in one or several test cases. In
addition to facilitating design for test, designing system test cases together with the system
requirements specification document helps expose ambiguity and refine specifications.
The set of feature tests passed by the current partial implementation is often used as a
gauge of progress. Interpreting a count of failing feature-based system tests Additional test
cases can be devised during development to check for observable symptoms of failures that
were not anticipated in the initial system specification. They may also be based on failures
observed and reported by actual users, either in acceptance testing or from previous versions
of a system. These are in addition to a thorough specification-based test suite, so they do not
compromise independence of the quality assessment.
system and its environment. The importance of such global properties is therefore magnified
in system
Testing. Global properties like performance, security, and safety are difficult to
specify precisely and operationally, and they depend not only on many parts of the system
under test, but also on its environment and use. For example, U.S. HIPAA regulations
governing privacy of medical records require appropriate administrative, technical, and
physical safeguards to protect the privacy of health information, further specified as follows:
Implementation specification: safeguards. A covered entity must reasonably safeguard
protected health information from any intentional or unintentional use or disclosure that is in
violation of the standards, implementation specifications or other requirements of this
subpart. [Uni00, sec. 164.530(c)(2)]
It is unlikely that any precise operational specification can fully capture the HIPAA
requirement as it applies to an automated medical records system. One must consider the
whole context of use, including, for example, which personnel have access to the system and
how unauthorized personnel are prevented from gaining access. Some global properties may
be defined operationally, but parameterized by use. For example, a hard-real-time system
must meet deadlines, but cannot do so in a completely arbitrary environment; its performance
specification is parameterized by event frequency and minimum inter-arrival times. An e-
commerce system may be expected to provide a certain level of responsiveness up to a
certain number of transactions per second and to degrade gracefully up to a second rate. A
key step is identifying the "operational envelope" of the system, and testing both near the
edges of that envelope (to assess compliance with specified goals) and well beyond it (to
ensure the system degrades or fails gracefully). Defining borderline and extreme cases is
logically part of requirements engineering, but as with precise specification of features, test
design often reveals gaps and ambiguities.
Not all global properties will be amenable to dynamic testing at all, at least in the
conventional sense. One may specify a number of properties that a secure computer system
should have, and some of these may be amenable to testing. Others can be addressed only
through inspection and analysis techniques, and ultimately one does not trust the security of a
system at least until an adversarial team has tried and failed to subvert it. Similarly, there is
no set of test cases that can establish software safety, in part because safety is a property of a
larger system and environment of which the software is only part. Rather, one must consider
the safety of the overall system, and assess aspects of the software that are critical to that
overall assessment. Some but not all of those claims may be amenable to testing.
Testing global system properties may require extensive simulation of the execution
environment. Creating accurate models of the operational environment requires substantial
human resources, and executing them can require substantial time and machine resources.
Usually this implies that "stress" testing is a separate activity from frequent repetition of
feature tests. For example, a large suite of system test cases might well run each night or
several times a week, but a substantial stress test to measure robust performance under heavy
load might take hours to set up and days or weeks to run.
A test case that can be run automatically with few human or machine resources should
generally focus on one purpose: to make diagnosis of failed test executions as clear and
simple as possible. Stress testing alters this: If a test case takes an hour to set up and a day to
run, then one had best glean as much information as possible from its results. This includes
monitoring for faults that should, in principle, have been found and eliminated in unit and
integration testing, but which become easier to recognize in a stress test (and which, for the
same reason, are likely to become visible to users). For example, several embedded system
products ranging from laser printers to tablet computers have been shipped with slow
memory leaks that became noticeable only after hours or days of continuous use. In the case
of the tablet PC whose character recognition module gradually consumed all system memory,
one must wonder about the extent of stress testing the software was subjected to.
Although system and acceptance testing are closely tied in many organizations,
fundamental differences exist between searching for faults and measuring quality. Even when
the two activities overlap to some extent, it is essential to be clear about the distinction, in
order to avoid drawing unjustified conclusions. Quantitative goals for dependability,
including reliability, availability, and mean time between failures, These are essentially
statistical measures and depend on a statistically valid approach to drawing a representative
sample of test executions from a population of program behaviors. Systematic testing, which
includes all of the testing techniques presented heretofore in this book, does not draw
statistically representative samples. Their purpose is not to fail at a "typical" rate, but to
exhibit as many failures as possible. They are thus unsuitable for statistical testing.
The first requirement for valid statistical testing is a precise definition of what is being
measured and for what population. If system operation involves transactions, each of which
consists of several operations, a failure rate of one operation in a thousand is quite different
from a failure rate of one transaction in a thousand. In addition, the failure rate may vary
depending on the mix of transaction types, or the failure rate may be higher when one million
transactions occur in an hour than when the same transactions are spread across a day.
Statistical modeling therefore necessarily involves construction of a model of usage, and the
results are relative to that model.
Suppose, for example, that a typical session using the Chipmunk Web sales facility
consists of 50 interactions, the last of which is a single operation in which the credit card is
charged and the order recorded. Suppose the Chipmunk software always operates flawlessly
up to the point that a credit card is to be charged, but on half the attempts it charges the
wrong amount. What is the reliability of the system? If we count the fraction of individual
interactions that are correctly carried out, we conclude that only one operation in 100 fails, so
the system is 99% reliable. If we instead count entire sessions, then it is only 50% reliable,
since half the sessions result in an improper credit card charge.
discontinuously (e.g., performance falls precipitously when system load crosses some
threshold), then one may need to make distinct predictions for different value ranges.
A second problem faced by statistical testing, particularly for reliability, is that it may
take a very great deal of testing to obtain evidence of a sufficient level of reliability. Consider
that a system that executes once per second, with a failure rate of one execution in a million,
or 99.9999% reliability, fails about 31 times each year; this may require a great testing effort
and still not be adequate if each failure could result in death or a lawsuit. For critical systems,
one may insist on software failure rates that are an insignificant fraction of total failures. For
many other systems, statistical measures of reliability may simply not be worth the trouble.
A less formal, but frequently used approach to acceptance testing is testing with users.
An early version of the product is delivered to a sample of users who provide feedback on
failures and usability. Such tests are often called alpha and beta tests. The two terms
distinguish between testing phases. Often the early or alpha phases are performed within the
developing organization, while the later or beta phases are performed at users' sites.
In alpha and beta testing, the user sample determines the operational profile. A good
sample of users should include representatives of each distinct category of users, grouped by
operational profile and significance. Suppose, for example, Chipmunk plans to provide Web-
based sales facilities to dealers, industrial customers, and individuals. A good sample should
include both users from each of those three categories and a range of usage in each category.
In the industrial user category, large customers who frequently issue complex orders as well
as small companies who typically order a small number of units should be represented, as the
difference in their usage may lead to different failure rates. We may weigh differently the
frequency of failure reports from dealers and from direct customers, to reflect either the
expected mix of usage in the full population or the difference in consequence of failure.
5.7 Usability
A usable product is quickly learned, allows users to work efficiently, and is pleasant
to use. Usability involves objective criteria such as the time and number of operations
required to perform tasks and the frequency of user error, in addition to the overall, subjective
satisfaction of users.
For test and analysis, it is useful to distinguish attributes that are uniquely associated
with usability from other aspects of software quality (dependability, performance, security,
etc.). Other software qualities may be necessary for usability; for example, a program that
often fails to satisfy its functional requirements or that presents security holes is likely to
suffer poor usability as a consequence. Distinguishing primary usability properties from other
software qualities allows responsibility for each class of properties to be allocated to the most
appropriate personnel, at the most cost-effective points in the project schedule.
Even if usability is largely based on user perception and thus is validated based on
user feedback, it can be verified early in the design and through the whole software life cycle.
The process of verifying and validating usability includes the following main steps:
The purpose of exploratory testing is to investigate the mental model of end users. It
consists of asking users about their approach to interactions with the system. For example,
during an exploratory test for the Chipmunk Web presence, we may provide users with a
generic interface for choosing the model they would like to buy, in order to understand how
users will interact with the system. A generic interface could present information about all
laptop computer characteristics uniformly to see which are examined first by the sample
users, and thereby to determine the set of characteristics that should belong to the summary in
the menu list of laptops. Exploratory test is usually performed early in design, especially
Users are asked to execute a planned set of actions that are identified as typical uses
of the tested feature. For example, the Chipmunk usability assessment team may ask users to
configure a product, modify the configuration to take advantage of some special offers, and
place an order with overnight delivery.
Users should perform tasks independently, without help or influence from the testing
staff. User actions are recorded, and comments and impressions are collected with a post
activity questionnaire. Activity monitoring can be very simple, such as recording sequences
of mouse clicks to perform each action. More sophisticated monitoring can include recording
mouse or eye movements. Timing should also be recorded and may sometimes be used for
driving the sessions (e.g., fixing a maximum time for the session or for each set of actions).
13. Provide clear and consistent navigation mechanisms to increase the likelihood that a
person will find what they are looking for at a site.
14. Ensure that documents are clear and simple, so they may be more easily understood.
When a new version of software no longer correctly provides functionality that should be
preserved, we say that the new version regresses with respect to former versions. The non
regression of new versions (i.e., preservation of functionality), is a basic quality requirement.
Disciplined design and development techniques, including precise specification and
modularity that encapsulates independent design decisions, improves the likelihood of
achieving non regression. Testing activities that focus on regression problems are called
(non) regression testing.
Changes in the new software version may impact the format of inputs and outputs,
and test cases may not be executable without corresponding changes. Even simple
modifications of the data structures, such as the addition of a field or small change of data
types, may invalidate former test cases, or outputs comparable with the new ones. Moreover,
some test cases may be obsolete, since they test features of the software that have been
modified, substituted, or removed from the new version.
Scaffolding that interprets test case specifications, rather than fully concrete test data,
can reduce the impact of input and output format changes on regression testing,
Test case prioritization orders frequency of test case execution, executing all of them
eventually but reducing the frequency of those deemed least likely to reveal faults by some
criterion. Alternate execution is a variant on prioritization for environments with frequent
releases and small incremental changes; it selects a subset of regression test cases for each
software version. Prioritization can be based on the specification and code-based regression
test selection techniques. In addition, test histories and fault-proneness models can be
incorporated in prioritization schemes.
For example, a test case that has previously revealed a fault in a module that has
recently undergone change would receive a very high priority, while a test case that has never
failed (yet) would receive a lower priority, particularly if it primarily concerns a
feature that was not the focus of recent changes.
Regression test selection techniques are based on either code or specifications. Code
based selection techniques select a test case for execution if it exercises a portion of the code
that has been modified. Specification-based criteria select a test case for execution if it is
relevant to a portion of the specification that has been changed. Code based regression test
techniques can be supported by relatively simple tools. They work even when specifications
are not properly maintained. However, like code-based test techniques in general, they do not
CFG regression testing techniques compare the annotated control flow graphs of the
two program versions to identify a subset of test cases that traverse modified parts of the
graphs. The graph nodes are annotated with corresponding program statements, so that
comparison of the annotated CFGs detects not only new or missing nodes and arcs, but also
nodes whose changed annotations correspond to small, but possibly relevant, changes in
statements.
The CFG for version 2.0 of cgi_decode is given in Figure 22.4. Differences between
version 2.0 and 1.0 are indicated in gray. In the example, we have new nodes, arcs and paths.
In general, some nodes or arcs may be missing (e.g., when part of the program is removed in
the new version), and some other nodes may differ only in the annotations (e.g., when we
modify a condition in the new version).
CFG criteria select all test cases that exercise paths through changed portions of the
CFG, including CFG structure changes and node annotations. In the example, we would
select all test cases that pass through node D and proceed toward node G and all test cases
that reach node L, that is, all test cases except TC1. In this example, the criterion is not very
effective in reducing the size of the test suite because modified statements affect almost all
paths.
Figure: The control flow graph of function cgi_decode version 2.0. Gray background
indicates the changes from the former version.
If we consider only the corrective modification (nodes X and Y ), the criterion is more
effective. The modification affects only the paths that traverse the edge between D and G, so
the CFG regression testing criterion would select only test cases traversing those nodes (i.e.,
TC2, TC3, TC4, TC5, TC8 and TC9). In this case the size of the test suite to be re executed
includes two-thirds of the test cases of the original test suite. In general, the CFG regression
testing criterion is effective only when the changes affect a relatively small subset of the
paths of the original program, as in the latter case. It becomes almost useless when the
changes affect most paths, as in version 2.0.
DF regression selection techniques re-execute test cases that, when executed on the original
program, exercise DU pairs that were deleted or modified in the revised program. Test cases
that executed a conditional statement whose predicate was altered are also selected, since the
changed predicate could alter some old definition-use associations.
Where test case specifications and test data are generated automatically from a
specification or model, generation can simply be repeated each time the specification or
model changes.
Code-based regression test selection criteria can be adapted for model-based regression test
selection. Consider, for example, the control flow graph derived from the process shipping
order specification We add the following item to that specification:
We can identify regression test cases with the CFG criterion that selects all cases that
correspond to international shipping addresses (i.e., test cases TC-1 and TC-5 from the
following table). The table corresponds to the functional test cases derived using to the
method
testing one can consider coarser grain elements such as methods, features, and files.
CHAPTER -2
LEVELS OF TESTING, INTEGRATION TESTING
The waterfall model is closely associated with top–down development and design by
functional decomposition. The end result of preliminary design is a functional decomposition
of the entire system into a tree-like structure of functional components. Figure 1.2 contains a
partial functional decomposition of our automated teller machine(ATM)system.
With this decomposition top down integration would begin with the main program
,checking the calls to the three tree level procedures(Terminal I/O ,ManageSession, and
conduct Transaction),Following the tree the manage session procedure would be tested and
then the CardEntry ,PIN entry, and Sales Transaction procedures, in this case the actual code
for low level units is replaced by a stub which is a throwaway piece of code that takes the
place of actual code, Bottom-up integration would be the opposite sequence, starting with the
CardEntry,PIN entry ,and Select Transaction procedures and working up toward the main
program, in the bottom up integration units at higher levels are replaced by drivers that
emulate the procedure calls .The ―big bang‖ approach simply puts all the units together at
once, with no stubs or drivers whichever approach is taken, the goal of traditional integration
testing is to integrate previously tested units with respect to the functional decomposition
tree,
waterfall model and to the bottom–up testing order, but it relies on one of the major
weaknesses of waterfall development cited by Agresti (1986)—the need for ―perfect
foresight.‖ Functional decomposition can only be well done when the system is completely
understood, and it promotes analysis to the near exclusion of synthesis. The result is a very
long separation between requirements specification and a completed system, and during this
interval, no opportunity is available for feedback from the customer. Composition, on the
other hand, is closer to the way people work: start with something known and understood,
then add to it gradually, and maybe remove undesired portions.
There are three mainline derivatives of the waterfall model: incremental development,
evolutionary development, and the spiral model (Boehm, 1988). Each of these involves a
series of increments or builds as shown in Figure 11.3 Within a build, the normal waterfall
phases from detailed design through testing occur with one important difference: system
testing is split into two steps—regression and progression testing
The main impact of the series of builds is that regression testing becomes necessary.
The goal of regression testing is to ensure that things that worked correctly in the previous
build still work with the newly added code. Regression testing can either precede or follow
integration testing, or possibly occur in both places. Progression testing assumes that
regression testing was successful and that the new functionality can be tested. (We like to
think that the addition of new code represents progress, not a regression.) Regression testing
is an absolute necessity in a series of builds because of the well-known ripple effect of
changes to an existing system. (The industrial average is that one change in five introduces a
new fault.)
The differences among the three spin-off models are due to how the builds are
identified. In incremental development, the motivation for separate builds is usually to flatten
the staff profile. With pure waterfall development, there can be a huge bulge of personnel for
the phases from detailed design through unit testing. Many organizations cannot support such
rapid staff fluctuations, so the system is divided into builds that can be supported by existing
personnel. In evolutionary development, the presumption of a build sequence is still made,
but only the first build is defined. On the basis of that, later builds are identified, usually in
response to priorities set by the customer/user, so the system evolves to meet the changing
needs of the user.
This foreshadows the customer-driven tenet of the agile methods. The spiral model is
a combination of rapid prototyping and evolutionary development, in which a build is defined
first in terms of rapid prototyping and then is subjected to a go/no-go decision based on
technology-related risk factors. From this, we see that keeping preliminary design as an
integral step is difficult for the evolutionary and spiral models. To the extent that this cannot
be maintained as an integral activity, integration testing is negatively affected. System testing
is not affected.
these as scenarios that are important to the customer, and then use these as system test cases.
These could be precursors to the user stories of the agile life cycles. The main contribution of
rapid prototyping is that it brings the operational (or behavioral) viewpoint to the
requirements specification phase. Usually, requirements specification techniques emphasize
the structure of a system, not its behavior. This is unfortunate because most customers do not
care about the structure, and they do care about the behavior.
Once again, this life cycle has no implications for integration testing. One big
difference is that the requirements specification document is explicit, as opposed to a
prototype. More important, it is often a mechanical process to derive system test cases from
an executable specification. Although more work is required to develop an executable
specification, this is partially offset by the reduced effort to generate system test cases. Here
is another important distinction: when system testing is based on an executable specification,
we have an interesting form of structural testing at the system level.
• The SATM terminal is sketched in Figure; in addition to the display screen, there are
function buttons B1, B2, and B3, a digit keypad with a cancel key, slots for printer
receipts and ATM cards, and doors for deposits and cash withdrawals.
• The SATM system is described here in two ways: with a structured analysis
approach, and with an object-oriented approach.
The structured analysis approach to requirements specification is still widely used .it
enjoys extensive CASE tools support as well as commercial training and is described in
numerous texts, the technique is based on three complimentary models function ,data, and
control, here we use dataflow diagram for the functional model, the entity/relationship model
for data, and finite state machine models for the control aspect of the SATM system, The
functional and data models were drawn with the Deft CASE tool from Sybase ,Inc. That tool
identifies external devices with lowercase letter .Elements of the functional decomposition
are identified with numbers. The open and filled arrowhead on flow arrows signify whether
the flow item is simple or compound. The portion of the SATM System shows here pertain
generally to the personal identification number(PIN) Verification portion of the system
Screen is comprised of
Screen1 welcome
Screen2 enter PIN
Screen3 wrong PIN
Screen4 PIN failed, card retained
Screen5 select trans type
Screen6 select account type
Screen7 enter amount
Screen8 Insufficient funds
Screen 9 cannot dispense that amount
Screen10 cannot process withdrawals
The upper level finite state machine in Figure 4.12 divides the system into states that
correspond to stages of customer usage.
Other choices are possible, for instance, we might choose states to be screens being
displayed (this turns out to be a poor choice).
Finite state machines can be hierarchically decomposed in much the same way as
dataflow diagrams.
The decomposition of the Await PIN state is shown in Figure 4.13. In both of these
figures, state transitions are caused either by events at the ATM terminal (such as a
keystroke) or by data conditions (such as the recognition that a PIN is correct)
During design, some of the original decisions may be revised based on additional
insights and more detailed requirements (for example, performance or reliability
goals).
The end result is a functional decomposition such as the partial one shown in the
structure chart in figure
Notice that the original first level decomposition into four subsystems is continued:
the functionality has been decomposed to lower levels of detail
If we follow the definition of the SATM system, we could first postulate that system
testing should make sure that all fifteen display screens have been generated. (An
output domain based, functional view of system testing.)
The entity/relationship model also helps: the one-to-one and one-to-many
relationships help us understand how much testing must be done.
The control model (in this case, a hierarchy of finite state machines) is the most
helpful.
system test cases in terms of paths through the finite state machine(s); doing this
yields a system level analog of structural testing.
The functional models (dataflow diagrams and structure charts) move in the direction
of levels because both express a functional decomposition.
structure chart and identify where system testing ends and integration testing starts
For instance, the following threads are all clearly at the system level:
1. Insertion of an invalid card. (this is probably the ―shortest‖ system thread)
2. Insertion of a valid card, followed by three failed PIN entry attempts.
3. Insertion of a valid card, a correct PIN entry attempt, followed by a balance
inquiry.
4. Insertion of a valid card, a correct PIN entry attempt, followed by a deposit.
5. Insertion of a valid card, a correct PIN entry attempt, followed by a withdrawal.
6. Insertion of a valid card, a correct PIN entry attempt, followed by an attempt to
withdraw more cash than the account balance.
• Every system has a port boundary; the port boundary of the SATM system includes
the digit keypad, the function buttons, the screen, the deposit and withdrawal doors,
the card and receipt slots, and so on.
• Each of these devices can be thought of as a ―port‖, and events occur at system ports.
• The port input and output events are visible to the customer, and the customer very
often understands system behavior in terms of sequences of port events.
• This fits our understanding of a test case, in which we specify pre-conditions, inputs,
outputs, and post-conditions.
• if a test case (thread) ever requires an input (or an output) that is not visible at the port
boundary, the test case cannot be a system level test case (thread