STMT Unit-3
STMT Unit-3
UNIT III
Dataflow testing:
Data flow testing is the name given to a family of test strategies based on
selecting paths through the program's control flow in order to explore
sequences of events related to the status of data objects.
For example, pick enough paths to assure that every data object has been
initialized prior to use or that all defined objects have been used for
something.
Motivation: It is our belief that, just as one would not feel confident
about a program without executing every statement in it as part of some
test, one should not feel confident about a program without having seen
the effect of using the value produced by each and every computation.
There are two types of data flow machines with different architectures.
7. GOTO 1
2. Multi-instruction, multi-data machines (MIMD) machines
These machines can fetch several instructions and objects in parallel.
Bug assumption:
The bug assumption for data-flow testing strategies is that control flow is
generally correct and that something has gone wrong with the software so
that data objects are not available when they should be, or silly things
are being done to data objects.
Also, if there is a control-flow problem, we expect it to have symptoms
that can be detected by data-flow analysis.
Although we'll be doing data-flow testing, we won't be using data flow
graphs as such. Rather, we'll use an ordinary control flow graph
annotated to show what happens to the data objects of interest at the
moment.
The data flow graph is a graph consisting of nodes and directed links.
We will use a control graph to show what happens to data objects of
interest at that moment.
Our objective is to expose deviations between the data flows we have and
the data flows we want.
They can be used in two distinct ways: (1) In a Calculation (2) As a part of
a Control Flow Predicate.
The following symbols denote these possibilities:
1. Defined: d - defined, created, initialized etc
2. Killed or undefined: k - killed, undefined, released etc
3. Usage: u - used for something (c - used in Calculations, p - used
in a predicate)
1. Defined (d):
3. Usage (u):
A variable is used for computation (c) when it appears on the right hand
side of an assignment statement.
A file record is read or written.
It is used in a Predicate (p) when it appears directly in a predicate.
Data flow anomaly model prescribes that an object can be in one of four
distinct states:
1. K :- undefined, previously killed, does not exist
2. D :- defined but not yet used for anything
3. U :- has been used for computation or in predicate
4. A :- anomalous
These capital letters (K,D,U,A) denote the state of the variable and should
not be confused with the program action, denoted by lower case letters.
Assume that the variable starts in the K state - that is, it has not been defined
or does not exist. If an attempt is made to use it or to kill it (e.g., say that we're
talking about opening, closing, and using files and that 'killing' means closing),
the object's state becomes anomalous (state A) and, once it is anomalous, no
action can return the variable to a working state. If it is defined (d), it goes into
the D, or defined but not yet used, state. If it has been defined (D) and
redefined (d) or killed without use (k), it becomes anomalous, while usage (u)
brings it to the U state. If in U, redefinition (d) brings it to D, u keeps it in U,
and k kills it.
This graph has three normal and three anomalous states and he considers the
kk sequence not to be anomalous. The difference between this state graph and
above Figure is that redemption is possible. A proper action from any of the
three anomalous states returns the variable to a useful working state.
There are many things for which current notions of static analysis are
inadequate. They are:
the proper state on a given path or, for that matter, whether they exist
at all.
4. Dynamic Subroutine and Function Names in a Call: subroutine or
function name is a dynamic variable in a call. What is passed, or a
combination of subroutine names and data objects, is constructed on
a specific path. There's no way, without executing the path, to
determine whether the call is correct or not.
5. False Anomalies: Anomalies are specific to paths. Even a "clear bug"
such as ku may not be a bug if the path along which the anomaly
exist is unachievable. Such "anomalies" are false anomalies.
Unfortunately, the problem of determining whether a path is or is not
achievable is unsolvable.
6. Recoverable Anomalies and Alternate State Graphs: What
constitutes an anomaly depends on context, application, and
semantics. How does the compiler know which model I have in mind?
It can't because the definition of "anomaly" is not fundamental. The
language processor must have a built-in anomaly definition with
which you may or may not (with good reason) agree.
7. Concurrency, Interrupts, System Issues: As soon as we get away
from the simple single-task uniprocessor environment and start
thinking in terms of systems, most anomaly issues become vastly
more complicated. How often do we define or create data objects at an
interrupt level so that they can be processed by a lower-priority
routine? Interrupts can make the "correct" anomalous and the
"anomalous" correct. True concurrency (as in an MIMD machine) and
pseudoconcurrency (as in multiprocessing) systems can do the same
to us. Much of integration and system testing is aimed at detecting
data-flow anomalies that cannot be detected in the context of a single
routine.
Although static analysis methods have limits, they are worth using and a
continuing trend in language processor design has been better static
analysis methods, especially for data flow anomaly detection. That's good
because it means there's less for us to do as testers and we have far too
much to do as it is.
The data flow model is based on the program's control flow graph - Don't
confuse that with the program's data flowgraph.
Here we annotate each link with symbols (for example, d, k, u, c, p) or
sequences of symbols (for example, dd, du, ddd) that denote the sequence
of data operations on that link with respect to the variable of interest.
Such annotations are called link weights.
The control flow graph structure is same for every variable: it is the
weights that change.
Components of the model:
1. To every statement there is a node, whose name is unique. Every
node has at least one outlink and at least one inlink except for
exit nodes and entry nodes.
2. Exit nodes are dummy nodes placed at the outgoing arrowheads
of exit statements (e.g., END, RETURN), to complete the graph.
Similarly, entry nodes are dummy nodes placed at entry
statements (e.g., BEGIN) for the same reason.
3. The outlink of simple statements (statements with only one
outlink) are weighted by the proper sequence of data-flow actions
for that statement. Note that the sequence can consist of more
than one letter. For example, the assignment statement A:= A +
B in most languages is weighted by cd or possibly ckd for
variable A. Languages that permit multiple simultaneous
assignments and/or compound statements can have anomalies
Prepared by: Dept. of CSE, RGMCET Page 11
SOFTWARE TESTING METHODOLOGIES AND TOOLS
Simple path segment is a path segment in which at most one node is visited
twice. For example, in the above Figure 4, (7,4,5,6,7) is a simple path segment.
A simple path segment is either loop-free or if there is a loop, only one node is
involved.
A du path from node i to k is a path segment such that if the last link has a
computational use of X, then the path is simple and definition-clear; if the
penultimate (last but one) node is j - that is, the path is (i,p,q,...,r,s,t,j,k) and
link (j,k) has a predicate use - then the path from i to j is both loop-free and
definition-clear.
For variable X and Y:In Figure 3, because variables X and Y are used only on
link (1,3), any test that starts at the entry satisfies this criterion (for variables X
and Y, but not for all variables as required by the strategy).
2. All Uses Startegy (AU): The all uses strategy is that at least one definition
clear path from every definition of every variable to every use of that definition
be exercised under some test. Just as we reduced our ambitions by stepping
down from all paths (P) to branch coverage (C2), say, we can reduce the
number of test cases by asking that the test set should include at least one
path segment from every definition to every use that can be reached by that
definition.
3.All p-uses/some c-uses strategy (APU+C) : For every variable and every
definition of that variable, include at least one definition free path from the
definition to every predicate use; if there are definitions of the variables that are
not covered by the above prescription, then add computational use test cases
as required to cover every definition.
For variable Z:In Figure 4, for APU+C we can select paths that all take the
upper link (12,13) and therefore we do not cover the c-use of Z: but that's okay
according to the strategy's definition because every definition is covered. Links
(1,3), (4,5), (5,6), and (7,8) must be included because they contain definitions
for variable Z. Links (3,4), (3,5), (8,9), (8,10), (9,6), and (9,10) must be included
because they contain predicate uses of Z. Find a covering set of test cases
under APU+C for all variables in this example - it only takes two tests.
The above examples imply that APU+C is stronger than branch coverage but
ACU+P may be weaker than, or incomparable to, branch coverage.
5. All Definitions Strategy (AD) : The all definitions strategy asks only every
definition of every variable be covered by atleast one use of that variable, be
that use a computational use or a predicate use.
From the definition of this strategy we would expect it to be weaker than both
ACU+P and APU+C.
6. All Predicate Uses (APU), All Computational Uses (ACU) Strategies : The
all predicate uses strategy is derived from APU+C strategy by dropping the
requirement that we include a c-use for the variable if there are no p-uses for
the variable. The all computational uses strategy is derived from ACU+P
strategy by dropping the requirement that we include a p-use for the variable if
there are no c-uses for the variable.
It is intuitively obvious that ACU should be weaker than ACU+P and that APU
should be weaker than APU+C.
Below Figure compares path-flow and data-flow testing strategies. The arrows
denote that the strategy at the arrow's tail is stronger than the strategy at the
arrow's head.
The right-hand side of this graph, along the path from "all paths" to "all
statements" is the more interesting hierarchy for practical applications.
Note that although ACU+P is stronger than ACU, both are incomparable to
the predicate-biased strategies. Note also that "all definitions" is not
comparable to ACU or APU.
1. Data Flow Testing is used to detect the different abnormalities that may rise
due to data flow anomalies.
2. Data Flow Testing shows the relationship between the data objects that
represents the data.
3. Data Flow testing strategies helps in determining the usage of variables that
are included in the test suite.
4. Data Flow Testing is cost effective.
5. Data Flow Testing solves the problems that are encountered while during
the execution of the program.
6. Data flow testing is used in developing web applications with Java
Technology.