Chapter 5
DEPENDENCE, DATA FLOW MODELS, AND DATA FLOW TESTING: Definition-Use pairs; Data flow analysis; Classic analyses; From execution to conservative flow analysis; Data flow analysis with arrays and pointers; Inter-procedural analysis; Overview of data flow testing; Definition- Use associations; Data flow testing criteria; Data flow coverage with complex structures; The infeasibility problem. 7 Hrs
10/11/2011
Collections by: Dr. U.P.Kulkarni
Why Data Flow Models?
Other Models emphasized control
Control flow graph, call graph, finite state machines
We also need to reason about dependence
Where does this value of x come from? What would be affected by changing this? ...
Many program analyses and test design techniques use data flow information
Often in combination with control flow
10/11/2011
Collections by: Dr. U.P.Kulkarni
Def-Use Pairs (1)
A def-use (du) pair associates a point in a program where a value is produced with a point where it is used Definition: where a variable gets a value
Variable declaration (often the special value uninitialized) Variable initialization Assignment Values received by a parameter
Use: extraction of a value from a variable
Expressions Conditional statements Parameter passing Returns
10/11/2011
Collections by: Dr. U.P.Kulkarni
Def-Use Pairs
... if (...) { x = ... ; ... } y = ... + x + ... ; Def-Use path ... if (...) { x = ... ... Use: the value of x is extracted Definition : x gets a value
y = ... + x + ... ...
10/11/2011
Collections by: Dr. U.P.Kulkarni
Def-Use Pairs (3)
/** Euclid's algorithm */
public class GCD { public int gcd(int x, int y) { int tmp; // A: def x, y, tmp while (y != 0) { // B: use y tmp = x % y; // C: def tmp; use x, y x = y; // D: def x; use y y = tmp; // E: def y; use tmp } return x; // F: use x }
Collections by: Dr. U.P.Kulkarni
Ch 6, slide 5
Def-Use Pairs (3)
A definition-clear path is a path along the CFG from a definition to a use of the same variable without another definition of the variable between
If, instead, another definition is present on the path, then the latter definition kills the former
A def-use pair is formed if and only if there is a definition-clear path between the definition and the use
10/11/2011
Collections by: Dr. U.P.Kulkarni
Definition-Clear or Killing
x = ... // A: def x q = ... x = y; // B: kill x, def x z = ... y = f(x); // C: use x
Path A..C is not definition-clear
... A x = ... ... B x=y ... C y = f(x)
Definition: x gets a value Definition: x gets a new value, old value is killed Use: the value of x is extracted
7
Path B..C is definition-clear
10/11/2011
Collections by: Dr. U.P.Kulkarni
Data Flow Analysis
Computing data flow information
10/11/2011
Collections by: Dr. U.P.Kulkarni
DATA FLOW ANALYSIS
Flow Graph is directed graph which shows all possible path of program execution.
10/11/2011
Collections by: Dr. U.P.Kulkarni
DATA FLOW ANALYSIS
10/11/2011
Collections by: Dr. U.P.Kulkarni
10
DATA FLOW ANALYSIS
10/11/2011
Collections by: Dr. U.P.Kulkarni
11
DATA FLOW ANALYSIS
USE: Yields the information useful to both programmers
and optimizing compilers Programmers can be told about : Unreachable code/ Dead code, Unused parameters to procedure, variables which are used before being given initial value
10/11/2011
Collections by: Dr. U.P.Kulkarni
12
DATA FLOW ANALYSIS
Reaching Definition Analysis Live Variable Analysis Use-Definition chains(ud chains) Definition-use chains(du chains) ..
10/11/2011
Collections by: Dr. U.P.Kulkarni
13
DATA FLOW ANALYSIS
10/11/2011
Collections by: Dr. U.P.Kulkarni
14
DATA FLOW ANALYSIS
10/11/2011
Collections by: Dr. U.P.Kulkarni
15
DATA FLOW ANALYSIS
10/11/2011
Collections by: Dr. U.P.Kulkarni
16
DATA FLOW ANALYSIS
10/11/2011
Collections by: Dr. U.P.Kulkarni
17
DATA FLOW ANALYSIS
10/11/2011
Collections by: Dr. U.P.Kulkarni
18
DATA FLOW ANALYSIS
10/11/2011
Collections by: Dr. U.P.Kulkarni
19
DATA FLOW ANALYSIS
10/11/2011
Collections by: Dr. U.P.Kulkarni
20
DATA FLOW ANALYSIS
10/11/2011
Collections by: Dr. U.P.Kulkarni
21
DATA FLOW ANALYSIS
10/11/2011
Collections by: Dr. U.P.Kulkarni
22
DATA FLOW ANALYSIS
10/11/2011
Collections by: Dr. U.P.Kulkarni
23
DATA FLOW ANALYSIS
10/11/2011
Collections by: Dr. U.P.Kulkarni
24
DATA FLOW ANALYSIS
10/11/2011
Collections by: Dr. U.P.Kulkarni
25
DATA FLOW ANALYSIS
10/11/2011
Collections by: Dr. U.P.Kulkarni
26
DATA FLOW ANALYSIS
10/11/2011
Collections by: Dr. U.P.Kulkarni
27
DATA FLOW ANALYSIS
10/11/2011
Collections by: Dr. U.P.Kulkarni
28
DATA FLOW ANALYSIS
NODE IN 1 2 3 4 5 6 a bc b a c
USE a bc b a c
OUT a -
In[s] -a -
Def[n] a b c a a -
10/11/2011
Collections by: Dr. U.P.Kulkarni
29
DATA FLOW ANALYSIS
NODE IN 1 2 3 4 5 6 ac bc b ac c
USE a bc b a c
OUT a bc b a ac -
In[s] a bc b a ac -
Def[n] a b c a -
10/11/2011
Collections by: Dr. U.P.Kulkarni
30
DATA FLOW ANALYSIS
10/11/2011
Collections by: Dr. U.P.Kulkarni
31
Data flow analysis with arrays and pointers
Arrays and pointers introduce uncertainty: Do different expressions access the same storage?
a[i] same as a[k] when i = k a[i] same as b[i] when a = b (aliasing)
The uncertainty is accomodated depending to the kind of analysis
Any-path: gen sets should include all potential aliases and kill set should include only what is definitely modified All-path: vice versa
10/11/2011
Collections by: Dr. U.P.Kulkarni
32
Scope of Data Flow Analysis Intraprocedural
Within a single method or procedure as described so far
Interprocedural
Across several methods (and classes) or procedures
Cost/Precision trade-offs for interprocedural analysis are critical, and difficult
context sensitivity flow-sensitivity
10/11/2011
Collections by: Dr. U.P.Kulkarni
33
Context Sensitivity
foo() { sub() } (call) bar() { sub()
sub() {
(call)
(return) }
(return) }
A context-sensitive (intrarprocedural) analysis distinguishes sub() called from foo() from sub() called from bar(); A context-insensitive (interprocedural) analysis does not separate them, as if foo() could call sub() and sub() could then return to bar()
10/11/2011
Collections by: Dr. U.P.Kulkarni
34
Data flow testing
10/11/2011
Collections by: Dr. U.P.Kulkarni
35
Data flow concept
1
2 3
x = .... if .... 4 x = .... Value of x at 6 could be computed at 1 or at 4 Bad computation at 1 or 4 could be revealed only if they are used at 6 (1,6) and (4,6) are def-use (DU) pairs
defs at 1,4 use at 6
....
5
... y = x + ...
Collections by: Dr. U.P.Kulkarni
Terms
DU pair: a pair of definition and use for some variable, such that at least one DU path exists from the definition to the use
x = ... is a definition of x = ... x ... is a use of x
DU path: a definition-clear path on the CFG starting from a definition to a use of a same variable
Definition clear: Value is not replaced on path Note loops could create infinite DU paths between a def and a use
10/11/2011
Collections by: Dr. U.P.Kulkarni
37
Definition-clear path
1
2 3
x = .... if .... 4 x = ....
1,2,3,5,6 is a definition-clear path from 1 to 6
x is not re-assigned between 1 and 6
....
5
1,2,4,5,6 is not a definitionclear path from 1 to 6
the value of x is killed (reassigned) at node 4
... y = x + ...
(1,6) is a DU pair because 1,2,3,5,6 is a definition-clear path
Collections by: Dr. U.P.Kulkarni
Definitions and uses
A program written in a procedural language, such as C and Java, contains variables. Variables are defined by assigning values to them and are used in expressions. Statement x=y+z defines variable x and uses variables y and z. Declaration int x, y, A[10]; defines three variables. Statement scanf(``%d %d", &x, &y) defines variables x and y. Statement printf(``Output: %d \n", x+y) uses variables x and y.
C-use Uses of a variable that occurs within an expression as part of an assignment statement, in an output statement, as a parameter within a function call, and in subscript expressions, are classified as c-use, where the c in c-use stands for computational.
p-use The occurrence of a variable in an expression used as a condition in a branch statement such as an if and a while, is considered as a p-use. The p in puse stands for predicate.
Data flow graph: Example
For a given test case Identify possible defuse pairs covered Unreachable node
Example: Test enhancement using data flow
Here is an MC/DC adequate test set that does not reveal the error.
Example (contd.)
Neither of the two tests force the use of z defined on line 6, at line 9. To do so one requires a test that causes conditions at lines 5 and 8 to be true.
An MC/DC adequate test does not force the execution of this path and hence the divide by zero error is not revealed.
Example (contd.)
Verify that the following test set covers all def-use pairs of z and reveals the error.
Adequacy criteria
All DU pairs: Each DU pair is exercised by at least one test case All DU paths: Each simple (non looping) DU path is exercised by at least one test case All definitions: For each definition, there is at least one test case which exercises a DU pair containing it
(Every computed value is used somewhere)
Corresponding coverage fractions can also be defined
10/11/2011
Collections by: Dr. U.P.Kulkarni
46
Difficult cases
x[i] = ... ; ... ; y = x[j]
DU pair (only) if i==j
p = &x ; ... ; *p = 99 ; ... ; q = x
*p is an alias of x
m.putFoo(...); ... ; y=n.getFoo(...);
Are m and n the same object? Do m and n share a foo field?
Problem of aliases: Which references are (always or sometimes) the same?
10/11/2011
Collections by: Dr. U.P.Kulkarni
47
Data flow coverage with complex structures
Arrays and pointers are critical for data flow analysis
Under-estimation of aliases may fail to include some DU pairs Over-estimation, on the other hand, may introduce unfeasible test obligations
For testing, it may be preferrable to accept under-estimation of alias set rather than over-estimation or expensive analysis
Controversial: In other applications (e.g., compilers), a conservative over-estimation of aliases is usually required Alias analysis may rely on external guidance or other global analysis to calculate good estimates Undisciplined use of dynamic storage, pointer arithmetic, etc. may make the whole analysis infeasible
10/11/2011
Collections by: Dr. U.P.Kulkarni
48
Infeasibility
Consider the c-use at node 4 of z defined at node 5. For this c-use to be covered, control must arrive at node 5 ( X>0) and then move to node 4 through 6 , 2 and 3. This is not possible because edge (2,3) can be taken only if x<=0
Infeasibility
1 if (cond)
2
Suppose cond has not changed between 1 and 5 3 x = ....
Or the conditions could be different, but the first implies the second
....
4
...
5 if (cond) 6 y = x + ...
7
Then (3,5) is not a (feasible) DU pair
But it is difficult or impossible to determine which pairs are infeasible
....
Infeasible test obligations are a problem
No test case can cover them
Collections by: Dr. U.P.Kulkarni
Infeasibility The path-oriented nature of data flow analysis makes the infeasibility problem especially relevant
Combinations of elements matter! Impossible to (infallibly) distinguish feasible from infeasible paths. More paths = more work to check manually.
In practice, reasonable coverage is (often, not always) achievable
Number of paths is exponential in worst case, but often linear All DU paths is more often impractical
10/11/2011
Collections by: Dr. U.P.Kulkarni
51
Summary
Data flow testing attempts to distinguish important paths: Interactions between statements
Intermediate between simple statement and branch coverage and more expensive path-based structural testing
Cover Def-Use (DU) pairs: From computation of value to its use
Intuition: Bad computed value is revealed only when it is used Levels: All DU pairs, all DU paths, all defs (some use)
Limits: Aliases, infeasible paths
Worst case is bad (undecidable properties, exponential blowup of paths), so pragmatic compromises are required
10/11/2011
Collections by: Dr. U.P.Kulkarni
52