Unit 3
Unit 3
UNIT III
Paths, Path products and Regular expressions: path products & path expression, reduction
procedure, applications, regular expressions & flow anomaly detection.
Logic Based Testing: overview, decision tables, path expressions, kv charts, specifications
Path Products
Normally flow graphs used to denote only control flow connectivity.
The simplest weight we can give to a link is a name.
Using link names as weights, we then convert the graphical flow graph into an
equivalent algebraic like expressions which denotes the set of all possible paths from
entry to exit for the flow graph.
Every link of a graph can be given a name.
The link name will be denoted by lower case italic letters In tracing a path or path
segment through a flow graph, you traverse a succession of link names.
The name of the path or path segment that corresponds to those links is expressed
naturally by concatenating those link names.
For example, if you traverse links a,b,c and d along some path, the name for that path
segment is abcd. This path name is also called a path product. Figure 5.1 shows some
examples:
Path Expression
Consider a pair of nodes in a graph and the set of paths between those node.
Denote that set of paths by Upper case letter such as X,Y. From Figure 5.1c, the
members of the path set can be listed as follows:
ac, abc, abbc, abbbc, abbbbc.............
Alternatively, the same set of paths can be denoted by :
ac+abc+abbc+abbbc+abbbbc+...........
The + sign is understood to mean "or" between the two nodes of interest, paths ac,or
abc, or abbc, and so on can be taken.
Any expression that consists of path names and "OR"s and which denotes a set of
paths between two nodes is called a "Path Expression”.
Path Products
The name of a path that consists of two successive path segments is conveniently
expressed by the concatenation or Path Product of the segment names.
For example, if X and Y are defined as X=abcde,Y=fghij,then the pathcorresponding
to X followed by Y is denoted by
XY=abcdefghij
Similarly,
YX=fghijabcde
aX=aabcde
Xa=abcdea
XaX=abcdeaabcde
If X and Y represent sets of paths or path expressions, their product represents theset
of paths that can be obtained by following every element of X by any element of Y in
all possible ways. For example,
X = abc + def + ghi
Y = uvw + z
Then,
XY = abcuvw + defuvw + ghiuvw + abcz + defz + ghiz
If a link or segment name is repeated, that fact is denoted by an exponent. The
exponent's value denotes the number of repetitions:
a1 = a; a2 = aa; a3 = aaa; an = aaaa . . . n times.
Similarly, if X = abcde then
X1 = abcde
X2 = abcdeabcde = (abcde)2
X3 = abcdeabcdeabcde = (abcde)2abcde
= abcde(abcde)2 = (abcde)3
The path product is not commutative (that is XY!=YX).
The path product is Associative.
RULE 1: A(BC)=(AB)C=ABC
where A,B,C are path names, set of path names or path expressions.
The zeroth power of a link name, path product, or path expression is also needed for
completeness. It is denoted by the numeral "1" and denotes the "path" whose length is
zero - that is, the path that doesn't have any links.
a0 = 1
X0 = 1
Path Sums
The "+" sign was used to denote the fact that path names were part of the same setof
paths.
The "PATH SUM" denotes paths in parallel between nodes.
Links a and b in Figure 5.1a are parallel paths and are denoted by a + b. Similarly,
links c and d are parallel paths between the next two nodes and are denoted by c +d.
The set of all paths between nodes 1 and 2 can be thought of as a set of parallel paths
and denoted by eacf+eadf+ebcf+ebdf.
If X and Y are sets of paths that lie between the same pair of nodes, then X+Y denotes
the UNION of those set of paths. For example, in Figure 5.2:
Distributive Laws
The product and sum operations are distributive, and the ordinary rules of
multiplication apply; that is
RULE 4: A(B+C)=AB+AC and (B+C)D=BD+CD
Applying these rules to the below Figure 5.1a yields
e(a+b)(c+d)f=e(ac+ad+bc+bd)f = eacf+eadf+ebcf+ebdf
Absorption Rule
If X and Y denote the same set of paths, then the union of these sets is unchanged;
consequently,
RULE 5: X+X=X (Absorption Rule)
If a set consists of paths names and a member of that set is added to it, the "new"
name, which is already in that set of names, contributes nothing and can be ignored.
For example,
if X=a+aa+abc+abcd+def then
X+a = X+aa = X+abc = X+abcd = X+def = X
It follows that any arbitrary sum of identical path expressions reduces to the same
path expression.
Loops
Loops can be understood as an infinite set of parallel paths. Say that the loop consists ofa
single link b. then the set of all paths through that loop point is
b0+b1+b2+b3+b4+b5+..............
The path expression for the above figure is denoted by the notation:
ab*c=ac+abc+abbc+abbbc+................
Evidently,
aa*=a*a=a+ and XX*=X*X=X+
It is more convenient to denote the fact that a loop cannot be taken more than a
certain, say n, number of times.
A bar is used under the exponent to denote the fact as follows:
Xn = X0+X1+X2+X3+X4+X5+ ................... +Xn
Rules 6 - 16
The following rules can be derived from the previous rules:
RULE 6: Xn + Xm = Xn if n>m
RULE 6: Xn + Xm = Xm if m>n
RULE 7: XnXm = Xn+m
RULE 8: XnX* = X*Xn = X*
RULE 9: XnX+ = X+Xn = X+
RULE 10: X*X+ = X+X* = X+
RULE 11: 1 + 1 = 1
RULE 12: 1X = X1 = X
Following or preceding a set of paths by a path of zero length does not change theset.
RULE 13: 1n = 1n = 1* = 1+ = 1
No matter how often you traverse a path of zero length, It is a path of zero length.
RULE 14: 1++1 = 1*=1
The null set of paths is denoted by the numeral 0. it obeys the following rules:
RULE 15: X+0=0+X=X
RULE 16: 0X=X0=0
If you block the paths of a graph for or aft by a graph that has no paths , therewon’t be
any paths.
Reduction Procedure
In the second way, we split the node into two equivalent nodes, call them A and A'
and put in a link between them whose path expression is Z*. Then we remove node A'
using steps 4 and 5 to yield outgoing links whose path expressions are Z*X and Z*Y.
Removing the loop and then node 6 result in the following expression:
a(bgjf)*b(c+gkh)d((ilhd)*imf(bjgf)*b(c+gkh)d)*(ilhd)*e
You can practice by applying the algorithm on the following flow graphs andgenerate their
respective path expressions:
Applications
The purpose of the node removal algorithm is to present one very generalized
concept-the path expression and way of getting it.
Every application follows this common pattern:
1. Convert the program or graph into a path expression.
2. Identify a property of interest and derive an appropriate set of "arithmetic" rules
that characterizes the property.
Replace the link names by the link weights for the property of interest. The path
expression has now been converted to an expression in some algebra, such as
1. Ordinary algebra, regular expressions, or boolean algebra. This algebraic
expression summarizes the property of interest over the set of all paths.
2. Simplify or evaluate the resulting "algebraic" expression to answer the question
you asked.
This arithmetic is an ordinary algebra. The weight is the number of paths in eachset.
Example
The following is a reasonably well-structured program.
Each link represents a single link and consequently is given a weight of"1" to start. Let’s say
the outer loop will be taken exactly four times and inner Loop can be taken zero or three times
its path expression, with a little work, is:
A: The flow graph should be annotated by replacing the link name with the maximum
of paths through that link (1) and also note the number oftimes for looping.
B: Combine the first pair of parallel loops outside the loop and also thepair in the
outer loop.
C: Multiply the things out and remove nodes to clear the clutter.
Alternatively, you could have substituted a "1" for each link in the path expression and then
simplified, as follows:
a(b+c)d{e(fi)*fgj(m+l)k}*e(fi)*fgh
= 1(1 + 1)1(1(1 x 1)31 x 1 x 1(1 + 1)1)41(1 x 1)31 x 1 x 1
= 2(131 x (2))413
= 2(4 x 2)4 x 4
= 2 x 84 x 4 = 32,768
This is the same result we got graphically. Actually, the outer loop should be taken exactly
four times. That doesn't mean it will be taken zero or four times. Consequently, there is a
superfluous"4" on the outlink in the last step. Therefore the maximum number of different
paths is 8192 rather than 32,768.
Structured Flowgraph
Structured code can be defined in several different ways that do not involve ad-hoc
rules such asnot using GOTOs.
A structured flowgraph is one that can be reduced to a single link by successive
application ofthe transformations of Figure 5.7.
The node-by-node reduction procedure can also be used as a test for structured code. Flow
graphsthat DO NOT contains one or more of the graphs shown below (Figure 5.8) as sub
graphs are structured.
1. Jumping into loops
2. Jumping out of loops
3. Branching into decisions
4. Branching out of decisions
The values of the weights are the number of members in a set of paths.
Example
Applying the arithmetic to the earlier example gives us the identical steps until step 3 (C) as
below:
If you observe the original graph, it takes at least two paths to cover andthat it can be done in
two paths.
If you have fewer paths in your test plan than this minimum you probablyhaven't covered. It's
another check.
In this table, in case of a loop, PA is the probability of the link leaving the loop and PL
is the probability of looping.
The rules are those of ordinary probability theory.
If you can do something either from column A with a probability of PA or from
column B with a probability PB, then the probability that you do either is PA + PB.
For the series case, if you must do both things, and their probabilities are independent
(as assumed), then the probability that you do both is the product of their
probabilities.
For example, a loop node has a looping probability of P L and a probability of not
looping of PA, which is obviously equal to I - PL.
Following the above rule, all we've done is replace the outgoing probability with 1 - so why
the complicated rule? After a few steps in which you've removed nodes, combined parallel
terms, removed loops andthe like, you might find something like this:
Which is what we've postulated for any decision. In other words, divisionby 1 - PL
renormalizes the outlink probabilities so that their sum equals unity after the loop is removed.
Here is a complicated bit of logic. We want to know the probabilityassociated with cases A,
B, and C.
Let us do this in three parts, starting with case A. Note that the sum of theprobabilities at each
decision node is equal to 1. Start by throwing away anything that isn't on the way to case A,
and then apply the reduction procedure. To avoid clutter, we usually leave out probabilities
equal to 1.
Case A:
Case B is simpler
These checks. It's a good idea when doing this sort of thing to calculate allthe
probabilities and to verify that the sum of the routine's exit probabilities does equal 1.
If it doesn't, then you've made calculation error or, more likely, you've left out some
bra How about path probabilities? That's easy. Just trace the pathof interest and
multiply the probabilities as you go.
Alternatively, write down the path name and do the indicated arithmetic operation
Say that a path consisted of links a, b, c, d, e, and the associated probabilities were .2,
.5, 1., .01, and I respectively. Path abcbcbcdeabddeawould have a probability of 5 x
10-10.
Long paths are usually improbable
The model has two weights associated with every link: the processing time for that link,
denotedby T, and the probability of that link P.
Example:
1. Start with the original flow graph annotated with probabilities and processing time.
2. Combine the parallel links of the outer loop. The result is just the mean of the
processing times for the links because there aren't any other links leaving the first
node. Also combine the pair of links at the beginning of the flow graph.
4. Use the cross-term step to eliminate a node and to create the inner self - loop.
5. Finally, you can get the mean processing time, by using the arithmetic rules as
follows:
Push/Pop, Get/Return
This model can be used to answer several different questions that can turn up in debugging. It
can also help decide which test cases to design.
The numeral 1 is used to indicate that nothing of interest (neither PUSHnor POP)
occurs on a given link.
"H" denotes PUSH and "P" denotes POP. The operations arecommutative, associative,
and distributive.
G(G + R)G(GR)*GGR*R
= G(G + R)G3R*R
= (G + R)G3R*
= (G4 + G2)R*
This expression specifies the conditions under which the resources will be balanced
on leaving the routine.
If the upper branch is taken at the first decision, the second loop must be taken four
times.
If the lower branch is taken at the first decision, the second loop must be taken twice.
For any other values, the routine will not balance. Therefore, the first loop does not
have to be instrumented to verify this behavior because its impact should be nil.
The Problem
The generic flow-anomaly detection problem (note: not just data-flow anomalies, but
any flow anomaly) is that of looking for a specific sequence of options considering all
possible paths through a routine.
Let the operations be SET and RESET, denoted by s and r respectively, and we want
to know if there is a SET followed immediately a SET or a RESET followed
immediately by a RESET (an ss or an rr sequence).
The Method
Annotate each link in the graph with the appropriate operator or the null operator1.
Simplify things to the extent possible, using the fact that a + a = a and 12 = 1.
You now have a regular expression that denotes all the possible sequences ofoperators
in that graph. You can now examine that regular expression for the sequences of
interest.
Example: Let A, B, C, be nonempty sets of character sequences whose smallest
string is at least one character long. Let T be a two-character string of characters.
Then if T is a substring of (i.e., if T appears within) ABnC, then T will appear in
AB2C. (HUANG's Theorem)
As an example, let
oA = pp
B = srr
C = rp
T = ss
The theorem states that ss will appear in pp(srr)nrp if it appears in pp(srr)2rp.
However, let
A = p + pp + ps
B = psr + ps(r + ps)
C = rp
T = P4
Is it obvious that there is a p4 sequence in ABnC? The theorem states that we have
only to look at
(p + pp + ps)[psr + ps(r + ps)]2rp
Multiplying out the expression and simplifying shows that there is no p4 sequence.
Incidentally, the above observation is an informal proof of the wisdom of looping
twice discussed in Unit 2. Because data-flow anomalies are represented by two-
character sequences, it follows the above theorem that looping twice is what you need
to do to find such anomalies.
Limitations
Huang's theorem can be easily generalized to cover sequences of greater length than
two characters. Beyond three characters, though, things get complex and this method
has probably reached its utilitarian limit for manual application.
There are some nice theorems for finding sequences that occur at the beginnings and
ends of strings but no nice algorithms for finding strings buried in an expression.
Static flow analysis methods can't determine whether a path is or is not achievable.
Unless the flow analysis includes symbolic execution or similar techniques, the
impact of unachievable paths will not be included in the analysis.
The flow-anomaly application, for example, doesn't tell us that there will be a flow anomaly -
it tells us that if the path is achievable, then there will be a flow anomaly. Such analytical
problems go away, of course, if you take the trouble to design routines for which all paths are
achievable.
Introduction
The functional requirements of many programs can be specified by decision tables,
which provide a useful basis for program and test design.
Consistency and completeness can be analyzed by using boolean algebra, which can
also be used as a basis for test design. Boolean algebra is trivialized by using
Karnaugh-Veitch charts.
"Logic" is one of the most often used words in programmers' vocabularies but oneof
their least used techniques.
Boolean algebra is to logic as arithmetic is to mathematics. Without it, the testeror
programmer is cut off from many test and design techniques and tools that incorporate
those techniques.
Logic has been, for several decades, the primary tool of hardware logic designers.
Many test methods developed for hardware logic can be adapted to software logic
testing. Because hardware testing automation is 10 to 15 years ahead of software
testing automation, hardware testing methods and its associated theory is a fertile
ground for software testing methods.
As programming and test techniques have improved, the bugs have shifted closer to
the process front end, to requirements and their specifications. These bugs range from
8% to 30% of the total and because they're first-in and last-out, they'rethe costliest of
all.
Although programmed tools are nice to have, most of the benefits of boolean algebra can be
reaped by wholly manual means if you have the right conceptual tool: the Karnaugh-Veitch
diagram is that conceptual tool.
Decision Tables
Figure 6.1 is a limited - entry decision table. It consists of four areas called the
conditionstub, the condition entry, the action stub, and the action entry.
Each column of the table is a rule that specifies the conditions under which the
actionsnamed in the action stub will take place.
The condition stub is a list of names of conditions.
A rule specifies whether a condition should or should not be met for the rule to be
satisfied. "YES" means that the condition must be met, "NO" means that the condition
must not be met, and "I" means that the condition plays no part in the rule, or it is
immaterial to that rule.
The action stub names the actions the routine will take or initiate if the rule is satisfied.
If the action entry is "YES", the action will take place; if "NO", the action will not
take place.
In addition to the stated rules, we also need a Default Rule that specifies the default action to
be taken when all other rules fail. The default rules for Table in Figure 6.1 is shown in Figure
6.3
If the decision appears on a path, put in a YES or NO as appropriate. If the decision does not
appear on the path, put in an I, Rule 1 does not contain decision C, therefore its entries are:
YES, YES, I, YES.
The corresponding decision table is shown in Table 6.1
The programmer tried to force all three processes to be executed for the cases but
forgot that the B and C predicates would be done again, thereby bypassing processes A2 and
A3.
Table 6.3 shows the conversion of this flow graph into a decision table after expansion.
Path Expressions
General
Logic-based testing is structural testing when it's applied to structure (e.g., control
flow graph of an implementation); it's functional testing when it's applied to a
specification.
In logic-based testing we focus on the truth values of control flow predicates.
A predicate is implemented as a process whose outcome is a truth-functional value.
For our purpose, logic-based testing is restricted to binary predicates.
We start by generating path expressions by path tracing as in Unit V, but this time,
our purpose is to convert the path expressions into boolean algebra, using the
predicates' truth values (e.g., A and ) as weights.
Boolean algebra
Steps
Label each decision with an uppercase letter that represents the truth valueof the
predicate. The YES or TRUE branch is labeled with a letter (say A) and the NO or
FALSE branch with the same letter overscored (say ).
The truth value of a path is the product of the individual labels. Concatenation or
products mean "AND". For example, the straight- through path of Figure 6.5, which
goes via nodes 3, 6, 7, 8, 10, 11, 12, and 2, has a truth value of ABC. The path via
nodes 3, 6, 7, 9 and 2 has a value of .
If two or more paths merge at a node, the fact is expressed by use of a plussign (+)
which means "OR".
There are only two numbers in boolean algebra: zero (0) and one (1). One means
"always true" and zero means "always false".
In all of the above, a letter can represent a single sentence or an entire boolean
algebraexpression.
Individual letters in a boolean algebra expression are called Literals (e.g. A,B)The
product of several literals is called a product term (e.g., ABC, DE).
An arbitrary boolean expression that has been multiplied out so that it consists of
the sum ofproducts (e.g., ABC + DEF + GH) is said to be in sum-of-products form.
The result of simplifications (using the rules above) is again in the sum of product
form and eachproduct term in such a simplified version is called a prime implicant.
For example, ABC + AB
+ DEF reduces by rule 20 to AB + DEF; that is, AB and DEF are prime implicants the
path expressions of Figure 6.5 can now be simplified by applying the rules.
Similarly,
The deviation from the specification is now clear. The functions should have been:
Loops complicate things because we may have to solve a boolean equation to determine
what predicate value combinations lead to where.
KV CHARTS
Introduction
If you had to deal with expressions in four, five, or six variables, you could get
bogged down in the algebra and make as many errors in designing test cases as there
are bugs in the routine you're testing.
Karnaugh-Veitch chart reduces boolean algebraic manipulations to graphical trivia.
Beyond six variables these diagrams get cumbersome and may not be effective.
Single Variable
Figure 6.6 shows all the boolean functions of a single variable and their equivalent
representation as a KV chart.
The charts show all possible truth values that the variable A can have.
A "1" means the variable’s value is "1" or TRUE. A "0" means that the variable's
value is 0 or FALSE.
The entry in the box (0 or 1) specifies whether the function that the chartrepresents
is true or false for that value of the variable.
We usually do not explicitly put in 0 entries but specify only the conditions under
which the function is true.
Two Variables
Figure 6.7 shows eight of the sixteen possible functions of two variables
Each box corresponds to the combination of values of the variables for the row and
column of that box.
A pair may be adjacent either horizontally or vertically but not diagonally.
Any variable that changes in either the horizontal or vertical direction does not appear
in the expression.
In the fifth chart, the B variable changes from 0 to 1 going down the column, and
because the A variable's value for the column is 1, the chart is equivalent to a simple
A.
The first chart has two 1's in it, but because they are not adjacent, each must betaken
separately.
They are written using a plus sign.
It is clear now why there are sixteen functions of two variables.
Each box in the KV chart corresponds to a combination of the variables' values.
That combination might or might not be in the function (i.e., the boxcorresponding to
that combination might have a 1 or 0 entry).
Since n variables lead to 2n combinations of 0 and 1 for the variables, and eachsuch
combination (box) can be filled or not filled, leading to 2 2n ways of doingthis.
Consequently for one variable there are 221 = 4 functions, 16 functions of 2 variables,
256 functions of 3 variables, 16,384 functions of 4 variables, and so on
Given two charts over the same variables, arranged the same way, their product is the
term by term product, their sum is the term by term sum, and the negation of a chart is
gotten by reversing all the 0 and 1 entries in the chart.
OR
Three Variables
KV charts for three variables are shown below.
As before, each box represents an elementary term of three variables with a bar
appearing or not appearing according to whether the row-column heading for thatbox
is 0 or 1.
A three-variable chart can have groupings of 1, 2, 4, and 8 boxes.
A few examples will illustrate the principles:
You'll notice that there are several ways to circle the boxes into maximum-sizedcovering
groups.