Paths, Path Products and Regular Expressions: UNIT-3

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 70

UNIT-3

PATHS, PATH PRODUCTS AND


REGULAR EXPRESSIONS
Motivation
 Flow graphs are being an abstract
representation of programs.
 Any question about a program can be cast
into an equivalent question about an
appropriate flowgraph.
 Most software development, testing and
debugging tools use flow graphs analysis
techniques.
Path products
 Normally flow graphs used to denote only
control flow connectivity.
 The simplest weight we can give to a link is
a name.
 Using link names as weights, we then
convert the graphical flow graph into an
equivalent algebraic like expressions which
denotes the set of all possible paths from
entry to exit for the flow graph.
Path products -continued.,
 Every link of a graph can be given a name.
 The link name will be denoted by lower case italic letters.
 In tracing a path or path segment through a flow graph,
you traverse a succession of link names.
 The name of the path or path segment that corresponds
to those links is expressed naturally by concatenating
those link names.
 For example, if you traverse links a,b,c and d along some
path, the name for that path segment is abcd.
 This path name is also called a path product.
Examples of Paths
a c

1 3 4 5 2
e f
b d
(eacf, eadf, ebcf, ebdf)
c
1 2 3 4
a b d

(abd, abcbd, abcbcbd, abcbcbcbd)


b
1 2 3
a c

(ac, abc, abbc, abbbc,abbbbc)


Path Expression
 Consider a pair of nodes in a graph and the set of
paths between those node. Denote that set of
paths by Upper case letter such as X,Y. the
members of the path set can be listed as follows:
 ac, abc, abbc, abbbc, abbbbc…………

Alternatively, the same set of paths can be denoted


by : ac+abc+abbc+abbbc+abbbbc+………
 The + sign is understood to mean “or” between
the two nodes of interest, paths ac, or abc, or
abbc, and so on can be taken.
 Any expression that consists of path names and
“OR”s and which denotes a set of paths between
two nodes is called a “PATH EXPRESSION”
Path Products
 The name of a path that consists of two successive path
segments is conveniently expressed by the concatenation or
PATH PRODUCT of the segment names.
 For example, if X and Y are defined as X=abcde,Y=fghij,then

the path corresponding to X followed by Y is denoted by


XY=abcdefghij
Similarly, YX=fghijabcde
aX=aabcde
Xa=abcdea
XaX=abcdeaabcde
Path Products-continued.,
 If X and Y represents sets of paths or path
expressions, their product represents the set of
paths that can be obtained by following every
element of X by any element of Y in all possible
ways.
 For example, X=abc+def+ghi
Y =uvw+z
 then
XY=abcuvw+defuvw+ghiuvw+abcz+defz+ghiz
 If a link or segment name is repeated, that fact is
denoted by an exponent. The exponent value
denotes the number of repetitions:
a1=a;a2=aa;a3=aaa;….an=aaaaaaa…..(n times)
Path Products-continued.,
 Similarly, if X=abcde then
x1=abcde
x2=abcdeabcde=(abcde)2
x3=abcdeabcdeabcde=(abcde)3
 The path product is not commutative (that is XY!=YX).
 The path product is Associative.
 RULE 1: A(BC)=(AB)C=ABC
where A,B,C are path names, set of path names or path
expressions.
 The zeroth power of a link name, path product, or path
expression is also needed for completeness. It is denoted
by the numeral “1” and denotes the path whose length is
zero- that is the path does not have any links.
a0=1
X0=1
Path Sums
 The “+” sign was used to denote the fact
that path names were part of the same set
of paths.
 The “PATH SUM” denotes paths in parallel
between nodes.
Path Sums- continued.,
 For example, in the below graph links a and b are
parallel and are denoted by a+b. similarly c and d
are parallel and dented by c+d.
 The set of all paths between nodes 1 and 2 can
be thought of as a set of parallel paths and
denoted by eacf+eadf+ebcf+ebdf

a c

1 3 4 5 2
e f
b d
Path Sums- continued.,
 If X and Y are sets of paths that lie
between the same pair of nodes, then X+Y
denotes the UNION of those
U
set of paths.
X V
W
Y
f h k
f
d
i
j

The first set of parallel paths is denoted by X+Y+d and the


second set by U+V+W+h+i+j. The set of all paths in the flow
graph is f(X+Y+d)g(U+V+W+h+i+j)k
Path Sums- continued.,
 The path is a set union operation, it is
clearly Commutative and Associative.
 RULE 2: X+Y=Y+X
 RULE 3: (X+Y)+Z=X+(Y+Z)=X+Y+Z
Distributive Laws
 The product and sum operations are distributive, and
the ordinary rules of multiplication apply; that is
 RULE 4: A(B+C)=AB+AC and (B+C)D=BD+CD
 Applying these rules to the below figure yields
e(a+b)(c+d)f=e(ac+ad+bc+bd)f
=eacf+eadf+ebcf+ebdf
a c

1 3 4 5 2
e f
b d
Absorption Rule
 If X and Y denote the same set of paths, then the
union of these sets is unchanged; consequently,
 RULE 5: X+X=X (Absorption Rule)
 If a set consists of paths names and a member of
that set is added to it, the “new” name, which is
already in that set of names, contributes nothing
and can be ignored.
 For example, if X=a+aa+abc+abcd+def then
X+a= X+aa= X+abc+ X+abcd= X+def= X
It follows that any arbitrary sum of identical path
expressions reduces to the same path expression.
Loop
 Loops can be understood as an infinite set
of parallel paths. Say that the loop consists
of a single link b. then the set of all paths
through that loop point is
b0+b1+b2+b3+b4+b5+……………..
b0
b1

b2
b3
bn
Loops – continued.,
 This potentially infinite sum is denoted by b* for an
individual link and by X* when X is a path expression.
b
1 a 2 c 3

 The path expression for the above figure is denoted by


the notation: ab*c=ac+abc+abbc+abbbc+………….
 Evidently aa*=a*a=a+ and XX*=X*X=X+
 It is more convenient to denote the fat that a loop cannot
be taken more than a certain, say n, number of times. A
bar is used under the exponent to denote the fact as
follows:
Xn= X0+X1+X2+X3+X4+X5+……………..+Xn
 The following rules can be derived from the
previous rules:
 RULE 6: Xn+ Xm = Xn if n>m
Xm if m>n
 RULE 7: Xn Xm = Xn+m
 RULE 8: Xn X* =X* Xn = X*
 RULE 9: XnX+=X+ Xn = X+
 RULE 10: X* X+ = X+ X* = X+
Identity Elements
 RULE 11: 1+1=1
 RULE 12: 1X=X1=X following or preceding a set of paths
by a path of zero length does not change the set.
 RULE 13: 1n=1n=1*= 1+=1 no matter how often you
traverse a path of zero length. It is a path of zero length.
 RULE 14: 1++1=1*=1
 The null set of paths is denoted by the numeral 0. it
obeys the following rules:
 RULE 15: X+0=0+X=X
 RULE 16: 0X=X0=0 If you block the paths of a graph for
or aft by a graph that has no paths , there wont be any
paths.
A Reduction Procedure
 This procedure is a node by node removal algorithm which is used to
convert a flowgraph whose links are labeled with names into a path
expression that denotes the set of all entry/exit paths in that flow graph.
 Steps to initialize the process:
1. Combine all serial links by multiplying their path expressions.
2. Combine all parallel links by adding their path expressions.
3. Remove all self-loops by replacing them with a link of the form X*, where X is
the path expression of the link in that loop.
 Steps in the algorithm’s loop:
4. Select any node for removal other than the initial or final node. Replace it with
a set of equivalent links whose path expressions corresponds to all the ways
you can form a product of the set of inlinks with the set of outlinks of that
node.
5. Combine any remaining serial links by multiplying their path expressions.
6. combine all parallel links by adding their path expressions.
7. Remove all self-loops as in step-3.
8. Does the graph consists of a single link between the entry node and the exit
node? If yes then the expression for that link is a path expression for the
original flow graph; otherwise return to step-4.
Cross – Term Step (Step 4)
 The cross term step is the fundamental
step of the reduction algorithm.
 It removes a node, there by reducing the
number of nodes by one.
 Successive application of this step
eventually get you down to one entry and
one exit node.
Cross – Term Step (Step 4)-
Example

AC
A C
AD
D
BC AE
E
B
BD
BE
A Reduction Procedure-
Example
 Applying this algorithm to the following
graph, we remove several nodes in order;
that is
1 a 3 b 4 c 5 d 6 e 2

f g h i

7 8 k 9 10
j l

m
 Remove node 10 by applying step 4 and
combine by step 5 to yield

1 a 3 b 4 c 5 d 6 e 2

f g h
il

7 8 k 9 10
j

im
 Remove node 9 by applying step4 and 5 to
yield
1 a 3 b 4 c 5 d 6 e 2

f g ilh
kh
7 8 9 10
j

im
 Remove node 7 by steps 4 and 5, as
follows:
1 a 3 b 4 c 5 d 6 e 2

g ilh
jf kh
7 8 9 10

imf
 Remove node 8 by steps 4 and 5, to obtain
1 a 3 b 4 c 5 d 6 e 2

gjf gkh ilh

7 8 9 10

imf
Parallel Term (step 6)
 Removal of node 8 above led to a pair of
parallel links between nodes 4 and 5.
combine them to create a path expression
for an equivalent link whose path
expression is c+gkh; that is
1 a 3 b 4 5 d 6 e 2
C+gkh

gjf ilh

imf
Loop Term (step 7)
 Removing node 4 leads to a loop term. The
graph has now been replaced with the
following equivalent simpler graph:
bgjf
b(c+gkh)
1 a 3 4 5 d 6 e 2

ilh

imf
Loop-removal operations
z
x Z*x
y
Z*y

In this way, we remove the self-loop and then multiply all


outgoing links by z*
 Continue the process by applying the loop-
removal step as follows:
(bgjf)*b(c+gkh)
1 a 3 5 d 6 e 2
ilh
imf
 Removing node 5 produces
ilhd

(bgjf)*b(c+gkh)d
1 a 3 6 e 2

imf

Remove the loop at node 6 to yield

(bgjf)*b(c+gkh)d (ilhd)*e
1 a 3 6 2

(ilhd)* imf
 Remove node 3 to yield
a(bgjf)*b(c+gkh)d (ilhd)*e
1 6 2

(ilhd)* imf (bgjf)*b(c+gkh)d

Removing the loop and then node 6 result


in the following expression:
a(bgjf)*b(c+gkh)d((ilhd)*imf(bjgf)*b(c+gkh)d)*(ilhd)*e
Identities
I1: (A + B)* = (A* + B*)*
I2: (A*B*)*
I3: = (A*B)*A*
I4: = (B*A)*B*
I5: = (A*B + A)*
I6: = (B*A + B)*
I7: (A + B + C + . . .)* = (A*+B*+C*+ . . .)*
I8: = (A*B*C* . . .)*
Applications
 The purpose of the node removal algorithm is to present
one very generalized concept- the path expression and
way of getting it.
 Every application follows this common pattern:
 Convert the program or graph into a path expression.
 Identify a property of interest and derive an appropriate
set of arithmetic rules that characterizes the property.
 Replace the link names by link weights for the property of
interest. Now the path expression been converted into
some algebra like ordinary algebra or regular expression.
 Simplify or evaluate the resulting algebraic expression to
answer the question you asked.
Maximum Path Count Arithmetic
 Label each link with a link weight that
corresponds to the number of paths that link
represents.
 Also mark each loop with the maximum number
of times that loop can be taken. If the answer is
infinite, you might as well stop the analysis
because it is clear that the maximum number of
paths will be infinite.
 There are three cases of interest: parallel links,
serial links, and loops
Case Path Weight
expression expression
Parallels A+B WA+WB
Series AB WAWB
n
Loop An Σ WAj
i=0

This arithmetic is an ordinary algebra. The weight


is the number of paths in each set.
Maximum Path Count
Arithmetic-example

l
k j
b i
a
d e f g h
c
Each link represents a single link and consequently is given a
weight of “1”To start. Lets say the outer loop will be taken exactly
four times and inner Loop Can be taken zero or three times
Path expression: a(b+c)d{e(fi)*fgj(m+l)k}*e(fi)*fgh
1

(4-4)
1 1 1
1 1 (0-3)
1 1 1 1 1 1

1
Annotated the flow graph by replacing the link name with the
maximum of paths through that link(1) also noted the number of
times for looping.
2
(4-4)
1 1
1 (0-3)
1 2 1 1 1 1 1

Combined the first pair of parallel loops outside the loop and also
the pair in the outer loop
2
(4-4)

1 (0-3)
2 1 1 1 1

Multiplied the things out and removed nodes to clear the clutter
 For the inner loop,

2
(4-4)

2 1 4 1

Take care of the inner loop, there are four possibilities leading to
four values. Then multiplied by the following link weight.
2
(4-4)

2 4 1

 Using cross term to create the self loop


with a weight.
2(4)=8 (4-4)
2 4

2 84 4
=32768

Alternatively, you could have substituted a “1” for each link in the
path expression and then simplified, as follows:
1(1+1)1(1(1*1)31*1*1(1+1)1)41(1*1)31*1*1
=2(131*(2))413
but 13=1+11+12+13=4
=2(4*24)*4=2*84*4=32768
Structured Flowgraph
 A structured flowgraph is one that can be
reduced to a single link by successive
application of the transformation of the
following:
A B A,B
PROCESS
A B A B
IF THEN ELSE
A
B
WHILE DO
A B A,B
REPEAT UNTIL
Structured Flowgraph-
continued.,
 Flow graphs that do not contain one or
more of the graphs shown below as
subgraphs are structured.
 Jumping into loops
 Jumping out of loops
 Branching into decisions
 Branching out of decisions
Unstructured Sub Graphs

Jumping into loops

Jumping out of loops

X
Unstructured Sub Graphs
X

Branching into
decisions

X
Branching out of
decisions
Lower Path Count Arithmetic
 A lower bound on the Case Path Weight
number of paths in a expressi expressi
routine can be
on on
approximated for
structured flow graphs.
Parallels A+B WA+WB
 The arithmetic is as
follows: Series AB max(WA
 The values of the weights WB )
are the number of
members in a set of
Loop A n
1, W1
paths.
Minimum Path Count-example.,

l
k j
b i
a
d e f g h
c
1

(4-4)
1 1 1
1 1 (0-3)
1 1 1 1 1 1

1
2
(4-4)
1 1
1 (0-3)
1 2 1 1 1 1 1

Combined the first pair of parallel loops outside the loop and also
the pair in the outer loop
2
(4-4)

1
2 1 1 1
2
(4-4)

2 1 1

(4-4)
2 2 1

2 2 1

2
Mean Processing times of
Routines
 Given the execution time of all statements
or instructions for every link in a flowgraph
and the probability for each direction for all
decisions are to find the mean processing
time for the routine as a whole.
 The model has two weights associated with
every link: the processing time for that link,
denoted by T, and the probability of that
link P.
The rules for mean processing times are:
Case Path Weight expression
expression
Parallels A+B TA+B=(PATA+PBTB)/(PA+PB)
PA+B= PA+PB
Series AB TAB=TA+TB
PAB =PAPB
Loop An TA=TLPL/(1-PL)
PA =PA/(1-PL)
Example
20
(0.95)

300
(0.05) 15
14
(0.3)
25 12
(0.6)
10 (0.3) 16 10 7
5
8 (0.4) (0.7)
(0.7)
40

 Start with the original flow graph annotated with


probabilities and processing time.
 Combine the parallel links of the outer loop. The result is
just the mean of the processing times for the links
because there aren’t any other links leaving the first
node. Also combine the pair of links at the beginning of
the flow graph.

34
15
14
(0.3)
12
(0.6)
10 16 10 5 7
35.5 8 (0.4) (0.7)
 Combine as many as serial links as you can

63

(0.3)
12
(0.6)
61.5 10 5 7
8 (0.4) (0.7)
 Use the cross term step to eliminate a node
and to create the inner self loop.
63

(0.3)
20
(0.6)
61.5 10 13 7
(0.4) (0.7)
63

(0.3)

61.5 10 13 7
30 (0.7)

63

(0.3)

61.5 53 7
(0.7)
116
(0.3)
61.5 60
(0.7)

61.5 49.714 60

171.214
Regular Expressions and Flow Anomaly
Detection
The Problem
The generic flow-anomaly detection problem is that of looking for a
specific sequence of operations considering all possible paths
through a routine.

Example:
Let’s the operations are SET and RESET, denoted by s and r
respectively, and we want to know if there is a SET followed
immediately a SET or a RESET followed immediately by a RESET
(i, an ss or an rr sequence).

62
Regular Expressions and Flow Anomaly
Detection

1) A file can be opened (o), closed (c), read (r), or written (w).
If the file is read or written to after it’s been closed, the sequence is nonsensical.
Therefore, cr and cw are anomalous.
Similarly, if the file is read before it’s been written, just after opening, we may
have a bug. Therefore, or is also anomalous

2) A tape transport can do a rewind (d), fast-forward (f), read (r), write (w), stop
(p), and skip (k). The following sequences are anomalous: df, dr, dw, fd, and fr.
Does the flowgraph lead to anomalous sequences on any path? If so, what
sequences and under what circumstances?

63
Regular Expressions and Flow Anomaly
Detection
The Method
Annotate each link in the graph with the appropriate operator or the

null operator 1
Simplify things to the extent possible, using the fact that

a + a = a and 12 = 1
We get a regular expression that denotes all the possible
sequences of operators in that graph.
Examine that regular expression for the sequences of interest

64
Regular Expressions and Flow Anomaly
Detection
Huang's Theorem

Let A, B, C, be nonempty sets of character sequences whose


smallest string is at least one character long.
Let T be a two-character string of characters. Then if T is a
substring of (i.e., if T appears within) ABnC, then T will appear
in AB2C.

As an example, let
A = pp
B = srr
C = rp
T = ss

65
Regular Expressions and Flow Anomaly
Detection
The theorem states that ss will appear in pp(srr)nrp if it appears in
pp(srr)2rp. We don’t need the theorem to see that ss does not
appear in the given string. However, let
A = p + pp + ps

 B = psr + ps(r + ps)

C = rp T = P4

Is it obvious that there is a p4 sequence in ABnC? The theorem

states that we have only to look at


 (p + pp + ps)[psr + ps(r + ps)]2rp.

66
Regular Expressions and Flow Anomaly
Detection

 If you substitute 1 + X2 for every


expression of the form X*, the paths that
result from this substitution are sufficient
to determine whether a given two-
character sequence exists or not.

67
Regular Expressions and Flow Anomaly
Detection
 Huang’s theorem can be easily generalized to cover sequences of
greater length than two characters.
 Beyond three characters, though, things get complex and this method
has probably reached its utilitarian limit for manual application.
 If A, B, and C are nonempty sets of strings of one or more characters,
and if T is a string of k characters, and if T is a substring of AB nC,
where n is greater than or equal to k, then T is a substring of ABkC.

68
 A sufficient test for strings of length k can be obtained by
substituting Pk for every appearance of P* (or Pn, where n is
greater than or equal to k). Recall that Pk = 1 + + P + P2 + P3
+ . . . + Pk
 A warning concerning the use of regular expressions: there are
almost no other useful identities beyond those shown earlier for
the path expressions.

 All flow analysis methods lose accuracy and utility if there are
unachievable paths

69
Regular Expressions and Flow
Anomaly Detection
 The flow anomaly detection problem is that of looking for
a specific sequence of operations considering all possible
paths through a routine.
 Here we are interested in knowing whether a specific
sequence occurred but not what the net effect of the
routine is.
 The method of anomaly detection:
 Annotate each link in the graph with the appropriate operator or
the null operator (1).
 Simplify things to the extent possible.
 After performing the above two steps you obtain a Regular
Expression that denotes the possible sequences of operators in
that graph.
 You can now examine that regular expression for the sequence of
interest.

You might also like