Unit 3 PDF
Unit 3 PDF
Unit 3 PDF
UNIT 3
PATH TESTING, DATA FLOW TESTING
3.1 Path Testing
Definition
Program graph is a directed graph in which nodes are either entire statements or fragments
of a statement, and edges represent flow of control.
If i and j are nodes in the program graph, there is an edge from node i to node j iff the
statement (fragment) corresponding to node j can be executed immediately after the
statement (fragment) corresponding to node i.
Constructing a program graph from a given program is an easy process. Line numbers refer to
statements and statement fragments. There is an element of judgment here: sometimes it is
convenient to keep a fragment (like a BEGIN) as a separate node, other times is seems better
to include this with another portion of a statement.
1. Program triangle 2 ‘structured programming version of simpler specification’;
2. Dim a, b, c as Integer;
3. Dim IsATriangle As boolean;
Step 1: Get Input
4. output ('Enter three integers which are sides of a triangle:');
5. Input (a,b,c);
6. output (“Side A is”' ,a)
7. output ( “Side B is “, b)
8. output (“side C is”, c);
Step 2: Is A Triangle?
09. IF (a < b + c) AND (b < a + c) AND (c < a + b)
10. THEN IsATriangle :=TRUE
11. ELSE IsATriangle := FALSE ;
12. End if
Step 3: Detremine Triangle Type
13. IF IsATriangle
14. Then If (a = b) AND (b = c)
15. THEN Output ('Triangle is Equilateral') ;
16. Else IF (a <> b) AND (a <> c) AND (b<> c)
17. THEN Output ('Triangle is Scalene') ;
18. Else output (“Isosceles”)
19. End if
20. End if
21. ELSE OUTPUT ('Not a Triangle') ;
22. END if
23. End triangle2
A program graph of this program is given in Figure 3.1. Nodes 4 through 7 are a sequence,
nodes 8 through 11 are an IF-THEN-ELSE construct (that terminates on an IF clause), and
nodes 14 through 16 are an IF-THEN construct. Nodes 4 and 22 are the program
source and sink nodes, corresponding to the single entry, single exit criteria.
4 5 6 7 8
10 11
12
13
14
16
21 15 17 18
19
20
22
23
3.2 DD-Paths
The best known form of structu
tural testing is based on a construct known as
a a decision-to-
decision path (DD-Path) [Milleer 77]. The name refers to a sequence of stat
tatements that, in
Miller’s words, begins with the “outway” of a decision statement and ends with the “inway”
of the next decision statement.
There are no internal branches in such a sequence, so the corresponding code
ode is like a row
of dominoes lined up so that wh
when the first falls, all the rest in the sequence
ence fall.
Miller’s original definition works well for second generation languages like FORTRAN II,
because decision making statements
ements (such as arithmetic IFs and DO loops)
oops) use statement
labels to refer to target statemeents.
With block structured langua
uages (Pascal, Ada, C), the notion of stateement fragments
resolves the difficulty of applyying Miller’s original definition—otherwise, we end up with
program graphs in which some sstatements are members of more than one DD-P
Path.
We will define DD-Paths in terms of paths of nodes in a directed graph. We might call these
paths chains, where a chain is a path in which the initial and terminal nodes are distinct, and
every interior node has indegree = 1 and outdegree = 1.
Notice that the initial node is 2-connected to every other node in the chain, and there are no
instances of 1- or 3-connected nodes, as shown in Figure 3.3. The length (number of edges)
of the chain in Figure 3.3 is 6. We can have a degenerate case of a chain that is of length 0,
that is, a chain consisting of exactly one node and no edges.
Interior nodes
Initial node Terminal node
Definition:
A DD-Path is a Sequence of nodes in a program graph such that
Case 1: it consists of a single node with indeg = 0,
Case 2: it consists of a single node with outdeg = 0,
Case 3: it consists of a single node with indeg ≥ 2 or outdeg ≥ 2,
Case 4: it consists of a single node with indeg = 1 and outdeg = 1,
Case 5: it is a maximal chain of length ≥ 1.
Cases 1 and 2 establish the unique source and sink nodes of the program graph of a structured
program as initial and final DD-Paths.
Case 3 deals with complex nodes; it assures that no node is contained in more than one DD-
Path. Case 4 is needed for “short branches”; it also preserves the one fragment, one DD-Path
principle.
Case 5 is the “normal case”, in which a DD-Path is a single entry, single exit sequence of
nodes (a chain). The “maximal” part of the case 5 definition is used to determine the final node
of a normal (non-trivial) chain.
This is a complex definition, so we’ll apply it to the program graph in Figure 9.1. Node 4 is a
Case 1 DD-Path, we’ll call it “first”; similarly, node 23 is a Case 2 DD-Path, and we’ll
call it “last”. Nodes 5 through 8 are a Case 5 DD-Path.
We know that node 8 is the last node in this DD-Path because it is the last node that
preserves the 2-connectedness property of the chain. If we went beyond node 8 to include
nodes 9, We Violate The indegree = outdegree = 1 criterion of a chain.
If we stopped at node 7, we would violate the “maximal” criterion. Nodes 10, 11, 15, 17,
18, and 21 are case4 DD-Paths. Nodes 9, 12, 13, 14, 16, 19, 20, and 22 are case3 DD-Paths.
Finall y node 23 is a case 2 DD-path.
All of this is summarized in Table 1, where the DD-Path names correspond to the DD-Path
graph in Figure 3.4.
Part of the confusion with this example is that the triangle problem is logic intensive and
computationally sparse. This combination yields many short DD-Paths. If the THEN and ELSE
clauses contained blocks of computational statements, we would have longer DD-Paths, as
we do in the commission problem.
First
C D
H
G
J
I
K L
Last
Definition
Given a program written in an imperative language, its DD-Path graph is the directed
graph in which nodes are DD-Paths of its program graph, and edges represent control
flow between successor DD-Paths.
In effect, the DD-Path graph is a form of condensation graph in this condensation, 2-
connected components are collapsed into individual nodes that correspond to Case 5 DD-Paths.
The single node DD-Paths (corresponding to Cases 1 - 4) are required to preserve the
convention that a statement (or statement fragment) is in exactly one DD-Path. Without
this convention, we end up with rather clumsy DD-Path graphs, in which some statement
(fragments) are in several DD-Paths.
3.3 Test Coverage Metrics
The raison d’être of DD-Paths is that they enable very precise descriptions of test coverage.
the fundamental limitations of functional testing is that it is impossible to know either the
extent of redundancy or the possibility of gaps corresponding to the way a set of functional
test cases exercises a program.
Test coverage metrics are a device to measure the extent to which a set of test cases covers
(or exercises) a program. There are several widely accepted test coverage metrics; most of
those in Table 2 are due to the early work of E. F. Miller [Miller 77].
Table 2 Structural Test Coverage Metrics
Metric Description of Coverage
C0 Every statement
C1 Every DD-Path (predicate outcome)
revealed.
Multiple Condition Coverage
Look closely at the compound conditions in DD-Paths A and E. Rather than simply traversing
such predicates to their TRUE and FALSE outcomes, we should investigate the different
ways that each outcome can occur.
One possibility is to make a truth table; a compound condition of three simple conditions
would have eight rows, yielding eight test cases. Another possibility is to reprogram
compound predicates into nested simple IF-THEN-ELSE logic, which will result in more DD-
Paths to cover.
Loop Coverage
The condensation graphs provide us with an elegant resolution to the problems of testing
loops. loops are a highly fault prone portion of source code. To start, there is an
amusing taxonomy of loops in [Beizer 83]: concatenated, nested, and horrible. shown in
Figure 3.5.
A A A
B B B
C C C
D D D
Horrible loops cannot occur when the structured programming precepts are followed.
When it is possible to branch into (or out from) the middle of a loop, and these branches
are internal to other loops, the result is Beizer’s horrible loop. (Other sources define this as a
knot—how appropriate.)
The simple view of loop testing is that every loop involves a decision, and we need to test
both outcomes of the decision: one is to traverse the loop, and the other is to exit (or not enter)
the loop.
Once a loop has been tested, the tester condenses it into a single node. If loops are nested,
this process is repeated starting with the innermost loop and working outward. This result
in the same multiplicity of test cases we found with boundary value analysis, which makes
sense, because each loop index variable acts like an input variable.
If loops are knotted, it will be necessary to carefully analyze them in terms of the dataflow
methods. As a preview, consider the infinite loop that could occur if one loop tampers with
the value of the other loop’s index.
B D
E
C F
McCabe based his view of testing on a major result from graph theory, which states that the
cyclomatic number of a strongly connected graph is the number of linearly independent
circuits in the graph. (A circuit is similar to a chain: no internal loops or decisions, but the
initial node is the terminal node. A circuit is a set of 3-connected nodes.)
We can always create a strongly connected graph by adding an edge from the (every) sink
node to the (every) source node. (Notice that, if the single entry, single exit precept is
violated, we greatly increase the cyclomatic number, because we need to add edges from each
sink node to each source node.) Figure 3.7 shows the result of doing this; it also contains
edge labels that are used in the discussion that follows.
There is some confusion in the literature about the correct formula for cyclomatic complexity.
the formula as V(G) = e - n + p
while others use the formula V(G) = e - n + 2p; everyone agrees that e is the number of
edges, n is the number of nodes, and p is the number of connected regions.
The confusion apparently comes from the transformation of an arbitrary directed graph (Figure
3.6) to a strongly connected directed graph obtained by adding one edge from the sink to the
source node (as in Figure 3.7). Adding an edge clearly affects value computed by the
formula, but it shouldn’t affect the number of circuits.
Here’s a way to resolve the apparent inconsistency: The number of linearly independent paths
from the source node to the sink node in Figure 3.6 is
V(G) = e - n + 2p = 10 – 7 + 2 ( 1 ) = 5
and the number of linearly independent circuits in the graph in Figure 3.7 is
V(G) = e - n + p = 11 – 7 + 1 = 5
B D
E
C F
The cyclomatic complexity of the strongly connected graph in Figure 3.7 is 5, thus there are
five linearly independent circuits. If we now delete the added edge form node G to node A,
these five circuits become five linearly independent paths from node A to node G.
In small graphs, we can visually identify independent paths. Here we identify paths as
sequences of nodes:
p1: A, B, C, G
p2: A, B, C, B, C, G
p3: A, B, E, F, G
p4: A, D, E, F, G
p5: A, D, F, G
We can force this beginning to look like a vector space by defining notions of addition and
scalar multiplication: path addition is simply one path followed by another path, and
multiplication corresponds to repetitions of a path.
McCabe arrives at a vector space of program paths. His illustration of the basis part of this
framework is that the path A, B, C, B, E, F, G is the basis sum p2 + p3 - p1, and the path A,
B, C, B, C, B, C, G is the linear combination 2p2 -p1.
It is easier to see this addition with an incidence matrix in which rows correspond to
paths, and columns correspond to edges, as in Table 3. The entries in this table are obtained
p1: A, B, C, G 1 0 0 1 0 0 0 0 1 0
p2: A, B, C, B, C, G 1 0 1 2 0 0 0 0 1 0
p3: A, B, E, F, G 1 0 0 0 1 0 0 1 0 1
p4: A, D, E, F, G 0 1 0 0 0 1 0 1 0 1
p5: A, D, F, G 0 1 0 0 0 0 1 0 0 1
ex1: A, B, C, B, E, F, G 1 0 1 1 1 0 0 1 0 1
ex2: A, B, C, B, C, B, C, G 1 0 2 3 0 0 0 0 1 0
Here we follow McCabe’s exaample, in which he first postulates the path through
throu nodes A,
B, C, B, E, F, G as the baseline. (This was expressed in terms of paths p1 - p5 earlier.)
e The first
decision node (outdegree ≥2) in this path is node A, so for the next basis path, we traverse
edge 2 instead of edge 1.
We get the path A, D, E, F, G, where we retrace nodes E, F, G in path 1 to be as minimally
different as possible. For the next path, we can follow the second path, and take the other
decision outcome of node D, which gives us the path A, D, F, G.
Now only decision nodes B and C have not been flipped; doing so yields the last two basis
paths, A, B, E, F, G and A, B, C, G. Notice that this set of basis paths is distinct from the one
in Table 3: this is not problematic, because there is no requirement that a basis be unique.
Rightly so, because there are two major soft spots in the McCabe view: one is that testing the
set of basis paths is sufficient (it’s not), and the other has to do with the yoga-like contortions
we went through to make program paths look like a vector space.
McCabe’s example that the path A, B, C, B, C, B, C, G is the linear combination 2 p2 - p1 is
very unsatisfactory. What does the 2p2 part mean? Execute path p2 twice? (Yes, according to
the math.) Even worse, what does the - p1 part mean? Execute path p1 backwards? Undo
the most recent execution of p1? Don’t do p1 next time? Mathematical sophistries like this
are a real turn-off to practitioners looking for solutions to their very real problems.
To get a better understanding of these problems, we’ll go back to the triangle program
example. Start with the DD-Path graph of the triangle program in Figure 3.4. We begin with
a baseline path that corresponds to a scalene triangle; say with sides 3, 4, 5.
This test case will traverse the path p1. Now if we flip the decision at node A, we get path
p2. Continuing the procedure, we flip the decision at node D, which yields the path p3. Now
we continue to flip decision nodes in the baseline path p1; the next node with outdegree = 2 is
node E.
When we flip node E, we get the path p4. Next we flip node G to get p5. Finally, (we know
we’re done, because there are only 6 basis paths) we flip node I to get p6. This procedure
Time for a reality check: if you follow paths p2, p3, p4, p5, and p6, you find that they
are all infeasible. Path p2 is infeasible, because passing through node C means the sides are
not a triangle, so none of the sequel decisions can be taken.
Similarly, in p3, passing through node B means the sides do form a triangle, so node L
cannot be traversed. The others are all infeasible because they involve cases where a triangle
is of two types (e.g., isosceles and equilateral).
The problem here is that there are several inherent dependencies in the triangle problem.
One is that if three integers constitute sides of a triangle, they must be one of the three
possibilities: equilateral, isosceles, or scalene. A second dependency is that the three
possibilities are mutually exclusive: if one is true, the other two must be false.
Recall that dependencies in the input data domain caused difficulties for boundary value
testing, and that we resolved these by going to decision table based functional testing, where
we addressed data dependencies in the decision table.
Here we are dealing with code level dependencies, and these are absolutely incompatible
with the latent assumption that basis paths are independent. McCabe’s procedure
successfully identifies basis paths that are topologically independent, but when these
contradict semantic dependencies, topologically possible paths are seen to be logically
infeasible.
One solution to this problem is to always require that flipping a decision results in a
semantically feasible path. Another is to reason about logical dependencies. If we think
about this problem we can identify several rules:
If node C is traversed, then we must traverse nodes H.
If node D is traversed, then we must traverse nodes G.
Taken together, these rules, in conjunction with McCabe’s baseline method, will yield the
following feasible basis path set:
The triangle problem is atypical in that there are no loops. The program has only 18
topologically possible paths, and of these, only the four basis paths listed above are feasible.
Thus for this special case, we arrive at the same test cases as we did with special value testing
and output range testing.
For a more positive observation, basis path coverage guarantees DD-Path coverage: the
process of flipping decisions guarantees that every decision outcome is traversed, which is
the same as DD-Path coverage.
If-then-else
If-Then Case
McCabe went on to find elemental “unstructures” that violate the precepts of structured
programming [McCabe 76]. These are shown in Figure 3.10.
First
First
A
A
a
B
C D a
F
F
H
G H
J G
I b
J
I
K L
K L
M
M
N
N
O
O
Last
Last
First
First
A
A
a
d
F
C
H
G
C
G
I
b
O
O Last
Last First
First e
A
e a
Last
Each of these “unstructures” contains three distinct paths, as opposed to the two paths present
in the corresponding structured programming constructs, so one conclusion is that such
violations increase cyclomatic complexity.
The piece d’ resistance of McCabe’s analysis is that these unstructures cannot occur by
themselves: if there is one in a program, there must be at least one more, so a program
cannot be just slightly unstructured.Since these increase cyclomatic complexity, the
minimum number of test cases is thereby increased.
The bottom line for testers is this: programs with high cyclomatic complexity require more
testing. Of the organizations that use the cyclomatic complexity metric, most set some
guideline for maximum acceptable complexity; V(G) = 10 is a common choice.
What happens if a unit has a higher complexity? Two possibilities: either simplify the unit or
plan to do more testing. If the unit is well structured, its essential complexity is 1, so it can
be simplified easily. If the unit has an essential complexity that exceeds the guidelines,
often the best choice is to eliminate the unstructures.
appropriate coverage metrics, and then use these as a cross check on functional test cases.
When the desired coverage is not attained, follow interesting paths to identify additional
(special value) test cases.This is a good place to revisit the Venn diagram view of testing that
we used in Chapter 1. Figure 3.12 shows the relationship between specified behaviors (set S),
programmed behaviors (set P), and topologically feasible paths in a program (set T).
As usual, region I is the most desirable — it contains specified behaviors that are
implemented by feasible paths. By definition, every feasible path is topologically possible,
so the shaded portion (regions 2 and 6) of the set P must be empty.
Region 3 contains feasible paths that correspond to unspecified behaviors. Such extra
functionality needs to be examined: if useful, the specification should be changed, otherwise
these feasible paths should be removed. Regions 4 and 7 contain the infeasible paths; of
these, region 4 is problematic.
Region 4 refers to specified behaviors that have almost been implemented: topologically
possible yet infeasible program paths. This region very likely corresponds to coding errors,
where changes are needed to make the paths feasible.
Region 5 still corresponds to specified behaviors that have not been implemented. Path
based testing will never recognize this region. Finally, region 7 is a curiosity: unspecified,
infeasible, yet topologically possible paths.
There is no problem here, because infeasible paths cannot execute. If the corresponding
code is incorrectly changed by a maintenance action (maybe by a programmer who doesn’t
fully understand the code), these could become feasible paths, as in region 3.
5 2 6
4 1 3
7
8 Topologically
possible paths
Definition
Node n € G(P) is a defining node of the variable v €V, written as DEF(v,n), iff the value of
the variable v is defined at the statement fragment corresponding to node n.
Input statements, assignment statements, loop control statements, and procedure calls are all
examples of statements that are defining nodes. When the code corresponding to such
statements executes, the contents of the memory location(s) associated with the variables are
changed.
Definition
Node n G(P) is a usage node of the variable v V, written as USE(v, n), iff the value
of the variable v is used at the statement fragment corresponding to node n.
Output statements, assignment statements, conditional statements, loop control statements, and
procedure calls are all examples of statements that are usage nodes. When the code
corresponding to such statements executes, the contents of the memory location(s) associated
with the variables remain unchanged.
Definition
A usage node USE(v, n) is a predicate use (denoted as P-use) iff the statement n is a
predicate statement; otherwise USE(v, n) is a computation use , (denoted C-use).
The nodes corresponding to predicate uses always have an outdegree ≥ 2, and nodes
corresponding to computation uses always have outdegree ≤ 1.
Definition
A definition-use (sub)path with respect to a variable v (denoted du-path) is a
(sub)path in PATHS(P) such that, for some v V, there are define and usage nodes DEF(v,
m) and USE(v, n) such that m and n are the initial and final nodes of the (sub)path.
Definition
A definition-clear (sub)path with respect to a variable v (denoted dc-path) is a
definition-use (sub)path in PATHS(P) with initial and final nodes DEF (v, m) and USE (v, n)
such that no other node in the (sub)path is a defining node of v.
Testers should notice how these definitions capture the essence of computing with stored
data values. Du-paths and dc-paths describe the flow of data across source statements from
points at which the values are defined to points at which the values are used. Du-paths that
are not definition-clear are potential trouble spots.
3.6.2 Example
We will use the Commission Problem and its program graph to illustrate these definitions.
The numbered source code is given next, followed by a program graph constructed
according to the procedures.
This program computes the commission on the sales of four salespersons, hence the outer
For-loop that repeats four times. During each repetition, a salesperson’s name is read
from the input device, and the input from that person is read to compute the total numbers of
locks, stocks, and barrels sold by the person.
The While-loop is a classical sentinel controlled loop in which a value of -1 for locks
signifies the end of that person’s data. The totals are accumulated as the data lines are read in
the While-loop.
After printing this preliminary information, the sales value is computed, using the constant
item prices defined at the beginning of the program. The sales value is then used to compute
the commission in the conditional portion of the program.
12 totalbarrels =0
13 input (locks)
14 while NOT (locks= -1) ‘loop condition uses -1 to indicate end of data’
15 input(stocks, barrels)
16 totallocks = totallocks + locks
17 totalstocks = totalstocks + stocks
18 totalbarrels = totalbarrels + barrels
19 input (locks)
20 Endwhile
21 Output (“Locks sold: “, totallocks)
22 Output (“stocks sold: “, totalstocks)
23 Output (“Barrels sold: “, totalbarrels)
24 locksales = lockprice * totallocks
25 stocksales = stockprice * totalstocks
26 barrelsales = barrelprice * totalbarrels
27 sales = locksales+ stocksales + barrelsales
28 output (“total sales:”, sales)
29 If (sales > 1800.0)
30 Then
31 Commission = 0.10 * 1000.0
32 Commission = Commission + 0.15 * 800.0
33 Commission = Commission + 0.20 * (sales - 1800.0)
34 Else if (sales > 1000.0)
35 Then
36 Commission = 0.10 * 1000.0
37 Commission = Commission + 0.15 * (sales - 1000.0)
38 Else
39 Commission = 0.10 * 1000.0
40 Endif
41 Endif
42 Output (“commission is $”, commission)
43 End commission
The DD-Paths in this program are given in Table 1, and the DD-Path graph is shown in Figure
10.2. Tables 2 and 3 list the define and usage nodes for five variables in the commission
problem. We use this information in conjunction with the program graph in Figure 10.1 to
identify various definition-use and definition-clear paths.
Table 1 DD-Paths in Figure 10.1
DD - Path Nodes
A 7,8,9,10,11,12,13
B 14
C 15,16,17,18,19,20
D 21,22,23,24,25,26,27,28
E 29
F 30,31,32,33
G 34
H 35,36,37
I 38,39
J 40
K 41,42,43
It’s a judgment call whether or not non-executable statements such as constant (CONST) and
variable (VAR) declaration statements should be considered as defining nodes. Technically,
these only define memory space (the CONST declaration creates a compiler-produced
initial value). Such nodes aren’t very interesting when we follow what happens along their
du-paths, but if there is something wrong, it’s usually helpful to include them. Take your pick.
We will refer to the various paths as sequences of node numbers.
7 8 9 10 11 12 13
14
15 16 17 18 19 20
21 22 23 24 25 26 27 28
29
34
30
38
31
35
32 36
39
37
33
40
23 42 43
p1 = <13, 14>
p2 = <13, 14,15,16>
p3 = <19, 20, 14>
p4 = <19,20, 14,15, 16>
Du-paths p1 and p2 refer to the priming value of locks which is read at node 13: locks
has a predicate use in the While statement (node 14), and if the condition is true (as in
path p2), a computation use at statement 16. The other two du-paths start near the end of the
While loop and occur when the loop repeats.
F H I
J
K
(USE(totallocks, 16), USE(totallocks, 21), USE(totallocks, 24)), we might expect six du-
paths.
Path p5 = <10,11,12,13,14,15,16> is a du-path in which the initial value (0) has a
computation use. This path is definition-clear. The next path is problematic:
p6 = <10,11,12,13,14,15,16,17,18,19,20,14,21>
We have ignored the possible repetition of the While-loop. We could highlight this by
noting that the subpath <16,17,18,19,20,14,15> might be traversed several times. Ignoring
this for now, we still have a du-path that fails to be definition-clear. If there is a problem
with the value of totallocks at node 21 (the WRITE statement), we should look at the
intervening DEF(totallocks, 16) node.
The next path contains p6; we can show this by using a path name in place of its
corresponding node sequence:
p7 = <10,11,12,13,14,15,16,17,18,19,20,14,21, 22, 23, 24>
p7 = < p6, 22, 23, 24>
Du-path p7 is not definition-clear because it includes node 16.
Subpaths that begin with node 16 (an assignment statement) are interesting. The first, <16,
16>, seems degenerate. If we “expanded” it into machine code, we would be able to separate
the define and usage portions. We will disallow these as du-paths.
Technically, the usage on the right-hand side of the assignment refers to a value defined at
node 10, (see path p5). The remaining two du-paths are both subpaths of p7:
p8 = <16,17,08,19,20,14,21>
p9 = <16,17,08,19,20,14,21,22,23,24>
Both of these are definition-clear, and both have the loop iteration problem we discussed
before.
p12 = <27,28,29,30,31,32,33>
Notice that p12 is a definition-clear path with three usage nodes; it also contains paths p10 and
p11. If we were testing with p12, we know we would also have covered the other two
paths.
The IF, ELSE IF logic in statements 29 through 40 highlights an ambiguity in the original
research. There are two choices for du-paths that begin with path p11: the static choice is the
path <27,28,29,30,31,32,33>, the dynamic choice is the path <27,28,29,34>. Here we will
use the dynamic view, so the remaining du-paths for sales are
p13 = <27,28,29,34>
p14 = <27,28,29,34,35,36,37>
p15 = <27,28,29,34,38,39>
Note that the dynamic view is very compatible with the kind of thinking we used for DD-
Paths.
Definition
The set T satisfies the All-Defs criterion for the program P iff for every variable v V, T
contains definition clear (sub)paths from every defining node of v to a use of v.
Definition
The set T satisfies the All-Uses criterion for the program P iff for every variable v V, T
contains definition-clear (sub)paths from every defining node of v to every use of v, and to the
successor node of each USE(v,n).
Definition
The set T satisfies the All-P-Uses /Some C-Uses criterion for the program P iff for every
variable v V, T contains definition-clear (sub)paths from every defining node of v to every
predicate use of v, and if a definition of v has no P-uses, there is a definition-clear path to at
least one computation use.
Definition
The set T satisfies the All-C-Uses /Some P-Uses criterion for the program P iff for every
variable v V, T contains definition-clear (sub)paths from every defining node of v to every
computation use of v, and if a definition of v has no C-uses, there is a definition-clear path
to at least one predicate use.
Definition
The set T satisfies the All-DU-paths criterion for the program P iff for every variable v
V, T contains definition-clear (sub)paths from every defining node of v to every use of v,
and to the successor node of each USE(v,n), and that these paths are either single loop
traversals, or they are cycle free.
These test coverage metrics have several set-theory based relationships, which are referred
to as “subsumption” in [Rapps 85]. When one test coverage metric subsumes another, a set
of test cases that attains coverage in terms of the first metric necessarily attains coverage
with respect to the subsumed metric. These relationships are shown in Figure 3.3.
All -Paths
All- DU-Paths
All- Uses
All- P-uses
All-Defs
All- Edges
All- Nodes
Definition
Given a program P, and a set V of variables in P, a slice on the variable set V at
statement n, written S(V,n), is the set of all statements in P that contribute to the values of
variables in V.
Listing elements of a slice S(V,n) will be cumbersome, because the elements are
program statement fragments. Since it is much simpler to list fragment numbers in P(G), we
make the following trivial change (it keeps the set theory purists happy):
Definition
Given a program P, and a program graph G(P) in which statements and statement
fragments are numbered, and a set V of variables in P, the slice on the variable set V at
statement fragment n, written S(V,n), is the set node numbers of all statement fragments in
P prior to n that contribute to the values of variables in V at statement fragment n.
The idea of slices is to separate a program into components that have some useful meaning.
First, we need to explain two parts of the definition. Here we mean “prior to” in the dynamic
sense, so a slice captures the execution time behavior of a program with respect to the
variable(s) in the slice.
Eventually, we will develop a lattice (a directed, acyclic graph) of slices, in which nodes are
slices, and edges correspond to the subset relationship.
Example
The commission problem is used in this book because it contains interesting data flow
properties, and these are not present in the Triangle problem (or in NextDate). Follow these
examples while looking at the source code for the commission problem that we used to
analyze in terms of define-use paths.
Slices on the locks variable show why it is potentially fault-prone. It has a P-use at node 14 and
a C-use at node 16, and has two definitions, the I-defs at nodes 13 and 19.
Think about slice S24 in terms of its “components”, the slices on the C-use variables. We can
write S24 = S10 S13 S16 S21 S22 S23, where the values of the six C-use variables
at node 36 are defined by the six slices joined together by the union operation. Notice
how the formalism corresponds to our intuition: if the value of sales is wrong, we first look
at how it is computed, and if this is OK, we check how the components are computed.
Everything comes together (literally) with the slices on commission. There are six A-def
nodes for commission (corresponding to the six du-paths we identified earlier). Three
computations of commission are controlled by P-uses of sales in the IF, ELSE IF logic. This
yields three “paths” of slices that compute commission. (See Figure 3.4.)
S31: S(commission, 31) = {31}
S32: S(commission, 32) = {31, 32}
S33: S(commission, 33) = {7,8,9,10,11,12,13,14,15,16,17,18,19, 20, 24, 25, 26,
27,29,30,31,32,33}
S34: S(commission, 36) = {36}
S35: S(commission, 37) = {7,8,9,10,11,12,13,14,15,16,17,18,19, 20, 24, 25, 26,
27,36,37}
S36: S(commission, 39) = {7,8,9,10,11,12,13,14,15,16,17,18,19, 20, 24, 25, 26,
27,29,34,38,39}
Whichever computation is taken, all come together in the last slice.
S37: S(commission, 41) = {7,8,9,10,11,12,13,14,15,16,17,18,19, 20, 24, 25, 26,
27,29,30,31,32,33,34,35,36,37,38,39}
The slice information improves our insight. Look at the lattice in Figure 3.4; it is a directed
acyclic graph in which slices are nodes, and an edge represents the proper subset relationship.
S31
S34
S32
S37
S37
2. Make slices on one variable. The set V in slice S(V,n) can contain several variables,
and sometimes such slices are useful. The slice S(V, 26) where
3. Make slices for all A-def nodes. When a variable is computed by an assignment
statement, a slice on the variable at that statement will include (portions of) all du-paths
of the variables used in the computation. Slice S({sales}, 36) is a good example of an
A-def slice.
4. Make slices for P-use nodes. When a variable is used in a predicate, the slice on that
variable at the decision statement shows how the predicate variable got its value. This
is very useful in decision-intensive programs like the Triangle program and NextDate.
5. Slices on non-P-use usage nodes aren’t very interesting. We discussed C-use slices
in point 2, where we saw they were very redundant with the A-def slice. Slices on O-
use variables can always be expressed as unions of slices on all the A-defs (and I-defs)
of the O-use variable. Slices on I-use and O-use variables are useful during debugging,
but if they are mandated for all testing, the test effort is dramatically increased.
6. Consider making slices compilable. Nothing in the definition of a slice requires that
the set of statements is compilable, but if we make this choice, it means that a set of
compiler directive and declarative statements is a subset of every slice. If we added this
same set of statements to all the slices we made for the commission program, our
lattices remain undisturbed, but each slice is separately compilable (and therefore
executable).
If there is a problem with commission at line 48, we can divide the program into two
parts, the computation of sales at line 34, and the computation of commission between
lines 35 and 48. If sales is OK at line 34, the problem must lie in the relative
complement; if not, the problem may be in either portion.
4. If you develop a lattice of slices, it’s convenient to postulate a slice on the very first
statement. This way, the lattice of slices always terminates in one root node. Show
equal slices with a two-way arrow.