Software Engineering A Holistic View
Software Engineering A Holistic View
BLUM
J
Digitized by the Internet Archive
in 2019 with funding from
Kahle/Austin Foundation
https://archive.org/details/softwareengineer0003blum
Software Engineering
THE JOHNS HOPKINS UNIVERSITY
Applied Physics Laboratory Series in Science and Engineering
BRUCE I. BLUM
Applied Physics Laboratory
The Johns Hopkins University
1 3 5 7 9 8 642
of my Navy research support was used for the preparation of this text,
it certainly has contributed significantly to my understanding of software
engineering.
Many individuals helped me to bring this book to completion. I first
thank Carl Bostrom, the Director of the Laboratory, Don Williams, the
Director of the Research Center, and Vince Sigillito, my immediate
supervisor, for making my environment secure and stimulating. I next
acknowledge (with apology) the help of the students in my 1990 software
engineering classes. They tolerated the preliminary drafts, the inappro¬
priate assignments, and the lack of a complete text. I hope that their
education did not suffer excessively; I am certain that the book
benefitted from their feedback. I also was fortunate in having the
following friends and colleagues review sections of book. Listed
alphabetically, they are: Robert Arnold, Boris Beizer, Grady Booch,
David Card, Peter Coad, James Coolahan, Robert Glass, Mars Gralia,
David Gries, Watts Humphrey, Larrie Hutton, Michael Jackson, Harlan
Mills, Jochen Moehr, David Parnas, and Ralph Semmel. They made
many helpful comments and caught some embarrassing mistakes. I
appreciate their time and interest; clearly, I am responsible for the
errors and inaccuracies that remain.
Having written my first book with a fountain pen, I am proud to
state that this one I typed myself. Still, I required the help of many
others to get it into its final state. Joe Lew and his Graphics and
Composition Section prepared the artwork and responded promptly to
my many updates; Jane Scott got my all letters out on time; John Apel,
who is coordinating the APL series with Oxford, made my job much
easier; and Jeff Robbins, my editor at Oxford, Anita Lekhwani, his
editorial assistant, and the production staff at Oxford also were most
helpful and supportive. Thank you all.
The last paragraph belongs to Harriet. She showed patience when
I was compulsive, interest when I was boring, and belief when I told her
the book would be finished in “about one more month.” She adds
excitement to my life, and that helps me keep software engineering in
perspective. Once again, thanks.
B. I. B.
Laurel, Md.
June 1991
CONTENTS
PROLOGUE, 3
3. MODELING-IN-THE-LARGE, 181
4. MODELING-IN-THE-SMALL, 269
EPILOGUE, 503
Index, 579
Software Engineering
PROLOGUE
Software Engineering: A Holistic View is the third (and final) title for this
book. Yet another book on software engineering? This prologue
explains why I felt one more was required and why I had such trouble in
selecting a title. Tb begin with, I find most software engineering texts
less than satisfactory. They seem to start with a chapter that tells us
why software is important and how poorly we have performed historical¬
ly. Next, the text describes how to manage the software process, usually
with references to the famous waterfall flow (which is criticized). The
remaining chapters follow the waterfall development cycle.
These books make it easy to teach a software life-cycle model and
identify the required documents. The texts also identify some methods
and tools, but often the descriptions are not integrated. Many different
examples are used, and it is difficult to contrast methods. Of course,
one should not expect the students in a one-semester software engineer¬
ing course to learn more than a few methods well. Still, most students
come away with just an understanding of the software life cycle as it was
designed for large projects in the mid-1970s. It is a clearly stated
process that can be memorized and repeated. Unfortunately, the rest
remains fuzzy. My early students seemed to accept software engineering
methods as an adjunct to what they conceived to be their primary
activity: writing program code.
After using these texts for several years, I decided that an alternative
was necessary. The available books were based on a life-cycle orienta¬
tion intended for large, complex systems that involved both hardware
and software development. Most of my students, however, would work
in project teams of 40 or fewer. I was not sure that the large-scale
approach would scale down. I also had a second concern. Software
engineering involves the creation of problem solutions using software,
and code writing should be a relatively small part of that process. Yet
with small class projects, the effort devoted to coding became so great
that many students confused software engineering with good coding
practices. (In fact, there are at least two books with software engineer¬
ing in the title that are, in essence, language-specific programming
texts.)
Thus, the first of my titles was Software Engineering for Small and
Large Projects. Because most users of the book would work on small
(more than half-an-effort-year) or large (up to 100-effort-year) projects,
it seemed appropriate to focus on software engineering for activities of
4 SOFTWARE ENGINEERING: A HOLISTIC VIEW
By 1955 the architecture was understood, and the model that was to
dominate computing through the 1990s had been defined. In fact, this
may explain the phenomenal advances that we have seen in computer
hardware. For almost half a century we have been refining a basic
design and miniaturizing its components. The result has been reduced
costs, increased functionality, and the identification of new, but related,
applications, (e.g., networks and communications). By way of contrast,
progress is much slower in those areas in which we have poorer models
(such as with highly parallel computing).
The computer had emerged; now it was to be used. The first
applications were obvious. The machine was called a calculator, and the
person who used it was the computer. The immediate need was for a
calculator to aid the “computer” in performing repeated computations.
The initial outputs were numerical tables used for ballistic computations,
statistical evaluations, and so on. Soon it was recognized that the
computer (as it now had become known) could manage any symbolic
information, and development of tools that facilitated programming
followed. In the mid-1950s Fortran proved that high-level languages
(HLL) could produce efficient object code. By 1961 Algol, Cobol, and
Lisp were operational. Thus, by the second decade of digital computing,
all of the principal HLLs (other than Prolog) were in existence. Of
course, Algol has undergone numerous name changes; current versions
include Pascal and Ada.
The ENIAC, developed by Mauchly and Eckert in 1945, was the first implementa¬
tion of a large scale vacuum tube computer. It was developed in parallel with von
Neumann’s design efforts. An early modification to the ENIAC replaced its plugboard
memory with a stored program memory.
THE SOFTWARE PROCESS 13
A brief case history of one job done with a system seldom gives
a good measure of its usefulness, particularly when the selection
is made by the authors of the system. Nevertheless, here are the
facts about a rather simple but sizable job. The programmer
attended a one-day course on FORTRAN and spent some more
time referring to the manual. He then programmed the job in
four hours, using 47 FORTRAN statements. These were
compiled by the 704 in six minutes, producing about 1000
instructions. He ran the program and found the output
incorrect. He studied the output and was able to localize his
error in a FORTRAN statement he had written. He rewrote the
offending statement, recompiled, and found that the resulting
program was correct. He estimated that it might have taken
three days to code the job by hand, plus an unknown time to
debug it, and that no appreciable improvement in speed of
execution would have been achieved thereby. [Back57]
Here we see that the HLL allows the programmer to move from a
description of the computation to be performed by the computer to an
explanation of the calculation that is desired. That is, the 47-line
statement defines what is to be done whereas the 1000 instructions detail
how the calculation is to be carried out. The former is an expression of
the problem to be solved; it is more compact than and just as efficient
as its 1000-line expansion.
The 1950s also saw the commercialization of the computer. Eckert
and Mauchly formed their own company and produced the UNIVersal
Automatic Computer (UNIVAC I), which was installed in the Census
Bureau in 1951. Eventually, they were bought out by Remington Rand,
a company that had assembled an impressive array of technological
skills. It was IBM, however, that was able to recognize the industry
needs and ultimately dominate the field. Until 1959, when the transis¬
torized 7090 was released, their offerings were still vacuum-tube based.
Yet, after a somewhat shaky start, the 7090/7094 emerged as the most
common platform for large-scale computing. The success of their early
computers became obvious when, in 1964, IBM offered its third
generation: the 360 family. The new systems corrected the limitations
of the earlier generation. For example, character codes were expanded
to 8 rather than 6 bits thereby supporting both uppercase and lowercase
characters. These improvements meant that the keypunch machines,
tape drives, and some old programs had to be changed. In a very short
period of time, a new but obsolescent technology had entrenched itself,
14 SOFTWARE ENGINEERING: A HOLISTIC VIEW
and the users who had committed themselves to the early systems
objected to the changes. Consequently, the industry resolved to make
future hardware transitions transparent to the users. Thus, while there
is a fifth-generation computer, no computers are labeled fourth
generation.
This was the background leading to the 1968 Garmisch NATO
Science Committee conference on software engineering. Computers
were in widespread use, integrated circuits had made them more
powerful and smaller, the new operating systems supported man-machine
interaction, and the HLLs had shown the way to improved productivity
and reliability. Yet, progress was slower than one would have hoped.
Large projects were difficult to manage and coordinate. Many computer
professionals had learned their skills on the job, and few were aware of
the techniques that a century of engineering had refined. As a result, 50
people from a dozen countries were invited to participate in this first
conference to address the problems of building software applications.
The attendees came from the ranks of academia and industry; each was
involved in the application of computers in a large problem.
The Garmisch conference was a resounding success. A sense of
euphoria arose from the fact that so many people had a common view
of the problem and a belief that a solution was possible. A second
conference, to meet near Rome, was called for the following year. The
following is taken from the Rome report.
With this definition, few would admit that they did not apply software
engineering on their projects. Indeed, Bauer later remarked in 1972:
Here the emphasis shifts to a scientific knowledge base that will guide
in making and evaluating decisions. For systems software (Area 1),
Boehm believed that a scientific base, founded on logical formalisms and
refined empirically, already existed. Engineering discipline was desirable
in the context of a predictable process to be managed. By way of
contrast, Dijkstra approached the problem from the perspective of
computer science. His discipline of programming was mathematically
motivated. These are two different categories of discipline, two
approaches to the problem.
18 SOFTWARE ENGINEERING: A HOLISTIC VIEW
The conferences and journals are the domain of the first subculture.
This is how it should be; academicians are expected to identify solutions
to general problems that may not impact the state of practice for five or
more years. Much has trickled down, and the second subculture has
benefitted from this knowledge. Still, at any point in time, each
subculture will be facing different sets of problems, and often it is the
differences that stand out. The author is a researcher who hopes to
discover ways in which we can advance the state of the art. But this is
a book for practitioners; it builds on what experience has shown us.
Only in a very limited way does it report on experiments or speculate
about what the future may bring.
Freeman suggests an alternative way of dividing the two cultures:
according to their computer science orientation. He points out that
The basic difference between the scientist and the engineer lies
in their goals. A scientist strives to gain new knowledge about
THE SOFTWARE PROCESS 19
9
In the Freeman quotation there is a reference to “information systems.” It is
common in the United States to equate this term with management information systems
and commercial database applications. This is a very narrow view. In much of the world
information technology (IT) includes electronicsystems, telecommunications, computing,
artificial intelligence, etc. [MaBu87, p. 2] In that sense, all computer applications are
indeed information systems.
20 SOFTWARE ENGINEERING: A HOLISTIC VIEW
Although I use the term software process, many still refer to it as the
software life cycle. Frequently, what they mean is a sequence of phases
in the development and use of a software product that is described with
the “waterfall model.” This section discusses that waterfall model and
identifies its strengths and weaknesses. The reader should be aware that
many in the software engineering community feel that this model has
been discredited, whereas others believe it is a basic extension of the
problem-solving paradigm.4 In any case, it is currently in widespread
use, and most of our empirical data have been derived from projects
In mathematics, functions map from a set (called its domain) onto a set (called its
range). Using this concept, the term application domain denotes the "field of use."
Examples of application domains are business systems, embedded military systems, and
operating systems. Domain is used without a qualifier when the context is obvious.
VALIDATION
SOFTWARE
REQUIREMENTS
PRELIMINARY
DESIGN
VALIDATION
DETAILED
FZ DESIGN <-
CODE AND
BEBUG
DEVELOPMENT ,
TEST / TEST AND
—> PREOPERATIONS
VALIDATION TEST/
■ >
OPERATIONS
I AND
MAINTENANCE
REVALIDATION
Hardware Software
5Some diagrams display the parallel development of the hardware and software plus
the integration that produces a final system. The figures in this book show only the
software-related activities.
28 SOFTWARE ENGINEERING: A HOLISTIC VIEW
CD
O
c
Q)
TD
C
o
CL
CO
CD
O
O
essential software process model, and I assert that all software process
models are specializations of it.6
Like the previous model, the essential software process consists of
three transformations, which when composed represent a transformation
from a need to a software solution. In this case the three transforma¬
tions are as follows.
Relating this back to the model shown in Figure 1.4, here the first two
transformations are concerned with the definition of the problem
statement and the third represents the process of detailing the imple¬
mentation statement; Figure 1.5 does not show the activity of converting
a set of programs into a system.
What new insights does this essential model provide? First, it points
out that the process takes place in (at least) two different domains. The
development team must know about the domain in which the application
will be used; this is the basis for the validation decisions. Obviously, the
team also must understand the software environment and tools that they
are to use in the implementation. Computer science teaches only the
second domain, and many applications are in that domain, (e.g., software
tools, operating systems). But application domain experience is equally
as important. Because there are so many application domains, most of
that domain knowledge is acquired on the job. In fact, this is why
software development companies are asked to detail their previous
experience when bidding on a contract. An accounting system is
relatively easy for a company that has built many such systems but quite
difficult for a company that has built only compilers.
Next, it is clear that two kinds of modeling technique are required.
Conceptual modeling relies on the formalisms of the application domain
6In this sense, Figure 1.5 presents a metamodel of the software process, i.e., a model
of the operational software process models.
THE SOFTWARE PROCESS 35
7As mentioned earlier, I have been careful to avoid the word correct. Correctness
is a property that must be with respect to some agreed statement. For example, a
program is correct with respect to its design specification. If a design specification has an
error, then a correct program will have that error. Validity, on the other hand, relates to
the ability of the software product to satisfy the application need. I have been using the
word “proper” to suggest that the design or model is both valid (i.e., the specification
establishes exactly what is desired) and correct (i.e., the product is error free with respect
to its specification). A proper design should result in an error-free product. Perhaps I
should not have left this point to a footnote, but I will make it again.
36 SOFTWARE ENGINEERING: A HOLISTIC VIEW
There are three basic approaches to revising the software process model.
The first is to adjust the management of the waterfall model so that it
will be less sensitive to the need for a complete requirements specifica¬
tion at the start of the project, the second is to integrate software
experimentation with the process, and the third is to extend the scope
of formalism. The three approaches are not mutually exclusive. Each
is intended to reduce uncertainty regarding the application need and the
associated solution, improve the reliability of the product by the early
elimination of errors, and provide tools that can take advantage of
software’s special properties. As we shall see, most modifications to the
waterfall model tend to move it away from the architecture metaphor
and closer to the sculpture metaphor.
All process models recognize the fact that small problems are easier
to solve than large ones. In the waterfall model the large project is
broken into smaller problems by decomposition. The specification is
subdivided into components, which in turn are subdivided into modules,
and so on. The result is a set of program units that must be designed,
implemented, and then integrated. An alternative partitioning of the
system divides the requirements specification into a family of layered
specifications. Each higher level implementation, usually called a build,
depends on the functions provided by earlier (or lower level) implemen¬
tations. The first builds to be constructed are those that are the best
understood or will provide the most service for subsequent builds.
Builds scheduled for later integration are only sketchily defined until
they are ready for implementation. We call this incremental or
evolutionary development [MiLD80, Gilb88j.
There are many advantages to this incremental approach. First,
because a larger problem is decomposed into a sequence of smaller
problems, each implementation task will be relatively small. There are
many short schedules, which makes progress easy to track. Construction
of a build normally is managed using a waterfall model. However,
because there are many small deliverables being produced on a
semimonthly schedule, there can be rapid feedback for the evaluation
process. Team members will not be faced with an extended period of
speculation about the utility and completeness of their designs; there
will be a sense of immediacy to their actions.
A second advantage of incremental design is that the development
team has an opportunity to learn about the problem as the design
progresses. The process may be characterized as one of doing what is
understood best, learning from that activity, identifying the next build,
and then iterating. Historically, this is how information systems were
developed to assist in patient care [Lind79j. In the beginning, the needs
were not well understood because the designers lacked domain experi-
THE SOFTWARE PROCESS 37
the point of all this ... is that questions of timing, storage, etc.
which are otherwise matters of judgment, can now be studied with
precision. Without the simulation the project manager is at the
mercy of human judgment. With the simulation he can at least
perform experimental tests of some key hypotheses and scope down
what remains for human judgment, which in the area of computer
program design (as in the estimation of takeoff gross weight, costs
to complete, or the daily double) is invariably and seriously
optimistic. [Royc70, p. 7]
Both references recognize that when the systems are new, the first
implementations are contaminated by artifacts of the learning process.
Either they are not as good as the increased understanding would allow
them to be, or they contain a residue of false starts and rejected
approaches. In either event, the programs do not offer an effective base
for a robust and maintainable system. Thus, the suggestion that the first
system should be treated as a learning experience. Of course, this simply
reflects good engineering practice. Before building a missile, for
example, there is extensive analysis, computer simulations are run,
individual units are constructed and tested, prototypes are built and
evaluated, and then—and only then—does the missile go into production.
The building of two generations of a software product may seem like a
very expensive learning technique, but prototypes allow us to increase
our understanding selectively in a more targeted and less expensive
manner.
Although rapid prototyping was a topic of discussion in the 1970s,
it received renewed attention in 1981 when Gomaa and Scott presented
their experience at the International Conference on Software Engineer¬
ing [GoSc81], Their goal was to specify the user requirements for a
THE SOFTWARE PROCESS 39
Cumulative
cost
Progress
through
steps
Evaluate alternatives,
Determine identify, resolve risks
objectives,
alternatives,
constraints.
Review
Develop, verify
next-level product
are of the correctness of the object program with respect to the source
program.
At present, most formal methods are research topics. Nevertheless,
some techniques have been used successfully on commercial products.
For example, there is over 10 years of operational experience with the
Vienna Development Method (VDM), and both quality and productivity
are quite good with this method [Jone80, C0HJ86]. Many other formal
methods (some supported by automated environments) are emerging.
Although these tools still have limited acceptance, it seems reasonably
certain that many will evolve into the system-level HLLs that some of
the NATO conference organizers envisioned. Although I provide a
simple illustration of VDM in Chapter 2, this is essentially a book for
today’s practitioners. Consequently, I concentrate on the methods and
tools used most widely. In time, I am certain, software developers will
come to rely on formal methods. For now, however, I shall limit my
discussion to the concept of a prototype in a formal system.
T\vo types of prototype already have been defined: the throwaway
prototype and the evolutionary knowledge-based prototype. The
availability of formal specifications enables the use of a third class of
prototype: the executable specification. A specification differs from a
program in that the specification defines the behavior for all implemen¬
tations while the program defines an efficient implementation for a
particular target operating system and machine. If the specification is
formal and if there is a translator such that the specification is opera¬
tional (i.e., it can be used to determine if the specification defines the
desired behavior for arbitrary test cases), then this operational specifica¬
tion can be viewed as a prototype of the intended system [Zave84]. Of
course, the operational specification does not offer the efficiency of the
end product, but it does provide a test bed for experimenting with the
statement of requirements.
Balzer points out that implementing a system mixes two very difficult
tasks: describing what the software is to do (i.e., its behavior) and
optimizing the performance to produce an acceptable implementation
[Balz85], The operational approach allows the developers to separate
these two concerns. The goal of this type of development can be
thought of as using a CAD/CAM system to deliver a design that is
transmitted automatically to tooling machines and robots that produce
finished products. We do not yet have enough knowledge to do this
with either software or hardware manufacture, but it is a worthwhile
objective and one that will soon be feasible for specialized domains.
When such tools become available, the role of the computer professional
(as well as the design engineer) will change. There will be less concern
for how to construct a solution (after all, much of that process will be
automated) and more regard for what the product is to do and how it
will be used.
42 SOFTWARE ENGINEERING: A HOLISTIC VIEW
of methods for problem solving. In this sense, the theme of this book
may be characterized as a problem-solving approach to software
engineering. What is the application need to be satisfied by a software
product? How do we design a product that meets that need? How do
we detail the design to meet the desired performance characteristics?
Each question is a response to an identified problem, and we rely on
software engineering to provide the methods and tools for arriving at a
proper solution to each. The problems bridge several domains; they
begin in the application domain and evolve to more detailed problems
in the implementation domain.
Now, if I intend to present software engineering from the perspec¬
tive of problem solving, then I should examine how software engineers
solve problems. I do this at three levels. First, I present an overview
of how people solve problems. One may view the next section as
interesting psychology but irrelevant software engineering; nevertheless,
it is important for problem solvers to understand how the human
information processor works and what its strengths and weaknesses are.
We should be sensitive to the kinds of mistakes that humans are prone
to make and how they may be avoided. Following the section on
problem solving there is one on modeling concepts. Again, the
discussion may be rejected as irrelevant—this time for being too
philosophical. The point made by the discussion is that we use models
to guide use in certain types of problem solving. The presentation
identifies the power and limitation of our formal models. In a sense,
not only am I telling readers that there is no cookbook, but I am also
telling them that it is impossible to write one. The final unit in this
section on software engineering as problem solving examines how
software tools and environments can improve the process. The bottom
line is that software design is a human, conceptual activity, and no silver
bullet can solve all our problems. Once we accept these constraints as
our “software engineering solution space,” we can then consider how to
produce software that is valid, correct, and delivered on schedule and
within cost.
There are several ways to review how people solve problems. First, I
might examine the psychology literature of that topic to summarize what
has been learned. For example, consider Dunker’s candle problem,
which has been studied for over 50 years. The subject is given a candle,
a book of matches, and a box of tacks and is instructed to attach the
candle to a wooden door so that there will be light for reading. Think
about solving this problem. Most people try to find some way of
embedding the tack into the candle so that the candle can be tacked to
44 SOFTWARE ENGINEERING: A HOLISTIC VIEW
the door. As it turns out, the “best” (but not the most obvious)
solution is to tack the box to the door and use the box as a holder for
the candle. Not only does this solution hold the candle so that damage
to the door is minimal, but it also catches the dripping wax. The
resolution of this problem involves the use of materials in unintended
ways. Interesting, but how does this relate to the kind of problems that
a software engineer encounters? Let me continue.
A second approach to problem solving is to teach a general method
or a heuristic. Bransford and Stein call their method the IDEAL
problem solver [BrSt84]. Here the acronym stands for the following
sequence of activities:
It can be seen that their method is not very different from the sequence:
o
A heuristic is an informal rule of thumb that generally provides a good solution. By
way of contrast, an algorithm is precise and always arrives at a solution for the domain in
which it is valid. When dealing with uncertainty, algorithms are not available, and we must
fall back to heuristics. Normally, as understanding grows, heuristics are replaced by
algorithms. In a sense, one may think of a heuristic as a formalization of intuition. Of
course, intuition matures with experience, and we replace trial-by-error approaches with
proven methods. An obvious goal of education is to make the student aware of those
methods so that they do not use heuristics when algorithms are available.
THE SOFTWARE PROCESS 45
9The value 7 ± 2 is taken from the title of Miller’s seminal paper, “The magical
number seven plus or minus two: Some limits on our capacity for processing information”
[Mill56],
LONG-TERM MEMORY
ITM
MLTM = x.
kltm = Semantic
WORKING MEMORY
Sensory information flows into Working Memory through the Perceptual Processor.
Working Memory consists of activated chunks in Long-Term Memory. The basic
principle of operation of the Model Human Processor is the Recognize-Act Cycle of
the Cognitive Processor (PO in Figure 2.2). The Motor Processor is set in motion
through activation of chunks in Working Memory.
47
48 SOFTWARE ENGINEERING: A HOLISTIC VIEW
CBSIBMRCA
The process is iterative. It may result in a decision that more data are
required (i.e., that a test should be ordered), that a probable diagnosis
(and associated therapy) is indicated, or both.
In analyzing the model, researchers found that the generation of
early hypotheses had considerable natural force; medical students
generated early hypotheses even when asked to withhold judgment. The
number of hypotheses was usually around four or five and appeared to
have an upper bound of six or seven. The generation of hypotheses was
based more consistently on single salient cues rather than on combina¬
tions of clues. Very few cues seemed to be used, and hypothesis
selection was biased by recent experience. Also, cue interpretation
tended to use three measures: confirm, disconfirm, and noncontributory;
the use of a seven-point scale had no greater explanatory power.
Finally, researchers noted that lack of thoroughness was not as
important a cause of error in diagnosis as were problems in integrating
and combining information. Interpreting these results in the context of
the information-processing model already presented, one might observe
that simple pattern matches are preferred, that the magical number 7 ±
2 holds, and that retrieval from LTM in response to a problem is quite
spontaneous.
In an unrelated study, McDonald tried to measure the impact of
reminders on physician behavior [McDo76]. Here, the physicians were
assisted by a computer system that reviewed a patient’s history and made
recommendations for the current visit. For example, if the patient was
a woman who had not had a Pap smear in the last 12 months, then the
program would recommend that one be ordered. Previous studies had
shown that these reminders affected the physician’s behavior. McDonald
now wanted to learn something about the long-term effect of this type
of system. He designed a crossover study divided into two phases. In
the first phase, half the clinicians received the reminders and the other
half worked without them. During the second phase, the reminder
group worked without the reminders and vice versa. An analysis of the
physicians’ actions again indicated positive changes whenever they had
the reminders. McDonald also expected to find a learning effect; he
thought that the group that first used the reminders would better
conform to the standards of care after the reminders were withdrawn.
52 SOFTWARE ENGINEERING: A HOLISTIC VIEW
But this was not the case. Neither order in the experiment nor the
physicians’ experience affected their behavior. The only factor that
seemed to impact behavior was the presence of a reminder. McDonald
concluded that the physicians were confronted by too many standards of
care and, consequently, suffered from information overload. He titled
his paper, “Computer reminders, the quality of care, and the nonperfect-
ability of man.”
Fortunately, the software engineer seldom has to make decisions
with the time pressures found in medical situations. Accuracy and
quality are far more important than speed to solution; one can indulge
in extensive creative worrying before being satisfied that the proposed
solution is sound. This, then, raises another question. If we think about
a problem long enough and hard enough, will we be able to solve it? It
would be nice to postulate that, when a problem is solvable, we always
use rational means to solve it. Unfortunately, papers with titles such as
suggest that this may not be possible. This raises the issue of bias in
human reasoning [Evan89]. How rational are we when we approach a
problem?
There have been many studies of how people interpret syllogisms.
In the 1920s, Wilkins found that people were better able to evaluate the
validity of concrete arguments, such as
Subsequent research has shown that the use of quantifiers (some or all)
or the introduction of ambiguity into the syllogism adds to the difficulty
in comprehending argument validity. For example,
has logical validity but lacks factual truth. Many subjects allow their
evaluation of the factual truth to bias their interpretation of the logical
validity, they examine the problem in the context of their mental model
rather than as a logical abstraction. (See [Gilh88], from which the last
syllogism was taken, for a more complete discussion of this and other
classic experiments in directed thinking.)
Another type of reasoning bias was uncovered by Wason in the mid-
1960s. In the selection task, a subject is given a deck of cards with
letters on one side and numbers on the other. The rule is that if there
is an A on one side of a card, then there is a 3 on the other side. The
following cards are shown:
A D 3 7
Which cards must be turned over to see if the rule is true or false?
We must turn over the A card to see if a 3 is on the other side and
the 7 card to see if an A is on the other side. Most people select the A
card for turning to see if the cards are consistent with the rule:
One side A => Other side 3, [i.e., (p a (p -> q)) => q, or modus
ponens].10
Most people do not select the 7 card even though it is required to test
the complementary rule
One side not 3 => Other side not A, [i.e., (--q a (p -> q)) => --p, or
modus tollens].
1(^A philosophical footnote. One cannot prove that for every card with an A on one
side there is a 3 on the other unless every A card can be inspected. However, just one A
card with something other than a 3 on the other side proves the rule to be false. This is
the source of the expression “the exception that proves the rule.” (It does not mean, as
it is commonly used, that finding a counterexample adds to our confidence in the rule’s
correctness.) To take this a step further, much of science is empirical in that one begins
with a hypothesis, and then collects data to confirm that hypothesis empirically. Popper
pointed out that, unless all possible cases could be tested, this approach proved nothing.
Proof, in the logical sense, could only refute the hypothesis—not support it [Popp59],
Thus, he suggested that experiments should be defined to falsify the hypothesis so that if
they failed we would have greater confidence in an unprovable hypothesis. It was over a
decade before Popper published his observations; he assumed that it was so obvious that
everyone recognized this fact. Now read on about confirmation bias and the hesitancy to
refute favored hypotheses.
54 SOFTWARE ENGINEERING: A HOLISTIC VIEW
8 x 7 x 6 x 5 x 4 x 3 x 2 x 1,
1 x 2 x 3 x 4 x 5 x 6 x 7 x 8.
N N
Diego to Reno) from local facts (e.g., Nevada borders on the east of
California), and they were wrong. The conclusion that I would like to
leave with the reader is that images (and diagrams) can be of enormous
help in problem solving and visualizing potential solutions. Neverthe¬
less, such devices are not free of bias.
It is now time to summarize what has just been presented in the
context of software engineering. Starting with the physical characteris¬
tics, we note that the human information processor has a small working
memory capacity. The working (or short-term) memory processes
symbolic chunks. Its contents decay rapidly, and they must be either
rehearsed or activated from long-term memory. (Like an airplane in
flight, it can never stop.) Activation is associative and is guided by the
current context (goal). The hypothetico-deductive model indicates that
THE SOFTWARE PROCESS 57
The axiomatic method says that our theorems are true if our
axioms are. The modeling method says that our theorems model
facts in the domain modeled if there is a close enough fit
between the model and the domain modeled. . . . [A]s a
philosopher might say, applied mathematics may not guarantee
knowledge of facts about the physical world, but it can lead to
the next best thing—justified true belief. [Barw89, p. 847]
That is, if we accept the validity of our axioms, then we can have
confidence in all that follows (i.e., justified true belief). But we can
have confidence in these axioms only from experimental, empirical
evidence. Barwise defines the Fallacy of Identification as the failure to
distinguish between some mathematical model and the thing it is a
model of.
So, what does that have to do with software engineering? Every¬
thing. When I defined the essence of the software process, I described
it as a three step process:
nSome computer scientists have developed representation schemes that are broad
enough to support formal modeling from the preliminary (and incomplete) initial
specification through the final implementation. These are called wide spectrum languages.
THE SOFTWARE PROCESS 61
Abstraction Reification
Domain
Objective Experience & Verification
Formalisms
Requirements
Subjective Analysis Validation
Methods
In short, one learns by doing and then reacting to the observed mistakes.
Fortunately, there are methods and tools that can help the software
engineer in “creative” problem solving. Most are designed to structure
what is known in the form of a model that (1) offers a clear expression
of what has been agreed to, (2) aids in the identification of what remains
unknown, and/or (3) facilitates the recognition of contradictions. Some
of the models will be descriptive, and others prescriptive. The final
model, (i.e., the delivered program), is always prescriptive, and thus the
software engineer is, in essence, a builder of models.
1?
Methodology is the study of methods. Because the word is impressive, some use
it instead of the more accurate word, method.
THE SOFTWARE PROCESS 65
Process descriptions
Tools
Support environment
four-level structure that implies that methods and practices are designed
to support a particular process description, and that tools are designed
to carry out the tasks defined by methods and practices. The lowest
level in this hierarchy is the environment, which is defined as an
integrated collection of tools that supports the overriding process
description (or model). There are several contradictions in this
hierarchy. The most glaring of these is that it is not a hierarchy; the
same tool can be used with many methods, and the same methods can
be used with many process descriptions. Nevertheless, the diagram does
offer an effective definition for an environment, and it does display the
relationships among process models, methods, and tools.
In what follows, I assume that the process model will be some
variation of the waterfall model. This is not an endorsement; rather, it
is a recognition that virtually all commercial development conforms to
that model. Because this book is intended for practitioners, it would be
foolhardy to expend much time on alternative paradigms that are not
widely used. The discussion of the methods and practices is presented
in the form of approaches to modeling. A quick review of the table of
contents shows that the material is grouped into the following three
types of activity:
■ Complexity. Software is more complex for its size than any other
human construct. Because no two parts are alike, software
systems contain more details than most manufactured systems in
which repeated elements abound.
role of the human problem solver (i.e., the designer) is central and that
the hope for purely technological solutions is limited.
13
For an interesting and very readable examination of scientific change see Kuhn,
The Structure of Scientific Revolution [Kuhn70], Kuhn calls the set of perceptions that
guides the understanding in some scientific field a paradigm. All problems are interpreted
in the context of this paradigm. When the paradigm fails to explain or predict observed
events, there is a paradigm shift. Older scientists, whose perceptions are rooted in the old
paradigm, resist this change. Thus, in the sense of this section, refuted scientific
perceptions are viewed as myths.
THE SOFTWARE PROCESS 69
This list is reproduced in its entirety from a 1984 paper in the IEEE
Transactions on Software Engineering. The data are correct, but we can
gain a better understanding of the problem by returning to the source.
The GAO report reviewed contracts for the development of custom-
built business and administrative systems. The projects studied
represented the state of the practice in the mid-1970s. The analysis was
divided into two activities. First, there was a pair of surveys, one of 163
contractors, and the other of 113 Federal data processing personnel with
contracting experience. The first survey was never printed, but the
results of the second survey are shown in Figure 1.11. These data
appear to contradict the results just presented.
The second task, which led to the report’s most dramatic findings,
was an analysis of nine software contracts. These projects were brought
to the reviewers’ attention because they were problem cases; in fact,
some were the subject of extended litigation. One small contract was
included as an exemplar of good practice. In all, the nine contracts had
a total value of $6.8 million. The value of the exemplar contract was
$119,800. The distributions were computed on the basis of contract cost.
Consequently, what might have been reported as “one contract out of
nine” became even more striking as “less than 2% of the software
contracted for was usable as delivered.”
Figure 1.12 summarizes the causes for failure in the eight contracts.
(Case 5 obviously was the exemplar.) Notice that only the last item in
the list involves technical issues. In fact, the primary causes for failure
were an inability to: identify what was needed, specify a valid product,
select an appropriate contracting mechanism, and/or control the process.
There is little evidence to suggest that the projects failed because of
shoddy workmanship. Of course, in a chaotic environment, poor
performance usually follows.
Software development has dollar overrun 21.2 29.2 25.7 9.7 6.2 8.0
Software development has calendar overrun 30.1 31.9 25.7 8.0 1.8 2.7
The delivered software must be corrected
or modified by in-house programmers 8.8 34.5 35.4 13.3 6.2 1.8
before it is usable
The software is paid for but never used * 3.6 16 1 57.1 20.5 2.7
The delivered software is difficult to modify 5.3 37.2 38.1 115 4,4 3.5
The contractor's programming practices are
such that the software is easily understood 14.2 62 8 15.0 6.2 ★ 1.8
by agency programs
Case Number
Cause
1 2 3 4 5 6 7 8 9
Having just invited skepticism, I now report on what we know about the
software process. I rely on reports from a variety of projects (ranging
from commercial applications to embedded systems, from assembly
language to high-level language programs) collected over several
decades, and interpreted with an assortment of analytic methods.
Naturally, this diversity ensures imprecision. The objective is to present
broadly held beliefs that are founded on empirical evidence. Gross
measures will suffice to characterize the software process. In Chapter
6 I return to this topic with an examination of how these measures may
be refined to aid managers improve software quality, productivity, and
predictability. For that goal, more refined measures are necessary.
I begin with the classic distribution of cost trends shown in Figure
E13. This figure is taken from Boehm’s 1976 paper, and it shows both
the relative decrease in hardware costs and an increasing commitment
to software maintenance. It should be pointed out that the software
costs include the software (and documentation) associated with the
hardware; that is, the figure does not necessarily represent a ratio for
new developments. Nevertheless, it is clear that equipment cost is now
a secondary factor, and that about half the software activity is devoted
to maintaining systems already in operation. Surveys typically report
that, in an internal data processing organization, less than half the staff
are available for new projects; the rest are engaged in system support
and software maintenance. This results in backlogs of two or more years
from problem identification to solution delivery. Indeed, Martin talks
of an invisible backlog because, when faced with such delays, clients see
no purpose in making a request [Mart82].
If, as the figure demonstrates, software is the critical commodity in
automated systems, it does not follow that most of this investment is in
programming. In project planning, it is common to allocate effort
according to a 40-20-40 rule. Forty percent of the effort is expended on
analysis and design, 20% on coding and debugging, and 40% on testing.
Boehm uses a 40-30-30 distribution with his cost projection model,
where the first 30% is for programming and testing and the second 30%
is considered integration [Boeh81], All of these rules are based on the
analysis of effort distributions from many projects. It is clear that more
effort is devoted to deciding what the software is to do, how it is to do
it, and determining what is done than is expended on the programming.
THE SOFTWARE PROCESS 73
test test
< 4K 1.5
4K - 16K 2.5
16K - 64K 4
64K - 512K 6
512K > 14
The reason for this is that the added staff requires training, which just
adds to the burden of the already overworked, behind-schedule staff.
There are several ways for projects to improve their performance
efficiency (which can be translated into smaller, more effective groups).
The proper use of methods eliminates unnecessary work and avoids the
errors that lead to rework. There also is a broad consensus that the
number of program lines produced per unit of time is independent of
the language used [Broo75, WaGa82, Blum90]. Consequently, the use
of compact representations will improve productivity. As will be shown
in the following chapters, this principle can be applied with higher level
languages, encapsulation, and reuse. Here I limit the discussion to the
effects associated with the choice of programming language. For
example, recall that the Backus FORTRAN example reduced a 1000 line
assembly language project to one of only 47 lines; what might have taken
days was reduced to an afternoon’s work.
There are some formal measures for the expressiveness of languages,
such as the language level of software science [Hals77]. It will suffice,
however, to provide an illustration of the phenomenon without a more
general discussion. In the late 1970s, Albrecht introduced a method for
estimating the size of a project according to the amount of function that
the resulting product would deliver; he called its unit of measure the
function point [Albr79j. Albrecht and Gaffney then analyzed several
projects and showed that productivity (as measured by the number of
lines of code required to produce one unit of functionality) was related
to the language used [AlGa83]. They reported the following:
COBOL 110
PL/1 65
DMS/VS 25
Thus, with more expressive languages, fewer lines of code were required
to program the equivalent functionality, and effort was reduced.
Another factor affecting productivity is the experience and capability
of the development staff. Much of the thrust of the previous section is
that software engineering involves problem solving, and that the more
expert the software engineers, the better the solutions. Do the data
support this view? There have been studies of individual differences
going back to the late 1960s. In one small, but frequently cited study,
it was found that there were performance differences of up to 28:1
[SaEG68]. A later study by Curtis found differences on the order of
THE SOFTWARE PROCESS 77
23:1 [Curt81]. Boehm analyzed the factors that effect software project
costs and classified them as the set of cost drivers shown in Figure 1.15.
This chart shows that the element with the largest range is “person¬
nel/team capability” (i.e., the skills, experience, and talent of the
personnel assigned to the project). Again, this confirms that there are
great differences among individuals. Fortunately, some of my data
suggest that much of this difference reflects a training deficit rather than
an inherent, uncorrectable difference [Blum90]. Thus, Brooks’s advice
to grow great designers (and implemented) implies a productivity bonus.
The initial design, however, represents only a small part of the total
effort devoted to software development. Figure 1.13 showed that half
the software cost is for maintenance. Estimates for the proportion of
the life-cycle cost devoted to maintenance range from 50 to 80%. In any
78 SOFTWARE ENGINEERING: A HOLISTIC VIEW
case, it is broadly recognized that this is the most expensive, and longest
living part of the process. Tb gain an understanding of the problem,
Lientz and Swanson surveyed 1000 Data Processing Management
Association (DPMA) members who considered themselves “managers.”
They received 486 17-page returns, which led them to the following
categorization of maintenance [LiSw80]:
Knowing that this is possible changes our mindset. After all, if bridge
engineers can deliver bridges that require no postconstruction testing,
it is reasonable for us, as software engineers, to expect similar standards.
14For example, see the final chapter in TEDIUM and the Software Process [Blum90].
THE SOFTWARE PROCESS 81
Stan has completed his analysis and is about to plunge headlong into the
hypothesis testing stage. There are some uncontrovertible facts that
support his theory, and he shall receive immediate feedback.
Fortunately, cartoon characters always survive their mistakes, and Stan’s
lesson will be that more analysis is required.
It usually takes longer for software engineers to get feedback, and
often their projects cannot survive major failures. So let us learn from
Stan and finish our analysis before we rush to code our solutions.
Requirements analysis is a very difficult task. It involves
understanding a problem in some application domain and then defining
a solution that can be implemented with software. The methods that we
use will depend on the kind of problem that we are solving and our
experience with this type of problem.
Experience is often the best teacher, and prototypes provide an
opportunity to learn. Of course, the prototypes will reflect the errors
of our way, and they should be discarded. As Brooks has told us, “Plan
to throw one away, you will anyway.” Ask Stan.
There also are ways to learn without doing. In what follows I show
how to model what we know about the data, the processes that
transform the data, and the underlying formalisms that specify the target
system. There are many methods, and few will be used in any one
project.
As a result of this analysis, we will come to understand the problem
we are to resolve and the solution that we intend to implement. We will
document that solution in a software requirements specification, or SRS.
Naturally, the SRS will not detail everything about the intended product;
it will define only its essential properties. Not included in the SRS are
the implicit requirements (necessary for all applications of this class)
and the derived requirements (expressed as design responses to the
essential requirements).
Once we have established what we want, we can proceed to the next
level of modeling and define how to implement it. Unfortunately,
separating this what from the how is not as simple as it may seem, and
we will continue doing requirements analysis and modeling through the
end of Chapter 4.
84 SOFTWARE ENGINEERING: A HOLISTIC VIEW
specification; we know that two pieces cannot occupy the same space at
the same time and that the queen cannot move like a knight. This is
true for every program, on every machine, in every language, in every
operating environment. But the requirement to respond in 30 seconds
is quite different. A particular program in a particular computer with
a controlled number of users might satisfy this requirement. Yet, when
another person logs onto that computer, the response time may degrade
so that it fails to meet the specification. The queens still move like
queens, but the moves may take longer.
Thus we see that there are two types of requirements:
Notice that I just introduced two new items in the above list of
functional requirements: use of a bitmapped display and maintenance of
a move log. Not all chess-playing programs need to satisfy these two
additional requirements, but a particular developer may find these two
requirements essential for any program he builds. The specification
augmented by these two functional requirements still describes the
behaviors for all programs (independent of their implementation
characteristics or operating environment). These behaviors can be
verified as logical properties of all implementations, and that is why I
referred to them as a predicate earlier. The nonfunctional requirements,
on the other hand, are a property of the implementation, and they
cannot be described logically. In the context of the specification, they
represent descriptive information that can be verified only after the
product is complete.
Thus, the description of what the product is to do can be divided
into two categories: the functional requirements define behaviors that
may be logically tested throughout the development process, and the
nonfunctional requirements represent operating characteristics of the
product that can be demonstrated only after the product is available.
Our challenge as software engineers, therefore, is (1) to identify the
functional requirements clearly and verify that no decisions violate them
and (2) to document the nonfunctional requirements and choose design
alternatives that have the highest probability of being able to meet them.
In summary, we see that there are some attributes of the desired
system that we cannot describe logically. That is, our definition of what
88 SOFTWARE ENGINEERING: A HOLISTIC VIEW
In the previous section I spoke of the danger that our designs may
reflect what we know how to build rather than what the sponsor needs.
The section on human problem solving in the previous chapter suggested
that we approach a problem with a particular mental model, that we
organize all new information to be consistent with that model
(frequently constructing naive theories), and that our reasoning
mechanisms are imperfect and biased. I also referenced Simon’s
observation that “even the most talented people require approximately
a decade to reach top professional proficiency” [Simo69, p. 108], Thus,
unless there is considerable experience with a particular class of
90 SOFTWARE ENGINEERING: A HOLISTIC VIEW
of prototype is that the user will read too much into his
interactions, and the delivered product will not match his
expectations.
This list is far from complete. Throughout the text I shall suggest
problems that can benefit from experience with prototypes. The point
that I wish to make here, however, is simply that prototyping is not a
new and foolproof way of developing systems; rather it is as old as
exploratory experimentation and computer simulation.
I have frequently made the point that the software process involves the
transition from a need to a product that satisfies that need, and that one
cannot apply the process without an understanding of the need being
addressed. That is, because software engineering is a problem-solving
activity, one must know something about the problem being solved to
study the process. For this reason I shall use a common case study for
all the examples in this book. The idea is that if the reader understands
the problem being solved, it will be much easier to follow the methods
used to produce the solution. Moreover, by choosing a problem from
the domain of software engineering, I expect the reader to become
REQUIREMENTS ANALYSIS AND MODELING 93
TO BE COMPLETED BY CM MANAGER:
Received by: Title: Date:
Corrective action Disposition
□ NAR □ DTF □ SCN#- □ CR# _
Documents affected:
In general, all SIRs not classified as no action required (NAR) are acted
on in an expeditious manner. Proposals from SIRs and CRs deemed
worthy of further consideration are documented as an engineering change
proposal (ECP), which is ultimately accepted or rejected by the project
sponsor. If an ECP is approved, then the affected CIs are copied from
the baseline for modification. The baseline remains unchanged until the
approved changes are made, the amended CIs have been audited, and the
revisions are available for baseline update. Naturally, the baseline will
retain both the version prior to the update and the new version. (Both
versions may be in operational use; alternatively, errors may be found in
the changed baseline that require the reversion to an earlier baseline.)
REQUIREMENTS ANALYSIS AND MODELING 97
CHANGE REQUEST
Product Name: - Request Number:
Telephone: E-mail:
TO BE COMPLETED BY CM MANAGER:
Comments
■ list. This produces various lists from the commentary in the code
management system, for example, a summary of change logs or a
list of checked-out files by version or designer.
2
In actual practice, most relational DBMS implementations allow duplicate tuples.
This is a performance compromise and not always a desirable system property.
106 SOFTWARE ENGINEERING: A HOLISTIC VIEW
Fig. 2.4. Sample data from the Software Incident Report relation.
can produce an ECP. The ERM is a graphic model. The entities are
drawn in boxes and the relationships are shown in diamonds. Normally
the relationship names are verbs, and the entity names are nouns.
Figure 2.5 presents an ER diagram for this simple relationship. There
are no arrows in this diagram to suggest the flow in which instances are
created. The diagram may be read from left to right as the declarative
statement, “SIRs produce ECPs,” and from right to left as, “ECPs are
produced by SIRs.”
Of course not every SIR will result in the production of an ECP; the
configuration control board (CCB) may determine that the change is
inappropriate or unnecessary. Thus, the relationship is optional with
respect to the ECP (i.e., if an ECP exists, then an SIR must exist, but if
an SIR exists, it does not follow that an associated ECP has been
produced). Also, a single SIR may produce more than one ECP. We
say that the relationship is one to many from SIR to ECP. Many ERM
notations use symbols on the line between the entities and the
relationship to indicate if the relationship is optional or mandatory as
well as the cardinality. Figure 2.6 illustrates three such notations;
unfortunately, there is no standard. The first depicts the l:n
relationship with one solid symbol on the SIR side and two open
symbols on the ECP side; solid implies mandatory and open optional.
The second example denotes the cardinality with (1:1) and (0:N)
meaning that each Produces relationship has exactly one SIR and from
zero to an arbitrary number of ECPs. Variations of this form sometimes
omit the participation constraint and use only a 1 along the line near the
SIR box and N along the line near the ECP box; other formats combine
all the information into a single notation, such as l:(0,n), written
anywhere in the area. The third example uses a circle to indicate
optional, cross lines for a cardinality of one, and a fan-out triangle to
express many. (Not shown is the optional one, which is indicated by a
single cross line and a circle.) In any case, the meaning should be clear.
Every SIR may produce several ECPs, or none; but every ECP is
produced by exactly one SIR.
The important thing about the ERM and the definition of cardinality
is that it forces the analysts to think about the universe being modeled.
If the l:(0,n) relationship is true, then it follows that an ECP may not
be prepared in response to more than one SIR. Does this make sense?
In the abstract, one could argue with equal conviction that the relation-
For SIR-No the value set contains numbers that can be assigned at the
time of SIR receipt, the value set for both SIR-Receipt-Date and SIR-
Close-Date is a set of dates in the form month/day/year, and the value
set for SIR-Status is {Analysis, Review, Pending acceptance, Not
approved (closed), Waiting assignment, Change authorized, Validated
change (closed)}. Figure 2.7 uses the notation of [FuNe86] to
illustrate the associations among these items for the data shown in
Figure 2.4.
Again, our interest is not in the notation but in the concepts
expressed. Is the value set of SIR-Status complete? Is it too detailed?
Is this concern for SIR status important at this stage of analysis? None
of these questions has an answer outside the context of the application
to be developed. How the system will be used and what reports are
required will establish what status codes are necessary. In some cases,
the status code may be linked to some other action, and thus it must be
defined explicitly. For example, we may state that the code management
component will permit release of CIs only if they are identified with an
ECP whose parent SIR has the status "W". Alternatively, the sponsors
may decide that the SIR status will be used only in reports, and that
control over the release of CIs with be enforced manually. In this way,
questions about the value set codes can precipitate an analysis of
fundamental behavioral issues.
In the SIR entity there still is no way of determining who initiated
the SIR. One could add an attribute to the SIR entity called SIR-
Submitter. Another approach would be to identify two more entities,
Developers and Customers and the relationship Submits between them
and the SIR entity. Developers also create CIs, receive CIs for revision,
and submit revised CIs for auditing and entry under SCM. From the
perspective of procedural actions, each of these tasks has a different
operational flow. From the point of view of the data being modeled,
however, perhaps one abstract data representation can manage all three
processes. This represents a trade-off that has no single correct answer;
there are only poor or wrong answers that ought to be avoided through
careful, up-front analysis. Figure 2.8 contains an ERM that proposes a
model for this part of the SCM system. Before examining it, one last
notational detail must be explained: the weak relationship.
Primary Key
Attributes SIR-No SIR-Receipt-Date SIR-Status SIR-Close-Date
Value Sets SIR-No Dates (A, R, P, N, W, C, V} Dates
Entities 123456 1/17/90 V 1/19/90
123467 1/20/90 N 3/17/90
123490 1/20/90 W
Starting with the upper left corner, the diagram shows that both
customers and developers can submit an SIR. Some special notation is
used to indicate that every submission of an SIR comes from either a
customer or a developer, but never both (i.e., the Submitter Submits an
SIR where Submitter is composed of the nonintersecting sets:
Customers and Developers). In this diagram, a weak relationship is
shown for the CR entity to indicate that only Customers Request a CR.
Either an SIR and CR can Produce an ECP. In this notation, it is not
clear that CR is not weakly related to the relationship Produces. Some
notations avoid this ambiguity by using a double-line diamond for the
weak relationship (i.e., Requests). The diagram now establishes
relationships among customers, designers, CRs, and SIRs. Does this set
of relationships conform to reality? Recall that some change requests
may be submitted by the developers (e.g., if there is no reasonable way
for a requirement to be met, the developers may submit a CR requesting
that the requirement be amended). This ERM does not permit this. I
leave the modification of the ERM to allow designer-submitted CRs as
an exercise, but I shall use this deficiency to make an important point.
When we draw the ERM, our intent is to have a valid model of the
desired SCM system and not just a syntactically correct ERM diagram.
In this case, even though the ERM is correct, it is not valid. The goal
of requirements analysis is to establish validity; correctness can follow
only after the specification exists. As with the draining-the-swamps
example, it is important to remember why we are drawing the ERM.
Figure 2.8 also helps to identify other issues that require
clarification. Consider the relationships among CRs, SIRs, and ECPs.
Should we be able to trace an ECP back to more than one CR or SIR?
If so, then are attributes for the Produces relationship necessary to
indicate status changes when the ECP is rejected, aborted, or completed?
What do we want the system to keep track of at the customer level, and
is it the same as for the developers? Do we want to link information
about Customers in the SCM system with information about Customers
in the Accounts Receivable or Marketing systems? (This concerns the
boundary of the automated SCM.) There are many more questions.
Because we cannot answer them without input from the sponsor, it is
best to move on with the diagram review. Nevertheless, it is important
to notice how we begin with a reasonable approximation and use that
baseline to improve our understanding of the problem. Too often books
present only a diagram containing some accepted solution. But the
diagram is most valuable when it guides us to that solution. Most of the
time our diagrams will be either invalid or incomplete, and the process
of requirements analysis is one of completing and validating the model.
Reading down from the middle of Figure 2.8 we see that Work
Requests are Initiated by an ECP. The figure shows that Work
Requests and ECP are in 1:1 correspondence, but that the initiation of
112 SOFTWARE ENGINEERING: A HOLISTIC VIEW
Each of the operators is a function from the set of relations to the set
of relations. Select produces a relation from SIR that satisfies the
formula SIR-Status = "W", and Project transforms that relation into a
relation with just the two named attributes.
By combining the five primitive operators, one can produce
additional operators. The natural join operator takes two relations that
share one or more attributes and produces a relation in which the tuples
contain all attributes of the independent relations (without repetition of
the shared attributes); tuples are included in the new relation if and only
if the tuples of the original relations share identical values for the
specified attribute(s). For example, assume that we wanted a list of ECP
identifiers (ECP-No) whose source SIR still was waiting assignment.
The relationship between SIR and ECP is the relationship Produces,
and it can be expressed as the relation
3In the field of database design, the term conceptual model implies a model in the
context of this framework. In this book, however, I use the term to represent a much
broader concept.
116 SOFTWARE ENGINEERING: A HOLISTIC VIEW
where CR-No identifies the CR. Clearly, this structure may seem to
offer efficiency if we always expect to list out the customer name with
the CR-No. However, Customer-Name would also appear in the
relation CUSTOMER, and replicating the name can cause problems. For
example, consider the potential for error if we insist on the consistent
spelling of customer names. For a relation to be in second normal form,
all nonkey attributes that are dependent on a proper subset of the key
must be removed. In this case, one would delete Customer-Name from
the REQUESTS. Given that the name already existed in CUSTOMER,
REQUIREMENTS ANALYSIS AND MODELING 117
There are other normal forms that deal only with multiple keys (multi¬
valued dependencies), and it can be shown that third normal form can
produce a scheme that guarantees lossless joins and the preservation of
dependencies. Thus, I will close out the discussion of normal forms here
by simply observing that once the ERM exists it must be transformed
into a set of relations that satisfies some normalization conditions. All
entities and relationships in the ERM must be expressed in the form of
a relation. (I have assumed that a relational DBMS will be used; the
REM is not restricted to the relational model.) If the ERM properly
defines the entities, then the derived scheme generally will be in third
normal form. Thus, an awareness of normal forms will help in the
creation of the ERM. For a simple guide to normal forms see [Kent83],
and for a discussion of the transformation of the ERM into a data
model or scheme see [Chen85],
Figure 2.9 contains a partial scheme (in third normal form) derived
from Figure 2.8; it omits the structure for the Cl entity, which I discuss
below. The first five relations need no explanation. Notice that
SUBMITS has one key term, and REQUESTS has two key terms. This
reflects the fact that the latter is a weak dependency. PRODUCES,
which was referenced in the join example, is not shown here. In Figure
2.8 we see that for every ECP there is exactly one SIR or CR, and we
have stored the identifier as an attribute and added the attribute Source
to indicate whether SIR-No or CR-No was to be used. SIR-No and CR-
No are called foreign keys because they are keys to other relations (i.e.,
SIR and CR). Notice that one of the keys will always be null, and
Source is used to identify which is not null. Managed-By (which is of
the type Developer-No) serves as another foreign key that eliminates the
need for a “manages” relation. Notice that the attributes of SUBMITS
could be appended to those of SIR. Thus, the only relationships that
118 SOFTWARE ENGINEERING: A HOLISTIC VIEW
CUSTOMER(Customer-No, Customer-Name,
Customer-Address, Salesman-No)
DEVELOPER(Developer-No, Developer-Name)
SALESMAN(SalesrnarvNo, Salesman-Name)
require separate relations are those with more than one key term.
Unfortunately, not all DBMS products support null foreign keys and
types.
Now that we have examined the bark of a few neighboring trees, let
us return to our task of mapping the forest. By way of retracing our
steps, we began by saying that we could learn about the system we were
about to specify by abstracting away all the details save for those that
described the data that the system would use. We adopted the ERM as
the tool to guide us in this odyssey, and the early discussions of the
validity of our ERM pivoted on our understanding of what the system
was to do and how the environment in which it would operate behaved.
Because this is only an exercise, we feigned satisfaction with our
understanding as expressed in Figure 2.8, and we moved on to see how
this conceptual model could be transformed into a more formal model.
This involved some backtracking and a whirlwind description of some
relational database theory. The critical observation was that a formalism
offers a rigid, beneficial structure that must be understood to be used.
That is, whereas a wave of the hand can be used to communicate or
clarify a semantic concern, such informality is worthless when dealing
with syntactic issues. And, as I hope I made abundantly clear in Chapter
1, software engineering is the process of managing the transition from
REQUIREMENTS ANALYSIS AND MODELING 119
* The system shall record and display every software incident report
(SIR) and change request (CR) received.
■ The system shall display the status of SIRs and CRs including
dates of receipt, source, and status.
interactions are the SIRs, CRs, ECPs, and CIs. SIRs and CRs come
from both customers and developers,4 somehow ECPs are reviewed by
management and get to the customers (who, after all, may have to pay
for the changes), and eventually the developers get CIs to change and
then submit as updated revisions for SCM. This is a fuzzy, stream-of-
consciousness examination of the SCM flow. But it does not seem to fit
into the framework suggested by the initial diagram because Figure 2.10
is not a good start. So consider the next few paragraphs a scenario
rather than an instruction in context diagramming. Problem solving
always involves backtracking.
One reason that Figure 2.10 is poor is that its choice of CCB as the
central oval is too restrictive. The CCB makes decisions about changes
to the configuration based on the analysis of SIRs and CRs. The CCB
will have to use the SCM system to help them manage their activities
and review the status of outstanding activity. I could concentrate on
how the CCB works and what they need from a SCM system, but this
would be the wrong tool for that purpose. My goal now is to establish
a boundary for the system I am about to specify. More formally, I wish
to restrict the solution space for this application. (For example, does
the SCM solution space include concerns about salesmen, and, if so,
what are they?) Putting the CCB in the center of the diagram may have
been a reasonable start, but now I recognize that the choice was too fine
grained. Therefore, let me change the central oval to a more generic
“CM Group” that includes the CCB plus the people charged with
maintaining the CIs, conducting the Cl auditing, tracking the status of
4The entity-relationship diagram in Figure 2.8 contradicts this fact. The ERM
diagram, therefore, is invalid; it is syntactically correct but nevertheless wrong. As an
exercise, the reader ought to produce a modified diagram. In what follows I will ignore
the misconceptions previously explained away for pedagogical purposes and continue by
modeling only the real world as I see it.
124 SOFTWARE ENGINEERING: A HOLISTIC VIEW
the ECPs, managing the company’s other CM activities, and so on. For
the boundary problem that I am trying to solve here, I do not need more
than a general definition of what organizational entities constitute the
CM group. When I look at the management oval, on the other hand, I
find exactly the opposite fault. If I lump all management together into
a single oval, I end up with a bureaucratic tangle that seems to interact
with everything. For our immediate concerns, it is best to separate the
developers’ managers, who will control the technical decisions, from the
upper-level management, which interacts with the customers and makes
the financial decisions.
This revised set of ovals is shown in Figure 2.11 along with the key
interactions between the central oval and the outside ovals. Let me
trace the sequence of operations for a SIR submitted by a customer. As
shown in the diagram, the SIR is sent from Customers to the CM
Group, which logs it and prepares an Analysis Request that is sent to
the Developers who prepare and return an Analysis Recommendation
to the CM Group. The recommendation is structured by the CM Group
as an ECP for Costing, and it is sent to the Development Managers for
action. The completed ECP is sent to Upper Management for approval,
and if approved it is forwarded to Customers for action. If the
customer is willing to authorize (and possibly pay for) the ECP, that fact
is transmitted to Upper Management, which in turn transmits a Task
Authorization to the Development Managers. The Development
Managers assign Developers to the project and send a Task
Authorization to the CM Group, which causes them to send (perhaps
electronically) the CIs for Modification already identified in the ECP.
After the Developers complete the changes, they inform the
Development Managers and submit the CIs for Auditing and reentry
under SCM as revisions. The Development Managers coordinate the
completion of the task with Upper Management and the Customers,
and eventually the CM Group sends the modified product to all
Customers who have paid for the change (either by contracting for it
directly or by way of a maintenance or warranty agreement). Of course,
these organizations can also Request and Receive Status Reports while
all this is happening.
Did Figure 2.11 say all that? No, some of the details were purposely
excluded. The transactions among the Development Managers,
Management, and the Customers are of no immediate concern to the
CM Group, therefore they do not belong on this diagram. We will draw
separate step-25 diagrams for Upper Management, Development
5The step numbers are informal references to the steps in this particular scenario,
and they should not be mistaken for a standard nomenclature.
REQUIREMENTS ANALYSIS AND MODELING 125
In fact, we know much more than is shown in this simple diagram. The
diagram’s concise presentation acts as a framework for retrieving that
knowledge as we explore the system’s complexity. We can stop at this
point, or we can use Orr’s methods to continue beyond requirements
analysis and begin the design. By way of illustration, I will outline the
next few steps in Orr’s approach. The method is called Data Structured
System Development (DSSD), and it is described in [Orr81] and
[Hans83]. The principal tool used by DSSD is the Warmer/Orr diagram.
These names suggest that I should have described this method in the
section on data modeling. Soon, I will rationalize my discussing DSSD
in a section on process models; for now I simply acknowledge that the
astute reader is correct in assuming we are about to determine the
underlying real-world structures and then fit the system’s processes to
them.
128 SOFTWARE ENGINEERING: A HOLISTIC VIEW
SIR leads to
Analysis Request, which leads to
Analysis Result, which leads to
Completed ECP, which leads to
ECP Authorization, which leads to
Assignments, which leads to
Completions, which leads to
Products.
c
SIR
/
Analysis request +
Analysis result < Log in
Completed ECP < I Technical review l
| ECP authorization < , +
Review and/or money l 1
a
Assignments i +
Management decision^
r
+ N
Completions < A Tasking and CIs I
I Acceptance
Products -< criteria
Manufacture
information
Each diagram is limited to three to six SA boxes, where each box models
a subject (process) described in the form: under control, input is
transformed into output by the mechanism. Because every box is a
Control
>{
Input Output
Subject ---^
1 Mechanism
MORE GENERAL
I
MORE DETAILED
and i may be equal to j. The state6 may be viewed as the holding (or
definition) of a set of conditions, and the transition may be interpreted
as the termination of some conditions and the beginning of other
conditions. An event may be thought of as either the state change or
the external activity associated with the transition. The effect of the
state transition may be observed as an action or output. The granularity
of these events, transitions, and actions will define the detail of the
model being constructed. This may seem like a very abstract way to
model a system, but there is a very natural isomorphism. One may
speak only of events, transitions, and actions. Thus, we can list the
events of interest for the process being modeled and identify how they
affect the state and what actions they produce. Once the list exists, we
can use an STD to represent the system.
In the SCM system requirements analysis, the major emphasis has
been on the processing of requests for change. Let us now focus on
what happens to the SCM system as CIs are checked in and out. The
following events and actions seem obvious, and they provide an excellent
start for the event list.
■ Initial. The initial state assumes that the Cl is checked into the
SCM workspace and locked.
The STD in Figure 2.20 displays the relationships among these events
(i.e., state transitions). There are several diagramming conventions that
one could use. Here the states are shown in boxes, and squared lines
are used for the transitions. The line to the side of the transition line
signals the event; above it is the event (or condition) and below it is the
action. (Another common diagram format uses circles for the states and
curved arcs for the transitions; to avoid confusion with the notation used
with SA/SD, I have used this straight-line form.)
6In Chapter 4 I will return to the subject of state when I discuss object-oriented
techniques.
REQUIREMENTS ANALYSIS AND MODELING 139
Check-in Check-out
Copy to SCM, Copy to dev
delete dev
SCM: Cl locked,
flagged
Dev: Cl unlocked
2
Modify
In this figure there are just two states. Initial sets the first state, and
Check-out moves to the second state. Modify does not change state, and
Check-in returns the system to the initial state. The lines to the side of
the transitions identify the events and, where considered important, the
key actions implied by the state change. As with an SADT diagram,
each of the state boxes can be decomposed into a more detailed STD.
At first sight, this STD does not seem to help us understand the
problem better. This type of diagram is most useful in process control,
real-time, and man-machine interaction problems. Nevertheless, even
here we can use it to examine what we know about our problem. First,
we should observe that the four different events reduce to just two
states. Is this too high a level of modeling, and if so where should we
add details? For example, we know that we want to use the latest
configuration to prepare an updated release, a process I called
manufacture. Therefore, we should be able to add a Manufacture event
as a transition from the first state back to itself with the action of
creating a new version. This would imply that we can manufacture
products only if the CIs are not flagged. This STD is for a single Cl,
140 SOFTWARE ENGINEERING: A HOLISTIC VIEW
■ Identify all events without concern for the states first and then go
back to make meaningful connections.
■ Begin with the initial state and methodically trace through the
system identifying all state changes as you go.
In actual practice, one will probably alternate between methods until the
list is reasonably complete. Once the events have been identified, they
should be associated with transitions (i.e, from one state to another) and
actions. This could be organized in the form of a list (as shown above),
or it could be represented as a table. For example, event data could be
displayed with the states in the first column, the events that cause the
system to enter the given state in the next column, and the events that
cause the system to exit from that state (perhaps back to itself) in the
last column. For the present example, the table would be as shown in
Thble 2.1. This information also could be organized by state alone as
shown in Thble 2.2.
By presenting the same information in different formats, one can test
the data for completeness and closure. One can ask:
Thble 2.1. Sample list of states with enter and exit events.
One can even restructure the information into the form of a decision
table as presented in Figure 2.21. In this table we are concerned with
identifying invalid conditions. Above the double line we identify state
conditions that are of interest to us, and below we list what transitions
are enabled (i.e., valid). Each column in the decision table represents
a rule. The first rule reads that if the Cl is flagged in the SCM
workspace and the Dev workspace is empty, then for any value for the
Dev workspace locked status (represented by a dash) this is an invalid
state. The second rule reads that a Cl flagged in the SCM workspace
and locked in a nonempty Dev workspace is also invalid. The last
column gives the rule that if the Cl in the SCM workspace is not flagged
and the Dev workspace is empty, then check-out is enabled. One can
test this decision table to verify that all possible combinations have been
Enable check-out X
Invalid state X X X
tested (in this case 23 combinations are shown) and that each
combination above the double lines produces exactly one outcome below
the double lines.
In the discussion of state-transition diagrams I have presented four
or five equivalent representations for the same information. Although
the example is trivial, it should be clear how an event orientation
contributes a different perspective for examining the system being
specified. Peters has an example illustrating how event analysis can be
combined with the ERM and SA/SD [Pete87], and (as will be discussed
in more detail in Chapter 3) Yourdon shows how the states in a STD
can map onto process definitions [Your89], In the remainder of this
section I will show how the STD can be extended to model concurrent
operations using a Petri net [Ager79, Pete81, Mura89]. Before describing
the Petri net, I will outline the process to be modeled with one. Assume
that the software product that we maintain in the SCM system consists
of two software components with the following rule for distributing new
versions. One component, the operating system (OS), is externally
supplied, and updates are periodically delivered to the SCM system. The
other component is developed internally and is kept under configuration
management; changes to it are made only in response to the receipt of
an SIR. When an OS change is made, it is tested and released only if
no authorized changes to the internally developed component are in
REQUIREMENTS ANALYSIS AND MODELING 143
The idea is that we need to know about both the specific application
being developed and what has been discovered for all applications of this
class. Not understanding the practical constraints may doom us to
unrealistic solutions; not being aware of the underlying theory may lead
to suboptimal (or even unworkable) solutions.
We have now completed a survey of modeling tools that can enhance
our understanding of the problem to be solved; I now turn to the task
of transforming that hard-found knowledge into representations that will
guide the design and implementation activities.
In Sections 2.3.1 and 2.3.2 the emphasis was placed on learning about
the problem to be solved and constructing possible solutions. I
suggested that this was a process of going from fuzzy to clear thinking,
and I showed a variety of methods that could be used to structure our
understanding. Some of these methods were informal, and others were
based on rigorously defined mathematical formalisms. For example, the
descriptions in the SADT boxes captured concepts informally; details
would be added later. The Petri net models, on the other hand, were
pictorial equivalents of a mathematically precise abstraction. In each
case, we used the modeling device to refine our understanding by
transforming partial solutions into a concrete, abstract form that could
be reviewed by our colleagues and sponsors. I now consider how we can
represent what we have discovered during the requirements analysis.
First, we must note that people communicate through words and
text, and some explanatory text is always required. That is discussed in
Section 2.3.4. What I outline here is not a substitute for a written
requirements document; rather, it supplements the written text for those
system attributes that can be expressed formally. Recall that the software
process begins with conceptual models that are eventually transformed
into formal models. In some cases, the conceptual and formal models
are essentially the same. For example, if one were building a compiler,
the programming language would be defined formally; if one were
developing a communication protocol, one might want first to define an
abstract model of that protocol. In both these instances, the output of
the requirements analysis process could be a model that prescribes
reification (i.e., a formal model).
As it turns out, we also can be taught to recognize formality as we
carry out requirements analysis. That is, we can learn to express our
conceptual models formally (as we do when we translate a computational
146 SOFTWARE ENGINEERING: A HOLISTIC VIEW
Languages
Natural
Artificial
supported a very simple syntax that could be used to specify the essential
objects in a system along with their relationships. For example, one
might define
Check-in
Description: Logs in new and updated CIs
Receives: Cl-materials, Version-instructions
Derives: Cl-version
Part of: Cl-manager
Subparts: Compute-version,...
Text. The type of CI-BODY is also of a primitive type, in this case the
type that defines a program file (P-file, assumed here to be a system-
wide definition). ASSIGN, the second state variable, is of type Assign,
which is defined as a mapping from one type to another. That is, we
have defined the state so that the assignments are preserved as maps
from an item of type Cid to one of type Developer, which is defined as
a set of authorized identifiers. Because a mapping is a function, no Cl
can be assigned to more than one developer in this model.
In a certain sense, the state model just described is similar to the
data models that were presented in Section 2.3.1. The major difference
is that the earlier models were intended first to gain an understanding
of the target system and then to establish a foundation for the design.
Here, the formal definition establishes the exact relationships to be
implemented. Methods such as ERM are intended to accommodate
some fuzziness in their descriptions of large systems. Formal methods
such as VDM, on the other hand, require precision and completeness;
often the price that they pay for this is a loss of succinctness. Thus,
what we are doing in this section is recording what we already have
decided about the application to be implemented. (In fact, the
conclusions used here may have been arrived at by using the techniques
described earlier.) The objective now is to have a precise definition that
will allow us to recognize inconsistencies and preserve correctness.
Tb continue with the illustration, all systems must have a set of
invariant state conditions that is always true. In this example we might
establish the restriction that the domain of Assign must be a subset of
that of Config (i.e., no CIs can be assigned to a developer unless they
are already in the system). This state invariant would be described in
the VDM notation. Once the state has been specified, operations on it
may be defined. For example, there must be an operator that establishes
the state for a new configuration management system. Naturally, we
would want that operator to be available only to authorized developers.
REQUIREMENTS ANALYSIS AND MODELING 151
/* Initializes CM system */
Figure 2.26 defines the INIT operator that tests for authorization and
initializes the variables CIS and ASSIGN as empty.
In this definition the variable MANAGER of type Developer is
supplied. The ext indicates that the definitions for CIS and ASSIGN are
external to this operation. The wr, however, indicates that the
operation may alter the state of (i.e., write to) the identified variables.
The actual operation is defined in terms of a precondition (pre), which
must be true if the operator is to become active, and a postcondition
(post), which must be true after the operator is complete. Here the
precondition is that the value for manager be that of a developer with
the manager privilege. The function that determines this, has-privilege,
is defined in the specification’s last two lines. The first line indicates
the domain and range, and the second line establishes transformation.
In this example, I have used a comment to indicate that some external
system-level function will be required. The post condition is that the
state for both the variables after the operation (as signified by the
prime) be empty (i.e., =[]). Observe how this specification models what
the INIT operation does without in any way indicating how it should do
it.
Finally, consider a slightly more interesting operation: one that
checks out a Cl and assigns it to a developer as shown in Figure 2.27.
In the CHECKOUT operation, a Cl and developer are provided. (I
assume that only authorized users have access to this operation.) The
general structure is the same as for INIT. Here only read access is
required to CIS. The precondition is that the Cl not already be assigned.
The postcondition indicates that after the operation the state of ASSIGN
152 SOFTWARE ENGINEERING: A HOLISTIC VIEW
(with the prime) is the state before the operation (without the prime)
with the mapping c - d added (actually, overloaded).
Now that the state and operations have been defined, we have an
obligation to prove that the state invariant cannot be violated as the
result of the system invoking any of its operations. We do this in a
rigorous rather than fully formal manner. That is, we do not attempt to
produce a mathematically exact proof; instead we go through the
rigorous process of establishing that such a proof could be produced.
For example, to verify the CHECKOUT operation, we would need to
show
Once we have carried out this proof, we are confident that our model of
the CM system preserves the correctness of the initial definitions. The
specification and design process then is repeated at successively lower
levels of detail until a complete implementation results.
It should be obvious from even this watered-down description of
VDM that the use of a formal modeling method is quite different from
the techniques with which most developers are familiar. Because
software engineering always involves groups of people working together,
no general textbook can provide more than an introduction to the
concepts involved. Learning how to use VDM (or Z or GIST, etc.)
usually involves working with experienced leaders. The orientation is so
different that competence requires training with an informed feedback.
Nevertheless, the approach is very important, and it is reasonably certain
that its use will advance over time. Therefore, it is instructive to
examine what these formal techniques contribute and to survey their
strengths and weaknesses.
REQUIREMENTS ANALYSIS AND MODELING 153
Recall that there are two dimensions to quality: validation (is this
the right system) and verification (is the system right). The formal
specification extends the level of the formal models so that we begin
with a reference point that defines what the system must do (rather than
just what the programs do). There are methods and tools (such as the
rigorous proofs of VDM) to verify that no decisions violate the
conditions designated in that formal model. Consequently, the
implementation will be correct with respect to its requirements
specification. In other words, the system will always be right. Conditions
that might lead to a failure in verification include massive specification
definitions that make rigor unwieldy, improperly trained developers, etc.
Such problems, however, can be resolved by means of automated support
and more experience with the methods. From a theoretical perspective,
formal methods are the only way to guarantee correctness, and there
now is an emphasis on rectifying the deficiencies of the present support
environments. Once this is accomplished, dissemination of the formal
approach should accelerate.
The problem of validation, however, is more persistent. When I
started this section, I observed that this was an appropriate technique
for certain domains. Moreover, I gave several citations to show that
VDM has indeed been used outside a research setting. But the issue
remains, how do the formalisms help us go from fuzzy to clear thinking?
For some applications, forcing us to think about the problem in new,
and abstract terms forces us into a better understanding. In other areas,
such as security and concurrency, we can reason about behaviors only in
the abstract; we lack “natural insights” and can understand the problem
only by reference to formal models. Still, the fact remains that many
formal systems sacrifice our intuitive insights in order to achieve an
unambiguous and complete specification. As a result, they increase the
risk of delivering a correct, but invalid product. Of course, this tension
between verification and validation is not unique to the formal methods;
it is simply that the abstractness of the formalisms can make it more
difficult to recognize whether this is the right system.
In summary, the development of formal methods and support
environments is an active area of research, and software engineers are
certain to have increasing exposure to them. As every proponent of a
formal method will testify, the methods are not complete unto
themselves. Designers still will need to use many of the tools and
techniques that are described in this book. And that is my justification
for asking you to read on.
■ What are the local conventions that the document should follow?
In most organizations there are established standards or
conventions for document preparation. Unfortunately, in many
instances these standards are too informal (actually bordering on
nonexistent); in other cases they may be very exact (and
sometimes even excessive). The software engineer has an
obligation to work within his organization to achieve an
appropriate level of documentation for the range of tasks that the
organization undertakes. Too little documentation is
unprofessional and invites disaster; too much documentation
transfers energy from the problem-solving task to a mechanical
and unrewarding activity. Each organization must determine
what is best for it, and each software engineer is expected to
respond to that definition and its refinement.
■ Plans. Plans are necessary before a project (or one of its phases)
begins. The objective of a plan is to lay out the activity in some
preliminary detail and to establish the standards that will be
followed. One of the greatest benefits of a plan is the thought
that goes into it. We plan in order to anticipate problems and
make mistakes when correcting them is still inexpensive. In that
sense, the SRS is a kind of plan for the project. In a large
project there will be a project plan, staffing plans, a configuration
management plan, a testing plan, and so on. Some plans will
simply be a pro forma adaptation of plans used for previous
projects; other plans will be the result of intensive analysis and
risk trade-off. All are important.
2 General Description
2.1 Product Perspective
2.2 Product Functions
2.3 User Characteristics
2.4 General Constraints
2.5 Assumptions and Dependencies
3 Specific Requirements
3.1 Functional Requirements
3.1.1 Functional Requirement 1
3.1.1.1 Introduction
3.1.1.2 Inputs
3.1.1.3 Processing
3.1.1.4 Outputs
3.1.2 Functional Requirement 2
3.5 Attributes
3.5.1 Security
3.5.2 Maintainability
The final two sections of the SRS are the appendices and an index.
Tb illustrate how the SRS for the SCM might look, Sections 1 and
2 would provide an overview of the SCM software system. Notice that
much of the analysis in the earlier part of this chapter has been
concerned with something much broader than just the SCM software
system. To understand the system, we first needed to model the host
environment. Thus, for example, the SADT flow described in Figure
2.18 depicted the processing of a Cl; it did not concentrate on what the
automated portions of the SCM would do. Now that we have
established a model for the environment in which the SCM software
system will operate, we can narrow in on what occurs within that system
boundary. Consequently, the SRS will specify only the features of the
software product that we intend to build. When we talk of product
functions in Section 2.2 of the SRS, we detail only the software product
functions; we do not describe how the entire SCM operates. In the same
way, SRS Section 2.3 defines the characteristics of the SCM software
users. In this case, there are casual users (the CCB), dedicated users
(the data entry clerks), and programmer users (who receive and submit
CIs for update); this characterization of the users’ needs will help the
designers. We should not include descriptions of persons who interact
with the SCM system but not the software (e.g., the customers who
submit CRs). They are outside the system boundary as we have defined
it, and we want the SRS to focus on only the system to be implemented.
Most of the information in the SRS is documented in Section 3.
Section 3.1 might identify the following six functional requirements:
■ Audit change. This involves the return of changed CIs for version
update. The inputs are requests to update and authorizations for
update, the process include update and flagging, and the output
is an updated configuration.
162 SOFTWARE ENGINEERING: A HOLISTIC VIEW
which of its facilities that we expect to use (e.g., its logon security
control), then we ought to specify that here. If HHI plans to have the
SCM system ported to other platforms, then that too should be spelled
out. But if the promotion of the SCM system as a commercial product
is only a pipe dream, then the SRS should not discuss the possibility.
Although there should be no secrets from the designers, we must also
avoid the opposite extreme of providing vague or irrelevant information.
Section 3.3, on performance requirements, must be specific and
verifiable; otherwise it will suffice to state, “The system must respond.”
The design constraints section (3.4) normally will augment or reference
internal standards documents or a project-specific plan. Section 3.5.1,
security, could simply indicate that vendor-supplied logon facilities shall
suffice. If the specifier asked for more, there would be an obligation to
demonstrate that the finished product delivered what was specified.
(Although it might be reasonable to request an extra level of security for
a highly sensitive, potentially hostile software setting, such a requirement
would be inappropriate for an internal SCM system.) Finally, there
always will be some category called “other.” In this case it identifies
features such as an external database that the system requires (e.g., a file
identifying the authorized designers may be made available to the SCM
system by some external system). Operations and site adaption would
probably not apply to an SCM system that is to be installed on existing
equipment.
Returning to the general discussion of IEEE 830, let me again
emphasize the significance of the descriptive text. Earlier I pointed out
the importance of the communications role of the introduction. Of
course, this is not restricted to just the first section; it is equally
important throughout the document. The fragmentation of the SRS into
highly specialized paragraphs tends to obscure the holistic nature of the
product being specified. Lucid commentary is needed to orient the
reader. After all, of what value is an unambiguous specification when
it is misinterpreted in the context of a naive theory? I also have
indicated that the document format includes alternatives and ellipses.
Clearly, the SRS contents must be adjusted to the needs of the project.
It is useful to have a standard that offers a guideline and provides a
checklist for essential topics; we should not have to reinvent the wheel.
Nevertheless, it would be foolish to use any standard as a rigid template
for entering a series of “not applicable” entries (or even worse,
paragraphs of meaningless text). Finally, it important to restate that,
although the SRS is intended to specify what is to be done, its very
format may suggest how the product will be implemented. That is, the
Functional Requirements sections may be interpreted as modules with
inputs, processes, outputs, and external interfaces. However, that is not
the intent of the SRS. The SRS should not be seen as (or written as if
it were) a Software Design Specification. The analyst must avoid
164 SOFTWARE ENGINEERING: A HOLISTIC VIEW
To ensure that the SRS contains these properties, the analysts (and their
managers) must systematically review it for completeness. The document
should include or reference the models described earlier in this chapter,
and the models must be checked to ensure that the requirements they
imply are properly expressed in the text. The documentation style
should facilitate both understanding and traceability. In general, the
requirements of Section 3 are best expressed in paragraphs composed of
lists of simple sentences. The vocabulary should be limited to verbs and
nouns that are understood in the context of the SRS, and the vocabulary
should avoid terms that are not readily understood by the users (e.g.,
avoid programming terms). Finally, the specifications should describe
actions and items that are external to the product; internal actions, of
course, are to be established during the design.
Figure 2.28 provides a different description of the properties of a
satisfactory software specification. Prepared by Boehm, this taxonomy
augments the properties of completeness, consistency, and testability
(verifiability) with the issue of feasibility: have we specified all the
necessary characteristics, and are we satisfied that the product can be
166 SOFTWARE ENGINEERING: A HOLISTIC VIEW
No TBDs .Closure
Complete properties
No nonexistent references
No missing functions
No missing products
Internal
Consistent External
Satisfactory Traceable
software
specification
— Human engr.
~ Environment
Specific
Interaction
Testable Unambiguous
Quantitative
may misunderstand the problem and deliver a product that meets the
specification but does not correspond to the environment’s needs (i.e.,
not enough of the essential requirements were identified and delivered).
Thus, the analysts must learn to specify enough for the designers to
understand both what is to be delivered and how broad their design
latitude is.
There is also a second dimension to what the SRS does not state; it
consists of the requirements that are assumed to be included according
to the conventional standards of practice. In effect, they represent those
requirements that all applications of this class are expected to meet, even
if the SRS does not call them out explicitly. For instance, one would be
very disappointed to receive a software product in which all the variable
names were in the form Xnnnnnnn, where n is a decimal digit.
However, one would feel quite silly explicitly specifying that the
developers shall not use such mnemonics (even though the designers
might find them meaningful).
Thus, on closer examination we see that there really are three
categories of requirement.
Restating this informally, the essential requirements are those that the
sponsor needs to see in the requirements document, the derived
requirements represent details added by the designers, and the implicit
requirements are the standards followed by the design team for all
products of this category. (The concept of application class is
important, different categories will have different implicit requirements.
For example, one would not expect the same implicit features for a real¬
time embedded system as for an ECP Reporting System.)
REQUIREMENTS ANALYSIS AND MODELING 169
■ Time to learn. How long does it take typical target users to learn
to use the product for a set of relevant tasks? To quantify this
one must identify the users (e.g., secretaries, data-entry clerks,
nurses) and the generic tasks (e.g., format a letter, enter a
category of data, review medical orders).
In this list, the first two factors relate to the ability of the user to
build (and retain) an effective model of the tasks supported by the
system. For some tasks (e.g., developing the interface for an automatic
teller machine), it is necessary to have the system map onto the users’
mental models and be very easy to learn. For other activities, however,
the product may reorient the existing task structure, and a much longer
training period may be necessary (e.g., many consultants suggest that at
least a week of training is necessary for a CASE tool to be integrated
into the process). Here, time to learn includes both an understanding
of the context of the new product plus its use. The remaining three
measures also combine the general design and user interface factors. Of
REQUIREMENTS ANALYSIS AND MODELING 173
These usability requirements clearly show how the essential and implicit
requirements differ. The former are explicit and testable. The designer
has an obligation to meet them, and there are unambiguous criteria for
seeing that they have been met. The latter represent good practices that
one expects the designers to use in producing an implementation.
Experienced designers will know what these implicit characteristics are
and when to modify them. Unfortunately, because these requirements
are subjective, we have no choice but to treat them as design decisions.
Shooman also identifies three managerial measures, but I will defer the
discussion of these considerations until Chapter 6.
This list does not exhaust the set of quality metrics. Figure 2.29
displays a frequently cited illustration of the principal quality concerns
and the metrics available to measure them. On the left are the quality
components of a model developed by Boehm et al. [BoBK78], and on
the right is a comparable model described by McCall et al. [McRW77].
Both models begin with a few high-level quality concerns, which are
decomposed into key factors, which in turn are reduced into measurable
criteria. The Boehm model defines general utility in terms of how the
product supports its objectives (i.e., as-is utility, which includes
reliability, efficiency, and human engineering), how easily the product
can be modified (i.e., maintainability, which includes testability,
understandability, and modifiability), and (at a slightly lower level of
importance) the product’s portability. These, in turn, can be described
by primitive constructs. The McCall quality model also addresses
roughly the same three concerns. These have been summarized as
follows [CaMc78]:
finally we used HLL typing to automate this test. I do not think that we
will ever be able to automate quality, but we can extend our
environments to exclude many of the common faults that degrade
quality. Also, as we gain experience, we can build formal models for
selected quality attributes to ensure that we achieve the required level
without expending effort on a degree of performance that exceeds our
needs. The usability of the previous section is one illustration of such
a model; examples taken from the ilities include reliability [Shoo83,
Musa80, MuI087] and efficiency (performance) [Ferr78].
It is perhaps fitting that the longest chapter in this book is the one on
requirements analysis. This is the most difficult task in system
development; it also is the most error-prone activity. We begin
requirements analysis with a perceived need, and we must end with a
specification for the automated product that will address that need. Of
course, this is the same kind of engineering problem that we face when
we build a bridge, or a ship, or an electronic device. If we have
constructed solutions to this type of problem before, then our task
reduces to one of detailing a solution tailored to the present conditions.
There is always a margin for error, and we must be careful to test every
decision and assertion. But that is no different for software than for
hardware.
Naturally, the first time a problem is addressed (either because the
technology is moving to a new area or because the developers have no
previous experience), the analysts need to discover what the intended
product must do. There is seldom a single correct (or even best)
answer, and one typically learns only by making (and rectifying) errors.
In Chapter 6 I present several ways in which projects can be structured
to facilitate this discovery process. Here I have described a variety of
modeling tools that foster communications with domain specialists and
provide a better understanding of the problems to be solved. I have
been careful to deemphasize methodology considerations. The reader
should be aware that many believe the consistent use of a fixed set of
methods to be critical; therefore, they begin by teaching the methods.
Although I certainly agree that each organization must commit itself to
a limited number of tools and methods, I have not endorsed any specific
approaches. Each organization must choose based on its problem
domain, operational culture, and customer demands.
Like this chapter, requirements analysis must come to an end. One
can never discover all the requirements; neither will the requirements
remain stable. Thus, at some point the development team must
document the requirements for the new product and begin the task of
REQUIREMENTS ANALYSIS AND MODELING 179
constructing it. Up to this point, the analysts have been concerned with
the environment in which the product will operate, and their
descriptions have conveyed how that product will interact with that
larger universe. Naturally, their judgments have been tempered by their
knowledge of how to build software, and their specifications are
intended to be read by software implementors as well as sponsors and
users. Nevertheless, the primary interest in requirements analysis has
been on what the product should do. It is a view of the software as a
black box\ a perspective from the software to the outside.
In the next steps we change this orientation. We assume that we
begin with a specification that (1) prescribes the behavior of the target
software, (2) details the essential properties of the product, and (3)
offers sufficient descriptive information to guide in all the design
choices. We now redirect our gaze and look inside to how we can realize
a solution to the problem described in the specification. Rather than
being treated as a black box, the details of the software are revealed to
us through a glass box (also called a white box). We must consider
domain knowledge as we validate, but most of the problem solving will
center around implementation issues. We will have to revisit many of
the same issues encountered during requirements analysis.
MODELING-IN-THE-LARGE
The Naciremas, as Poulson has discovered, have confused their tools and
goals. For now, the damage will be limited to some bruised vegetables.
Nevertheless, they had better get their act together before they go off on
a lion hunt, or the results may be catastrophic.
This chapter describes modeling-in-the-large in the context of two
philosophically opposed approaches. Like the tools of the hunter/
gatherer, the methods are not interchangeable. Therefore, the software
engineer should learn from the Naciremas’ example. He must be careful
to match his tools to his goals.
The two orientations are called decomposition and composition.
Oversimplifying, decomposition starts with a model of the entire system
as a single black box. This is then decomposed into a set of interfacing
functions, each of which also is viewed as a black box. The interfaces
among the functions are identified and defined, and the process is
repeated until the granularity of the black-box contents is so fine that
their operations can described.
In contrast, composition begins by modeling what is known about
portions of the system. Each smaller model is completed, and the set of
all smaller models is combined to form the system. Where decomposi¬
tion is top down, composition is outside in. Where decomposition
works with an incomplete model of the entire system, composition deals
with complete models within an incomplete system model. Thus, it
should be obvious that a recipe consisting of two parts decomposition
and two parts composition will be unsatisfactory.
I use structured analysis and structured design to illustrate decompo¬
sition. Structured analysis began as a method for requirements analysis,
and structured design originated as a technique for effective modulariza¬
tion (i.e., what I call modeling-in-the-small). For composition I describe
two Jackson methods. The first was developed for program design
(clearly modeling-in-the-small) and the second for system design
(modeling-in-the-large).
The following text justifies what may seem a strange choice for a
chapter with the present title. One obvious explanation is that we are
taking a holistic view of software engineering, and boundaries implied
by labels may have little meaning.
182 SOFTWARE ENGINEERING: A HOLISTIC VIEW
In this chapter we make the transition from determining how a need can
be met with a software system to describing how that target system
should be constructed. In the requirements analysis of the previous
chapter, we were concerned only with decisions regarding what the
product must do. We developed an understanding of the application
domain, and we used modeling tools to establish how a software product
could be used. Based on the results of that analysis, we were able to
document a set of requirements that identified all of the essential
features of the target product.
In our software configuration management (SCM) case study, we
first examined how the environment dealt with the issues of change
control and configuration integrity. We constructed a model for an
idealized flow, and then we identified how a software system could
support this flow. Using this model, we then defined the essential
features of the desired system: it must manage the storage and update
of configuration items (CIs), it must log and report on the status of
change requests and software incident reports, etc. Exactly what the
product must provide was documented in a Software Requirements
Specification (SRS). This SRS, naturally, did not detail all the features
of the system to be delivered; we deferred many of the particulars as
design decisions. Thus, for example, the SRS stated that we must be
able to update Cl version numbers, but it did not specify how the
interface should operate.
Throughout the requirements analysis phase, we directed our
thinking to how the software would be used in the application domain.
We addressed implementation issues only from the perspective of
feasibility. That is, could a software system reasonably be expected to
support some feature within the given constraints? Once we answered
all these questions, we had a descriptive model that contained
2
This key lesson of software engineering is probably the most difficult to
teach. Because students begin by learning how to write code (i.e., programming-
in-the-small), there is a tendency to focus on the complexities of that task.
Further, when students are given exercises, they often are of a scale that demands
only superficial modeling-in-the large. The result is that the problem solution can
be understood in terms of program units. To resolve uncertainty, code is experi¬
mented with through a process called “hacking,” and “commercial-grade” software
is seldom expected. Obviously, there are solid pedagogical foundations for this
approach to teaching. But in the world of software engineering, it is assumed that
the development staff already has the prerequisite coding skills. Here the
emphasis ought to be placed on establishing what the product is to do and how
it should be constructed. Experimentation with software should be limited to
prototypes where only the lessons learned are preserved in the delivered product.
Once the specification for a program has been established, its coding and
verification should become routine.
MODELING-IN-THE-LARGE 185
As one would expect, these methods evolved over time. SA/SD was
initially developed for data processing applications, and the 1980s saw
extensions to the basic SA diagrams for use in real-time applications.
Individuals modified their methods as the underlying technology
changed. DeMarco continually points out that his views are evolving
and his books should not be read too literally; Yourdon recently has
written a book to correct some of his earlier teachings [Your89]. Thus,
when we speak of SA/SD we are talking of a set of tools that guide us
in problem solving. The problems that we confront in SA are different
from those we find when performing SD; consequently, the tools and
methods also change. We should not expect a comprehensive, formal
approach to system design. However, we will find a flexible framework
that guides us in our efforts.
MODELING-IN-THE-LARGE 189
When SA was introduced as an analysis tool, the first task in the method
was the definition of four models:
These four models are constructed using data flow diagrams (DFD),
which I will describe below.
The New Physical Model, once it has been constructed, is redrawn
as a context diagram that consists of a single bubble (circle) with the key
external interfaces identified. In Chapter 2 I used Orr’s method to
establish the system boundary, and there is no need to repeat that
process with the above four models. Figure 2.16 (which has been copied
here as Figure 3.1) is a context diagram for the SCM system. It
represents the culmination of the above analysis process and the
beginning of what I call modeling-in-the-large. The context diagram
describes what the system is to do; our task now is to decide how the
software can provide the desired functionality. In doing this, we refer
■ Data flows. These are the directed arcs of the network. They
represent the data transfers to or from a process. In effect, the
data flows define the process interfaces. The data flows are
drawn as directed lines between nodes in the network.
very easy to create descriptive materials that are easily understood. The
DFDs display system functions in arbitrary levels of detail, the data
dictionary describes the essential characteristics of the data flows, and
the minispecs contain the processing details.4 The result is a collection
of documents that, in combination, can resolve all design concerns.
The context diagram in Figure 3.1 delineates the domain of our
system. The SCM system is shown to interface with three external
entities (terminators) with three major inputs and four major outputs.
Obviously, the diagram reduces system complexity considerably. We
make simplifying assumptions so that we can concentrate on what is of
primary importance in our current analysis. For example, we see that
Developers receive Assignments and CIs from the SCM System and
return Tested CIs to it. That high-level statement has an almost
comical obviousness to it. Our next objective is to clarify what the
statement implies. We will add details as we proceed, but we will always
defer some of the particulars until the final step. Figure 3.2 represents
the first step in this expansion of the context diagram. As a refinement
of the bubble in the context diagram, the level-2 DFD must have exactly
three terminators with four output and three input flows. And the
names of these items must be the same. We are treating the bubble as
a black box and detailing its internal workings; viewing it as a black box
implies that we cannot violate its interfaces.
The SCM System displayed in Figure 3.2 has been decomposed into
six major functions, which bear a remarkable similarity to the system
functions defined in the SRS of Section 2.3.4.
These six functions describe the interface between the SCM software
system and its users. Both manual and computer supported procedures
are implied. We still are modeling the SCM System we intend to build.
Although our present goal is to design the software, we have begun by
modeling the software in the context of its use. One could argue that
this is requirements analysis, and that the discussion belongs in Chapter
2. But we really are making design decisions. Notice that we have
structured the SCM System around two central data stores:
There was no requirement to have these two data stores. We also have
allocated our functions in a very specific way. We set Management
outside the processing flow and allowed it to control activities only by
placing Authorizations in the Change Status File. We identified four
functions in the processing flow, and assumed that Configuration
Manufacture operates asynchronously with respect to the change tasks.
In one sense, this allocation could be presented as a convenient
mechanism for describing what the SCM system does. But it really
represents a commitment to how the implemented system will operate
(and by implication, how it will be built). Given our holistic view, we
should not feel uncomfortable in using an analysis tool for design.
To illustrate the departure that we have taken, compare Orr’s
approach with the SA method shown here. Both begin with the
MODELIN G-IN -THE-L ARGE 195
identical context diagram (Figs. 2.13 and 3.1). The DSSD “entity
diagram,” however, was elaborated to identify sequences of actions as
shown in the main line functional flow of Figure 2.14. The DFD of
Figure 3.2, on the other hand, obscures these sequential associations.
For example, the DFD does not make it clear that the Release for
Change can begin for Emergency Repairs without receiving any inputs
from the Detailed Analysis process. In fact, there is nothing in this
DFD to suggest that Audit Change does not always occur before
Release for Change. Thus, to clarify the interfaces, SA hides the
sequential processing highlighted by DSSD.
The DFD also differs from the SADT diagram in Figure 2.18. The
SADT diagram in that illustration was really the “new logical model.”
It included management as a control and did not explicitly show data
stores. As a logical model, the SADT diagram concentrated on the flow
within the total system. Naturally, one could make the two types of SA
diagrams seem more alike by changing the names of the processes (e.g.,
calling “Preliminary Processing” “Log and Evaluate”). But that would
provide only superficial conformance. Both the diagrams and the
analysis processes differ fundamentally. In the SA that we are using in
this section, the system is represented as a set of functions (or process¬
es). We display those functions as bubbles and link them with other
nodes via data flows. Our objective is to decompose the entire system
into a set of discrete processing units. The DFDs provide a picture of
how the system operates, but little information about what goes on
inside a process or the temporal dependencies among processes.
Naturally, by the end of the SA process—when the data dictionary is
complete and we have the minispecs—we will have discovered and
recorded everything that we need to know to implement the system.
The SA method is iterative. Tb understand how a process (repre¬
sented by a bubble) operates, one must either expand it as a next-level
DFD or describe it in a minispec. Figure 3.3 details bubble 1, Prelimi¬
nary Processing. Notice that each bubble has been numbered with a
prefix that indicates parentage. In Figure 3.2 Preliminary Processing
has one input, one update to the Change Status File, and two outputs.
These same interfaces are shown in Figure 3.3; the file update informa¬
tion, however, has been elaborated. Before describing the DFD, let me
review the concept of Preliminary Processing that this DFD represents.
Recall that Customers submit SIRs and CRs. Although Developers
also can submit SIRs, this was not shown in the context diagram. It
could be argued with some justification that this fact was omitted to
eliminate a minor detail that would clutter the diagram and thereby
inhibit comprehension. In any event, the SCM unit receives the SIRs
and CRs along with special information for emergency repairs. (This is
the “Emergency Priority Statement” identified in the SRS discussion of
Section 2.3.4.) Obviously, there will be two processing flows. For
196 SOFTWARE ENGINEERING: A HOLISTIC VIEW
routine and urgent repairs, the staff will enter the initial information
and send the paper report for review and preliminary processing. That
processing may determine that no action is required, and the request will
be closed out. Alternatively, it may be determined that a more detailed
analysis is necessary; in this case, the status will be so set, and the
prioritized requests will be forwarded. For emergency repairs, one
would expect some action to be taken even before the paperwork is
processed. In those situations, the request may be entered after it has
been completed, or the emergency may have been resolved with a quick
fix combined with a lower priority change request. Obviously, there is
a need to edit the change data in the Change Status File and review the
status of pending actions.
The flow supporting the above view of Preliminary Processing is
shown in Figure 3.3. Is this how we actually expect the SCM system to
work? Would it be better to combine the Preliminary Analysis process
identified here with the larger Detailed Analysis process of bubble 2?
There are still many open questions, a fact that reenforces my observa¬
tion that the requirements analysis of the previous chapter and the
design of this chapter overlap. In part, the difference is a mater of focus
MODELING-IN-THE-LARGE 197
and orientation. We are using the DFD here to identify the functions
that the software will support. We recognize that one person may
conduct both the preliminary and the detailed analysis. The present
concern is for the software-supported tools that he will use in each of
those tasks. Naturally, if the tools will not conform to the flow that the
SCM organization uses, then those tools will not be effective. Thus,
when we wear our analyst’s hat, we must consider what the environment
expects the software to support, and when we wear our designer’s hat,
we need to concentrate on how to decompose the software into the units
that will provide the desired services. In the context of this discussion,
we begin with an essential requirement to process new requests of
various priorities. We must validate the subsequent design decisions to
ensure that the finished product will indeed do what was specified. The
fact that validation continues throughout implementation indicates that
the requirements analysis task is never really finished.
Now for a more detailed examination of the level-3 DFD. There are
five functions:
■ Enter Requests. This process enters the new requests into the
Change Status File and forwards the (hardcopy) SIRs and CRs
for Preliminary Analysis. Emergency Repairs are shown here
as being routed directly for action; further discussion with the
users may revise this part of the flow, but a change should have
little impact on the overall processing.
different modes of use, then the design would have to take this
requirement into consideration.
This minispec identifies all the validation criteria that must be satisfied
before a request is forwarded to Detailed Analysis; Emergency Repairs,
of course, receive special handling.
The format for this minispec has emphasized the ability to communi¬
cate. The display of the change entry is shown in a figure (i.e., the
Figure X appended to the minispec). The testing associates with each
acceptance criterium a value of OK or HOLD. The objective is to
document what the algorithm will do and not how the algorithm will be
implemented. For example, the above minispec might be coded (in part)
as
200 SOFTWARE ENGINEERING: A HOLISTIC VIEW
PreliminaryAnalysis begin;
Preliminary_Analysis end;
The data dictionary will not be complete until I have provided defini¬
tions for SIR Entry and CR Entry, but it will remind me of what has not
yet been defined. Observe that most of the time our design will be
incomplete; in fact, the design will be complete only when it is finished.
Thus, we want our methods to help us deal with the incompleteness we
5Observe that some of the validation criteria defined in the minispec for 1.2
Preliminary Analysis are the same as those that will be used in 1.1.4 Quality
Review. This suggests that we might want to define some common subroutines
to be used for both processes. Unfortunately, the SA method that we are using
will not remind us of that fact; we will need to recognize and remember this
efficiency on our own.
MODELING-IN-THE-LARGE 201
In this notation the equal sign establishes the definition, the plus sign
is an “and,” the square brackets and bars represent a selection (“or”),
the parentheses indicate an optional term, and the asterisks set off
comments. Thus the SIR Entry consists of an identifier (SIR-No), the
product name (Product-ID), information about the source of the SIR
(Submitter, defined on the next line), and the status (SIR-Status, also
defined below). Submitter is defined to be one of two terms, neither of
which is defined in this part of the data dictionary. This implies that
there must be files that identify the customers and developers, a fact that
is not obvious from the first two DFDs. Of course, this fact will be
clear when the data diction is complete. SIR-Status is defined as the
priority flag (defined below as either R for routine, U for urgent, or E
for emergency), the receipt date, the optional assign date (unnecessary
for requests that require no actions), and the close date. Finally, the
definition of Receipt-Date uses a comment to indicate its type, range
and the fact that it is required.
This segment of the data dictionary defines everything about an SIR
that we want our automated SCM system to maintain in the Change
Status File. Let us return to the SIR form in Figure 2.1 to consider
what is not in the Change Status File. None of the information in the
software identification block (upper right corner) is included. Observe
from the level-2 DFD that if we expect the Change Status File to
contain enough information to identify the CIs Affected for the Release
for Change process, then this information must be entered. However,
there is a reasonable chance for errors in this information at the time
of Preliminary Processing, and it would be best to defer the entry of
this information as part of the Analysis Summary after the Detailed
Analysis is complete. Returning to the SIR form, the description and
analysis boxes contain text that might be costly to enter. If the text were
created using the same computer that stored the Change Status File,
202 SOFTWARE ENGINEERING: A HOLISTIC VIEW
then its automatic capture should be considered. Yet, if the SIR forms
were filled out manually at a customer site, then there would be little
justification for including it as part of the SIR entry. Finally, what
should be entered in the SCM System database from the section labeled
“To be completed by SCM manager?” That answer requires some
discussion with the SCM manager. What is available? What is
necessary? How will it be used? Who will enter it and when? And so
on.
The modeling process involves asking questions of ourselves and
others and then recording the answers in the representation scheme we
have selected. In SA we use DFDs to show interactions among the
processes (functions), minispecs to describe a process after it cannot be
decomposed further, and the data dictionary to establish the contents
and structure of the interfaces. The activity involves iteration, and few
functions are neatly isolated. Nevertheless, the tools of SA provide a
natural media for recording a design model and critiquing it with others.
After all, this section has provided only a very sketchy definition of the
DFD, minispec, and data dictionary, yet I am confident that the reader
understands the examples. In fact, their intuitive clarity is one reason
for the widespread acceptance of the SA analysis tools. Naturally,
learning to create these documents is not as easy as learning to read
them. It takes time to develop new skills, and one needs the experience
gained from the feedback of error identification and correction.
As we already have seen, the minispecs for computer programs
explain the functions to be supported without providing the implementa¬
tion details. When do we express the processing of a bubble as a
minispec? DeMarco offers the following guidelines [DeMa78],
= means IS EQUIVALENT TO
+ means AND
[] means SELECT ONE from the items separated by |
() means OPTIONAL
{} means ITERATIONS OF
** are used to set off comments
@ is sometimes used to identify a key field for a store (e.g.,
@SIR-No)
Only the iteration symbol requires further elaboration. When there are
cardinality constraints, the minimum number of iterations is written to
the left of the bracket and the maximum number to the right (e.g.,
1 {CIs} for at least one Cl identifier and 0{Developer-ID}5 for an
optional field that can contain up to a maximum of five Developer-ID
values). Obviously, the various data dictionary entries for the lists of
CIs and developer assignments will utilize iteration symbols.
There also are conventions for the DFD, and I shall illustrate them
with Figure 3.5, which is the highest-level DFD for the Configuration
Manufacture function. I begin with some observations about the
particular process that I have selected as an exemplar. As noted in
Section 2.2, there are software tools that support this function [Feld79],
but a complete manufacturing system remains a research topic. Thus,
although there are models for configuration manufacture, it is far from
a simple task. We lack a sound conceptual model for describing how it
should be implemented, and we ought to be prepared for feedback,
learning, and revision. (In Brooks s words, we should plan to throw one
away.) In fact, we have here an excellent candidate for a prototype.
Unlike a prototype for a user interface (e.g., Enter Requests), this
prototype would address implementation concerns: Does the data
structure express all the necessary configurations? Are there conflicts
in processing sequences? Is there a potential for deadlock? And so on.
It can be argued that neither the prototype nor the SA method is
appropriate for this type of problem, but I will not participate in that
debate. I simply use the current example to show (1) that DFDs can be
used for other than data processing applications and (2) any analysis or
design method depends on iteration to resolve questions when new
problems are being addressed.
The DFD in Figure 3.5 conforms to the basic rules for all DFDs:
This is the number of levels for a DFD, and the level of a system is given
as the number of levels for the context diagram. Yourdon suggests, “In a simple
system, one would probably find two or three levels; a medium-size system will
typically have three to six levels; and a large system will have five to eight levels”
[Your89, p. 168],
MODELING-IN-THE-LARGE 207
human control. Still, the figure does identify the following processes
and their interfaces:
■ Construct Test Set. This process receives a Set of CIs and uses
it to construct a Test Set, which is stored in the Cl and Configu¬
ration Repository. This is a deceptively simple process descrip¬
tion. It implies that we keep test sets under configuration
control, that we update them when we modify the program CIs,
and that we know how to combine them into a system (i.e.,
configuration) test.
System will serve its users. For example, when a need for management
reports was identified in Chapter 2, there was no reference to a Change
Status File. We made a design decision that affects the implementation;
indeed, as a result of the SA method we have established both the file
contents and the functions that reference it.7 The obvious questions
now are, Is the design complete? Can we begin to code? As one might
guess from the chapter heading, the answer to both questions is “No.”
So far we have been modeling how the system must be decomposed into
functions. The result is the identification of a set of functions and their
interfaces. The primary orientation throughout this process was the
concern for how the requirements could be expressed and decomposed
as functions. Although we made implementation (i.e., design) decisions,
we were motivated by the need to define how the system would support
its objectives and not how it would be implemented.
It is now time to consider how the system should be implemented.
Not as program code (which would be modeling-in-the-small) but as an
organization of modules. Structured Design (SD) is a method for
composing systems in a way that improves reuse, reliability, and
maintainability. Some SD techniques show how to convert a data flow
graph (such as a DFD) into a structure chart that displays the relation¬
ships among the modules. In fact, many commercial training organiza¬
tions teach SA and SD as complementary techniques. (Sometimes the
combined approach is called Structured Analysis and Logical Design,
SALD.) Thus, the output of SA can be used as the input to SD; both
SA and SD also can be used independently. Although many of the
concepts developed with SD have broad acceptance, SD does not enjoy
the same popularity as SA. Nevertheless, a brief introduction to its
principles will be instructive.
The origins of both SA and SD can be traced back to the IBM
System Journal paper prepared by Constantine and his students. In it,
the authors defined the problem as follows:
The concern of SD was not on how the modules were implemented (i.e.,
how the code was written) but on how the system’s functions could be
realized as a set of communicating modules. The prevalent modeling
tool of the time was the flowchart, which detailed the order of and
conditions for the execution of blocks of code. SD raised the level of
abstraction from code processes (as would be expressed in a flowchart
symbol) to modules, and the structure chart provided a tool for
modeling module connections and their calling parameters.
In SD the module is defined as an identifiable set of contiguous
program statements such as a subroutine. (While the size of a module
was not defined, the conventions of structured programming limited a
program unit to what could be printed on a single page: about 50 lines
with comments.) The central issues that SD faced were
■ How should modules interact with each other? Coupling was the
term adopted to identify and rank the alternative techniques.
Fig. 3.7. Structure chart for the process model in Figure 3.6.
one set of input data flows and returns another (Transform Data) is
called a transform module. Print Report, the module that receives only
input data flows (i.e., produce outputs), is called an efferent module.
(Not shown in the figure is a coordinate module that receives control
flows from one module and sends control flows to another module.)
The structure in this example is almost trivially simple. Of course, that
is its advantage. The goal is to reduce a complex system into uncompli¬
cated structures that are easy to code and maintain.
Transform analysis defines a four-step process for modeling a
function as the transformation of an input into an output. The steps are
as follows:
2. Identify the afferent and efferent data elements. This can be done
only by understanding the processing represented by each bubble
in the data flow graph. For example, the structure of the graph
in Figure 3.6 does not rule out the possibility that the first two
bubbles represent a two-step afferent process.
The structure chart in Figure 3.7 uses three levels to illustrate the
concept of repeated factoring. Naturally, the number of levels required
depends on the complexity of the functions that they implement. That
must be a problem-specific consideration; the rules for structuring a
transform-centered design, on the other hand, are always problem
independent.
A second form of analysis is called transaction analysis. Here some
data value or event is used to determine the actions of the system. The
transaction center for this kind of system must get the transaction in its
raw form, analyze it to determine its type, dispatch it according to its
type, and complete the processing of each transaction. Figure 3.8
illustrates transaction processing in a data flow graph for the actions
taken after a new configuration is tested and the Action Flag is entered.
It shows that, depending on the value of the Action Flag, reports will be
sent to management, the testers, and/or the SCM team. The structure
chart for this flow is given in Figure 3.9. On the left the flag is received
and the analysis retrieves information about the configuration. This has
been factored as a transform-centered design. The module structure on
the right illustrates a transaction-centered design. The transaction
processing is detailed on the right. It shows the module connections in
which the processed disposition information (Flag) is used to direct the
configuration information (Summary) to one or more reporting
modules. The diamond represents a selection. Because Flag controls
the processing flow, it is shown with a black dot. Again, the result is a
very simple structure.
Figure 3.10 contains a more complex data flow graph and its
structure chart is shown in Figure 3.11. The overall structure is
determined by transform analysis, and the factoring indicates that each
process in the graph can be implemented as a single module in the
chart. Notice that, even though we speak of a hierarchy, we do not
insist that the structure chart depict a hierarchy. Process F sends data
to both processes A and E. Consequently, module F must be processed
before either A or E can complete; that is, F is subordinate to both
modules A and E. In constructing the structure chart, we expect the
module structure to reflect the organization of the graph (e.g., because
218 SOFTWARE ENGINEERING: A HOLISTIC VIEW
Fig. 3.8. Data flow graph for processing SCM test results.
Fig. 3.9. Structure diagram for the process model in Figure 3.8.
When describing the background of SA, I pointed out that its origins
were in data processing. The basic principles of SA, however, are not
limited to that application class. SA provides a systematic approach to
the definition of requirements and the decomposition of functions once
a system boundary has been established. For data processing applica¬
tions, communications among the functions (i.e., processes or bubbles)
are quite straightforward, and they can be modeled with the four symbols
of the DFD. For other types of application, however, more complex
interfunction communications must be supported, and the symbology of
the DFD must be enhanced. In what follows I illustrate how SA has
been extended to support the modeling of real-time applications. I also
show how that experience, in turn, has affected SA.
All computer systems operate in real time. To my knowledge, only
the British Museum algorithm ignores time constraints. (It solves
problems, such as theorem proving, by an exhaustive and unsystematic
search for correct results.) What characterizes a real-time system, then,
is not its concern for performance but
■ The frequent need for unique hardware, which implies the need
for parallel development of hardware and software.
9
This statement implies that the hardware design is available prior to the
specification of the software. Unfortunately, this is seldom the case. With the
parallel development of hardware and software, the designers often must work
with partial requirements and test environments. Because of the physical
difficulty in implementing hardware changes, there is a tendency to rely on
software modifications to compensate for system-level design inadequacies. This
is the essential difficulty of conformity to which Brooks referred [Broo871. It is
a management-determined response to an unanticipated problem, and no design
method can circumvent its effects. 6
MODELING-IN-THE-LARGE 223
processes the Current, which is input to all the Process functions and
the Temperature flows that each of them report. From this information
it computes a Correction flow that is sent to Power Source. Here there
is just one Compute Current Change for all the Process transforma¬
tions. Tb show one Compute Current Change for each Process,
Compute Current Change would be drawn as a stack of bubbles.
Finally, the figure shows that all Temperature flows are sent to
Emergency Shutdown, which can send the control flow Alarm to
System Control. There are some symbols not included in this DFD.
For example, there is a notation to show that a flow combines the flows
from more than one process. Other SA tools for real-time systems
expand further on this notation to include symbols for loosely coupled
(queued) and closely coupled (synchronized) messages [Goma84
Goma86].
Once the DFD notation has been expanded to model the concepts
essential to a real-time system, Ward and Mellor devote the remainder
of Volume 1 to describing the modeling tools used in their method.
event list. These data then are used to establish the behavioral model
as follows.
If one were to group the design methods as they emerged in the 1970s,
one would not choose the decomposition-composition rubric I am using.
Rather, one would organize the methods into those that concentrated on
the flow of data among processes (i.e., data-flow methods) and those that
focused on the structure of the data (i.e., data-structure methods). As
its name implies, the data flow diagram (DFD) of structured analysis was
developed for a data-flow approach. One builds a model of the system
by first establishing how the data move within the system and then by
defining the transformations that affect the data flows. The transforma¬
tions are represented as bubbles (processes) in the DFD, and structured
analysis provides a systematic method for detailing the process descrip¬
tions.
The data-structure proponents, on the other hand, believe that the
best way to understand an organization is to model the data that it uses.
This model, it is asserted, reflects the structure of the universe in which
the organization operates. That is, because the organization uses the
data as surrogates for the real-world entities with which it interacts, the
data structures represent the organization’s richest understanding of its
universe. How the organization uses (transforms) the data (i.e., its
processes) is subject to more change than the data’s underlying
structure. The same analogy can be made for programs; if one under¬
stands the structure of the input and output data, then the structure of
the program should follow.
Both sides in this debate recognized that there was such a philosoph¬
ic difference that compromise was impossible. One could use either a
data-flow or a data-structure method, but not a combination of the two.
In Section 3.2, the early history of some of the data-flow methods was
summarized. For the data-structure methods, there were three principal
method developers: Warnier (referenced in Section 2.3.1), Orr (whose
requirements analysis methods are described in Section 2.3.2), and
Jackson (whose program design method can be characterized as data-
structure driven).
As its name implies, the Jackson Program Design Methodology
(called JSP for reasons too complicated to get into) is a method for
program design and not for modeling-in-the-large. It begins by defining
the structure of the data on which a program is to operate and then uses
a rigorous approach to ensure that the program and data structures
match. Because of its algorithmic base, it is possible for different
designers using this method to produce identical program solutions.
That is, for some types of problem, two programmers given the same
assignment will produce near-identical solutions. This achievement is
not common to many design methods.
Jackson described his method in 1975 [Jack75] and went on to
address issues in the design of systems. The result was a second method,
called Jackson System Development (JSD), that was presented in
[Jack82]. As one would expect, there is a great deal of commonality
between JSP and JSD, but the former is not a subset of the latter.
These are two different methods; as with structured analysis and
structured design, they can be used either separately or in combination.
Naturally, the Jackson methods continued to evolve. Cameron, who
works for Michael Jackson Systems, Ltd., has written a frequently cited
overview of JSD [Came86] and has edited an excellent IEEE tutorial on
JSP and JSD [Came89]. Among the other books on these methods are
one by King on JSP [King88] and one by Sutcliffe on JSD [Sutc88], In
what follows I defy the logic of the book’s top-down organization.
Rather than starting with the modeling-in-the-large method of JSD, I
begin by describing JSP. JSP is easy to understand, and by presenting
it first I can lay the foundation for a discussion of JSD, which is often
explained in ways that are difficult to comprehend.
MODELING-IN-THE-LARGE 231
■ Selection (sel ... alt). This is the selection of one item among
alternatives (e.g., case). Selection is indicated by a row of boxes
with a small circle in the upper right corner.
Thus, the notation reduces to rows of boxes with possible symbols in the
upper right corner. A JSP diagram labels the boxes and connects them
with lines. Because the JSP diagram is a graphic representation of a
structured program, each diagram is in the form of a hierarchy. The two
equivalent forms of the JSP notation are shown in Figure 3.13.
Although the notations are similar, there is a fundamental difference
between a Jackson diagram11 and a structured program. In structured
programming, each item may be elaborated in the form of other
structures. It is good practice to design programs using stepwise
refinement [Wirt71], and one often removes unnecessary detail by
replacing a complex structure with a simple abstraction (e.g., substituting
for a complex process the name of a procedure that carries out that
process). In JSP, however, our goal is to create a complete model, and
its notation does not rely on symbols that capture an abstracted concept
to be elaborated. Thus, unlike a DFD, which displays a portion of the
model as a node in a nested representation scheme, each JSP diagram
must be complete. Naturally, the designers cannot work on all aspects
of the problem concurrently; they will have to separate their concerns
and defer independent segments of the design, but the diagrammatic
notation is not intended to manage their incomplete designs.
A seq
B;
C;
D;
A end
A iter
B;
A end
lowest level are assumed to represent the primitives (in this case, the
fields in the data structure). Finally, note that some of the boxes are
necessary to produce a valid structure. For example, Affected items is
a list of the CIs and documents that will be affected by the proposed
change. One might be tempted to write this as a sequence of two
iterations, but that diagram could not be expressed in the formal
notation of structured programming (i.e., one cannot write structured
code for a sequence of two iterations, but one can write structured code
for a sequence of two items, each of which is an iteration). Thus, one
of the main advantages of the Jackson notation it that is enforces
formality on a diagram that is easy to comprehend. The naming of the
nonleaf nodes in the tree also has the advantage of providing labels for
things (such as the set of all CIs) that ought to be named.
In this example we have taken the notation of structured program¬
ming, which was designed to model dynamic structures, and we used it
to model data, which is static. Jackson believes that programmers often
focus on the dynamics of the program, thereby becoming distracted by
details. They ask questions such as, “What should the program do
here?” when they might better consider, “How often should this
operation be executed?” For example, if the programmer began to write
code for the entry of Affected items, he might have difficulty in
managing the special cases of both CIs and documents, neither CIs nor
documents, and only CIs or only documents. Naturally, if the program¬
mer happened to stumble onto the solution of a sequence of two items,
each of which was an iteration, he would have no trouble. But it
certainly would have been better to start with a good (formal) under-
234 SOFTWARE ENGINEERING: A HOLISTIC VIEW
standing of the structure rather than hope that a good structure may
present itself.
JSP avoids the bad program designs that result from an undirected
approach. One begins (as would be expected with a data-structure
method) by diagramming the inputs to and outputs from the desired
program. These two groups of structures then are merged, and the
result is a program structure for transforming the input into the output.
After the program structure has been established, the operations carried
out within that structure are detailed and assigned. Up to this point, all
of the design activity has been concerned with the program’s static
properties; it is left to the final step, in which the structure is copied
over into textual form, to address any remaining dynamic concerns. This
can be restated as the following four steps:
used in the program, and then he associates them with the boxes in the
diagram. (As with the program that is being modeled, the same
operation may be used in more than one place.)
The numbered list of operations for the report program is produced
by starting at the output side of an imaginary data-flow graph and
working toward the input side. By listing the operations in this order
the developer becomes aware of the need for interior nodes in the data¬
flow graph whenever an output value cannot be copied directly from an
input value. Here is one such list.
1. Open Output
2. Close Output
3. Head new page
4. Print Source and Count
5. Print Total, all priorities
6. Total(Priority) =Total(Priority) + Count
7. Total = 0, all priorities
8. Count=0
9. Count=Count+1
10. Get Priority
11. Set page header
12. Get Source Name
13. Read record
14. Get Date
15. Open Input
16. Close Input
12There are many forms of pseudocode. One type is the minispec described
in Section 3.2.2, which is intended to convey a process using the precise constructs
of programming. In most cases, however, there is no obligation to implement the
program using the structure implied by the minispec. In this section, the
pseudocode is intended to convey how the program will operate. It eliminates
details to aid comprehension, but it is assumed that the final program will be a
refinement of the pseudocode structure. In JSP, the pseudocode serves as a
program design language (PDL), which will be discussed in Chapter 4.
238 SOFTWARE ENGINEERING: A HOLISTIC VIEW
for the executable program. Figure 3.18 contains the result of step 4; it
should not be difficult to translate this pseudocode into an operational
program in any of a number of programming languages. Depending on
the language chosen, one might have to augment the structure to deal
with dynamic issues such as tests for invalid files, end-of-file processing,
and so on.
This example illustrates the power of JSP with a very simple
program. If the reader has been doing the exercises, it would be
interesting to compare the above pseudocode with the minispec
produced as the result of exercise 3.10. Also, if this book is being used
as a textbook, it would be interesting to compare the results of the
CR-Report seq
Process-Header seq
Open Input
Open Output
Total=0, all priorities
Read record
Get Date
Set page header
Read record
Process-Header end
Process-Body seq
Priority iter_ until EOF
Print-Header seq
Get Priority
Head new page
Print-Header end
Process-Priority seq
Process-Source iter while (not EOF)
Initialize-Counter seq
Count=0
Get Source Name
Initialize-Counter end
Compute-Count seq
Total-Number iter while ((not
EOF) and (priority
last group priority))
Count=Count+1
Read record
Total-Number end
Compute-Count end
List-Count seq
Print Name and Count
List-Count end
Update-Totals seq
Total(Priority) =Total(Priority) -1-Count
Update-Totals end
Process-Source end
Process-Priority end
Priority end
Process-Body end
Process-Totals seq
Print Total, all priorities
Close Input
Close Output
Process-Totals end
CR-Report end
various answers to exercise 3.22 using JSR How similar are the designs?
Do they differ because people were solving different problems? Were
the different solutions to the same problem all free of errors? Was any
correct solution noticeably superior to the others? Clearly, the above
program structure is uninspiring and lacks any creative flair. As Jackson
puts it, it is easier to make a good program fast than to make a fast
program good. Of course, this just confirms the goal of software
engineering. Inspiration and creative flair should be reserved for the
problems that merit them; they should not be dissipated on problems for
which a complete solution can be obtained through the application of
a standard technique.
Naturally, not all programs are as simple as the one just shown. JSP
has compiled methods for collating data streams, handling recognition
difficulties (e.g., knowing when to end the Process-Source iteration in
the above example), and error handling. A major concern in JSP is the
problem of the structure clash, or what to do when the structure of the
input does not match that of the output. There are three categories of
structure clash: boundary clashes, ordering clashes, and interleaving
clashes. In the CR-Report program I eliminated an ordering clash by
starting with a sorted input file, which could be constructed from
iterations of the CR data structure shown in Figure 3.14. However, if
I were asked to prepare the report whose structure is shown on the right
of Figure 3.15 from an input file with iterations of the CR structure in
Figure 3.14, then I would have a structure clash. These two structures
cannot be merged into one without producing an excessively complex
structure that would prove difficult to maintain.
One approach to dealing with an ordering clash is to divide the
program into two parts. The first part produces data in a structure that
the second part can use to produce the output. In the above example,
one program (Sort) produces Sort File from the structure in Figure 3.14,
and the CR-Report program uses Sort File to produce the desired report.
Figure 3.19 illustrates this processing flow. In some cases this is the
most natural way to organize the processing, but in many cases the result
can be inefficient. An implementation solution offered by JSP is called
inversion, and because inversion is also an important tool in JSD, I shall
describe it here. Briefly, inversion is defined as a transformation
technique that introduces a suspend-and-resume mechanism into a
process thus implementing it as a variable-state subprogram. Again, an
example will help.
Assume that we would like to remove the intermediate file shown in
Figure 3.19. One way to do this would be to structure the first program
so that it could provide the second program with the data it needed
when it was needed. Then, instead of getting its inputs from the
intermediate file, the second program could invoke the first program as
a subroutine. In this example, the CR-Report would change its
MODELING-IN-THE-LARGE 241
Read record
statements with
PI seq
Setup seq
Get header stuff from input
Write header stuff to output file
Setup end
Body seq
Process iter_ until finished
Do what’s needed
Write processing stuff to output file
Process end
Body end
Finish seq
Write final stuff to output file
Finish end
Fig. 3.21. A program that creates an intermediate file.
PI before inversion).
In this code there are three places that stuff is written to the output
file. We would like to access the three different categories of output
immediately, rather than wait until the last of the stuff has been entered
in the intermediate file. Thus, we would like the first invocation of the
inverted PI to return the header stuff, subsequent invocations to return
the processing stuff, and the last invocation to return the final stuff.
Moreover, we do not want to change the structure of PI. We can do
this by introducing the variable QX, which is part of the state of PI that
remains persistent for all invocations during a single execution of P2.
In other words, execution always begins at the start of PI, and QX
provides a computed go to the appropriate segment of code. Figure 3.22
shows the structure of the inverted form of PI; added or changed lines
are underlined and are in bold. The final label of Q4 causes any
invocations after what is expected to be the last invocation to return the
final stuff. Notice that the suspend-and-resume has not removed the
structure clash, but it has provided an acceptable implementation of its
solution.
In summary, for simple examples such as the production of a CR
report, JSP is easy to use. Obviously, for more complex problems, more
experience with JSP is required. The discussion has not shown how JSP
manages the problems of error testing, identifying changes in the input
data, collating, and so on. The issue of structure clash, however, was
addressed, and two solutions were offered: the use of intermediate files
(which made sense in the case of the CR report program), and program
inversion (which can be used to extract transaction processing routines
from a much larger program). The orientation of JSP is clearly data-
MODELING-IN-THE-LARGE 243
In the section on JSP I introduced the Jackson diagram and showed how
it could be used to represent the structure of either data or a program.
Because I defined the diagrammatic notation in the context of the three
structured-programming constructs [BoJa66], it should be clear the
notation can be used to represent anything that can be displayed as a
flow diagram.13 In JSD we will use the notation to diagram models of
13.
’The Warnier notation also has the ability to represent these three basic constructs.
244 SOFTWARE ENGINEERING: A HOLISTIC VIEW
into action with the return of the Cl for testing. Test Change is an
iteration of Audit Change, and, as one would expect, the Audit Change
process is isomorphic to the Audit process of Design. The process
repeats until either the change is certified or abandoned. The next
process, Revise, is a selection: enter a New Revision or, as indicated by
the dash, do nothing (e.g., ignore the revision if the change is with¬
drawn). This Update process iterates until the Cl is no longer required.
Notice that, like the life of a Cl, the process diagrammed in Figure
3.23 is quite long; it can extend over years and perhaps decades. Thus
it is a long-lived process. In software design we are accustomed to think
in terms of fast-acting processes such as subroutines. But here we are
modeling entities that exist in the environment that our target system
will support. Unlike the JSP diagrams of Section 3.3.1, Figure 3.23 does
not model the structure of data or a program; rather, it models the
behaviors of a real-world object about which a program must manage
data. Consequently, if there are 1000 CIs, then there will be 1000 of
these structures active during the life of the system. And if the average
246 SOFTWARE ENGINEERING: A HOLISTIC VIEW
active life of a Cl is five years, then on average each process model will
have a life of five years. Obviously, one does not expect a single
program to run for five years, but it is reasonable to expect a system with
a persistent database to run for that period. We use JSD to model the
system; once the structure has been defined, JSP can be used to model
the programs.
The process in Figure 3.23 is not the only valid representation for
the Cl management process. In fact, one of the exercises suggests a
better solution, and Figure 3.24 contains an alternative process for
Update. Which Update description is best? That depends on how the
users actually manage the CIs. As with the requirements analysis tasks,
there seldom is a right answer; what we are doing is framing an answer
to a question in order to create a system that produces the answer. But
there is an important difference between the diagrams in Figures 3.23
and 3.24. The first model indicates that, if an Update of the Cl is in
progress, it would be impossible to Release that Cl for modification
until the Revise is complete. But the second model is restricted to one
Update of a Cl; it does not prohibit concurrent updates for a CL That
is, Figure 3.23 models the entire life cycle for a Cl, whereas Figure 3.24
models the Update process only. Here is a second question for us: Do
we want to allow more than one update to a Cl at a time? If the answer
is positive, then we probably want two models: an amended model of the
Cl that does not include the update, and a separate model of Update as
shown in Figure 3.23. Obviously, decisions about the best real-world
model are independent of the JSD methods we use to represent that
model.
This is a good place to pause. Where have we been and where are
we going? We know that before we can build a software product we
first must know what it is to do (i.e., its requirements). Once this has
been established, we lay out the overall architecture of the system
(modeling-in-the-large), and finally we add the details to create an
implementation (modeling-in-the-small). When I described structured
analysis/structured design, I pointed out that the analysis method models
the user environment to create a context diagram, and that diagram
defines the system boundary. From that point on, the analysis process
both identifies requirements and establishes the system structure; the
subsequent structured design activity is concerned with modeling-in-the-
small. In this section on composition, I reversed the order of discussion.
1 began with JSP, which is clearly modeling-in-the-small, and I have just
jumped all the way back to the requirements analysis aspects of JSD.
Unfortunately, the world of the software engineer just is not neat. I could
organize the presentation to hide that fact, but sooner or later you
would find out. Chapter 6 discusses how management can make
software development seem neat, but here we ought to examine all those
hidden corners (or, to change metaphors, paint the portrait warts and
all).
One of the problems with a composition philosophy is that it does
not fit neatly into the three-step flow suggested by the titles of Chapters
2 through 4. In our running example, we began with a generic require¬
ment, “Give me a good SCM system.” We then studied the problem and
produced a software requirements specification (SRS) that details what
we are expected to deliver. This serves as the acceptance criteria for
what we finally deliver: Does the system manage the CRs and CCB
actions as prescribed? Does the system allow the users to access and
update CIs as detailed? And so on. But that list does not (and ought
not) tell us how to build the system that delivers the enumerated
features. Still, it would be nice to know how each requirement was
implemented. When we use a decomposition technique, it is not
difficult to associate requirements with implementation units. One
begins with a decomposition of the entire system into components, and
then associates each requirement with the components that implement
the behaviors that satisfy it. This linking process can be repeated with
each level of decomposition; it is called providing traceability. But with
composition, we have a different approach. We begin by modeling the
universe in which the system will operate, and then we design the
software to support that model. The SRS still provides a checklist for
the system’s mandatory features; each will be provided, but it will be
difficult to associate requirements with the implementation units that
satisfy them. Of course, in the end the product must comply with the
SRS independent of how it was designed. But we should recognize that
the SRS plays a very different role when we are designing a system using
248 SOFTWARE ENGINEERING: A HOLISTIC VIEW
14I have quoted Jackson directly here because the phrase “suffer or perform
actions” often is used in the definition of objects in object-oriented programming.
Also, we have seen the term entity used in a slightly different setting in the
material on semantic data modeling and the entity-relationship model. The point
is that there are a variety of names for the real-world items of interest to the
designers, and many (overlapping) paradigms exist for modeling these items and
the systems that support them. More on this topic is given in Chapter 4.
MODELING-IN-THE-LARGE 249
■ Fail The Cl fails the acceptance test and must be returned for
an additional correction cycle.
1. Available = "No"
2. Change-for = CR-number
3. Status = "Inactive"
4. Status = "Active"
5. Status = "Complete"
6. Status = "Accept"
7. Available = "Yes"
Q1: Cl seq
/* Some initialization of Cl data entry */
Q2: Update iter until retirement
Release seq
Available="No"
Change-for=CR-Number
Status="Inactive"
Release end
Modify seq
Correction iter Status="Accept"
Change seq
Status="Active"
Change end
Q3: Evaluate seq
Status = "Complete"
Evaluate end
Q4: Accept sel lnput=F
Pass seq
Status="Accept"
Pass end
alt lnput=Fail
Fail seq
Status="Active"
Fail end
Accept end
Q5: Correction end
Modify end
Revise seq
Available="Yes"
Revise end
Q6: Update end
/* Some retirement of the Cl entry */
Cl end
In this text I have added some labels to indicate where the flow may
be broken down into suspend-and-resume subprocesses.
■ Get CIs Have the CIs assigned to the developer for update.
■ Work Perform the task work as broken into work units. (Work
units may be defined as a time period, such as a week, or a
product deliverable, such as a module change.)
■ Deliver Deliver the updated CIs for entry into the SCM system.
process were ignored. For example, we did not try to show that a CR
results in both task assignments and the release of CIs. Also we did not
indicate restrictions such as that one can assign a task only if the
necessary CIs have Available="Yes"; we simply record these constraints
MODELING-IN-THE-LARGE 255
informally for later use. Our immediate concern has been the definition
of the real-world processes, and the data they use. We are composing
the system as a set of processes that must communicate with each
other.15
As we examine the processes that we have defined, we may find that
the system model does not support all the desired actions. When this
occurs, we must introduce new processes. For example, notice that the
Task entity produces some status data for each Task Unit. If we wanted
to report on the status of all tasks associated with a given CR, however,
we would need another process (CR Progress) that could keep track of
the progress of task units by CR. In general, processes are added for
three reasons:
Thus, the model begins as a description of the real-world that the system
is to support, and then additional system-oriented processes are added.
As Cameron puts it,
Now that we have defined a set of processes, we are ready for the
next step in JSD: the construction of a network of the process models.
By way of introduction, Figure 3.29 contains a fragment of the network
that shows how the CR entity releases a Cl for Correction. The circle
indicates that what is transmitted from the CR entity to the Cl entity is
a data stream (or message). The two processes are shown here at the
network level as boxes; the boxes can be expanded at the process level as
the Jackson diagrams in Figures 3.25 and 3.27. For a slightly more
complex illustration of a network fragment, consider the CR Progress
)
256 SOFTWARE ENGINEERING: A HOLISTIC VIEW
process, which gets its information from both Task and CR. Its
processing is contained in the network fragment shown in Figure 3.30.
At the end of each Task Unit the Status data are sent (by the data
stream S) to the process CR Progress; the two lines cutting the upper
line indicate that more than one such data stream may be sent. The CR
Progress process also receives requests to produce a status listing. The
request is indicated by the data stream R input to the process, and the
data stream L represents the listing produced by the process. When the
request is made, information is required about the CR associated with
the task, and that is accessed by examining the CR state vector, which is
indicated by the diamond.
In summary, at the modeling level the processes communicate with
each other in two ways.
MODELING-IN-THE-LARGE 257
The system network with these six processes is given in Figure 3.31.
The network shows data stream CR input to the process CR. This
represents the receipt of the physical change request. In some descrip¬
tions of JSD, this input will be expanded into a process CR-0 that sends
the data stream CR to the process CR-1. Here CR-0 represents the real-
world change request, and CR-1 is its representation within the SCM
258 SOFTWARE ENGINEERING: A HOLISTIC VIEW
■ The modeling phase in which one models the reality that the
system is to support. First the entities and actions are identified,
and then the entities are expressed as sequential processes using
the Jackson diagram notation. Data attributes also are identified
as the diagrams are augmented and transformed into text.
Once these questions are resolved, the software engineer can focus on
the problems unique to his assignment:
reasonable fashion.16
As a student of software engineering, I am concerned with how the
software process is conducted and the methods that we use to organize
it. From that perspective, methods and tools are interesting to me. But
most readers will be software engineering practitioners, and their
attention will be dominated by the problem domain and the target
support environment. Methods and tools are part of their problem¬
solving armory; serviceability and commonality must be favored over
novelty and specialization. Which suggests that each organization
probably ought to select one method just as it tries to limit diversity in
its programming languages, operating systems, and computers. Natural¬
ly, the standards should not be rigid, and every organization should be
encouraged to move from one family of methods to another whenever
the efficacy of the latter has been demonstrated. Therefore, I think it
important for all students of software engineering to understand both
the top-down methods that we have traditionally used and the formally
based constructive techniques that we are currently adopting.
Given a choice of methods and asked to select one as the standard,
which should one choose? The short answer is that this depends on the
setting and sponsors. If one method already has been selected and is
successful, then one should be careful in making changes. One of the
responsibilities of management is to reduce risks, and sometimes there
is more risk in moving to a productive, but untried, technology than in
continuing along the current path. (More on how to evaluate methods
is presented in Chapter 6.) But what if, after reading the first three
chapters, management recognizes that what they are doing is so bad that
immediate change is necessary? The first criteria is to go with the
method that is best understood. If the supervisors are very familiar with
one method, then it should be given primary consideration. (Of course,
if it is an obsolescent method, then this fact should be considered.)
Also if there is an organization that can train the staff in a method, that
method also should rank high on the list. (Remember, though, that it
may take as much as a year before the staff becomes proficient in the
method.)
Finally, we get to what you have been waiting for: all things being
equal, which method is best? Again, I offer only a soft answer.
Yourdon’s view of “modern structured analysis” presents a method for
combining a variety of modeling tools to produce a design. The
structure of the design is top-down, but the method used to construct it
is not. Similarly, JSD provides an alternative method for modeling a
system. As I have presented the methods here, each begins by examining
the external world that the system must support, then turns to issues
regarding how the system must be organized to provide the identified
services, and finally considers how to program a solution. That is, each
guides the problem solving down a path from the domain in which the
system will be used, to the structure of the system that implements the
solution, down to the details of that implementation. As the problems
change, so too do the tools used to resolve them. I am not an ideo¬
logue, and would not necessarily choose one as the best, but I can warn
the readers of some dangers to be avoided.
MODELING-IN-THE-SMALL
As with any large organization, an ice cream factory must organize its
work into small, manageable units. Lorenz offers us one modularization
for a company with a large number of flavors. Some very important
software engineering principles are employed here. Individual flavors
may be added or removed with no impact on the overall system, each
flavor has its own operator that hides information from the other
flavors, and one can model the whole system as specializations of the
FLAVOR_DESK class. I wonder if all ice cream factories have such an
elegant architecture.
This chapter on modeling-in-the-small is really a chapter about how
to construct software modules. I begin with some generalizations that
will be valid for any decomposition philosophy. Documentation is
always necessary, and it must be maintained over the life of the module.
There are standards for good style, which should become ingrained in all
software engineers. Simplicity comes before efficiency; given an efficient
design, the transformation of an inefficient module into a more efficient
one is a refinement. But I expect the reader knows this already.
The bulk of the chapter is concerned with encapsulation techniques.
I introduce this from an historical perspective. Parnas’s views on
modularization and information hiding are described, and abstract data
types are reviewed. After the underlying concepts are presented, Ada is
used to illustrate the abstract data type. The next section builds on the
abstract data type and describes object-oriented programming. These
ideas are expanded to explain object-oriented design and object-oriented
analysis. Thus, after starting with an examination of how to define
software modules, I end up describing another analysis method.
The final section introduces the program proof. It is pointed out
that the proof is not something that one does after the program is
written; rather it involves building a program as its proof. In that sense,
the proof method is universal; it is not limited to the program-coding
activity.
270 SOFTWARE ENGINEERING: A HOLISTIC VIEW
This is the third, and final, chapter with modeling in its title. In
Chapter 2, we began by analyzing and modeling the features a new
system must provide. These essential properties were identified in a
Software Requirements Specification (SRS), which became the defining
document for the target system. During modeling-in-the-large we used
what we had learned from the earlier analysis to establish the system
characteristics (e.g., system structure, interface definitions, algorithm
descriptions). The result was a preliminary design that determined the
target system structure and the functions that it should support. As
designers, we had the freedom to do anything so long as the finished
system included (and did not conflict with) the features detailed in the
SRS.
Now that the modeling-in-the-large is complete, most of the
application-specific issues have been resolved. We have descriptions of
the key report formats; we may assume that they express the users’
desires. We know the principal user interfaces; we need only create the
programs that implement them. The critical algorithms are identified in
the minispecs and design documents; we now must either find programs
to be reused or code the algorithms. In short, modeling-in-the-small
begins when most of the application domain uncertainties have been
cleared up and a framework for the individual programming units has
been established. We are at that point in the essential software process
(Fig. 1.5) where we will be concerned with implementation issues. Our
problem solving now will be guided by the design requirements and our
knowledge of the implementation domain.
■ The test plan, test cases, and ultimately, the test results.
The folder is retained for the life of the unit, and it serves as the
repository for all information regarding that software unit. In time,
because the detailed design information is less expensive to maintain in
independent folders, the collection of folders replaces the Low-Level
Design Specifications of the design specification. As long as the
contents of the folder are consistent with the high-level design specifica¬
tion, one can make changes to the former without modifying the latter.
From the perspective of configuration management, both the High-Level
Design Specification and the unit folders are considered CIs, but only
the specification is subject to formal control. That is, the individual
designers are responsible for the integrity of the unit folders, and only
a subset of their contents (e.g., the source code) is managed by the SCM
system. This represents a trade-off between the cost of and the need for
formal control.
It is logical to present this discussion of the design specification and
the unit folder in one place. This does not imply, however, that
documentation is a separate activity; indeed, documents are written and
folders are compiled as the design decisions are formalized. For
example, it is common to use a program design language (PDL) as an
intermediate structure between the high-level statement of what a
module should do and the operational module code that produces the
implementation [CaGo75]. The PDL removes some of the intermediate
details to clarity the processing flow. Obviously, the PDL is created
274 SOFTWARE ENGINEERING: A HOLISTIC VIEW
before the code, and it becomes a part of the detailed design (and thus
the unit folder). If, after the code has been tested, program changes are
made by first revising the PDL and then altering the code, then the PDL
should be retained as an important component of the unit folder. But
if the PDL was simply a device to go from fuzzy to clear thinking (as
with the Orr context diagram of Section 2.3.2), then there is little value
in retaining the PDL in the unit folder when the code no longer reflects
its structure. Thus, the contents of the unit folder may vary over time,
and one may archive obsolete documentation when it has no further
potential use.
In time, software maintainers will go directly from the high-level
design documents to the code.1 Therefore it is important that the code
itself be described with a prologue similar to that outlined in Figure 4.1.
This prologue provides a general introduction to the module and also
identifies all modifications by date, designer, and purpose. The
referenced documents (including the change request, test data, and test
results) will be available in the unit folder, and the short prologue
description serves as a useful reminder and index of changes.
In this discussion of specifications, I have been very careful to avoid
giving detailed document formats. There is a reason for this. I believe
that documentation is an integral part of the intellectual process of
design. There are two essential reasons for this.
1If the maintainers reference the code because the high-level documentation
serves as a clear road map to the models to be altered, and if the module code
employs a style that makes it easy to comprehend, then this referencing of the
code is a good thing. But if the maintainers have to rely on the code because all
other system documents are of little value, then this is a symptom of poor
management.
MODELING-IN-THE-SMALL 275
an end.
This view contrasts with what I have previously referred to as the
“document-driven” approach, in which the design effort is organized
around the deliverable documents. Too many people have been taught
to conduct software engineering in that manner, and, in that sense, the
treatment of documentation in this book is revisionist. We must think
of the documents as representing incremental expressions of the problem
solution and not as deliverable items. Form must follow function, and
if a document has marginal immediate or long-term value, then little
effort should be expended on it. Although our documents will be useful
for management confidence and sponsor involvement, that always should
be a secondary consideration.
9
Actually, the language is a representation scheme for the computational
model, and the compiler (or interpreter) is the tool.
MODELING-IN-THE-SMALL 277
Most readers of this book are either software engineers with the
prerequisite skills or persons who have no need to program. Therefore,
I need not teach the reader about programming style, and I can presume
that if programming is important, the reader’s code
I would hope that no readers would see any reason to change their style
on the basis of what I have to say in the next few paragraphs. Neverthe¬
less, it is instructive to review the style elements in the context of
problems they address.
The Elements of Programming Style consists of an introduction plus
six chapters on expression, structure, input and output, common
blunders, efficiency and instrumentation, and documentation. In what
follows, I have regrouped the style rules3 according to some attributes
of the software process that mandate their adoption.
3The indented rules are quoted from the Summary of Rules [KeP174, pp.
135-137]. In all, 63 rules are given in the book, and about half of them are
repeated here.
MODELING-IN-THE-SMALL 279
This listing has emphasized why the rules are important, not just what
the rules are. As we will see, the methods and languages that we use
may eliminate the need for some rules, but such improvements will not
affect the underlying problems that the rules address.4
Notice that many of these rules suggest that correctness should be
separated from efficiency. First one should get a program that does the
job expected of it, and then one can address its efficiency. (This is the
motivation for Balzer’s operational approach to the software process
discussed in Section 1.2.3.) Bentley warns that “we should almost never
consider efficiency in the first design of a module, and rarely make
changes for efficiency’s sake that convert clean, slow code to messy, fast
code” [Bent82, p. 107], He observes that various studies have shown
4If, after reading the above list, the reader feels insecure about his program¬
ming style, I urge him to read the Kernighan and Plauger book or one of
Bentley’s programming pearl collections [Bent86, Bent88]. Fairley also provides
an effective list of dos and don’ts of good coding style [Fair85],
280 SOFTWARE ENGINEERING: A HOLISTIC VIEW
Modifying Code
Loops Logic
1. Code motion out of loops 1. Exploit algebraic identities
2. Combining tests 2. Short-circuiting monotone
3. Loop unrolling functions
4. Transfer-loop unrolling 3. Reordering tests
5. Unconditional branch removal 4. Precompile logical functions
6. Loop fusion 5. Boolean variable elimination
Procedures Expressions
1. Collapsing procedure 1. Compile-time initialization
hierarchies 2. Exploit algebraic identities
2. Exploit common cases 3. Common subexpression
3. Coroutines elimination
4. Transformations on recursive 4. Pairing computations
procedures 5. Exploit word parallelism
5. Parallelism
are too often selected with the prime intent to demonstrate what
a computer can do. Instead, a main criterion for selection
should be their suitability to exhibit certain widely applicable
techniques. Furthermore, examples of programs are commonly
presented as finished “products” followed by explanations of
their purpose and their linguistic details. But active program¬
ming consists of the design of new programs, rather than
contemplation of old programs. As a consequence of these
teaching methods, the student obtains the impression that
programming consists mainly of mastering a language . . . and
relying on one’s intuition to somehow transform ideas into
finished programs. [Wirt71, p. 221]
In this process, both the program and the data specifications are refined
in parallel. This intertwining of the procedures and the data they use
solution to the target problem, and the formal model is its realization.
Naturally, when we are trained to think about our problems in (concep¬
tual) representations that are close to the formalisms of the solution
space, it is easier to go from our mental models to the operational
models. In fact, this is why the PDL (or pseudocode) is so effective; it
provides a bridge from the intended implementation to the product’s
construction. When using stepwise refinement, the incomplete solution
acts as a kind of pseudocode.
Wirth concludes his paper with the following five-point summary.
The orientation of this paper was the teaching of programming, but the
method is valid for development by experienced programmers. Wirth’s
final sentence stated, “If this paper has helped to dispel the widespread
belief that programming is easy as long as the programming language is
powerful enough and the available computer is fast enough, then it has
achieved one of its purposes.” Similar comments regarding the
limitations of advanced programming languages would be repeated more
than a decade later by Brooks in his “No Silver Bullet” paper [Broo87,
see Section 1.3.3]. Thus, we cannot avoid confronting the difficulties of
creating an implementation.
Wirth presented a top-down, decomposition technique, which
borrowed heavily from the earlier work of Hoare and Dijkstra. The
problems emerged as the leaves on a tree, and each node represented a
decision. (Of course, backtracking might reconfigure the tree.) In the
MODELING-IN-THE-SMALL 285
Again we see the tension between the mind, which is associative and
based on judgment, and mathematics, which is hierarchical and precise.
In Chapter 3 I organized modeling-in-the-large into two camps, each
of which emphasized the strengths of one of these modeling orientations.
Decomposition took the top-down approach, and the nodes in the
hierarchy represented refinements and detailing; composition, on the
other hand, began by modeling what was best understood and then (with
JSD) formally represented that model for conversion into an implemen¬
tation. In each case, the “possible solutions to a given problem emerge
as the leaves of a tree.” That tree, of course, represents a trace of the
solution process and not the form of the solution. In the context of
Wirth’s 1971 paper, the solution (i.e., program) also was a tree, but the
concept of stepwise refinement need not be restricted to what I have
labeled as decomposition methods. It is a general problem-solving
paradigm that can be applied to any method that produces a formal
solution. It is especially well suited to problem-solving activities that
have their expression in formal (provable) representation schemes.
Which raises the question, can stepwise refinement be applied to
different classes of representation schemes in modeling-in-the-small?
The table of contents suggests the answer: Yes, in two ways.
In modeling-in-the-large I pointed out that decomposition evolved
from methods concerned with how data were transformed and composi¬
tion from methods focusing on the structure of data. I can extend that
division to modeling-in-the-small by observing that one can look at how
abstract algorithms act on particular data structures (i.e., Wirth’s view
286 SOFTWARE ENGINEERING: A HOLISTIC VIEW
As can be seen from this list, the design was built on levels of abstrac¬
tion. Each level performed services for the functions at the next highest
level; each level had exclusive use of certain resources that other levels
were not permitted to use. Higher level functions could invoke lower-
level functions, but the lower level functions could not invoke higher-
level functions. In effect, each level represented a virtual machine for
the next level functions. Higher level functions need only know what
facilities the virtual machine provided, not how they were implemented.
Lower level functions need only provide the services specified, they need
not know how those services were used. In this way, the system could
be reduced to “encapsulated levels” that were both intellectually
manageable and verifiable. In the sense of the separation of concerns,
given that level n had been verified, one could concentrate on develop¬
ing level n + 1 without concern for level n’s implementation.
In 1972 Parnas wrote two important papers that showed how to
produce a program hierarchy in the sense just described. I begin with
a discussion of the second of these papers, “On the Criteria to Be Used
in Decomposing Systems into Modules” [Parn72b], Here Parnas began
with the observation that modular programming allows one module to
be written without knowledge of the code in another module, and it also
MODELING-IN-THE-SMALL 289
■ Module 1: Input. This reads in the lines from the input media.
(Remember, this was 1972, and one generally used input files
rather than databases.)
■ Module 2: Circular Shift. This would be called after the input was
complete; it prepares a two-field index containing the address of
the first character of each circular shift and a pointer to the input
array where the data are stored.
■ Module 2: Input. This reads the lines from the input media and
calls a procedure to store them internally.
8One could update this by talking of the structure of the DFD. Compare
Parnas’s observations with those of Jackson and Cameron cited in Chapter 3.
292 SOFTWARE ENGINEERING: A HOLISTIC VIEW
■ Provide the intended user all the information that he will need
to use the program, and nothing more.
Figure 4.4 contains the specification for a stack. The single quotes
identify values before the function is called, the brackets indicate the
scope of quantifiers, and = is “equals.” What we see here is an example
of an abstract data type. In this case the specification is independent of
the implementation; the abstract data type extensions described in
MODELING-IN-THE-SMALL 293
Function PUSH(a)
possible values: none
integer: a
effect: call ERR1 ifa>p2Va<0V ’DEPTH’ = pi
else [VAL = a; DEPTH = ’DEPTH’ + 1;]
Function POP
Possible values: none
parameters: none
effect: call ERR2 if ’DEPTH’ = 0
the sequence "PUSH(a); POP" has no net effect if no error calls
occur.
Function VAL
possible values: integer initial; value undefined
parameters: none
effect: error call if ’DEPTH’ = 0
Function DEPTH
possible values: integer; initial value 0
parameters: none
effect: none
pi and p2 are parameters, pi is intended to represent the
maximum depth of the stack and p2 the maximum width or
maximum size for each item.
Section 4.2.1 allow the designer to generate the implementation from its
specification. Nevertheless, the principle is the same. As Parnas put it,
This is the tension between conceptual and formal modeling in the essential
software process described in Section 1.2.2. Notice that the validity of the method
will depend on how well the specification expresses the concept. Parnas’s fourth
criteria for a specification was that it should discuss the program in terms
normally used by the user and implementer. He specifically excluded specifica¬
tions in terms of the mappings they provide between large input domains and
large output domains or their specification in terms of mappings onto small
automata, etc.” [Parn72a, p. 330] In general, for a formalism to be effective, the
conceptual and formal models must be (in some sense) “close” to each other.
For more of the author’s views on this topic, see [B!um89],
MODELING-IN-THE-SMALL 295
denote the same stack. Therefore the class of stack objects is represent¬
ed by equivalence classes of the set of all expressions, which are
determined by the axioms. If the axioms are well chosen, then the
equivalence classes are unique. Liskov and Zilles point out that if
296 SOFTWARE ENGINEERING: A HOLISTIC VIEW
1 CREATE (STACK)
2 STACK(S) & INTEGER(I) 3 STACK(PUSH(S,I)) &
[POP(S) * STACKERROR 3 STACK(POP(S))] &
[TOP(S) * INTEGERERROR 3lNTEGER(TOP(S))]
3 (VA) [A(CREATE) &
(VS)(VI) [STACK(S) & INTEGER(I) & A(S)
3 A(PUSH(S,I)) & [S * CREATE 3 A(POP(S))]
FUNCTIONALITY
CREATE: STACK
PUSH: STACK X INTEGER STACK
TOP: STACK INTEGER u INTEGERERROR
POP: STACK STACK u STACKERROR
AXIOMS
V TOP(PUSH(S,l)) = I
2' TOP (CREATE) = INTEGERERROR
3' POP(PUSH(S,l)) = S
4' POP(CREATE) = STACKERROR
Fig. 4.6. Algebraic specification of the stack abstraction.
(Reprinted from [ZiLi75] with permission, ©1975, IEEE.)
axioms that affect them. Both the users and implementers can see the
interesting properties of the stack, and nothing more. How these
operators are implemented and how the stack is structured are hidden
from view. Abstract data types and object-oriented programming are but
two steps from this algebraic specification in the direction of automated
support to improve constructibility and comprehensibility, and it is now
time to follow that path.10
In the previous section, the concept of an abstract data type (ADT) was
presented from the perspectives of both information hiding and
formalization of the specification process. There is also a strong
intuitive foundation for this concept, which I now shall elaborate. When
high-level programming languages were introduced, there was a need to
10By way of closure, let me point out that Guttag, a colleague of Liskov,
pursued the role of abstract data types in verification [Gutt77], Together they
wrote a text on abstraction and specification in program development, which
provides an excellent foundation for building insights into this approach [LiGu86],
Parnas also continued his work. An important paper was published in 1979
containing useful rules for easing the extension and contraction of software
[Parn79]. He spoke of program families and ways to design for change. These
methods were adapted for use in specifying the requirements for the A7-E aircraft
with the title Software Cost Reduction (SRC) [Heni80, Kmie84, PaCW85]. For
a comparison of SCR, structured design and object-oriented design in real-time
systems, see [Kell87]. It is interesting to note that a goal of Parnas’s early work
in data abstraction was to establish mathematical interpretations for specifications.
His current research continues in this direction, and the emphasis is placed on
making the formal interpretations explicit (e.g., [PaSK90]).
298 SOFTWARE ENGINEERING: A HOLISTIC VIEW
associate each variable name with the format and number of words in
storage associated with that name. Thus real and integer were essential
identifiers for the compiler. Not only did the data type identify how
storage should be managed, but it also specified which computational
instructions should be used. One set of arithmetic operators was used
for real numbers, and another for integers. The decision regarding the
evaluation of mixed expressions was left to the compiler designer. The
expression 1.3 + 1 could be evaluated as either the integer 2 or the real
numbers 2.0 or 2.3 depending on the method selected for interpreting
mixed expressions.
Out of this need to link variables with machine instructions and
storage patterns came the recognition that the data type also was an
effective tool in software development. It provided a mechanism for
establishing a design criterion that could be distributed throughout the
system. For example, if one defined CI_No as an integer, then all inputs
could automatically be tested for conformance to the defined characteris¬
tics of an integer. Letters, special characters (other than an initial + or
-), or a decimal point would be rejected as invalid inputs. Of course,
defining CI_No to be of type integer also allows one to find the product
of two Cl identifiers, to increment an identifier by one, and other
similarly meaningless operations. One could avoid this by defining
CI_No as an alphanumeric data type, but then one would lose the
automatic checks for nonnumeric inputs. Ideally, one would like to have
a stronger typing mechanism so one could define what was important
(e.g., the Cl identifier is an unsigned number within a specified range)
and rule out what would be inappropriate (e.g., arithmetic on identifi¬
ers).
The introduction of strong data typing does have a cost in flexibility,
and some programming languages (such as LISP) support only one type.
Nevertheless, it is broadly accepted that the benefits of data typing
outweigh their limitations, and there has been a trend toward the
development of strongly-typed languages and environments (including
many written in LISP). Most modern programming languages extend the
scope of data types beyond what is required by the compiler. A common
feature is the enumeration type, with which, for example, SIR_Status can
be defined as the set of values {Analysis, Review, Pending acceptance,
Not approved (open), Waiting assignment, Changed authorized,
Validated change (closed)}. When SIR_Status is defined with that
value set, only the specific input values will be accepted. (A specifica¬
tion that is both less verbose and less clear is the set {A, R, P, N, W, C,
V}, see Fig. 2.7.)
If we begin with this interpretation of the data type concept, then we
recognize that each type establishes a class of objects (e.g., reals,
integers, Cl identifiers, SIR status indicators) and the valid operations
on them (e.g., arithmetic operators, concatenation, input, output). One
MODELING-IN -THE-SMALL 299
■ Tasks. These units may operate in parallel with other tasks. The
task specification establishes the name and parameters, the task
body defines its execution, and a task type is a specification that
permits the subsequent declaration of any number of similar
tasks.
Notice that in each case the principle of information hiding relegates the
implementation details to a body of code; there is always a specification
available that simply defines the unit’s interface and (with comments)
describes its use. Because the abstract data types normally are imple¬
mented as packages, I shall discuss only that programming unit.
Let us begin with the construction of a package for the stack
abstract data type. Using the algebraic specification described in the
previous section (Fig. 4.6), we can see that the functionality of a stack
is expressed in the visible part of the STACKS package shown in Figure
4.7. It defines the type STACK and presents the four functions
CREATE, PUSH, TOP, and POP that operate on that type. (Ada
reserved words are in bold type.) Notice that the type of stack is limited
private. Both private and limited private imply that the structure of
The following overview draws heavily from Booch’s text, which provides an
excellent introduction to the use of Ada [Booc83]. Ada, by the way, is a
registered trademark of the U. S. Department of Defense.
MODELING-IN-THE-SMALL 301
package STACKS is
type STACK is limited private;
function CREATE return STACK;
procedure PUSH(S : in out STACK; I : in INTEGER);
procedure TOP(S : in STACK; I : out INTEGER);
procedure POP(S : in out STACK);
INTEGER_ERROR, STACK_ERROR, OVERFLOW : exception;
13In this example I have chosen to call the data type STACK. This can lead
to confusion when distinguishing between the name of the package and the name
of the data type. If I chose the data type name STORAGE, the distinction
between the package and type names in Figure 4.7 would be quite clear. The
result, however, would be a loss in clarity for the package users. The Software
Productivity Consortium recommends the use of a suffix to denote the object’s
class. Using this convention, we would have a package name of STACKS_
PACKAGE and a data type of STACK_TYPE.
302 SOFTWARE ENGINEERING: A HOLISTIC VIEW
private
type STACK is
record
STORE : array (1 .. 100) of INTEGER;
INDEX : NATURAL range 0 .. 100 := 0;
end record;
Fig. 4.8. The private part of the Ada STACKS package.
(Naturally, the cost for changing either the private part or the body will
be a recompilation of the affected program units.) The definition shown
in Figure 4.8 is one of an array of maximum size 100. The data
structure used is the record, and a separate variable, INDEX, is
maintained as a pointer. The range for the pointer is from 0 to 100;
negative values are not permitted. I could have defined INDEX to be of
the type INTEGER, but the use of the type NATURAL improves
understandability. For each CREATE, a 100-integer array and an integer
pointer (initialized to zero) will be allocated. The user will have no
access to either component of the record except as defined by PUSH,
TOP, and POP. Notice that there is nothing in the algebraic specifica¬
tion of Figure 4.6 that is comparable to the private part shown in Figure
4.8; the former does not address implementation issues, but the latter
must.
Finally, there is the package body (also hidden from the user), which
implements the STACK operators. Before describing it, however, I will
examine some of the potential dangers we face in creating the implemen¬
tation. Returning to Section 4.2.1, recall that I showed how the
algebraic specification of Figure 4.6 was equivalent to the 9 axioms in
the axiomatic specification of Figure 4.5. But I never showed how those
axioms were selected or proved that they ensured the desired behaviors;
I only showed what would happen if one of the axioms was omitted. We
are familiar with stacks, and we accept those 9 axioms; there is no need
to retrace the steps from the “fuzzy thinking” to the clear thinking that
produced them. Yet much of what we will do with encapsulation
techniques (and, in general, software engineering) deals with problems
for which we must create solutions. We may use techniques such as
abstract data types to hide unnecessary details, but the hiding mechanism
does not ensure the correctness, consistency, closure, or completeness of
what is hidden.
Figure 4.9 contains the body of the package. Notice that the axioms
of Figure 4.6 are nowhere to be found. There is a proof obligation to
demonstrate that the procedures are models of the theory expressed in
those axioms (see the discussion of the LST model in Section 1.3.2).
That is, one must prove
TOP(PUSH(S,l)) = I
TOP (CREATE) = INTEGERERROR
MODELING-IN-THE-SMALL 303
POP(PUSH(S,l)) = S
POP(CREATE) = STACK_ERROR
I leave the proof as an exercise, but it is important to note that the Ada
package can provide only a mechanism for the representation of the
type’s functionality. Verifying that the hidden part does what is implied
remains the developer’s responsibility. In summary, there is a differ¬
ence between using a formal representation scheme and using a formal
method. Obviously, the scheme provides assistance, but a difficult
intellectual task remains.
The package presented in Figures 4.7 through 4.9 encapsulate the
concept of a stack as a set of up to 100 integers. This is a very narrow
definition. Some stacks will require a greater capacity, others less.
Some stacks will be integers, others reals or even characters. Of course,
one could simply copy packages and edit them. Ada eliminates the need
for this common implementation practice by supporting the generaliza-
The test for overflow in the body would then be coded as,
package STACKS1
is new STACKGEN(SIZE => 100, ELEM => INTEGER);
package SMALL_BOOL_STACK
is new STACKGEN(SIZE => 5, ELEM => BOOLEAN);
with SMALL_BOOL_STACK;
procedure MAIN_PROGRAM is . . .
declare
S :SMALL_BOOL_STACK.STACK;
Assuming that there was no ambiguity with the type STACK (i.e., the
type name was not used in more that one way in the current program
unit), one could eliminate the need for the package prefix with the use.
Figure 4.11 illustrates how the SMALL_BOOL_STACK might be used.
Notice that use trades convenience for clarity. A quick glance at the
body of Figure 4.11 gives little information about the stack BS. By over-
generic
SIZE : POSITIVE;
type ELEM is private;
package STACKGEN is
type STACK is limited private;
function CREATE return STACK;
procedure PUSH(S : in out STACK; I : in ELEM);
procedure TOP(S : in STACK; I : out ELEM)-
procedure POP(S : in out STACK);
INTEGER_ERROR, STACK_ERROR, OVERFLOW : exception;
private
type STACK_SIZE is NATURAL range 0 .. SIZE;
type STACK is
record
STORE : array (1 .. SIZE) of ELEM;
INDEX : STACK_SIZE := 0;
end record;
305
306 SOFTWARE ENGINEERING: A HOLISTIC VIEW
declare
use SMALL_BOOL_STACK;
BS :STACK;
B :BOOLEAN;
I, J INTEGER;
begin
Let us see how abstract data types can help us structure a solution to
this problem.14
The classical approach to this problem would be to use a decomposi¬
tion technique, identify processes called Release_CI and Updated, and
then write procedural code that incorporates the above conditions. In
terms of Parnas’s KWIC index problem (Section 4.2.1), this would
represent a modularization defined by the flow chart organization. Our
goal, of course, is to use the encapsulation property of the abstract data
type to create a modularization based on design decisions that can be
hidden from the users of the module (i.e., package or object). To
illustrate how this may be done, I begin with a package (CI_RECORD)
that defines the type CI_ID with the following operations:
14The astute reader will recognize that the objects to be defined in the
following Ada packages are not persistent (i.e., they cease to exist when the main
program terminates). We will ignore that problem by assuming that the package
bodies are extended to data stored in files. Because I am less interested in the
details of an Ada implementation than I am in explaining how abstract data types
are used, I shall ignore the details of ensuring that the stored data persist.
308 SOFTWARE ENGINEERING: A HOLISTIC VIEW
■ NEW_CI This enters a new Cl with the version number 1.1 and
sets the CI_STATUS to Unassigned (i.e., the Cl is not currently
being modified and is available for assignment to a designer for
changes).
package CI_RECORD is
type CIJD is limited private;
function CREATE_CI(CN : in CI_No; D : in Description)
return CI ID;
procedure ACCEPT_CI(C : in out CIJD);
procedure ASSIGN_CI(C : in out CI ID);
procedure DELETE_CI(C : in out CIJD);
procedure EDIT_CI(D : in Description; C : in out CIJD);
procedure NEXT_CI(CN : in Cl-No; V : in out Version;
D out Description; S : out Status; C : out CIJD);
function STATUS_ID(C : in CIJD) return BOOLEAN;
CREATE_ERROR, ASSIGNJERROR, DELETE_ERROR,
UNDEFINED_CI : exception;
private
type Cl ID is
record
ID : CI_No; - Assume type already defined.
VER : Version; - Assume type already defined.
DES : Description; - Assume type already defined.
CIJ3TATUS is (Preliminary, Available, Unavailable,
Deleted);
end record;
package body CI RECORD is
function CREATE CI return CI ID is
begin
if [some test to see if CI_No is already assigned to a CIJD]
then raise CREATE_ERROR;
end if;
CIJD.ID := CN; CIJD.VER := H"; CIJD.DES := D;
CIJD.CI_STATUS := Preliminary;
return CREATE J3I;
end CREATEJ3I;
end CI_RECORD;
Rather than showing the Ada package for this set of operators, Figure
4.13 contains the pre and postconditions for each of the operators. As
with the VDM example given in Section 2.3.3, the representation in the
figure does not detail how the transformation takes place; it describes
310 SOFTWARE ENGINEERING: A HOLISTIC VIEW
type TASK_CI_DESIGNER_TABLE is
record
T : TASKJD;
C : CIJD;
D : DESIGNERJD;
end record;
TASK_CI_DESIGNER_TABLE
Thsk Cl Designer
1 12345 Joe
1 23456 Mary
2 34567 Joe
We now define the operator (procedure) SPLIT that takes the contents
of TASK_CI_DESIGNER_TABLE and creates instances of the ADTk
WORK_PACKAGE and TASK_ASSIGNMENT. In this example, the
results would be as follows:
WORK_PACKAGE TASK_ASSIGNMENT
TASK_CI_DESIGNER_TABLE
Thsk Cl Designer
1 12345 Joe
1 12345 Mary
1 23456 Joe
1 23456 Mary
2 34567 Joe
COMBINE(SPLIT(TASK_CI_DESIGNER_TABLE)) *
TASK_CI_DESIGNER_TABLE
The previous section on abstract data types showed how one language,
Ada, supported encapsulation. The objective, of course, was to
demonstrate the ADT and not the features of Ada. I now continue the
general discussion of encapsulation techniques with object-oriented
programming (OOP). Clearly, OOP implies a dependence on a
programming language. However, our interest is in the design and
implementation concepts, and I shall deemphasize language specifics.
This section explores encapsulation concepts in the context of modeling-
in-the-small; Section 4.2.4 examines the extension of these ideas to
modeling-in-the-large.
Smalltalk is accepted as the first language to popularize OOP. Its
origins can be traced back to Simula 67, which, as its name suggests, was
intended to support simulations [DaMN70].15 In a discrete simulation
one identifies the objects of interest and the events that activate them,
terminate them, change their state, etc. In Section 3.3 we used JSD to
model the SCM system as a discrete simulation. For example, the life
history of a Cl shown in Figure 3.23 can be interpreted as a simulation
of a CL For every Cl at any point of time, its current status can be
associated with exactly one box; the Jackson diagram also indicates the
states to which the Cl could move next. A common discrete simulation
application is the modeling of a factory. There are machines, machine
operators, and machine options. Several of the machines may be of the
same type or subtype. Both machines and machine operators respond
to events that alter the state of his machine. These events (such as, “Set
machine X for process Y”) may have time delays associated with them,
and the simulation reports (for each time slice) the state of each
machine, its output items, its production rate, and so on.
In the JSD simulation we modeled every object of interest as a
sequential process and then combined all the processes as a network. An
15Simula 67, simply called Simula since 1986, is actually a general purpose
language, and it is still in use [P00I86].
314 SOFTWARE ENGINEERING: A HOLISTIC VIEW
The first two items on this list are related. In combination, they imply
that each ADT maintains its state. This is a modularization issue.
Notice that when we presented the Ada ADT, it was described as a
package to be imported by a procedure. The procedure (and not the
ADT) maintained the state; the ADT, of course, hid the structural
details of that state from the procedure. In Ada, procedures invoke
other procedures, and many decisions must be made at compile-time.
Thus, the first difference between OOP and imperative languages
such as Pascal and Ada is that they have contrasting philosophies
regarding modularization. In Pascal and Ada, the module is a process,
and the data structures are used to make that process more efficient.
(Review the discussion of stepwise refinement in Section 4.1.3.) With
an OOP, however, the module is an object (or abstract data type) and
its role is both the management of the object’s state and the definition
of the processes (methods) that can act on that state. With an
imperative language, control paths are predetermined at compile-time,
and one module invokes another.17 With OOP there are no prede¬
fined control paths. Modules send messages to other modules based on
the dynamics of the processing. That is why Smalltalk is so effective for
managing interactive displays. The developer need not anticipate all the
processing options; he need only define the actions that are valid for an
object in a given state.
Given this different view of a module, the second two OOP
properties are necessary. Dynamic binding supports decision making
based on the state of an object. Clearly, the definition of the process
must be supplied before execution, but the determination of the desired
action will rely on the state information available at the time of
execution. For example, the actions to be taken in moving and clipping
a window will depend, in part, on the size and location of the window,
and this will have been determined during execution. The final OOP
property, inheritance, is useful in generalizing data types. The Ada ADT
does not support inheritance, and it is possible to have an object
orientation without it. Nevertheless, inheritance is generally considered
a prerequisite for OOP.
Now that we have defined four necessary characteristics of an OOP
language, let us stop briefly to review where we have been and where we
17-
Again, I am simplifying. Modula, for example, supports coroutines, and
Ada tasks operate asynchronously. See [Wegn87] for a classification of the
language features in the object-oriented paradigm.
MODELING-IN-THE-SMALL 317
are going. Recall that we are still in Section 4.2, entitled Encapsulation
Techniques. We are concerned with how we can use software to package
complex concepts in simpler, more directly comprehensible forms. In
the Parnas definition of information hiding, we seek methods to hide
everything about the implementation of a module that the user does not
need to know and everything about the use of a module that the
implementer does not need to know. Liskov and Zilles showed how to
apply this principle by capturing concepts formally. Finally, Section
4.2.2 illustrated these ideas by demonstrating how Ada could express the
concepts of a stack or a Cl with an ADT. As software engineers, our
principal interest is in the identification, representation, and validation
of concepts. The implementation language that we use is an important,
but secondary detail. Therefore, in the remainder of this section I shall
concentrate on the representational and operational concepts behind
OOP. The goal is not to introduce the reader to some new program¬
ming technique. Rather, it is to describe an important modeling
alternative and to show how it can be applied for both implementation
(this section) and design (the next section).
To continue. I have introduced the features of an OOP language
and observed that the essence of this paradigm stems from a decision
regarding modularization. I now reintroduce OOP, but this time I begin
from a modularization perspective. Meyer gives five criteria for
evaluating design methods with respect to modularity [Meye88].
Many of these principles echo what has been said earlier. The section
on structured design (3.2.2) addressed the issue of interfaces and
coupling, and Meyer supplies this section’s third definition of informa¬
tion hiding.
What differentiates Meyer’s discussion from those that preceded it
is that his goal is the development of a programming language and
environment that improves modularization by exploiting these five
principles. He concludes his discussion by introducing the open-closed
MODELING-IN-THE-SMALL 319
As we will see in Section 4.2.4, one can use object-oriented design (OOD)
for other than OOP languages. The further one gets from object-
oriented languages, however, the greater the burden for the developers.
As a general rule, we would like our programming tools to enable the
effective expression of our concepts. Therefore, we would expect future
development environments to incorporate the object-based features
described in this chapter.
There are many object-oriented languages other than Smalltalk and
Simula. Three popular languages produce applications coded in C.
C+ + is a superset of C and includes some features that have little to do
with object orientedness [Stro86], Objective-C grafts some Smalltalk
concepts onto a C base [Cox86], Both C++ and Objective-C use
preprocessors to generate the C code. Eiffel is a complete OOP
language; the output of its compiler is C code. Several OOPs have been
implemented as extensions to a Lisp environment; Loops [BoSt82] and
Flavors [Cann80] are the most frequently cited examples. Finally, there
are OOP environments that operate in personal computers; the March
1989 BYTE contains an overview and summary of object-oriented
resources. Eiffel is perhaps the best language for explaining OOP
concepts, and I shall use it in the following examples.
MODELING-IN-THE-SMALL 321
18
At the end of Section 2.3.1 a similar example was given with CIs that were
classified by both size (small and large) and language (Ada and C).
MODELING-IN-THE-SMALL 323
As with the example in Figure 4.12, I shall ignore the fact that we are
dealing with persistent objects.
I start with a demonstration of information hiding. Assume that we
have created the class Cl with the above routines (features or opera¬
tions). The code generated from Eiffel will use the class definition as
a template for creating instances (objects). To see how this is done,
consider the following fragment of some class called X. (Remember, all
modules are classes.)
ci_1: Cl
This declares that the variable (Meyer prefers the term entity) ci_1 is of
type Cl. This is quite similar to the statement in the Ada illustration of
Figure 4.11.
BS :STACK;
In each case, the type has a complex and hidden structure. With the
stack example, storage will be allocated to BS by the compiler; for ci_1,
on the other hand, a type has been associated with a label, but the
324 SOFTWARE ENGINEERING: A HOLISTIC VIEW
object does not exist (i.e., it does not yet have state). To create the
object, Eiffel supplies the predefined feature Create, which is available
for all classes.
ci_1. Create
This creates an instance of the object associated with ci_1. We now can
apply the operations listed above. For example,
declares that the object is a new Cl with the attribute values of number
"123456", descriptive title "Square root program", category "S" for source
code, and language "A" for Ada. Naturally, the user of the feature must
know the interface, but nothing more.
The feature to accept a program after update might be expressed as
This interface provides only the CI number ("123456") and the fact that
the update represents a version ("V") or release ("R"). Of course, we
also could have defined some feature that binds a CI number with an
entity (say, openjci). Using that feature would have implied that the CI
number of ci_1 already was bound to "123456", and the first parameter
in acceptjci would have been unnecessary. Finally, a sequence of code
to assign a CI for modification might first test the status of a CI before
assigning it to a designer (in this case to Joe) as follows.
Figure 4.14 displays the class definition for the above features. A
few comments are obvious. To begin with, Eiffel does not support the
enumerated type that we saw with Ada. There are two reasons for this.
First, Eiffel has been designed to be a “small” (Meyer uses the term
“ascetic”) language; it concentrates on devising powerful notations for
the most advanced and common constructs rather than shorthands for
all possible cases. Second, as an object-oriented language, the designer
is free to define classes for types when that is the best way to express the
concept. Another valid comment about the class definition in Figure
4.14 is that it does not seem to do much error checking. In acceptjci,
for instance, the code allows the processing of CIs that already are in
the Available status. I omitted that test for a reason. Refer back to
Figure 4.13 and observe how preconditions and postconditions were used
MODELING-IN-THE-SMALL 325
class Cl export
new_ci, accept_ci, status_ci, assign_ci, edit_ci info, edit ci text
feature - -
ci_no: INTEGER ;
cijiame: STRING ;
cijype. (Source, Object, Test_data, Description) [Invalid Eiffel
cijang. (Ada, C) ; syntax, used
cijjpdate. (Version, Release); for illustration
ci_status: (Available, Unavailable); only]
cijdevetoper: STRING ;
I used this notation because the declarative statement was much more
direct than an equivalent procedural selection (i.e., if . . . then . . .
else). Eiffel provides preconditions (called require) and postconditions
(called ensure). There also is an invariants clause that represents
326 SOFTWARE ENGINEERING: A HOLISTIC VIEW
■ Ct_C. This would be a class with the attributes s_file and ojile.
It would use the editor C editor for the feature edit_ci_text.
and assuming that a valid Cl number was bound to each entity label
(e.g., ci_1 represents the object with Cl number "123456"), then the
statement
ci_1 .edit_ci_text
ci_3.edit_ci_text
ci_2.edit_ci_text
The feature has been defined for the class, but there is not enough
information in the object’s state to decide which editor to invoke.
Because we would not expect edit_cijext to be valid except for
subclasses of Cl, we ought to define this feature in Cl as follows:
edit_ci_text is
require (the Cl number is valid)
deferred
end ; - edit_ci_text
Here, deferred implies that the definition of this feature is deferred for
definition in descendent classes; the feature will be valid only for those
subclasses.
The class hierarchy also controls other operations as well. One can
write the statement,
After execution, ci_2, which was declared to be of class (type) Cl, will
be of either type CI_ADA or CIJDOC (depending on the outcome of
“some test”). Now, the statement
ci_2.edit_ci_text
will be valid. The type associated with ci_2 was changed during
execution (which is an illustration of dynamic binding), and the
processing of the edit text feature will be determined by the object’s
current type (which is an example of polymorphism).
The instances of inheritance presented so far have all been of a
hierarchical form. Meyer’s seventh level of object happiness requires
more. Tb illustrate this highest level, note that HHI might want two
ways of subdividing their program CIs: by language (Ada or C) and by
operating environment (Unix or DOS). Some features may vary with
language, and others with environment. Multiple inheritance is
required. To see how Eiffel supports it, assume that there already is a
class called CI_UNIX. The following would then create the class of Ada
programs that run under Unix.
x: STACK [Cl];
That simple line certainly hides a lot of information from its user.
Given an object-based language (in this case Ada), Booch then showed
how to design a system with real-world abstractions that map onto
implementation representations.
In his current writings, Booch proposes a four-step life cycle for
OOD [Booc91, pp. 198-206],
In his initial work with Ada, the key decisions were related to the
identification of classes or objects that could be realized as Ada
packages. Booch developed a diagramming convention for Ada
packages, subprograms, and tasks that has found widespread acceptance.
The basic symbols are shown in Figure 4.15. Objects are indicated by a
closed, free-form unit. A subprogram is shown by a rectangle with a
small field at the top (representing the subprogram specification) and a
larger field at the bottom (for the body). The package has symbols on
the left border of the box. Oval symbols indicate abstract data types
(data objects), and rectangular symbols name the package’s operations.
Generic packages and subprograms are shown with dotted lines, and a
task (not to be confused with Task in this example) has the basic form
of a parallelogram. One may nest symbols (or networks of symbols)
MODELING-IN-THE-SMALL 333
Task
Subprogram
the object and module diagrams with state transition diagrams, timing
diagrams, and process diagrams. Figure 4.16 contains a simple class
diagram that shows the class of Status Report, which produces status
report objects. Each status report lists out the active CRs and SIRs
along with the name of the responsible analyst. The figure displays the
inheritance relationship between both CR and SIR and Requests for
Mods as well as that between Analyst and Employee. The class Status
Report uses Report Utilities. The class utility symbol is the shaded class
icon, and the relationship is indicated by the double line; the white
circle represents the interface. The implementation of Status Report
uses the three classes connected to it by the double line; the black circle
indicates this implementation detail. The solid directed line is used for
inherits; a dashed directed line (not shown in the figure) is used for the
instantiates relationship.
I conclude this discussion of OOD with an illustration of the module
diagramming technique. As befits the title of this chapter, the example
is implementation oriented; it reflects Booch’s earlier methods. The
first step in the process is the identification of the objects and their
attributes. In his 1983 book on Ada, Booch adapted a technique
developed by Abbott to extract both the objects and their operations
from a text description [Abbo83]. Variations of this strategy are used
with entity-relationship models and JSD. Here is a brief illustration of
the method; I begin with a description of the SCM system processing.
This is simply a restatement of the first step used in the JSD example of
Section 3.3.2. It really is at too high a level for this illustration, but it
facilitates a comparison between OOD and JSD.
With this text description of a process (algorithm), we can proceed
by analyzing its contents. We begin by identifying the nouns. These
may be classified as common nouns (i.e., a class of things such as vehicle
or table), mass nouns and units of measure (i.e., constraining characteris¬
tics or groupings as in the case of “traffic,” which refers to a collection
of vehicles”), and proper names and nouns of direct reference (i.e., names
of specific entities such as Ford truck or a specific table such as
Assigned_CI_Table). The first two noun categories define types (classes
or ADTk), and the third identifies individual objects within a type.
Adjectives indicate attributes. Naturally, the assignment of terms to
these categories depends on the semantic context and the level of
description. This language analysis, therefore, is not a mechanical way
to design a system, but it does provide a way to start. We first underline
the nouns.
As with the JSD example, there are three primary objects, CRs, CIs, and
tasks. I limit myself to those three common nouns because I choose to
treat NAR and ECP as attributes of CR, and Cl file and status as
attributes of Cl. Clearly, I am reading between the lines.
The next step is to repeat this process with the verbs to identify the
operations on the objects. We speak of operations suffered by an object
(i.e., those that generally result from the receipt of a message or input
by the object) and operations required of an object (i.e., those that
generally result in the sending of an message or output by the object).
Underlining the verbs, we get,
20
Now that Booch’s primary concern is for objects and classes, he no longer
finds this method very helpful. In a personal correspondence, he writes, “The use
of identifying nouns and verbs is dated, as it has proved to be of limited utility.
As I describe in the OOD book [Booc91], techniques such as CRC cards [class,
responsibility, collaboration, see BeCu89], domain analysis, and various classifica¬
tion paradigms, are much more powerful.” I include the example here because
there are frequent references to it, and the reader should be aware of the concept
and—in particular—its limitations.
MODELING-IN-THE-SMALL 337
SCM Software
21The first edition of this book was published in 1990. The second edition
changed some of the terminology and notation, and this discussion uses that of
the second edition. In some cases I include in parentheses terms used only in the
first edition.
340 SOFTWARE ENGINEERING: A HOLISTIC VIEW
One builds this model by first identifying the class-&-objects and then
establishing their dependencies using semantic data modeling. Next, the
details are reduced by grouping the objects into subjects. Finally, the
object data structures (attributes) and operations (services) are defined.
Obviously, an example will help.
In what follows I summarize the method presented in [CoYo91j.
That book contains sample illustrations and considerable explanation.
I have assumed that the reader is familiar with the object-oriented
paradigm and the SCM problem; thus my presentation is less complete
than that of Coad and Yourdon. Naturally, the interested reader should
return to the source. I begin with step 1, Identifying Class-&-Objects.
From the previous example, the reader knows that I will identify three:
CRs, CIs, and Thsks. If I were an analyst starting from scratch, however,
how would I determine what the objects were? First I would look at the
problem space (real world of HHI in this case) to identify the objects
(data and processing), which I would expect to see in the target system,
that interact with objects external to the system. I might identify objects
by reviewing the work flow, asking questions about the operations,
reading the existing documentation, or simply looking at pictures and
diagrams. In selecting the objects, I would concern myself with their
structure and interactions. For example, Change Requests (CR) and
Software Incident Reports (SIR) are both specializations of a generic
change request object. A prerequisite for an object is that it must
“remember” something of importance to the system (i.e., have state).
Other items for consideration include the roles played, the number of
attributes, services or attributes shared with other objects, etc. During
the process of identifying objects, some will be observed to have
marginal value (e.g., they remember nothing of importance, they
represent unique instances, or their data can be derived); these potential
objects should be eliminated. Finally, we select the objects of interest
to our model. They are given names consisting of a singular noun or
adjective + noun. We are building a vocabulary for our model, and the
names should be readable and clear. Figure 4.18 contains a collection
of Class-&-Objects that might be selected after the first iteration of step
1. The symbol for a class-&-object is a three component box inside a
shaded box. The name goes into the top section; the next two sections
are used to list the attributes and services.
Step 2, Identifying Structure, joins the identified objects by either
Gen-Spec (ISA or inheritance, formerly called classification) or Whole-
Part (IS_PART_OF, formerly called assembly). During the process of
identifying the structure, new objects may be defined. For instance, by
generalizing, a new object could be defined that heads a Gen-Spec
structure (e.g., Generic CR has been created as a generalization of the
two specific change request categories, CR and SIR). Alternatively, one
might enumerate a set of objects to create a Whole-Part structure (e.g.,
MODELING-IN-THE-SMALL 341
22
^These concepts are described in the discussion of Figure 2.6. The first
edition used the third notation shown in that figure.
342 SOFTWARE ENGINEERING: A HOLISTIC VIEW
Generic CR Cl
CR SIR
1 0, 1
Cl - Info Cl - Text
Task _
1. CR
2. Task 3. Cl
Generic CR Cl
Type Cl - Number
CIs Cl - Name
Submitter Add
Add Delete
Delete
Edit
1 ,
0 1
Cl - Info Cl - Text
Create version
Delete version Copy
1. CR
Source Object Test data
Text Lines
o, m
Cl - Version
Version - No
Status
Assign
Accept
Display
3 Cl
attribute <...>
attribute <...>
attribute <...>
externallnput <...>
externalOutput <...>
notes <...>
and, as needed,
traceabilityCodes
applicationStateCodes
timeRequirements
memoryRequirements
of their programs, but they should use a rigorous approach that implies
that they could complete the proof if given the time.
If the programmer who wants to prove his programs begins with only
a general description of an algorithm (say for the greatest common
divisor), what separates him from the hacker? Both begin without a
formal specification. The hacker writes a program and uses his
interactions with the computer to refine his algorithm. Once finished,
he reviews his code and then “proves” to his satisfaction that it is
correct. The deliberate programmer, on the other hand, develops a
formal specification as he develops his code, and he proves that his code
is correct with respect to that specification. That is, the deliberate
programmer takes a very different approach to programming. Of course,
some things are facilitated by informality. Tailoring a window interface
to a particular application benefits from exploratory development. But
if you lived next to a nuclear powerplant, how comfortable would you be
knowing that a hacker developed the safety system?
In what follows, I first borrow from what Gries calls The Science of
Programming [Grie81]. I begin with the book’s motivating introduction,
“Why Use Logic? Why Prove Programs Correct?” The answers are
derived from experience with a very simple task: write a program for
division by means of repeated subtraction. (The program will be used
on a device with no integer division.) Our initial program stores in r
and q the remainder and quotient of x divided by y:
r ;= x; q := 0;
while r > y do
begin r := r-y; q := q+1 end;
We first debug this program by running some test data to see what we
find. Of course, if we run the program with a lot of data, we will have
to look at a great deal of output. Therefore, we will just check to see
if the results are reasonable. We know that there is a precondition that
y must be positive, and that after the computation
x = y*q + r
must hold. We can code these assertions as tests and list the results
only when we find a test case that fails.
Success. No failures are listed. Unfortunately, after a while,
someone finds a problem: the values
x = 6, y = 3, q = 1, r = 3
348 SOFTWARE ENGINEERING: A HOLISTIC VIEW
Finally we have a correct program. But we did not prove it correct; the
discovery of errors forced us to make it correct. As a result of
debugging the program, we established the proper assertions. The
finished program with the assertions is shown in Figure 4.24. We have
made our assertions as strong as possible, and we know that if these
assertions hold, the output will always be correct. Correct, of course,
with respect to our definition of division.
Let us examine what we have done here. The example began with
a straightforward solution to the problem. We wrote the code and then,
in the implementation domain (the solution space), we proceeded to test
if we had a proper solution. Two errors ultimately were discovered, and
the result was a correct program. However, there is an alternative. We
could have remained in the application domain (the problem space) and
sought a solution to the problem. This would force us to think through
the algorithm’s pre and postconditions rather than discover them after
we encountered a problem. Look at the first and last assertion in the
final program. They define division. Given this definition, it is not too
difficult to find a sequence of state changes that will guarantee that, if
the precondition is true, then the postcondition will be true (assuming,
of course, that the computer does not fail). And this is what we mean
when we speak of proof of correctness. It represents a systematic
method for thinking about the problem solution.
{Y>0}
r := x; q := 0;
while r .> y do
begin r := r-y; q := q+1 end;
{x = y*q + r and r<y}
The precondition for Cl is also the precondition for Design, and the
postcondition for Design ({Ci-No exists a status = active}) serves as the
precondition for Maintain. Similarly, the postcondition for Maintain
({Cl-No exists a status = inactive}) is the precondition for Retire. The
process can be repeated by expanding the Design and Maintain
sequences until the transition from each pre to postcondition is refined
enough to permit the development of routines that guarantee the
assertions hold. Reexamine the division program in this context and
notice how each statement’s postcondition serves as the next statement’s
precondition. Observe, too, how we have let the proof process help
guide the design activity.
Proof of correctness changes our perception of programming.
Rather than thinking about how the program works, we consider what
we expect to be true about the program state at each transition. That
is, we think about the problem we wish to solve in the mathematical
context of the solution space. In the remainder of this section I
illustrate this idea with an example taken from a paper by Hoare
[Hoar87]. I present only the material that relates to program proofs,
and I recommend the paper as an excellent overview of some formal
methods for program design. The illustration that Hoare uses is a
simple one: writing a greatest common divisor (GCD) program.
Although the problem may be considered trivial, it is best to use a
simple problem for a complex subject. The methods do scale up. As
noted in Section 2.3.3, formal methods are being used effectively for
large projects; furthermore, in the next chapter I will discuss the
cleanroom technique, which also relies on this proof technique.
The first step in this example is to define what we mean by a GCD.
We are free to use any tools for producing a mathematically meaningful
definition that enables the reader to understand what a GCD is. Hoare
defines the necessary relationship between the parameters (x,y) and their
GCD (z) as follows.
D1.1 z divides x
D1.2 z divides y
D1.3 z is the greatest of the set of numbers satisfying D1.1 and
D1.2.
D1.4 [definition of “p divides q”]
D1.5 [definition of “p is the greatest member of a set S”]
MODELING-IN-THE-SMALL 351
Of course, the proof was rigorous but not formal. Nevertheless, the
proof has given us confidence that we are pursuing a reasonable goal.
As a rule, there is a tradeoff between generality and efficiency. It
would not be difficult to transform the above specification into a logic
program, but it would operate very slowly. Because we desire an
efficient program, we will use a more restrictive algebraic notation that
does not allow disjunction and negation—only conjunction. By reasoning
about the problem, we derive the following algebraic equations.
r = gcd(p,q)
can be proved solely from the algebraic specification and the previously
known laws of arithmetic. We prove this by induction. Assume that this
352 SOFTWARE ENGINEERING: A HOLISTIC VIEW
is true for all p and q strictly less than N. From L2.1, this is so for N
= 2 (i.e., x = 1). Then there are four cases.
P4.2 Z = gcd(x,y)
N, X, Y := 0, x, y
{P4.3} 0 {P4.2}
to give us
Only the middle task, O', remains to be developed. Hoare suggests that
this be split again into four subtasks in accordance with the following
series of intermediate assertions.
The task O'would, in turn, be expressed as four tasks, but the effect of
the program would remain the same. Once the sequential tasks are
defined, we will have proven that given P4.1, the result will be P4.2.
Moreover, the annotated program is the proof. And that is what we mean
by proof of correctness.
24One does not use this kind of notation except for the pedagogic purposes.
The subdivision of 0' would present a formidable notational challenge. The
normal convention is to integrate the assertions into the structured code, as was
done in the integer divide example shown in Figure 4.24.
MODELING-IN-THE-SMALL 355
As the Harris drawing indicates, some things are more difficult to prove
than others. Fortunately, there is an external reality against which this
scientist (the one on the right) can test his hypothesis. That is how the
figures for the speed of light in a vacuum and in water were verified. I
assume that he extrapolated from that model to arrive at the value of
843,000 years per inch for a brick wall. Unless he has an extraordinarily
long-lived light bulb, however, he will have trouble with this experiment.
The software engineer has an equally difficult assignment. Wearing
his verification and validation (V&V) hat, he is to answer the question,
“How good is this software?” Where the modeling of the previous three
chapters concerned the forward-looking, problem-solving aspect of
software development, this chapter examines V&V—the backward¬
looking, solution-assessment counterpart to modeling. V&V may be
viewed as the shining light that makes the software better by either
ensuring that it is right the first time or by removing errors from
software perceived to be good. Naturally, it progresses better through
a clear medium than a brick wall.
Most errors are made early in the development process, and the
most serious of these are those that persist the longest. Therefore, there
is an incentive to remove mistakes before the programs are coded. Of
course, the only way to identify these defects is by reflection and
analysis. When we work with formalisms, they help us identify errors;
if the problem or its solution does not have a formal expression, then
there is no alternative but to think about the problem and its solution.
Reviews, inspections, and walkthroughs are methods for organizing
collective thinking about the recognition and removal of errors.
Once the programs exist, they can be tested to see if they are free
of bugs. There are two basic approaches to machine testing. We can
test programs according to how they have been built. This is structural,
or white box testing. Alternatively, we can test programs with respect
to what they should do. This is functional, or black box testing. In
practice we do both. First with each program and then with components
made of programs. In each case we look for errors, and success
disappoints us; it comes only when we find a previously undetected
error.
360 SOFTWARE ENGINEERING: A HOLISTIC VIEW
If this book were a symphony, then this chapter would be its third
movement. Chapter 1 introduced the theme, and the three modeling
chapters could be interpreted as one long, slow movement. This chapter
on verification and validation (V&V) would then be presented as a
dance, and the book would conclude with a management melody. But,
to continue this analogy, the climax comes—as with Beethoven’s Ninth—in
the longest movement. Problem solving in the construction of the
models is really the most important aspect of the software process.
V&V and management are supporting activities; isolating them for
independent analyses is something of an anticlimax. Nevertheless, each
view is an important part of the process, and the principle of separation
of concerns permits us to break off parts from the whole to examine
them more carefully.
Recall that I described an essential software process in Chapter 1
(Fig. 1.5). It presented the process as a transformation from a need in
the application domain into a software solution that executed in the
implementation domain. Two categories of model were required.
Conceptual models described the problem and its solution in the context
of the application domain, and formal models specified the characteris¬
tics (e.g., behavior and performance) of the software that would realize
the solution. The process had several properties.
■ For any conceptual model there are many formal models that can
produce a desired software solution. Subjective judgment is
required to select a formal model that will be, in some sense, the
best. Similarly, for any formal model there are many implemen¬
tations that will be correct with respect to it.
is deferred until that later phase. The principal concern of “Test and
Preoperations,” however, is integration. It is assumed that prior to
“Code and Debug” we have a correct and valid design and, as a result of
debugging, we also have correct program modules. Thus, in “Test and
Preoperations” we really are establishing that our design indeed was
correct and valid. Now that we have finished building the software, we
integrate it into its operational environment to certify that the system
is right and that this is the right system. Naturally, this would be a
terrible time to find out that we have been wrong; after all, at this point
the implementation is complete.
Of course, the waterfall flow does not wait until the product is coded
before establishing validity. Each precoding phase in Figure 1.2
culminates with a validation activity, thereby ensuring that the require¬
ments or design documents are valid before they are passed on to the
next phase. We know that the cost to correct a defect can be 100 times
more expensive if it is not discovered until the product has been
released. Thus, the early detection of faults is essential. Yet the 40-20-
40 rule (which provides a guideline for the distribution of effort prior
to coding, for coding, and after coding) suggests that a traditional
project organization devotes considerable effort to after-the-fact analysis.
Obviously, this allocation of effort indicates that many problems cannot
be identified until the software is complete. But that is not how we
build a bridge. We know the design is correct before construction
begins; the first 50 cars to cross it do not constitute a test. Should
software be different?
My prejudice is that software should not be different. We should
solve as many problems as possible as early as possible. We should
never defer an honest appraisal until some later time. Neither should
we assume that some independent group can find or fix our imperfect
solutions. There are some management techniques that can help us get
better solutions, and these will be considered in the following chapter.
Nevertheless, the key to quality software is the developers’ continuing
and serious critiquing of their decisions. If modeling is the forward-
looking aspect of problem solving, then V&V constitutes a backward¬
looking examination of each solution in its fullest context. To be
effective, all problem solving requires an evaluation that closes the loop,
and in this chapter we shall consider evaluation methods. Because there
are many methods for modeling, there also will be many methods for
evaluating the decisions expressed in a model.
When the model is formal (as with VDM), methods exist to
determine if the design or implementation is correct. This is the
advantage that a formal method offers. It allows us to reason about
concepts in the abstract and have confidence that our products will
implement those concepts correctly. When the model is not formal (as
with most requirements documents), then we can find errors only by
VERIFICATION AND VALIDATION 363
Naturally, this was before the Challenger accident that so altered the
NASA schedules, and the launch date was less than six years away.
Technical and budgetary problems were everywhere. Perkin-Elmer was
having trouble with the design of the Fine Guidance Sensors, which
would keep the telescopes pointed to celestial targets with high
precision; it was not certain that the instrument would be ready for
launch. C. Robert O’Dell, the chief scientist, and the astronomers on
his science advisory panel felt that they were spending 25 hours per day
on budget battles. “I found myself reacting to crises instead of trying to
do the job right.” In this setting, the task of mirror polishing seemed
to require only routine attention. The company already had made a
number of mirrors for the intelligence community; moreover, we have
been grinding mirrors for more than a century.
The main problem faced by the polishing group was one of
precision. To take advantage of the airless clarity of space, the mirror
had to be accurate within 1/65 the wavelength of a helium-neon laser.
The polishing mechanism used the standard technique, but the device to
test the accuracy of the curve would be a variation of the “null
corrector” normally used. Whereas the null corrector helped technicians
identify surface imperfections by shining light onto the mirror face
through a set of lenses, the new “reflective null” used a laser and
carefully calibrated mirrors. The reflective null would be capable of
identifying Hubble mirror surface imperfections on the order of a
thousandth of a wavelength. And so, from 1980 to 1981, the polishing
proceeded under the direction of the operations division while the
experts, who wrote the proposal and designed the instruments, worked
on other tasks.
When the roughly shaped mirror was received, the reflected null
indicated that the surface had a spherical aberration of about half a
wavelength, which ultimately would be removed. Some minor problems
emerged, but they were readily explained away. At the start, when
assembling the reflective null corrector, it was found that the adjustment
screws would not turn far enough. A 1.3-millimeter-thick spacer was
inserted, and the corrector was certified as “correct.” Soon thereafter,
the opticians tried to double check the alignment of the reflective null
with a second device, but when the test failed, the testers accepted the
certified device as the more accurate. Finally, after the polishing was
complete, a double check was made with a third instrument. Again it
failed, and the certified device was trusted once more. In fact, over¬
whelmed by massive cost overruns and schedule slippages, the Perkin-
Elmer technicians did not want to share their results with NASA, and
NASA seemed equally content not to ask for them. In this environment,
additional confirmatory tests would not be well received; there was
neither time nor money for them. The result, as we now know, was a
spherical aberration that images each star in a halo of fuzz. From the
366 SOFTWARE ENGINEERING: A HOLISTIC VIEW
post facto analysis, we have found out that a crucial lens in the reflective
null was 1.308 millimeters out of position. Thus, this lens was polished
with the desired degree of precision, but to the wrong curve.
What does this example teach us? Even when we are confident in
what we are doing, we still are obliged to test and examine all our
decisions as thoroughly as possible. It is NASA policy always to cross
check their designs. There appear to have been only two exceptions to
that policy. The first was with the O-rings that destroyed Challenger,
and the second was with the Hubble Space Telescope mirror. Need I say
more about the importance of being ernest?
■ There are methods for learning more about the application under
development and the efficacy of a proposed solution. Particular
techniques include the use of simulations and prototypes (both
automated and manual).
The methods and tools that we use will depend on the type of applica¬
tion, the environment in which the product is being developed, and the
development stage.
For requirements and design, Boehm suggests the V&V techniques
shown in Thble 5.1. Notice that, except for common checklists and
automated cross-referencing systems, there are no standard tools. The
V&V techniques tend to be application specific; each must be tailored
to a particular problem. It is assumed that the more one knows about
the problem and the proposed solution, the better the quality of that
solution will be. In this sense, Table 5.1 does not distinguish between
the knowledge necessary to specify or design the solution and that needed
to verify and validate that it is a good solution. The forward-looking
activity of design and the backward-looking task of V&V are but two
perspectives within a single problem-solving process. The advantage of
V&V is that it provides respite during which we can examine a stable set
of decisions free from other concerns. Without this distinct V&V
process, we would accept our decisions solely on the basis of the
arguments that induced us to make them.
The techniques in Thble 5.1 are quite varied; indeed, we could add
to them many of the modeling tools described in Chapter 2. Which
tools we choose will depend on the immediate objectives and project
constraints. In the cited paper, Boehm presents a table that ranks the
listed techniques according to different criteria. For example, he
indicates that manual cross-referencing is very effective for completeness,
consistency, and traceability for small projects, but it is only moderately
effective when the projects are large. Automated cross-referencing, on
the other hand, is shown to be very effective for both large and small
projects. Reading is more effective for small projects than large
projects; checklists are most effective for human engineering, maintain¬
ability, and reliability (a fact that suggests that the checklists must be
somewhat domain specific); and the two detailed automated techniques
are the most effective methods for verifying resource engineering
concerns. Clearly, the diversity of software applications leads to an
assortment of approaches to V&V.
Figure 5.1 contains a different matrix of tools versus goals, this one
prepared by Jones [Jone79j. It is limited to just four categories of
technique and five classes of problem. In his experience, machine
testing (i.e., testing of the programs) is the least effective method for
Notice that management was not mentioned. The goal of this kind of
review is to examine the technical decisions. The presence of managers,
who ultimately will evaluate the participants, is conducive to neither the
free exchange of ideas nor the open admission of failure. As I will
discuss in the next chapter, management does have important responsi¬
bilities in making the reviews work effectively, but attending reviews is
not one of them.
An important guideline for the conduct of a review is that the
material to be examined must be mature and all the participants must
be prepared to discuss it. If both conditions are not satisfied, then the
leader should cancel the review and document the reason in the report.
The Handbook offers considerable advice to the review leader. For
example, the following is extracted from a summary checklist [FrWe90,
p. 114].
■ Before the review. Is the product ready for review? Are all the
relevant materials in your possession? Have they been distribut¬
ed to the participants? Have the participants confirmed their
acceptance? Has the conference room been scheduled?
In general, the same team will be formed for both the Design Ix and the
Code I2.
The inspection process consists of five steps.
rate of coverage for the material is 130 NCSS per hour for L and
150 NCSS per hour for I2.
Although Fagan initially defined his inspection method for the design
and coding of programs, others have extended the technique to other
development activities. Dunn provides extended checklists for require¬
ments and design [Dunn84, pp. 95-98] as well as code [pp. 121-123].
Thble 5.2 illustrates a checklist, taken from another source, prepared for
a requirements inspection. Examples of Ix and I2 checklists are included
in [Faga76]; they contain questions such as, are all constants defined?
Clearly, the kinds of questions asked will reflect the kinds of errors most
commonly found.
One of the advantages of the formal approach suggested by Fagan
is the fact that it collects statistics that can be used both to help in the
identification of errors and in the evaluation of the process. For
example, a 4,439-NCSS test case that he presented in his 1976 paper
showed that an average of 38 errors were found per thousand NCSS
during the I2 and I2 inspections, and an additional 8 errors per thousand
NCSS were discovered during preparation for acceptance test. No
subsequent errors were encountered; that is, the inspection process
found 82% of the errors, and the product was delivered defect free.
Because the inspection process also retained a record of each type of
error found, Fagan was able to compile the distributions summarized in
Thble 5.3. Although the sample is far too small to make any general
observations, notice how many of the errors were related to missing or
extra code. The former usually are an indication that some requirements
have been omitted; the later represent a maintenance problem (i.e., the
delivered product has undocumented and unexpected features).
Not all reviews need be devoted to finding errors. For example, they
frequently are used to educate team members new to the project or
organization. Naturally, the reviews should be continued after the new
members have become indoctrinated; we seldom run out of errors. No
matter which form of walkthrough or inspection method is implemented,
376 SOFTWARE ENGINEERING: A HOLISTIC VIEW
Completeness
1. Are all sources of input identified?
2. What is the total input space?
3. Are there any timing constraints on the inputs?
4. Are all types of outputs identified?
5. What are all the types of runs?
6. What is the input space and output space for each type of
run?
7. Is the invocation mechanism for each run type defined?
8. Are all environmental constraints defined?
9. Are all necessary performance requirements defined?
Ambiguity
1. Are all special terms clearly defined?
2. Does each sentence have a single interpretation in the proper
domain?
3. Is the input-to-output mapping clearly defined for each type
of run?
Consistency
1. Do any of the designated requirements conflict with the
descriptive material?
2. Are there any input states that are mapped to more than one
output state?
3. Is a consistent set of quantative units used? Are all numeric
quantities consistent?
Name of error
Design error 24.4
Logic error 39.8 26.4
Prologue/prose error 17.1 14.9
All other errors 43.1 34.3
Class of error
Missing 57 35
Wrong 32 53
Extra 11 12
■ A TRW study of reported errors found that 63% could have been
found by code inspections and that 58% could have been found
by design inspections [ThLN78].
The logic behind the utility of inspections is obvious. When the only
product the development team has is text, the only way to find errors is
to establish an organized method to look for them. Technology can help
in removing some of the syntactically identifiable problems, but the hard
work still has to be done by us humans.
I have used the term “bug” several times now. Its use in computing
has been traced by Grace Hopper to the final days of the World War II
when she was part of a team working to build the Mark II, a large relay
computer. One hot summer evening the computer failed, and when they
located the defective relay they found a moth in it. The moth and
explanation were entered into the logbook (see Fig. 5.2). After then,
whenever Howard Aiken asked if the team was “making any numbers,”
negative responses were given with the explanation, “we were debugging
the computer” [Hopp81], This idea of a bug has a natural appeal. It
evokes memories of warm summer evenings with background noises and
gnats converging on a cool drink. Benign bugs. Annoying, but
unavoidable. Certainly a more forgiving term than error or defect or
fault or failure or mistake. Frankly, I find it too forgiving a term,
nevertheless, throughout the remainder of this section I will use it.
Testing finds the bugs and debugging removes them.
In his The Art of Software Testing, Myers describes the objective of
testing as follows.
A good test case is one that has a high probability of detecting an as-
yet undiscovered error.
Fig. 5.2. Portion of the log page showing the original bug.
(Photograph curtesy Naval Museum, Naval Surface Weapons Center.)
380 SOFTWARE ENGINEERING: A HOLISTIC VIEW
Thus, we are now in the business of trying to break things that the
programmer believes to be of production quality. We cannot prove
correctness; the most that we can hope for is an increased confidence
that all the most damaging bugs have been found. Myers describes
testing as an extremely creative and intellectually challenging task; just
like design, testing is an art.
Adrion, Branstad, and Cherniavsky observe that, before we begin to
test, five essential components are required [AdBC82].
■ Identify the bugs. This step involves the examination of the test
outputs and the documentation of the test results. If bugs are
detected, then this fact is reported, and the activity reverts to a
debugging phase. After debugging, there is an iteration of the
present step. All passed tests should be repeated with the revised
program. This is called regression testing, and it can discover
errors introduced during the debugging process. If no bugs are
found, then the quality of the test data set should be questioned.
If confidence in the product is low, then the test data set will be
augmented, and the search for bugs continues. If it is believed
that sufficient testing has been conducted, then this fact is
reported, and the testing for this particular product is complete.
Error-Locating Principles
Think.
If you reach an impasse, sleep on it.
If you reach an impasse, describe the problem to someone else.
Use debugging tools only as a second resort.
Avoid experimentation. Use it only as a last resort.
Error-Repairing Principles
Where there is one bug, there is likely to be another.
Fix the error, not just the symptom of it.
The probability of the fix being correct is not 100%.
The probability of the fix being correct drops as the size of the
program increases.
VERIFICATION AND VALIDATION 383
Although somewhat out of date, this Myers’s book remains one of the
most insightful and readable introductions to the art of testing.
While the introduction to this section may have brought the reader to
Phase 2, I would hope that the introduction to this chapter is at Phase
4. (Substitute V&V for testing.) In testing we are looking for those
persistent errors that remain after inspections, standards, and good
design methods and tools. We hope that not too many bugs remain, and
we want to develop skills that will eliminate them with as little effort as
possible.
When looking for bugs in the program, we have two basic techniques
available to us. Static analysis examines the program text without
execution. A compiler performs some static analysis; it can, for example,
identify unreachable statements. Dynamic analysis examines the behavior
384 SOFTWARE ENGINEERING: A HOLISTIC VIEW
■ Other, unspecified.
The table was extracted from Software Testing Techniques, second edition,
by Boris Beizer, copyright Boris Beizer, reprinted with permission of Van
Nostrand Reinhold, New York. Percentages of this extracted table may not work
out because some of Beizer’s low-frequency categories have been deleted.
TM>le 5.4. Beizer’s sample bug statistics.
387
388 SOFTWARE ENGINEERING: A HOLISTIC VIEW
the total bugs reported. Of the requirements bugs, half are due to
incorrect requirements; the effect of logic errors in the requirements is
negligible. With respect to the implementation and coding errors, most
of these can be traced back to problems in the documentation.
Approximately two-thirds of the errors, however, are associated with
three categories: the improper interpretation of a requirement, or its
implementation with a faulty program structure or incorrect data
definitions, structure, or declarations. Thus, if we wish to direct our
energies to finding the most bugs, then we ought to concentrate on
testing for these three categories of bug.
I already have divided testing categories into two groups: the static
analysis that tries to detect bugs implicit in the program text, and
dynamic analysis that relies on program execution to detect bugs. I now
subdivide the latter into two groups.
■ Structural tests. These are tests that examine the structure of the
program text and its data definitions to identify bugs. They are
also called white box tests (or sometimes glass box tests). The
white box tests systematically examine the most detailed design
document (i.e., the code) to ensure that no structural or data
bugs remain.
Most writers on software testing and quality divide static analysis into
two categories. The first involves the extraction of information about
the object with nonautomated techniques such as the review or inspec¬
tion. This was the subject of Section 5.2. The second category takes
advantage of the fact that the program is a machine-processible object.
It relies on automated techniques to discover real or potential problems.
Compilers provide many such facilities; others are supported by
commercial tools. Each defect and potential source of a problem
identified by a static analyzer represents something that is probably
wrong with the program. Bugs are not independent events, and where
one bug has been found, there is an increased probability that additional
bugs remain. Thus, any program that fails any of these tests is likely to
be bug prone. Dunn catalogues the capabilities of automated static
analysis as shown in Thble 5.5.
It follows, therefore, that the automated static analysis techniques
can provide three categories of assistance.
■ Bug identification. The analysis can identify things that are wrong
with the program. Some of these bugs may cause immediate
failure during execution; others may simply suggest a degree of
carelessness that hints at the existence of other, undetected bugs.
Dunn’s list contains examples for the first two categories of automated
static analysis; the remainder of this section considers the basis for
evaluating a program’s complexity.
Among the properties of a program that seem to correlate with its
complexity are its size (usually measured in lines of code), its interfaces
VERIFICATION AND VALIDATION 391
with other modules (usually measured as fan in, the number programs
invoking the given program, or fan out, the number of programs invoked
by the given program), the number of parameters passed, etc. There are
two special complexity measures—each of which may be calculated
automatically—that have received considerable attention, and I shall
present them here. In the next chapter I show how a measurements
program can determine their effectiveness within a particular setting.
Halstead sought to establish a software science that identifies certain
intrinsic, measurable properties embodied in the code of an algorithm
[Hals77]. He begins with the following four countable parameters.
X:= X + 1
the operators are := and + , and the operands are X and 1. He then
defines
n = n1 + n2
N = Nl + N2
N = n1 log2 nx + n2 log2 n2
where the log2 reduces the length value to the number of bits required
to express each operator and operand uniquely. From this one can
compute the volume of the program
V = N log2 n
A = LV*;
V(G) = e - n + 2p
If each program has its own graph, then the value of p can be
thought of as the number of independent programs. A structured
program will havep — 1. Moreover, it can be shown that the cyclomatic
complexity of a structured program equals the number of predicates plus
one. Where there is a compound predicate, such as
n - e + r = 2.
Because the stronger the test, the more effective it will be, I limit the
discussion to branch coverage testing. I also assume that we are working
with structured programs with a single entry and exit.
The discussion of cyclomatic complexity in Section 5.3.2 showed how
a program’s control flow could be represented as a graph. The
cyclomatic number counts the number of control paths. If each
predicate in the program is based on a single condition, then a systemat¬
ic test of each path will constitute C2 testing. McCabe observes that the
number of test cases to achieve this is equal to the cyclomatic number.
Here is the method he uses for establishing a base set of paths (or basis
paths) for structured testing [McCa83].
■ Draw a control flow graph as shown in Figure 5.4; nodes that are
part of a sequential flow are omitted to simplify the graph.
■ Next return the first decision to its initial state and flip the
second decision.
■ Continue in this way until every decision has been flipped while
all other decisions have been held to the baseline.
VERIFICATION AND VALIDATION 397
Because the choice of the baseline is arbitrary, there may be several sets
of data that satisfy this structured-testing criteria. Each application of
the method, however, will produce a test set with the following
properties:
X < A at statement 2,
U < 0 at statement 5,
Z < = Y at statement 6, and
U > 0 at statement 12.
utilization. Nevertheless, this may be the only way in which one can test
certain flows. It also is the only method for evaluating the loop tests to
be described below. Therefore, as its name implies, white box testing
may require that the tester do more than just prepare inputs for a run.
The tester also should be sensitive to the fact that the program he tests
is intended to be a long-lived object. The instrumentation, test sets, and
test outcomes should be designed to have a lifetime equal to that of the
programs. Thought should be given to the preservation of the test cases
and outcomes in a machine processible media that can serve as an oracle
for regression testing.
Returning to the topic of path testing,4 we have seen that McCabe’s
heuristic is quite effective in producing a test set that covers all branches
and paths in the flow. But I have provided a strange program whose
function is not clear. Without explaining what that program is to do, it
is difficult to determine if its logic is valid. The fact that some paths are
never taken may suggest that the program should be reworked, but the
basis path selection process can offer us little insight here. Testing each
statement at least once has absolved us of criminality. The question now
is, was that enough? Beizer states that 65% of all bugs can be caught
in unit testing. In practice, about 35% of all bugs are detected by path
testing; when these tests are augmented with other methods, the
percentage of bugs caught rises to 50 to 60%. Among the limitations of
path testing are the following:
Beizer uses the term “path testing” for a family of testing techniques, of
which statement coverage and branch coverage are the first two members. The
base-set heuristic just described usually (but not always) provides branch
coverage.
VERIFICATION AND VALIDATION 399
■ Nested loops. Start at the innermost loop, and set the outer
loops to their minimum values. Test that loop as a single loop.
Continue outward in this manner, but set all but the tested loop
to typical values. Repeat the cases for all loops in the nest
simultaneously.
To close out this discussion of path testing, one must mix thorough¬
ness with thought and practicality. For every test run there must be an
outcome analysis. Stacks of unread listings violate the intent of testing.
The goal is to find errors, and that requires that we look for them. Path
testing, as usually applied, has a goal of ensuring Cl + C2 coverage;
loop testing expands on this basic theme. Some heuristics can help
VERIFICATION AND VALIDATION 401
identify potentially effective test cases. But thought must prevail. Add
to the tests by picking additional paths that are slight variations of
previous paths. If it is not obvious why some control path exists,
question it rather than just test it. Play your hunches and give your
intuition free reign. The software is not fragile, and you ought not be
able to break it. But go ahead, give it your best shot.
Even when the test sets have been prepared with great care, they
still may not detect existing bugs. Among the problems is the possibility
that the predicates in the control flow are correlated (e.g., we cannot
start the loop in the program in Figure 5.3 with X >= A). In some
cases, we may find that seemingly independent predicates are correlated.
For example, consider the following sequence of code.
Here the program will take the same branch for both selections. The
test designer should consider why the program was designed this way.
Beizer states that this kind of correlated decision is often associated with
the practice of “saving code.” Several times I have stressed the
importance of reuse, but that is quite different from the practice of
creating new programs by combining and editing fragments of old
programs. Perhaps the most important quality factor of operational
code is its maintainability; code-saving tricks are seldom documented
and almost always difficult to maintain. The test designer, therefore,
should question whether the program might better be restructured by
combining the two selection statements with the predicate A > 0 into
one.
Another complication is that of testing blindness in which the desired
path is achieved for the wrong reason. For example, both sets of code
will take the same branch, but the code on the right is buggy.
Correct Buggy
X := 7 X := 7
Path testing cannot differentiate the buggy from the correct path; only
inspection will work here. Related to this is the problem of coincidental
correctness that results when the presence of a bug is not reflected in the
test outcome. For example, one common practice is to instrument the
program to print out the path name (and perhaps the state for key
variables) each time the path has been taken. The tester then goes
through these traces to verify that a test input produced the appropriate
trace. If the trace statement is printed only when the program enters
the path, however, there may be a bug that causes a change in the
control flow without having that change reflected in the trace. To avoid
this kind of problem, the trace statements should be placed at both the
beginning and end of each traced path.
In addition to examining the control paths, one should also look for
data-flow anomalies. As Thble 5.4 indicates, there are almost as many
data bugs as there are structural bugs. Thus, it is prudent to heed the
advice of Rapps and Weyuker.
It is our belief that, just as one would not feel confident about
a program without executing every statement in it as part of
some test, one should not feel confident about a program
without having seen the effect of using the value produced by
each and every computation. [RaWe82, p. 272]
What we shall do, then, is assume that the program’s control flow is
correct and look to see if the data objects are available when they
should be, or if silly things are being done to the data objects. Because
of the holistic nature of software, such tests will also help us discover
previously undetected control-flow bugs.
Beizer defines the following method for data-flow testing. As with
control-flow testing, we begin by constructing a graph. Of particular
interest will be the state changes for the data. Beizer identifies three
possible actions on data objects [Beiz90, pp. 151-153]:
* k— The last thing done to the object in the path was to kill it;
normal.
■ d— The last thing done to the object in the path was to define it;
possibly anomalous.
■ u— The last thing done to the object was to use it; normal.
annotates it with the names of the data objects used in the predicates.
This graph then can be analyzed for each data object. For example,
Figure 5.6b illustrates the flow for X. It is defined (d) in the link
between nodes 0 and 2, used as a predicate and in a calculation (pc,
which is a detailed form of uu) in the link between nodes 2 and 5, and
so on. One can use this flow to identify possible anomalies manually,
or one can use it to generate test sets. It is also possible to develop
automated tools that help in the record keeping and reduce the effort.
Data-flow testing offers an effective way to identify a relatively large
number of bugs. Moreover, the good coding practices required to
support this kind of testability also enhances the quality of the product.
Now that we have identified goals for the test sets, we must prepare
the test data. There are some general rules to follow. For example, we
should test at the boundary of a predicate. Thus, for the predicate X >
0 the tests should include the case of X = 0. Often an examination of
the outcome will indicate that the predicate was incorrect; it should have
been X >= 0. Of course, this is really not a coding bug; it reflects an
inadequate design in which the boundary was not stated clearly. The
next section, on functional or black box testing, deals with these domain
issues in a more systematic manner.
VERIFICATION AND VALIDATION 405
In black box testing we are concerned only with the fact that the
program or component performs as specified. This introduces a
potential problem. The tests can be only as thorough as the specifica¬
tions. The behaviors of a program can be divided into three categories.
For the expected behaviors, we might try (3,4,5), (4,4,5), and (5,5,5).
Each output will be tested. For the rejected behaviors we might try
combinations that do not produce triangles such as (1,2,3) or (4,4,100).
406 SOFTWARE ENGINEERING: A HOLISTIC VIEW
whereas the second input would not. We need not try combinations
with a zero, negative integer values, real values, integer arrays, or
character strings; the input data type assures us that we need not care
about these situations. We now have the smallest test set we can use.
It tests for each expected outcome and one rejected outcome. Do we
need more tests? Perhaps there is something about the problem that
suggests that the implementation might be dependent on the order of
the values in the triple. If so, we should add (4,5,4) and (5,4,4) for the
expected outcome and (1,3,2), (2,3,1), (2,1,3), (3,2,1), and (3,1,2) for the
rejected outcome. Notice that we have more tests for rejected than for
expected outcomes.
The first observation regarding these test cases is that as a result of
creating them we have found a bug. (In the previous section, Beizer told
us this would happen!) The test cases expect four different outcomes,
but the specification defines only three. Therefore, we must alter the
output to
If this requires a change to the program, we need not run the tests until
the program has been corrected; our first test has been successful
without ever having been run. A second observation about this example
is that if we had thought about the test cases when the program was
being designed, then the bug would not have persisted until the time of
testing. This, of course, is the message that Gries presented in section
4.3. But, even if one elects not to use proofs of correctness, there are
benefits in planning for tests. Bad designs have bugs and are difficult to
test; therefore many of their bugs remain undetected until after delivery.
Good designs, on the other hand, are testable. Consequently, an early
concern for testability can improve the design and reduce the number of
bugs to be removed.
Unlike the structural tests of Section 5.3.3, the functional tests are
much simpler to set up and run. We seldom have to instrument the
software, and we almost always can initiate our tests by invoking a
program with a given set of parameters. Thus, the principal function of
black box test preparation is the selection of input parameters that will
either produce the expected outcome or reject the inputs. Because we
cannot exercise the component with every possible combination of input,
VERIFICATION AND VALIDATION 407
we seek a more effective method for selecting the tests that will uncover
previously undetected bugs. Myers offers the following guidelines for
developing those test cases [Myer79].
■ The test case should reduce, by more than a count of one, the
number of other test cases that must be developed to achieve
some predefined goal of “reasonable” testing.
■ The test case should cover a large set of other possible test
situations. That is, it should tell us something about the
presence or absence of errors over and above this specific set of
input values.
5I have chosen to illustrate these concepts using some of Myer’s early work,
which is both intuitive and easy to explain in a few pages. The approach,
however, is dated, and—in that sense—my presentation is distorted. In the past 15
years there has been considerable work in the development of algorithms for
producing equivalence classes, and I recommend Beizer’s chapter on domain
testing for a more thorough and up-to-date discussion.
408 SOFTWARE ENGINEERING: A HOLISTIC VIEW
■ First, for the valid equivalence classes, write a new test case that
covers as many of the uncovered valid equivalence cases as
possible.
■ Then, for the invalid equivalence classes, write a test case that
covers one, and only one, class.
The reason that we can test more than one valid class at a time is that
we are seeking failures. Failure in any one equivalence class will be
recognized in the outcome unless there are hidden dependencies among
valid classes (which should have been removed during the partitioning
process). We must test invalid classes separately, however, because there
may be a hierarchy in the program’s testing that prohibits the identifica¬
tion of all invalid responses.
In selecting the test cases to be used, Myers notes that we should
not rely on arbitrary test data and proposes the method of boundary-
value analysis. He suggests the following criteria.
■ Select one or more elements such that the edge of the equiva¬
lence class is the subject of a test. For instance, if the input
domain of a computation is defined to be valid for nonnegative
values less than or equal to 100, then the valid tests would
include the value 10'™ and 100, and the invalid tests would
include the values -1, 0, and 100.001.
errors for this type of product with its required reliability. If the testing
has not uncovered the expected number of bugs, then additional testing
is suggested. (More on this technique is given in Chapter 6.) The
second involves an analysis of the test set. The idea is to calibrate how
effective a test set is for a target program with an unknown number of
bugs by measuring its performance with the same program and a known
set of bugs. The idea of error seeding, usually attributed to Mills, is a
way to estimate the number of errors in the components being tested
[Mill72]. First S errors are seeded into the component so that their
placement is statistically similar to that of the actual errors. Prior to
the seeding, the test set has been used to discover I errors; when rerun
after the seeded errors have been added, K seeded errors will be
uncovered. From this we can estimate E, the number of errors in the
software being tested, as
E = IS IK.
the inputs and select cases from those classes to either produce the
expected results or reject the inputs. To maximize the effectiveness of
the test cases, we choose values that are on the boundary of the
partition where the likelihood of making a mistake is highest. Although
this technique has received widespread acceptance, the reader should be
aware that some empirical studies are questioning its utility. Hamlet
and Thylor report, “comparison between [partition] testing that observes
subdomain boundaries and random sampling that ignores the partition
gives the counter-intuitive result that partitioning is of little value”
[HaTh90]. That is, randomly generated test sets with the same statistical
distribution as the data to be processed have been shown to produce
better results than partition testing.6 They assert that partition testing
is most effective “when subdomains with a high failure probability can
be detected—that is, when the failures are suspected and localized.”
Naturally, when the prerequisites for random testing have not been met,
partition testing is the only option.
Thus, in this imperfect world we must live with uncertainty regarding
the validity of the specification, the correctness of the design, the
existence of bugs in the implementation, and the strength of the test
sets. Errors build on each other, and they become more expensive to
remove the later they are detected. Therefore, lacking any foolproof
method for identifying bugs, it is far better to avoid their existence than
to look for them. This is what Dijkstra set out to do in the T.H.E.
system, and another approach is discussed in Section 5.4 on the
cleanroom. Lacking that discipline, however, there is no alternative to
the careful, complete, and systematic act of test design.
5.3.5. Integration
The previous three sections described the kinds of tests available for
identifying bugs in the software. Static analysis reviews the software text
and its surrogates to identify problems, structural testing designs tests
based on how the software has been built, and functional testing
develops tests using knowledge of what the software is to do. These
three techniques are valid for individual programs, units comprised of
6Notice that the definition of a random sample is quite different from that
of an arbitrary test. The random sample is chosen randomly from a statistically
valid model, whereas the criterium for an arbitrary test is simply that it satisfy the
syntactic constraints. Given a statistically valid model for the functionality (and
user profile), random samples are effective in assessing software reliability and
detecting some failures. Arbitrary tests at best are inefficient; at worst, they instil
a false sense of confidence.
VERIFICATION AND VALIDATION 411
'j
alpha and beta tests are not useful; well-defined criteria exist to establish
that the software does precisely what is expected of it.
Myers observes, “Because of the absence of a methodology, system
testing requires a substantial amount of creativity; in fact, the design of
good system test cases requires more creativity, intelligence, and
experience than that required to design the system or program” [Myer-
79], He then goes on to identify 15 categories of test that should be
explored when designing test cases.
■ Security testing. Does the system satisfy the security and privacy
objectives?
In describing these tests, Myers points out that the objective should
be to find data that will make them fail. This implies an adversarial
relationship between the developer and tester. The developer tries to
anticipate every problem and make the system correct with respect to
the SRS, and the tester uses the same context in order to find fault with
the system. Clearly, finding problems before the users do is beneficial
to both the developers and testers. I have called the system tests
positive in the sense that they are not intended to find bugs; rather,
their objective is to exercise the system as severely as possible to certify
that it is correct. After the system is accepted and changes have been
made, this same certification process must be repeated if we are to
maintain confidence in the system. Of course, we can consider the
system test positive only if we have already run negative tests for each
testing category before system testing begins. For example, we need to
model performance and throughput at an early stage in the design. If
these are critical properties of the system, then some prototypes or
simulations may be necessary. Key components should be given a
performance or throughput budget and tested against that budget as
early as possible. Waiting to find that we have failed at the time of
system test when the implementation is complete—is a very poor
strategy.
VERIFICATION AND VALIDATION 417
This chapter started with a review of the software process and contrasted
the building of a bridge with the implementation of software. Whereas
the goal of software testing is error discovery, bridge “testing” is seen as
a certification process. The engineers are expected to produce a bridge
design that will be accepted as correct and valid. After construction
begins, changes are expensive; consequently, bridge building (and most
hardware manufacture) is optimized for error removal prior to imple¬
mentation. In software, however, there is a perception that errors are
inevitable and that debugging and machine testing are the best way to
remove them. Justifications for this belief include software’s limited
experience base and its associated lack of handbooks, the absence of an
external constraining reality, and the fact that machine testing is both
relatively inexpensive and feasible. As a result, a rationale has been
justified that permits developers to defer a complete understanding of
the product’s behavior until after it has been implemented.
This book has presented several methods for getting it right the first
time. Working from formal specifications defines the system’s behavior
at the outset and ensures that the product satisfies those specifications.
Proof of correctness establishes the desired state changes and builds a
program that will realize them. There is at least a decade of experience
with these methods, and they are successful. There also are demonstrat¬
ed techniques for refining the specifications when there is uncertainty.
The two basic methods cited are prototypes, in which experimentation
leads to a specification for the entire system, and incremental develop¬
ment, in which layers of the system are specified and implemented on
top of previously defined layers (builds). This ability to develop a
system incrementally is a unique property of software. Because software
is not bound by an external reality, there are a very large number of
ways in which a software system may be layered.
The cleanroom approach combines mathematical verification with
software’s facility for incremental development; the result is an analogue
of the hardware development process. Introduced by Mills, Dyer, and
Linger in the early 1980s, the intent is to deny entry of defects during
software development, hence the term “cleanroom” [MiDL87, SeBB87].
As with bridge building, one produces a design that is correct and valid
before implementation begins; incremental development is used to
reduce the work tasks to small units whose progress may be tracked
easily. If the design is indeed error free, then the traditional error-
detection role of testing will not be necessary. Because we expect no
errors to be found, a better goal of testing is to measure the probability
of the zero-defect assertion. Whenever the desired level of reliability
has not been achieved, corrective action can be taken in the next
418 SOFTWARE ENGINEERING: A HOLISTIC VIEW
The cleanroom approach has been used in a production setting since the
mid-1980s. Among the first systems to be implemented in this way are
an IBM COBOL restructuring tool (80K lines), an Air Force helicopter
flight program (35K lines), and a NASA system (30K lines). There also
has been an empirical evaluation of student experience with the method
[SeBB87], The general conclusion is that the resulting programs are
more reliable than programs developed with the traditional life-cycle
model, that the time required to produce a verified program is less than
or the same as the time necessary to design, code, and debug a program,
that the method of functional verification scales up to large programs,
and that statistical quality control is superior to the time-honored
technique of finding and removing bugs.
Tb illustrate the benefit of statistical testing, consider Thble 5.6
which summarizes data collected by Adams on software failures for nine
major IBM projects [Adam84]. It displays the mean time between
failures (MTBF) reported for these projects in operational years (i.e., if
a product has been installed at 100 sites for one year and only one
failure was reported, then the MTBF would be 100 years). T\vo
contrasting measures are shown.
Average percentage
failures 33.4 46.9 15.8 3.9
Probability of a failure
for this frequency 0.008 0.065 0.202 0724
Thus, the table shows that 33.4% of the fixes were devoted to failures
that had a 0.008 probability of being seen by a user, whereas only 3.9%
of the fixes were devoted to errors that had a 0.724 probability of
impacting the user. If the distribution of effort to repair errors is
roughly independent of the MTBF, then this table suggests that IBM
spent one-third of its error-correction budget on failures that its
customers hardly ever saw and less than 4% of its budget on the failures
that represented almost three-quarters of all the failures encountered by
its customers! Clearly, this is not an effective allocation of resources.8
These data indicate that not all errors are equal; there is a difference
between discoverable errors and important errors. If we spend our V&V
budget on just discovering errors, it is not clear that we will find the
important errors. Tb find the important errors we must look for
properties of the software that are operationally significant, not just the
attributes that are likely to be coded wrong. Cobb and Mills assert that
this may best be done by statistical quality control (i.e., the functional
testing of the systems using randomly generated test sets with the same
statistical distributions as the specified inputs) [CoMi90]. They present
evidence to support their claim from projects developed with and
without the cleanroom approach. When the cleanroom is used, of
course, there is no need for structural testing; the programs already have
been verified with respect to what they are to do, which includes the
details of how they do it.
5.5. A summing up
9There are many different definitions for quality. Perhaps the most intuitive
description is that quality is a measure of the buyer’s perception that the software
product performs as expected. Notice how this concept incorporates the ideas of
validation (the product corresponds to the needs of the environment), verification
(the product performs as specified), and the ilities (the product is portable,
maintainable, reliable, etc.). Like good art, quality software is difficult to define
but—with experience—easy to recognize. The software engineer, lacking a precise
definition for quality, must learn to recognize (and to react) when the quality is
deficient. Failure to do so will lead to the loss of his customers.
422 SOFTWARE ENGINEERING: A HOLISTIC VIEW
■ What lessons can be learned from this task that will improve
performance in future tasks?
Notice that the manager need not know how to solve the technical
problems that he is managing; he need only understand what the
technical people are proposing.
Regrettably, empirical evidence suggests that managers are not very
effective in this kind of problem solving. In a paper titled “Software
Failures Are Management Failures,” Wingrove lists the 16 most common
reasons given by project management for failure to meet budget,
timescale, and specifications [Wing87]. Redmill organizes these reasons
as follows [Redm90].
The first question that one should address is how software project
management differs from other technical management domains, or
indeed from nontechnical management. Each project or organization is
identified by a set of goals or objectives, and management’s responsibili-
MANAGING THE PROCESS 429
These principles all support the general concept of keeping the project
within control. Once it is out of control, the team members tend to
react to problems. There is no time to anticipate difficulties, and major
defects can develop unnoticed. The problems in the Hubble Space
Telescope, described in Section 5.1, resulted from reactive management.
Is there anything special about the management of software projects?
Kichenham offers the following observation.
are common. In the initial experiment, the team was structured with a
Chief Programmer (in this case, Harlan Mills), an assistant (Terry
aker), two to five other programmers, and a librarian who managed the
batch computer runs and maintained the system listings as a common,
public record. Technical leadership was provided by the Chief Program¬
mer (or the assistant in his absence), and the code was open for all
members of the team to examine. In retrospect, we see that the Chief
Programmer Team shared some of the ideas of egoless programming and
group reviews. When evaluating this type of organization, however, it
is difficult to separate the effects of team organization, the principles
exploited by the team, and the extraordinary intellectual capacity of the
initial team leaders. In any event, software engineering textbooks
continue to discuss the advantages and disadvantages of this type of
organization, and I felt that I should at least describe it. For a more
current view of software teams, see [Rett90],
A second examination of software development organization, which
has received considerably less attention, was reported by Licker [Lick831
By way of background, in 1960 McGregor classified management styles
as conforming to either Theory X (authoritarian) or Theory Y (partici¬
pative and supportive); each theory was based on the perception that the
individual may not be or may be responsibility-seeking and eager to
work [McGr60], The result is an ethos for the firm, and an individu¬
al independent of his innate abilities—cannot advance until he becomes
accustomed to the style and feeling of the firm. In 1982 Ouchi wrote a
book on the Japanese approach to management, and he introduced the
next step, Theory Z [Ouch82]. A Type Z (American) organization
operates in harmony with the Japanese philosophy of life-time employ¬
ment, slow evaluation and promotion, non-specific career paths, implicit
control mechanisms, collective decision-making and responsibility, and
holistic concerns. Software-related firms in Ouchi’s list of Type Z
organizations include IBM, Hewlett-Packard, and Xerox. Briefly, the
individual loyalty is to the organization and group, and career paths
emphasize cross training over specialization.
In the following extract, Licker first cites some observations made
by Kraft [Kraf77], and then comments on the benefit of the holistic
approach.
A dismal picture, and I am not sure I concur with it. But the alternative
is certainly worth considering.
piece of equipment has not arrived, a proposed solution was flawed, key
personnel are unavailable). In this case the model must be altered to
reflect the project’s new reality. Using the model as a guide, we also
may notice that planned events will not occur as scheduled (e.g., the
development of a module is behind schedule, the users are not available
to evaluate a prototype, vacation schedules delay a task’s completion).
Here, managers are expected to react to the microproblems before they
become macroproblems (i.e., threaten the success of the project). Using
the three dimensions of control at their disposal, managers can modify
aspects of local plans without having those changes impact the larger
project (or global) plan. In both situations, the plan (or model) is
revised. In the first case the model was inaccurate; we need to learn
from that experience to improve the quality of subsequent models. In
the second case, management is simply exercising the control expected
of it.
In this description of what management does, I have used the words
model and plan interchangeably. The model of how a project will be
carried out normally is documented as a plan that identifies a sequence
of events. One may think of this plan as a discrete simulation or as a
program [Oste87]. Project progress is evaluated with respect to these
events. For each event, management must ask the question, “Have all
the criteria for completing this event been satisfied?” If the answer is
positive, then the project moves on to the next events in the plan; if the
answer is negative, then management must decide what actions are
necessary. For example, recall the Preliminary Design Review (PDR)
described in Section 5.2.
Unlike the structured reviews, which are designed to find errors, the
PDR is intended to elicit confidence that the preliminary design is
sufficiently complete to warrant the initiation of detailed design. As an
activity, the PDR is a “dog and pony show.” Its intent is to demonstrate
to the bosses, customers, users, and colleagues that the preliminary
design has been thought through thoroughly. The development team
will learn as a result of preparing for the review, and some very useful
knowledge will be exchanged during the review. But this interaction is
only a secondary benefit of the PDR. For management the PDR is one
of many events in its plan. Management must determine if that event
has truly taken place, and make a decision based on their evaluation.
The PDR is but one of the inputs that they will use in coming to a
conclusion regarding the quality of the preliminary design. The review
may have gone well, but the managers may sense that some aspects of
the project were not considered in sufficient detail. Or the review may
have gone poorly, but there are other factors that prohibit delaying the
detailed design. (After all, project management does operate in the real
world.) Thus, in the project plan (model), the PDR is an event (node)
that represents a decision point (branch). From the perspective of
MANAGING THE PROCESS 437
system design, the PDR could be held a month earlier or later—it would
make little difference in the technical outcome. As an event in the
(possibly revised) project plan, however, it is essential that the PDR be
held as scheduled; its occurrence provides information about the validity
of the current plan, which is far more important than anything it might
tell the technical people about the validity of their design.
From this introduction it is clear that management requires two
categories of tool. The first is used to construct the model, and the
second is used to evaluate progress with respect to the model. Of
course, if all projects were alike, then one model would suffice. Even
within a fixed organization, however, this is seldom the case, and
project-specific models must be constructed. Notice that I use the word
“model” even though its realization will be a “plan.” I do this to
emphasize that the role of the model (or plan) is twofold. First, it helps
the development team organize its thoughts and establishes some
necessary baseline documentation, and second, it provides a mechanism
for accessing progress and status. Often, plans are treated as boilerplate
(i.e., documentation to be created, distributed, and filed, but not
necessarily read). Obviously, I reject that view. If the plan will not be
used and kept current, then management should question if it is worth
the effort of its preparation. Software engineering is frequently
described as a document-driven discipline, and the many plans—for the
project, for QA, for CM, for V&V—are cited as illustrations. I take a
pragmatic view. If the plans do not help understand or control the
project, then do not write them. Sometimes the activity of planning
(and other documentation) becomes an end in and of itself. (Recall
Yourdon’s criticism of the physical model in Section 3.2.3.) This must
be avoided. But without a plan, there is no sense of where one should
be. Granularity is reduced to just two events: start and stop. One
recognizes trouble only when “stop” never comes.
What tools do we use for constructing a plan? From reading the
previous chapters we understand the software process; consequently, we
should be able to identify the events that will be of major concern.
Early in the project it will be clear that certain project-specific technical
issues need additional study. There also will be a group of project-
independent events that must be monitored (e.g., QA, CM, and V&V
plans, system PDR, system test). Finally, there will be a series of events
associated with each system component. These are normally organized
in the form of a work breakdown structure (WBS) [Thus80]. The WBS
is an enumeration of all work activities, structured in a hierarchy, that
organizes work into short, manageable tasks. For a task to be manage¬
able, of course, it must have well-defined inputs and outputs, schedules,
and assigned responsibilities. Thble 6.1 contains the outline of a
generic WBS for a software implementation project. Naturally, the
lower levels of the WBS will be specific to the project. Each WBS item
Tkble 6.1. Software work breakdown structure.
438
MANAGING THE PROCESS 439
defines a milestone (or event) that will be useful for both decision
making and status review. The granularity of the WBS will depend on
the number of milestones and the nature of the project. Tbo few
milestones deny management the information necessary to maintain the
project under control; too many milestones during a short period may
obscure significant trends.
As with design, diagrammatic tools and multiple views aid problem
solving. The WBS is an organized list of tasks. Because most of those
tasks interact, it is useful to model them as a network using either the
Program Evaluation Review Technique (PERT) or Critical Path Method
(CPM). Figure 6.1 displays a PERT chart (or network) taken from
[USAr73]. The first event (0) starts at time TE = 0, and the final event
(10) occurs 33 days later (TE = 33). The nodes indicate the day number
that each of the nine subtasks is completed, and the TE value indicates
when the event is completed. The edges between nodes indicate the
dependencies among the events; they are labeled with the number of
days to task completion. The critical path in this network is the longest
path through the network between the start and finish nodes. Any
schedule slippage on the critical path will cause a project delay. For
example, the critical path in Figure 6.1 is 0-2-4-6-9-10. If the comple¬
tion of any of those events is delayed while all the other events on the
path are completed in the scheduled time, then event 10 cannot be
finished in 33 days. In contrast, event 3 can be delayed as many as 7
days before the path 0-3-5-8-10 also becomes critical. Boehm provides
a useful introduction to PERT methods [Boeh81]. There are many
automated tools available for drawing and maintaining these networks.
12 19 26 2 9 16 23 30 7 14
r
Develop procedures CJ L} -
Brief management □
XIn common American usage, “anticipate” often is equated with the passive
act of recognition. It has a much stronger meaning. It implies an action that will
counteract some future problem. Naturally, this is how the word is used here.
442 SOFTWARE ENGINEERING: A HOLISTIC VIEW
probability of that outcome and the loss to the parties affected by that
outcome. This may be written,
RE = Prob(UO) x Loss(UO).
This suggests that we can optimize for certain technical aspects at the
potential expense of product features, cost, and/or schedule. Naturally,
this is what a manager is paid to do. To make risk trade-offs, often in
the midst of a project. In most cases, there is a limited window for
making a decision. If one does not anticipate risks promptly, problems
emerge and an opportunity is lost. In making these trade-offs, the
MANAGING THE PROCESS 443
These two categories of risk map into the two domains of the essential
software process (Section 1.2.2). The generic risks are concerned with
the implementation domain and are independent of application domain
specifics. In contrast, the project-specific risks are driven by application-
domain issues. Naturally, books on software engineering will focus on
the methods for reducing generic risks. Management’s risk reduction
program, however, is expected to build on the lessons learned regarding
generic risks and focus on the project-specific risks.
Figure 6.3 depicts Boehm’s steps in risk management. First we
assess the risks, and then we control them. The former involves
identification, analysis, and prioritization; the latter involves planning,
resolution, and monitoring. In what follows I review for each step its
objectives and tools. Keep in mind that we are operating in two
dimensions. One dimension is concerned with generic issues that are
valid for all projects of this general class. We expect to develop skills
in identifying these risks so that future project plans can anticipate
them. The second dimension involves project-specific risks. Here one
cannot generalize. Boehm uses case studies to illustrate these risks in
[Boeh89]. As with creating a valid requirements specification or
developing an effective design, one cannot have a good sense for the
risks without also having a solid understanding of the problems to be
solved. That is, to become effective in risk management, management
must be deeply involved in the project. Management tools without that
commitment will not avail.
Obviously, risk assessment begins with risk identification. As with
requirements analysis and testing, checklists are helpful. They catalogue
444 SOFTWARE ENGINEERING: A HOLISTIC VIEW
RISK - CHECKLISTS
IDENTIFICATION - DECISION DRIVER ANALYSIS
- ASSUMPTION ANALYSIS
- DECOMPOSITION
PERFORMANCE MODELS
RISK
RISK
ASSESSMENT COST MODELS
ANALYSIS
NETWORK ANALYSIS
DECISION ANALYSIS
RISK
RISK EXPOSURE
PRIORITIZATION
RISK LEVERAGE
BUYING INFORMATION
RISK
MANAGEMENT RISK AVOIDANCE
RISK REDUCTION
RISK PROTOTYPES
RISK
CONTROL SIMULATIONS
RESOLUTION
BENCHMARKS
ANALYSES
STAFFING
- RISK REASSESSMENT
- CORRECTIVE ACTION
■ Will your project really get all of the best people? During the
period of proposal writing, all the best resumes are used. Now
that the work is in hand, are those people really available? Are
they even interested in your project?
MANAGING THE PROCESS 445
■ Are the key people compatible? In the best of all possible worlds,
this is never an issue. This is an example of a hygiene factor.
One should develop these checklists and use them as both reminders and
general rules. Each painful mistake should contribute to the list.
Anther technique for risk identification is decision-driver analysis.
Here one itemizes the sources of the key decisions about the system. If
a decision has been driven by factors other than technical and manage¬
ment achievability, then it may imply a software risk. Examples include
politically driven decisions (e.g., choice of a piece of equipment or
subcontractor), marketing-driven decisions (e.g., special features or
equipment), or short-term versus long-term decisions. One can also
identify risks by explicitly examining the assumptions, which tend to be
optimistic and hidden. As with error-detection reviews, there is no
alternative to thinking about the problem and relying on past experi¬
ence. Finally, Boehm identifies the technique of decomposition.
Basically, this acknowledges that when we deal at a high level of
abstraction we hide many of our problems along with the details. One
guideline that he offers is the Pareto 80-20 phenomena: 80% of the
contribution comes from 20% of the contributors. The examples he lists
are:
■ Why? This is the risk item importance and its relation to the
project objectives.
For each identified risk to be resolved, we get a plan and a task that
must be monitored. The outcome of the task tells us something about
the project or product and removes some uncertainty. For example, in
the SCM system application, we may have detected some risk regarding
the ability of the existing hardware configuration to support all the
users’ files. If this apprehension was justified, then there would be two
alternatives. We could alter the system design or enhance the equip¬
ment. Once the risk is assessed, we will plan and carry out a task to
resolve it. After that task is complete, we will use its findings to adjust
either the plan or the design as necessary. If we did not pursue risk
management, the alternative might have been the completion of an SCM
system that “met specifications” but would not work on the available
equipment because we never thought to insist that it should.
I conclude this section on risk management with some more of
Gilb’s principles; consider them an endorsement for his book [Gilb88].
At the level that I have listed these principles, they simply represent
sensitivity training. The role of management is to look for risks and
anticipate problems. They can assume this proactive assignment, of
course, only if they have good models to work from, and this is the topic
448 SOFTWARE ENGINEERING: A HOLISTIC VIEW
■ How likely is it that factors have not been considered that will
affect deliverable X?
for showing that the waterfall flow and the spiral model are isomorphic;
the former emphasizes the end-of-phase certification and the feedback
from earlier phases but does not depict the risk-reduction activities,
whereas the spiral model focuses on risk reduction by hiding the
feedback and certification details.
If prototyping is a method for top-down risk reduction, incremental
development may be thought of as being bottom up (or outside-in, cf.
Section 3.1). The software prototypes are an adaptation of a hardware
design validation method. Hardware requires top-down development;
the design must be complete before fabrication begins. Incremental
development, on the other hand, takes advantage of some features
unique to software. Because software possesses conformity and
invisibility, it is possible to structure software units to fit arbitrary
requirements. Thus there is a far broader range for software increments
than would be possible for hardware components. We can think of two
classes of software increment. The first is comparable to the hardware
unit. It provides its users with some fixed functionality, and the
designers can translate operational feedback into product improvements
or extensions. This type of incremental development is implicit in
Brooks’s phrase, “grow, don’t build, systems.” It is also how Lehman’s
E-type programs evolve [Lehm80]. This is how PC word processors have
moved from file editing tools to desktop publishing systems; it is also
how experience with small, stand-alone hospital applications matured as
comprehensive hospital information systems [Lind79]. Domain
experience accumulates, gold plating is avoided, and symbiotic interac¬
tions between the developers and users produces a sound understanding
of the requirements. A positive feedback cycle is created. Effective
tools help users discover how technology can meet their needs, which
leads to more effective tools.
The second form of incremental development focuses on the
implementation of a single system. One example of this approach was
given in the description of the cleanroom (Section 5.4). Here one
begins with a formal specification for the target product; increments
then are defined, built, and tested. By keeping the deliverables relatively
small, their visibility is enhanced, and their management is easier. The
project plan has a fine granularity, and one quickly can tell whenever the
schedule is not being met. (In Section 5.4 I emphasized the incremental
testing of the product’s reliability; naturally, this management-control
benefit also was present.) Gilb describes calls his approach to incremen¬
tal development evolutionary delivery or evo planning [Gilb88], It
encompasses the following critical concepts.
The four scales are nested, and a ratio scale also has the properties of
the other three. Knowing the metric’s scale provides information about
the allowable operations on it. Of course, the scale may not always be
obvious. For example, is an error that takes 8 hours to correct twice as
“bad” as one that takes 4 hours to correct? Not if we include cost to
recover from, cost to detect, cost to distribute updates for, etc. Thus,
the measurement scales warn of some operations that may produce
meaningless results (e.g., dividing intervals or subtracting ordinals), but
they do not guide us in understanding what our metrics measure.
Because we are not measuring repeatable, objective phenomena, the
primary goal in metric identification is to find measurements that will
tell us something interesting about the project we are managing. For
example, as a metric, lines of code is of little interest. It becomes useful
when there is a model that transforms lines of code into effort months
or lines of code into an expected number of errors. Given these models,
we can estimate lines of code to estimate effort months, or we can count
lines of code to estimate the as-yet undetected errors. The problem is,
however, that there may not be a model suitable for our specific project.
We must either learn to live with an imperfect model or construct a
better one. In either case, we begin by examining experience with
existing models and then determine how to accommodate local factors.
MANAGING THE PROCESS 459
e = (o + 4m + p)/6.
Total 24840
As the estimates are calculated, they are entered into the first four
columns in the table shown in Figure 6.4. These represent our
estimation of the size of the problem to be solved based on our
experience with such problems. The next two columns of the table list
the multipliers derived from local experience, and the final two columns
contain cost and time as a function of LOC. Notice that it is assumed
that the multipliers vary with the complexity of the software function.
The results of the final calculations are shown in Figure 6.5. In this
illustration I used a dart-board model for computing cost and effort, and
I found that the cost will be $506,000 and that the project will require
116.5 effort months. Two questions are obvious. How good is that
estimate and is the expenditure reasonable for the project? We can gain
some confidence in the estimate by repeating the process using a
different project decomposition. Figure 6.6 shows a decomposition of
software components by phase. Again, historical data are used to
convert lines of code into dollars and days. The totals are within 10%
of each other—close enough to suggest that the estimate is reliable. Now
we must address the question of the project’s being worth the expendi¬
ture. If it is not, then a design-to-cost philosophy or an incremental-
MANAGING THE PROCESS 461
L = CkK1V'3
year curve is off the graph for the 500K line system, and a 500 effort-
year project requires over 5 years. Naturally, the projects are not as
elastic as the curves would suggest. The relationships they define hold
true only for larger projects with “reasonable” variations in the scope of
the key parameters. Therefore, the model has little utility for the small
to large projects that are the focus of this book.
A second class of cost model looks for relationships that will
compute cost and time as a function of the number of lines of code.
Most such relationships are expressed in the form
E = aLb
where E is the effort (or time) and L is the number of lines of code.
The models focus on producing families of (a,b) values to account for
project-specific factors. The most widely used of these models is the
Constructive COst MOdel (COCOMO) developed by Boehm [Boeh81].
It is defined in three levels: basic, intermediate, and detailed. I will
describe only the intermediate model.
The first step in using the COCOMO model is to establish the type
of project. Boehm defines three modes of development.
Different sets of (a,b) values are used for each mode. Once the mode
is established, the software components are defined and an initial size
estimate in delivered source instructions (DSI) is computed. Unlike the
Pressman example, only a most likely value is given for each component.
These estimates will now be refined using adaptation and cost-driver
data.
The COCOMO adaptation equations first reduce the number of lines
by the amount of effort that will be saved through reuse. The equation
is based on a 40-30-30 rule, and it accounts for the effort saved in the
reuse of designs, code, and/or integration data. The adaptation
adjustment factor (AAF) is computed for each component as
For new (i.e., 100% modified) components, AAF is 100. The estimated
DSI (EDSI) is then calculated as
T = aEb
Three pairs of (a,b) values are available to compute the total develop¬
ment time (TDEV) as a function of (MM)DEV and the development
mode. Once the total elapsed time is established, it must be distributed
into phases (e.g., requirements analysis, product design, programming).
Thble look-ups are provided to distribute TDEV among the development
phases as a function of development mode and project size. The result
is a program plan that allocates labor and time by phase.
The basic COCOMO model is similar to the intermediate model
except that it does not apply the adaptation and effort multiplier factors.
The detailed model extends the analysis to each phase. Naturally, the
validity of the model depends on how well the parameters and table
values match the realities of a particular project and organization.
Boehm’s data were derived from numerous studies, and the number of
COCOMO users suggests that they provide a useful baseline. Notice
how the COCOMO approach differs from that of Putnam. Putnam
begins with the assumption that there is something “natural” about the
software process, and he sets out to model that property. Boehm, on
the other hand, believes that there are many factors that affect produc¬
tivity in software development, and he sets out to identify and empirical¬
ly quantify them. In this brief discussion, three factors were identified:
the development mode, the degree of reuse (adaptation), and the effort
multipliers. Even if one does not use the COCOMO model, an
awareness of these factors is important to management. Figure 1.13,
presented in a different context in Chapter 1, displays the relative effect
of each multiplier. Consistent with my theme of software engineering
as problem solving, the range of personnel and team capability is twice
as great as that of any other factor. The range suggests that the cost for
466 SOFTWARE ENGINEERING: A HOLISTIC VIEW
a project staffed by the least capable personnel will be four times greater
than that of a project staffed with the best people. Of course, that is
misleading; the least-capable staffing might never get the job done.
One of the problems with models such as COCOMO is that they
depend on the number of lines of code to estimate the cost. That is,
they assume that the project cost is a function of the size of the
delivered product. The use of fourth generation languages (4GL) and
other productivity-enhancing tools demonstrates that products of
different sizes can deliver identical functionality. Naturally, the smaller
the product, the lower the development cost. But not all project costs
are related to the size of the delivered product. Jones points to the
following paradox: for a fixed product, the cost per line of code increases
as the total number of lines is reduced [Jone86a]. This is because the
fixed costs for requirements analysis, top-level design, integration,
training, etc. are independent of the programming language. As the
denominator in the cost/line decreases, the ratio increases. Thus, the
estimates of the size and effort for atypical development environments
present very special problems [VeTh88].
It would be helpful to have a technique that estimates a project’s
cost as a function of the target system’s attributes rather than its
predicted size. Such a method would compute the cost of the problem
to be solved rather than the product to be delivered. Function points,
initially developed by Albrect for commercial applications, offer an
alternative that comes close to this goal [Albr79], The objective is to
characterize the functions to be provided by a system and then to use
that characterization for estimating the effort. Five system properties
typically are used to compute the function point count (FP).
commercial tools for software cost and schedule estimation, and most of
them have satisfied and committed users. Find one that works for your
organization, and adopt the tool to your environment (but not vice
versa). Second, to be done in conjunction with number one, learn how
to estimate. If estimation were easy, we would do it instinctively and
never make mistakes. But estimation is counterintuitive, and we need
to develop the experience that makes it seem instinctive. (Repeat the
last sentence with “management,” etc. in place of “estimation.”) Tb
learn how to estimate we must think of estimating as a continuing
process. (Just like planning.)
Figure 6.8 is taken from a paper by Boehm; it shows how cost
estimates improve as the project progresses. During the feasibility stage,
estimates may be off by as much as a factor of 4. By the time the
detailed design specifications have been developed, estimates for the
remaining work are quite reliable. Certainly, we should not rely on the
least-dependable estimates for planning the project. As we gain more
information, we should refine the estimates and adjust the plan.2
DeMarco suggests that an organization assign a team the sole task of
estimating; by staying with this process and refining the estimates, team
members become skilled estimators [DeMa82]. (In keeping with Theory
Z, one need not stay an estimator forever. In fact, the estimation
lessons learned will be very useful in virtually all other assignments.) If
an organization is to adopt DeMarco’s idea, then it is necessary to
understand what estimating is. It is not the default definition of “the
most optimistic prediction that has a non-zero probability of coming
true.” Rather, it is “a prediction that is equally likely to be above or
below the actual result.” For DeMarco, “Estimates shall be used to
create incentives” and not as management-imposed targets to be met.
That is, by making the estimate an evolving best guess, it no longer can
serve as the abstract event to be managed by. He therefore suggests a
restructuring of the software process to create
His book describes how to define metrics that maximize BPB. As one
would expect, the parameters reflect local factors, and they improve as
experience accumulates.
A brief reprise. The chapter began with a review of metrics and
measurements and then examined three classes of model for estimating
time and cost. After considerable build up, the underlying flaw in this
process was exposed. We use models to build a plan, and we manage
the project with respect to that plan. However, because we are not
physicists trying to conform to some external reality, our models (and
therefore our plans) are not bound by any external constraints. A poor
plan, when well managed, will result in the suboptimal conduct of the
project. We have no alternative but to recognize these facts and learn
to accommodate these difficulties. We must subjectively define objective
criteria for controlling the process; as with other aspects of software
development, the fact that our performance is correct with respect to
plan offers no assurance that our plan is correct. That important point
reiterated, let us continue by examining another category of management
model, which addresses the question of how to validate that a piece of
software functions as expected. As discussed in the previous chapter, the
470 SOFTWARE ENGINEERING: A HOLISTIC VIEW
issue is one of software reliability (i.e., the probability that the software
is defect free). Is this a management or a V&V concern? Obviously,
both. Management must establish the desired level of reliability and
ensure that there is sufficient support for testing; the V&V team,
naturally, will be responsible for the conduct of the task and the
calculation of the reliability estimates. Both management and V&V
must base their decisions using a common set of reliability models.
The field of software reliability is broad, and I cannot hope to cover
much of the material here. The standard work is that of Musa, Iannino,
and Okumoto [MuI087]. Shooman covers the topic in some detail in
his text [Shoo83]; recent surveys include [Goel85], [Leve91], and
[ReVe91]. By way of a short introduction, I review some of the concepts
presented in a paper by Musa and Ackerman with the subtitle, “When
to stop testing?” [MuAc89], In it, failures are defined as runs in which
the outputs do not conform to the requirements, and faults are the
instructions that underlie the failure. An operational profile defines the
set of valid inputs with their distribution, and we assume that software
failure is a Poisson distribution process. We define p(j) to be the
cumulative number of failures expected to occur by the time that the
software has experienced a given amount of execution time, r. The
derivative of p represents the instantaneous rate of failure, denoted A(r).
There are several formulas for computing p(r). The static model
assumes that the software is not being changed, and A(r) reduces to the
constant, A. The basic model accounts for changes as the software is
being debugged; the assumption is that the faults are equally likely to
cause failures. In the logarithmic Poisson model, some faults are
considered more likely to cause failures than others. Figure 6.9
compares these three software reliability models with respect to both
failures experienced versus execution time and failure intensity versus
execution time.
The preconditions for using one of these reliability models is that
the significant failures have been defined, that an operational profile has
been specified, and that a failure-intensity objective has been set to some
desired level of confidence for the defined failures. The idea is that the
model defines the desired reliability property of the software as a
function of its testing time. If too many errors are encountered in the
specified time, then one may conclude that the desired reliability level
has not been achieved. In practice, the process proceeds as follows:
■ Continue testing until the selected model shows that the required
failure-intensity level has been met to the desired level of
confidence. [MuAc89, p. 22]
By selecting test cases that are more likely to fail, testing time may be
compressed. Nevertheless, to provide the necessary confidence for a
critical system that will be broadly distributed, considerable testing time
must be allocated. There are two problems with this approach in small
to large projects. First, the reliability models have been validated on
very large systems using extensive testing time; it is not certain that the
models scale down. Second, the testing is predicated on the formal
definition of the system combined with a solid understanding of its use
and the cost of failures. In effect, what we are doing is using experience
with other systems to build general models for software reliability and
then fitting one of those models to the target system. Subjective
judgment is used in the selection and parameterizing of the model; once
that task is complete, however, decisions regarding the software’s
reliability become objective.
472 SOFTWARE ENGINEERING: A HOLISTIC VIEW
■ Module size. The costs for larger modules is relatively lower than
that of smaller modules, and there is no difference in fault rate.
This suggests that standards limiting the size of software units
may be ill advised. However, a subsequent analysis also showed
that complexity limits may promote maintainability.
Card suggests that what I have called open-loop learning is, in reality,
individual learning, which stays with the individual and is not retained as corporate
experience. Thus, as individuals move on, their experience base is lost. Process
management, however, attempts to share a common process that incorporates the
learning; it also provides a foundation for new learning to improve the process.
That is, the structure of process improvement should institutionalize the process,
the improvements, and the improvement process.
474 SOFTWARE ENGINEERING: A HOLISTIC VIEW
we first must get the process under control so that we can predict
outcomes reliably; we then must find ways of improving the process and
evaluating those improvements.
The initial chapters in this book discuss how to develop quality
software (i.e., software that is delivered on time and within budget, does
what is needed, is error free, can be maintained easily, and so on). If it
sufficed to follow those practices, management would have little to do.
Quality products would result automatically. Conversely, attempts by
management to control the process without the commitment of the
development and maintenance teams can have limited impact. There is
no manufacturing process in software, and the prerequisites for quality
improvement are improvements in the analysis, design, and testing of the
products. If software development is an intellectual, problem-solving
activity, so too is the improvement of that activity. Therefore, process
improvement demands the support of both the technical and manage¬
ment staffs, a fact noted throughout Humphrey’s six basic principles of
software process change.
■ Software engineering
Ultimately, everyone must be involved.
is a team effort, and anyone who does not participate in
improvement will miss the benefits and may even inhibit
progress.
Observe that principles one and six place the primary responsibility for
process improvement on management.
Because one cannot evaluate the impact of change unless there is a
stable baseline, Humphrey begins with a process assessment designed to
MANAGING THE PROCESS 477
learn how the organization works, to identify its major problems, and to
enroll its opinion leaders in the change process. This assessment activity
requires the support of senior management. The team’s assignment is
to review the organization’s process with respect to some vision of how
its processes should be performed. The five-level maturity framework
is used as the context for comparison, and one outcome of the assess¬
ment is a common view of the desired software process. The objective
is to improve how the technical teams operate, and that requires their
cooperation. Confidentiality must be preserved, and the assessors should
avoid the perception that problems are being reported to management.
Honesty and openness are necessary, and the assessment team should
avoid presenting itself as knowing all the answers. The focus must be
on actions that can produce improvements; general sessions on problems
will be limited to a Hawthorne effect. The goal of the assessment
process is to establish a baseline and initiate an improvement program.
A baseline without the commitment to follow up is a hollow gesture; an
attempt to improve the process without a baseline is an open loop.
The thrust of Humphrey’s book is a description of the key elements
within each improvement phase. Most of this material already has been
presented elsewhere in these pages. Nevertheless, it is useful to see how
the concepts are organized from the perspective of Managing the
Software Process [Hump89a]. The tools to achieve the level of a
repeatable process include planning, an initial level of configuration
control, and a software quality control program. The move up to a
defined process requires standards, inspections, testing, extended
configuration management, and a definition of the software process. To
enable the last activity Humphrey suggests the creation of a Software
Engineering Process Group (SEPG) that identifies the key problems,
establishes priorities, defines action plans, gets professional and
management agreement, assigns people, provides training and guidance,
launches implementation, tracks progress, and fixes the inevitable
problems. Naturally, the SEPG requires the support of senior manage¬
ment, the line projects, Software Quality Assurance, and the professional
staff; it provides a well-staffed, continuing focus for process improve¬
ment, which must be integrated into the organization’s ongoing activities
to succeed. Once a defined process has been established, a baseline
exists against which process interventions can be evaluated. This
requires data gathering and the management of software quality. Finally,
the optimization process emphasizes defect prevention, automated
support, and—perhaps—contracting for software.
Notice that process optimization rests on three pillars. The first is
the positive removal of errors, which in the software context is the only
measure of quality variability. The second is the support of a controlled
process using automation to reduce effort or error potential. Notice
that the premature introduction of automation may simply institutional-
478 SOFTWARE ENGINEERING: A HOLISTIC VIEW
That is, when a process is under statistical control, finding results that
are outside the expected bounds tells us something about the process and
not its control. If we react to individual out-of-bounds events by
adjusting the process’s control, we will—by definition—get it out of
control. However, if we use the out-of-bounds events to guide an
investigation of the process, then we have an opportunity for significant
process improvement. The key, therefore, is not to try to correct the
statistical control for outliers, but to remove those outliers by means of
process changes.
Humphrey’s recommendations for the prevention of defects are
consistent with the concepts presented throughout this book. They are
important enough to bear restatement.
■ The programmers must evaluate their own errors. They have the
greatest interest and will learn the most from the process.
■ This is
There is no single cure-all that will solve all the problems.
implicit in my holistic perspective. Error causes must be
removed one at a time; the evaluation of specific interventions
requires stability in those aspects not being changed.
The above principles for defect prevention are free of any measures.
Naturally, statistical control implies a measurement system that can
indicate when the process is out of control. In hardware manufacturing,
tolerances are computed for the parts; when a sample of parts implies
that tolerance is being exceeded, adjustments are made to the production
line to bring it back under control. With software, statistical control is
interpreted as the ability to predict with an expected level of certainty.
We have seen that management abstracts the project as a plan and then
manages that plan. The plan establishes levels of quality (e.g., reliability
and standards) and resource allocations (e.g., effort and schedule). A
project under statistical control will conform to its plan in a predictable
way. Being under statistical control, however, does not imply that the
process is in any way optimized; it simply means that we have good
models for describing how the processes operate. Without those models
we cannot control the project, but the models themselves are seldom
perfect. They do not model some external, fixed phenomena, and there
is no concept of correctness. Statistical control is equated with the
accuracy of prediction, and the process models are best interpreted as
stepping stones to better processes (with their models). All of which
raises the following question: If there is no way of knowing what the
process should be, then how do we know when changes improve the
process?
The method used to evaluate process changes is based on the
experimental paradigm of the scientific method. We begin with a null
hypothesis (H0) that states that the change to the system or the event
being measured has made no difference (i.e., the magnitude of any
effects can be attributed to chance). An appropriate statistical test for
H0 is selected, and a significance level (a) is specified as the criterion of
rejection. The data are collected, the statistic is computed, and its p-
value is determined. The null hypothesis is rejected if and only if the p-
value is no larger than a. If rejected, we accept the alternative
hypothesis (Hf), which is actually the hypothesis we are interested in.
Typically, the experimental paradigm is reserved for highly repeatable
phenomena, but we have seen that few software processes satisfy such an
interpretation. Basili has adopted the scientific method to software
480 SOFTWARE ENGINEERING: A HOLISTIC VIEW
■ Build. Using the chosen processes, build the products and collect
the prescribed data. Where the processes are closed loop, it is
essential that the data be analyzed promptly so that the results
can be fed back for process adjustment.
■ Analyze. After the project has been completed, analyze the data
to evaluate the current practices, determine problems, and
recommend changes.
by the numbers and sizes of new units, modified units, and reused
units), and documentation (measured by the number of pages).
the observed rate from the current project. When the process exceeds
a control limit, it is by definition out of control and some management
action may be mandated. In this way, the control limits permit normal
variability while providing a very early warning of potential problems.
The same technique also can be used to evaluate the effect of a
technology introduction [Sayw81]. The intervention should produce a
new target/expected rate (with its set of control limits) that is in some
sense better, and project tracking can assess how well the project
conforms to those new limits.
Card’s book is clear and concise, and it is recommended reading for
both students and practitioners. He offers some very specific conclu¬
sions concerning the software he analyzed. However, I have not
repeated his results. Unlike physics, with its external universe, the
software process is adaptive. There are no universal measures. The
process of modeling the software process becomes more important than
the form of that perishable process model. There is an inherent tension
between control and improvement. Management needs a fixed model to
control software development and maintenance, and process improve¬
ments violate the established control limits. Of course, that is an old
problem for industrial engineers, but it is still new to software engineers.
I let Card have the last word on the subject.
The previous section considered how to use models for both manage¬
ment and process improvement. T\vo complementary modes of improve¬
ment are possible. One addresses the hygiene factors that degrade
productivity and quality. The first five chapters of this book provide a
collection of effective techniques for replacing bad practices. Although
many of these methods are philosophically incompatible, there is much
to choose from. Each alternative provides the means for avoiding
“doing things the wrong way.” Thus, in most cases the hygiene factors
can be confronted simply by process examination, introduction of
refinements and/or education, and then follow up, follow up, follow up.
The second approach to process improvement derives from the
MANAGING THE PROCESS 487
■ Systems software. Here they noted a trend away from very large
products with most development efforts “limited to no more than
two years and 10 programmers on any particular product.” Most
testing is considered to be part of the development effort, and
separate test groups normally report to the development manag¬
ers. Staff turnover tended to be relatively high, and the software
engineering practices varied widely; “the older the system, the
fewer software engineering techniques used.”
■ Tools typically are funded from project budgets, and there are
few mechanisms for prorating investment and training across
projects.
The reviewers also noted that many tools are incomplete or poorly
documented. “Because such tools fail to live up to promises, project
managers are justifiably reluctant to adopt them or consider subsequent¬
ly developed tools.” One would hope that a new generation of
commercially supported software tools would mitigate this problem, but
managers (whose primary responsibility is risk reduction) have a long
memory for disappointments. Once a tool fails, there is a limited
incentive to try another.
The remainder of this section describes three promising technolo¬
gies, which range in age from 5 to 15 years. Some, therefore, are
available for immediate exploitation, others may require a decade of
refinement. In making a decision regarding how to implement these
technologies, management must understand what is commercially
available, how it can be integrated into their organization’s operations,
and the readiness of their staff to accommodate the changes. It would
be foolish for me to offer advice in making that determination. I can,
however, explain the technology’s underlying concepts, the understanding
of which is necessary for an intelligent decision.
6.3.1. Reuse
Thus, the solution space is smaller, and both concepts and module units
can be reused. If management commits to an object-based development
environment, the potential for increased reuse is a by-product.
If knowledge-based reuse is not a mature technology and if modular
reuse is enhanced with the adoption of an object-based environment,
then is there anything that management should do beyond simply
promulgating the use of Ada, C++, Eiffel, or Smalltalk? Emphatically,
yes. First, management must recognize that simply changing to a new
language will not lead to much of a productivity improvement. Recall
from Figure 1.13 that the effort multiplier with the least impact is that
of language experience. Once one knows how to program in one
language, one can quickly learn to program in another. Thus, if one has
a decade of experience with FORTRAN programs, it will not take long
to learn how to use Ada to write FORTRAN-style programs, which
promises no encapsulation payoff. If a new method is to be used, then
training is required. There is, however, a corollary to Brooks’s law,
“Training in a productivity-enhancing technology only reduces near-term
productivity.” Therefore, a commitment to a new technology without
budgeting for both the training and learning costs is really no commit¬
ment at all.
Even when the new technology is properly integrated into the
environment, reuse benefits will be slow. Because the first projects have
no libraries to draw from, little can be reused. In fact, the cost of
preparing modules for reuse is greater than the cost for their one-time
use. Consequently, one can expect productivity to decrease during the
first few projects conducted in a reuse context [BaBo91, CaBa91j. Reuse
requires a long-term commitment; its benefits may take years to
materialize. Furthermore, management would be prudent to follow the
risk-reduction principle of starting small and learning by doing. Some
researchers are working on environments to maintain and retrieve
reusable fragments. In particular, the faceted-retrieval scheme of Prieto-
Diaz and Freeman has received considerable attention [PrFr87]. This
kind of work focuses on investigations into the management of large
MANAGING THE PROCESS 493
6.3.2. Reengineering
Designers need to know about only the first two of these; the third is
the outcome of their development activity. Clearly, the maintainers
must understand what the product does, how it is structured, and what
the implications of change are. Prior to structured programming,
branching encouraged the distribution of processing throughout the
program; after structured programming, the effect of most processing
was limited to what could be read on (or evoked from) a single page.
Encapsulation, as discussed in Chapter 4, continues this process of
localizing concepts and hiding details. Nevertheless, considerable effort
is required to gain a sound understanding of the product. The mainte¬
nance activity, therefore, should be organized to speed the accumulation
of the desired knowledge and maximize its retention.
When a development team follows a throw-it-over-the-wall philoso¬
phy, its staff retains the product-specific knowledge only through system
acceptance. It is assumed that another contractor will assume mainte¬
nance responsibility, which requires an investment in learning the above
three categories of knowledge. When, on the other hand, the develop¬
ment team has a mission orientation, the “wall” disappears; team
members retain a commitment to the entire product family. In either
case a stable, dedicated maintenance staff is desirable. One learns by
doing, and in time one gets to be very good at it. Moreover, when
maintenance-support tools are used, the training investment also is
reduced. Using maintenance as a training ground for new programmers,
however, is often inefficient. The important assignments are too
difficult for the junior staff, and the knowledge gained from the smaller
assignments is seldom developed. When possible, my personal prefer¬
ence is for the mission orientation; here the interest is in the problems
and their solutions—not a narrow technology (cf. the above discussion of
Theory Z). Knowledge of the problems, technology, and implementation
reinforces each other; the potential for reuse and symbiotic transfer
grows. The mission orientation is most easily instituted with internal
developments.
A second, related aspect of maintenance is the characterization of
its process. Basili suggests that we should consider maintenance “reuse-
MANAGING THE PROCESS 495
■ Full-reuse model. Here one starts with the requirements for the
new systems and reuses as much as possible from the old system.
Because the new system is built from documents and components
in the repository, a mature reuse foundation is a prerequisite.
Requirements
(constraints, Desiqn Implementation
objectives,
business rules)
I have used the term reengineering in the section title; it is the broadest
of the concepts defined here.
If an organization is to perform maintenance, then it must engage
in design recovery. How else will it know what to change? Naturally,
the better the software products conform to modern programming
practices, the easier that design recovery should be. Once the software
structure has become so degraded that modifications are either very
risky, costly, or both, there is no alternative but to freeze or retire the
existing programs. Replacement alternatives include a costly new
system, a difficult-to-maintain revision of the existing system, or an
inexpensive and reliable reengineered system. The adjectives leave little
ambiguity about my recommendation. Although reengineering has a
clear advantage over the other approaches, in the state of today’s
practice it is largely an intellectual and manual process. There are some
design recovery tools that help designers navigate through the existing
programs, and restructuring tools exist to convert operational systems
into forms that are more easily maintained [Arno86], Here is the
manager’s dilemma. Reuse establishes a foundation for process
improvement in later projects, but its immediate impact is increased
cost. Until reuse is available, the most cost-effective alternative to new
development is reengineering. But, reengineering, without reuse, simply
sets the stage for another iteration of reengineering. That is, because
there is no repository for the recovered designs, the structure of the
newly reengineered system will become obscured during modification,
and ultimately another round of design recovery will be needed. Thus,
design recovery can never be automated until a reuse scaffolding is in
place. And without the ability to reuse, design recovery (which is at the
heart of reengineering) will remain a labor-intense, problem-solving
activity. No silver bullet.
Clearly, automation can help, but we had better know what we are trying
to do before we begin to automate the process.
The presentation in this chapter has been a “what is” rather than a “how
to” description of software project management. I can justify my
approach by pointing to all the books that describe the management
process. The most widely referenced text on the subject is by Metzger
[Metz81]. There also are several books that address the people-oriented
aspects of managing software projects [CoZa80, Lick85], Most software
engineering textbooks have sections on management; Sage and Palmer
MANAGING THE PROCESS 501
Forty years ago, the initial concern in the new discipline was the
creation of high quality programs. It is my thesis that software
engineering is a response to that concern; it focuses on finding better
ways to design and implement programs.
For example, the waterfall model and its variations aim at validating
and refining the design until it is specific enough to be converted into
a set of programs that can be tested individually and then integrated.
The idea is that we cannot create correct programs until we have a valid
design; once that design exists, programs can be coded. The proof of
correctness addresses the issue of algorithm design; it is concerned with
understanding the logic of the computation, which is—of course—a purer
representation for the program. The cleanroom eschews execution-based
testing so that the designers can focus on what the programs should do
rather than how they happen to behave. Emphasis is placed on
reasoning about the program’s behavior rather than feedback from
hacking. Formal methods such as VDM use mathematical descriptions
at a higher level. Here the product is a system and not just a program,
and the availability of a formal specification provides a mechanism to
evaluate the correctness of the programs. The debate between followers
of decomposition and composition was motivated, in a large part, by a
concern for the best way to structure a system as program modules. In
fact, programming-in-the-large refers to a module interconnection
language, and—as I have described it—modeling-in-the-small focuses on
modularization techniques for implementation. Have I missed alienating
anyone?
Although the previous paragraph identified many different methods,
the central goal was the same. We need to produce programs, which are
formal models of computation. Therefore, as a prerequisite to produc¬
ing these programs we must produce a design that specifies what the
programs are to do. Decades of software development have taught us
to believe that our only other choice is to write the programs first and
then produce the design—a universally rejected strategy. Nevertheless,
there is another way of looking at the problem. Revert back to the
essential software process, illustrated in Figure 1.5, and view the
software process as the transformation from a need to an automated
response to that need. As shown in the figure, we begin with subjective,
conceptual models of both the problem and its solution. We next create
a formal model that specifies the desired implementation, and then we
create the implementation. Software engineering thus reduces to the
discipline of combining a group of needs with the understanding of a
EPILOGUE 505
That is, one cannot understand the problem-oriented paradigm using the
concepts of the product-oriented paradigm. A problem-oriented
“program generator” is about as meaningful as a “correct initial
specification.” There is only an intuitive desire that the term makes
sense.
Having rejected the idea that the program is a product to be
specified and built, I have emancipated myself from the encumbrances
of program development. Tb address the issue of how to represent the
problem to be solved (i.e., the need to be met using automation), I must
explore how to exploit the special properties of software. (This pair of
complementary sentences expresses the paradigm shift; if that fact is not
clear, then reread them before proceeding.) Tb continue, software has
(at least) five characteristics that distinguish it from hardware.
Designer
2The version of TEDIUM used for OCIS generates programs in the MUMPS
programming language [Lewk89],
514 SOFTWARE ENGINEERING: A HOLISTIC VIEW
EXERCISES
1.1 In 1957 Backus talked about “one job done with a system.” Was
this a good evaluation of the system? How could he have made the
evaluation better? If one were to evaluate a system today, how should
it be done?
1.2 The 1957 Backus quotation showed how FORTRAN changed the
problem and reduced the size of the task. Are there systems in the
1990s that apply the same techniques that FORTRAN used in the
1950s? Explain.
1.4 Boehm, in his 1976 paper, talked of two problem areas. Where
are we today with respect to Area 2? Who are our technicians for Area
2, and what is (or ought to be) their training.
1.11 In Boehm’s 1976 illustration (Fig. 1.2), there are two places
where the word “test” appears. Are these the only places that testing is
done? Explain.
1.12 Are there other ways to show the flow implied by Figures 1.2 and
1.3? Try your own version of the waterfall flow.
1.13 The waterfall flow was designed for very large projects. Are there
problems with its use with small projects? Explain.
1.16 In Figure 1.4, which shows the essence of the waterfall model,
where can computer science improve the process? Explain.
1.17 In the essential software process model (Fig. 1.5), two types of
models are identified. From your experience with software, give three
examples each of a conceptual and a formal model. In the essential
software process, which type of model is of concern to computer
scientists? To practicing software engineers?
1.18 What are the advantages of doing it twice? Do you think that
this is a common industry practice? Explain.
1.24 Thke a computer program you have written and break it down
into chunks. Do the chunks map concepts into operational structures
(e.g., sequences and loops)? Are the chunks in turn composed of
chunks?
1.25 TYy this experiment with a colleague: Show him a short program
and test his recall; repeat the same experiment with scrambled code.
Compare the performance for the two cases.
1.28 For the designer of an environment for human users, what clues
do the studies of bias in decision making provide?
1.29 Do you agree or disagree with all that has been said about human
problem solving? Explain.
1.31 Why do you think that the LST model concentrates on reification
rather than abstraction?
1.37 Why do you think that the list of potential improvements that
Brooks lists in his “No Silver Bullet” paper can, at best, address only
accidental difficulties.
EXERCISES 521
1.41 List five facts that you believe would surprise most noncomputer
people. That would surprise most managers. Most engineers.
2.2 Give an example of some process that can be specified, but that
may be difficult to implement as a program. What are the major
522 SOFTWARE ENGINEERING: A HOLISTIC VIEW
2.6 Is a formal SCM system required for all projects? If not, what
conditions would justify an informal approach to SCM?
2.7 Identify some of the costs and delays inherent in a formal SCM
system. How can they be reduced?
2.8 Refer to the waterfall diagram in Figure 1.2 and list the CIs that
should be maintained under SCM for a large project. Are there any
changes for a small project?
2.10* Write a short report that defines the method for revision
numbering. Be sure to include sufficient detail so that the report can
be referenced as a requirement for the code management component of
the SCM system.
EXERCISES 523
2.11 Give two examples of systems in which the design of the system
can be defined by the data representations it manages. Give two
examples in which the data structures are not important. How
important are processes that act on the data in these illustrations?
2.14* Modify the ERM in Figure 2.8 to reflect the real-world fact that
either customers or developers can submit CRs. (You may have to
extend the notation to show this. Does the new notation introduce
ambiguity?)
2.15* Draw an ERM diagram for a system that makes task assignments,
requests the release of CIs, returns CIs, and reports status. Do not try
to follow the diagram in Figure 2.8. How does your diagram differ from
that in Figure 2.8?
2.16* Develop an ERM for the code management system of the SCM
system. Assume the external interfaces are messages to copy files to and
from the baseline. Obviously, the messages must identify both a
designer and an authorized SCM agent.
2.17 Present a statement using the relational algebra that would list
all CIs assigned to designer Jones. Identify the relations, their
attributes, and keys. Could you use the algebra without knowing the
schema? Now sketch a procedural program to produce the same results
from that schema.
2.20 Assume that there are two categories of customer. Some that
purchased systems and could not submit CRs and others that could
submit both SIRs and CRs. How would this change the diagram in
Figure 2.11? Why?
2.22* Using the context diagram developed in exercise 2.21, describe the
sequential flow for the processing of a Cl. Once you have laid out a
sequence including both loops and selections (i.e., if... then ... else), are
there any exceptions to this flow? (Note: in Chapter 3 we will see how
Jackson models this kind of sequence.)
2.23* Expand the Initiate Task box in the SADT diagram shown in
Figure 2.18 as a complete SADT diagram. Be sure that the diagram
inputs and outputs match those of the box in its parent diagram.
2.24* Note that the boxes in Figure 2.18 are actions and the flows are
data. This is sometimes called an “actagram,” and it is possible to draw
a “datagram” in which the boxes are data and the flows are actions. Try
drawing a datagram for the processing of a CR from receipt to Cl
update. What data would be drawn in boxes? What actions would flow
from box to box? What are the controls and mechanisms?
2.26* Define an event list for the condition described in exercise 2.25.
Present that event list in the form of a decision table.
2.30 What are the preconditions and postconditions for the two states
in the state-transition diagram in Figure 2.20? For the boxes in the
SADT diagram in Figure 2.18? How formal are your statements?
2.33* Write an SRS for the preliminary processing function of the SCM
system. Are some areas unclear? How can you resolve these issues?
Should you delay the SRS until you have these answers, or should you
defer the resolution until design? Why?
2.34* Use the various checklists to critique your SRS section. How
much additional effort would be required to get an SRS that satisfies
these minimal demands? Is it possible to produce a good SRS without
additional domain knowledge? Explain.
2.35* Outline the characteristics of the user interface for the code
management system. (See exercise 2.9.) Critique your design according
to the criteria given in the chapter.
526 SOFTWARE ENGINEERING: A HOLISTIC VIEW
2.39 How would the emphasis in this chapter change if the case study
were a business management system? An embedded process control
device? The security module of an operating system?
3. Modeling-in-the-Large
3.3 Discuss some aspect of the SCM and examine how it can be
implemented. Does your explanation begin at the top or with some
specific characteristic?
details)? To what extent did the top-down approach add to the clarity
of your presentation?
3.6 Describe the current physical and logical models for the way HHI
supports SCM. How will the new SCM system alter this (i.e., what is
the new logical model)?
3.7* The context diagram in Figure 3.1 was adapted from the entity
diagram developed using Orr’s method. In SA one would produce a new
physical model in the form of a data flow diagram, draw a boundary
around what the system should support, and then reduce what is inside
that boundary to a single bubble. Try this as an exercise starting with
the current logical model.
3.9* Assume that the input to a bubble is a sorted file containing all
active CRs sorted by Priority, Source, and CR-No. The input file has a
header containing its creation date. The output from this process is a
report containing a list of active CRs that identifies the source and the
number of active CRs for that source. The report starts a new page for
each priority category, is headed with the input file creation date, and
contains a summary page with the total number of active CRs in each
category. Write a minispec for this process (bubble). Write a data
dictionary entry for the input file. How would you show the sort order?
(Note: there is no standard method.) Write a description of the output
file. (Hint: show the report layout and identify the items in the report
by their dictionary names, if available.)
528 SOFTWARE ENGINEERING: A HOLISTIC VIEW
3.11* Write a data dictionary for the attributes identified in the partial
SCM scheme shown in Figure 2.9. Does that scheme match what was
produced by completing the DFD? How do you explain the difference?
3.12* Discuss the differences between the DFD and the structure chart.
3.13* Draw a structure chart for the process defined in exercise 3.9.
Explain where transform analysis was conducted. Was transaction
analysis required?
3.14 What insights about the product and its design did you develop
as the result of exercise 3.13? After drawing the structure chart, would
you change the pseudocode in exercise 3.10?
3.17 Discuss how (or if) the three phases of the Ward and Mellor
book correspond to the titles of Chapters 2 through 4. (Hint: as with
much of software engineering, there is no single correct answer.)
3.18* Make an event list for just the code management portion of the
SCM. Use the four-step approach suggested by Yourdon and produce
a DFD and data dictionary. Is this DFD different from what you
described in exercise 3.8? Does a process labeled Source Code
Management exist in the earlier DFDs? Should it?
EXERCISES 529
3.22* Use JSP to produce the pseudocode for a program that reads in
CRs (as structured in Fig. 3.14) and produces a report containing all
CRs that are active. It lists the CR number, source, and priority plus a
count of the number of CIs and documents affected. A final summary
page provides a count of active CRs plus the total number of CIs and
documents affected; this is given for each priority, and there is a total.
3.23* Repeat exercise 3.22 but list only completed CRs. Sort the
output by priority and then by date completed (contained in the Status
Complete block). List the CR number, source, and counts of CIs and
documents affected. Include a final summary page. (Note that this kind
of structure clash is easiest to resolve using a sort file as shown in Fig
3.19.)
3.25* Figure 3.23 shows the life history of a CL Draw a similar Jackson
diagram for the activities of the CCB.
3.26* What are the actions of interest to the CCB? How are these
actions similar to events as they were discussed in the state-transition
530 SOFTWARE ENGINEERING: A HOLISTIC VIEW
model? Are state changes associated with actions? Are state changes
associated with moving from box to box in a Jackson diagram?
3.29* Produce pseudocode (and the Jackson diagrams) for the three
processes introduced in Figure 3.31. Note how each box in the network
represents a process diagram that can be converted into pseudocode text.
3.31* Use the contents of the control hierarchy shown in Figure 3.34
to produce a structure chart for the SCM system. (Treat Task as one
module; all the other modules should have been expanded as the result
of the above exercises.) Note that one would not normally produce a
structure chart using JSD. I offer this as an exercise to show how the
information produced by the Jackson method may be represented as a
set of interacting modules. The key to JSD is how it helps the designer
find the interconnections.
3.32* Assume that two SCM systems are developed: one from the DFD
diagram in Figure 3.5 and the other from the control hierarchy shown
in Figure 3.34. How would these two systems differ in their internal
structure? The structure of the files on which they operate?
3.34* Continuing with example 3.33, discuss the effects on both systems
if we elected to add the completion file (described in exercise 3.24) and
the associated report.
EXERCISES 531
3.35 Assume that you had to give a presentation to the sponsor about
the SCM system that you were about to develop. Which types of
diagrams would you find most effective?
3.36 Discuss how you could combine some of the best features of
SA/SD and JSD/JSP. What would be the advantages? Disadvantages?
4. Modeling-in-the-Small
4.1 The general theme of this text is that as the design progresses we
go from solving problems defined in the application domain to solving
problems in the implementation domain. Using the titles of Chapters
2 through 4 discuss this view. See if you can define the dividing lines
between the steps in the overall process.
4.2* Choose some portion of the SCM design and write a design
specification for that set of functions. Include any of the graphic
descriptive materials (e.g., DFDs) that you may already have produced.
4.3* Design a cover sheet for the unit folders to be used by HHI in its
development of the SCM system.
4.4* Select a program module from the detailed design and prepare a
sample program prologue for it.
4.5 Select two programming languages with which you are familiar
and rate them according to the characteristics of a modern programming
language. Can the languages be improved? If so, how?
4.9* Describe some of the rules identified in Figure 4.2. How many
of these rules can be automated and how many rely on human judgment?
4.14* Draw a DFD for the KWIC index program. Use JSP to create a
program structure diagram for the KWIC index program. How do these
designs differ from each other and the two Parnas modularizations?
4.17 Write pre and postconditions assertions for the four stack
functions in Figure 4.6. (Use the notation used in the VDM discussion
in Section 1.2.3.)
4.19 Prove that the procedures in Figure 4.9 satisfy the axioms in
Figure 4.6. Is there anything else in Figure 4.9 that requires a proof?
4.21* Declare S to be of type stack and write some Ada code that uses
the operators CREATE, PUSH, TOP, and POP on S. Write some
sequences that would raise an exception.
4.22* Define a package for the type LIST with the operations CREATE,
HEAD, and TAIL where HEAD returns the first item in the list and TAIL
returns the list without the first item. Restrict the list items to the
integers, and identify the exceptions.
4.24* Write a program that uses LIST_CI to list out all versions of a
given Cl. That is, assume that the program has CI No as an input.
(Use pseudocode and do not be concerned with the exact Ada syntax.)
Extend this to a function call SELECT CI that reads in a value for
CI_No, produces the list together with a self-generated sequence number,
534 SOFTWARE ENGINEERING: A HOLISTIC VIEW
and once the list is produced prompts the user to select one of the listed
items by entering the sequence number. If the entered number is valid,
the function returns the CIJD of the selected item; otherwise it raises
a NOT FOUND exception.
4.25* Augment the definition of the CIJD record by adding the name
of the file in which the Cl is actually stored. Explain how this link
between the surrogate record that describes the Cl and the physical file
that stores the Cl can be exploited.
4.26* Write the procedure for NEW VERSION using the CIRECORD
package. Do the same for ASSIGN.
4.28 Reexamine CI RECORD and explain why you feel that the design
is satisfactory or not satisfactory. Hint: try to determine an invariant
and prove that the operators do not affect the invariant.
4.31 Provide more examples and counterexamples for the five criteria
for evaluating design methods with respect to modularity. Are there
additional criteria?
4.34 Eiffel has a predefined feature Create that is valid for all classes.
Identify some additional predefined features that might be useful. What
is the danger of having many predefined features?
4.35 Based on the discussion in the text, revise and complete the
definition of the Cl class given in Figure 4.14.
4.36 Define the class Cl-Info. (Develop your own syntax to show
inheritance if necessary.)
4.38* Define classes for Ada and C programs. Also define classes for
Unix and DOS applications. Now define a class for DOS applications
in C.
4.43* Complete the definition of the Task object in Figure 4.19. Now
complete it in Figure 4.21.
4.44* Use the template in Figure 4.22 to specify the Task object as
defined in exercise 4.43.
4.46 Summarize the five steps in OOA. How do they differ from SA?
JSD? OOD?
536 SOFTWARE ENGINEERING: A HOLISTIC VIEW
4.47* Prove the program shown in Figure 4.24. How do you know the
loop terminates? Was your proof formal or rigorous? If you worked in
a group, did the group interactions help or slow the process?
4.49* Prove the L3 statements about the binary properties of the gcd.
4.50* Complete the proof of Q' using the three intermediate assertions
provided by Hoare.
5.2 Some methods produce programs that are correct the first time.
The 40-20-40 rule implies that 40% of the total effort will be required
for integration and test. If the programs were already correct before
integration, how would the 40-20-40 rule change? How could you verify
this? (The author cannot supply a suggested answer to the first
question, but he has one for the second.)
5.7 Discuss why it is difficult for someone to find his own errors.
Are there cases when a person is the best one to identify errors. Recall
the discussion of incubation in Section 1.3.1; what is the benefit of
having time to think about problems? Recall the situation in the
Hubble Space Telescope project in 1980; what are the dangers of a
highly stressed environment?
5.11* Refine the response to exercise 5.9 and repeat exercise 5.10.
5.15* Lay out a plan for reviews from the creation of requirements to
the implementation of code. What should be examined, when, how, and
by whom?
5.18 Before reading this chapter, did your view of testing conform to
Myers’s definition? Describe your understanding and explain the
implications.
5.22 Compare Fagan’s data in Thble 5.4 with Beizer’s data in Thble
5.5. Are there any contradictions?
5.25* Find a small program and count its operators and operands.
Compute Halstead length, volume, and program level. How would you
automate this process?
5.27* Construct the test set for path coverage of the program in Figure
5.2. The test set must include both inputs and expected outcomes.
EXERCISES 539
5-28* What does the program in Figure 5.2 do? Should it be recoded
to do this better? Can all paths be tested with the test set? Is this a
problem?
5.29 Observe that exercises 5.27 and 5.28 involve static analysis. After
this analysis are you confident that the program is correct? What kinds
of errors might you find by machine testing?
5.34* Using the answer from exercise 5.28, construct a functional test
set for the program in Figure 5.2. What functional tests could be
developed using only the program code in Figure 5.2?
5.35* Assume that there was a program to test for dataflow anomalies
by static analysis as suggested in exercise 5.32. Describe a set of test
inputs.
identical in each. Given that the data type for stl and st2 limits them
to from 0 to 6 characters, develop a test set for this program.
5.38* Use error guessing to determine the three tests with the highest
probability of finding an error in the program described in exercise 5.37.
5.3.5. Integration
5.41* Describe a system test for the SCM system. Which of Myers’s
test categories are inappropriate for this product?
6.1 List some management problems that are valid for any kind of
project. List some problems that are unique to a software project.
EXERCISES 541
6.4 Management can control only resources, time, and the product.
Can a schedule be stretched without affecting the other dimensions? If
so, are there limits? Do all product changes affect the other two
dimensions? Give examples.
6.9* Develop a work breakdown structure for the HHI SCM project.
6.12 Describe how you might estimate risk exposure for the SCM
project. How would you compute risk reduction leverage.
542 SOFTWARE ENGINEERING: A HOLISTIC VIEW
6.14* Identify the major risks for the SCM project. Are they generic
or project specific?
6.18* Use Boehm’s top-10 list and identify specific risk examples that
might be expected in the SCM project.
6.19* What would be the best way for HHI to organize the SCM
project. Explain your recommendation.
6.21 Can one use incremental development when the project has a
fixed requirement specification?
6.22 Explain DeMarco’s statement, “You can’t control what you can’t
measure.”
6.24* Critique the cost estimates shown in Figures 6.5 and 6.6. Do they
seem reasonable? Is the effort on some modules underestimated?
Overestimated?
6.25 How does the Putnam analysis help in planning the SCM project?
EXERCISES 543
6.26 Using the ideas of the COCOMO model, are there any portions
of the SCM that could be reused from existing HHI products? What of
the effort multipliers; are they all nominal, or are some high? Or low?
6.27 Using the multipliers given in the typical formula, compute the
number of function points in the SCM project. If this exercise is being
done as part of a group activity, first do the computations individually
and then compare results.
6.29* Discuss the reliability of the HHI SCM system. What is the
prerequisite to knowing when enough testing has been conducted?
6.30* Assume that you were responsible for estimating the complexity
of software. Define three complexity metrics that might be effective in
the HHI environment.
6.35 Given the goal of reducing the number of changes to items under
configuration control, use the GQM paradigm to suggest how we might
achieve that goal.
6.37* Assume that you are hired as an expert by HHI to improve their
process. Draw up a plan for what you would do in the first 30 days
544 SOFTWARE ENGINEERING: A HOLISTIC VIEW
given that you have little information about how HHI now conducts its
affairs.
6.45* Prepare a plan for the introduction of CASE tools into HHI.
Appendix B
READINGS
[CaGo75] S. Caine and K. Gordon, PDL—A Tbol for Software Design, Proc.
Natl Comp. Conf, 271-276,1975.
[Gard85] H. Gardner, The Mind’s New Science, Basic Books, New York,
1985.
[Myer79] G. J. Myers, The Art of Software Testing, Wiley, New York, 1979.
[Oste87] L. Osterweil, Software Processes Are Software Too, Proc. 9th Int.
Conf. S. £., 2-13, 1987.
[Wall26] G. Wallas, The Art of Thought, Harcourt, Brace, New York, 1926.
Requirements Schema, 49
functional and nonfunctional Schulmeyer, G. G., 482
86-87 Scope of control, 214
implicit, 102, 167-69 Scope of effect, 214
Resnick, B., 49 Scott, D. B. H., 38-39
Restructuring, 496 Security, 174
Rettig, M., 433 Selby, R. W., 417, 419
Reuse, 490-93 Selection task, 53, 114
Reuter, V. G., 440 Semantic data models, 119-20
Reverse engineering, 496 Separation of concerns, 210
Reviews 270 Set difference, 114
benefits from, 377-78 Seven ± 2, 46
errors found, 377 Seven sins of the specifier, 166-67
inspection, 373-76 Seventeen years ± 3, 487
preliminary and critical, 369-70, Shen, Y. V., 392, 456, 457
436-37 Shneiderman, B., 49, 57, 170
requirements inspection, 376 Shoman, K. E., 131, 136
walkthrough, 370-73 Shooman, M. L., 174, 178, 470
Revision Control System, 98 Shriver, B., 329
Revision management systems Siegel, S. G., 95, 97
98-100 Simon, H. A., 45, 89
Riddle, W. E., 487, 490 Simplicity, 174
Rigorous approach, 152, 347 Simula 67, 313
Risk SIR, see software incident report
generic and project specific, 443 Six-analyst phenomenon, 227
management, 441-48 Smalltalk, 315
reduction, 448-55 Smedema, C. H., 276
top-10 checklist, 448-49 Smith, J. M., 120
Risks to the public, 364 Software development paradigms
Rochkind, M. J., 98 509
Rombach, H. D., 481 Software configuration management
Ross, D. T., 131, 136, 188 94
Ross, R., 435 Software crisis, 11
Rosson, M. B., 57
Software data collection system, 483
Royce, W., 454
Software design specification, 157,
Royce, W. W., 25-26, 37-38, 90 271-72
Rumbaugh, J., 329
Software engineering hierarchy
64-65
Sackman, H., 76
Software Engineering Institute, 499
SADT, see Structured Analysis and Software Engineering Laboratory
Design Technique 481
Sage, A. R, 501
Software Engineering Process Group
Sammarco, J. J., 500 477
Sarson, T., 188
Software engineering
Sayward, F. G., 486
definition, 20
SEI, see Software Engineering
scientific principles, 15
Institute
two cultures, 18
SEL, see Software Engineering Software factory, 495
Laboratory
Software incident report, 95, 96
INDEX 587
2. 8 1993
i&H — [
| A Kl O P 1QQT _
L
LJttru oW/
-^*r--
_
& £UwUt
-
SfevT1 APR 1 A
1 v/ L.UUU -
«
U
-gar -WM-
MAR 1 £ I N07 2 7
-MATH \ 1996
ncr rr- omy?-
uti —i—t £UU£
-~
A P>e
- w— 7189 gi
—:-
APR 1 2 2003
; ■ 1995— 38-297
ENT UNIVERS TY
1164 0226528 8
JHU/APL
Series in Science & Engineering
MIRROR WORLDS
Or: The Day Software Puts the Universe in a Shoebox
■ . . How It Will Happen and What It Will Mean
David Gelernter
1991
NONLINEAR OPTIMIZATION
Complexity Issues
Stephen A. Vavasis
1991
90000:
9 780195 071597
ISBN O-n-SDTIST-X