Ravi Sethi - Software Engineering - Basic Principles and Best Practices (2023, Cambridge University Pre
Ravi Sethi - Software Engineering - Basic Principles and Best Practices (2023, Cambridge University Pre
314– 321, 3rd Floor, Plot 3, Splendor Forum, Jasola District Centre, New Delhi –
110025, India
www.cambridge.org
DOI: 10.1017/9781009051811
A catalogue record for this publication is available from the British Library.
Cambridge University Press & Assessment has no responsibility for the persistence or
accuracy of URLs for external or third-party internet websites referred to in this
publication and does not guarantee that any content on such websites is, or will remain,
accurate or appropriate.
Brief Contents
Preface
1 Introduction
3 User Requirements
4 Requirements Analysis
5 Use Cases
7 Architectural Patterns
8 Static Checking
9 Testing
10 Quality Metrics
1 Introduction
1.1 What Is Software Engineering?
1.1.1 Definition of Software Engineering
1.1.2 A Tale of Two Companies
1.2 The Requirements Challenge
1.2.1 Identifying Users and Requirements
1.2.2 Dealing with Requirements Changes
1.3 Software Is Intrinsically Complex
1.3.1 Sources of Complexity
1.3.2 Architecture: Dealing with Program Complexity
1.4 Defects Are Inevitable
1.4.1 Fix Faults to Avoid Failures
1.4.2 Introduction to Testing
1.4.3 Black-Box and White-Box Testing
1.5 Balancing Constraints: The Iron Triangle
1.5.1 Scope. Cost. Time. Pick Any Two!
1.6 Social Responsibility
1.6.1 Case Study: The Volkswagen Emissions Scandal
1.6.2 The ACM Code
1.6.3 Case Study: The Therac-25 Accidents
1.6.4 Lessons for Software Projects
1.7 Conclusion
Further Reading
Exercises
3 User Requirements
3.1 What Is a Requirement?
3.1.1 The Basic Requirements Cycle
3.1.2 Case Study: Requirements Challenges
3.1.3 Kinds of Requirements
3.2 Developing Requirements and Software
3.2.1 Agile Methods Validate Working Software
3.2.2 Case Study: An Agile Emphasis on Requirements
3.2.3 Plan-Driven Methods Validate a Specification
3.3 Eliciting User Needs
3.3.1 A Classification of Needs
3.3.2 Accessing User Needs
3.3.3 Case Study: Intuit’s Design for Delight
3.4 Writing Requirements: Stories and Features
3.4.1 Guidelines for Effective User Stories
3.4.2 Guidelines for System Features
3.4.3 Perspective on User Stories
3.5 Writing User-Experience Scenarios
3.5.1 Guidelines for User-Experience Scenarios
3.5.2 Case Study: A Medical Scenario
3.6 Clarifying User Goals
3.6.1 Properties of Goals
3.6.2 Asking Clarifying Questions
3.6.3 Organizing Goals into Hierarchies
3.6.4 Contributing and Conflicting Goals
3.7 Identifying Security Attacks
3.7.1 Attack Trees: Think Like an Attacker
3.7.2 Finding Possible Attacks
3.8 Conclusion
Further Reading
Exercises
4 Requirements Analysis
4.1 A Checklist Approach
4.2 Relative Estimation: Iteration Planning
4.2.1 Anchoring Can Bias Decisions
4.2.2 Agile Story Points
4.2.3 Velocity of Work
4.3 Structured Group Consensus Estimates
4.3.1 Wideband Delphi and Planning Poker
4.3.2 The Original Delphi Method
4.4 Balancing Priorities
4.4.1 Must-Should-Could-Won’t (MoSCoW) Prioritization
4.4.2 Balancing Value and Cost
4.4.3 Balancing Value, Cost, and Risk
4.5 Customer Satisfiers and Dissatisfiers
4.5.1 Kano Analysis
4.5.2 Classification of Features
4.5.3 Life Cycles of Attractiveness
4.5.4 Degrees of Sufficiency
4.6 Plan-Driven Estimation Models
4.6.1 How Are Size and Effort Related?
4.6.2 The Cocomo Family of Estimation Models
4.7 Conclusion
Further Reading
Exercises
5 Use Cases
5.1 Elements of a Use Case
5.1.1 Actors and Goals Outline a System
5.1.2 Flows and Basic Flows
5.2 Alternative Flows: Conditional Behaviors
5.2.1 Specific Alternative Flows
5.2.2 Extension Points
5.2.3 Bounded Alternative Flows
5.3 Writing Use Cases
5.3.1 A Template for Use Cases
5.3.2 From Actor Intentions to System Interactions
5.3.3 How to Build Use Cases
5.4 Use-Case Diagrams
5.4.1 Diagrams Highlight Goals and Actors
5.5 Relationships between Use Cases
5.5.1 Subflows
5.5.2 Inclusion of Use Cases
5.5.3 Extensions of Use Cases
5.6 Conclusion
Further Reading
Exercises
7 Architectural Patterns
7.1 Software Layering
7.1.1 The Layered Pattern
7.1.2 Design Trade-offs
7.2 Three Building Blocks
7.2.1 The Shared-Data Pattern
7.2.2 Observers and Subscribers
7.3 User Interfaces: Model-View-Controller
7.3.1 Design Decisions
7.3.2 The Basic Model-View-Controller Pattern
7.3.3 Keep Views Simple
7.4 Dataflow Architectures
7.4.1 Dataflow Pipelines
7.4.2 Dataflow Networks
7.4.3 Unbounded Streams
7.4.4 Big Dataflows
7.5 Connecting Clients with Servers
7.5.1 The Client-Server Pattern
7.5.2 Deploying Test Servers
7.5.3 The Broker Pattern
7.6 Families and Product Lines
7.6.1 Commonalities and Variabilities
7.6.2 Software Architecture and Product Lines
7.6.3 Economics of Product-Line Engineering
7.7 Conclusion
Further Reading
Exercises
8 Static Checking
8.1 Architecture Reviews
8.1.1 Guiding Principles for Architecture Reviews
8.1.2 Discovery, Deep-Dive, and Retrospective Reviews
8.2 Conducting Software Inspections
8.2.1 The Phases of a Traditional Inspection
8.2.2 Case Study: Using Data to Ensure Effectiveness
8.2.3 Organizing an Inspection
8.3 Code Reviews: Check Intent and Trust
8.3.1 Invested Expert Reviewers
8.3.2 Reviewing Is Done within Hours
8.4 Automated Static Analysis
8.4.1 A Variety of Static Checkers
8.4.2 False Positives and False Negatives
8.5 Conclusion
Further Reading
Exercises
9 Testing
9.1 Overview of Testing
9.1.1 Issues during Testing
9.1.2 Test Selection
9.1.3 Test Adequacy: Deciding When to Stop
9.1.4 Test Oracles: Evaluating the Response to a Test
9.2 Levels of Testing
9.2.1 Unit Testing
9.2.2 Integration Testing
9.2.3 Functional, System, and Acceptance Testing
9.2.4 Case Study: Test Early and Often
9.3 Code Coverage I: White-Box Testing
9.3.1 Control-Flow Graphs
9.3.2 Control-Flow Coverage Criteria
9.4 Input Coverage I: Black-Box Testing
9.4.1 Equivalence-Class Coverage
9.4.2 Boundary-Value Coverage
9.5 Code Coverage II: MC/DC
9.5.1 Condition and Decision Coverage Are Independent
9.5.2 MC/DC Pairs of Tests
9.6 Input Coverage II: Combinatorial Testing
9.7 Conclusion
Further Reading
Exercises
10 Quality Metrics
10.1 Meaningful Metrics
10.1.1 Metrics Quantify Attributes
10.1.2 Selecting Useful Metrics
10.1.3 Goal-Directed Measurement
10.2 Software Quality
10.2.1 The Many Forms of Software Quality
10.2.2 Measuring Customer Support
10.3 Graphical Displays of Data Sets
10.3.1 Data Sets
10.3.2 Scales of Measurement
10.3.3 Bar Charts Display Data by Category
10.3.4 Gantt Charts Display Schedules
10.4 Product Quality: Measuring Defects
10.4.1 Severity of Defects
10.4.2 Defect-Removal Efficiency
10.4.3 Customer-Found Defects (CFDs)
10.4.4 CFDs Measure Installs, Not Quality
10.5 Ops Quality Improvement: A Case Study
10.5.1 How to Improve Software Quality
10.5.2 The Customer Quality Metric
10.5.3 Subgoals: Product and Process Improvement
10.5.4 Measuring Process Improvements
10.6 Data Dispersion: Boxplots and Histograms
10.6.1 Medians and Quartiles
10.6.2 Box Plots Summarize Data by Quartile
10.6.3 Histograms of Data Spread
10.7 Data Dispersion: Statistics
10.7.1 Variance from the Mean
10.7.2 Discrete Probability Distribution
10.7.3 Continuous Distributions
10.7.4 Introduction to Normal Distributions
10.7.5 Introduction to Student’s t-Distributions
10.8 Confidence Intervals
10.8.1 Definition of Confidence Interval
10.8.2 If the Population Standard Deviation Is Known
10.8.3 If the Population Standard Deviation Is Unknown
10.9 Simple Linear Regression
10.9.1 The Simpler Case: Line through the Origin
10.9.2 Ordinary Least-Squares Fit
10.10 Conclusion
Further Reading
Exercises
This book therefore focuses on basic principles and best practices. The
emphasis is not only on what works, but on why it works. The book
includes real-world examples and case studies, where possible. Some
classic examples are included for perspective.
Principles endure while practices evolve as the assumptions behind
them are reexamined. The principles in the book relate to the intrinsic
properties of software and human nature: software is complex, requirements
change, defects are inevitable, teams need coordination. Assumptions about
how to deal with these intrinsic properties have been tested over the years.
Must testing follow coding? Not with test-driven development. The
distinction between development and maintenance blurs with an evolving
software code base. All assumptions have to be questioned to enable the
pace of continuous deployment. What does not change is that design and
architecture are the key to managing complexity, iterative agile methods
accommodate requirements changes, validation and verification reduce
defects, and a healthy balance of structure and flexibility motivates teams
and improves performance.
Content Organization and Coverage
This book is intended for a junior - or senior-level introductory course in
software engineering. Students are expected to have enough programming
maturity to engage in a team project. They are not expected to have any
prior team experience.
The ACM-IEEE guidelines strongly recommend the inclusion of a
significant project in a software engineering course. First, systematic
engineering methods are intended for problems of complexity and scale.
With a significant project, students get to experience the benefits of
engineering concepts and methods. Second, users and teams bring a human
dimension to the discipline. Working with a real customer on a project
suitable for a team of, say, four provides students with a team experience.
Appendix A addresses the challenge of organizing a course with dual tracks
for concepts and a project. See also the brief comments in the following
paragraphs.
The chapters in this book can be grouped as follows: getting started,
what to build, design and architecture, software quality, and metrics.
Getting Started: Chapters 1–2 Chapter 1 introduces key topics that are
explored in the rest of the book: requirements, software architecture, and
testing. The chapter also has a section on social responsibility and
professional conduct.
Chapter 2 deals with processes, which orchestrate team activities. A
team’s culture and values guide activities that are not covered by the rules
of a process. Process models tend to focus on specific activities, leaving the
rest to the team: Scrum focuses on planning and review events, XP on
development practices, V processes on testing, and the Spiral Framework
on risk reduction. The chapter discusses how a team can combine best
practices from these process models for its project.
Processes and teams are closely tied, since processes organize teams and
their activities. Team skills belong in the same grouping.
Constraints are external forces from the context of a project. For example,
European Union regulations prohibit products from exporting personal data.
Cost, time, and legal constraints are mentioned explicitly in the definition of
software engineering. Projects must also deal with ethical and social norms.
Box 1.1 Origins of the Term Software Engineering
Identify and prioritize user requirements that truly reflect user needs.
The challenges include multiple users with differing needs and
communication gaps between users and developers.
What do users really want? There can be gaps between what users
are willing and able to communicate and what developers are able to
grasp.
The delivered product does what customers asked for, but it does not
have the performance that they need.
Requirements changes due to external factors can lead to a project
being redirected or even canceled.
1.2.2 Dealing with Requirements Changes
Changes in customer needs and requirements have repercussions for the
development process and for the product, because of the relationships shown
in Fig. 1.3. We have no control over changes in customer needs, but there are
two things we can do.
Layered Architectures
For examples, let us turn from a general discussion of architecture to a
specific form: layered architectures, which are widely used. In a layered
architecture, modules are grouped into sets called layers. The layers are
typically shown stacked, one on top of the other. A key property of layered
architectures is that modules in an upper layer may use modules in the layer
immediately below. Modules in a lower layer know nothing about modules
in the layers above them.
Example 1.6 The layered architecture in Fig. 1.4 is a simplified
version of the architecture of many apps. At the top of the diagram is
the Presentation layer, which manages the user interface. Below it is
the Domain layer, which handles the business of the app. In a ride-
sharing app, the Domain layer would contain the rules for matching
riders and drivers. Next is the Service Access layer, which accesses
all the persistent data related to the app; for example, customer
profiles and preferences. At the bottom, the Platform layer is for the
frameworks and the operating system that support the app.
The arrows in Fig. 1.4 are included simply to make the may-use
relationships explicit. Vertical stacking of layers can convey the same
information implicitly. From now on, such down arrows will be dropped
from layered diagrams. By convention, if layer A is immediately above layer
B in a diagram, modules in the upper layer A may use modules in the lower
layer B.
Input Domain The input domain is the set of possible test inputs. A
test input is more than a value for a single variable. A test input
provides values for all the relevant input variables for the software
under test. The input need not be numeric; for example, it could be a
click on a web page or a signal from a sensor.
The following questions capture the main issues that arise during
testing:11
The main barrier to testing is test selection. Once tests are selected,
automated tools make it convenient to rerun all tests after every change to
the program. See Chapter 9.
1.4.3 Black-Box and White-Box Testing
During test selection, we can either treat the software under test as a black
box or we can look inside the box at the source code. Testing that depends
only on the software’s interface is called black-box testing. Testing that is
based on knowledge of the source code is called white-box testing.
Typically, white-box testing is used for smaller units of software and
black-box testing is used for larger segments that are built up from the units.
Black-box testing is the only option if the source code is not available and all
we have is an executable version of the software. Chapter 9 has more
information on testing.
1.5 Balancing Constraints: The Iron
Triangle
All projects balance competing priorities and constraints. The Iron Triangle
(also known as the Project Management Triangle) illustrates the typical
resource constraints faced by projects: scope, time, and cost; see Fig. 1.6.
Looking ahead, variants of the Iron Triangle will be used in Section 2.1.2 to
illustrate the contrasting priorities of traditional and agile methods.12
With Therac-25, before the massive overdose in the East Texas Cancer
Center, there were reports of unexpected behavior from other sites. Each
time, the manufacturer focused narrowly on specific design issues, without
getting at the root cause of the problem or the overall safety of the medical
device.
Lesson Act in the public interest at every stage of a software project,
from requirements, to design, to coding and testing, even to maintenance.
The software needs to be safe not only when it is first delivered, but also
whenever it is modified. Get to the root cause of any problem.
Design Defensively
The product principle of the Code begins as follows:
Scope. Cost. Time. Pick any two! (You can’t simultaneously control
a project’s functionality, budget, and schedule, so pick any two to fix
- the other will vary.)
For history buffs, the 1968 NATO conference was a defining event
for the field of software engineering. The papers and discussions
from the conference make for fascinating reading [143]; see also the
report on the NATO conference in 1969 [40].
Exercises for Chapter 1
Exercise 1.1 Most definitions of software engineering boil down to the
application of engineering methods to software. For each of the following
definitions answer the questions:
Does the definition have any elements that are not covered by the
preceding questions? If so, what is it and how would you summarize
its role?
(a)
(b)
Exercise 1.3 For each of the following sixteen driving forces on software
projects, choose the “best fit” with the four key drivers in Fig. 1.1:
Customers, Processes, Products, and Constraints. The drivers are listed in
priority order. If a force seems to fit under more than one driver, choose the
earliest driver in the priority order. Briefly justify your grouping of each
force under a key driver.18 (The forces are in alphabetical order.)
Exercise 1.4 This exercise deals with the distinction between programs and
computations. The program is the following C code for rearranging the
elements of an array:
Consider the function call partition(0,4), where the relevant array elements
are as shown in the following list. A computation path through a program is
the sequence of actions that occur during an execution of the program. In
each case, what are the computation paths through the function body?
Design audit trails and error detection into the system from the start.
How would each of these practices have helped avoid the Therac-25
accidents? Provide two or three bullet items per practice.
2
Software Development Processes
◈
Plan and Build Processes that plan and build are called plan-driven. They
include waterfall processes (Section 2.4) and V processes (Section 2.5).
Plan-driven processes are characterized by two phases:
Plan-driven processes are orderly and were a big improvement over the
ad hoc code and fix methods of the 1950s. Plan and build became the
dominant culture in the 1970s. That dominance is illustrated by the
following example.
Example 2.2 The NASA Space Shuttle team viewed plan and build
as being the “ideal.” But, they could not use it.
A rigid plan then build approach does not allow for requirements
changes after the planning phase. This inability to handle changes is a major
limitation.
Iterate and Evolve Processes that iterate and evolve a product are called
iterative. As a rule, agile methods are iterative. Both Scrum and Extreme
Programming (XP) (Section 2.3) are iterative. Iterative processes evolve a
product as follows:
Refactor: “Don’t hesitate to throw away the clumsy parts and rebuild
them.”
The group valued working software and the refinement of tools through
early user feedback. Echoes of the Unix values resonate in agile values,
discussed in what follows. The idea of stringing tools in a pipeline leads
directly into dataflow networks (Section 7.4).
Box 2.1 The Agile Manifesto
That is, while there is value in the items on the right, we value the
items on the left more. See https://agilemanifesto.org.
Let the term agile method apply to any software development method
that conforms with the values in the Agile Manifesto and the principles
behind it.8 Thus, agile methods are characterized by the following:
Example 2.3 The App team is a small well-knit team that specializes
in building custom smartphone apps. They know from experience
that small screens can lead to surprises when clients see a working
app on a smartphone. It can take a few iterations to converge on the
final product.
They choose how they do the work. They collaborate closely
with clients and expect requirements to change. They rely heavily on
working software. □
Example 2.4 The Corporate team is part of a supplier to large,
stable, highly regulated customers who want semiannual releases.
The customers run extensive tests in their own labs before putting
any software into production. The Corporate team has a dozen
people, including a product manager, who represents customer
interests, a project manager to organize the project, and developers
in different time zones.
Within their corporate approval and review structure, the
developers can choose their own practices. The product is complex
and the team is geographically distributed, so they have chosen to
carefully document design decisions, including what not to change.
The team values working software. While they deliver
semiannually to customers, they have their own internal mini-
deliveries of working software. The product manager knows the
customer’s business inside out and has good contacts in the customer
organization. Requirements are relatively stable, since big changes
tend to wait for the next release. Nevertheless, nondisclosure
(confidentiality) agreements allow the product manager to review the
team’s progress, to keep the project on track to meeting customer
needs. □
2.1.4 Selecting a Process Model
If we know a project’s challenges or risks, we can select a process model to
manage the risks. The decision tree in Fig. 2.3 makes process
recommendations based on two kinds of risk factors: the design complexity
of the product and the stability of the requirements. A third key factor,
product quality, is not included in the decision tree because high product
quality is desirable for all projects; see Section 2.5 for testing practices. For
additional project risks, see Section 2.6.
Figure 2.3 Decision tree for selecting a process model for a project. The
decisions in this tree are based on design complexity and requirements
stability.
Simple Design
With a simple design, there is no need for a significant up-front design
effort. If requirements might change, a team needs to use an agile (iterative)
method. Even if requirements are expected to be stable, unexpected events
could still result in changes.
With a simple design, the decision tree therefore recommends a
combination of Scrum and XP. As we shall see, the Scrum Framework lets
developers choose how they build a product. Extreme Programming
provides an approach to development. The combination of Scrum and XP is
discussed in Section 2.3.
Example 2.5 The App team from Example 2.3 has been retained by
an airline to build a flight-reservations app. The basic features of the
app have been outlined: it will enable passengers to choose a flight,
select seats, and pay by credit card or frequent-flier miles.
The team specializes in apps, so they already have a design in
mind. They therefore take the left branch at the root of the decision
tree (for simple designs). The decision tree takes them to using a
combination of Scrum and XP. This agile approach will be helpful
because the team members expect changes to the requirements. □
Figure 2.4 Elements of Scrum. The diagram shows the events during a
sprint and the artifacts related to planning and review.
“We’re good friends. We see each other every day. We’re all
equals. We don’t need roles and a [development] process.”
At the end of the first iteration, they chose roles: product owner,
scrum master, developers. Why? Because they were well behind
both the other teams and their own plans. □
Product Owner The product owner is responsible for the content of the
product. The owner must be a single person; ownership is not to be spread
across team members. In all team events, the product owner serves as the
voice of the customer and sets priorities for what the team implements.
Scrum Master The scrum master acts as coach, facilitator, and moderator,
responsible for arranging all events and for keeping them focused and on
time. The scrum master guides people by highlighting the purpose and
ground rules of an event, not by telling people how to do their jobs. An
additional responsibility is to remove any external impediments for the team.
Figure 2.5 A process diagram for Scrum. Boxes represent the activities
associated with the Scrum events.
Sprint Planning The purpose of sprint planning is to set a sprint goal and
to select a sprint backlog. The product owner represents customer needs and
priorities. The development team provides estimates for what it can
accomplish during the sprint. See Sections 4.2-4.4 for estimation and
prioritization of work items.
Sprint planning is time-limited to eight hours or less for a one-month
sprint. The attendees are the entire scrum team.
Sprint Goal The product owner proposes a goal for the functionality
to be implemented during the sprint. The owner leads the entire
scrum team in converging on the sprint goal. Once set, the sprint
goal may not be changed during the sprint.
Sprint Backlog Given the sprint goal, the development team has sole
responsibility for selecting product-backlog items to achieve the
goal. The developers are accountable for explaining how completion
of the selected items during the sprint will achieve the sprint goal.
During a sprint, the developers may renegotiate with the product
owner what they can accomplish.
These questions keep the whole development team informed about the
current status and work that remains to be done in the sprint. The scrum
master takes responsibility for addressing any impediments that are external
to the scrum team.
Daily scrums are also known as daily stand-ups because the attendees
sometimes stand (to encourage observance of the 15-minute time limit). The
15-minute time limit works for small teams. The process can be scaled to
larger teams by creating smaller subteams responsible for subsystems. The
daily scrum for the larger team is then a scrum of scrums, with
representatives from the subteams. The representatives are typically the
scrum masters of the subteams.
Product Increment As with any iterative process, the product evolves with
every sprint. The deliverable from a sprint is called a product increment. The
added functionality in the product increment is driven by the sprint goal.
The product increment is reviewed with stakeholders during the sprint
review. The intent is to have a potentially releasable product at the end of
each sprint.
2.2.5 Summary
Highlights Scrum takes an iterative approach to product
development; iterations are called sprints. Scrum’s focus is on
planning and review events to coordinate and structure development
work. Developers are free to choose how they implement the goal for
a sprint.
This advice and the boxes in the development cycle correspond to the
four main XP activities: listening, testing, coding, and design, respectively.
Kent Beck defined XP while he led a payroll project for Chrysler,
starting in 1996. The name comes from taking best practices to the extreme;
for example,
Before any code is written, the new tests should fail. Furthermore, the
tests should fail for the expected reasons. If, however, the tests pass, then
either the feature already exists, or the tests are too weak to find bugs.
Once there is an adequate set of tests, the idea is to make only the code
changes that are necessary; that is, to modify the software just enough so it
passes all tests.
Regression testing acts as a safety net during coding, since it runs all
tests to verify that the new code did not break some existing feature.
Coding is not the time to restructure the code to gracefully
accommodate the new functionality. Cleanup belongs in the next activity:
refactoring.
Wasted Effort Too much designing too early may lead to wasted
effort on planning for something that will not be needed.
Kent Beck’s Advice In the early days of agile development, some teams
deferred design until the last possible moment and ended up with “brittle
poorly designed systems.” They cited the acronym yagni, which comes from
“you aren’t going to need it.”14 Yagni came to mean don’t anticipate, don’t
implement any functionality until you need it.
However, yagni does not have to mean no design up front. On design,
Kent Beck’s advice to XP teams
is not to minimize design investment over the short run, but to keep the
design investment in proportion to the needs of the system so far. The
question is not whether or not to design, the question is when to
design.15
Figure 2.7 Guidelines for designers. Product complexity pushes for early
design work. Requirements volatility (changeability) pushes for deferring
design work.
The top left box in the figure is for complex products with stable
requirements. With stable requirements, we can go ahead with early design
efforts to address product complexity. System properties, such as scale,
performance, reliability, security, and so on, have to be designed into a
system from the start. They are hard to retrofit. Products in this category can
be designed up front to meet performance, or reliability, or other goals. See
Chapter 7 for architectural patterns, which carry ideas from previous
solutions to recurring problems.
Proceeding clockwise, the top right box is for complex products with
changing requirements. Here, complexity and requirements are opposing
forces. Complexity pushes for early design, but early design efforts might be
wasted if requirements change. The recommendation in this case is to invest
some design effort up front, in a modifiable design, based on what we do
know. For example, a product with stringent reliability requirements is likely
to continue to have stringent reliability requirements. And, for reliability, we
would need primary and backup versions. A modular system isolates design
decisions in modules so they can be changed without affecting the rest of the
system; see Section 6.2. In short, for products in this category, the idea is to
do some up-front design and to isolate and defer design decisions that can be
deferred.
The bottom-right category is for simple products with changing
requirements. For a simple product, we can get started with minimal up-front
design effort. Since the product is simple, we can adapt the design readily to
keep up with changing requirements.
Finally, the bottom-left box is for simple products with stable
requirements. Stable requirements allow us to do design work whenever it is
most convenient. The recommendation is to defer design work until it is
needed, on the off chance of an unexpected change in requirements.
2.3.4 A Scrum+XP Hybrid
Scrum and XP have compatible values and complementary process, so they
can be readily unified into a single agile method. Scrum concentrates on
time-boxed planning and review events; see Fig. 2.5 for the events and their
associated activities. Extreme Programming concentrates on development
practices.
The process diagram in Fig. 2.8 unifies Scrum and XP. Let us use
Scrum terminology for the unified process: iterations are sprints, user stories
are work items, and collections of user stories form a backlog. For
convenience let us use the terms event and Scrum activity interchangeably to
refer to an event and its activity.
1. Sprint Planning. Select work items for the sprint backlog of work
items for this sprint. This activity is formed by merging Scrum sprint
planning and XP iteration planning (the first activity in the development
cycle in Fig. 2.6).
2. Test Planning. Write tests based on the work items in the sprint
backlog. With this activity, the hybrid process switches to XP for
development. In the hybrid, the XP development activities incorporate
daily scrum events.
4. Design and Refactoring. Clean up the code base. After this activity,
the hybrid process switches to Scrum.
Figure 2.10 For severe defects, the later the fix, the greater the cost. (The
original diagram had a log scale for the relative cost.)
For large projects, the relative cost of fixing a severe defect rises by a
factor of 100 between the initial requirements and design phases and the
final phase where the system has been delivered and put into operation. The
cost jumps by a factor of 10 between requirements and coding and jumps by
another factor of 10 between coding and operation.
The dashed curve for smaller projects is much flatter: the cost of a fix
during operation is seven times the cost of a fix during the requirements
phase. For non-severe defects, the ratio may be 2:1 instead of the 100:1 ratio
for severe defects.
The following table summarizes the preceding discussion comparing
the cost of late fixes (during operation, after delivery) to the cost of early
fixes (during initial requirements and design):
Figure 2.11 Variants of the waterfall model were used for releases
of the successful AT&T/Lucent 5ESS switching system. The rows
show releases between 1985 and 1996. The histograms display effort
levels.
Diagram by Harvey Siy. Used by permission.
The Releases (Projects) The data in the figure is for the time
period 1985-1996. The upper half of the figure is for releases 11-115
for international markets. The lower half is for releases D1-D11 for
domestic US markets. Development of the releases overlapped. The
histograms in each row show the number of developers on that
release in that time period. Each release was carefully planned and
built using a waterfall variant.
Caveats Waterfall processes are very risky, for two reasons. First,
requirements often change while a product is being built to the
original plan. Second, the cost of fixing a defect rises exponentially
as a project progresses, so late-stage testing can result in costly
rework and project delays.
Waterfall projects typically took months rather than weeks. Some of the
time went into careful up-front planning and documentation. Some of it went
into extensive testing at the end to root out defects. Successful projects often
froze requirements for months, which led to long waits for new product
features. Successful projects also customized their processes to start testing
early, so they used a waterfall variant rather than a purely sequential process.
2.5 Levels of Design and Testing: V
Processes
By specifying what to test and when to test it, a process can provide a
development team with a testing strategy. All teams need a strategy because
testing is such an important part of software development. We know from
waterfall processes that big-bang testing at the end is a bad idea. The XP
strategy is to write tests before writing code to pass the tests. A process does
not tell us how to test; testing is covered in Chapter 9.
V processes introduce the concept of testing at various levels of
granularity, down to smaller and smaller parts, until we get to units of
implementation. Specifically, V processes introduce the concepts of unit,
functional, integration, system, and acceptance testing. These terms are now
part of the software engineering vocabulary. These concepts can be adopted
by any team using any of the processes in this chapter.
V processes are based on the development process for the SAGE air
defense system, built in the 1950s. SAGE (Semi-Automated Ground
Environment) was an ambitious distributed system that grew to 24 radar data
collection centers and three combat centers spread across the United
States.23
2.5.1 Overview of V Processes
A V process has levels of paired specification/design and testing activities.
The higher the level, the larger the software under design and test. The
number of levels varies from project to project. An example of a five-level V
process appears in Fig. 2.12.
Diagrams for V processes resemble the letter V. The design phases are
drawn going down and to the right. Coding and any testing that occurs
during coding are at the bottom. The testing phases are drawn going up and
to the right. The dashed arrows in Fig. 2.12 link a specification (design)
phase with its corresponding testing phase. Design and testing are thus
paired, like opening and closing parentheses, surrounding a coding phase.
At each level, the result of the design activity is a specification of what
to build and test at that level. The testing activity at each level checks that its
corresponding specification is implemented correctly.
Test planning can begin in the down part of the V, before coding begins.
In the SAGE project, test planning was done in parallel with coding. Test
execution must follow coding, so test execution is in the up part of the V.
Including relevant tests with a specification strengthens the specification.
V processes are essentially waterfall processes. Note that the solid
arrows in Fig. 2.12 trace the sequential flow from customer requirements to
acceptance testing. A specific V process may have different levels of
specification and testing.
2.5.2 Levels of Testing, from Unit to Acceptance
The upper levels of a V process focus on validation, and the lower levels
focus on verification, as described in what follows. Both validation and
verification refer to forms of error checking:
The pair of case studies in this section illustrate that a process reduces,
but does not eliminate risks. The first project was very successful. The
second project, by the same team using the same process, failed to live up to
expectations. It faced many challenges, related to requirements, its code
base, new technology, and staff shortfalls.
Example 2.14 This example illustrates a process adjustment to
manage a potential project risk. For their complex product with
stable requirements, the Corporate team chose a plan-driven
approach in Example 2.6. In Example 2.7, the team added daily
scrums.
The addition of daily scrums is an adjustment to a plan-driven
process. Now, the team has identified performance as a potential risk
for the proposed design. The developers consider three options for
adjusting their process to manage the performance risk:
Risk factors such as the ones in Fig. 2.13 can be quantified and
compared by treating each risk as a cost: it is the cost of something
going wrong. For example, there is a potential cost associated with
missing a deadline or with having to rework a design. Let us refer to
something going wrong as an event. The risk of an event is:
2. For each risk factor, ask questions to classify the risk as high,
medium, or low. See the following for examples of questions.
Sample Questions
Here are some sample questions to help with a quick risk assessment:
How experienced is the team? If the code base has just been handed
to a green team, the team risk may be high. If the team has years of
experience with different projects, the team risk is low.
The takeaway from this example is that their iterative process enabled
the team to rapidly evolve a prototype into a successful product. The initial
requirements came from internal users within the company. The
requirements were updated based on user feedback and the competitive
threat from Microsoft. Beneath the surface, there was a lurking issue with
the code base. In the rush to market, it had not been adequately refactored to
clean up the code.
2.6.3 Netscape 4.0: A Troubled Project
Netscape 4.0 illustrates that there are limits to what a process can do to
accommodate requirements changes. Iterative and agile methods are not a
silver bullet. Besides requirements, projects can succumb to other
challenges; for example, related to design, technology, and staff shortages.
Example 2.16 The Netscape Communicator 4.0 project had the
same iterative process, the same team, and the same code base as the
3.0 project. But 4.0 faced multiple challenges.
Design and Technology The project had issues with the design of
the code for the product and with the tools to build the system.
Communicator 4.0 was built on the existing code base from
Navigator 3.0. The existing code base needed to be re-architected to
accept the new features, but the schedule did not permit a redesign.
As for tools, the team chose to use Java, so the same Java code
would run on Windows, MacOS, and Unix. Java was relatively new
at the time and did not provide the desired product performance.
(Since then, Java compilers have improved significantly.)
Staff Shortages With multiple platforms to support - Windows,
MacOS, Unix - the team did not have enough testers.
3. For the final cycle, one team was selected to build a viable
system.
Each of the competing teams was free to choose its own process
for a given cycle. The investment level increased with each cycle.
Example 2.19 provides more information about the project and the
benefits of an incremental spiral approach to awarding a large
contract. □
A Spiral Cycle
Each spiral cycle has two main parts, represented by the upper and lower
halves of Fig. 2.15. The upper half defines the problem to be addressed in
this cycle. The lower half develops a solution to the problem.
Values So, where do we start when we organize a new project? Start with
values. The purpose of a project is to build working software that will meet
customer needs upon delivery. This purpose motivates the following agile
values:
Collaborate with customers. After all, the project is for their benefit.
The conclusion is that agile values are a good starting point for defining
a set of values for a team.
For XP, see Beck’s early paper [16] or the updated account in the
book by Beck and Andres [17].
a) For a one-month sprint, what are the durations of the four Scrum
events?
d) Agile processes are great for when you are not sure of the details of
what to build.
Exercise 2.3 A usage survey identified the following as the top five agile
practices.32 For each practice, describe
how it is practiced.
a) Daily Standup
b) Retrospectives
c) Iteration Planning
d) Iteration Review
e) Short Iterations
Exercise 2.4 In practice, processes do not always run smoothly. For each of
the situations listed in this exercise, briefly answer the following questions:
a) The product owner challenges developer estimates for the time and
effort needed for work items. The product owner is convinced that the
developers are inflating their estimates.
b) It appears that, for lack of time, some essential product backlog
items will have to be dropped entirely from the project.
a) Waterfall processes
b) V processes
c) Iterative processes
a) Create a website.
Exercise 2.7 For each of the four quadrants in Fig. 2.18, classify the
following risks as low or high. The quadrants represent combinations of the
market and technology conditions for a software project. Briefly explain
your classification.
a) Requirements-related risk
b) Design-related risk
c) Quality-related risk
Here, New refers to a market or technology that is new to the world, and
Stable stands for well known to the world. There can be surprises with either
new technology or new markets, and there are no surprises left with either
stable technology or stable markets.
Exercise 2.8 For each of the quadrants in Fig. 2.18, how would you address
the requirements, design, and quality risks identified in Exercise 2.7 if you
are using
a) an agile process?
b) a plan-driven process?
3
User Requirements
◈
All software projects face the twin questions of what users want and what to
build to address those wants. These are related, but logically distinct
questions. Requirements elicitation is the process of discovering user wants
and writing them down as user requirements. Requirements analysis is the
process of prioritizing user requirements from various sources and defining
what to build as a product. In practice, elicitation and analysis are entwined:
the same user-developer conversation can touch on a user goal, its priority,
and acceptance tests for it.
This chapter deals with elicitation. It begins with an overview of
requirements and requirements development. It then explores the elicitation
and clarification of user needs and goals. Chapter 5 introduces use cases,
which describe user-system interactions to accomplish user goals. Chapter 4
provides techniques for requirements analysis.
This chapter will enable the reader to:
Elicit the needs and goals conveyed by users through their words,
actions, and sentiments.
The edges in the figure are labeled with requirements produced by the
activities. This chapter includes techniques for eliciting and clarifying user
requirements. Overviews of the various kinds of requirements appear later in
this section.
To elicit means to draw out a response. Requirements elicitation is the
identification of the various stakeholders and the discovery and description
of user motivations, goals, needs, and wants for a project. Section 3.3
discusses how to interact with users to uncover their needs.
During analysis, the output from elicitation is clarified, reconciled, and
prioritized. Analysis produces a definition of the desired behavior,
functionality, and attributes of a product. The product definition consists of a
coherent set of specific requirements. Section 3.6 introduces techniques for
clarifying requirements. Chapter 4 discusses prioritization.
To validate is to confirm with users that the right product is being built.
Agile methods validate working software at the end of each iteration. Plan-
driven methods validate detailed specification documents and high-level
designs. See Section 3.2 for the handling of requirements during software
development. Validation techniques include reviews and acceptance tests.
See Chapter 8 for software-architecture reviews.
In practice, the boundaries between the activities in Fig. 3.1 are fluid. A
single conversation with users can elicit a need and prompt a discussion
about its priority (for use during analysis). The same conversation can then
turn to possible acceptance tests for validating the need.
The activities have been teased apart in the basic cycle because each
has its own bag of tricks and techniques. For a small project, some of the
activities may be combined or skipped. For a large project, each activity can
be a process in its own right.
3.1.2 Case Study: Requirements Challenges
The three leading causes of requirements challenges are (a) multiple
stakeholders with differing and possibly conflicting goals, (b) incomplete or
inadequate requirements, and (c) changes over time. In the next example, the
developers cited changing requirements as their biggest challenge. The root
cause may, however, have been that the project prioritized technology over
user needs and goals.
Example 3.1 In May 2013, the British Broadcasting Corporation
(BBC) canceled its Digital Media Initiative, writing off an
investment of £98.4 million over six years.2 What went wrong? The
chief technology officer, who was subsequently dismissed, was
reported to have said,
Project Delays The project got off to a bad start. Funding was
approved in January 2008 and a vendor was selected in February, but
the contract to build a system was terminated by mutual agreement
in July 2009. A year and a half after it started, the project was back
to square one, with nothing to show.
The BBC then decided in September 2009 to bring the project
in-house. They gave the IT organization the responsibility for
building and delivering the system.
□
The BBC project had multiple requirements-related problems. The
project focused on technology standardization, not on users: its success
depended on a major shift in user behavior. The project began before
requirements were fully developed. Repeated delays meant that users turned
to alternative tools and practices. If there is a single lesson from such
projects, it is this: prioritize value for users and the business over
improvements in technology and infrastructure.
3.1.3 Kinds of Requirements
The rest of this section introduces the various kinds of requirements shown
along the edges in Fig. 3.1. The amount of documentation varies from
project to project. Agile methods work primarily with user requirements; for
example, with user stories. Plan-driven methods typically have detailed
documentation.
Business Requirements The vision and scope of a project lay out the
expectations for the value the project will provide for the customer. The
expectations are the starting point for the project; see the top left of Fig. 3.1.
Let us call the vision and scope the business goals or the business
requirements. Here are some common categories of business goals:
Increase revenues.
Reduce costs.
As a traveler,
Sprint reviews are repeated on Lines 4 and 5 to separate out two key
activities during a sprint review: (a) review the product increment (the
implemented sprint backlog) and (b) update the product backlog based on
user feedback. The review of the increment is a validation activity. The
update to the product backlog is an elicitation activity. Sprint retrospectives
are for process improvement, so there is no direct requirements activity.
With XP, the writing of user stories is an elicitation activity; see Section
3.4 for guidelines for writing user stories. The writing of tests for stories
supports validation. It is left as an exercise to the reader to create a
counterpart of Table 3.1 for XP.
3.2.2 Case Study: An Agile Emphasis on Requirements
The Fast Feedback process in Fig. 3.4 is based on Scenario-Focused
Engineering, which has been used within Microsoft.3 It highlights customer
interactions during software development. The balance tips from mostly
requirements development during early iterations to mostly software
development in later iterations. The Fast Feedback process complements
Scrum and XP. Its requirements activities can be used together with the team
events of Scrum and the development practices of XP.
Figure 3.4 The agile Fast Feedback process emphasizes the identification
and analysis of customer needs.
Example 3.6 Starting at the top left of Fig. 3.4, the four activities of
the Fast Feedback process correspond to elicitation, analysis, design,
and build/validation. The focus of elicitation is on unmet needs and
goals, not on product features and capabilities. The output from
elicitation is a set of stories that represent product opportunities. The
various opportunities could well pull the project in different
directions.
Moving right, the next activity is to analyze the stories (user
requirements) and frame or define a problem to be solved by a
product. Collectively, developers and users sift through and prioritize
the user requirements. Their task is to converge on the specific
problem to be addressed.
For the distinction between eliciting needs and framing the
problem, consider the need to listen to music on the go. This need
can be met in at least three distinct ways:
Correct The SRS must truly reflect user needs and must agree with
other relevant documentation about the project.
Articulated needs relate to what people say and think. They are what
people want us to hear. By listening, we can elicit articulated needs.
Observable needs relate to what people do and use. They are made
visible by actions and usage patterns. Through direct observations
and examination of usage data we can elicit observable needs.
Tacit needs are conscious needs that people cannot readily express.
They relate to what people feel and believe. By empathizing and
trading places - walking in someone else’s shoes - we can elicit tacit
needs.
Example 3.7 Disconnects between what people say and what they
do are not new. Faced with a string of product failures in the 1990s,
Scott Cook, the founder of Intuit, resolved that “for future new
product development, Intuit should rely on customer actions, not
words.”6 □
Example 3.8 Let us apply the model in Fig. 3.5 to the user behavior
of Netflix’s recommendation system for videos. The articulated user
need is for maximal choice and comprehensive search. Their usage
patterns, however, exhibit the opposite: an observable need for a few
compelling choices, simply presented. □
Example 3.9 The user’s articulated request was for an app that
would send a notification when the temperature in the Lunar
Greenhouse wandered outside a narrow range. The developers met
the request, and the story does not end there. Once the developers
learned the motivation for the request, they extended the app to meet
a latent need that made life easier for the user.
The Lunar Greenhouse is part of a study of what it would take
to grow vegetables on the moon or on a long space voyage.7 Water
will be a scarce resource. The temperature in the greenhouse was
therefore strictly controlled: warm enough to grow plants, but not
too warm, to limit water loss due to evaporation. Without
notifications, the user, a graduate student, had to regularly visit the
greenhouse to personally check growing conditions.
The development team created a notification app to meet the
articulated need. They got a feed from sensors in the greenhouse and
sent notifications to the user’s phone. The app eliminated regular
trips to the greenhouse for temperature checks.
There is more to the story. Every time the temperature went out
of range, the graduate student would jump on a bicycle to go to
check the greenhouse. Sometimes, all was well and the temperature
returned to normal, without intervention. The developers set up a
video surveillance camera in the greenhouse and enhanced the app to
provide a live video feed to the user’s phone, on demand. Now, the
graduate student needed to go to the Lunar Greenhouse only when
manual intervention was warranted.
In summary, the articulated user need was for a notification
system. The latent need for video monitoring emerged out of the
close collaboration between the user and the developers. The
developers discovered the latent need by relating to the user’s
motivations and frustrations. □
3.3.2 Accessing User Needs
When two people communicate, the receiver may not fully grasp or may
misinterpret the message from the sender. Thus, there can be communication
gaps when developers elicit user needs and write user requirements. The
purpose of the following general tips is to reduce gaps and inaccuracies in
user requirements. After the tips, we consider techniques for eliciting
articulated and observable needs.
Clarify and confirm that the written requirements capture the user’s
intent, to their satisfaction. To repeat, users must be satisfied with the
written user requirements.
The advice to developers is to hear the other person out before asking a
question or changing the direction of the conversation. Words and phrases
are important to people, so resist the urge to paraphrase or reword what users
day. Marketers use the term Voice of the Customer for a statement “in the
customer’s own words, of the benefit to be fulfilled by the product or
service.”9
Listening and Observing
The qualitative and quantitative techniques in Fig. 3.6 can be used for
eliciting articulated and observable needs. The left column in the figure is
for articulated needs. Interviews are qualitative; that is, interview findings
cannot be readily measured. Surveys, meanwhile, are quantitative, especially
if the questions are multiple choice or on a numeric scale. Observation of
customer behavior and analysis of usage logs are techniques for eliciting
observable needs; see the right column in Fig. 3.6. Observations are
qualitative; usage logs are quantitative.
Here, 〈role〉, 〈task〉, and 〈benefit〉 are placeholders for English phrases.
Any task a user wants to accomplish is equivalent to a user goal, as
discussed in Section 3.6. We therefore treat 〈task〉 as a user goal that may
require clarification.
What is a good user story? Above all, a story is about a user need; it is
not about the technology for addressing the need. Stories must therefore be
written so that users can understand them and relate to them. At the same
time, stories must be written so that developers can implement them and test
for them. It may take multiple conversations to outline and refine the initial
set of user stories for a project.
SMART Stories
Good user stories are SMART, where SMART stands for specific,
measurable, achievable, relevant, and time-bound.
If one of these criteria is not met, then the story has to be refined until it
is SMART. (SMART criteria can be applied to any goal; see Section 3.6.)
Example 3.11 The first draft of a user story may not meet all the
SMART criteria. Consider the payroll story from Example 3.11:
As a payroll manager
As a payroll manager
□
How Much Detail?
When is a story SMART enough that we can stop refining it? A more
fundamental question is: who decides whether a story is SMART enough?
The story’s developers make that decision. A story is SMART enough if the
developers can write tests for it (which implies that they can implement the
story).
As a payroll manager
The takeaway from the preceding example is that a story does not have
to include all the information needed to implement it. It needs to include
enough for developers to write tests for the story.
INVEST in Stories
The INVEST acronym provides a checklist for writing good user stories.
The acronym stands for Independent, Negotiable, Valuable, Estimable,
Small, and Testable.12
Acceptance Tests
User stories are accompanied by acceptance tests that characterize the
behavior to be implemented. Acceptance tests are part of the conversation
between users and developers about user wants.
The following template for writing acceptance tests addresses three
questions from Section 1.4:
Given 〈a precondition〉
then use the federal tax tables for singles to compute the tax
I: is implementation-free
R: is based on research
3.5.2 Case Study: A Medical Scenario
The scenario in the following example is prompted by a doctor’s request for
an app to provide reliable information for seriously ill children and their
parents. From the start, the doctor described the motivation for the request
(the current state) and her vision for an app (the desired state). On her own,
the doctor outlined a UX scenario during the first two or three meetings. She
did not have a list of features. She and her colleague were focused on the
overall experience for parents and kids. The desired outcome drove the
iterative development of the prototype for the children’s hospital.
Example 3.16 The narrative in this example has two paragraphs: one
for the introduction and situation, and another paragraph for the
outcome.16
Title: A Medical-Information App for Kids and their Parents
I: Insight The scenario includes key insights into who will use
the app and what the app must provide to support their desired
experiences. The primary users are parents, who want reliable
information. Then come children. The app needs to support a
separate kid-friendly experience. Where does the content come
from? If doctors add and edit content, they will need their own
interface. The outcome could be taken apart, sentence by
sentence, for follow-up conversations about requirements.
□
3.6 Clarifying User Goals
Conversations with users about their needs often begin with goals that
express intent, goals that need to be clarified and refined before they can
implemented. For example, the conversation with two doctors about the
medical app in Example 3.16 began with
We want an app that provides seriously ill children and their parents
with reliable medical information.
This goal is soft, where soft means that it fails one or more of the
SMART criteria. Recall that SMART stands for specific, measurable,
achievable, relevant, and time-bound. These criteria apply broadly to goals,
requirements, questions, actions, and so on. SMART goals can be readily
implemented and tested.17
Soft goals arise naturally during requirements development. This
section phrases user needs and requirements as goals and asks clarifying
questions to refine soft goals into SMART subgoals and actions. In practice,
goal refinement is an iterative process. The refinement of the initial goal for
the medical app (presented at the beginning of this section) was spread
across two or three sessions with the doctors. The term “reliable
information” was refined into (a) content created by the doctors and their
colleagues, and (b) links to text, images, and videos approved by the doctors.
A single initial soft goal can be refined into multiple subgoals. To keep
track of subgoals that arise during requirements development, this section
introduces goal hierarchies, with the initial goal(s) at the top (root) of a
hierarchy. The children of a node represent subgoals.
3.6.1 Properties of Goals
Anything a user wants to accomplish can be viewed as a user goal. The want
in the user story
Each of these goals fails one or more of the SMART criteria. How
specifically does the payroll system need to be upgraded, and by when? How
do we identify a “trusted” driver? How much inventory do we need to keep
a popular item in stock? How is popular defined? As we shall see, simple
“How?” questions can be a powerful tool for clarifying user goals.
How Else questions tend to elicit alternatives. If the goal is clear from
the context, the simple forms are “How?” and “How else?”.
Quantification questions explore metrics and criteria for determining
whether an optimize goal has been accomplished. They are of two kinds:
How Much and How Many.
Example 3.22 Consider a brainstorming session with users about
their interest in listening to music on the go. The following
conversation illustrates the use of options questions:
How can we provide music while mobile? Download media to
a portable personal device.
How else can we provide music while mobile? Stream music to
a device over the network. □
An and node represents a goal G with subgoals G1, G2,..., Gk, where
every one of G1, G2,..., Gk must be satisfied for goal G to be satisfied.
For example, the goal in Fig. 3.8(a) of beating a competitor is
satisfied only if both of the following are satisfied: release the
product by August, and deliver a superior product.
Figure 3.7 A goal hierarchy for the lunar greenhouse, Example 3.9.
Figure 3.8 Examples of and and or nodes in a goal hierarchy.
(b) Or node
Figure 3.9 A goal hierarchy. See Fig. 3.10 for the contributing and
conflicting goals in this hierarchy.
There is only one goal in the hierarchy with more than one
subgoal. It is the top-level goal of delivering a superior product. This
goal has an and node, so, for it to be satisfied, all three of its
subgoals must be satisfied. Netscape’s strategy for a superior
browser was to run on multiple operating systems. Microsoft’s
browser ran only on Windows, at the time. Netscape also closely
monitored beta releases from Microsoft to ensure that its browser
would have competitive features.
The graph in Fig. 3.10 shows the contributes and conflicts
relations for the hierarchy in Fig. 3.9. Since subgoals contribute to
higher goals, there are directed edges from subgoals (child nodes) to
higher goals (parent nodes). As an example, the goal at the top right,
“Deliver superior product,” has three edges coming into it. Two of
the edges are for strong contributions, so they are shown as solid
edges. The third edge is for a weak contribution from “Monitor
competitor’s betas,” so it is dashed.
Figure 3.10 Contributing and conflicting goals for the hierarchy in
Fig. 3.9.
The goal hierarchies in this section illustrate how soft high-level goals
can be refined into more specific subgoals. The hierarchies grow down from
higher goals to subgoals. This process of goal refinement can stop when
SMART goals are reached. Here, SMART means that developers know how
to implement them.
Example 3.25 Consider the following refinement of “Run on
multiple platforms”:
1. At the Leaves Mark each leaf p if that leaf is possible; that is, the
goal at the leaf is achievable. Otherwise, mark it i for impossible.
Walk into the data center and get physical access to the system.
Multiple Goals Users do not speak with one voice; that is, they can
have differing goals and priorities. It may not be possible to
simultaneously satisfy them all.
Changes over Time User wants can change over time, as new needs
emerge, and there are changes in the users’ business and the
marketplace.
For modeling security attacks and how to defend against them, see
Schneier [165].
Exercise 3.2 The two diagrams in Fig. 3.13 are for agile software-
development processes. The Fast Feedback Process (left) highlights
requirements-related activities. Assume that the output of “Identify Needs”
is a set of user stories. XP (right) highlights implementation-related
activities.
Draw a diagram for a unified process that combines the two agile processes.
Your diagram must highlight both requirements-related and implementation-
related activities. Avoid duplication of activities in the unified process.
a) Articulated needs.
b) Observable needs.
c) Tacit needs.
d) Latent needs.
The examples need not be from the same software project. For each
example, address the following questions: What is the underlying need in the
example? How is the need exhibited? How is it accessed, in terms of
listening, observing, and/or empathizing?
Exercise 3.4 Write user stories for the pool-service application in Example
1.7. Include stories from both a technician’s and a manager’s perspective.
Exercise 3.5 Use the pseudo-English syntax in Table 3.2 to write a set of
features for the pool-service application in Example 1.7. Is the set of features
complete? Briefly justify your answer.
Exercise 3.7 Write a UX scenario based on the needs of the user in Example
3.9; the example deals with the Lunar Greenhouse. Give the user a name and
use your imagination to fill in details, so your scenario conforms with the
SPICIER guidelines.
Exercise 3.8 Write a UX scenario for a frequent traveler who wants an app
to help deal with busy airports, flight delays, gate changes, tight connections,
and the like. Put yourself in the shoes of the frequent traveler. What are your
needs and desired end-to-end experience? What is your emotional state?
Include details, in line with the SPICIER guidelines.
Exercise 3.9 Write four user stories based on the following scenario. Let
these be the highest priority user stories, based on perceived business value
for the client.
Your team is doing a project for a nationwide insurance company that
prides itself on its personalized customer service. Each customer has a
designated insurance agent, who is familiar with the customer’s situation and
preferences. The company has engaged you to create an application that will
route customer phone calls and text messages to the designated agent.
And there may be times when the designated agent is not available. If so,
the caller needs to speak to an agent - say, to report an accident - then, as a
backup, the automated application must offer to connect the caller with
another agent at the local branch (preferred) or at the regional support center,
which is staffed 24 hours a day, 7 days a week. The regional center will be
able to help the caller because the application will simultaneously send both
the phone call and the relevant customer information to the regional agent’s
computer.
At any choice point, callers will be able to choose to leave a voice
message or request a callback.
Exercise 3.10 The goals for the San Francisco Bay Area Rapid Transit
System (BART) included the following:21
Minimize costs.
Apply goal analysis to refine these top-level goals into SMART goals.
a) Show the questions that you use for refining goals into subgoals.
Briefly explain the rationale for each question.
Once user needs and goals are recorded as user requirements, the emphasis shifts
to requirements analysis. User requirements, even specific and measurable ones,
correspond to wish lists from the various stakeholders. Requirements analysis
prioritizes these wish lists to define precisely what to build as a product; see Fig.
4.1. To prioritize, we must answer three questions. First, what properties of the
product will prioritization be based on? Examples of properties include not only
cost and functionality, but usefulness, usability, and desirability. Second, how
will the properties be quantified? Quantification involves classification into
ranked categories; for example, must-have, should-have, could-have, won’t
have. Third, how do we rank order requirements based on a combination of
properties, such as benefit, cost, and perhaps risk?
Figure 4.1 A view of requirements development. Clarification and refinement
of user needs and goals into SMART user goals begins during elicitation and
continues into requirements analysis. For simplicity, this figure shows only
product requirements.
Example 4.1 Let us apply the useful, usable, desirable criteria to the
initial and redesigned versions of the speech-therapy app of Example
1.3. The app had two main stakeholders: parents and their children.
Parents bought the initial version because they felt it would be useful for
their children. But, they were frustrated because they found it too hard to
use. Thus, for parents, the app was desirable and useful, but not usable.
Children, meanwhile, had no trouble using the initial version but were
bored because it was like a lesson. Thus, for children, the initial version
was usable but neither desirable nor useful (for entertaining them).
The makers of the product found a creative way of redesigning it to
remove usability as a pain point for parents and to make the app
desirable and useful for children. They simplified the user interface for
parents and made the app more like a game for children.
□
Checklist Questions
The following list of questions is offered as a starting point for creating a
checklist for a project.
The end-to-end experience provides the context for the stakeholder’s use of
the product. The question can expose any gaps between the proposed product
and the stakeholder’s desired role for the product in the overall experience. User-
experience (UX) scenarios are designed to expose such gaps; see Section 3.5.
A desirable product is one that customers really want and consider well
worth the price that they paid for it. “Worth what paid for” is a measure of
customer satisfaction. Desirable therefore touches on both aesthetic value and
value as in cost/benefit.
Useful, Usable, and Desirable Are Independent
Useful, usable, and desirable are shown at vertices of a triangle in Fig. 4.2 to
emphasize that they relate to the same product but are independent properties:
A product can be useful and desirable, but not usable; for example, a
product with a complex interface that is hard to use. Apple is known for
its great products; however, after one of its product announcements, the
news headline was “Innovative, but Uninviting.” Why? Because “using
the features is not always easy.”2
A product can be usable and desirable, but not useful; for example,
fashionable apps that are downloaded, but rarely used.
A product can be useful and usable, but not desirable; for example, an
application that gets treated as a commodity or is not purchased at all.
Section 4.5 explores useful features that are taken for granted if they are
implemented, but cause dissatisfaction if they do not live up to
expectations.
Figure 4.2 Key product attributes. Great products address needs, wants, and
ease of use.
4.2 Relative Estimation: Iteration Planning
Agile iteration planning is a form of incremental requirements analysis, in which
the highest-priority user stories are selected for implementation during that
iteration. Lower-priority requirements remain on the wish list and may
eventually be dropped if time runs out. Iteration planning involves both
prioritization and estimation: prioritization to rank order and select user stories,
and estimation to determine how much the development team can implement
during an iteration.
To prioritize a set of items is to rank order or linearly order the items,
without needing to quantify them. To estimate is to quantify some property of an
item; for example, the development effort for implementing a user story. People
tend to be better at prioritizing (comparing) than they are at estimating
(measuring).
This section begins with the phenomenon of anchoring, which can lead to
biased estimates. Avoid anchoring! The section then considers the estimation of
development effort during iterative planning.
4.2.1 Anchoring Can Bias Decisions
Cognitive bias is the human tendency to make systematic errors in judgment
under uncertainty. Anchoring occurs when people make estimates by adjusting a
starting value, which is called an anchor value. The anchor value introduces
cognitive bias because new estimates tend to be skewed toward it (the anchor
value).3
In order to get unbiased estimates from developers, avoid telling them
about customer or management expectations about effort, schedule, or budgets.
Example 4.2 A single number in pages of documentation was enough to
anchor estimates in a case study by Jorge Aranda and Steve Easterbrook.
The participants were asked to estimate the time it would take to deliver
a software application.4 Each participant was given a 10-page
requirements document and a three-page “project-setting” document.
The project-setting document had two kinds of information: (1) a brief
description of the client organization, including quotes from interviews;
and (2) background about the development team that would implement
the application, including the skills, experience, and culture of the
developers.
The 23 participants were divided into three groups. The only
difference between the instructions given to the three groups was a quote
on the second page of the project-setting document, supposedly from a
middle manager; see Table 4.1.
Figure 4.3 Connextra’s index card template for a user story. The card includes
fields for a user-assigned priority for benefit and a developer estimate of the
work effort. Connextra was a London startup that, sadly, did not make it.
Story points estimates of size and effort follow from the observation that
people are better at prioritization than estimation; that is, at predicting relative,
rather than absolute magnitude. Given work items A and B, it is easier to
estimate whether A requires more or less effort than B, or whether A is simpler
or more complex than B. It is harder to estimate the code size of an
implementation of either A or B.
Point Values 1, 2, 3
The simplest scale for story points has point values 1, 2, and 3, corresponding to
easy, medium, and hard. For example, the developers might assign points as
follows:
1 point for a story that the team knows how to do and could do quickly
(where the team defines quickly).
2 points for a story that the team knows how to do, but the
implementation of the story would take some work.
3 points for a story that the team would need to figure out how to
implement. Hard or 3-point stories are candidates for splitting into
simpler stories.
Example 4.3 A development team assigns 1 point to the following story
because they can readily extract and count words from a document, even
if they have to strip out LaTeX formatting commands:
The team assigns 2 points to the following story because they know
how to implement it, but it will take some work to hook up with the
sensor feed from the greenhouse and send notifications:
The team assigns 3 points to the following story because they are
not sure how to do color matching of images; for example, floral prints
and geometric designs.
Example 4.4 In Fig. 4.4, the team’s estimated velocity was 17, but it
only completed 12 points worth of stories in an iteration. For the next
iteration, the team can adjust its estimated velocity based on its actual
velocity from recent iterations. One possible approach is to use the
average velocity for the past few, say three, iterations. □
Figure 4.4 Estimated and actual velocity for an iteration. The team planned
17 points worth of work, but completed only 12 points worth.
4.3 Structured Group Consensus Estimates
Unbiased consensus estimates from groups can be more accurate than individual
estimates from members of a group. This observation dates back to Aristotle,
who noted that, together, many perspectives from ordinary people were very
likely to be better than those from a few experts.5 The group estimation
techniques in this section are variations on the theme of collecting rounds of
independent estimates until consensus is reached. At the end of each round,
group members get some information about the estimates from other group
members.
Wideband Delphi
For software projects, group discussion can lead to valuable insights that can be
useful during design, coding, and testing. Any concerns that surface during the
discussions can be recorded and addressed later. Pitfalls can hopefully be
avoided. Such insights are missed if group members are kept apart to soften
cognitive bias. Perhaps the benefits of group discussion outweigh the risks of
bias.
The Wideband Delphi method, outlined in Fig. 4.5, combines rounds of
private individual estimates with group discussion between rounds. The
members have a chance to explain the reasoning behind their estimates. The
method has been used successfully for both up-front and agile planning.8
Planning Poker
A variant of the Wideband Delphi method, called Planning Poker is used for
estimation during agile iteration planning. In Planning Poker, participants are
given cards marked with Fibonacci story points 1,2,3,5,8,.... Each developer
independently and privately picks a card, representing their estimate for the user
story under discussion. All developers then reveal their cards simultaneously. If
the individual estimates are close to each other, consensus has been reached.
More likely, there will be some high cards and some low cards, with the others
in the middle.
The discussion begins with the developers with the high and low cards. Do
they know something that the others don’t? Or, are they missing something? The
only way to find out is through group discussion. The process then repeats for
further rounds of individual card selection and group discussion until consensus
is reached.
A moderator captures key comments from the group discussion of a story,
so that the comments can be addressed during implementation and testing.9
4.3.2 The Original Delphi Method
With the original Delphi method, the group members were kept apart and
anonymous. Instead of direct contact with each other, they were provided with
feedback about where their estimate stood, relative to the others. Consensus was
achieved by having several rounds of estimates and feedback.
Example 4.7 The data in Fig. 4.6 is for four rounds of forecasts by a
group of five experts. The same data appears in tabular and graphical
form.
The first round forecasts range from a low of 125 to a high of
1,000. The median forecast for the first round is 200. In the second
round, the range of forecasts narrows to 158-525. The ranges for the
third and fourth rounds are close: 166-332 and 167-349, respectively.
Note that the group is converging on a range, not on a single
forecast. With forecasts, it is not unusual for experts to have differences
of opinion. □
Figure 4.6 An application of the Delphi method with five experts. The same
data appears in both tabular and graphical form.
of three values:
Figure 4.7 Prioritize first by value and cost and then by value and risk.
(a)
(b)
a) Prioritize first by value and cost, as summarized in Fig. 4.7(a). The four
categories in the figure are high-value, low-cost; high-value, high-cost;
low-value, low-cost; and low-value, high-cost.
Low risk if requirements are stable and the technology is known to the
team.
Medium risk is the default if either the requirements are close to but not
yet stable or the technology is not known to the team, but known to the
world.
The risk may be lowered to Low for an experienced team if either the
requirements risk is low (they are stable) or the technology risk is low (it
is known to the team).
High risk is the default if either the requirements are far from stable or if
the technology is new to the world.
Figure 4.8 Combined risk based on two factors: requirements stability and
technology familiarity. Risk increases as requirements become less stable and
as the team becomes less familiar with the technology. In each box, the
default is to go with the higher risk, unless the team has special knowledge
that lowers the risk.
The risk may be lowered to Medium for a team with special skills and
experience, if either the requirements risk is low (they are stable) or if the
technology risk is low (it is known to the team).
4.5 Customer Satisfiers and Dissatisfiers
What makes one product or feature desirable and another taken for granted until
it malfunctions? An analysis of satisfiers and dissatisfiers is helpful for
answering this question. The classification of features in this section can also be
used for prioritization. Features are classified into the following categories: key,
attractors, expected, neutral, and reverse. Key features correspond roughly to
must-haves, expected to should-haves, and reverse to won’t-haves. Attractor
features are the ones to watch. They can surface latent needs - needs that users
did not know that they had. Attractors can be the differentiator that turns a
product into a best seller.
4.5.1 Kano Analysis
Noriaki Kano and his colleagues carried the distinction between job satisfiers
and dis-satisfiers over to customer satisfiers and dissatisfiers.13 Kano analysis
assesses the significance of product features by considering the effect on
customer satisfaction of (a) building a feature and (b) not building the feature.
Kano analysis has been applied to user stories and work items in software
development.14
Paired Questions
Kano et al. used questionnaires with paired positive and negative questions
about product features. The paired questions were of the following form:
With each form, positive, and negative, they offered five options:
a) I’d like it
b) I’d expect it
c) I’m neutral
d) I can accept it
e) I’d dislike it
Box 4.2 Job Satisfiers and Dissatisfiers
The factors that lead to job satisfaction are different from the factors that
lead to job dissatisfaction, as Frederick Herzberg and his colleagues
discovered in the 1950s:15
Job satisfaction is tied to the work, to what people do: “job content,
achievement on a task, recognition for task achievement, the nature of
the task, responsibility for a task and professional advancement.”
a) I’d be satisfied
b) I’m neutral
c) I’d be dissatisfied
The rows in the nine-box grid correspond to the cases where a feature is
built. The columns are for the cases where the feature is not built. Note the
ordering of the rows and columns: with rows, satisfaction decreases from top to
bottom; with columns, satisfaction decreases from left to right.
Key Features
The top-right box in Fig. 4.9 is for the case where customers are satisfied if the
feature is built and would be dissatisfied if it is not built. With key features, the
more the better, subject to the project’s schedule and budget constraints. For
example, with time-boxed iterations, a constraint would be the number of
features that the team can build in an iteration.
Reverse Features
The box to the bottom left is for the case where customers would be dissatisfied
if the feature is built and satisfied if it is not built. The two adjacent boxes are
also marked Reverse. With reverse features, the fewer that are built, the better.
Attractor Features
The box in the middle of the top row in Fig. 4.9 is for the case where customers
would be satisfied if the feature is built and neutral if it is not. This box
represents features that can differentiate a product from its competition. Features
that address latent needs are likely to show up as attractors.
Example 4.12 Mobile phones with web and email access were a novelty
around the year2000. Kano analysis in Japan revealed that young people,
including students, were very enthusiastic about the inclusion of the new
features, but were neutral about their exclusion. These features were
therefore attractors.
Indeed, such phones with web and email access rapidly gained
popularity. □
Expected Features
The box in the middle of the right column in Fig. 4.9 is for the case where
customers would be neutral if the feature is built and dissatisfied if it is not built.
Features customers take for granted fit in this box. For example, if a feature does
not meet the desired performance threshold - that is, if performance is not built
in - customers would be dissatisfied. But if performance is built in, then
customers may not even notice it.
Indifferent Features
The middle box in Fig. 4.9 is for the case where customers would be neither
satisfied nor dissatisfied if the feature were built. Features that customers
consider unimportant would also fit in this box.
4.5.3 Life Cycles of Attractiveness
The attractiveness of features can potentially vary over time. The progression in
Fig. 4.10 shows a possible trajectory: from indifferent to attractor to key to
expected. The table in the figure is from the top right of the nine-box grid in Fig.
4.9.
Figure 4.10 Over time, a feature can go from being indifferent to attractor to
key to expected.
Example 4.13 Consider again mobile phones with features for web and
email access. From Example 4.12, these features were attractors when
they were introduced in Japan. Web and email access soon became key
features that everyone wanted. Now, web and email access are expected:
they are taken for granted since practically every phone has them. □
The concept of attractors carries over from the binary “built or not” case to
degrees of sufficiency. For attractors, customers start out neutral if the degree of
the feature is insufficient. As the degree of sufficiency increases, customer
satisfaction increases from neutral to satisfied. Conversely, for expected features,
customers start out dissatisfied. As the feature’s sufficiency increases, customer
dissatisfaction decreases, eventually reaching neutral.
Example 4.14 Response time is an expected feature that is measured on
a sliding scale. Let us measure it in milliseconds. Users are dissatisfied if
it takes too long for an application to respond to a user action. In other
words, users are dissatisfied if response time is insufficient.
As response time improves, it becomes fast enough, user
dissatisfaction goes away. In other words, as response time becomes
sufficient, users become neutral. □
4.6 Plan-Driven Estimation Models
A software project has a cost and schedule overrun if the initial development
effort estimates are too low. The state of the art of effort estimation can be
summarized as follows:18
Historical data about similar past projects is a good predictor for current
projects.
The diagram in Fig. 4.11 has been dubbed the Cone of Uncertainty, after
the shape enclosed by the solid lines. Similar diagrams were in use in the 1950s
for cost estimation for chemical manufacturing.19
4.6.1 How Are Size and Effort Related?
The “cost” of a work item can be represented by either the estimated program
size or the estimated development effort for the item. Size and effort are related,
but they are not the same. As size increases, effort increases, but by how much?
The challenge is that, for programs of the same size, development effort can
vary widely, depending on factors such as team productivity, problem domain,
and system architecture. Team productivity can vary by an order of magnitude.20
Critical applications require more effort than casual ones. Effort increases
gradually with a loosely coupled architecture; it rises sharply with tight
coupling.
Project managers can compensate for some of these factors, further
complicating the relationship between size and effort. Experienced project
managers can address productivity variations when they form teams; say, by
pairing a novice developer with someone more skilled. Estimation can be
improved by relying on past data from the same problem domain; for example,
smartphone apps can be compared with smartphone apps, embedded systems
with embedded systems, and so on. Design guidelines and reviews can lead to
cleaner architectures, where effort scales gracefully with size.
The relationship between size and effort is therefore context dependent.
Within a given context, however, historical data can be used to make helpful
predictions.
For large projects or with longer planning horizons, it is better to work with
size. Iteration planning, with its short 1-4 week planning horizons, is often based
on effort estimates. The discussion in this section is in terms of effort - the same
estimation techniques work for both size and effort.
Figure 4.12 How does development effort E grow with program size S? The
three curves are for different values of the constant b in the equation E = aSb,
where a is also a constant.
The three curves in Fig. 4.12 were obtained by picking suitable values for
the constants a and b in the equation
(4.1)
Case b > 1 (upper curve). The more usual case is when b > 1 and larger
projects become increasingly harder, either because of the increased need
for team communication or because of increased interaction between
modules as size increases. In other words, the rate of growth of effort
accelerates with size.
4.6.2 The Cocomo Family of Estimation Models
Equation 4.1, expressing effort as a function of size is from a model called
Cocomo-81. The name Cocomo comes from Constructive Cost Model. The
basic Cocomo model, introduced in 1981, is called Cocomo-81 to distinguish it
from later models in the Cocomo suite.21
For a given project, the constants a and b in Equation 4.1 are estimated
from historical data about similar projects. IBM data from waterfall projects in
the 1970s fits the following (E is effort in staff-months and S is in thousands of
lines of code):22
(4.2)
Meanwhile, TRW data from waterfall projects fits the following (the three
equations are for three classes of systems):
(4.3)
The constants in (4.3) can be adjusted to account for factors such as task
complexity and team productivity. For example, for a complex task, the
estimated effort might be increased by 25 percent. (The actual percentage
depends on historical data about similar projects by the same team.) Such
adjustments can be handled by picking suitable values for the constants a and b
in the general equation (4.1).
(4.4)
Constants a, b, and c are based on past data about similar projects. Factors
like team productivity, problem complexity, desired reliability, and tool usage
are built into the choice of constants a, b, and c.
New estimation models continue to be explored. As software engineering
evolves, the existing models lose their predictive power.23 Existing models are
designed to fit historical data, and the purpose of advances in software
development is to improve upon (disrupt) the historical relationship between
development effort, program size, and required functionality.
4.7 Conclusion
Requirements analysis is at the interface between the requirements and the
design steps of a project. It shapes user needs and goals into a consistent
prioritized set of functional requirements and quality attributes for a product.
Rough prioritization suffices with agile methods, since each iteration focuses on
the highest-priority items for that iteration. Later iterations can take advantage of
experience and information gained during earlier iterations. By comparison,
plan-driven requirements analysis must plan down to every last detail, since
planning is done once and for all, at a time when uncertainty is the greatest. The
longer the planning horizon, the greater the uncertainty.
A logical progression of analysis activities appears in the following list.
Some items in the progression may just be checklist items for smaller or simpler
projects. For other projects, the items may be major activities, or even processes.
Clarify and refine the top-level user goals. The questions and goal
analysis techniques in Section 3.6 refine soft starting goals into specific
SMART goals.
For Kano analysis, download the paper by Kano [108]; see also the
Wikipedia article on Kano Analysis.
Exercise 4.2 Are the following statements generally true or generally false?
Briefly explain your answers.
a) The expected overall planning effort is less with up-front planning than
with agile planning.
b) The Iron Triangle illustrates connections between time, cost, and scope.
c) The Agile Iron Triangle fixes time and scope and lets costs vary.
i) Three-point estimation takes the average of the best, the most likely, and
the worst case estimates.
Exercise 4.3 Use the nine-box grid in Fig. 4.13 to classify features during Kano
analysis. Give a one-line explanation for why a given class of features belongs in
one of the boxes in the grid. (Note that the ordering of rows and columns in Fig.
4.13 is different from the ordering in Fig. 4.9.)
Figure 4.13 Classification of features during Kano analysis. The rows and
columns have been scrambled, relative to Fig. 4.9.
Exercise 4.4 For each of the following categories, give an example of a feature
related to video conferencing. Be specific and provide realistic examples.
a) Key
b) Attractor
c) Expected
d) Indifferent
e) Reverse
Figure 4.14 Classification of features during Kano analysis.
Exercise 4.5 Consider Kano analysis using the four-box grid in Fig. 4.14,
instead of the nine-box grid in Fig. 4.9.
a) How would you classify features using the four-box grid? Explain your
answer.
b) For each of the four boxes, give an example of a specific feature that
belongs in that box. Why does it belong in the box?
Exercise 4.6 Table 4.2 shows the change in customer satisfaction for Attractors
and Expected features as the degree of sufficiency of a feature increases. Create
the corresponding tables for Key, Reverse, and Indifferent features.
Exercise 4.7 Magne Jørgensen and Barry Boehm engaged in a friendly debate
about the relative merits of expert judgment and formal models (e.g., Cocomo)
for estimating development effort. Read their debate [107] and summarize their
arguments pro and con for:
a) expert judgment.
b) formal models.
Based on their arguments, when and under what conditions would you
recommend estimation methods that rely on
c) expert judgment.
d) formal models.
5
Use Cases
◈
a) Who is the use case for? Call that role the primary actor of the use
case.
b) What does the primary actor want to accomplish with the system?
Call that the user goal of the use case.
As might be expected, the three main elements of a use case are the
primary actor, the goal, and a basic flow.
This chapter will enable the reader to:
A basic flow extends from the start of a use case to the end of the use
case.
When the primary actor and the user goal are clear from the context, a
simple use case can be shown by writing just the basic flow.
Example 5.1 It is common practice to write a flow as a single
numbered sequence that includes both actor actions and system
responses. The following basic flow describes how a user (the
primary actor) interacts with the system to resize a photo (the user
goal):
Here are some developing basic flows; see also Section 5.3 for tips on
writing use cases.
Start each action in a flow with either “The actor ...” or “The system
...”; see Example 5.1.
For each basic flow, choose the simplest way of reaching the goal,
from start to successful finish.
See Section 5.2 for alternative flows, which represent additional
behavior, beyond the basic flow.
5.2 Alternative Flows: Conditional
Behaviors
The description of most systems is complicated by the many options, special
cases, exceptions, and error conditions that the systems must be prepared to
handle. An alternative flow models conditional behavior that is a variant of
the success scenario represented by a basic flow. It models behavior that is
essential to a system but is not central to an intuitive understanding of how
the system can meet the user goal of a use case.
Ivar Jacobson introduced use cases in the mid-1980s to model phone
systems, which can be enormously complex. Meanwhile, the basic idea of a
phone call is simple: in the next example, a caller connects with a callee in a
few steps.1
Example 5.4 Consider the basic flow in Fig. 5.2, which connects a
caller with a callee. The following specific alternative flow handles
the case where the caller leaves a message because the callee does
not answer in time. This alternative flow was formed by replacing
actions 3-5 in the basic flow:
AL T E R N A T I V E FL O W A: LE A V E A ME S S A G E
1. The caller provides the callee’s phone number.
AL T E R N AT I V E FL O W B: IN VA L I D PH O N E NU M B E R
After Action 1 in the basic flow, if the phone number is
invalid,
2b. The system gives an error message.
Resume the basic flow at Action 6.
□
5.2.2 Extension Points
Instead of writing alternative flows by referring to numbered actions in a
basic flow, we can give names to points in a basic flow. Named points
decouple an alternative flow from sequence numbers in the basic flow, so the
sequence numbers can be changed without touching the alternative flows.
Extension points are named points between actions in a flow. In
addition, there can be extension points just before the first action and
immediately after the last action of a flow. The term “extension points” is
motivated by their use for attaching additional behavior that extends the
system. We follow the convention of writing the names of extension points
in boldface. Within basic flows, extension points will be enclosed between
braces, {and}.
AL T E R N AT I V E FL O W A: FA I L AU T H E N T I C AT I O N
At {Bank Services}, if authentication has failed,
Display “Authentication failed.”
Resume the basic flow at {Return Card}.
□
Figure 5.3 A basic flow with extension points in boldface.
AL T E R N AT I V E FL O W B: IN S U F F I C I E N T FU N D S
At {Dispense Cash}, if account has insufficient funds,
Display “Insufficient funds in account.”
Resume the basic flow at {Return Card}.
A Graphical Aside
The graphical view in Fig. 5.4 is included solely to illustrate the relationship
between a basic flow and its alternative flows. It is not recommended for
writing use cases because the separation of basic from alternative flows is
central to the simplicity of use cases. To combine them in one representation
would be bad practice.
Figure 5.4 Basic and specific alternative flows for cash withdrawals from
an ATM. Such a graphical representation does not readily accommodate
bounded alternative flows.
The graph in Fig. 5.4 uses arrows to show how specific alternative
flows attach to the basic flow for cash withdrawals from an ATM. Flows
correspond to paths through the graph. The basic flow goes sequentially
through the numbered actions, from the start to the end of the use case. The
two alternative flows in the figure are for handling authentication failure and
insufficient funds.
5.2.3 Bounded Alternative Flows
Bounded alternative flows attach anywhere between two named extension
points in a basic flow.
Example 5.8 What if the network connection between the ATM and
the server is lost? Connection loss is an external event that can
happen at any time; it is not tied to any specific point in the basic
flow.
For simplicity, we assume that the actions in the basic flow are
atomic; for example, power does not fail in the middle of an action.
In particular, we assume that the action “Dispense cash and update
account balance” is handled properly; that is, cash is dispensed if
and only if the account is updated. A real system would use
transaction processing techniques to ensure that the account balance
reflects the cash dispensed.
With the basic flow and extension points in Fig. 5.3, the
following bounded alternative flow returns the card if the network
connection is lost before cash is dispensed:
BO U N D E D AL T E R N AT I V E FL O W : LO S E NE T W O R K
CO N N E C T I O N
At any point between {Read Card} and {Dispense Cash}, if
the network connection is lost,
Display “Sorry, out of service.”
Resume the basic flow at {Return Card}.
□
5.3 Writing Use Cases
Use cases are written so they can be read at multiple levels of detail. The
actors and goals in a collection of use cases serve as an outline of a system.
Basic flows, together with one-line descriptions of alternative flows, are
helpful for an intuitive understanding of the system. Further information
about system behavior can be provided by expanding the one-line
descriptions of alternative flows. This ordering of the elements of uses cases
allows them to be developed incrementally. The template in Table 5.1 can be
filled in as information about a project becomes available. This section also
distinguishes between the level of detail in (a) a use case that describes user
intentions and (b) a use case that adds specifics about user-system
interactions.
Table 5.1 A representative template for writing use cases. The elements are
listed in the order they can be added to a use case.
5.3.1 A Template for Use Cases
There is no standard format for writing the elements of a use case. The
template in Table 5.1 is representative of templates that are used in practice.
The elements are considered in this section in the order they can be added to
a use case.
Name and Goal The first element in the template is a name for the use
case. The name is preferably a short active phrase such as “Withdraw Cash”
or “Place an Order.” Next, if needed, is a brief description of the goal of the
use case. If the name is descriptive enough, there may be no need to include
a goal that says essentially the same thing.
The Basic Flow The basic flow is the heart of a use case. It is required.
Begin a basic flow with an actor action. The first action typically triggers the
use case; that is, it initiates the use case.
Alternative Flows Long use cases are hard to read, so the template asks
only for a list of alternative flows. Identify an alternative flows by its name
or its goal. Since requirements can change, alternative flows need not be
fleshed out until the need arises.
Alternative flows must be about optional, exceptional, or truly
alternative behavior. Otherwise, the relevant actions may belong in the basic
flow. If the behavior is not conditional, it is not alternative behavior and may
not belong in this use case; it may belong in some other use case.
Extension Points Use extension points in the basic flow only to indicate
points for inserting additional behavior. Alternative flows, both specific and
bounded, attach to a basic flow at named extension points; for examples, see
Section 5.2.
1. Begin with a list of actors and goals for the whole system. Such a list
can be reviewed with users to (a) validate and prioritize the list of
stakeholders and goals and (b) explore the boundaries of the proposed
system.
2. Draft basic flows and acceptance tests for the prioritized goals. The
draft can focus on user intentions, with system interactions included as
needed. Acceptance tests are helpful for resolving ambiguities and for
reconciling the basic flow and the goal of a use case.
Figure 5.5 Use case diagram for a salary system. An individual can
have two roles: employee and manager. Each role is represented by a
different actor.
“It’s hard to design without something that you could describe as a use
case.”
“Many described use cases informally, for example, as: ‘Structure
plus pithy bits of text to describe a functional requirement. Used to
communicate with stakeholders.’ ”6
5.5 Relationships between Use Cases
Most systems can be described by collections of self-contained use cases. A
self-contained use case may benefit from a private subflow that serves a
well-defined purpose. Inclusion and extension relationships between use
cases must be handled with care. As an influential book by Kurt Bittner and
Ian Spence notes:
If there is one thing that sets teams down the wrong path, it is the
misuse of the use-case relationships.7
5.5.1 Subflows
Even for simple systems, the readability of flows can be enhanced by
defining subflows: a subflow is a self-contained subsequence with a well-
defined purpose. The logic and alternatives remain tied to the basic flow if
subflows are linear, where all the actions are performed or none of them are.
A subflow is private to a use case if it is invoked only within that use
case.
5.5.2 Inclusion of Use Cases
An inclusion is a use case that is explicitly called as an action from a basic
flow. The callee (the inclusion) is unaware of its caller. A possible
application is to partition a use case by factoring out some well-defined
behavior into an inclusion (a use case of its own). Alternatively, two or more
use cases can share the common behavior provided by an inclusion.
Inclusion A use case knows about the inclusion that it calls. The
inclusion use case is unaware of its caller.
BA S I C FL O W : PL A C E OR D E R
1. The shopper enters a product category.
Now, suppose that the shopper has the additional goal of getting
product recommendations. The extension use case Get
Recommendations invokes itself at the extension point in Place
Order:
EX T E N S I O N : GE T RE C O M M E N D AT I O N S
At {Display Products} in use case Place Order 〈system
makes recommendations〉
User Stories Are Lighter Weight User stories require less effort to create
than use cases because a use is closer to a collection of related user stories
than it is to an individual story.
Note, however, that a use case need not be fully developed up front; it
can be evolved as a project progresses. A project can opt for a combination
of context and light weight by incrementally developing both use cases and
user stories.
Further Reading
Jacobson, Spence, and Kerr [98] introduce use cases, along with
high-level guidelines for writing and using use cases. They offer use
cases for broad use during software development, not just for
requirements; see Box 5.1.
a) A basic flow
d) How the alternative flows attach to the basic flow using extension
points
For each alternative flow, show the full flow, not just the name of the flow.
For inclusions, provide both a descriptive name and a brief comment about
the role of the included use case.
Exercise 5.2 Write a use case for the software for the insurance company
scenario from Exercise 3.9.
Exercise 5.3 Write a use case for the software to control a self-service
gasoline pump, including the handling of payments, choice of grade of gas,
and a receipt. In addition, when the screen is not being used otherwise, the
system must permit targeted advertising on the screen. Targeted means that
the ads are based on the customer’s purchase history with the vendor.
Exercise 5.4 Prior to meeting with the customer, all you have is the
following brief description of a proposed system:
The system will allow users to compare prices on health-insurance plans
in their area; to begin enrollment in a chosen plan; and to simultaneously
find out if they qualify for government healthcare subsidies. Visitors will
sign up and create their own specific user account first, listing some personal
information, before receiving detailed information about the plans that are
available in their area. [Description adapted from Wikipedia, CC-BY-SA 3.0
license.]10
Write a use case based on this description.
Exercise 5.5 HomeAway allows a user to rent vacation properties across the
world. Write a use case for a renter to select and reserve a vacation property
for specific dates in a given city.
Exercise 5.6 Write a use case for an airline flight-reservations system. For
cities in the United States, the airline either has nonstop flights or flights
with one stop through its hubs in Chicago and Dallas. Your reservations
system is responsible for offering flight options (there may be several
options on a given day, at different prices), seat selection, and method of
payment (choice of credit card or frequent flier miles). Another team is
responsible for the pricing system, which determines the price of a round-
trip ticket.
Exercise 5.7 Write a use case for the software to send a text message
between two mobile phones, as described here. Each phone has its own
Home server in the network, determined by the phone’s number. The Home
server keeps track of the phone’s location, billing, and communication
history. Assume that the source and destination phones have different Home
servers. The destination Home server holds messages until they can be
delivered. Also assume that the network does not fail; that is, the phones stay
connected to the network.
6
Design and Architecture
◈
For any software project, large or small, architecture is key to managing the
intrinsic complexity of software. The design of a system includes its
architecture, so, more broadly, design is key to managing software
complexity. Informally, architecture partitions a system into parts that are
easier to work with than the system as a whole. Let us refer to the parts as
architectural elements, or simply elements. With a clean architecture, we can
reason about the system in terms of the properties of its elements, without
worrying about how the elements are implemented.
Like software, architecture is invisible. What we work with are
descriptions or “views” of an architecture. Different views serve different
purposes; for example, a source code view differs from a deployment view
showing the geographic distribution of the servers.
This chapter provides guidelines for designing and describing system
architecture. It progresses bottom-up from individual classes to whole
systems. UML class diagrams are views of the classes in a system. It may
take multiple views to describe a large system.
This chapter will enable the reader to:
b) The relationships between the elements; that is, how the elements
work together to form the whole system.
Example 6.1 Suppose that a source-code view of a spelling checker
has the following elements:
Design is used to understand and reason about both the external and
internal properties of the system and its parts.3
6.1.3 What Is a Good Software Architecture?
Asking what makes an architecture good is akin to asking what makes a
piece of software good. There are no hard and fast rules; only guidelines.
Here are some questions we might ask of any architecture:
Functional Will the system meet its user requirements? Will it be fit
for purpose? Will it be worth the cost? In short, will it do the job it is
designed to do?
2. The modules interact only through the services that they provide each
other.
At the system level, we can reason about the system purely in terms
of module responsibilities and relationships.
Encrypt, the main module, is responsible for the user interface. Its
primary job is to read plain text and write cipher text. It also gets
an encryption key and initializes the Delta module with the key. It
uses the Shift module.
The other two requirements are (a) buy-in from senior management and
the organization, and (b) support for tools and automation.
6.3 Class Diagrams
If we recursively partition a modular system into its subsystems, and the
subsystems into their parts, we eventually get to units of implementation,
such as classes and procedures. This section introduces UML class diagrams,
a graphical notation for designing and describing classes and their
relationships; that is, how the classes work together. Used properly, the
notation enables the modularity concepts and guidelines in Section 6.2 to be
applied at the class level. UML allows a class diagram to include as much or
as little information about a class as desired. Thus, UML descriptions can
contain implementation details, if a designer chooses to add them.
This section is a selective introduction to the syntax and semantics of
class diagrams in UML (Unified Modeling Language). The language was
formed by combining the object-oriented design methods of Grady Booch,
James Rumbaugh, and Ivar Jacobson.10 The full language is large and
complex, with over twenty kinds of diagrams, each serving a specific
purpose. The different kinds are typically used by themselves, independently
of the others. For example, the use case diagrams of Section 5.4 are part of
UML. Package diagrams are used to illustrate groupings of modules in
Section 6.5.
6.3.1 Representing a Class
A UML class diagram is a developer’s view of the significant classes in a
system’s design. UML enables us to highlight what we consider important
about the classes and their relationships. Other information about the classes
can be suppressed if it is not needed for understanding or documenting a
design.
The simplest representation of an individual class is a box (rectangle)
with just the name of the class. The general representation of a class contains
its name, together with its significant attributes and operations. Informally,
attributes correspond to fields for data. Operations correspond to methods or
procedures/functions. Both attributes and operations are discussed in this
section.
The class name is required. The attributes and operations are optional.
When all three are present, the box for a class has lines to separate them,
stacked with the name on top, the attributes in the middle, and the operations
at the bottom.
Example 6.4 The class diagram in Fig. 6.4 is based on the modular
design in Fig. 6.2 for an encryption system. The classes correspond
to modules with the same name. The dashed arrows in the class
diagram represent dependencies among classes; dependencies are
discussed later in this section.
In the class diagram, the two boxes to the left show classes
Encrypt and Shift by their names alone. To the right, the box for
class Delta is represented by its name, attribute key of type
String, and operations offset() and setkey(). The visibility
markers, + and -, stand for public and private, respectively. See also
the discussion of visibility in what follows. □
The syntax for writing attributes has one required element: the name.
All other elements in the following syntax are optional:
Here, 〈and〉 enclose placeholders. The colon, :, goes with the type, and
is dropped if the type is not provided. Similarly, the equal sign, =, is dropped
if no default initial value is provided.
Example 6.5 The following are some of the ways of providing
information about an attribute:
Visibility
The visibility markers are as follows:
While the meanings of public and private visibility are essentially the same
across languages, the interpretation of package and protected visibility can
vary subtly. Java and C++ have different notions of protected. In Java, it
means visible within the package and to subclasses, whereas in C++, it
means visible only to subclasses.
Multiplicity
Just as a course can have multiple students, some properties may represent
multiple objects. Multiplicity specifies a range for the number of objects
represented by an attribute. A range is written as
Operations
The simplest way to write an operation is to write the name, followed by a
pair of parentheses, (). As previously noted, operations correspond to
methods in a class. The syntax for operations is as follows:
The name and the parentheses are required. The other elements of the syntax
are optional. If present, the parameters are specified by writing a name
followed by a colon and a type. If the return type is not specified, then the
colon, :, is dropped.
Example 6.6 The following are some of the ways of describing the
same operation:
□
6.3.2 Relationships between Classes
We turn now from individual classes to how classes work together. Class
diagrams support three kinds of relationships between classes: dependencies,
generalizations, and associations. Examples of these relationships appear in
Fig. 6.5.
(a) Dependency
(b) Generalization
(c) Association
Generalizations A generalization is a subclass-superclass relationship. It is
represented by an arrow ending in an unfilled triangle touching the
superclass. In Fig. 6.5(b), the arrow is from class Ellipse to the more
general class Shape.
Names: Roles and Verbs Associations can be named either after the
underlying property or by the role played by a class. For example, given that
students take courses, the role of a student is to take a course. So, takes is a
suitable name for the association between class Student and class Course.
Similarly, courses have students, so has is a suitable name for the
association between Course and Student.
In a diagram, a role is shown next to the box for a class. For example,
in Fig. 6.6(d), see the roles next to Student and Course.
The 4+1 grouping is a helpful starting point for picking views for a
project. The “+1” refers to selected scenarios or use cases that span the four
kinds of views: see Fig. 6.8.
Logical Views
Logical views focus on end-user functionality. Developers can validate the
architecture by tracing how the elements in the view will support a scenario
or use case. Logical views facilitate early stakeholder feedback on questions
like the following: Will the proposed solution meet user needs and goals?
Will it satisfy all stakeholders? Are there any major missing gaps? The
views are also useful for probing for potential requirements changes. Section
6.2 has design guidelines for modules that isolate anticipated changes.
Development Views
Development views guide the development and evolution of the source code
for a system. The elements in these views are typically modules. Modules
with clear interfaces allow developers of different modules to work
independently and in parallel.
Figure 6.9 A logical view of a system that supports a mobile user
interface for technicians and a web user interface for managers. The
system uses an external map service and an external data store.
Dynamic Views
Dynamic views model the components, messages, and data flows that arise
at run time. Examples of components include objects and processes. The
views can handle distributed or concurrent systems.
Figure 6.10 A development view of the architecture of a compiler that
shows module-submodule relationships.
Example 6.15 With the encryption system in Fig. 6.2, the uses
structure of the system is as follows:
Module Encrypt uses Delta to manage the key, and it uses Shift
to encrypt the plain text, one letter at a time. Module Shift uses Delta
to help decide how to encrypt individual letters. □
1. Introduction
(b) Show message and data flows across the boundary between the
current system and external platforms, databases, and services; for
example, for locations and maps.
(d) Keep the context diagram simple. Use views to describe the
current system itself.
1. Primary Diagram
(b) Briefly describe the purpose of the element and the services it
provides.
(d) Add notes on key design decisions for the element, including
what not to change.
Introduction to CommApp
Example 6.17 The partial module hierarchy in Fig. 6.14 is for the
communications app in Example 6.16. The root of the tree is for the
entire app. The children of the root represent the three main
modules: Model, View, and Controller. The hierarchy also shows
submodules for Model and View. □
Element View
Rationale The look and feel of the app are subject to change, so
the look-and-feel decisions are isolated within the View module. For
example, session windows need not be rectangles with hard edges.
The approach of managing sessions by moving contact cards is
not subject to change.
These views are accompanied by scenarios or use cases that can touch
multiple views. Scenarios account for the +1 in the name 4+1.
Classes and modules are typical architectural elements in development
views. “Module” is a general term that applies not only to classes, but to any
grouping of related data, operations, and other constructs. Modules have
well-defined interfaces. Implementation hiding is a key principle for the
design of modules. The idea is to give each module a specific responsibility
and then hide the implementation of the responsibility from the other
modules. This hiding is achieved by having modules interact only through
their interfaces.
Systems that embody implementation hiding are called modular.
Modularity makes systems easier to work with, since we can reason about
their properties in terms of module responsibilities; that is, the services that
the modules provide each other. Modularity also makes systems easier to
modify. As long as the interface to a module remains intact, we can change
the module’s implementation without touching the rest of the system. Thus,
we can plan for change by isolating likely changes within modules. See
Section 6.2 for guidelines for designing modules. The guidelines also apply
to classes.
A presentation or lightweight documentation about a system
architecture can be created by using the following outline:
System Overview
1. Introduction Introduce the problem and the challenges. Summarize
the rationale for the solution.
What is a Module Guide? List its main elements and their purpose or
roles.
Exercise 6.2 Are the following statements generally true or generally false?
Briefly explain your answers.
e) With the XP focus on the simplest thing that could possibly work and
on refactoring to clean up the design as new code is added, there is no
need for architecture.
b) Change your design to track both the top ten videos and the top ten
categories; for example, to settle whether Comedy videos are more
popular than Action videos.
For each of the preceding cases, suppose modules A and B have that kind of
coupling. How would you refactor A and B into modules M1, M2, … that
comply with Information Hiding and provide the same services as A and B?
That is, for each public function A.f() or B.f() in the interfaces of A and B,
there is an equivalent function Mi .f() for some refactored module Mi.
Exercise 6.5 For the system in Exercise 6.3, you decide to treat the system
as a product family because you recognize that the same approach can be
applied to track top-selling items for a retailer or the most emailed items for
a news company.
b) What are the variabilities; that is, how do product family members
differ?
Exercise 6.6 KWIC is an acronym for Key Word in Context. A KWIC index
is formed by sorting and aligning all the “significant” words in a title. For
simplicity, assume that capitalized words are the only significant words. As
an example, the title Wikipedia the Free Encyclopedia has three significant
words, Wikipedia, Free, and Encyclopedia. For the two titles
Exercise 6.7 Given a month and a year, the Unix cal command produces a
calendar for that month; for example, cal 10 1752 produces
Given a year, as in cal 2000, the output is a calendar for that year.
Your task is to design a modular architecture for a proposed
implementation of the cal command. (See instructions below.)
Rather than start each design from scratch, software architects tend to adapt
earlier successful designs for similar problems. Even if they do not reuse any
code, the solution approach gives them a head start with the current problem.
Problems like fault tolerance have been addressed over and over again. (The
idea for a solution is to have a live backup system that takes over if the
primary system fails due to a hardware or software fault.)
An architectural pattern outlines properties that are common to all
solutions of a design problem that occurs over and over again. A pattern
provides the core of a solution, in the form of guidelines for designing a
specific solution.
This chapter introduces some frequently occurring architectural
patterns. A partial list appears in Fig. 7.1. To aid the selection of a pattern in
a given situation, the figure mentions what each pattern is good for.
Additional patterns related to clients and servers are covered in Section 7.5.
Section 7.6 deals with product families and product lines, where products
share a “core” of common properties.
Figure 7.1 Some frequently occurring patterns.
Reusability and Portability These two attributes are related, but they are
distinct. For a shining example of reusability, the existing Internet
infrastructure was reused for a new purpose when web browsing was layered
above it, without any change to the infrastructure. Specifically, HTTP was
layered above the Internet protocols, TCP and IP. For an example of
portability, the Unix operating system is ported whenever it is moved from
one machine to another kind of machine.
In short, an upper layer is ported when we move it from atop one lower
layer to another. Meanwhile, a lower layer is reused when we add a new
upper layer above it.
Layer Bridging With layer bridging, a layer may have more than one layer
directly below it. In Fig. 7.4(a), the App layer is drawn so it touches each of
the other layers below it. Thus, modules in the App layer may use modules
in any of the other layers: Framework, Services, and Operating System.
Layers with Sidecar The layers with sidecar pattern differs from the
layered pattern by having one layer - called the sidecar - that vertically spans
all the other horizontal layers; for example, see Fig. 7.4(b). A vertical cross-
cutting sidecar layer may be used for debugging, auditing, security, or
operations management. In Fig. 7.4(b), the sidecar layer is a monitoring
layer.
Care is needed with a sidecar to ensure that layering is preserved; that
is, lower layers cannot access upper layers indirectly through the sidecar.
Layering can be preserved if communications with the sidecar are one-way,
either to or from the sidecar, but not in both directions. In a debugging or
audit setting, modules in the sidecar may reach into the other layers and use
modules in the horizontal layers. On the other hand, in a logging or
monitoring setting, modules in the horizontal layers may use the sidecar to
deposit event logs.
7.1.2 Design Trade-offs
The layered pattern results in strict layering, where each layer knows only
about itself and the interface of the layer directly below. Layer boundaries
are hard and layer implementations are hidden. New layers can be added
without touching the layers below.
In practice, designers face trade-offs between strictness of layering and
other goals. The designers of the Internet protocols chose strictness and
reusability, accepting some loss of efficiency in data transmission. By
contrast, when Unix was ported from one machine to another, the designers
sacrificed some portability (strictness) for performance. They favored
performance at the expense of some added effort because an operating
system is ported once to a new machine, and then used over and over again
on the new machine.
Figure 7.5 Each endpoint (host) attached to the Internet runs a copy of the
IP stack of protocols. The dashed arrows represent the logical flow of data
between corresponding layers at the endpoints. The solid arrows represent
the actual path between applications at the endpoints.
Example 7.2 Strict layering of Internet protocols results in some
data transmission overhead. The lower layers of the IP stack are
unaware of the layers above them, so each layer adds its own
identifying information to a packet, in the form of a header; see Fig.
7.6. The actual data that the endpoints want to exchange, called the
payload, must carry the packet headers with it as overhead.
Starting at the top of Fig. 7.6, a TCP header is added when a
payload of between 64 and 1518 bytes goes down the IP stack to the
TCP layer. The payload is shaded at the TCP level, since TCP does
not look inside the packet at the payload. An IP header is added as
the now TCP packet goes to the IP layer. At the IP level, the TCP
packet is shaded since IP does not look inside it. Another header is
added at the Link layer. At the destination, the headers are stripped
as a packet goes up the IP stack to deliver the payload to the
application there.
The header sizes in Fig. 7.6 are based on the assumption that
IPv4 (IP version 4) is the Internet Layer protocol and that Ethernet is
the Link Layer protocol. For simplicity, we abbreviate IPv4 to IP. □
Scalability Copies of the data can improve scalability, but they bring
with them the problem of keeping the copies consistent with each
other.
7.2.2 Observers and Subscribers
The observer and the publish-subscribe patterns are closely related. In both
patterns, a publisher component raises (publishes) an event, unaware of who
is interested in being notified about the event. The two patterns differ in
whether the interested component knows the publisher. An interested
component is called an observer if it knows the identity of the publisher. It is
called a subscriber if it does not know the publisher.
5. Subscribers are notified of all events. They pick out the events that
are of interest to them.
Separated Presentation
The popular saying “The map is not the territory” illustrates the distinction
between views and models: a model corresponds to the territory and a view
corresponds to a map. There can be many maps of the same territory; for
example, a satellite map, a street map, or a topographical map. The
following example illustrates the separation between models and views.
Example 7.6 In a photo-editing application, the photo objects belong
in the model. Each photo object has attributes such as the photo’s
height, width, resolution, and the pixels that make up its image. The
model also includes the logic for operations on the photo; for
example, for resizing it or cropping it.
Two presentations of a photo object appear in Fig. 7.7. Both
presentations get their information from the model, but they display
different attributes. The image view on the left displays a picture of
the Mona Lisa; that is, it displays the pixels in the photo object. The
dialog view on the right displays numeric values for the photo’s
height, width, and resolution in pixels per inch. The height and width
are given in both inches and in pixels. □
Now, suppose that the user doubles the numeric value of the
height from 450 to 900 pixels in the dialog view. The model is
unaware of the presentations, but in the other direction, the
presentations know about the model. Between them, the view and
the controller for the dialog presentation can therefore send a
message to the model that the height has doubled. The model can
then update the other photo attributes: the width must double and the
pixels from the original image have to be mapped onto the pixels for
the doubled image. (Both the height and width double, so the
number of pixels quadruples.)
When the model changes, both presentations must synchronize
with the model to ensure that they display the current state of the
photo object. □
Model The model is responsible for the domain logic and objects.
There is one model per application domain.
While a user interface always has just one model, it can have multiple
viewcontroller pairs. Between them, a paired view and controller are
responsible for observing the model. If they are notified of a change to the
model, they retrieve the relevant updated information to synchronize the
view with the model. In practice, MVC architectures vary in exactly how
they observe the model and retrieve updates. In the architecture in Fig. 7.8,
the view does the observing and retrieving.
Example 7.8 With the MVC architecture in Fig. 7.8, this example
traces how a user change made through one view is reflected in all
views. The dialog view of a photo object in Fig. 7.7 has several
fields that are linked. The height appears in both pixels and inches.
Furthermore, we continue to assume that the proportions of the
photo are maintained, so if the height doubles, the width must double
too.
When the user types 9, then 0, then 0 in the field for the height
in pixels, the view raises an event to notify its controller. The
controller gets the digits and passes the new height 900 to the model.
The model resizes the photo object, which doubles the height to 900
and the width to 600.
The model then raises an event about the change in the photo
object. All views are observers of the model, so they are notified of
the changes. They retrieve the latest information about the photo
object and synchronize their displays with the model. Thus, the user
activity in the pixel-height field in the dialog view is propagated not
only to the photo view, but also to other fields in the dialog view,
with the MVC architecture in Fig. 7.8. □
7.3.3 Keep Views Simple
Views are harder to test than the other components in an MVC architecture
because they involve user interaction. Hence the guidance to keep views
simple while dividing work between views and their controllers. Controllers
and models communicate via messages and notification, which are more
readily tested using automated frameworks than user activity; see Chapter 9
for unit test frameworks.
Humble Views
The humble approach is to minimize the behavior of any object that is hard
to test.6 A humble view is limited to detecting user activity and rendering
information on a display. Its controller observes the model and tells the view
what to display. It is easier to intercept and automate testing of messages
from a controller to a view than it is to test what is actually displayed on a
screen.
Example 7.9 Let us revisit Example 7.8, this time with a humble
view. When the user enters 900 for the height of the photo in pixels,
the humble view notifies its controller. The controller tells the model
of the change. The model resizes the photo object and raises an event
about the update to the photo.
The controller is the observer, so it retrieves the new height and
the width. The controller then tells the humble view exactly what to
display in each field in the dialog view. □
Complex Presentations: Rely on the Controller
A complex user interface can involve more than a passive display of
information from the model. It can involve some decisions and computations
that are specific to a presentation. Such logic is specific to the given view
and has no relevance to the other views. It therefore does not belong with the
model. It belongs in the controller, since we want to keep views humble.7
Example 7.10 Large companies maintain communication networks
with links between sites across the world. Engineers monitor the
links to ensure that the data traffic through them is flowing smoothly.
If a link is overloaded, the engineers take appropriate actions.
The engineers at one company want two views of traffic across
the company network:
Unix Pipelines
Unix pipelines are composed by using the pipe operator |. A pipeline with
the three tools p, q, and r is written as
p|q|r
The Unix operating system takes care of connecting the output of p to the
input of q and the output of q to the input of r.
The operating system comes with a set of useful tools that are designed
to work together in pipelines. The tools work with streams of characters. The
following tools are used in Examples 7.13 and 7.14. For more information
about them, use the Unix man command with the name of the tool.
Suppose that the input to this pipeline consists of the two lines
of text:
1 but 1 shore
1 by 2 shells
1 sells 2 the
1 she 3 sea
□
7.4.2 Dataflow Networks
We get dataflow networks instead of pipelines if we lift the restriction that
each component must have a single input stream and a single output stream.
Example 7.17 The first two components in the pipeline in Fig. 7.10
perform mapping transformations. Translation of nonalphabetic
characters to newlines and the translation of uppercase to lowercase
can both be done a character at a time. Sorting is a prime example of
a reducing function. Note that fonts moves up from the last position
after the sort. The fourth component, uniq, is closer to a mapping
than a reducing transformation. It compares consecutive lines and
drops repetitions, so its decisions are based on local context. In the
extreme case, if all the lines in a stream are the same, it reduces the
stream to a single line. In practice, uniq behaves like a mapping. □
3. Trial Period After successfully handling some of the real client load,
the new version n + 1 is ready for full production; that is, ready to
handle all client requests. During a trial period, the earlier version n
remains on standby.
The time frames for the preceding stages are measured in hours or days.
Problem Ensure that clients get the services they want through a service
interface, without needing to know about the infrastructure (including
servers) that is used to provide the services.
Table 7.1 Quality attributes influence the choice of patterns for a software
problem.
Further Reading
The book by Alexander, Ishikawa, and Silverstein [5] is the
definitive source for Alexander’s patterns for communities and
buildings.
There “are beds and tables in the world - plenty of them, are there not?”
“Yes.”
“But there are only two ideas or forms of them - one the idea of a bed, the
other of a table.”
“True.”
“And the maker of either of them makes a bed or he makes a table for our
use, in accordance with the idea.”
The Republic dates back to circa 380 BCE.19
g) Just as all pipes in a Unix pipeline carry character streams, all links
in a dataflow pipeline carry streams of the same data type.
What specifically are the roles of the view and its controller? Clearly
list any assumptions.
Exercise 7.4 Use Unix tools to create pipelines for the following problems.
In addition to the tools sort, uniq, and tr introduced in Section 7.4, the
suggested tools are the stream editor sed and the programming language
awk. Both sed and awk work a line at a time, where a line is a sequence of
characters ending in the newline character 'n'.
a) Split an input file into a list of words, one per line, where a word is a
sequence of uppercase and lowercase letters. Preserve word order.
Remove all other characters, except for the newline characters
following each word.
Exercise 7.5 Parnas [147] uses KWIC indices to illustrate the modularity
principle; for a quick introduction to KWIC indices, see Exercise 6.6. For
the problem of creating a KWIC index, design separate modular solutions
using each of the following approaches:
Exercise 7.6 This exercise explores the use of T diagrams to describe a boot-
strapping process for creating a compiler for a new class of machines N,
which has no software to begin with. The basic T diagram in Fig. 7.16(a)
shows the three languages that characterize a compiler: the source language
S to be translated, the target language T produced as output, and the
implementation language I that the compiler is written in. The diagram in
Fig. 7.16(b) shows the translation of a compiler. The base of the diagram is a
C compiler on an existing machine M; that is, a compiler from C to M,
written in M, the language of the existing machine. Its input is the source
code in C for a compiler from C to N, the new machine. This C source code
is translated into target code in M: the target is a compiler from C to N that
runs on the existing machine M.
Figure 7.16 T diagram for a compiler from source language S to target
language T, written in implementation language I. (b) Translation of the
source code in C for a compiler from C to N, the language of a new
machine, by a C compiler on an existing machine M.
(a)
(b)
The Review Is for the Project Team’s Benefit The purpose of a review is
to provide a project team with objective feedback. It is then up to the project
team and its management to decide what to do with the feedback. The
decision may be to continue the project with minor changes, to change the
project’s direction, or even to cancel the project.
Reviews have been found to be cost effective for both projects that are
doing well and for projects that need help. Reviews are not meant to be an
audit on behalf of the project’s management. They are not for finding fault
or assigning blame.
Since the project team will be the one to act on reviewer feedback,
members of the development team have to participate in the review. The
team’s participation also helps to build trust between the team and the
reviewers, which increases the likelihood that the team will act on the
feedback from the review.
The Project Has a System Architect Any project that is important enough
to merit a formal architecture review is important enough to have a dedicated
system architect. The “architect” may be a person or a small team. The
reviewers rely on the architect to describe the architecture and provide the
rationale for the design decisions.
The reviewers also assess the team’s skills. Does anyone on the team
have prior experience with such a system? Are the tools new or known to the
team?
8.1.2 Discovery, Deep-Dive, and Retrospective Reviews
The focus of a review varies from project to project and, for a given project,
from stage to stage in the life of a project. The following three kinds of
reviews are appropriate during the early, middle, and late stages of a project,
respectively:
The moderator role can be split into two: organizer of the inspection
and moderator of the group interactions. Similarly, the author role can be
split into two: author of the artifact, and reader who paraphrases the content
to be reviewed.
8.2.2 Case Study: Using Data to Ensure Effectiveness
At one level, the next example describes how one company uses data to
ensure the effectiveness of its inspections. At another level, the example
illustrates the increasing role of data collection and process improvement in
software engineering. See Chapter 10 for more on metrics and measurement.
Example 8.1 The data in Table 8.3 shows just a few of the internal
benchmarks for inspections at Infosys, around 2002. For
requirements documents, preparation by individual reviewers was at
the rate of 5– 7 pages per hour. During the group review meeting, the
progress rate was about the same. The group found 0.5– 1.5 minor
and 0.1– 0.3 major defects per page. For code inspections, the
individual preparation rate was 100– 200 lines per hour and the
progress rate during the group meeting was 110– 150 lines per hour.
The review found about equal numbers of minor and major
defects:0.01– 0.06 per line.
Did the inspection find too many minor defects and too few major
ones? Check whether the reviewers understood the artifact well
enough. Also check whether the reference document was precise
enough. Was this the first review for an artifact from this project?
Was the artifact of low quality? If so, does the author need training
or a simpler work assignment? Rework the artifact.
While the data in this example is dated, it continues to be
relevant. Technology has changed in the interim, but preparation,
progress, and defect-discovery rates today are very likely to be
similar to those in Table 8.3. □
8.2.3 Organizing an Inspection
The moderator assembles a team of independent reviewers. The moderator
also ensures that the project team provides clear objectives and adequate
materials for inspection. For example, for an architecture review, the project
must provide a suitable architectural description. For a code inspection, the
code must compiled and tested.
How Many Reviewers? The cost of an inspection rises with the number of
reviewers: how many are enough? The number of reviewers depends on the
nature of the inspection: with too few, the review team may not have the
required breadth of expertise; with too many, the inspection becomes
inefficient. The fewer reviewers, the better, not only for the cost of the
reviewers’ time, but because it takes longer to coordinate schedules and
collect comments from the reviewers.
A typical inspection has three to six reviewers. For code inspections,
there is evidence that two reviewers find as many defects as four.5
(a) R E V I E W - T H E N - C O M M I T
(b) C O M M I T - T H E N - R E V I E W
After traveling 350 million miles in 274 days, the Mars rover,
Curiosity, landed flawlessly on the planet’s surface on August 5,
2012. The crucial sequence from entry into the thin Mars atmosphere
to the landing was called “seven minutes of terror” by the engineers
who designed it. So many things had to go right to slow the
spacecraft from 11,200 miles an hour to 0 and land the rover without
crashing on top of it.
Software controlled all functions on the rover and its spacecraft.
The landing sequence was choreographed by 500,000 of the 3
million lines of code, written mostly in C by a team of 35 people.
Verification was deeply entwined with the other development
activities.14
The code for the mission was highly parallel, with 120 tasks.
Parallel programs are notoriously hard to verify. The team used a
powerful technique called logic model checking.
Overall, every precaution was taken to ensure the success of the
mission. What distinguishes the project is not the novelty of the
verification techniques, but the rigor with which the techniques were
applied.
8.4.1 A Variety of Static Checkers
Automated static analysis tools consist of a set of checkers or detectors,
where each checker looks for specific questionable constructions in the
source code. Questionable constructions include the following:
A resource leaks; that is, the resource is allocated but not released.
7) if( condition1 )
8) goto fail;
Using data-flow analysis, we can deduce that the value of logger will
be null when control reaches the last line of the preceding program
fragment. At that point, a failure will occur: there will be no object for
logger to point to, so the method call log(message)will fail.
Before reading the next example, can you find the potential null
dereferences in Fig. 8.6?
Figure 8.6 A program fragment from the open-source Apache Tomcat
Server with a null-dereference bug.
Example 8.4 The real code fragment in Fig. 8.6 avoids a null
dereference for logger, but it introduces a null dereference for
container.
Data-flow analysis would discover that logger in Fig. 8.6 is
defined in two places and used in two places. The two definitions are
on lines 1 and 3. The two uses are on lines 4 and 5. As for
container, there are no definitions; there are four uses, on lines 2,
3, 5, and 7.
A null-dereference failure will occur if container is null
when line 1 is reached. Control then flows from the decision on line
2 to line 4, leaving logger unchanged at null. From the decision
on line 4, control therefore flows to line 7, which has a use of
container. But container is null, so we have a null-dereference
failure.
There is no null dereference if container is non-null when
line 1 is reached, even if container.getLogger() returns null.
□
For code reviews, see the review of six major open-source projects
by Rigby et al. [157].
d) “When indexing into a string, are the limits of the string exceeded?”
Exercise 8.2 The following questions have been proposed for an architecture
discovery review. Rank order the questions, from most to least important, in
your opinion. Explain your reasons for the ordering. There is no single right
or wrong answer.
b) Do the goals for the system capture the customer’s desired user
experience?
d) For each use case, is there a set of modules that implements the use
case?
e) Is there any functionality that is not needed for the use cases?
f) Is the architecture clean?
i) Does the team have the expertise for implementing the system?
Exercise 8.3 Suppose that it is two years from now. Your current project has
turned into a successful company. For a deep-dive architecture review, come
up with 10 independent security-related questions. Ensure that there are no
redundant questions.
Describe the main issues during testing: when to use white-box and/or
black-box testing, setting up a test environment, test selection, and
evaluation of test outputs.
Select and design tests to achieve the desired levels of code coverage
for white-box testing, and input domain coverage for black-box
testing.
9.1 Overview of Testing
The code in Fig. 9.1 is from the clock driver of a processor used in digital media
players produced by Microsoft and Toshiba. On December 31, 2008, owners of
the player awoke to find that it froze on startup. On the last day of a leap year,
the code loops forever.1
Figure 9.1 On the last day of a leap year, this code fails. Where is the fault?
Software under Test The software under test can be a code fragment, a
component, a subsystem, a self-contained program, or a complete
hardware-software system.
Example 9.2 Suppose that the software under test is the code on lines 1–
17 in Fig. 9.1. The input domain is the set of possible initial integer
values for the variable days. The output domain is the set of possible
final integer values for the variables year and days.
The code in Fig. 9.1 cannot be run as is, because it needs a
definition for ORIGINYEAR and an implementation for function
IsLeapYear(). These things must be provided by the environment.
(We are treating the initial value of variable days as an input, so the
environment does not need to provide a value for days.) □
The following questions capture the main issues that arise during testing:
So, how many successful tests should the software pass before its quality is
deemed “high enough”? In what follows, we consider answers to this question.
Adequacy criteria based on coverage and defect discovery are much better
than arbitrary criteria, such as stopping when time runs out or when a certain
number of defects have been found. They cannot, however, guarantee the
absence of bugs.
While testing alone is not enough, it can be a key component of an overall
quality-improvement plan based on reviews, static analysis, and testing; see
Section 10.4.
9.1.4 Test Oracles: Evaluating the Response to a Test
Implicit in the preceding discussion is the assumption that we can readily tell
whether an output is “correct”; that is, we can readily decide whether the output
response to an input stimulus matches the expected output. This assumption is
called the oracle assumption.
The oracle assumption has two parts:
Most of the time, there is an oracle, human or automated. For values such
as integers and characters, all an oracle may need to do is to compare the output
with the expected value. An oracle based on a known comparison can be easily
automated.
Graphical and audio/video interfaces may require a human oracle. For
example, how do you evaluate a text-to-speech system? It may require a human
to decide whether the spoken output sounds natural to a native speaker.
Figure 9.3 Levels of testing. Functional tests may be merged into system
tests; hence the dashed box.
Each level of testing plays a different role. From Section 2.5, to validate is
to check whether the product will do what users want, and to verify is to check
whether it is being implemented correctly, according to its specification. The top
two levels in Fig. 9.3 validate that the right product is being built. The lower
levels verify the product is being built right (correctly).
System and functional testing may be combined into a single level that tests
the behavior of the system; hence the dashed box for functional tests. The
number of levels varies from project to project, depending on the complexity of
the software and the importance of the application.5
Based on data from hundreds of companies, a rough estimate is that each
level catches about one in three defects.6
9.2.1 Unit Testing
A unit of software is a logically separate piece of code that can be tested by
itself. It may be a module or part of a module. Unit testing verifies a unit in
isolation from the rest of the system. With respect to the overview of testing in
Fig. 9.2, the environment simulates just enough of the rest of the system to allow
the unit to be run and tested.
Unit testing is primarily white-box testing, where test selection is informed
by the source code of the unit. White-box testing is discussed in Section 9.3.
From Section 9.1, the environment includes the context that is needed to
run the software under test. For a Java program, the context includes values of
variables and simulations of any constructs that the software relies on.
Example 9.5 The pseudo-code in Fig. 9.4(a) shows a class Date with a
method getYear(). The body of getYear() is not shown – think of it
as implementing the year calculation in Fig. 9.1.
The code in Fig. 9.4(b) sets up a single JUnit test for getYear().
The annotation @Test marks the beginning of a test. The name of the
test is test365. A descriptive name is recommended, for readability of
messages about failed tests. Simple tests are recommended to make it
easier to identify faults.
The test creates object date and calls getYear(365), where 365
represents December 31, 1980. JUnit supports a range of assert methods;
assertEquals() is an example. If the computed value year does not
equal 1980, the test fails, and JUnit will issue a descriptive message.
For more information about JUnit, visit junit.org . □
Example 9.7 The edges and paths in Fig. 9.5 represent the uses relation
between the modules in a system. Module A uses all the modules below
it. A uses B and C directly; it uses the other modules indirectly.
A uses B, so B must be present and work for A to work. But, B uses
D and E, so D and E must also be present and work for A to work. □
Example 9.8 Consider the modules in Fig. 9.5. Any ordering that adds a
child node before a parent node can serve for incremental integration
testing, except for modules F and G, which use each other. They must be
added together.
Here are two possible orderings for bottom-up testing:
H, D, I, E, B, J, F and G, C, A
J, I, F and G, C, H, E, D, B, A
Stub modules are often more complicated than they first appear to be.9
9.2.3 Functional, System, and Acceptance Testing
Functional testing verifies that the overall system meets its design
specifications. System testing validates the system as a whole with respect to
customer requirements, both functional and nonfunctional. The overall system
may include hardware and third-party software components.
Functional testing may be merged into system testing. If the two are
merged, then system testing performs a combination of verification and
validation.
System testing is a major activity. Nonfunctional requirements include
performance, security, usability, reliability, scalability, serviceability,
documentation, among others. These are end-to-end properties that can be tested
only after the overall system is available.
Testing for properties like security, usability, and performance are important
enough that they may be split off from system testing and conducted by
dedicated specialized teams.
Acceptance Testing
Acceptance testing differs from the other levels of testing since it is performed
by the customer organization. Mission-critical systems are usually installed in a
lab at a customer site and subjected to rigorous acceptance testing before they
are put into production. Acceptance tests based on usage scenarios ensure that
the system will support the customer organization’s business goals.
9.2.4 Case Study: Test Early and Often
Testing was a top priority during the development of a highly reliable
communications product at Avaya. Tests were a deliverable, along with the code
they tested. The development process called for extensive unit, functional,
integration, system, inter-operability, regression, and performance testing.
Interoperability testing verifies that a given product will work together with
other products, including products from other vendors. Regression testing reruns
all tests after a change to verify that the change did not break any existing
functionality. Performance testing is one of many kinds of testing for quality
attributes.
The relative ordering of tests in the Avaya project is reflected in Fig. 9.6.
Otherwise, the figure is a simplified variant of the Avaya process. The figure has
phases, whereas the Avaya process staggered coding and testing across time-
boxed sprints.
Decision nodes have one incoming and two or more outgoing edges.
Basic nodes represent assignments and procedure calls. Decision
nodes result from Boolean expressions, such as those in conditional and
while statements. The two outgoing edges from a decision node are
called branches and labeled T for true and F for false. If the Boolean
expression in the decision node evaluates to true control flows through
the T branch. Otherwise, control flows through the F branch.
□
Figure 9.7 Control-flow graph for the code fragment in Fig. 9.1.
The flow graph for a program can be constructed by applying rules like the
ones in Fig. 9.8. The rules are for a simple language that supports assignments,
conditionals, while loops, and sequences of statements. Variables E and S
represent expressions and statements, respectively. The rules can be applied
recursively to construct the flow graph for a statement. It is left to the reader to
extend the rules to handle conditionals with both then and else parts.
Figure 9.8 Rules for constructing a control-flow graph.
(a) I: = E
(b) if (E)S
With test 367, control goes through the body of the while loop
exactly once before exiting. Variable year is initialized to 1980, a leap
year, so this test traces the simple path
The singleton test set {732} happens to cover all nodes in Fig. 9.7.
Multiple tests are usually needed to achieve the desired level of node
coverage. □
These tests achieve 100 percent node coverage, but they do not
achieve 100 percent branch coverage because they do not cover the F
branch from D3 to D1. (In fact, test 732 alone covers all branches
covered by the other tests.)
This F branch out of D3 is covered only when the code is in an
infinite loop. Here’s why. For the branch (D3, D1) to be taken, the
following must hold:
Together, these observations imply that, when the branch (D3, D1)
is taken, days must have value 366 and year must represent a leap year.
With these values for days and year, the program loops forever.
Thus, a test suite designed to achieve 100 percent branch coverage
would uncover the leap-year bug.
The smallest test input that triggers the infinite loop is the value 366
for days – the corresponding date is December 31, 1980. □
Test inputs are drawn from a set called the input domain. The input domain
can be infinite. If not infinite, it is typically so large that exhaustive testing of all
possible inputs is impossible. Test selection is therefore based on some criterion
for sampling the input domain.
The selection criteria in this section are closely, but not exclusively,
associated with black-box testing, which treats the source code as if it were
hidden. Test selection is based on the software’s specification.
9.4.1 Equivalence-Class Coverage
Equivalence partitioning is a heuristic technique for partitioning the input
domain into subdomains with inputs that are equivalent for testing purposes. The
subdomains are called equivalence classes. If two test inputs are in the same
equivalence class, we expect them to provide the same information about a
program’s behavior: they either both catch a particular fault or they both miss
that fault.
A test suite provides equivalence-class coverage if the set includes a test
from each equivalence class.
There are no hard-and-fast rules for defining equivalence classes – just
guidelines. The following example sets the stage for considering some
guidelines.
Example 9.14 Consider a program that determines whether a year
between 1800 and 2100 represents a leap year. Strictly speaking, the
input domain of this program is the range 1800– 2100, but let us take the
input domain to be the integers, since the program might be called with
any integer. Integers between 1800 and 2100 will be referred to as valid
inputs; all other integers will be referred to as invalid inputs.
As a first approximation, we might partition the input domain into
two equivalence classes, corresponding to valid and invalid integer
inputs. For testing, however, it is convenient to start with three
equivalence classes: integers up to 1799; 1800 through 2100; and 2101
and higher.
These equivalence classes can be refined, however, since some
years are leap years and some years are not. The specification of leap
years is as follows:
Every year that is exactly divisible by four is a leap year, except for
years that are exactly divisible by 100, but these centurial years are leap
years if they are exactly divisible by 400.11
This specification distinguishes between years divisible by 4, 100,
400, and all other years. These distinctions motivate the following
equivalence classes (see Fig. 9.9):
Integers between 1800 and 2100 that are divisible by 4, but not by
100.
The integers 1800, 1900, and 2100, which are divisible by 100, but
not by 400.
The integer 2000, which is divisible by 400.
Figure 9.9 Equivalence classes for a leap-year program. The two shaded
regions are for invalid test inputs.
If the specification singles out one or more test inputs for similar
treatment, then put all inputs that get the “same” treatment into an
equivalence class.
Example 9.15 In Fig. 9.9, consider the equivalence classes for valid and
invalid inputs. The valid inputs are the years between 1800 and 2100.
The boundaries of this equivalence class are 1800 and 2100. There are
two classes for invalid inputs: the smaller class of years less than 1800
and the bigger class of years greater than 2100. The upper boundary for
the smaller class is 1799, but there is no lower boundary, since there is
an infinite number of integers smaller than 1800. Similarly, the lower
boundary for the bigger class is 2101 and there is no upper boundary.
For the equivalence class of years between 1800 and 2100 that are
not multiples of 4, the lower boundary is 1801 and the upper boundary is
2099. □
A test suite provides boundary value coverage if it includes the upper and
lower boundaries of each of the equivalence classes. For an equivalence class
with one element, the lower and upper boundaries are the same. In Fig. 9.9, the
year 2000 is in a class by itself.
9.5 Code Coverage II: MC/DC
Branch coverage, also known as decision coverage, is adequate for the vast
majority of applications. MC/DC, short for Modified Condition/Decision
Coverage, is a stronger form of white-box testing that is required for some
critical applications. MC/DC applies to decisions involving complex Boolean
expressions containing operators such as & (logical and) and | (logical or) For
example, suppose the value of the following decision is false:
Each row of this table represents a test. The columns represent the values of
the conditions a and b and the decision a | b. In test 2, a is T and b is F, so a | b
is T.
Example 9.16 Tests 2 and 3 provide condition coverage, but not
decision coverage. Condition coverage follows from the observation that
in tests 2 and 3, a is T and F, respectively; b is F and T, respectively.
But, the two tests do not provide decision coverage, since a | b is T in
both tests.
Tests 2 and 4 provide decision coverage, but not condition
coverage. While the value of a | b flips from T to F in these tests, b is not
covered, since b is F in tests 2 and 4. □
9.5.2 MC/DC Pairs of Tests
MC/DC is a stronger criterion than either condition or decision coverage.
Modified Condition/Decision Coverage (MC/DC) requires each condition to
independently affect the outcome of the decision. Independently means that, for
each condition x, there is a pair of tests such that all three of the following hold:
1. From one test to the next, the outcome (truth value) of the decision flips
from T to F, or vice versa.
3. The values of the remaining conditions stay the same across the tests.
Example 9.18 The following three tests provide MC/DC coverage for
the decision a | b:
Tests 2 and 4 are an MC/DC pair for a: the outcome changes from T
to F, the value of a flips, and the value of b stays the same. Similarly,
tests 3 and 4 are an MC/DC pair for b. Thus, we have verified that the
test set provides MC/DC coverage for the decision. Note that test 4 in
both MC/DC pairs, so it is reused. Therefore, with n = 2 conditions we
have 3 < 2n tests. □
Pairs Tables
A pairs table succinctly identifies the MC/DC pairs for the conditions in a
decision.15
The pairs tables for a | b and a &b appear in Table 9.1. A pairs table is an
extension of a truth table for the decision. A truth table has a column for each
condition and a column for the decision. It has a row for each combination of
values for the conditions. A pairs table adds a column for each condition x. If (i,
j) is an MC/DC pair for x, then the added column for x has j in row i and i in row
j.
Table 9.1 Pairs tables for decisions a|b and a & b on the right.
Example 9.19 The pairs table for the decision (a &b) | c appears in
Table 9.2. The columns in the table are in three sections, separated by
vertical lines. From the left, the first section has a column for each
condition. The second section is for the decision. The third section has
added columns to keep track of the MC/DC pairs for each condition.
There is only one MC/DC pair for a: it is (2, 6). In the added
column for a, the pairs table therefore has 6 in row 2 and 2 in row 6.
None of the other pairs of tests qualifies as an MC/DC pair for a, since
the outcome is not flip. The outcome of the decision remains T for the
pairs (1, 5) and (3, 7), and it remains F for the pair (4, 8).
The only pair for b is (2, 4). There are three pairs for c: they are (3,
4), (5, 6), and (7, 8). □
Selecting MC/DC Tests
The MC/DC tests for a decision can be deduced from its pairs table. The fewer
the tests, the better.
In the following heuristic approach to test selection, “pair” is short for
“MC/DC pair.”
Condition c has three pairs. Either (3, 4) or (5, 6) would add just
one more test, since 4 and 6 have already been selected. The pair (7, 8)
would add two tests, since neither 7 nor 8 has been selected. Choosing
the pair (3, 4) for c, we get four tests:
□
most failures are triggered by one or two factors, and progressively fewer
by three, four, or more factors, and the maximum interaction degree is
small.
Pairwise Interactions
Pairwise testing addresses two-way interactions. The idea is to test all
combinations of values for each pair of factors. For the system in Example 9.21
the pairs of factors are
Some conventions will be helpful in organizing sets of tests. Let the letters
A, B, C, . . . represent factors; for example, A represents browser, B represents
platform, and C represents database. Let the integers 0, 1, 2, . . . represent factor
values; for example, for factor B (platform), 0 represents Linux, 1 represents
Windows, and 2 represents Mac OS. Let two-letter combinations AB, AC, BC, . .
. represent the pairs of factors (A, B), (A, C), (B, C), . . ., respectively.
Consider tests involving two factors A and B, each of which can take on the
two values 0 and 1. With two factors and two values, there are four possible
combinations of values for the two factors. A test consists of a specific
combination of values. The following table represents an exhaustive set of tests
involving the two factors:
In such tables, columns represent factors, rows represent tests, and table
entries represent values for the factors.
A set of tests is a t-way covering array if the tests include all possible
combinations for each subset of t factors. The next two examples illustrate two-
way covering arrays.
Table 9.3 Tables for three factors, each of which can have two possible
values. See Example 9.22.
(a) All combinations
Table 9.4 All combinations for pairs AB, AC, and BC, and a covering
array. See Example 9.23.
Multi-Way Covering Arrays
The discussion of two-way interactions generalizes directly to the interaction of
three or more factors. More factors need to be considered since two-way testing
finds between 50 percent and 90 percent of faults, depending on the application.
For critical applications, 90 percent is not good enough. Three-way testing raises
the lower bound from 50 percent to over 85 percent.
The benefits of combinatorial testing become more dramatic as the size of
the testing problem increases. The number of possible combinations grows
exponentially with the number of factors. By contrast, for fixed t, the size of a t-
way covering array grows logarithmically with the number of factors. For
example, there are 210 = 1024 combinations of 10 factors, with each factor
having two values. There is a three-way covering array, however, that has only
13 tests; see Table 9.5.18
Algorithms and tools are available for finding covering arrays. The general
problem of finding covering arrays is believed to be a hard problem. A naive
heuristic approach is to build up a covering array by adding columns for the
factors, one at a time. Entries in the new column can be filled in by extending an
existing test, or by adding a row for a new test.
start with an empty array;
for each factor F:
add a column for F;
mark F;
for each three-way interaction XYF, where X and Y are marked:
for each combination in the combinations table for XYF:
if possible:
fill in a blank in an existing row to cover the combination.
else:
add a row with entries in the columns for X, Y, and F;
comment: leave all other entries in the row blank
9.7 Conclusion
Since exhaustive testing of software is generally impossible or impractical,
defects are detected by running the software on selected test inputs. The
recommended approach is to start testing early, while code is being written or
modified. In other words, start with (low level) units of implementation and then
test larger (higher level) pieces of software. This approach gives us multiple
levels of testing: unit, functional, integration, system, and acceptance testing,
from low to high.
With smaller pieces of software, we can use white-box testing, where tests
are selected to cover or exercise constructs in the code. Black-box testing, based
on inputs and outputs, is used for larger, more complex pieces of software.
Black-box tests are selected to be a representative sample of the set of all
possible inputs.
The idea is to select a good enough set of tests so we can be confident that
the software will behave as expected when it is used. “Good enough” is
quantified by defining two kinds of coverage criteria: code coverage for white-
box, and input-domain coverage for black-box testing.
Code coverage is the degree to which a set of tests exercises specific
programming constructs such as statements and decisions. Of the two, decision
coverage is stronger: it detects more defects. MC/DC (Modified
Condition/Decision Coverage) is an even criterion that is required for aviation
and automotive software. For complex decisions with multiple conditions,
MC/DC requires each condition to independently affect the outcome of the
decision.
Input-domain coverage is the degree to which a test set is a representative
sample of all possible inputs. Equivalence partitioning is a technique that
partitions a potentially infinite input domain into a finite number of equivalence
classes, where all tests in the same equivalence class either all pass together or
all fail together. Testing of highly configurable systems must contend with a
combinatorial explosion of configurations; for example, choose one from the
column for factor A, one from the column for factor B, and so on. Combinatorial
testing avoids the explosion by considering all combinations of a few, say k = 3,
of the factors instead of all possible factors.
Further Reading
The overview of testing in Section 9.1 is motivated by Whittaker’s [190]
“practical” tutorial.
Whittaker, Arbon, and Carollo [191] describe how Google tests software.
The classic 1979 book by Myers [140] on the art of testing has a third
2011 edition coauthored by Badgett and Sandler [141].
For more on test-driven development, see the special May– June 2007
issue of IEEE Software (Vol. 24, No. 3).
Exercise 9.2 How would you select “good” tests for test-driven development?
Exercise 9.3 What is the best match between the testing phases of the
development process in Fig. 9.10 and unit, functional, integration, system, and
acceptance testing? Explain your answer.
Here are brief descriptions of the testing phases:
Figure 9.10 The development process for the SAGE air-defense system.
b) Define a test set that maximizes coverage of the nodes in the flowgraph.
For each test, identify the nodes that the test covers.
c) Define a test set that maximizes coverage of the edges in the flowgraph.
For each test, identify the edges that the test covers.
Figure 9.11 Code for white-box testing in Exercise 9.4.
Exercise 9.5 Consider black-box testing of a stack data structure, where the
input domain consists of sequences of push and pop operations. The stack is
initially empty. The software under test produces error messages if a pop is
applied to an empty stack or if a push is applied to a stack that is full. The stack
can hold at most max elements, where max is a parameter that the tester can set.
The testing strategy is to partition the valid input sequences into three
equivalence classes, representing states of the stack; empty, partially full, and
full.
Exercise 9.6 For each of the following decisions, define a minimal set of
MC/DC tests. Briefly explain why the set is minimal. The symbol ! denotes
logical negation.
a) a | (b &c)
b) (!a) &(b|c)
Exercise 9.7 Suppose that a system can be configured by setting four binary-
valued factors, and that you want to test all two-way interactions between
factors.
a) Create no more than six tests to test all two-way interactions. Explain
why your solution works.
For metrics to be useful, they must be consistent with our intuitions about the
real world. As an example, defects can be a useful metric for product quality –
the fewer severe defects, the better. This chapter recommends the following
four-step approach to designing and using meaningful metrics:
d) Finally, analyze the data for insights to guide decisions and predictions
related to the goal.
Display data graphically, using bar charts, Gantt charts, box plots,
and histograms.
Example 10.1 In the example in Fig. 10.1, a program entity has a size
attribute that is measured in lines of code. The specific program has
750 lines, so 750 lines is a data point. □
Example 10.2 This example shows that an attribute can have more
than one metric. Again, let the entity of interest be a program and the
attribute be program size. Consider two metrics loc and ncsl, for lines
of source code with and without comments, respectively. The names
of the metrics are acronyms for “lines of code” and “non-comment
source lines.”
As a variant, suppose that the program includes some open-
source code. We may want two separate metrics for the number of
lines with and without the open-source code. □
Measure Has Two Meanings The term measure has two meanings. As a
noun, it is a synonym for metric; for example, lines of code is a measure of
size. As a verb, “to measure” is to associate a value with a metric.
The term “measurement” is also overloaded. As a process, measurement
defines metrics, so values or symbols can be assigned to attributes. As a value,
a measurement is a specific data value for a metric. For example, 500 lines
and 750 lines are measurements.
Example 10.3 The Microsoft Access team was delighted that total
downloads of their new database product had jumped from 1.1 million
to 8 million over 18 months. Then, they observed that customers were
not using the product. The team followed up with phone interviews
and found that the Net Promoter Score was “terrible.” The verbatim
customer comments were “brutal.”
The team fixed the product, including 3– 4 design flaws and some
pain points. They also made some training videos. The videos turned
out to be most popular. The Net Promoter Score jumped an impressive
35 points.2 □
Data about product downloads is easy to collect automatically. The
number of downloads is not, however, a useful metric for customer
satisfaction, as the team discovered in the preceding example. See what
follows for more on customer satisfaction.
An IBM study set out to address the larger question of which of the many
metrics in use within the company had the greatest impact on customer
satisfaction.
Example 10.7 The IBM study was based on three years of actual data
from service centers that handled customer calls. It examined 15
different metrics for an operating systems product. The metrics related
to customer-found defects and to customer service requests; see Table
10.1.
The study examined the correlation between the 15 metrics and
customer satisfaction surveys. It found the greatest correlation
between the following two factors and satisfaction surveys:
The next two metrics were much less significant than the first
two:
Table 10.1 Which of these customer-service metrics has the most effect on
customer satisfaction? All 15 were managed and tracked.
Defective fixes during corrective maintenance are also referred to as
breakage. Breakage is a significant dissatisfier. Customers do not like it when
something stops working.
10.3 Graphical Displays of Data Sets
Since the value of a metric can be a symbol or a number, values can be of
many kinds; for example, product names, user story points, release dates, code
coverage ratios, and days to resolution for a customer problem. Each of these
metrics has a different kinds of value. Both “release date” and “days to
resolution” involve schedules, but the value of the former is a date and the
value of the latter is a number. We can subtract two dates to get a number (of
days between them), but is it meaningful to add two dates? No, it is not.
This section introduces scales of measurement, which spell out the
operations that are meaningful for a set of values. We then consider simple
graphical displays for entire data sets. The simple displays are bar charts for
data points grouped by category and Gantt charts for dates and schedules. For
histograms and box plots, see Section 10.6.
10.3.1 Data Sets
A data set is a sequence of data points (values), where data points may be
repeated – a later measurement can yield a data point that has been seen
before. When we need to distinguish between the known subset and an entire
data set, the known subset is called a sample and the entire data set is called
the population. The data points in a sample are also known as observations.
(10.1)
The second data set represents contractor bids for building a system to
the same requirements. The bids are in thousands of euros, rounded to the
nearest integer):
(10.2)
Example 10.8 Data set 10.1 has nine data points, so its median is 2.1,
the fifth data point in the following linear ordering:
The mean is the average, 2.49. The mode is the most frequently
occurring data point, 2.1. □
10.3.2 Scales of Measurement
A scale of measurement consists of a set of values and a set of operations on
the values. With the Stevens system, product names, story points, release
dates, and coverage ratios are on different scales. Numbers are on an absolute
scale that is not part of the Stevens system.
Ordinal Scales Values on an ordinal scale are linearly ordered. The only
operation is comparison with respect to the ordering. These comparisons
allow ordinal values to be sorted. As an example, the priority values high,
medium, and low form an ordinal scale. Like nominal values, ordinal values
are used as categories; for example, we can group requirements by priority.
Caution. Sometimes, ordinal values are numbered; for example, a user
story may be assigned one, two, or three story points. In this case, the
numbers are simply symbols and the “distances” between them are not
defined. All we can say is that a one-point story is relatively simple, that a
three-point story is complex, and that a two-point story is somewhere in
between. Where it falls in between is not defined.
Absolute Scale
The Stevens scales that we have just discussed are in addition to the absolute
scale, where values correspond to integers. Counts are on the absolute scale;
for example, the count of the number of lines of code in a program. The term
“absolute” comes from the scale having an absolute zero value. It makes
sense to talk of zero defects in a program that has no defects. An absolute
scale may or may not support negative values; for example, it does not make
sense for a program to have a negative number of defects.
10.3.3 Bar Charts Display Data by Category
Graphical displays of data are routinely used to present and summarize data
about the status of a software projects. Charts and plots readily convey
information that could otherwise get lost in a sea of numbers. With displays,
we can spot trends and formulate hypotheses that can then be tested through
data analysis. Here are a couple of typical questions about trends:
Bar charts show numeric values by bars that are either drawn vertically
or horizontally; see Fig. 10.3. The height or length of a bar is determined by
the value it represents.
Applications Bar charts are used to display values by category. In Fig. 10.3,
the categories are contractors and the values are their bids. The categories in a
bar chart can be on either a nominal or an ordinal scale; for example, products
are on a nominal scale, months are on an ordinal scale.
Example 10.9 For a study of reproducibility in software development,
the Simula Research Lab sought bids for developing the “same”
system. Specifically, all potential bidders were given the same
requirements. Thirty-five companies submitted bids. The median bid
was 25,940 euros.9
Bar Charts for Ordinal Values Categories on an ordinal scale (for example,
calendar months) are linearly ordered. Therefore, in a bar chart, the horizontal
position of ordinal categories is significant. For example, project managers
use bar charts to show trends over time. Trends related to the rate of discovery
of new defects are of particular interest. A bar chart shows at a glance whether
this rate is going down, month after month.
Applications The main use of Gantt charts is for planning and tracking
schedules. Here are two examples:
The work breakdown schedule for a project shows the order and time
intervals in which tasks must be completed. Otherwise, the project will
overrun its schedule. The tasks are represented by rows in a Gantt
chart.
Convention When it is used by itself in this chapter, the term defect refers to
a critical or major defect.
10.4.2 Defect-Removal Efficiency
Defect-removal efficiency is based on the intuitive idea of comparing the
number of defects removed before delivery with the total number that are
detected before and after delivery. The efficiency level is then the percentage
of total defects that are removed before delivery.
Such percentages (or ratios) would allow fair comparisons across
products. A larger, more complex product is likely to have many more defects
than a smaller, simpler one. The effect of size and complexity cancels out,
however, because it applies equally to both the numerator and denominator in
the ratio
A problem with the intuitive idea is that the total number before and after
delivery cannot be measured. “After” is an infinite interval that starts with
delivery and does not end. In accumulating the total, we could wait forever for
another defect to be found and removed.
The solution to this problem is to approximate the total by putting a time
limit on the interval after product delivery. In practice, there is an initial flurry
of customer-found defects, as customers exercise the product in ways that the
developers may not have fully tested. This flurry dies down in the first few
months after delivery.
The 90-day interval in Fig. 10.5 strikes a balance between (a) getting
early feedback about the product and (b) waiting long enough for customer
trouble reports to die down.
Figure 10.5 Defect-removal efficiency is the percentage of total defects
removed during development, where total defects is the total number
removed during development and detected within 90 days after delivery.
(10.3)
Table 10.2 Defect-removal efficiency (DRE) data for reviews, static analysis,
and testing. Source: Capers Jones.11
10.4.3 Customer-Found Defects (CFDs)
When two customers find the same defect, the product has one fault and two
failures. One fault, because there is one defect/fault for developers to fix. Two
failures, because two customer installations of the product were affected by
failures – presumably, the customers found the defect because they had a
failure of some sort.
Both defects and affected installations are carefully tracked by major
companies. A defect is an inherent property of a product: it cannot be
removed without changing the product. Defect counts are therefore metrics for
product quality. Counts of affected installations are metrics for ops quality:
they reflect the customer experience with operating the product. This section
deals with metrics for defects. Section 10.5 is a case study of an ops quality
metric derived from counts of affected systems.
Example 10.12 Typically, only the first report of a fault is counted as
a customer-found defect. Subsequent reports about the same defect are
counted separately. The following IBM metrics distinguish between
the first and subsequent reports of a defect:
Numbers that were dubbed “genuine” and reported for the first time
2. It is the first report of this product defect. Even if there are multiple
reports, they are about the same defect in the product. A fix to the defect
will handle all of the trouble reports. It therefore makes sense to classify
only the first report as a CFD. (Meanwhile, the number of reports about a
defect is a measure of operations quality, since it reflects the customer
experience with the product.)
The ratios cluster around the median, 2.1 defects per 100 installs.
In other words, the function
Figure 10.7 A partial hierarchy that refines the top-level goal of improving
customer satisfaction with current and future products. The “leaf” goals can
be refined further to get specific measurable goals. The hierarchy includes
some customer-support subgoals that must be met for customer satisfaction
to improve.
Improve the quality of future products so they will have better ops
quality after they are delivered and operated.
The subgoals for improving future product quality are to (a) identify and
(b) make process improvements. Development practices are the main lever for
improving the quality of future products, including products that are still in
development. In short, better processes today will develop better products for
tomorrow – better products that are expected to fail less often. If they fail less
often, they will have better ops quality. The goal hierarchy therefore shows:
process improvement contributes to better
product quality, which contributes to
ops quality improvement
Installed Base The size of the installed base is the total number of installs of
a product. Based on empirical evidence, affected installs grow linearly with
total installs; see Fig. 10.8 for supporting data. CQM therefore takes the ratio
of affected to total installs. This ratio represents the likelihood (probability)
that an install will be affected.
Figure 10.8 Data showing the growth of affected installs (vertical axis)
with total installs (horizontal axis). An install is affected if it reports a
customer-found defect. The data points are for a sequence of releases.
Reporting Interval CQM is derived from a count of installs that are affected
within a fixed time period after their date of installation. That time period is
called the reporting interval. The length of the reporting interval is a
parameter, say n-months. Typical values of n are 1, 3, and 6 months.
For CQM, the installation date must be within the product maturity
period for the install to be included in the counts of affected and total installs;
see Fig. 10.9. The dashed vertical line represents the end of the maturity
period. Installation dates past the maturity period are not included in the
counts. Reporting intervals are shown as solid lines. As long as a reporting
interval begins within the maturity period, it can extend past the maturity
period. All reporting intervals have the same length: n months.
Figure 10.9 To be considered for the n-month CQM ops-quality metric for
a product, an installation date must be within the product maturity period,
which is the first m months after product release. The thick lines indicate
the reporting interval, which is the first n months after the installation date.
Both are useful summary statistics. The difference between the median 8
and the mean 20 is a reflection of the dispersion of the data points.
In practice, the median can be a more realistic statistic about a “typical”
data point because the median is not skewed by a few unusually large or
unusually small values. The mean averages values, so it is affected by extreme
data points. When a billionaire walks into a room full of regular people, the
median net worth of the people in the room may shift by one, but the mean net
worth goes through the roof.
Definition of Median
The following definition generalizes the earlier view of a median as the
middle data point. In particular, the median is a value; it does not have to be a
member of the data set.
A median of a data set is any value such that the values in the data set
satisfy both of the following conditions:
For a data set with an even number of data points, we follow the
convention of picking the median to be the average of the middle two values.
With the data set
The median, 2.1, partitions this data set into two halves, each
with 4 data points. The lower quartile is the median, 1.8, of
Intuitively, quartiles are values that mark the boundaries of the four
quarters of a data set. Quartiles will be used in the following in box plots,
which show the range of values between quartiles. As we shall see, box plots
visually summarize the dispersion of data points. The following definitions
are convenient for working with box plots.
A data set is characterized by the five quartiles, Q0-Q4:
(10.4)
The interquartile range (IQR) is the length of the interval between the
lower and upper quartiles:
(10.5)
10.6.2 Box Plots Summarize Data by Quartile
Box plots or boxplots, also known as box and whiskers plots, summarize the
dispersion of data points by showing the intervals between quartiles. Boxplots
can be drawn horizontally or vertically. In horizontal boxplots, the minimum
value is at the left; see Fig. 10.11. In vertical boxplots, the minimum value is
at the bottom.
Figure 10.11 Two boxplots for the same data set. (a) The whiskers in the
upper boxplot extend all the way to the minimum value at the left end and
to the maximum value at the right end. (b) The lower boxplot shows
outliers as dots.
The middle half of the data set is identified by a box in the middle of a
boxplot. Specifically, the box identifies the interquartile range between the
lower and upper quartiles. The median is drawn as a line across the box.
The lines that extend from the box toward the left and right ends of a
boxplot are called whiskers. Whiskers can be drawn all the way to the
minimum and maximum values, as in Fig. 10.11(a), or can stop short to show
outliers, as in Fig. 10.11(b). The latter boxplot, in Fig. 10.11(b), has two
outliers at the high end, and no outliers at the low end. See Example 10.17 for
how to draw whiskers.
Example 10.16 The boxplots in Fig. 10.11 are for the data set
The lower half of this data set is 12, 12, 16, 19, 20, so the lower
quartile is 16. The upper half is 22, 24, 30, 56, 80, so the upper
quartile is 30.
Both boxplots in Fig. 10.11 extend from the minimum 12 to the
maximum 80. In both, the box lies between the lower quartile 16 and
the upper quartile 30. The median, 21, is shown as a vertical line
across the box. □
Outliers
Outliers are defined in terms of the interquartile range, IQR, between the
upper and lower quartiles. A data point is an outlier if it is more than 1.5 ×
IQR past the lower or the upper quartile.
Example 10.17 For the data set in Example 10.16, the lower quartile
is 16 and the upper quartile is 30, so
The data set has two outliers: 56 and 80. They are both greater
than 51 = 30 + 21. There are no outliers at the low end of this data set.
□
In a boxplot showing outliers, the whiskers extend as follows:
The whisker from the lower quartile extends to the minimum, Q0, or to
Q1 − 1.5 × IQR, whichever is greater.
The whisker from the upper quartile extends to the maximum, Q4, or
to Q3 + 1.5 × IQR, whichever is lesser.
10.6.3 Histograms of Data Spread
A histogram is a graphical display of the number of data points that fall within
a given interval; that is, within a given range of values. Informally, a
histogram approximates the dispersion or spread of data points in a data set.
Two examples of histograms appear in Fig. 10.12.
Figure 10.12 Two histograms for the contractor bids data set 10.2.
Example 10.18 Compare the histograms in Fig. 10.12 with the bar
chart in Fig. 10.3. They are all based on the same data about
contractor bids. Entity contractor has two metrics: name and bid. In
the bar chart, the horizontal axis has a category for each contractor
name. The height of each bar represents the value of that contractor’s
bid. Thus, there is a bar for each data point (bid).
In the histograms, the horizontal axis represents the values of the
bids. The vertical bars count the number of bids that fall within a
bin/interval. □
Example 10.19 Both of the histograms in Fig. 10.12 are for the
contractor bids data set 10.2. The histogram on the left has 6 bins; the
bin width is 12. Here is how we get the bin width. The 6 bins must
cover data points ranging from 3 to 70, so the width must be at least
The histograms in Fig. 10.12(a– b) are for the same data set, but
their visual appearance is different. Changing the number of bins can
change the shape of a histogram. Shifting the bins can also perturb the
shape.
Consider the effect of shifting the bins in the left histogram by 1.
The contents of the first bin change as follows:
For any given data set, it is worth experimenting with the bin
parameters to get a feel for the distribution of the data points. □
How Many Bins?
The square-root rule is a simple rule for choosing the number of bins: use the
square root of the size of the data set. If N is the number of data points, then
(10.6)
In words, the number of bins is the smallest integer that is greater than or
equal to the square root of the number of data points. This rule yields 6 bins
for the 35 contractor bids; see the histogram in Fig. 10.12(a).
10.7 Data Dispersion: Statistics
This section introduces variance, a key summary statistic for the dispersion of
data points in a data set. The standard deviation of a data set is the positive
square root of its variance. The section then briefly introduces distributions. A
discrete probability distribution describes the relative frequency with which
observed data points will have a given sample value. A continuous probability
distribution describes the likelihood (probability) that a randomly chosen data
point will have a specific value.
10.7.1 Variance from the Mean
Variance measure of the spread or deviation of data points from the mean.
Example 10.20 The data points in Fig. 10.13 represent the ops quality
of a sequence of releases of a product. The releases are identified by
the numbers along the horizontal axis. The vertical axis represents the
percentage of installs that were affected by severe defects.
The mean 2.49 of the data set is one summary statistic for the
quality of the releases; see the horizontal line. Variance quantifies how
clustered or how dispersed the data points are from the mean. The
vertical lines show the deviations or distances of the data points from
the mean. □
Figure 10.13 The horizontal line in the middle represents the mean of the
data set. The dashed vertical lines show the deviations of the data points
from the mean.
With small data sets, summary statistics can be swayed by a few extreme
data points.
Example 10.22 The extreme data point 5.0 has an outsize influence
on the variance of data set 10.1, repeated here for convenience:
Let us round the mean 2.49 to 2.5, to keep the numbers simple.
The variance is the average of the squares of the deviations from the
mean:
(10.7)
(10.8)
Example 10.23 Severity levels are discrete. Consider a data set with
100 defects (data points): 5 critical, 10 major, 60 minor, and 25
cosmetic. From these sample values, we can estimate the probability
with which a randomly selected defect will have a given severity. The
following table shows the frequency of sample values in the data set
and the probability estimates:
(10.9)
For integer sample values, the interval for Φ(x) is (−∞, x).
10.7.3 Continuous Distributions
Continuous distributions are useful for data analysis even for data sets of
discrete sample values. The counterpart of a probability mass function for
discrete sample values is a probability density function, which maps a real
sample value x to a relative likelihood that a random data point will have that
sample value. Again, we use ϕ to denote a probability density function.
Examples of probability density functions appear in Fig. 10.14.
Figure 10.14 (a) Normal distributions: three with mean 0 and variance 0.2,
1.0, and 5.0; and one with mean −2 and variance 0.5. (b) Student’s t-
Distribution.
(a) Normal
Source: public domain image by Wikimedia Commons.
(b) Student’s t
Source: Skbkekas, CC3.0 BY license.13
Note that for any real number x, the probability ϕ(x) is 0. Why? Consider
a uniform distribution, where all real sample values between 0 and 1 are
equally likely. There are an infinite number of reals (not floating-point
numbers!) between 0 and 1, so the probability is 0 that a random sample will
exactly equal 0.5 or 0.1415926535 or any other of the infinite number of
possible sample values.
More generally, think of approximating a continuous density function
over the reals by a discrete mass function over the integers. In the
approximation, a real number is rounded to an integer. In the discrete
approximation, the mass function applied to an integer n represents the
probability that a random data point is in the interval (n − 0.5, n+ 0.5) in the
continuous distribution. As we expand this interval, the probability of a
random data point falling in the expanded interval increases. As we shrink the
interval, the probability decreases. In the limit, as we continue to shrink, the
probability of a random data point falling within the approximation interval
goes to zero.
With continuous distributions, we work with probabilities in intervals.
The area under the curve of a probability density function is 1. In other words,
the area under the curve of a probability density function between −∞ and +∞
is 1. With the distributions in Fig. 10.14, the density function drops off sharply
as the distance from the mean increases.
10.7.4 Introduction to Normal Distributions
The four probability density curves in Fig. 10.14(a) are all for a family of
distributions called normal distributions. “Normal” is their name, as opposed
to a characterization, as in, “it is normal for the sun to rise in the east.” The
shape of the curves has led to the popular name “bell curve” for normal
distributions, although other distributions also have bell-shaped curves; for
example, see the Student’s t-distribution in Fig. 10.14(b). Normal distributions
are widely used because they fit many natural phenomena. They also readily
support mathematical analysis.
A normal distribution is characterized by two parameters: its mean µ and
its standard deviation σ (or variance σ2). A standard normal distribution has
mean µ = 0 and equal variance and standard deviation, σ2 = σ = 1. The larger
the standard deviation, the wider and flatter the curve. The tall, narrow curve
in Fig. 10.14(a) with a peak at value 0 has mean µ = 0 and variance σ2 = 0.2.
The wider curves with mean µ = 0 have variance 1.0 and 5.0. The fourth curve
has µ = −2.0 and σ2 = 0.5.
A normal distribution is symmetric around the mean. The probability
density function falls off rapidly as we move away from the mean. The
following table shows the percentage of data points that are within nσ of the
mean; that is, within the interval (µ − nσ, µ + nσ).
10.7.5 Introduction to Student’s t-Distributions
Normal distributions work well with large sizes, but not with small. Student’s
t-distributions were motivated by the problem of small sample sizes, perhaps
as small as 3 data points. The distributions arise when small numbers of data
points are drawn from a population with a normal distribution. The name
comes from the pseudonym “Student” used by William Sealy Gosset when he
published the distributions in 1908. The distributions are also referred to
simply as t-distributions.14
Members of the family of Student’s t-distributions are characterized by a
parameter called the degrees of freedom, commonly denoted by the Greek
letter ν (“nu”). For a sample size N, the relevant t-distribution has ν = N − 1;
see Fig. 10.14(b) for probability density functions for ν = 1, 2, 5, ∞. The
density curve for a t-distribution resembles the curve for a normal distribution
with mean 0 and standard deviation 1, except that t-distributions have more
probability mass further away from the mean. As the number of degrees of
freedom increases, the curve for a t-distribution approaches a normal
distribution with µ = 0 and σ = 1. In other words, the curve for a t-distribution
is bell shaped and symmetric. It becomes narrower and taller as the number of
degrees of freedom increases; in the limit it equals a normal distribution.
10.8 Confidence Intervals
Intuitively, the reliability of a prediction (or estimate) increases with the
number of data points on which the prediction is based. In other words, as the
size of a random sample grows, we can be increasingly confident that the
sample faithfully reflects the population from which the sample is drawn. We
can then use the sample to predict some property of the population, such as
the population’s mean.
Confidence intervals quantify the informal notion of confidence in a
prediction about a largely unknown population. We have a known sample, but
the rest of the population is unknown. Instead of predicting a single point
value, the idea is to predict an interval, a range of values. We want the
prediction to be right with a specified probability, say 95 percent. Here,
“right” means that the predicted interval contains the population’s property of
interest. This probability is called the confidence level. The higher the desired
confidence level, the wider the interval. Conversely, the narrower the
confidence interval, the lower the confidence level.
10.8.1 Definition of Confidence Interval
For concreteness, let the property of interest be the mean µ of the population
and let the desired confidence level be 95 percent. We refer to the population
mean µ as the true mean.
A 95 percent confidence interval is an interval
(predicted-low, predicted-high)
With the true population mean as the property of interest, the estimate is
the sample mean.
More precisely, consider a sample data set X with N data points xi. Let
be the average of the N data points. This average is the sample mean
(10.10)
(10.11)
where error-bound is such that, with a 95 percent confidence level, the
preceding interval contains the true mean µ.
The calculation of the error bound depends on four factors:
Sample Size It is N.
(10.12)
Sample Size N The term in Equation 10.12 adjusts for the convergence
of samples mean on the true population mean µ for increasing values of N.
In the limit, the sample equals the population and .
Consider the simple case of samples drawn from a population with a
standard normal distribution; that is, µ = 0 and σ = 1. Each random sample
will have its own sample mean . For example, the sample 1.1, −0.3, 0.7 has
N = 3 and
and the sample −0.5, 0, 1.2, −0.3 has N = 4 and
(10.13)
is a random variable with a standard normal distribution that has mean 0 and
standard deviation 1.
Note that represents the deviation of the sample mean from the
true population mean.
Coverage Factor c For any normal distribution, with 95 percent probability,
a random data point is within 1.96σ of the mean. Since the term (10.13) has µ
= 0 and σ = 1, with a 95 percent confidence level, its absolute value is less
than or equal to c = 1.96:
(10.14)
(10.15)
Table 10.3 Partial table of tα,ν values for Student’s t-distribution. Here, ν is
the degrees of freedom; CL is short for Confidence Level; and α = (1−CL)/2.
For ν = ∞, the t-distribution equals a standard normal distribution.
Table entries for a t-distribution are sometimes denoted by tα,ν, where α =
1 − CL/2.
Example 10.24 The data set 10.2 of N = 35 contractor bids is a
sample from an unknown population. Let us assume that the
population is normally distributed and that the confidence level is 95
percent. The calculation of the error bound is summarized by the
following:
The estimate for the true population mean is the sample mean,
27.6. The 95 percent confidence interval is
(10.16)
Figure 10.15 Two regression lines for a data set. The solid line is based on
the ordinary least-squares criterion. The dashed line has half the data points
above it, half below.
Simple linear regression is closely associated with a “goodness of fit”
criterion called ordinary least squares. Given a set of N two-dimensional data
points (xi, yi), simple linear regression fits a line defined by the equation
(10.17)
(10.18)
The solid regression line in Fig. 10.15 satisfies the least-squares criterion.
10.9.1 The Simpler Case: Line through the Origin
Ratios like defect-removal efficiency (Section 10.4) and customer-quality
metric (Section 10.5) correspond to a special case where the regression line
must pass through the origin; see Fig. 10.15. Regression lines through the
origin are motivated by practical examples; for example, a program with 0
lines of code has 0 defects; see also Example 10.25.
We have f (0) = 0 when the regression line passes through the origin, so
Equation 10.17 reduces to
(10.19)
With the mean of the ratios as a candidate for the parameter a, we need to
know if the outliers are exceptions to be ignored, or if they provide
meaningful information. This knowledge requires insights from the real-world
application represented by the data points. If the outliers are exceptions, then
they unduly sway the mean.
10.9.2 Ordinary Least-Squares Fit
We begin with an ordinary least-squares fit through the origin. The solution
for the slope a in f (x) = a will then be adapted for the case where f (x) = ax +
b.
(10.20)
has two parameters: a for the slope of the line, and b for where the line
intercepts the y-axis. The intercept is the point (0, f (0)). In this case, the
ordinary least-squares regression line passes through what we might call the
intercept point and the center of mass of the data points, . Here is the
average of the xi values and is the average of the yi values.
To get the slope a of the regression line f (x) = ax + b, let us rewrite the
slope of a regression line through the origin:
The 0 terms in the preceding equation make explicit the fact that the line
passes through the origin, (0, 0). For a regression line that passes through the
center of mass , we substitute and for the 0 terms. The slope a for f (x)
= ax + b through the center of mass is
(10.22)
(10.23)
(10.24)
To force a regression line through some other point (u, v) instead of the
center of mass, substitute u and v for and , respectively.
10.10 Conclusion
A metric quantifies an attribute (property) of an entity in the real world, such
as source code, a development activity, a delivery event, or a team. A
measurement process quantifies an attribute by associating a data value with a
metric; for example, associating the symbol “critical” as the value of attribute
severity of an entity product defect.
This chapter introduces data measurement, description, and analysis
techniques by applying them to the goal of software quality assessment and
improvement. While software quality is important in its own right, the
applications of metrics extend beyond quality to all areas of software
engineering. For example, a 2014 survey collected 145 questions about
software engineering that Microsoft engineers would like data scientists to
address. The questions were grouped into 12 categories such as customers and
requirements, development practices, and teams and collaboration. The top
five questions appear in Table 10.4.15
Table 10.4 The top five of 145 data-related questions from a survey of
Microsoft engineers. Begel and Zimmerman presented the results in 2014.
See Fenton and Bieman [68] for data description and analysis
techniques.
Exercises for Chapter 10
Exercise 10.1 In each of the following cases, what are the issues, if any, with
the use of the metric for the intended purpose?
Exercise 10.2 Classify the metrics in Table 10.1 according to the six
approaches to software quality in Section 10.2.
Exercise 10.3 For the contractor bids data set 10.2 in Section 10.6, determine
a) the mean
b) the mode(s)
c) the median
e) the variance
Exercise 10.4 Follow the instructions in Exercise 10.3 with the data set
c) Create a boxplot
Exercise 10.6 Determine the following confidence intervals for the true mean,
given the contractor bids data set 10.2:
a) 90%
b) 95%
c) 98%
Exercise 10.7 For the following data set, what is the slope a of a regression
line through the origin, where the line f (x) = ax is determined using the
following approaches:
“We’re good friends. We see each other every day. We’re all
equals. We don’t need roles and a [development] process.”
At the end of the first status review they assigned roles: product
owner, scrum master, developers. Why? Because they were well
behind not only the other teams, they were behind their own plans.
□
Proposal and Status Reports These reports serve two purposes. First,
they ask leading questions to guide project activities; for example, the
proposal asks about the customer, their needs within the context of an end-
to-end experience, and the proposed benefit to be provided by the project.
Second, the reports serve as touch points between the concepts and
project tracks.
They ask students to apply specific concepts to their projects:
If desired, the questions in the templates can be replaced by questions
for other forms of user requirements, functional requirements, and system
description, respectively. With the timeline in Fig. A.1, the reports are due in
weeks 5, 8, and 11, respectively. The reports place a hopefully weak
constraint on the concepts track.
A.2 Project Proposal
Between weeks 2 and 5 on the timeline in Fig. A.1, the teams propose a
project, preferably one with a real customer (outside the team). They elicit
requirements, set project goals, outline an implementation, and come up to
speed on the tools and platforms that they will use. Meanwhile, the lectures
cover user needs and requirements. The detailed questions in the proposal
template enable the teams to work in parallel with the lectures.
During week 5, the teams present their proposals to the whole class.
The feedback helps students tune their project goals. The goals have to be
challenging enough, yet doable.
The proposal and the status reports build up to a comprehensive final
report. All of the reports therefore have essentially the same sections. The
proposal emphasizes the Customer Needs and Project Goals sections.
Material from these sections can be reused in later reports.
Proposal Template
Note In this proposal and the reports, put information where it fits best.
Avoid repeating the same information in a later section.
Descriptive Title Focused on the User Benefits
Team Name: Team Members
1. Introduction
Opening Paragraph
Challenges
2. Customer Need
User Requirements
− Write acceptance tests for the user stories, using the “Given . . .
when . . ..
then . . .” template.
3. Project Goals
Measures of Success
− Who outside the team have you tested the idea on?
− Describe the real outside customer, if any.
− How will you know whether the customer got their desired
benefits?
− What are your customer-centric measures of success?
4. System Description
For this proposal, a rough draft of this section is enough.
5. Solution Approach
A brief rough draft of this section is enough for this proposal.
− What platforms, tools, libraries, and the like will you use?
− How will you test it?
− How will you evaluate the adequacy of your test strategy?
6. Project Management
Start a Change Log to track any changes to the project as described
in this Proposal. For each entry in the Log, include four things: the
date, description, motivation, and implications of the change.
Team Coordination
7. Team
Backgrounds
Roles
− What are the planned roles for the team members during this
project?
1. Introduction
Highlights
Changes
2. Customer Need
3. Project Goals
Use Cases
− Write a use case for each main user goal for a primary or secondary
customer.
− Show the title, user goal, and full basic flow for each use case.
Choose meaningful titles.
4. System Description
5. Current Status
6. Project Management
Continue to maintain the Change Log. Add any new changes to the
project, tracking the date, motivation, description, and implications of
each change.
7. Team
8. Reflection
1. Introduction
Highlights
Changes
2. Customer Need
− Briefly describe the customer’s desired overall experience.
3. Project Goals
4. System Description
System Overview
5. Current Status
6. Project Management
Continue to maintain the Change Log. Add any new changes to the
project, tracking the date, motivation, description, and implications of
each change.
7. Team
8. Reflection
1. Introduction
Opening Paragraph
Challenges
Changes
2. Customer Need
User Requirements
− Include acceptance tests for the user stories, using the “Given . . .
when . . .. then . . .” template.
3. Project Goals
Use Cases
− Include a use case for each main user goal for a primary or
secondary customer.
− Show the title, user goal, and full basic flow for each use case.
Choose meaningful titles.
− For alternative flows that have been implemented, give only the
title, a one-line description, and how the alternative flow connects
with its basic flow.
Measures of Success
− How do you know whether the customer got their desired benefits?
4. System Description
System Overview
− For each element, identify the single owner in the team, even if
multiple team members contributed to the element.
5. Final Status
6. Project Management
− What were the major events during the project? Include dates.
− Anything else?
Team Coordination
7. Team
Backgrounds
− What were the backgrounds of the team members?
Roles
− What were the roles of the team members during this project?
− Did you have access to the data, services, and resources you
needed?
9. Reflection
− For the features that were not implemented, what were the issues?
Recommendations
2 Boehm [30] writes, “On my first day on the job [around 1955], my
supervisor showed me the GD ERA 1103 computer, which filled a large
room. He said, ‘Now listen. We are paying $600 an hour for this and $2
an hour for you, and I want you to act accordingly.’ ”
3 The focus of the 1968 NATO conference was on the “many current
problems in software engineering.” See [143, p. 13– 14].
11 For more on the issues that arise during testing, see the “practice
tutorial” by Whittaker [190].
12 Martin Barnes is credited with creating the Iron Triangle for a 1969
course [187]. Trilemmas have been discussed in philosophy for centuries.
2 Madden and Rone [128] describe the iterative process used for the
parallel development of the hardware, software, and simulators for the
Space Shuttle.
3 Larman and Basili [121] trace the roots of iterative and agile methods
to the “plan-do-study-act” quality improvement cycles proposed by
Walter Shewhart at Bell Labs in the 1930s.
4 The Unix practices in Section 2.1.3 are from McIlroy, Pinson, and
Tague’s foreword to a 1978 collection of papers on Unix [135].
6 Fowler [73].
18 The committee chairman’s opening remarks are from [181, p. 2]. For
the extent of system and end-to-end testing, see [181, p. 57]. The cost of
the website is from [182, p. 19].
19 The chart on the cost of fixing a defect is due to Boehm [30, 24].
26 Cusumano and Yoffie [55, p. 251] relate how the initial requirements
for Netscape 3.0 were set. They quote Bill Turpin, “The original way we
came up with the product ideas was that Marc Andreessen was sort of our
product marketing guy. He went out and met with lots of customers. He
would meet with analysts. He would see what other new companies were
doing.”
27 Iansiti and MacCormack [91] note that by the late 1990s, companies
in a wide range of industries “from computer workstations to banking”
had adopted iterative product development to deal with uncertain and
unexpected requirements changes.
29 See Boehm [29] for more on the Spiral Framework. An early version
of the framework appears in [27]. The treatment in Section 2.7 follows
[29], where Boehm emphasizes that the framework is not a process
model. He describes it as a “risk-driven process model generator.”
3 The Fast Feedback process in Fig. 3.4 is adapted from the Fast
Feedback Cycle in De Bonte and Fletcher [58].
13 Dan North [145] created the template for acceptance tests with Chris
Matts. “We started describing the acceptance criteria in terms of
scenarios, which took the ‘Given-when-then’ form.”
14 The stylized English syntax for features in Table 3.2 is adapted from
Mavin and Wilkinson [131].
3 The terms cognitive bias and anchoring are due to Tversky and
Kahneman [178].
5 From Aristotle [9, Book 3, Part 11], “the many, of whom each
individual is an ordinary person, when they meet together may very
likely be better than the few . . . for some understand one part, and some
another, and among them they understand the whole.”
7 Dalkey and Helmer [56] describe the original Delphi method. They
observe that a roundtable “induces the hasty formulation of preconceived
notions, an inclination to close one’s mind to novel ideas, a tendency to
defend a stand once taken or, alternatively and sometimes alternately, a
predisposition to be swayed by persuasively stated opinions of others.”
See Helmer [85] for a retrospective on the Delphi method.
12 Cohn [51, ch. 9] is the source for the two-step method in Section 4.4.3
for prioritization based on value, cost, and risk.
13 Kano [108] is the source for the treatment of Kano analysis in Section
4.5. The oft-cited paper by Kano et al. [109] is in Japanese.
20 Results from numerous studies, going back to the 1960s, support the
observation that there are order-of-magnitude differences in individual
and team productivity. From early studies by Sackman, Erikson, and
Grant [161], “one poor performer can consume as much time or cost as
5, 10, or 20 good ones.” McConnell [132] outlines the challenges of
defining, much less measuring software productivity.
22 Walston and Felix [186] provide IBM data from the 1970s for
estimating effort from program size. The data for the TRW effort-size
curve is from Boehm [26].
23 Boehm and Valerdi [33] note that “although Cocomo II does a good
job for the 2005 development styles projected in 1995, it doesn’t cover
several newer development styles well. This led us to develop additional
Cocomo II-related models.”
Notes for Chapter 5: Use Cases
2 The ATM use case in Example 5.6 is based on a fully worked-out use
case in Bittner and Spence [22], which Ivar Jacobson, the inventor of use
cases, called “THE book on use cases” [96].
5 Jacobson, Spence, and Kerr [98] provide six principles for use cases:
three for writing and three for using use cases to drive iterative
development.
2 Klein and Weiss [112] note: “Architecture is a part of the design of the
system; it highlights some details by abstracting away from others.”
3 The version of Conway’s law [54] in Box 6.1, is from his website
www.melconway.com/Home/Conways_Law.html.
8 The first guideline for module design is from Parnas [147]. The
remaining guidelines are based on Britton and Parnas [37, pp. 1– 2].
10 See [35] for a user guide by the original authors of UML. The preface
includes a brief history of UML. Grady Booch and James Rumbaugh
created an early draft of UML in 1994, by combining their object-
oriented design methods. They were soon joined by Ivar Jacobson, and
expanded the UML effort to incorporate his methods. UML 1.1 was
adopted as a standard by Object Management Group (OMG) in 1997.
The UML 2.0 standard was adopted in 2005.
In a 2006 interview, Jacobson observed, “UML 2.0 has become very
large, it has become heavy, it’s very hard to learn and study” [97]. UML
has strong name recognition, but actual usage lags. A 2013 survey by
Scott Ambler, an author of UML books, found that while all 162
respondents had heard of UML, “Only 13% found UML to be very
useful, and 45% indicated that UML was useful but they could get by
without it. A further 20% found it ‘more trouble than it was worth’” [6].
12 Views have been incorporated into the standards IEEE 1471 and
ISO/IEC/IEEE 42010.
2 The cross section of the Hagia Sophia is from Lübke and Semrau [127];
see https://commons.wikimedia.org/wiki/File:Hagia-Sophia-
Laengsschnitt.jpg.
1 Maranzano et al. [129] is the primary source for Section 8.1. The
authors were all at Bell Labs when they began conducting architecture
reviews in 1988. From their 700 reviews through 2005, between 29% and
49% of the design issues were categorized under “The proposed solution
doesn’t adequately solve the problem.” Between 10% and 18% came
under “The problem isn’t completely or clearly defined.”
5 Porter, Siy, Toman, and Votta found that there “was no difference
between two - and four-person inspections, but both performed better
than one-person inspections” [152, p. 338].
6 Porter and Votta [153] report on a study that found reviewers who used
checklists were no more effective at finding defects than reviewers who
used ad hoc techniques. Reviewers who used scenarios were more
effective at finding defects.
7 Eick et al. [64, p. 64] found that 90% of defects were found during
individual preparation. This data was collected as part of a study to
estimate residual faults; that is, faults that remain in a completed system.
13 Holzmann [88, 89] describes the development of the software for the
mission to land a rover on Mars.
15 For the SSL bug in iOS and Mac OS, see Bland [23].
16 David Hovemeyer “developed FindBugs as part of his PhD research .
. . in conjunction with his thesis advisor William Pugh.” [11] Example
8.4 is based on [90].
1 The defective code in Fig. 9.1 is from the clock driver for the Freescale
MC 13783 processor used by the Microsoft Zune 30 and Toshiba
Gigabeat S media players [193]. The root cause of the failure on
December 31, 2008 was isolated by “itsnotabigtruck” [94].
5 SWEBOK 3.0 merges system and functional testing into a single level
[36, pp. 4– 5]. The levels of testing in Section 9.2 assign the validation
and verification roles to system and functional testing, respectively. The
classic text on testing by Myers separates system and functional testing
[140].
6 Jones has published summary data from 600 client companies [104]:
“Many test stages such as unit test, function test, regression test, etc. are
only about 35% efficient in finding code bugs, or find one bug out of
three. This explains why 6 to 10 separate kinds of testing are needed.”
7 The xUnit family began with Beck’s automated testing framework for
Smalltalk. In 1997, Smalltalk usage was on the decline and Java usage
was on the rise, so Beck and Gamma created JUnit for Java. They had
three goals for JUnit: make it natural enough that developers would
actually use it; enable tests that retain their value over time; and leverage
existing tests in creating new ones [18].
9 Myers [140, pp. 99– 100] notes that a comparison between top-down
and bottom-up integration testing “seems to give the bottom-up strategy
the edge.”
10 Zhu, Hall, and May [196] survey test coverage and adequacy criteria.
12 Myers [140, pp. 46– 47] provides heuristic guidelines for equivalence
partitioning.
15 Jones and Harrold [105] use pairs tables for MC/DC testing.
16 The NIST ACTS tool for combinatorial testing is available through
GitHub: https://github.com/usnistgov/combinatorial-testing-tools.
18 D. M. Cohen et al. [50] show that for fixed t, the size of a t-way
covering array grows logarithmically with the number of factors. They
also describe heuristics for designing tests. The covering array with 13
tests for 10 factors in Table 9.5 is from Kuhn’s keynote [115]. It also
appears in Hagar et al. [84]. Garvin, M. B. Cohen, and Dwyer [75]
explore an extension of combinatorial testing called constrained
combinatorial testing, where “some features cannot coexist in a
configuration.”
3 Basili and Weiss [13] advocate the use of goals to guide data collection
and measurement: “Without goals, one runs the risk of collecting
unrelated, meaningless data.” Basili et al. [12] describe an approach that
they call GQM+Strategies, which extends the Goals-Questions-Metrics
(GQM) approach of [13]. GQM+Strategies starts with high-level goals
that are refined into what they call “measurement” or “GQM” goals.
10 Gantt charts are named after Henry Gantt, who initially used them to
measure worker productivity. See the Wikipedia page for more
information.
11 The medians in Table 10.2, are from summary data provided by Jones
[103, 104], reportedly from 600 client companies. There is other
anecdotal data to support the observation that a combination of reviews,
static analysis, and testing is highly effective for defect detection; for
example, see Hackbarth, et al. [83].
12 CQM was introduced by Mockus and Weiss [138] under the name
Interval Quality. Hackbarth, et al. [83] describe the use of CQM to drive
quality improvement at Avaya.
15 Begel and Zimmerman [19] describe how they collected and ranked
questions that Microsoft engineers have for data scientists.
Notes for Appendix A: A Team Project
[6] S. W. Ambler. UML 2.5: Do you even care? Dr. Dobb’s (November 19,
2013). www.drdobbs.com/architecture-and-design/uml-25-do-you-even-
care/240163702.
[19] A. Begel and T. Zimmermann. Analyze this! 145 questions for data
scientists in software engineering. Proceedings of the 36th International
Conference on Software Engineering (ICSE) (2014) 12– 23.
[21] A. Bessey, K. Block, B. Chelf, et al. A few billion lines of code later:
Using static analysis to find bugs in the real world. Communications of the
ACM 53, 2 (February 2010) 66– 75.
[23] M. Bland. Finding more than one worm in the apple. Communications
of the ACM 57, 7 (July 2014) 58– 64.
[24] B. W. Boehm. Software engineering. IEEE Transactions on Computers
C-25, 12 (December 1976) 1226– 1241.
[34] K. D. Boklan. How I broke the Confederate code (137 years too late).
Cryptologia 30 (2006) 340–
345.www.cwu.edu/boersmas/cryptology/confederate%20code%20paper.pdf
[45] D. Clegg and R. Barker. Case Method Fast Track: A RAD Approach.
Addison-Wesley Professional (1994).
[70] M. Fowler. UML Distilled 3rd ed.: A Brief Guide to the Standard
Object Modeling Language. Addison-Wesley (2003).
[77] W. H. Gates III. The Internet Tidal Wave. Internal Microsoft memo
(May 26, 1995). www.justice.gov/atr/cases/exhibits/20.pdf.
[79] B. Glick. The BBC DMI project: What went wrong? Computer Weekly
(Feb 5, 2014). www.computerweekly.com/news/2240213773/The-BBC-
DMI-project-what-went-wrong.
[87] F. Herzberg. Work and the Nature of Man. Cleveland World Publishing
Co. (1966).
[90] D. Hovemeyer and W. Pugh. Finding more null pointer bugs, but not
too many. 7th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis
for Software Tools and Engineering. ACM (June 2007) 9– 14.
[96] I. Jacobson. Use cases: yesterday, today, and tomorrow. Software and
Systems Modeling 3 (2004) 210– 220.
[97] I. Jacobson interview. Ivar Jacobson on UML, MDA, and the future of
methodologies. InfoQ interview (October 24, 2006).
www.infoq.com/interviews/Ivar_Jacobson/.
[98] I. Jacobson, I. Spence, and B. Kerr. Use-Case 2.0: The hub of software
development. ACM Queue 14, 1 (January– February 2016) 94– 123.
[103] C. Jones. Software Quality in 2013: A Survey of the State of the Art.
35th Annual Pacific NW Software Quality Conference (October 2013). The
following web-site has a link to his video keynote:
www.pnsqc.org/software-quality-in-2013-survey-of-the-state-of-the-art/.
[110] R. Kazman and A. Eden. Defining the terms architecture, design, and
implementation. news@sei 6, 1 (First Quarter 2003).
[130] I. Maravić. Spotify’s event delivery: The road to the cloud (Part I).
(February 25, 2016). https://labs.spotify.com/2016/02/25/spotifys-event-
delivery-the-road-to-the-cloud-part-i/.
[136] G. A. Miller. The magical number seven, plus or minus two: Some
limits on our capacity for processing information. Psychological Review
101, 2 (1955) 343– 352.
[153] A. Porter and L. G. Votta Jr. What makes inspections work? IEEE
Software 14, 6 (November– December 1997).
[185] L. G. Votta Jr. Does every inspection need a meeting? 1st ACM
SIGSOFT Symposium on Foundations of Software Engineering (SIGSOFT
’93). Distributed as Software Engineering Notes 18, 5 (December
1993)107– 114.
daily scrum, 35
daily standup, see daily scrum
Dalkey, Norman Crolee, 318
Dardenne, Anne, 317
Darimont, Robert, 317
data set, 261
dataflow, 186
dates
Microsoft Excel, 263
Unix, 263
Davis, Edward D., 318
De Bonte, Austina, 97, 316–318
Dean, Jeffrey, 5, 314, 321
Decina, Dan, 10
decomposition views, see module hierarchy
defect, 11
discovery rate, 226
severity, 268
defect removal efficiency, 268
DeLine, Rob, 320
Delphi estimation, 318
deployment, 194
design, 144, see also architecture
Design for Delight, 76
desirable, see useful, usable, desirable
development, 2
Dijkstra, Edsger Wybe, 226, 321, 322
dissatisfiers, 113, 116
distribution, 284
Doran, George T., 317
observation, 261
operations quality, see ops quality
ops quality, 259, 260, 272
ordinal scale, 263
ordinary least squares, 293
outlier (boxplot), 280
quality attribute, 69
quality, software
forms of, 257
improvement, 275
question
motivation, 87
options, 88
quantification, 88
QuickBooks app, 76
T diagram, 202
t-distribution, 287, 290
Tague, Berkley A., 315
test-driven development, 40
testing, 12–13, 39–40, 49–51, 222
adequacy, 225
big-bang, 45, 229
black-box, 13, 238, 245
combinatorial, 245
effectiveness, 269, 322, see also coverage
incremental, 229
integration, 229
levels, 51, 227
regression, 39
unit, 228
white-box, 13, 233, 241
Therac-25 accidents, 16, 23, 225
time-boxed, 27
Toman, Carol A., 321
Toshiba Gigabeat S, 322
Turner, Clark R., 16, 314
Tversky, Amos Nathan, 317
UML, 150
class, see class diagram
early draft, 320
package, see package diagram
usage, 319, 320
unbounded data stream, 190
Unified Modeling Language, see UML
univariate data, 265
Unix
culture, 28
dates, 263
pipeline, 187
portability, 177
software tools, 186, 201
usable, see useful, usable, desirable
usage logs, 76
use case, 125
developing, 135
diagram, 136
elements, 125
extension, 137, 138
inclusion, 137
levels, 134, 319
subflow, 137
template, 133
use cases
and user stories, 139
iterative development, 135
user goal, 125
user story, 38, 77–80, 82
UX scenario, 83
XP, 97
yagni, 41, 315
Yoffie, David B., 316
Zave, Pamela, 316
Zhu, Hong, 323
Zimmerman, Thomas, 296, 324