0% found this document useful (0 votes)
11 views199 pages

MCS104C ASE Digital Notes

The document discusses requirements elicitation in software engineering, emphasizing its importance in defining system requirements through collaboration between clients, users, and developers. It outlines key activities involved in the process, such as identifying actors, scenarios, and use cases, and highlights the significance of clear communication to avoid costly errors. Additionally, it covers functional and nonfunctional requirements, validation of specifications, and different engineering approaches like greenfield and reengineering.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views199 pages

MCS104C ASE Digital Notes

The document discusses requirements elicitation in software engineering, emphasizing its importance in defining system requirements through collaboration between clients, users, and developers. It outlines key activities involved in the process, such as identifying actors, scenarios, and use cases, and highlights the significance of clear communication to avoid costly errors. Additionally, it covers functional and nonfunctional requirements, validation of specifications, and different engineering approaches like greenfield and reengineering.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 199

1

VTU, Belagavi
Advanced Software Engineering (MCS104C)
From M2 to M5
~By Swapnadeep Kapuri
2VX24SCS17
2

Module 2, Chapter 1
Requirements Elicitation
A requirement is a feature that the system must have or a constraint that it must satisfy to be
accepted by the client. Requirements engineering aims at defining the requirements of the system
under construction. Requirements engineering includes two main activities; requirements
elicitation, which results in the specification of the system that the client understands, and
analysis, which results in an analysis model that the developers can unambiguously interpret.
Requirements elicitation is the more challenging of the two because it requires the collaboration
of several groups of participants with different backgrounds. On the one hand, the client and the
users are experts in their domain and have a general idea of what the system should do, but they
often have little experience in software development. On the other hand, the developers have
experience in building systems, but often have little knowledge of the everyday environment of
the users.
Scenarios and use cases provide tools for bridging this gap. A scenario describes an
example of system use in terms of a series of interactions between the user and the system. A use
case is an abstraction that describes a class of scenarios. Both scenarios and use cases are written
in natural language, a form that is understandable to the user.
Developers elicit requirements by observing and interviewing users. Developers first
represent the user’s current work processes as as-is scenarios, then develop visionary scenarios
describing the functionality to be provided by the future system. The client and users validate the
system description by reviewing the scenarios and by testing small prototypes provided by the
developers. As the definition of the system matures and stabilizes, developers and the client agree
on a requirements specification in the form of functional requirements, nonfunctional
requirements, use cases, and scenarios.
4.1 Introduction: Usability Examples
Requirements elicitation is about communication among developers, clients, and users to define a
new system. Failure to communicate and understand each other’s’ domains results in a system that
is difficult to use or that simply fails to support the user’s work. Errors introduced during
requirements elicitation are expensive to correct, as they are usually discovered late in the process,
often as late as delivery. Such errors include missing functionality that the system should have
supported, functionality that was incorrectly specified, user interfaces that are misleading or
unusable, and obsolete functionality. Requirements elicitation methods aim at improving
communication among developers, clients, and users. Developers construct a model of the
application domain by observing users in their environment. Developers select a representation
that is understandable by the clients and users (e.g., scenarios and use cases). Developers validate
the application domain model by constructing simple prototypes of the user interface and
collecting feedback from potential users. An example of a simple prototype is the layout of a user
interface with menu items and buttons. The potential user can manipulate the menu items and
3

buttons to get a feeling for the usage of the system, but there is no actual response after buttons are
clicked, because the required functionality is not implemented.
4.2 An Overview of Requirements Elicitation
Requirements elicitation focuses on describing the purpose of the system. The client, the
developers, and the users identify a problem area and define a system that addresses the problem.
Such a definition is called a requirements specification and serves as a contract between the client
and the developers. The requirements specification is structured and formalized during analysis
(Module 2, Chapter 2, Analysis) to produce an analysis model (see Figure 4-1). Both requirements
specification and analysis model represent the same information. They differ only in the language
and notation they use; the requirements specification is written in natural language, whereas the
analysis model is usually expressed in a formal or semiformal notation. The requirements
specification supports the communication with the client and users. The analysis model supports
the communication among developers. They are both models of the system in the sense that they
attempt to represent accurately the external aspects of the system. Given that both models represent
the same aspects of the system, requirements elicitation and analysis occur concurrently and
iteratively.
Requirements elicitation and analysis focus only on the user’s view of the system. For
example, the system functionality, the interaction between the user and the system, the errors that
the system can detect and handle, and the environmental conditions in which the system functions
are part of the requirements. The system structure, the implementation technology selected to build
the system, the system design, the development methodology, and other aspects not directly visible
to the user are not part of the requirements.
4

Requirements elicitation includes the following activities:


• Identifying actors. During this activity, developers identify the different types of users the future
system will support.
• Identifying scenarios. During this activity, developers observe users and develop a set of
detailed scenarios for typical functionality provided by the future system. Scenarios are concrete
examples of the future system in use. Developers use these scenarios to communicate with the user
and deepen their understanding of the application domain.
• Identifying use cases. Once developers and users agree on a set of scenarios, developers derive
from the scenarios a set of use cases that completely represent the future system. Whereas scenarios
are concrete examples illustrating a single case, use cases are abstractions describing all possible
cases. When describing use cases, developers determine the scope of the system.
• Refining use cases. During this activity, developers ensure that the requirements specification is
complete by detailing each use case and describing the behavior of the system in the presence of
errors and exceptional conditions.
• Identifying relationships among use cases. During this activity, developers identify
dependencies among use cases. They also consolidate the use case model by factoring out common
functionality. This ensures that the requirements specification is consistent.
• Identifying nonfunctional requirements. During this activity, developers, users, and clients
agree on aspects that are visible to the user, but not directly related to functionality. These include
constraints on the performance of the system, its documentation, the resources it consumes, its
security, and its quality.
During requirements elicitation, developers access many different sources of information,
including client-supplied documents about the application domain, manuals and technical
documentation of legacy systems that the future system will replace, and most important, the users
and clients themselves. Developers interact the most with users and clients during requirements
elicitation. We focus on two methods for eliciting information, making decisions with users and
clients, and managing dependencies among requirements and other artifacts:
• Joint Application Design (JAD) focuses on building consensus among developers, users, and
clients by jointly developing the requirements specification.
• Traceability focuses on recording, structuring, linking, grouping, and maintaining dependencies
among requirements and between requirements and other work products.
4.3 Requirements Elicitation Concepts
We describe the main requirements elicitation concepts used in this chapter. In particular, we
describe
• Functional Requirements (Section 4.3.1)
• Nonfunctional Requirements (Section 4.3.2)
• Completeness, Consistency, Clarity, and Correctness (Section 4.3.3)
• Realism, Verifiability, and Traceability (Section 4.3.4)
• Greenfield Engineering, Reengineering, and Interface Engineering (Section 4.3.5).
We describe the requirements elicitation activities in Section 4.4.
5

4.3.1 Functional Requirements


Functional requirements describe the interactions between the system and its environment
independent of its implementation. The environment includes the user and any other external
system with which the system interacts. For example, Figure 4-2 is an example of functional
requirements for SatWatch, a watch that resets itself without user intervention:

The above functional requirements focus only on the possible interactions between
SatWatch and its external world (i.e., the watch owner, GPS, and WebifyWatch). The above
description does not focus on any of the implementation details (e.g., processor, language, display
technology).

4.3.2 Nonfunctional Requirements


Nonfunctional requirements describe aspects of the system that are not directly related to the
functional behavior of the system. Nonfunctional requirements include a broad variety of
requirements that apply to many different aspects of the system, from usability to performance.
The FURPS+ model used by the Unified Process [Jacobson et al., 1999] provides the following
categories of nonfunctional requirements:
• Usability is the ease with which a user can learn to operate, prepare inputs for, and interpret
outputs of a system or component. Usability requirements include, for example, conventions
adopted by the user interface, the scope of online help, and the level of user documentation. Often,
clients address usability issues by requiring the developer to follow user interface guidelines on
color schemes, logos, and fonts.
6

• Reliability is the ability of a system or component to perform its required functions under stated
conditions for a specified period of time. Reliability requirements include, for example, an
acceptable mean time to failure and the ability to detect specified faults or to withstand specified
security attacks. More recently, this category is often replaced by dependability, which is the
property of a computer system such that reliance can justifiably be placed on the service it delivers.
Dependability includes reliability, robustness (the degree to which a system or component can
function correctly in the presence of invalid inputs or stressful environment conditions), and safety
(a measure of the absence of catastrophic consequences to the environment).
• Performance requirements are concerned with quantifiable attributes of the system, such as
response time (how quickly the system reacts to a user input), throughput (how much work the
system can accomplish within a specified amount of time), availability (the degree to which a
system or component is operational and accessible when required for use), and accuracy.
• Supportability requirements are concerned with the ease of changes to the system after
deployment, including for example, adaptability (the ability to change the system to deal with
additional application domain concepts), maintainability (the ability to change the system to deal
with new technology or to fix defects), and internationalization (the ability to change the system
to deal with additional international conventions, such as languages, units, and number formats).
The ISO 9126 standard on software quality [ISO Std. 9126], similar to the FURPS+ model,
replaces this category with two categories: maintainability and portability (the ease with which
a system or component can be transferred from one hardware or software environment to another).
(Note: FURPS+ is an acronym using the first letter of the requirements categories: Functionality,
Usability, Reliability, Performance, and Supportability. The + indicates the additional
subcategories. The FURPS model was originally proposed by [Grady, 1992]. The definitions in
this section are quoted from [IEEE Std. 610.12-1990].)
The FURPS+ model provides additional categories of requirements typically also included under
the general label of nonfunctional requirements:
• Implementation requirements are constraints on the implementation of the system, including
the use of specific tools, programming languages, or hardware platforms.
• Interface requirements are constraints imposed by external systems, including legacy systems
and interchange formats.
• Operations requirements are constraints on the administration and management of the system
in the operational setting.
• Packaging requirements are constraints on the actual delivery of the system (e.g., constraints
on the installation media for setting up the software).
• Legal requirements are concerned with licensing, regulation, and certification issues. An
example of a legal requirement is that software developed for the U.S. federal government must
comply with Section 508 of the Rehabilitation Act of 1973, requiring that government information
systems must be accessible to people with disabilities.
Nonfunctional requirements that fall into the URPS categories are called quality
requirements of the system. Nonfunctional requirements that fall into the implementation,
7

interface, operations, packaging, and legal categories are called constraints or pseudo
requirements. Budget and schedule requirements are usually not treated as nonfunctional
requirements, as they constrain attributes of the projects.

4.3.3 Completeness, Consistency, Clarity, and Correctness


Requirements are continuously validated with the client and the user. Validation is a critical step
in the development process, given that both the client and the developer depend on the
requirements specification. Requirement validation involves checking that the specification is
complete, consistent, unambiguous, and correct. It is complete if all possible scenarios through
the system are described, including exceptional behavior (i.e., all aspects of the system are
represented in the requirements model). The requirements specification is consistent if it does not
contradict itself. The requirements specification is unambiguous if exactly one system is defined
(i.e., it is not possible to interpret the specification two or more different ways). A specification is
correct if it represents accurately the system that the client needs and that the developers intend to
build (i.e., everything in the requirements model accurately represents an aspect of the system to
the satisfaction of both client and developer).
The correctness and completeness of a requirements specification are often difficult to
establish, especially before the system exists. Given that the requirements specification serves as
a contractual basis between the client and the developers, the requirements specification must be
carefully reviewed by both parties. Additionally, parts of the system that present a high risk should
be prototyped or simulated to demonstrate their feasibility or to obtain feedback from the user. In
the case of SatWatch described above, a mock-up of the watch would be built using a traditional
watch and users surveyed to gather their initial impressions. A user may remark that she wants the
watch to be able to display both American and European date formats.

4.3.4 Realism, Verifiability, and Traceability


Three more desirable properties of a requirements specification are that it be realistic, verifiable,
and traceable. The requirements specification is realistic if the system can be implemented within
constraints. The requirements specification is verifiable if, once the system is built, repeatable
tests can be designed to demonstrate that the system fulfills the requirements specification. For
example, a mean time to failure of a hundred years for SatWatch would be difficult to verify
(assuming it is realistic in the first place). The following requirements are additional examples of
non-verifiable requirements:
• The product shall have a good user interface. -Good is not defined.
• The product shall be error free. -Requires large number of resources to establish.
• The product shall respond to the user with 1 second for most cases. -Most cases is not defined.
A requirements specification is traceable if each requirement can be traced throughout the
software development to its corresponding system functions, and if each system function can be
traced back to its corresponding set of requirements. Traceability includes also the ability to track
the dependencies among requirements, system functions, and the intermediate design artifacts,
8

including system components, classes, methods, and object attributes. Traceability is critical for
developing tests and for evaluating changes. When developing tests, traceability enables a tester
to assess the coverage of a test case, that is, to identify which requirements are tested and which
are not. When evaluating changes, traceability enables the analyst and the developers to identify
all components and system functions that the change would impact.

4.3.5 Greenfield Engineering, Reengineering, and Interface


Engineering
Requirements elicitation activities can be classified into three categories, depending on the source
of the requirements. In greenfield engineering, the development starts from scratch—no prior
system exists—so the requirements are extracted from the users and the client. A greenfield
engineering project is triggered by a user need or the creation of a new market. SatWatch is a
greenfield engineering project.
A reengineering project is the redesign and reimplementation of an existing system
triggered by technology enablers or by business processes [Hammer & Champy, 1993].
Sometimes, the functionality of the new system is extended, but the essential purpose of the system
remains the same. The requirements of the new system are extracted from an existing system.
An interface engineering project is the redesign of the user interface of an existing system.
The legacy system is left untouched except for its interface, which is redesigned and
reimplemented. This type of project is a reengineering project in which the legacy system cannot
be discarded without entailing high costs.
In both reengineering and greenfield engineering, the developers need to gather as much
information as possible from the application domain. This information can be found in procedures
manuals, documentation distributed to new employees, the previous system’s manual, glossaries,
cheat sheets and notes developed by the users, and user and client interviews. Note that although
interviews with users are an invaluable tool, they fail to gather the necessary information if the
relevant questions are not asked. Developers must first gain a solid knowledge of the application
domain before the direct approach can be used.

4.4 Requirements Elicitation Activities


In this section, we describe the requirements elicitation activities. We discuss heuristics and
methods for eliciting requirements from users and modeling the system in terms of these concepts.
Requirements elicitation activities include:
• Identifying Actors (Section 4.4.1)
• Identifying Scenarios (Section 4.4.2)
• Identifying Use Cases (Section 4.4.3)
• Refining Use Cases (Section 4.4.4)
• Identifying Relationships Among Actors and Use Cases (Section 4.4.5)
• Identifying Initial Analysis Objects (Section 4.4.6)
• Identifying Nonfunctional Requirements (Section 4.4.7).
9

The methods described in this section are adapted from OOSE [Jacobson et al., 1992], the Unified
Software Development Process [Jacobson et al., 1999], and responsibility-driven design [Wirfs-
Brock et al., 1990].

4.4.1 Identifying Actors


Actors represent external entities that interact with the system. An actor can be human or an
external system. In the SatWatch example, the watch owner, the GPS satellites, and the
WebifyWatch serial device are actors (see Figure 4-4). They all exchange information with the
SatWatch. Note, however, that they all have specific interactions with SatWatch: the watch owner
wears and looks at her watch; the watch monitors the signal from the GPS satellites; the
WebifyWatch downloads new data into the watch. Actors define classes of functionality.

Actors are role abstractions and do not necessarily directly map to persons. The same
person can fill the role of WatchOwner and WebifyWatch. However, the functionality they access
is substantially different. For that reason, these two roles are modeled as two different actors.
The first step of requirements elicitation is the identification of actors. This serves both to
define the boundaries of the system and to find all the perspectives from which the developers
need to consider the system. When the system is deployed into an existing organization (such as a
company), most actors usually exist before the system is developed: they correspond to roles in
the organization.
During the initial stages of actor identification, it is hard to distinguish actors from objects.
For example, a database subsystem can at times be an actor, while in other cases it can be part of
the system. Note that once the system boundary is defined, there is no trouble distinguishing
between actors and such system components as objects or subsystems. Actors are outside of the
system boundary; they are external. Subsystems and objects are inside the system boundary; they
are internal. Thus, any external software system using the system to be developed is an actor.
10

Once the actors are identified, the next step in the requirements elicitation activity is to
determine the functionality that will be accessible to each actor. This information can be extracted
using scenarios and formalized using use cases.

4.4.2 Identifying Scenarios


A scenario is “a narrative description of what people do and experience as they try to make use of
computer systems and applications” [Carroll, 1995]. A scenario is a concrete, focused, informal
description of a single feature of the system from the viewpoint of a single actor. Scenarios cannot
(and are not intended to) replace use cases, as they focus on specific instances and concrete events
(as opposed to complete and general descriptions). However, scenarios enhance requirements
elicitation by providing a tool that is understandable to users and clients.
Scenarios can have many different uses during requirements elicitation and during other
activities of the life cycle. Below is a selected number of scenario types taken from [Carroll, 1995]:
• As-is scenarios describe a current situation. During reengineering, for example, the current
system is understood by observing users and describing their actions as scenarios. These scenarios
can then be validated for correctness and accuracy with the users.
• Visionary scenarios describe a future system. Visionary scenarios are used both as a point in the
modeling space by developers as they refine their ideas of the future system and as a
communication medium to elicit requirements from users. Visionary scenarios can be viewed as
an inexpensive prototype.
• Evaluation scenarios describe user tasks against which the system is to be evaluated. The
collaborative development of evaluation scenarios by users and developers also improves the
definition of the functionality tested by these scenarios.
• Training scenarios are tutorials used for introducing new users to the system. These are step-
by-step instructions designed to hand-hold the user through common tasks.
In requirements elicitation, developers and users write and refine a series of scenarios in
order to gain a shared understanding of what the system should be. Initially, each scenario may be
high level and incomplete.
Developers use existing documents about the application domain to answer these
questions. These documents include user manuals of previous systems, procedures manuals,
company standards, user notes and cheat sheets, user and client interviews. Developers should
always write scenarios using application domain terms, as opposed to their own terms. As
developers gain further insight into the application domain and the possibilities of the available
technology, they iteratively and incrementally refine scenarios to include increasing amounts of
detail. Drawing user interface mock-ups often helps to find omissions in the specification and to
build a more concrete picture of the system.
The emphasis for developers during actor identification and scenario identification is to
understand the application domain. This results in a shared understanding of the scope of the
system and of the user work processes to be supported. Once developers have identified and
described actors and scenarios, they formalize scenarios into use cases.
11

4.4.3 Identifying Use Cases


A scenario is an instance of a use case; that is, a use case specifies all possible scenarios for a
given piece of functionality. A use case is initiated by an actor. After its initiation, a use case may
interact with other actors, as well. A use case represents a complete flow of events through the
system in the sense that it describes a series of related interactions that result from its initiation.

Generalizing scenarios and identifying the high-level use cases that the system must
support enables developers to define the scope of the system. Initially, developers name use cases,
attach them to the initiating actors, and provide a high-level description of the use case as in Figure
4-7. The name of a use case should be a verb phrase denoting what the actor is trying to accomplish.
The verb phrase “Report Emergency” indicates that an actor is attempting to report an emergency
to the system (and hence, to the Dispatcher actor). This use case is not called “Record Emergency”
because the name should reflect the perspective of the actor, not the system. It is also not called
12

“Attempt to Report an Emergency” because the name should reflect the goal of the use case, not
the actual activity.
Attaching use cases to initiating actors enables developers to clarify the roles of the
different users. Often, by focusing on who initiates each use case, developers identify new actors
that have been previously overlooked.
Describing a use case entail specifying four fields:
• Describing the entry and exit conditions of a use case enables developers to understand the
conditions under which a use case is invoked and the impact of the use case on the state of the
environment and of the system. By examining the entry and exit conditions of use cases,
developers can determine if there may be missing use cases. For example, if a use case requires
that the emergency operations plan dealing with earthquakes should be activated, the
requirements specification should also provide a use case for activating this plan.
• Describing the flow of events of a use case enables developers and clients to discuss the
interaction between actors and system. This results in many decisions about the boundary of
the system, that is, about deciding which actions are accomplished by the actor and which
actions are accomplished by the system.
• Finally, describing the quality requirements associated with a use case enables developers to
elicit nonfunctional requirements in the context of a specific functionality. In this book, we
focus on these four fields to describe use cases as they describe the most essential aspects of a
use case. In practice, many additional fields can be added to describe an exceptional flow of
events, rules, and invariants that the use case must respect during the flow of events.
Writing use cases is a craft. An analyst learns to write better use cases with experience.
Consequently, different analysts tend to develop different styles, which can make it difficult to
produce a consistent requirements specification. To address the issue of learning how to write use
cases and how to ensure consistency among the use cases of a requirements specification, analysts
adopt a use case writing guide. Figure 4-8 is a simple writing guide adapted from [Cockburn, 2001]
that can be used for novice use case writers.
13

4.4.4 Refining Use Cases


Figure 4-10 is a refined version of the ReportEmergency use case. It has been extended to include
details about the type of incidents known to FRIEND and detailed interactions indicating how the
Dispatcher acknowledges the FieldOfficer.

The use of scenarios and use cases to define the functionality of the system aims at creating
requirements that are validated by the user early in the development. As the design and
implementation of the system starts, the cost of changing the requirements specification and adding
new unforeseen functionality increases. Although requirements change until late in the
development, developers and users should strive to address most requirements issues early. This
entails many changes and much validation during requirements elicitation. Note that many use
cases are rewritten several times, others substantially refined, and yet others completely dropped.
To save time, much of the exploration work can be done using scenarios and user interface mock-
ups.
The following heuristics can be used for writing scenarios and use cases:
14

The focus of this activity is on completeness and correctness. Developers identify


functionality not covered by scenarios, and document it by refining use cases or writing new ones.
Developers describe seldom occurring cases and exception handling as seen by the actors. Whereas
the initial identification of use cases and actors focused on establishing the boundary of the system,
the refinement of use cases yields increasingly more details about the features provided by the
system and the constraints associated with them. In particular, the following aspects of the use
cases, initially ignored, are detailed during refinement:
• The elements that are manipulated by the system are detailed. In Figure 4-10, we added details
about the attributes of the emergency reporting form and the types of incidents.
• The low-level sequence of interactions between the actor and the system are specified. In Figure
4-10, we added information about how the Dispatcher generates an acknowledgment by selecting
resources.
• Access rights (which actors can invoke which use cases) are specified.
• Missing exceptions are identified and their handling specified.
• Common functionality among use cases is factored out.

4.4.5 Identifying Relationships among Actors and Use Cases


Even medium-sized systems have many use cases. Relationships among actors and use cases
enable the developers and users to reduce the complexity of the model and increase its
understandability. We use communication relationships between actors and use cases to describe
the system in layers of functionality. We use extend relationships to separate exceptional and
common flows of events. We use include relationships to reduce redundancy among use cases.
Communication relationships between actors and use cases
Communication relationships between actors and use cases represent the flow of
information during the use case. The actor who initiates the use case should be distinguished from
the other actors with whom the use case communicates. By specifying which actor can invoke a
15

specific use case, we also implicitly specify which actors cannot invoke the use case. Similarly, by
specifying which actors communicate with a specific use case, we specify which actors can access
specific information and which cannot. Thus, by documenting initiation and communication
relationships among actors and use cases, we specify access control for the system at a coarse
level.
The relationships between actors and use cases are identified when use cases are identified.
Figure 4-11 depicts an example of communication relationships in the case of the FRIEND system.
The «initiate» stereotype denotes the initiation of the use case by an actor, and the «participate»
stereotype denotes that an actor (who did not initiate the use case) communicates with the use case.

Extend relationships between use cases


A use case extends another use case if the extended use case may include the behavior of
the extension under certain conditions. Separating exceptional and optional flows of events from
the base use case has two advantages. First, the base use case becomes shorter and easier to
understand. Second, the common case is distinguished from the exceptional case, which enables
the developers to treat each type of functionality differently (e.g., optimize the common case for
response time, optimize the exceptional case for robustness). Both the extended use case and the
extensions are complete use cases of their own. They each must have entry and end conditions and
be understandable by the user as an independent whole.
16

Include relationships between use cases


Redundancies among use cases can be factored out using include relationships. Factoring
out shared behavior from use cases has many benefits, including shorter descriptions and fewer
redundancies. Behavior should only be factored out into a separate use case if it is shared across
two or more use cases. Excessive fragmentation of the requirements specification across a large
number of use cases makes the specification confusing to users and clients.

Extend versus include relationships


In summary, the following heuristics can be used for selecting an extend or an include
relationship.

In all cases, the purpose of adding include and extend relationships is to reduce or remove
redundancies from the use case model, thus eliminating potential inconsistencies.

4.4.6 Identifying Initial Analysis Objects


One of the first obstacles developers and users encounter when they start collaborating with each
other is differing terminology. Although developers eventually learn the users’ terminology, this
problem is likely to be encountered again when new developers are added to the project.
Misunderstandings result from the same terms being used in different contexts and with different
meanings.
To establish a clear terminology, developers identify the participating objects for each use
case. Developers should identify, name, and describe them unambiguously and collate them into a
17

glossary, which is also called a data dictionary. Building this glossary constitutes the first step
toward analysis.
The glossary is included in the requirements specification and, later, in the user manuals.
Developers keep the glossary up to date as the requirements specification evolves. The benefits of
the glossary are manyfold: new developers are exposed to a consistent set of definitions, a single
term is used for each concept (instead of a developer term and a user term), and each term has a
precise and clear official meaning. The identification of participating objects results in the initial
analysis object model.
The identification of participating objects during requirements elicitation only constitutes
a first step toward the complete analysis object model. The complete analysis model is usually not
used as a means of communication between users and developers, as users are often unfamiliar
with object-oriented concepts. However, the description of the objects (i.e., the definitions of the
terms in the glossary) and their attributes are visible to the users and reviewed.
Many heuristics have been proposed in the literature for identifying objects. Here are a
selected few:

During requirements elicitation, participating objects are generated for each use case. If
two use cases refer to the same concept, the corresponding object should be the same. If two objects
share the same name and do not correspond to the same concept, one or both concepts are renamed
to acknowledge and emphasize their difference. This consolidation eliminates any ambiguity in
the terminology used.
Once participating objects are identified and consolidated, the developers can use them as
a checklist for ensuring that the set of identified use cases is complete.
18

4.4.7 Identifying Nonfunctional Requirements


Nonfunctional requirements describe aspects of the system that are not directly related to its
functional behavior. Nonfunctional requirements span a number of issues, from user interface look
and feel to response time requirements to security issues. Nonfunctional requirements are defined
at the same time as functional requirements because they have as much impact on the development
and cost of the system.
For example, consider a mosaic display that an air traffic controller uses to track planes. A
mosaic display system compiles data from a series of radars and databases (hence the term
“mosaic”) into a summary display indicating all aircraft in a certain area, including their
identification, speed, and altitude. The number of aircraft such a system can display constrains the
performance of the air traffic controller and the cost of the system. If the system can only handle
a few aircraft simultaneously, the system cannot be used at busy airports. On the other hand, a
system able to handle a large number of aircraft is more costly and more complex to build and to
test.
Nonfunctional requirements can impact the work of the user in unexpected ways. To
accurately elicit all the essential nonfunctional requirements, both client and developer must
collaborate so that they identify (minimally) which attributes of the system that are difficult to
realize are critical for the work of the user. In the mosaic display example above, the number of
aircraft that a single mosaic display must be able to handle has implications on the size of the icons
used for displaying aircraft, the features for identifying aircraft and their properties, the refresh
rate of the data, and so on.
The resulting set of nonfunctional requirements typically includes conflicting
requirements. For example, the nonfunctional requirements of the SatWatch (Figure 4-3) call for
an accurate mechanism, so that the time never needs to be reset, and a low unit cost, so that it is
acceptable to the user to replace the watch with a new one when it breaks. These two nonfunctional
requirements conflict as the unit cost of the watch increases with its accuracy. To deal with such
conflicts, the client and the developer prioritize the nonfunctional requirements, so that they can
be addressed consistently during the realization of the system.
There are unfortunately few systematic methods for eliciting nonfunctional requirements.
In practice, analysts use a taxonomy of nonfunctional requirements (e.g., the FURPS+ scheme
described previously) to generate check lists of questions to help the client and the developers
focus on the nonfunctional aspects of the system. As the actors of the system have already been
identified at this point, this check list can be organized by role and distributed to representative
users. The advantage of such check lists is that they can be reused and expanded for each new
system in a given application domain, thus reducing the number of omissions. Note that such check
lists can also result in the elicitation of additional functional requirements. For example, when
asking questions about the operation of the system, the client and developers may uncover a
number of use cases related with the administration of the system. Table 4-3 depicts example
questions for each of the FURPS+ category.
19

Once the client and the developers identify a set of nonfunctional requirements, they can
organize them into refinement and dependency graphs to identify further nonfunctional
requirements and identify conflicts. For more material on this topic, the reader is referred to the
specialized literature (e.g., [Chung et al., 1999]).
20

4.5 Managing Requirements Elicitation


In the previous section, we described the technical issues of modeling a system in terms of use
cases. Use case modeling by itself, however, does not constitute requirements elicitation. Even
after they become expert use case modelers, developers still need to elicit requirements from the
users and come to an agreement with the client. In this section, we describe methods for eliciting
information from the users and negotiating an agreement with a client. In particular, we describe:
• Negotiating Specifications with Clients: Joint Application Design (Section 4.5.1)
• Maintaining Traceability (Section 4.5.2)
• Documenting Requirements Elicitation (Section 4.5.3)

4.5.1 Negotiating Specifications with Clients: Joint Application


Design
Joint Application Design (JAD) is a requirements method developed at IBM at the end of the
1970s. Its effectiveness lies in that the requirements elicitation work is done in one single workshop
session in which all stakeholders participate. Users, clients, developers, and a trained session leader
sit together in one room to present their viewpoints, listen to other viewpoints, negotiate, and come
to a mutually acceptable solution. The outcome of the workshop, the final JAD document, is a
complete requirements specification document that includes definitions of data elements, work
flows, and interface screens. Because the final document is jointly developed by the stakeholders
(that is, the participants who not only have an interest in the success of the project, but also can
make substantial decisions), the final JAD document represents an agreement among users, clients,
and developers, and thus minimizes requirements changes later in the development process. JAD
is composed of five activities (Figure 4-15):
1. Project definition. During this activity, the JAD facilitator interviews the project manager and
the client to determine the objectives and the scope of the project. The findings from the interviews
are collected in the Management Definition Guide.
2. Research. During this activity, the JAD facilitator interviews present and future users, gathers
information about the application domain, and describes a first set of high-level use cases. The
JAD facilitator also starts a list of problems to be addressed during the session. The results of this
activity are a Session Agenda and a Preliminary Specification listing work flow and system
information.
3. Preparation. During this activity, the JAD facilitator prepares the session. The JAD facilitator
creates a Working Document, which is the first draft of the final document, an agenda for the
session, and any overhead slides or flip charts representing information gathered during the
Research activity. The JAD facilitator also selects a team composed of the client, the project
manager, selected users, and developers. All stakeholders are represented, and the participants are
able to make binding decisions.
21

4. Session. During this activity, the JAD facilitator guides the team in creating the requirements
specification. A JAD session lasts for 3 to 5 days. The team defines and agrees on the scenarios,
use cases, and user interface mock-ups. All decisions are documented by a scribe.
5. Final document. The JAD facilitator prepares the Final Document, revising the working
document to include all decisions made during the session. The Final Document represents a
complete specification of the system agreed on during the session. The Final Document is
distributed to the session participants for review. The participants then attend a 1- to 2-hour
meeting to discuss the reviews and finalize the document.
JAD has been used by IBM and other companies. JAD leverages group dynamics to
improve communication among participants and to accelerate consensus. At the end of a JAD
session, developers are more knowledgeable of user needs, and users are more knowledgeable of
development trade-offs. Additional gains result from a reduction of redesign activities
downstream. Because of its reliance on social dynamics, the success of a JAD session often
depends on the qualifications of the JAD facilitator as a meeting facilitator. For a detailed overview
of JAD, the reader is referred to [Wood & Silver, 1989].
22

4.5.2 Maintaining Traceability


Traceability is the ability to follow the life of a requirement. This includes tracing where the
requirements came from (e.g., who originated it, which client need does it address) to which
aspects of the system and the project it affects (e.g., which components realize the requirement,
which test case checks its realization). Traceability enables developers to show that the system is
complete, testers to show that the system complies with its requirements, designers to record the
rationale behind the system, and maintainers to assess the impact of change.
Consider the SatWatch system we introduced at the beginning of the chapter. Currently, the
specification calls for a two-line display that includes time and date. After the client decides that
the digit size is too small for comfortable reading, developers change the display requirement to a
single-line display combined with a button to switch between time and date. Traceability would
enable us to answer the following questions:
• Who originated the two-line display requirement?
• Did any implicit constraints mandate this requirement?
• Which components must be changed because of the additional button and display?
• Which test cases must be changed?
The simplest approach to maintaining traceability is to use cross-references among
documents, models, and code artifacts. Each individual element (e.g., requirement, component,
class, operation, test case) is identified by a unique number. Dependencies are then documented
manually as a textual cross-reference containing the number of the source element and the number
of the target element. Tool support can be as simple as a spreadsheet or a word processing tool.
This approach is expensive in time and personpower, and it is error prone. However, for small
projects, developers can observe benefits early.
For large-scale projects, specialized database tools enable the partial automation of the
capture, editing, and linking of traceability dependencies at a more detailed level (e.g., DOORS
[Telelogic] or RequisitePro [Rational]). Such tools reduce the cost of maintaining traceability, but
they require the buy-in and training of most stakeholders and impose restrictions on other tools in
the development process.

4.5.3 Documenting Requirements Elicitation


The results of the requirements elicitation and the analysis activities are documented in the
Requirements Analysis Document (RAD). This document completely describes the system in
terms of functional and nonfunctional requirements. The audience for the RAD includes the client,
the users, the project management, the system analysts (i.e., the developers who participate in the
requirements), and the system designers (i.e., the developers who participate in the system design).
The first part of the document, including use cases and nonfunctional requirements, is written
during requirements elicitation. The formalization of the specification in terms of object models is
written during analysis. Figure 4-16 is an example template for a RAD used in this book.
The first section of the RAD is an Introduction. Its purpose is to provide a brief overview
of the function of the system and the reasons for its development, its scope, and references to the
23

development context (e.g., reference to the problem statement written by the client, references to
existing systems, feasibility studies). The introduction also includes the objectives and success
criteria of the project.
The second section, Current system, describes the current state of affairs. If the new
system will replace an existing system, this section describes the functionality and the problems
of the current system. Otherwise, this section describes how the tasks supported by the new system
are accomplished now. For example, in the case of SatWatch, the user currently resets her watch
whenever she travels across a time zone. Because of the manual nature of this operation, the user
occasionally sets the wrong time and occasionally neglects to reset. In contrast, the SatWatch will
continually ensure accurate time within its lifetime. In the case of FRIEND, the current system is
paper based: dispatchers keep track of resource assignments by filling out forms. Communication
between dispatchers and field officers is by radio. The current system requires a high
documentation and management cost that FRIEND aims to reduce.
The third section, Proposed system, documents the requirements elicitation and the
analysis model of the new system. It is divided into four subsections:
• Overview presents a functional overview of the system.
• Functional requirements describe the high-level functionality of the system.
• Nonfunctional requirements describe user-level requirements that are not directly related to
functionality. This includes usability, reliability, performance, supportability, implementation,
interface, operational, packaging, and legal requirements.
• System models describe the scenarios, use cases, object model, and dynamic models for the
system. This section contains the complete functional specification, including mock-ups
illustrating the user interface of the system and navigational paths representing the sequence of
screens. The subsections Object model and Dynamic model are written during the Analysis
activity, described in the next chapter.
The RAD should be written after the use case model is stable, that is, when the number of
modifications to the requirements is minimal. The requirements, however, are updated throughout
the development process when specification problems are discovered or when the scope of the
system is changed. The RAD, once published, is baselined and put under configuration
management.4 The revision history section of the RAD will provide a history of changes include
the author responsible for each change, the date of the change, and a brief description of the change.

(Note: ARENA case study is there in Page number 152/147 of the


textbook Object-oriented software engineering _ using UML,
Patterns, -- Bruegge, Bernd; Dutoit, Allen H, in section 4.6)
24
25

Module 2, Chapter 2
Analysis
Analysis results in a model of the system that aims to be correct, complete, consistent, and
unambiguous. Developers formalize the requirements specification produced during requirements
elicitation and examine in more detail boundary conditions and exceptional cases. Developers
validate, correct and clarify the requirements specification if any errors or ambiguities are found.
The client and the user are usually involved in this activity when the requirements specification
must be changed and when additional information must be gathered.
In object-oriented analysis, developers build a model describing the application domain.
For example, the analysis model of a watch describes how the watch represents time: Does the
watch know about leap years? Does it know about the day of the week? Does it know about the
phases of the moon? The analysis model is then extended to describe how the actors and the system
interact to manipulate the application domain model: How does the watch owner reset the time?
How does the watch owner reset the day of the week? Developers use the analysis model, together
with nonfunctional requirements, to prepare for the architecture of the system developed during
high-level design (Module 3, Chapter 1, System Design: Decomposing the System).
We focus on the identification of objects, their behavior, their relationships, their
classification, and their organization. We describe management issues related to analysis in the
context of a multi-team development project.

5.1 Introduction: An Optical Illusion


Specifications, like multi-stable images, contain ambiguities caused by the inaccuracies inherent
to natural language and by the assumptions of the specification authors. For example, a quantity
specified without a unit is ambiguous (e.g., the “Feet or Miles?” example in Section 4.1), a time
without time zone is ambiguous (e.g., scheduling a phone call between different countries).
Formalization helps identify areas of ambiguity as well as inconsistencies and omissions
in a requirements specification. Once developers identify problems with the specification, they
address them by eliciting more information from the users and the client. Requirements elicitation
and analysis are iterative and incremental activities that occur concurrently.

5.2 An Overview of Analysis


Analysis focuses on producing a model of the system, called the analysis model, which is correct,
complete, consistent, and verifiable. Analysis is different from requirements elicitation in that
developers focus on structuring and formalizing the requirements elicited from users (Figure 5-2).
This formalization leads to new insights and the discovery of errors in the requirements. As the
analysis model may not be understandable to the users and the client, developers need to update
the requirements specification to reflect insights gained during analysis, then review the changes
with the client and the users. In the end, the requirements, however large, should be understandable
by the client and the users.
26

There is a natural tendency for users and developers to postpone difficult decisions until
later in the project. A decision may be difficult because of lack of domain knowledge, lack of
technological knowledge, or simply because of disagreements among users and developers.
Postponing decisions enables the project to move on smoothly and avoids confrontation with
reality or peers. Unfortunately, difficult decisions eventually must be made, often at higher cost
when intrinsic problems are discovered during testing, or worse, during user evaluation.
Translating a requirements specification into a formal or semiformal model forces developer to
identify and resolve difficult issues early in the development.
The analysis model is composed of three individual models: the functional model,
represented by use cases and scenarios, the analysis object model, represented by class and object
diagrams, and the dynamic model, represented by state machine and sequence diagrams (Figure
5-3). In the previous chapter, we described how to elicit requirements from the users and describe
them as use cases and scenarios. In this chapter, we describe how to refine the functional model
and derive the object and the dynamic model. This leads to a more precise and complete
specification as details are added to the analysis model. We conclude the chapter by describing
management activities related to analysis.
27

5.3 Analysis Concepts


In this section, we describe the main analysis concepts used in this chapter. In particular, we
describe
• Analysis Object Models and Dynamic Models (Section 5.3.1)
• Entity, Boundary, and Control Objects (Section 5.3.2)
• Generalization and Specialization (Section 5.3.3).

5.3.1 Analysis Object Models and Dynamic Models


The analysis model represents the system under development from the user’s point of view. The
analysis object model is a part of the analysis model and focuses on the individual concepts that
are manipulated by the system, their properties and their relationships. The analysis object model,
depicted with UML class diagrams, includes classes, attributes, and operations. The analysis object
model is a visual dictionary of the main concepts visible to the user. The dynamic model focuses
on the behavior of the system.
The dynamic model is depicted with sequence diagrams and with state machines.
Sequence diagrams represent the interactions among a set of objects during a single use case. State
machines represent the behavior of a single object (or a group of very tightly coupled objects). The
dynamic model serves to assign responsibilities to individual classes and, in the process, to identify
new classes, associations, and attributes to be added to the analysis object model.
When working with either the analysis object model or the dynamic model, it is essential
to remember that these models represent user-level concepts, not actual software classes or
components. For example, classes such as Database, Subsystem, SessionManager, Network,
should not appear in the analysis model as the user is completely shielded from those concepts.
28

Note that most classes in the analysis object model will correspond to one or more software classes
in the source code. However, the software classes will include many more attributes and
associations than their analysis counterparts. Consequently, analysis classes should be viewed as
high-level abstractions that will be realized in much more detail later. Figure 5-4 depicts good and
bad examples of analysis objects for the SatWatch example.

5.3.2 Entity, Boundary, and Control Objects


The analysis object model consists of entity, boundary, and control objects [Jacobson et al., 1999].
Entity objects represent the persistent information tracked by the system. Boundary objects
represent the interactions between the actors and the system. Control objects are in charge of
realizing use cases. In the 2Bwatch example, Year, Month, and Day are entity objects; Button and
LCDDisplay are boundary objects; ChangeDateControl is a control object that represents the
activity of changing the date by pressing combinations of buttons.
Modeling the system with entity, boundary, and control objects provides developers with
simple heuristics to distinguish different, but related concepts. For example, the time that is tracked
by a watch has different properties than the display that depicts the time. Differentiating between
boundary and entity objects forces that distinction: The time that is tracked by the watch is
represented by the Time object. The display is represented by the LCDDisplay. This approach with
three object types results in smaller and more specialized objects. The three object-type approach
also leads to models that are more resilient to change: the interface to the system (represented by
the boundary objects) is more likely to change than its basic functionality (represented by the entity
and control objects). By separating the interface from the basic functionality, we are able to keep
most of a model untouched when, for example, the user interface changes, but the entity objects
do not.
To distinguish between different types of objects, UML provides the stereotype mechanism
to enable the developer to attach such meta-information to modeling elements. For example, in
Figure 5-5, we attach the «control» stereotype to the ChangeDateControl object. In addition to
stereotypes, we may also use naming conventions for clarity and recommend distinguishing the
29

three different types of objects on a syntactical basis: control objects may have the suffix Control
appended to their name; boundary objects may be named to clearly denote an interface feature
(e.g., by including the suffix Form, Button, Display, or Boundary); entity objects usually do not
have any suffix appended to their name. Another benefit of this naming convention is that the type
of the class is represented even when the UML stereotype is not available, for example, when
examining only the source code.

5.3.3 Generalization and Specialization


Modeling with UML, inheritance enables us to organize concepts into hierarchies. At the top of
the hierarchy is a general concept (e.g., an Incident, Figure 5-6), and at the bottom of the hierarchy
are the most specialized concepts (e.g., CatInTree, TrafficAccident, BuildingFire, EarthQuake,
ChemicalLeak). There may be any number of intermediate levels in between, covering more-or-
less generalized concepts (e.g., LowPriorityIncident, Emergency, Disaster). Such hierarchies allow
us to refer to many concepts precisely. When we use the term Incident, we mean all instances of
all types of Incidents. When we use the term Emergency, we only refer to an Incident that requires
an immediate response.
Generalization is the modeling activity that identifies abstract concepts from lower-level
ones. For example, assume we are reverse-engineering an emergency management system and
discover screens for managing traffic accidents and fires. Noticing common features among these
three concepts, we create an abstract concept called Emergency to describe the common (and
general) features of traffic accidents and fires.
Specialization is the activity that identifies more specific concepts from a high-level one.
For example, assume that we are building an emergency management system from scratch and that
we are discussing its functionality with the client. The client first introduces us with the concept
of an incident, then describes three types of Incidents: Disasters, which require the collaboration
of several agencies, Emergencies, which require immediate handling but can be handled by a single
agency, and LowPriorityIncidents, that do not need to be handled if resources are required for
other, higher-priority Incidents.
In both cases, generalization and specialization result in the specification of inheritance
relationships between concepts. In some instances, modelers call inheritance relationships
generalization-specialization relationships. In this book, we use the term “inheritance” to denote
30

the relationship and the terms “generalization” and “specialization” to denote the activities that
find inheritance relationships.

5.4 Analysis Activities: From Use Cases to Objects


In this section, we describe the activities that transform the use cases and scenarios produced
during requirements elicitation into an analysis model. Analysis activities include:
• Identifying Entity Objects (Section 5.4.1)
• Identifying Boundary Objects (Section 5.4.2)
• Identifying Control Objects (Section 5.4.3)
• Mapping Use Cases to Objects with Sequence Diagrams (Section 5.4.4)
• Modeling Interactions among Objects with CRC Cards (Section 5.4.5)
• Identifying Associations (Section 5.4.6)
• Identifying Aggregates (Section 5.4.7)
• Identifying Attributes (Section 5.4.8)
• Modeling State-Dependent Behavior of Individual Objects (Section 5.4.9)
• Modeling Inheritance Relationships (Section 5.4.10)
• Reviewing the Analysis Model (Section 5.4.11).
We illustrate each activity by focusing on the ReportEmergency use case of FRIEND
described in Module 2, Chapter 1, Requirements Elicitation. These activities are guided by
heuristics. The quality of their outcome depends on the experience of the developer in applying
these heuristics and methods. The methods and heuristics presented in this section are adapted
from [De Marco, 1978], [Jacobson et al., 1999], [Rumbaugh et al., 1991], and [Wirfs-Brock et al.,
1990].

5.4.1 Identifying Entity Objects


Participating objects (see Section 4.4.6) form the basis of the analysis model. As described in
Module 2, Chapter 1, Requirements Elicitation, participating objects are found by examining
each use case and identifying candidate objects. Natural language analysis [Abbott, 1983] is an
intuitive set of heuristics for identifying objects, attributes, and associations from a requirements
specification. Abbott’s heuristics maps parts of speech (e.g., nouns, having verbs, being verbs,
adjectives) to model components (e.g., objects, operations, inheritance relationships, classes).
Table 5-1 provides examples of such mappings by examining the ReportEmergency use case
(Figure 5-7).
Natural language analysis has the advantage of focusing on the users’ terms. However, it
suffers from several limitations. First, the quality of the object model depends highly on the style
of writing of the analyst (e.g., consistency of terms used, verbification of nouns). Natural language
is an imprecise tool, and an object model derived literally from text risks being imprecise.
Developers can address this limitation by rephrasing and clarifying the requirements specification
as they identify and standardize objects and terms. A second limitation of natural language analysis
is that there are many more nouns than relevant classes. Many nouns correspond to attributes or
31

synonyms for other nouns. Sorting through all the nouns for a large requirements specification is
a time-consuming activity. In general, Abbott’s heuristics work well for generating a list of initial
candidate objects from short descriptions, such as the flow of events of a scenario or a use case.

The following heuristics can be used in conjunction with Abbott’s heuristics:

Developers name and briefly describe the objects, their attributes, and their responsibilities
as they are identified. Uniquely naming objects promotes a standard terminology. For entity objects
we recommend always to start with the names used by end users and application domain
specialists. Describing objects, even briefly, allows developers to clarify the concepts they use and
avoid misunderstandings (e.g., using one object for two different but related concepts). Developers
need not, however, spend a lot of time detailing objects or attributes given that the analysis model
is still in flux. Developers should document attributes and responsibilities if they are not obvious;
a tentative name and a brief description for each object is sufficient otherwise. There will be plenty
of iterations during which objects can be revised. However, once the analysis model is stable, the
description of each object should be as detailed as necessary (see Section 5.4.11).

5.4.2 Identifying Boundary Objects


Boundary objects represent the system interface with the actors. In each use case, each actor
interacts with at least one boundary object. The boundary object collects the information from the
actor and translates it into a form that can be used by both entity and control objects.
32

Boundary objects model the user interface at a coarse level. They do not describe in detail the
visual aspects of the user interface. For example, boundary objects such as “menu item” or “scroll
bar” are too detailed. First, developers can discuss user interface details more easily with sketches
and mock-ups. Second, the design of the user interface continues to evolve as a consequence of
usability tests, even after the functional specification of the system becomes stable. Updating the
analysis model for every user interface change is time consuming and does not yield any
substantial benefit.

We have made progress toward describing the system. We now have included the interface
between the actor and the system. We are, however, still missing some significant pieces of the
description, such as the order in which the interactions between the actors and the system occur.
In the next section, we describe the identification of control objects.

5.4.3 Identifying Control Objects


Control objects are responsible for coordinating boundary and entity objects. Control objects
usually do not have a concrete counterpart in the real world. Often a close relationship exists
between a use case and a control object; a control object is usually created at the beginning of a
use case and ceases to exist at its end. It is responsible for collecting information from the boundary
objects and dispatching it to entity objects. For example, control objects describe the behavior
associated with the sequencing of forms, undo and history queues, and dispatching information in
a distributed system.

In the next section, we construct a sequence diagram using the ReportEmergency use case and the
objects we discovered to ensure the completeness of our model.
33

5.4.4 Mapping Use Cases to Objects with Sequence Diagrams


A sequence diagram ties use cases with objects. It shows how the behavior of a use case (or
scenario) is distributed among its participating objects. Sequence diagrams are usually not as good
a medium for communication with the user as use cases are, since sequence diagrams require more
background about the notation. For computer savvy clients, they are intuitive and can be more
precise than use cases. In all cases, however, sequence diagrams represent another shift in
perspective and allow the developers to find missing objects or grey areas in the requirements
specification.
In this section, we model the sequence of interactions among objects needed to realize the
use case. Figures 5-8 through 5-10 are sequence diagrams associated with the ReportEmergency
use case. The columns of a sequence diagram represent the objects that participate in the use case.
The left-most column is the actor who initiates the use case. Horizontal arrows across columns
represent messages, or stimuli, that are sent from one object to the other. Time proceeds vertically
from top to bottom. For example, the first arrow in Figure 5-8 represents the press message sent
by a FieldOfficer to an ReportEmergencyButton. The receipt of a message triggers the activation
of an operation. The activation is represented by a vertical rectangle from which other messages
can originate. The length of the rectangle represents the time the operation is active. In Figure 5-
8, the operation triggered by the press message sends a create message to the
ReportEmergencyControl class. An operation can be thought of as a service that the object
provides to other objects. Sequence diagrams also depict the lifetime of objects. Objects that
already exist before the first stimuli in the sequence diagram are depicted at the top of the diagram.
Objects that are created during the interaction are depicted with the «create» message pointing to
the object. Instances that are destroyed during the interaction have a cross indicating when the
object ceases to exist. Between the rectangle representing the object and the cross (or the bottom
of the diagram, if the object survives the interaction), a dashed line represents the time span when
the object can receive messages. The object cannot receive messages below the cross sign. For
example, in Figure 5-8 an object of class ReportEmergencyForm is created when object of
ReportEmergencyControl sends the «create» message and is destroyed once the
EmergencyReportForm has been submitted.
In general, the second column of a sequence diagram represents the boundary object with
which the actor interacts to initiate the use case (e.g., ReportEmergencyButton). The third column
is a control object that manages the rest of the use case (e.g., ReportEmergency Control). From
then on, the control object creates other boundary objects and may interact with other control
objects as well (e.g., ManageEmergencyControl).
By constructing sequence diagrams, we not only model the order of the interaction among
the objects, we also distribute the behavior of the use case. That is, we assign responsibilities to
each object in the form of a set of operations. These operations can be shared by any use case in
which a given object participates. Note that the definition of an object that is shared across two or
more use cases should be identical; that is, if an operation appears in more than one sequence
diagram, its behavior should be the same.
34

Sharing operations across use cases allows developers to remove redundancies in the
requirements specification and to improve its consistency. Note that clarity should always be given
precedence to eliminating redundancy. Fragmenting behavior across many operations
unnecessarily complicates the requirements specification.
In analysis, sequence diagrams are used to help identify new participating objects and
missing behavior. Because sequence diagrams focus on high-level behavior, implementation issues
such as performance should not be addressed at this point. Given that building interaction diagrams
can be time consuming, developers should focus on problematic or underspecified functionality
first. Drawing interaction diagrams for parts of the system that are simple or well defined might
not look like a good investment of analysis resources, but it should also be done to avoid
overlooking some key decisions.

5.4.5 Modeling Interactions among Objects with CRC Cards


An alternative for identifying interactions among objects are CRC cards [Beck & Cunningham,
1989]. CRC cards (CRC stands for class, responsibilities, and collaborators) were initially
introduced as a tool for teaching object-oriented concepts to novices and to experienced developers
unfamiliar with object-orientation. Each class is represented with an index card (called the CRC
35

card). The name of the class is depicted on the top, its responsibilities in the left column, and the
names of the classes it needs to accomplish its responsibilities are depicted in the right column.
Figure 5-12 depicts two cards for the ReportEmergencyControl and the Incident classes.

CRC cards can be used during modeling sessions with teams. Participants, typically a mix
of developers and application domain experts, go through a scenario and identify the classes that
are involved in realizing the scenario. One card per instance is put on the table. Responsibilities
are then assigned to each class as the scenario unfolds and participants negotiate the
responsibilities of each object. The collaborators column is filled as the dependencies with other
cards are identified. Cards are modified or pushed to the side as new alternatives are explored.
Cards are never thrown away, because building blocks for past alternatives can be reused when
new ideas are put on the table.
CRC cards and sequence diagrams are two different representations for supporting the
same type of activity. Sequence diagrams are a better tool for a single modeler or for documenting
a sequence of interactions, because they are more precise and compact. CRC cards are a better tool
for a group of developers refining and iterating over an object structure during a brainstorming
session, because they are easier to create and to modify.

5.4.6 Identifying Associations


An association shows a relationship between two or more classes. For example, a FieldOfficer
writes an EmergencyReport (see Figure 5-13). Identifying associations has two advantages. First,
it clarifies the analysis model by making relationships between objects explicit (e.g., an
EmergencyReport can be created by a FieldOfficer or a Dispatcher). Second, it enables the
developer to discover boundary cases associated with links. Boundary cases are exceptions that
must be clarified in the model. For example, it is intuitive to assume that most EmergencyReports
are written by one FieldOfficer. However, should the system support EmergencyReports written
by more than one? Should the system allow for anonymous EmergencyReports? Those questions
should be investigated during analysis by discussing them with the client or with end users.
Associations have several properties:
• A name to describe the association between the two classes (e.g., Writes in Figure 5-13).
Association names are optional and need not be unique globally.
36

• A role at each end, identifying the function of each class with respect to the associations (e.g.,
author is the role played by FieldOfficer in the Writes association).
• A multiplicity at each end, identifying the possible number of instances (e.g., * indicates a
FieldOfficer may write zero or more EmergencyReports, whereas 1 indicates that each
EmergencyReport has exactly one FieldOfficer as author).

Initially, the associations between entity objects are the most important, as they reveal more
information about the application domain. According to Abbott’s heuristics (see Table 5-1),
associations can be identified by examining verbs and verb phrases denoting a state (e.g., has, is
part of, manages, reports to, is triggered by, is contained in, talks to, includes). Every association
should be named, and roles should be assigned to each end.

The object model will initially include too many associations if developers include all
associations identified after examining verb phrases. In Figure 5-14, for example, we identify two
relationships: the first between an Incident and the EmergencyReport that triggered its creation;
the second between the Incident and the reporting FieldOfficer. Given that the EmergencyReport
and FieldOfficer already have an association modeling authorship, the association between
Incident and FieldOfficer is not necessary. Adding unnecessary associations complicates the
model, leading to incomprehensible models and redundant information.
Most entity objects have an identifying characteristic used by the actors to access them.
FieldOfficers and Dispatchers have a badge number. Incidents and Reports are assigned numbers
and are archived by date. Once the analysis model includes most classes and associations, the
developers should go through each class and check how it is identified by the actors and in which
context. For example, are FieldOfficer badge numbers unique across the universe? Across a city?
A Police station? If they are unique across cities, can the FRIEND system know about
FieldOfficers from more than one city? This approach can be formalized by examining each
individual class and identifying the sequence of associations that need to be traversed to access a
specific instance of that class.
37

5.4.7 Identifying Aggregates


Aggregations are special types of associations denoting a whole–part relationship. For example,
a FireStation consists of a number of FireFighters, FireEngines, Ambulances, and a LeadCar. A
State is composed of a number of Counties that are, in turn, composed of a number of Townships
(Figure 5-15). An aggregation is shown as a association with a diamond on the side of the whole
part.
There are two types of aggregation, composition and shared. A solid diamond denotes
composition. A composition aggregation indicates that the existence of the parts depends on the
whole. For example, a County is always part of exactly one State, a Township is always part of a
County. As political boundaries do not change often, a Township will not be part of or shared with
another County (at least, in the life time of the emergency response system).
A hollow diamond denotes a shared aggregation relationship, indicating the whole and
the part can exist independently. For example, although a FireEngine is part of at most one
FireStation at the time, it can be reassigned to a different FireStation during its life time.

Aggregation associations are used in the analysis model to denote whole–part concepts.
Aggregation associations add information to the analysis model about how containment concepts
38

in the application domain can be organized in a hierarchy or in a directed graph. Aggregations are
often used in the user interface to help the user browse through many instances. For example, in
Figure 5-15, FRIEND could offer a tree representation for Dispatchers to find Counties within a
State or Townships with a specific County. However, as with many modeling concepts, it is easy
to over-structure the model. If you are not sure that the association you are describing is a whole–
part concept, it is better to model it as a one-to-many association, and revisit it later when you have
a better understanding of the application domain.

5.4.8 Identifying Attributes


Attributes are properties of individual objects. For example, an EmergencyReport, as described
in Table 5-2, has an emergency type, a location, and a description property (see Figure 5-16). These
are entered by a FieldOfficer when she reports an emergency and are subsequently tracked by the
system. When identifying properties of objects, only the attributes relevant to the system should
be considered. For example, each FieldOfficer has a social security number that is not relevant to
the emergency information system. Instead, FieldOfficers are identified by badge number, which
is represented by the badgeNumber property.

Properties that are represented by objects are not attributes. For example, every
EmergencyReport has an author that is represented by an association to the FieldOfficer class.
Developers should identify as many associations as possible before identifying attributes to avoid
confusing attributes and objects. Attributes have:
• A name identifying them within an object. For example, an EmergencyReport may have a
reportType attribute and an emergencyType attribute. The reportType describes the kind of report
being filed (e.g., initial report, request for resource, final report). The emergencyType describes
the type of emergency (e.g., fire, traffic, other). To avoid confusion, these attributes should not
both be called type.
• A brief description.
• A type describing the legal values it can take. For example, the description attribute of an
EmergencyReport is a string. The emergencyType attribute is an enumeration that can take one of
three values: fire, traffic, other. Attribute types are based on predefined basic types in UML.
Attributes can be identified using Abbott’s heuristics (see Table 5-1). In particular, a noun
phrase followed by a possessive phrase (e.g., the description of an emergency) or an adjective
phrase (e.g., the emergency description) should be examined. In the case of entity objects, any
property that must be stored by the system is a candidate attribute.
39

Note that attributes represent the least stable part of the object model. Often, attributes are
discovered or added late in the development when the system is evaluated by the users. Unless the
added attributes are associated with additional functionality, the added attributes do not entail
major changes in the object (and system) structure. For these reasons, the developers need not
spend excessive resources in identifying and detailing attributes that represent less important
aspects of the system. These attributes can be added later when the analysis model or the user
interface sketches are validated.

5.4.9 Modeling State-Dependent Behavior of Individual Objects


Sequence diagrams are used to distribute behavior across objects and to identify operations.
Sequence diagrams represent the behavior of the system from the perspective of a single use case.
State machine diagrams represent behavior from the perspective of a single object. Viewing
behavior from the perspective of each object enables the developer to build a more formal
description of the behavior of the object, and consequently, to identify missing use cases. By
focusing on individual states, developers may identify new behavior. For example, by examining
each transition in the state machine diagram that is triggered by a user action, the developer should
be able to identify a flow step in a use case that describes the actor action that triggers the transition.
Note that it is not necessary to build state machines for every class in the system. Only objects
with an extended lifespan and state-dependent behavior are worth considering. This is almost
always the case for control objects, less often for entity objects, and almost never for boundary
objects.
Figure 5-17 displays a state machine for the Incident class. The examination of this state
machine may help the developer to check if there are use cases for documenting, closing, and
archiving Incidents. By further refining each state, the developer can add detail to the different
user actions that change the state of an incident. For example, during the Active state of an indicate,
FieldOfficers should be able to request new resources, and Dispatchers should be able to allocate
resource to existing incidents.
40

5.4.11 Reviewing the Analysis Model


The analysis model is built incrementally and iteratively. The analysis model is seldom correct or
even complete on the first pass. Several iterations with the client and the user are necessary before
the analysis model converges toward a correct specification usable by the developers for design
and implementation. For example, an omission discovered during analysis will lead to adding or
extending a use case in the requirements specification, which may lead to eliciting more
information from the user.
Once the number of changes to the model are minimal and the scope of the changes
localized, the analysis model becomes stable. Then the analysis model is reviewed, first by the
developers (i.e., internal reviews), then jointly by the developers and the client. The goal of the
review is to make sure that the requirements specification is correct, complete, consistent, and
unambiguous. Moreover, developers and client also review if the requirements are realistic and
verifiable. Note that developers should be prepared to discover errors downstream and make
changes to the specification. It is, however, a worthwhile investment to catch as many requirements
errors upstream as possible. The review can be facilitated by a checklist or a list of questions.
41

5.4.12 Analysis Summary


The requirements elicitation activity is highly iterative and incremental. Chunks of functionality
are sketched and proposed to the users and the client. The client adds requirements, criticizes
existing functionality, and modifies existing requirements. The developers investigate
nonfunctional requirements through prototyping and technology studies and challenge each
proposed requirement. Initially, requirements elicitation resembles a brainstorming activity. As the
description of the system grows and the requirements become more concrete, developers need to
extend and modify the analysis model in a more orderly manner to manage the complexity of
information.
Figure 5-19 depicts a typical sequence of the analysis activities. The users, developers, and
client are involved in developing an initial use case model. They identify a number of concepts
and build a glossary of participating objects. These first two activities were discussed in the
previous chapter. The remaining activities were covered in this section. The developers classify
the participating objects into entity, boundary, and control objects (in Define entity objects,
Section 5.4.1, Define boundary objects, Section 5.4.2, and Define control objects, Section
5.4.3). These activities occur in a tight loop until most of the functionality of the system has been
identified as use cases with names and brief descriptions. Then the developers construct sequence
diagrams to identify any missing objects (Define interactions, Section 5.4.4). When all entity
objects have been named and briefly described, the analysis model should remain fairly stable as
it is refined.
Define associations (Section 5.4.6), Define attributes (Section 5.4.8) and Define state
dependent behavior (Section 5.4.9) constitute the refinement of the analysis model. These three
activities occur in a tight loop during which the state of the objects and their associations are
extracted from the sequence diagrams and detailed. The use cases are then modified to account for
any changes in functionality. This phase may lead to the identification of an additional chunk of
functionality in the form of additional use cases. The overall process is then repeated incrementally
for these new use cases.
During Consolidate model (Section 5.4.10), the developers solidify the model by
introducing qualifiers and generalization relationships and suppressing redundancies. During
Review model (Section 5.4.11), the client, users, and developers examine the model for
correctness, consistency, completeness, and realism. The project schedule should plan for multiple
reviews to ensure high-quality requirements and to provide opportunities to learn the requirements
activity. However, once the model reaches the point where most modifications are cosmetic,
system design should proceed. There will come a point during requirements where no more
problems can be anticipated without further information from prototyping, usability studies,
technology surveys, or system design. Getting every detail right becomes a wasteful exercise: some
of these details will become irrelevant by the next change. Management should recognize this point
and initiate the next phase in the project.
42

5.5 Managing Analysis


In this section, we discuss issues related to managing the analysis activities in a multi-team
development project. The primary challenge in managing the requirements in such a project is to
maintain consistency while using so many resources. In the end, the requirements analysis
document should describe a single coherent system understandable to a single person.
We first describe a document template that can be used to document the results of analysis
(Section 5.5.1). Next, we describe the role assignment to analysis (Section 5.5.2). We then address
communication issues during analysis. Next, we address management issues related to the iterative
and incremental nature of requirements (Section 5.5.4).
5.5.1 Documenting Analysis
As we saw in the previous chapter, the requirements elicitation and analysis activities are
documented in the Requirements Analysis Document (RAD, Figure 5-20). RAD Sections 1
43

through 3.5.2 have already been written during requirements elicitation. During analysis, we revise
these sections as ambiguities and new functionality are discovered. The main effort, however,
focuses on writing the sections documenting the analysis object model (RAD Sections 3.5.3 and
3.5.4).
RAD Section 3.5.3, Object models, documents in detail all the objects we identified, their
attributes, and, when we used sequence diagrams, operations. As each object is described with
textual definitions, relationships among objects are illustrated with class diagrams.
RAD Section 3.5.4, Dynamic models, documents the behavior of the object model in terms
of state machine diagrams and sequence diagrams. Although this information is redundant with
the use case model, dynamic models enable us to represent more precisely complex behaviors,
including use cases involving many actors.
The RAD, once completed and published, will be baselined and put under configuration
management. The revision history section of the RAD will provides a history of changes including
the author responsible for each change, the date of the change, and brief description of the change.

5.5.2 Assigning Responsibilities


Analysis requires the participation of a wide range of individuals. The target user provides
application domain knowledge. The client funds the project and coordinates the user side of the
effort. The analyst elicits application domain knowledge and formalizes it. Developers provide
feedback on feasibility and cost. The project manager coordinates the effort on the development
side. For large systems, many users, analysts, and developers may be involved, introducing
additional challenges during for integration and communication requirements of the project. These
challenges can be met by assigning well-defined roles and scopes to individuals. There are three
main types of roles: generation of information, integration, and review.
44

• The end user is the application domain expert who generates information about the current
system, the environment of the future system, and the tasks it should support. Each user
corresponds to one or more actors and helps identify their associated use cases.
• The client, an integration role, defines the scope of the system based on user requirements.
Different users may have different views of the system, either because they will benefit from
different parts of the system (e.g., a dispatcher vs. a field officer) or because the users have different
opinions or expectations about the future system. The client serves as an integrator of application
domain information and resolves inconsistencies in user expectations.
• The analyst is the application domain expert who models the current system and generates
information about the future system. Each analyst is initially responsible for detailing one or more
use cases. For a set of use cases, the analysis will identify a number of objects, their associations,
and their attributes using the techniques outlined in Section 5.4. The analyst is typically a developer
with broad application domain knowledge.
• The architect, an integration role, unifies the use case and object models from a system point of
view. Different analysts may have different styles of modeling and different views of the parts of
the systems for which they are not responsible. Although analysts work together and will most
likely resolve differences as they progress through analysis, the role of the architect is necessary
to provide a system philosophy and to identify omissions in the requirements.
• The document editor is responsible for the low-level integration of the document and for the
overall format of the document and its index.
• The configuration manager is responsible for maintaining a revision history of the document
as well as traceability information relating the RAD with other documents (such as the System
Design Document; see Chapter 6, System Design: Decomposing the System).
• The reviewer validates the RAD for correctness, completeness, consistency, and clarity. Users,
clients, developers, or other individuals may become reviewers during requirements validation.
Individuals that have not yet been involved in the development represent excellent reviewers,
because they are more able to identify ambiguities and areas that need clarification.
The size of the system determines the number of different users and analysts that are needed
to elicit and model the requirements. In all cases, there should be one integrating role on the client
side and one on the development side. In the end, the requirements, however large the system,
should be understandable by a single individual knowledgeable in the application domain.
5.5.3 Communicating about Analysis
The task of communicating information is most challenging during requirements elicitation and
analysis. Contributing factors include
• Different backgrounds of participants. Users, clients, and developers have different domains
of expertise and use different vocabularies to describe the same concepts.
• Different expectations of stakeholders. Users, clients, and managements have different
objectives when defining the system. Users want a system that supports their current work
processes, with no interference or threat to their current position (e.g., an improved system often
translates into the elimination of current positions). The client wants to maximize return on
45

investment. Management wants to deliver the system on time. Different expectations and different
stakes in the project can lead to a reluctance to share information and to report problems in a timely
manner.
• New teams. Requirements elicitation and analysis often marks the beginning of a new project.
This translates into new participants and new team assignments, and, thus, into a ramp-up period
during which team members must learn to work together.
• Evolving system. When a new system is developed from scratch, terms and concepts related to
the new system are in flux during most of the analysis and the system design. A term may have a
different meaning tomorrow.
No requirements method or communication mechanism can address problems related to
internal politics and information hiding. Conflicting objectives and competition will always be part
of large development projects. A few simple guidelines, however, can help in managing the
complexity of conflicting views of the system:
• Define clear territories. Defining roles as described in Section 5.5.2 is part of this activity. This
also includes the definition of private and public discussion forums. For example, each team may
have a discussion database as described in Chapter 3, Project Organization and
Communication, and discussion with the client is done on a separate client database. The client
should not have access to the internal database. Similarly, developers should not interfere with
client/user internal politics.
• Define clear objectives and success criteria. The co-definition of clear, measurable, and
verifiable objectives and success criteria by both the client and the developers facilitates the
resolution of conflicts. Note that defining a clear and verifiable objective is a nontrivial task, given
that it is easier to leave objectives open-ended. The objectives and the success criteria of the project
should be documented in Section 1.3 of the RAD.
• Brainstorm. Putting all the stakeholders in the same room and to quickly generate solutions and
definitions can remove many barriers in communication. Conducting reviews as a reciprocal
activity (i.e., reviewing deliverables from both the client and the developers during the same
session) has a similar effect.
Brainstorming, and more generally the cooperative development of requirements, can lead
to the definition of shared, ad hoc notations for supporting the communication. Storyboards, user
interface sketches, and high-level dataflow diagrams often appear spontaneously. As the
information about the application domain and the new system accrue, it is critical that a precise
and structured notation be used. In UML, developers employ use cases and scenarios for
communicating with the client and the users, and use object diagrams, sequence diagrams, and
state machines to communicate with other developers (see Sections 4.4 and 5.4). Moreover, the
latest release of the requirements should be available to all participants. Maintaining a live online
version of the requirements analysis document with an up-to-date change history facilitates the
timely propagation of changes across the project.
46

5.5.4 Iterating over the Analysis Model


Analysis occurs iteratively and incrementally, often in parallel with other development activities
such as system design and implementation. Note, however, that the unrestricted modification and
extension of the analysis model can only result in chaos, especially when a large number of
participants are involved. Iterations and increments must be carefully managed and requests for
changes tracked once the requirements are baselined. The requirements activity can be viewed as
several steps (brainstorming, solidification, maturity) converging toward a stable model.
Brainstorming
Before any other development activity is initiated, requirements is a brainstorming process.
Everything—concepts and the terms used to refer to them—changes. The objective of a
brainstorming process is to generate as many ideas as possible without necessarily organizing
them. During this stage, iterations are rapid and far reaching.
Solidification
Once the client and the developers converge on a common idea, define the boundaries of
the system, and agree on a set of standard terms, solidification starts. Functionality is organized
into groups of use cases with their corresponding interfaces. Groups of functionality are allocated
to different teams that are responsible for detailing their corresponding use cases. During this stage,
iterations are rapid but localized.
Maturity
Changes at the higher level are still possible but more difficult, and thus, are made more
carefully. Each team is responsible for the use cases and object models related to the functionality
they have been assigned. A cross-functional team, the architecture team, made of representatives
of each team, is responsible for ensuring the integration of the requirements (e.g., naming).
Once the client signs off on the requirements, modification to the analysis model should
address omissions and errors. Developers, in particular the architecture team, need to ensure that
the consistency of the model is not compromised. The requirements model is under configuration
management and changes should be propagated to existing design models. Iterations are slow and
often localized.
The number of features and functions of a system will always increase with time. Each
change, however, can threaten the integrity of the system. The risk of introducing more problems
with late changes results from the loss of information in the project. The dependencies across
functions are not all captured; many assumptions may be implicit and forgotten by the time the
change is made.
When changes are necessary, the client and developer define the scope of the change and
its desired outcome and change the analysis model. Given that a complete analysis model exists
for the system, specifying new functionality is easier (although implementing it is more difficult).
5.5.5 Client Sign-Off
The client sign-off represents the acceptance of the analysis model (as documented by the
requirements analysis document) by the client. The client and the developers converge on a single
47

idea and agree about the functions and features that the system will have. In addition, they agree
on:
• a list of priorities
• a revision process
• a list of criteria that will be used to accept or reject the system
• a schedule and a budget.
Prioritizing system functions allows the developers to understand better the client’s
expectations. In its simplest form, it allows developers to separate bells and whistles from essential
features. It also allows developers to deliver the system in incremental chunks: essential functions
are delivered first; additional chunks are delivered depending on the evaluation of the previous
chunk. Even if the system is to be delivered as a single, complete package, prioritizing functions
enables the client to communicate clearly what is important to her and where the emphasis of the
development should be. Figure 5-21 provides an example of a priority scheme.

After the client sign off, the requirements are baselined and are used for refining the cost
estimate of the project. Requirements continue to change after the sign-off, but these changes are
subject to a more formal revision process. The requirements change, whether because of errors,
omissions, changes in the operating environment, changes in the application domain, or changes
in technology. Defining a revision process up front encourages changes to be communicated across
the project and reduces the number of surprises in the later stages. Note that a change process need
not be bureaucratic or require excessive overhead. It can be as simple as naming a person
responsible for receiving change requests, approving changes, and tracking their implementation.
Figure 5-22 depicts a more complex example in which changes are designed and reviewed
by the client before they are implemented in the system. In all cases, acknowledging that
requirements cannot be frozen (but only baselined) will benefit the project.
The list of acceptance criteria is revised prior to sign-off. The requirements elicitation and
analysis activity clarify many aspects of the system, including the nonfunctional requirements with
which the system should comply and the relative importance of each function. By restating the
acceptance criteria at sign-off, the client ensures that the developers are updated about any changes
in client expectations.
The budget and schedule are revisited after the analysis model becomes stable.
Whether the client sign-off is a contractual agreement or whether the project is already
governed by a prior contract, it is an important milestone in the project. It represents the
48

convergence of client and developer on a single set of functional definitions of the system and a
single set of expectations. The acceptance of the requirements analysis document is more critical
than any other document, given that many activities depend on the analysis model.

(Note: ARENA case study is there in Page number 205/200 of the


textbook Object-oriented software engineering _ using UML,
Patterns, -- Bruegge, Bernd; Dutoit, Allen H, in section 5.6)
49

Module 3, Chapter 1
System Design: Decomposing the System
System design is the transformation of an analysis model into a system design model. During
system design, developers define the design goals of the project and decompose the system into
smaller subsystems that can be realized by individual teams. Developers also select strategies for
building the system, such as the hardware/software strategy, the persistent data management
strategy, the global control flow, the access control policy, and the handling of boundary
conditions. The result of system design is a model that includes a subsystem decomposition and a
clear description of each of these strategies.
System design is not algorithmic. Developers have to make trade-offs among many design
goals that often conflict with each other. They also cannot anticipate all design issues that they will
face because they do not yet have a clear picture of the solution domain. System design is
decomposed into several activities, each addressing part of the overall problem of decomposing
the system:
• Identify design goals. Developers identify and prioritize the qualities of the system that
they should optimize.
• Design the initial subsystem decomposition. Developers decompose the system into
smaller parts based on the use case and analysis models. Developers use standard architectural
styles as a starting point during this activity.
• Refine the subsystem decomposition to address the design goals. The initial
decomposition usually does not satisfy all design goals. Developers refine it until all goals are
satisfied.
6.2 An Overview of System Design
Analysis results in the requirements model described by the following products:
• a set of nonfunctional requirements and constraints, such as maximum response time,
minimum throughput, reliability, operating system platform, and so on
• a use case model, describing the system functionality from the actors’ point of view
• an object model, describing the entities manipulated by the system
• a sequence diagram for each use case, showing the sequence of interactions among objects
participating in the use case.
The analysis model describes the system completely from the actors’ point of view and
serves as the basis of communication between the client and the developers. The analysis model,
however, does not contain information about the internal structure of the system, its hardware
configuration, or more generally, how the system should be realized. System design is the first step
in this direction. System design results in the following products:
• design goals, describing the qualities of the system that developers should optimize
• software architecture, describing the subsystem decomposition in terms of subsystem
responsibilities, dependencies among subsystems, subsystem mapping to hardware, and major
policy decisions such as control flow, access control, and data storage
50

• boundary use cases, describing the system configuration, startup, shutdown, and
exception handling issues.
The design goals are derived from the nonfunctional requirements. Design goals guide the
decisions to be made by the developers when trade-offs are needed. The subsystem decomposition
constitutes the bulk of system design. Developers divide the system into manageable pieces to deal
with complexity: each subsystem is assigned to a team and realized independently. For this to be
possible, developers need to address system-wide issues when decomposing the system. In this
chapter, we describe the concept of subsystem decomposition and discuss examples of generic
system decompositions called “architectural styles.” In the next chapter, we describe how the
system decomposition is refined to meet specific design goals. Figure 6-2 depicts the relationship
of system design with other software engineering activities.

6.3 System Design Concepts


In this section, we describe subsystem decompositions and their properties in more detail. First,
we define the concept of subsystem and its relationship to classes (Section 6.3.1). Next, we look
at the interface of subsystems (Section 6.3.2): subsystems provide services to other subsystems. A
service is a set of related operations that share a common purpose. During system design, we
define the subsystems in terms of the services they provide. Later, during object design, we define
the subsystem interface in terms of the operations it provides. Next, we look at two properties of
subsystems, coupling and cohesion (Section 6.3.3). Coupling measures the dependencies between
two subsystems, whereas cohesion measures the dependencies among classes within a subsystem.
Ideal subsystem decomposition should minimize coupling and maximize cohesion. Then, we look
51

at layering and partitioning, two techniques for relating subsystems to each other (Section 6.3.4).
Layering allows a system to be organized as a hierarchy of subsystems, each providing higher-
level services to the subsystem above it by using lower-level services from the subsystems below
it. Partitioning organizes subsystems as peers that mutually provide different services to each
other. In Section 6.3.5, we describe a number of typical software architectures that are found in
practice.
6.3.1 Subsystems and Classes
In Chapter 2, Modeling with UML, we introduced the distinction between application domain and
solution domain. In order to reduce the complexity of the application domain, we identified smaller
parts called “classes” and organized them into packages. Similarly, to reduce the complexity of the
solution domain, we decompose a system into simpler parts, called “subsystems,” which are made
of a number of solution domain classes. A subsystem is a replaceable part of the system with well-
defined interfaces that encapsulates the state and behavior of its contained classes. A subsystem
typically corresponds to the amount of work that a single developer or a single development team
can tackle. By decomposing the system into relatively independent subsystems, concurrent teams
can work on individual subsystems with minimal communication overhead. In the case of complex
subsystems, we recursively apply this principle and decompose a subsystem into simpler
subsystems (see Figure 6-3).

This subsystem decomposition is depicted in Figure 6-4 using UML components.


Components are depicted as rectangles with the component icon in the upper right corner.
Dependencies among components can be depicted with dashed stick arrows. In UML, components
can represent both logical and physical components. A logical component corresponds to a
subsystem that has no explicit run-time equivalent, for example, individual business components
that are composed together into a single run-time application logic layer. A physical component
corresponds to a subsystem that as an explicit run-time equivalent, for example, a database server.
For example, the accident management system we previously described can be
decomposed into a DispatcherInterface subsystem, realizing the user interface for the Dispatcher;
a FieldOfficerInterface subsystem, realizing the user interface for the FieldOfficer; an
IncidentManagement subsystem, responsible for the creation, modification, and storage of
Incidents; a ResourceManagement subsystem, responsible for tracking available Resources (e.g.,
FireTrucks and Ambulances); a MapManagement for depicting Maps and Locations; and a
52

Notification subsystem, implementing the communication between FieldOfficer terminals and


Dispatcher stations.

Several programming languages (e.g., Java and Modula-2) provide constructs for modeling
subsystems (packages in Java, modules in Modula-2). In other languages, such as C or C++,
subsystems are not explicitly modeled, so developers use conventions for grouping classes (e.g., a
subsystem can be represented as a directory containing all the files that implement the subsystem).
Whether or not subsystems are explicitly represented in the programming language, developers
need to document carefully the subsystem decomposition as subsystems are usually realized by
different teams.
6.3.2 Services and Subsystem Interfaces
A subsystem is characterized by the services it provides to other subsystems. A service is a set of
related operations that share a common purpose. A subsystem providing a notification service, for
example, defines operations to send notices, look up notification channels, and subscribe and
unsubscribe to a channel. The set of operations of a subsystem that are available to other
subsystems form the subsystem interface. The subsystem interface includes the name of the
operations, their parameters, their types, and their return values. System design focuses on defining
the services provided by each subsystem, that is, enumerating the operations, their parameters, and
their high-level behavior. Object design will focus on the application programmer interface
(API), which refines and extends the subsystem interfaces. The API also includes the type of the
parameters and the return value of each operation.
Provided and required interfaces can be depicted in UML with assembly connectors, also
called ball-and-socket connectors. The provided interface is shown as a ball icon (also called
lollipop) with its name next to it. A required interface is shown as a socket icon. The dependency
53

between two subsystems is shown by connecting the corresponding ball and socket in the
component diagram.

Figure 6-5 depicts the dependencies among the FieldOfficerInterface, DispatchterInterface


and ResourceManagement subsystems. The FieldOfficerInterface requires the
ResourceUpdateService to update the status and location of the FieldOfficer. The
DispatcherInterface requires the ResourceAllocationService to identify available resources and
allocating them to new Incidents. The ResourceManagement subsystem provides both services.
Note that we use the ball-and-socket notation when the subsystem decomposition is already fairly
stable and that our focus has shifted from the identification of subsystems to the definition of
services. During the early stages of system design, we may not have such a clear understanding of
the allocation of functionality to subsystems, in which case we use the dependency notation
(dashed arrows) of Figure 6-4.
6.3.3 Coupling and Cohesion
Coupling is the number of dependencies between two subsystems. If two subsystems are loosely
coupled, they are relatively independent, so modifications to one of the subsystems will have little
impact on the other. If two subsystems are strongly coupled, modifications to one subsystem is
likely to have impact on the other. A desirable property of a subsystem decomposition is that
subsystems are as loosely coupled as reasonable. This minimizes the impact that errors or future
changes in one subsystem have on other subsystems.
Cohesion is the number of dependencies within a subsystem. If a subsystem contains many
objects that are related to each other and perform similar tasks, its cohesion is high. If a subsystem
contains a number of unrelated objects, its cohesion is low. A desirable property of a subsystem
decomposition is that it leads to subsystems with high cohesion.
6.3.4 Layers and Partitions
A hierarchical decomposition of a system yields an ordered set of layers. A layer is a grouping
of subsystems providing related services, possibly realized using services from another layer.
Layers are ordered in that each layer can depend only on lower level layers and has no knowledge
of the layers above it. The layer that does not depend on any other layer is called the bottom layer,
and the layer that is not used by any other is called the top layer (Figure 6-9). In a closed
54

architecture, each layer can access only the layer immediately below it. In an open architecture,1
a layer can also access layers at deeper levels.

An example of a closed architecture is the Reference Model of Open Systems


Interconnection (in short, the OSI model), which is composed of seven layers [Day &
Zimmermann, 1983]. Each layer is responsible for performing a well-defined function. In addition,
each layer provides its services by using services of the layer below (Figure 6-10).
55

The Physical layer represents the hardware interface to the network. It is responsible for
transmitting bits over a communication channel. The DataLink layer is responsible for transmitting
data frames without error using the services of the Physical layer. The Network layer is responsible
for transmitting and routing packets within a network. The Transport layer is responsible for
ensuring that the data are reliably transmitted from end to end. The Transport layer is the interface
Unix programmers see when transmitting information over TCP/IP sockets between two processes.
The Session layer is responsible for initializing and authenticating a connection. The Presentation
layer performs data transformation services, such as byte swapping and encryption. The
Application layer is the system you are designing (unless you are building an operating system or
protocol stack). The Application layer can also consist of layered subsystems.
An example of an open architecture is the Swing user interface toolkit for Java [JFC, 2009].
The lowest layer is provided by the operating system or by a windowing system, such as X11, and
provides basic window management. AWT is an abstract window interface provided by Java to
shield applications from specific window platforms. Swing is a library of user interface objects
that provides a wide range of facilities, from buttons to geometry management. An application
usually accesses only the Swing interface. However, the Application layer may bypass the Swing
layer and directly access AWT. In general, the openness of the architecture allows developers to
bypass the higher layers to address performance bottlenecks (Figure 6-12).
56

Another approach to dealing with complexity is to partition the system into peer
subsystems, each responsible for a different class of services. For example, an onboard system for
a car could be decomposed into a travel service that provides real-time directions to the driver, an
individual preferences service that remembers a driver’s seat position and favorite radio station,
and vehicle service that tracks the car’s gas consumption, repairs, and scheduled maintenance.
Each subsystem depends loosely on the others, but can often operate in isolation.

6.3.5 Architectural Styles


As the complexity of systems increases, the specification of system decomposition is critical. It is
difficult to modify or correct weak decomposition once development has started, as most
subsystem interfaces would have to change. In recognition of the importance of this problem, the
concept of software architecture has emerged. A software architecture includes system
decomposition, global control flow, handling of boundary conditions, and inter-subsystem
communication protocols [Shaw & Garlan, 1996].
Repository
In the repository architectural style (see Figure 6-13), subsystems access and modify a
single data structure called the central repository. Subsystems are relatively independent and
interact only through the repository. Control flow can be dictated either by the central repository
(e.g., triggers on the data invoke peripheral systems) or by the subsystems (e.g., independent flow
of control and synchronization through locks in the repository).
Repositories are typically used for database management systems, such as a payroll system
or a bank system. The central location of the data makes it easier to deal with concurrency and
integrity issues between subsystems.
57

Compilers and software development environments also follow a repository architectural


style (Figure 6-14). The different subsystems of a compiler access and update a central parse tree
and a symbol table. Debuggers and syntax editors access the symbol table as well. The repository
subsystem can also be used for implementing the global control flow. In the compiler example of
Figure 6-14, each individual tool (e.g., the compiler, the debugger, and the editor) is invoked by
the user.

The repository only ensures that concurrent accesses are serialized. Conversely, the
repository can be used to invoke the subsystems based on the state of the central data structure.
These systems are called “blackboard systems.” The HEARSAY II speech understanding system
[Erman et al., 1980], one of the first blackboard systems, invoked tools based on the current state
of the blackboard.
58

Repositories are well suited for applications with constantly changing, complex data
processing tasks. Once a central repository is well defined, we can easily add new services in the
form of additional subsystems. The main disadvantage of repository systems is that the central
repository can quickly become a bottleneck, both from a performance aspect and a modifiability
aspect. The coupling between each subsystem and the repository is high, thus making it difficult
to change the repository without having an impact on all subsystems.
Model/View/Controller
In the Model/View/Controller (MVC) architectural style (Figure 6-15), subsystems are
classified into three different types: model subsystems maintain domain knowledge, view
subsystems display it to the user, and controller subsystems manage the sequence of interactions
with the user. The model subsystems are developed such that they do not depend on any view or
controller subsystem. Changes in their state are propagated to the view subsystem via a
subscribe/notify protocol. The MVC is a special case of the repository where Model implements
the central data structure and control objects dictate the control flow.

The subscription and notification functionality associated with this sequence of events is
usually realized with an Observer design pattern (see Section A.7). The Observer design pattern
allows the Model and the View objects to be further decoupled by removing direct dependencies
from the Model to the View. For more details, the reader is referred to [Gamma et al., 1994] and
to Section A.7.
The rationale between the separation of Model, View, and Controller is that user interfaces,
i.e., the View and the Controller, are much more often subject to change than is domain knowledge,
i.e., the Model. Moreover, by removing any dependency from the Model on the View with the
subscription/notification protocol, changes in the views (user interfaces) do not have any effect on
the model subsystems. In the example of Figure 6-16, we could add a Unix-style shell view of the
file system without having to modify the file system. We described a similar decomposition in
Module 2, Chapter 1, Analysis, when we identified entity, boundary, and control objects. This
decomposition is also motivated by the same considerations about change.
MVC is well suited for interactive systems, especially when multiple views of the same
model are needed. MVC can be used for maintaining consistency across distributed data; however,
it introduces the same performance bottleneck as for other repository styles.
59

Client/server
In the client/server architectural style (Figure 6-18), a subsystem, the server, provides
services to instances of other subsystems called the clients, which are responsible for interacting
with the user. The request for a service is usually done via a remote procedure call mechanism or
a common object broker (e.g., CORBA, Java RMI, or HTTP). Control flow in the clients and the
servers is independent except for synchronization to manage requests or to receive results.

An information system with a central database is an example of a client/server architectural


style. The clients are responsible for receiving inputs from the user, performing range checks, and
initiating database transactions when all necessary data are collected. The server is then responsible
for performing the transaction and guaranteeing the integrity of the data. In this case, a client/server
architectural style is a special case of the repository architectural style in which the central data
structure is managed by a process. Client/server systems, however, are not restricted to a single
server. On the World Wide Web, a single client can easily access data from thousands of different
servers.Client/server architectural styles are well suited for distributed systems that manage large
amounts of data.
Peer-to-peer
A peer-to-peer architectural style (see Figure 6-20) is a generalization of the client/ server
architectural style in which subsystems can act both as client or as servers, in the sense that each
subsystem can request and provide services. The control flow within each subsystem is
independent from the others except for synchronizations on requests.
An example of a peer-to-peer architectural style is a database that both accepts requests
from the application and notifies to the application whenever certain data are changed (Figure 6-
21). Peer-to-peer systems are more difficult to design than client/server systems because they
introduce the possibility of deadlocks and complicate the control flow.
60

Callbacks are operations that are temporary and customized for a specific purpose. For
example, a DBUser peer in Figure 6-21 can tell the DBMS peer which operation to invoke upon a
change notification. The DBUser then uses the callback operation specified by each DBUser for
notification when a change occurs. Peer-to-peer systems in which a “server” peer invokes “client”
peers only through callbacks are often referred to as client/server systems, even though this is
inaccurate since the “server” can also initiate the control flow.

Three-tier
The three-tier architectural style organizes subsystems into three layers (Figure 6-22):
• The interface layer includes all boundary objects that deal with the user, including
windows, forms, web pages, and so on.
• The application logic layer includes all control and entity objects, realizing the
processing, rule checking, and notification required by the application.
• The storage layer realizes the storage, retrieval, and query of persistent objects

The three-tier architectural style was initially described in the 1970s for information
systems. The storage layer, an analog to the Repository subsystem in the repository architectural
style, can be shared by several different applications operating on the same data. In turn, the
separation between the interface layer and the application logic layer enables the development or
modification of different user interfaces for the same application logic.
61

Four-tier
The four-tier architectural style is a three-tier architecture in which the Interface layer is
decomposed into a Presentation Client layer and a Presentation Server layer (Figure 6-23). The
Presentation Client layer is located on the user machines, whereas the Presentation Server layer
can be located on one or more servers. The four-tier architecture enables a wide range of different
presentation clients in the application, while reusing some of the presentation objects across
clients. For example, a banking information system can include a host of different clients, such as
a Web browser interface for home users, an Automated Teller Machine, and an application client
for bank employees. Forms shared by all three clients can then be defined and processed in the
Presentation Server layer, thus removing redundancy across clients.

Pipe and filter


In the pipe and filter architectural style (Figure 6-24), subsystems process data received
from a set of inputs and send results to other subsystems via a set of outputs. The subsystems are
called “filters,” and the associations between the subsystems are called “pipes.” Each filter knows
only the content and the format of the data received on the input pipes, not the filters that produced
them. Each filter is executed concurrently, and synchronization is accomplished via the pipes. The
pipe and filter architectural style is modifiable: filters can be substituted for others or reconfigured
to achieve a different purpose.
The best-known example of a pipe and filter architectural style is the Unix shell [Ritchie
& Thompson, 1974]. Most filters are written such that they read their input and write their results
on standard pipes. This enables a Unix user to combine them in many different ways. Figure 6-25
shows an example made of four filters. The output of ps (process status) is fed into grep (search
for a pattern) to remove all the processes that are not owned by a specific user. The output of grep
(i.e., the processes owned by the user) is then sorted by sort and sent to more, which is a filter that
displays its input to a terminal, one screen at a time.
62

Pipe and filter styles are suited for systems that apply transformations to streams of data
without intervention by users. They are not suited for systems that require more complex
interactions between components, such as an information management system or an interactive
system.

6.4 System Design Activities: From Objects to Subsystems


Using MyTrip, a driver can plan a trip from a home computer by contacting a trip-planning service
on the Web (PlanTrip in Figure 6-26). The trip is saved for later retrieval on the server. The trip-
planning service must support more than one driver.

We perform the analysis for the MyTrip system following the techniques outlined in Chapter 5,
Analysis, and obtain the model in Figure 6-28.
63

In addition, during requirements elicitation, our client specified the following


nonfunctional requirements for MyTrip:

6.4.2 Identifying Design Goals


The definition of design goals is the first step of system design. It identifies the qualities that our
system should focus on. Many design goals can be inferred from the nonfunctional requirements
or from the application domain. Others will have to be elicited from the client. It is, however,
necessary to state them explicitly such that every important design decision can be made
consistently following the same set of criteria.
64

For example, in the light of the nonfunctional requirements for MyTrip described in Section
6.4.1, we identify reliability and fault tolerance to connectivity loss as design goals. We then
identify security as a design goal, as numerous drivers will have access to the same trip planning
server. We add modifiability as a design goal, as we want to provide the ability for drivers to select
a trip planning service of their choice. The following box summarizes the design goals we
identified.

In general, we can select design goals from a long list of highly desirable qualities. Tables
6-2 through 6-6 list a number of possible design criteria. These criteria are organized into five
groups: performance, dependability, cost, maintenance, and end user criteria. Performance,
dependability, and end user criteria are usually specified in the requirements or inferred from the
application domain. Cost and maintenance criteria are dictated by the customer and the supplier.
Performance criteria (Table 6-2) include the speed and space requirements imposed on
the system.

Dependability criteria (Table 6-3) determine how much effort should be expended in
minimizing system crashes and their consequences.
65

Cost criteria (Table 6-4) include the cost to develop the system, to deploy it, and to
administer it. Note that cost criteria not only include design considerations but managerial ones,
as well. When the system is replacing an older one, the cost of ensuring backward compatibility
or transitioning to the new system has to be taken into account. There are also trade-offs between
different types of costs such as development cost, end user training cost, transition costs, and
maintenance costs. Maintaining backward compatibility with a previous system can add to the
development cost while reducing the transition cost.

Maintenance criteria (Table 6-5) determine how difficult it is to change the system after
deployment.

End user criteria (Table 6-6) include qualities that are desirable from a users’ point of
view, but have not yet been covered under the performance and dependability criteria. Often these
criteria do not receive much attention, especially when the client contracting the system is different
from its users.
66

When defining design goals, only a small subset of these criteria can be simultaneously
taken into account. It is, for example, unrealistic to develop software that is safe, secure, and cheap.
Typically, developers need to prioritize design goals and trade them off against each other as well
as against managerial goals as the project runs behind schedule or over budget. Table 6-7 lists
several possible trade-offs.

Managerial goals can be traded off against technical goals (e.g., delivery time vs.
functionality). Once we have a clear idea of the design goals, we can proceed to design an initial
subsystem decomposition.
6.4.3 Identifying Subsystems
Finding subsystems during system design is similar to finding objects during analysis. For
example, some of the object identification techniques we described in Chapter 5, Analysis, such
as Abbotts’s heuristics, are applicable to subsystem identification. Moreover, subsystem
decomposition is constantly revised whenever new issues are addressed: several subsystems are
merged into one subsystem, a complex subsystem is split into parts, and some subsystems are
67

added to address new functionality. The first iterations over subsystem decomposition can
introduce drastic changes in the system design model. These are often best handled through
brainstorming.
Another heuristic for subsystem identification is to keep functionally related objects
together. A starting point is to assign the participating objects that have been identified in each use
case to the subsystems. Some group of objects, as the Trip group in MyTrip, are shared and used
for communicating information from one subsystem to another. We can either create a new
subsystem to accommodate them or assign them to the subsystem that creates these objects.

Encapsulating subsystems with the Facade design pattern


Subsystem decomposition reduces the complexity of the solution domain by minimizing
coupling among subsystems. The Facade design pattern (see Appendix A.6 and [Gamma et al.,
1994]) allows us to further reduce dependencies between classes by encapsulating a subsystem
with a simple, unified interface. For example, in Figure 6-30, the Compiler class is a façade hiding
the classes CodeGenerator, Optimizer, ParseNode, Parser, and Lexer. The façade provides access
only to the public services offered by the subsystem and hides all other details, effectively reducing
coupling between subsystems. Subsystems identified during the initial subsystem decomposition
often result from grouping several functionally related classes. These subsystems are good
candidates for the Facade design pattern and should be encapsulated under one class.
68

Module 3, Chapter 2
System Design: Addressing Design Goals
During system design, we identify design goals, decompose the system into subsystems, and refine
the subsystem decomposition until all design goals are addressed. In the previous chapter, we
described the concepts of design goals and system decomposition. In this chapter, we introduce
the system design activities that address the design goals. In particular, we examine
• Selection of off-the-shelf and legacy components. Off-the-shelf or legacy components realize
specific subsystems more economically. The initial subsystem decomposition is adjusted to
accommodate them.
• Mapping of subsystem to hardware. When the system is deployed on several nodes, additional
subsystems are required for addressing reliability or performance issues.
• Design of a persistent data management infrastructure. Managing the states that outlives a
single execution of the system has an impact on overall system performance and leads to the
identification of one or more storage subsystems.
• Specification of an access control policy. Shared objects are protected so that user access to
them is controlled. Access control impacts how objects are distributed within subsystems.
• Design of the global control flow. Determining the sequence of operations impacts the interface
of the subsystems.
• Handling of boundary conditions. Once all subsystems have been identified, developers decide
on the order in which individual components are started and shutdown.
We then describe the management issues related to system design, such as documentation,
responsibilities, and communication.
7.2 An Overview of System Design Activities
Design goals guide the decisions to be made by the developers especially when trade-offs are
needed. Developers divide the system into manageable pieces to deal with complexity: each
subsystem is assigned to a team and realized independently. In order for this to be possible, though,
developers need to address system-wide issues when decomposing the system. In particular, they
need to address the following issues:
• Hardware/software mapping: What is the hardware configuration of the system? Which node
is responsible for which functionality? How is communication between nodes realized? Which
services are realized using existing software components? How are these components
encapsulated? Addressing hardware/software mapping issues often leads to the definition of
additional subsystems dedicated to moving data from one node to another, dealing with
concurrency, and reliability issues. Off-the-shelf components enable developers to realize complex
services more economically. User interface packages and database management systems are prime
examples of off-the shelf components. Components, however, should be encapsulated to minimize
dependency on a particular component; a competing vendor may offer a better product in the
future, and you want the option to switch.
69

• Data management: Which data should be persistent? Where should persistent data be stored?
How are they accessed? Persistent data represents a bottleneck in the system on many different
fronts: most functionality in system is concerned with creating or manipulating persistent data. For
this reason, access to the data should be fast and reliable. If retrieving data is slow, the whole
system will be slow. If data corruption is likely, complete system failure is likely. These issues
must be addressed consistently at the system level. Often, this leads to the selection of a database
management system and of an additional subsystem dedicated to the management of persistent
data.
• Access control: Who can access which data? Can access control change dynamically? How is
access control specified and realized? Access control and security are system wide issues. The
access control must be consistent across the system; in other words, the policy used to specify who
can and cannot access certain data should be the same across all subsystems.
• Control flow: How does the system sequence operations? Is the system event driven? Can it
handle more than one user interaction at a time? The choice of control flow has an impact on the
interfaces of subsystems. If an event-driven control flow is selected, subsystems will provide event
handlers. If threads are selected, subsystems must guarantee mutual exclusion in critical sections.
• Boundary conditions: How is the system initialized and shut down? How are exceptional cases
handled? System initialization and shutdown often represent much of the complexity of a system,
especially in a distributed environment. Initialization, shutdown, and exception handling have an
impact on the interface of all subsystems.
Figure 7-1 depicts the activities of system design. Each activity addresses one of the issues
we described above. Addressing any one of these issues can lead to changes in subsystem
decomposition and raising new issues.
70

7.3 Concepts: UML Deployment Diagrams


UML deployment diagrams are used to depict the relationship among run-time components and
nodes. Components are self-contained entities that provide services to other components or actors.
A Web server, for example, is a component that provides services to Web browsers. A Web browser
such as Safari is a component that provides services to a user. A node is a physical device or an
execution environment in which components are executed. A system is composed of interacting
run-time components that can be distributed among several nodes. Furthermore, a node can contain
another node, for example, a device can contain an execution environment.
In UML deployment diagrams, nodes are represented by boxes containing component
icons. Nodes can be stereotyped to denote physical devices or execution environments.
Communication paths between nodes are represented by solid lines. The protocol used by two
nodes to communicate can be indicated with a stereotype on the communication path. Figure 7-2
depicts an example of a deployment diagram with two Web browsers accessing a Web server. The
Web server in turns accesses a database server. We can see from the diagram that the Web browsers
do not directly access the database at any time.

The deployment diagram in Figure 7-2 focuses on the allocation of components to nodes
and provides a high-level view of each component. Components can be refined to include
information about the interfaces they provide and the classes they contain. Figure 7-3 illustrates
the WebServer component and its containing classes.
71

7.4 System Design Activities: Addressing Design Goals


In this section, we describe the activities needed to ensure that subsystem decomposition addresses
all the nonfunctional requirements and can account for any constraints during the implementation
phase. Here, we refine the subsystem decomposition by
• Mapping Subsystems to Processors and Components (Section 7.4.1)
• Identifying and Storing Persistent Data (Section 7.4.2)
• Providing Access Control (Section 7.4.3)
• Designing the Global Control Flow (Section 7.4.4)
• Identifying Services (Section 7.4.5)
• Identifying Boundary Conditions (Section 7.4.6)
• Reviewing the System Design Model (Section 7.4.7).
7.4.1 Mapping Subsystems to Processors and Components
Selecting a hardware configuration and a platform
Many systems run on more than one computer and depend on access to an intranet or to
the Internet. The use of multiple computers can address high-performance needs and interconnect
multiple distributed users. Consequently, we need to examine carefully the allocation of
subsystems to computers and the design of the infrastructure for supporting communication
between subsystems. These computers are modeled as nodes in UML deployment diagrams.
Because the hardware mapping activity has significant impact on the performance and complexity
of the system, we perform it early in system design.
Selecting a hardware configuration also includes selecting a virtual machine onto which
the system should be built. The virtual machine includes the operating system and any software
components that are needed, such as a database management system or a communication package.
The selection of a virtual machine reduces the distance between the system and the hardware
platform on which it will run. The more functionality the components provide, the less
development work is involved. The selection of the virtual machine, however, may be constrained
by a client who acquires hardware before the start of the project. The selection of a virtual machine
may also be constrained by cost considerations: it can be difficult to estimate whether building a
component cost more than buying it.
We select a Unix machine as the virtual machine for the: WebServer, and the Web browsers
Safari and Internet Explorer as the virtual machines for the: OnBoardComputer.
72

Allocating objects and subsystems to nodes


Once the hardware configuration has been defined and the virtual machines selected,
objects and subsystems are assigned to nodes. This often triggers the identification of new objects
and subsystems for transporting data among the nodes.
In general, allocating subsystems to hardware nodes enables us to distribute functionality
and processing power where it is most needed. Unfortunately, it also introduces issues related to
storing, transferring, replicating, and synchronizing data among subsystems. For this reason,
developers also select the components they will use for developing the system.

7.4.2 Identifying and Storing Persistent Data


Persistent data outlive a single execution of the system. For example, at the end of the day, an
author saves his work into a file on a word processor. The file can then be reopened later. The word
processor need not run for the file to exist. Similarly, information related to employees, their
employment status, and their paychecks live in a database management system. This allows all the
programs that operate on employee data to do so consistently. Moreover, storing data in a database
enables the system to perform complex queries on a large data set (e.g., the records of several
thousand employees).
73

Where and how data is stored in the system affects system decomposition. In some cases,
for example, in a repository architectural style (see Section 6.3.5), a subsystem can be completely
dedicated to the storage of data. The selection of a specific database management system can also
have implications on the overall control strategy and concurrency management.
Identifying persistent objects
First, we identify which data must be persistent. The entity objects identified during
analysis are obvious candidates for persistency. In general, we can identify persistent objects by
examining all the classes that must survive system shutdown, either in case of a controlled
shutdown or an unexpected crash. The system will then restore these long-lived objects by
retrieving their attributes from storage during system initialization or on demand as the persistent
objects are needed.
Selecting a storage management strategy
Once all persistent objects are identified, we need to decide how these objects should be
stored. The decision for storage management is more complex and is usually dictated by
nonfunctional requirements: Should the objects be retrieved quickly? Must the system perform
complex queries to retrieve these objects? Do objects require a lot of memory or disk space? In
general, there are currently three options for storage management:
• Flat files. Files are the storage abstractions provided by operating systems. The
application stores its data as a sequence of bytes and defines how and when data should be
retrieved. The file abstraction is relatively low level and enables the application to perform a
variety of size and speed optimizations. Files, however, require the application to take care of many
issues, such as concurrent access and loss of data in case of system crash.
• Relational database. A relational database provides data abstraction at a higher level
than flat files. Data are stored in tables that comply with a predefined type called a schema. Each
column in the table represents an attribute. Each row represents a data item as a tuple of attribute
values. Several tuples in different tables are used to represent the attributes of an individual object.
Mapping complex object models to a relational schema is challenging. Specialized methods, such
as [Blaha & Premerlani, 1998], provide a systematic way of performing this mapping. Relational
databases also provide services for concurrency management, access control, and crash recovery.
Relational databases have been used for a while and are a mature technology. Although scalable
and ideal for large data sets, they are relatively slow for small data sets and for unstructured data
(e.g., images, natural language text).
• Object-oriented database. An object-oriented database provides services similar to a
relational database. Unlike a relational database, it stores data as objects and associations. In
addition to providing a higher level of abstraction (and thus reducing the need to translate between
objects and storage entities), object-oriented databases provide developers with inheritance and
abstract data types. Object-oriented databases significantly reduce the time for the initial
development of the storage subsystem. However, they are slower than relational databases for
typical queries and are more difficult to tune.
74

7.4.3 Providing Access Control


In multi-user systems, different actors have access to different functionality and data. For example,
an everyday actor may only access the data it creates, whereas a system administrator actor may
have unlimited access to system data and to other users’ data. During analysis, we modeled these
distinctions by associating different use cases to different actors. During system design, we model
access by determining which objects are shared among actors, and by defining how actors can
control access. Depending on the security requirements of the system, we also define how actors
are authenticated to the system (i.e., how actors prove to the system who they are) and how selected
data in the system should be encrypted.

Defining access control for a multi-user system is usually more complex than in MyTrip.
In general, we need to define for each actor which operations they can access on each shared object.
For example, a bank teller may post credits and debits up to a predefined amount. If the transaction
exceeds the predefined amount, a manager must approve the transaction. Managers can examine
the branch statistics; but cannot access the statistics of other branches. Analysts can access
information across all branches of the corporation, but cannot post transactions on individual
accounts. We model access on classes with an access matrix. The rows of the matrix represent the
actors of the system. The columns represent classes whose access we control. An entry (class,
actor) in the access matrix is called an access right and lists the operations (e.g., postSmallDebit(),
postLargeDebit(), examineGlobalStats()) that can be executed on instances of the class by the
actor.
We can represent the access matrix using one of three different approaches: global access
table, access control list, and capabilities.
• A global access table represents explicitly every cell in the matrix as a (actor,class,
operation) tuple. Determining if an actor has access to a specific object requires looking up the
corresponding tuple. If no such tuple is found, access is denied.
75

• An access control list associates a list of (actor,operation) pairs with each class to be
accessed. Empty cells are discarded. Every time an object is accessed, its access list is checked for
the corresponding actor and operation. An example of an access control list is the guest list for a
party. A butler checks the arriving guests by comparing their names against names on the guest
list. If there is a match, the guests can enter; otherwise, they are turned away.
• A capability associates a (class,operation) pair with an actor. A capability allows an actor
access to an object of the class described in the capability. Denying a capability is equivalent to
denying access. An example of a capability is an invitation card for a party. In this case, the butler
checks if the arriving guests hold an invitation for the party. If the invitation is valid, the guests are
admitted; otherwise, they are turned away. No other checks are necessary.
The representation of the access matrix is also a performance issue. Often, the number of
actors and the number of protected objects is too large for either the capability or the access control
list representations. In such cases, rules can be used as a compact representation of the global
access matrix. For example, firewalls protect services located on Intranet Hosts from other hosts
on the Internet. Based on the source host and port, destination host and port, and packet size, the
firewall allows or denies packets to reach their destination.
When the number of actors and objects is large, a rule-based representation is more
compact than either access control lists or capabilities. Moreover, a small set of rules is more
readable, and hence, more easily proofed by a human reader, which is a critical aspect when setting
up a secure environment.
An access matrix only represents static access control. This means that access rights can
be modeled as attributes of the objects of the system. In the bank information system example,
consider a broker actor who is assigned a set of portfolios. By policy, a broker cannot access
portfolios managed by another broker. In this case, we need to model access rights dynamically in
the system, and, hence, this type of access is called dynamic access control.
In both static and dynamic access control, we assume that we know the actor: either the
user behind the keyboard or the calling subsystem. This process of verifying the association
between the identity of the user or subsystem and the system is called authentication. A widely
used authentication mechanism, for example, is for the user to specify a user name, known by
everybody, and a corresponding password, only known to the system and stored in an access
control list. The system protects its users’ passwords by encrypting them before storing or
transmitting them. If only a single user knows this user name–password combination, then we can
assume that the user behind the keyboard is legitimate. Although password authentication can be
made secure with current technology, it suffers from many usability disadvantages: users choose
passwords that are easy to remember and, thus, easy to guess. They also tend to write their
password on notes that they keep close to their monitor, and thus, visible to many other users,
authorized or not. Fortunately, other, more secure authentication mechanisms are available. For
example, a smart card can be used in conjunction with a password: an intruder would need both
the smart card and the password to gain access to the system. Better, we can use a biometric sensor
for analyzing patterns of blood vessels in a person’s fingers or eyes. An intruder would then need
76

the physical presence of the legitimate user to gain access to the system, which is much more
difficult than just stealing a smart card.
Encryption is used to prevent such unauthorized access. Using an encryption algorithm,
we can translate a message, called “plaintext,” into an encrypted message, called a “ciphertext,”
such that even if intercepted, it cannot be understood. Only the receiver has sufficient knowledge
to correctly decrypt the message, that is, to reverse the original process. The encryption process is
parameterized by a “key,” such that the method of encryption and decryption can be switched
quickly in case the intruder manages to obtain sufficient knowledge to decrypt the message.
Once authentication and encryption are provided, application-specific access control can
be more easily implemented on top of these building blocks. In all cases, addressing security issues
is a difficult topic. When addressing these issues, developers should record their assumptions and
describe the intruder scenarios they are considering. When several alternatives are explored,
developers should state the design problems they are attempting to solve and record the results of
the evaluation. We describe in the next chapter how to do this systematically using issue modeling.
7.4.4 Designing the Global Control Flow
Control flow is the sequencing of actions in a system. In object-oriented systems, sequencing
actions includes deciding which operations should be executed and in which order. These decisions
are based on external events generated by an actor or on the passage of time.
Control flow is a design problem. During analysis control flow is not an issue, because we
assume that all objects are running simultaneously executing operations any time they need to.
During system design, we need to take into account that not every object has the luxury of running
on its own processor. There are three possible control flow mechanisms:
• Procedure-driven control. Operations wait for input whenever they need data from an
actor. This kind of control flow is mostly used in legacy systems and systems written in procedural
languages. It introduces difficulties when used with object-oriented languages. As the sequencing
of operations is distributed among a large set of objects, it becomes increasingly difficult to
determine the order of inputs by looking at the code.
• Event-driven control. A main loop waits for an external event. Whenever an event
becomes available, it is dispatched to the appropriate object, based on information associated with
the event. This kind of control flow has the advantage of leading to a simpler structure and to
centralizing all input in the main loop. However, it makes the implementation of multi-step
sequences more difficult to implement.
• Threads. Threads are the concurrent variation of procedure-driven control: The system
can create an arbitrary number of threads, each responding to a different event. If a thread needs
additional data, it waits for input from a specific actor. This kind of control flow is the most
intuitive of the three mechanisms. However, debugging threaded software requires good tools:
preemptive thread schedulers introduce nondeterminism and, thus, make testing harder.
77

7.4.5 Identifying Services


Until this point, we have examined the key system design decisions that impact the subsystem
decomposition. We have now identified the main subsystems and we have a rough idea of how to
allocate responsibilities to each subsystem. In this activity, we refine the subsystem decomposition
by identifying the services provided by each subsystem. We review each dependency between
subsystems and define an interface for each service we identified (depicted in UML by a lollipop).
In this activity, we name the identified services. During object design, we specify each service
precisely in terms of operations, parameters, and constraints.
By focusing on dependencies between subsystems, we refine the subsystem
responsibilities, we find omissions in our decomposition, and we validate the current software
architecture. By focusing on services (as opposed to attributes or operations), we remain at the
architectural abstraction level, allowing us to reassign responsibilities between subsystems,
without changing many modeling elements.
7.4.6 Identifying Boundary Conditions
In previous sections, we dealt with designing and refining the system decomposition. We now have
a better idea of how to decompose the system, how to distribute use cases among subsystems,
where to store data, and how to achieve access control and ensure security. We still need to examine
the boundary conditions of the system—that is, to decide how the system is started, initialized,
and shut down—and we need to define how we deal with major failures such as data corruption
and network outages, whether they are caused by a software error or a power outage. Uses cases
dealing with these conditions are called boundary use cases.
It is common that boundary use cases are not specified during analysis or that they are
treated separately from the common use cases. For example, many system administration functions
can be inferred from the everyday user requirements (registering and deleting users, managing
access control), whereas, many other functions are consequences of design decisions (cache sizes,
location of database server, location of backup server) and not of requirement decisions. In general,
we identify boundary use cases by examining each subsystem and each persistent object:
• Configuration. For each persistent object, we examine in which use cases it is created or
destroyed (or archived). For objects that are not created or destroyed in any of the common use
cases (e.g., Maps in the MyTrip system), we add a use case invoked by a system administrator
(e.g., ManageMaps in the MyTrip system).
• Start-up and shutdown. For each component (e.g., a WebServer), we add three use cases
to start, shutdown, and configure the component. Note that a single use case can manage several
tightly coupled components.
• Exception handling. For each type of component failure (e.g., network outage), we
decide how the system should react (e.g., inform users of the failure). We document each of these
decisions with an exceptional use case that extends the relevant common uses cases identified
during requirements elicitation. Note that, when tolerating the effects of a failure, the handling of
an exceptional condition can lead to changing the system design instead of adding an exceptional
78

use case. For example, the RouteAssistant can completely download the Trip onto the car before
the start of the trip.
In general, an exception is an event or error that occurs during the execution of the system.
Exceptions are caused by three different sources:
• A hardware failure. Hardware ages and fails. A hard disk crash can lead to the permanent
loss of data. The failure of a network link, for example, can momentarily disconnect two nodes of
the system.
• Changes in the operating environment. The environment also affects the way a system
works. A wireless mobile system can lose connectivity if it is out of range of a transmitter. A power
outage can bring down the system, unless it is fitted with back-up batteries.
• A software fault. An error can occur because the system or one of its components
contains a design error. Although writing bug-free software is difficult, individual subsystems can
anticipate errors from other subsystems and protect against them.
Exception handling is the mechanism by which a system treats an exception. In the case
of a user error, the system should display a meaningful error message to the user so that she can
correct her input. In the case of a network link failure, the system should save its temporary state
so that it can recover when the network comes back on line.
7.4.7 Reviewing System Design
Like analysis, system design is an evolutionary and iterative activity. Unlike analysis, there is no
external agent, such as the client, to review the successive iterations and ensure better quality. This
quality improvement activity is still necessary, and project managers and developers need to
organize a review process to substitute for it. Several alternatives exist, such as using the
developers who were not involved in system design to act as independent reviewers, or to use
developers from another project to act as a peer review. These review processes work only if the
reviewers have an incentive to discover and report problems.
In addition to meeting the design goals that were identified during system design, we need
to ensure that the system design model is correct, complete, consistent, realistic, and readable. The
system design model is correct if the analysis model can be mapped to the system design model.
The model is complete if every requirement and every system design issue has been
addressed.
The model is consistent if it does not contain any contradictions.
The model is realistic if the corresponding system can be implemented.
The model is readable if developers not involved in the system design can understand the
model.
In many projects, you will find that system design and implementation overlap quite a bit.
For example, you may build prototypes of selected subsystems before the architecture is stable in
order to evaluate new technologies. This leads to many partial reviews instead of an encompassing
review followed by a client sign-off, as for analysis. Although this process yields greater flexibility,
it also requires developers to track open issues more carefully. Many difficult issues tend to be
resolved late not because they are difficult, but because they fell through the cracks of the process.
79

7.5 Managing System Design


In this section, we discuss issues related to managing the system design activities. As in analysis,
the primary challenge in managing the system design is to maintain consistency while using as
many resources as possible. In the end, the software architecture and the system interfaces should
describe a single cohesive system understandable by a single person.
We first describe a document template that can be used to document the results of system
design (Section 7.5.1). Next, we describe the role assignment during system design (Section 7.5.2)
and address communication issues during system design (Section 7.5.3). Next, we address
management issues related to the iterative nature of system design (Section 7.5.4).
7.5.1 Documenting System Design
System design is documented in the System Design Document (SDD). It describes design goals
set by the project, subsystem decomposition (with UML class diagrams), hardware/software
mapping (with UML deployment diagrams), data management, access control, control flow
mechanisms, and boundary conditions. The SDD is used to define interfaces between teams of
developers and serve as a reference when architecture-level decisions need to be revisited. The
audience for the SDD includes the project management, the system architects (i.e., the developers
who participate in the system design), and the developers who design and implement each
subsystem. Figure 7-18 is an example template for a SDD.
The first section of the SDD is an Introduction. Its purpose is to provide a brief overview
of the software architecture and the design goals. It also provides references to other documents
and traceability information (e.g., related requirements analysis document, references to existing
systems, constraints impacting the software architecture).
The second section, Current software architecture, describes the architecture of the
system being replaced. If there is no previous system, this section can be replaced by a survey of
current architectures for similar systems. The purpose of this section is to make explicit the
background information that system architects used, their assumptions, and common issues the
new system will address.
The third section, Proposed system architecture, documents the system design model of
the new system. It is divided into seven subsections:
• Overview presents a bird’s-eye view of the software architecture and briefly describes
the assignment of functionality to each subsystem.
• Subsystem decomposition describes the decomposition into subsystems and the
responsibilities of each. This is the main product of system design.
• Hardware/software mapping describes how subsystems are assigned to hardware and
off-the-shelf components. It also lists the issues introduced by multiple nodes and software reuse.
• Persistent data management describes the persistent data stored by the system and the
data management infrastructure required for it. This section typically includes the description of
data schemes, the selection of a database, and the description of the encapsulation of the database.
80

• Access control and security describes the user model of the system in terms of an access
matrix. This section also describes security issues, such as the selection of an authentication
mechanism, the use of encryption, and the management of keys.
• Global software control describes how the global software control is implemented. In
particular, this section should describe how requests are initiated and how subsystems synchronize.
This section should list and address synchronization and concurrency issues.
• Boundary conditions describe the start-up, shutdown, and error behavior of the system.
(If new use cases are discovered for system administration, these should be included in the
requirements analysis document, not in this section.)
The fourth section, Subsystem services, describes the services provided by each
subsystem. Although this section is usually empty or incomplete in the first versions of the SDD,
this section serves as a reference for teams for the boundaries between their subsystems. The
interface of each subsystem is derived from this section and detailed in the Object Design
Document.
The SDD is written after the initial system decomposition is done; that is, system architects
should not wait until all system design decisions are made before publishing the document. The
SDD, moreover, is updated throughout the process when design decisions are made or problems
are discovered. The SDD, once published, is baselined and put under configuration management.
The revision history section of the SDD provides a history of changes as a list of changes, including
author responsible for the change, date of change, and brief description of the change.

7.5.2 Assigning Responsibilities


System design in complex systems is centered around the architecture team. This is a cross-
functional team made up of architects who define the subsystem decomposition and selected
81

developers who will implement the subsystem. It is critical that system design include people who
are exposed to the consequences of system design decisions. The architecture team starts work as
soon as the analysis model is stable and continues to function until the end of the integration phase.
This creates an incentive for the architecture team to anticipate problems encountered during
integration. Below are the main roles of system design:
• The architect takes the main role in system design. The architect ensures consistency in
design decisions and interface styles. The architect ensures the consistency of the design in the
configuration management and testing teams, in particular in the formulation of the configuration
management policy and the system integration strategy. This is mainly an integration role
consuming information from each subsystem team. The architect is the leader of the cross-
functional architecture team.
• Architecture liaisons are the members of the architecture team. They are representatives
from the subsystem teams. They convey information from and to their teams and negotiate
interface changes. During system design, they focus on the subsystem services; during the
implementation phase, they focus on the consistency of the APIs.
• The document editor, configuration manager, and reviewer roles are the same as for
analysis (see Section 5.5.2).
The number of subsystems determines the size of the architecture team. For complex
systems, an architecture team is introduced for each level of abstraction. In all cases, there should
be one integrating role on the team to ensure consistency and the understandability of the
architecture by a single individual.
7.5.3 Communicating about System Design
Communication during system design should be less challenging than during analysis: the
functionality of the system has been defined; project participants have similar backgrounds and by
now should know each other better. Communication is still difficult, due to new sources of
complexity:
• Size. The number of issues to be dealt with increases as developers start designing. The
number of items that developers manipulate increases: each piece of functionality requires many
operations on many objects. Moreover, developers investigate, often concurrently, multiple
designs and multiple implementation technologies.
• Change. The subsystem decomposition and the interfaces of the subsystems are in
constant flux. Terms used by developers to name different parts of the system evolve constantly.
If the change is rapid, developers may not be discussing the same version of the subsystem, which
can lead to much confusion.
• Level of abstraction. Discussions about requirements can be made concrete by using
interface mock-ups and analogies with existing systems. Discussions about implementation
become concrete when integration and test results are available. System design discussions are
seldom concrete, as consequences of design decisions are felt only later, during implementation
and testing.
82

• Reluctance to confront problems. The level of abstraction of most discussions can also
make it easy to delay the resolution of difficult issues. A typical resolution of control issues is
often, “Let us revisit this issue during implementation.” Whereas it is usually desirable to delay
certain design decisions, such as the internal data structures and algorithms used by each
subsystem, any decision that has an impact on the system decomposition and the subsystem
interfaces should not be delayed.
• Conflicting goals and criteria. Individual developers often optimize different criteria. A
developer experienced in user interface design will be biased toward optimizing response time. A
developer experienced in databases might optimize throughput. These conflicting goals, especially
when implicit, result in developers pulling the system decomposition in different directions and
lead to inconsistencies.
The same techniques we discussed in analysis (see Section 5.5.3) can be applied during
system design:
• Identify and prioritize the design goals for the system and make them explicit (see
Section 6.4.2). If the developers concerned with system design have input in this process, they will
have an easier time committing to these design goals. Design goals also provide an objective
framework against which decisions can be evaluated.
• Make the current version of the system decomposition available to all concerned. A
live document distributed via the Internet is one way to achieve rapid distribution. Using a
configuration management tool to maintain the system design documents helps developers in
identifying recent changes.
• Maintain an up-to-date glossary. As in analysis, defining terms explicitly reduces
misunderstandings. When identifying and modeling subsystems, provide definitions in addition to
names. A UML diagram with only subsystem names is not sufficient for supporting effective
communication. A brief and substantial definition should accompany every subsystem and class
name.
• Confront design problems. Delaying design decisions can be beneficial when more
information is needed before committing to the design decision. This approach, however, can
prevent the confrontation of difficult design problems. Before tabling an issue, several possible
alternatives should be explored and described, and the delay justified. This ensures that issues can
be delayed without serious impact on the system decomposition.
• Iterate. Selected excursions into the implementation phase can improve the system
design. For example, new features in a vendor-supplied component can be evaluated by
implementing a vertical prototype (see Section 7.5.4) for the functionality most likely to benefit
from the feature.
Finally, no matter how much effort is expended on system design, the system decomposition and
the subsystem interfaces will almost certainly change during implementation. As new information
about implementation technologies becomes available, developers have a clearer understanding of
the system, and design alternatives are discovered. Developers should anticipate change and
reserve some time to update the SDD before system integration.
83

Module 4, Chapter 1
Object Design: Reusing Pattern Solutions
During analysis, we describe the purpose of the system. This results in the identification of
application objects. During system design, we describe the system in terms of its architecture, such
as its subsystem decomposition, global control flow, and persistency management. During system
design, we also define the hardware/software platform on which we build the system. This allows
the selection of off-the-shelf components that provide a higher level of abstraction than the
hardware. During object design, we close the gap between the application objects and the off-the-
shelf components by identifying additional solution objects and refining existing objects. Object
design includes
• reuse, during which we identify off-the-shelf components and design patterns to make
use of existing solutions
• service specification, during which we precisely describe each class interface
• object model restructuring, during which we transform the object design model to
improve its understandability and extensibility
• object model optimization, during which we transform the object design model to
address performance criteria such as response time or memory utilization.
Object design, like system design, is not algorithmic. The identification of existing patterns
and components is central to the problem-solving process. We discuss these building blocks and
the activities related to them. In this chapter, we provide an overview of object design and focus
on reuse, that is the selection of components and the application of design patterns.
8.2 An Overview of Object Design
Conceptually, software system development fills the gap between a given problem and an
existing machine. The activities of system development incrementally close this gap by identifying
and defining objects that realize part of the system.
Analysis reduces the gap between the problem and the machine by identifying objects
representing problem-specific concepts. During analysis the system is described in terms of
external behavior such as its functionality (use case model), the application domain concepts it
manipulates (object model), its behavior in terms of interactions (dynamic model), and its
nonfunctional requirements.
System design reduces the gap between the problem and the machine in two ways. First,
system design results in a virtual machine that provides a higher level of abstraction than the
machine. This is done by selecting off-the-shelf components for standard services such as
middleware, user interface toolkits, application frameworks, and class libraries. Second, system
design identifies off-the-shelf components for application domain objects such as reusable class
libraries of banking objects.
After several iterations of analysis and system design, the developers are usually left with
a puzzle that has a few pieces missing. These pieces are found during object design. This includes
identifying new solution objects, adjusting off-the-shelf components, and precisely specifying
84

each subsystem interface and class. The object design model can then be partitioned into sets of
classes that can be implemented by individual developers.
Object design includes four groups of activities (see Figure 8-2):
• Reuse. Off-the-shelf components identified during system design are used to help in the
realization of each subsystem. Class libraries and additional components are selected for basic data
structures and services. Design patterns are selected for solving common problems and for
protecting specific classes from future change. Often, components and design patterns need to be
adapted before they can be used. This is done by wrapping custom objects around them or by
refining them using inheritance. During all these activities, the developers are faced with the same
buy-versus-build trade-offs they encountered during system design.
• Interface specification. During this activity, the subsystem services identified during
system design are specified in terms of class interfaces, including operations, arguments, type
signatures, and exceptions. Additional operations and objects needed to transfer data among
subsystems are also identified. The result of service specification is a complete interface
specification for each subsystem. The subsystem service specification is often called subsystem
API (Application Programmer Interface).
• Restructuring. Restructuring activities manipulate the system model to increase code
reuse or meet other design goals. Each restructuring activity can be seen as a graph transformation
on subsets of a particular model. Typical activities include transforming N-ary associations into
binary associations, implementing binary associations as references, merging two similar classes
from two different subsystems into a single class, collapsing classes with no significant behavior
into attributes, splitting complex classes into simpler ones, and/or rearranging classes and
operations to increase the inheritance and packaging. During restructuring, we address design
goals such as maintainability, readability, and understandability of the system model.
• Optimization. Optimization activities address performance requirements of the system
model. This includes changing algorithms to respond to speed or memory requirements, reducing
multiplicities in associations to speed up queries, adding redundant associations for efficiency,
rearranging execution orders, adding derived attributes to improve the access time to objects, and
opening up the architecture, that is, adding access to lower layers because of performance
requirements.
Object design is not sequential. Although each group of activities described above
addresses a specific object design issue, they usually occur concurrently. A specific off-the-shelf
component may constrain the number of types of exceptions mentioned in the specification of an
operation and thus may impact the subsystem interface. The selection of a component may reduce
the implementation work while introducing new “glue” objects, which also need to be specified.
Finally, restructuring and optimizing may reduce the number of components to be implemented by
increasing the amount of reuse in the system.
Usually, interface specification and reuse activities occur first, yielding an object design
model that is then checked against the use cases that exercise the specific subsystem. Restructuring
and optimization activities occur next, once the object design model for the subsystem is relatively
85

stable. Focusing on interfaces, components, and design patterns results in an object design model
that is much easier to modify. Focusing on optimizations first tends to produce object design
models that are rigid and difficult to modify. However, as depicted in Figure 8-2, activities of
object design occur iteratively.

8.3 Reuse Concepts: Solution Objects, Inheritance, and Design


Patterns
In this section, we present the object design concepts related to reuse:
• Application Objects and Solution Objects (Section 8.3.1)
• Specification Inheritance and Implementation Inheritance (Section 8.3.2)
• Delegation (Section 8.3.3)
86

• The Liskov Substitution Principle (Section 8.3.4)


• Delegation and Inheritance in Design Patterns (Section 8.3.5).
8.3.1 Application Objects and Solution Objects
Modeling with UML, class diagrams can be used to model both the application domain
and the solution domain. Application objects, also called “domain objects,” represent concepts of
the domain that are relevant to the system. Solution objects represent components that do not have
a counterpart in the application domain, such as persistent data stores, user interface objects, or
middleware.
During analysis, we identify entity objects and their relationships, attributes, and
operations. Most entity objects are application objects that are independent of any specific system.
During analysis, we also identify solution objects that are visible to the user, such as boundary and
control objects representing forms and transactions defined by the system. During system design,
we identify more solution objects in terms of software and hardware platforms. During object
design, we refine and detail both application and solution objects and identify additional solution
objects needed to bridge the object design gap.
8.3.2 Specification Inheritance and Implementation Inheritance
During analysis, we use inheritance to classify objects into taxonomies. This allows us to
differentiate the common behavior of the general case, that is, the superclass (also called the “base
class”), from the behavior that is specific to specialized objects, that is, the subclasses (also called
the “derived classes”). The focus of generalization (i.e., identifying a common superclass from a
number of existing classes) and specialization (i.e., identifying new subclasses given an existing
superclass) is to organize analysis objects into an understandable hierarchy. Readers of the analysis
model can start from the abstract concepts, grasp the core functionality of the system, and make
their way down to concrete concepts and review specialized behavior.
The focus of inheritance during object design is to reduce redundancy and enhance
extensibility. By factoring all redundant behavior into a single superclass, we reduce the risk of
introducing inconsistencies during changes (e.g., when repairing a defect) since we have to make
changes only once for all subclasses. By providing abstract classes and interfaces that are used by
the application, we can write new specialized behavior by writing new subclasses that comply with
the abstract interfaces. For example, we can write an application manipulating images in terms of
an abstract Image class, which defines all the operations that all Images should support, and a
series of specialized classes for each image format supported by the application (e.g., GIFImage,
JPEGImage). When we need to extend the application to a new format, we only need to add a new
specialized class.
Although inheritance can make an analysis model more understandable and an object
design model more modifiable or extensible, these benefits do not occur automatically. On the
contrary, inheritance is such a powerful mechanism that novice developers often produce code that
is more obfuscated and more brittle than if they had not used inheritance in the first place.
Inheritance yields its benefits by decoupling the classes using a superclass from the
specialized subclasses. In doing so, however, it introduces a strong coupling along the inheritance
87

hierarchy between the superclass and the subclass. Whereas this is acceptable when the inheritance
hierarchy represents a taxonomy (e.g., it is acceptable for Image and GIFImage to be tightly
coupled), it introduces unwanted coupling in the other cases.
The use of inheritance for the sole purpose of reusing code is called implementation
inheritance. With implementation inheritance, developers reuse code quickly by subclassing an
existing class and refining its behavior. A Set implemented by inheriting from a Hashtable is an
example of implementation inheritance. Conversely, the classification of concepts into type
hierarchies is called specification inheritance (also called “interface inheritance”). The UML
class model of Figure 8-4 summarizes the four different types of inheritance we discussed in this
section.

8.3.3 Delegation
Delegation is the alternative to implementation inheritance that should be used when reuse is
desired. A class is said to delegate to another class if it implements an operation by resending a
message to another class. Delegation makes explicit the dependencies between the reused class
and the new class. The right column of Figure 8-3 shows an implementation of MySet using
delegation instead of implementation inheritance. The only significant change is the private field
table and its initialization in the MySet() constructor. This addresses both problems we mentioned
before:
• Extensibility. The MySet on the right column does not include the containsKey() method
in its interface and the new field table is private. Hence, we can change the internal representation
of MySet to another class (e.g., a List) without impacting any clients of MySet.
• Subtyping. MySet does not inherit from Hashtable and, hence, cannot be substituted for
a Hashtable in any of the client code. Consequently, any code previously using Hashtables still
behaves the same way.
88

Delegation is a preferable mechanism to implementation inheritance as it does not interfere


with existing components and leads to more robust code. Note that specification inheritance is
preferable to delegation in subtyping situations as it leads to a more extensible design.

8.3.4 The Liskov Substitution Principle


The Liskov Substitution Principle [Liskov, 1988] provides a formal definition for specification
inheritance. It essentially states that, if a client code uses the methods provided by a superclass,
then developers should be able to add new subclasses without having to change the client code.
For example, in the left column of Figure 8-3, this means that, if a client uses a Hashtable, the
client should not have to be modified when we substitute the Hashtable for any of its subclasses,
for example MySet. Clearly, this is not the case, so the relationship between MySet and Hashtable
is not a specification inheritance relationship. Below is the formal definition of the Liskov
Substitution Principle:
89

8.3.5 Delegation and Inheritance in Design Patterns


In general, when to use delegation or inheritance is not always clear and requires some experience
and judgement on the part of the developer. Inheritance and delegation, used in different
combinations, can solve a wide range of problems: decoupling abstract interfaces from their
implementation, wrapping around legacy code, and/or decoupling classes that specify a policy
from classes that provide mechanism.
In object-oriented development, design patterns are template solutions that developers
have refined over time to solve a range of recurring problems [Gamma et al., 1994]. A design
pattern has four elements:
1. A name that uniquely identifies the pattern from other patterns.
2. A problem description that describes the situations in which the pattern can be used.
Problems addressed by design patterns are usually the realization of modifiability and extensibility
design goals and nonfunctional requirements.
3. A solution stated as a set of collaborating classes and interfaces.
4. A set of consequences that describes the trade-offs and alternatives to be considered
with respect to the design goals being addressed.
When studying design patterns, you will notice that many patterns use a mix of inheritance
and delegation and therefore look similar. However, the same mechanisms are used in subtly
different ways. To clarify the differences, we use the following terms to denote different classes
participating in the pattern:
• The client class accesses the pattern. In the class diagram of the Adapter pattern (Figure
8-5), this class is simply called Client. Client classes can be either existing classes of a class library
or new classes of the system under development.
• The pattern interface is the part of the pattern that is visible to the client class. Often,
the pattern interface is realized by an abstract class or an interface. In the Adapter pattern, this class
is called ClientInterface.
• The implementor class provides the lower-level behavior of the pattern. In the Adapter
pattern, the LegacyClass and the Adapter are implementor classes. In many patterns, a number of
collaborating implementor classes are needed to realize the pattern behavior.
90

• The extender class specializes an implementor class to provide a different


implementation or an extended behavior of the pattern. In the Adapter pattern, the subtypes of
LegacyClass are extender classes. Note that, often, extender classes represent future classes that
developers anticipate.

Since developers have strived to evolve and refine design patterns for maximizing reuse
and flexibility, they are usually not solutions that programmers would initially think of. As design
patterns capture a great deal of knowledge (e.g., by documenting the context and tradeoffs involved
in applying a pattern), they also constitute a source of guidance about when to use inheritance and
delegation.
8.4 Reuse Activities: Selecting Design Patterns and Components
System design and object design introduce a strange paradox in the development process. On the
one hand, during system design, we construct solid walls between subsystems to manage
complexity by breaking the system into smaller pieces and to prevent changes in one subsystem
from affecting other subsystems. On the other hand, during object design, we want the software to
be modifiable and extensible to minimize the cost of future changes. These are conflicting goals:
we want to define a stable architecture to deal with complexity, but we also want to allow flexibility
to deal with change later in the development process. This conflict can be solved by anticipating
change and designing for it, as sources of later changes tend to be the same for many systems:
91

• New vendor or new technology. Commercial components used to build the system are
often replaced by equivalent ones from a different vendor. This change is common and generally
difficult to cope with. The software marketplace is dynamic, and vendors might go out of business
before your project is completed.
• New implementation. When subsystems are integrated and tested together, the overall
system response time is, more often than not, above performance requirements. System
performance is difficult to predict and should not be optimized before integration. Developers
should focus on the subsystem services first. This triggers the need for more efficient data
structures and algorithms—often under time constraints.
• New views. Testing the software with real users uncovers many usability problems. These
often translate into the need to create additional views on the same data.
• New complexity of the application domain. The deployment of a system triggers ideas
of new generalizations: a bank information system for one branch may lead to the idea of a multi-
branch information system. The application domain itself might also increase in complexity:
previously, flight numbers were associated with one plane, and one plane only, but with air carrier
alliances, one plane can now have a different flight number from each carrier.
• Errors. Many requirements’ errors are discovered when real users start using the system.
The use of delegation and inheritance in conjunction with abstract classes decouples the
interface of a subsystem from its actual implementation. In this section, we provide selected
examples of design patterns that can deal with the type of changes mentioned above.
8.4.1 Encapsulating Data Stores with the Bridge Pattern
Consider the problem of incrementally developing, testing, and integrating subsystems realized by
different developers. Subsystems may be completed at different times, delaying the integration of
all subsystems until the last one is completed. To avoid this delay, projects often use a stub
implementation in place of a specific subsystem so that the integration tests can start even before
the subsystems are completed. In other situations, several implementations of the same subsystem
are realized, such as a reference implementation that realizes the specified functionality with the
most basic algorithms, or an optimized implementation that delivers better performance at the cost
of additional complexity. In short, a solution is needed for dynamically substituting multiple
realizations of the same interface for different uses.
This problem can be addressed with the Bridge design pattern (Appendix A.3, [Gamma
et al., 1994]). In the early stages of the project, we are interested in a rudimentary storage
subsystem based on object serialization for the purpose of debugging and testing the core use cases
of the TournamentManagement subsystem. The entity objects will be subject to many changes,
and we do not know yet what performance bottlenecks will be encountered during storage.
Consequently, an efficient storage subsystem should not be the focus of the first prototype. As
discussed during the system design of ARENA (Section 7.6.4), however, we anticipate that both a
file-based implementation and a relational database implementation of the storage subsystem
should be provided, in the first and second iteration of the system, respectively. In addition, a set
of stubs should be provided to allow early integration testing even before the file-based
92

implementation is ready. To solve this problem, we apply the Bridge pattern shown in Figure 8-7.
The LeagueStore is the interface class to the pattern, and provides all high-level functionality
associated with storage. The LeagueStoreImplementor is an abstract interface that provides the
common interface for the three implementations, namely the StubStoreImplementor for the stubs,
the XMLStoreImplementor for the file-based implementation, and the JDBCStoreImplementor for
the relational database implementation.

Note that even if most LeagueStoreImplementors provide similar services, using a Bridge
abstraction reduces performance. The design goals we defined at the beginning of system design
(Section 6.4.2) help us decide about performance and modifiability trade-offs.
Inheritance and delegation in the Bridge pattern
The Bridge pattern interface is realized by the Abstraction class, and its behavior by the
selected ConcreteImplementor class. The design pattern can be extended by providing new
RefinedAbstraction or ConcreteImplementor classes. This pattern is a classic example of
combining specification inheritance and delegation to achieve both reuse and flexibility.
On the one hand, specification inheritance is used between the abstract Implementor
interface and the classes ConcreteImplementors. As a result, each ConcreteImplementor can be
substituted transparently at runtime, from the Abstraction class and RefinedAbstraction classes.
This also ensures that, when adding a new ConcreteImplementor, developers will strive to provide
the same behavior as all other ConcreteImplementors.
On the other hand, Abstraction and Implementor are decoupled using delegation. This
enables the distribution of different behavior in each of the side of the bridge. For example, the
LeagueStore class in Figure 8-7 provides the high-level behavior for storing Leagues, whereas the
concrete LeagueStoreImplementor provides specific lower-level functionality that differs in its
realization from one storage approach to the other. Since LeagueStore and
LeagueStoreImplementor provide different behaviors, they cannot be treated as subtypes
according to the Liskov Substitution Principle.
93

8.4.2 Encapsulating Legacy Components with the Adapter Pattern


As the complexity of systems increases and the time to market shortens, the cost of software
development significantly exceeds the cost of hardware. Hence, developers have a strong incentive
to reuse code from previous projects or to use off-the-shelf components. Interactive systems, for
example, are now rarely built from scratch; they are developed with user interface toolkits that
provide a wide range of dialogs, windows, buttons, or other standard interface objects. Interface
engineering projects focus on reimplementing only part of an existing system. For example,
corporate information systems, costly to design and build, must be updated to new client hardware.
Often, only the client side of the system is upgraded with new technology; the back end of the
system left untouched. Whether dealing with off-the-shelf component or legacy code, developers
have to deal with code they cannot modify and which usually was not designed for their system.
We deal with existing components by encapsulating them. This approach has the advantage
of decoupling the system from the encapsulated component, thus minimizing the impact of existing
software on the new design. This can be done using an Adapter pattern.
The Adapter design pattern (Appendix A.2, [Gamma et al., 1994]) converts the interface
of a component into an interface that the client expects. This interface is called the ClientInterface
in Figure 8-5. An Adapter class provides the glue between ClientInterface and LegacyClass. For
example, assume the client is the static sort() method of the Java Array class (Figures 8-8 and 8-
9). This method expects two arguments a, an Array of objects, and c, a Comparator object, which
provides a compare() method to define the relative order between elements. Assume we are
interested in sorting strings of the class MyString, which defines the greaterThan() and an equals()
methods. To sort an Array of MyStrings, we need to define a new comparator,
MyStringComparator, which provides a compare() method using greaterThan() and equals().
MyStringComparator is an Adapter class.

Inheritance and delegation in the Adapter pattern


The Adapter pattern uses specification inheritance between the ClientInterface and the
Adapter. The Adapter in turn delegates to the LegacyClass implementor class to realize the
operations declared in ClientInterface. On the one hand, this enables all client code that already
94

uses the ClientInterface to work with instances of Adapter transparently and without modification
of the client. On the other hand, the same Adapter can be used for subtypes of the LegacyClass.
Note that the Bridge and the Adapter patterns are similar in purpose and structure. Both
decouple an interface from an implementation, and both use a specification inheritance relationship
and a delegation relationship. They differ in the context in which they are used and in the order in
which delegation and inheritance occur. The Adapter pattern uses inheritance first and then
delegation, whereas the Bridge pattern uses delegation first and then inheritance. The Adapter
pattern is applied when the interface (i.e., ClientInterface) and the implementation (i.e.,
LegacyClass) already exist and cannot be modified. When developing new code, the Bridge pattern
is a better choice as it provides more extensibility.
8.4.3 Encapsulating Context with the Strategy Pattern
Consider a mobile application running on a wearable computer that uses different networks
protocols depending on the location of the user: assume, for example, a car mechanic using the
wearable computer to access repair manuals and maintenance records for the vehicle under repair.
The wearable computer should operate in the shop with access to a local wireless network as well
as on the roadside using a third-generation mobile phone network, such as UMTS. When updating
or configuring the mobile application, a system administrator should be able to use the wearable
computer with access to a wired network such as Ethernet. This means that the mobile application
needs to deal with different types of networks as it switches between networks dynamically, based
on factors such as location and network costs. Assume that during the system design of this
application, we identify the dynamic switching between wired and wireless networks as a critical
design goal. Furthermore, we want to be able to deal with future network protocols without having
to recompile the application.
To achieve both of these goals, we apply the Strategy design pattern (Appendix A.9,
[Gamma et al., 1994]). The system model and implementation, respectively, are shown in Figures
8-10 and 8-11. The Strategy class is realized by NetworkInterface, which provides the common
interface to all networks; the Context class is realized by a NetworkConnection object, which
represents a point-to-point connection between the wearable and a remote host. The Client is the
mobile application. The Policy is the LocationManager, which monitors the current location of the
wearable and the availability of networks, and configures the NetworkConnection objects with the
appropriate NetworkInterfaces. When the LocationManager object invokes the
setNetworkInterface() method, the NetworkConnection object shuts down the current
NetworkInterface and initializes the new NetworkInterface transparently from the rest of the
application.
//PTO
95

Inheritance and delegation in the Strategy pattern


The class diagrams for the Bridge and the Strategy patterns (see Figures 8-7 and 8-10) are
almost identical. The key difference is in the creator of the concrete implementation classes: In the
Bridge pattern, the class Abstraction creates and initializes the ConcreteImplementations. In the
Strategy pattern, however, the Context is not aware of the ConcreteStrategies. Instead, a client
creates the ConcreteStrategy objects and configures the Context. Moreover,
ConcreteImplementations in the Bridge pattern is usually created at initialization time, while
ConcreteStrategies in the Strategy pattern is usually created and substituted several times during
run time.
8.4.4 Encapsulating Platforms with the Abstract Factory Pattern
Consider an application for an intelligent house: the application receives events from sensors
distributed throughout the house (e.g., light bulb on, light bulb off, window open, window closed,
inside and outside temperature, weather forecasts), identifies predefined patterns, and issues
commands for actuators (e.g., turn air-conditioning on, store statistics on energy consumption,
close garage door, trigger theft alarm). Although several manufacturers provide the hardware to
build such applications (e.g., EIB, Zumtobel’s Luxmate), interoperability in this domain is
currently poor, preventing the mix and match of devices from different manufacturers, and thus,
making it difficult to develop a single software solution for all manufacturers.
We use the Abstract Factory design pattern (AppendixA.1) to solve this problem. In our
intelligent house, each manufacturer provides temperature sensors, electric blinds that report if
they are forced in, and intelligent light bulbs that report if they have burned out. As shown in
Figure8-12, these generic objects are called AbstractProducts (e.g., LightBulb, Blind), and their
concrete realizations are called ConcreteProducts (e.g., EIBLightBulb, ZumtobelLightBulb,
96

EIBBlind, ZumtobelBlind). One factory for each manufacturer (e.g., ZumtobelFactory,


EIBFactory) provides methods for creating the ConcreteProducts (e.g., createLightBulb(),
createBlind()). The Client classes (e.g., a TheftApplication) access only the interfaces provided by
the AbstractFactory and the AbstractProducts, thereby shielding the Client classes completely from
the manufacturer of the underlying products.

Inheritance and delegation in the Abstract Factory pattern


The Abstract Factory pattern uses specification inheritance to decouple the interface of a
product from its realization. However, since products of the same platform usually depend on each
other and access the concrete product classes, products of different platforms cannot be substituted
transparently. For example, EIBBulbs are incompatible with LuxmateBulbs and should not be
mixed within the same intelligent house system. To ensure that a consistent set of products is
created, the Client can only create products by using a ConcreteFactory, which delegates the
creation operations to the respective products. By using specification inheritance to decouple
ConcreteFactories from their interface, product families from different manufacturers can be
substituted transparently from the client.
8.4.5 Encapsulating Control Flow with the Command Pattern
In interactive systems and in transaction systems, it is often desirable to execute, undo, or store
user requests without knowing the content of the request. For example, consider the case of
matches in the ARENA tournament management system. We want to record individual moves in
matches so that these moves can be replayed by a spectator at a later date. However, we also want
97

ARENA to support a broad spectrum of games, so we do not want the classes responsible for
recording and replaying moves to depend on any specific game.
We can apply the Command design pattern (Appendix A.4, [Gamma et al., 1994]) to this
effect. The key to decoupling game moves from their handling is to represent game moves as
command objects that inherit from an abstract class called Move in Figure 8-13. The Move class
declares operations for executing, undoing, and storing commands, whereas ConcreteCommands
classes (i.e., TicTacToeMove and ChessMove in ARENA) implement specific commands. The
classes responsible for recording and replaying games only access the GameMove abstract class
interface, thus making the system extensible to new Games.

Inheritance and delegation in the Command pattern


The Command design pattern uses specification inheritance between the Command class
and ConcreteCommands, enabling new commands to be added independently from the Invoker.
Delegation is used between ConcreteCommands and Receivers, and between Invoker and
Command, enabling ConcreteCommands to be dynamically created, executed, and stored. The
Command pattern is often used in a Model/View/Controller software architecture, where Receivers
are model objects, Invoker and Commands are controller objects, and Clients creating Commands
are view objects.
8.4.6 Encapsulating Hierarchies with the Composite Design Pattern
User interface toolkits, such as Swing and Cocoa, provide the application developer with a range
of classes as building blocks. Each class implements a specialized behavior, such as inputting text,
selecting and deselecting a check box, pushing a button, or pulling down a menu. The user interface
design can aggregate these components into Windows to build application-specific interfaces. For
example, a preferences dialog may include a number of on-off check boxes for enabling different
features in the application.
As windows become more complex and include many different users interface objects,
their layout (i.e., moving and resizing each component so that the window forms a coherent whole)
becomes increasingly unmanageable. Consequently, modern toolkits enable the developer to
organize the user interface objects into hierarchies of aggregate nodes, called “panels,” that can be
98

manipulated the same way as the concrete user interface objects. For example, our preferences
dialog can include a top panel for the title of the dialog and instructions for the user, a center panel
containing the checkboxes and their labels, and a bottom panel for the ‘ok’ and ‘cancel’ button.
Each panel is responsible for the layout of its subpanels, called “children,” and the overall dialog
only has to deal with the three panels.
Swing addresses this problem with the Composite design pattern (Appendix A.5,
[Gamma et al., 1994]) as depicted in Figure 8-16. An abstract class called Component is the roof
of all user interface objects, including Checkboxes, Buttons, and Labels. Composite, also a
subclass of Component, is a special user interface object representing aggregates including the
Panels we mentioned above. Note that Windows and Applets (the root of the instance hierarchy)
are also Composite classes that have additional behavior for dealing with the window manager and
the browser, respectively.

8.4.7 Heuristics for Selecting Design Patterns


Identifying the correct design pattern for a given problem is difficult unless you already have some
experience in using design patterns. Pattern catalogs are large and varied, and one cannot expect
developers to read them completely. As design patterns address a specific design goal or a specific
nonfunctional requirement, another technique is to use key phrases in the Requirements Analysis
Document (RAD) and the System Design Document (SDD) to select candidate patterns. This is
similar to the Abbott’s natural language technique described in Module 2, Chapter 2, Analysis. The
heuristics box below provides example key phrases for the patterns covered in this chapter:
99

8.4.8 Command Composite Strategy Identifying and Adjusting


Application Frameworks
Application frameworks
An application framework is a reusable partial application that can be specialized to
produce custom applications [Johnson & Foote, 1988]. In contrast to class libraries, frameworks
are targeted to particular technologies, such as data processing or cellular communications, or to
application domains, such as user interfaces or real-time avionics. The key benefits of application
frameworks are reusability and extensibility. Framework reusability leverages the application
domain knowledge and the prior effort of experienced developers to avoid recreation and
revalidation of recurring solutions. An application framework enhances extensibility by providing
hook methods, which are overwritten by the application to extend the application framework.
Hook methods systematically decouple the interfaces and behaviors of an application domain from
the variations required by an application in a particular context. Framework extensibility is
essential to ensure timely customization of new application services and features.
Frameworks can be classified by their position in the software development process.
• Infrastructure frameworks aim to simplify the software development process. Examples
include frameworks for operating systems [Campbell & Islam, 1993], debuggers [Bruegge et al.,
1993], communication tasks [Schmidt, 1997], user interface design [Weinand et al., 1988], and
100

Java Swing [JFC, 2009]. System infrastructure frameworks are used internally within a software
project and are usually not delivered to a client.
• Middleware frameworks are used to integrate existing distributed applications and components.
Common examples include Microsoft’s MFC and DCOM, Java RMI, WebObjects [Wilson &
Ostrem, 1999], WebSphere [IBM], WebLogic Enterprise Application [BEA], implementations of
CORBA [OMG, 2008], and transactional databases.
• Enterprise application frameworks are application specific and focus on domains such as
telecommunications, avionics, environmental modeling, manufacturing, financial engineering
[Birrer, 1993], and enterprise business activities [JavaEE, 2009].
Infrastructure and middleware frameworks are essential to create rapidly high-quality
software systems, but they are usually not requested by external customers. Enterprise frameworks,
however, support the development of end-user applications. As a result, buying infrastructure and
middleware frameworks is more cost effective than building them [Fayad & Hamu, 1997].
Frameworks can also be classified by the techniques used to extend them.
• Whitebox frameworks rely on inheritance and dynamic binding for extensibility.
Existing functionality is extended by subclassing framework base classes and overriding
predefined hook methods using patterns such as the template method pattern [Gamma et al., 1994].
• Blackbox frameworks support extensibility by defining interfaces for components that
can be plugged into the framework. Existing functionality is reused by defining components that
conform to a particular interface and integrating these components with the framework using
delegation.
Whitebox frameworks require intimate knowledge of the framework’s internal structure.
Whitebox frameworks produce systems that are tightly coupled to the specific details of the
framework’s inheritance hierarchies, and thus changes in the framework can require the
recompilation of the application. Blackbox frameworks are easier to use than whitebox
frameworks because they rely on delegation instead of inheritance. However, blackbox
frameworks are more difficult to develop because they require the definition of interfaces and
hooks that anticipate a wide range of potential use cases. Moreover, it is easier to extend and
reconfigure blackbox frameworks dynamically, as they emphasize dynamic object relationships
rather than static class relationships. [Johnson & Foote, 1988].
Frameworks, class libraries, and design patterns
Frameworks are closely related to design patterns, class libraries, and components.
Design patterns versus frameworks. The main difference between frameworks and patterns is
that frameworks focus on reuse of concrete designs, algorithms, and implementations in a
particular programming language. In contrast, patterns focus on reuse of abstract designs and small
collections of cooperating classes. Frameworks focus on a particular application domain, whereas
design patterns can be viewed more as building blocks of frameworks.
Class libraries versus frameworks. Classes in a framework cooperate to provide a reusable
architectural skeleton for a family of related applications. In contrast, class libraries are less domain
specific and provide a smaller scope of reuse. For instance, class library components, such as
101

classes for strings, complex numbers, arrays, and bitsets, can be used across many application
domains. Class libraries are typically passive; that is, they do not implement or constrain the
control flow. Frameworks, however, are active; that is, they control the flow of control within an
application. In practice, developers often use frameworks and class libraries in the same system.
For instance, frameworks use class libraries, such as foundation classes, internally to simplify the
development of the framework. Similarly, application-specific code invoked by framework event
handlers uses class libraries to perform basic tasks, such as string processing, file management,
and numerical analysis.
Components versus frameworks. Components are self-contained instances of classes that are
plugged together to form complete applications. In terms of reuse, a component is a blackbox that
defines a cohesive set of operations that can be used solely with knowledge of the syntax and
semantics of its interface. Compared with frameworks, components are less tightly coupled and
can even be reused on the binary code level. That is, applications can reuse components without
having to subclass from existing base classes. The advantage is that applications do not always
have to be recompiled when components change. The relationship between frameworks and
components is not predetermined. On the one hand, frameworks can be used to develop
components, where the component interface provides a facade pattern for the internal class
structure of the framework. On the other hand, components can be plugged into blackbox
frameworks. In general, frameworks are used to simplify the development of infrastructure and
middleware software, whereas components are used to simplify the development of end-user
application software.
8.5 Managing Reuse
Historically, software development started as a craft, in which each application was custom made
according to the wishes and needs of a single customer. After all, software development
represented only a fraction of the cost of hardware, and computing solutions were affordable only
to few. With the price of hardware dropping and computing power increasing exponentially, the
number of customers and the range of applications has broadened dramatically. Conversely,
software costs increased as applications became more complex. This trend reached the point where
software represented the largest cost in any computing solution, putting tremendous economic
pressure on the project manager to reduce the cost of software. With no silver bullet in sight,
systematic reuse of code, designs, and processes became an attractive solution. Reuse, whether
design patterns, frameworks, or components, has many technical and managerial advantages:
• Lower development effort. When reusing a solution or a component, many standard
errors are avoided. Moreover, in the case of design patterns, the resulting system is more easily
extended and more resilient to typical changes. This results in less development effort and reduces
the need for human resources, which can be redirected to testing the software to ensure better
quality.
• Lower risk. When reusing repetitively the same design pattern or component, the typical
problems that will be encountered are known and can be anticipated. Moreover, the time needed
102

to adapt the design pattern or to glue the component is also known, resulting in a more predictable
development process and fewer risks.
• Widespread use of standard terms. The reuse of a standard set of design patterns and
components fosters the use of a standard vocabulary. For example, terms such as Adapter, Bridge,
Command, or Facade denote precise concepts that all developers become familiar with. This
reduces the number of different terms and solutions to common problems and reduces
misunderstandings among developers.
• Increased reliability. Reuse by itself does not increase reliability or reduce the need for
testing (see the Ariane 501 incident in Section 3.1 as an illustrative example). Components and
pattern solutions that worked in one context can exhibit unexpected failures in other contexts.
However, a culture of reuse in a software organization can increase reliability for all of the above
reasons: reduced development time can lead to an increased testing effort, repetitive use of
components can lead to a knowledge base of typical problems to be anticipated, and use of standard
terms reduces communication failures.
Unfortunately, reuse does not occur spontaneously within a development organization. The
main challenges include
• NIH (Not Invented Here) syndrome. Since software engineering education (at least
until recently) emphasizes mostly the design of new solutions, developers often distrust the reuse
of existing solutions, especially when the customization of the solution under consideration is
limited or constrained. In such situations, developers believe that they can develop a completely
new solution that is better adapted to their specific problem (which is usually true) in less time
than what they need to understand the reused solution (which is usually not true). Moreover, the
advantages of reuse are visible only in the longer term, while the gratification of developing a new
implementation is instantaneous.
• Process support. The processes associated with identifying, reusing, and customizing an
existing solution are different than those involved in creating a brand-new solution. The first set
of activities requires painstakingly sifting through a large and evolving corpus of knowledge and
carefully evaluating the findings. The second set of activities requires creativity and a good
understanding of the problem. Most software engineering tools and methods are better adapted to
creative activities than to reuse. For example, there are currently many catalogs of design patterns,
but no systematic method for novice developers to identify quickly the appropriate pattern that
should be used in a given situation.
• Training. Given the lack of knowledge support tools for reuse, training is the single most
effective method in establishing a reuse culture. Consequently, the burden of educating developers
to specific reusable solutions and components falls on the development organization.
8.5.1 Documenting Reuse
Reuse activities involve two types of documentation: the documentation of the template solution
being reused and the documentation of the system that is reusing the solution.
The documentation of a reusable solutions (e.g., the design pattern, a framework, or a
component) includes not only a description of the solution, but also a description of the class of
103

problems it addresses, the trade-offs faced by the developer, alternative implementations, and
examples of use. This documentation is typically difficult to produce, as the author of the reusable
solution may not be able to anticipate all the problems it can be used for. Moreover, such
documentation is usually generic and abstract and must be illustrated by concrete examples for
novice developers to fully understand the parameters of the solution. Consequently, documentation
of a reusable solution is usually not ideal. However, developers can incrementally improve this
documentation each time they reuse a solution by adding the following:
• Reference to a system using the solution. Minimally, the documentation of the reusable
solution should include references to each use. If defects are discovered in the reused solution,
these defects can be systematically corrected in all occurrences of reuse.
• Example of use. Examples are essential for developers to understand the strengths and
limitation of the reused solution. Each occurrence of reuse constitutes an example. Developers
should include a brief summary illustrating the problems being solved and the adopted solution.
• Alternative solutions considered. As we saw in this chapter, many design patterns are
similar. However, selecting the wrong pattern can lead to more problems than developing a custom
solution. In the documentation of the example, developers should indicate which other candidate
solutions they discarded and why.
• Encountered trade-offs. Reuse, especially in the case of frameworks and components,
often entails making a compromise and selecting a less-than-optimal solution for some criteria.
For example, one component may offer an interface that is extensible, and another may deliver
better response time.
The documentation of the system under construction should minimally include references
to all the reused solutions. For example, design patterns are not immediately identifiable in the
code, as the classes involved usually have names different from names used in the standard pattern.
Many patterns draw their benefits from the decoupling of certain classes (e.g., the bridge client
from the bridge implementations), so such classes should remain decoupled during future changes
to the system. Similarly, explicitly documenting which classes use which components makes it
easier to adapt the client classes to newer versions of the reused components. Consequently,
developers can further increase of the benefits of reuse by documenting the links between reused
solutions and their code, in addition to the standard object design documentation.
A contributing factor for the high cost of change late in the process is the loss of design
context. Developers forget quickly the reasons behind designing complicated workarounds or
complex data structures during early phases of the process. When changing code late in the
process, the probability of introducing errors into the system is high. Hence, the reason for
recording trade-offs, examples, alternatives, and other decision-making information is also to
reduce the cost of change.
8.5.2 Assigning Responsibilities
Individual developers assigned to subsystems will not spontaneously turn to design patterns and
components unless they have experience with these topics. To foster a reuse culture, an
organization needs to make the incentives of reuse as high as possible for the individual developer.
104

This includes access to expert developers who can provide advice and information, and specific
components or patterns, training, and emphasis on reuse during design reviews and code
inspections. The availability of knowledge lowers the frustration experienced when experiencing
the learning curve associated with a component. The explicit review of pattern usage (or lack
thereof) increases the organizational incentive for investing time into looking for ready solutions.
Below are the main roles involved in reuse:
• Component expert. The component expert is familiar with using a specific component.
The component expert is a developer and usually has received third-party training in the use of the
component.
• Pattern expert. The pattern expert is the analog of the component expert for a family of
design patterns. However, pattern experts are usually self-made and acquire their knowledge from
experience.
• Technical writer. The technical writer must be aware of reuse and document
dependencies between components, design patterns, and the system, as discussed in the previous
section. This may require the technical writer to become familiar with the solutions typically
reused by the organization and with their associated terms.
• Configuration manager. In addition to tracking configurations and versions of
individual subsystems, the configuration manager must also be aware of the versions of the
components that are used. While newer versions of the components may be used, their introduction
requires tests to be repeated and changes related to the upgrade documented.
The technical means of achieving reuse (e.g., inheritance, delegation, design patterns,
application frameworks) have been available to software engineers for nearly two decades. The
success factors associated with reuse are actually not technical, but managerial. Only an
organization that provides the tools for selecting and improving reusable solutions and the culture
to encourage their use can reap the benefits of design and code reuse.
(Note: ARENA case study is there in Page number 338/333 of the
textbook Object-oriented software engineering _ using UML,
Patterns, -- Bruegge, Bernd; Dutoit, Allen H, in section 8.6)
105

Module 4, Chapter 2
Object Design: Specifying Interfaces
During object design, we identify and refine solution objects to realize the subsystems defined
during system design. During this activity, our understanding of each object deepens: we specify
the type signatures and the visibility of each of the operations, and, finally, we describe the
conditions under which an operation can be invoked and those under which the operation raises an
exception. As the focus of system design was on identifying large chunks of work that could be
assigned to individual teams or developers, the focus of object design is on specifying the
boundaries between objects. At this stage in the project, a large number of developers concurrently
refines and changes many objects and their interfaces. The pressure to deliver is increasing and the
opportunity to introduce new, complex faults into the design is still there. The focus of interface
specification is for developers to communicate clearly and precisely about increasingly lower-level
details of the system.
The interface specification activities of object design include
• identifying missing attributes and operations
• specifying type signatures and visibility
• specifying invariants
• specifying preconditions and postconditions.
In this chapter, we provide an overview of the concepts of interface specification. We
introduce OCL (Object Constraint Language) as a language for specifying invariants,
preconditions, and postconditions. We discuss heuristics and stylistic guidelines for writing
readable constraints. Finally, we examine the issues related to documenting and managing
interface specifications.
9.2 An Overview of Interface Specification
At this point in system development, we have made many decisions about the system and produced
a wealth of models:
• The analysis object model describes the entity, boundary, and control objects that are
visible to the user. The analysis object model includes attributes and operations for each object.
• Subsystem decomposition describes how these objects are partitioned into cohesive
pieces that are realized by different teams of developers. Each subsystem includes high level
service descriptions that indicate which functionality it provides to the others.
• Hardware/software mapping identifies the components that make up the virtual
machine on which we build solution objects. This may include classes and APIs defined by existing
components.
• Boundary use cases describe, from the user’s point of view, administrative and
exceptional cases that the system handles.
• Design patterns selected during object design reuse describe partial object design models
addressing specific design issues.
106

All these models, however, reflect only a partial view of the system. Many puzzle pieces
are still missing and many others are yet to be refined. The goal of object design is to produce an
object design model that integrates all of the above information into a coherent and precise whole.
The goal of interface specification, the focus of this chapter, is to describe the interface of each
object precisely enough so that objects realized by individual developers fit together with minimal
integration issues. To this end, interface specification includes the following activities:
• Identify missing attributes and operations. During this activity, we examine each
subsystem service and each analysis object. We identify missing operations and attributes that are
needed to realize the subsystem service. We refine the current object design model and augment it
with these operations.
• Specify visibility and signatures. During this activity, we decide which operations are
available to other objects and subsystems, and which are used only within a subsystem. We also
specify the return type of each operation as well as the number and type of its parameters. This
goal of this activity is to reduce coupling among subsystems and provide a small and simple
interface that can be understood easily by a single developer.
• Specify contracts. During this activity, we describe in terms of constraints the behavior
of the operations provided by each object. In particular, for each operation, we describe the
conditions that must be met before the operation is invoked and a specification of the result after
the operation returns.
The large number of objects and developers, the high rate of change, and the concurrent
number of decisions made during object design make object design much more complex than
analysis or system design. This represents a management challenge, as many important decisions
tend to be resolved independently and are not communicated to the rest of the project. Object
design requires much information to be made available among the developers so that decisions can
be made consistent with decisions made by other developers and consistent with design goals. The
Object Design Document, a live document describing the specification of each class, supports this
information exchange.
9.3 Interface Specification Concepts
In this section, we present the principal concepts of interface specification:
• Class Implementor, Class Extender, and Class User (Section 9.3.1)
• Types, Signatures, and Visibility (Section 9.3.2)
• Contracts: Invariants, Preconditions, and Postconditions (Section 9.3.3)
• Object Constraint Language (Section 9.3.4)
• OCL Collections: Sets, Bags, and Sequences (Section 9.3.5)
• OCL Qualifiers: forAll and exists (Section 9.3.6).
9.3.1 Class Implementor, Class Extender, and Class User
So far, we have treated all developers as equal. Now that we are delving into the details of object
design and implementation, we need to differentiate developers based on their point of view. While
all use the interface specification to communicate about the class of interest, they view the
specifications from radically different point of views (see also Figure 9-1):
107

• The class implementor is responsible for realizing the class under consideration. Class
implementors design the internal data structures and implement the code for each public operation.
For them, the interface specification is a work assignment.
• The class user invokes the operations provided by the class under consideration during
the realization of another class, called the client class. For class users, the interface specification
discloses the boundary of the class in terms of the services it provides and the assumptions it makes
about the client class.
• The class extender develops specializations of the class under consideration. Like class
implementors, class extenders may invoke operations provided by the class of interest, the class
extenders focus on specialized versions of the same services. For them, the interface specification
both a specifies the current behavior of the class and any constraints on the services provided by
the specialized class.

9.3.2 Types, Signatures, and Visibility


During analysis, we identified attributes and operations without necessarily specifying their types
or their parameters. During object design, we refine the analysis and system design models by
completing type and visibility information. The type of an attribute specifies the range of values
the attribute can take and the operations that can be applied to the attribute. For example, consider
the attribute maxNumPlayers of the Tournament class in ARENA (Figure 9-3). maxNumPlayers
represent the maximum number of Players who can be accepted in a given Tournament. Its type is
int, denoting that it is an integer number. The type of the maxNumPlayers attribute also defines
the operations that can be applied to this attribute: we can compare, add, subtract, or multiply other
integers to maxNumPlayers.
Operation parameters and return values are typed in the same way as attributes are. The
type constrains the range of values the parameter or the return value can take. Given an operation,
the tuple made out of the types of its parameters and the type of the return value is called the
signature of the operation. For example, the acceptPlayer() operation of Tournament takes one
parameter of type Player and does not have a return value. The signature for acceptPlayer() is then
108

acceptPlayer(Player):void. Similarly, the getMaxNumPlayers() operation of Tournament takes no


parameters and returns an int. The signature of getMaxNumPlayers() is then
getMaxNumPlayers(void):int.
The class implementor, the class user, and the class extender all access the operations and
attributes of the class under consideration. However, these developers have different needs and are
usually not allowed to access all operations of the class. For example, a class implementor accesses
the internal data structures of the class that the class user cannot see. The class extender accesses
only selected internal structures of superclasses. The visibility of an attribute or an operation is a
mechanism for specifying whether the attribute or operation can be used by other classes or not.
UML defines four levels of visibility:
• A private attribute can be accessed only by the class in which it is defined. Similarly, a
private operation can be invoked only by the class in which it is defined. Private attributes and
operations cannot be accessed by subclasses or calling classes. Private operations and attributes
are intended for the class implementor only.
• A protected attribute or operation can be accessed by the class in which it is defined
and by any descendant of that class. Protected operations and attributes cannot be accessed by any
other class. Protected operations and attributes are intended for the class extender.
• A public attribute or operation can be accessed by any class. The set of public
operations and attributes constitute the public interface of the class and is intended for the class
user.
• An attribute or an operation with visibility package can be accessed by any class in the
nearest enclosing package. This visibility enables a set of related classes (for example, forming a
subsystem) to share a set of attributes or operations without having to make them public to the
entire system.
Visibility is denoted in UML by prefixing the name of the attribute or the operation with a
character symbol: – for private, # for protected, + for public, or ~ for package. For example, in
Figure 9-3, we specify that the maxNumPlayers attribute of Tournament is private, whereas all the
class operations are public.
109

Type information alone is often not sufficient to specify the range of legitimate values of
an attribute. In the Tournament example, the int type allows maxNumPlayers to take negative
values, which does not make sense in the application domain. We address this issue with contracts.
9.3.3 Contracts: Invariants, Preconditions, and Postconditions
Contracts are constraints on a class that enable class users, implementors, and extenders to share
the same assumptions about the class [Meyer, 1997]. A contract specifies constraints that the class
user must meet before using the class as well as constraints that are ensured by the class
implementor and the class extender when used. Contracts include three types of constraints:
• An invariant is a predicate that is always true for all instances of a class. Invariants are
constraints associated with classes or interfaces. Invariants are used to specify consistency
constraints among class attributes.
• A precondition is a predicate that must be true before an operation is invoked.
Preconditions are associated with a specific operation. Preconditions are used to specify constraints
that a class user must meet before calling the operation.
• A postcondition is a predicate that must be true after an operation is invoked.
Postconditions are associated with a specific operation. Postconditions are used to specify
constraints that the class implementor and the class extender must ensure after the invocation of
the operation.
For example, consider the Java interface for the Tournament from Figure 9-3. This class
provides an acceptPlayer() method to add a Player in the Tournament, a removePlayer() method
to withdraw a Player from the Tournament (e.g., because the player cancelled his application), and
a getMaxNumPlayers() method to get the maximum number of Players who can participate in this
Tournament.
An example of an invariant for the Tournament class is that the maximum number of
Players in the Tournament should be positive. If a Tournament is created with a maxNumPlayers
that is zero, the acceptPlayer() method will always violate its contract and the Tournament will
never start. Using a boolean expression, in which t is a Tournament, we can express this invariant
as
t.getMaxNumPlayers() > 0
An example of a precondition for the acceptPlayer() method is that the Player to be added
has not yet already been accepted in the Tournament and that the Tournament has not yet reached
its maximum number of Players. Using a boolean expression, in which t is a Tournament and p is
a Player, we express this invariant as
!t.isPlayerAccepted(p) and t.getNumPlayers() < t.getMaxNumPlayers()
An example of a postcondition for the acceptPlayer() method is that the current number
of Players must be exactly one more than the number of Players before the invocation of
acceptPlayer(). We can express this postcondition as
t.getNumPlayers_afterAccept = t.getNumPlayers_beforeAccept + 1
where numPlayers_afterAccept and numPlayers_afterAccept are the current and number of
Players before and after acceptPlayer(), respectively.
110

We use invariants, preconditions, and postconditions to specify special or exceptional cases


unambiguously. It is also possible to use constraints to completely specify the behavior of an
operation. Such a use of constraints, called “constraint-based specification,” however, is difficult
and can be more complicated than implementing the operation itself. In this book, we do not
describe pure constraint-based specifications. Instead, we focus on specifying operations using
both constraints and natural language and emphasizing boundary cases for the purpose of better
communication among developers.
9.3.4 Object Constraint Language
A constraint can be expressed in natural language or in a formal language such as Object
Constraint Language (OCL) [OMG, 2006]. OCL is a language that allows constraints to be
formally specified on single model elements (e.g., attributes, operations, classes) or groups of
model elements (e.g., associations and participating classes). In the next two sections, we introduce
the basic syntax of OCL. For a complete tutorial on OCL, we refer to [Warmer & Kleppe, 2003].
A constraint is expressed as a boolean expression returning the value True or False. A
constraint can be depicted as a note attached to the constrained UML element by a dependency
relationship. Figure 9-4 depicts a class diagram of Tournament example of the previous section
using UML and OCL.

Attaching OCL expressions to diagrams can lead to clutter. For this reason, OCL
expressions can be alternatively expressed in a textual form. For example, the invariant for the
Tournament class requiring the attribute maxNumPlayers to be positive is written as follows:

The context keyword indicates the entity to which the expression applies. This is followed
by one of the keywords inv, pre, and post, which correspond to the UML stereotypes «invariant»,
«precondition», and «postcondition», respectively. Then follows the actual OCL expression.
OCL’s syntax is similar to object-oriented languages such as C++ or Java. However, OCL is not a
procedural language and thus cannot be used to denote control flow. Operations can be used in
OCL expressions only if they do not have any side effects.
111

For invariants, the context for the expression is the class associated with the invariant. The
keyword self (e.g., self.numElements) denotes all instances of the class.1 Attributes and operations
are accessed using the dot notation (e.g., self.maxNumPlayers accesses maxNumPlayers in the
current context). The self-keyword can be omitted if there is no ambiguity.
For preconditions and postconditions, the context of the OCL expression is an operation.
The parameters passed to the operation can be used as variables in the expression. For example,
consider the following precondition on the acceptPlayer() operation in Tournament:

The creators and users of OCL constraints are developers during object design and during
implementation. In Java programs, tools such as iContract [Kramer, 1998] enable developers to
document constraints in the source code using Javadoc style tags, so that constraints are more
readily accessed and updated. Figure 9-5 depicts the Java code corresponding to the constraints
introduced so far.
112

9.3.5 OCL Collections: Sets, Bags, and Sequences


In general, constraints involve an arbitrary number of classes and attributes. Consider the class
model of Figure 9-6 representing the associations among the League, Tournament, and Player
classes. Let’s assume we want to refine the model with the following constraints:
1. A Tournament’s planned duration must be under one week.
2. Players can be accepted in a Tournament only if they are already registered with the
corresponding League.
3. The number of active Players in a League are those that have taken part in at least one
Tournament of the League.

Now, let’s review the above constraints in terms of the instances of Figure 9-7:
1. The winter:Tournament lasts two days, the xmas:Tournament three days, both under a
week.
2. All Players of the winter:Tournament and the xmas:Tournament are associated with
tttExpert:League. The Player zoe, however, is not part of the tttExpert:League and does
not take part in either Tournament.
3. tttExpert:League has four active Players, whereas the chessNovice:League has none,
because zoe does not take part in any Tournament.
At first sight, these constraints vary quite a bit: for example, the first constraint involves
attributes of a single class (Tournament.start and Tournament.end); the second one involves three
classes (i.e., Player, Tournament, League) and their associations; the third involves a set of Matches
within a single Tournament. In all cases, we start with the class of interest and navigate to one or
more classes in the model.
113

In general, we distinguish three cases of navigation (Figure 9-8):


• Local attribute. The constraint involves an attribute that is local to the class of interest
(e.g., duration of a Tournament in constraint 1),
• Directly related class. The expression involves the navigation of a single association to
a directly related class (e.g., Players of a Tournament, League of a Tournament).
• Indirectly related class. The constraint involves the navigation of a series of associations
to an indirectly related class (e.g., the Players of all Tournaments of a League).
114

All constraints can be built using a combination of these three basic cases of navigation.
Once we know how to deal with these three cases of navigation, we can build any constraint. We
already know how to deal with the first type of constraint with the dot notation, as we saw in the
previous section. For example, we can write constraint 1 as follows:

In the second constraint, however, the expression league.players can actually refer to many
objects, since the players association is a many-to-many association. To deal with this situation,
OCL provides additional data types called collections. There are three types of collections:
• OCL sets are used when navigating a single association. For example, navigating the
players’ association of the winter:Tournament yields the set {alice, bob}. Navigating the players
association from the tttExpert:League yields the set {alice, bob, marc, joe}. Note, however, that
navigating an association of multiplicity 1 yields directly an object, not a set. For example,
navigating the league association from winter:Tournament yields tttExpert:League (as opposed to
{tttExpert:League}).
• OCL sequences are used when navigating a single ordered association. For example, the
association between League and Tournament is ordered. Hence, navigating the tournaments
association from tttExpert:League yields [winter:Tournament, xmas:Tournament] with the index
of winter:Tournament and xmas:Tournament being 1 and 2, respectively.
• OCL bags are multisets: they can contain the same object multiple times. Bags are used
to accumulate the objects when accessing indirectly related objects. For example, when
determining which Players are active in the tttExpert:League, we first navigate the tournaments
association of tttExpert, then the players association from winter:Tournament, and finally the
players association from xmas:Tournament, yielding the bag {alice, bob, bob, marc, joe}. The bag
resulting from navigating the same associations from chessNovice:League results in the empty
bag, as there are no Tournaments in the chessLeague. In cases where the number of occurrences
of each object in the bag is undesired, the bag can be converted to a set.
OCL provides many operations for accessing collections. The most often used are
• size, which returns the number of elements in the collection
• includes(object), which returns True if object is in the collection
• select(expression), which returns a collection that contains only the elements of the
original collection for which expression is True
• union(collection), which returns a collection containing elements from both the original
collection and the collection specified as parameter
• intersection(collection), which returns a collection that contains only the elements that
are part of both the original collection and the collection specified as parameter
• asSet(collection), which returns a set containing each element of the collection.
To distinguish between attributes in classes from collections, OCL uses the dot notation for
accessing attributes and the -> operator for accessing collections. For example, constraint 2 (on
page 361) can be expressed with an includes operation as follows:
115

The next association we navigate is the players association on the League, which results in
a set because of the “many” multiplicities of the association. We use the OCL includes() operation
on this set to test if the Player p is known to the League.
Navigating a series of at least two associations with one-to-many or many-to-many
multiplicity results in a bag. For example, in the context of a League, the expression
tournaments.players contains the concatenation of all players of the Tournaments related to the
current League. As a result of this concatenation, elements can appear several times. To remove
the duplicates in this bag, for example, when counting the number of Players in a League that have
taken part in a Tournament, we can convert the bag into a set using the OCL asSet operation.
Consequently, we can write constraint 3 (on page 361) as follows:

9.3.6 OCL Quantifiers: forAll and exists


So far, we presented examples of constraints using common OCL collection operations such as
includes, union, or asSet. Two additional operations on collections enable us to iterate over
collections and test expressions on each element:
• forAll(variable|expression) is True if expression is True for all elements in the collection.
• exists(variable|expression) is True if there exists at least one element in the collection for
which expression is True.
For example, to ensure that all Matches in a Tournament occur within the Tournament’s
time frame, we can repetitively test the start dates of all matches against the Tournament using
forAll(). Consequently, we write this constraint as follows:

The OCL exists() operation is similar to forAll(), except that the expressions evaluated on
each element are ORed, that is, only one element needs to satisfy the expression for the exists()
operation to return True. For example, to ensure that each Tournament conducts at least one Match
on the first day of the Tournament, we can write:

9.4 Interface Specification Activities


Interface specification includes the following activities:
• Identifying Missing Attributes and Operations (Section 9.4.1)
• Specifying Type Signatures and Visibility (Section 9.4.2)
• Specifying Preconditions and Postconditions (Section 9.4.3)
• Specifying Invariants (Section 9.4.4)
116

• Inheriting Contracts (Section 9.4.5).


To illustrate these activities, we use the ARENA object design model resulting from the
AnnounceTournament use case (Section 5.6). During analysis, we identified several boundary,
control, and entity classes: The TournamentForm class is responsible for generating and processing
all user interface forms, and the TournamentControl class is responsible for coordinating all
transactions between the TournamentForm and the entity classes Tournament, Player, and Match.
Figure 9-9 depicts the attributes, operations, and associations among these classes that have been
identified during analysis. First, we resolve any remaining requirements issues and identify
missing attributes and operations.

9.4.1 Identifying Missing Attributes and Operations


During this step, we examine the service description of the subsystem and identify missing
attributes and operations. During analysis, we may have missed many attributes because we
focused on the functionality of the system: we described the functionality of the system primarily
with the use case model (as opposed to operations in the object model). We focused on the
application domain when constructing the object model and therefore ignored details related to the
system that are independent of the application domain.
To prevent a player from applying to two different tournaments that will be conducted at
the same time, we draw a sequence diagram representing the control and data flow needed (Figure
117

9-10). Drawing this diagram leads us to the identification of an additional operation,


isPlayerOverbooked(), that checks if the start and end dates of the Tournament of interest overlap
with those of other Tournaments into which the Player has already been accepted.

Since isPlayerOverbooked() enforces a policy that we recently identified with the


organizing of Tournaments, we attach this operation with the TournamentControl class, as opposed
to the Player class or the Tournament class. This results in an entity object model that is simpler
and modifiable. For example, other definitions of player commitments (e.g., a player can only play
one tournament per week) or policies involving other objects (e.g., matches involving young
players must occur before a certain time of the day) can be added or substituted in the
TournamentControl class without changes to the entity classes.
9.4.2 Specifying Types, Signatures, and Visibility
During this step, we specify the types of the attributes, the signatures of the operations, and the
visibility of attributes and operations. Specifying types refines the object design model in two
ways. First, we add detail to the model by specifying the range of each attribute. For example, by
determining the type of the start and end date of a Tournament, we make decisions about the
granularity of the time tracked by the application. By selecting a representation of time including
days, hours, minutes, and seconds, we enable LeagueOwners to conduct several Tournaments per
day. Second, we map classes and attributes of the object model to built-in types provided by the
development environment. For example, by selecting String to represent the name attributes of
Leagues and Tournaments, we can use all the operations provided by the String class to manipulate
name values.
118

We also consider the relationship between the classes we identified and the classes from
existing components. For example, a number of classes implementing collections are provided in
the java.util package. The List interface provides a way to access an ordered collection of objects
independent from the underlying data structure. The Map interface provides a table mapping from
unique keys to arbitrary entries. We select the List interface for returning collections of objects,
such as the Tournaments to which a Player has been accepted. We select the Map interface for
returning mappings of objects, for example, Player to Scores.
Finally, we determine the visibility of each attribute and operation during this step. By
doing so, we determine which attributes should be accessible only indirectly via the class’s
operations, and which attributes are public and can be modified by any other class. Similarly, the
visibility of operations allows us to distinguish between operations that are part of the class
interface and those that are utility methods that can only be accessed by the class. In the case of
abstract classes and classes that are intended to be refined, we also define protected attributes and
methods for the use of subclasses only. Figure 9-11 depicts the refinement of the object model
depicted in Figure 9-9 after types, signatures, and visibility have been assigned.

Once we have specified the types of each attribute, the signature of each operation, and its
visibility, we focus on specifying the behavior and boundary cases of each class by using contracts.
119

9.4.3 Specifying Pre- and Postconditions


During this step, we define contracts for each public operation of each class. We already said that
a contract is an agreement between the class user and the class implementor. The preconditions of
an operation describe the part of the contract that the class user must respect. The postconditions
describe what the class implementor guarantees in the event the class user fulfilled her part of the
contract. When refining a class, class extenders inherit the contract from the original class
implementor.
For example, in Section9.4.1, we identified the operation isPlayerOverbooked() on the
TournamentControl class, which checks if a Player’s existing commitments would prevent him
from taking part in the current Tournament. As a class implementor, we want to make sure that the
class user checks Players that have not yet been accepted into the Tournament. If this is the case,
we only need to check if the current Tournament overlaps any that the Player is already taking part
in. We express this with the following contract:

Preconditions and postconditions can also be used to specify dependencies among


operations in the same class. Consider, for example, the operations on the TournamentControl
class. Given a new Tournament, these operations must be invoked in a specific order. We cannot
resolve the sponsorship of a Tournament without knowing which sponsors are interested. Also, we
cannot advertise the Tournament before we resolve the sponsorship issue. For TournamentControl,
we can simply write preconditions and postconditions that examine the state of the associations of
the Tournament class. To state that sponsors cannot be selected before there are interested
advertisers, we write the following:

To ensure that TournamentControl.selectSponsors() is invoked only once, we add the


following precondition:

Finally, to specify how TournamentControl.selectSponsors() sets the advertisers


association, we add the following postcondition:
120

Below is the complete set of OCL constraints specifying the order of selectSponsors(),
advertiseTournament(), and acceptPlayer().

9.4.4 Specifying Invariants


Once you master the syntax and the concepts behind a constraint language such as OCL, writing
contracts for individual operations is relatively simple. The concept of a contract between the class
user and the class implementor is intuitive (e.g., “this is what I can do if you ensure these
conditions”) and focuses on a relatively short increment in time (i.e., the execution of a single
operation). However, grasping the essence of a class from operation-specific contracts is difficult;
much information is distributed throughout many constraints in many operations, so identifying
general properties of the class is difficult. Hence, the need for invariants. Invariants are much more
difficult to write than preconditions and postconditions, but they provide an overview of the
essential properties of the class.
Invariants constitute a permanent contract that extends and overwrites the operation-
specific contracts. The activity of identifying invariants is similar to that of finding abstract classes
during analysis (Section 5.4.10). A few are obvious and can be written from the start. Others can
be identified by extracting common properties from operation-specific contracts.
An example of an obvious invariant is that all Matches of a Tournament must occur within
the time frame of the Tournament:
121

An example of an invariant that is not so obvious, but can be identified by examining the
contracts of the TournamentControl class, is that no Player can take part in two or more
Tournaments that overlap. Although this property can be inferred by examining
TournamentControl.isPlayerOverbooked() operation, we can write this concisely as an invariant.
Since it is a policy decision, we attach this invariant to the TournamentControl class, as opposed
to the Player or the Tournament class.

When specified on several associations, constraints usually become complex and difficult
to understand, especially when nested forAll statements are used. For example, consider an
invariant stating that all Matches in a Tournament must involve only Players that are accepted in
the Tournament:

This constraint involves three collections: players, p.tournaments, and t.matches. We can
simplify this expression by using a bag created while navigating a series of associations:

In general, reducing the number of operations and nesting levels in a constraint makes it
much more understandable.
As illustrated by these examples, it is relatively easy to generate a large number of
constraints for each class. This does not guarantee readability. In fact, writing readable and correct
constraints is difficult. Remember that the reason for writing invariants is to clarify the
assumptions made by the class implementor to the class user. Consequently, when writing
constraints, the class implementor should focus on simple, short constraints that describe boundary
cases that may not otherwise be obvious. Figure 9-12 lists several heuristics to make constraints
more readable.
122

Invariants, preconditions, and postconditions specify the semantics of each operation by


making explicit what must be true before and after an operation is executed. Hence, contracts
provide clear documentation to the user. As we see in the next section, contracts are also useful for
the class extender.
9.4.5 Inheriting Contracts
In a polymorphic language, a class can be substituted by any of its descendents. That is, a class
user invoking operations on a class could be invoking instead a subclass. Hence, the class user
expects that a contract that holds for the superclass still holds for the subclass. We call this contract
inheritance.
For example, in ARENA, consider the inheritance hierarchy between User, LeagueOwner,
Player, Spectator, and Advertiser (Figure 9-13). The User class has an invariant stating that the
email address should be not null so that each user can be notified. If at some point in our design,
we decide that Spectators do not really need an E-mail address, then this contract will be broken
and classes invoking the User.notify() method may break. Consequently, either Spectators should
123

be taken out of the User hierarchy (i.e., Spectator does not fulfill the User contract) or the invariant
should be revised (i.e., the terms of the contract should be reformulated).

Contracts are inherited in the following manner:


• Preconditions. A method of subclass is allowed to weaken the preconditions of the
method it overrides. In other words, an overwritten method can handle more cases than its
superclass. For example, consider a concrete TournamentStyle class, SimpleKnockOutStyle, that
can deal with any number of Players that is a power of 2. We can express this with a precondition
on the planMatches() operation that restricts the number of Players to a power of 2. A
ComplexKnockOutStyle class refining the SimpleKnockOutStyle could weaken this precondition
by planning Tournaments for any number of Players.
• Postconditions. Methods must ensure the same postconditions as their ancestors or
stricter ones. Assume you are implementing a Set by inheriting from a List (this is a case of
implementation inheritance and questionable practice, as we discussed in Section 8.3.2). The
postcondition of List.add() is that the size of the List increases by one. The Set.add() method, in
this case, could not comply with this invariant, since adding an element to a set does not necessarily
increase the size of the set. Hence, Set should not be implemented as a subclass of List.
• Invariants. A subclass must respect all invariants of its superclasses. However, a subclass
can strengthen the inherited invariants. For example, List inherits from Collection. Collection has
an invariant specifying that its size cannot be negative. Consequently, List must respect this
invariant and cannot have a negative size. However, List adds a new invariant that stipulates that
its elements are ordered.
Contract inheritance is particularly useful when specifying abstract classes or interfaces
that are meant to be refined by class extenders. By precisely documenting the boundary between
the client class and an interface, it is possible for class extenders to implement new refined extender
classes without being familiar with the invoking source code. Contract inheritance is also a
consequence of Liskov’s Substitution Principle (Section 8.3.4), since extender classes must be able
to substitute transparently for an ancestor.
124

9.5 Managing Object Design


In this section, we discuss management issues related to object design. There are two primary
management challenges during object design:
• Increased communication complexity. The number of participants involved during this
phase of development increases dramatically. The object design models and code are the result of
the collaboration of many people. Management needs to ensure that decisions among these
developers are made consistently with project goals.
• Consistency with prior decisions and documents. Developers often do not appreciate
completely the consequences of analysis and system design decisions before object design. When
detailing and refining the object design model, developers may question some of these decisions
and reevaluate them. The management challenge is to maintain a record of these revised decisions
and to make sure all documents reflect the current state of development.
We discuss these challenges in Section 9.5.1, where we focus on the Object Design
Document, its development and maintenance, and its relationship with other documents, and in
Section 9.5.2, where we describe the roles and responsibilities associated with object design.
9.5.1 Documenting Object Design
Object design is documented in the Object Design Document (ODD). It describes object design
trade-offs made by developers, guidelines they followed for subsystem interfaces, the
decomposition of subsystems into packages and classes, and the class interfaces. The ODD is used
to exchange interface information among teams and as a reference during testing. The audience
for the ODD includes system architects (i.e., the developers who participate in the system design),
developers who implement each subsystem, and testers.
There are three main approaches to documenting object design:
• Self-contained ODD generated from model. The first approach is to document the
object design model the same way we documented the analysis model or the system design model:
we write and maintain a UML model and generate the document automatically. This document
would duplicate any application objects identified during analysis. The disadvantages of this
solution include redundancy with the Requirements Analysis Document (RAD) and a high level
of effort for maintaining consistency with the RAD. Moreover, the ODD duplicates information in
the source code and requires a high level of effort whenever the code changes. This often leads to
an RAD and an ODD that are inaccurate or out of date.
• ODD as extension of the RAD. The second approach is to treat the object design model
as an extension of the analysis model. In other terms, the object design is considered as the set of
application objects augmented with solution objects. The advantage of this solution is that
maintaining consistency between the RAD and the ODD becomes much easier as a result of the
reduction in redundancy. The disadvantages of this solution include polluting the RAD with
information that is irrelevant to the client and the user. Moreover, object design is rarely as simple
as identifying additional solution objects. Often, application objects are changed or transformed to
accommodate design goals or efficiency concerns.
125

• ODD embedded into source code. The third approach is to embed the ODD into the
source code. As in the first approach, we represent the ODD using a modeling tool (see Figure 9-
14). Once the ODD becomes stable, we use the modeling tool to generate class stubs. We describe
each class interface using tagged comments that distinguish source code comments from object
design descriptions. We can then generate the ODD using a tool that parses the source code and
extracts the relevant information (e.g., Javadoc [Javadoc, 2009a]). Once the object design model
is documented in the code, we abandon the initial object design model. The advantage of this
approach is that the consistency between the object design model and the source code is much
easier to maintain: when changes are made to the source code, the tagged comments are updated
and the ODD regenerated. In this section, we focus only on this approach.
The fundamental issue is one of maintaining consistency among two models and the source
code. Ideally, we want to maintain the analysis model, the object design model, and the source
code using a single tool. Objects would then be described once, and consistency among
documentation, stubs, and code would be maintained automatically.
Presently, however, UML modeling tools provide facilities for generating a document from
a model or class stubs from a model. For example, the glossary of the RAD can be generated from
the analysis model by collating the description fields attached to each class. (Figure 9-14). The
class stub generation facility, called forward engineering, can be used in the self-contained ODD
approach to generate the class interfaces and stubs for each method.
126

Some modeling tools provide facilities for reverse engineering, that is, recreating a UML
model from source code. Such facilities are useful for creating object models from legacy code.
They require substantial hand processing, however, because the tool cannot recreate bidirectional
associations based on reference attributes only.
Tool support currently falls short when maintaining two-way dependencies, in particular
between the analysis model and the source code. Some tools, such as Rationale Rose [Rational,
2002] and Together Control Center [TogetherSoft, 2002], realize this functionality by embedding
information about associations and other UML constructs in source code comments. Even though
this allows the tool to recover syntactic changes from the source code, developers must still update
model descriptions to reflect the changes. Because developers need different tools to change the
source code and the model, the model often falls behind.
Figure 9-15 is an example template for a generated ODD.

The first section of the ODD is an introduction to the document. It describes the general
trade-offs made by developers (e.g., buy vs. build, memory space vs. response time), guidelines
and conventions (e.g., naming conventions, boundary cases, exception handling mechanisms), and
an overview of the document.
Interface documentation guidelines and coding conventions are the single most important
factor that can improve communication between developers during object design. These include a
list of rules that developers should use when designing and naming interfaces. These are examples
of such conventions:
• Classes are named with singular nouns.
• Methods are named with verb phrases, fields, and parameters with noun phrases.
• Error status is returned via an exception, not a return value.
• Collections and containers have an iterator() method returning an Iterator.
• Iterators returned by iterator() methods are robust to element removals.
The second section of the ODD, Packages, describes the decomposition of subsystems into
packages and the file organization of the code. This includes an overview of each package, its
dependencies with other packages, and its expected usage.
127

The third section, Class interfaces, describes the classes and their public interfaces. This
includes an overview of each class, its dependencies with other classes and packages, its public
attributes, operations, and the exceptions they can raise.
9.5.2 Assigning Responsibilities
Object design is characterized by a large number of participants accessing and modifying a large
amount of information. To ensure that changes to interfaces are documented and communicated in
an orderly manner, several roles collaborate to control, communicate, and implement changes.
These include the members of the architecture team who are responsible for system design and
subsystem interfaces, liaisons who are responsible for interteam communication, and configuration
managers who are responsible for tracking change.
Below is an example of how roles can be assigned during object design. As in other
activities, the same participant can be assigned more than one role.
• The core architect develops coding guidelines and conventions before object design
starts. As for many conventions, the actual set of conventions is not as important as the
commitment of all architects and developers to use the conventions. The core architect is also
responsible for ensuring consistency with prior decisions documented in the System Design
Document (SDD) and Requirements Analysis Document (RAD).
• The architecture liaisons document the public subsystem interfaces for which they are
responsible. This leads to a first draft of the ODD, which is used by developers. Architecture
liaisons also negotiate changes to public interfaces. Often, the issue is not of consensus, but of
communication: developers depending on the interface may welcome the change if they are
notified first. The architecture liaisons and the core architect form the architecture team.
• The object designers refine and detail the interface specification of the class or subsystem
they implement.
• The configuration manager of a subsystem releases changes to the interfaces and the
ODD once they become available. The configuration manager also keeps track of the relationship
between source code and ODD revisions.
• Technical writers from the documentation team clean up the final version of the ODD.
They ensure that the document is consistent from a structural and content point of view. They also
check for compliance with the guidelines.
As in system design, the architecture team is the integrating force of object design. The
architecture team ensures that changes are consistent with project goals. The documentation team,
including the technical writers, ensures that the changes are consistent with guidelines and
conventions.
9.5.3 Using Contracts During Requirements Analysis
Some requirements analysis approaches advocate the use of constraints much earlier, for example,
during the definition of the entity objects. In principle, OCL can be used in requirements analysis
as well as in object design. In general, developers consider specific project needs before deciding
on a specific approach or level of formalism to be used when documenting operations. Examine
the following trade-offs before deciding if and when to use constraints for which purpose:
128

• Communication among stakeholders. During software development, models support


communication among stakeholders. Different models are used for different types of stakeholders.
On the one hand, a use case or a user interface mock-up is much easier for a client to understand
than an OCL constraint. On the other hand, an OCL constraint is much more precise statement for
the class user.
• Level of detail and rate of change. Attaching constraints to an analysis model requires
a much deeper understanding of the requirements. When this information is available, either from
the user, the client, or general domain knowledge, this results in a more complete analysis model.
When this information is not available, however, clients and developers may be forced to make
decisions too early in the process, increasing the rate of change (and consequently, development
cost) later in the development.
• Level of detail and elicitation effort. Similarly, eliciting detailed information from a
user during analysis may require much more effort than eliciting this information later in the
process, when early versions of the user interface are available and specific issues can be
demonstrated. However, this approach assumes that modifying the components under
consideration is relatively cheap and does not have a serious impact on the rest of the system. This
is the case for user interface layout issues and dialog considerations.
• Testing requirements. During testing, we compare the actual behavior of the system or
a class with the specified behavior. For automated tests or for stringent testing requirements (found,
for example, in application domains such as traffic control, medicine, or pharmaceuticals), this
requires a precise specification to test against. In this case, constraints help a lot if they are
specified as early as possible. We discuss testing in detail in Chapter 11, Testing.
(Note: ARENA case study is there in Page number 379/374 of the
textbook Object-oriented software engineering _ using UML,
Patterns, -- Bruegge, Bernd; Dutoit, Allen H, in section 9.6)
129

Module 4, Chapter 3
Mapping Models to Code
If the design pattern selection and the specification of class interfaces were done carefully, most
design issues should now be resolved. We could implement a system that realizes the use cases
specified during requirements elicitation and system design. However, as developers start putting
together the individual subsystems developed in this way, they are confronted with many
integration problems. Different developers have probably handled contract violations differently.
Undocumented parameters may have been added to the API to address a requirement change.
Additional attributes have possibly been added to the object model, but are not handled by the
persistent management system, possibly because of a miscommunication. As the delivery pressure
increases, addressing these problems results in additional improvised code changes and
workarounds that eventually yield to the degradation of the system. The resulting code would have
little resemblance to our original design and would be difficult to understand.
In this chapter, we describe a selection of transformations to illustrate a disciplined
approach to implementation to avoid such a system degradation. These include
• optimizing the class model
• mapping associations to collections
• mapping operation contracts to exceptions
• mapping the class model to a storage schema.
We use Java and Java-based technologies in this chapter. The techniques we describe,
however, are also applicable to other object-oriented programming languages.
10.2 An Overview of Mapping
A transformation aims at improving one aspect of the model (e.g., its modularity) while
preserving all of its other properties (e.g., its functionality). Hence, a transformation is usually
localized, affects a small number of classes, attributes, and operations, and is executed in a series
of small steps. These transformations occur during numerous object design and implementation
activities. We focus in detail on the following activities:
• Optimization (Section 10.4.1). This activity addresses the performance requirements of
the system model. This includes reducing the multiplicities of associations to speed up queries,
adding redundant associations for efficiency, and adding derived attributes to improve the access
time to objects.
• Realizing associations (Section 10.4.2). During this activity, we map associations to
source code constructs, such as references and collections of references.
• Mapping contracts to exceptions (Section 10.4.3). During this activity, we describe the
behavior of operations when contracts are broken. This includes raising exceptions when violations
are detected and handling exceptions in higher level layers of the system.
• Mapping class models to a storage schema (Section 10.4.4). During system design, we
selected a persistent storage strategy, such as a database management system, a set of flat files, or
130

a combination of both. During this activity, we map the class model to a storage schema, such as
a relational database schema.
10.3 Mapping Concepts
We distinguish four types of transformations (Figure 10-1):
• Model transformations operate on object models (Section 10.3.1). An example is the
conversion of a simple attribute (e.g., an address represented as a string) to a class (e.g., a class
with street address, zip code, city, state, and country attributes).
• Refactorings are transformations that operate on source code (Section 10.3.2). They are
similar to object model transformations in that they improve a single aspect of the system without
changing its functionality. They differ in that they manipulate the source code.
• Forward engineering produces a source code template that corresponds to an object
model (Section 10.3.3). Many modeling constructs, such as attribute and association
specifications, can be mechanically mapped to source code constructs supported by the selected
programming language (e.g., class and field declarations in Java), while the bodies and additional
private methods are added by developers.
• Reverse engineering produces a model that corresponds to source code (Section 10.3.4).
This transformation is used when the design of the system has been lost and must be recovered
from the source code. Although several CASE tools support reverse engineering, much human
interaction is involved for recreating an accurate model, as the code does not include all
information needed to recover the model unambiguously.

10.3.1 Model Transformation


A model transformation is applied to an object model and results in another object model [Blaha
& Premerlani, 1998]. The purpose of object model transformation is to simplify or optimize the
original model, bringing it into closer compliance with all requirements in the specification. A
transformation may add, remove, or rename classes, operations, associations, or attributes. A
transformation can also add information to the model or remove information from it.
In Chapter 5, Analysis, we used transformations to organize objects into inheritance
hierarchies and eliminate redundancy from the analysis model. For example, the transformation in
Figure 10-2 takes a class model with a number of classes that contain the same attribute and
131

removes the redundancy. The Player, Advertiser, and LeagueOwner in ARENA all have an email
address attribute. We create a superclass User and move the email attribute to the superclass.

In principle, the development process can be thought as a series of model transformations,


starting with the analysis model and ending with the object design model, adding solution domain
details along the way. Although applying a model transformation is a fairly mechanical activity,
identifying which transformation to apply to which set of classes requires judgement and
experience.
10.3.2 Refactoring
A refactoring is a transformation of the source code that improves its readability or modifiability
without changing the behavior of the system [Fowler, 2000]. Refactoring aims at improving the
design of a working system by focusing on a specific field or method of a class. To ensure that the
refactoring does not change the behavior of the system, the refactoring is done in small incremental
steps that are interleaved with tests. The existence of a test driver for each class allows developers
to confidently change the code and encourages them to change the interface of the class as little as
possible during the refactoring.
For example, the object model transformation of Figure 10-2 corresponds to a sequence of
three refactorings. The first one, Pull Up Field, moves the email field from the subclasses to the
superclass User. The second one, Pull Up Constructor Body, moves the initialization code from
the subclasses to the superclass. The third and final one, Pull Up Method, moves the methods
manipulating the email field from the subclasses to the superclass.
Pull Up Field relocates the email field using the following steps (Figure 10-3):
1. Inspect Player, LeagueOwner, and Advertiser to ensure that the email field is equivalent.
Rename equivalent fields to email if necessary.
132

2. Create public class User.


3. Set parent of Player, LeagueOwner, and Advertiser to User.
4. Add a protected field email to class User.
5. Remove fields email from Player, LeagueOwner, and Advertiser. 6. Compile and test.

Then, we apply the Pull Up Constructor Body refactoring to move the initialization code
for email using the following steps (Figure 10-4):
1. Add the constructor User(Address email) to class User.
2. Assign the field email in the constructor with the value passed in the parameter.
3. Add the call super(email) to the Player class constructor.
4. Compile and test.
5. Repeat steps 1–4 for the classes LeagueOwner and Advertiser.
133

At this point, the field email and its corresponding initialization code are in the User class. Now,
we examine if methods using the email field can be moved from the subclasses to the User class.
To achieve this, we apply the Pull Up Method refactoring:
1. Examine the methods of Player that use the email field. Note that Player.notify() uses
email and that it does not use any fields or operations that are specific to Player.
2. Copy the Player.notify() method to the User class and recompile.
3. Remove the Player.notify() method.
4. Compile and test.
5. Repeat for LeagueOwner and Advertiser.
Applying these three refactorings effectively transforms the ARENA source code in the
same way the object model transformation of Figure 10-2 transformed the ARENA object design
model. Note that the refactorings include many more steps than its corresponding object model
transformation and interleave testing with changes. This is because the source code includes many
more details, so it provides many more opportunities for introducing errors. In the next section, we
discuss general principles for avoiding transformation errors.
10.3.3 Forward Engineering
Forward engineering is applied to a set of model elements and results in a set of corresponding
source code statements, such as a class declaration, a Java expression, or a database schema. The
purpose of forward engineering is to maintain a strong correspondence between the object design
model and the code, and to reduce the number of errors introduced during implementation, thereby
decreasing implementation effort.
134

For example, Figure 10-5 depicts a particular forward engineering transformation applied
to the classes User and LeagueOwner. First, each UML class is mapped to a Java class. Next, the
UML generalization relationship is mapped to an extends statement in the LeagueOwner class.
Finally, each attribute in the UML model is mapped to a private field in the Java classes and to two
public methods for setting and getting the value of the field. Developers can then refine the result
of the transformation with additional behavior, for example, to check that the new value of
maxNumLeagues is a positive integer.
Note that, except for the names of the attributes and methods, the code resulting from this
transformation is always the same. This makes it easier for developers to recognize transformations
in the source code, which encourages them to comply with naming conventions. Moreover, since
developers use one consistent approach for realizing classes, they introduce fewer errors.
10.3.4 Reverse Engineering
Reverse engineering is applied to a set of source code elements and results in a set of model
elements. The purpose of this type of transformation is to recreate the model for an existing system,
either because the model was lost or never created, or because it became out of sync with the
source code. Reverse engineering is essentially an inverse transformation of forward engineering.
Reverse engineering creates a UML class for each class declaration statement, adds an attribute
for each field, and adds an operation for each method. However, because forward engineering can
lose information (e.g., associations are turned into collections of references), reverse engineering
does not necessarily recreate the same model. Although many CASE tools support reverse
engineering, CASE tools provide, at best, an approximation that the developer can use to
rediscover the original model.
10.3.5 Transformation Principles
A transformation aims at improving the design of the system with respect to some criterion. We
discussed four types of transformations so far: model transformations, refactorings, forward
engineering, and reverse engineering. A model transformation improves the compliance of the
object design model with a design goal. A refactoring improves the readability or the modifiability
of the source code. Forward engineering improves the consistency of the source code with respect
to the object design model. Reverse engineering tries to discover the design behind the source
code.
However, by trying to improve one aspect of the system, the developer runs the risk of
introducing errors that will be difficult to detect and repair. To avoid introducing new errors, all
transformations should follow these principles:
• Each transformation must address a single criterion. A transformation should improve
the system with respect to only one design goal. One transformation can aim to improve response
time. Another transformation can aim to improve coherence. However, a transformation should
not optimize multiple criteria. If you find yourself trying to deal with several criteria at once, you
most likely introduce errors by making the source code too complex.
• Each transformation must be local. A transformation should change only a few methods
or a few classes at once. Transformations often target the implementation of a method; in which
135

case the callers are not affected. If a transformation changes an interface (e.g., adding a parameter
to a method), then the client classes should be changed one at the time (e.g., the older method
should be kept around for background compatibility testing). If you find yourself changing many
subsystems at once, you are performing an architectural change, not an object model
transformation.
• Each transformation must be applied in isolation to other changes. To further localize
changes, transformations should be applied one at the time. If you are improving the performance
of a method, you should not add new functionality. If you are adding new functionality, you should
not optimize existing code. This enables you to focus on a limited set of issues and reduces the
opportunities for errors.
• Each transformation must be followed by a validation step. Even though
transformations have a mechanical aspect, they are applied by humans. After completing a
transformation and before initiating the next one, validate the changes. If you applied an object
model transformation, update the sequence diagrams in which the classes under consideration are
involved. Review the use cases related to the sequence diagrams to ensure that the correct
functionality is provided. If you applied a refactoring, run the test cases relevant to the classes
under consideration. If you added new control statements or dealt with new boundary cases, write
new tests to exercise the new source code. It is always easier to find and repair a bug shortly after
it was introduced than later.
10.4 Mapping Activities
In this section, we present transformations that occur frequently to illustrate the principles we
described in the previous section. We focus on transformations during the following activities:
• Optimizing the Object Design Model (Section 10.4.1)
• Mapping Associations to Collections (Section 10.4.2)
• Mapping Contracts to Exceptions (Section 10.4.3)
• Mapping Object Models to a Persistent Storage Schema (Section 10.4.4).
10.4.1 Optimizing the Object Design Model
The direct translation of an analysis model into source code is often inefficient. The analysis model
focuses on the functionality of the system and does not take into account system design decisions.
During object design, we transform the object model to meet the design goals identified during
system design, such as minimization of response time, execution time, or memory resources. For
example, in the case of a Web browser, it might be clearer to represent HTML documents as
aggregates of text and images. However, if we decided during system design to display documents
as they are retrieved, we may introduce a proxy object to represent placeholders for images that
have not yet been retrieved.
In this section, we describe four simple but common optimizations: adding associations to
optimize access paths, collapsing objects into attributes, delaying expensive computations, and
caching the results of expensive computations.
136

When applying optimizations, developers must strike a balance between efficiency and
clarity. Optimizations increase the efficiency of the system but also the complexity of the models,
making it more difficult to understand the system.
Optimizing access paths
Common sources of inefficiency are the repeated traversal of multiple associations, the
traversal of associations with “many” multiplicities, and the misplacement of attributes
[Rumbaugh et al., 1991].
Repeated association traversals. To identify inefficient access paths, you should identify
operations that are invoked often and examine, with the help of a sequence diagram, the subset of
these operations that requires multiple association traversal. Frequent operations should not require
many traversals, but should have a direct connection between the querying object and the queried
object. If that direct connection is missing, you should add an association between these two
objects. In interface and reengineering projects, estimates for the frequency of access paths can be
derived from the legacy system. In greenfield engineering projects, the frequency of access paths
is more difficult to estimate. In this case, redundant associations should not be added before a
dynamic analysis of the full system—for example, during system testing—has determined which
associations participate in performance bottlenecks.
“Many” associations. For associations with “many” multiplicities, you should try to
decrease the search time by reducing the “many” to “one.” This can be done with a qualified
association (Section 2.4.2). If it is not possible to reduce the multiplicity of the association, you
should consider ordering or indexing the objects on the “many” sides to decrease access time.
Misplaced attributes. Another source of inefficient system performance is excessive
modeling. During analysis many classes are identified that turn out to have no interesting behavior.
If most attributes are only involved in set() and get() operations, you should reconsider folding
these attributes into the calling class. After folding several attributes, some classes may not be
needed anymore and can simply remove from the model.
The systematic examination of the object model using the above questions should lead to
a model with selected redundant associations, with fewer inefficient many-to-many associations,
and with fewer classes.
Collapsing objects: Turning objects into attributes
After the object model is restructured and optimized a couple of times, some of its classes
may have few attributes or behaviors left. Such classes, when associated only with one other class,
can be collapsed into an attribute, thus reducing the overall complexity of the model.
Consider, for example, a model that includes Persons identified by a SocialSecurity object.
During analysis, two classes may have been identified. Each Person is associated with a
SocialSecurity class, which stores a unique social security number identifying the Person. Now,
assume that the use cases do not require any behavior for the SocialSecurity object and that no
other classes have associations with the SocialSecurity class. In this case, the SocialSecurity class
should be collapsed into an attribute of Person (see Figure 10-6).
137

The refactoring equivalent to the model transformation of Figure 10-6 is Inline Class
refactoring [Fowler, 2000]:
1. Declare the public fields and methods of the source class (e.g., SocialSecurity) in the
absorbing class (e.g., Person).
2. Change all references to the source class to the absorbing class.
3. Change the name of the source class to another name, so that the compiler catches any
dangling references.
4. Compile and test.
5. Delete the source class.
Delaying expensive computations
Often, specific objects are expensive to create. However, their creation can often be delayed
until their actual content is needed. For example, consider an object representing an image stored
as a file (e.g., an ARENA AdvertisementBanner). Loading all the pixels that constitute the image
from the file is expensive. However, the image data need not be loaded until the image is displayed.
We can realize such an optimization using a Proxy design pattern [Gamma et al., 1994]. An
ImageProxy object takes the place of the Image and provides the same interface as the Image object
(Figure 10-7). Simple operations such as width() and height() are handled by ImageProxy. When
Image needs to be drawn, however, ImageProxy loads the data from disk and creates a RealImage
object. If the client does not invokes the paint() operation, the RealImage object is not created,
thus saving substantial computation time. The calling classes only access the ImageProxy and the
RealImage through the Image interface.
Caching the result of expensive computations
Some methods are called many times, but their results are based on values that do not change or
change only infrequently. Reducing the number of computations required by these methods
substantially improve overall response time. In such cases, the result of the computation should be
cached as a private attribute. Consider, for example, the LeagueBoundary.getStatistics() operation,
which displays the statistics relevant to all Players and Tournaments in a League. These statistics
change only when a Match is completed, so it is not necessary to recompute the statistics every
time a User wishes to see them. Instead, the statistics for a League can be cached in a temporary
138

data structure, which is invalidated the next time a Match is completed. Note that this approach
includes a time-space trade-off: we improve the average response time for the getStatistics()
operation, but we consume memory space by storing redundant information.

10.4.2 Mapping Associations to Collections


Associations are UML concepts that denote collections of bidirectional links between two or more
objects. Object-oriented programming languages, however, do not provide the concept of
association. Instead, they provide references, in which one object stores a handle to another object,
and collections, in which references to several objects can be stored and possibly ordered.
References are unidirectional and take place between two objects. During object design, we realize
associations in terms of references, taking into account the multiplicity of the associations and
their direction.
Note that many UML modeling tools accomplish the transformation of associations into
references mechanically. However, even with a tool that accomplishes this transformation, it is
nevertheless critical that you understand its rationale, as you have to deal with the generated code.
Unidirectional one-to-one associations. The simplest association is a unidirectional one-
to-one association. For example (Figure 10-8), in ARENA, an Advertiser has a one-to-one
association with an Account object that tracks all the charges accrued from displaying
AdvertisementBanners. This association is unidirectional, as the Advertiser calls the operations of
the Account object, but the Account never invokes operations of the Advertiser. In this case, we
139

map this association to code using a reference from the Advertiser to the Account. That is, we add
a field to Advertiser named account of type Account.
Creating the association between Advertiser and Account translates to setting the account
field to refer to the correct Account object. Because each Advertiser object is associated with
exactly one Account, a null value for the account attribute can only occur when a Advertiser object
is being created. Otherwise, a null account is considered an error. Since the reference to the
Account object does not change over time, we make the account field private and add a public
Advertiser.getAccount() method. This prevents callers from accidentally modifying the account
field.

Bidirectional one-to-one associations. The direction of an association often changes


during the development of the system. Unidirectional associations are simple to realize.
Bidirectional associations are more complex and introduce mutual dependencies among classes.
Assume that we modify the Account class so that the display name of the Account is computed
from the name of the Advertiser. In this case, an Account needs to access its corresponding
Advertiser object. Consequently, the association between these two objects must be bidirectional
(Figure 10-9). We add an owner attribute to Account in the Java source code, but this is not
sufficient: by adding a second attribute to realize the association, we introduce redundancy into
the model. We need to ensure that if a given Account has a reference to a specific Advertiser, the
Advertiser has a reference to that same Account. In this case, as the Account object is created by
the Advertiser constructor, we add a parameter to the Account constructor to initialize the owner
field to the correct value. Thus, the initial values for both fields are specified in the same statement
in the Advertiser constructor. Moreover, we make the owner field of Account private and add a
public method to get its value. Since neither the Advertiser class nor the Account class modifies
the field anywhere else, this ensures that both reference attributes remain consistent. Note that this
assumption is not enforceable with the programming language constraints. The developer needs to
140

document this assumption by writing a one-line comment immediately before the account and
owner fields.
In Figure 10-9, both the Account and the Advertiser classes must be recompiled and tested
whenever we change either class. With a unidirectional association from the Advertiser class to the
Account class, the Account class would not be affected by changes to the Advertiser class.
Bidirectional associations, however, are usually necessary in the case of classes that need to work
together closely. The choice between unidirectional or bidirectional associations is a trade-off to
be evaluated in each specific context. To make the trade-off easier, we can systematically make all
attributes private and provide corresponding getAttribute() and setAttribute() operations to access
the reference. This minimizes changes to APIs when changing a unidirectional association to
bidirectional or vice versa.

One-to-many associations. One-to-many associations cannot be realized using a single


reference or a pair of references. Instead, we realize the “many” part using a collection of
references. For example, assume that an Advertiser can have several Accounts to track the
expenses accrued by AdvertisementBanners for different products. In this case, the Advertiser
object has a one-to-many association with the Account class (Figure 10-10). Because Accounts
have no specific order and because an Account can be part of an Advertiser at most once, we use
a set of references, called accounts, to model the “many” part of the association. Moreover, we
decide to realize this association as a bidirectional association, and so add the addAccount(),
removeAccount(), and setOwner() methods to the Advertiser and Account classes to update the
accounts and owner fields.
As in the one-to-one example, the association must be initialized when Advertiser and
Account objects are created. However, since an Advertiser can have a varying number of Accounts,
141

the Advertiser object does not invoke the Account constructor. Instead, a control object for creating
and archiving Accounts is responsible for invoking the constructor.
Note that the collection on the “many” side of the association depends on the constraints
on the association. For example, if the Accounts of an Advertiser must be ordered, we need to use
a List instead of a Set. To minimize changes to the interface when association constraints change,
we can set the return type of the getAccounts() method to Collection, a common superclass of List
and Set.

Many-to-many associations. In this case, both end classes have fields that are collections
of references and operations to keep these collections consistent. For example, the Tournament
class of ARENA has an ordered many-to-many association with the Player class. This association
is realized by using a List attribute in each class, which is modified by the operations addPlayer(),
removePlayer(), addTournament(), and removeTournament() (Figure 10-11). We already identified
acceptPlayer() and removePlayer() operations in the object design model (see Figure 9-11). We
rename acceptPlayer() to addPlayer() to maintain consistency with the code generated for other
associations.
As in the previous example, these operations ensure that both Lists are consistent. In the
event the association between Tournament and Player should be unidirectional, we could then
remove the tournaments attribute and its related methods, in which case a unidirectional many to-
many associations or a unidirectional one-to-many association are very similar and difficult to
distinguish at the object interface level.
142

Qualified associations. As we saw in Chapter 2, Modeling with UML, qualified


associations are used to reduce the multiplicity of one “many” sides in a one-to-many or a many-
to-many association. The qualifier of the association is an attribute of the class on the “many” side
of the association, such as a name that is unique within the context of the association, but not
necessarily globally unique. For example, consider the association between League and Player
(Figure 10-12). It is originally a many-to-many association (a League involves many Players, a
Player can take part in many Leagues). To make it easier to identify Players within a League,
Players can choose a short nickname that must be unique within the League. However, the Player
can choose different nicknames in different Leagues, and the nicknames do not need to be unique
globally within an Arena.
Qualified associations are realized differently from the way one-to-many and many-to
many associations are realized. The main difference is that we use a Map object to represent the
qualified end, as opposed to a List or a Set, and we pass the qualifier as a parameter in the
operations to access the other end of the association. To continue our example, consider the
association between League and Player. We realize this qualified association by creating a private
players attribute in League and a leagues attribute in Player. The players attribute is a Map indexed
by the nickname of the Player within the League. Because the nickname is stored in the Map, a
specific Player can have different nicknames across Leagues. The players attribute is modified
with the operations addPlayer() and removePlayer(). A specific Player is accessed with the
getPlayer() with a specific nickName, which reduces the need for iterating through the Map to find
a specific Player. The other end of the association is realized with a Set, as before.
//PTO
143

Associations classes. In UML, we use an association class to hold the attributes and
operations of an association. For example, we can represent the Statistics for a Player within a
Tournament as an association class, which holds statistics counters for each Player/Tournament
combination (Figure 10-13). To realize such an association, we first transform the association class
into a separate object and a number of binary associations. Then we can use the techniques
discussed earlier to convert each binary association to a set of reference attributes. In Section 10.6,
we revisit this case and describe additional mappings for realizing association classes.
Once associations have been mapped to fields and methods, the public interface of classes
is relatively complete and should change only as a result of new requirements, discovered bugs, or
refactoring.
//PTO
144

10.4.3 Mapping Contracts to Exceptions


Object-oriented languages that include constraints, such as Eiffel, can automatically check
contracts and raise exceptions when a contract is violated. This enables a class user to detect bugs
associated with incorrect assumptions about the used class. In particular, this is useful when
developers reuse a set of classes to discover boundary cases. Raising exceptions when
postconditions are violated enables class implementors to catch bugs early, to identify precisely
the operation in which the violation occurred, and to correct the offending code.
Unfortunately, many object-oriented languages, including Java, do not provide built-in
support for contracts. However, we can use their exception mechanisms as building blocks for
signaling and handling contract violations. In Java, we raise an exception with the throw keyword
followed by an exception object. The exception object provides a place holder for storing
information about the exception, usually an error message and a backtrace representing the call
stack of the throw. The effect of throwing an exception interrupts the control flow and unwinds the
call stack until a matching catch statement is found. The catch statement is followed by a
parameter, which is bound to the exception object, and an exception handling block. If the
exception object is of the same type of the parameter (or a subclass thereof), the catch statement
matches and the exception handling block is executed.
For example, in Figure 10-14, let us assume that the acceptPlayer() operation of
TournamentControl is invoked with a player who is already part of the Tournament. In this case,
TournamentControl.addPlayer() throws an exception of type KnownPlayer, which is caught by the
145

caller, TournamentForm.addPlayer(), which forwards the exception to the ErrorConsole class, and
then proceeds with the next Player. The ErrorConsole boundary object then displays a list of error
messages to the user.
A simple mapping would be to treat each operation in the contract individually and to add
code within the method body to check the preconditions, postconditions, and invariants relevant
to the operation:
• Checking preconditions. Preconditions should be checked at the beginning of the
method, before any processing is done. There should be a test that checks if the precondition is
true and raises an exception otherwise. Each precondition corresponds to a different exception, so
that the client class can not only detect that a violation occurred, but also identify which parameter
is at fault.
• Checking postconditions. Postconditions should be checked at the end of the method,
after all the work has been accomplished and the state changes are finalized. Each postcondition
corresponds to a Boolean expression in an if statement that raises an exception if the contract is
violated. If more than one postcondition is not satisfied, only the first detection is reported.
• Checking invariants. When treating each operation contract individually, invariants are
checked at the same time as postconditions.
• Dealing with inheritance. The checking code for preconditions and postconditions
should be encapsulated into separate methods that can be called from subclasses.
146

If we mapped every contract following the above steps, we would ensure that all
preconditions, postconditions, and invariants are checked for every method invocation, and that
violations are detected within one method invocation. While this approach results in a robust
system (assuming the checking code is correct), it is not realistic:
• Coding effort. In many cases, the code required for checking preconditions and
postconditions is longer and more complex than the code accomplishing the real work. This results
in increased effort that could be better spent in testing or code clean-up.
• Increased opportunities for defects. Checking code can also include errors, increasing
testing effort. Worse, if the same developer writes the method and the checking code, it is highly
probable that bugs in the checking code mask bugs in the actual method, thereby reducing the
value of the checking code.
• Obfuscated code. Checking code is usually more complex than its corresponding
constraint and difficult to modify when constraints change. This leads to the insertion of many
more bugs during changes, defeating the original purpose of the contract.
• Performances drawback. Checking systematically all contracts can significantly slow
down the code, sometimes by an order of magnitude. Although correctness is always a design goal,
response time and throughput design goals would not be met.
Hence, unless we have a tool for generating checking code automatically, such as iContract
[Kramer, 1998], we need to adopt a pragmatic approach and evaluate the above trade-offs in the
project context. Remember that contracts support communication among developers,
consequently, exception handling of contract violations should focus on interfaces between
developers. Below are heuristics to evaluate these trade-offs:

In all cases, the checking code should be documented with comments describing the
constraints checked, both in English and in OCL. In addition to making the code more readable,
this makes it easier to modify the checking code correctly when a constraint changes.
10.4.4 Mapping Object Models to a Persistent Storage Schema
In this section, we look at the steps involved in mapping an object model to a relational database
using Java and database schemas.
147

A schema is a description of the data, that is, a meta-model for data [Date, 2004]. In UML,
class diagrams are used to describe the set of valid instances that can be created by the source code.
Similarly, in relational databases, the database schema describes the valid set of data records that
can be stored in the database. Relational databases store both the schema and the data. Relational
databases store persistent data in the form of tables (also called relations in the database literature).
A table is structured in columns, each of which represents an attribute. For example, in Figure
10-16, the User table has three columns, firstName, login, and email. The rows of the table
represent data records, with each cell in the table representing the value of the attribute for the data
record in that row. In Figure 10-16, the User table contains three data records each representing
the attributes of specific users Alice, John, and Bob.

A primary key of a table is a set of attributes whose values uniquely identify the data
records in a table. The primary key is used to refer unambiguously to a specific data record when
inserting, updating, or removing it. For example, in Figure 10-16, the login attribute represents a
unique user name within an Arena. Hence, the login attribute can be used as a primary key. Note,
however, the email attribute is also unique across all users in the table. Hence, the email attribute
could also be used as a primary key. Sets of attributes that could be used as a primary key are called
candidate keys. Only the actual candidate key that is used in the application to identify data
records is the primary key.
A foreign key is an attribute (or a set of attributes) that references the primary key of
another table. A foreign key links a data record in one table with one or more data records in
another table. In Figure 10-17, the table League includes the foreign key owner that references the
login attribute in the User table in Figure 10-16. Alice is the owner of the tictactoeNovice and
tictactoeExpert leagues and John is the owner of the chessNovice league.
//PTO
148

Mapping classes and attributes


When mapping the persistent objects to relational schemata, we focus first on the classes
and their attributes. We map each class to a table with the same name. For each attribute, we add
a column in the table with the name of the attribute in the class. Each data record in the table
corresponds to an instance of the class. By keeping the names in the object model and the relational
schema consistent, we provide traceability between both representations and make future changes
easier.
When mapping attributes, we need to select a data type for the database column. For
primitive types, the correspondence between the programming language type and the database type
is usually trivial (e.g., the Java Date type maps to the datetime type in SQL). However, for other
types, such as String, the mapping is more complex. The type text in SQL requires a specified
maximum size. For example, when mapping the ARENA User class, we could arbitrarily limit the
length of first names to 25 characters, enabling us to use a column of type text[25]. Note that we
have to ensure that users’ first names comply with this new constraint by adding preconditions and
checking code in the entity and boundary objects.
Next, we focus on the primary key. There are two options when selecting a primary key for
the table. The first option is to identify a set of class attributes that uniquely identifies the object.
The second option is to add a unique identifier attribute that we generate.
Mapping associations
After having mapped the classes to relational tables, we now turn to the mapping of
associations. The mapping of associations to a database schema depends on the multiplicity of the
association. One-to-one and one-to-many associations are implemented as a so-called buried
association [Blaha & Premerlani, 1998], using a foreign key. Many-to-many associations are
implemented as a separate table.
Buried associations. Associations with multiplicity one can be implemented using a
foreign key. For one-to-many associations, we add a foreign key to the table representing the class
on the “many” end. For all other associations, we can select either class at the end of the
association. For example (Figure 10-19), consider the one-to-many association between
LeagueOwner and League. We map this association by adding a owner column to the League table
149

referring to the primary key of the LeagueOwner table. The value of the owner column is the value
of the id (i.e., the primary key) of the corresponding league. If there are multiple Leagues owned
by the same LeagueOwner, multiple data records of the League table have the id of the owner as
value for this column. For associations with a multiplicity of zero or one, a null value indicates
that there are no associations for the data record of interest.

Separate table. Many-to-many associations are implemented using a separate two-column


table with foreign keys for both classes of the association. We call this the association table. Each
row in the association table corresponds to a link between two instances. For example, we map the
many-to-many Tournament/Player association to an association table with two columns: one for
the id of the Tournaments, the other for the id of the Players. If a player is part of multiple
tournaments, each player/tournament association will have a separate data record. Similarly, if a
tournament includes multiple players, each player will have a separate data record. The association
table in Figure 10-20 contains two links representing the membership of “alice” and “john” in the
“novice” Tournament.

Note that a one-to-one and one-to-many association could be realized with an association table
instead of a buried association. Using a separate table to realize all associations results in a database
schema that is modifiable. For example, if we change the multiplicity of a one-to-many association
to a many-to-many association, we do not need to change the database schema. Of course, this
150

increases the overall number of tables in the schema and the time to traverse the association. In
general, we need to evaluate this trade-off in the context of the application, examining whether the
multiplicity of the association is likely to change or if response time is a critical design goal.
Mapping inheritance relationships
Relational databases do not directly support inheritance, but there are two main options for
mapping an inheritance relationship to a database schema. In the first option, called vertical
mapping, similar to a one-to-one association, each class is represented by a table and uses a foreign
key to link the subclass tables to the superclass table. In the second option, called horizontal
mapping, the attributes of the superclass are pushed down into the subclasses, essentially
duplicating columns in the tables corresponding to subclasses.
Vertical mapping. Given an inheritance relationship, we map the superclass and
subclasses to individual tables. The superclass table includes a column for each attribute defined
in the superclass. The superclass includes an additional column denoting the subclass that
corresponds to the data record. The subclass tables include a column for each attribute defined in
the superclass. All tables share the same primary key, that is, the identifier of the object. Data
records in the superclass and subclass tables with the same primary key value refer to the same
object.
151

Horizontal mapping. Another way to realize inheritance is to push the attributes of the
superclass down into the subclasses, effectively removing the need for a superclass table. In this
case, each subclass table duplicates the columns of the superclass.

The trade-off between using a separate table for superclasses and duplicating columns in
the subclass tables is between modifiability and response time. If we use a separate table, we can
add attributes to the superclass simply by adding a column to the superclass table. When adding a
subclass, we add a table for the subclass with a column for each attribute in the subclass. If we
duplicate columns, modifying the database schema is more complex and error prone. The
advantage of duplicating columns is that individual objects are not fragmented across a number of
tables, which results in faster queries. For deep inheritance hierarchies, this can represent a
significant performance difference.
In general, we need to examine the likelihood of changes against the performance
requirements in the specific context of the application.
10.5 Managing Implementation
Transformations enable us to improve specific aspects of the object design model and to convert
it into source code. By providing systematic recipes for recurring situations, transformations
enable us to reduce the amount of effort and the overall number of errors in the source code.
However, to retain this benefit throughout the lifetime of the system, we need to document the
application of transformations so that they can be consistently reapplied in the event of changes to
the object design model or the source code.
Reverse engineering attempts to alleviate this problem by allowing us to reconstruct the
object design model from the source code. If we could maintain a one-to-one mapping between
the source code and the object design mode, we would not need any documentation: the tools at
hand would automatically apply selected transformations and mirror changes in the source code
and the object design model. However, most useful transformations, including those described in
152

this chapter, are not one-to-one mappings. As a result, information is lost in the process of applying
the transformation. For example:
• Association multiplicity and collections. Unidirectional one-to-many associations and
many-to-many associations map to the same source code. A CASE tool that reverse-engineers the
corresponding source code usually selects the least restrictive case (i.e., a many-to-many
association). In general, information about association multiplicity is distributed in several places
in the source code, including checking code in the boundary objects.
• Association multiplicity and buried associations. One-to-many associations and one-
to one associations implemented as a buried association in a database schema suffer from the same
problem. Worse, when all associations are realized as separate tables, all information about
association multiplicity is lost.
• Postconditions and invariants. When mapping contracts to exception-handling code
(Section 10.4.3), we generate checking code only for preconditions. Postconditions and invariants
are not mapped to source code. The object specification and the system become quickly
inconsistent when postconditions or invariants are changed, but not documented.
These challenges boil down to finding conventions and mechanisms to keep the object design
model, the source code, and the documentation consistent with each other. There is no single
answer, but the following principles reduce consistency problems when applied systematically:
• For a given transformation, use the same tool. If you are using a modeling tool to map
associations to code, use the same tool when you change association multiplicities. Modern
modeling tools generate markers as source code comments to enable the repetitive generation of
code from the same model. However, this mapping can easily break when developers use
interchangeably a text editor or the modeling tool to change associations. Similarly, if you generate
constraint-checking code with a tool, regenerate the checking code when the constraint is changed.
• Keep the contracts in the source code, not in the object design model. Contracts
describe the behavior of methods and restrictions on parameters and attributes. Developers change
the behavior of an object by modifying the body of a method, not by modifying the object design
model. By keeping the constraint specifications as source code comments, they are more likely to
be updated when the code changes.
• Use the same names for the same objects. When mapping an association to source code
or a class to a database schema, use the same names on both sides of the transformation. If the
name is changed in the model, change it in the source code. By using the same names, you provide
traceability among the models and make it easier for developers to identify both ends of the
transformation. This also emphasizes the importance of identifying the right names for classes
during analysis, before any transformations are applied, to minimize the effort associated with
renaming.
• Make transformations explicit. When transformations are applied by hand, it is critical
that the transformation is made explicit in some form so that all developers can apply the
transformation the same way. For example, transformations for mapping associations to collections
should be documented in a coding conventions guide so that, when two developers apply the same
153

transformation, they produce the same code. This also makes it easier for developers to identify
transformations in the source code. As usual, the commitment of developers to use standard
conventions is more important than the actual conventions.
10.5.2 Assigning Responsibilities
Several roles collaborate to select, apply, and document transformations and the conversion of the
object design model into source code:
• The core architect selects the transformations to be systematically applied. For example,
if it is critical that the database schema is modifiable, the core architect decides that all associations
should be implemented as separate tables.
• The architecture liaison is responsible for documenting the contracts associated with
subsystem interfaces. When such contracts change, the architecture liaison is responsible for
notifying all class users.
• The developer is responsible for following the conventions set by the core architect and
actually applying the transformations and converting the object design model into source code.
Developers are responsible for maintaining up-to-date the source code comments with the rest of
the models.
Identifying and applying transformations the first time is relatively trivial. The key
challenge is in reapplying transformations after a change occurs. Hence, when assigning
responsibilities, each role should understand who should be notified in the event of changes.
(Note: ARENA case study is there in Page number 421/416 of the
textbook Object-oriented software engineering _ using UML,
Patterns, -- Bruegge, Bernd; Dutoit, Allen H, in section 10.6)
154

Module 4, Chapter 4
Testing
Testing is the process of finding differences between the expected behavior specified by system
models and the observed behavior of the implemented system. Unit testing finds differences
between a specification of an object and its realization as a component. Structural testing finds
differences between the system design model and a subset of integrated subsystems. Functional
testing finds differences between the use case model and the system. Finally, performance testing
finds differences between nonfunctional requirements and actual system performance. When
differences are found, developers identify the defect causing the observed failure and modify the
system to correct it. In other cases, the system model is identified as the cause of the difference,
and the system model is updated to reflect the system.
From a modeling point of view, testing is the attempt to show that the implementation of
the system is inconsistent with the system models. The goal of testing is to design tests that exercise
defects in the system and to reveal problems. This activity is contrary to all other activities we
described in previous chapters: analysis, design, implementation, communication, and negotiation
are constructive activities. Testing, however, is aimed at breaking the system. Consequently, testing
is usually accomplished by developers that were not involved with the construction of the system.
11.1 Introduction: Testing the Space Shuttle
Testing is the process of analyzing a system or system component to detect the differences between
specified (required) and observed (existing) behavior. Unfortunately, it is impossible to completely
test a nontrivial system. First, testing is not decidable. Second, testing must be performed under
time and budget constraints. As a result, systems are often deployed without being completely
tested, leading to faults discovered by end users.
The first launch of the Space Shuttle Columbia in 1981, for example, was canceled because
of a problem that was not detected during development. The problem was traced to a change made
by a programmer two years earlier, who erroneously reset a delay factor from 50 to 80
milliseconds. This added a probability of 1/67 that any space shuttle launch would fail.
Unfortunately, in spite of thousands of hours of testing after the change was made, the fault was
not discovered during the testing phase. During the actual launch, the fault caused a
synchronization problem with the shuttle’s five on-board computers that led to the decision to abort
the launch.
Testing is often viewed as a job that can be done by beginners. Managers would assign the
new members to the testing team, because the experienced people detested testing or are needed
for the more important jobs of analysis and design. Unfortunately, such an attitude leads to many
problems. To test a system effectively, a tester must have a detailed understanding of the whole
system, ranging from the requirements to system design decisions and implementation issues. A
tester must also be knowledgeable of testing techniques and apply these techniques effectively and
efficiently to meet time, budget, and quality constraints.
155

11.2 An Overview of Testing


Reliability is a measure of success with which the observed behavior of a system conforms to the
specification of its behavior. Software reliability is the probability that a software system will not
cause system failure for a specified time under specified conditions [IEEE Std. 982.2 1988].
Failure is any deviation of the observed behavior from the specified behavior. An erroneous state
(also called an error) means the system is in a state such that further processing by the system will
lead to a failure, which then causes the system to deviate from its intended behavior. A fault, also
called “defect” or “bug,” is the mechanical or algorithmic cause of an erroneous state. The goal of
testing is to maximize the number of discovered faults, which then allows developers to correct
them and increase the reliability of the system.
We define testing as the systematic attempt to find faults in a planned way in the
implemented software. Contrast this definition with another common one: “testing is the process
of demonstrating that faults are not present.” The distinction between these two definitions is
important. Our definition does not mean that we simply demonstrate that the program does what
it is intended to do. The explicit goal of testing is to demonstrate the presence of faults and non-
optimal behavior. Our definition implies that the developers are willing to dismantle things.
Moreover, for the most part, demonstrating that faults are not present is not possible in systems of
any realistic size.
Most activities of the development process are constructive: during analysis, design, and
implementation, objects and relationships are identified, refined, and mapped onto a computer
environment. Testing requires a different thinking, in that developers try to detect faults in the
system, that is, differences between the reality of the system and the requirements. Many
developers find this difficult to do. One reason is the way we use the word “success” during testing.
Many project managers call a test case “successful” if it does not find a fault; that is, they use the
second definition of testing during development. However, because “successful” denotes an
achievement, and “unsuccessful” means something undesirable, these words should not be used in
this fashion during testing.
In this chapter, we treat testing as an activity based on the falsification of system models,
which is based on Popper’s falsification of scientific theories [Popper, 1992]. According to Popper,
when testing a scientific hypothesis, the goal is to design experiments that falsify the underlying
theory. If the experiments are unable to break the theory, our confidence in the theory is
strengthened and the theory is adopted (until it is eventually falsified). Similarly, in software
testing, the goal is to identify faults in the software system (to falsify the theory). If none of the
tests have been able to falsify software system behavior with respect to the requirements, it is ready
for delivery. In other words, a software system is released when the falsification attempts (tests)
show a certain level of confidence that the software system does what it is supposed to do.
There are many techniques for increasing the reliability of a software system:
• Fault avoidance techniques try to detect faults statically, that is, without relying on the
execution of any of the system models, in particular the code model. Fault avoidance tries
156

to prevent the insertion of faults into the system before it is released. Fault avoidance
includes development methodologies, configuration management, and verification.
• Fault detection techniques, such as debugging and testing, are uncontrolled and controlled
experiments, respectively, used during the development process to identify erroneous states
and find the underlying faults before releasing the system. Fault detection techniques assist
in finding faults in systems, but do not try to recover from the failures caused by them. In
general, fault detection techniques are applied during development, but in some cases, they
are also used after the release of the system. The black-boxes in an airplane to log the last
few minutes of a flight is an example of a fault detection technique.
• Fault tolerance techniques assume that a system can be released with faults and that
system failures can be dealt with by recovering from them at runtime. For example,
modular redundant systems assign more than one component with the same task, then
compare the results from the redundant components. The space shuttle has five onboard
computers running two different pieces of software to accomplish the same task.
In this chapter, we focus on fault detection techniques, including reviews and testing. A review
is the manual inspection of parts or all aspects of the system without actually executing the system.
There are two types of reviews: walkthrough and inspection. In a code walkthrough, the developer
informally presents the API (Application Programmer Interface), the code, and associated
documentation of the component to the review team. The review team makes comments on the
mapping of the analysis and object design to the code using use cases and scenarios from the
analysis phase. An inspection is similar to a walkthrough, but the presentation of the component
is formal. In fact, in a code inspection, the developer is not allowed to present the artifacts (models,
code, and documentation). This is done by the review team, which is responsible for checking the
interface and code of the component against the requirements. It also checks the algorithms for
efficiency with respect to the nonfunctional requirements. Finally, it checks comments about the
code and compares them with the code itself to find inaccurate and incomplete comments. The
developer is only present in case the review needs clarifications about the definition and use of
data structures or algorithms. Code reviews have proven to be effective at detecting faults. In some
experiments, up to 85 percent of all identified faults were found in code reviews [Fagan, 1976],
[Jones, 1977], [Porter et al., 1997].
Debugging assumes that faults can be found by starting from an unplanned failure. The
developer moves the system through a succession of states, ultimately arriving at and identifying
the erroneous state. Once this state has been identified, the algorithmic or mechanical fault causing
this state must be determined. There are two types of debugging: The goal of correctness
debugging is to find any deviation between observed and specified functional requirements.
Performance debugging addresses the deviation between observed and specified nonfunctional
requirements, such as response time.
Testing is a fault detection technique that tries to create failures or erroneous states in a planned
way. This allows the developer to detect failures in the system before it is released to the customer.
Note that this definition of testing implies that a successful test is a test that identifies faults. We
157

will use this definition throughout the development phases. Another often-used definition of
testing is that “it demonstrates that faults are not present.” We will use this definition only after the
development of the system when we try to demonstrate that the delivered system fulfills the
functional and nonfunctional requirements.
If we used this second definition all the time, we would tend to select test data that have a low
probability of causing the program to fail. If, on the other hand, the goal is to demonstrate that a
program has faults, we tend to look for test data with a higher probability of finding faults. The
characteristic of a good test model is that it contains test cases that identify faults. Tests should
include a broad range of input values, including invalid inputs and boundary cases, otherwise,
faults may not be detected. Unfortunately, such an approach requires extremely lengthy testing
times for even small systems.
158

Figure 11-1 depicts an overview of testing activities:


• Test planning allocates resources and schedules the testing. This activity should occur early
in the development phase so that sufficient time and skill is dedicated to testing. For example,
developers can design test cases as soon as the models they validate become stable.
• Usability testing tries to find faults in the user interface design of the system. Often, systems
fail to accomplish their intended purpose simply because their users are confused by the user
interface and unwillingly introduce erroneous data.
• Unit testing tries to find faults in participating objects and/or subsystems with respect to the
use cases from the use case model.
• Integration testing is the activity of finding faults by testing individual components in
combination. Structural testing is the culmination of integration testing involving all
components of the system. Integration tests and structural tests exploit knowledge from the
SDD (System Design Document) using an integration strategy described in the Test Plan (TP).
• System testing tests all the components together, seen as a single system to identify faults with
respect to the scenarios from the problem statement and the requirements and design goals
identified in the analysis and system design, respectively:
o Functional testing tests the requirements from the RAD and the user manual.
o Performance testing checks the nonfunctional requirements and additional design
goals from the SDD. Functional and performance testing are done by developers.
o Acceptance testing and installation testing check the system against the project
agreement and is done by the client, if necessary, with help by the developers.
11.3 Testing Concepts
In this section, we present the model elements used during testing (Figure 11-2):
• A test component is a part of the system that can be isolated for testing. A component can be
an object, a group of objects, or one or more subsystems.
• A fault, also called bug or defect, is a design or coding mistake that may cause abnormal
component behavior.
• An erroneous state is a manifestation of a fault during the execution of the system. An
erroneous state is caused by one or more faults and can lead to a failure.
• A failure is a deviation between the specification and the actual behavior. A failure is triggered
by one or more erroneous states. Not all erroneous states trigger a failure.
o Note that, outside the testing community, developers often do not distinguish between
faults, failures, and erroneous states, and instead, refer to all three concepts as
“errors.”
• A test case is a set of inputs and expected results that exercises a test component with the
purpose of causing failures and detecting faults.
• A test stub is a partial implementation of components on which the tested component depends.
A test driver is a partial implementation of a component that depends on the test component.
Test stubs and drivers enable components to be isolated from the rest of the system for testing.
159

• A correction is a change to a component. The purpose of a correction is to repair a fault. Note


that a correction can introduce new faults.

11.3.1 Faults, Erroneous States, and Failures


To speak about erroneous state, failure, or fault, we need to compare the desired behavior
(described in the use case in the RAD) with the observed behavior (described by the test case).
You are probably already familiar with many other algorithmic faults that are introduced
during the implementation phase. For example, “Exiting a loop too soon,” “exiting a loop too late,”
“testing for the wrong condition,” “forgetting to initialize a variable” are all implementation-
specific algorithmic faults. Algorithmic faults can also occur during analysis and system design.
Stress and overload problems, for example, are object design specific algorithmic faults that lead
to failure when data structures are filled beyond their specified capacity. Throughput and
performance failures are possible when a system does not perform at the speed specified by the
nonfunctional requirements.
A fault in the virtual machine of a software system is another example of a mechanical
fault: even if the developers have implemented correctly, that is, they have mapped the object
model correctly onto the code, the observed behavior can still deviate from the specified behavior.
In concurrent engineering projects, for example, where hardware is developed in parallel with
software, we cannot always make the assumption that the virtual machine executes as specified.
Other examples of mechanical faults are power failures. Note the relativity of the terms “fault”
and “failure” with respect to a particular system component: the failure in one system component
(the power system) is the mechanical fault that can lead to failure in another system component
(the software system).
160

11.3.2 Test Cases


A test case is a set of input data and expected results that exercises a component with the purpose
of causing failures and detecting faults. A test case has five attributes: name, location, input, oracle,
and log (Table 11-1).
The name of the test case allows the tester to distinguish between different test cases. A
heuristic for naming test cases is to derive the name from the requirement it is testing or from the
component being tested. For example, if you are testing a use case Deposit(), you might want to
call the test case Test_Deposit. If a test case involves two components A and B, a good name would
be Test_AB. The location attribute describes where the test case can be found. It should be either
the path name or the URL to the executable of the test program and its inputs.

Input describes the set of input data or commands to be entered by the actor of the test case
(which can be the tester or a test driver). The expected behavior of the test case is the sequence of
output data or commands that a correct execution of the test should yield. The expected behavior
is described by the oracle attribute. The log is a set of time-stamped correlations of the observed
behavior with the expected behavior for various test runs.
Once test cases are identified and described, relationships among test cases are identified.
Aggregation and the precede associations are used to describe the relationships between the test
cases. Aggregation is used when a test case can be decomposed into a set of subtests. Two test
cases are related via the precede association when one test case must precede another test case.
Figure 11-9 shows a test model where TestA must precede TestB and TestC. For example,
TestA consists of TestA1 and TestA2, meaning that once TestA1 and TestA2 are tested, TestA is
tested; there is no separate test for TestA. A good test model has as few associations as possible,
because tests that are not associated with each other can be executed independently from each
other. This allows a tester to speed up testing, if the necessary testing resources are available. In
Figure 11-9, TestB and TestC can be tested in parallel, because there is no relation between them.
161

Test cases are classified into blackbox tests and whitebox tests, depending on which aspect
of the system model is tested. Blackbox tests focus on the input/output behavior of the component.
Blackbox tests do not deal with the internal aspects of the component, nor with the behavior or the
structure of the components. Whitebox tests focus on the internal structure of the component. A
whitebox test makes sure that, independently from the particular input/output behavior, every state
in the dynamic model of the object and every interaction among the objects is tested. As a result,
whitebox testing goes beyond blackbox testing. In fact, most of the whitebox tests require input
data that could not be derived from a description of the functional requirements alone. Unit testing
combines both testing techniques: blackbox testing to test the functionality of the component, and
whitebox testing to test structural and dynamic aspects of the component.
11.3.3 Test Stubs and Drivers
Executing test cases on single components or combinations of components requires the tested
component to be isolated from the rest of the system. Test drivers and test stubs are used to
substitute for missing parts of the system. A test driver simulates the part of the system that calls
the component under test. A test driver passes the test inputs identified in the test case analysis to
the component and displays the results.
A test stub simulates a component that is called by the tested component. The test stub
must provide the same API as the method of the simulated component and must return a value
compliant with the return result type of the method’s type signature. Note that the interface of all
components must be baselined. If the interface of a component changes, the corresponding test
drivers and stubs must change as well.
The implementation of test stubs is a nontrivial task. It is not sufficient to write a test stub
that simply prints a message stating that the test stub was called. In most situations, when
component A calls component B, A is expecting B to perform some work, which is then returned
as a set of result parameters. If the test stub does not simulate this behavior, A will fail, not because
of a fault in A, but because the test stub does not simulate B correctly.
Even providing a return value is not always sufficient. For example, if a test stub always
returns the same value, it might not return the value expected by the calling component in a
162

particular scenario. This can produce confusing results and even lead to the failure of the calling
component, even though it is correctly implemented. Often, there is a trade-off between
implementing accurate test stubs and substituting the test stubs by the actual component. For many
components, drivers and stubs are often written after the component is completed, and for
components that are behind schedule, stubs are often not written at all.
To ensure that stubs and drivers are developed and available when needed, several
development methods stipulate those drivers be developed for every component. This results in
lower effort because it provides developers the opportunity to find problems with the interface
specification of the component under test before it is completely implemented.
11.3.4 Corrections
Once tests have been executed and failures have been detected, developers change the component
to eliminate the suspected faults. A correction is a change to a component whose purpose is to
repair a fault. Corrections can range from a simple modification to a single component, to a
complete redesign of a data structure or a subsystem. In all cases, the likelihood that the developer
introduces new faults into the revised component is high. Several techniques can be used to
minimize the occurrence of such faults:
• Problem tracking includes the documentation of each failure, erroneous state, and fault
detected, its correction, and the revisions of the components involved in the change. Together
with configuration management, problem tracking enables developers to narrow the search for
new faults.
• Regression testing includes the re-execution of all prior tests after a change. This ensures that
functionality which worked before the correction has not been affected. Regression testing is
important in object-oriented methods, which call for an iterative development process. This
requires testing to be initiated earlier and for test suites to be maintained after each iteration.
Regression testing unfortunately is costly, especially when part of the tests is not automated.
• Rationale maintenance includes the documentation of the rationale for the change and its
relationship with the rationale of the revised component. Rationale maintenance enables
developers to avoid introducing new faults by inspecting the assumptions that were used to
build the component.
Next, let us describe in more detail the testing activities that lead to the creation of test cases,
their execution, and the development of corrections. Note: Section 11.4 is not part of this syllabus,
but still typed here for brief knowledge about it.
11.4 Testing Activities
In this section, we describe the technical activities of testing. These include
• Component inspection, which finds faults in an individual component through the manual
inspection of its source code.
• Usability testing, which finds differences between what the system does and the users’
expectation of what it should do.
163

• Unit testing, which finds faults by isolating an individual component using test stubs and
drivers and by exercising the component using test cases.
• Integration testing, which finds faults by integrating several components together.
• System testing, which focuses on the complete system, its functional and nonfunctional
requirements, and its target environment.
11.5 Managing Testing
In this section, we describe how to manage testing activities to minimize the resources needed.
Many testing activities occur near the end of the project, when resources are running low and
delivery pressure increases. Often, trade-offs lie between the faults to be repaired before delivery
and those that can be repaired in a subsequent revision of the system. In the end, however,
developers should detect and repair a sufficient number of faults such that the system meets
functional and nonfunctional requirements to an extent acceptable to the client.
First, we describe the planning of test activities (Section 11.5.1). Next, we describe the test
plan, which documents the activities of testing (Section 11.5.2). Next, we describe the roles
assigned during testing (Section 11.5.3). Next, we discuss the topics of regression testing (Section
11.5.4), automated testing (Section 11.5.5), and model-based testing (Section 11.5.6).
11.5.1 Planning Testing
Developers can reduce the cost of testing and the elapsed time necessary for its completion through
careful planning. Two key elements are to start the selection of test cases early and to parallelize
tests.
Developers responsible for testing can design test cases as soon as the models they validate
become stable. Functional tests can be developed when the use cases are completed. Unit tests of
subsystems can be developed when their interfaces is defined. Similarly, test stubs and drivers can
be developed when component interfaces are stable. Developing tests early enables the execution
of tests to start as soon as components become available. Moreover, given that developing tests
requires a close examination of the models under validation, developers can find faults in the
models even before the system is constructed. Note, however, that developing tests early on
introduces a maintenance problem: test cases, drivers, and stubs need to be updated whenever the
system models change.
The second key element in shortening testing time is to parallelize testing activities. All
component tests can be conducted in parallel; double tests for components in which no faults were
discovered can be initiated while other components are repaired.
Testing represents a substantial part of the overall project resources. A typical guideline for
projects following a Unified Process life cycle is to allocate 25 percent of project resources to
testing (see Section 15.4.2; [Royce, 1998]). However, this number can go up depending onsafety
and reliability requirements on the system. Hence, it is critical that test planning start early, as early
as the use case model is stable.
164

11.5.2 Documenting Testing


Testing activities are documented in four types of documents, the Test Plan, the Test Case
Specifications, the Test Incident Reports, and the Test Summary Report:
• The Test Plan focuses on the managerial aspects of testing. It documents the scope, approach,
resources, and schedule of testing activities. The requirements and the components to be tested
are identified in this document.
• Each test is documented by a Test Case Specification. This document contains the inputs,
drivers, stubs, and expected outputs of the tests, as well as the tasks to be performed.
• Each execution of each test is documented by a Test Incident Report. The actual results of the
tests and differences from the expected output are recorded.
• The Test Report Summary document lists all the failures discovered during the tests that need
to be investigated. From the Test Report Summary, the developers analyze and prioritize each
failure and plan for changes in the system and in the models. These changes in turn can trigger
new test cases and new test executions.
The Test Plan (TP) and the Test Case Specifications (TCS) are written early in the process, as
soon as the test planning and each test case are completed. These documents are under
configuration management and updated as the system models change. Figure 11-27 is an outline
for a Test Plan.

• Section 1 of the test plan describes the objectives and extent of the tests. The goal is to provide
a framework that can be used by managers and testers to plan and execute the necessary tests
in a timely and cost-effective manner.
• Section 2 explains the relationship of the test plan to the other documents produced during the
development effort such as the RAD, SDD, and ODD (Object Design Document). It explains
how all the tests are related to the functional and nonfunctional requirements, as well as to the
system design stated in the respective documents. If necessary, this section introduces a naming
scheme to establish the correspondence between requirements and tests.
165

• Section 3, focusing on the structural aspects of testing, provides an overview of the system in
terms of the components that are tested during the unit test. The granularity of components and
their dependencies are defined in this section.
• Section 4, focusing on the functional aspects of testing, identifies all features and combinations
of features to be tested. It also describes all those features that are not to be tested and the
reasons for not testing them.
• Section 5 specifies generic pass/fail criteria for the tests covered in this plan. They are
supplemented by pass/fail criteria in the test design specification. Note that “fail” in the IEEE
standard terminology means “successful test” in our terminology.
• Section 6 describes the general approach to the testing process. It discusses the reasons for the
selected integration testing strategy. Different strategies are often needed to test different parts
of the system. A UML class diagram can be used to illustrate the dependencies between the
individual tests and their involvement in the integration tests.
• Section 7 specifies the criteria for suspending the testing on the test items associated with the
plan. It also specifies the test activities that must be repeated when testing is resumed.
• Section 8 identifies the resources that are needed for testing. This should include the physical
characteristics of the facilities, including the hardware, software, special test tools, and other
resources needed (office space, etc.) to support the tests.
• Section 9, the core of the test plan, lists the test cases that are used during testing. Each test
case is described in detail in a separate Test Case Specification document. Each execution of
these tests will be documented in a Test Incident Report document. We describe these
documents in more details later in this section.
• Section 10 of the test plan covers responsibilities, staffing and training needs, risks and
contingencies, and the test schedule.
Figure 11-28 is an outline of a Test Case Specification.

The Test Case Specification identifier is the name of the test case, used to distinguish it from other
test cases. Conventions such as naming the test cases from the features or the component being
tested allow developers to more easily refer to test cases. Section 2 of the TCS lists the components
under test and the features being exercised. Section 3 lists the inputs required for the test cases.
166

Section 4 lists the expected output. This output is computed manually or with a competing system
(such as a legacy system being replaced). Section 5 lists the hardware and software platform
needed to execute the test, including any test drivers or stubs. Section 6 lists any constraints needed
to execute the test such as timing, load, or operator intervention. Section 7 lists the dependencies
with other test cases.
The Test Incident Report lists the actual test results and the failures that were experienced.
The description of the results must include which features were demonstrated and whether the
features have been met. If a failure has been experienced, the test incident report should contain
sufficient information to allow the failure to be reproduced. Failures from all Test Incident Reports
are collected and listed in the Test Summary Report and then further analyzed and prioritized by
the developers.
11.5.3 Assigning Responsibilities
Testing requires developers to find faults in components of the system. This is best done when the
testing is performed by a developer who was not involved in the development of the component
under test, one who is less reticent to break the component being tested and who is more likely to
find ambiguities in the component specification.
For stringent quality requirements, a separate team dedicated to quality control is solely
responsible for testing. The testing team is provided with the system models, the source code, and
the system for developing and executing test cases. Test Incident Reports and Test Report
Summaries are then sent back to the subsystem teams for analysis and possible revision of the
system. The revised system is then retested by the testing team, not only to check if the original
failures have been addressed, but also to ensure that no new faults have been inserted in the system.
For systems that do not have stringent quality requirements, subsystem teams can double
as a testing team for components developed by other subsystem teams. The architecture team can
define standards for test procedures, drivers, and stubs, and can perform as the integration test
team. The same test documents can be used for communication among subsystem teams.
One of the main problems of usability tests is with enrolling participants. Several obstacles
are faced by project managers in selecting real end users [Grudin, 1990]:
• The project manager is usually afraid that users will bypass established technical support
organizations and call the developers directly, once they know how to get to them. Once this
line of communication is established, developers might be sidetracked too often from doing
their assigned jobs.
• Sales personnel do not want developers to talk to “their” clients. Sales people are afraid that
developers may offend the client or create dissatisfaction with the current generation of
products (which still must be sold).
• The end users do not have time.
• The end users dislike being studied. For example, an automotive mechanic might think that an
augmented reality system will put him out of work.
Debriefing the participants is the key to coming to understanding how to improve the usability of
the system being tested. Even though the usability test uncovers and exposes problems, it is often
167

the debriefing session that illustrates why these problems have occurred in the first place. It is
important to write recommendations on how to improve the tested components as fast as possible
after the usability test is finished, so they can be used by the developers to implement any necessary
changes in the system models of the tested component.
11.5.4 Regression Testing
Object-oriented development is an iterative process. Developers modify, integrate, and retest
components often, as new features are implemented or improved. When modifying a component,
developers design new unit tests exercising the new feature under consideration. They may also
retest the component by updating and rerunning previous unit tests. Once the modified component
passes the unit tests, developers can be reasonably confident about the changes within the
component. However, they should not assume that the rest of the system will work with the
modified component, even if the system has previously been tested. The modification can
introduce side effects or reveal previously hidden faults in other components. The changes can
exercise different assumptions about the unchanged components, leading to erroneous states.
Integration tests that are rerun on the system to produce such failures are called regression tests.
The most robust and straightforward technique for regression testing is to accumulate all
integration tests and rerun them whenever new components are integrated into the system. This
requires developers to keep all tests up-to-date, to evolve them as the subsystem interfaces change,
and to add new integration tests as new services or new subsystems are added. As regression testing
can become time consuming, different techniques have been developed for selecting specific
regression tests. Such techniques include [Binder, 2000]:
• Retest dependent components. Components that depend on the modified component are the
most likely to fail in a regression test. Selecting these tests will maximize the likelihood of
finding faults when rerunning all tests is not feasible.
• Retest risky use cases. Often, ensuring that the most catastrophic faults are identified is more
critical than identifying the largest number of faults. By focusing first on use cases that present
the highest risk, developers can minimize the likelihood of catastrophic failures.
• Retest frequent use cases. When users are exposed to successive releases of the same system,
they expect that features that worked before continue to work in the new release. To maximize
the likelihood of this perception, developers focus on the use cases that are most often used by
the users.
In all cases, regression testing leads to running many tests many times. Hence, regression testing
is feasible only when an automated testing infrastructure is in place, enabling developers to
automatically set up, initialize, and execute tests and compare their results with a predefined oracle.
11.5.5 Automating Testing
Manual testing involves a tester to feed predefined inputs into the system using the user interface,
a command line console, or a debugger. The tester then compares the outputs generated by the
system with the expected oracle. Manual testing can be costly and error prone when many tests
are involved or when the system generates a large volume of outputs. When requirements change
168

and the system evolves rapidly, testing should be repeatable. This makes these drawbacks worse,
as it is difficult to guarantee that the same test is executed under the same conditions every time.
The repeatability of test execution can be achieved with automation. Although all aspects
of testing can be automated (including test case and oracle generation), the main focus of test
automation has been on execution. For system tests, test cases are specified in terms of the
sequence and timing of inputs and an expected output trace. The test harness can then execute a
number of test cases and compare the system output with the expected output trace. For unit and
integration tests, developers specify a test as a test driver that exercises one or more methods of
the classes under tests.
The benefit of automating test execution is that tests are repeatable. Once a fault is
corrected as a result of a failure, the test that uncovered the failure can be repeated to ensure that
the failure does not occur anymore. Moreover, other tests can be run to ensure (to a limited extent)
that no new faults have been introduced. Moreover, when tests are repeated many times, for
example, in the case of refactoring (see Section 10.3.2), the cost of testing is decreased
substantially. However, note that developing a test harness and test cases is an investment. If tests
are run only once or twice, manual testing may be a better alternative.
An example of an automated test infrastructure is JUnit, a framework for writing and
automating the execution of unit tests for Java classes [JUnit, 2009]. The JUnit test framework is
made out of a small number of tightly integrated classes (Figure 11-29). Developers write new test
cases by subclassing the TestCase class. The setUp() and tearDown() methods of the concrete test
case initialize and clean up the testing environment, respectively. The runTest() method includes
the actual test code that exercises the class under test and compares the results with an expected
condition. The test success or failure is then recorded in an instance of TestResult. TestCases can
be organized into TestSuites, which will invoke sequentially each of its tests. TestSuites can also
be included in other TestSuites, thereby enabling developers to group unit tests into increasingly
larger test suites.
169

Typically, when using JUnit, each TestCase instance exercises one method of the class
under test. To minimize the proliferation of TestCase classes, all test methods exercising the same
class (and requiring the same test environment initialized by the setUp() method) are grouped in
the same ConcreteTestCase class. The actual method that is invoked by runTest() can then be
configured when creating instances of TestCases. This enables developers to organize and
selectively invoke large number of tests.
11.5.6 Model-Based Testing
Testing (manual or automated) requires an infrastructure for executing tests, instrumenting the
system under test, and collecting and assessing test results. This infrastructure is called the test
harness or test system. The test system is made of software and hardware components that interact
with various actors, which can then be modeled using UML. In Chapters 2-10, we have shown
how the system under development and the development organization can be modeled in UML.
Similarly, we can model the test system in UML. To be able to do this, we need to extend UML
with new entity objects for modeling the test system.
UML profiles provide a way for extending UML. A UML profile is a collection of new
stereotypes, new interfaces, or new constraints, thus providing new concepts specialized to an
application domain or a solution domain.
U2TP (UML 2 Testing Profile, [OMG, 2005]) is an example of a UML profile, which
extends UML for modeling testing. Modeling the test system in U2TP provides the same
advantages as when modeling the system under development: test cases are modeled in a standard
notation understood by all participants, test cases can be automatically generated from test models,
test cases execution and results can be automatically collected and recorded.
U2TP extends UML with the following concepts:
• The system under test (stereotype «sut»), which may be the complete system under
development, or only a part of it, such as a subsystem or a single class.
• A test case (stereotype «testCase») is a specification of behavior realizing one or more test
objectives. A test cases specifies the sequence of interactions among the system under test and
the test components. The interactions are either stimuli on the system under test or observations
gathered from the system under test or from test components. A test case is represented as a
sequence diagram or state machine. Test cases return an enumerated type called verdict,
denoting if the test run passed, failed, was inconclusive, or an error in the test case itself was
detected. In U2TP terminology, an error is caused by a fault in the test system, while a failure
is caused by a fault in the system under test.
• A test objective (stereotype «testObjective») describes in English the goal of one or several
test cases. A test objective is typically a requirement or a part of a requirement that is being
verified. For example, the test objective of the displayTicketPrices test case is to verify that the
correct price is displayed after selected a zone button on the ticket distributor.
• Test components (stereotype «testComponent»), such as test stubs and utilities needed for
executing a test case. Examples of test components include simulated hardware, simulated user
behavior, or components that inject faults.
170

• Test contexts (stereotype «testContext»), which include the set of test cases, the configuration
of test components and system under test needed for every test case, and a test control for
sequencing the test cases.
• An arbiter (interface Arbiter), which collects the local test results into an aggregated result.
• A scheduler (interface Scheduler), which creates and coordinate the execution of the test cases
among test components and system under test.
Figure 11-30 depicts an example of a test system in U2TP for the TicketDistributor. The test
context PurchaseTicketSuite groups all the test cases for the PurchaseTicket use case. The system
under test is the TicketDistributor software. To make it easier to control and instrument the system
to assess the success or failure of tests, we simulate the ticket distributor display with a
DisplaySimulator test component.

For example, Figure 11-31 depicts the expected interactions of the displayTicketPrices() test case
resulting to a pass verdict. selectZone1(), selectZone2(), and selectZone4() are stimuli on the
system under test. getDisplay() are observations to assess if individual test steps were successful.
Note that only the expected interactions are displayed. Any unexpected interactions, missing
interactions, or observations that do not match the oracles, lead to a failed verdict. U2TP also
provides mechanisms, not discussed here, to explicitly model interactions that lead to an
inconclusive or a failed verdict.
The displayTicketPrices() test case of Figure 11-31 explicitly models the mapping between
zones and ticket prices. In a realistic system, this approach would not be sustainable, as many test
cases are repeated for boundary values and with samples of different equivalence classes. To
address this challenge, U2TP provides the concepts of DataPool, DataPartition, and DataSelector,
to represent test data samples, equivalence classes, and data selection strategies, respectively.
These allow to parameterize test cases with different sets of values, keeping the specification of
test cases concise and reusable.
171

//Module 4, Chapter 4 completed


172

Module 5, Chapter 1
Software Maintenance
Often, the software system undergoes a beta testing phase, during which the users test run the
system and report bugs and deficiencies. The development team removes the bugs and
deficiencies. This period may take a few weeks or several months, depending on the nature of the
application, complexity, and size of the software system. After beta test, the software system enters
into its maintenance phase. The system is stabilizing in the first several months during which the
users exercise more and more functions and become familiar with the system's behavior. Removal
of bugs and deficiencies continues, but the rate should reduce significantly. This period is
sometimes called the system aging period. As the world evolves, the system's functionality,
performance, quality of service, or security can no longer satisfy the business needs. Enhancement
to the system is required. In this case, a new project is established to identify new capabilities, and
design and implement the new capabilities to enhance the software system. The new project will
go through the steps as described in the previous chapters. In this way, the system evolves during
the prolonged maintenance period. Various surveys show that software maintenance consumes
60%-80% of the total life-cycle costs; 75% or more of the costs are due to enhancements.
Therefore, software maintenance is an important area of software engineering and deserves an
entire book. This chapter serves as an introduction to the topic. After studying this chapter, you
will learn the following:
• Fundamentals of software maintenance.
• Factors that require software maintenance.
• Lehman's Jaw of system evolution.
• Types of software maintenance.
• Software maintenance process models and activities.
• Software reverse-engineering.
• Software reengineering.
21.1 WHAT IS SOFTWARE MAINTENANCE?
Many software systems that were constructed decades ago are still in use today. These systems are
called legacy systems. Many of the legacy systems will continue to operate in the next several
decades. One of the reasons that these systems cannot retire as they should is the high cost of
replacing them. Another reason is there is no guarantee that the new system will be as good as the
replaced system. This is because the legacy systems have embedded the collective knowledge,
experience, and intelligence of thousands of software engineers, domain experts, and users during
the last several decades. Even if replacing an old system is an option, it is too costly for an
organization to replace an existing system frequently. The costs include system development cost,
costs associated with lost productivity due to procurement, participation in requirements gathering,
system design reviews, acceptance testing, user training, beta testing, and adapting to the new
system. Therefore, after their releases, systems undergo a prolonged period of continual
modifications to correct errors, enhance capabilities, adapting to new operating platforms or
173

environments, and improving the system structure to make it possible for further changes. This
process is called software maintenance, defined by the IEEE as follows:
Definition 21.1 Software maintenance is modifying a software system or com ponent after
delivery to correct faults, improve performance, add new capabilities, or adapt to a changed
environment.
21.2 FACTORS THAT MANDATE CHANGE
After the system is released, installed, and operated in the target environment, update to the system
is still needed. A number of factors mandate software change:
1. Bug fixes. Although the software system has been tested to achieve a desired test
coverage, some vital bugs may occur during the operational phase. These require bug removal and
regression testing to ensure that the modified software passes selected tests performed previously.
2. Change in operating environment. Changes in the hardware, platform, and system
configuration may require modification to the software.
3. Change in government policies and regulations. Changes in government policies and
regulations may require changes to the software system to comply with the new policies and
regulations.
4. Change in business procedures. Many software systems automate business operations.
If the procedures of some of the business operations are changed, then the software system must
be modified accordingly. For example, as security becomes important, many web-based
applications require users to set up authentication questions and answers to better authenticate the
users. Such changes are due to changes in business procedures and require changes to the software.
5. Changes to prevent future problems. Sometimes, changes to the software system are
needed to prevent problems that could occur in the future. For example, redesign and reimplement
a complex component to improve its reliability.
21.3 LEHMAN'S LAWS OF SYSTEM EVOLUTION
Lehman and Belady conducted a series of studies on systems evolution. The Lehman's laws of
system evolution are the result of these studies. The laws are specified for the so-called E-type of
systems. These are systems that cannot be completely and definitely specified. That is, system
development for such a system is a wicked problem. On the other hand, the S-type systems are
systems that can be completely and definitely specified. Their development is not a wicked
problem. Examples of such systems are mathematical software, chess playing software and the
like. The eight Lehman's laws are:
1. law of continuing change (1974). After the system is released, changes to the system
are required, and these continue until the system is replaced. Changes are due to reasons
described in Section 21.2.
2. law of increasing entropy or complexity (1974). The structure of the software system
deteriorates as changes are made. This is because changes introduce errors, which
require more changes. Changes often introduce conditional statements to handle
erroneous situations, or check for invocation of new fean1res. These increase the
complexity of the system and coupling between the components. The result is that the
174

system becomes more and more difficult to understand and maintain. Restructuring or
reengineering is required to improve the structure of the system to reduce the
maintenance cost.
3. law of self-regulation (1974). The system evolution process is a self-regulating
process. Many system attributes such as maintainability, release interval, error rate, and
the like may appear to be stochastic from release to release. However, their long-term
trends exhibit observable regularities. In fact, this law is universal. That is, it is not
limited to system evolution. It is applicable to everything because everything is a
system. Consider, for example, the stock chart for a public company. The daily prices
may fluctuate, sometimes drastically. However, the long-term movements exhibit an
upward, downward, or flat trend. This law is due to the eighth law-that is, the law of
feedback systems. Indeed, the regularity is the result of the feedback loops, or
interaction of factors that cancel each other as well as enhance each other during a long
period of maintenance activities. This law is also a generalization of the next three laws-
law of conservation of organizational stability, law of conservation of familiarity, and
law of continuing growth. These three laws state the regularities of three specific
aspects.
4. Law of conservation of organizational stability (1978). The maintenance process for
an E-type system tends to exhibit a constant average work rate over the system's
lifetime.
5. Law of conservation of familiarity (1978). The average incremental growth of the
system remains a constant during the system's lifetime.
6. Law of continuing growth (1991). E-type systems must continue its functional growth
to satisfy its users.
7. Law of declining quality (1996). The quality of E-type systems will appear to be
declining unless they are rigorously adapted to the changes in the operating
environment.
8. Law of feedback systems (1996). The evolution process consists of multilevel,
multiloop, and multiagent feedback systems that play a role in all the laws. That is, the
other laws are due to the feedback behavior.
As stated in Lehman and Belady's article, the law of increasing entropy implies that the
system would be replaced because the cost to maintain it would exceed the cost of building a new
system. This was true for operating systems, which were studied by Lehman and Belady. However,
many organizations find that replacing a legacy application system is not an option because
numerous business processes and business rules have been implemented in the legacy system
during the prolonged maintenance process. Moreover, millions of records are stored in the
databases. Due to inadequate documentation and the complexity of the system, no one really
knows what is implemented and how to port the data records. Therefore, many legacy systems are
still in use and companies spend hundreds of millions of dollars maintaining them each year.
175

21.4 TYPES OF SOFTWARE MAINTENANCE


The IEEE categorizes software maintenance into four types:
1. Corrective maintenance: Reactive modification of a software product performed after
delivery to correct discovered faults.
2. Adaptive maintenance: Modification of a software product performed after de livery to
keep a computer program usable in a changed or changing environment.
3. Perfective maintenance: Modification of a software product performed after de livery to
improve performance or maintainability.
4. Emergency maintenance: Unscheduled corrective maintenance performed to keep a
system operational. On the other hand, corrective maintenance is a planned maintenance
activity.
21.5 SOFTWARE MAINTENANCE PROCESS AND ACTIVITIES
Software maintenance is an ongoing activity that continues until the system retires. Like software
development, maintenance requires a process. During the history of software engineering, many
maintenance process models have been proposed. This section reviews some of these process
models. Regardless of which process model is used, software maintenance performs a set of basic
activities. These include:
1. Program understanding. Software systems are large and complex. In many cases, the
specification and design documents are either missing, incomplete, or out dated. However,
the maintenance software engineer must know the software before he can change it.
Therefore, program understanding is required.
2. Change identification and analysis. This activity identifies the needed changes, analyzes
their impact, estimates the change effort, and assesses the change risks.
3. Configuration change control. Changes made to a component of the system may impact
other components. This means that the software engineers who maintain the affected
components must be involved in the decision-making process. This activity requires the
maintenance team to prepare an engineering change proposal. It serves to inform relevant
stakeholders of the changes and solicits their feedback. A board of members reviews the
proposal. The board may approve or disapprove the changes, or require that the proposal
be revised to address their concerns.
4. Change implementation, resting, and delivery. The approved changes are made to the
existing system. This involves recovery of design from the implementation and modifies
the design to satisfy the changed requirements. The system is modified to incorporate the
changes. Integration testing, acceptance testing, and system testing are performed to ensure
that the system is correctly modified. The system is then delivered to its operating
environment. During its operation, data are collected and used to compute metrics such as
performance, response time, number of reboots, and so on. Defects are recorded. These are
the input to the next cycle of maintenance activities.
176

21.5.1 Maintenance Process Models


Figure 21.1 shows some of the maintenance process models proposed in the literature. The quick
fix, iterative enhancement, and full reuse models are described in [ 16). With the quick fix model
in Figure 21. 1 (a), the source code and the necessary documentations are changed. The source
code is compiled and the system is tested. The quick fix model can be applied to all types of
maintenance. The iterative enhancement model in Figure 21. l(b) is an adaptation of the
evolutionary development model to software maintenance. That is, changes are based on an
analysis of the current system. The requirements specification and design of the current system are
modified and used as the basis for implementing the changes. This process is repeated for each
batch of changes. The full reuse model in Figure 21. l (c) differs from the iterative enhancement
model in its emphasis in reusing the existing system's requirements specification, design,
implementation, test cases, and other reusable components from a repository. With this model,
software reuse processes and techniques are applied to reap the benefits of reuse. The model
requires that there must be a repository of reusable components, and support for selecting and
tailoring the reusable components must be available. The IEEE-1219 model shown in Figure
21.l(d) and the JS0 12207 model are similar to the iterative enhancement model.
177

Figure 21.2 shows the correspondence between the models.

21.5.2 Program Understanding


To change a software system, the software engineer needs to understand the program. This is
commonly referred to as program understanding or program comprehension. It involves a process
that extracts the design and specification artifacts from the code and represents them in a mental
model. This process is the opposite of the development process, which begins with a problem
statement and ends with the production of the software running in the target environment. Different
mental models are used to represent the design and specification artifacts that are recovered from
the code. These include object-oriented models, control flow models, functional models, and
sophisticated models proposed by researchers. UML diagrams may serve as the object-oriented
models. The flowchart and data flow diagram are examples of control flow models and functional
models, respectively.
Many real-world systems consist of millions of lines of code and thousands of classes. In
addition, object-oriented systems also exhibit complex interdependencies among the classes. For
example, the Inter Views library has 122 classes, more than 400 inheritance, aggregation and
association relationships, and more than 1,000 member functions. The classes call each other,
resulting in the so-called invocation chains. That is, function f1 calls function f2, which in tum
calls function f3, and so on. Figure 21.3 shows the lengths of the invocation chains in the Inter
Views library. The data do not include calls that involve dynamic binding and function pointers
(Inter Views was implemented in C++).

The figure shows that two chains have a length of I 4. This means 15 functions form a call
sequence-that is, f1 calls f2, f2 calls f3, ..., f14 calls fl5. Thus, to understand the functionality, the
maintainer needs to trace the 14 function calls and makes a note of what each function does. He
then derives the functionality from the trace. The figure shows that most of the invocation chains
call two to nine functions, and about 30% of the chains call more than a half a dozen functions.
The complex relationships among the classes and the function invocation chains make object-
oriented programs difficult to understand. Unfortunately, the version of the Inter Views library
used in the study in did not include any in-code comments. Each program file contains only a brief,
178

generic file header stating the copyright information, and this is the only in-code comment. Thus,
understanding a program is difficult without other supporting documents.
21.5.3 Change Identification and Analysis
Software maintenance needs to identify the needed changes based on the change events.
Sometimes, more than one change is needed for a given event. All these changes should be
identified. It is worthwhile to identify alternative changes to accommodate a change event. For
example, if it is costly or time consuming to change a component and there are commercial off-
the-shelf (COTS) alternatives, then changing the component as well as using COTS should be
identified as alternatives. This information allows the change analysis step to evaluate the options
and select a viable option to pursue. Sometimes, it is difficult to fix a component to remove the
root cause, but it is relatively easy to change a different component to fix the problem, at least
temporarily. In this case, these two alternatives should be identified. The changes identified are
analyzed to:
1. Assess the change impact-that is, which other components will be affected by the
changes made to a given component.
2. Estimate the costs and time required to implement the changes and test the result.
3. Identify risks and define resolution measures.
Object-oriented software exhibits complex dependencies among the classes. Changes made
to one class may affect many classes. This is also called the ripple effect. Figure 21.4 shows the
change impact in the InterViews library discussed earlier. The data indicate that on average, 15
classes are affected when one class is changed. The first three cases indicate that changes made to
one class could affect 51, 62, and 74 classes, respectively. However, in some case, change impact
is limited. For example, in one case, five classes are changed but only two classes are affected. As
discussed above, there are alternatives to fix a defect or solving a problem. Change impact analysis
should consider the maintenance costs associated with the alternative ways to change the software.
The number of alternatives should be limited to reduce the cost of change analysis. In addition to
change impact, the risks associated with the changes are identified and measures are defined to
resolve the risks if they do occur. The outcome of this step is used in the configuration change
control step to determine if the proposed changes should be performed.

Change impact analysis using the design class diagram is associated with a number of
problems. The design class diagram usually does not include implementation classes and
associated relationships. Moreover, the implementation of the classes and relationships of a design
179

class diagram may differ from their design counterparts. For example, an implemented class may
have more functions than its design counterpart. These functions may call functions of other
classes. Such dependencies may not exist in the design class diagram. Thus, change impact
analysis based on the design class diagram may produce incorrect results. Finally, if the design
class diagram is not available, then it is not possible to use this approach. Change impact analysis
using a reverse-engineering approach offers a solution to the above problems. This is described in
"Reverse-Engineering" (Section 21.6).
21.5.4 Configuration Change Control
Section 21.5.3 shows that changes made to a class may ripple throughout the system, affecting
many classes. In practice, the classes are developed by different teams and team members. If class
A is changed and class B is affected, then the developer of class B needs to know that his class has
been affected. This is necessary because changes to class B may be needed. Thus, changes to the
components of a system must be made in a coordinated manner; otherwise, the project would
become a chaos. The mechanism to coordinate the changes to the components of a system is called
software configuration management. Configuration change control (CCC) is one of its
components. It performs two main functions:
1. Preparing an engineering change proposal. Based on the change analysis result, the
maintenance personnel prepare an engineering change proposal (ECP). The ECP consists of
administrative forms, supporting technical and administrative materials that specify the proposed
changes, the reasons for the changes, the affected components, and the effort, time, and cost
required to implement the changes. The priorities of the changes are specified. A schedule to
implement the changes is also described.
2. Evaluating the engineering change proposal. The ECP is reviewed by a configuration
change control board (CCCB). The board consists of representatives from different parties
including representatives of the development teams of the com ponents that will be affected by the
changes. If the review raises concerns, then the proposal is modified and resubmitted. In some
cases, the proposal is rejected for various reasons. In this case, the proposal is archived for future
reference.
21.5.5 Change Implementation, Testing, and Delivery
Once the ECP is accepted, the changes are implemented. For many real-world systems, the change
incorporation activity needs to implement a set of changes, which may include all types of
maintenance. The implementation is application dependent. It is not pursued further. The
implementation is tested using the existing as well as new test cases. That is, regression testing and
development testing are performed. The modified and tested system is then deployed to operate in
the target environment. During the operation phase, bugs are recorded, various data such as system
logs, transaction processing times and data needed to compute desired metrics are collected. The
data are used to compute metrics to assess the performance of the system. These results are useful
for the next cycle of maintenance work.
180

21.6 REVERSE-ENGINEERING
The process that converts the code to recover the design, specification, and a problem statement is
a reverse process of the development process. Therefore, this is called reverse-engineering.
Chikofsky and Cross define reverse-engineering as "the process of analyzing a subject system to
identify the system's components and their interrelationships, and create representations of the
system in another form or at a higher level of abstraction" [46]. In comparison, "the traditional
process of moving from high-level abstractions and logical, implementation-independent designs
to the physical implementation of a system" is called forward-engineering.

21.6.1 Reverse-Engineering Workflow


Performing reverse-engineering manually is difficult. Therefore, tools are developed to automate
the process. The main components of a reverse-engineering tool are shown in Figure 21.6. The
tool takes the code as the input and displays the design diagrams as the output. The first step of the
reverse-engineering process extracts the software artifacts from the code. For example, to reverse-
engineer an object-oriented program to generate a class diagram, the classes, their attributes and
operations, and the relationships between the classes are extracted. The results are stored in a
database. The artifacts are used to compute the diagram layout, that is, the coordinates that
determine where to draw the classes, attributes, and relationships. Finally, a display component
draws the diagram according to the layout.
21.6.2 Usefulness of Reverse-Engineering
The diagrams produced by reverse-engineering have a number of uses:
Program understanding. The diagrams produced by a reverse-engineering tool facilitate
understanding of the structure, functionality, and behavior of the software under maintenance. This
is extremely useful when the system documentation is missing, outdated, or inadequate.
Formal analysis. Formal analysis techniques can be applied to the diagrams produced by
a reverse-engineering tool to detect problems that may exist in the software. For example, software
model checking applies the conventional model checking technique to a software model produced
by reverse-engineering.
181

Test case generation. The diagrams produced by reverse-engineering facilitate test case
generation. For example, the basis paths of a flowchart diagram facilitate the generation of basis
path test cases. The state diagram facilitates generation of state behavior test cases.
Software reengineering. The diagrams produced by reverse-engineering are useful for
software reengineering-a process that restructures the software system to improve a certain
aspect(s) of the software system. This is described in a later section.
21.6.3 Reverse-Engineering: A Case Study
In illustration, this section presents the OOTWorks, "Developing an Object-Oriented Software
Testing and Maintenance Environment" testing and maintenance environment reported in [101].
It includes the following components, among others:
Object Relation Diagram (ORD). This component takes the source code and produces a
UML class diagram showing the classes, their attributes and operations, and the relationships
between the classes. The user can select the classes and relationships, and the attributes and
methods of which classes to be displayed. The ORD utilities include:
• Change Impact Analysis. The user can select the classes to be changed and have
the tool highlight the classes that are affected, based on the dependencies among
the classes, as described in the last section.
• Software Metrics. This utility calculates software metrics including, for each
class, the class size, number of lines of code, number of children, fan-in, fan-out,
number of relationships, depth-in-inheritance-tree, and so on.
• Version Comparison. This utility takes as input two versions of the source code
and displays the ORD for the old and new versions. Moreover, the new version
highlights the classes added, changed, and affected. The old version highlights the
classes deleted, changed, and affected.
• Test Order. This utility computes the order to test the classes so that the effort
required to implement the test stubs is substantially reduced.
Block Branch Diagram (BBD). The Block Branch Diagram performs reverse engineering
of the functions of a class and displays the flowcharts for the functions. The BBD component also
calculates and displays the basis paths of the function, highlights the basis path selected, and shows
the variables that are used and modified.
Object Interaction Diagram (OID). This component performs reverse engineering of the
source code to generate and display a sequence diagram that describes the interaction between the
objects.
Object State Diagram (OSD). This component performs reverse-engineering of the
source code to produce and display a state diagram that describes the state dependent behavior of
an object. The utilities include state reachability analysis and state-based fault analysis.
21.7 SOFTWARE REENGINEERING
Software reengineering is an important activity of software maintenance. It is a process that
restructures a software system or component to improve certain aspects of the software. As
Lehman's laws indicate, software systems undergo continual changes. These cause the structure of
182

the software system to deteriorate. As a consequence, the software becomes more difficult to
comprehend and more costly to maintain. In this case, it is necessary to restructure the software
system to reduce the maintenance cost.
21.7.1 Objectives of Reengineering
As discussed above, reengineering is required to improve the structure of the software system so
that further maintenance activities can be performed cost effectively. Besides this, software
reengineering is sometimes performed to improve the quality, security, and performance aspects
of a software system. More specifically, software reengineering is often performed with one or
more objectives in mind. The following is a list of such objectives:
1. Improving the Software architecture. One important software reengineering
objective is to improve the software architecture of an existing system. The need for
improvement may be due to different reasons. Improving the software architecture is
achieved by applying architectural design patterns, security patterns, and design
patterns. For example, the controller pattern is often applied to decouple the graphical
user interface from the business objects to improve the architecture.
2. Reducing the complexity of the software. Studies indicate that the complexity of a
system has significant impact on the quality and security of a software system. The
complexity of a software system or component can be measured in different ways. One
complexity metric is the cyclomatic complexity, which was proposed by McCabe and
discussed in Chapters 19 and 20. It measures the number of independent paths or
control flows in a function. If the cyclomatic complexity is high, then the function is
difficult to understand, implement, test, and maintain. Many patterns can be applied to
reduce the complexity. These include observer, state, strategy, and other patterns.
3. Improving the ability to adapt to changes. This includes application of appropriate
design patterns to improve the strucn1re and behavior of the software system so that it
is more adaptable to changes in requirements and operating environment. For example,
the design of the persistence framework in Chapter 17 lets the system easily adapt to
changes in the database management system. Applying this framework to an existing
system improves its adaptiveness.
4. Improving the performance, efficiency, and resource utilization. During the system
operation phase, data about various aspects of the system are collected and metrics are
computed. These are valuable information for identifying places for improvement, for
example, performance bottlenecks, poor workload distribution and poor resource
utilization. Architectural styles, patterns, and efficient algorithms can be applied to
improve the system. For example, virtual proxy, smart proxy, fly-weight, and prototype
can be applied to improve performance, object creation speed, and memory usage.
5. Improving the maintainability of the software system. Many patterns can be applied
to make the software system easier to maintain. These include abstract factory, bridge,
builder, chain of responsibility, command, composite, decorator, facade, factory
183

method, flyweight, interpreter, iterator, mediator, observer, state, strategy, template


method, and visitor.

21.7.2 Software Reengineering Process


A typical software reengineering process is shown in Figure 21. 7. It involves the following
activities:
1. Identifying places that need improvement. First, the software is analyzed to identify
places where improvement is needed. Several tools are useful for achieving this task.
2. Selecting an improvement strategy. Next, improvement strategies are developed for
the items that are identified to improve. Often, more than one strategy exists for each item that
needs to be improved. The improvement strategies are analyzed to assess their change impact, and
the time and cost required to implement the strategies. Based on the analysis result, an
improvement strategy is selected.
3. Implementing the proposed improvements. The proposed improvements are
implemented. Testing and regression testing are performed to ensure that the reengineered software
system satisfies the requirements.
4. Evaluating system against objectives. The modified system is evaluated against the
objectives. If further improvement is needed, then the process is repeated.
21.7.3 Software Reengineering: A Case Study
This section illustrates how software reengineering improves software quality through a small case
study. The system is the OOTWorks environment described in Section 21.6.3. The objective of the
case study is to improve the OOTWorks environment using the tools of OOTWorks. The
reengineering objectives discussed in Section 21.7. 1 are taken into consideration. The case study
performs the following steps, as described previously:
1. Identifying places that need improvement.
2. Selecting an improvement strategy.
3. Implementing the proposed improvements.
4. Evaluating the system against improvement objectives.
First, the metric calculation tool of OOTWorks is applied to identify places that need
improvement. The tool indicates many places required improvements. One of these is the
extremely high complexity of one of the methods of a metrics calculation class. The method
calculates the software metrics selected by the user. The software industry has an unofficial
complexity threshold of 10, but the method has a complexity of 38. To understand why the method
has such a high complexity, a closer examination of the code is performed. It reveals that the
184

method uses conditional statements to test if a metric is selected. If so, it calculates the metric. The
complexity reflects the use of 38 conditional checks to determine which metrics need be computed.
The next step is to select an improvement strategy. The use of conditional statements
implies behavior variations. That is, different metrics are calculated by using different algorithms.
This suggests that the polymorphism pattern can be applied. The polymorphism pattern is
summarized as follows:
Problem: How does one handle behavior variations without using conditional statements?
Solution: Define an interface for the behaviors that vary and let the subclasses implement the
behavior variations.
Thus, an abstract class called Metric is defined. It implements the Action Listener interface of Java.
Its actionPerformed (...) method invokes its abstract computeMetric (...) method. The subclasses
of Metric implement the compute Metric(...) method to compute the concrete metrics. Moreover,
the subclasses are the action listeners of the respective metric selection widgets, which are check
boxes. In this way, when the user checks a metric check box, the corresponding metric is
calculated. When the user clicks the Display Metrics button, the selected metrics are displayed.
The third step implements the proposed improvements. That is, the skeleton code for the
Metric abstract class is implemented. Test-driven development is applied to implement each of the
concrete Metric classes, one at a time. More specifically, the skeleton code for the subclass is
implemented. Tests are written to test the unimplemented computeMetric(...) method and make
sure all the tests fail. The implementation of the computeMetric(...) involves copying-and-pasting
the existing code into the appropriate places. The code is modified if needed. In most cases, very
little effort is required to modify the reused code. The tests are run to ensure that the metric is
correct. The process iterates for each of the metrics.
Finally, the modified tool is applied to assess the complexity of the modified class. It shows
that the complexities of the computeMetrics(...) methods of the subclasses are low. As expected,
about 40 new classes are added. Regression testing is performed to ensure that the change does not
alter other parts of the software. For this case study, it is the case. In addition to substantial
reduction in complexity, the improvement makes the component much easier to maintain. For
example, adding new metrics is very easy--0ne needs to add and implement a Metric subclass and
a check box to notify an object of this class. Test-driven development of this extension is also made
easy, compared to testing the chunk of code that contains 38 nested if-then-else statements.
21.8 PATTERNS FOR SOFTWARE MAINTENANCE
As discussed previously, many patterns can be applied during the maintenance phase to improve
the software system in various ways. This section presents two new pat terns. The facade pattern
is useful for simplifying the client that interacts with a group of components. While the facade
pattern simplifies the client interface, the 111ediaror pattern simplifies the internal interaction of
the group of components. Pat terns that can be applied during the maintenance phase are not limited
to these two patterns. Therefore, other patterns that can be applied during maintenance are also
reviewed.
185

21.8.1 Simplifying Client Interface with Façade


During software maintenance, it is common that a client component needs to invoke a group of
components to accomplish a task. Consider, for example, the workflow shown in Figure 21.6. The
client component needs to invoke in a sequence three components that implement three tasks as
represented by the three rectangles. In many cases, the client wants to recover the design from the
code. It does not want to know the components and how to invoke them. A pattern to simplify the
interface for the client is desirable. This is the facade pattern, described in Figure 21.8.
In the original design of the OOTWorks environment, the user needs to invoke the ORD
parser to extract the artifact. The user then invokes the layout component to compute the
coordinates. Finally, the user invokes the display component to draw the diagram. This is not user-
friendly. Therefore, the reengineering effort selects the facade pattern to improve the client
interface. Figure 21.9 shows the application of the facade pattern to simplify the interface for the
client. The benefits of applying the facade pattern to the existing design can easily be derived from
the benefits listed in Figure 21.8.
186

21.8.2 Simplifying Component Interaction with Mediator


The facade pattern simplifies the client interface and lets the facade interact with the components.
If the interaction between the components is complex, as hinted by the gray cloud shape in Figure
21.9, then the components may be tightly coupled with each other. In the worst case, each
component must know the presence of, and how to interact with, each of the other components.
As a consequence, the components may be difficult to test, debug, maintain, and reuse. This kind
of code is often seen during software maintenance. A solution to improve the design is to reduce
the coupling between the interacting components.
The mediator pattern fulfills this objective. Figure 21.10 shows the specification of the
pattern. The pattern assigns the responsibility of coordinating the interacting components to an
object, called the mediator. The pattern replaces the component-to-component interaction in the
existing design with the mediator-to component interaction. The mediator defines the interface for
the components to interact with it. It uses the existing interfaces of the components to interact with
the components.
To illustrate, consider the design in Figure 21.9. The classes enclosed in the gray cloud
shape interact with each other in a complex fashion. The interaction behavior is somewhat difficult
to comprehend. In addition, a change to one class may affect the other classes. To improve, the
mediator pattern is applied. Figure 21.11 shows the result. In the figure, the mediator interacts with
the objects, which are decoupled from each other. The mediator coordinates the interaction.

21.8.3 Patterns for Software Maintenance


Many patterns presented in previous chapters are applicable during the maintenance phase. Figure
21.12 summarizes the patterns that can be applied during the maintenance phase. The first column
is the pattern. The second column shows the types of maintenance for which the pattern can be
applied. The third column briefly describes the use of the pattern and its benefits.
187
188

21.10 TOOL SUPPORT FOR SOFTWARE MAINTENANCE


Many software maintenance activities are tedious and time consuming. Moreover, software
maintenance needs to coordinate the changes to ensure consistency. The resulting software system
needs to be retested to ensure that it satisfies the requirements and constraints. The use of software
tools can significantly reduce the time and effort. The following are some of the tools that are
useful for software maintenance:
Reverse-engineering tools are useful for design and specification recovery. They aid
program comprehension and identification of places that need improvement. These tools are
extremely valuable when the design documentation is missing, outdated, or inadequate.
Metrics calculation tools compute and display quantitative measurements of a software
system. They help in identifying and highlighting places that need improvement. For example,
classes that consist of thousands of lines of code are difficult to maintain and are more likely to be
error prone. Classes that have an excessive number of functions may be assigned too many
responsibilities. Methods with a high complexity are candidates for improvement.
Performance measurement tools such as software profilers can display execution times,
invocation frequencies, and memory usage of various components of a software system. They are
useful for identifying performance bottlenecks and memory-intensive components. Software
reengineering may be needed to mitigate these problems.
Static analysis tools are useful for detecting violation of coding standards, in correct use
of types, existence of certain bugs and anomalies, and security vulnerabilities.
Change impact analysis tools are useful for assessing the scope of impact of proposed
improvements. The change impact analysis results are the basis for the estimation of the effort
required to perform the proposed improvements.
189

Effort estimation tools are useful for calculating the required time, effort, and costs to
implement the proposed improvements. Configuration management tools such as Concurrent
Versions System (CVS) and Subversion are useful for coordinating the changes to maintain the
consistency of the software being reengineered.
Regression testing tools are useful for rerunning the test cases to ensure that the system
satisfies the requirements and reengineering does not introduce new errors. Some of the tools can
analyze the software and select a subset of test cases to rerun. This reduces the regression testing
time and effort.
190
191

Module 5, Chapter 2
Software Configuration Management
During the software life cycle, numerous documents are produced. These include requirements
specification, software design documents, source code, and test cases. These documents depend
on each other. For example, a software design is usually de rived from and dependent on the
requirements specification. Classes depend on other classes. This means that changing the
requirements specification requires changes to the design and changing one class requires changes
to other classes. In general, changes made to a document may ripple throughout the project,
affecting many other documents.
This chapter presents concepts, activities, and techniques for controlling changes to the
documents produced during the life cycle, and tracking the status of the software system and its
components. These activities belong to the software engineering discipline referred to as software
configuration management (SCM). Traditionally, configuration management only applies to the
development of hardware elements of a hardware-software system. It is concerned with the
consistent labeling, tracking, and change control of the hardware elements of a system. Software
configuration management adapts the traditional discipline to software development. In this
chapter, you will learn the following:
• Basic concepts of SCM including software configuration item and baseline.
• Functions of SCM including software configuration identification, software configuration change
control, software configuration auditing, and software configuration status accounting.
• Knowledge of SCM tools such as Revision Control System (RCS), Concurrent Versions System
(CVS), Subversion, Domain Software Engineering Environment (DSEE), and ClearCase.
22.1 THE BASELINES OF A SOFTWARE LIFE CYCLE
During the software development life cycle, numerous documents are produced. As each set of
documents is produced and passes quality reviews, the project is moving closer toward its
completion. In this sense, the successful productions of the needed software artifacts serve as
measurements of the progress of the project. More specifically, the productions of such documents
at significant check points, such as the end of the requirements phase, the end of the design phase,
and so forth, act like the milestones of a long journey. These milestones let us know the progress
status of the project and product. If the project reaches the milestones as scheduled, then the team
knows that it will be able to complete the project on time; otherwise, the team needs to take action
to resolve the discrepancy.
In SCM, the milestones are called baselines. A baseline denotes a significant state of the
software life cycle. For example, the baselines for the agile process/methodology described in
Chapter 1, that is, Figures 2.15 and 2. 16, are shown in Figure 22.1. Each baseline is associated
with a set of software artifacts or documents produced in the baseline. These artifacts or documents
are called software configuration items (SCIs). In practice, each project defines its baselines and
configuration items taking into con sideration factors such as project size, budget, and available
resources. The concept of a baseline serves several purposes:
192

1. It defines the important states of progress of a project or product. The baselines in Figure
22.1 define the significant states of a given project.
2. It signifies that the project or product has reached a given state of progress. That is, a
baseline is established when the required SCIs are produced and pass the SQA reviews. At this
point, the SCIs are checked in to the configuration management system. Once a configuration item
is checked in to the configuration management system, changes to the item must go through a
procedure to ensure that the changes will maintain the consistency of the configuration of the
system.
3. It forms a common basis for subsequent development activities. Before the establishment
of the requirements baseline, the teams could proceed with the design activities but changes to the
requirements and use cases are to be expected. The establishment of the requirements baseline
"freezes" the documents associated with the baseline, that is, changes can no longer be made freely.
Needed changes must be documented and evaluated to assess their impact to configuration items
produced in subsequent activities such as design diagrams and implementation.
4. It is a mechanism to control changes to configuration items as explained in the last bullet.

22.2 WHAT IS SOFTWARE CONFIGURATION MANAGEMENT?


Generally speaking, SCM is baseline management and configuration item management. Baseline
management means defining a project's baselines and providing mechanisms to formally establish
the baselines. That is, at the beginning of a project, the baselines of the project, the criteria to
certify the baselines, and procedures to establish the baselines are defined. For example, a project
may adopt the baselines in Figure 22.1. In this case, the criteria for establishing the planning
baseline may be that the requirements, optionally a domain model, abstract and high-level use
cases, a requirement-use case traceability matrix, a use case delivery schedule, and a draft
architectural design are defined, reviewed, and the deficiencies are removed. The procedure for
establishing the baseline may be that these documents are authorized to be checked in to the
193

configuration management system. When all these documents are checked in, the baseline is
established. The configuration item management aspect of SCM is concerned with updates that
are made to the baseline items. That is, before a document is checked in to the configuration
management system, changes to the document can be made freely. However, once the document
is checked in, then any update to the document must go through a change control procedure to
coordinate the update.
22.3 WHY SOFTWARE CONFIGURATION MANAGEMENT?
For small projects that involve only a few developers working closely at one location, the need for
SCM is not acute. The team members can talk to each other in person to synchronize the updates
to the software artifacts they produce. However, many real-world software systems are developed
by many teams and developers working on shared, interdependent software artifacts
simultaneously, at different locations. In these cases, the work of one team cannot be started until
other documents are produced. Therefore, mechanisms are needed to establish the baselines and
publicize such information so that all of the teams are aware of the progress status of the project.
Updates to the software artifacts must be carefully coordinated to allow the teams to assess the
impact and avoid inconsistent update or overwriting the work produced by others.
Besides the need to synchronize the multiple distributed teams working together on a
project, the need to maintain different versions of a software system requires SCM. Multiple
versions of a system are needed to satisfy the needs of different customers, for example, different
customers require different modules of the system. If a vendor has a few dozen customers, then
the vendor may need to maintain dozens of versions of a software system to satisfy the needs of
its customers. In addition, the vendor may need to maintain different releases of a software system.
In some cases, there are subtle differences between the compilers from different compiler vendors.
This means there are variations in the source code of the software system, resulting in different
versions.
In summary, SCM is needed to coordinate the development activities of the multiple
development teams and team members, as well as to support maintenance of multiple versions of
a software system.
22.4 SOFTWARE CONFIGURATION MANAGEMENT
FUNCTIONS
As depicted in Figure 22.2, SCM consists of four main functions. These are outlined below and
detailed in the following sections.
• Software configuration identification. Software configuration identification de fines the
baselines, the configuration items, and a naming schema to uniquely identify each of the
configuration items. This function is performed when a new project starts.
• Configuration change control. Software configuration change control exercises control on
changes to the configuration items to ensure the consistency of the system configuration and
successful cooperation between the teams and team members. This function is performed when
change requests arrive, due to events that require changes.
194

• Software configuration auditing. Software configuration auditing verifies and validates the
baselines and configuration items, defines and executes mechanisms for formally establishing the
baselines, and ensures that proposed changes are properly implemented.
• Software configuration status accounting. Software configuration status accounting is
responsible for tracking and maintaining information about the system configuration. It provides
database support to the other three functions.

22.4.1 Software Configuration Identification


During the software development process, numerous documents are produced, used, and updated.
Not all documents must be placed under the control of SCM. Which documents need to be
managed depend on various factors including project size, development process, and available
resources. Software configuration identification de· fines the baselines and the software
configuration items for each of the baselines. For example, Figure 22.1 shows a sample set of
baselines and associated configuration items. A plan-driven project may include more, while an
agile project may include fewer configuration items.
For projects that need to manage many configuration items, a model of baseline and
configuration items is useful. Figure 22.3 shows a template of such a model in UML. Configuration
items are classified into simple software configuration items (Simple SCI) and composite software
configuration items (Composite SCI). A Simple SCI does not include other configuration items.
Examples are an expanded use case, a domain model, a sequence diagram, and a design class
diagram. A Composite SCI may contain other configuration items. For example, a design
specification includes expanded use cases, sequence diagrams, and a design class diagram.
As shown in the model, changes to a SCI may affect other SCIs due to inheritance,
aggregation, and association relationships. As an example of change impact due to an association
195

relationship, consider a sequence diagram that is derived from an expanded use case. Obviously,
if the expanded use case is modified, the sequence diagram may be affected and may need to be
modified as well. As an example of change impact due to an aggregation relationship, consider a
design specification that contains an expanded use case and a sequence diagram derived from the
expanded use case. If the expanded use case is deleted, then the sequence diagram must be deleted.
These imply that the design specification must be changed.
The abstract software configuration item (SCD, which is displayed in Figure 22.3 in italic
font, defines a set of attributes and operations that are common to all configuration items. Useful
attributes include, but are not limited to, the following:
• ID number-A unique ID to identify the SCI. It should bear certain semantics to communicate
the functionality of the SCI and the system or subsystem it belongs to. For example, a domain
model constn1cted in increment I for a library information system may have an ID number like
LIS-Inc 1-DM.
• name-The name of the configuration item, for example, Checkout Document Expanded Use
Case, Checkout Document Sequence Diagram, and so on.
• document type-The type of the document of the SCI, for example, requirements specification,
domain model, design specification, test cases, and the like. This attribute eliminates the need for
subclassing.
• document file-The document file or the full path name for the file that contains the SCI.
• author-The developer who creates the configuration item.
• date created, target completion date, and date completed-These are useful for tracking the
status of the SCI.
• version number-This is used to keep track of the multiple versions of a configuration item.
• update history-A list of update summaries, each of which briefly specifies the update, who
performs the update, and date of update.
• description-A brief description of the configuration item.
• SQA personnel-A technical staff who is responsible for the quality assurance of the configuration
item.
• SCM personnel-A technical staff who is responsible for checking in the con figuration item.
As usual, a simple configuration item has concrete operations to set and get attributes. It
may also include abstract operations for verifying and validating the configuration item as well as
computing various metrics. A composite configuration item has additional operations to add,
remove, and get component configuration items. Finally, a baseline has operations to add, remove,
and get a predecessor as well as operations to add, remove, and get a configuration item. To apply
the model, each concrete project extends the abstract leaf classes to provide concrete
implementation of the abstraction operations. In particular, a subclass is created for a set of SCIs
that share the same behavior.
196

22.4.2 Software Configuration Change Control


As illustrated in Figure 22.4, software configuration change control (SCCC) involves the following
activities but only two of them are SCCC functions:
1. Identify changes required by various events. Many events mandate changes to the SCIs.
These events include:
a. Software deficiencies, for example, the functionality is inadequate or incorrect, the
performance is unacceptable.
b. Hardware changes, for example, replacement of the computer or hardware devices.
c. Changes to operational requirements, for example, a new security procedure requires
that passwords must satisfy strong password rules and be changed periodically.
d. Improvement and enhancement requests from customer and users, for example,
improvement to user interface and actor-system interaction behavior are required.
e. Changes to budget, project duration, and schedule are required, for example, the schedule
needs to be adjusted to meet an emerging business situation, or the budget is cut and the system's
capabilities must be reduced.
2. Analyze changes. When any of the events occur, the team needs to identify changes to the
configuration items to respond to the change event. The changes are analyzed by respective experts
who are the developers in most cases.
3. Prepare an engineering change proposal. The changes and analysis results are used to prepare
an engineering change proposal (ECP). The ECP consists of administrative forms, and supporting
technical and administrative materials that specify, among others, the following:
a. Description of the proposed changes.
b. Identification of originating organization or developer.
c. Rationale for the changes.
d. Identification of affected baselines and SCIs.
e. Effort, time, and cost required to implement the proposed changes as well as the priority
of each of the proposed changes. f. Impact to project schedule.
4. Evaluate engineering change proposals. The ECP is reviewed by a configuration change
control board (CCCB), which consists of representatives from different parties, especially parties
whose work and schedule will be affected by the changes. Three different outcomes are possible:
(I) the proposal is rejected, in this case, it is archived; (2) changes to the proposal are required, in
197

this case, it is returned to proposal preparation function; and (3) the proposal is approved, in this
case, the changes are made.
5. Incorporate changes. The approved changes are made to the software system.

22.4.3 Software Configuration Auditing


The software configuration auditing (SCA) function has the following responsibilities:
1. Defining mechanisms for establishing and formally establishing a baseline. A baseline
can exist in one of two states: (1) a to-be-established (TBE) baseline, and (2) a sanctioned
baseline. A TBE baseline is brought to existence when one of the associated documents is
produced and entered into the SCM system. A sanctioned baseline is established when all
of the associated configuration items are produced and pass SQA inspection, review and/or
testing, and entered into the SCM system.
2. Configuration item verification. This ensures that what is intended for each con
figuration item as specified in one baseline or update is achieved in a succeeding baseline
or update. For example, for each high-level use case allocated to an increment in the
requirements baseline, there must be an expanded use case in the design baseline that
specifies how the system and the actor would interact to carry out the front-end processing
of the use case.
3. Configuration item validation. This checks the correctness to ensure that the con
figuration item solves the right problem. Consider, for example, the Checkout Document
use case of a library information system (LIS). Verification ensures that there is an
expanded use case in the succeeding baseline. Validation ensures that the specification of
the expanded use case indeed matches the user's expectation.
4. Ensuring that changes specified in approved ECPs are properly and timely
implemented.
198

22.4.4 Software Configuration Status Accounting


Software configuration status accounting tracks and reports information about the configuration
items. As depicted in Figure 22.2, it provides database support to the other three SCM functions.
As such, the SCM database increases in complexity and the amount of data that need to be
maintained as the project progresses. Data that need to be stored include descriptive information
about the SCIs and baselines, description of ECPs and their status, change status, deficiencies of a
TBE baseline as a result of configuration auditing, relationships between the configuration items,
and relationships between baselines and configuration items.
22.6 SOFTWARE CONFIGURATION MANAGEMENT TOOLS
SCM activities must maintain and process a lot of data and SCIs, which are related to each other
in a complex manner such as trees with branches denoting releases, versions, and revisions. In
addition, SCM tools need to notify relevant teams and team members of the state of the SCIs and
changes to the SCIs. Concurrent updates to the SCIs may be needed to improve efficiency. Such
updates require concurrency control to ensure consistency. Clearly, SCM tools are needed to
support these activities. This section presents the capabilities of such tools.
The capabilities provided by SCM tools vary significantly. Some SCM tools provide full
support to all SCM activities while others support only a subset of the SCM activities. Which SCM
tools to use depend on the project. In general, large, mission-critical systems or distributed
development require more SCM functions. Small, agile projects tend to use only version control
tools. A typical SCM tool provides the following capabilities:
• Version control. The objective of version control is to manage the releases, versions, and
revisions of a software system. It is used during the development process as well as the
maintenance phase. The need for such a function has been discussed in the "Why Software
Configuration Management" section.
• Workspace management. Software engineers work together to design and implement a
software system. To coordinate the work of the software engineers, a central repository of software
artifacts is needed. Workspace management pro vides local workspaces for the software engineers
and the central repository for the software engineers to share their work. It allows the software
engineers to check in local files to the repository, and check out repository files to their workspaces.
• Concurrency control. Software engineers may need to work on the same set of files
simultaneously, which may result in inconsistent updates. Concurrency control provides
mechanisms to enable or disable concurrent updates. If concurrent update is enabled, then the tool
provides mechanisms to merge the concurrent updates and facilitate resolution of conflicts.
• System build. The system build capability allows the team to specify the system
configuration, that is, which versions of which components should be included in a system. The
SCM tool will automatically compile and link the components to produce the executable system.
• Support to SCM process. This capability is aimed at automating the SCM procedures
described in previous sections.
Tools that provide version control, workspace management, and concurrency control
include Source Code Control System (SCCS), Revision Control System (RCS), Concurrent
199

Versions System (CVS), Subversion (SYN), Domain Software Engineering Environment (DSEE),
IBM ClearCase, and many others. SCCS is one of the earliest computer-aided software for source
code revision control. RCS controls access to shared files through an access list of login names
and the ability to lock and unlock a revision. CVS is a substantial extension of RCS and is a
preinstalled plugin of NetBeans. Subversion is initially designed to replace CVS; and hence, it
possesses all of the CVS capabilities. However, since its inception in 2000, Subversion has evolved
beyond a CVS replacement and introduced a comprehensive set of advanced features. DSEE is a
proprietary SCM software, which forms the basis for IBM ClearCase. Figure 22.5 is a comparative
summary of some of the features of RCS, CVS, Subversion, and ClearCase. Appendix C.7
describes in detail how to use CVS and Subversion in NetBeans.
Tools that support system build include make and ant. Make is a UNIX/Linux utility and
ant provides the functions of make to build systems using Java components. These tools Jet the
software engineer specify a script or a sequence of commands. System build is accomplished by
executing the script. Nowadays, system build is supported by almost all of the integrated
development environments (IDEs).

You might also like