MCS104C ASE Digital Notes
MCS104C ASE Digital Notes
VTU, Belagavi
Advanced Software Engineering (MCS104C)
From M2 to M5
~By Swapnadeep Kapuri
2VX24SCS17
2
Module 2, Chapter 1
Requirements Elicitation
A requirement is a feature that the system must have or a constraint that it must satisfy to be
accepted by the client. Requirements engineering aims at defining the requirements of the system
under construction. Requirements engineering includes two main activities; requirements
elicitation, which results in the specification of the system that the client understands, and
analysis, which results in an analysis model that the developers can unambiguously interpret.
Requirements elicitation is the more challenging of the two because it requires the collaboration
of several groups of participants with different backgrounds. On the one hand, the client and the
users are experts in their domain and have a general idea of what the system should do, but they
often have little experience in software development. On the other hand, the developers have
experience in building systems, but often have little knowledge of the everyday environment of
the users.
Scenarios and use cases provide tools for bridging this gap. A scenario describes an
example of system use in terms of a series of interactions between the user and the system. A use
case is an abstraction that describes a class of scenarios. Both scenarios and use cases are written
in natural language, a form that is understandable to the user.
Developers elicit requirements by observing and interviewing users. Developers first
represent the user’s current work processes as as-is scenarios, then develop visionary scenarios
describing the functionality to be provided by the future system. The client and users validate the
system description by reviewing the scenarios and by testing small prototypes provided by the
developers. As the definition of the system matures and stabilizes, developers and the client agree
on a requirements specification in the form of functional requirements, nonfunctional
requirements, use cases, and scenarios.
4.1 Introduction: Usability Examples
Requirements elicitation is about communication among developers, clients, and users to define a
new system. Failure to communicate and understand each other’s’ domains results in a system that
is difficult to use or that simply fails to support the user’s work. Errors introduced during
requirements elicitation are expensive to correct, as they are usually discovered late in the process,
often as late as delivery. Such errors include missing functionality that the system should have
supported, functionality that was incorrectly specified, user interfaces that are misleading or
unusable, and obsolete functionality. Requirements elicitation methods aim at improving
communication among developers, clients, and users. Developers construct a model of the
application domain by observing users in their environment. Developers select a representation
that is understandable by the clients and users (e.g., scenarios and use cases). Developers validate
the application domain model by constructing simple prototypes of the user interface and
collecting feedback from potential users. An example of a simple prototype is the layout of a user
interface with menu items and buttons. The potential user can manipulate the menu items and
3
buttons to get a feeling for the usage of the system, but there is no actual response after buttons are
clicked, because the required functionality is not implemented.
4.2 An Overview of Requirements Elicitation
Requirements elicitation focuses on describing the purpose of the system. The client, the
developers, and the users identify a problem area and define a system that addresses the problem.
Such a definition is called a requirements specification and serves as a contract between the client
and the developers. The requirements specification is structured and formalized during analysis
(Module 2, Chapter 2, Analysis) to produce an analysis model (see Figure 4-1). Both requirements
specification and analysis model represent the same information. They differ only in the language
and notation they use; the requirements specification is written in natural language, whereas the
analysis model is usually expressed in a formal or semiformal notation. The requirements
specification supports the communication with the client and users. The analysis model supports
the communication among developers. They are both models of the system in the sense that they
attempt to represent accurately the external aspects of the system. Given that both models represent
the same aspects of the system, requirements elicitation and analysis occur concurrently and
iteratively.
Requirements elicitation and analysis focus only on the user’s view of the system. For
example, the system functionality, the interaction between the user and the system, the errors that
the system can detect and handle, and the environmental conditions in which the system functions
are part of the requirements. The system structure, the implementation technology selected to build
the system, the system design, the development methodology, and other aspects not directly visible
to the user are not part of the requirements.
4
The above functional requirements focus only on the possible interactions between
SatWatch and its external world (i.e., the watch owner, GPS, and WebifyWatch). The above
description does not focus on any of the implementation details (e.g., processor, language, display
technology).
• Reliability is the ability of a system or component to perform its required functions under stated
conditions for a specified period of time. Reliability requirements include, for example, an
acceptable mean time to failure and the ability to detect specified faults or to withstand specified
security attacks. More recently, this category is often replaced by dependability, which is the
property of a computer system such that reliance can justifiably be placed on the service it delivers.
Dependability includes reliability, robustness (the degree to which a system or component can
function correctly in the presence of invalid inputs or stressful environment conditions), and safety
(a measure of the absence of catastrophic consequences to the environment).
• Performance requirements are concerned with quantifiable attributes of the system, such as
response time (how quickly the system reacts to a user input), throughput (how much work the
system can accomplish within a specified amount of time), availability (the degree to which a
system or component is operational and accessible when required for use), and accuracy.
• Supportability requirements are concerned with the ease of changes to the system after
deployment, including for example, adaptability (the ability to change the system to deal with
additional application domain concepts), maintainability (the ability to change the system to deal
with new technology or to fix defects), and internationalization (the ability to change the system
to deal with additional international conventions, such as languages, units, and number formats).
The ISO 9126 standard on software quality [ISO Std. 9126], similar to the FURPS+ model,
replaces this category with two categories: maintainability and portability (the ease with which
a system or component can be transferred from one hardware or software environment to another).
(Note: FURPS+ is an acronym using the first letter of the requirements categories: Functionality,
Usability, Reliability, Performance, and Supportability. The + indicates the additional
subcategories. The FURPS model was originally proposed by [Grady, 1992]. The definitions in
this section are quoted from [IEEE Std. 610.12-1990].)
The FURPS+ model provides additional categories of requirements typically also included under
the general label of nonfunctional requirements:
• Implementation requirements are constraints on the implementation of the system, including
the use of specific tools, programming languages, or hardware platforms.
• Interface requirements are constraints imposed by external systems, including legacy systems
and interchange formats.
• Operations requirements are constraints on the administration and management of the system
in the operational setting.
• Packaging requirements are constraints on the actual delivery of the system (e.g., constraints
on the installation media for setting up the software).
• Legal requirements are concerned with licensing, regulation, and certification issues. An
example of a legal requirement is that software developed for the U.S. federal government must
comply with Section 508 of the Rehabilitation Act of 1973, requiring that government information
systems must be accessible to people with disabilities.
Nonfunctional requirements that fall into the URPS categories are called quality
requirements of the system. Nonfunctional requirements that fall into the implementation,
7
interface, operations, packaging, and legal categories are called constraints or pseudo
requirements. Budget and schedule requirements are usually not treated as nonfunctional
requirements, as they constrain attributes of the projects.
including system components, classes, methods, and object attributes. Traceability is critical for
developing tests and for evaluating changes. When developing tests, traceability enables a tester
to assess the coverage of a test case, that is, to identify which requirements are tested and which
are not. When evaluating changes, traceability enables the analyst and the developers to identify
all components and system functions that the change would impact.
The methods described in this section are adapted from OOSE [Jacobson et al., 1992], the Unified
Software Development Process [Jacobson et al., 1999], and responsibility-driven design [Wirfs-
Brock et al., 1990].
Actors are role abstractions and do not necessarily directly map to persons. The same
person can fill the role of WatchOwner and WebifyWatch. However, the functionality they access
is substantially different. For that reason, these two roles are modeled as two different actors.
The first step of requirements elicitation is the identification of actors. This serves both to
define the boundaries of the system and to find all the perspectives from which the developers
need to consider the system. When the system is deployed into an existing organization (such as a
company), most actors usually exist before the system is developed: they correspond to roles in
the organization.
During the initial stages of actor identification, it is hard to distinguish actors from objects.
For example, a database subsystem can at times be an actor, while in other cases it can be part of
the system. Note that once the system boundary is defined, there is no trouble distinguishing
between actors and such system components as objects or subsystems. Actors are outside of the
system boundary; they are external. Subsystems and objects are inside the system boundary; they
are internal. Thus, any external software system using the system to be developed is an actor.
10
Once the actors are identified, the next step in the requirements elicitation activity is to
determine the functionality that will be accessible to each actor. This information can be extracted
using scenarios and formalized using use cases.
Generalizing scenarios and identifying the high-level use cases that the system must
support enables developers to define the scope of the system. Initially, developers name use cases,
attach them to the initiating actors, and provide a high-level description of the use case as in Figure
4-7. The name of a use case should be a verb phrase denoting what the actor is trying to accomplish.
The verb phrase “Report Emergency” indicates that an actor is attempting to report an emergency
to the system (and hence, to the Dispatcher actor). This use case is not called “Record Emergency”
because the name should reflect the perspective of the actor, not the system. It is also not called
12
“Attempt to Report an Emergency” because the name should reflect the goal of the use case, not
the actual activity.
Attaching use cases to initiating actors enables developers to clarify the roles of the
different users. Often, by focusing on who initiates each use case, developers identify new actors
that have been previously overlooked.
Describing a use case entail specifying four fields:
• Describing the entry and exit conditions of a use case enables developers to understand the
conditions under which a use case is invoked and the impact of the use case on the state of the
environment and of the system. By examining the entry and exit conditions of use cases,
developers can determine if there may be missing use cases. For example, if a use case requires
that the emergency operations plan dealing with earthquakes should be activated, the
requirements specification should also provide a use case for activating this plan.
• Describing the flow of events of a use case enables developers and clients to discuss the
interaction between actors and system. This results in many decisions about the boundary of
the system, that is, about deciding which actions are accomplished by the actor and which
actions are accomplished by the system.
• Finally, describing the quality requirements associated with a use case enables developers to
elicit nonfunctional requirements in the context of a specific functionality. In this book, we
focus on these four fields to describe use cases as they describe the most essential aspects of a
use case. In practice, many additional fields can be added to describe an exceptional flow of
events, rules, and invariants that the use case must respect during the flow of events.
Writing use cases is a craft. An analyst learns to write better use cases with experience.
Consequently, different analysts tend to develop different styles, which can make it difficult to
produce a consistent requirements specification. To address the issue of learning how to write use
cases and how to ensure consistency among the use cases of a requirements specification, analysts
adopt a use case writing guide. Figure 4-8 is a simple writing guide adapted from [Cockburn, 2001]
that can be used for novice use case writers.
13
The use of scenarios and use cases to define the functionality of the system aims at creating
requirements that are validated by the user early in the development. As the design and
implementation of the system starts, the cost of changing the requirements specification and adding
new unforeseen functionality increases. Although requirements change until late in the
development, developers and users should strive to address most requirements issues early. This
entails many changes and much validation during requirements elicitation. Note that many use
cases are rewritten several times, others substantially refined, and yet others completely dropped.
To save time, much of the exploration work can be done using scenarios and user interface mock-
ups.
The following heuristics can be used for writing scenarios and use cases:
14
specific use case, we also implicitly specify which actors cannot invoke the use case. Similarly, by
specifying which actors communicate with a specific use case, we specify which actors can access
specific information and which cannot. Thus, by documenting initiation and communication
relationships among actors and use cases, we specify access control for the system at a coarse
level.
The relationships between actors and use cases are identified when use cases are identified.
Figure 4-11 depicts an example of communication relationships in the case of the FRIEND system.
The «initiate» stereotype denotes the initiation of the use case by an actor, and the «participate»
stereotype denotes that an actor (who did not initiate the use case) communicates with the use case.
In all cases, the purpose of adding include and extend relationships is to reduce or remove
redundancies from the use case model, thus eliminating potential inconsistencies.
glossary, which is also called a data dictionary. Building this glossary constitutes the first step
toward analysis.
The glossary is included in the requirements specification and, later, in the user manuals.
Developers keep the glossary up to date as the requirements specification evolves. The benefits of
the glossary are manyfold: new developers are exposed to a consistent set of definitions, a single
term is used for each concept (instead of a developer term and a user term), and each term has a
precise and clear official meaning. The identification of participating objects results in the initial
analysis object model.
The identification of participating objects during requirements elicitation only constitutes
a first step toward the complete analysis object model. The complete analysis model is usually not
used as a means of communication between users and developers, as users are often unfamiliar
with object-oriented concepts. However, the description of the objects (i.e., the definitions of the
terms in the glossary) and their attributes are visible to the users and reviewed.
Many heuristics have been proposed in the literature for identifying objects. Here are a
selected few:
During requirements elicitation, participating objects are generated for each use case. If
two use cases refer to the same concept, the corresponding object should be the same. If two objects
share the same name and do not correspond to the same concept, one or both concepts are renamed
to acknowledge and emphasize their difference. This consolidation eliminates any ambiguity in
the terminology used.
Once participating objects are identified and consolidated, the developers can use them as
a checklist for ensuring that the set of identified use cases is complete.
18
Once the client and the developers identify a set of nonfunctional requirements, they can
organize them into refinement and dependency graphs to identify further nonfunctional
requirements and identify conflicts. For more material on this topic, the reader is referred to the
specialized literature (e.g., [Chung et al., 1999]).
20
4. Session. During this activity, the JAD facilitator guides the team in creating the requirements
specification. A JAD session lasts for 3 to 5 days. The team defines and agrees on the scenarios,
use cases, and user interface mock-ups. All decisions are documented by a scribe.
5. Final document. The JAD facilitator prepares the Final Document, revising the working
document to include all decisions made during the session. The Final Document represents a
complete specification of the system agreed on during the session. The Final Document is
distributed to the session participants for review. The participants then attend a 1- to 2-hour
meeting to discuss the reviews and finalize the document.
JAD has been used by IBM and other companies. JAD leverages group dynamics to
improve communication among participants and to accelerate consensus. At the end of a JAD
session, developers are more knowledgeable of user needs, and users are more knowledgeable of
development trade-offs. Additional gains result from a reduction of redesign activities
downstream. Because of its reliance on social dynamics, the success of a JAD session often
depends on the qualifications of the JAD facilitator as a meeting facilitator. For a detailed overview
of JAD, the reader is referred to [Wood & Silver, 1989].
22
development context (e.g., reference to the problem statement written by the client, references to
existing systems, feasibility studies). The introduction also includes the objectives and success
criteria of the project.
The second section, Current system, describes the current state of affairs. If the new
system will replace an existing system, this section describes the functionality and the problems
of the current system. Otherwise, this section describes how the tasks supported by the new system
are accomplished now. For example, in the case of SatWatch, the user currently resets her watch
whenever she travels across a time zone. Because of the manual nature of this operation, the user
occasionally sets the wrong time and occasionally neglects to reset. In contrast, the SatWatch will
continually ensure accurate time within its lifetime. In the case of FRIEND, the current system is
paper based: dispatchers keep track of resource assignments by filling out forms. Communication
between dispatchers and field officers is by radio. The current system requires a high
documentation and management cost that FRIEND aims to reduce.
The third section, Proposed system, documents the requirements elicitation and the
analysis model of the new system. It is divided into four subsections:
• Overview presents a functional overview of the system.
• Functional requirements describe the high-level functionality of the system.
• Nonfunctional requirements describe user-level requirements that are not directly related to
functionality. This includes usability, reliability, performance, supportability, implementation,
interface, operational, packaging, and legal requirements.
• System models describe the scenarios, use cases, object model, and dynamic models for the
system. This section contains the complete functional specification, including mock-ups
illustrating the user interface of the system and navigational paths representing the sequence of
screens. The subsections Object model and Dynamic model are written during the Analysis
activity, described in the next chapter.
The RAD should be written after the use case model is stable, that is, when the number of
modifications to the requirements is minimal. The requirements, however, are updated throughout
the development process when specification problems are discovered or when the scope of the
system is changed. The RAD, once published, is baselined and put under configuration
management.4 The revision history section of the RAD will provide a history of changes include
the author responsible for each change, the date of the change, and a brief description of the change.
Module 2, Chapter 2
Analysis
Analysis results in a model of the system that aims to be correct, complete, consistent, and
unambiguous. Developers formalize the requirements specification produced during requirements
elicitation and examine in more detail boundary conditions and exceptional cases. Developers
validate, correct and clarify the requirements specification if any errors or ambiguities are found.
The client and the user are usually involved in this activity when the requirements specification
must be changed and when additional information must be gathered.
In object-oriented analysis, developers build a model describing the application domain.
For example, the analysis model of a watch describes how the watch represents time: Does the
watch know about leap years? Does it know about the day of the week? Does it know about the
phases of the moon? The analysis model is then extended to describe how the actors and the system
interact to manipulate the application domain model: How does the watch owner reset the time?
How does the watch owner reset the day of the week? Developers use the analysis model, together
with nonfunctional requirements, to prepare for the architecture of the system developed during
high-level design (Module 3, Chapter 1, System Design: Decomposing the System).
We focus on the identification of objects, their behavior, their relationships, their
classification, and their organization. We describe management issues related to analysis in the
context of a multi-team development project.
There is a natural tendency for users and developers to postpone difficult decisions until
later in the project. A decision may be difficult because of lack of domain knowledge, lack of
technological knowledge, or simply because of disagreements among users and developers.
Postponing decisions enables the project to move on smoothly and avoids confrontation with
reality or peers. Unfortunately, difficult decisions eventually must be made, often at higher cost
when intrinsic problems are discovered during testing, or worse, during user evaluation.
Translating a requirements specification into a formal or semiformal model forces developer to
identify and resolve difficult issues early in the development.
The analysis model is composed of three individual models: the functional model,
represented by use cases and scenarios, the analysis object model, represented by class and object
diagrams, and the dynamic model, represented by state machine and sequence diagrams (Figure
5-3). In the previous chapter, we described how to elicit requirements from the users and describe
them as use cases and scenarios. In this chapter, we describe how to refine the functional model
and derive the object and the dynamic model. This leads to a more precise and complete
specification as details are added to the analysis model. We conclude the chapter by describing
management activities related to analysis.
27
Note that most classes in the analysis object model will correspond to one or more software classes
in the source code. However, the software classes will include many more attributes and
associations than their analysis counterparts. Consequently, analysis classes should be viewed as
high-level abstractions that will be realized in much more detail later. Figure 5-4 depicts good and
bad examples of analysis objects for the SatWatch example.
three different types of objects on a syntactical basis: control objects may have the suffix Control
appended to their name; boundary objects may be named to clearly denote an interface feature
(e.g., by including the suffix Form, Button, Display, or Boundary); entity objects usually do not
have any suffix appended to their name. Another benefit of this naming convention is that the type
of the class is represented even when the UML stereotype is not available, for example, when
examining only the source code.
the relationship and the terms “generalization” and “specialization” to denote the activities that
find inheritance relationships.
synonyms for other nouns. Sorting through all the nouns for a large requirements specification is
a time-consuming activity. In general, Abbott’s heuristics work well for generating a list of initial
candidate objects from short descriptions, such as the flow of events of a scenario or a use case.
Developers name and briefly describe the objects, their attributes, and their responsibilities
as they are identified. Uniquely naming objects promotes a standard terminology. For entity objects
we recommend always to start with the names used by end users and application domain
specialists. Describing objects, even briefly, allows developers to clarify the concepts they use and
avoid misunderstandings (e.g., using one object for two different but related concepts). Developers
need not, however, spend a lot of time detailing objects or attributes given that the analysis model
is still in flux. Developers should document attributes and responsibilities if they are not obvious;
a tentative name and a brief description for each object is sufficient otherwise. There will be plenty
of iterations during which objects can be revised. However, once the analysis model is stable, the
description of each object should be as detailed as necessary (see Section 5.4.11).
Boundary objects model the user interface at a coarse level. They do not describe in detail the
visual aspects of the user interface. For example, boundary objects such as “menu item” or “scroll
bar” are too detailed. First, developers can discuss user interface details more easily with sketches
and mock-ups. Second, the design of the user interface continues to evolve as a consequence of
usability tests, even after the functional specification of the system becomes stable. Updating the
analysis model for every user interface change is time consuming and does not yield any
substantial benefit.
We have made progress toward describing the system. We now have included the interface
between the actor and the system. We are, however, still missing some significant pieces of the
description, such as the order in which the interactions between the actors and the system occur.
In the next section, we describe the identification of control objects.
In the next section, we construct a sequence diagram using the ReportEmergency use case and the
objects we discovered to ensure the completeness of our model.
33
Sharing operations across use cases allows developers to remove redundancies in the
requirements specification and to improve its consistency. Note that clarity should always be given
precedence to eliminating redundancy. Fragmenting behavior across many operations
unnecessarily complicates the requirements specification.
In analysis, sequence diagrams are used to help identify new participating objects and
missing behavior. Because sequence diagrams focus on high-level behavior, implementation issues
such as performance should not be addressed at this point. Given that building interaction diagrams
can be time consuming, developers should focus on problematic or underspecified functionality
first. Drawing interaction diagrams for parts of the system that are simple or well defined might
not look like a good investment of analysis resources, but it should also be done to avoid
overlooking some key decisions.
card). The name of the class is depicted on the top, its responsibilities in the left column, and the
names of the classes it needs to accomplish its responsibilities are depicted in the right column.
Figure 5-12 depicts two cards for the ReportEmergencyControl and the Incident classes.
CRC cards can be used during modeling sessions with teams. Participants, typically a mix
of developers and application domain experts, go through a scenario and identify the classes that
are involved in realizing the scenario. One card per instance is put on the table. Responsibilities
are then assigned to each class as the scenario unfolds and participants negotiate the
responsibilities of each object. The collaborators column is filled as the dependencies with other
cards are identified. Cards are modified or pushed to the side as new alternatives are explored.
Cards are never thrown away, because building blocks for past alternatives can be reused when
new ideas are put on the table.
CRC cards and sequence diagrams are two different representations for supporting the
same type of activity. Sequence diagrams are a better tool for a single modeler or for documenting
a sequence of interactions, because they are more precise and compact. CRC cards are a better tool
for a group of developers refining and iterating over an object structure during a brainstorming
session, because they are easier to create and to modify.
• A role at each end, identifying the function of each class with respect to the associations (e.g.,
author is the role played by FieldOfficer in the Writes association).
• A multiplicity at each end, identifying the possible number of instances (e.g., * indicates a
FieldOfficer may write zero or more EmergencyReports, whereas 1 indicates that each
EmergencyReport has exactly one FieldOfficer as author).
Initially, the associations between entity objects are the most important, as they reveal more
information about the application domain. According to Abbott’s heuristics (see Table 5-1),
associations can be identified by examining verbs and verb phrases denoting a state (e.g., has, is
part of, manages, reports to, is triggered by, is contained in, talks to, includes). Every association
should be named, and roles should be assigned to each end.
The object model will initially include too many associations if developers include all
associations identified after examining verb phrases. In Figure 5-14, for example, we identify two
relationships: the first between an Incident and the EmergencyReport that triggered its creation;
the second between the Incident and the reporting FieldOfficer. Given that the EmergencyReport
and FieldOfficer already have an association modeling authorship, the association between
Incident and FieldOfficer is not necessary. Adding unnecessary associations complicates the
model, leading to incomprehensible models and redundant information.
Most entity objects have an identifying characteristic used by the actors to access them.
FieldOfficers and Dispatchers have a badge number. Incidents and Reports are assigned numbers
and are archived by date. Once the analysis model includes most classes and associations, the
developers should go through each class and check how it is identified by the actors and in which
context. For example, are FieldOfficer badge numbers unique across the universe? Across a city?
A Police station? If they are unique across cities, can the FRIEND system know about
FieldOfficers from more than one city? This approach can be formalized by examining each
individual class and identifying the sequence of associations that need to be traversed to access a
specific instance of that class.
37
Aggregation associations are used in the analysis model to denote whole–part concepts.
Aggregation associations add information to the analysis model about how containment concepts
38
in the application domain can be organized in a hierarchy or in a directed graph. Aggregations are
often used in the user interface to help the user browse through many instances. For example, in
Figure 5-15, FRIEND could offer a tree representation for Dispatchers to find Counties within a
State or Townships with a specific County. However, as with many modeling concepts, it is easy
to over-structure the model. If you are not sure that the association you are describing is a whole–
part concept, it is better to model it as a one-to-many association, and revisit it later when you have
a better understanding of the application domain.
Properties that are represented by objects are not attributes. For example, every
EmergencyReport has an author that is represented by an association to the FieldOfficer class.
Developers should identify as many associations as possible before identifying attributes to avoid
confusing attributes and objects. Attributes have:
• A name identifying them within an object. For example, an EmergencyReport may have a
reportType attribute and an emergencyType attribute. The reportType describes the kind of report
being filed (e.g., initial report, request for resource, final report). The emergencyType describes
the type of emergency (e.g., fire, traffic, other). To avoid confusion, these attributes should not
both be called type.
• A brief description.
• A type describing the legal values it can take. For example, the description attribute of an
EmergencyReport is a string. The emergencyType attribute is an enumeration that can take one of
three values: fire, traffic, other. Attribute types are based on predefined basic types in UML.
Attributes can be identified using Abbott’s heuristics (see Table 5-1). In particular, a noun
phrase followed by a possessive phrase (e.g., the description of an emergency) or an adjective
phrase (e.g., the emergency description) should be examined. In the case of entity objects, any
property that must be stored by the system is a candidate attribute.
39
Note that attributes represent the least stable part of the object model. Often, attributes are
discovered or added late in the development when the system is evaluated by the users. Unless the
added attributes are associated with additional functionality, the added attributes do not entail
major changes in the object (and system) structure. For these reasons, the developers need not
spend excessive resources in identifying and detailing attributes that represent less important
aspects of the system. These attributes can be added later when the analysis model or the user
interface sketches are validated.
through 3.5.2 have already been written during requirements elicitation. During analysis, we revise
these sections as ambiguities and new functionality are discovered. The main effort, however,
focuses on writing the sections documenting the analysis object model (RAD Sections 3.5.3 and
3.5.4).
RAD Section 3.5.3, Object models, documents in detail all the objects we identified, their
attributes, and, when we used sequence diagrams, operations. As each object is described with
textual definitions, relationships among objects are illustrated with class diagrams.
RAD Section 3.5.4, Dynamic models, documents the behavior of the object model in terms
of state machine diagrams and sequence diagrams. Although this information is redundant with
the use case model, dynamic models enable us to represent more precisely complex behaviors,
including use cases involving many actors.
The RAD, once completed and published, will be baselined and put under configuration
management. The revision history section of the RAD will provides a history of changes including
the author responsible for each change, the date of the change, and brief description of the change.
• The end user is the application domain expert who generates information about the current
system, the environment of the future system, and the tasks it should support. Each user
corresponds to one or more actors and helps identify their associated use cases.
• The client, an integration role, defines the scope of the system based on user requirements.
Different users may have different views of the system, either because they will benefit from
different parts of the system (e.g., a dispatcher vs. a field officer) or because the users have different
opinions or expectations about the future system. The client serves as an integrator of application
domain information and resolves inconsistencies in user expectations.
• The analyst is the application domain expert who models the current system and generates
information about the future system. Each analyst is initially responsible for detailing one or more
use cases. For a set of use cases, the analysis will identify a number of objects, their associations,
and their attributes using the techniques outlined in Section 5.4. The analyst is typically a developer
with broad application domain knowledge.
• The architect, an integration role, unifies the use case and object models from a system point of
view. Different analysts may have different styles of modeling and different views of the parts of
the systems for which they are not responsible. Although analysts work together and will most
likely resolve differences as they progress through analysis, the role of the architect is necessary
to provide a system philosophy and to identify omissions in the requirements.
• The document editor is responsible for the low-level integration of the document and for the
overall format of the document and its index.
• The configuration manager is responsible for maintaining a revision history of the document
as well as traceability information relating the RAD with other documents (such as the System
Design Document; see Chapter 6, System Design: Decomposing the System).
• The reviewer validates the RAD for correctness, completeness, consistency, and clarity. Users,
clients, developers, or other individuals may become reviewers during requirements validation.
Individuals that have not yet been involved in the development represent excellent reviewers,
because they are more able to identify ambiguities and areas that need clarification.
The size of the system determines the number of different users and analysts that are needed
to elicit and model the requirements. In all cases, there should be one integrating role on the client
side and one on the development side. In the end, the requirements, however large the system,
should be understandable by a single individual knowledgeable in the application domain.
5.5.3 Communicating about Analysis
The task of communicating information is most challenging during requirements elicitation and
analysis. Contributing factors include
• Different backgrounds of participants. Users, clients, and developers have different domains
of expertise and use different vocabularies to describe the same concepts.
• Different expectations of stakeholders. Users, clients, and managements have different
objectives when defining the system. Users want a system that supports their current work
processes, with no interference or threat to their current position (e.g., an improved system often
translates into the elimination of current positions). The client wants to maximize return on
45
investment. Management wants to deliver the system on time. Different expectations and different
stakes in the project can lead to a reluctance to share information and to report problems in a timely
manner.
• New teams. Requirements elicitation and analysis often marks the beginning of a new project.
This translates into new participants and new team assignments, and, thus, into a ramp-up period
during which team members must learn to work together.
• Evolving system. When a new system is developed from scratch, terms and concepts related to
the new system are in flux during most of the analysis and the system design. A term may have a
different meaning tomorrow.
No requirements method or communication mechanism can address problems related to
internal politics and information hiding. Conflicting objectives and competition will always be part
of large development projects. A few simple guidelines, however, can help in managing the
complexity of conflicting views of the system:
• Define clear territories. Defining roles as described in Section 5.5.2 is part of this activity. This
also includes the definition of private and public discussion forums. For example, each team may
have a discussion database as described in Chapter 3, Project Organization and
Communication, and discussion with the client is done on a separate client database. The client
should not have access to the internal database. Similarly, developers should not interfere with
client/user internal politics.
• Define clear objectives and success criteria. The co-definition of clear, measurable, and
verifiable objectives and success criteria by both the client and the developers facilitates the
resolution of conflicts. Note that defining a clear and verifiable objective is a nontrivial task, given
that it is easier to leave objectives open-ended. The objectives and the success criteria of the project
should be documented in Section 1.3 of the RAD.
• Brainstorm. Putting all the stakeholders in the same room and to quickly generate solutions and
definitions can remove many barriers in communication. Conducting reviews as a reciprocal
activity (i.e., reviewing deliverables from both the client and the developers during the same
session) has a similar effect.
Brainstorming, and more generally the cooperative development of requirements, can lead
to the definition of shared, ad hoc notations for supporting the communication. Storyboards, user
interface sketches, and high-level dataflow diagrams often appear spontaneously. As the
information about the application domain and the new system accrue, it is critical that a precise
and structured notation be used. In UML, developers employ use cases and scenarios for
communicating with the client and the users, and use object diagrams, sequence diagrams, and
state machines to communicate with other developers (see Sections 4.4 and 5.4). Moreover, the
latest release of the requirements should be available to all participants. Maintaining a live online
version of the requirements analysis document with an up-to-date change history facilitates the
timely propagation of changes across the project.
46
idea and agree about the functions and features that the system will have. In addition, they agree
on:
• a list of priorities
• a revision process
• a list of criteria that will be used to accept or reject the system
• a schedule and a budget.
Prioritizing system functions allows the developers to understand better the client’s
expectations. In its simplest form, it allows developers to separate bells and whistles from essential
features. It also allows developers to deliver the system in incremental chunks: essential functions
are delivered first; additional chunks are delivered depending on the evaluation of the previous
chunk. Even if the system is to be delivered as a single, complete package, prioritizing functions
enables the client to communicate clearly what is important to her and where the emphasis of the
development should be. Figure 5-21 provides an example of a priority scheme.
After the client sign off, the requirements are baselined and are used for refining the cost
estimate of the project. Requirements continue to change after the sign-off, but these changes are
subject to a more formal revision process. The requirements change, whether because of errors,
omissions, changes in the operating environment, changes in the application domain, or changes
in technology. Defining a revision process up front encourages changes to be communicated across
the project and reduces the number of surprises in the later stages. Note that a change process need
not be bureaucratic or require excessive overhead. It can be as simple as naming a person
responsible for receiving change requests, approving changes, and tracking their implementation.
Figure 5-22 depicts a more complex example in which changes are designed and reviewed
by the client before they are implemented in the system. In all cases, acknowledging that
requirements cannot be frozen (but only baselined) will benefit the project.
The list of acceptance criteria is revised prior to sign-off. The requirements elicitation and
analysis activity clarify many aspects of the system, including the nonfunctional requirements with
which the system should comply and the relative importance of each function. By restating the
acceptance criteria at sign-off, the client ensures that the developers are updated about any changes
in client expectations.
The budget and schedule are revisited after the analysis model becomes stable.
Whether the client sign-off is a contractual agreement or whether the project is already
governed by a prior contract, it is an important milestone in the project. It represents the
48
convergence of client and developer on a single set of functional definitions of the system and a
single set of expectations. The acceptance of the requirements analysis document is more critical
than any other document, given that many activities depend on the analysis model.
Module 3, Chapter 1
System Design: Decomposing the System
System design is the transformation of an analysis model into a system design model. During
system design, developers define the design goals of the project and decompose the system into
smaller subsystems that can be realized by individual teams. Developers also select strategies for
building the system, such as the hardware/software strategy, the persistent data management
strategy, the global control flow, the access control policy, and the handling of boundary
conditions. The result of system design is a model that includes a subsystem decomposition and a
clear description of each of these strategies.
System design is not algorithmic. Developers have to make trade-offs among many design
goals that often conflict with each other. They also cannot anticipate all design issues that they will
face because they do not yet have a clear picture of the solution domain. System design is
decomposed into several activities, each addressing part of the overall problem of decomposing
the system:
• Identify design goals. Developers identify and prioritize the qualities of the system that
they should optimize.
• Design the initial subsystem decomposition. Developers decompose the system into
smaller parts based on the use case and analysis models. Developers use standard architectural
styles as a starting point during this activity.
• Refine the subsystem decomposition to address the design goals. The initial
decomposition usually does not satisfy all design goals. Developers refine it until all goals are
satisfied.
6.2 An Overview of System Design
Analysis results in the requirements model described by the following products:
• a set of nonfunctional requirements and constraints, such as maximum response time,
minimum throughput, reliability, operating system platform, and so on
• a use case model, describing the system functionality from the actors’ point of view
• an object model, describing the entities manipulated by the system
• a sequence diagram for each use case, showing the sequence of interactions among objects
participating in the use case.
The analysis model describes the system completely from the actors’ point of view and
serves as the basis of communication between the client and the developers. The analysis model,
however, does not contain information about the internal structure of the system, its hardware
configuration, or more generally, how the system should be realized. System design is the first step
in this direction. System design results in the following products:
• design goals, describing the qualities of the system that developers should optimize
• software architecture, describing the subsystem decomposition in terms of subsystem
responsibilities, dependencies among subsystems, subsystem mapping to hardware, and major
policy decisions such as control flow, access control, and data storage
50
• boundary use cases, describing the system configuration, startup, shutdown, and
exception handling issues.
The design goals are derived from the nonfunctional requirements. Design goals guide the
decisions to be made by the developers when trade-offs are needed. The subsystem decomposition
constitutes the bulk of system design. Developers divide the system into manageable pieces to deal
with complexity: each subsystem is assigned to a team and realized independently. For this to be
possible, developers need to address system-wide issues when decomposing the system. In this
chapter, we describe the concept of subsystem decomposition and discuss examples of generic
system decompositions called “architectural styles.” In the next chapter, we describe how the
system decomposition is refined to meet specific design goals. Figure 6-2 depicts the relationship
of system design with other software engineering activities.
at layering and partitioning, two techniques for relating subsystems to each other (Section 6.3.4).
Layering allows a system to be organized as a hierarchy of subsystems, each providing higher-
level services to the subsystem above it by using lower-level services from the subsystems below
it. Partitioning organizes subsystems as peers that mutually provide different services to each
other. In Section 6.3.5, we describe a number of typical software architectures that are found in
practice.
6.3.1 Subsystems and Classes
In Chapter 2, Modeling with UML, we introduced the distinction between application domain and
solution domain. In order to reduce the complexity of the application domain, we identified smaller
parts called “classes” and organized them into packages. Similarly, to reduce the complexity of the
solution domain, we decompose a system into simpler parts, called “subsystems,” which are made
of a number of solution domain classes. A subsystem is a replaceable part of the system with well-
defined interfaces that encapsulates the state and behavior of its contained classes. A subsystem
typically corresponds to the amount of work that a single developer or a single development team
can tackle. By decomposing the system into relatively independent subsystems, concurrent teams
can work on individual subsystems with minimal communication overhead. In the case of complex
subsystems, we recursively apply this principle and decompose a subsystem into simpler
subsystems (see Figure 6-3).
Several programming languages (e.g., Java and Modula-2) provide constructs for modeling
subsystems (packages in Java, modules in Modula-2). In other languages, such as C or C++,
subsystems are not explicitly modeled, so developers use conventions for grouping classes (e.g., a
subsystem can be represented as a directory containing all the files that implement the subsystem).
Whether or not subsystems are explicitly represented in the programming language, developers
need to document carefully the subsystem decomposition as subsystems are usually realized by
different teams.
6.3.2 Services and Subsystem Interfaces
A subsystem is characterized by the services it provides to other subsystems. A service is a set of
related operations that share a common purpose. A subsystem providing a notification service, for
example, defines operations to send notices, look up notification channels, and subscribe and
unsubscribe to a channel. The set of operations of a subsystem that are available to other
subsystems form the subsystem interface. The subsystem interface includes the name of the
operations, their parameters, their types, and their return values. System design focuses on defining
the services provided by each subsystem, that is, enumerating the operations, their parameters, and
their high-level behavior. Object design will focus on the application programmer interface
(API), which refines and extends the subsystem interfaces. The API also includes the type of the
parameters and the return value of each operation.
Provided and required interfaces can be depicted in UML with assembly connectors, also
called ball-and-socket connectors. The provided interface is shown as a ball icon (also called
lollipop) with its name next to it. A required interface is shown as a socket icon. The dependency
53
between two subsystems is shown by connecting the corresponding ball and socket in the
component diagram.
architecture, each layer can access only the layer immediately below it. In an open architecture,1
a layer can also access layers at deeper levels.
The Physical layer represents the hardware interface to the network. It is responsible for
transmitting bits over a communication channel. The DataLink layer is responsible for transmitting
data frames without error using the services of the Physical layer. The Network layer is responsible
for transmitting and routing packets within a network. The Transport layer is responsible for
ensuring that the data are reliably transmitted from end to end. The Transport layer is the interface
Unix programmers see when transmitting information over TCP/IP sockets between two processes.
The Session layer is responsible for initializing and authenticating a connection. The Presentation
layer performs data transformation services, such as byte swapping and encryption. The
Application layer is the system you are designing (unless you are building an operating system or
protocol stack). The Application layer can also consist of layered subsystems.
An example of an open architecture is the Swing user interface toolkit for Java [JFC, 2009].
The lowest layer is provided by the operating system or by a windowing system, such as X11, and
provides basic window management. AWT is an abstract window interface provided by Java to
shield applications from specific window platforms. Swing is a library of user interface objects
that provides a wide range of facilities, from buttons to geometry management. An application
usually accesses only the Swing interface. However, the Application layer may bypass the Swing
layer and directly access AWT. In general, the openness of the architecture allows developers to
bypass the higher layers to address performance bottlenecks (Figure 6-12).
56
Another approach to dealing with complexity is to partition the system into peer
subsystems, each responsible for a different class of services. For example, an onboard system for
a car could be decomposed into a travel service that provides real-time directions to the driver, an
individual preferences service that remembers a driver’s seat position and favorite radio station,
and vehicle service that tracks the car’s gas consumption, repairs, and scheduled maintenance.
Each subsystem depends loosely on the others, but can often operate in isolation.
The repository only ensures that concurrent accesses are serialized. Conversely, the
repository can be used to invoke the subsystems based on the state of the central data structure.
These systems are called “blackboard systems.” The HEARSAY II speech understanding system
[Erman et al., 1980], one of the first blackboard systems, invoked tools based on the current state
of the blackboard.
58
Repositories are well suited for applications with constantly changing, complex data
processing tasks. Once a central repository is well defined, we can easily add new services in the
form of additional subsystems. The main disadvantage of repository systems is that the central
repository can quickly become a bottleneck, both from a performance aspect and a modifiability
aspect. The coupling between each subsystem and the repository is high, thus making it difficult
to change the repository without having an impact on all subsystems.
Model/View/Controller
In the Model/View/Controller (MVC) architectural style (Figure 6-15), subsystems are
classified into three different types: model subsystems maintain domain knowledge, view
subsystems display it to the user, and controller subsystems manage the sequence of interactions
with the user. The model subsystems are developed such that they do not depend on any view or
controller subsystem. Changes in their state are propagated to the view subsystem via a
subscribe/notify protocol. The MVC is a special case of the repository where Model implements
the central data structure and control objects dictate the control flow.
The subscription and notification functionality associated with this sequence of events is
usually realized with an Observer design pattern (see Section A.7). The Observer design pattern
allows the Model and the View objects to be further decoupled by removing direct dependencies
from the Model to the View. For more details, the reader is referred to [Gamma et al., 1994] and
to Section A.7.
The rationale between the separation of Model, View, and Controller is that user interfaces,
i.e., the View and the Controller, are much more often subject to change than is domain knowledge,
i.e., the Model. Moreover, by removing any dependency from the Model on the View with the
subscription/notification protocol, changes in the views (user interfaces) do not have any effect on
the model subsystems. In the example of Figure 6-16, we could add a Unix-style shell view of the
file system without having to modify the file system. We described a similar decomposition in
Module 2, Chapter 1, Analysis, when we identified entity, boundary, and control objects. This
decomposition is also motivated by the same considerations about change.
MVC is well suited for interactive systems, especially when multiple views of the same
model are needed. MVC can be used for maintaining consistency across distributed data; however,
it introduces the same performance bottleneck as for other repository styles.
59
Client/server
In the client/server architectural style (Figure 6-18), a subsystem, the server, provides
services to instances of other subsystems called the clients, which are responsible for interacting
with the user. The request for a service is usually done via a remote procedure call mechanism or
a common object broker (e.g., CORBA, Java RMI, or HTTP). Control flow in the clients and the
servers is independent except for synchronization to manage requests or to receive results.
Callbacks are operations that are temporary and customized for a specific purpose. For
example, a DBUser peer in Figure 6-21 can tell the DBMS peer which operation to invoke upon a
change notification. The DBUser then uses the callback operation specified by each DBUser for
notification when a change occurs. Peer-to-peer systems in which a “server” peer invokes “client”
peers only through callbacks are often referred to as client/server systems, even though this is
inaccurate since the “server” can also initiate the control flow.
Three-tier
The three-tier architectural style organizes subsystems into three layers (Figure 6-22):
• The interface layer includes all boundary objects that deal with the user, including
windows, forms, web pages, and so on.
• The application logic layer includes all control and entity objects, realizing the
processing, rule checking, and notification required by the application.
• The storage layer realizes the storage, retrieval, and query of persistent objects
The three-tier architectural style was initially described in the 1970s for information
systems. The storage layer, an analog to the Repository subsystem in the repository architectural
style, can be shared by several different applications operating on the same data. In turn, the
separation between the interface layer and the application logic layer enables the development or
modification of different user interfaces for the same application logic.
61
Four-tier
The four-tier architectural style is a three-tier architecture in which the Interface layer is
decomposed into a Presentation Client layer and a Presentation Server layer (Figure 6-23). The
Presentation Client layer is located on the user machines, whereas the Presentation Server layer
can be located on one or more servers. The four-tier architecture enables a wide range of different
presentation clients in the application, while reusing some of the presentation objects across
clients. For example, a banking information system can include a host of different clients, such as
a Web browser interface for home users, an Automated Teller Machine, and an application client
for bank employees. Forms shared by all three clients can then be defined and processed in the
Presentation Server layer, thus removing redundancy across clients.
Pipe and filter styles are suited for systems that apply transformations to streams of data
without intervention by users. They are not suited for systems that require more complex
interactions between components, such as an information management system or an interactive
system.
We perform the analysis for the MyTrip system following the techniques outlined in Chapter 5,
Analysis, and obtain the model in Figure 6-28.
63
For example, in the light of the nonfunctional requirements for MyTrip described in Section
6.4.1, we identify reliability and fault tolerance to connectivity loss as design goals. We then
identify security as a design goal, as numerous drivers will have access to the same trip planning
server. We add modifiability as a design goal, as we want to provide the ability for drivers to select
a trip planning service of their choice. The following box summarizes the design goals we
identified.
In general, we can select design goals from a long list of highly desirable qualities. Tables
6-2 through 6-6 list a number of possible design criteria. These criteria are organized into five
groups: performance, dependability, cost, maintenance, and end user criteria. Performance,
dependability, and end user criteria are usually specified in the requirements or inferred from the
application domain. Cost and maintenance criteria are dictated by the customer and the supplier.
Performance criteria (Table 6-2) include the speed and space requirements imposed on
the system.
Dependability criteria (Table 6-3) determine how much effort should be expended in
minimizing system crashes and their consequences.
65
Cost criteria (Table 6-4) include the cost to develop the system, to deploy it, and to
administer it. Note that cost criteria not only include design considerations but managerial ones,
as well. When the system is replacing an older one, the cost of ensuring backward compatibility
or transitioning to the new system has to be taken into account. There are also trade-offs between
different types of costs such as development cost, end user training cost, transition costs, and
maintenance costs. Maintaining backward compatibility with a previous system can add to the
development cost while reducing the transition cost.
Maintenance criteria (Table 6-5) determine how difficult it is to change the system after
deployment.
End user criteria (Table 6-6) include qualities that are desirable from a users’ point of
view, but have not yet been covered under the performance and dependability criteria. Often these
criteria do not receive much attention, especially when the client contracting the system is different
from its users.
66
When defining design goals, only a small subset of these criteria can be simultaneously
taken into account. It is, for example, unrealistic to develop software that is safe, secure, and cheap.
Typically, developers need to prioritize design goals and trade them off against each other as well
as against managerial goals as the project runs behind schedule or over budget. Table 6-7 lists
several possible trade-offs.
Managerial goals can be traded off against technical goals (e.g., delivery time vs.
functionality). Once we have a clear idea of the design goals, we can proceed to design an initial
subsystem decomposition.
6.4.3 Identifying Subsystems
Finding subsystems during system design is similar to finding objects during analysis. For
example, some of the object identification techniques we described in Chapter 5, Analysis, such
as Abbotts’s heuristics, are applicable to subsystem identification. Moreover, subsystem
decomposition is constantly revised whenever new issues are addressed: several subsystems are
merged into one subsystem, a complex subsystem is split into parts, and some subsystems are
67
added to address new functionality. The first iterations over subsystem decomposition can
introduce drastic changes in the system design model. These are often best handled through
brainstorming.
Another heuristic for subsystem identification is to keep functionally related objects
together. A starting point is to assign the participating objects that have been identified in each use
case to the subsystems. Some group of objects, as the Trip group in MyTrip, are shared and used
for communicating information from one subsystem to another. We can either create a new
subsystem to accommodate them or assign them to the subsystem that creates these objects.
Module 3, Chapter 2
System Design: Addressing Design Goals
During system design, we identify design goals, decompose the system into subsystems, and refine
the subsystem decomposition until all design goals are addressed. In the previous chapter, we
described the concepts of design goals and system decomposition. In this chapter, we introduce
the system design activities that address the design goals. In particular, we examine
• Selection of off-the-shelf and legacy components. Off-the-shelf or legacy components realize
specific subsystems more economically. The initial subsystem decomposition is adjusted to
accommodate them.
• Mapping of subsystem to hardware. When the system is deployed on several nodes, additional
subsystems are required for addressing reliability or performance issues.
• Design of a persistent data management infrastructure. Managing the states that outlives a
single execution of the system has an impact on overall system performance and leads to the
identification of one or more storage subsystems.
• Specification of an access control policy. Shared objects are protected so that user access to
them is controlled. Access control impacts how objects are distributed within subsystems.
• Design of the global control flow. Determining the sequence of operations impacts the interface
of the subsystems.
• Handling of boundary conditions. Once all subsystems have been identified, developers decide
on the order in which individual components are started and shutdown.
We then describe the management issues related to system design, such as documentation,
responsibilities, and communication.
7.2 An Overview of System Design Activities
Design goals guide the decisions to be made by the developers especially when trade-offs are
needed. Developers divide the system into manageable pieces to deal with complexity: each
subsystem is assigned to a team and realized independently. In order for this to be possible, though,
developers need to address system-wide issues when decomposing the system. In particular, they
need to address the following issues:
• Hardware/software mapping: What is the hardware configuration of the system? Which node
is responsible for which functionality? How is communication between nodes realized? Which
services are realized using existing software components? How are these components
encapsulated? Addressing hardware/software mapping issues often leads to the definition of
additional subsystems dedicated to moving data from one node to another, dealing with
concurrency, and reliability issues. Off-the-shelf components enable developers to realize complex
services more economically. User interface packages and database management systems are prime
examples of off-the shelf components. Components, however, should be encapsulated to minimize
dependency on a particular component; a competing vendor may offer a better product in the
future, and you want the option to switch.
69
• Data management: Which data should be persistent? Where should persistent data be stored?
How are they accessed? Persistent data represents a bottleneck in the system on many different
fronts: most functionality in system is concerned with creating or manipulating persistent data. For
this reason, access to the data should be fast and reliable. If retrieving data is slow, the whole
system will be slow. If data corruption is likely, complete system failure is likely. These issues
must be addressed consistently at the system level. Often, this leads to the selection of a database
management system and of an additional subsystem dedicated to the management of persistent
data.
• Access control: Who can access which data? Can access control change dynamically? How is
access control specified and realized? Access control and security are system wide issues. The
access control must be consistent across the system; in other words, the policy used to specify who
can and cannot access certain data should be the same across all subsystems.
• Control flow: How does the system sequence operations? Is the system event driven? Can it
handle more than one user interaction at a time? The choice of control flow has an impact on the
interfaces of subsystems. If an event-driven control flow is selected, subsystems will provide event
handlers. If threads are selected, subsystems must guarantee mutual exclusion in critical sections.
• Boundary conditions: How is the system initialized and shut down? How are exceptional cases
handled? System initialization and shutdown often represent much of the complexity of a system,
especially in a distributed environment. Initialization, shutdown, and exception handling have an
impact on the interface of all subsystems.
Figure 7-1 depicts the activities of system design. Each activity addresses one of the issues
we described above. Addressing any one of these issues can lead to changes in subsystem
decomposition and raising new issues.
70
The deployment diagram in Figure 7-2 focuses on the allocation of components to nodes
and provides a high-level view of each component. Components can be refined to include
information about the interfaces they provide and the classes they contain. Figure 7-3 illustrates
the WebServer component and its containing classes.
71
Where and how data is stored in the system affects system decomposition. In some cases,
for example, in a repository architectural style (see Section 6.3.5), a subsystem can be completely
dedicated to the storage of data. The selection of a specific database management system can also
have implications on the overall control strategy and concurrency management.
Identifying persistent objects
First, we identify which data must be persistent. The entity objects identified during
analysis are obvious candidates for persistency. In general, we can identify persistent objects by
examining all the classes that must survive system shutdown, either in case of a controlled
shutdown or an unexpected crash. The system will then restore these long-lived objects by
retrieving their attributes from storage during system initialization or on demand as the persistent
objects are needed.
Selecting a storage management strategy
Once all persistent objects are identified, we need to decide how these objects should be
stored. The decision for storage management is more complex and is usually dictated by
nonfunctional requirements: Should the objects be retrieved quickly? Must the system perform
complex queries to retrieve these objects? Do objects require a lot of memory or disk space? In
general, there are currently three options for storage management:
• Flat files. Files are the storage abstractions provided by operating systems. The
application stores its data as a sequence of bytes and defines how and when data should be
retrieved. The file abstraction is relatively low level and enables the application to perform a
variety of size and speed optimizations. Files, however, require the application to take care of many
issues, such as concurrent access and loss of data in case of system crash.
• Relational database. A relational database provides data abstraction at a higher level
than flat files. Data are stored in tables that comply with a predefined type called a schema. Each
column in the table represents an attribute. Each row represents a data item as a tuple of attribute
values. Several tuples in different tables are used to represent the attributes of an individual object.
Mapping complex object models to a relational schema is challenging. Specialized methods, such
as [Blaha & Premerlani, 1998], provide a systematic way of performing this mapping. Relational
databases also provide services for concurrency management, access control, and crash recovery.
Relational databases have been used for a while and are a mature technology. Although scalable
and ideal for large data sets, they are relatively slow for small data sets and for unstructured data
(e.g., images, natural language text).
• Object-oriented database. An object-oriented database provides services similar to a
relational database. Unlike a relational database, it stores data as objects and associations. In
addition to providing a higher level of abstraction (and thus reducing the need to translate between
objects and storage entities), object-oriented databases provide developers with inheritance and
abstract data types. Object-oriented databases significantly reduce the time for the initial
development of the storage subsystem. However, they are slower than relational databases for
typical queries and are more difficult to tune.
74
Defining access control for a multi-user system is usually more complex than in MyTrip.
In general, we need to define for each actor which operations they can access on each shared object.
For example, a bank teller may post credits and debits up to a predefined amount. If the transaction
exceeds the predefined amount, a manager must approve the transaction. Managers can examine
the branch statistics; but cannot access the statistics of other branches. Analysts can access
information across all branches of the corporation, but cannot post transactions on individual
accounts. We model access on classes with an access matrix. The rows of the matrix represent the
actors of the system. The columns represent classes whose access we control. An entry (class,
actor) in the access matrix is called an access right and lists the operations (e.g., postSmallDebit(),
postLargeDebit(), examineGlobalStats()) that can be executed on instances of the class by the
actor.
We can represent the access matrix using one of three different approaches: global access
table, access control list, and capabilities.
• A global access table represents explicitly every cell in the matrix as a (actor,class,
operation) tuple. Determining if an actor has access to a specific object requires looking up the
corresponding tuple. If no such tuple is found, access is denied.
75
• An access control list associates a list of (actor,operation) pairs with each class to be
accessed. Empty cells are discarded. Every time an object is accessed, its access list is checked for
the corresponding actor and operation. An example of an access control list is the guest list for a
party. A butler checks the arriving guests by comparing their names against names on the guest
list. If there is a match, the guests can enter; otherwise, they are turned away.
• A capability associates a (class,operation) pair with an actor. A capability allows an actor
access to an object of the class described in the capability. Denying a capability is equivalent to
denying access. An example of a capability is an invitation card for a party. In this case, the butler
checks if the arriving guests hold an invitation for the party. If the invitation is valid, the guests are
admitted; otherwise, they are turned away. No other checks are necessary.
The representation of the access matrix is also a performance issue. Often, the number of
actors and the number of protected objects is too large for either the capability or the access control
list representations. In such cases, rules can be used as a compact representation of the global
access matrix. For example, firewalls protect services located on Intranet Hosts from other hosts
on the Internet. Based on the source host and port, destination host and port, and packet size, the
firewall allows or denies packets to reach their destination.
When the number of actors and objects is large, a rule-based representation is more
compact than either access control lists or capabilities. Moreover, a small set of rules is more
readable, and hence, more easily proofed by a human reader, which is a critical aspect when setting
up a secure environment.
An access matrix only represents static access control. This means that access rights can
be modeled as attributes of the objects of the system. In the bank information system example,
consider a broker actor who is assigned a set of portfolios. By policy, a broker cannot access
portfolios managed by another broker. In this case, we need to model access rights dynamically in
the system, and, hence, this type of access is called dynamic access control.
In both static and dynamic access control, we assume that we know the actor: either the
user behind the keyboard or the calling subsystem. This process of verifying the association
between the identity of the user or subsystem and the system is called authentication. A widely
used authentication mechanism, for example, is for the user to specify a user name, known by
everybody, and a corresponding password, only known to the system and stored in an access
control list. The system protects its users’ passwords by encrypting them before storing or
transmitting them. If only a single user knows this user name–password combination, then we can
assume that the user behind the keyboard is legitimate. Although password authentication can be
made secure with current technology, it suffers from many usability disadvantages: users choose
passwords that are easy to remember and, thus, easy to guess. They also tend to write their
password on notes that they keep close to their monitor, and thus, visible to many other users,
authorized or not. Fortunately, other, more secure authentication mechanisms are available. For
example, a smart card can be used in conjunction with a password: an intruder would need both
the smart card and the password to gain access to the system. Better, we can use a biometric sensor
for analyzing patterns of blood vessels in a person’s fingers or eyes. An intruder would then need
76
the physical presence of the legitimate user to gain access to the system, which is much more
difficult than just stealing a smart card.
Encryption is used to prevent such unauthorized access. Using an encryption algorithm,
we can translate a message, called “plaintext,” into an encrypted message, called a “ciphertext,”
such that even if intercepted, it cannot be understood. Only the receiver has sufficient knowledge
to correctly decrypt the message, that is, to reverse the original process. The encryption process is
parameterized by a “key,” such that the method of encryption and decryption can be switched
quickly in case the intruder manages to obtain sufficient knowledge to decrypt the message.
Once authentication and encryption are provided, application-specific access control can
be more easily implemented on top of these building blocks. In all cases, addressing security issues
is a difficult topic. When addressing these issues, developers should record their assumptions and
describe the intruder scenarios they are considering. When several alternatives are explored,
developers should state the design problems they are attempting to solve and record the results of
the evaluation. We describe in the next chapter how to do this systematically using issue modeling.
7.4.4 Designing the Global Control Flow
Control flow is the sequencing of actions in a system. In object-oriented systems, sequencing
actions includes deciding which operations should be executed and in which order. These decisions
are based on external events generated by an actor or on the passage of time.
Control flow is a design problem. During analysis control flow is not an issue, because we
assume that all objects are running simultaneously executing operations any time they need to.
During system design, we need to take into account that not every object has the luxury of running
on its own processor. There are three possible control flow mechanisms:
• Procedure-driven control. Operations wait for input whenever they need data from an
actor. This kind of control flow is mostly used in legacy systems and systems written in procedural
languages. It introduces difficulties when used with object-oriented languages. As the sequencing
of operations is distributed among a large set of objects, it becomes increasingly difficult to
determine the order of inputs by looking at the code.
• Event-driven control. A main loop waits for an external event. Whenever an event
becomes available, it is dispatched to the appropriate object, based on information associated with
the event. This kind of control flow has the advantage of leading to a simpler structure and to
centralizing all input in the main loop. However, it makes the implementation of multi-step
sequences more difficult to implement.
• Threads. Threads are the concurrent variation of procedure-driven control: The system
can create an arbitrary number of threads, each responding to a different event. If a thread needs
additional data, it waits for input from a specific actor. This kind of control flow is the most
intuitive of the three mechanisms. However, debugging threaded software requires good tools:
preemptive thread schedulers introduce nondeterminism and, thus, make testing harder.
77
use case. For example, the RouteAssistant can completely download the Trip onto the car before
the start of the trip.
In general, an exception is an event or error that occurs during the execution of the system.
Exceptions are caused by three different sources:
• A hardware failure. Hardware ages and fails. A hard disk crash can lead to the permanent
loss of data. The failure of a network link, for example, can momentarily disconnect two nodes of
the system.
• Changes in the operating environment. The environment also affects the way a system
works. A wireless mobile system can lose connectivity if it is out of range of a transmitter. A power
outage can bring down the system, unless it is fitted with back-up batteries.
• A software fault. An error can occur because the system or one of its components
contains a design error. Although writing bug-free software is difficult, individual subsystems can
anticipate errors from other subsystems and protect against them.
Exception handling is the mechanism by which a system treats an exception. In the case
of a user error, the system should display a meaningful error message to the user so that she can
correct her input. In the case of a network link failure, the system should save its temporary state
so that it can recover when the network comes back on line.
7.4.7 Reviewing System Design
Like analysis, system design is an evolutionary and iterative activity. Unlike analysis, there is no
external agent, such as the client, to review the successive iterations and ensure better quality. This
quality improvement activity is still necessary, and project managers and developers need to
organize a review process to substitute for it. Several alternatives exist, such as using the
developers who were not involved in system design to act as independent reviewers, or to use
developers from another project to act as a peer review. These review processes work only if the
reviewers have an incentive to discover and report problems.
In addition to meeting the design goals that were identified during system design, we need
to ensure that the system design model is correct, complete, consistent, realistic, and readable. The
system design model is correct if the analysis model can be mapped to the system design model.
The model is complete if every requirement and every system design issue has been
addressed.
The model is consistent if it does not contain any contradictions.
The model is realistic if the corresponding system can be implemented.
The model is readable if developers not involved in the system design can understand the
model.
In many projects, you will find that system design and implementation overlap quite a bit.
For example, you may build prototypes of selected subsystems before the architecture is stable in
order to evaluate new technologies. This leads to many partial reviews instead of an encompassing
review followed by a client sign-off, as for analysis. Although this process yields greater flexibility,
it also requires developers to track open issues more carefully. Many difficult issues tend to be
resolved late not because they are difficult, but because they fell through the cracks of the process.
79
• Access control and security describes the user model of the system in terms of an access
matrix. This section also describes security issues, such as the selection of an authentication
mechanism, the use of encryption, and the management of keys.
• Global software control describes how the global software control is implemented. In
particular, this section should describe how requests are initiated and how subsystems synchronize.
This section should list and address synchronization and concurrency issues.
• Boundary conditions describe the start-up, shutdown, and error behavior of the system.
(If new use cases are discovered for system administration, these should be included in the
requirements analysis document, not in this section.)
The fourth section, Subsystem services, describes the services provided by each
subsystem. Although this section is usually empty or incomplete in the first versions of the SDD,
this section serves as a reference for teams for the boundaries between their subsystems. The
interface of each subsystem is derived from this section and detailed in the Object Design
Document.
The SDD is written after the initial system decomposition is done; that is, system architects
should not wait until all system design decisions are made before publishing the document. The
SDD, moreover, is updated throughout the process when design decisions are made or problems
are discovered. The SDD, once published, is baselined and put under configuration management.
The revision history section of the SDD provides a history of changes as a list of changes, including
author responsible for the change, date of change, and brief description of the change.
developers who will implement the subsystem. It is critical that system design include people who
are exposed to the consequences of system design decisions. The architecture team starts work as
soon as the analysis model is stable and continues to function until the end of the integration phase.
This creates an incentive for the architecture team to anticipate problems encountered during
integration. Below are the main roles of system design:
• The architect takes the main role in system design. The architect ensures consistency in
design decisions and interface styles. The architect ensures the consistency of the design in the
configuration management and testing teams, in particular in the formulation of the configuration
management policy and the system integration strategy. This is mainly an integration role
consuming information from each subsystem team. The architect is the leader of the cross-
functional architecture team.
• Architecture liaisons are the members of the architecture team. They are representatives
from the subsystem teams. They convey information from and to their teams and negotiate
interface changes. During system design, they focus on the subsystem services; during the
implementation phase, they focus on the consistency of the APIs.
• The document editor, configuration manager, and reviewer roles are the same as for
analysis (see Section 5.5.2).
The number of subsystems determines the size of the architecture team. For complex
systems, an architecture team is introduced for each level of abstraction. In all cases, there should
be one integrating role on the team to ensure consistency and the understandability of the
architecture by a single individual.
7.5.3 Communicating about System Design
Communication during system design should be less challenging than during analysis: the
functionality of the system has been defined; project participants have similar backgrounds and by
now should know each other better. Communication is still difficult, due to new sources of
complexity:
• Size. The number of issues to be dealt with increases as developers start designing. The
number of items that developers manipulate increases: each piece of functionality requires many
operations on many objects. Moreover, developers investigate, often concurrently, multiple
designs and multiple implementation technologies.
• Change. The subsystem decomposition and the interfaces of the subsystems are in
constant flux. Terms used by developers to name different parts of the system evolve constantly.
If the change is rapid, developers may not be discussing the same version of the subsystem, which
can lead to much confusion.
• Level of abstraction. Discussions about requirements can be made concrete by using
interface mock-ups and analogies with existing systems. Discussions about implementation
become concrete when integration and test results are available. System design discussions are
seldom concrete, as consequences of design decisions are felt only later, during implementation
and testing.
82
• Reluctance to confront problems. The level of abstraction of most discussions can also
make it easy to delay the resolution of difficult issues. A typical resolution of control issues is
often, “Let us revisit this issue during implementation.” Whereas it is usually desirable to delay
certain design decisions, such as the internal data structures and algorithms used by each
subsystem, any decision that has an impact on the system decomposition and the subsystem
interfaces should not be delayed.
• Conflicting goals and criteria. Individual developers often optimize different criteria. A
developer experienced in user interface design will be biased toward optimizing response time. A
developer experienced in databases might optimize throughput. These conflicting goals, especially
when implicit, result in developers pulling the system decomposition in different directions and
lead to inconsistencies.
The same techniques we discussed in analysis (see Section 5.5.3) can be applied during
system design:
• Identify and prioritize the design goals for the system and make them explicit (see
Section 6.4.2). If the developers concerned with system design have input in this process, they will
have an easier time committing to these design goals. Design goals also provide an objective
framework against which decisions can be evaluated.
• Make the current version of the system decomposition available to all concerned. A
live document distributed via the Internet is one way to achieve rapid distribution. Using a
configuration management tool to maintain the system design documents helps developers in
identifying recent changes.
• Maintain an up-to-date glossary. As in analysis, defining terms explicitly reduces
misunderstandings. When identifying and modeling subsystems, provide definitions in addition to
names. A UML diagram with only subsystem names is not sufficient for supporting effective
communication. A brief and substantial definition should accompany every subsystem and class
name.
• Confront design problems. Delaying design decisions can be beneficial when more
information is needed before committing to the design decision. This approach, however, can
prevent the confrontation of difficult design problems. Before tabling an issue, several possible
alternatives should be explored and described, and the delay justified. This ensures that issues can
be delayed without serious impact on the system decomposition.
• Iterate. Selected excursions into the implementation phase can improve the system
design. For example, new features in a vendor-supplied component can be evaluated by
implementing a vertical prototype (see Section 7.5.4) for the functionality most likely to benefit
from the feature.
Finally, no matter how much effort is expended on system design, the system decomposition and
the subsystem interfaces will almost certainly change during implementation. As new information
about implementation technologies becomes available, developers have a clearer understanding of
the system, and design alternatives are discovered. Developers should anticipate change and
reserve some time to update the SDD before system integration.
83
Module 4, Chapter 1
Object Design: Reusing Pattern Solutions
During analysis, we describe the purpose of the system. This results in the identification of
application objects. During system design, we describe the system in terms of its architecture, such
as its subsystem decomposition, global control flow, and persistency management. During system
design, we also define the hardware/software platform on which we build the system. This allows
the selection of off-the-shelf components that provide a higher level of abstraction than the
hardware. During object design, we close the gap between the application objects and the off-the-
shelf components by identifying additional solution objects and refining existing objects. Object
design includes
• reuse, during which we identify off-the-shelf components and design patterns to make
use of existing solutions
• service specification, during which we precisely describe each class interface
• object model restructuring, during which we transform the object design model to
improve its understandability and extensibility
• object model optimization, during which we transform the object design model to
address performance criteria such as response time or memory utilization.
Object design, like system design, is not algorithmic. The identification of existing patterns
and components is central to the problem-solving process. We discuss these building blocks and
the activities related to them. In this chapter, we provide an overview of object design and focus
on reuse, that is the selection of components and the application of design patterns.
8.2 An Overview of Object Design
Conceptually, software system development fills the gap between a given problem and an
existing machine. The activities of system development incrementally close this gap by identifying
and defining objects that realize part of the system.
Analysis reduces the gap between the problem and the machine by identifying objects
representing problem-specific concepts. During analysis the system is described in terms of
external behavior such as its functionality (use case model), the application domain concepts it
manipulates (object model), its behavior in terms of interactions (dynamic model), and its
nonfunctional requirements.
System design reduces the gap between the problem and the machine in two ways. First,
system design results in a virtual machine that provides a higher level of abstraction than the
machine. This is done by selecting off-the-shelf components for standard services such as
middleware, user interface toolkits, application frameworks, and class libraries. Second, system
design identifies off-the-shelf components for application domain objects such as reusable class
libraries of banking objects.
After several iterations of analysis and system design, the developers are usually left with
a puzzle that has a few pieces missing. These pieces are found during object design. This includes
identifying new solution objects, adjusting off-the-shelf components, and precisely specifying
84
each subsystem interface and class. The object design model can then be partitioned into sets of
classes that can be implemented by individual developers.
Object design includes four groups of activities (see Figure 8-2):
• Reuse. Off-the-shelf components identified during system design are used to help in the
realization of each subsystem. Class libraries and additional components are selected for basic data
structures and services. Design patterns are selected for solving common problems and for
protecting specific classes from future change. Often, components and design patterns need to be
adapted before they can be used. This is done by wrapping custom objects around them or by
refining them using inheritance. During all these activities, the developers are faced with the same
buy-versus-build trade-offs they encountered during system design.
• Interface specification. During this activity, the subsystem services identified during
system design are specified in terms of class interfaces, including operations, arguments, type
signatures, and exceptions. Additional operations and objects needed to transfer data among
subsystems are also identified. The result of service specification is a complete interface
specification for each subsystem. The subsystem service specification is often called subsystem
API (Application Programmer Interface).
• Restructuring. Restructuring activities manipulate the system model to increase code
reuse or meet other design goals. Each restructuring activity can be seen as a graph transformation
on subsets of a particular model. Typical activities include transforming N-ary associations into
binary associations, implementing binary associations as references, merging two similar classes
from two different subsystems into a single class, collapsing classes with no significant behavior
into attributes, splitting complex classes into simpler ones, and/or rearranging classes and
operations to increase the inheritance and packaging. During restructuring, we address design
goals such as maintainability, readability, and understandability of the system model.
• Optimization. Optimization activities address performance requirements of the system
model. This includes changing algorithms to respond to speed or memory requirements, reducing
multiplicities in associations to speed up queries, adding redundant associations for efficiency,
rearranging execution orders, adding derived attributes to improve the access time to objects, and
opening up the architecture, that is, adding access to lower layers because of performance
requirements.
Object design is not sequential. Although each group of activities described above
addresses a specific object design issue, they usually occur concurrently. A specific off-the-shelf
component may constrain the number of types of exceptions mentioned in the specification of an
operation and thus may impact the subsystem interface. The selection of a component may reduce
the implementation work while introducing new “glue” objects, which also need to be specified.
Finally, restructuring and optimizing may reduce the number of components to be implemented by
increasing the amount of reuse in the system.
Usually, interface specification and reuse activities occur first, yielding an object design
model that is then checked against the use cases that exercise the specific subsystem. Restructuring
and optimization activities occur next, once the object design model for the subsystem is relatively
85
stable. Focusing on interfaces, components, and design patterns results in an object design model
that is much easier to modify. Focusing on optimizations first tends to produce object design
models that are rigid and difficult to modify. However, as depicted in Figure 8-2, activities of
object design occur iteratively.
hierarchy between the superclass and the subclass. Whereas this is acceptable when the inheritance
hierarchy represents a taxonomy (e.g., it is acceptable for Image and GIFImage to be tightly
coupled), it introduces unwanted coupling in the other cases.
The use of inheritance for the sole purpose of reusing code is called implementation
inheritance. With implementation inheritance, developers reuse code quickly by subclassing an
existing class and refining its behavior. A Set implemented by inheriting from a Hashtable is an
example of implementation inheritance. Conversely, the classification of concepts into type
hierarchies is called specification inheritance (also called “interface inheritance”). The UML
class model of Figure 8-4 summarizes the four different types of inheritance we discussed in this
section.
8.3.3 Delegation
Delegation is the alternative to implementation inheritance that should be used when reuse is
desired. A class is said to delegate to another class if it implements an operation by resending a
message to another class. Delegation makes explicit the dependencies between the reused class
and the new class. The right column of Figure 8-3 shows an implementation of MySet using
delegation instead of implementation inheritance. The only significant change is the private field
table and its initialization in the MySet() constructor. This addresses both problems we mentioned
before:
• Extensibility. The MySet on the right column does not include the containsKey() method
in its interface and the new field table is private. Hence, we can change the internal representation
of MySet to another class (e.g., a List) without impacting any clients of MySet.
• Subtyping. MySet does not inherit from Hashtable and, hence, cannot be substituted for
a Hashtable in any of the client code. Consequently, any code previously using Hashtables still
behaves the same way.
88
Since developers have strived to evolve and refine design patterns for maximizing reuse
and flexibility, they are usually not solutions that programmers would initially think of. As design
patterns capture a great deal of knowledge (e.g., by documenting the context and tradeoffs involved
in applying a pattern), they also constitute a source of guidance about when to use inheritance and
delegation.
8.4 Reuse Activities: Selecting Design Patterns and Components
System design and object design introduce a strange paradox in the development process. On the
one hand, during system design, we construct solid walls between subsystems to manage
complexity by breaking the system into smaller pieces and to prevent changes in one subsystem
from affecting other subsystems. On the other hand, during object design, we want the software to
be modifiable and extensible to minimize the cost of future changes. These are conflicting goals:
we want to define a stable architecture to deal with complexity, but we also want to allow flexibility
to deal with change later in the development process. This conflict can be solved by anticipating
change and designing for it, as sources of later changes tend to be the same for many systems:
91
• New vendor or new technology. Commercial components used to build the system are
often replaced by equivalent ones from a different vendor. This change is common and generally
difficult to cope with. The software marketplace is dynamic, and vendors might go out of business
before your project is completed.
• New implementation. When subsystems are integrated and tested together, the overall
system response time is, more often than not, above performance requirements. System
performance is difficult to predict and should not be optimized before integration. Developers
should focus on the subsystem services first. This triggers the need for more efficient data
structures and algorithms—often under time constraints.
• New views. Testing the software with real users uncovers many usability problems. These
often translate into the need to create additional views on the same data.
• New complexity of the application domain. The deployment of a system triggers ideas
of new generalizations: a bank information system for one branch may lead to the idea of a multi-
branch information system. The application domain itself might also increase in complexity:
previously, flight numbers were associated with one plane, and one plane only, but with air carrier
alliances, one plane can now have a different flight number from each carrier.
• Errors. Many requirements’ errors are discovered when real users start using the system.
The use of delegation and inheritance in conjunction with abstract classes decouples the
interface of a subsystem from its actual implementation. In this section, we provide selected
examples of design patterns that can deal with the type of changes mentioned above.
8.4.1 Encapsulating Data Stores with the Bridge Pattern
Consider the problem of incrementally developing, testing, and integrating subsystems realized by
different developers. Subsystems may be completed at different times, delaying the integration of
all subsystems until the last one is completed. To avoid this delay, projects often use a stub
implementation in place of a specific subsystem so that the integration tests can start even before
the subsystems are completed. In other situations, several implementations of the same subsystem
are realized, such as a reference implementation that realizes the specified functionality with the
most basic algorithms, or an optimized implementation that delivers better performance at the cost
of additional complexity. In short, a solution is needed for dynamically substituting multiple
realizations of the same interface for different uses.
This problem can be addressed with the Bridge design pattern (Appendix A.3, [Gamma
et al., 1994]). In the early stages of the project, we are interested in a rudimentary storage
subsystem based on object serialization for the purpose of debugging and testing the core use cases
of the TournamentManagement subsystem. The entity objects will be subject to many changes,
and we do not know yet what performance bottlenecks will be encountered during storage.
Consequently, an efficient storage subsystem should not be the focus of the first prototype. As
discussed during the system design of ARENA (Section 7.6.4), however, we anticipate that both a
file-based implementation and a relational database implementation of the storage subsystem
should be provided, in the first and second iteration of the system, respectively. In addition, a set
of stubs should be provided to allow early integration testing even before the file-based
92
implementation is ready. To solve this problem, we apply the Bridge pattern shown in Figure 8-7.
The LeagueStore is the interface class to the pattern, and provides all high-level functionality
associated with storage. The LeagueStoreImplementor is an abstract interface that provides the
common interface for the three implementations, namely the StubStoreImplementor for the stubs,
the XMLStoreImplementor for the file-based implementation, and the JDBCStoreImplementor for
the relational database implementation.
Note that even if most LeagueStoreImplementors provide similar services, using a Bridge
abstraction reduces performance. The design goals we defined at the beginning of system design
(Section 6.4.2) help us decide about performance and modifiability trade-offs.
Inheritance and delegation in the Bridge pattern
The Bridge pattern interface is realized by the Abstraction class, and its behavior by the
selected ConcreteImplementor class. The design pattern can be extended by providing new
RefinedAbstraction or ConcreteImplementor classes. This pattern is a classic example of
combining specification inheritance and delegation to achieve both reuse and flexibility.
On the one hand, specification inheritance is used between the abstract Implementor
interface and the classes ConcreteImplementors. As a result, each ConcreteImplementor can be
substituted transparently at runtime, from the Abstraction class and RefinedAbstraction classes.
This also ensures that, when adding a new ConcreteImplementor, developers will strive to provide
the same behavior as all other ConcreteImplementors.
On the other hand, Abstraction and Implementor are decoupled using delegation. This
enables the distribution of different behavior in each of the side of the bridge. For example, the
LeagueStore class in Figure 8-7 provides the high-level behavior for storing Leagues, whereas the
concrete LeagueStoreImplementor provides specific lower-level functionality that differs in its
realization from one storage approach to the other. Since LeagueStore and
LeagueStoreImplementor provide different behaviors, they cannot be treated as subtypes
according to the Liskov Substitution Principle.
93
uses the ClientInterface to work with instances of Adapter transparently and without modification
of the client. On the other hand, the same Adapter can be used for subtypes of the LegacyClass.
Note that the Bridge and the Adapter patterns are similar in purpose and structure. Both
decouple an interface from an implementation, and both use a specification inheritance relationship
and a delegation relationship. They differ in the context in which they are used and in the order in
which delegation and inheritance occur. The Adapter pattern uses inheritance first and then
delegation, whereas the Bridge pattern uses delegation first and then inheritance. The Adapter
pattern is applied when the interface (i.e., ClientInterface) and the implementation (i.e.,
LegacyClass) already exist and cannot be modified. When developing new code, the Bridge pattern
is a better choice as it provides more extensibility.
8.4.3 Encapsulating Context with the Strategy Pattern
Consider a mobile application running on a wearable computer that uses different networks
protocols depending on the location of the user: assume, for example, a car mechanic using the
wearable computer to access repair manuals and maintenance records for the vehicle under repair.
The wearable computer should operate in the shop with access to a local wireless network as well
as on the roadside using a third-generation mobile phone network, such as UMTS. When updating
or configuring the mobile application, a system administrator should be able to use the wearable
computer with access to a wired network such as Ethernet. This means that the mobile application
needs to deal with different types of networks as it switches between networks dynamically, based
on factors such as location and network costs. Assume that during the system design of this
application, we identify the dynamic switching between wired and wireless networks as a critical
design goal. Furthermore, we want to be able to deal with future network protocols without having
to recompile the application.
To achieve both of these goals, we apply the Strategy design pattern (Appendix A.9,
[Gamma et al., 1994]). The system model and implementation, respectively, are shown in Figures
8-10 and 8-11. The Strategy class is realized by NetworkInterface, which provides the common
interface to all networks; the Context class is realized by a NetworkConnection object, which
represents a point-to-point connection between the wearable and a remote host. The Client is the
mobile application. The Policy is the LocationManager, which monitors the current location of the
wearable and the availability of networks, and configures the NetworkConnection objects with the
appropriate NetworkInterfaces. When the LocationManager object invokes the
setNetworkInterface() method, the NetworkConnection object shuts down the current
NetworkInterface and initializes the new NetworkInterface transparently from the rest of the
application.
//PTO
95
ARENA to support a broad spectrum of games, so we do not want the classes responsible for
recording and replaying moves to depend on any specific game.
We can apply the Command design pattern (Appendix A.4, [Gamma et al., 1994]) to this
effect. The key to decoupling game moves from their handling is to represent game moves as
command objects that inherit from an abstract class called Move in Figure 8-13. The Move class
declares operations for executing, undoing, and storing commands, whereas ConcreteCommands
classes (i.e., TicTacToeMove and ChessMove in ARENA) implement specific commands. The
classes responsible for recording and replaying games only access the GameMove abstract class
interface, thus making the system extensible to new Games.
manipulated the same way as the concrete user interface objects. For example, our preferences
dialog can include a top panel for the title of the dialog and instructions for the user, a center panel
containing the checkboxes and their labels, and a bottom panel for the ‘ok’ and ‘cancel’ button.
Each panel is responsible for the layout of its subpanels, called “children,” and the overall dialog
only has to deal with the three panels.
Swing addresses this problem with the Composite design pattern (Appendix A.5,
[Gamma et al., 1994]) as depicted in Figure 8-16. An abstract class called Component is the roof
of all user interface objects, including Checkboxes, Buttons, and Labels. Composite, also a
subclass of Component, is a special user interface object representing aggregates including the
Panels we mentioned above. Note that Windows and Applets (the root of the instance hierarchy)
are also Composite classes that have additional behavior for dealing with the window manager and
the browser, respectively.
Java Swing [JFC, 2009]. System infrastructure frameworks are used internally within a software
project and are usually not delivered to a client.
• Middleware frameworks are used to integrate existing distributed applications and components.
Common examples include Microsoft’s MFC and DCOM, Java RMI, WebObjects [Wilson &
Ostrem, 1999], WebSphere [IBM], WebLogic Enterprise Application [BEA], implementations of
CORBA [OMG, 2008], and transactional databases.
• Enterprise application frameworks are application specific and focus on domains such as
telecommunications, avionics, environmental modeling, manufacturing, financial engineering
[Birrer, 1993], and enterprise business activities [JavaEE, 2009].
Infrastructure and middleware frameworks are essential to create rapidly high-quality
software systems, but they are usually not requested by external customers. Enterprise frameworks,
however, support the development of end-user applications. As a result, buying infrastructure and
middleware frameworks is more cost effective than building them [Fayad & Hamu, 1997].
Frameworks can also be classified by the techniques used to extend them.
• Whitebox frameworks rely on inheritance and dynamic binding for extensibility.
Existing functionality is extended by subclassing framework base classes and overriding
predefined hook methods using patterns such as the template method pattern [Gamma et al., 1994].
• Blackbox frameworks support extensibility by defining interfaces for components that
can be plugged into the framework. Existing functionality is reused by defining components that
conform to a particular interface and integrating these components with the framework using
delegation.
Whitebox frameworks require intimate knowledge of the framework’s internal structure.
Whitebox frameworks produce systems that are tightly coupled to the specific details of the
framework’s inheritance hierarchies, and thus changes in the framework can require the
recompilation of the application. Blackbox frameworks are easier to use than whitebox
frameworks because they rely on delegation instead of inheritance. However, blackbox
frameworks are more difficult to develop because they require the definition of interfaces and
hooks that anticipate a wide range of potential use cases. Moreover, it is easier to extend and
reconfigure blackbox frameworks dynamically, as they emphasize dynamic object relationships
rather than static class relationships. [Johnson & Foote, 1988].
Frameworks, class libraries, and design patterns
Frameworks are closely related to design patterns, class libraries, and components.
Design patterns versus frameworks. The main difference between frameworks and patterns is
that frameworks focus on reuse of concrete designs, algorithms, and implementations in a
particular programming language. In contrast, patterns focus on reuse of abstract designs and small
collections of cooperating classes. Frameworks focus on a particular application domain, whereas
design patterns can be viewed more as building blocks of frameworks.
Class libraries versus frameworks. Classes in a framework cooperate to provide a reusable
architectural skeleton for a family of related applications. In contrast, class libraries are less domain
specific and provide a smaller scope of reuse. For instance, class library components, such as
101
classes for strings, complex numbers, arrays, and bitsets, can be used across many application
domains. Class libraries are typically passive; that is, they do not implement or constrain the
control flow. Frameworks, however, are active; that is, they control the flow of control within an
application. In practice, developers often use frameworks and class libraries in the same system.
For instance, frameworks use class libraries, such as foundation classes, internally to simplify the
development of the framework. Similarly, application-specific code invoked by framework event
handlers uses class libraries to perform basic tasks, such as string processing, file management,
and numerical analysis.
Components versus frameworks. Components are self-contained instances of classes that are
plugged together to form complete applications. In terms of reuse, a component is a blackbox that
defines a cohesive set of operations that can be used solely with knowledge of the syntax and
semantics of its interface. Compared with frameworks, components are less tightly coupled and
can even be reused on the binary code level. That is, applications can reuse components without
having to subclass from existing base classes. The advantage is that applications do not always
have to be recompiled when components change. The relationship between frameworks and
components is not predetermined. On the one hand, frameworks can be used to develop
components, where the component interface provides a facade pattern for the internal class
structure of the framework. On the other hand, components can be plugged into blackbox
frameworks. In general, frameworks are used to simplify the development of infrastructure and
middleware software, whereas components are used to simplify the development of end-user
application software.
8.5 Managing Reuse
Historically, software development started as a craft, in which each application was custom made
according to the wishes and needs of a single customer. After all, software development
represented only a fraction of the cost of hardware, and computing solutions were affordable only
to few. With the price of hardware dropping and computing power increasing exponentially, the
number of customers and the range of applications has broadened dramatically. Conversely,
software costs increased as applications became more complex. This trend reached the point where
software represented the largest cost in any computing solution, putting tremendous economic
pressure on the project manager to reduce the cost of software. With no silver bullet in sight,
systematic reuse of code, designs, and processes became an attractive solution. Reuse, whether
design patterns, frameworks, or components, has many technical and managerial advantages:
• Lower development effort. When reusing a solution or a component, many standard
errors are avoided. Moreover, in the case of design patterns, the resulting system is more easily
extended and more resilient to typical changes. This results in less development effort and reduces
the need for human resources, which can be redirected to testing the software to ensure better
quality.
• Lower risk. When reusing repetitively the same design pattern or component, the typical
problems that will be encountered are known and can be anticipated. Moreover, the time needed
102
to adapt the design pattern or to glue the component is also known, resulting in a more predictable
development process and fewer risks.
• Widespread use of standard terms. The reuse of a standard set of design patterns and
components fosters the use of a standard vocabulary. For example, terms such as Adapter, Bridge,
Command, or Facade denote precise concepts that all developers become familiar with. This
reduces the number of different terms and solutions to common problems and reduces
misunderstandings among developers.
• Increased reliability. Reuse by itself does not increase reliability or reduce the need for
testing (see the Ariane 501 incident in Section 3.1 as an illustrative example). Components and
pattern solutions that worked in one context can exhibit unexpected failures in other contexts.
However, a culture of reuse in a software organization can increase reliability for all of the above
reasons: reduced development time can lead to an increased testing effort, repetitive use of
components can lead to a knowledge base of typical problems to be anticipated, and use of standard
terms reduces communication failures.
Unfortunately, reuse does not occur spontaneously within a development organization. The
main challenges include
• NIH (Not Invented Here) syndrome. Since software engineering education (at least
until recently) emphasizes mostly the design of new solutions, developers often distrust the reuse
of existing solutions, especially when the customization of the solution under consideration is
limited or constrained. In such situations, developers believe that they can develop a completely
new solution that is better adapted to their specific problem (which is usually true) in less time
than what they need to understand the reused solution (which is usually not true). Moreover, the
advantages of reuse are visible only in the longer term, while the gratification of developing a new
implementation is instantaneous.
• Process support. The processes associated with identifying, reusing, and customizing an
existing solution are different than those involved in creating a brand-new solution. The first set
of activities requires painstakingly sifting through a large and evolving corpus of knowledge and
carefully evaluating the findings. The second set of activities requires creativity and a good
understanding of the problem. Most software engineering tools and methods are better adapted to
creative activities than to reuse. For example, there are currently many catalogs of design patterns,
but no systematic method for novice developers to identify quickly the appropriate pattern that
should be used in a given situation.
• Training. Given the lack of knowledge support tools for reuse, training is the single most
effective method in establishing a reuse culture. Consequently, the burden of educating developers
to specific reusable solutions and components falls on the development organization.
8.5.1 Documenting Reuse
Reuse activities involve two types of documentation: the documentation of the template solution
being reused and the documentation of the system that is reusing the solution.
The documentation of a reusable solutions (e.g., the design pattern, a framework, or a
component) includes not only a description of the solution, but also a description of the class of
103
problems it addresses, the trade-offs faced by the developer, alternative implementations, and
examples of use. This documentation is typically difficult to produce, as the author of the reusable
solution may not be able to anticipate all the problems it can be used for. Moreover, such
documentation is usually generic and abstract and must be illustrated by concrete examples for
novice developers to fully understand the parameters of the solution. Consequently, documentation
of a reusable solution is usually not ideal. However, developers can incrementally improve this
documentation each time they reuse a solution by adding the following:
• Reference to a system using the solution. Minimally, the documentation of the reusable
solution should include references to each use. If defects are discovered in the reused solution,
these defects can be systematically corrected in all occurrences of reuse.
• Example of use. Examples are essential for developers to understand the strengths and
limitation of the reused solution. Each occurrence of reuse constitutes an example. Developers
should include a brief summary illustrating the problems being solved and the adopted solution.
• Alternative solutions considered. As we saw in this chapter, many design patterns are
similar. However, selecting the wrong pattern can lead to more problems than developing a custom
solution. In the documentation of the example, developers should indicate which other candidate
solutions they discarded and why.
• Encountered trade-offs. Reuse, especially in the case of frameworks and components,
often entails making a compromise and selecting a less-than-optimal solution for some criteria.
For example, one component may offer an interface that is extensible, and another may deliver
better response time.
The documentation of the system under construction should minimally include references
to all the reused solutions. For example, design patterns are not immediately identifiable in the
code, as the classes involved usually have names different from names used in the standard pattern.
Many patterns draw their benefits from the decoupling of certain classes (e.g., the bridge client
from the bridge implementations), so such classes should remain decoupled during future changes
to the system. Similarly, explicitly documenting which classes use which components makes it
easier to adapt the client classes to newer versions of the reused components. Consequently,
developers can further increase of the benefits of reuse by documenting the links between reused
solutions and their code, in addition to the standard object design documentation.
A contributing factor for the high cost of change late in the process is the loss of design
context. Developers forget quickly the reasons behind designing complicated workarounds or
complex data structures during early phases of the process. When changing code late in the
process, the probability of introducing errors into the system is high. Hence, the reason for
recording trade-offs, examples, alternatives, and other decision-making information is also to
reduce the cost of change.
8.5.2 Assigning Responsibilities
Individual developers assigned to subsystems will not spontaneously turn to design patterns and
components unless they have experience with these topics. To foster a reuse culture, an
organization needs to make the incentives of reuse as high as possible for the individual developer.
104
This includes access to expert developers who can provide advice and information, and specific
components or patterns, training, and emphasis on reuse during design reviews and code
inspections. The availability of knowledge lowers the frustration experienced when experiencing
the learning curve associated with a component. The explicit review of pattern usage (or lack
thereof) increases the organizational incentive for investing time into looking for ready solutions.
Below are the main roles involved in reuse:
• Component expert. The component expert is familiar with using a specific component.
The component expert is a developer and usually has received third-party training in the use of the
component.
• Pattern expert. The pattern expert is the analog of the component expert for a family of
design patterns. However, pattern experts are usually self-made and acquire their knowledge from
experience.
• Technical writer. The technical writer must be aware of reuse and document
dependencies between components, design patterns, and the system, as discussed in the previous
section. This may require the technical writer to become familiar with the solutions typically
reused by the organization and with their associated terms.
• Configuration manager. In addition to tracking configurations and versions of
individual subsystems, the configuration manager must also be aware of the versions of the
components that are used. While newer versions of the components may be used, their introduction
requires tests to be repeated and changes related to the upgrade documented.
The technical means of achieving reuse (e.g., inheritance, delegation, design patterns,
application frameworks) have been available to software engineers for nearly two decades. The
success factors associated with reuse are actually not technical, but managerial. Only an
organization that provides the tools for selecting and improving reusable solutions and the culture
to encourage their use can reap the benefits of design and code reuse.
(Note: ARENA case study is there in Page number 338/333 of the
textbook Object-oriented software engineering _ using UML,
Patterns, -- Bruegge, Bernd; Dutoit, Allen H, in section 8.6)
105
Module 4, Chapter 2
Object Design: Specifying Interfaces
During object design, we identify and refine solution objects to realize the subsystems defined
during system design. During this activity, our understanding of each object deepens: we specify
the type signatures and the visibility of each of the operations, and, finally, we describe the
conditions under which an operation can be invoked and those under which the operation raises an
exception. As the focus of system design was on identifying large chunks of work that could be
assigned to individual teams or developers, the focus of object design is on specifying the
boundaries between objects. At this stage in the project, a large number of developers concurrently
refines and changes many objects and their interfaces. The pressure to deliver is increasing and the
opportunity to introduce new, complex faults into the design is still there. The focus of interface
specification is for developers to communicate clearly and precisely about increasingly lower-level
details of the system.
The interface specification activities of object design include
• identifying missing attributes and operations
• specifying type signatures and visibility
• specifying invariants
• specifying preconditions and postconditions.
In this chapter, we provide an overview of the concepts of interface specification. We
introduce OCL (Object Constraint Language) as a language for specifying invariants,
preconditions, and postconditions. We discuss heuristics and stylistic guidelines for writing
readable constraints. Finally, we examine the issues related to documenting and managing
interface specifications.
9.2 An Overview of Interface Specification
At this point in system development, we have made many decisions about the system and produced
a wealth of models:
• The analysis object model describes the entity, boundary, and control objects that are
visible to the user. The analysis object model includes attributes and operations for each object.
• Subsystem decomposition describes how these objects are partitioned into cohesive
pieces that are realized by different teams of developers. Each subsystem includes high level
service descriptions that indicate which functionality it provides to the others.
• Hardware/software mapping identifies the components that make up the virtual
machine on which we build solution objects. This may include classes and APIs defined by existing
components.
• Boundary use cases describe, from the user’s point of view, administrative and
exceptional cases that the system handles.
• Design patterns selected during object design reuse describe partial object design models
addressing specific design issues.
106
All these models, however, reflect only a partial view of the system. Many puzzle pieces
are still missing and many others are yet to be refined. The goal of object design is to produce an
object design model that integrates all of the above information into a coherent and precise whole.
The goal of interface specification, the focus of this chapter, is to describe the interface of each
object precisely enough so that objects realized by individual developers fit together with minimal
integration issues. To this end, interface specification includes the following activities:
• Identify missing attributes and operations. During this activity, we examine each
subsystem service and each analysis object. We identify missing operations and attributes that are
needed to realize the subsystem service. We refine the current object design model and augment it
with these operations.
• Specify visibility and signatures. During this activity, we decide which operations are
available to other objects and subsystems, and which are used only within a subsystem. We also
specify the return type of each operation as well as the number and type of its parameters. This
goal of this activity is to reduce coupling among subsystems and provide a small and simple
interface that can be understood easily by a single developer.
• Specify contracts. During this activity, we describe in terms of constraints the behavior
of the operations provided by each object. In particular, for each operation, we describe the
conditions that must be met before the operation is invoked and a specification of the result after
the operation returns.
The large number of objects and developers, the high rate of change, and the concurrent
number of decisions made during object design make object design much more complex than
analysis or system design. This represents a management challenge, as many important decisions
tend to be resolved independently and are not communicated to the rest of the project. Object
design requires much information to be made available among the developers so that decisions can
be made consistent with decisions made by other developers and consistent with design goals. The
Object Design Document, a live document describing the specification of each class, supports this
information exchange.
9.3 Interface Specification Concepts
In this section, we present the principal concepts of interface specification:
• Class Implementor, Class Extender, and Class User (Section 9.3.1)
• Types, Signatures, and Visibility (Section 9.3.2)
• Contracts: Invariants, Preconditions, and Postconditions (Section 9.3.3)
• Object Constraint Language (Section 9.3.4)
• OCL Collections: Sets, Bags, and Sequences (Section 9.3.5)
• OCL Qualifiers: forAll and exists (Section 9.3.6).
9.3.1 Class Implementor, Class Extender, and Class User
So far, we have treated all developers as equal. Now that we are delving into the details of object
design and implementation, we need to differentiate developers based on their point of view. While
all use the interface specification to communicate about the class of interest, they view the
specifications from radically different point of views (see also Figure 9-1):
107
• The class implementor is responsible for realizing the class under consideration. Class
implementors design the internal data structures and implement the code for each public operation.
For them, the interface specification is a work assignment.
• The class user invokes the operations provided by the class under consideration during
the realization of another class, called the client class. For class users, the interface specification
discloses the boundary of the class in terms of the services it provides and the assumptions it makes
about the client class.
• The class extender develops specializations of the class under consideration. Like class
implementors, class extenders may invoke operations provided by the class of interest, the class
extenders focus on specialized versions of the same services. For them, the interface specification
both a specifies the current behavior of the class and any constraints on the services provided by
the specialized class.
Type information alone is often not sufficient to specify the range of legitimate values of
an attribute. In the Tournament example, the int type allows maxNumPlayers to take negative
values, which does not make sense in the application domain. We address this issue with contracts.
9.3.3 Contracts: Invariants, Preconditions, and Postconditions
Contracts are constraints on a class that enable class users, implementors, and extenders to share
the same assumptions about the class [Meyer, 1997]. A contract specifies constraints that the class
user must meet before using the class as well as constraints that are ensured by the class
implementor and the class extender when used. Contracts include three types of constraints:
• An invariant is a predicate that is always true for all instances of a class. Invariants are
constraints associated with classes or interfaces. Invariants are used to specify consistency
constraints among class attributes.
• A precondition is a predicate that must be true before an operation is invoked.
Preconditions are associated with a specific operation. Preconditions are used to specify constraints
that a class user must meet before calling the operation.
• A postcondition is a predicate that must be true after an operation is invoked.
Postconditions are associated with a specific operation. Postconditions are used to specify
constraints that the class implementor and the class extender must ensure after the invocation of
the operation.
For example, consider the Java interface for the Tournament from Figure 9-3. This class
provides an acceptPlayer() method to add a Player in the Tournament, a removePlayer() method
to withdraw a Player from the Tournament (e.g., because the player cancelled his application), and
a getMaxNumPlayers() method to get the maximum number of Players who can participate in this
Tournament.
An example of an invariant for the Tournament class is that the maximum number of
Players in the Tournament should be positive. If a Tournament is created with a maxNumPlayers
that is zero, the acceptPlayer() method will always violate its contract and the Tournament will
never start. Using a boolean expression, in which t is a Tournament, we can express this invariant
as
t.getMaxNumPlayers() > 0
An example of a precondition for the acceptPlayer() method is that the Player to be added
has not yet already been accepted in the Tournament and that the Tournament has not yet reached
its maximum number of Players. Using a boolean expression, in which t is a Tournament and p is
a Player, we express this invariant as
!t.isPlayerAccepted(p) and t.getNumPlayers() < t.getMaxNumPlayers()
An example of a postcondition for the acceptPlayer() method is that the current number
of Players must be exactly one more than the number of Players before the invocation of
acceptPlayer(). We can express this postcondition as
t.getNumPlayers_afterAccept = t.getNumPlayers_beforeAccept + 1
where numPlayers_afterAccept and numPlayers_afterAccept are the current and number of
Players before and after acceptPlayer(), respectively.
110
Attaching OCL expressions to diagrams can lead to clutter. For this reason, OCL
expressions can be alternatively expressed in a textual form. For example, the invariant for the
Tournament class requiring the attribute maxNumPlayers to be positive is written as follows:
The context keyword indicates the entity to which the expression applies. This is followed
by one of the keywords inv, pre, and post, which correspond to the UML stereotypes «invariant»,
«precondition», and «postcondition», respectively. Then follows the actual OCL expression.
OCL’s syntax is similar to object-oriented languages such as C++ or Java. However, OCL is not a
procedural language and thus cannot be used to denote control flow. Operations can be used in
OCL expressions only if they do not have any side effects.
111
For invariants, the context for the expression is the class associated with the invariant. The
keyword self (e.g., self.numElements) denotes all instances of the class.1 Attributes and operations
are accessed using the dot notation (e.g., self.maxNumPlayers accesses maxNumPlayers in the
current context). The self-keyword can be omitted if there is no ambiguity.
For preconditions and postconditions, the context of the OCL expression is an operation.
The parameters passed to the operation can be used as variables in the expression. For example,
consider the following precondition on the acceptPlayer() operation in Tournament:
The creators and users of OCL constraints are developers during object design and during
implementation. In Java programs, tools such as iContract [Kramer, 1998] enable developers to
document constraints in the source code using Javadoc style tags, so that constraints are more
readily accessed and updated. Figure 9-5 depicts the Java code corresponding to the constraints
introduced so far.
112
Now, let’s review the above constraints in terms of the instances of Figure 9-7:
1. The winter:Tournament lasts two days, the xmas:Tournament three days, both under a
week.
2. All Players of the winter:Tournament and the xmas:Tournament are associated with
tttExpert:League. The Player zoe, however, is not part of the tttExpert:League and does
not take part in either Tournament.
3. tttExpert:League has four active Players, whereas the chessNovice:League has none,
because zoe does not take part in any Tournament.
At first sight, these constraints vary quite a bit: for example, the first constraint involves
attributes of a single class (Tournament.start and Tournament.end); the second one involves three
classes (i.e., Player, Tournament, League) and their associations; the third involves a set of Matches
within a single Tournament. In all cases, we start with the class of interest and navigate to one or
more classes in the model.
113
All constraints can be built using a combination of these three basic cases of navigation.
Once we know how to deal with these three cases of navigation, we can build any constraint. We
already know how to deal with the first type of constraint with the dot notation, as we saw in the
previous section. For example, we can write constraint 1 as follows:
In the second constraint, however, the expression league.players can actually refer to many
objects, since the players association is a many-to-many association. To deal with this situation,
OCL provides additional data types called collections. There are three types of collections:
• OCL sets are used when navigating a single association. For example, navigating the
players’ association of the winter:Tournament yields the set {alice, bob}. Navigating the players
association from the tttExpert:League yields the set {alice, bob, marc, joe}. Note, however, that
navigating an association of multiplicity 1 yields directly an object, not a set. For example,
navigating the league association from winter:Tournament yields tttExpert:League (as opposed to
{tttExpert:League}).
• OCL sequences are used when navigating a single ordered association. For example, the
association between League and Tournament is ordered. Hence, navigating the tournaments
association from tttExpert:League yields [winter:Tournament, xmas:Tournament] with the index
of winter:Tournament and xmas:Tournament being 1 and 2, respectively.
• OCL bags are multisets: they can contain the same object multiple times. Bags are used
to accumulate the objects when accessing indirectly related objects. For example, when
determining which Players are active in the tttExpert:League, we first navigate the tournaments
association of tttExpert, then the players association from winter:Tournament, and finally the
players association from xmas:Tournament, yielding the bag {alice, bob, bob, marc, joe}. The bag
resulting from navigating the same associations from chessNovice:League results in the empty
bag, as there are no Tournaments in the chessLeague. In cases where the number of occurrences
of each object in the bag is undesired, the bag can be converted to a set.
OCL provides many operations for accessing collections. The most often used are
• size, which returns the number of elements in the collection
• includes(object), which returns True if object is in the collection
• select(expression), which returns a collection that contains only the elements of the
original collection for which expression is True
• union(collection), which returns a collection containing elements from both the original
collection and the collection specified as parameter
• intersection(collection), which returns a collection that contains only the elements that
are part of both the original collection and the collection specified as parameter
• asSet(collection), which returns a set containing each element of the collection.
To distinguish between attributes in classes from collections, OCL uses the dot notation for
accessing attributes and the -> operator for accessing collections. For example, constraint 2 (on
page 361) can be expressed with an includes operation as follows:
115
The next association we navigate is the players association on the League, which results in
a set because of the “many” multiplicities of the association. We use the OCL includes() operation
on this set to test if the Player p is known to the League.
Navigating a series of at least two associations with one-to-many or many-to-many
multiplicity results in a bag. For example, in the context of a League, the expression
tournaments.players contains the concatenation of all players of the Tournaments related to the
current League. As a result of this concatenation, elements can appear several times. To remove
the duplicates in this bag, for example, when counting the number of Players in a League that have
taken part in a Tournament, we can convert the bag into a set using the OCL asSet operation.
Consequently, we can write constraint 3 (on page 361) as follows:
The OCL exists() operation is similar to forAll(), except that the expressions evaluated on
each element are ORed, that is, only one element needs to satisfy the expression for the exists()
operation to return True. For example, to ensure that each Tournament conducts at least one Match
on the first day of the Tournament, we can write:
We also consider the relationship between the classes we identified and the classes from
existing components. For example, a number of classes implementing collections are provided in
the java.util package. The List interface provides a way to access an ordered collection of objects
independent from the underlying data structure. The Map interface provides a table mapping from
unique keys to arbitrary entries. We select the List interface for returning collections of objects,
such as the Tournaments to which a Player has been accepted. We select the Map interface for
returning mappings of objects, for example, Player to Scores.
Finally, we determine the visibility of each attribute and operation during this step. By
doing so, we determine which attributes should be accessible only indirectly via the class’s
operations, and which attributes are public and can be modified by any other class. Similarly, the
visibility of operations allows us to distinguish between operations that are part of the class
interface and those that are utility methods that can only be accessed by the class. In the case of
abstract classes and classes that are intended to be refined, we also define protected attributes and
methods for the use of subclasses only. Figure 9-11 depicts the refinement of the object model
depicted in Figure 9-9 after types, signatures, and visibility have been assigned.
Once we have specified the types of each attribute, the signature of each operation, and its
visibility, we focus on specifying the behavior and boundary cases of each class by using contracts.
119
Below is the complete set of OCL constraints specifying the order of selectSponsors(),
advertiseTournament(), and acceptPlayer().
An example of an invariant that is not so obvious, but can be identified by examining the
contracts of the TournamentControl class, is that no Player can take part in two or more
Tournaments that overlap. Although this property can be inferred by examining
TournamentControl.isPlayerOverbooked() operation, we can write this concisely as an invariant.
Since it is a policy decision, we attach this invariant to the TournamentControl class, as opposed
to the Player or the Tournament class.
When specified on several associations, constraints usually become complex and difficult
to understand, especially when nested forAll statements are used. For example, consider an
invariant stating that all Matches in a Tournament must involve only Players that are accepted in
the Tournament:
This constraint involves three collections: players, p.tournaments, and t.matches. We can
simplify this expression by using a bag created while navigating a series of associations:
In general, reducing the number of operations and nesting levels in a constraint makes it
much more understandable.
As illustrated by these examples, it is relatively easy to generate a large number of
constraints for each class. This does not guarantee readability. In fact, writing readable and correct
constraints is difficult. Remember that the reason for writing invariants is to clarify the
assumptions made by the class implementor to the class user. Consequently, when writing
constraints, the class implementor should focus on simple, short constraints that describe boundary
cases that may not otherwise be obvious. Figure 9-12 lists several heuristics to make constraints
more readable.
122
be taken out of the User hierarchy (i.e., Spectator does not fulfill the User contract) or the invariant
should be revised (i.e., the terms of the contract should be reformulated).
• ODD embedded into source code. The third approach is to embed the ODD into the
source code. As in the first approach, we represent the ODD using a modeling tool (see Figure 9-
14). Once the ODD becomes stable, we use the modeling tool to generate class stubs. We describe
each class interface using tagged comments that distinguish source code comments from object
design descriptions. We can then generate the ODD using a tool that parses the source code and
extracts the relevant information (e.g., Javadoc [Javadoc, 2009a]). Once the object design model
is documented in the code, we abandon the initial object design model. The advantage of this
approach is that the consistency between the object design model and the source code is much
easier to maintain: when changes are made to the source code, the tagged comments are updated
and the ODD regenerated. In this section, we focus only on this approach.
The fundamental issue is one of maintaining consistency among two models and the source
code. Ideally, we want to maintain the analysis model, the object design model, and the source
code using a single tool. Objects would then be described once, and consistency among
documentation, stubs, and code would be maintained automatically.
Presently, however, UML modeling tools provide facilities for generating a document from
a model or class stubs from a model. For example, the glossary of the RAD can be generated from
the analysis model by collating the description fields attached to each class. (Figure 9-14). The
class stub generation facility, called forward engineering, can be used in the self-contained ODD
approach to generate the class interfaces and stubs for each method.
126
Some modeling tools provide facilities for reverse engineering, that is, recreating a UML
model from source code. Such facilities are useful for creating object models from legacy code.
They require substantial hand processing, however, because the tool cannot recreate bidirectional
associations based on reference attributes only.
Tool support currently falls short when maintaining two-way dependencies, in particular
between the analysis model and the source code. Some tools, such as Rationale Rose [Rational,
2002] and Together Control Center [TogetherSoft, 2002], realize this functionality by embedding
information about associations and other UML constructs in source code comments. Even though
this allows the tool to recover syntactic changes from the source code, developers must still update
model descriptions to reflect the changes. Because developers need different tools to change the
source code and the model, the model often falls behind.
Figure 9-15 is an example template for a generated ODD.
The first section of the ODD is an introduction to the document. It describes the general
trade-offs made by developers (e.g., buy vs. build, memory space vs. response time), guidelines
and conventions (e.g., naming conventions, boundary cases, exception handling mechanisms), and
an overview of the document.
Interface documentation guidelines and coding conventions are the single most important
factor that can improve communication between developers during object design. These include a
list of rules that developers should use when designing and naming interfaces. These are examples
of such conventions:
• Classes are named with singular nouns.
• Methods are named with verb phrases, fields, and parameters with noun phrases.
• Error status is returned via an exception, not a return value.
• Collections and containers have an iterator() method returning an Iterator.
• Iterators returned by iterator() methods are robust to element removals.
The second section of the ODD, Packages, describes the decomposition of subsystems into
packages and the file organization of the code. This includes an overview of each package, its
dependencies with other packages, and its expected usage.
127
The third section, Class interfaces, describes the classes and their public interfaces. This
includes an overview of each class, its dependencies with other classes and packages, its public
attributes, operations, and the exceptions they can raise.
9.5.2 Assigning Responsibilities
Object design is characterized by a large number of participants accessing and modifying a large
amount of information. To ensure that changes to interfaces are documented and communicated in
an orderly manner, several roles collaborate to control, communicate, and implement changes.
These include the members of the architecture team who are responsible for system design and
subsystem interfaces, liaisons who are responsible for interteam communication, and configuration
managers who are responsible for tracking change.
Below is an example of how roles can be assigned during object design. As in other
activities, the same participant can be assigned more than one role.
• The core architect develops coding guidelines and conventions before object design
starts. As for many conventions, the actual set of conventions is not as important as the
commitment of all architects and developers to use the conventions. The core architect is also
responsible for ensuring consistency with prior decisions documented in the System Design
Document (SDD) and Requirements Analysis Document (RAD).
• The architecture liaisons document the public subsystem interfaces for which they are
responsible. This leads to a first draft of the ODD, which is used by developers. Architecture
liaisons also negotiate changes to public interfaces. Often, the issue is not of consensus, but of
communication: developers depending on the interface may welcome the change if they are
notified first. The architecture liaisons and the core architect form the architecture team.
• The object designers refine and detail the interface specification of the class or subsystem
they implement.
• The configuration manager of a subsystem releases changes to the interfaces and the
ODD once they become available. The configuration manager also keeps track of the relationship
between source code and ODD revisions.
• Technical writers from the documentation team clean up the final version of the ODD.
They ensure that the document is consistent from a structural and content point of view. They also
check for compliance with the guidelines.
As in system design, the architecture team is the integrating force of object design. The
architecture team ensures that changes are consistent with project goals. The documentation team,
including the technical writers, ensures that the changes are consistent with guidelines and
conventions.
9.5.3 Using Contracts During Requirements Analysis
Some requirements analysis approaches advocate the use of constraints much earlier, for example,
during the definition of the entity objects. In principle, OCL can be used in requirements analysis
as well as in object design. In general, developers consider specific project needs before deciding
on a specific approach or level of formalism to be used when documenting operations. Examine
the following trade-offs before deciding if and when to use constraints for which purpose:
128
Module 4, Chapter 3
Mapping Models to Code
If the design pattern selection and the specification of class interfaces were done carefully, most
design issues should now be resolved. We could implement a system that realizes the use cases
specified during requirements elicitation and system design. However, as developers start putting
together the individual subsystems developed in this way, they are confronted with many
integration problems. Different developers have probably handled contract violations differently.
Undocumented parameters may have been added to the API to address a requirement change.
Additional attributes have possibly been added to the object model, but are not handled by the
persistent management system, possibly because of a miscommunication. As the delivery pressure
increases, addressing these problems results in additional improvised code changes and
workarounds that eventually yield to the degradation of the system. The resulting code would have
little resemblance to our original design and would be difficult to understand.
In this chapter, we describe a selection of transformations to illustrate a disciplined
approach to implementation to avoid such a system degradation. These include
• optimizing the class model
• mapping associations to collections
• mapping operation contracts to exceptions
• mapping the class model to a storage schema.
We use Java and Java-based technologies in this chapter. The techniques we describe,
however, are also applicable to other object-oriented programming languages.
10.2 An Overview of Mapping
A transformation aims at improving one aspect of the model (e.g., its modularity) while
preserving all of its other properties (e.g., its functionality). Hence, a transformation is usually
localized, affects a small number of classes, attributes, and operations, and is executed in a series
of small steps. These transformations occur during numerous object design and implementation
activities. We focus in detail on the following activities:
• Optimization (Section 10.4.1). This activity addresses the performance requirements of
the system model. This includes reducing the multiplicities of associations to speed up queries,
adding redundant associations for efficiency, and adding derived attributes to improve the access
time to objects.
• Realizing associations (Section 10.4.2). During this activity, we map associations to
source code constructs, such as references and collections of references.
• Mapping contracts to exceptions (Section 10.4.3). During this activity, we describe the
behavior of operations when contracts are broken. This includes raising exceptions when violations
are detected and handling exceptions in higher level layers of the system.
• Mapping class models to a storage schema (Section 10.4.4). During system design, we
selected a persistent storage strategy, such as a database management system, a set of flat files, or
130
a combination of both. During this activity, we map the class model to a storage schema, such as
a relational database schema.
10.3 Mapping Concepts
We distinguish four types of transformations (Figure 10-1):
• Model transformations operate on object models (Section 10.3.1). An example is the
conversion of a simple attribute (e.g., an address represented as a string) to a class (e.g., a class
with street address, zip code, city, state, and country attributes).
• Refactorings are transformations that operate on source code (Section 10.3.2). They are
similar to object model transformations in that they improve a single aspect of the system without
changing its functionality. They differ in that they manipulate the source code.
• Forward engineering produces a source code template that corresponds to an object
model (Section 10.3.3). Many modeling constructs, such as attribute and association
specifications, can be mechanically mapped to source code constructs supported by the selected
programming language (e.g., class and field declarations in Java), while the bodies and additional
private methods are added by developers.
• Reverse engineering produces a model that corresponds to source code (Section 10.3.4).
This transformation is used when the design of the system has been lost and must be recovered
from the source code. Although several CASE tools support reverse engineering, much human
interaction is involved for recreating an accurate model, as the code does not include all
information needed to recover the model unambiguously.
removes the redundancy. The Player, Advertiser, and LeagueOwner in ARENA all have an email
address attribute. We create a superclass User and move the email attribute to the superclass.
Then, we apply the Pull Up Constructor Body refactoring to move the initialization code
for email using the following steps (Figure 10-4):
1. Add the constructor User(Address email) to class User.
2. Assign the field email in the constructor with the value passed in the parameter.
3. Add the call super(email) to the Player class constructor.
4. Compile and test.
5. Repeat steps 1–4 for the classes LeagueOwner and Advertiser.
133
At this point, the field email and its corresponding initialization code are in the User class. Now,
we examine if methods using the email field can be moved from the subclasses to the User class.
To achieve this, we apply the Pull Up Method refactoring:
1. Examine the methods of Player that use the email field. Note that Player.notify() uses
email and that it does not use any fields or operations that are specific to Player.
2. Copy the Player.notify() method to the User class and recompile.
3. Remove the Player.notify() method.
4. Compile and test.
5. Repeat for LeagueOwner and Advertiser.
Applying these three refactorings effectively transforms the ARENA source code in the
same way the object model transformation of Figure 10-2 transformed the ARENA object design
model. Note that the refactorings include many more steps than its corresponding object model
transformation and interleave testing with changes. This is because the source code includes many
more details, so it provides many more opportunities for introducing errors. In the next section, we
discuss general principles for avoiding transformation errors.
10.3.3 Forward Engineering
Forward engineering is applied to a set of model elements and results in a set of corresponding
source code statements, such as a class declaration, a Java expression, or a database schema. The
purpose of forward engineering is to maintain a strong correspondence between the object design
model and the code, and to reduce the number of errors introduced during implementation, thereby
decreasing implementation effort.
134
For example, Figure 10-5 depicts a particular forward engineering transformation applied
to the classes User and LeagueOwner. First, each UML class is mapped to a Java class. Next, the
UML generalization relationship is mapped to an extends statement in the LeagueOwner class.
Finally, each attribute in the UML model is mapped to a private field in the Java classes and to two
public methods for setting and getting the value of the field. Developers can then refine the result
of the transformation with additional behavior, for example, to check that the new value of
maxNumLeagues is a positive integer.
Note that, except for the names of the attributes and methods, the code resulting from this
transformation is always the same. This makes it easier for developers to recognize transformations
in the source code, which encourages them to comply with naming conventions. Moreover, since
developers use one consistent approach for realizing classes, they introduce fewer errors.
10.3.4 Reverse Engineering
Reverse engineering is applied to a set of source code elements and results in a set of model
elements. The purpose of this type of transformation is to recreate the model for an existing system,
either because the model was lost or never created, or because it became out of sync with the
source code. Reverse engineering is essentially an inverse transformation of forward engineering.
Reverse engineering creates a UML class for each class declaration statement, adds an attribute
for each field, and adds an operation for each method. However, because forward engineering can
lose information (e.g., associations are turned into collections of references), reverse engineering
does not necessarily recreate the same model. Although many CASE tools support reverse
engineering, CASE tools provide, at best, an approximation that the developer can use to
rediscover the original model.
10.3.5 Transformation Principles
A transformation aims at improving the design of the system with respect to some criterion. We
discussed four types of transformations so far: model transformations, refactorings, forward
engineering, and reverse engineering. A model transformation improves the compliance of the
object design model with a design goal. A refactoring improves the readability or the modifiability
of the source code. Forward engineering improves the consistency of the source code with respect
to the object design model. Reverse engineering tries to discover the design behind the source
code.
However, by trying to improve one aspect of the system, the developer runs the risk of
introducing errors that will be difficult to detect and repair. To avoid introducing new errors, all
transformations should follow these principles:
• Each transformation must address a single criterion. A transformation should improve
the system with respect to only one design goal. One transformation can aim to improve response
time. Another transformation can aim to improve coherence. However, a transformation should
not optimize multiple criteria. If you find yourself trying to deal with several criteria at once, you
most likely introduce errors by making the source code too complex.
• Each transformation must be local. A transformation should change only a few methods
or a few classes at once. Transformations often target the implementation of a method; in which
135
case the callers are not affected. If a transformation changes an interface (e.g., adding a parameter
to a method), then the client classes should be changed one at the time (e.g., the older method
should be kept around for background compatibility testing). If you find yourself changing many
subsystems at once, you are performing an architectural change, not an object model
transformation.
• Each transformation must be applied in isolation to other changes. To further localize
changes, transformations should be applied one at the time. If you are improving the performance
of a method, you should not add new functionality. If you are adding new functionality, you should
not optimize existing code. This enables you to focus on a limited set of issues and reduces the
opportunities for errors.
• Each transformation must be followed by a validation step. Even though
transformations have a mechanical aspect, they are applied by humans. After completing a
transformation and before initiating the next one, validate the changes. If you applied an object
model transformation, update the sequence diagrams in which the classes under consideration are
involved. Review the use cases related to the sequence diagrams to ensure that the correct
functionality is provided. If you applied a refactoring, run the test cases relevant to the classes
under consideration. If you added new control statements or dealt with new boundary cases, write
new tests to exercise the new source code. It is always easier to find and repair a bug shortly after
it was introduced than later.
10.4 Mapping Activities
In this section, we present transformations that occur frequently to illustrate the principles we
described in the previous section. We focus on transformations during the following activities:
• Optimizing the Object Design Model (Section 10.4.1)
• Mapping Associations to Collections (Section 10.4.2)
• Mapping Contracts to Exceptions (Section 10.4.3)
• Mapping Object Models to a Persistent Storage Schema (Section 10.4.4).
10.4.1 Optimizing the Object Design Model
The direct translation of an analysis model into source code is often inefficient. The analysis model
focuses on the functionality of the system and does not take into account system design decisions.
During object design, we transform the object model to meet the design goals identified during
system design, such as minimization of response time, execution time, or memory resources. For
example, in the case of a Web browser, it might be clearer to represent HTML documents as
aggregates of text and images. However, if we decided during system design to display documents
as they are retrieved, we may introduce a proxy object to represent placeholders for images that
have not yet been retrieved.
In this section, we describe four simple but common optimizations: adding associations to
optimize access paths, collapsing objects into attributes, delaying expensive computations, and
caching the results of expensive computations.
136
When applying optimizations, developers must strike a balance between efficiency and
clarity. Optimizations increase the efficiency of the system but also the complexity of the models,
making it more difficult to understand the system.
Optimizing access paths
Common sources of inefficiency are the repeated traversal of multiple associations, the
traversal of associations with “many” multiplicities, and the misplacement of attributes
[Rumbaugh et al., 1991].
Repeated association traversals. To identify inefficient access paths, you should identify
operations that are invoked often and examine, with the help of a sequence diagram, the subset of
these operations that requires multiple association traversal. Frequent operations should not require
many traversals, but should have a direct connection between the querying object and the queried
object. If that direct connection is missing, you should add an association between these two
objects. In interface and reengineering projects, estimates for the frequency of access paths can be
derived from the legacy system. In greenfield engineering projects, the frequency of access paths
is more difficult to estimate. In this case, redundant associations should not be added before a
dynamic analysis of the full system—for example, during system testing—has determined which
associations participate in performance bottlenecks.
“Many” associations. For associations with “many” multiplicities, you should try to
decrease the search time by reducing the “many” to “one.” This can be done with a qualified
association (Section 2.4.2). If it is not possible to reduce the multiplicity of the association, you
should consider ordering or indexing the objects on the “many” sides to decrease access time.
Misplaced attributes. Another source of inefficient system performance is excessive
modeling. During analysis many classes are identified that turn out to have no interesting behavior.
If most attributes are only involved in set() and get() operations, you should reconsider folding
these attributes into the calling class. After folding several attributes, some classes may not be
needed anymore and can simply remove from the model.
The systematic examination of the object model using the above questions should lead to
a model with selected redundant associations, with fewer inefficient many-to-many associations,
and with fewer classes.
Collapsing objects: Turning objects into attributes
After the object model is restructured and optimized a couple of times, some of its classes
may have few attributes or behaviors left. Such classes, when associated only with one other class,
can be collapsed into an attribute, thus reducing the overall complexity of the model.
Consider, for example, a model that includes Persons identified by a SocialSecurity object.
During analysis, two classes may have been identified. Each Person is associated with a
SocialSecurity class, which stores a unique social security number identifying the Person. Now,
assume that the use cases do not require any behavior for the SocialSecurity object and that no
other classes have associations with the SocialSecurity class. In this case, the SocialSecurity class
should be collapsed into an attribute of Person (see Figure 10-6).
137
The refactoring equivalent to the model transformation of Figure 10-6 is Inline Class
refactoring [Fowler, 2000]:
1. Declare the public fields and methods of the source class (e.g., SocialSecurity) in the
absorbing class (e.g., Person).
2. Change all references to the source class to the absorbing class.
3. Change the name of the source class to another name, so that the compiler catches any
dangling references.
4. Compile and test.
5. Delete the source class.
Delaying expensive computations
Often, specific objects are expensive to create. However, their creation can often be delayed
until their actual content is needed. For example, consider an object representing an image stored
as a file (e.g., an ARENA AdvertisementBanner). Loading all the pixels that constitute the image
from the file is expensive. However, the image data need not be loaded until the image is displayed.
We can realize such an optimization using a Proxy design pattern [Gamma et al., 1994]. An
ImageProxy object takes the place of the Image and provides the same interface as the Image object
(Figure 10-7). Simple operations such as width() and height() are handled by ImageProxy. When
Image needs to be drawn, however, ImageProxy loads the data from disk and creates a RealImage
object. If the client does not invokes the paint() operation, the RealImage object is not created,
thus saving substantial computation time. The calling classes only access the ImageProxy and the
RealImage through the Image interface.
Caching the result of expensive computations
Some methods are called many times, but their results are based on values that do not change or
change only infrequently. Reducing the number of computations required by these methods
substantially improve overall response time. In such cases, the result of the computation should be
cached as a private attribute. Consider, for example, the LeagueBoundary.getStatistics() operation,
which displays the statistics relevant to all Players and Tournaments in a League. These statistics
change only when a Match is completed, so it is not necessary to recompute the statistics every
time a User wishes to see them. Instead, the statistics for a League can be cached in a temporary
138
data structure, which is invalidated the next time a Match is completed. Note that this approach
includes a time-space trade-off: we improve the average response time for the getStatistics()
operation, but we consume memory space by storing redundant information.
map this association to code using a reference from the Advertiser to the Account. That is, we add
a field to Advertiser named account of type Account.
Creating the association between Advertiser and Account translates to setting the account
field to refer to the correct Account object. Because each Advertiser object is associated with
exactly one Account, a null value for the account attribute can only occur when a Advertiser object
is being created. Otherwise, a null account is considered an error. Since the reference to the
Account object does not change over time, we make the account field private and add a public
Advertiser.getAccount() method. This prevents callers from accidentally modifying the account
field.
document this assumption by writing a one-line comment immediately before the account and
owner fields.
In Figure 10-9, both the Account and the Advertiser classes must be recompiled and tested
whenever we change either class. With a unidirectional association from the Advertiser class to the
Account class, the Account class would not be affected by changes to the Advertiser class.
Bidirectional associations, however, are usually necessary in the case of classes that need to work
together closely. The choice between unidirectional or bidirectional associations is a trade-off to
be evaluated in each specific context. To make the trade-off easier, we can systematically make all
attributes private and provide corresponding getAttribute() and setAttribute() operations to access
the reference. This minimizes changes to APIs when changing a unidirectional association to
bidirectional or vice versa.
the Advertiser object does not invoke the Account constructor. Instead, a control object for creating
and archiving Accounts is responsible for invoking the constructor.
Note that the collection on the “many” side of the association depends on the constraints
on the association. For example, if the Accounts of an Advertiser must be ordered, we need to use
a List instead of a Set. To minimize changes to the interface when association constraints change,
we can set the return type of the getAccounts() method to Collection, a common superclass of List
and Set.
Many-to-many associations. In this case, both end classes have fields that are collections
of references and operations to keep these collections consistent. For example, the Tournament
class of ARENA has an ordered many-to-many association with the Player class. This association
is realized by using a List attribute in each class, which is modified by the operations addPlayer(),
removePlayer(), addTournament(), and removeTournament() (Figure 10-11). We already identified
acceptPlayer() and removePlayer() operations in the object design model (see Figure 9-11). We
rename acceptPlayer() to addPlayer() to maintain consistency with the code generated for other
associations.
As in the previous example, these operations ensure that both Lists are consistent. In the
event the association between Tournament and Player should be unidirectional, we could then
remove the tournaments attribute and its related methods, in which case a unidirectional many to-
many associations or a unidirectional one-to-many association are very similar and difficult to
distinguish at the object interface level.
142
Associations classes. In UML, we use an association class to hold the attributes and
operations of an association. For example, we can represent the Statistics for a Player within a
Tournament as an association class, which holds statistics counters for each Player/Tournament
combination (Figure 10-13). To realize such an association, we first transform the association class
into a separate object and a number of binary associations. Then we can use the techniques
discussed earlier to convert each binary association to a set of reference attributes. In Section 10.6,
we revisit this case and describe additional mappings for realizing association classes.
Once associations have been mapped to fields and methods, the public interface of classes
is relatively complete and should change only as a result of new requirements, discovered bugs, or
refactoring.
//PTO
144
caller, TournamentForm.addPlayer(), which forwards the exception to the ErrorConsole class, and
then proceeds with the next Player. The ErrorConsole boundary object then displays a list of error
messages to the user.
A simple mapping would be to treat each operation in the contract individually and to add
code within the method body to check the preconditions, postconditions, and invariants relevant
to the operation:
• Checking preconditions. Preconditions should be checked at the beginning of the
method, before any processing is done. There should be a test that checks if the precondition is
true and raises an exception otherwise. Each precondition corresponds to a different exception, so
that the client class can not only detect that a violation occurred, but also identify which parameter
is at fault.
• Checking postconditions. Postconditions should be checked at the end of the method,
after all the work has been accomplished and the state changes are finalized. Each postcondition
corresponds to a Boolean expression in an if statement that raises an exception if the contract is
violated. If more than one postcondition is not satisfied, only the first detection is reported.
• Checking invariants. When treating each operation contract individually, invariants are
checked at the same time as postconditions.
• Dealing with inheritance. The checking code for preconditions and postconditions
should be encapsulated into separate methods that can be called from subclasses.
146
If we mapped every contract following the above steps, we would ensure that all
preconditions, postconditions, and invariants are checked for every method invocation, and that
violations are detected within one method invocation. While this approach results in a robust
system (assuming the checking code is correct), it is not realistic:
• Coding effort. In many cases, the code required for checking preconditions and
postconditions is longer and more complex than the code accomplishing the real work. This results
in increased effort that could be better spent in testing or code clean-up.
• Increased opportunities for defects. Checking code can also include errors, increasing
testing effort. Worse, if the same developer writes the method and the checking code, it is highly
probable that bugs in the checking code mask bugs in the actual method, thereby reducing the
value of the checking code.
• Obfuscated code. Checking code is usually more complex than its corresponding
constraint and difficult to modify when constraints change. This leads to the insertion of many
more bugs during changes, defeating the original purpose of the contract.
• Performances drawback. Checking systematically all contracts can significantly slow
down the code, sometimes by an order of magnitude. Although correctness is always a design goal,
response time and throughput design goals would not be met.
Hence, unless we have a tool for generating checking code automatically, such as iContract
[Kramer, 1998], we need to adopt a pragmatic approach and evaluate the above trade-offs in the
project context. Remember that contracts support communication among developers,
consequently, exception handling of contract violations should focus on interfaces between
developers. Below are heuristics to evaluate these trade-offs:
In all cases, the checking code should be documented with comments describing the
constraints checked, both in English and in OCL. In addition to making the code more readable,
this makes it easier to modify the checking code correctly when a constraint changes.
10.4.4 Mapping Object Models to a Persistent Storage Schema
In this section, we look at the steps involved in mapping an object model to a relational database
using Java and database schemas.
147
A schema is a description of the data, that is, a meta-model for data [Date, 2004]. In UML,
class diagrams are used to describe the set of valid instances that can be created by the source code.
Similarly, in relational databases, the database schema describes the valid set of data records that
can be stored in the database. Relational databases store both the schema and the data. Relational
databases store persistent data in the form of tables (also called relations in the database literature).
A table is structured in columns, each of which represents an attribute. For example, in Figure
10-16, the User table has three columns, firstName, login, and email. The rows of the table
represent data records, with each cell in the table representing the value of the attribute for the data
record in that row. In Figure 10-16, the User table contains three data records each representing
the attributes of specific users Alice, John, and Bob.
A primary key of a table is a set of attributes whose values uniquely identify the data
records in a table. The primary key is used to refer unambiguously to a specific data record when
inserting, updating, or removing it. For example, in Figure 10-16, the login attribute represents a
unique user name within an Arena. Hence, the login attribute can be used as a primary key. Note,
however, the email attribute is also unique across all users in the table. Hence, the email attribute
could also be used as a primary key. Sets of attributes that could be used as a primary key are called
candidate keys. Only the actual candidate key that is used in the application to identify data
records is the primary key.
A foreign key is an attribute (or a set of attributes) that references the primary key of
another table. A foreign key links a data record in one table with one or more data records in
another table. In Figure 10-17, the table League includes the foreign key owner that references the
login attribute in the User table in Figure 10-16. Alice is the owner of the tictactoeNovice and
tictactoeExpert leagues and John is the owner of the chessNovice league.
//PTO
148
referring to the primary key of the LeagueOwner table. The value of the owner column is the value
of the id (i.e., the primary key) of the corresponding league. If there are multiple Leagues owned
by the same LeagueOwner, multiple data records of the League table have the id of the owner as
value for this column. For associations with a multiplicity of zero or one, a null value indicates
that there are no associations for the data record of interest.
Note that a one-to-one and one-to-many association could be realized with an association table
instead of a buried association. Using a separate table to realize all associations results in a database
schema that is modifiable. For example, if we change the multiplicity of a one-to-many association
to a many-to-many association, we do not need to change the database schema. Of course, this
150
increases the overall number of tables in the schema and the time to traverse the association. In
general, we need to evaluate this trade-off in the context of the application, examining whether the
multiplicity of the association is likely to change or if response time is a critical design goal.
Mapping inheritance relationships
Relational databases do not directly support inheritance, but there are two main options for
mapping an inheritance relationship to a database schema. In the first option, called vertical
mapping, similar to a one-to-one association, each class is represented by a table and uses a foreign
key to link the subclass tables to the superclass table. In the second option, called horizontal
mapping, the attributes of the superclass are pushed down into the subclasses, essentially
duplicating columns in the tables corresponding to subclasses.
Vertical mapping. Given an inheritance relationship, we map the superclass and
subclasses to individual tables. The superclass table includes a column for each attribute defined
in the superclass. The superclass includes an additional column denoting the subclass that
corresponds to the data record. The subclass tables include a column for each attribute defined in
the superclass. All tables share the same primary key, that is, the identifier of the object. Data
records in the superclass and subclass tables with the same primary key value refer to the same
object.
151
Horizontal mapping. Another way to realize inheritance is to push the attributes of the
superclass down into the subclasses, effectively removing the need for a superclass table. In this
case, each subclass table duplicates the columns of the superclass.
The trade-off between using a separate table for superclasses and duplicating columns in
the subclass tables is between modifiability and response time. If we use a separate table, we can
add attributes to the superclass simply by adding a column to the superclass table. When adding a
subclass, we add a table for the subclass with a column for each attribute in the subclass. If we
duplicate columns, modifying the database schema is more complex and error prone. The
advantage of duplicating columns is that individual objects are not fragmented across a number of
tables, which results in faster queries. For deep inheritance hierarchies, this can represent a
significant performance difference.
In general, we need to examine the likelihood of changes against the performance
requirements in the specific context of the application.
10.5 Managing Implementation
Transformations enable us to improve specific aspects of the object design model and to convert
it into source code. By providing systematic recipes for recurring situations, transformations
enable us to reduce the amount of effort and the overall number of errors in the source code.
However, to retain this benefit throughout the lifetime of the system, we need to document the
application of transformations so that they can be consistently reapplied in the event of changes to
the object design model or the source code.
Reverse engineering attempts to alleviate this problem by allowing us to reconstruct the
object design model from the source code. If we could maintain a one-to-one mapping between
the source code and the object design mode, we would not need any documentation: the tools at
hand would automatically apply selected transformations and mirror changes in the source code
and the object design model. However, most useful transformations, including those described in
152
this chapter, are not one-to-one mappings. As a result, information is lost in the process of applying
the transformation. For example:
• Association multiplicity and collections. Unidirectional one-to-many associations and
many-to-many associations map to the same source code. A CASE tool that reverse-engineers the
corresponding source code usually selects the least restrictive case (i.e., a many-to-many
association). In general, information about association multiplicity is distributed in several places
in the source code, including checking code in the boundary objects.
• Association multiplicity and buried associations. One-to-many associations and one-
to one associations implemented as a buried association in a database schema suffer from the same
problem. Worse, when all associations are realized as separate tables, all information about
association multiplicity is lost.
• Postconditions and invariants. When mapping contracts to exception-handling code
(Section 10.4.3), we generate checking code only for preconditions. Postconditions and invariants
are not mapped to source code. The object specification and the system become quickly
inconsistent when postconditions or invariants are changed, but not documented.
These challenges boil down to finding conventions and mechanisms to keep the object design
model, the source code, and the documentation consistent with each other. There is no single
answer, but the following principles reduce consistency problems when applied systematically:
• For a given transformation, use the same tool. If you are using a modeling tool to map
associations to code, use the same tool when you change association multiplicities. Modern
modeling tools generate markers as source code comments to enable the repetitive generation of
code from the same model. However, this mapping can easily break when developers use
interchangeably a text editor or the modeling tool to change associations. Similarly, if you generate
constraint-checking code with a tool, regenerate the checking code when the constraint is changed.
• Keep the contracts in the source code, not in the object design model. Contracts
describe the behavior of methods and restrictions on parameters and attributes. Developers change
the behavior of an object by modifying the body of a method, not by modifying the object design
model. By keeping the constraint specifications as source code comments, they are more likely to
be updated when the code changes.
• Use the same names for the same objects. When mapping an association to source code
or a class to a database schema, use the same names on both sides of the transformation. If the
name is changed in the model, change it in the source code. By using the same names, you provide
traceability among the models and make it easier for developers to identify both ends of the
transformation. This also emphasizes the importance of identifying the right names for classes
during analysis, before any transformations are applied, to minimize the effort associated with
renaming.
• Make transformations explicit. When transformations are applied by hand, it is critical
that the transformation is made explicit in some form so that all developers can apply the
transformation the same way. For example, transformations for mapping associations to collections
should be documented in a coding conventions guide so that, when two developers apply the same
153
transformation, they produce the same code. This also makes it easier for developers to identify
transformations in the source code. As usual, the commitment of developers to use standard
conventions is more important than the actual conventions.
10.5.2 Assigning Responsibilities
Several roles collaborate to select, apply, and document transformations and the conversion of the
object design model into source code:
• The core architect selects the transformations to be systematically applied. For example,
if it is critical that the database schema is modifiable, the core architect decides that all associations
should be implemented as separate tables.
• The architecture liaison is responsible for documenting the contracts associated with
subsystem interfaces. When such contracts change, the architecture liaison is responsible for
notifying all class users.
• The developer is responsible for following the conventions set by the core architect and
actually applying the transformations and converting the object design model into source code.
Developers are responsible for maintaining up-to-date the source code comments with the rest of
the models.
Identifying and applying transformations the first time is relatively trivial. The key
challenge is in reapplying transformations after a change occurs. Hence, when assigning
responsibilities, each role should understand who should be notified in the event of changes.
(Note: ARENA case study is there in Page number 421/416 of the
textbook Object-oriented software engineering _ using UML,
Patterns, -- Bruegge, Bernd; Dutoit, Allen H, in section 10.6)
154
Module 4, Chapter 4
Testing
Testing is the process of finding differences between the expected behavior specified by system
models and the observed behavior of the implemented system. Unit testing finds differences
between a specification of an object and its realization as a component. Structural testing finds
differences between the system design model and a subset of integrated subsystems. Functional
testing finds differences between the use case model and the system. Finally, performance testing
finds differences between nonfunctional requirements and actual system performance. When
differences are found, developers identify the defect causing the observed failure and modify the
system to correct it. In other cases, the system model is identified as the cause of the difference,
and the system model is updated to reflect the system.
From a modeling point of view, testing is the attempt to show that the implementation of
the system is inconsistent with the system models. The goal of testing is to design tests that exercise
defects in the system and to reveal problems. This activity is contrary to all other activities we
described in previous chapters: analysis, design, implementation, communication, and negotiation
are constructive activities. Testing, however, is aimed at breaking the system. Consequently, testing
is usually accomplished by developers that were not involved with the construction of the system.
11.1 Introduction: Testing the Space Shuttle
Testing is the process of analyzing a system or system component to detect the differences between
specified (required) and observed (existing) behavior. Unfortunately, it is impossible to completely
test a nontrivial system. First, testing is not decidable. Second, testing must be performed under
time and budget constraints. As a result, systems are often deployed without being completely
tested, leading to faults discovered by end users.
The first launch of the Space Shuttle Columbia in 1981, for example, was canceled because
of a problem that was not detected during development. The problem was traced to a change made
by a programmer two years earlier, who erroneously reset a delay factor from 50 to 80
milliseconds. This added a probability of 1/67 that any space shuttle launch would fail.
Unfortunately, in spite of thousands of hours of testing after the change was made, the fault was
not discovered during the testing phase. During the actual launch, the fault caused a
synchronization problem with the shuttle’s five on-board computers that led to the decision to abort
the launch.
Testing is often viewed as a job that can be done by beginners. Managers would assign the
new members to the testing team, because the experienced people detested testing or are needed
for the more important jobs of analysis and design. Unfortunately, such an attitude leads to many
problems. To test a system effectively, a tester must have a detailed understanding of the whole
system, ranging from the requirements to system design decisions and implementation issues. A
tester must also be knowledgeable of testing techniques and apply these techniques effectively and
efficiently to meet time, budget, and quality constraints.
155
to prevent the insertion of faults into the system before it is released. Fault avoidance
includes development methodologies, configuration management, and verification.
• Fault detection techniques, such as debugging and testing, are uncontrolled and controlled
experiments, respectively, used during the development process to identify erroneous states
and find the underlying faults before releasing the system. Fault detection techniques assist
in finding faults in systems, but do not try to recover from the failures caused by them. In
general, fault detection techniques are applied during development, but in some cases, they
are also used after the release of the system. The black-boxes in an airplane to log the last
few minutes of a flight is an example of a fault detection technique.
• Fault tolerance techniques assume that a system can be released with faults and that
system failures can be dealt with by recovering from them at runtime. For example,
modular redundant systems assign more than one component with the same task, then
compare the results from the redundant components. The space shuttle has five onboard
computers running two different pieces of software to accomplish the same task.
In this chapter, we focus on fault detection techniques, including reviews and testing. A review
is the manual inspection of parts or all aspects of the system without actually executing the system.
There are two types of reviews: walkthrough and inspection. In a code walkthrough, the developer
informally presents the API (Application Programmer Interface), the code, and associated
documentation of the component to the review team. The review team makes comments on the
mapping of the analysis and object design to the code using use cases and scenarios from the
analysis phase. An inspection is similar to a walkthrough, but the presentation of the component
is formal. In fact, in a code inspection, the developer is not allowed to present the artifacts (models,
code, and documentation). This is done by the review team, which is responsible for checking the
interface and code of the component against the requirements. It also checks the algorithms for
efficiency with respect to the nonfunctional requirements. Finally, it checks comments about the
code and compares them with the code itself to find inaccurate and incomplete comments. The
developer is only present in case the review needs clarifications about the definition and use of
data structures or algorithms. Code reviews have proven to be effective at detecting faults. In some
experiments, up to 85 percent of all identified faults were found in code reviews [Fagan, 1976],
[Jones, 1977], [Porter et al., 1997].
Debugging assumes that faults can be found by starting from an unplanned failure. The
developer moves the system through a succession of states, ultimately arriving at and identifying
the erroneous state. Once this state has been identified, the algorithmic or mechanical fault causing
this state must be determined. There are two types of debugging: The goal of correctness
debugging is to find any deviation between observed and specified functional requirements.
Performance debugging addresses the deviation between observed and specified nonfunctional
requirements, such as response time.
Testing is a fault detection technique that tries to create failures or erroneous states in a planned
way. This allows the developer to detect failures in the system before it is released to the customer.
Note that this definition of testing implies that a successful test is a test that identifies faults. We
157
will use this definition throughout the development phases. Another often-used definition of
testing is that “it demonstrates that faults are not present.” We will use this definition only after the
development of the system when we try to demonstrate that the delivered system fulfills the
functional and nonfunctional requirements.
If we used this second definition all the time, we would tend to select test data that have a low
probability of causing the program to fail. If, on the other hand, the goal is to demonstrate that a
program has faults, we tend to look for test data with a higher probability of finding faults. The
characteristic of a good test model is that it contains test cases that identify faults. Tests should
include a broad range of input values, including invalid inputs and boundary cases, otherwise,
faults may not be detected. Unfortunately, such an approach requires extremely lengthy testing
times for even small systems.
158
Input describes the set of input data or commands to be entered by the actor of the test case
(which can be the tester or a test driver). The expected behavior of the test case is the sequence of
output data or commands that a correct execution of the test should yield. The expected behavior
is described by the oracle attribute. The log is a set of time-stamped correlations of the observed
behavior with the expected behavior for various test runs.
Once test cases are identified and described, relationships among test cases are identified.
Aggregation and the precede associations are used to describe the relationships between the test
cases. Aggregation is used when a test case can be decomposed into a set of subtests. Two test
cases are related via the precede association when one test case must precede another test case.
Figure 11-9 shows a test model where TestA must precede TestB and TestC. For example,
TestA consists of TestA1 and TestA2, meaning that once TestA1 and TestA2 are tested, TestA is
tested; there is no separate test for TestA. A good test model has as few associations as possible,
because tests that are not associated with each other can be executed independently from each
other. This allows a tester to speed up testing, if the necessary testing resources are available. In
Figure 11-9, TestB and TestC can be tested in parallel, because there is no relation between them.
161
Test cases are classified into blackbox tests and whitebox tests, depending on which aspect
of the system model is tested. Blackbox tests focus on the input/output behavior of the component.
Blackbox tests do not deal with the internal aspects of the component, nor with the behavior or the
structure of the components. Whitebox tests focus on the internal structure of the component. A
whitebox test makes sure that, independently from the particular input/output behavior, every state
in the dynamic model of the object and every interaction among the objects is tested. As a result,
whitebox testing goes beyond blackbox testing. In fact, most of the whitebox tests require input
data that could not be derived from a description of the functional requirements alone. Unit testing
combines both testing techniques: blackbox testing to test the functionality of the component, and
whitebox testing to test structural and dynamic aspects of the component.
11.3.3 Test Stubs and Drivers
Executing test cases on single components or combinations of components requires the tested
component to be isolated from the rest of the system. Test drivers and test stubs are used to
substitute for missing parts of the system. A test driver simulates the part of the system that calls
the component under test. A test driver passes the test inputs identified in the test case analysis to
the component and displays the results.
A test stub simulates a component that is called by the tested component. The test stub
must provide the same API as the method of the simulated component and must return a value
compliant with the return result type of the method’s type signature. Note that the interface of all
components must be baselined. If the interface of a component changes, the corresponding test
drivers and stubs must change as well.
The implementation of test stubs is a nontrivial task. It is not sufficient to write a test stub
that simply prints a message stating that the test stub was called. In most situations, when
component A calls component B, A is expecting B to perform some work, which is then returned
as a set of result parameters. If the test stub does not simulate this behavior, A will fail, not because
of a fault in A, but because the test stub does not simulate B correctly.
Even providing a return value is not always sufficient. For example, if a test stub always
returns the same value, it might not return the value expected by the calling component in a
162
particular scenario. This can produce confusing results and even lead to the failure of the calling
component, even though it is correctly implemented. Often, there is a trade-off between
implementing accurate test stubs and substituting the test stubs by the actual component. For many
components, drivers and stubs are often written after the component is completed, and for
components that are behind schedule, stubs are often not written at all.
To ensure that stubs and drivers are developed and available when needed, several
development methods stipulate those drivers be developed for every component. This results in
lower effort because it provides developers the opportunity to find problems with the interface
specification of the component under test before it is completely implemented.
11.3.4 Corrections
Once tests have been executed and failures have been detected, developers change the component
to eliminate the suspected faults. A correction is a change to a component whose purpose is to
repair a fault. Corrections can range from a simple modification to a single component, to a
complete redesign of a data structure or a subsystem. In all cases, the likelihood that the developer
introduces new faults into the revised component is high. Several techniques can be used to
minimize the occurrence of such faults:
• Problem tracking includes the documentation of each failure, erroneous state, and fault
detected, its correction, and the revisions of the components involved in the change. Together
with configuration management, problem tracking enables developers to narrow the search for
new faults.
• Regression testing includes the re-execution of all prior tests after a change. This ensures that
functionality which worked before the correction has not been affected. Regression testing is
important in object-oriented methods, which call for an iterative development process. This
requires testing to be initiated earlier and for test suites to be maintained after each iteration.
Regression testing unfortunately is costly, especially when part of the tests is not automated.
• Rationale maintenance includes the documentation of the rationale for the change and its
relationship with the rationale of the revised component. Rationale maintenance enables
developers to avoid introducing new faults by inspecting the assumptions that were used to
build the component.
Next, let us describe in more detail the testing activities that lead to the creation of test cases,
their execution, and the development of corrections. Note: Section 11.4 is not part of this syllabus,
but still typed here for brief knowledge about it.
11.4 Testing Activities
In this section, we describe the technical activities of testing. These include
• Component inspection, which finds faults in an individual component through the manual
inspection of its source code.
• Usability testing, which finds differences between what the system does and the users’
expectation of what it should do.
163
• Unit testing, which finds faults by isolating an individual component using test stubs and
drivers and by exercising the component using test cases.
• Integration testing, which finds faults by integrating several components together.
• System testing, which focuses on the complete system, its functional and nonfunctional
requirements, and its target environment.
11.5 Managing Testing
In this section, we describe how to manage testing activities to minimize the resources needed.
Many testing activities occur near the end of the project, when resources are running low and
delivery pressure increases. Often, trade-offs lie between the faults to be repaired before delivery
and those that can be repaired in a subsequent revision of the system. In the end, however,
developers should detect and repair a sufficient number of faults such that the system meets
functional and nonfunctional requirements to an extent acceptable to the client.
First, we describe the planning of test activities (Section 11.5.1). Next, we describe the test
plan, which documents the activities of testing (Section 11.5.2). Next, we describe the roles
assigned during testing (Section 11.5.3). Next, we discuss the topics of regression testing (Section
11.5.4), automated testing (Section 11.5.5), and model-based testing (Section 11.5.6).
11.5.1 Planning Testing
Developers can reduce the cost of testing and the elapsed time necessary for its completion through
careful planning. Two key elements are to start the selection of test cases early and to parallelize
tests.
Developers responsible for testing can design test cases as soon as the models they validate
become stable. Functional tests can be developed when the use cases are completed. Unit tests of
subsystems can be developed when their interfaces is defined. Similarly, test stubs and drivers can
be developed when component interfaces are stable. Developing tests early enables the execution
of tests to start as soon as components become available. Moreover, given that developing tests
requires a close examination of the models under validation, developers can find faults in the
models even before the system is constructed. Note, however, that developing tests early on
introduces a maintenance problem: test cases, drivers, and stubs need to be updated whenever the
system models change.
The second key element in shortening testing time is to parallelize testing activities. All
component tests can be conducted in parallel; double tests for components in which no faults were
discovered can be initiated while other components are repaired.
Testing represents a substantial part of the overall project resources. A typical guideline for
projects following a Unified Process life cycle is to allocate 25 percent of project resources to
testing (see Section 15.4.2; [Royce, 1998]). However, this number can go up depending onsafety
and reliability requirements on the system. Hence, it is critical that test planning start early, as early
as the use case model is stable.
164
• Section 1 of the test plan describes the objectives and extent of the tests. The goal is to provide
a framework that can be used by managers and testers to plan and execute the necessary tests
in a timely and cost-effective manner.
• Section 2 explains the relationship of the test plan to the other documents produced during the
development effort such as the RAD, SDD, and ODD (Object Design Document). It explains
how all the tests are related to the functional and nonfunctional requirements, as well as to the
system design stated in the respective documents. If necessary, this section introduces a naming
scheme to establish the correspondence between requirements and tests.
165
• Section 3, focusing on the structural aspects of testing, provides an overview of the system in
terms of the components that are tested during the unit test. The granularity of components and
their dependencies are defined in this section.
• Section 4, focusing on the functional aspects of testing, identifies all features and combinations
of features to be tested. It also describes all those features that are not to be tested and the
reasons for not testing them.
• Section 5 specifies generic pass/fail criteria for the tests covered in this plan. They are
supplemented by pass/fail criteria in the test design specification. Note that “fail” in the IEEE
standard terminology means “successful test” in our terminology.
• Section 6 describes the general approach to the testing process. It discusses the reasons for the
selected integration testing strategy. Different strategies are often needed to test different parts
of the system. A UML class diagram can be used to illustrate the dependencies between the
individual tests and their involvement in the integration tests.
• Section 7 specifies the criteria for suspending the testing on the test items associated with the
plan. It also specifies the test activities that must be repeated when testing is resumed.
• Section 8 identifies the resources that are needed for testing. This should include the physical
characteristics of the facilities, including the hardware, software, special test tools, and other
resources needed (office space, etc.) to support the tests.
• Section 9, the core of the test plan, lists the test cases that are used during testing. Each test
case is described in detail in a separate Test Case Specification document. Each execution of
these tests will be documented in a Test Incident Report document. We describe these
documents in more details later in this section.
• Section 10 of the test plan covers responsibilities, staffing and training needs, risks and
contingencies, and the test schedule.
Figure 11-28 is an outline of a Test Case Specification.
The Test Case Specification identifier is the name of the test case, used to distinguish it from other
test cases. Conventions such as naming the test cases from the features or the component being
tested allow developers to more easily refer to test cases. Section 2 of the TCS lists the components
under test and the features being exercised. Section 3 lists the inputs required for the test cases.
166
Section 4 lists the expected output. This output is computed manually or with a competing system
(such as a legacy system being replaced). Section 5 lists the hardware and software platform
needed to execute the test, including any test drivers or stubs. Section 6 lists any constraints needed
to execute the test such as timing, load, or operator intervention. Section 7 lists the dependencies
with other test cases.
The Test Incident Report lists the actual test results and the failures that were experienced.
The description of the results must include which features were demonstrated and whether the
features have been met. If a failure has been experienced, the test incident report should contain
sufficient information to allow the failure to be reproduced. Failures from all Test Incident Reports
are collected and listed in the Test Summary Report and then further analyzed and prioritized by
the developers.
11.5.3 Assigning Responsibilities
Testing requires developers to find faults in components of the system. This is best done when the
testing is performed by a developer who was not involved in the development of the component
under test, one who is less reticent to break the component being tested and who is more likely to
find ambiguities in the component specification.
For stringent quality requirements, a separate team dedicated to quality control is solely
responsible for testing. The testing team is provided with the system models, the source code, and
the system for developing and executing test cases. Test Incident Reports and Test Report
Summaries are then sent back to the subsystem teams for analysis and possible revision of the
system. The revised system is then retested by the testing team, not only to check if the original
failures have been addressed, but also to ensure that no new faults have been inserted in the system.
For systems that do not have stringent quality requirements, subsystem teams can double
as a testing team for components developed by other subsystem teams. The architecture team can
define standards for test procedures, drivers, and stubs, and can perform as the integration test
team. The same test documents can be used for communication among subsystem teams.
One of the main problems of usability tests is with enrolling participants. Several obstacles
are faced by project managers in selecting real end users [Grudin, 1990]:
• The project manager is usually afraid that users will bypass established technical support
organizations and call the developers directly, once they know how to get to them. Once this
line of communication is established, developers might be sidetracked too often from doing
their assigned jobs.
• Sales personnel do not want developers to talk to “their” clients. Sales people are afraid that
developers may offend the client or create dissatisfaction with the current generation of
products (which still must be sold).
• The end users do not have time.
• The end users dislike being studied. For example, an automotive mechanic might think that an
augmented reality system will put him out of work.
Debriefing the participants is the key to coming to understanding how to improve the usability of
the system being tested. Even though the usability test uncovers and exposes problems, it is often
167
the debriefing session that illustrates why these problems have occurred in the first place. It is
important to write recommendations on how to improve the tested components as fast as possible
after the usability test is finished, so they can be used by the developers to implement any necessary
changes in the system models of the tested component.
11.5.4 Regression Testing
Object-oriented development is an iterative process. Developers modify, integrate, and retest
components often, as new features are implemented or improved. When modifying a component,
developers design new unit tests exercising the new feature under consideration. They may also
retest the component by updating and rerunning previous unit tests. Once the modified component
passes the unit tests, developers can be reasonably confident about the changes within the
component. However, they should not assume that the rest of the system will work with the
modified component, even if the system has previously been tested. The modification can
introduce side effects or reveal previously hidden faults in other components. The changes can
exercise different assumptions about the unchanged components, leading to erroneous states.
Integration tests that are rerun on the system to produce such failures are called regression tests.
The most robust and straightforward technique for regression testing is to accumulate all
integration tests and rerun them whenever new components are integrated into the system. This
requires developers to keep all tests up-to-date, to evolve them as the subsystem interfaces change,
and to add new integration tests as new services or new subsystems are added. As regression testing
can become time consuming, different techniques have been developed for selecting specific
regression tests. Such techniques include [Binder, 2000]:
• Retest dependent components. Components that depend on the modified component are the
most likely to fail in a regression test. Selecting these tests will maximize the likelihood of
finding faults when rerunning all tests is not feasible.
• Retest risky use cases. Often, ensuring that the most catastrophic faults are identified is more
critical than identifying the largest number of faults. By focusing first on use cases that present
the highest risk, developers can minimize the likelihood of catastrophic failures.
• Retest frequent use cases. When users are exposed to successive releases of the same system,
they expect that features that worked before continue to work in the new release. To maximize
the likelihood of this perception, developers focus on the use cases that are most often used by
the users.
In all cases, regression testing leads to running many tests many times. Hence, regression testing
is feasible only when an automated testing infrastructure is in place, enabling developers to
automatically set up, initialize, and execute tests and compare their results with a predefined oracle.
11.5.5 Automating Testing
Manual testing involves a tester to feed predefined inputs into the system using the user interface,
a command line console, or a debugger. The tester then compares the outputs generated by the
system with the expected oracle. Manual testing can be costly and error prone when many tests
are involved or when the system generates a large volume of outputs. When requirements change
168
and the system evolves rapidly, testing should be repeatable. This makes these drawbacks worse,
as it is difficult to guarantee that the same test is executed under the same conditions every time.
The repeatability of test execution can be achieved with automation. Although all aspects
of testing can be automated (including test case and oracle generation), the main focus of test
automation has been on execution. For system tests, test cases are specified in terms of the
sequence and timing of inputs and an expected output trace. The test harness can then execute a
number of test cases and compare the system output with the expected output trace. For unit and
integration tests, developers specify a test as a test driver that exercises one or more methods of
the classes under tests.
The benefit of automating test execution is that tests are repeatable. Once a fault is
corrected as a result of a failure, the test that uncovered the failure can be repeated to ensure that
the failure does not occur anymore. Moreover, other tests can be run to ensure (to a limited extent)
that no new faults have been introduced. Moreover, when tests are repeated many times, for
example, in the case of refactoring (see Section 10.3.2), the cost of testing is decreased
substantially. However, note that developing a test harness and test cases is an investment. If tests
are run only once or twice, manual testing may be a better alternative.
An example of an automated test infrastructure is JUnit, a framework for writing and
automating the execution of unit tests for Java classes [JUnit, 2009]. The JUnit test framework is
made out of a small number of tightly integrated classes (Figure 11-29). Developers write new test
cases by subclassing the TestCase class. The setUp() and tearDown() methods of the concrete test
case initialize and clean up the testing environment, respectively. The runTest() method includes
the actual test code that exercises the class under test and compares the results with an expected
condition. The test success or failure is then recorded in an instance of TestResult. TestCases can
be organized into TestSuites, which will invoke sequentially each of its tests. TestSuites can also
be included in other TestSuites, thereby enabling developers to group unit tests into increasingly
larger test suites.
169
Typically, when using JUnit, each TestCase instance exercises one method of the class
under test. To minimize the proliferation of TestCase classes, all test methods exercising the same
class (and requiring the same test environment initialized by the setUp() method) are grouped in
the same ConcreteTestCase class. The actual method that is invoked by runTest() can then be
configured when creating instances of TestCases. This enables developers to organize and
selectively invoke large number of tests.
11.5.6 Model-Based Testing
Testing (manual or automated) requires an infrastructure for executing tests, instrumenting the
system under test, and collecting and assessing test results. This infrastructure is called the test
harness or test system. The test system is made of software and hardware components that interact
with various actors, which can then be modeled using UML. In Chapters 2-10, we have shown
how the system under development and the development organization can be modeled in UML.
Similarly, we can model the test system in UML. To be able to do this, we need to extend UML
with new entity objects for modeling the test system.
UML profiles provide a way for extending UML. A UML profile is a collection of new
stereotypes, new interfaces, or new constraints, thus providing new concepts specialized to an
application domain or a solution domain.
U2TP (UML 2 Testing Profile, [OMG, 2005]) is an example of a UML profile, which
extends UML for modeling testing. Modeling the test system in U2TP provides the same
advantages as when modeling the system under development: test cases are modeled in a standard
notation understood by all participants, test cases can be automatically generated from test models,
test cases execution and results can be automatically collected and recorded.
U2TP extends UML with the following concepts:
• The system under test (stereotype «sut»), which may be the complete system under
development, or only a part of it, such as a subsystem or a single class.
• A test case (stereotype «testCase») is a specification of behavior realizing one or more test
objectives. A test cases specifies the sequence of interactions among the system under test and
the test components. The interactions are either stimuli on the system under test or observations
gathered from the system under test or from test components. A test case is represented as a
sequence diagram or state machine. Test cases return an enumerated type called verdict,
denoting if the test run passed, failed, was inconclusive, or an error in the test case itself was
detected. In U2TP terminology, an error is caused by a fault in the test system, while a failure
is caused by a fault in the system under test.
• A test objective (stereotype «testObjective») describes in English the goal of one or several
test cases. A test objective is typically a requirement or a part of a requirement that is being
verified. For example, the test objective of the displayTicketPrices test case is to verify that the
correct price is displayed after selected a zone button on the ticket distributor.
• Test components (stereotype «testComponent»), such as test stubs and utilities needed for
executing a test case. Examples of test components include simulated hardware, simulated user
behavior, or components that inject faults.
170
• Test contexts (stereotype «testContext»), which include the set of test cases, the configuration
of test components and system under test needed for every test case, and a test control for
sequencing the test cases.
• An arbiter (interface Arbiter), which collects the local test results into an aggregated result.
• A scheduler (interface Scheduler), which creates and coordinate the execution of the test cases
among test components and system under test.
Figure 11-30 depicts an example of a test system in U2TP for the TicketDistributor. The test
context PurchaseTicketSuite groups all the test cases for the PurchaseTicket use case. The system
under test is the TicketDistributor software. To make it easier to control and instrument the system
to assess the success or failure of tests, we simulate the ticket distributor display with a
DisplaySimulator test component.
For example, Figure 11-31 depicts the expected interactions of the displayTicketPrices() test case
resulting to a pass verdict. selectZone1(), selectZone2(), and selectZone4() are stimuli on the
system under test. getDisplay() are observations to assess if individual test steps were successful.
Note that only the expected interactions are displayed. Any unexpected interactions, missing
interactions, or observations that do not match the oracles, lead to a failed verdict. U2TP also
provides mechanisms, not discussed here, to explicitly model interactions that lead to an
inconclusive or a failed verdict.
The displayTicketPrices() test case of Figure 11-31 explicitly models the mapping between
zones and ticket prices. In a realistic system, this approach would not be sustainable, as many test
cases are repeated for boundary values and with samples of different equivalence classes. To
address this challenge, U2TP provides the concepts of DataPool, DataPartition, and DataSelector,
to represent test data samples, equivalence classes, and data selection strategies, respectively.
These allow to parameterize test cases with different sets of values, keeping the specification of
test cases concise and reusable.
171
Module 5, Chapter 1
Software Maintenance
Often, the software system undergoes a beta testing phase, during which the users test run the
system and report bugs and deficiencies. The development team removes the bugs and
deficiencies. This period may take a few weeks or several months, depending on the nature of the
application, complexity, and size of the software system. After beta test, the software system enters
into its maintenance phase. The system is stabilizing in the first several months during which the
users exercise more and more functions and become familiar with the system's behavior. Removal
of bugs and deficiencies continues, but the rate should reduce significantly. This period is
sometimes called the system aging period. As the world evolves, the system's functionality,
performance, quality of service, or security can no longer satisfy the business needs. Enhancement
to the system is required. In this case, a new project is established to identify new capabilities, and
design and implement the new capabilities to enhance the software system. The new project will
go through the steps as described in the previous chapters. In this way, the system evolves during
the prolonged maintenance period. Various surveys show that software maintenance consumes
60%-80% of the total life-cycle costs; 75% or more of the costs are due to enhancements.
Therefore, software maintenance is an important area of software engineering and deserves an
entire book. This chapter serves as an introduction to the topic. After studying this chapter, you
will learn the following:
• Fundamentals of software maintenance.
• Factors that require software maintenance.
• Lehman's Jaw of system evolution.
• Types of software maintenance.
• Software maintenance process models and activities.
• Software reverse-engineering.
• Software reengineering.
21.1 WHAT IS SOFTWARE MAINTENANCE?
Many software systems that were constructed decades ago are still in use today. These systems are
called legacy systems. Many of the legacy systems will continue to operate in the next several
decades. One of the reasons that these systems cannot retire as they should is the high cost of
replacing them. Another reason is there is no guarantee that the new system will be as good as the
replaced system. This is because the legacy systems have embedded the collective knowledge,
experience, and intelligence of thousands of software engineers, domain experts, and users during
the last several decades. Even if replacing an old system is an option, it is too costly for an
organization to replace an existing system frequently. The costs include system development cost,
costs associated with lost productivity due to procurement, participation in requirements gathering,
system design reviews, acceptance testing, user training, beta testing, and adapting to the new
system. Therefore, after their releases, systems undergo a prolonged period of continual
modifications to correct errors, enhance capabilities, adapting to new operating platforms or
173
environments, and improving the system structure to make it possible for further changes. This
process is called software maintenance, defined by the IEEE as follows:
Definition 21.1 Software maintenance is modifying a software system or com ponent after
delivery to correct faults, improve performance, add new capabilities, or adapt to a changed
environment.
21.2 FACTORS THAT MANDATE CHANGE
After the system is released, installed, and operated in the target environment, update to the system
is still needed. A number of factors mandate software change:
1. Bug fixes. Although the software system has been tested to achieve a desired test
coverage, some vital bugs may occur during the operational phase. These require bug removal and
regression testing to ensure that the modified software passes selected tests performed previously.
2. Change in operating environment. Changes in the hardware, platform, and system
configuration may require modification to the software.
3. Change in government policies and regulations. Changes in government policies and
regulations may require changes to the software system to comply with the new policies and
regulations.
4. Change in business procedures. Many software systems automate business operations.
If the procedures of some of the business operations are changed, then the software system must
be modified accordingly. For example, as security becomes important, many web-based
applications require users to set up authentication questions and answers to better authenticate the
users. Such changes are due to changes in business procedures and require changes to the software.
5. Changes to prevent future problems. Sometimes, changes to the software system are
needed to prevent problems that could occur in the future. For example, redesign and reimplement
a complex component to improve its reliability.
21.3 LEHMAN'S LAWS OF SYSTEM EVOLUTION
Lehman and Belady conducted a series of studies on systems evolution. The Lehman's laws of
system evolution are the result of these studies. The laws are specified for the so-called E-type of
systems. These are systems that cannot be completely and definitely specified. That is, system
development for such a system is a wicked problem. On the other hand, the S-type systems are
systems that can be completely and definitely specified. Their development is not a wicked
problem. Examples of such systems are mathematical software, chess playing software and the
like. The eight Lehman's laws are:
1. law of continuing change (1974). After the system is released, changes to the system
are required, and these continue until the system is replaced. Changes are due to reasons
described in Section 21.2.
2. law of increasing entropy or complexity (1974). The structure of the software system
deteriorates as changes are made. This is because changes introduce errors, which
require more changes. Changes often introduce conditional statements to handle
erroneous situations, or check for invocation of new fean1res. These increase the
complexity of the system and coupling between the components. The result is that the
174
system becomes more and more difficult to understand and maintain. Restructuring or
reengineering is required to improve the structure of the system to reduce the
maintenance cost.
3. law of self-regulation (1974). The system evolution process is a self-regulating
process. Many system attributes such as maintainability, release interval, error rate, and
the like may appear to be stochastic from release to release. However, their long-term
trends exhibit observable regularities. In fact, this law is universal. That is, it is not
limited to system evolution. It is applicable to everything because everything is a
system. Consider, for example, the stock chart for a public company. The daily prices
may fluctuate, sometimes drastically. However, the long-term movements exhibit an
upward, downward, or flat trend. This law is due to the eighth law-that is, the law of
feedback systems. Indeed, the regularity is the result of the feedback loops, or
interaction of factors that cancel each other as well as enhance each other during a long
period of maintenance activities. This law is also a generalization of the next three laws-
law of conservation of organizational stability, law of conservation of familiarity, and
law of continuing growth. These three laws state the regularities of three specific
aspects.
4. Law of conservation of organizational stability (1978). The maintenance process for
an E-type system tends to exhibit a constant average work rate over the system's
lifetime.
5. Law of conservation of familiarity (1978). The average incremental growth of the
system remains a constant during the system's lifetime.
6. Law of continuing growth (1991). E-type systems must continue its functional growth
to satisfy its users.
7. Law of declining quality (1996). The quality of E-type systems will appear to be
declining unless they are rigorously adapted to the changes in the operating
environment.
8. Law of feedback systems (1996). The evolution process consists of multilevel,
multiloop, and multiagent feedback systems that play a role in all the laws. That is, the
other laws are due to the feedback behavior.
As stated in Lehman and Belady's article, the law of increasing entropy implies that the
system would be replaced because the cost to maintain it would exceed the cost of building a new
system. This was true for operating systems, which were studied by Lehman and Belady. However,
many organizations find that replacing a legacy application system is not an option because
numerous business processes and business rules have been implemented in the legacy system
during the prolonged maintenance process. Moreover, millions of records are stored in the
databases. Due to inadequate documentation and the complexity of the system, no one really
knows what is implemented and how to port the data records. Therefore, many legacy systems are
still in use and companies spend hundreds of millions of dollars maintaining them each year.
175
The figure shows that two chains have a length of I 4. This means 15 functions form a call
sequence-that is, f1 calls f2, f2 calls f3, ..., f14 calls fl5. Thus, to understand the functionality, the
maintainer needs to trace the 14 function calls and makes a note of what each function does. He
then derives the functionality from the trace. The figure shows that most of the invocation chains
call two to nine functions, and about 30% of the chains call more than a half a dozen functions.
The complex relationships among the classes and the function invocation chains make object-
oriented programs difficult to understand. Unfortunately, the version of the Inter Views library
used in the study in did not include any in-code comments. Each program file contains only a brief,
178
generic file header stating the copyright information, and this is the only in-code comment. Thus,
understanding a program is difficult without other supporting documents.
21.5.3 Change Identification and Analysis
Software maintenance needs to identify the needed changes based on the change events.
Sometimes, more than one change is needed for a given event. All these changes should be
identified. It is worthwhile to identify alternative changes to accommodate a change event. For
example, if it is costly or time consuming to change a component and there are commercial off-
the-shelf (COTS) alternatives, then changing the component as well as using COTS should be
identified as alternatives. This information allows the change analysis step to evaluate the options
and select a viable option to pursue. Sometimes, it is difficult to fix a component to remove the
root cause, but it is relatively easy to change a different component to fix the problem, at least
temporarily. In this case, these two alternatives should be identified. The changes identified are
analyzed to:
1. Assess the change impact-that is, which other components will be affected by the
changes made to a given component.
2. Estimate the costs and time required to implement the changes and test the result.
3. Identify risks and define resolution measures.
Object-oriented software exhibits complex dependencies among the classes. Changes made
to one class may affect many classes. This is also called the ripple effect. Figure 21.4 shows the
change impact in the InterViews library discussed earlier. The data indicate that on average, 15
classes are affected when one class is changed. The first three cases indicate that changes made to
one class could affect 51, 62, and 74 classes, respectively. However, in some case, change impact
is limited. For example, in one case, five classes are changed but only two classes are affected. As
discussed above, there are alternatives to fix a defect or solving a problem. Change impact analysis
should consider the maintenance costs associated with the alternative ways to change the software.
The number of alternatives should be limited to reduce the cost of change analysis. In addition to
change impact, the risks associated with the changes are identified and measures are defined to
resolve the risks if they do occur. The outcome of this step is used in the configuration change
control step to determine if the proposed changes should be performed.
Change impact analysis using the design class diagram is associated with a number of
problems. The design class diagram usually does not include implementation classes and
associated relationships. Moreover, the implementation of the classes and relationships of a design
179
class diagram may differ from their design counterparts. For example, an implemented class may
have more functions than its design counterpart. These functions may call functions of other
classes. Such dependencies may not exist in the design class diagram. Thus, change impact
analysis based on the design class diagram may produce incorrect results. Finally, if the design
class diagram is not available, then it is not possible to use this approach. Change impact analysis
using a reverse-engineering approach offers a solution to the above problems. This is described in
"Reverse-Engineering" (Section 21.6).
21.5.4 Configuration Change Control
Section 21.5.3 shows that changes made to a class may ripple throughout the system, affecting
many classes. In practice, the classes are developed by different teams and team members. If class
A is changed and class B is affected, then the developer of class B needs to know that his class has
been affected. This is necessary because changes to class B may be needed. Thus, changes to the
components of a system must be made in a coordinated manner; otherwise, the project would
become a chaos. The mechanism to coordinate the changes to the components of a system is called
software configuration management. Configuration change control (CCC) is one of its
components. It performs two main functions:
1. Preparing an engineering change proposal. Based on the change analysis result, the
maintenance personnel prepare an engineering change proposal (ECP). The ECP consists of
administrative forms, supporting technical and administrative materials that specify the proposed
changes, the reasons for the changes, the affected components, and the effort, time, and cost
required to implement the changes. The priorities of the changes are specified. A schedule to
implement the changes is also described.
2. Evaluating the engineering change proposal. The ECP is reviewed by a configuration
change control board (CCCB). The board consists of representatives from different parties
including representatives of the development teams of the com ponents that will be affected by the
changes. If the review raises concerns, then the proposal is modified and resubmitted. In some
cases, the proposal is rejected for various reasons. In this case, the proposal is archived for future
reference.
21.5.5 Change Implementation, Testing, and Delivery
Once the ECP is accepted, the changes are implemented. For many real-world systems, the change
incorporation activity needs to implement a set of changes, which may include all types of
maintenance. The implementation is application dependent. It is not pursued further. The
implementation is tested using the existing as well as new test cases. That is, regression testing and
development testing are performed. The modified and tested system is then deployed to operate in
the target environment. During the operation phase, bugs are recorded, various data such as system
logs, transaction processing times and data needed to compute desired metrics are collected. The
data are used to compute metrics to assess the performance of the system. These results are useful
for the next cycle of maintenance work.
180
21.6 REVERSE-ENGINEERING
The process that converts the code to recover the design, specification, and a problem statement is
a reverse process of the development process. Therefore, this is called reverse-engineering.
Chikofsky and Cross define reverse-engineering as "the process of analyzing a subject system to
identify the system's components and their interrelationships, and create representations of the
system in another form or at a higher level of abstraction" [46]. In comparison, "the traditional
process of moving from high-level abstractions and logical, implementation-independent designs
to the physical implementation of a system" is called forward-engineering.
Test case generation. The diagrams produced by reverse-engineering facilitate test case
generation. For example, the basis paths of a flowchart diagram facilitate the generation of basis
path test cases. The state diagram facilitates generation of state behavior test cases.
Software reengineering. The diagrams produced by reverse-engineering are useful for
software reengineering-a process that restructures the software system to improve a certain
aspect(s) of the software system. This is described in a later section.
21.6.3 Reverse-Engineering: A Case Study
In illustration, this section presents the OOTWorks, "Developing an Object-Oriented Software
Testing and Maintenance Environment" testing and maintenance environment reported in [101].
It includes the following components, among others:
Object Relation Diagram (ORD). This component takes the source code and produces a
UML class diagram showing the classes, their attributes and operations, and the relationships
between the classes. The user can select the classes and relationships, and the attributes and
methods of which classes to be displayed. The ORD utilities include:
• Change Impact Analysis. The user can select the classes to be changed and have
the tool highlight the classes that are affected, based on the dependencies among
the classes, as described in the last section.
• Software Metrics. This utility calculates software metrics including, for each
class, the class size, number of lines of code, number of children, fan-in, fan-out,
number of relationships, depth-in-inheritance-tree, and so on.
• Version Comparison. This utility takes as input two versions of the source code
and displays the ORD for the old and new versions. Moreover, the new version
highlights the classes added, changed, and affected. The old version highlights the
classes deleted, changed, and affected.
• Test Order. This utility computes the order to test the classes so that the effort
required to implement the test stubs is substantially reduced.
Block Branch Diagram (BBD). The Block Branch Diagram performs reverse engineering
of the functions of a class and displays the flowcharts for the functions. The BBD component also
calculates and displays the basis paths of the function, highlights the basis path selected, and shows
the variables that are used and modified.
Object Interaction Diagram (OID). This component performs reverse engineering of the
source code to generate and display a sequence diagram that describes the interaction between the
objects.
Object State Diagram (OSD). This component performs reverse-engineering of the
source code to produce and display a state diagram that describes the state dependent behavior of
an object. The utilities include state reachability analysis and state-based fault analysis.
21.7 SOFTWARE REENGINEERING
Software reengineering is an important activity of software maintenance. It is a process that
restructures a software system or component to improve certain aspects of the software. As
Lehman's laws indicate, software systems undergo continual changes. These cause the structure of
182
the software system to deteriorate. As a consequence, the software becomes more difficult to
comprehend and more costly to maintain. In this case, it is necessary to restructure the software
system to reduce the maintenance cost.
21.7.1 Objectives of Reengineering
As discussed above, reengineering is required to improve the structure of the software system so
that further maintenance activities can be performed cost effectively. Besides this, software
reengineering is sometimes performed to improve the quality, security, and performance aspects
of a software system. More specifically, software reengineering is often performed with one or
more objectives in mind. The following is a list of such objectives:
1. Improving the Software architecture. One important software reengineering
objective is to improve the software architecture of an existing system. The need for
improvement may be due to different reasons. Improving the software architecture is
achieved by applying architectural design patterns, security patterns, and design
patterns. For example, the controller pattern is often applied to decouple the graphical
user interface from the business objects to improve the architecture.
2. Reducing the complexity of the software. Studies indicate that the complexity of a
system has significant impact on the quality and security of a software system. The
complexity of a software system or component can be measured in different ways. One
complexity metric is the cyclomatic complexity, which was proposed by McCabe and
discussed in Chapters 19 and 20. It measures the number of independent paths or
control flows in a function. If the cyclomatic complexity is high, then the function is
difficult to understand, implement, test, and maintain. Many patterns can be applied to
reduce the complexity. These include observer, state, strategy, and other patterns.
3. Improving the ability to adapt to changes. This includes application of appropriate
design patterns to improve the strucn1re and behavior of the software system so that it
is more adaptable to changes in requirements and operating environment. For example,
the design of the persistence framework in Chapter 17 lets the system easily adapt to
changes in the database management system. Applying this framework to an existing
system improves its adaptiveness.
4. Improving the performance, efficiency, and resource utilization. During the system
operation phase, data about various aspects of the system are collected and metrics are
computed. These are valuable information for identifying places for improvement, for
example, performance bottlenecks, poor workload distribution and poor resource
utilization. Architectural styles, patterns, and efficient algorithms can be applied to
improve the system. For example, virtual proxy, smart proxy, fly-weight, and prototype
can be applied to improve performance, object creation speed, and memory usage.
5. Improving the maintainability of the software system. Many patterns can be applied
to make the software system easier to maintain. These include abstract factory, bridge,
builder, chain of responsibility, command, composite, decorator, facade, factory
183
method uses conditional statements to test if a metric is selected. If so, it calculates the metric. The
complexity reflects the use of 38 conditional checks to determine which metrics need be computed.
The next step is to select an improvement strategy. The use of conditional statements
implies behavior variations. That is, different metrics are calculated by using different algorithms.
This suggests that the polymorphism pattern can be applied. The polymorphism pattern is
summarized as follows:
Problem: How does one handle behavior variations without using conditional statements?
Solution: Define an interface for the behaviors that vary and let the subclasses implement the
behavior variations.
Thus, an abstract class called Metric is defined. It implements the Action Listener interface of Java.
Its actionPerformed (...) method invokes its abstract computeMetric (...) method. The subclasses
of Metric implement the compute Metric(...) method to compute the concrete metrics. Moreover,
the subclasses are the action listeners of the respective metric selection widgets, which are check
boxes. In this way, when the user checks a metric check box, the corresponding metric is
calculated. When the user clicks the Display Metrics button, the selected metrics are displayed.
The third step implements the proposed improvements. That is, the skeleton code for the
Metric abstract class is implemented. Test-driven development is applied to implement each of the
concrete Metric classes, one at a time. More specifically, the skeleton code for the subclass is
implemented. Tests are written to test the unimplemented computeMetric(...) method and make
sure all the tests fail. The implementation of the computeMetric(...) involves copying-and-pasting
the existing code into the appropriate places. The code is modified if needed. In most cases, very
little effort is required to modify the reused code. The tests are run to ensure that the metric is
correct. The process iterates for each of the metrics.
Finally, the modified tool is applied to assess the complexity of the modified class. It shows
that the complexities of the computeMetrics(...) methods of the subclasses are low. As expected,
about 40 new classes are added. Regression testing is performed to ensure that the change does not
alter other parts of the software. For this case study, it is the case. In addition to substantial
reduction in complexity, the improvement makes the component much easier to maintain. For
example, adding new metrics is very easy--0ne needs to add and implement a Metric subclass and
a check box to notify an object of this class. Test-driven development of this extension is also made
easy, compared to testing the chunk of code that contains 38 nested if-then-else statements.
21.8 PATTERNS FOR SOFTWARE MAINTENANCE
As discussed previously, many patterns can be applied during the maintenance phase to improve
the software system in various ways. This section presents two new pat terns. The facade pattern
is useful for simplifying the client that interacts with a group of components. While the facade
pattern simplifies the client interface, the 111ediaror pattern simplifies the internal interaction of
the group of components. Pat terns that can be applied during the maintenance phase are not limited
to these two patterns. Therefore, other patterns that can be applied during maintenance are also
reviewed.
185
Effort estimation tools are useful for calculating the required time, effort, and costs to
implement the proposed improvements. Configuration management tools such as Concurrent
Versions System (CVS) and Subversion are useful for coordinating the changes to maintain the
consistency of the software being reengineered.
Regression testing tools are useful for rerunning the test cases to ensure that the system
satisfies the requirements and reengineering does not introduce new errors. Some of the tools can
analyze the software and select a subset of test cases to rerun. This reduces the regression testing
time and effort.
190
191
Module 5, Chapter 2
Software Configuration Management
During the software life cycle, numerous documents are produced. These include requirements
specification, software design documents, source code, and test cases. These documents depend
on each other. For example, a software design is usually de rived from and dependent on the
requirements specification. Classes depend on other classes. This means that changing the
requirements specification requires changes to the design and changing one class requires changes
to other classes. In general, changes made to a document may ripple throughout the project,
affecting many other documents.
This chapter presents concepts, activities, and techniques for controlling changes to the
documents produced during the life cycle, and tracking the status of the software system and its
components. These activities belong to the software engineering discipline referred to as software
configuration management (SCM). Traditionally, configuration management only applies to the
development of hardware elements of a hardware-software system. It is concerned with the
consistent labeling, tracking, and change control of the hardware elements of a system. Software
configuration management adapts the traditional discipline to software development. In this
chapter, you will learn the following:
• Basic concepts of SCM including software configuration item and baseline.
• Functions of SCM including software configuration identification, software configuration change
control, software configuration auditing, and software configuration status accounting.
• Knowledge of SCM tools such as Revision Control System (RCS), Concurrent Versions System
(CVS), Subversion, Domain Software Engineering Environment (DSEE), and ClearCase.
22.1 THE BASELINES OF A SOFTWARE LIFE CYCLE
During the software development life cycle, numerous documents are produced. As each set of
documents is produced and passes quality reviews, the project is moving closer toward its
completion. In this sense, the successful productions of the needed software artifacts serve as
measurements of the progress of the project. More specifically, the productions of such documents
at significant check points, such as the end of the requirements phase, the end of the design phase,
and so forth, act like the milestones of a long journey. These milestones let us know the progress
status of the project and product. If the project reaches the milestones as scheduled, then the team
knows that it will be able to complete the project on time; otherwise, the team needs to take action
to resolve the discrepancy.
In SCM, the milestones are called baselines. A baseline denotes a significant state of the
software life cycle. For example, the baselines for the agile process/methodology described in
Chapter 1, that is, Figures 2.15 and 2. 16, are shown in Figure 22.1. Each baseline is associated
with a set of software artifacts or documents produced in the baseline. These artifacts or documents
are called software configuration items (SCIs). In practice, each project defines its baselines and
configuration items taking into con sideration factors such as project size, budget, and available
resources. The concept of a baseline serves several purposes:
192
1. It defines the important states of progress of a project or product. The baselines in Figure
22.1 define the significant states of a given project.
2. It signifies that the project or product has reached a given state of progress. That is, a
baseline is established when the required SCIs are produced and pass the SQA reviews. At this
point, the SCIs are checked in to the configuration management system. Once a configuration item
is checked in to the configuration management system, changes to the item must go through a
procedure to ensure that the changes will maintain the consistency of the configuration of the
system.
3. It forms a common basis for subsequent development activities. Before the establishment
of the requirements baseline, the teams could proceed with the design activities but changes to the
requirements and use cases are to be expected. The establishment of the requirements baseline
"freezes" the documents associated with the baseline, that is, changes can no longer be made freely.
Needed changes must be documented and evaluated to assess their impact to configuration items
produced in subsequent activities such as design diagrams and implementation.
4. It is a mechanism to control changes to configuration items as explained in the last bullet.
configuration management system. When all these documents are checked in, the baseline is
established. The configuration item management aspect of SCM is concerned with updates that
are made to the baseline items. That is, before a document is checked in to the configuration
management system, changes to the document can be made freely. However, once the document
is checked in, then any update to the document must go through a change control procedure to
coordinate the update.
22.3 WHY SOFTWARE CONFIGURATION MANAGEMENT?
For small projects that involve only a few developers working closely at one location, the need for
SCM is not acute. The team members can talk to each other in person to synchronize the updates
to the software artifacts they produce. However, many real-world software systems are developed
by many teams and developers working on shared, interdependent software artifacts
simultaneously, at different locations. In these cases, the work of one team cannot be started until
other documents are produced. Therefore, mechanisms are needed to establish the baselines and
publicize such information so that all of the teams are aware of the progress status of the project.
Updates to the software artifacts must be carefully coordinated to allow the teams to assess the
impact and avoid inconsistent update or overwriting the work produced by others.
Besides the need to synchronize the multiple distributed teams working together on a
project, the need to maintain different versions of a software system requires SCM. Multiple
versions of a system are needed to satisfy the needs of different customers, for example, different
customers require different modules of the system. If a vendor has a few dozen customers, then
the vendor may need to maintain dozens of versions of a software system to satisfy the needs of
its customers. In addition, the vendor may need to maintain different releases of a software system.
In some cases, there are subtle differences between the compilers from different compiler vendors.
This means there are variations in the source code of the software system, resulting in different
versions.
In summary, SCM is needed to coordinate the development activities of the multiple
development teams and team members, as well as to support maintenance of multiple versions of
a software system.
22.4 SOFTWARE CONFIGURATION MANAGEMENT
FUNCTIONS
As depicted in Figure 22.2, SCM consists of four main functions. These are outlined below and
detailed in the following sections.
• Software configuration identification. Software configuration identification de fines the
baselines, the configuration items, and a naming schema to uniquely identify each of the
configuration items. This function is performed when a new project starts.
• Configuration change control. Software configuration change control exercises control on
changes to the configuration items to ensure the consistency of the system configuration and
successful cooperation between the teams and team members. This function is performed when
change requests arrive, due to events that require changes.
194
• Software configuration auditing. Software configuration auditing verifies and validates the
baselines and configuration items, defines and executes mechanisms for formally establishing the
baselines, and ensures that proposed changes are properly implemented.
• Software configuration status accounting. Software configuration status accounting is
responsible for tracking and maintaining information about the system configuration. It provides
database support to the other three functions.
relationship, consider a sequence diagram that is derived from an expanded use case. Obviously,
if the expanded use case is modified, the sequence diagram may be affected and may need to be
modified as well. As an example of change impact due to an aggregation relationship, consider a
design specification that contains an expanded use case and a sequence diagram derived from the
expanded use case. If the expanded use case is deleted, then the sequence diagram must be deleted.
These imply that the design specification must be changed.
The abstract software configuration item (SCD, which is displayed in Figure 22.3 in italic
font, defines a set of attributes and operations that are common to all configuration items. Useful
attributes include, but are not limited to, the following:
• ID number-A unique ID to identify the SCI. It should bear certain semantics to communicate
the functionality of the SCI and the system or subsystem it belongs to. For example, a domain
model constn1cted in increment I for a library information system may have an ID number like
LIS-Inc 1-DM.
• name-The name of the configuration item, for example, Checkout Document Expanded Use
Case, Checkout Document Sequence Diagram, and so on.
• document type-The type of the document of the SCI, for example, requirements specification,
domain model, design specification, test cases, and the like. This attribute eliminates the need for
subclassing.
• document file-The document file or the full path name for the file that contains the SCI.
• author-The developer who creates the configuration item.
• date created, target completion date, and date completed-These are useful for tracking the
status of the SCI.
• version number-This is used to keep track of the multiple versions of a configuration item.
• update history-A list of update summaries, each of which briefly specifies the update, who
performs the update, and date of update.
• description-A brief description of the configuration item.
• SQA personnel-A technical staff who is responsible for the quality assurance of the configuration
item.
• SCM personnel-A technical staff who is responsible for checking in the con figuration item.
As usual, a simple configuration item has concrete operations to set and get attributes. It
may also include abstract operations for verifying and validating the configuration item as well as
computing various metrics. A composite configuration item has additional operations to add,
remove, and get component configuration items. Finally, a baseline has operations to add, remove,
and get a predecessor as well as operations to add, remove, and get a configuration item. To apply
the model, each concrete project extends the abstract leaf classes to provide concrete
implementation of the abstraction operations. In particular, a subclass is created for a set of SCIs
that share the same behavior.
196
this case, it is returned to proposal preparation function; and (3) the proposal is approved, in this
case, the changes are made.
5. Incorporate changes. The approved changes are made to the software system.
Versions System (CVS), Subversion (SYN), Domain Software Engineering Environment (DSEE),
IBM ClearCase, and many others. SCCS is one of the earliest computer-aided software for source
code revision control. RCS controls access to shared files through an access list of login names
and the ability to lock and unlock a revision. CVS is a substantial extension of RCS and is a
preinstalled plugin of NetBeans. Subversion is initially designed to replace CVS; and hence, it
possesses all of the CVS capabilities. However, since its inception in 2000, Subversion has evolved
beyond a CVS replacement and introduced a comprehensive set of advanced features. DSEE is a
proprietary SCM software, which forms the basis for IBM ClearCase. Figure 22.5 is a comparative
summary of some of the features of RCS, CVS, Subversion, and ClearCase. Appendix C.7
describes in detail how to use CVS and Subversion in NetBeans.
Tools that support system build include make and ant. Make is a UNIX/Linux utility and
ant provides the functions of make to build systems using Java components. These tools Jet the
software engineer specify a script or a sequence of commands. System build is accomplished by
executing the script. Nowadays, system build is supported by almost all of the integrated
development environments (IDEs).