Large-Scale C++ Software Design
Large-Scale C++ Software Design
John Lakes
•
TT
ADDISON· WESLEY
An imprint of Addison Wesley Longman, Inc.
Reading, Massachusetts Harlow, England Menlo Park, California
Berkeley, California Don Mills, Ontario Sydney
Bonn Amsterdam Tokvo Mexico Citv
The authors and publishers have taken care in the preparation of this book, but make no
expressed or implied warranty of any kind and assume no responsibility for errors or
omissions. No liability is assumed for incidental or consequential damages in connection
with or arising out of the use of the information of programs contained herein.
The publisher offers discounts on this book when ordered in quantity for special sales.
For more infonnation please contact:
DEDICATION
iTo my parents, Marci ana gene, wfw preparea ana encouragea me.
iTo my wife, Catliy, wfw suffered tlirougli it witli me.
iTo my aaugliter, Sarafi, wlio was 60m in tlie mitfafe of it alr.
All rights reserved. No part of this publication may pe reproduced, stored in a retrieval
system, or transmitted in any fonn or by any means, electronic, mechanical, photocopying,
recording, or otherwise, without prior written permission of the publisher. Printed in the
United States of America. Published simultaneously in Canada.
ISBN 0-201-63362-0
3456789 10 11-MA-00999897
Third printing, January 1997
Contents
•••
Figure List XIII
Preface xxv
Chapter 0: Introduction 1
0.1 From C to C++ .................................................................................... ~ ............................ ,. ........... 2
0.2 Using C++ to Develop Large Projects ............................................................................... 2
0.2.1 Cyclic Dependencies .............................................................................................. 3
0.2.2 Excessive Link-Time Dependencies ....................................................................... 5
0.2.3 Excessive Compile-Time Dependencies ................................................................................ 7
0.2.4 The Global Name Space ................................................................................................ 10
0.2.5 Logical vs. Physical Design ................................................. ,. ............................................... . 12
0.3 Reuse .....................................................................................................................................14
0.4 Quality ................................................................................................................................ 15
0.4.1 Quality Assurance ........................................................................•..•..•••••••••••••••••• 16
.0.4.2 Quality Ensurarice ................................................................................................. 17
0.5 Software Development Tools ................................................................................................ 17
0.6 Summary ............................................................................................................................18
PART I: BASICS 19
Chapter 1: Preliminaries 21
1.1 Multi-File C++ Programs ........................................... ~ .................................................... 21
1.1.1 Declaration versus Definition ............................................................................... 22
1.1.2 Internal versus External Linkage ...... :................................... ~ ............................... 23
VI• Contents
Chapter 3: Components 99
3.1 Components versus Classe.s ............................................................................................. 99
3.2 Physical Design Rules .................................................................................................... 108
3.3 J[Jt}e DeIJen<isC=>1l Relation .............................................................................................. 12()
3.4 ImIJlie<i DeIJen<iellcy ......................................•................................................................ 127
3.5 Extractillg Actual DeIJell<iencies ................................................................................... 134
3.()~1riellcislliIJ ...................................................................................................................... 1~(5
3.(5.1 Long-Distallce ~1riell<ishiIJ an<i ImIJlied DeIJenciency ......................................... 141
3 .().2 ~1riendshiIJ all<i Frau<i .......................................................................................... 144
3.7 Summary ..................................................... .- ................................................................. 147
Appendix A
The Protocol Hierarchy 737
Illtent ..................................................................................................................................... ~:3S)
Also Known As ..... "................................................................................................................... ~·3s)
Motivation ...................................................... ,...................................................................... 73s)
Applic;ability ......................................................................................................................... ~ 44
~tlrLlc;tlIr~ ............................................................................................................................... ~~~
Participants ........................................................................................................................... 745
Collaborations ........................................................................................................................................... ~ ~6
Consequences ....................................................................................................................... ~ 46
IIlljJlt!IlleIltGlti()1l ..................................................................................................................... ~~~
Sample Code ......................................................................................................................... ~ 61
Known Uses ............ ~ ................................................................................................................... 766
R~lat~d Patterns .................................................................................................................................................... ~ 6~
xii Contents
Appendix B
Implementing an ANSI C-Compatible C++ Interface 769
B.l Memory Allocation Error Detection ............................................................................. 769
B.2 Providing a Main Procedure (ANSI COnly) ................................................................ 778
Appendix C
A Dependency Extractor!Analyzer Package 779
C.l Using adep, cdep, and 1 dep ......................................................................................780
C.2 COIIlIIland-~in~ DO(;UIIlentCltion ................................................................................... -79~
C.~ Idep Package Architecture ............................................................................................ 81 0
C.4 Source Code ................................................................................................................... 81~
AppendixD
Quick Reference 815
D.l Definitions .................................................................................... ·... "............................. 815
][).2 1\I.Iajor ][)esign ~ules ...................................................................................................... 82()
][).~ Minor ][)esign Rules ...................................................................................................... 821
][).4 Guidelines .......................................................................................................................... 822
1:>.:; ~Ilc;iI>les ...................................................................................................................... 824
Bibliography 833
Index 835
Figure List
Back in 1985, Mentor Graphics became one of the first companies to attempt a truly
large project in C++. Back then no one knew how to do that, and no one could have
anticipated the cost overruns, slipped schedules, huge executables, poor performance,
and incredibly expensive build times that a naive approach would inevitably produce.
Many valuable lessons were learned along the way-knowledge obtained through bit-
ter experience. There were no books to help guide the design process; object-oriented
designs on this scale had never before been attempted.
Ten years later, with a wealth of valuable experience under its belt, Mentor Graphics
has produced several large software systems written in C++, and in doing so has
paved the way for others to do the same without having to pay such a high price for
the privilege.
Audience
Large-Scale C++ Software Design was written explicitly for experienced C++ soft-
ware developers, system architects, and proactive quality-assurance professionals.
This book is particularly appropriate for those involved in large development efforts
such as databases, operating systems~ compilers, and frameworks.
Developing a large-scale software system in C++ requires more than just a sound
understanding of the logical design issues covered in most books on C++ program-
ming. Effective design also requires a grasp of physical design concepts that, although
closely tied to the technical aspects of development, include a dimension with which
even expert professional software developers may have little or no experience.
Yet most of the advice presented in this book also applies to small projects. It is typical
for a person to start with a small project and then begin to take on larger and more
challenging enterprises. Often the scope of a p.articular project will expand, and what
starts out as a small project becomes a major undertaking. The immediate conse-
quences of disregarding good practice in a large project, however, are far more severe
than they are for disregarding good practice in a smaller project.
This book unites high-level design concepts with specific c++ programming details
to satisfy two needs:
Make no mistake, this is an advanced text. This is not the book from which to learn
C++ syntax for the first time, nor is it likely to expose you to the dark corners of the
language. Instead, this book will show you how to use the full power of the C++ lan-
guage in ways that scale well to very large systems.
In short, if you feel that you know C++ well, but would like to understand more about
how to use the language effectively on large projects, this book is for you.
Most people learn by example. In general, I have supplied examples that illustrate
real-world designs. I have avoided examples that illustrate one point but have blatant
errors in other aspects of the design. I have also tried to avoid examples that illustrate
a detail of the language but serve no other usef~l purpose.
Except where otherwise indicated, all examples in this text are intended to represent
"good design." Examples presented in earlier chapters are therefore consistent with
all practices recommended throughout the book. A disadvantage of this approach is
that you may see code that is written differently from the code you are used to seeing,
without yet knowing exactly why. I feel that being able to use all of the examples in
the book for reference compensates for this drawback.
There are two notable exceptions to this practice: comments and 'package prefixes.
Comments for many of the examples in this text have simply been omitted for lack of
space. Where they are presented, they are at best minimal. Unfortunately, this is one
place where the reader is asked to "do as I say, not as I do"-at least in this book. Let
the reader be assured that in practice I am scrupulous about commenting all interfaces
as I write them (not after).
The second exception is the inconsistent use of package prefixes in the early examples
of the book. In a large project environment package prefixes are required, but they are
awkward at first and take some getting used to. I have elected to omit the consistent use
of registered package prefixes until after they are formally presented in Chapter 7, so as
not to detract from the presentation of other important fundamental material.
Many texts note that inline functions are used in examples for textual brevity when
illustrating intended functionality. Since much of this book is directly related to orga-
nizational issues such as when to inline, my tendency will be to avoid inline functions
xxviii Preface
There are a variety of popular file name extensions used to distinguish C++ header
files and C++ implementation files. For example:
Throughout the examples we consistently use the . h extension to identify C++ header
files and the . c extension to identify C++ implementation files. In the text, we will
frequently refer to header files as . h files and to implementation files as . c files.
Finally, all of the examples in this text have been compiled and are syntactically
correct using SUN'S version of CFRONT 3.0 running on SUN SPARC stations, as well as
on HP700 series machines running their native C++ compiler. Of course, any errors are
the sole responsibility of the author.
A Road Map
There is a lot of material to cover in this book. Not all readers will have the same
background. I have therefore provided some basic (but essential) material in Chapter
1 to help level the field. Expert C++ programmers may choose to skim this section or
simply refer to it if needed. Chapter 2 contains a modest collection of software design
rules that I would hope every experienced developer will quickly ratify.
Chapter 0: Introduction.
An overview of what lies in wait for the large-scale C++ software
developer.
•
A Road Map XXIX
PART I: BASICS
Chapter 1: Preliminaries.
A review of basic language information, common design patterns, and
style conventions used in this book.
The remainder of the text is divided into two main sections. The first, entitled "Physical
Design Concepts," presents a sequence of important topics related to the physical
structure of large systems. The material in these chapters (3 through 7) focuses on
aspects of programming that will be entirely new to many readers, and cuts right to
the bone of large program design. This section is presented "bottom up," with each
chapter drawing on information developed in previous chapters.
Chapter 3: Components.
The fundamental physical building blocks of a system.
Chapter 5: Levelization.
Specific techniques for reducing link-time dependencies.
Chapter 6: Insulation.
Specific techniques for reducing compile-time dependencies.
Chapter 7: Packages.
Extending the above techniques to yet larger systems.
The final section, entitled "Logical Design Issues," addresses the conventional
discipline of logical design in conjunction with physical design. These chapters (8
through 10) address the design of a component as a whole, summarize the myriad
xxx Preface
issues surrounding sensible interface design, and address implementation issues in the
context of a large-project environment.
Acknowledgments
This book would not have been possible without the diligence of my many colleagues
at Mentor Graphics who have contributed to the company's landmark architectural
and development efforts.
First and foremost, I would like to recognize the contributions of my friend, col-
league, and former college classmate Franklin Klein, who reviewed virtually every
page of the manuscript in its raw form. Franklin provided a sounding board for pre-
senting many concepts that will be new to most software developers. The depth of
Franklin's wisdom, intelligence, knowledge, diplomacy, and grasp of the nuances of
effective communication is unprecedented in my experience. His detailed comments
are responsible for countless revisions in the content, flow, and demeanor of the pre-
sentation.
Several dedicated and gifted software professionals reviewed all or most of the mate-
rial in this book during its formative stages. I consider myself fortunate that they
agreed to invest their valuable time reviewing this book. I would like to thank Brad
Appleton, Rick Cohen, Mindy Garber, Matt Greenwood, Amy Katriel, Tom
Acknowledgments xxxi
O'Rourke, Ann Sera, Charles Thayer, and Chris Van Wyk for the enormous energy
they spent helping to make this book as valuable as it could be. In particular, I would
like to thank Rick Eesley for many fertile discussions and practical recommenda-
tions especially his plea for a summary at the end of each chapter.
Several expert software developers and quality assurance engineers reviewed individ-
ual chapters. I would like to thank Samir Agarwal, Jim Anderson, Dave Arnone, Rob-
ert Brazile, Tom Cargill, Joe Cicchiello, Brad Cox, Brian Dalio, Shawn Edwards, Gad
Gruenstein, William Hopkins, Curt Horkey, Ajay Kamdar, Reid Madsen, Jason Ng,
Pete Papamichael, Mahesh Ragavan, Vojislav Stojkovic, Clovis Tondo, Glenn Wikle,
Steve Unger, and John Vlissides for their technical contributions. I would also like to
thank Lisa Cavaliere-Kaytes and Tom Matheson of Mentor Graphics for their sugges-
tions regarding some of the figures in this text. In addition I would like to acknowl-
edge the contributions of Eugene Lakos and Laura Mengel.
Since the original printing of this book, I would like to thank the following readers for
.helping me to remove some of the inevitable errors for which I take full responsibility:
Jamal Khan, Oat Nguyen, and Scott Meyers.
This book might never have been written were it not for a promotional letter I
received at Columbia University offering me a complimentary review copy of Rob
Murray's book. Since 1 teach only during the Spring semester, 1 returned the enclosed
form, but requested that the book be sent to Mentor Graphics instead of Columbia.
Soon after that, I received a call from Pradeepa Siva (of Addison-Wesley's Corporate
& Professional Publishing Group) determined to get to the bottom of this unusual
request. After convincing her of its legitimacy (and some perhaps gratuitous self
aggrandizement) she remarked, "I think my boss would like to talk with you." A few
days after that, 1 met with her boss-the publisher. 1 had always revered the excel-
lence of the Professional Computing Series produced by this group, and it is that rep-
utation that ultimately compelled me to commit to writing this book for that series.
lowe a great deal to the members of the Corporate & Professional Publishing Group
at Addison-:Wesley. John Wait, its publisher, has patiently provided me with insights
into people and communication that 1 will forever cherish. From relentlessly reading
books and reviews, to direct discussions with individual software professionals, to
standing in bookstores and discretely observing the buying habits of potential readers,
John Wait has his fingers on the pulse of the industry.
xxxii Preface
The production staff headed by Marty Rabinowitz is dedicated to excellence in all its
respects. Despite apprehension expressed to me by authors in academia (associated
with other publishers), I was delighted with the tremendous importance placed by
Marty on delivering a technically accurate, readily usable, and aesthetically appealing
rendering of the author's ideas. I especially want to thank Frances Scanlon for her tire-
less and seemingly endless efforts in typesetting this entire book.
Brian Kernighan, the technical editor of this series, provided valuable contributions
on both style and substance, as well as finding many typographical errors and incon-
sistencies that no one else caught. The depth and breadth of his knowledge coupled
with his concise writing style has in no small way contributed to the success of this
.
senese
Finally, I would like to thank the other authors in this series for documenting funda-
mental logical concepts and design practices that this book takes for granted.
Introduction
Developing good C++ programs is not easy. Developing highly reliable and maintain-
able software in C++ becomes even more difficult and introduces many new concepts
as projects become larger. Just as experience gained from building single-family
homes does not qualify a carpenter to erect a skyscraper, many techniques and prac-
tices learned through experiences with smaller C++ projects simply do not scale well
to larger development efforts.
This book is about how to design very large, high-quality software systems. It is
intende~ for experienced C++ software developers who strive to create highly main-
tainable, highly testable software architectures. This book is not a theoretical
approach to programming; it is a thorough, practical guide to success, drawing from
years of experience of expert C++ programmers developing huge, multi-site systems.
We will demonstrate how to design systems that involve hundreds of programmers,
thousands of classes, and potentially millions of lines of C++ source code.
This introduction considers some of the kinds of problems encountered when devel-
oping large projects in C++, and provides a context for the groundwork we must do
in the early chapters. In this introduction several terms are used without definition.
Most of these terms should be understandable from context. In the chapters that fol-
low, these terms are defined more precisely. The real payoff will come in Chapter 5,
where we begin to apply specific techniques to reduce the coupling (i.e., the degree
of interdependency) within our C++ systems.
2 Introduction Chapter 0
C++ is not just an extension of C: it supports an entirely new paradigm. The object-
oriented paradigm is notorious for demanding more design effort and savvy than its
procedural counterpart. C++ is more difficult to master than C, and there are innu-
merable ways to shoot yourself in the foot. Often you won't realize a serious error
until it is much too late to fix it and still meet your schedule. Even relatively small
indiscretions, such as the indiscriminate use of virtual functions or the passing of
user-defined types by value, can result in perfectly correct C++ programs that run ten
times slower than they would have had you written them in C.
During the initial exposure to C++, there is invariably a period during which produc-
tivity will grind to a halt as the seemingly limitless design alternatives are explored.
During this period, conventional procedural programmers will· be filled with an
uneasiness as they try to get their arms around the concept referred to as object
oriented.
Although the size and complexity of the C++ language can at first be somewhat over-
whelming for even the most experienced professional C programmers, it does not take
too long for a competent C programmer to get a small, nontrivial C++ program up and
running. Unfortunately, the undisciplined techniques used to create small programs in
C++ are totally inadequate for tackling larger projects. That is to say, a naive application
of C++ technology does not scale well to larger projects. The consequences for the
uninitiated are many.
Just like a program in C, a poorly written C++ program can be very hard to under-
stand and maintain. If interfaces are not fully encapsulating, it will be difficult to tune
Contrary to popular belief, object-oriented programs in their most general form are
fundamentally more difficult to test and verify than their procedural counterparts. 2
The ability to alter internal behavior via virtual functions can invalidate class invari-
ants essential to correct performance. Further, the potential number of control flow
paths through an object-oriented system can be explosively large.
As programs get larger, forces of a different nature come into play. The following sub-
sections illustrate specific instances of some of the kinds of problems that we are
likely to encounter.
As a software- professional, you have probably been in a situation where you were
looking at a software system for the first time and you could not seem to find a rea-
sonable starting point or a piece of the system that made sense on its own. Not being
able to understand or use any part of a system independently is a symptom of a cycli-
cally dependent design. C++ objects have a phenomenal tendency to get tangled up in
each other. This insidious form of tight physical coupling is illustrated in Figure 0-1. A
circuit is a collection" of elements and wires. Consequently, class Cire u i t knows
about the definitions of both E1 ement and Wi reo An element knows the circuit to
which it belongs, and can tell whether or not it is connected to a specified wire. Hence
class E1 ement also knows about both Ci rcui t and Wi reo Finally, a wire can be con-
nected to a terminal of either an element or a circuit. In order to do its job, class Wi re
must access the definitions of both E1 ement and Ci rcui t.
The definitions for each of these three object types reside in separate physical compo-
nents (translation units) in order to improve modularity. Even though the implementa-
tions of these individual types are fully encapsulated by their interfaces, however, the
. c files for each component are forced to include the header files of the other two. The
resulting dependency graph for these three components is cyclic. That is, no one com-
ponent can be used or even tested without the other two.
II circuit.h
I I .. .
. .•. •. class Wi re;
class Circuit {
/ I ...
Wire *addWire(const char*);
Wire *addElem(const char*);
II circuit.c
#include "circuit.h"
#include "wire.h"
II element.h #include "element.h"
II ... /! ...
class Circuit;
class Wire;
class Element
I I ...
Circuit *getParent();
int isConn(const Wire&);
II e1ement.c
. . i}include "element.h" class Element;
'. #include »circuit.h H
class Circuit;
#include "wire.h"
/ I ...
class Wire
element / I ...
void conn(Element*~int term);
void connCCircuit*,int term);
. ..
II wire.c
#include "wire.h"
- -......- -____...,-,Fil iti ncl ude "el ement. hI!
#include "circuit.h"
/ I ...
wire
Large systems that are naively architected tend to become tightly coupled by cyclic
dependencies and fiercely resist decomposition. Supporting such systems can be a
nightmare, and effective modular testing is often impossible.
Section 0.2.2 Excessive Link-Time Dependencies 5
One of the nice things about objects is that it is easy to add missing functionality as the
need presents itself. This almost seductive feature of the paradigm has tempted many
conscientious developers to tum lean, well-thought-out classes into huge dinosaurs
that embody a tremendous amount of code-most of which is unused by the vast
majority of its clients. Figure 0-2 illustrates what can happen when the functionality in
a simple St r i n9 class is allowed to grow to fill the needs of all clients. Each time a
new feature is added for one client, it potentially costs all of the rest of the clients in
terms of increased instance size, code size, runtime, and physical dependencies.
c++ programs are often larger than necessary. If care is not taken, the executable size
for a C++ program could be much larger than it would be if the equivalent program
were written in C. By ignoring external dependencies, overly ambitious class develop-
ers have created sophisticated classes that directly or indirectly depend on enormous
amounts of code. A "Hello World" program employing one particularly elaborate
St r i n9 class produced an executable size of 1.2 megabytes!
6 Introduction Chapter 0
II str.h II str.c
#ifndef INCLUDED STR Hinclude IIstrohl!
#define INCLUDED_STR #include IIsun.h"
#include "moon.h"
class String { #include "stars.hl!
char *d_string_p; Ir ...
int d_length; II (lots of dependencies omitted)
int d size; I I ...
int d_count; #include "everyone.h"
I I .. 0 #include "theirbrother.h"
double d_creationTime; String::StringC)
d_string_p(O)
public: , d_length(O)
String(); , d_size(O)
String(const String& s); , d_count(O)
String(const char *cp); II
String(const char c); II
II .00 II
--String();
String &operator=(const String& s);
String &operator+=(const String& s);
II
II (27 pages omitted!)
I I ...
int isPalindrome() const;
int isNameOfFamousActor() const;
};
II
#endif
Overweight types such as this St r i n 9 class not only increase executable size but can
make the linking process unduly slow and painful. If the time necessary to link in
5 t r i n9 (along with all of its implementation dependencies) is large relative to the
time it would otherwise take to link your subsystem, it becomes less likely that you
would bother to reuse St r i n g.
Fortunately,. techniques exist for avoiding these and other forms of unwanted link-
time dependencies.
If you have ever tried to develop a multi-file program in C++, then you know that
changing a header file can potentially cause several translation units to recompile. At
the very early stages of system development, making a change that forces the entire
system to recompile presents no significant burden. As you continue to develop your
system, however, the idea of changing a low-level header file becomes increasingly
distasteful. Not only is the time necessary to recompile the entire system increasing,
but so is the time to compile even individual translation units. Sooner or later, there
comes a point where you simply refuse to modify a low-level class because of the cost
of recompiling. If this sounds familiar, then you may have experienced excessive
compile-time dependencies.
Excessive cornpile-:-time coupling, which is virtually irrelevant for small projects, can
grow to dominate the development time for larger projects. Figure 0-3 shows a com-
mon example of what appears to be a good idea at first but turns bad as the size of a
system grows. The myerror component defines a struct, MyError, that contains an
.enumeration of all possible error codes. Each new component that is added to the
system naturally includes this header file. Unfortunately, each new component may
have its own error codes that have not already been identified in the master list.
8 Introduction Chapter 0
II myerror.h
#ifndef INCLUDED_MYERROR
#define INCLUDED MYERROR
struct MyError {
enum Codes {
SUCCESS = 0,
WARNING,
ERROR,
IO_ERROR,
I I ...
READ_ERROR,
WRITE_ERROR,
I / .. .
/ / .. .
BAD_STRING,
BAD_FILENAME.
// .. .
// .. .
CANNOT_CONNECT_TO_WORK_PHONE,
CANNOT_CONNECT_TO_HOME_PHONE,
// .. .
// .. .
MARTIANS_HAVE_LANDEO,
// ...
};
};
#endif
As the number of components gets larger, our desire to add to this list will wane. We
will be tempted to reuse existing error codes that are, perhaps, only roughly appropri-
ate just to avoid changing myerror. h. Eventually, we will abandon any thought of add-
ing a new error code, and simply return ERROR or WARNING rather than change
myerror. h. By the time we reach this point, the design is unmaintainable and practi-
cally useless.
section 0.2.3 Excessive Compile-Time Dependencies 9
There are many other causes of unwanted compile-time dependencies. A large C++
program tends to have many more header files than an equivalent C program. The
unnecessary inclusion of one header file by another is a common source of excessive
coupling in c++. In Figure 0~4, for example, it is not necessary to include the defini-
tion of objects in the simulator header file just because a client of class Si mu1 at 0 r
may find these definitions useful. Doing so forces the client to depend at compile-
time on all such components whether or not they are actually used. Excessive include
directives not only increase the cost of compiling the client, but increase the likeli-
hood that the client will need to be recompiled as a result of a low-level change to the
system.
The problem was due to organizational details illustrated in part by the simulator
component shown in Figure 0-4. Cosmetic techniques were developed to mitigate this
problem, but the real solution came when the unnecessary compile-time dependencies
were eliminated.
10 Introduction Chapter 0
II simulator.h
#ifndef INCLUDED_SIMULATOR
#define INCLUDED_SIMULATOR
#include "cadtool.h" II required by "IsA" relationship
#include "myerror.h" II bad idea (see'Section 6.9)
If inc 1ude" c i r cui t reg i s try. h I! II unnecessary compile-time dependency
#include "inputtable.h" II unnecessary compile-time dependency
#include "circuit.hl! II required by "HasA" relationship
#include "rectangle.h" II unnecessary compile-time dependency
/ I ...
#include <iostream.h> II unnecessary compile-time dependency
#endif
As with link-time dependencies, there are several specific techniques available for
eliminating compile-time dependencies.
If you have ever worked on a mUlti-person C++ project, then you know that software
integration is a common forum for unwanted surprises. In particular, the proliferation
of global identifiers can become problematic. One obvious danger is that these names
can collide. The consequence is that the individually developed parts of the system
Section 0.2.4 The Global Name Space 11
For example, I have used object libraries that have consisted of literally thousands of
header files. I recall trying to find the definition of a type Tar get I d at file scope that
looked like a class (but wasn't):
Targetld id;
I remember trying to "grep,,3 through all of the thousands of header files looking for
the definition, only to receive a message to the effect that there were too many files. I
wound up having to nest the grep command in a shell script that split up the header
files based on the first letter in order to pare down the problem into 26 problems of
manageable size. I eventually discovered that the "class" I was looking for was not a
cl ass at all. Nor was it a struct or a uni on! As illustrated in Figure 0.5, the type
Ta rget I d, it turned out, was actually a typedef declaration at file scope for an i nt!
II upd_system.h
#ifndef INCLUDED_UPD_SYSTEM
#define INCLUDED_UPD_SYSTEM
#endif
The typedef had introduced a new type name into the global name space. There was
no indication that type was an i nt, nor was there any hint of where I might find its
definition.
II upd_system.h
#ifndef INCLUDED_UPD_SYSTEM
#define INCLUDED_UPD_SYSTEM
class upd_System {
I I ...
public:
typedef int Targetld; II much better!
II
};
#endif
Had the typedef declaration been nested within a class (as suggested in Figure 0-6),
the reference would have been qualified with that class name (or the declaration
would have been inherited), making it straightforward to track down:
Following simple practices like the one suggested above can minimize the likelihood
of collisions and at the same time make logical entities easier to find in large systems.
Most books on C++ address only logical design. Logical design is that which pertains
to language constructs such as classes, operators, functions, and so on. For example,
whether a particular class should or should not have a copy constructor is a logical
design issue. Deciding whether a particular operator (e.g., 0 per a to r==) should be a
class member or a free (i.e., nonmember) function is also a logical issue. Even select-
ing the types of the internal data members of a class would fall under the umbrella of
logical de~ign.
c++ supports an overwhelmingly rich set of logical design alternatives. For example,
inheritance is an essential ingredient of the object-oriented paradigm. Another, called
layering, involves composing types from more primitive objects, often embedded
directly in the class definition. Unfortunately there are many who would try to use
inheritance where layering is indicated: A Telephone is not a kind of Recei ver, Di al,
or Cor d; rather, it is composed of (or "layered on") those primitive parts.
Section 0.2.5 Logical vs. Physical Design 13
Misdiagnosing a situation in this way can lead to inefficiencies in both time and space,
and can obscure the semantics of the architecture to a point where the entire system
becomes difficult to maintain. Knowing when (and when not) to use a particular lan-
guage construct is part of what makes the experienced C++ developer so valuable.
Logical design does not address issues such as where to place a class definition. From
a purely logical perspective, all definitions at file scope exist at the same level in a sin-
gle space without boundaries. Where a class is defined relative to its member defini-
tions and supporting free operators is not relevant to logical design. All that is
important is that these logical entities somehow come together to form a working pro-
gram, and that, because the entire program is thought of as a single unit, there is no
notion of individual physical dependencies. The program as a whole depends on itself.
There are several good books on logical design (see the bibliography). Unfortunately,
there are also many problems, which arise only as programs get larger, that these
books do not address. This is because much of the material relevant to successful
large-system design falls under a different category, referred to in this book as
physical design.
Physical design addresses the issues surrounding the physical entities of a system (e.g.,
files, directories, and libraries) as well as organizational issues such as compile-time or
link-time dependencies between physical entities. For example, making a member
ri ng() of class Telephone an i nl i ne function forces any client of Tel ephone to
have seen not only the declaration of r i n9 ( ) but also its definition in order to com-
pile. The logical behavior of r i n9 () is the same whether or not r i n9 () is declared
i n 1 i ne. What is affected is the degree and character of the physical coupling between
Tel e p h0 ne and its clients, and therefore the cost of maintaining any program using
Telephone.
Good physical design, however, involves more than passively deciding how to partition
the existing logical entities of a system. Physical design implications will often. dictate
the outcome of logical design decisions. For example, relationships between classes
in the logical domain, such as IsA, HasA, and Uses, collapse into a single relation-
ship, DependsOn, between components in the physical domain. Furthermore, the
dependencies of a sound physical design will form a graph that has no cycles. There-
fore we avoid logical design choices that would imply cyclic physical dependencies
among components.
14 Introduction Chapter 0
Simultaneously satisfying the constraints of both logical and physical design may, at
times, prove challenging. In fact, some logical designs may have to be reworked or
even replaced in order to meet the physical design quality criteria. In my experience,
however, there have always been solutions that adequately address both domains,
although it may (at first) take some time to discover them.
For small projects that fit easily into a single directory, physical design may warrant
little concern. However, for larger projects the importance of a sound physical design
grows rapidly. For very large projects, physical design will be a critical factor in deter-
mining the success of the project.
0.3 Reuse
Object-oriented design touts reuse as an incentive, yet like many other benefits of the
paradigm, it is not without cost. Reuse implies coupling, and coupling in itself is
undesirable. If several programmers are attempting to use the same standard component
without demanding functional changes, the reuse is probably reasonable and justified.
Consider, however, the scenario where there are several clients working on different
programs, and each is attempting to "reuse" a common component to achieve some-
what different purposes. If those otherwise independent clients are actively seeking
enhancement support, they could find themselves at odds with one another as a result
of the reuse: an enhancement for one client could disrupt the others. Worse, we could
wind up with an overweight class (like the st r i n 9 class of Figure 0-2) that serves the
needs of no one.
Reuse is often the right answer. But in order for a component or subsystem to be
reused successfully, it must not be tied to a large block of unnecessary code. That is, it
must be possible to reuse the part of the system that is needed without having to link
in the rest of the system.
Large projects stand to benefit from their implementors' knowing both when to reuse
code and when to make code reusable.
section 0.4 Quality 15
0.4 Quality
Quality has many dimensions. Reliability addresses the traditional definition of quality
(Le., "Is it buggy?"). A product that is easy to use and does the right thing most of the
time is often considered adequate. For some applications, however-in areas such as
aerospace, medical, and financial, for example--errors can be extremely costly. In
general, software cannot be made reliable through testing alone; by the time you are
able to test it, the software's intrinsic quality has already been established. Not all
software can be tested effectively. For software to be tested effectively, it must be
designed from the start with that goal in mind.
Design for testability, although rarely the first concern of smaller projects, is of para-
mount importance when successfully architecting large and very large C++ systems.
Testability, like quality itself, cannot be an afterthought: it must be considered from
the start-before the first line of code is ever written.
There are many other aspects to quality besides reliability. Functionality, for example,
addresses whether a product does what the customer expects. Sometimes a product
will fail to gain acceptance because it does not have enough of the features that
customers have come to expect. Worse, a product can miss its mark altogether: if a
customer expects to buy a screwdriver, the best hammer in the world will fail a function-
ality test. Having a clear functional specification that meets marketing requirements
before development is underway is an important first step toward ensuring appropriate
functionality. In this book, however, we consider techniques that address how to
design and build large systems, and not what large systems to design.
Usability is yet another measure of .quality. Some software products can be very
powerful in the right hands. However, it is not enough that the developer be able to
use the product effectively. If the product is too complex, difficult, awkward, or pain-
ful for the typical intended customer to pick up and use, it will not be used. Often
when we say user, we think of the end user of the system. In a large, hierarchically
designed system, however, the clients of your component are probably just other com-
ponents. Early feedback from customers (including other developers) is essential for
ensuring usability.
of customers. A poorly designed system written in C++ (or any other language, for
that matter) can be expensive to maintain and even more expensive to extend. Large,
maintainable designs don't just happen; they are engineered by following a discipline
that ensures maintainability.
Peiformance addresses how fast and small the product is. Although object-oriented
design is known to have valuable advantages in the areas of extensibility and reuse,
there are aspects of the paradigm that, if applied naively, can cause programs to run
more slowly and require more memory than is necessary. If our code runs too slowly,
or if it requires much more memory than a competitor's product, we cannot sell it. For
example, modeling every character in a text editor as an object, although perhaps
theoretically appealing, could be an inappropriate design decision if we are interested
in optimal space/time performance. 4 Attempting to replace a heavily used funda-
mental type (such as i n t) with a user-defined version (such as a Big I nt class) will
inevitably degrade performance. If we fail to address our performance goals in the
beginning, we may adopt architectures or coding practices that will preclude our ever
achieving these goals, short of rewriting the entire system. Knowing where to accept
some inelegance and knowing how to contain the effects of performance trade-offs
distinguishes software engineers from mere programmers.
4 See the Flyweight pattern in gamma, Chapter 4, pp. 195-206 for a clever solution to this particu-
lar kind of performance problem.
Section 0.4.2 Quality Ensurance 17
In this process model, the distinction of QA and development is blurred; the technical
qualifications for either position are essentially the same. One day, an engineer could
write an interface and have another engineer review it for consistency, clarity, and
usability. The next day the roles could be reversed. To be truly effective, the culture
must be one of teamwork-each member helping the other to ensure high-quality
software as it is being developed.
Providing a complete process model is a huge task and well beyond the scope of this
book. However if high-quality software is to be achieved, system architects and soft-
ware developers must take the lead by designing in the quality all along the way.
Large projects can benefit from many kinds of tools, including browsers, incremental
linkers, and code generators. Even simple tools can be very useful. A detailed descrip-
tion of a simple dependency analyzer that I have found invaluable in my own work is
provided in Appendix C.
Some tools can help to mitigate the symptoms of a poor design. Class browsers can
help to analyze convoluted'designs and find definitions for logical entities that would
otherwise be hidden-buried within a large project. Sophisticated programming
environments with incremental linkers and program databases can help to push the
envelope of what can be accomplished even with a poor physical design. But none of
these tools address the underlying problem: a lack of inherent design quality.
18 Introduction Chapter 0
Unfortunately there is no single quick and easy way to achieve quality. Tools alone
cannot solve fundamental problems resulting from a poor physical design. Although
tools can postpone the onset of some of these symptoms, no tool will design in the'
quality for you, nor will it ensure that your design complies with its specification.
Ultimately, it is experience, intelligence, and discipline that yield a quality product.
0.6 Summary
c++ is a whole lot more than just an extension of C. Cyclic link-time dependencies
among translation units can undermine understanding, testing, and reuse. Unnecessary
•
or excessive compile-time dependencies can increase compilation cost and destroy
maintainability. A disorganized, undisciplined, or naive approach to C++ development
will virtually guarantee that these problems occur as projects become larger.
Most C++ design books address only logical issues (such as classes, functions, and
inheritance) and ignore physical issues (such as files, directories, and dependencies).
In larger systems, however, physical design quality will dictate the correct outcome of
many logical design decisions ..
Reuse is not without cost. Reuse implies coupling, and coupling can be undesirable.
Unwarranted reuse is to be avoided.
Quality has many dimensions: reliability, functionality, usability, maintainability, and per-
formance. Each of these dimensions contributes to the success or failure of large projects.
Finally, good tools are an important part of the development process. But tools cannot
make up for a lack of inherent design quality in large C++ systems. This book is about
how to design in that quality.
PART I:
This book covers quite a bit of material relating to object-oriented design and C++
programming. Not all readers will have the same background. In Part I of this text, we
address the fundamentals in an effort to reach a common starting point from which to
launch further discussions.
Chapter 1 is a review of several key properties of the C++ language, basic object-
oriented design principles and notation, and standard coding and documentation
conventions used throughout this text. The purpose of this chapter is to help level the
field. It is expected that much of this material will be familiar to many readers. Nothing
presented here is new. Expert C++ programmers may choose to skim this chapter or
simply refer to it as needed.
This chapter reviews some important aspects of the C++ programming language and
object-oriented analysis that are fundamental to large-system design. Nothing revolu-
tionary is presented; some material, however, may be unfamiliar. We start by examin-
ing multi-file programs, declaration versus definition, and internal versus external
linkage in the contexts of both header (. h) and implementation (. c) files. Next we
explore the use of typedef declarations and ass~rt statements. After considering a few
matters of style regarding naming conventions and class member layout, we explore
one of the most common object-oriented design patterns: iterator. We conclude with
a thorough discussion of the logical design notation used throughout this book, a
brief discussion of inheritance versus layering, -and, finally, a recommendation for
minimality in our interfaces.
For all but the tiniest programs, it is neither wise nor practical to place an entire program
in a single file. For one thing, each- time you made a change to any part of the program,
you would be forced to recompile the program in its entirety. You also would not be
able to reuse any part of your program in another program without copying the source
code to another file. Such duplication can quickly become a maintenance headache.
PlaCing the source code for cohesive parts of a program in separate files enables the
program to be compiled more efficiently, while enabling its parts to be reused in other
programs.
22 Preliminaries Chapter 1
In this section, we review some basic properties of the structure of the C++ language with
regard to programs that are created from several source files. These concepts will be
used frequently throughout this book.
int f(int,int);
int f(int,int);
class IntSetIter;
class IntSetIter;
typedef int Int;
typedefint Int;
friend IntSetlter;
friend IntSetlter;
extern int globalVariable; II bad idea (global variable declaration)
extern int global Variable; II (see Section 2.3.1)
are all declarations, and can be repeated any number of times within a single scope.
On the other hand, the following declarations at file scope are also definitions, and
therefore cannot be seen more than once in a given scope without triggering a compile-
time error:
We should note that function and static data member declarations are exceptions that,
although not definitions, may not be repeated within the definition of a class:
class NoGood {
static int 1 • t II declaration
static int i ; II illegal in C++
public:
int f(); II declaration
int f(); II illegal in C++
};
When a . c file is compiled, the header files are first included (recursively) by the C
preprocessor (cpp) to form a single source file containing all the necessary informa-
tion. This intermediate file (called a translation unit) is then compiled to produce a . 0
file (object file) with the same root name. Linkage connects the symbols produced
within the various translation units to form an executable program. There are two dis-
tinct kinds of linkage: internal and external. The kind of linkage used will directly
influence how we incorporate a given logical construct in our physical design.
24 Preliminaries Chapter 1
Intemallinkage means that access to the definition is limited to the current translation
unit. That is, a definition with internal linkage is not "visible" to any other translation
unit and therefore cannot be used to resolve undefined symbols during the linking
process. For example,
static int x;
is defined at file scope, but the keyword s tat i c forces the linkage to be internal.
Another example of internal linkage is an enumeration:
Enumerations are definitions (not just declarations), but never themselves introduce
symbols into the . 0 file. In order for definitions with internal linkage to affect other
parts of a program, they must be placed in the header file, not the . c file.
class Point {
int d_x;
i nt d-y;
public:
Point Ci nt x, int y) .. d_x (x) , d-y(y) { } II internal linkage
int xC) canst { return d- x', } II internal linkage
i nt y ( ) const { return d-y; } II internal linkage
II . . .
}; II internal linkage
inline int operator==(const Paint& left, canst Point& right)
{
return left.xC) == right.xC) && left.y() == right.y();
} II internal linkage
section 1.1.2 Internal versus External Linkage 25
External linkage means that the definition is not limited to a single translation unit.
Definitions with external linkage produce external symbols in the . 0 file that are
accessible by all other translation units for resolving their undefined symbols. Such
external symbols .must be unique throughout the program or the program will not link.
Note that we will consistently refer to a nonmember function as a free function and
never as afriendfunction. A free function need not be a friend of any class; whether
or not it is should be an implementation detail (see Section 3.6).
Where possible, the C++ compiler substitutes the body of an inline function directly
in place of the function call and introduces no symbols into the . 0 file. Sometimes the
compiler will elect (for various reasons, such as·recursion or dynamic binding) to lay
down a static copy of an inline function. This static copy introduces only a local sym-
bol into the current . 0 file, which cannot interact with external symbols.
26 Preliminaries Chapter 1
Because a declaration is solely for the benefit of the current translation unit,
declarations themselves introduce nothing at all into a . 0 file. Consider the following
declarations:
None of these declarations themselves affects the contents of the resulting .0 file.
Instead, each of these declarations merely names an external symbol, enabling the
current translation unit to gain access to the corresponding global definition if needed.
It is actually the 'use of the symbol name (e.g., calling a function) and not the declara-
tion itself that causes an undefined symbol to be introduced into the . 0 file. It is precisely
this fact that allows early prototyping: as long as the missing functionality is not
needed, partially implemented objects can be used in running programs.
In the previous example, each of the three declarations enabled access to an externally
defined function or object. We might be sloppy and say that these "declarations" have
external linkage. But there are other kinds of declarations that do not serve to enable
access to external definitions. We will often refer to these kinds of declarations as
having "internal" linkage. For example,
is a typedef declaration. It does not introduce any symbols into the . 0 file, nor does it
enable access to a global object with external linkage: its linkage is internal. An
important kind of declaration that happens to have internal linkage is that of a class.
All of the above have the identical effect of introducing the name Poi n t as some kind
of user-defined type; the particular declaration type (e.g., c1 ass) need not match the
actual definition type (e.g., un ion):
class Rep;
I / ...
union Rep {
// ...
1 •
Section 1.1.3
Header (. h) Files 27
The definitions to which these declarations potentially refer also have internal link-
age; this property distinguishes class declarations from the external declarations in
previous examples. Both class declarations and class definitions contribute nothing to
the . 0 file and are solely for the benefit of the current translation unit.
On the other hand, static class data members (declared within the class definition)
have external linkage:
class Point {
static int s numPoints; II declaration of external object
I I ...
};
The static class data member s_numPoi nts (shown above) is only a declaration, but
, its definition in the . c file has "external" linkage:
II point.c
int Point::s_numPoints; II definition of external object
II (initialized to 0 by default)
Note that, according to the language specification, every static class data member
must be defined exactly once somewhere in the final program. 2
It is not possible in c++ to declare an enumeration without defining it. As we will see,
class declarations are quite often used in place of preprocessor inc 1 ude directives to
declare a class without defining it.
II radio.h
#ifndef INCLUDED_RADIO
#define INCLUDED RADIO
class Radio {
static int s_count; II fine: static member declaration
static canst double S- PI·, II fine: static canst member dec.
int d_size; 1/ fine: member data definition
I / ...
public:
int size() canst; II fine: member function declaration
I I ...
}; II fine: class definition
#endif
Figure 1-3: What Does and Does Not Belong in a Header File
The redundancy of duplicated nonmember data definitions affects not only program size
but also runtime performance by defeating the caching mechanism of the host computer.
Section 1.1.4 Implementation ( . c) Files 29
Occasionally, however, there are valid reasons for placing a static instance of a user~
defined object in a header file at file scope. In particular, the constructor of such an
object can be used to ensure that a particular global facility (such as i ostream) has
been initialized before it is used. 3 Although this solution may be elegant for small and
medium-sized systems, it is problematic for very large systems. We will return to this
issue in Section 7.8.1.3.
We will sometimes elect to define functions and data for use in our own implementation
that we do not want exposed outside of our translation unit. Definitions with internal
but not external linkage can appear at file scope in a . c file without affecting the glo-
bal (symbol) name space. The definitions to be avoided at file scope in . c files are
data and functions that have not been declared static. For example,
II filel.c
i nt i; II external linkage
int max(int a, int b) { return a > b ? a : b } II external linkage
The above definitions have external linkage and could potentially collide with other
similar names in the global name space. Because inline and static free functions have
internal linkage, these kinds of functions can be defined at file scope in a . c file and
not pollute the global name space. For example,
II file2.c
Enumeration definitions, nonmember objects declared s tat i c, and (by default) con s t
data definitions also have internal linkage. It is safe to define all of these entities at file
scope in the . c file. For example,
II file3.c
#include <math.h>
class Link; 1/ internal
Other constructs such as typedef declarations and preprocessor macros do not intro-
duce exported symbols into the . 0 file. They too may appear in . c files at file scope
without affecting the global name space. For example,
Typedefs and macros have limited usefulness in C++, and they can be harmful if
abused. We will explore the perils of typedefs in Sections 1.2 and 2.3.3, and those of
macros in Section 2.3.4.
A typedef declaration creates an alias for an existing type, not a new type. A
typedef, therefore, gives only the illusion of type safety. Consequently, typedefs in
the interface can easily do more harm than good.
Consider class Person shown in Figure 1-4. We have decided to nest typedef
declarations within the Per son class to avoid affecting the global name space and to
make them easier to find. The set Wei 9 h t member function is defined to take a weight
argument in "Pounds," while the getHei ght method returns height in "Inches."
typedef Declarations 31
Section 1.2
II person.h
#ifndef INCLUDED_PERSON
#define INCLUDED_PERSON
class Person {
I I ...
public:
typedef double Inches;
typedef double Pounds;
I I ...
void setWeight(Pounds weight);
Inches getHeight() const;
I I ...
};
lIendif
Unfortunately, a nested typedef offers no more type safety than one declared at file
scope:
The two type names Inc hesand Po Un ds are structurally equal and therefore com-
pletely interchangeable. These typedefs afford absolutely no compile-time type safety,
yet make it difficult to know the actual type.
Typedefs do, however, have their place when it comes to defining complex function
arguments. For example,
The standard C library provides a macro called ass e r t (see ass e r t . h) for guarantee-
ing that a given expression evaluates to a non-zero value; otherwise an error message
is printed and program execution is terminated. 4 Assertions are convenient to use and
are a powerful implementation-level documentation tool for developers. Assert state-
ments are like active comments-they not only make assumptions clear and precise,
but if these assumptions are violated, they actually do something about it.
The use of assert stat~ments can be an effective way to catch program logic errors at
runtime, and yet they are easily filtered out of production code. Once development is
complete, the runtime cost of these redundant tests for coding errors can be eliminated
simply by defining the preprocessor symbol NDEBUG during compilation. Be sure,
however, to remember that code placed in the assert itself will be omitted in the pro-
duction version. Consider the following partial definition of a St r i n 9 class:
class String {
enum { DEFAULT_SIZE = 8 };
char *d_array_p;
int d_size;
int d_length;
public:
String( };
I I ...
};
If (as with the code below) the expression argument to the ass e r t macro affects the
state of the software, then the production version will exhibit disparate behavior.
String: :StringC)
d_size (DEFAULT_SIZE)
, d_length(O)
{
II error
}
We can avoid this problem by making sure that the asserted code is completely inde-
pendent of the normal operation of the object:
String: :String()
d_size CDEFAULT_SIZE)
, d_lengthCO)
{
d_array_p = new char[d_size]);
assertCd_array_p); II fine
}
When programmers get together to start a project, they often discuss what coding
standards to adopt. Few of these standards contribute to the quality of the product.
Often they are concerned with questions such as
if (exp) {
if (exp){
At the beginning of one big project, we spent weeks arguing about standards. We con-
cluded that although there is an advantage to standardization, the list of standards
should be as small as possible, and each should be driven by clear engineering princi-
ples. Both of the examples above fail these criteria.
Another thing we learned is that when it comes to enforcing standards, there are two
domains: the interface and the implementation. A good interface is much more impor-
tant than a good implementation. Interfaces have a direct impact on clients and they
also have global implications. Implementations should affect only the authors and
maintainers of code.
There are clear reasons to impose strict standards on interfaces, particularly in large
projects. Interfaces are generally much more difficult and costly to repair than imple-
mentation. It is usually. not too difficult to throw out a poor implementation and
replace it with a better one, provided the interface is a good encapsulating one.
The following coding conventions have been debated ad infinitum and have survived
the ordeal. Most of the recommendations proposed here focus on aspects that affect
interfaces, where their benefit will be most strongly felt. Then again, much of this is a
matter of personal taste. If there is one rule, it is to be consistent.
c++ syntax is complex. Subtle clues about the nature of its constructs are always wel-
come. A fairly standard and widely accepted practice is to treat type names with spe-
cial consideration. In this text we consistently make the first character of a type name
an uppercase letter; non-type names begin with a lowercase letter.
For our purposes, types are those entities that are neither data nor functions:
• Classes
• Structures .
• Unions
• Typedefs
• Enumerations
• Templates
section 1.4.1.2 Multi-Word Identifier Names 35
class Point;
struct Date;
union Value;
enum Temperature { COLD, WARM, HOT, VERY HOT} temp;?
typedef Temp Temperature;
template class Stack<int>;
int Point::getX() const;
void Point::setX(int xCoord);
There are two kinds of people when it comes to naming identifiers-those who
advocate the use of the underscore character (~ _ ~) to delimit words, and those who
advocate capitalizing the second and subsequent words:
There are arguments for both sides. I was originally in the underscore camp but was
forced to make the change by consensus. Now I realize that it makes no difference: it
is just a matter of what you are used to. Perhaps the capitals are a bit better because
the names are shorter; they become easier to read once you are used to them. Using
capitals also leaves open the use of underscore for other purposes (see Sections 6.4.2
and 7.2.1). The important thing is that there be consistency throughout the product line.
It appears unprofessional and can be annoying for one set of classes to use one
naming convention while other classes in the same product use the other, especially
if outside paying customers will (or some day could) have direct access to the under-
lying C++ classes. Some programmers, however, may dismiss these inconsistencies as
simply a matter of style.
7 In
this text the names of enumerators and (static) constants are all uppercase and make use of
underscores to delimit words.
36 Preliminaries Chapter 1
In this book I have adopted the uppercaseStanda rd. Whatever you adopt, however, I
strongly recommend that you be_consi stent, particularly in the interface.
Readability and maintainability are greatly served if people remember to add a consis-
tent prefix (such as d_) to the data members of their classes. Consider the following
Shoe class:
class Shoe
double d_temperature;
int d_size;
I I ...
public:
I I ...
void expand(double calories);
I I ...
void setSize(int size);
I I ...
};
Values held in local (automatic) variables within member functions are only tempo-
rary; they do not exist after the member function returns. On the other hand, class
member data defines the state of the object, which exists between member function
calls:
It is common to see member functions that set an instance variable (e.g., d_s i ze) to
contain a single assignment expression:
Section 1.4.1.3 Data Member Names 37
inline
void Shoe::setSize(int size)
{
putting the d_ in front of data members also obviates dreaming up weird names (e.g.,
sz) for the manipulator function's argument:
The choice of a d_ prefix is quite arbitrary_ We do not use only an underscore (_) as a
prefix because identifiers beginning with an underscore are reserved for use by C
compilers. 8 Some prefer to use a trailing underscore for this purpose:
I find it useful to leave the suffix open for other purposes (such as _p to identify a
pointer data member).9 You may also want to use a different prefix (such as s_ to
identify static class data). Whether in a class or at file scope, non-canst static data
potentially contains instance-independent state information. As discussed in Section
6.3.5, static class data members may be moved to file scope in a . c file to help avoid
compile-time coupling. Because of the very similar properties and interchangeability
of these two types of data, it makes sense to identify state variables in the . c file with
an s_ as well. Consistently following this naming convention makes it easy to search
for all instance-independent state variables in a component.
It is worth noting that static class or file scope constant data is stateless. We can iden-
tify the nature and lifetime of this data simply by making its name all uppercase. For
constant data in class scope, a name such as S_DEFAULT_VALUE or simply
DEFAULT_VALUE could work equally well. In this book we prefer S_DEFAULT_VALUE
for class-scoped constant static data to remind us of the need to keep it private (see
Section 2.2).
8 ellis,
Section 2.4, p. 7.
9 See Section 6.4.2 for another use of an identifier suffix.
38 Preliminaries Chapter 1
By contrast, a non-static constant data member has a more limited lifetime and its
value need not always be the same in each incarnation of the object. Consequently, its
name would appear in lowercase and begin with a d_ prefix:
When using an unfamiliar object, figuring out where to find things can be difficult.
Although member function ordering within a class is clearly a matter of style, from a
client's point of view it helps to be consistent. A fundamental way to classify member
functionality is by whether or not it potentially affects the state of the object.
An organization useful for both a developer and a client is illustrated in Figure 1-5.
This organization has the advantage of grouping by categories of functionality that are
present in nearly every C++ class. This organization is also independent of the partic-
ular abstraction being implemented.
class Car {
I I ...
public:
II CREATORS
Car(int cost = 0);
Car(const Car& car);
""'CarC);
II MANIPULATORS
Car& operator=(const Car& car);
void addFuel(double numberOfGallons);
void drive(double deltaGasPedal);
void turn(double anglelnDegrees);
I I ...
II ACCESSORS
double getFuel() const;
double getRPMs() const;
double getSpeed() const;
I / ...
};
CREATORS bring objects into and out of existence. Notice that operatar= is not a
creator, but rather (by convention) the first manipulator. MAN I PU LATORS are simply
non-canst member functions; ACCESSORS are canst member functions. This purely
objective grouping makes it easy to verify at a glance that all of the accessors and
none of the manipulators are declared as con s t members of the class. But the princi-
pal benefit is to provide a common starting point for dissecting the fundamental func-
tionality of an unfamiliar class. For larger classes, it can be helpful to sort members
within each section alphabetically. For very large classes such as wrappers (discussed
in Sections 5.10 and 6.4.3), other organizations may be more appropriate.
Sometimes people will try to group member functions as get/set pairs as illustrated in
Figure 1-6. For some users, .this style is a result of the misguided belief that an object
is little more than a public data structure that has data members, each of which must
have both a "get" (accessor) function and a "set" (manipulator) function. This style
itself could, for some, impede the creation of truly encapsulated interfaces in which the
data members are not necessarily transparently reflected in the behavior of the object.
class Car {
double d_fuel;
double d_speed;
double d_rpms;
public:
CarCint cost = 0);
Car(const Car& car);
Car& operator=(const Car& car);
..... Ca r ( ) ;
// ...
};
Finally, there is the question of where to place the data members. Properly encapsu-
lated classes do not have public data. From a logical point of view, data members are
40 Preliminaries Chapter 1
merely implementation details of the class6 Consequently, many people prefer to place
the implementation details of a class, including the data members, at the end of the
class definition, as illustrated in Figure 1-76
class Car {
public:
Car(int cost = 0);
Car(const Car& car);
// ...
private:
do ub1e d_ f ue1 ;
double d_speed;
double d_rpms;
};
Although this organization may be more readable to naive clients, the attempt to hide
the implementation details at the end of the class definition belies the fact that they are
not hidden. The presence of implementation details in the header file imposes a
degree of compile-time coupling that does not evaporate simply by relocating these
details within the class definition.
Since this book addresses physical and organizational design issues, we consistently
place implementation details in the header file ahead of the public interface (partly to
emphasize their presence). In Chapter 6, we discuss how such implementation-level
clutter can be removed from a header file entirely, and thus truly hidden from the client.
1.5 Iterators
Perhaps the most common pattern in object-oriented design is that of an iterator. 10, 11
An iterator is an object that is intimately coupled to and supplied with a primary
object of some kind; its purpose is to allow clients to sequence through the parts,
attributes, or subobjects of the primary object.
Often objects will represent a collection of other objects. Such objects are commonly
referred to as containers. Sets, lists, stacks, heaps, queues, hash tables, and so on are
typical container objects. Note that where relevant, we often identify the source file
for a body of code with a leading comment. For example,
// stack.h // stack.c
#ifndef INCLUDED_STACK 1fi nc1 ude" s t a c k . h "
// ... I / ...
Consider, for example, the simple class implementing a set of integers shown in Fig-
ure 1-8. As we can see from its header file, I nt Set is implemented using I n t Set Lin k
objects, but that fact is an encapsulated implementation detail of the class. In this min-
imal implementation, we have elected to prevent users from constructing a copy of an
I nt Set or assignirig to one by making these otherwise automatically generated func-
tions private. (The comment NOT IMPLEMENTED· indicates that the functionality
does not exist even privately.) Users of I ntSet are allowed only to create an empty
set, add integers to it, check for membership, and destroy it.
/1 intset.h
#ifndef INCLUDED_INTSET
#define INCLUDED INTSET
class IntSetLink;
class IntSetIter;
class ostream;
class IntSet {
// DATA
IntSetLink *d_root_p; // root of a linked list of integers
// FRIENDS
friend IntSetIter;
private:
// NOT IMPLEMENTED
IntSet(const IntSet&);
IntSet& operator=(const IntSet&J;
public:
// CREATORS
IntSetC);
// Create an empty set of integers.
'"" I ntSet ( ) ;
/1 Destroy this set.
42 Preliminaries Chapter 1
II MANIPULATORS
void add(int i);
II Add an integer to this set. If the given integer is
II already present, this operation has no effect.
II ACCESSORS
int isMember(int i) const:
II returns 1 if integer 1 is a member of the set,
II and 0 otherwise.
};
lfendif
A tiny test driver that exercises this limited functionality is shown in Figure 1-9. Note
that driver programs in this book are indicated by using the file name suffix . t . c.
II intset.t.c
#include "intset.h"
#include <iostream.h)
maine)
{
IntSet a;
a.add(l); a.add(2): a.add(3).; a.add(2); a.add(4); a.add(6);
cout « endl:
}
II Output:
II john@john: a.out
II O-no I-yes 2-yes 3-yes 4-yes 5-no 6-yes 7-no 8-no 9-no
II john@john:
Suppose we would like to find out what members exist in the set in order to print
them. Theoretically, we could write the output function ourselves as shown in Figure
1-10, but the performance of that implementation would be somewhat lacking.
Iterators 43
section 1.5
An obvious solution is to make the 0 per a tor << function a friend of class I nt Set in
order to take advantage of its internal representation. We could do that, but what if a
client is not happy with the format supplied by this operator's implementation? What
happens if later we find we need to access the internal members, say, to compare two
I ntSet objects?
class IntSet {
I I ...
public:
I I ...
void reset();
II Reset to beginning of sequence of integers. The Current
II integer will be invalid only if the set is empty.
void advance();
II Advance to the next integer in the set. If the current
II integer was the last in the set, the current integer
II will be invalid after advance returns. Note that the
II behavior is undefined if the current integer is already
II not valid.
int current() canst;
II Return the current integer in the sequence. Note that the
II behavior is undefined if the current integer is not valid.
int isCurrentValid() canst;
II Return 1 if the current integer is valid, and a otherwise.
II Note that the current integer is valid if the set is not
II empty and we have not advanced beyond the last integer
II in the set.
}
We could keep adding new members and friends, but each time we do, we put both
our clients and ourselves at risk by increasing the complexity of the class. Repeatedly
revisiting and extending the functionality of an object is a well-recognized way of
introducing bugs into software. Also, unless you plan to support multiple versions,
other clients that do not care about this new functionality will have it forced upon them.
Instead of dealing with these deficiencies one at a time, we can address most of them
at once by providing a general and efficient way to access the individual members of
the set. Suppose we decided to add this capability directly to the I ntSet class itself,
as depicted in Figure 1-11. It is now possible for a client to iterate through an instance
of class I nt Set and print out the contents of that object in any format that is desired.
Figure 1-12 illustrates some of the power of iteration. Regardless of how the imple-
mentation of the set may change, the client's code will not be affected.
}
return 0 « '}';
}
Unfortunately, there are still problems with the design shown in Figure 1-12. For a
given object, there can be at most one iteration going on at anyone time. Suppose we
are trying to implement a comparison function for our I ntSet and decide, for debug-
ging purposes, to print out the contents of the sets midway through the comparison
iteration. The print routine would have the unwanted side effect of corrupting the iter-
ation state for the comparison. The problem is that I nt Set allocates enough space to
hold state information for exactly one iteration. That space remains allocated whether
or not an iteration is active. If for some reason we want to have a pair of nested for
loops that iterate over the elements in the same set, we would have to duplicate the
entire set.
This problem could be addressed by having the client hold on to the internal state or
retain some other form of place holder. If the client allocates the state dynamically,
the client must remember to delete the state to avoid a memory leak.
section 1.5 Iterators 45
If the place holder is in the form of an integer index, there could be some additional
practical constraints on the underlying implementation of the set. For example, if the
set is implemented as a linked list (instead of an array), there is the potential for qua-
dratic-Le., O(N2)-behavior during iteration because each iteration of the for loop
would result in having to traverse the list.
The standard approach is to supply an iterator class along with each container class
(in the same header file). The iterator is declared a fri end of the container and
therefore has access to its internal organization. The iterator class is defined in the
same header file as the container class to avoid the problems associated with "long-
distance" friendship (discussed in Section 3.6). Iterators for concrete containers such
as IntSet are typically created on the program stack; thus their state is destroyed
automatically when the iterator goes out of scope. Iterator objects can be more space
efficient because the space for each iteration need exist only during the iteration
process itself. Also, any number of iterators can be independently active on a given
container at any time without interfering with one another.
As a practical matter, it is common for iterators to assume that the objects on which
they operate are not modified or destroyed during the course of iteration. It is also
common for the order in which objects are presented during iteration to be implemen-
tation dependent and subject to change without notice. Ideally, iterator developers
would explicitly state whether or not the order of iteration is defined. To be safe, cli-
ents of iterators should not assume an order unless one is specified.
Figure 1-13 illustrates the design of the standard iterator pattern used throughout this
book. This iterator object is intended for use with for loops. The syntax of this itera-
tor is quite terse. The use of the operators is by no means obvious, especially if you
have never seen them used this way before. One could easily argue that this style is an
abuse of operator overloading because readability is reduced. There is more to this
story, however.
class IntSetlter {
II DATA
IntSetLink *d_link_p; II root of linked list of integers
private:
II NOT IMPLEMENTED
IntSetlter(const IntSetIter&):
IntSetIter& operator=(const IntSetlter&):
46 Preliminaries Chapter 1
public:
// CREATORS
IntSetlter{const IntSet& IntSet);
// Create an iterator for the specified integer set.
----IntSetlter();
II Destroy this iterator (an unnecessary comment).
II MANIPULATORS
void operator++{);
/1 Advance the" state of the iteration to next integer in set.
/1 ACCESSORS
int operator()() canst;
/1 Return the value of the current integer.
Because of the frequency with which iterators can and do occur in large designs, the
most important consideration for developers must be consistency. If we avoid operator
overloading and use functions instead, it is important to use the same function names
every time; otherwise we will find ourselves unwittingly misnaming these functions
and forever having to revert to header files for the syntactic details. A representative few
of the many possible equivalent function names are shown in Figure 1-14.
Our experience has shown that adopting the operators indicated in the left column of
Figure 1-14 for each of these standard iteration methods produces a consistent, easy-
to-use, and soon familiar and easily recognized idiom for iteration over concrete
types. Whatever you decide to use, be sure to be consistent throughout your product
line. The final implementation of the I ntSet output operator is shown in Figure 1-15;
the terse iterator notation affords a succinct implementation.
Logical Design Notation 47
Section 1.6
}
return 0 « '}t:
}
The choice of pre-increment (++it) over the post-increment (it++) in Figure 1-15 is
deliberate; the post-increment version requires a second dummy argument and is not"
universally available. 12 Furthermore, the semantics of increment for an iterator more
closely pattern those for pre-increment when applied to the fundamental types (see
Section 9.1.1).
Object-oriented design lends itself to a rich set of notations. 13 Most of these notations
denote relationships between the logical entities of a design.
DEFINITION:
NOTATION MEANING
C__
x _) X is a logical entity (e.g., class).
IsA
B B is a kind of A.
U ses-In-The-Interface
BO---------------------A B uses A in B's interface.
Throughout this text, we consistently identify logical entities (e.g., classes, structures,
and unions) with an ellipse-like bubble:
class Car {
( Car) I I ...
};
II car.c
car.c /finclude "car.hl!
II
class Car {
I I ...
h - - Uses-In-The-Interface ( ) public:
( Car f -------f Ga~ void addFuel(Gas *);
/ I ...
};
class Car {
( Car )_~_U_s_e_S-_In_-_T_he_-_Im
_____ ----f( Engine)
pl_e_m_e_n_ta_ti_o_n Engine d_motor;
I I ...
};
If there is ever a need for additional logical notation, a labeled arrow that explicitly
identifies the relationship will suffice.
Suppose a Message is a kind of Stri ng. That is, an object of type Message can be used
wherever a St r i n g object is required.
Section 1.6.1 The IsA Relation 49
class String {
I I ...
public: ( String)
I I ...
};
As we can see from the definitions of Figure 1-16a, class Message inherits from class
Stri ng, and an arrow is used to denote this relationship in Figure 1-16b:
IsA
D
That is, D-----1.~B means that "D is a kind of B" and that "D inherits from B."
The direction of the arrow is significant; it points in the direction of implied depen-
dency. Class 0 depends on B because 0 is derived from B. B must come first in order
for 0 to name B as a base class:
class B { 1* ... *1 };
class 0 public B { 1* .0. *1 };
Often you will see the arrow pointed in the opposite direction, which can be mislead-
ing. An arrow shows an asymmetric relationship between two entities denoted by its
label (in this case "IsA"). To draw the arrow the other way, we would logically have to
call.the relation something else, such as "Derives" or "Is-A-Base-Class-Of":
This alternative notation is less desirable because the arrow points in the direction
opposite to that of implied dependency.
50 Preliminaries Chapter 1
Shape Shape
IsA
Square (Square )
Shape
IsA
Square
Whenever a function names a type in its parameter list or names a type as a return value,
that function is said to use that type in its interface. That is, a type is used in the interface
of a function if the type name is part of the function's return type or signature. 14
clearly makes use of class I nt Set in its interface. This function happens to return an
i nt, so i nt also would be considered part of this function's interface. However, fun-
damental types are ubiquitous and omitted from such consideration in practice.
There are three levels of logical access for classes in C++: pub 1 i c, protected, and
pr i va teo The public interface of a class is defined as the union of the interfaces of the
public member functions of that class. The protected interface of a class is defined
similarly_ In other words, when a (pub 1 i c) member function of class B uses class A in
its interface, we say that class B uses class A in B's (pub 1i c) interface. IS For example,
the constructor for c las sIn t Set I t e r, I n t Set I t e r ( con s tIn t Set &) uses clas s
I ntSet in its interface; therefore I ntSet is used in the interface of I ntSet Iter.
o Uses-In-The-Interface
You can think of the 0 symbol as an arrow with its tail at the bubble and the
head missing (or as a conductor's baton pointing at a member of the orchestra). The
direction of the implied arrow is important-it points in the direction of implied
dependency. That is, if B uses A, then B depends on A and not vice versa. (We will talk
more about implied dependency in Section 3.4.)
15 The interaction between friendship and the Uses-In-The-Interlace relation is discussed in Section 3.6.1.
52 Preliminaries Chapter 1.
Figure 1-18 shows the logical view of the i ntset component, including the Uses-In-
The-Interface relation among the logical entities (classes and free operator functions)
defined there. The figure reflects that I n t Set I t e r and both free operators use I n t Set
in their respective interfaces.
IntSet
intset
Logical View
The U ses-In-The-Interface relation is a valuable tool for both logical and physical
design. This notation is most useful when confined to logical entities (classes and free
operators) at file scope. Free operators are frequently omitted from logical diagrams
in order to reduce notational clutter.
The actual logical interface for a class can be quite large and complex. Often the prop-
erty we are most interested in exhibiting is one of intrinsic dependency rather than
detailed usage. The set of types used in the interface of a class is more stable (i.e., less
likely to change during development and maintenance) than the set of types used by
any particular member function. The more abstract usage characteristics of the class
taken as a whole are, therefore, more resilient to small changes in the logical interface
than are the usage characteristics of its individual member functions.
Architects can make good use of this information as they refine high-level designs and
cast them into discrete physical components.
.
In the above implementation, two iterators are created: one for each In t Set argument.
The body of the for loop is entered only while both iterators refer to valid set
elements. With each iteration through the loop, integers at corresponding positions in
the sets are compared. If any such comparison fails, then the sets are immediately rec-_
ognized as being not equal. On exit from the for loop, both of the following condi-
tions must be true:
1. At least one of the iterators has reached the end of its set and is now
invalid.
The two I ntSet objects are equal if and only if both iterators are now invalid.
Uses-In-The-Implementation
•
That is, B -e--- A means that A is used in the implementation of B.
intset
Logical View
Figure 1-19: Both Kinds of Uses Relations Within the; ntset Component
Figure 1-19 again shows us the logical view of the ; ntset component along with both
kinds of uses relationships. In particular we see that
uses class I nt Set in its interface and class In t Set I t e r in its implementation.
Although 0 per a tor! = is shown implemented symmetrically to 0 per at 0 r==, _
ope rat 0 r ! = would probably be implemented in terms of 0 per at 0 r== in practice.
A class can use another type in its implementation in several ways. As we will see in
Section 3.4,- the particular way in which our class uses a type will affect not only how
our class depends on that type but also to what extent clients of our class will be
forced to depend on that type. For the time being, we simply exhibit the ways in
which a class can use a type in its implementation:
DEFINITION:
Specific kinds of the Uses-In-The-Implementation Relationship:
Name Meanin&
Uses The class has a member function that names the type.
HasA The class embeds an instance of the type.
HoldsA The class embeds a pointer (or reference) to the type.
WasA The class privately inherits from the type.
1.6.3.1 Uses
If any member function of a class (including a private member) names a type in either
its interface or its implementation, that type is considered to be used in the logical
implementation of the class.
/'
56 Preliminaries Chapter 1
class Crook {
private:
va; d bribe();
// ...
};
class Judge;
Figure 1-20 illustrates that since type J ud9 e is named in the body of a member func-
tion (bri be) of class Crook, Judge is used in the implementation of Crook. In other
words, class Crook uses Judge.
Another form of usage occurs when a class, X, embeds a (private) data member of
type T. This kind of internal usage is commonly referred to as HasA. Even if,class X
contains a data member whose type is merely derived (in the C-Ianguage sense) from
T (e.g., T* or T&), T is still considered to be used in the logical implementation of X.
We will occasionally refer to this kind of internal usage as HoldsA.
class BattleShip {
Tower d_controlTower;
Cannon *d_replaceableForward8attery_p;
Cannon& d_fixedAftBattery;
// ...
};
Battleship
(RasA) (Holds A)
Uses-In-The-Implementation Uses-In-The-Implementation
Cannon
Figure 1-21 shows a class definition for Tower and a class declaration for Cannon.
Both of these types are used in the implementation of class Bat t 1e s hip. In particular,
Ba ttl es hip RasA Towe rand Ba ttl es hip RoldsA Ca n non. We make no distinction in
the symbolic notation we use: both HasA and HoldsA are indicated with the usual
• notation .
1.6.3.3 ~as)l
Inheriting privately from a type is yet another way to use that type in the logical
implementation of a class. Private inheritance is an implementation detail of the
derived class. From a logical point of view, a private base class (like a private data
member) is invisible to clients. Private inheritance is a technique that can be used to
propagate only a subset of the attributes of its base class. This seldom-used relation
has been affectionately termed WasA, and is illustrated in Figure 1-22.
Battleship
(Was A)
U ses-In-The-Implementation
ArizonaMemorial
(HasA) (HoldsA)
Uses-In-The-Implementation U ses-In-The-Implementation
Shop
Figure 1-22 shows a class definition for Bat t 1 e s hip that acts as a private base class
for Ar i zan aMemo ri a 1 . Once in active service, the battleship Arizona was one of the
58 Preliminaries Chapter 1
casualties of the 1941 bombings of Pearl Harbor. The Arizona is now a museum with
a gift shop and exhibits.
We have now reviewed all of the logical notation we need to get down to the serious
business of physical design. The logical and physical aspects of design are tightly
coupled. Each of the logical relations-IsA, Uses-In-The-Interface, and Uses-In-The-
Implementation-implies a physical dependency between logical entities. As we will
see in Chapter 3, it is ultimately these logical relations that dictate the physical inter-
dependencies within our system.
In the context of object-oriented design, when someone mentions the word hierarchy,
many people think inheritance. Inheritance is one form of logical hierarchy-layer-
ing is another. By far, the more common form of logical hierarchy in object-oriented
design results from layering.
Instances of a layered type are often not programmatically accessible to clients via the
interface of the higher-level object. The connotation is that the primitive type is at a
lower level of abstraction. For example, a person has a heart, a brain, a liver, and so
on, yet these layered organ objects are not part of the public interface of most healthy
Section 1.8 Minimality 59
(Bad Idea)
Layering is an important and often underdeployed weapon in the arsenal of the object-
oriented designer. It is not uncommon for novice programmers to attempt to use
inheritance where layering is indicated. Figure 1-23 shows two examples of logical
hierarchy. In both cases, Per son implicitly depends on He art, Bra in, and Li ve r in
order to do its job. Layering is clearly the correct approach here because a Person is
not a Hea rt, a Bra in, or aLi ver. Instead, a Person has a Hea rt, a Bra in, and a
Liver. Furthermore, these organs must not be exposed in the interface of a Person.
With layering, a client need not be subjected to the interfaces of these internal details.
1.8 Minimality
Some class authors want their classes to be all things to all people. Such classes
have been referred to, affectionately, as Winnebago classes. This very common and
seemingly noble desire is cause for concern. As developers, we must remember that
just because a client asks for an enhancement doesn't mean that it is appropriate for
our class. Suppose you are the author of a class and each of 10 clients asks you for a
different enhancement. If you agree, two things will happen:
1. You will have to implement, test, and document 10 new features that you
did not originally consider part of the abstraction that you were trying to
implement (which in itself is a symptom of a problem).
2. Each of your 10 clients will be given 9 new features that they did not ask
for and probably don't need or want.
Every time you add a feature to please one person, you disrupt and potentially annoy
the rest of your client base. It has happened that classes that were originally light-
weight and very useful have, over time, become so bloated that instead of being good
for everything, they have become, quite literally, good for nothing.
Notice that in Section 1.5 we chose to disallow explicitly the possibility of initializa-
tion or assignment for instances of both I n t Set and I n t Set I t e r by declaring the
respective member functions private. Making a copy of a collection can result in non-
trivial development effort, and such functionality for iterators is rarely needed in prac-
tice. We can defer the implementation and testing of superfluous functionality unless
or until a need for that functionality presents itself. Deferring implementation is also
one way to keep our options open. Not only does it require less work to implement,
test, document, and maintain software, but by deliberately not supplying functionality
prematurely, we commit to neither its behavior nor its implementation. In fact, not
implementing functionality can improve usability. For example, making the copy con-
structor private prevents inadvertently passing an object by value-a technique used
in the iostream package. 18
This minimalist approach of making components sufficient but not necessarily com-
plete applies to large projects under development where the users of the component
are "in-house" or in a position to request and receive additional functionality quickly
should it tum out to be needed. The most extreme case occurs where the component is
highly specialized and the author is the only intended user. In that case, implementing
any unneeded functionality is probably unwarranted. Of course,omitting the imple-
mentation of functionality intrinsic to an abstraction would not make sense for, say, a
commercial component library where the users are paying customers and will expect
robust and fully functional objects. This issue is not black and white; between the two
extremes lies a spectrum that corresponds to how widely a component will be used. In
evaluating the trade-offs, remember to consider that functionality is invariably easier
to add than to remove.
1.9 Summary
Large C++ programs reside in more than a single source file. Partitioning programs
into separate translation units makes recompilation more efficient and reuse possible.
Although most C++ declarations can be repeated in a given scope, there must be
exactly one definition of every object, function, or class used in a C++ program.
Definitions with internal linkage are confined to a single translation unit and cannot
affect other translation units unless placed in a header file. Such definitions can exist
at file scope in . c files without affecting the global (symbol) name space.
Definitions with external linkage can be used to resolve undefined symbols in other
translation units at link time. Placing such definitions in header files is almost cer-
tainly a programming error.
Typedef declarations are only aliases for types and provide no additional compile-
time type safety.
Assert statements can be used effectively to detect coding errors during development
without affecting program size or runtime performance in the production version of a
product.
• Multi-word identifier names capitalize the first letter of the second and
subsequent words.
62 Preliminaries Chapter 1
• Constants and macros are all uppercase (with words separated by a single
underscore) .
•
• Class data members are prefixed by d_ (or s_ for static members).
• Private details will precede the public interface in class definitions (primarily
to emphasize their presence in the header file).
The iterator design pattern is used to sequence over the parts, attributes, or sub-
objects of some primary object. An iterator is declared to be a f r i end of the primary
object, and its definition should reside in the same header file as that object. The
iterator notation used in this book tersely conforms to a for-loop model.
Object-oriented design lends itself to a rich set of logical notations. In this text,
however, we will limit ourselves to three:
The orientation of each symbol (shown here from left to right) should be consistent
with its label and point in the direction of implied dependency. There are a few special
names for some particular kinds of Uses-In-The-Implementation (Uses, RasA,
HoldsA, and WasA); however, the notation used to represent each of these variations
is the same.
Inheritance and layering are two forms of logical hierarchy. Layering is by far the
more common, often involving an implementation-only dependency. Layering, spe-
cifically composition, is preferable to derivation when the class in question cannot
sensibly be thought of as a kind of the proposed base class(es). Finally, extending the
functionality of a single class in response to several clients often results in a class that
is overweight and undesirable. For classes that are not widely used, implementing
excessively complete functionality can unnecessarily increase development time,
maintenance cost, and code size. Deferring the implementation of functionality that is
not yet needed reduces development time while keeping options open. On the other
hand, commercial component libraries are expected to be fully functional and robust.
Ground Rules
. This chapter describes a modest collection of fundamental design rules that have
proved useful in practice and that serve as framework for discussing the material sur-
rounding more advanced rules presented later in this book. These fundamental rules
address basic practices such as restricting member data access and reducing the num-
ber of identifiers in the global name space. In particular, we examine what types of
constructs can safely be placed at file scope in a header file. The need for both internal
and redundant external include guards will be established. This chapter concludes
with a discussion of what constitutes adequate documentation (such as explicitly iden-
tifying behavior that is undefined), followed by a short list of identifier-naming con-
ventions.
Overview
The beauty of any fine art comes not only from creativity but also from discipline. So
it is with programming. C++ is a large language, and there is ample room to be cre-
ative with it. However, the design space is so big that without discipline-that is,
without some modest constraints on the design structure-large projects can easily
become intractable and unmaintainable. These constraints are presented in the form of
design rules, guidelines, and principles.
Design Rules: Experience tells us that certain coding practices that are perfectly legal
in C++ simply should never be used in a large-project environment. Recommenda-
tions that flatly proscribe or require a given practice without exception are referred to
in this book as design rules. Verifying adherence to these rules cannot be a subjective
64 Ground Rules Chapter 2
process. Design rules must be sufficiently precise, specific, and well defined so that
complying with these rules can be verified objectively. To be effective, design rules
must lend themselves to impersonal, mechanical verification via automated tools.
Guidelines: Experience also tells us that certain other practices should be avoided
wherever possible. Suggested practices of a more abstract nature for which exceptions
are sometimes legitimately made are called guidelines. Guidelines are like rules of thumb
to be followed unless other, more compelling, engineering reasons dictate otherwise.
Principles: There are certain observations and truths that have often proved useful
during the design process but must be evaluated in the context of a specific design.
These are referred to as principles.
This book contains many recommendations. In this chapter I present a set of very
basic design rules that I call ground rules, explaining and (I hope) justifying each rule
as I go. You may not agree with all of them at first, but over time they have proved
both workable and effective for very large projects.
I have subdivided design rules into two distinct categories: major and minor. Major
design rules refer to practices that must always be followed. Deviating from a major
design rule is likely to affect the quality not only of the offending component, but also
of other components within the system. Even infrequent violations could undermine
the success of a large project. Throughout this book, I have assumed that major design
rules are never violated. As always, never never means NEVER. If extraordinary cir-
cumstances and common sense dictate that one or more major design rule'S be vio-
lated, it is incumbent upon developers to fully understand and appreciate the
implications and possible consequences of their actions. Minor design rules refer to
practices that are strongly recommended but not necessarily critical to a project's
overall success-for example, issues involving constructs that are used only in the
implementation, are unlikely to affect other developers, and are otherwise relatively
contained and easy to fix in isolated instances. Draconian adherence to minor design
Section 2.2 Member Data Access 65
rules is not critical because (unlike adherence to major rules) the cost of a project
increases only incrementally with each minor rule violation.
Because it is not expected that there will ever be an engineering reason to violate any
design rule (major or minor), any design rule that proscribes one approach must offer
a suitable alternative that will work in all cases.
Consider the definition of class Re eta n 9 1e in Figure 2-1. This Re eta n9 1 e is defined
by providing two Poi nt objects (see Figure 1-1) that identify its lower-left and upper-
right comers. Since this particular implementation of Re eta n9 1e stores these Poi nt
values internally, we might be tempted to make the data members public to avoid sup-
plying manipulator (i.e., set) and accessor (i.e., get) functions for each.
II rectangle.h
#ifndef INCLUDED_RECTANGLE
#define INCLUDED_RECTANGLE
class Rectangle {
public:
Point d_lowerLeft; II bad idea (public data)
Point d_upperRight; II bad idea (public data)
public:
II CREATORS
Rectangle(const Point& lowerLeft, canst Point& upperRight);
Rectangle(const Rectangle& rect);
,...,Rectangle();
II MANIPULATORS
Rectangle& operator=(const Rectangle& rect);
void moveBy(const Point& delta);
II
66 Ground Rules Chapter 2
II ACCESSORS
int area() const;
I I ....
};
II
inline
void Rectangle::moveBy(const Point& delta)
{
d_lowerLeft += delta;
d_upperRight += delta;
}
II
ffendif
Now consider the impact on clients when we discover that Rectangl e objects are fre-
quently moved. To improve performance, we might try changing the representation of
Recta ng 1 e objects. For example, instead of storing the absolute location of the upper-
right comer, we might represent that value implicitly by storing its position relative to
the lower-left comer:
class Rectangle {
public:
Point d_lowerLeft; II same purpose as in Figure 2-1
Point d_upperRightOffset II new "relative" representation
With this new representation, the moveBy member function can be implemented in one
line instead of two because the relative position of the upper-right comer with respect
to the lower-left is not affected by the move:
inline
Rectangle::moveBy(const Point& delta)
{
d_lowerLeft += delta;
}
The location of the upper-right comer is no longer stored in the Rectangl e object and
therefore must be calculated when needed:
Any clients who previously accessed the d_u p per Rig h t data member directly will
now be forced to rework their code. Component reuse compounds this problem. If a
class defining public data is shared among executables, then changing the data repre-
sentation of a single class could necessitate modifying the source code for any number
of separate programs.
Keeping all data members private and providing the appropriate accessor and
manipulator functions, as shown in Figure 2-2, leaves us free to change the internal
representation without forcing our clients to rework their code. The implementation
of getUpperRi ght() could have been modified to compute that value on demand
without changing its logical interface.
Besides maintainability, there are reasons not to have public data members. For exam-
ple, the values of data members in a class are rarely independent. Direct (writable)
access to data (such as d_a rea in Figure 2-2) could easily leave an object in an incon-
sistent state. Providing only a functional interface grants class authors the level of
control necessary to ensure the integrity of their objects. Providing manipulator and
accessor functions also affords developers the opportunity to insert temporary code
(e.g., print statements for debugging, reference counts for performance tuning, and
assert statements for reliability).2
II rectangle.h
#ifndef INCLUDED_RECTANGLE
#define INCLUDED_RECTANGLE
class Rectangle {
Point d_lowerLeft; II Yet another representation!
int d_width; II Fortunately, these data members are private.
int d_height;
int d_area; II Store this redundantly to improve performance.
publ ic:
I I CREATORS
RectangleCconst Point& lowerLeft, const Point& upperRight);
RectangleCconst Rectangle& rect);
-RectangleC);
II MANIPULATORS
Rectangle& operator=(const Rectangle& rect);
void moveBy(const Point& delta);
II
II ACCESSORS
int area() const;
Point getLowerLeftC) canst;
Point getUpperRight() const:
};
I I ...
inline
void Rectangle::moveByCconst Point& delta)
{
d_lowerLeft += delta;
}
I I ...
inline
Point Rectangle::getUpperRight(const Point& delta) canst
{
return d_lowerLeft + PointCd_width, d_height);
}
II
Note that public access to data members of a 5 t rue t (or class) that itself is entirely
hidden (either privately within another class or locally within a . c file) is a separate
matter not covered by the above rule (see Sections 6.4.2 and 8.4). When data
The Global Name Space 69
section 2.3
members are not private, it is preferable to denote the deliberate lack of encapsulation
by using the keyword 5 t rue t instead of c 1 ass.
Some people advocate the use of protected data to facilitate arbitrary access from a
derived class. But from a maintainability perspective, pro tee ted access is like pub-
1 i c access because anyone who wants to get at protected data can do so with only the
modest additional effort of deriving a class. Unlike friendship, which explicitly
denotes who has access to private details, making class data protected results in an
unbounded breach of encapsulation.
The same arguments that applied to the public interface also apply to the protected
interface. Base-class authors can preserve maintainability by treating their protected
and public interfaces as separate but equally important. Keeping all member data pri-
vate and supplying the appropriate protected functions will enable the base-class
implementation to change independently of any derived classes.
For projects of even moderate size involving more than a single developer, there is a
danger of name collisions when independently developed parts are integrated into a
single program. The severity of the problem grows exponentially with system size,
and is exacerbated when the collisions result from integrating software provided by
third-party vendors.
There are various ways to pollute the global name space, some more onerous than
others. All of them are counterproductive in a large system environment. We now
address several of these issues independently and conclude this section with a design
rule that describes what kinds of declarations and definitions may exist safely at file
scope in C++ header files.
It has been said that global variables are like a cancer: you can't live with them, but
once established, they are often impossible to cut out. We can always get away with-
out using external global variables in a new C++ project. Exceptions to this rule might
involve access in a baroque program (such as Lex or YACC) that communicates via
global variables or perhaps within embedded systems.
70 Ground Rules Chapter 2 .
File scope data with extemallinkage risks collision with global names in other trans-
lation units (whose authors were egocentric enough to believe that they, too, owned
the global scope). But name pollution is only one of the many ways in which global
variables damage a program. Global variables tie objects and code together in ways
that make it virtually impossible to reuse translation units selectively in other pro-
grams. Debugging, testing, and even understanding systems that make liberal use of
global variables can become overwhelmingly costly in large projects.
Provided that you are not forced to use a system that already requires using global
variables in its interface, there are a couple of simple transformations that can unglo-
balize these variables:
int size;
double scale;
canst char *system;
These variables can be removed from the global name space by enclosing them in a
s t rue t and making them s tat i c members of that structure: 3
struct Global {
static int s_size; II bad idea (public data)
static double s_scale; II bad idea (public data)
static const char *s_system; II bad idea (public data)
};
Remember, of course, to define these static data members in the corresponding . c file.
Now, instead of accessing the global variables using
Although we have solved the global name space problem, we have not done all that
we should. Experience shows that just as with non-static (Le., instance-specific) mem-
ber data, directly accessing static (Le., class-specific) member data makes large sys-
tems profoundly more expensive to maintain. If we were to change the exported data
type of a member (e.g., s_s i ze) from i nt to daub 1e, that would be an interface
change; all clients would be affected regardless of what we do. But we may decide to
change the implementation of s_s i ze to a computed value based on other, more prim-
itive values (such as s_wi dth and s_hei ght). Providing static function members to
access (and manipulate) static data members allows us to make such local changes
without perturbing clients of the global scope.
The next step is to eliminate the public data by making Glob a 1 a class and providing
static manipulator and accessor methods, as illustrated in Figure 2-3. Class G1a b a1
now acts as a logical module accessible from anywhere in the program. Because all
the interface functions are static, there is no need to instantiate an object in order to
use this class. Declaring the default constructor private and leaving it unimplemented
enforces this usage model.
To achieve a flexible design, we should be careful not to overuse global state informa-
tion. The mere fact that we expect to have only a single instance of an object is not suffi-
cient reason to make it a module instead of an instantiatable class. Globally accessible
modules make sense when they correspond to inherently unique entities (such as a sys-
tem console) or for system-wide constants (such as those found in 1 i mi ts . h) that are
not dictated by a particular application (see Section 6.2.9). Global modules are best
avoided when other, more localized (e.g., object-based) implementation will suffice. 4
72 Ground Rules Chapter 2
class Global {
static int s_size;
static double s_scale;
static const char *s_system:
private:
II NOT IMPLEMENTED
Global(); II prevent inadvertent instantiation
public:
II MANIPULATORS
static void setSize(int size) { s size = size; }
static void setScaleCdouble scale) { s_scale = scale; }
static void setSystem(const char *system) { s_system = system; }
II ACCESSORS
static int getSize() { return s_size; }
static double getScale() { return s_scale; }
static canst char *getSystem() { return s_system; }
};
Free functions, too, can be a threat to the global name space, especially when they do
not involve any user-defined type in their argument signature. If a free function is
defined with internal linkage in a . h file or with external linkage in a . c file, it may
collide with another function definition with the same name (and signature) during
program integration. Operator functions are an exception.
Fortunately, free functions can always be grouped into a utility class (s t rue t) con-
taining only static functions. The resulting cohesion is not necessarily optimal, but it
does reduce the likelihood of global name collisions. Here's an example:
The above free functions could always be replaced by the following static methods:
struct SysUt i 1 {
static int getMonitorResolutionC);
static vaid setSystemScaleCdouble scaleFactor);
static int isPasswordCarrect(canst char *usr, canst char *psw);
};
Unfortunately, free operator functions cannot be nested inside classes. Thisis not a
serious problem because free operators require at least one of their arguments to be a
user-defined type. Hence the likelihood of free operators colliding is remote, and such
collisions are typically not a problem in practice.
Enumerations, typedefs, and (by default) file scope const data all have intemallink-
age. People often declare constants, enumerations, or typedefs at file scope in header
files. This is a mistake.
Because C++ fully supports nested types, enumerations can be defined (and typedefs
declared) within the scope of a class without conflicting with other names in the
global name space. By choosing a more limited scope in which to define an enumer-
ation, you ensure that all enumerators of that enumeration become similarly scoped
and thus will not conflict with other names defined outside that scope.
74 Ground Rules Chapter 2
II paint.h
enum Color { RED, GREEN, BLUE, ORANGE, YELLOW}; II bad idea
II juice.h
enum Fruit ( APPLE, ORANGE, GRAPE, CRANBERRY}; II bad idea
These two enumerations were probably not written by the same developer, yet it is
quite possible that they could someday be included in the same file, resulting in an
ambiguity, ORANGE, that cannot be resolved!
II picture.c
#include "picture.h"
#include "paint.h"
#include "juice.h"
If these two enumerations are instead defined within separate classes, one can easily
use scope resolution to resolve the ambiguity: Pai nt: : Orange or Jui ce: : Orange.
For similar reasons, typedefs and constant data should also be placed within class
scope in header files. Most constant data is integral, and nested enumerations work
well to provide integral constants within the scope of a class. Other constant types
(e.g., doubl e, Stri ng) must be made static members of the class and initialized
within the. c file:
II array.h II array.c
#ifndef INCLUDED_ARRAY #include "array.h"
#define INCLUDED_ARRAY
class Array {
enum { DEFAULT_SIZE = 100 };
static canst double DEFAULT_VALUE; double Array::DEFAULT_VALUE = 0.0;
static canst String DEFAULT_NAME; String Array::DEFAULT_NAME = " " ;
II I I ...
};
#endif
In large projects, aside from the global name collisions, there is a very real problem
with even finding enumerations, typedefs, and constants at file scope. Nesting a
section 2.3.4 Preprocessor Macros 75
typedef within a class forces the name to be fully qualified (or the declaration to be
inherited), making it relatively easy to find. The same reasoning applies to enumera-
tions, but even stronger arguments for nesting enumerations within classes have
already been presented.
There is almost no need for macros in C++. They are useful for include guards (see
Section 2.4), and in a very few cases their benefits outweigh their problems in a . c file
(most notably, when used to achieve conditional compilation for portability or debug-
ging). But in general, preprocessor macros are inappropriate for production software.
The preprocessor is not part of the C++ language; its basis is completely textual, mak-
ing macros painfully hard to debug. Although macros can make code easier to write,
their free form often makes code much harder to read and understand. Consider the
following code fragment:
How would you tell your debugger, browser, or other automated tool to deal with the
above at the source level?
As bad as macros are in . c files, there are even stronger software engineering reasons
for keeping macros out of header files. Take the case of defining a preprocessor con-
stant using #defi ne in a header file. Since macros are not part of C++, they cannot be
placed inside the scope of a class. Any file that includes a header file with a #defi ne
will take on that definition.
76 Ground Rules Chapter 2
II theircode.h II ourcode.c
#ifndef INCLUDED_THEIRCODE #include "ourcode.h"
#define INCLUDED_THEIRCODE #include "theircode.h"
I I ... I I ...
#endif I I ...
return status;
};
II
When file ou rcode. c is compiled, the compiler first calls the preprocessor. Even
though GOOD is defined within the protective scope of a function, it is not safe from the
preprocessor, which mercilessly replaces the enumerator GOOD with the literal integer 0:
I I ...
int OurClass::aFunction
{
enum { BAD = -It 0 = 0 } status - 0;
I I ...
return status;
}:
When the compiler encounters the enumeration, it spits out Synta x Er ro r, but you won't
know why until you have spent an eternity "grepping" through . h files looking to see
who has IFdefi ne'd one of your enumerators. Notice that this problem would not have
section 2.3.5 Names in Header Files 77
occurred if the preprocessor symbol had instead been either a canst or an enum at file
scope (which, by the way, are also design-rule violations according to Section 2.3.3):
II theircode.h II theircode.h
#ifndef INCLUDED_THEIRCODE #ifndef INCLUDED_THEIRCODE
#define INCLUDED_THEIRCODE #define INCLUDED_THEIRCODE
I I ... I I ...
canst int GOOD = 100; II bad idea enum { GOOD = lOa}; II bad idea
II file-scope constant data II file-scope enumerated value
I I ... I I ...
#endif #endif
Preprocessor macros can also be used to implement templates in cases where that
C++ language feature is missing or inadequately implemented. If macros are used for
this purpose, then macro functions will appear in header files. There are ways to
approach this problem, other than resorting to macros, that may be better suited for
large projects. In any event, template-related issues should be addressed early in the
development process.
A name declared at file scope in a header file has the potential to collide with any file-
scope name in any file in the entire system. Even names with internal linkage
declared at file scope in a . c file are not safe from file-scope names in a . h file.
The only things we expect to find at file scope in a header file are class declarations,
class definitions, free operator declarations, and in line function definitions. Nesting
78 Ground Rules Chapter 2
all other constructs within class scope eliminates most of the trouble associated with
name collisions.
To help illustrate this rule, an otherwise meaningless header file containing several
constructs is provided with commentary in Figure 2-4. Note that a static instance of
user-defined type is a special case, which is discussed in Section 7.8.1.3. For now,
avoidance of these static user-defined objects in . h files may be treated as a guideline
and not a rule.
statf¢iilt-tileScOpeVa r i.·ahl te
i n rn~l·d~t~·
Cbtrst i ntBUFFERS lZ E = 2 .. constdata .defi ni ttoh><··
:~hlJ m Boo 1e ahl.ZER
!~'yp~d efl 0 n 9
0, 0) ~Nj~E',:.' ,J}';;, • ti,:!itil:l !~:i.il,i:l i l<ilj8l1 •. i:~~
Big I n t ; ' "
:.i.: • • • • • • .•·.• i.:.·.: • • :••·• • :.
class Driver {
enum Color { RED, GREEN}; II fine: enumeration in class scope
typedef int (Driver::*PMF)(); II fine: typedef in class scope
static int s_count; II fine: static member declaration
int d_size; II fine: member data definition
private:
struct Pnt {
short int d_x, d-y;
Pnt(int x, int y)
: d_x ( x), d-y ( y) {}
}; II fine: private struct definition
friend DriverInit; II fine: friend declaration
Section 2.3.5 Names in Header Files 79
public:
int static roundCdouble d); II fine: static member
II function declaration
void setSizeCint size): II fine: member function declaration
int cmp(const Driver&) canst; II fine: const member
II function declaration
}; II fine: class definition
I I ...
} driver'Init; 17 s pe c ial c as e . ( see Sec t ion 7. 8 . 1 .3 ) . .
inline
void Driver::setSize(int size)
{
d size = size;
} II fine: inline member
II function definition
ostream& operator«Costream& 0,
canst Driver& d): II fine: free operator
II function declaration
inline
int operator==Cconst Driver& lhs,
canst Driver& rhs)
{
return compare(lhs, rhs) -- 0;
} II fine: free inline operator
II function definition
inline
int Driver::round(double d)
{
return d <0 ? -intCO.5 - d)
intCO.5 + d);
} II fine: inline static member
II function definition
If we follow the above recommendation that only class, struct, union, and inline func-
tion definitions appear at file scope in header files, we will still have a problem if the
same header file gets included twice in a single translation unit. This problem could
occur with the simple include graph shown in Figure 2-5.
When component c. 's .c file is compiled, the preprocessor first includes the corre-
sponding header file, c. h, which in tum includes the contents of a. h. Next c. h
includes file b. h, triggering a second inclusion of a . h. If a . h has any definitions at all
(and in C++ it almost surely does) the compiler will complain about multiple defini-
tions.
II C.c
/ I '"
#include "c.h"
C.C
c.h
Includes
b.h
II a.h
II
+ Bad idea: missing
include guards
a.h
Figure 2-5: Reconvergent Include Graph Causing Compile-Time Error
Section 2.4 Include Guards 81
For example, when a. h is included in a translation unit, the preprocessor will first
check to see if the preprocessor symbol INC LU0 ED_A is defined. If not, the guard sym-
bol INCLUDED_A will be defined once and for all (for this translation unit), and then
82 Ground Rules Chapter 2
preprocessing will proceed by reading the definitions contained within the rest of the
header file. The second (and any subsequent) time this header file is included, the
contents inside the preprocessor 1f i f nde f conditional (i.e., the rest of the file) will be
ignored.
The actual symbol used for the include guard is not important so long as it does not
match any other symbol in the entire system. Since the include guard is tied to a given
header file, and that header file name must be unique in the system, incorporating that
name in the guard symbol can ensure that no two guard symbols are the same.
The preprocessor knows nothing of c++ scoping rules. We must therefore ensure that
the include guard symbols do not match any other symbols at all-even those within
functions defined in a . c file.
Adopting a standard naming convention of prefixing the root name of the header file
in upper case (e.g., STACK) with a globally reserved prefix (e.g., INCLUDED_) ensures
unique and predictable guard names:
II stack.h II iccad_transistor.h
#ifndef INCLUDED STACK #ifndef INCLUDED_ICCAD_TRANSISTOR
#define INCLUDED_STACK #define INCLUDED- ICCAD- TRANSISTOR
I I ... I I ...
#endif #endif
Practical isn't always pretty, and this is one of those cases. Theoretically, unique inter-
nal include guards are sufficient. With large projects, however, it can be very costly
not to consider a bit further.
Typically, each screen uses a substantial number of the available widgets. For the pur-
poses of this discussion, assume each screen type uses all (or most) of these primitive
types in a substantive way that prompts the implementors to include all of wI . h, w2 . h,
... , wn . h files in each s i . h file. The header file for a typical screen, S13, is shown in
Figure 2-7.
II sl3.h
#ifndef INCLUDED_513
#define INCLUDED_513
#include "wl.h"
#include "w2.h"
#include "w3.h"
I I ...
#include t'wn.h"
#include <math.h>
class 513 {
WI d_wla;
WI d_wlb;
W2 d_w2;
W3 d_w3;
II
Wn d_wn;
};
#endif
Do you see a potential problem? Let's continue. Suppose you have developed a good
number of screens, and in some translation unit of your system, c k . c, you need to
include all of the screen headers (say to create them). The include graph for a window
application with N = 5 widgets and M =5 screens is shown in Figure 2-8.
84 Ground Rules Chapter 2
II ck.c .
#include "ck.htt
#include "sl.h ll
#include IIs2.h"
#include "s3.hl'
#include "s4.hl!
#include "s5.hl'
(N= 5)
Figure 2-8: Include Graph for One Component in Window System of Size N =5
When the preprocessor sees that c k . c has included s 1 . h, it also includes wI. h
through w5. h. Upon encountering s2. h, each of the widget header files must still be
reopened and reprocessed line by line in its entirety searching for the trailing He nd i f
(only to find that there is nothing else to be done). This redundant preprocessing
occurs with s3. h, 54. h, and again with 55. h. Although this program will compile and
work properly, we had to wait for 25 widget header files to be processed when 5
would have done the job!
Unless care is taken to ensure otherwise, C++ tends to have large, dense include graphs
(much more than C). Although inheritance and layering contribute to this problem, the
underlying cause is often the misguided belief on the part of c++ developers that they
are somehow doing their clients a favor by including in their header file every other
header that a client might need.
Avoiding dense include graphs is part of the topic of Insulation, covered in Chapter 6.
What follows is a practice that will minimize the impact of such reconvergent inclu-
sion, even in a poor design. Note that some development environments are smart
Section 2.5 Redundant Include Guards 85
enough to keep track of previously included header files, but many common environ-
ments are not. If portability is an issue, it is better safe than sorry.
Wir .6i."I,!I.,rcr,~,.;,i.
... ......
Place a redundant (external) include guard around each include directive that occurs in
a header file. This technique, applied to a typical screen header file, is shown in Figure
2-9. Processing file s 13. h for the first time will still cause files wI . h, w2 . h, ... , wn . h
to be included. Including another screen, however, will not lead to any redundant
parsing of widget headers.
Notice that the redundant include guard for the math standard library header is differ-
ent from the rest. Although math. h does have its own internal include guard, it proba-
bly doesn't follow our standard. The runtime libraries supplied with different
co~pilers are likely to have different naming conventions for the include guards they
use, and these guard names may not always be consistent. Components supplied by
third-party vendors may use yet another convention. For all components that are not
guaranteed to follow our include-guard naming convention, it will be necessary to add
a line that defines the appropriate include guard symbol after the corresponding
include directive (a~\y'as done for ma t h . h).
Using redundant include guards is admittedly unpleasant. It now takes not one but at
least three lines to include a header in a header file-four lines if the included header
came from outside our sphere of influence. Redundant include guards not only make
headers take longer to write, they make headers harder to read. Using redundant
include guards also requires following a consistent and predictable naming conven-
tion. Is it worth it?
Experience with truly large projects that have dense include graphs shows that the
answer is a resounding YES! Initial builds of projects consisting of several million
lines of C++ source code were taking on the order of a week to compile using a large
86 Ground Rules Chapter 2
network of work stations. Inserting redundant include guards reduced compile time
significantly, with no substantive change to the code.
II s13.h
#ifndef INCLUDED_513
#define INCLUDED_513
#ifndef INCLUDED_WI
#include "wl.hll
lIendif
#ifndef INCLUDED_W2
11 inc 1ude" w2 . h
11'-\,/-
Hendif
1Iifndef INCLUDED_W3
#include "w3.h"
#endif
/ I ...
#ifndef INCLUDED_WN
Hinclude "wn.h"
lIendif
#ifndef INCLUDED_MATH
#include <math.h>
#define INCLUDED_MATH II extra line
#endif
class 513 {
WI d_wla;
WI d_wlb;
W2 d_w2;
W3 d_w3;
I I ,.,
Wn d_wn;
};
#endif
What we have just discussed is typically not an issue for a small or even a medium~"'"
size system. But what would happen if we were dealing with systems that contained;
the equivalent of hundreds of primitive widgets with hundreds of primitive screens?
To provide quantitative information demonstrating the benefits of using redundant
include guards, I tried the following experiment.
Section 2.5 Redundant Include Guards 87
I let N be number of widgets as well as the number of screens. I then generated sub-
systems and measured the compile time (which is dominated by the C preprocessor
time) for a single translation unit, including all of the screen header files with and
without redundant include guards. I tried the experiment with header files having 10
lines each and again with header files of 100 lines each. I defined the speedup factor
to be the compilation time without redundant include guards divided by the corre-
sponding compilation time with redundant include guards added. The results are
shown in Figure 2-10.
For systems with fewer than eight widgets and eight screens, the speed-up is either
non-existent or minimal, but given that the total compile time was less than 1 CPU
second, it hardly matters.
Header files in C++ are seldom only 10 lines long; 100 lines is still small but more
typical. For systems with 32 widgets, the time spent in the C preprocessor compiling
each client component on my machine can be reduced by a factor of more than 6
(from 5.8 to 0.9 CPU seconds). For systems with 64 widgets, the speedup is a factor of
over II! Redundant include guards are ugly, but do no real harm. Not using redundant
guards runs the risk of quadratic (i.e., O(N2)) behavior at compile time.
88 Ground Rules Chapter 2
Note that redundant guards are not necessary in . c files. Short of deliberately duplicating
=/Ii n elude directives in the. c file, the (pathological) worst-case behavior, 2N remains
linear (i.e., O(N» with respect to the number of distinct . h files, N.
The data in this section reflects CFRONT running on Unix-based workstations. Other
development environments may have somewhat different characteristics. In Chapter 6
we will see that ne~ting =/Ii ncl ude directives in header files is not only undesirable but
often unnecessary. The ugliness of the redundant include guards, if nothing else,
reminds us that we want to avoid placing #i ncl ude directives in header files when-
ever it makes sense to do so.
2.6 Documentation
The examples in this book do not set a good example for what are sufficient comments
for production code (otherwise this would be three books, not one). But comments,
especially in the interface, are an essential part of the development process.
Guideline
Document the interfaces so that they are usable by others; have at
least one other developer review each interface.
To see why it is valuable to have another developer review your interface, try to put
yourself in the position of a client or a test engineer trying to understand your class.
You know very well how to use your interface-after all, you designed it! The terse
names you supplied as member functions are "obvious" and "self-explanatory." But
unless you have taken the time to have someone else review your interface and docu-
mentation, chances are that there is significant room for improvement-particularly in
its usability.
A big part of usability is being able to pick up an unfamiliar header and just start
using it. In practice, header file comments are often the only documentation (or at
least the only up-to-date documentation) that exists for an interface. If clients are
forced to peek at the implementation in order to figure out how to use your compo-
nent, then it is not documented properly.
section 2.6 Documentation 89
Guideline
Explicitly state conditions under which behavior is undefined.
struct MathUtil {
I I ...
static int fact(int n);
II Returns the product of consecutive integers between 1 and n.
};
What do you think about the comment for function fact? We might guess that fact is
supposed to be the common mathematical functionJactorial (n!), and that fact (0) is
actually 1 and not 1 • 0 = 0 or undefined. However, that is not what the comment says.
What the comment fails to say is what is supposed to happen when n is non-positive!
A factorial is not defined for negative integral values. It may be that our particular
implementation returns 0 in these cases. What fact ( n) returns when the value of n is
negative is an artifice of the implementation and not part of the specification; clients
should be told explicitly not to rely on this behavior. Another implementation replac-
ing this one could easily provide different behavior for negative values of n (including
causing your program to crash).
Unless explicitly stated in our comments, clients and test engineers will, in general,
have no way to distinguish between what is intended or required behavior and what is
simply coincidental behavior resulting from the particular implementation choice. A
better, more usable interface is presented below:
struct MathUtil {
I I ...
static int factorial (int n);
II Returns the product of consecutive integers between 1 and n
II for positive n. If n is 0, 1 is returned.
II Note that the behavior is not defined for negative values
II of n nor for results that are too large to fit in an into
};
90 Ground Rules Chapter 2
Error checking throughout every level of a system in order to detect logic errors can
become expensive, especially for large systems. Good documentation can be a viable
alternative to writing excessive code. For example, some software developers feel that
it is necessary to handle every pointer that comes into a function, even if that pointer
is null. If this function is part of a widely used interface, favoring robustness might
well prove to be a good decision. Alternatively, it can be sufficient to make it clear to
clients that passing a null pointer will result in undefined behavior, backing that up
with an assert statement at the beginning of the function implementation:
II stdio.c
#include <stdio.h>
#include <assert.h> ...
1* ... *I
int printf(const char *format ... )
{
assert(format);
1*
*1
}
The effective use of both documentation and assert statements can lead to lighter-
weight code that is still quite usable. If someone misuses the function, it is their own
fault-and they'll find out about it soon enough!
Section 2.7 Identifier-Naming Conventions 91
It would be laudable if every developer always made it clear when, for example, a
pointer argument to functions cannot be null. Responsible cl~ents, however, should
not assume that a pointer argument can be null unless the resulting behavior is
explicitly stated.
Distinguishing data member, type, and constant names from other identifier names in
a consistent and objectively verifiable way can be a significant advantage when main-
taining a large system. Section 1.4.1 presented a collection of naming conventions
that we tersely punctuate here with three design rules and two guidelines.
You may also elect to use s_ to distinguish static from instance data. The above prac-
tice is a minor design rule because clients will never have to deal with this issue
(since, according to Section 2.2, data members should always be private).
The above practice is presented as a rule and not a guideline because it is a widely
accepted and objectively verified standard that improves readability in general,
92 Ground Rules Chapter 2
making interfaces easier to understand and code easier to maintain. It is a minor rule
because an isolated lapse is not the end of the world.
The above practice helps to distinguish constant (and therefore "stateless") variables
from both local variables and member (state) variables. It is presented as a design rule
and not a guideline because it helps to improve maintainability, it is objectively verifi-
able, and it requires no exceptions.
The above practice is also objectively verifiable, but not everyone can be convinced o(
its virtue, and it is largely a matter of style. Its utility is in making identifier names
somewhat easier to remember and in exhibiting a more professional image to most
customers. It is presented here as a guideline (particularly for the interface), but toler-
ates some degree of individuality in the implementation. (In this book we have'
adopted the uppercase standard.)
Guideline
Be consistent about names used in the same way; in particular adopt
consistent method names and operators for recurring design patterns
such as iteration.
Section 2.8 Summary 93
Attaining consistency across the interface of a large system can enhance usability and
can also be surprisingly difficult to accomplish. Empowering a group of top-notch
developers to act as "Interface Engineers" has proven effective in achieving consis-
tency across development groups in large projects. Container classes, along with their
iterators, also lend themselves to template implementations (see Section 10.4) that can
be effective at enforcing consistency across otherwise unrelated objects.
2.8 Summary
C++ is a large language, giving way to an even larger design space. In this chapter we
have described a modest set of fundamental design rules and guidelines that have
proven themselves to be useful in practice.
Major design rules are presumed never to be violated. Even infrequent violations
could compromise the integrity of a large system. Throughout this text, we will
assume that all major design rules have been followed consistently.
Minor design rules are also presumed t~'be followed but perhaps not with draconian
adherence. Deviating from a minor rule in isolated instances is unlikely to have a
severe global impact.
Guidelines are presented as rules of thumb, and should be followed unless there is a
compelling engineering reason to do otherwise.
Exposing the member data of a class to its clients violates encapsulation. Providing
non-private access to member data implies that local changes in representation may
force clients to rework their code. Furthermore, by allowing writable access to data
members, there is no way to prevent accidental misuse from leaving data in an incon-
sistent state. Protected member data is like public member data in that there is no limit
to the number of clients that might be affected by a change to that data.
Global variables pollute the global name space and warp the physical structure of a
design in ways that can make independent testing and selective reuse virtually impos-
sible. There is no need to use global variables in new C++ projects. We can systemat-
ically eliminate global variables by placing them in class scope as private static
members, and then provide public static function members to access them. Excessive
dependency on such modules, however, is a symptom of a poor design.
94 Ground Rules Chapter 2
Free functions, particularly those that do not operate on any user-defined type, are
likely candidates for collision with other functions during integrations. Nesting such
functions in class scope as static members all but eliminates the danger of collision.
Enumerations, typedefs, and constant data also threaten the global name space. By
nesting enumerations within class scope, any ambiguity can be resolved via scope res-
olution. A typedef at file scope can look suspiciously like a class, and be surprisingly
difficult to find in a large project. By nesting typedefs in class scope, they become rel-
atively easy to track down. An integral constant defined in a header file is often best
expressed by an enumerator in class scope. Other types of constants can be scoped by
making them static const members of some class.
Preprocessor macros are difficult to understand for both human beings and machines.
Since macros are not part of the C++ language, they are irreverent of scope, and, if
placed in a header file, they can collide with any identifier in any file in the system.
Consequently macros should not appear in header files except as include guards.
All things considered, we will avoid introducing anything into file scope in a header
file other than classes, structures, unions, and free operators. We will, of course, allow
inline member function definitions in headers.
Including a definition twice results in a compile-time error. Since most C++ header
files contain definitions, it is essential that we protect against the possibility of a
reconvergent include graph. Wrapping the definitions inside a header with internal
include guards ensures that the contents of each header will be incorporated at most
once in any translation unit.
Redundant (external) include guards, although not strictly necessary, ensure that we
avoid potentially quadratic behavior at compile time. By wrapping include directives
in header files with redundant guards, we ensure having to open a header file at most
twice per translation unit.
Not all code must be robust. Redundant, runtime program-error checking at every
level of the system can have an unacceptable impact on performance. A combina-
tion of documentation and assertions can serve the same purpose, but with superior
runtime performance in the final product.
Developing a large-scale software system in c++ requires more than just a sound
understanding of logical design issues. Logical entities, such as classes and functions,
are like the flesh and skin of a system. The logical entities that make up large C++
systems are distributed across many physical entities, such as files and directories.
The physical architecture is the skeleton of the system-if it is malformed, there is no
cosmetic remedy for alleviating its unpleasant symptoms.
The quality of the physical design of a large system will dictate the cost of its mainte-
nance and the potential it has for the independent reuse of its subsystems. Effective
design requires a thorough grasp of physical design concepts that although closely
tied to many logical design issues include a dimension with which even expert profes-
sional software developers may have little or no experience. Part II of this book pre-
sents a thorough introduction to the fundamental concepts of good physical design.
Chapter 4 describes the importance of physical hierarchy (i.e., layering) with respect to
development, maintenance, and testing. In this chapter we explore how to characterize
individual components, subsystems, and entire systems in terms of their physical depen-
dencies. We see how to exploit the hierarchical structure of sound physical designs to
achieve higher reliability at lower cost through isolation, incremental, and hierarchical
testing. We also measure how the physical dependencies in a system contribute to the
cost of maintenance and regression testing in terms of link time and disk space.
Chapter 7 extends the concept of levelization to very large systems. Additional physical
structure beyond that of individual components is needed to support the complex
functionality of such systems. Packages represent a physically cohesive collection of
cooperating components and provide a higher level of physical abstraction than can be
achieved with components alone. In this chapter we revisit the concepts of levelization
and insulation in the context of packages as a whole. We also touch on issues pertaining
to the process of developing and releasing stable snapshots of a very large system.
Finally, we discuss the role of rna i n ( ) in object-oriented systems and the relative advan-
tages of various strategies for initialization.
Components
This chapter introduces the notion of physical ,design in contrast to the more popular
topic of logical design. The component is presented as the fundamental unit of design.
Next we explore a small collection of physical rules that ensure important desirable
properties in large designs. We then discuss the DependsOn relation among compo-
nents and see how to infer this relation from abstract logical relationships at design
time. We also see how to track physical dependencies efficiently by examining the
#i nc 1 ude graph among components. Finally, we explore the subtle physical implica-
tions of granting friendship both inside and outside components.
Logical design emphasizes the interaction of the classes and functions defined within
a system. From a purely logical point of view, a design can be thought of as a sea of
classes and functions where no physical partitions exist-every class and free
function resides in a single seamless space. Interactive object-oriented languages such
as Smalltalk and CLOS with their rich, runtime environments geared toward a single
developer have no doubt helped to foster this monolithic perspective.
Logical desigp., however, looks at only one side of the design process. Logical design
does not take into account physical entities such as files and libraries. Compile-time
coupling, link-time dependency, and independent reuse are simply not addressed by
logical design. For example, whether or not a function is declared i n 1 i n e does not
affect what it does, but can greatly affect readily measurable characteristics such as
runtime, compile time, link time, and executable size. Without considering the physical
100 Components Chapter 3
view of a design, it is not possible to consider the organizational issues that become
important when developing very large systems.
Physical design focuses on the physical entities in the system and how they are
interrelated. In most conventional C++ programming environments, the source code
for every logical entity in the system must reside in a physical entity, commonly
referred to as aftle. Ultimately, the physical structure of every C++ program can be
described as a collection of files. Some of these files will be header (. h) files and
some of them will be implementation files ( . c) files. For small programs, this descrip-
tion is sufficient. For larger programs, we need to impose additional structure in order
to create maintainable, testable, and reusable subsystems.
1 The notion of a component is presented in stroustrup, Section 12.3, pp. 422-425. In this chapter,
we expand on that discussion by introducing physical design concepts that make the definition of a
component in C++ concrete.
2 We will ignore extraordinary circumstances that might justify a component having more than a
single . h or . c file.
Section 3.1 Components versus Classes 101
A component will typically define one or more closely related classes and any free
operators deemed appropriate for the abstraction it supports. Basic types such as
Point, String, and Biglnt will each be implemented in a component containing a
single class (Figure 3-1 a). Container classes such as In tSet, Sta c k, and Lis twill
typically be implemented in a component containing (at least) the principle class and
its iterator (Figure 3-1 b). More complex abstractions involving multiple types such as
Graph can embody several classes in a single component (see Figure 3-lc). Finally,
classes that provide a wrapper for an entire subsystem (see Section 5.10) may form a
thin encapsulating layer consisting of one or more principle classes and many iterators
(Figure 3.1d).
Each of the components in Figure 3-1 (like every other component) has a physical as
well as a logical view. The physical view consists of the . h file and the . c file, with
the . h file included as the first substantive line of the . c file. The physical implemen-
tation of a component always depends on its interface at compile time. This internal
physical coupling contributes to the need to treat these two files as a single physical
entity.
102 Components Chapter 3
point point
intset.h intset.c
intset intset
graph.h graph.c
graph graph
(d)~~~~~~~~~~
simulator simulator
Figure 3-1: Logical versus Physical View of Several Components
Section 3.1 Components versus Classes 103
A component (and not a class) is the appropriate fundamental unit of both logical and
physical design for at least three reasons:
As a concrete example, Figure 3-2 shows the header file for a stack component con-
taining two classes defined at file scope, namely, St a c k and St a c kIt e r. We can also
see that there are two free (i.e., not member) operator functions implementing == and
!= between two Stack objects. Peeking at the implementation, we would discover
that operator== uses Stacklter, and that operator!= is implemented in terms of
operator==. The complete set of logical entities at file scope in component stack is
pictured in Figure 3-3a. The physical entitles (s t a c k . hand s t a c k . c) along with their
canonical physical relationship are depicted in Figure 3-3b.
104 Components Chapter 3
II stack.h
#ifndef INCLUDED_STACK
#define INCLUDED_STACK
c1 ass StackIter;
class Stack {
int *d_stack_p; II pointer to array of int
int d_sp; II stack pointer (index)
int d_size; II size of current array of in!
friend StackIter; II (no comment needed)
public:
II CREATORS
Stack(); II create an empty Stack
Stack(const Stack& stack); II (no comment needed)
""'Stack(); II (no comment needed)
II MANIPULATORS
Stack& operator=(const Stack& stack); II copy Stack from Stack
void pushCint value); II push integer onto this Stac~
int pope); II pop integer off this Stack
II undefined if Stack empty .
II ACCESSORS
int isEmpty() const; II 1 if empty else a
i nt top () c,o ns t ; II integer on top of this Stack
, .
}, . II undefined if Stack empty
int operator==(const Stack& lhs, const Stack& rhs);
II 1 if two stacks contain identical values else 0
int operator!=Cconst Stack& lhs, const Stack& rhs):
II 1 if two stacks do not contain identical values else 0
class Stacklter { II iter order: top to bottom
int *d_stack_p; II points to orig. stack array
int d_sp; II local stack pOinter (index)
StacklterCconst Stacklter&); II not implemented
Stacklter& operator=(const StackIter&); II not implemented
public:
II CREATORS
Stacklter(const Stack& stack); II initialize to top of Stack
-StacklterC); II (no comment needed)
II MANIPULATORS
void operator++(); II advance state of iteration
II undefined if done
II. ACCESSORS
operator canst void *() const; II non-zero if not done else 0
int operatorC)() const; II value of current integer
}; II undefined if done
#endif
Figure 3-2: Header File stack. h for a stack Component
Section 3.1 Components versus Classes 105
.. .
stack.h stack.c
stack stack
(a) Logical View (b) Physical View
We have chosen a simple stack to ensure that the application functionality does not
obscure the points we want to illustrate. In this example, almost every member is
commented (which is a bare minimum for production code). A stack is a kind of con-
tainer. Access to other than the top element of a stack is not nonnally thought of as
part of a stack abstraction. We have provided the iterator to make the functionality
defined in this stack component more generally extensible by clients, while preserv-
ing encapsulation (see Section 1.5). We make no mention of a maximum stack size
because a stack abstraction has no maximum size. Providing functionality such as
i s Full or a return status that exposes artificial limitations imposed by a substandard
implementation not only violates the abstraction but also complicates its use. Such
unexpected, implementation-based limitations are better treated as exceptions. Some-
times, however, we will allow a client to "help" an object to anticipate future events,
potentially improving performance. In order to avoi~ exposing a particular implemen-
tation choice, such "help"-like regi ster in C or i nl i ne in c++ should be only a
hint and have no programmatically detectable effect (see Section 10.3.1).
The logical interface of a component is the set of types and functionality defined in
the header file that are programmatically accessible by clients of that component.
Private implementation details that for organizational reasons reside in the . h file are
encapsulated and not considered part of the logical interface.
In the same sense that the public interface of a class consists of the union of the inter-
faces of the public members of that class (Section 1.6.2), the "public" interface of a
component consists of the collection of all public member functions, typedefs, enu-
merations, and free (operator) functions declared in the component's. h file.
For example, the public member functions of both St a c k and St a c kIt e r contribute to
the logical interface of component stack. The free operator function
is not a member of St a c k and therefore is not considered as part of the logical inter-
face of class Stack. Nonetheless, this operator does extend the set of programmati-
cally accessible functions defined in component stack and therefore does extend the
component's logical interface. The somewhat subtle issues surrounding friendship are
discussed in Section 3.6.
II stack.t.e
#include "stack.h"
#include <iostream.h>
rna i n ( )
{
Stack stack;
stack.push(III);
stack.push(222);
stack.push(333);
II Output:
II 333
II 222
II 111
therefore will not alter the physical interface, thus forcing clients to recompile. The
downside is that for a lightweight object such as St a c k, removing inline functions
could result in an order-of-magnitude loss in runtime performance (see Section 6.6.1).
From a logical point of view, what is and is not used in the implementation of a com-
ponent is an encapsulated detail and unimportant. From a physical point of view, such
usage can imply physical dependencies on other components. It is these physical
dependencies that will affect maintainability and reusability in large systems.
Good design requires that the developer understand the issues involved in both logical
and physical design. Logical design is the natural place to start. We must consider
what logical entities either naturally belong together or are sufficiently interdependent
that they cannot reasonably be separated. We must also consider how much of the
implementation detail we want to expose in the physical interface. Furthermore, we
need to decide on what other components our component will depend, and what impact
changes in these components will have on both our own component and its clients. A
component has not been designed properly until all of these issues have been
addressed.
This section considers the fundamental rules of physical design. These rules are
necessary if our other practices and techniques are to be effective. It is virtually
impossible to correct a large design that has not followed these practices in essence
from the start.
It may seem obvious, but this rule should be stated clearly once. For a component to
be reusable it must be reasonably self-contained. A component may have dependen-
cies on other components. However, any logical constructs (apart from class declara-
tions) that a component declares within its own header file-if defined at all-should
be defined entirely within that component.
Figure 3-5 is an example of how not to partition logical entities into physical units.
Class St a c k has been defined in component s t a c k, but its implementation is not con-
fined to the s t a c k component. St ac k: : pus h is defined in set. c, and St ac k: : pop is
defined in ma in. c !
II intset.h II stack.h
#ifndef INCLUDED_INTSET #ifndef INCLUDED STACK
#define INCLUDED INTSET #define INCLUDED STACK
class IntSet { class Stack {
I I ... I I ...
public: public:
II I I ...
II void push(int i);
II int pope);
}; };
#endif 1Iendif
intseth stack.h
II I I ... II
•
intset.c stack.c maln.c
The root names of the . c file and the . h file that comprise a component
should match exactly.
It is important for maintainability that the root names of a component's files match
exactly. Knowing, for example, that s t a c k . c and s t a c k . h comprise a single compo-
nent not only facilitates manual maintenance but also opens the door to simple object-
oriented design automation tools (see Appendix C).
Unfortunately, some existing object code archivers place relatively low character lim-
its (e.g., 13) on object file names. Hence it is not always possible to have the name of
the component's. c file mirror the name of its principal class. Worse, some operating
systems limit file names to only eight characters (plus a three-character suffix), which
can be a significant burden when developing very large systems.
The . c file of every component should include its own . h file as the
first substantive line of code.
We must include the . h file of a component in its . c file because the compiler must
see the declaration of a class member before it can compile its definition. This practice
is required by the language and also by many common dependency-analysis tools. The
reason for placing this 11 inc 1u de directive at the top of the file is somewhat subtle.
section 3.2 Physical Design Rules 111
Including the . h file as the very first line of the . c file ensures that no critical piece of
information intrinsic to the physical interface of the. component is missing from the
. h file (or, if there is, that you will find out about it as soon as you try to compile the
. c file).
II wildthing.h
#ifndef INCLUDED_WILDTHING
#define INCLUDED WILDTHING
class WildThing {
I I ...
public:
WildThing();
I I ...
};
1tendif
Notice that we have overloaded the left-shift operator « <) in the way that is normal
and customary for stream output. Next consider the implementation:
II wildthing.c
#include <iostream.h>
#include "wildthing.h"
I I ...
We try to compile the implementation, and it compiles just fine. Next we create a test
file for wi 1 dthi ng:
II wildthing.t.c
#include <iostream.h>
#include "wildthing.h"
int maine)
{
WildThing wild;
II
I I ...
return 0;
}
File wi 1 d t hi n9 . t . c compiles and links. The program runs perfectly, and we go tell
all our friends that we are done. But there is a bug and a physical bug at that! The fol-
lowing program will not compile. Why?
II product.c
#include "wildthing.h"
#include <iostream.h>
int maine)
{
WildThing wild;
II
I I ...
return 0;
The problem is that we did not declare class 0 s t rea mbefore we tried to use it in the
interface of opera tor< < that is declared in wi 1dth i ng . h. The order of the #i n elude
directives was reversed in the client code, and now the header itself doesn't parse
because the 0 s t rea midentifier is not yet declared. How do we fix the problem?
Section 3.2 Physical Design Rules 113
When you figure out the bug, the fix is simple: add the declaration "c 1 ass
as t ream;,,3 to wi 1 dt hi ng . h at file scope before the first use of ostrearn:
II wildthing.h
#ifndef INCLUDED_WILDTHING
#define INCLUDED_WILDTHING
class WildThing {
I I ...
public:
WildThing();
I / ...
};
#endif
The more important question is How do we prevent the problem? The answer is
equally simple. Always make the . c file of each component include the . h file for that
component before including or declaring anything else. In this way each component
ensures that its own header file is self-sufficient with respect to compilation.
Guideline
Clients should include header files providing required type defini-
tions directly; except for non-private inheritance, avoid relying on
one header file to include another.
Whether or not one header file should include another is a physical, not a logical,
issue. In cases where the header file itself needs a definition in another header file in
order to compile (see Section 6.3.7), it is correct to place the appropriate #i nc 1 ude
directive in that header file (surrounded, of course, by redundant external include
guards as described in Section 2.5).
3 And not the preprocessor directive /I inc 1ude <i os t rea m. h> (as explained in Section 6.3.7).
114 Components Chapter 3
Except for public and protected inheritance, however, the need to include a type's def-
inition rather than forward declare it in the header file is almost always dictated by
encapsulated logical implementation details.
How we layer one type on another will affect the degree of compile-time coupling.
Incrementally r~ducing compile-time coupling is the topic of Section 6.3. For exam-
ple, whether My Ty Pe HasA (embeds) St ac k or HoldsA (pointer to) St ac k could deter-
mine whether my ty pe . h includes s t a c k . h or simply forward-declares class St a c k
(see Section 6.3.2). If I alter the implementation of MyType so that it now HoldsA
(instead of HasA) Stack, the Hi ncl ude directive may no longer be needed in
mytype. h. If I remove that directive, then clients who depended on how St a c k was
used in the implementation of MyType would also be forced to change. Even if Stack
were used in the logical interface of MyType, there might still be no need for mytype . h
to include stack. h (see Section 6.3.7). It is up to each client that uses Stack substan-
tively to include its definition directly.
For similar reasons, it would be unwise for a client to rely on the header of some com-
ponent to forward declare a class used only in that component's logical implementation.
Section 3.2 Physical Design Rules 115
For analysis, maintenance, and particularly testing, it is important that someone (or
some tool) be able to look at only the physical interface of a component and under-
stand the complete logical interface of that component. Requiring a component to
declare its entire logical interface in its header file serves to improve
Suppose someone defined an external free function (or vari~ble) in the. c file of com-
ponent foo and failed to declare it as external function (or variable) in foo. h. Another
component, ba r, that happened to link with foo could obtain access to that function
(or variable) by creating the appropriate external declaration locally. This unfortunate
scenario is depicted in Figure 3-6. Note that this example illustrates a poor design (the
kind of example I have tried to avoid presenting in this book).
As the figure shows, the. c file of ba r is dependent on the definitions supplied by the
physical implementation of foo but is independent of foo's physical interface. There
is a "backdoor" usage of foo and an implicit physical dependency of bar upon foo
that cannot be detected easily. Automated dependency generators for makefiles
(mkmf, gmake, etc.) that take into account only the iii ncl ude graph would have no
clue of this subtle dependency. Moreover, to the maintainers of this code there is no
immediate evidence that these two components are coupled. Yet, when we go to reuse
ba r, the link phase will fail because the definition of function f (and global variable
size) will be missing.
116 Components Chapter 3
Bad idea: Free functions and global variables are design-rule violations.
II bar.h II bar.c
#ifndef INCLUDED BAR #include "bar.h"
#define INCLUDED_BAR
I I ... extern int size;
void feint x, int y);
bar
illegal "backdoor"
physical dependency
II foo.h II foo.c
#ifndef INCLUDED FOD #include "foo.h"
#define INCLUDED_FOD
I I ... int size;
void f(int x, int y)
Note: neither s i zenor f {
is declared in this I I ...
}
file.
lIendif
foo
II bar.h II bar.c
#ifndef INCLUDED BAR 4finclude "bar.h"
#define INCLUDED BAR #include "foo.h"
// ...
int Bar::g()
{
size = f(x, Y);
}
4fendif
bar
legal physical dependency
II foo.h II foo.c
#ifndef INCLUDED_FDD #include "foo.hl!
#define INCLUDED_Faa
foo
Had the complete interface been specified in f 0 0 . h, the client component, bar, could
simply have included the foo. h file in its own. c file, making the dependency of the
implementation of ba r on the interface of foa explicit. This new and somewhat;
improved implementation is illustrated in Figure 3-7. However, the use of an external
global variable or an external free function is still a violation of the design rules pre-
sented in Section 2.3.1 and 2.3.2, respectively.
Classes defined at file scope entirely within the . c file of a component could easily
violate this rule, since non-inline class member functions and static member data have
external linkage. If we impose the same restrictions on classes defined entirely in a . c
file that the C++ language itself imposes on local class definitions (i.e., classes
defined entirely within a single function),4 we can avoid creating external definitions
and thereby avoid violating this rule.
Though technically a rule violation, defining a class entirely within a . c file is rela-
tively harmless in practice because name mangling will tend to discourage one from
trying to make direct use of the external symbols. The only real danger is that the
external definition may collide with some other identical definition (which would still
be the case if that class were defined in its own separate component). A more compel-
ling reason to avoid defining classes entirely in a . c file might be that it cannot then
be tested directly (see Section 8.4).
Avoiding backdoor usage is critical to good physical design and effective reuse. It is
not enough to put the burden solely on the author of a component. To close all the
loopholes, we must make a reciprocal requirement that no client attempt to make use
of any construct with external linkage via local declarations. Instead, clients are
required to include the . h file of a component in order to access any definitions that
the component provides.
Our reason for following this rule is primarily to make the dependency on external
definitions in other components explicit.
II foo.c
#include "foo.h"
extern de" double pow{double, int); II bad idea: local extern declaration
However, we will get incorrect results at runtime because the local ext ern declaration
does not match the actual definition of pow: 6
The mismatched declarations will cause the second argument of pow to become
garbled. We can avoid such problems and make the dependency explicit by including
the . h file instead:
II foo.c
Hinclude "foo.hl!
#include <math.h> II pow()
By including the header files, inconsistencies with functions having either linkage
characteristic will be caught at compile time, which is eminently preferable to either
link time or runtime.
are an entirely different matter because class definitions have internal linkage. Such
declarations are not only common but desirable, especially where they can eliminate
preprocessor Hi n c 1 ude directives in header files. This use of class declarations is dis-
cussed with respect to link-time dependencies in Section 5.10 and again with respect
to compile-time dependencies in Section 6.4.3.
Physical dependencies among the components that make up a system will affect its
development, maintenance, testing, and independent reuse. The logical relations
among classes and free (operator) functions will imply physical dependencies among
the components in which they reside. We can define implementation dependency for
functions loosely by saying that a function depends on a component if that component
is needed in order to compile and link the body of that function. We can define imple-
mentation dependency for classes in a similar way. More generally, we can define pre-
cisely a central and purely physical relation among components.
Section 3.3 The DependsOn Relation 121
The DependsOn relation is quite different from the relations we have already seen.
IsA and Uses are logical relations because they apply to logical entities, irrespective
of the physical components in which those logical entities reside. DependsOn is a
physical relation because it applies to components as a whole, which are themselves
physical entities.
The notation used to represent the dependency of one physical unit on another is a
(fat) arrow. For example,
implies that component pl ane depends on component wi ng. That is, component
p 1 an e cannot be used (i.e., it cannot be linked into a program or possibly even com-
piled) unless component wi ng is also available.
As has been our convention, logical entities are represented by ellipses, and physical
entities are represented by rectangles. Notice that the arrow used to indicate physical
dependency is drawn between components and not individual classes. The (fat) arrow
notation used to denote physical dependency should never be confused with the arrow
notation used to denote inheritance. An inheritance arrow always runs between two
classes (which are logical entities); a DependsOn arrow connects physical entities
(such as files, components, and packages).
To illustrate the DependsOn relation in action, consider the following skeleton header
file for a string component. By the way, don't try to name your component "string"; it
may not work well in the presence of the standard C library header s t r i n9 . h.
122 Components Chapter 3
II str.h
#ifndef INCLUDED STR
#define INCLUDED_STR
#ifndef INCLUDED_CHARARRAY
#include "chararray.h"
#endif
class String {
CharArray d_array; II HasA
I / ...
public:
I I ...
};
II
#endif
There is just enough information visible for us to see that class St r i n 9 has a data
member of type CharArray. We know from C that if a struct has an instance of a
user-defined type as a data member, it will be necessary to know the size and layout of
that data member even to parse the definition of the s t rue t.
More specifically, it is not possible to compile any file that needs the definition of
St r i n9 without first including c h a r a r ray. h. For that reason we are justified in
nesting #i ncl ude "cha ra rray. h" in the header file of component str along with
the concomitant, redundant include guards.
Section 3.3 The DependsOn Relation 123
str chararray
sOn
str chararray
II word.h II str.h
#ifndef INCLUDED_WORD #ifndef INCLUDED_STR
#define INCLUDED_WORD #define INCLUDED_STR
#ifndef INCLUDED_STR
#include "str.h" class CharArray:
#endif
#endif #endif
II ward.c II str.c
#include "ward.h" #include "str.h"
#include "chararray.hll
I I ... I I ...
Recall that, except for inline functions, all class member functions and static data mem-
bers in C++ have extemallinkage. For all practical purposes we can say that if a compo-
nent needs to include another component in order to compile, it is going to depend on
that component at link time to resolve undefined symbols at the object-code level.
I
,,"
,',
:', "
word.c str.c
As Figure 3-10 shows, word.o depends on external names. defined in str.o. Even if
word.o does not directly use names defined in chararray.o, it does use names
defined in s t r . o. The names used in s t r . 0 to resolve these undefined symbols will
126 Components Chapter 3
implied by transiti vi
String
word str chararray
The DependsOn relation is important to physical design because it indicates all the
components required for the functionality supplied by a given component to be main-
tained, tested, and reused. We have just seen how to infer physical dependency from
the source code itself. As we will see in Section 3.4, it is possible to infer physical
dependency directly from abstract logical relationships such as IsA and Uses. Infer-
ring physical dependencies at the design stage will help us to achieve a sound physical
architecture early in the development process.
Section 3.4 Implied Dependency 127
1.
•.
1.· ••.••.•.· •. :.••..•••.. ~i-illill~ . I
..•....••.....•.
Unless otherwise stated, we will assume that if a function uses a user-defined type, it
does so in a substantive way. To explain what we mean by substantive, let us assume
for the moment that if a function uses a type in its interface, it will be necessary for
the component defining that function to include the . h file for the component defining
that type.
II two.c II one.h
#include "two.h" #ifndef INCLUDED_ONE
#include "one.h" #define INCLUDED_ONE
I I ... I I ...
int Two::getlnfoCconst One& one) class One {
{ I / ...
return one.infoC); int infoC) const;
} II
II };
II
#endif
Figure 3-12 illustrates our assumption that if function Two: : getlnfo uses class One in
its interface, then it likely does something with One in its implementation that would
require having seen One's definition. In this example, Two: :getlnfo invokes the
128 Components Chapter 3
con s t member function i n f 0 of class 0 ne, which requires the compiler to see the
definition of One in order to compile two. c.
The assumption that the Uses relation implies a compile-time dependency is too
strong. However, this assumption predicts physical implementation dependencies
fairly accurately. It is not necessary for the Uses relation to cause a compile-time
dependency in order to induce an indirect physical dependency. To see how an indi-
rect link-time dependency can occur, consider adding another component, th ree, and
two more files, two. hand three. c, to those of the previous example.
II three.c II two.h
#include "three.h" #ifndef INCLUDED_TWO
#include "two.h" #define INCLUDED_TWO
I I ... class One;
int Three::x2info(const One& one) I I ...
{ class Two {
return 2 * Two::getlnfo(one); I I ...
} public:
II static int getInfo(const One& one);
I I ...
};
II
1Iendif
As shown in Figure 3-13, three.c defines a member function, x2info, which uses
class One in its interface. However, the argument to x2i nfo is passed by reference and
x2 i n fa makes no substantive use of One '8 definition before passing its argument off to
Two's static member function getlnfo, which also accepts a One object by reference.
The x2i nfo function in component three treats class One opaquely and does not
know anything about One other than that it is a c 1 ass, s t ruct, or un ion.
Suppose that no other function in class Th ree uses One (substantively), and also that
there is no other compile-time dependency of component three on component one.
That is, th ree. h and two. h alone are sufficient to compile th ree. c, even though One
is used in the interface of three. But function x2i nfo does depend on One indirectly.
If we try to test th ree, we will not be able to link until we have written and compiled
two. c. To do that, two. c will have to include one. h.
section 3.4 Implied Dependency 129
U ses-In -Interface
three two
ends On
(Implied Indirect Link-Time Dependency)
Certain logical relationships have strong physical implications. For example, deriving
from a type (IsA) or embedding an instance of a type (RasA) always implies that a class
will depend on that type at compile time. In fact, these logical relations imply a com-
pile-time dependency not only for, the class itself, but also for any client of the class.
Figure 3-15 illustrates the physical implications of IsA and RasA for the example in
Figure 3-11. This time Wo rd is reimplemented as a kind of St ring, and St ring has a
130 Components Chapter 3
CharArray data member. 7 The definitions for both String and CharArray must be
available in order for W0 rd. c to compile. Moreover, every client of W0 r d will also
require the definitions both St r i n g and Ch a r Ar ray in order to compile. These same
strong physical implications hold for private derivation and for inline functions that
make substantive use of a type. In all of these cases we are justified in nesting the
required Hi ncl ude directives in the component's. h fi 1e.
RasA
chararray
D dsOn
(Implied Indirect Compile-Time Dependency)
Figure 3-15: Logical IsA and HasA Relations Implying Component Dependency
Such strong physical coupling is not necessarily implied if a class HoldsA type (that
is, if it has a pointer or reference to that type as a data member), nor is it implied if the
type is used substantively in the body of a non-inline function. Such usage does not
justify forcing clients of the component to depend at compile time on its implementa-
tion types as would result from nesting the IIi ncl ude directive in the component's
header. These subtle but important distinctions will be exploited to reduce compile-
time coupling in Chapter 6.
So far we have dealt with only two or three classes at a time. Now we will infer, from
a given abstract logical representation, the physical dependencies among a somewhat
larger collection of components. The diagram in Figure 3-16 depicts a small sub-
system used to support an online glossary.
7 This form of structural inheritance is potentially unsound from a logical design stand point.
Suppose the semantics of Word require it to hold a proper subset of the arbitrary data supported by
the Stri ng base class (e.g., no space, punctuation, or control characters). Using public Inheritance,
there is no way to prevent base class functionality (e.g., Stri ng: : operator= from being used by
clients to violate this requirement (see meyers, Item 37, pp. 130-132).
Section 3.4 Implied Dependency 131
str chararray
alias wordlist
Figure 3-16: Intercomponent Logical Relationships
At the upper right of Figure 3-16, we see class Ch a r Ar ray in its own separate
component. The St ri ng class (to its left) uses Cha rArray in its implementation, so
we infer a likely physical dependency of component str on component cha ra rray:
str chararray
word str
132 Components Chapter 3
As we can see from Figure 3-16, Ali a s not only IsA Wo rd but also Uses Wo rd in its
interface. Notice, however, that the implied dependency of the Uses relationship and
the arrow denoting the IsA relationship point in the same direction (from Ali a s to
Wo rd). Consequently, there is no implied cyclic dependency between Wo rd and Ali as.
It would therefore be possible to use wo rd in a program without including or linking
to ali as.
(implied)
alias word
Now consider the wordl i st component of Figure 3-16, which defines two presumably
template classes Link<Word> and List<Word>. Within component wordlist we see
that there is a logical Uses-In-The-Implementation relationship between Li st<Word>
and Lin k<W0 r d >. Since these classes are already defined within the same component,
logical relationships between them cannot affect physical dependencies.
Both Li nk<Word> and Li st<Word> use Word in their respective interfaces. Either one
of these logical relationships alone would be sufficient for us to infer a likely physical
dependency of the entire wordl i st component on word. Notice again that component
wo rd can exist in a program without including or linking to wo rd 1 i 5 t, but not vice
versa.
wordlist
chararray
The transitive closure of the dependency graph in Figure 3-17 is shown in Figure 3-
18a. All of the edges in this graph labeled with a t are called transitive edges because
their existence is implied by other edges that represent "direct" dependencies. Remov-
ing these redundant transitive edges does not lose essential information, but it does
reduce clutter and make the graph easier to understand.
.·• ·8Ii.al• :~
:·:.::;:!!·• • ·:.·
l~t!~lll!ll
(a) Complete Graph (b) Graph with Redundant Edges Removed
It is easy to tell from Figure 3-18b that word depends indirectly on chararray and
that wordl i st depends indirectly on word. In general, a component x DependsOn
component y if and only if there is path in the dependency graph from x to y.
Suppose now that we are designing a large project, guided by implied dependencies.
After the design stage is largely complete and development is under way, we would
like to have a tool that could extract the actual physical dependencies among our com-
ponents. We could then track the actual component dependencies and compare them
with our initial design expectations.
Although it is possible to parse the source for an entire C++ program to determine the
exact component dependency graph, doing so is both difficult and relatively slow.
However, provided the design rules presented in Section 3.2 have been followed, it is
possible to extract the component dependency graph directly from the components'
source files by.parsing only the C++ preprocessor Hi nc 1 ud e directives. Such process-
ing is relatively fast as and is done by a number of standard, public-domain depen-
dency analysis tools (such as gmake, mkmf, and cdep).
The include graph generated by C++ preprocessor Iii ncl ude direc-
tives should alone be sufficient to infer all physical dependencies
within a system provided the system compiles.
Section 3.5 ' Extracting Actual Dependencies 135
To see why this claim is true, consider the following line of reasoning. If component x
makes direct substantive use of component y, then in order to compile x, the compiler
will have to see the definition supplied in y . h. The only way this can happen is for
component x to directly or indirectly include y . h. As a result of the design rules in
Section 3.2, any such direct substantive use is synonymous with a compile-time
dependency.
The contrapositive (that if x does not include y . h, then x does not have a compile-time
dependency on y) is certainly true, provided x compiles. .
Going the other way, the only reason component x would legitimately include the
header file of component y is if component x did in fact make direct substantive use of
component y. Otherwise the inclusion itself would be superfluous and introduce
unwanted compile-time coupling.
Guideline
The very fact that one component includes the header of another forces a compile-
time dependency whether or not one previously existed. If we assume that all
#i ncl ude directives in a component are necessary, then it is likely that the compile-
time dependency will be accompanied by a link-time dependency (which we already
know is transitive). In other words, "substantive use" should equate to "header file
inclusion," and that substantive use almost always implies a kind of physical depen-
dency that is transitive.
The #inc 1ude graph for a set of components is just another relation that happens to
reflect the dependency among components quite accurately. If we interpret "x
Includes y. h (either directly or indirectly)" as "x DependsOn y directly," then the
relation resulting from the Hi n c 1 ude graph accurately reflects compile-time physical
component dependencies.
The design rule stating that all substantive use of a component must be flagged by
including its header file (rather than via local ext ern declarations) guarantees that the
transitive closure of the Includes relation indicates all actual physical dependencies
among components.
These extracted dependencies occasionally err on the side of being too conservative.
The dependency graph extracted in this manner may indicate additional, fictitious
dependencies brought on by unnecessary if inc 1ude directives (which should be
removed). But, provided that all the major design rules are followed, the graph will
never omit an actual component dependency.
The ability to extract actual physical dependencies from a potentially large collection
of components quickly and accurately allows us to verify throughout the development
process that these dependencies are consistent with our overall architectural plan. A
physical dependency extractor/analyzer tool is described in Appendix C.
3.6 Friendship
We now digress to discuss the subtle issues regarding friendship and how granting
friendship affects the logical interface of a class and of a component. The interaction
between friendship and physical design is surprisingly strong. Although ostensibly a
logical design issue, friendship will influence the way in which we collect logical con-
structs into components. The desire to avoid friendship across component boundaries
Section 3.6 Friendship 137
can even induce us to restructure our logical design. We refer to the material presented
in this section frequently throughout this book.
Guideline
Avoid granting (long-distance) friendship to a logical entity defined
in another component.
According to the Annotated c++ Reference Manual, "A friend is as much a part of
the interface of a class as a member is."g In making this claim, there is an implicit
assumption that the friend is inseparably tied to an object granting it friendship.
From a purely logical point of view, if a class makes a declaration of friendship, then,
according to the definition of Encapsulation (Section 2.2), that declaration is not an
encapsulated detail of the class. Anyone who defines a function whose declaration
exactly matches that of a fri end declaration within a class can gain programmatic
access to the private members of that class, provided no other function matching the
fri end declaration is defined in the same program. In that very precise sense, the
f r i end declaration itself is part of the interface of the class-the actual function defi-
nition is not.
By treating the component and not the class as the fundamental unit of design, we
gain an entirely different perspective. As long as friendship is granted locally (i.e., as
long as it is granted only to logical entities defined within the same component), the
friends are, in fact, inseparably tied to the object granting friendship.
If this operator were suddenly declared a fr i end of class Sta c k, allegedly placing the.
operator itself in the (public) interface of St a c k, then it should be possible to detect
this change programmatically-right? But, provided that the operator is defined
within the same component, granting the operator friend status has absolutely no.
effect on the logical interface of that component. In fact, from any client's point of
view, whether operator== is or is not a friend of class Stack is an encapsulated
implementation detail of this component!
To illustrate this point further, consider briefly a St ri ng class that defines (among
other things) member 0 per a t 0 r+= to implement concatenation to itself.
String {
I I ...
public:
I I ...
String(const String& string); II copy constructor
I I ...
String& operator+=(const String& rhs); II concatenate to me
};
We can now choose to implement nondestructive concatenation (+) in the same com-
ponent, without making the operator a friend:
String {
II...
friend String operator+(const String&, const String&):
public:
I I ...
StringCconst String& string); II copy constructor
I I ...
String& operator+=(const String& rhs); II concatenate to me
};
Declaring operator+ to be a fri end of Stri ng allows for a more efficient implemen-
tation and potentially increases the cost of maintenance, but does not affect the logical
interface of the component:
Granting local friendship does not threaten to expose the private details of an object to
unauthorized users. Because classes that are declared friends are defined (locally)
within the header file of the same component, anyone who tries to use the object
granting friendship will have the valid definitions of all friend classes thrust upon
9 The C++ language makes no distinction based on the location of the friend declaration within a
class. However, placing the declaration in a private area of the class reflects the component's
semantics with respect to local friendship. .
140 Components Chapter 3
them. Any attempt to redefine these friend classes will be prevented by the compiler,
which will promptly issue the error: .
Granting private access to another physical piece of the system leaves a hole in the
encapsulation that could be abused by plugging in a counterfeit component to obtain
access. For example, suppose the St ac kIt e r class from Section 3.1 were declared in
component stack; ter, separate from class Stack. Then there would be nothing to
stop a user of the s t ac k component from substituting his or her own component
defining a customized St a c kIt e r, and thereby obtaining private access to the St a c k
class. Once this happens, the class granting the long-distance friendship has no pro-
tection against access to its private members-its encapsulation has been violated.
Section 3.6.1 Long-Distance Friendship and Implied Dependency 141
A class is an indivisible logical unit. A free function is a distinct logical unit. Whether
the free function is or is not a friend of a class never affects any implied physical
dependency in the system.
II barop.h
class Bar;
int operator==(canst Bar&, canst Bar&);
142 Components Chapter 3
defined in its own component, barop. Figure 3-19 shows this operator along with
class Stack and class Stacklter (now shown in separate components as well). This
free operator is neither a member nor a friend of St a c k, and therefore it clearly does
not extend the interface of class S.tack. But what exactly changes when we declare
this operator a fri end of Stack?
barop bar
Stack
stackiter stack
Would 0 pe r a to r== ( con s t Bar &, con s t Ba r&) now be considered part of the inter-
face of class St a c k? If so, then St a c k uses Bar in its interface, and there is an erroneous
implied dependency of component stack on component ba r as shown in Figure 3-20.
Bar
barop
stackiter stack
The physical dependencies of Stack do not suddenly assume those of barop just by
granting the operator friendship. Using a type implies a dependency on all of its mem-
bers but not necessarily on any of its friends. In particular, 0 per at 0 r== ( con s t Bar &,
con s t Bar &) is a friend of St a c k. St a c kIt e rUses St a c k, but this in no way implies that
Stacklter Uses operator==( canst Bar&, const Ba r&) either directly or indirectly.
Section 3.6.1 Long-Distance Friendship and Implied Dependency 143
Notice the direction of the arrow used to indicate the IsFriendOf relation in Figure 3-20.
The arrow indicates that ope ra to r==( can s t Ba r&, con s t Ba r&) is now permitted to
depend on St a C k in a more intimate way than before, but it does not guarantee any
actual dependency. There is no physical dependency whatsoever in the opposite direc-
tion-as would be implied by treating ope ra to r==( cons t Ba r&, can s t Ba r&) as if
it were part of the Stack's logical interface. To summarize, only access privilege and
not physical dependency is altered by granting friendship.
The importance of this principle is illustrated by the following pair of free operators
used to compare objects of type Stack and type Foa (symmetrically):
We do not need to look inside the header file for either St a c k or F00 to know that
these operators are not members and therefore are not part of the logical interface of
either class. Since these operators are not part of either class, we could define them in
an entirely separate component that could then be included by clients only when
needed. Regardless of the access privilege, the U ses-In-The-Interface relation points
in one direction, from operator to class, as shown in Figure 3-21.
foostackop stack
Now consider the highly questionable decision to add instead the following two
operator== member functions:
As members, these operator functions are clearly part of the interfaces of their respec-
tive classes. Each member operator uses the other class in its interface. The presence
of these operators introduces an undesirable, cyclic Uses-In-The-Interface relation-
ship between Foo and Stack as shown in Figure 3-22. No such cyclic dependency was
induced when the operators were free and defined in a separate component. Adding
free (operator) functions never affects the logical interface of any class regardless of
access because free operators, unlike members, are not an intrinsic part of any class.
(Note that making operator== a member is a poor decision in terms of purely logical
design considerations, as discussed in Section 9.1.2.)
An unscrupulous developer can gain access to private details simply by defining the
friend class locally (at file scope). The developer can then exploit these details via
inline functions, which do not have external linkage and hence will not collide with
Section 3.6.2 Friendship and Fraud 145
the legitimate function definitions, even if they are linked into the program. For the
same reason, declaring an individual non-inline free (operator) function a fri end-
even locally-is not immune to fraud via inline replacement. People actually do this
in production code. You have been warned!
Figure 3-23 illustrates the highly questionable practice of taking deliberate advantage
of the hole in encapsulation left by employing long-distance friendships. Class J ail
defines a private member rel ease() and befriends a class named Jai 1 Key, defined
outside the j ail component. The authorized J ail Key is defined within component
j ail key, which is linked into the program. A malevolent vis ito r component
declares a local version of class J ail Key hidden entirely within the vis ito r . c file.
Since this illicit version of J ail Key has no members with external linkage, it is able to
coexist silently in a program and still take advantage of the friendship afforded by
J ail. The constructor for the Vi s i to r object named "bugsy" defined in rna i n ( ) cre-
ates an instance of its own J ail Key, which on construction calls the private
r e 1 e as e ( ) method of Jail. Escape is inevitable.
Sadly, there are even easier and more heinous ways to violate encapsulation:
II felon.c
#define private public II capital offense
#include "jail.h"
II
II jail.h
#if !definedCINCLUDED JAIL) && !defined(protected) && !definedCprivate)
#define INCLUDED JAIL
lIendif
II main.c
Hinclude "jail.hl!
Hinclude "jailkey.h"
#include "visitor.h"
maine)
{ II Output:
Jail jail; II john@john: a.out
JailKey key(jail): II Escape!
Visitor bugsy(jail): II john@john:
}
•
maln.c
II visitor.h II visitor.c
#ifndef INCLUDED VISITOR 1foinclude "visitor.h"
#define INCLUDED VISITOR struct JailKey { II local class
class Jail; JailKey(const Jail& jail)
class Visitor { {
I I ... jail.release();
public: } II no external linkage
Visitor(const Jail& jail); }:
... };
~~_~~~~ . #endif Visitor::Visitor(const Jail& jail)
J ail Key key ( j ail ) ; JailKey
II jail.h II jail.c
#ifndef INCLUDED_JAIL #include "jail.h"
#define INCLUDED_JAIL #include <iostream.h>
class JailKey; II not defined locally
class Jail { void Jail: :release() canst
friend JailKey; II long distance {
void releasee) canst; cout« "Escape!"« endl;
I I ...
};
# endif
jail
3.7 Summary
Friendship directly affects access" privilege but not implied dependency. Indirectly,
however, our desire to avoid long-distance friendships will force us to package inti-
mately related logical entities within a single component, thereby coupling them
physically. Ignoring "these physical considerations invites clients to exploit the breach
of encapsulation caused by all long-distance friendships, and even by local friend-
ships, to individual, non-inline free (operator) functions.
Physical Hierarchy
In this chapter we explore how to exploit physical hierarchy to facilitate the effective
testing of "good" interfaces. We introduce the notion of level numbers to help charac-
terize components in terms of their physical dependencies. Using a complex example,
we demonstrate the value of testing in isolation as well as testing hierarchically and
incrementally. Finally, we derive an objective metric for quantifying the degree of
physical coupling within an arbitrary subsystem. This metric will help us to evaluate
the impact of various design alternatives by making the notion of physical design
quality more objective and concrete.
When a customer test-drives a car, he or she is looking to see how well the car
performs as a unit-how well the car handles, comers, brakes, and so on. The cus-
tomer is also interested in subjective usability-how "nifty" the car looks, how com-
fortable the seats are, how plush the interior is, and, in general, how satisfying the car
would be to own. Typical customers do not test the air-bags, ball-joints, or engine
mounts to see whether they will perform as expected in all circumstances. When
150 Physical Hierarchy Chapter 4
buying a new car from a reputable manufacturer, the customer simply takes for
granted this important low-level reliability.
For the car to function properly, it is important that each of the objects on which the car
depends works properly as well. Customers do not test each part of the car individu-
ally-but somebody does. It is not the responsibility of the customer to "QA" the car.
The customer is paying for a quality product, and part of that quality is the satisfaction
of knowing that the car works properly.
In the real world, each part of a car has been designed with a well-defined interface
and has been tested in isolation under extreme conditions to ensure that it meets its
specified tolerances long before it is ever integrated into a car. In order to maintain a
car, mechanics must be able to gain access to its various parts from time to time in
order to diagnose and fix problems.
Complex software systems are like cars. All of the low-level parts are objects with
well-defined interfaces. Each part or component can be stress tested in isolation.
These parts can then be integrated, via layering, into a sequence of increasingly com-
plex subsystems-each subsystem with a test suite to ensure that the incremental inte-
gration has occurred properly. This layered architecture enables test engineers to
access the functionality implemented in the lower levels of abstraction without expos-
ing clients of the product to these lower-level interfaces. The final product is also
tested to ensure that it meets customer expectations.
To summarize: a well-designed car is built from layered parts that have been tested
thoroughly by the manufacturer:
1. in isolation,
2. within a sequence of partially integrated subsystems, and
3. as a fully integrated product.
Once assembled, these parts are easily accessible by mechanics to facilitate proper
testing and maintenance. In software, the concepts remain essentially the same.
section 4.2 A Complex Subsystem 151
Figure 4-1 illustrates an instance of the point-to-point routing problem. 1 The enclos-
ing region contains three holes that a successful path may touch but not overlap. The
starting point is indicated by s and the ending point is indicated bye. One of the many
possible shortest rectilinear paths of specified width is defined by the center line,
shown in the figure connecting sand e.
1 We present this authentic example in all its detail. It is not necessary, however, to understand every
aspect of this example in order to benefit from the discussions that follow. A cursory reading will be
sufficient.
152 Physical Hierarchy Chapter 4
width
~ ~
start )
~
width < end
l'
shortest
encJosing rectilinear
regIon path
The logical interface for a component solving this complex problem can be decep-
tively simple. The header file p2p_router. h describing the client's interface for the
point-to-point router subsystem is shown in its entirety in Figure 4-2. The (registered)
class prefix p2p_ identifies this component as belonging to the p2p package as well as
eliminating the possibility of identifier name collisions among classes belonging to
distinct packages (see Section 7.2).
II p2p_router.h
#ifndef INCLUDED_P2P_ROUTER
#define INCLUDED_P2P_ROUTER
class geom_Point;
class geom_Polygon;
class p2p_Routerlmp;
class p2p_Router {
p2p_Routerlmp *d_data_p;
II NOT IMPLEMENTED
p2p_Router(const p2p_Router&);
p2p_Router& operator=(const p2p_Router&);
public:
II CREATORS
p2p_Router(const geom_Polygon& enclosingRegion);
II Create router for specified enclosing region.
II The region must be a simple, closed polygon.
-p2p_Router();
Section 4.2 A Complex Subsystem 153
II MANIPULATORS
int addObstruction(const geom_Polygon& hole);
II Add obstruction; obstruction must be a simple, closed polygon.
II If obstruction overlaps another obstructi~n or the perimeter
II of the enclosing shape, return non-zero with no effect and 0
II otherwise. Note: Regions are allowed to touch but not overlap.
II ACCESSORS
int findPath(geom~Polygon *returnValue, const geom_Point& start,
const geom_Point& end, int width) const;
II Determine whether a rectilinear path of specified width exists
II in the current obstructed region between specified start and
II end points. Return 1 if such a path exists and 0 otherwise.
II If a path exists and returnValue is not O. store the center
II line of any shortest path in (*returnValue).
};
#endif
There are two user-defined types used in the logical interface of the point-to-point
router subsystem. These types (geom_Polygon and geom_Poi nt) are part of a public
package (geom) of geometric types used widely throughout the entire system. For ref-
erence purposes, the respective interfaces of geom_Poi nt and geom_Po 1ygon are
sketched in Figure 4-3.
class geom_Point {
I I ...
public:
geom_Point(int x, int y);
geom_Point(const geom_Point& point);
,....geom_Poi nt () {};
geom_Point& operator=(const geom_Point& point);
void setX(int x);
void setY(int y): class geom_Polygon {
I I ...
int x() const;
int y() const: public:
}; geom_Polygon();
geom_Polygon(const geom_Polygon& pgn);
-geom_Polygon() I};
geom_Polygon& operator=(const geom_Polygon& pgn) ;
void appendVertex(const geom_Point& point);
I I ...
int numVertices() const;
const geom_Point& vertex(int vertexlndex) const;
I I ...
};
An actual implementation of this subsystem involves some 5,000 lines of C++ source
code (not including comments), yet using the point-to-point router component is very
easy. A straightforward driver that runs the example of Figure 4-1 is given for
completeness in Figure 4-4. Note that the advantage of this long, linear style is its
simplicity. It is typical of drivers actually used during development and testing.
II p2p_router.t.c
II inc 1 ude p2P_ r 0 ute r . h
It If
#include "geom_polygon.h"
#include IIgeom_point.h"
#include <iostream.h>
maine)
{
geom_Polygon enclosingRegion;
enclosingRegion.appendVertex(geom_Point(O, 1000));
enclosingRegion.appendVertex(geom_Point(O, 600)):
enclosingRegion.appendVertex(geom_Point(700, -100));
enclosingRegion.appendVertex(geom_Point(2l00, -100));
enclosingRegion.appendVertex(geom_Point(2100, 100));
enclosingRegion.appendVertex(geom~Point(3000. 100));
enclosingRegion.appendVertex(geom_Point(3000, -200»);
enclosingRegion.appendVertex(geom_Point(3200, -400));
enclosingRegion.appendVertex(geom_Point(4500, -400);
enclosingRegion.appendVertex(geom_Point(5000, 100));
enclosingRegion.appendVertex(geom_Point(5000, 1000));
enclosingRegion.appendVertex(geom_Point(O, 1000));
geom_Polygon holel;
holel.appendVertex(geom_Point(800. 900»);
holel.appendVertex(geom_Point(800, 700));
holel.appendVertex(geom_Point(1400, 700));
holel.appendVertex(geom_Point(1400. 900));
holel.appendVertex(geom_Point(800. 900));
geom_Polygon hole2;
hole2.appendVertexCgeom_Point(600. 300));
hole2.appendVertex(geom_Point(800. 100));
hole2.appendVertex(geom_PointC1600, 100»);
hole2.appendVertex(geom_Point(1400, 300));
hole2.appendVertex(geom_PointC600, 300));
geom_Polygon-hole3;
hole3.appendVertex(geom_Point(2600, 900);
hole3.appendVertex(geom_Point(2900, 600));
hole3.appendVertex(geom_Point(3800, 600));
hole3.appendVertex(geom_Point(380Q, 300));
hole3.appendVertex(geom_Point(4200, 300));
hole3.appendVertexCgeom_Point(4200, 600));
hole3.appendVertexCgeom_Point(4500, 900));
hole3.appendVertexCgeom_PointC2600. 900));
Section 4.3 The Difficulty in Testing "Good" Interfaces 155
p2p~Router router(enclosingRegion);
router.addObstruction(holel);
'router.addObstruction(ho1e2):
router.addObstruction(hole3):
geom_Polygon centerline;
geom_Point start(400, 800), end(4600, 500);
int width = 400;
II Output:
II john@john a.out
II { (400, 800) (400, 500) (3400, 500) (3400, 200) (4600, 200) (4600, 500) }
II john@john
For example, the p2p_router component (Figure 4-2) contains only four public
functions:
2. a destructor,
The output at the end of Figure 4-4 tells us that this component produced an answer.·
Now stop for a moment and imagine that you are a quality assurance test engineer
assigned to this project. How would you go about thoroughly testing such an interface?
First consider that in general there will be many equally good solutions for an instance
of this problem. Verifying that a solution is a rectilinear path of a given width that
connects two points in a region with obstructions is not trivial, but it can be done with-
out extraordinary effort. Verifying that a solution to this problem is optimal is, in gen-
eral, as difficult as finding the solution in the first place.
You could verify the output by trying several test cases and inspecting them by hand.
Although time consuming, manual inspection can be effective during development.·
Consider what happens when the development phase has ended and the subsystem·
moves into the maintenance/tuning phase. It would be impractical to think that you or
the developers would be willing or even able to manually review the output of every
subsystem on every release.
One approach commonly used to help automate regression testing involves running a
large number of test cases through the system at the top level and capturing the results.
These results are then inspected once by hand to verify their accuracy. Before each
release, new results are obtained and compared with the original results. Presumably,
if the new output matches the old output exactly, the subsystem is correct.
A significant drawback with regression tests for many complex problems, including this
one, is that there may be multiple correct solutions. Although each of the components of
the point-to-point router subsystem may have completely predictable behavior, there is
room in the specification for the developer to alter p2p_Router's implementation in
ways that produce a different (but equally good) final result for a given input.
On a much smaller scale, consider the specification for a simple iterator on some
collection. Typically there is no constraint on the order in which the elements must be
section 4.4 Design for Testability 157
presented. The requirement is that each element in the collection be presented exactly
once. Verifying that an iterator is behaving properly in isolation is not difficult. But
when the iterat"or is embedded in the implementation of a complex subsystem (such as
that headed by the p2p_router component), the ability to test that iterator effectively
may be lost.
Even worse than the pseudo-random behavio.-2 of heuristic-based systems is the com-
pletely unpredictable behavior associated with systems that employ asynchronous
communication. Such systems produce results that are generally not repeatable. In
these cases, high-level regression testing could be virtually useless.
Minimizing the "surface area" in our designs (i.e., providing sufficient but minimal
interfaces) is a cornerstone of good software engine~ring. Yet there is a cruel irony in
knowing that the very interfaces we strive so hard to achieve can present a formidable
barrier to conventional testing techniques. Fortunately there are techniques that we can
use to overcome these testing problems. The proverb about an ounce of prevention
being worth a pound of cure especially pertains here.
A major component of designing in quality is design for testability (DFT). The impor-
tance of DFT is well recognized in the integrated circuit (Ie) industry. In many cases
2 For more on pseudo-random functionality, see rand() in pJauger, Chapter 13, p. 337.
158 Physical Hierarchy Chapter 4 :
it is impractical to test IC chips, some with over a million transistors, from the outside
pins alone.
When an IC chip is fabricated, it acts as a "black box" and can be tested only from the
external inputs and outputs (pins). Figure 4-5a illustrates the process of trying to test a
hardware subsystem w using only the interface provided to regular clients of the chip
. itself. In order to test w, it is necessary not only to figure ~ut v/hat would make a good
test suite for w, but also how to propagate that test suite through the chip to reach the
inputs of w. As if that weren't bad enough, each result that w produces must then be
propagated from the output of w to some output of the chip itself in order to observe
and verify that w has behaved correctly. Ensuring propagation of this information
requires detailed knowledge about the entire chip-knowledge that has nothing to do·
with the correct functionality of w.
A - - - 1 1.....- . . 1
B ---II~---'
c ---..,......
c
(a) Testing a Component from the System Level (b) Testing a Component Directly
One form of DFT for IC chips called SCAN is accomplished with extra pins and
additional internal circuitry provided solely for testing purposes. Using these special
features, test engineers are able to isolate the various subsystems within the chip. In
so doing they are able to gain direct access to the inputs and outputs of internal sub-
systems and to exercise their functionality directly_ In other words, this DFT
approach attempts to grant the tester direct access to a subsystem, thereby eliminating
the cost of propagating signals through the entire chip. In this way, the full functionality
of the subsystem can be explored efficiently as illustrated in Figure 4-5b, without
regard to the details of how the subsystem is used in the larger system.
When first employed, DFT was great for improving quality; however, Ie designers
did not appreciate having this additional design requirement. Not only was this an
Section 4.4 Design for Testability 159
extra consideration, but it made their designs bigger and therefore much more expen-
sive to produce. Many designers were frustrated, considering this disciplined
approach to be an infringement on their creativity.
Both disciplines require that the functionality in these types be tested thoroughly to
ensure correct behavior when instantiated. But, unlike Ie design where each individ-
ual instance of a type must be tested for physical defects, software objects are immune
to such defects. If a class is implemented correctly, then, by definition, all instances of
that type are correctly implemented as well.
I·•.
_i_ •• rILl~~;P~
I ,
·.·.······.······.···.··.p···········
· I
.• .•
Distributing system testing throughout the design hierarchy can be
much more effective per testing dollar than testing at only the higbest-
level interface.
160 Physical Hierarchy Chapter 4
From the point of view of testing, each, software type is like a real-world instance.
Testing the functionality of a St r i n9 class is easiest and most effective if done
directly, rather than by attempting to test it as part of a larger system. And, unlike Ie
testing, we automatically have direct access to the. interface of the software sub-
system-the St r i n 9 class.
Put another way, if we have only X dollars to spend on testing, we can achieve more
thorough coverage if we distribute the testing effort throughout the system, thus test-
ing individual component interfaces directly, than we can by testing from the end
user's interface alone.
Consider again the p2p_router component of Figure 4-2. Even assuming entirely
predictable behavior, it would be ineffective to attempt to test this component entirely
from the highest level, especially given its tiny interface. In analogy to IC testing (see
Figure 4-6), this would be like trying to test a one-million-transistor microprocessor
chip with only two pins!3
Software testing is inherently easier than hardware testing because instances of a class
created within a system are no different from instances of the same class created inde-
pendently, outside that system. If a complex software subsystem were truly analogous
to an IC chip, the implementation would reside entirely within a single physical com-
ponent. If the functionality declared in p2p_router. h were implemented entirely
within p2p_router. C, we would probably be forced to violate encapSUlation by pro-
viding extra functionality in the public interface-just to enable effective testing.
3 Other kinds of Ie testing strategies such as Build-In Self Test (BIST) place additional circuitry on
the chip that can be enabled to verify that the chip is working properly without having to propagate
specific information to the interface. BIST is somewhat analogous to the use of assert statements in
software. Adding public functionality, such as testMe(), would be a more accurate analogy, but the
physical hierarchy in our software architecture allows us to achieve the same result without adding
any test-specific functionality to the interface of a component.
Section 4.5 Testing in Isolation 161
.
In out
Figure 4-6: Fictitious Highly Test-Resistant IC Chip with Only Two Pins
Fortunately, the implementation of the point-to-point router does not live in a single
component. Instead, this implementation is deliberately distributed throughout a
physical hierarchy of components. Even though the client of a p2p~Route r object has
no programmatic access to the layered objects that make up the router, it is still possi-
ble for test engineers to identify subcomponents with predictable behaviors that can
be tested and verified much more efficiently in isolation.
Consider the physical architecture for the p2p~router shown in Figure 4-7. By
designing the p2p_router so that each of its subsystems can be developed and tested
individually, we can ensure that each of their upgraded functionalities is in place even
162 Physical Hierarchy Chapter 4
though they cannot be verified through the completed routers interface until some
future date. If programming errors occur, they can be detected and fixed in parallel.
Integration is where most specification errors are detected. When the integrated sys-
tem fails to perform as anticipated, the development team must scramble to diagnose
the problems. Inevitably, they will find many coding bugs not intrinsically related to
the integration itself. Independent testing could have at least allowed these coding
errors to have been diagnosed and fixed much earlier in the development process.
At the lowest levels of a complex system, components are often heavily optimized,
increasing the likelihood of subtle errors and the need for detailed regression tests. For
example, carefully designed, object-specific memory management can often double
runtime performance. However, custom memory managers are quite error prone, and
these errors are among the most difficult to detect and repair. Instrumenting global
operators new and del et e in an isolated component test driver can ensure that the
memory-management scheme is functioning properly under a wide variety of condi-
tions, including those encountered only infrequently in practice.
Not all programs use all functionality in reusable components. For example, if a pro-
gram does not call the pop ( ) member of a Sta c k class, there is no way that pop ( ) can
be tested just by testing that program. Even if a particular program calls every func-
tion, there may be states in which objects are supposed to behave properly, but which
the surrounding software does not allow them to attain.
In a large project, the author of the St r i n9 class is probably not the same individual
as the one whose valid enhancement exposes the problem. Detecting and then repair-
ing such bugs, not to mention the frustration that ensues, is far more expensive than
simply avoiding them in the first place through early, component-level testing in
isolation.
It would be redundant and unnecessarily costly for every system that uses a library
facility such as i ostream to have tests to verify that the needed i ostream functional-
ity is working properly. People have come to assume that i ostream does work as
intended. For large systems, there will probably be many application libraries devel-
oped in house. No single executable will make use of all of this functionality, yet all
of it should be tested thoroughly in isolation.
164 Physical Hierarchy Chapter 4
We can avoid the redundancy by grouping the testing effort with the components them-
selves. In so doing, one extends the notion of object-oriented design to include, as a
single unit, not only the component but also the supporting tests and documentation.
Furthermore, well-written component-level tests can facilitate reuse by providing pro-
spective users with a suite of small but comprehensive examples. The functionality
supplied by each component can now be tested thoroughly in a single place; clients
who depend on these components may reasonably assume they are reliable.
Isolation testing is ideal for identifying low-level problems that result from enhance-
ments and is especially useful for porting a system to new platforms. These low-level
tests ensure the preservation of basic functionality and make it easy to track down any
discrepancies. Occasionally defects escape local detection and are caught by tests at
higher levels. The low-level component test should be updated to expose the errant
behavior before the defect is repaired. Doing this will both facilitate the repair and
preserve modularity by making the testing of this component independent of any
particular client.
There is a point of diminishing returns to testing in isolation. For example, placing the
definition of aLi n k class for a simple Lis t object in a distinct component so that it
may be tested in isolation is absurd for two reasons:
Determining this point for component-level isolation testing should be done objec-
tively, based on a costlbenefit analysis, not solely by how much a given developer
loathes (or enjoys) testing.
For a design to be tested effectively, it must be possible to decompose the design into
units of functionality whose complexity is manageable. A component is ideal for this
purpose. Consider the header files for three components cl, c2, and c3 depicted in
Figure 4-8. Note that we have declared class C1 in component headers c2. hand c3. h
without providing its definition because it is not necessary to define a class that is
returned by value in order to declare that function.
section 4.6 Acyclic Physical Dependencies 165
We can observe (Section 3.4) that there are no implied dependencies of clan any other
component. Class C2 uses class C1 in its interface. Therefore it is likely that component
c2 depends on component cl, but, we hope, not on c3. Class C3 uses both C2 and (1 in
its interface, and so c3 is likely to depend on both c2 and cl. The implied dependen-
cies in this system form a directed acyclic graph (DAG) as shown in Figure 4-9a.
Component dependency graphs that contain no cycles have very positive implications
for testability, but not all component dependency graphs are acyclic. To see why, con-
sider what would happen if we changed the return type of (1 : : f from a C1 to a (2 as
follows:
class C1 {
I I ...
public:
II C1 f(); II old
C2 f(); II new
}:
Now (1 uses (2 in its interface and (probably) depends on it. The implied component
dependency graph for this modified system now has a physical cycle, and is shown in
Figure 4-9b.
166 Physical Hierarchy Chapter 4
redundant edge
/'
'"
/'
/'
/'
\ ...... .
\ .
c2 \ \ c1
\ \
\ \
\ \
cycle
c1
Systems with acyclic physical dependencies (such as the one shown in Figure 4-9a)
are far easier to test effectively than those with cycles. Whenever the component
dependencies in a system are acyclic, there is (at least) one reasonable order to go
about testing the system. Since component c 1 depends on nothing else, tests to verify
its functionality in isolation can be written first. Next we see that component c2
depends only on component cl. Because we were able to write effective tests for cl,
we may presume c 1 to be functioning properly. We can now write tests for the func-
tional value added by c2. We need not retest the contribution of cl since that function-
ality is already covered. Then we look at c3 which depends on both cl and c2.
Because we presumably have already written tests to verify the functionality supplied
by both cl and c2, we need address only the additional functionality implemented in c3.
The notion of level numbers is borrowed from the field of digital, gate-level, zero-
delay circuit simulation. 4 Here, a gate implements a low-level block of Boolean func-
tionality. Each gate has two or more connection points called terminals. A circuit
consists of an interconnected collection of gates. Like a gate, a circuit has both input
terminals and output terminals. Primary inputs are inputs to the circuit itself. These
inputs are connected to the inputs of some of the gates within the circuit by pair-wise
terminal connectors called wires. The outputs of these gates are connected by wires to
the inputs of still other gates, and so on. A simple circuit with four primary inputs
(a, b, c, and d) is illustrated in Figure 4-10a.
".A--+--Y
Simulating a circuit involves setting its primary inputs with logical values and then
evaluating each of the (layered) gates in tum. But before any particular gate can be
evaluated, we must make sure that its inputs are valid by ensuring that all gates that
feed this particular gate have already been evaluated.
A circuit is a kind of graph. Here, gates and primary inputs are treated as vertices of a
graph, and wires are treated as (directed) edges. 5 The level number in this context
4 The zero-delay approximation is used primarily in a special kind of circuit simulator known as a
fault simulator. The discovery of this analogy between hardware and software arose, in part, from
the author's Ph.D. research at Columbia University with Professor Stephen H. Unger.
5 The gates themselves impose the edge direction, which reflects the dependency of the gate on its
input source (e.g., either a primary input or the output of another gate in the circuit).
168 Physical Hierarchy Chapter 4
indicates the longest path from a particular gate to a primary input. Primary inputs are
defined to have a level of O. By evaluating these gates in order of increasing levels, we"
can guarantee that every gate's inputs will be valid.
Primary input values are assumed, and do not require evaluation. During simulation,
level-l gates are fed only by primary inputs. These gates are evaluated first, in arbi-
trary order. Next to be evaluated are alllevel-2 gates. Since level-2 gates are fed only
by one or more level-l gate (and possibly also by primary inputs), we are assured at
this point that all inputs for level-2 gates have been evaluated. Since a gate at level N
depends only on levels [0 ... N-l] for its inputs, evaluating gates in levelized order
guarantees a successful simulation.
In Figure 4-10a, a level-l OR-gate feeds the only input of the NOT-gate, making it a
level-2 gate. The AND-gate is fed both by a level-l OR-gate and a level-2 NOT-gate. The
longest path from the AND-gate to a primary input is 3 (through the NOT-gate to primary
input c or d). The AND-gate belongs to the highest level, 3, and is evaluated last.
Notice that, with the pair of cross-coupled NOR-gates in Figure 4-10b, the longest path
from either gate to either primary input (r or s) is unbounded. This circuit cannot be
levelized-that is, it cannot be assigned unique level numbers. The property that
makes a circuit levelizable is that it has no feedback. This lack of feedback makes the
circuit qualitatively easier to understand, develop, analyze, and test. For these reasons,
feedback is used in large systems only under very restricted circumstances. For com-
pletely analogous reasons, a "lack of feedback" is exactly the property we would like
our software designs to possess.
DEFINITION:
Level 0: A component that is external to our package.
Levell: A component that has no local physical dependencies.
Level N: A component that depends physically on a component
at level N-l, but not higher.
In this definition, we assume that all components outside our current package6 (e.g.,
i ostream) have already been tested and are known to function properly. These com-
ponents are treated as "primary inputs" and have a "level" of O. A component with no
local physical dependencies is defined to have a level of 1. Otherwise, a component is
defined to have a level one more than the maximum level of the- components upon
which it depends.
Figure 4-11 shows the component dependency diagram from Figure 3-17 of Section
3.4, which happens not to have any cycles, and hence is levelizable. The level number
is shown in the upper right comer of each component. Component c ha r a r ray does
not depend on any other components locally but does depend on the standard library
components (which are all assumed to be at level 0), so chararray has a level of 1. A
level-l component (such as cha ra rray) that depends only on compiler-supplied
libraries is called a leaf component. Leaf components are always testable in isolation.
6 Assume for now that package means the current project directory.
170 Physical Hierarchy Chapter 4
o
(external or C++
standard library
components)
chararay
alias wordlist
Component str depends only on chararray. The level of str is 2, one more than that
of chararray. Component word depends on str (and indirectly on chararray). Since
s t r has a level of 2, wo rd has a level of 3. Since wo rd is at level 3, and the only com-
ponent on which ali a 5 depends directly is wo rd, ali a 5 is at level 4. The wo rd 1 i 5 t
component also depends directly on word but does not depend on al i as, so wordl i st
is also at level 4.
With a levelized diagram it is easy to tell what components in this system are testable
in isolation. In the example of Figure 4-12 there is only one independently testable
Section 4.7.2 Using Level Numbers in Software 171
component: chararray. By starting at the lowest level (i.e., 1) and testing all compo-
nents on the current level before moving to the next higher level, we are assured that
all the components on which the current component depends have already been tested.
In the example of Figure 4-11, we can test either wo r d1 i 5 t or ali a 5 last, but the rest
of the testing order is implied by the level num-
bers.
Notice that the term levelizable applies to physical, not logical, entities. Although an
acyclic logical dependency graph might imply that a testable physical partition exists,
the level numbers of (physical) components, along with our design rules, imply a via-
ble order for effective testing. Moreover, Figure 4-11 identifies what subsystems can
be reused independently. Figure 4-12 indicates the other components that must
accompany the reuse of any of these components.
Another significant advantage to levelizable designs is that they are more easily
comprehended incrementally. The process of understanding a levelizable design can
proceed in an orderly manner (either top down or bottom up). Not all subsystems
formed by hierarchical designs are reusable. But, to be maintainable, each component
must have a well-defined interface that can be readily understood, regardless of how
general its applicability.
172 Physical Hierarchy Chapter 4
Of course, not all designs are levelizable. Sometimes whether or not a design is level-
izable is not immediately obvious from a logical diagram. Consider the diagram of
Figure 4-13. Can you tell from this diagram whether or not the components in this
design are levelizable?
The indicated logical relationships in this design do not imply cyclic physical depen-
dencies among any of the components. In fact, our design rules ensure that there can be
no hidden physical dependencies (e.g., on external global variables). Figure 4-14 shows
the implied component dependencies and the resulting component level numbers for
this design.
Section 4.7.2 Using Level Numbers in Software 173
The component/class diagram is cluttered and contains more information than needed
to understand the physical structure of the system. If we rearrange the placement of
the components and eliminate the logical detail, we obtain the strikingly lucid compo-
nent dependency diagram of Figure 4-15.
There is one redundant edge in the diagram of Figure 4-15. Component wordexbui 1der
depends directly on components di rectory, fi 1e, and node. As we know from Section
3.3, the DependsOn relationship is transitive. Since di rectory (and fi 1 e also) depends
on node, the dependency of wordexbui 1der on node is implied and can be removed
without affecting level numbers. The diagram in Figure 4-15 is clearly acyclic and
typical of those for subsystems that address a specific application. At this level of
abstraction, the design appears to be sound.
174 Physical Hierarchy Chapter 4
Level 5:
Level 4:
Level 3:
Level 2:
Levell:
*redundant transitive edge
One of the great values of this ~alysis is that, after untangling the component depen-
dency diagram, we were able to make a substantive, qualitative comment about the
integrity of the physical design without even the tiniest discussion of the application
domain. Simple tools to help automate this process are easy to write, and have proven to
be invaluable for large projects. Appendix C describes a simple component-dependency
J
analyzer.
In this sense, implementing and testing a software system is like building a house.
After the overall architectural design is complete, the bricks (i.e., the components, not
objects) are put in place one by one. The successful addition of each brick depends
not only on its own integrity but also on the integrity of the mortar used to integrate
the brick with the lower-level bricks on which this brick depends. It is easy enough to
inspect each brick for defects along the way. But once complete, the house is often
large and complex, presenting too many barriers to inspect each detail.
section 4.8 Hierarchical and Incremental Testing 175
In this approach, a separate test driver for each component is created by the developer
concurrently with the component itself to exercise and verify functionality imple-
mented in that component. Not only is this test driver used extensively during devel-
opment, but it is later ~ade available to quality assurance (QA) in order to help
describe the intended behavior of the component that it verifies.
Each component can be tested using an individual test driver that exercises the func-
tionality implemented in that particular component. Physical dependency governs the
order in which tests are developed and run. Level numbers serve both to characterize
the relative complexity of a component locally within a package and to provide an
objective strategy for testing.
Individual drivers are necessary in order to ensure that physical design rules are fol-
lowed-otherwise we will be unable to demonstrate that functionality declared within
a component is available solely within the subset of components indicated by that
component's dependency graph. To illustrate why this is so, consider the design-rule
violation (shown in Figure 4-16), where component a defines a class A with member
function f ( ), and a component b (layered on a), which illegally implements A: : f ( ).
176 Physical Hierarchy Chapter 4
As illustrated in Figure 4-16a, a single test driver that links to both a and b is inca-,.
pable of detecting this major design rule violation. As far as anyone can tell from the:
dependency graph, component a is independent of component b and therefore can be
reused independently of b. If someone tries to reuse component a independently of b
and calls f ( ), A: : f will show up as an undefined symbol at link time.
In Figure 4-16b, distinct drivers are provided to exercise the functionality in each
component. When linking the driver for component a, component b is deliberately
excluded from the link process. If the driver for a is at all thorough (i.e., calls each
function at least once), then if A: : f is not defined, the error will be caught at link time
-that is, without even having to run the driver. This same technique also serves to
detect components that are not levelizable.
DriverAB DriverB
ab.t.c b.t.c
DriverA
a.t.c
a a
(a) One Driver for Many Components (b) One Driver per Component
Another compelling reason for insisting on individual drivers is that a single compo-
nent typically provides ample functionality for a test driver to exercise thoroughly.
Lumping tests for several components within a single driver would lead to excessively
large (or, more likely, inadequate) tests.
Section 4.8 Hierarchical and Incremental Testing 177
Figure 4-17 illustrates the abstract physical structure of the hierarchical testing
strategy. Each component at level 1 can depend on only external components (all of
which are at level 0). Therefore each component at level 1 can be tested independently
of all other (local) components.
•
•
•
Level 3:
Level 2:
driver2 driver4
Levell:
As we proceed to higher levels of the physical design hierarchy, the complexity of the
subsystems will grow, often exponentially. This explosive growth implies that we will
soon reach the point where tests designed to cover the complete behavior of a high-
level interface will be too difficult to write or take too long to run.
Since we can assume that the components at lower levels are supplying objects that •
are working properly, the task of incremental testing is often reduced to testing the
way in which these lower-level objects combine to form higher-level objects. Writing
incremental tests is not always easy in practice, and requires intimate knowledge of
the implementation of the component.
For example, suppose a user-defined type Xis layered upon three other types, A, B, and
C, each of which lives in a separate component. Figure 4-18a shows part of the defini-
tion for class X. From this partial header we can observe the logical uses relationships
of Figure 4-18b. Now, given that each class resides in a separate component, we can
infer the component dependencies shown in Figure 4-18c.
In this highly simplistic example, testing functions f and 9 of class X amounts to veri-
fying that functions X : : f and X : : 9 are properly hooked up to the appropriate underly-
ing functions c : : U and C: : v, respectively. Since component c is at a lower level than
component x, we can assume that c has already been tested and is internally correct,
making it unnecessary to retest C: : u or c: : V in the driver for component x. By con-
trast, the implementation of X: : h is substantial, and therefore is where most of the
testing effort for this component should be focused.
Section 4.8 Hierarchical and Incremental Testing 179
class X {
Ad_a;
B d- b·,
C d_c;
public:
/ / ...
int f() { return d_c.uCd_a); }
i nt 9 () { ret urn d_c. v (d_a, d_b); }
int hC);
};
x
B
b
(b) Logical Relationships (c) Physical Relationships
Among Classes Among Components
·White-box tests are effective at helping the developer flush out low-level program-
ming errors such as simple coding errors, and often even basic algorithmic errors
resulting in memory leaks and even forced program terminations. Since white-box
tests are implementation dependent, a complete reimplementation of an underlying
object may render such tests ineffective.
White-box testing and 100 percent code coverage are necessary but are not sufficient
to ensure high-quality components. For example, if, as a developer analyzing a prob-
lem, I miss a special case that requires extra processing, it is not likely that the omis-
sion would be uncovered through white-box testing alone.
Unlike the white-box test that verifies that the code works as the developer intended,
the black-box test verifies that the component satisfies its requirements and complies
with its specification.
Black-box testing is driven directly from the component's requirements and specification.
Black-box testing is, for the most part, independent of implementation. Black-box testing
is also appropriate for an independent tester, say from a QA department, who must under-
stand the behavior and proper use of the component from its documentation alone.
One of the appealing properties of incremental testing is that the difficulty of testing
any given component is roughly proportional to the functional value added by that
component itself rather than to the combined complexity of the lower-level compo-
nents on which that component depends. Regardless of how extensive the functional-
ity in components a, b, and c might be, it may be possible to write a relatively short
but thorough incremental test for component x because X: : f and X: : 9 merely propa-
gate information to and from a working C subobject.
To summarize this section: we want the complexity of the test to correspond to the
complexity of the component under test. We want to test all leaf components in isola-
tion. All higher-level components are tested assuming the lower-level components on
which they depend are internally correct. This incremental, hierarchical strategy
allows us to focus our testing effort where it can do the most good, and to avoid the
redundancy of retesting already tested software.
Let us- return once again to the point-to-point router example of Figure 4-2. As dis-
cussed earlier, the interface for p2p_router is difficult to test effectively. It is pre-
cisely for these kinds of interfaces that hierarchical testing is most needed to ensure
quality.
geom_po 1ygon components belong to a separate package, geom, and are assumed by
the p2p package implementor to be internally correct. These reusable library compo-
nents account for a nontrivial portion of the router's implementation.
Component
level
4
Let us assume that in our p2p_router subsystem, each of the lowest-level compo-
nents has predictable behavior and is eminently testable. The level-l components are,
as ever, testable in isolation-independently of any other p2p components. Each of
the level-2 components in this subsystem depends on at most two level-l components.
Each of the level-2 components implements an appropriate amount of additional func-
tionality that, in combination with the already-tested lower-level functionality, is not
difficult to understand and verify.
The p2p_router component insulates its clients of the router from all details of its
implementation, pushing much of its implementation down into the p2p_routerimp
component. In tum, p2p-,-routerimp serves to expose to test engineers subfunctional-
ity that would otherwise be inaccessibly defined within p2p_router. c.
In the actual implementation, p2p_router implements less than 10 percent of the solu-
tion; its job is primarily to coordinate the functionality implemented in the lower-level
Section 4.10 Testability versus Testing 183
Testability and testing are not the same thing. In fact, they are largely independent
aspects of quality. By testable, we mean that there is an effective test strategy that will
allow us to verify that the functionality indicated by the interface (along with support-
ing documentation) is realized by the implementation. By tested, we are saying that
the product has demonstrated that it now conforms to its specifications. Testable is
something we strive to make our products from the moment we start our design.
Tested is a state our product must attain before we release it to our customers. Testing
is something we do all along the way.
Knowing when and how much to test is an engineering trade-off. The more thorough
the developer is at testing the code as it is being implemented, the less likely it will be
that unforeseen bugs will affect the development schedule down the road.
On the other hand, developing thorough tests is time consuming and can significantly
increase the up-front cost of development. Often this extra effort is more than
compensated by reduced time spent in maintenance, future enhancements, and even
current development.
Unfortunately, it is inevitable that the interfaces of many components will change sub-
stantially during the early stages of the development process. Some components will
split apart, others will merge together, and still others will disappear entirely. Conse-
quently, developing thorough regression tests at the preliminary stages of a project
may in some cases tum out not to be cost-effective.
If developers do not consider testability when designing their systems, then the testing
process may not be straightforward or effective. In order to facilitate efficient testing,
the testability of a system must be in place long before its components are ever tested.
Often designs begin with acyclic dependencies and then, as they evolve, cyclic depen-
dencies creep in during enhancements. For example, consider adding to class C1 in
Figure 4-8 of Section 4.6 the member 9 ( ) returning a C2 by value as follows:
class (1 {
I I ...
public:
Cl f();
C2 g(); II new
};
Guideline
The best solution is to correct cyclic dependencies before they happen; or, if they do
creep in, to detect and correct them as soon as they occur. Chapter 5 addresses tech-
niques for restrtlcturing a cyclically dependent design to eliminate the cycles while
preserving the intended behavior.
Merging components into a single component is the right solution when the objects in
the combined abstraction are naturally tightly coupled and other issues -do not over-
ride. If one class befriends another, this -would further suggest that the classes belong
in the same component (see Section 3.6.1). Merging tightly coupled, cohesive compo-
nents also has the welcome benefit of reducing the number of components, and hence
the physical complexity of the system, without further compromising testability or
independent reuse.
Occasionally a single, tightly coupled abstraction will be deemed too large to fit in
one component and will be split into mutually dependent components. Most of the
time, however, the tightly coupled part of the abstraction can be isolated from the rest
of the implementation and placed in a single component, which in tum depends on
other independent components. These independent components can now be tested
thoroughly in isolation (see Section 5.9).
Level 2
r- - - - - - - - - - - - - .,
I I
c12 Levell
I I
L
- - - ------ -- - -- .J
single physical unit, which detracts from the uniformity of a maintainable design.
Although such dependencies are undesirable, the overall testability of a system will
not be lost so long as the number and size of such "blobs" are kept to a minimum.
Linking large programs takes a long time. Typically, developers will need to link a
single component many times in the process of creating both the component and its
test driver. After that, the component will need to be linked to its driver whenever
regression tests are run. For small projects, link times are comparable to the compile
times of individual components. As projects get larger, the link time grows to be much
larger than the time needed to compile even the largest of components.
For the sake of this discussion, let's say that the dependencies in a design formed a
perfect binary tree. Just over half of the components would be at levelland could be
tested in complete isolation. Another quarter would each depend on two leaf compo-
nents. If we let L represent the number of distinct levels in the tree, then only one of
the 2L-l components would actually depend on all the rest. Although real designs are
not nearly so regular, the advantage of testing a hierarchy of components with acyclic
dependencies remains clear.
188 Physical Hierarchy Chapter 4
Consider the costs associated with developing a set of components. For the moment,
let's assume that link time is proportional to the number of components being linked
together. 8 For instance, if linking one component to a test driver takes 1 CPU second,
then linking five components would take roughly 5 CPU seconds.
-- N • N
8 This assumption is of course only a crude approximation, since link cost will clearly be affected by
variation in component sizes and by the structure of the function-call hierarchy.
9 A fully interdependent design has a direct-dependency graph that is "strongly connected" but not ,
necessarily "complete." See abo, Section 5.5, p. 189 and Section 10.3, p. 375 for formal definitions
of these respective terms.
Section 4.12 Cumulative Component Dependency 189
Now consider what would happen if our dependencies were acyclic and formed a
binary tree. Now, not all components have equal link cost. Components at level 1
could be linked to their respective test drivers in unit time (e.g., 1 CPU second). Fully
half of the link cost associated with component testing could be virtually eliminated.
Each component at level 2 would depend on two components at levelland comprise
a subsystem of size 3 (it would take 3 CPU seconds to link). That is, another quarter
of the test cost associated with linking could be reduced dramatically (by a factor of
N/3). Only one component in this hypothetical system, the root, would require the full
N CPU seconds of link time previously required by each of the N components.
Mathematically we can show that the total link cost to incrementally test a system
whose physical dependencies form a binary tree is proportional to N log (N) instead of
N 2 (see Figure 4-22). For example, in the case with 15 components,
CC~alanced
Binary
(15) = (15+ 1) • (log2(15 + 1) - 1) + 1 = 49
Tree
The benefits of acyclic dependencies are enormous. The average time to link an indi-
vidual test driver for an acyclic design with tree-like dependencies is proportional to
the log of the number of components, rather than to the number of components itself,
as is the case for cyclic designs.
190 Physical Hierarchy Chapter 4
Number of
components Units of time
on this level needed to link Level
1 L
. . :.. : ....... ::.: ... : .... :. . ....... :... : .. ::": .. ":.: ...::: ..
··.cOrnponentseadl with a . . . ..
Hm\1,(:,hm[:n:':':iUiH2 ".· • '• ,.i.·. . . .·. .·.·. . . . i.·.·....·. i • · ;~iI'!} . • }\ • •. • i7~~~.')IJ..q~e· .cp §tqf7 lunits~
1
""":
7 3
3 2
1 1
Let L be the number of levels in the system (depth of the binary tree).
Let N = 2L - 1 be the number of components in the system.
CCDBalanced (N) L ( numberof ) ( link-time cost of )
Binary - S components • testing a component
Tree i=1 on level i on level i
L
- S 2L-i
• (2 i - 1)
i=1
L L
- S 2L S 2L-i
;=1 ;=1
L L
- 2L • S1 Si-1
i-I i=1
- 2L • L (2L_ 1)
- (N + 1) • (log2(N + 1) - 1) + 1
Figure 4-23 compares link-time costs associated with testing cyclic and hierarchical
systems with N = 1, 3, 7, and 15 components. The number shown corresponding to
each component position in the dependency graph indicates the link cost associated
with incrementally testing that component. The CCD for each system is calculated
and shown at the bottom of its dependency graph. The CCD for each tree-like system
is calculated in two ways: once level by level and once using the equation derived in
Figure 4-22
N=3 N=7 N= 15
cyclic
dependency
structure
tree-like
dependency
structure
Suppose that you are developing a system that has 63 components, each with its own
test driver. In a cyclic design, each component would take 63 seconds to relink in
order to test. Compare this to a hierarchical design (analyzed in Figure 4-24), in
which fully half of the components can be linked in 1 CPU second, a quarter in 3 CPU
seconds, an eighth in 7 CPU seconds,. and so on. Only one of the 63 components takes
the full 63 CPU seconds to link in order to test it. The total cost of linking all 63 test
drivers is calculated in two ways in Figure 4-24 to be 321 CPU seconds (5.35 CPU
minutes). Compare this with the 63 2 = 3,969 CPU seconds (1.1 CPU hours) it would
take to link all 63 test drivers to a cyclically dependent system.
1 32 • 1 - 32
2 16 • 3 - 48
3 8 • 7 - 56
4 4 • 15 -- 60
5 2 • 31 - 62
6 1 • 63 - 63
Total -- 321
64 • 5 +1
= 321
system. The total link time alone for building component regression tests on a system
with 1,023 components could range from 1,024 - 9 + 1 = 9,217 CPU seconds (just
over 2.5 hours) for the hierarchically designed system to 1,023 -1,023 = 1,046,529
CPU seconds (over 12 days) for the cyclically dependent system.
It is unlikely that a single project would grow to 1,023 components without being
further partitioned into what we call packages. The importance of ensuring acyclic
dependencies among packages is even greater than that for individual components
(see Section 7.3).
CCD is also a predictor of the cumulative disk space requirements for incremental
regression testing. Disk space can become an important consideration when incremen-
tally testing a large system concurrently. The size of each independent executable test
programs on disk will be roughly proportional to the number of components to which
the test driver must statically link. Consequently, cyclically interdependent systems
can require significantly more disk space than -do hierarchical designs.
To summarize: our goal is to be able to build a test driver for each component that
links with the component to be tested and only the (few) components on which that
component depends. CCD is a metric that quantifies the coupling of a system in terms
of the total link-time cost associated with testing each component incrementally.
Cyclically dependent components exhibit quadratic behavior in terms of the link time
and disk space required in order to test them incrementally. In contrast, forming an
acyclic (tree-like) hierarchy of component dependencies reduces the link cost of
incremental component testing dramatically.
In this section we characterize what makes a design maintainable in terms of its physical
dependencies. We continue to discuss CCD and how it is used to indicate the overall
maintainability of a subsystem. We also show how to use CCD to measure incremental
improvements in physical design qUality.
Imagine joining a company that is developing a very large system. You are handed a
subsystem of about 150,000 lines of C++ code and you are asked to understand what
it does and make suggestions as to how to improve it. Upon examination, you find that
the components (for the most part) are consistent with the rules and guidelines set
forth in Chapters 2 and 3. You then discover that most of the components in the sys-
194 Physical Hierarchy Chapter 4
tern depend (either directly or indirectly) on most of the other components. What do
you do? Unfortunately there is no happy ending to this story. The best anyone can do
may be to try to fit the entire design into his or her head, and that may take months.
Had the same subsystem been designed with an eye toward minimizing CCD, most-
if not all-cyclic dependencies would have been eliminated. It would be possible to
study pieces of the subsystem in isolation, to test, verify, tune, and even replace them,
without having to involve the entire subsystem either mentally or physically. In other
words, actively reducing the intercomponent dependencies, as quantified by CCD,
improves understandability and therefore maintainability.
CCD=49
Assume now that the cyclic dependencies in the design of Figure 4-25 are removed,
making it levelizable. Although levelizability is highly desirable, some levelizable
architectures are more maintainable and reusable than others. Consider the design
hierarchies shown in Figure 4-26. Each hierarchy contains seven components, and
each is levelizable. Figure 4-26a shows one extreme version of levelization. Designs
of this nature are termed vertical. Each component in this system depends on all of the
components at lower levels. Vertical subsystems exhibit a high degree of coupling,
Section 4.13 Physical Design Quality 195
Level 7:
Level 6:
LevelS:
Level 4:
Level 3:
Level 2:
Levell: .11_._1I1ll
CCD=28 CCD = 17 CCD=7
(a) Vertical (b) Tree (c) Horizontal
Vertical systems are highly inflexible with respect to both testing and reuse. There is
only a single order in which to test purely vertical systems, and that order is entirely
determined by its levelization. Developing a vertical subsystem is also relatively
expensive in terms of link times. The total link cost (CCD) of 28 units for this system
is more than half of the 49 units for the cyclically dependent subsystem shown in Fig-
ure 4-25. Furthermore, a vertical subsystem will be relatively difficult to partition into
parallel development efforts, spread across multiple developers. A vertical subsystem
is, however, acyclic and therefore qualitatively easier to maintain than if it were
cyclic.
Figure 4-26b shows a design hierarchy in the form of a binary tree. As we know, over
half of the components in this design contribute only a single unit each to the CCD.
Designs will not be perfect binary trees, but the CCD of a binary tree serves as a good
benchmark against which to compare many typical applications. Tree-like designs,
with their lower degree of coupling, are much more flexible and suited to reuse than
vertical designs. At each level there are typically several subsystems that can be tested
196 Physical Hierarchy Chapter 4
and possibly reused independently of the rest of the system. The disk space require-
ment for holding most of the incremental test driver programs will be relatively low.
By making the dependency graph flatter rather than taller, we increase flexibility. The
flatter the design, the greater the potential for independent reuse. Flattening the
dependencies also helps to decrease the time needed for understanding and mainte-
nance. The flatter the design, the more likely a bug can be tracked to a single, isolated
component or a small independent subsystem, and therefore the less disk space will
be required by the driver executable to exercise the defect.
Figure 4-26c shows the other end of the levelization spectrum. This type of design is
characterized as horizontal because all of the components are entirely independent
and decoupled from one another. Components belonging to purely horizontal sub-
systems may be tested in any order and reused in any combination desired. The disk
space requirement for every incremental test driver program will be quite low. Such
dependency characteristics are typical in reusable component libraries but atypical of
subsystems in generaL
We can make some objective, quantitative statements about the relative maintainability
and reusability (but not necessarily the "goodness") of a design of a given size based
on its CCD. Design dependencies form a continuum that ranges from cyclic to verti-
cal to tree-like to horizontal. Even in the presence of cycles, every design can be
assigned a CCD. All other things being equal, the lower the CCD, the less expensive
(in terms of link time and disk space) the system will be to develop and maintain.
There is yet another reason to strive for a hierarchical system with a minimal CeDe
Requirements are rarely cast in stone and may change during the development of a
project. By distributing the implementation throughout a hierarchy of components,
the design becomes more resilient to change. The more horizontal an architecture, the
less it is likely that any changes in specification will affect the overall system. This
expected cost due to changes in specification is directly related to the average compo-
nent dependency (ACD) in the system.
Section 4.13 Physical Design Quality 197
"
.. . . . . 1l.•·:..•. •".:r..•·.•..:: . :c..••.·:•.. : I.··.•e.•.·•. .• I
I
"0:::" . . . . . . :.
.
I. •. •. •.D
·. ·.• ·•. ·•·. •. 1..•·•. ··•.:Hi.>
n.·.··
.••. •.•·.•.•.
As an illustration of reducing CCD, consider the two systems with similar depen-
dency structure shown in Figure 4-27. Design A has a cyclic dependency between two
of its components. Testing either one of these components requires linking to both of
them, along with all of the components on which either one of them depends; this
gives each of them an individual component dependency of 7. Notice also that at the
right of Design A, a portion of the hierarchy is purely vertical.
198 Physical Hierarchy Chapter 4
Cyclic
Dependency
CCD is an objective metric that characterizes the physical coupling within a system.
CCD can flag subsystems with unusually high incremental development and mainte-
nance costs. For example, a vertical chain is the levelizable configuration with the
highest CCD: N(N + 1)/2. Therefore a CCD of greater than N(N + 1)/2 implies that at
least one cyclic dependency exists. However, CCD is not (by itself) a measure of the
quality of a subsystem.
We can conveniently use the alternate equation derived in Figure 4-22 to determine
the CCD for a (theoretical) binary-tree-like architecture of the same size as those
shown in Figure 4-27. Figure 4-28 demonstrates that a binary-tree-like architecture
with 11 components has a CCD of 32.02, which is comparable with that of Design B.
Section 4.13 Physical Design Quality 199
= 12 -Iog2(12) - 11
= 32.02
Figure 4-28: Computing the CCD of a Theoretical Balanced Tree of Size 11
CCD(subsystem)
NCCD(subsystem) =
CC~alanced(Nsubsystem)
Binary
Tree
The NCCD of a system can be used to characterize the degree of physical coupling
within the system relative to a theoretical binary-dependency tree of the same size.
Referring back to Figure 4-27, the NCCD of Design B was 32/32.02 = 1.00 as com-
pared with an NCCD of 39/32.02 = 1.21 for Design A (and 121/32.02 = 3.78 for the
completely interdependent implementation).
An NCCD of less than 1.0 can be thought of as more "horizontal" or loosely coupled;
such a system probably employs little reuse. An NCCD of greater than 1.0 can be
thought of as more "vertical" and/or tightly coupled; such a system may be making
extensive reuse of components. An NCCD substantially greater than 1.0 indicates that
there may be significant cyclic physical coupling within the system.
The degree of maintainability in terms of the CCD that we are able to achieve depends
on the nature of the subsystem. We will not always achieve perfect tree-like maintain-
ability. For horizontal component libraries, we would expect a much lower CCD. The
200 Physical Hierarchy Chapter 1
CCD will be higher for highly interconnected topologies that employ reuse heavily,
such as the window system shown in Figure 2-8 of Section 2.5.
NeeD is not a measure of the relative quality of a system. NCCD is simply a tool for
characterizing the degree of coupling within a subsystem. Increasing the number of
components in a system could artificially reduce the NCCD. One way to do this is to
eliminate completely valid reuse; this would likely not be an improvement.
Figure 4-29 shows two designs with equivalent functionality. Design B is 50 percent
larger than A with a 25 percent larger CCD. On the other hand Design A, through
reuse, exhibits more physical coupling for its size than does Design B. Nonetheless,
Design A may very well be the better engineered and more maintainable design.
SIZE = 4 SIZE = 6
CCD= 8 CCD = 10
NCCD = 1.05 NCeD = 0.73
Reducing the CCD in a system of a given size is almost always desirable. Reducing
the size (number of components) of a system is also desirable but not at the cost of
introducing cyclic dependencies, inappropriately merging components, or creating
Section 4.14 Summary 201
In conclusion, the CCD metric has been introduced to identify explicitly the kind of
dependencies we would like to minimize. NCCD gives us a quantitative way of char-
acterizing the physical dependencies of a subsystem as horizontal, tree-like, vertical,
or cyclic. The precise numerical value of the CCD (or the NCCD) for a given system
is not important. What is important is actively designing systems to keep the CCD for
each subsystem from becoming larger than necessary.
4.14 Summary
Much of the testing strategy in this chapter is motivated by the success of Design For
Testability (DFT) over a decade ago. But, unlike real-world objects, instances of
classes defined within a software system are no different from instances of the same
classes defined outside that system. We can exploit this fact to verify portions of the
design hierarchy in isolation, thereby reducing part of the risk of integration.
Hierarchical testing refers to testing components at each level of the physical hierarchy.
Each lower-level component should provide a well-defined interface and implement
predictable functionality that can be tested, verified, and reused independently of
components at higher levels.
Incremental testing refers to having individual drivers test only the functionality actu-
ally implemented within the component under test; functionality implemented at
lower levels of the physical hierarchy is presumed at this point to be internally correct.
Consequently, incremental tests mirror the complexity of the implementation of the
component under test and not that of the hierarchy of components upon which this
component depends. Incremental testing is a form of white-box testing, which relies
on knowing the implementation of the component in order to improve reliability.
Black-box testing derives from requirements and specifications, and is independent of
implementation. These two forms of testing are complementary, and both contribute
to ensuring overall quality.
Cyclically dependent designs are not levelizable. Such systems are known to be diffi-
cult to maintain and have a correspondingly high CCD. Among designs that are level-
izable, the more horizontal the hierarchy, the lower the CCD. Flattening physical
dependencies helps to decrease the time needed for understanding, development, and
maintenance, while improving the flexibility, testability, and reusability of a system.
NCCD (normalized CCD) helps to categorize the physical structure of arbitrary
designs as cyclic, vertical, tree-like, or horizontal.
Levelization
Link-time dependencies within a system (as quantified by CCD) playa central role in
establishing the overall physical quality of a system. More conventional aspects of
quality, such as understandability, maintainability, testability, and reusability, are all
closely tied to the quality of the physical design. If not carefully prevented, cyclic
physical dependencies will rob a system of this quality, leaving it inflexible and diffi-
cult to manage.
Even revisable designs can be unnecessarily costly to maintain and enhance. Forced
dependency on large, low-level subsystems can pose a significant development burden
on higher-level subsystems. Minimizing the impact of such dependencies contributes
to the physical quality of the system.
Throughout this chapter we use many examples taken from several application
domains to illustrate these techniques in a variety of contexts. Occasionally we present
a substantial body of source code to make the example concrete for reference purposes.
204 Levelization Chapter 5
In this section we look at three ways in which cyclic physical dependencies can occur
in practice. To demonstrate the breadth of this problem, we preset;lt and discuss each of
these examples in a separate subsection without attempting to resolve them. These spe-
cific problems and many others will be solved as appropriate techniques are presented
throughout the remainder of this chapter.
5.1.1 Enhancement
Initial designs are usually carefully planned and often levelizable. In time, the unan-
ticipated needs of clients can evoke less-well-thought-out enhancements that induce
unwanted cyclic dependencies. For example, we sometimes find we have similar
objects that, for one reason or another (e.g., performance), coexist in a system but that
contain essentially the same information.
Figure 5-1 shows a simple but illustrative example consisting of two classes, each rep-
resenting a kind of box. A Rectangl e is defined by two points that determine its
lower-left and upper-right corners. A Wi ndow is defined by a center point, a width, and
a height. These objects have distinct performance characteristics but contain the same
logical information.
II rectangle.h II window.h
#ifndef INCLUDED_RECTANGLE #ifndef INCLUDED WINDOW
#define INCLUDED RECTANGLE #define INCLUDED_WINDOW
I I ... I I ...
int lowerLeftX() const; int width() canst;
II I I ...
}; };
#endif #endif
Each of these objects will be used to facilitate the rendering of very large designs
interactively on a graphics terminal; draw speed will be critical. For performance rea-
sons, we do not even consider employing virtual functions, and most of the functions
are declared i n 1 i n e.
Allowing two components to "know" about each other via Iii ncl ude
directives implies cyclic physical dependency.
It turns out that clients will occasionally need to be able to convert between these two
types of boxes, perhaps to obtain the performance characteristics of the other. This is
one way in which good designs can sometimes start to deteriorate.
Consider the "solution" set forth in Figure 5-2. We have added to each class a con-
structor that takes as its only argument a con s t reference to the other class. We can
now pass a Wi ndow object to a function requiring a Rectangl e and vice versa, the con-
version being performed implicitly. How does that sound to you?
If it sounded good to you, you are not alone. But it is not a good solution. For one
thing, any speed benefit that might be realized could be lost by having to construct a
temporary object of the other type on entry to a function. Since the conversion is
implicit and automatic, your clients may not even realize that the extra temporary is
being created (and will blame you for your "slow" class).
Much more importantly, we have introduced a cyclic physical dependency between the
header files of two previously independent components. Each of these components
now must "know" about the other. It is no longer possible to compile, link, test, or use
either one of these components without the other. Most clients will not be concerned
about the subtle differences in performance characteristics between these classes and
would opt to iIse either one, but rarely both. This unlevelizable enhancement forces
them to take both.
206 Levelization Chapter 5
II rectangle.h II window.h
#ifndef INCLUDED_RECTANGLE #ifndef INCLUDED_WINDOW
#define INCLUDED_RECTANGLE #define INCLUDED_WINDOW
i n1 i
ne II
Rectangle::Rectangle(const Window& w)
{
I I ... i n 1 i ne
} Window::Window(const Rectangle& w)
{
II 1/
}
#endif #endif
rectangle.h window.h
rectangle.c window.c
rectangle window
We can move the preprocessor tf inc 1 ude directives from the . h files to the . c files (as
shown in Figure 5-3), but this does not eliminate the physical coupling. Both compo-
nents still depend on each other at compile time, and each will potentially depend on
the other at link time. We need to do something a bit more radical.
Section 5.1.1 Enhancement 207
II rectangle.h II window.h
#ifndef INCLUDED_RECTANGLE #ifndef INCLUDED~WINDOW
#define INCLUDED_RECTANGLE #define INCLUDED_WINDOW
#endif #endif
rectangle.c window.c
rectangle window
5.1.2 Convenience
Often, in an effort to make a system usable, developers are tempted to create designs
that are not structurally sound. As a second, more involved example of this recurring
theme, consider a graphical shape editor whose design is depicted abstractly in Figure
5-4. The Shape class is abstract and defines a protocol that all concrete shapes must
implement. Every shape has a location that we will assume for now must be manipu-
lated as quickly as possible (i.e., via inline functions). Since some of the functionality
in the Shape class is already implemented, Shape serves not only to define a common
interface, but also to factor the common part of the implementation. 1
1 Section 6.4.1 describes how we could reduce compile.:.time coupling between consumers and sup-
pliers of the Shap e interlace if we relaxed the speed requirement for the m0 veT 0 function.
Section 5.1.2 Convenience 209
The Shap e class could potentially define a large number of pure virtual functions. A
sparse representation of the header file for the shape component is presented in Figure
5-5. Clients of the Shape class will need to be able to create actual shapes, but they
will not need to interact with the derived class interfaces directly. In order to insulate
clients of Shap e from concrete classes derived from Shap e, the ability to create spe-
cific kinds of Shape is incorporated directly into Shape's interface.
210 Levelization ChapterS
II shape.h
#ifndef INCLUDED_SHAPE
#define INCLUDED_SHAPE
class Screen;
class Shape {
int d_xCoord;
int d-yCoord;
protected:
Shape(int x, int y);
Shape(const Shape& shape);
Shape& operator=(const Shape& shape);
public:
static Shape *create(const char *typeName);
virtual -Shape();
I I ...
v0 i d m0 veT a ( i nt x, i nt y) { d_x = x; d-y = y; }
I I ...
virtual Shape *clone() const = 0;
virtual void draw(Screen *s) canst - 0;
I I ...
};
#endif
To make it easy to add new shapes by name, the Shap e class implements the static
member function create. This method takes the type name of the Shape (as a canst
cha r *) and returns a pointer to a dynamically allocated, newly constructed Shape of
the appropriate concrete type derived from Shap e. 2 If no shape corresponding to that
type name exists, the function returns o. The entire . c file for the s hap e component is
presented in Figure 5-6.
2 Returning a pointer to a dynamically allocated object is error prone because it leaves the responsi-
bility of deallocation with the client. Failing to catch an exception can easily result in a memory leak.
Handle classes (as discussed in Section 6.5.3) can be used to reduce the potential for memory leaks.
Section 5.1.2 Convenience 211
II shape.c
#include "shape.h"
#include "circle.h"
#include "square.h"
#include "triangle.h"
#include "screen.h"
#include "string.hl! II strcmpC)
Shape::ShapeCconst Shape& s)
: d_xCoord(s.d_xCoord)
, d-yCoordCs.d-yCoord)
{}
Shape: :---Shape() {}
The Ed ito r class itself is layered upon a number of custom types (E 1, ..., En) used
solely in the implementation of Ed ito r. Each of these types uses Sh a pe in its interface
in order to perform various abstract operations on shapes (e.g., move To, sea 1e, draw,
and so on). Only one of the implementation components, el, which implements the add
212 Levelization ChapterS
command, needs to be able to create a shape from a type name. The rest of these com-
ponents can use Shape's virtual functions to access a particular Shape's functionality,
and do not need to depend on any concrete Shape directly. Does this sound reasonable?
Although this design may seem appealing from a usability standpoint, it has a design
flaw that makes it quite a bit more expensive to maintain than it need be. The ere ~•. ~
ate member function of S hap e uses a constructor of each of the classes derived from
Shape, which forces a mutual dependency between Shape and all classes derived from
Shape. It is therefore not possible to test a specific kind of Shape independently of all
the rest, significantly increasing the link time and disk space required during incre-
mental testing. The shape subsystem, which is otherwise horizontal and therefore
highly reusable, is turned into an all-or-nothing proposition.
Adding a new kind of shape to this subsystem requires modifying the Shape base
class, which could produce errors in functionality pertaining to the other indepen-
dently derived classes. The high degree of coupling brought on by having a base class
"know" about its derived classes implies a considerable increase in maintenance cost
and a considerable loss of flexibility and reuse.
Consider the graph shown in Figure 5-7. A graph consists of a collection of nodes and
edges. The nodes within this graph are connected by directed edges. In general, the
edges in the graph will form cycles. 3 Each node consists of some data and some infor-
mation about how the node is incorporated into the graph. In this example the node's
data is no more than a name. The connectivity is represented simply as a list of edges
to or from that node.
6
Susan ~
- - - - - - - I.. Franklin
Figure 5-8 illustrates the minimal functionality associated with the node component.
Given a Node, it is possible to ask for its name, find out the number of edges connected
to it, and iterate over these edges by supplying integer indices between 0 and N-l,
where N is the current value returned by Node: : n umEdges ( ).4
II node.h
#ifndef INCLUDED_NODE
#define INCLUDED_NODE
class Edge;
class Node {
I I ...
Node(const Node&); II not implemented
Node& operator-(const Node&): II not implemented
public:
Node(const char *name);
,....Node() ;
canst char *nameC) canst;
int numEdges() const;
Edge& edge(int index) canst;
};
#endif
Figure 5-8: Public Interface of node Component
II edge.h
#ifndef INCLUDED_EDGE
#define INCLUDED_EDGE
class Node;
class Edge {
I I ...
Edge(const Edge&); II not implemented
Edge& operator=(const Edge&); II not implemented
public:
EdgeCNode *from, Node *to, double weight);
-Edge ( ) ;
Node& fromC) const;
Node& toC) const;
double weight() canst;
};
#endif
Figure 5-9: Public Interface for an edge Component
4 It
is a subtle point that supplying an integer index for iteration suggests that the underlying imple-
mentation is likely to be an array of some kind and not a linked list of edges. A naive linked-list
implementation would result in quadratic runtime behavior during iteration.
Section 5.2 Escalation 215
An Edge in this system is used to connect rtodes. Like nodes, edges also contain both
local and network-related functionality. The network-independent infonnation associ-
ated with the Edge in this example is just its weight, and the connectivity information
is just the two Nod e objects to which the Ed g e is connected.
Initially we are faced with the unappealing design illustrated in Figure 5-10. Node
uses Edge in its interface and vice versa. As it stands, it seems as though class Node
and class Edge must be mutually dependent-otherwise how could a client possibly
traverse the graph? Furthermore, there is the question of who owns the memory for
these objects and who is authorized to bring instances of Node and/or Edge into and
out of existence.
node edge
Recall from Section 3.6 that friendship does not introduce physical dependencies by
itself, but in order to preserve encapSUlation it can indirectly cause physical coupling
to occur. In order to avoid the breach of encapsulation and lack of modularity associ-
ated with long-distance friendships, it may be necessary to group severallevelizable
classes within a single component (as explained at the end of Section 5.9). A common
example of this kind of coupling can be seen in virtually every container component
that supplies an iterator. Invariably the iterator will be a friend of the container and
therefore defined within the same component.
The above are but a few examples of the kinds of cyclic coupling that commonly arise
in practice. The remainder of this chapter is devoted to developing various techniques
and transformations for untangling designs that might otherwise seem to defy an acyclic
physical implementation.
5.2 Escalation
Let's now return to the example involving the two cyclicly dependent components
(shown in Figure 5-1): rectangl e and wi ndow. Suppose that instead of having
rectangl e and wi ndow "know" about each other, we decide arbitrarily that rectangles
216 Levelization Chapter 5
are more basic than windows. We can move both conversions into class Wi ndow.
Win d ow now "uses" Re eta n 9 1 e but not vice versa, as is illustrated in Figure 5-11.
window
Level 2:
II window.h
#ifndef INCLUDED_WINDOW
#define INCLUDED_WINDOW
Levell:
#ifndef INCLUDED_RECTANGLE
#include "rectangle.h"
rectangle #endif
class Window {
I I ...
public:
I I ...
Window(const Rectangle& r);
I I ...
operator Rectangle() const;
I I ...
II rectangle.h };
#ifndef INCLUDED_RECTANGLE
#define INCLUDED_RECTANGLE II
#endif ifendif
This solution requires that we change our point of view somewhat, because the
Rectangl e and Wi ndow classes are no longer symmetric. Rectangl e lives at levell,
but Win d ow is now defined at level 2. If we want any old box we can reuse Re eta n91 e
and not worry about Win d ow or conversions between the classes. If we need a Win dow,
however, we will have to take Rectangl e also.
Level 3:
Level 2:
Levell:
Level 5:
Level 4:
Level 3:
Level 2:
Levell:
(b) + '", .
~~
Adding a dependency between two components at the same level (e.g., from v to u in
Figure 5-12) also never introduces a cycle but does affect the level number as shown
in Figure 5-13b. Finally, it may even be possible to add a dependency from a lower-
level component to a higher-level one (e.g., from t to u in Figure 5-12 without intro-
ducing a cyclic dependency). Adding this dependency without introducing a cycle
will be possible if and only if component u does not already dominate component t.
Here, component u does not dominate component t and the result of adding the
dependency from t to u is shown in Figure 5-13c.
Of course we could have gone the other way and made Wi ndow the primitive object. In that
case rectangl e knows about wi ndow but not vice versa. This situation is depicted in Fig-
ure 5-14. Notice that in this example we have elected to move the If inc 1ude" win dow. h"
directive to the rectangl e. c file, which implies that the conversion routines will not be
inline.
Section 5.2 Escalation 219
Level 2:
Levell:
window
II rectangle.h
#ifndef INCLUDED_RECTANGLE
#define INCLUDED_RECTANGLE
class Window;
#endif #endif
Both solutions imply that only one component can be used independently of the other.
Either solution is an improvement over the original cyclicly dependent design, but we
can do still better. Many clients who use these components will need one or the other
but not both. Of those that do need to use both components, only some will need to
convert between them. To maximize independent reusability, we can avoid having
either component dominate the other by moving the cycle-inducing functionality to a
higher level-a technique referred to in this book as escalation.
220 Levelization Chapter 5
In corporations, if two employees are not able to resolve a dispute, the common prac-
tice is to escalate the problem to a higher level. In the case of objects competing for
dominance, the same solution is often effective. We can create a utility class called
Box Uti 1 that knows about both the Re eta n91 e and Win dow classes and then place the
definition of this class in an entirely separate component, as shown in Figure 5-15.
Now clients interested in either Rectangl e class or Wi ndow class·are free to use either
class independently. If a single client happens to use both classes but does not need to
convert between them, so be it. If yet other clients require the conversion routines,
they are available. However, note that conversion between Rectangl e and Wi ndow,
which used to be implicit, must now be performed explicitly. (See Section 9.3.1 for
more on implicit conversions.)
Note that, in the previous example, we elected to use the keyword s t rue t instead of
c 1 ass when defining BoxUt i 1 to suggest that this type merely provides a scope for
public nested types and public static member functions. In this convention, all mem-
bers of a struct are public and hence there are no data members. Although creating
an instance of such a type is pointless, it does no real harm. We can reduce some
unnecessary clutter if we suppress our compulsion to declare the unimplemented
default constructor p r i va teo
Section 5.2 Escalation 221
boxutil
Level 2:
Levell:
rectangle window
I I boxuti 1. h
#ifndef INCLUDED BOXUTIL
#define INCLUDED BOXUTIL
class Rectangle;
class Window;
struct BoxUti 1 {
static Window toWindow(const Rectangle& r);
static Rectangle toRectangle(const Window& w);
};
#endif
II rectangle.h II window.h
#ifndef INCLUDED_RECTANGLE #ifndef INCLUDED_WINDOW
#define INCLUDED_RECTANGLE #define INCLUDED_WINDOW
ffendif #endif
Now let's consider again the physical coupling induced by the static create function,
defined in the base class of the shape hierarchy of Figure 5-4. Suppose we escalate
ere ate above the level of its derived classes by introducing a new utility class,
Shap eUti 1, whose sole purpose is to create shapes. This new class would be placed in
its own component and contain the ere ate function from the original Shap e class, as
shown in Figure 5-16.
/ / shapeuti 1. h
#ifndef INCLUDED_SHAPEUTIL
#define INCLUDED_SHAPEUTIL
class Shape;
struct ShapeUtil {
static Shape *create(const char *typeName);
}; .
#endif
By adding a new component and escalating the Uses relationship to a higher level, we
have removed the cyclic dependencies among all components in the shape subsystem.
The levelized diagram for the new system is shown in Figure 5-17.
It is now possible for each concrete shape to be tested in isolation. Even the partial
implementation provided by class Shape can be tested modularly by deriving a con-
crete "stub" class from Shap e in the test driver for the s hap e component. Each of the
concrete shapes can now be reused independently of the rest in any combination. For
example, another system is now able to reuse ci rcl e. and square without having to
link: in t ria n9 1e.
It is now also possible to test each of E2, ... , En without having to link to every con-
crete shape. Since these components require only the shape base class interface, it may
be deemed sufficient to test the incremental value added by each of the editor compo-
nents e2, ... , en on only a representative sample of all available concrete shapes.
The advantage of this new design over the original is a reduction in coupling that will
translate directly into reduced development and maintenance costs while amplify-
ing the potential for reuse. It may be difficult to appreciate the importance of this
design approach when the number of implementation components in the editor, and,
Section 5.2 Escalation 223
particularly, the number of concrete shapes, is small. The real advantage is that this
new design scales up much better than the original as more editor commands and new
kinds of Shape are added.
At first glance, this new design may appear to be unnecessarily complicated, but in
fact it simplifies the job of both developer and client. Even with the additional compo-
nent in the new design, the coupling' associated with hierarchically testing the shape
subsystem as measured by CCD is reduced by a full 25 percent. The coupling associ-
ated with incrementally testing the editor subsystem is reduced by 17.4 percent, giving:
an overall reduction in CCD of 20.5 percent.
Figure 5-18b illustrates the effect when the editor suqsystem is made large (30 imple-
mentation components instead of only 3). Now the reduction in component coupling
for the editor subsystem is nearly 46 percent, pushing the overall reduction in CCD to
43.3 percent.
Cyclic coupling at lower levels of the physical hierarchy can have a dramatic effect on
the cost of maintaining clients. As can be seen in Figure 5-18c, when the shape hierar-
chy is made large (30 concrete types instead of only 3), the advantage of the new
design, as measured by CCD, amounts not only to a reduction in coupling of over 90
percent in the shape subsystem but also a reduction of over 44 percent in the editor
subsystem, for a reduction of close to 85 percent overall. When both the shape sub-
system and editor are large, the overall percentage reduction in coupling continues to
improve, as shown in Figure 5-18d.
Section 5.2 Escalation 225
17.4%
25.1%
SIZE = 4 SIZE = 5
CCD = 16 CCD = 12
NCCD = 2.10 NCCD = 1.14
shape subsystem shape subsystem
45.9%
25.1%
SIZE = 4 SIZE = 5
CCD= 16 CCD= 12
NCCD = 2.10 NCCD = 1.14
shape subsystem shape subsystem
44.3%
90.3%
SIZE = 31 SIZE = 32
CCD = 961 CCD = 93
NCCD = 8.47 NCCD = 0.69
shape subsystem shape subsystem
84.9%
90.3%
SIZE = 31 SIZE = 32
CCD= 961 CCD = 93
NceD = 8.47 NCCD = 0.69
shape subsystem shape subsystem
The important lesson to be learned from this analysis is that a high degree of coupling
associated. with lower-level subsystems can dramatically increase the cost of develop-
ing and maintaining clients and subsystems at higher levels.
5.3 Demotion
Escalation and demotion are similar in that in either case, cyclic dependencies among
components are eliminated by moving the cyclicly dependent functionality to another
level in the physical hierarchy. Let us start by analyzing what happens during a more
general form of escalation. As illustrated in Figure 5-19, two mutually dependent
components (a) are factored into four components, (b) two of which may be mutually
dependent and two of which are independent. The two higher-level components can
230 Levelization ChapterS,
Now contrast this with the general process of demotion. As shown in Figure 5-20, two
mutually dependent components (a) are again factored into four components (b). Two
of the components depend on the two other components, which may be mutually
dependent. The two lower-level components can then be combined (c) if necessary to
avoid a cyclic dependency or, if cohesive, to reduce physical complexity.
Consider the situation shown in Figure 5-21, in which there are two geometric utility
classes, GeomUt i 1 and GeomUt i 12. Each of these utilities provides a suite of functions
that operate on points, lines, and polygons. External clients directly use one, the other,
or both. Unlike geomut; 1, geomut i 1 2 is complex and depends on many other compo-
nents, and even exposes some new types in its interface. Those clients that need only
the basic geometric functionality provided in GeomUt i 1 need not link with the
geomuti 12 component.
Section 5.3 Demotion 231
II geomuti12.h
#ifndef INCLUDED_GEOMUTIL2
#define INCLUDED_GEOMUTIL2
class Line;
class Polygon;
struct GeomUti12 {
static int crossesSelf(const Polygon& polygon);
static int doeslntersect(const Line& linel. const Line& line2);
I I ...
}
#endif
r components
II geomutil.h
#ifndef INCLUDED_GEOMUTIL
#define INCLUDED_GEOMUTIL
class Point;
class Line;
class Polygon;
struct GeomUtil (
static int islnside(const Polygon& polygon. canst Point& pOint);
static int areColinear(const Line& linel, ·canst Line& line2);
static int areParallel(canst Line& linel, const Line& line2);
I I ...
}
flendif
232 Levelization Chapter 5
It may also be the case that the two components have taken on distinct characteristics
due to the demands of the clients who depend on them. In that case, it might be more
appropriate to factor out the common functionality and demote it to a lower level in
the physical hierarchy, as shown in Figure 5-22. That is, we can move both the
does I nte r sect and a reCo 1 i nea r functions to GeomUt i 1Co reo
Notice again that these utility classes are merely scopes in which to declare static
member functions-they were never intended to be used to create objects. By
employing the "trick" of making both of the original utilities derive publicly from a
common core, clients of the original utilities will not need to alter their code if one or
more of the utility functions they use is demoted.
Section 5.3 Demotion 233
Demotion is a useful tool for reducing the CCD of some designs even when there are
no cyclic dependencies. Suppose a component x depends on only a part of another
complex component y with a high CCD as shown in Figure 5-23a. If we can demote
the common part of y we may be able to spare x some of the physical dependencies
incurred by y (see Figure 5-23b).
other components
other components
-
Demoting common code enables independent reuse.
Figure 5-24 illustrates a situation in which the enumerated values defined in sub-
system A are used throughout the entire system, yet subsystem B is otherwise inde-
pendent of subsystem A.
struct ScopeOfE {
enum E { 1* ... *1 };
};
It may seem that placing a single enumeration in its own class is overkill. In some
cases that is so, but not here. Notice that placing this tiny bit of code in its own com-
ponent has freed subsystem B from the considerable maintenance burden of having to
drag around all of subsystem A.
In the architecture shown in Figure 5-26, the parser is tightly coupled to the runtime
data structure in a single subsystem at the bottom of the system hierarchy. Conse-
quently, we might expect to see a member function of the form
system
I I
..
processor
I I
..
parser &
I runtime-db I
Figure 5-26: Poorly Factored Runtime Database Architecture
At the next level, the processor, which operates off the runtime database, is forced to
depend on the combined parser and runtime database subsystem. The system component
is relatively small and manages both the loading and processing of the runtime database.
Although the above architecture is levelizable, it portends some potentially severe conse-
quences with respect to maintenance and enhancement. The development of a processor
is coupled to both the parser and the runtime database, even though a parser is not needed
for processing. As the system expands and we decide to add more processors, each pro-
cessor must bear the unnecessary burden of linking to the parser during development.
Suppose we decide to change the fonnat of the input file or (worse) make use of mul-
tiple formats. Now, instead of just a single read command, the runtime database must
support several:
section 5.3 Demotion 237
system
(-.. /
parser A &
parser B &
parser C &
runtime-db
The consequence of this subsystem architecture is that all existing parsers must be
linked in whenever we are:
In the original architecture, the database depends on the parser to load the informa-
tion. However, on closer examination (see Figure 5-28), we realize that there is (or
should be) an almost acyclic relationship between the runtime database and the parsers.
The database is a low-level repository for information into which clients (such as
parsers) deposit information and from which clients (such as processors) access and
possibly manipulate information. Each parser depends on. the runtime database to
238 Levelization Chapter 5
store the parsed infonnation. The problem lies in the gratuitous "upward" dependency
of the runtime database on a parser.
Gratuitous
Upward
Dependency
... Pn
p1 p2 pn
parser subsystem
r1 r2 rn
runtime database subsystem
If subsequent processing should not alter the runtime database but, say, merely gener-
ate reports, the system can ensure that the database is not overwritten by passing the
processor a read-only (con 5 t) reference to the loaded Run t i meDB:
system
runtime-db
With this new architecture, any number of independent processors can be added to the
system and none of them will depend on any parser. Similarly, parsers can be replaced
or added without affecting the runtime database, processors, or other parsers in any
way. With this architecture it is not hard to imagine that the database, parsers, and pro-
cessors could be reused in various combinations in other standalone applications (e.g.,
translators, archivers, and browsers).
As a final example of the power of demotion, consider the subsystem shown in Figure
5-30, in which three related components are cyclicly dependent.
240 Levelization Chapter 5
reporta
6 Normally a utility class is either just a struct to provide a scope for a collection of related free
functions or a module (i.e., the class contains only static data members). In either case, it is not
meaningful to instantiate instances of such a class because they contain no state associated with a
particular instance. See "Class Utilities" in booch, Chapter 5, pp. 186-187.
section 5.3 Demotion 241
The problem arises in part because the single Libra ry class serves as both a reposi-
tory of low-level infonnation and a collection of (higher-level} reports. Fortunately
there are a couple of alternative solutions. First, by demoting the low-level repository
below the rest of the subsystem, we can eliminate the cyclic coupling (as shown in
Figure 5-31).
Another solution begins by recognizing that a single class, Report, has been used for
J two distinct purposes:
Having a single class serve this dual role is also partially to blame for the cyclic
dependency. The Libra ry depends directly on the interface of the base class, Report,
but only indirectly on its implementation, through the use of virtual functions.
Consider what would happen if we split Report into two classes. The first class would
define the interface specified in the original Report class but would not implement any
of the functions. That is, every function in class Report would now be declared a pure7
virtual function. The second class, call it Report Imp, would derive from Repo rt and pro-
vide the generic report implementation by overriding the appropriate virtual functions.
Now it is possible to break the cyclic dependency in the original system (Figure 5-30)
by demoting only the interface defined in the Repo rt base class below the level of
Library. Class Reportlmp, which implements common functionality and depends on
Stat Uti 1 , remains at a higher level in the physical hierarchy, as shown in Figure 5-32.
Which solution is better? This first solution (Figure 5-31) of factoring the Lib r a r y
class is ideal for maintenance, because the low-level repository can be developed
independently of the report collection, as can the statistical utility component. The
second solution (Figure 5-32) forces the entire unfactored Library, and therefore the
low-level repository and statistical utility, to be sandwiched between the interface and
partial implementation of Report. From this perspective, the first solution is prefera-
ble. However, there are other reasons that a single base class should define either the
interface or the factored implementation but not both. Separating the interface from
the (partial) implementation of a base class is discussed in detail in Section 6.4.1
7 The destructor would be declared virtual, but not pure virtual (see Section 9.3.3).
Section 5.3 Demotion 243
Figure 5-32: Demoting "Just" the Interface of the Report Base Class
In this new architecture (shown in Figure 5-33) the physical structure exhibits more
flexibility than it does in any previous architecture. To avoid unnecessary compile-
time coupling, we would want to separate Report from its partial implementation in
any case. Doing so also allow us to test Report, Co 11 ect i on, and Libra ry by creating
a very simple test-stub, Repo rtC, that does not use or depend on Sta tUt i 1 (see Figure
5-34).
section 5.3 Demotion 245
Factoring the library component is advantageous because it further reduces the physi-
cal coupling in the subsystem. The separation is particularly appropriate because
we've made StatUti 1 depend only on Reposi tory, while Col1 ecti on depends only
on Report, adding considerable flexibility to the hierarchy.
~~~:~;::~~~1
(a) Repos i tory (b) StatUt i 1
~~~~~=
·.'-"s,.:i~:.".·.·,:.·.·<.:.· ".,_" .:-,.·.•"_,,,.·."-.·.•,-,i·,• ,,"'~j~
(c) ReportA
subsystem subsystem subsystem
Escalation and demotion are closely related. What differentiates escalation from
demotion in character is merely the direction in which a relatively small amount of
offending functionality is moved. In fact, escalation and demotion are actually both
just special cases of the more general repackaging technique illustrated in Figure 5-36.
Here, two mutually dependent components (a) are once again factored into four com-
ponents (b). Two of these components, x' and y' , may depend only on each other
while the other two components, x u and y", potentially depend on each of the other
three components. The two respective pairs of, perhaps, mutually dependent compo-
nents may now be recombined into two new components (c). Component u now
depends on component v, which is independent. This general repackaging technique
was applied informally to the components geomut i 1 and geomut i 1 2 discussed at the
beginning of this section.
Normally we assume that if a function uses an object of type T, it does so in a way that
requires knowing the definition of T. That is, in order to compile the body of the func-
tion, the compiler needs to know the size and layout of the object it uses. The way a
compiler learns the size and layout of an object in C++ is for the component using the
object to include the header file of the component containing the object's class definition.
If a function body can be compiled having seen only the declaration of type T (e.g.,
c 1 ass T ;), then that function itself does not depend on the definition of T. The signifi-
cance of using a type in size is that such use induces an immediate compile-time
dependency on the component defining T. (Avoiding unnecessary compile-time
dependencies is the topic of C_hapter 6.) The body of a function f using type T in name
but not in size typically, however, calls one or more functions in other components
that, in tum, do depend on the definition of T. In this situation there would continue to
be a link-time dependency of f on T.
248 Levelization ChapterS
If a function f and all components on which f depends can be compiled and linked,
having seen only the declaration but not the definition of T, then f is said to use T in
name only. For example,
II util.h
#ifndef INCLUDED_UTIL
#define INCLUDED_UTIL
struct Uti 1 {
SomeType *f(SomeType *obj);
}
II util.c
#endif #include "util.h ll
illustrates a function f using a type SomeType in name only. The significance of using
a type in name only is that there is no implied physical dependency by such use-
even at link time. Without the physical dependency, the coupling is all but eliminated.
Similar definitions can be constructed for a class that uses a type in size or in name
only. Even more useful is that these definitions can be extended to apply to compo-
nents as a whole.
We use the first of these two definitions in Chapter 6. For now, we focus on the rami-
fication of the second of these two component-level definitions. Note that, as illus-
trated in Figure 5-37, a component u that uses a T object in name but also depends on
another component v that, in tum, uses T in size, by transitivity, does not use T in name
only. Component.u depends physically on component v and indirectly on component t.
Depends On
t
Figure 5-37: Component u Does Not Use Type T In Name Only
The dashed-line form of the uses notation "0- - - - " denotes that the use is "in name
only" and imposes a conceptual but no physical dependency.
250 Levelization ChapterS
Situations involving using a type in name only rarely arise naturally; they are usually
contrived to avoid unwanted physical dependencies. Using a type in name only is pos-
sible when the component doing the "using" refers to the object only by pointer or ref-
erence, and never interacts with the object directly in any way other than to hold its
address.
II handle.h
#ifndef INCLUDED_HANDLE
#define INCLUDED_HANDLE
class Faa;
class Handle {
Fao *d_opaque_p;
public:
HandleCFoo *foo) d_opaque_pCfoo) {}
void set(Foo *foo) { d_opaque_p = foa; }
Foo *get() canst { return d_opaque_p; }
};
#endif
A pointer is said to be opaque if the definition of the type to which it points is not
included in the current translation unit. Figure 5-38 shows a trivial example of a class
that holds an opaque pointer to an instance of some class named Faa. The client of the
Han d 1e class will ultimately have to include the header file of a component that
defines Faa in order to come up with a Faa object. For testing purposes, any class Foa
will do, including even a mere class declaration as Figure 5-39 demonstrates.
II handle.t.e
#include "handle.hl!
#include <assert.h>
rna inC)
{
Faa *pl = CFaa *) OxBAD;
Foa *p2 = CFoa *) OxBOB;
Handle handleCpl);
assertCpl == handle.get());
h.setCp2);.
assertCp2 == handle.getC));
}
The significance of this example is that it was possible to exercise the functionality of
the Han d 1e class completely without having to include or link to any component
defining class Faa. Such is a litmus test for whether another type has been used not
only opaquely, but also in name only.
252 Levelization ChapterS
,
«<ii
c,.·. ·.;.·• p
............> . I
. . ·.• .·• e. •.'•. ·."' ....•...
I.••.·.• . ·.·.I
.•. . .
_ . i ... )
Suppose Sc r e e n is a container for Wid get objects, and suppose furthermore that
each Widget holds a pointer, d_parent_p, identifying the Screen to which the
Widget belongs. Now consider the interfaces for the widget and screen compo-
nents suggested in Figure 5-40, and in particular the accessor member function
numberOfWidgetslnParentScreen of class Widget.
This function allows a client holding nothing but a Wi dget to find out how many other
Wi dget objects there are in the Screen to Whi ch the Wi dget belongs. From a pure
usability perspective, this architecture may seem appealing; from a maintainability
perspective, it is expensive.
section 5.4 Opaque Pointers 253
II screen.h II widget.h
#ifndef INCLUDED_SCREEN #ifndef INCLUDED_WIDGET
#define INCLUDED_SCREEN #define INCLUDED_WIDGET
#endif
The problem with the maintainability of this design is that in order to implement the
numberOfWi dgets InPa rentScreen method in the wi dget. c file, we will need to
"ask" the parent Sere en for this information. Asking Sere e n anything implies having
seen its definition, which is accomplished by first including s ere en. h in wid get. c.
But doing so leads to the unlevelizable situation depicted in Figure 5-41.
254 Levelization Chapter 5
screen widget
II widget.c
#include "widget.h"
#include "screen.h"
I I ...
int Widget::numberOfWidgetslnParentScreen() canst
{
return d_parent_p-)numWidgets();
}
II
screen.h widget.h
screen widget
The fundamental problem here is that a Wid get is trying to do more than it should. A
Wid get has functionality that makes sense in its own context, but it cannot in general
know about other,Widget objects without asking its parent Screen. Consider again
the analogy to a corporation. You can ask any employee, "What are you doing?" and
the employee should be able to tell you. Similarly, you can always ask the employee,
"Who is your boss?" In contrast, try asking the employee for the number of employ ..
ees who work for his or her boss. In general the employee will not know the answer,
and will need to go to the boss and ask.
Actually it is none of the employee's business how many other employees work for
the boss. Consider this alternate approach. Suppose you want to know how many
employees work for ,my boss. Instead of asking me that question, ask me, "Who is
section 5.4 Opaque Pointers 255
your boss?" I will tell you, and then you can go and ask her yourself how many
employees work for her. If she wants to tell you, she will.
The use of opaque pointers (used in name only) can serve to break unwanted cyclic
component dependencies. Turning back to our programming example, consider the
alternate definition for the wi dget component shown in Figure 5-42. In this usage
model, it is possible to ask the Wi dget for its parent Screen. We will then be able to
ask the parent Screen about its other Wi dget objects (or anything else, for that mat-
ter). The principal benefit for this model, however, is that component wi dget no
longer depends on component screen at either compile or link time. The dependency
of wi dget on screen is now in name only_
II widget.h
#ifndef INCLUDED_WIDGET
#define INCLUDED_WIDGET
class Screen;
class Widget {
Screen *d_parent_p; II screen to which this widget belongs
I I ...
public:
WidgetCScreen *screen);
I I ...
I I ...
The new component dependency graph for screen and wi dget is shown in Figure 5-
43. With this new architecture, it is possible to test all the functionality .of wi dget
256 Levelization Chapter 5
independently of the screen component. Other components that use widgets but do
not care about screens need not include screen. h or link to screen. o.
Level 2:
uses in the interface
(In Name Only)
Levell:
widget
A small test driver that demonstrates the physical independence of wi dget on screen
is shown in Figure 5-44.
II widget.t.c
#include Itwidget.hlt
#include <iostrearn.h>
rna i n ( )
{
Screen *const screen = (Screen *) Oxbad;
if (screen != widget.parentScreen(» {
cout « IIError!" « endl;
}
1/
}
widget.parentScreen()-)numWidgets()
For convenience, these two operations could be combined into a s tat i c member
function of Sere en or some other, higher-level class. Instead of saying
widget.numberOfWidgetslnParentScreen()
Screen::numberOfWidgetslnParentScreen(widget)
to obtain the value. In either case, this interface forces clients to look outside the
wi dget component's interface in order to obtain the answer to their question.
Note that when moving functionality from the contained object to the container, the
first argument of each new static member will be either a can s t reference or a nOD-
canst _pointer to the contained object-depending, respectively, on whether the
original member was a canst or non-canst function. The rationale for this style of
argument passing is taken up in Section 9.1.11.
The term dumb data refers to a generalization of the concept of opaque pointers. Dumb
data is any kind of information that an object holds but does not know how to interpret.
Such data must be used in the context of another object, usually at a higher level.
An initial cut at the top-level track component is given in Figure 5-46. In this architec-
ture, a T rae k holds a collection of Rae e objects and supplies a Ra eel t e r to iterate
over today's races at the track. The Track takes bets and issues (pointers to) Wager
objects, which can be redeemed after the race is completed.
II track.h
#ifndef INCLUDED_TRACK
#define INCLUDED_TRACK
class Horse;
class Race;
class RaceIter;
class Track;
class Wager {
const Horse& d_horse;
double d_amount;
I I ...
Wager(const Horse& horse, double amount); II For track's use only
Wager(const Wager&); II i.e., not for use
Wager& operator=(const Wager&); II by the public ..
friend Track;
Section 5.5 Dumb Data 259
public:
const char *horseNameC) canst;
int raceNumber() canst;
Track& track() canst;
double amount() canst;
};
class Track {
Race *d_races_p;
I I ...
friend Racelter;
public:
I I ...
canst Race *loakupRace(int raceNumber) canst;
canst Horse *loakupHorse(canst char *horseName) const;
Wager *betCconst Horse& horse, dauble wagerAmount);
dauble redeem(Wager *bet) canst;
};
class RaceIter {
I I ...
public:
Racelter(const Track& track);
void operator++();
operator canst vaid *() canst;
canst Race& operator()() const;
};
#endif
Each Ra ce object maintains the number of that race, the post time for that race, and
the collection of horses running in that race. The race component also provides a
HorseI ter to iterate over the horses running in a specified Race. Given a Race object,
it is possible to determine at which track the race will be run. A rough version of the
race component is shown in Figure 5-47.
II race.h
#ifndef INCLUDED_RACE
#define INCLUDED_RACE
class Horselter;
class Race {
I I ...
friend Horselter;
260 Levelization ChapterS
public:
Race(const Track& track, int raceNumber~ double postTime);
// ...
int number() const;
double po~tTime() const:
canst Track *track() const:
};
class Horselter {
// ...
public:
HarseIter(const Race& race);
vaid operator++();
operator canst void *() canst;
canst Horse& operator()() const;
};
#endif
A Horse is defined at the lowest level of the racetrack subsystem's physical hierarchy.
A Horse maintains its name and number, and it can be used to determine in which
race it is scheduled to run. A first cut at our leaf-level horse component is sketched in
Figure 5-48.
#ifndef INCLUDED_HORSE
#define INCLUDED_HORSE
class Race;
class Horse {
canst Race& d_race:
char *d_name_p;
int *d_number;
// ...
public:
Horse(const Race& race, canst char *HorseName, int harseNumber);
// ...
canst char *name() canst;
int number() canst;
const Race *race() const;
};
1foendif
In this initial implementation, a Wag e r is implemented with only two data members as
follows:
class Wager {
canst Horse& d_horse:
double d_amount;
// ...
public:
// ...
};
The functionality for the horse racetrack system described above implies maintaining
a cyclic internal data structure: Each T rae k knows the races that it holds, each H0 r s e
knows in which race it runs, and each Ra ce knows both the track in which it is held
and the horses that will participate in it. However, this data structure can be imple-
mented with acyclic physical dependencies by using opaque pointers as shown by the
component/class diagram of Figure 5-49.
horse
The original architecture presented for the racetrack subsystem has no cyclic physical
dependencies but is nonetheless cyclicly dependent in name. Although having two com-
ponents that each know the names of one or more objects defined in the other compo-
nent is not necessarily bad, there are trade-offs to be made that will be discussed shortly.
Suppose that, instead of identifying the objects in this system by their absolute
addresses, we identify them in terms of indices into a sequence of objects that has
meaning only in the context of the parent object.
The Track would then hold a sequence (array) of Race objects, and each Race would
have an associated integer "index." The Ra ce index would be meaningful only in the
context of a T r ac k object. Since the Ra c e indices can be made to correspond to the
publicly accessible Ra c e numbers, the need for a Ra c e I t e r is reduced, provided we
supply an accessor for Tr a c k to report the total number of races held today.
By the same argument, each Horse in a Race is naturally assigned a number. Given a
Race that has a sequence of horses, we can identify the Horse within a Race by sup-
plying its index relative to that race. We therefore can also dispense with the
H0 r s e I t e r for Race.
When it comes to redeeming wagers, the Trae k defines a context that is much smaller
than the entire address space (accessible via pointers). In the original implementation,
we used opaque back pointers beginning with H0 r s e, and moved in a bottom-up fashion
to arrive at the Race and finally the Track. In the proposed implementation, the limited
context of the Tra c k is exploited to identify the Ra ce and Ho rs e using a pair of integer
indices as shown in Figure 5-50.
class Wager {
canst Track& d_track:
double d_amount;
short int d_racelndex:
short int d_horselndex;
// ...
public:
Wager(const Track& track,
int horseNumber,
int raceNumber,
double amount);
canst Track& track() canst;
double amount() canst;
int horseNumber() canst;
int raceNumber() canst;
// ...
};
section 5.5 Dumb Data 263
class Track {
Race *d_races_p;
int d_numRaces;
// ...
public:
Wager *bet(int race, int horse, double amount):
double redeem(Wager *bet) const;
canst Race *lookupRace(int raceNumber) const;
constHarse *lookupHorse(const char *horseName) const;
canst .Horse *lookupHorse(const Race& race, int horseNumber) const;
int numRaces() canst;
// ...
};
Observe that, because of the very limited, context, we can safely use 16-bitinstead of
32-bit integers. This fact could be significant if the' number of outstanding\vagers at
anyone time becomes very large. For example, on my 32-bit machine, where a double
is 8 bytes long and naturally aligned,9 the size of the wager objects drops from 24 to
16 bytes when we make the indices s h0 r t integers-a savings of 33 percent!
Figure 5-51 illustrates the revised architecture for the racetrack subsystem. The new
system is significantly simpler. This system has no cyclic dependencies between com-
ponents-not even in name-and significantly fewer classes. The principal change
was simply the way in which a H0 r s e is identified.
Dumb data can be more convenient and occasionally more compact than opaque
pointers for identifying other objects. Had the new Wag e r object identified the Ra c e
and H0 r s e by opaque pointer instead of s h0 r t integer index, the size of Wag e r would
again be 24 instead of 16 bytes on my machine.
Another advantage is that the values stored as dumb data are not machine addresses and
therefore contain meaningful values that can be tested explicitly. In the horse-racing
application, the indexed approach is particularly appealing, because the indices
(which are publicly accessible) do have legitimate utility in the user domain. It is not
uncommon to hear a frequent patron of the track request of a parimutuel ticket agent
at the betting window: "Gimme 2 bucks on number 4 in the 9th (to win)!"
In Name Only
A disadvantage of the indexed approach is that it does sacrifice a fair degree of type
safety compared to opaque pointers in that the Race and Horse indices are just inte-
gers. Another drawback is that this implementation forces the Race and H0 r s e collec-
tions to be indexed rather than to remain arbitrary collections. The resulting erosion of
encapsulation could easily have a negative impact on maintainability if exposed to the
general pUblic.
In situations other than our horse-racing example, the dumb-data indices used to iden-
tify subobjects in this way might very well be meaningless to clients of the subsystem.
For these reasons, the use of dumb data is typically an optimized implementation tech-
nique encapsulated within a subsystem and not exposed at the higher levels of a system.
As a similar but more serious example, consider the task of modeling the connectivity
within a circuit consisting of a heterogeneous collection of electrical components. 10 A
gate-level circuit, such as the one used to introduce levelization in Figure 4-10 of Sec-
tion 4.7, can be represented as a graph consisting of nodes (called gates) and edges
(called wires). Each gate has a collection of electrically distinct connection points
(called terminals). Conceptually, representing a circuit amounts to maintaining a hetero-
geneous collection of gates and a homogeneous collection of bidirectional wires. Each
wire is attached to two distinct terminals within the circuit, establishing connectivity.
a x e
gO z -
b Y '--
x
91 z d
c y
C
In order to traverse the graph, a Te rmi na 1 must maintain an opaque pointer to its par-
ent Ga te or Ci rcu it. Note that eire ui t can be treated as just a special kind of Ga te
10 Thisexample describes the application of dumb data in a very different context. The basic tech-
nique, however, is the same as for the racetrack example.
266 Levelization ChapterS
that contains instances of other gates. I1 Cyclic physical component dependencies can
be avoided by using opaque pointers as shown in the partial component/class diagram
of Figure 5-53 (with collection iterators omitted).
Here again there is an opportunity to break even the nominal cyclic dependencies by
defining a connection "in context." If a Circuit contains,an indexed collection (an
array) of gates, and similarly each Gate contains an array of tenninals, then we can
identify a connection point in the context of a Cire ui t as a simple pair of integer indices.
11 This example illustrates another instance of recursive composition-a design pattern called
Composite in gamma, Chapter 4, pp. 163-173. This pattern was seen previously in terms of Node,
Fi 1 e, and Di rectory in Figure 4-13 of Section 4.7. The Composite design pattern has been used
effectively to implement hierarchical circuit descriptions.
section 5.5 Dumb Data 267
Consider again the example of Figure 5-52. Suppose the implementation of the circuit
consists of an array of two gates, gO and 9 1, with indices 0 and 1, respectively. The
tenninals for both gO and gl are x, y, and z, and happen to have indices 0, 1, and 2
respectively. We can now describe the connection point "terminal x of gate 9 1" as the
pair of integer indices (1, 0). We can similarly describe the connection point z of gO as
the coordinate pair (0, 2).
By convention, we can identify the enclosing circuit by using an index outside the legal
range for gates indices" (such as -1). If the circuit's terminal a has index 0, its connec-
tion coordinates could be represented as the pair of indices (-1; 0). The complete list of
connections for this circuit, described in terms of integer coordinates, is provided in
Figure 5-54.
C.a - ( - 1 , 0) ...
...
..• ( 0, 0) - gO.x
C.b
C.c
gO.z
= ( - 1 , 1)
-
-
( - 1 , 2 ) ...
( o, 2) ...
..• (
(
o, 1 ) = gO.y
1 , 1 ) = gl.y
( 1 , 0) = gl.x
gO.z - ( o, 2) ... • ( - 1 , 4) - C.e
1 , 2 ) ...
gl.z = (
• ( - 1 , -3) - C.d
class Connection {
int d_gatelndex;
int d_terminallndex;
public:
Connectian(int gatelndex, int instancelndex);
int gatelndex() canst;
int terminal Index() canst;
};
The graph-like nature of Ci rcui t is not evident from the subcomponents. The con-
nectivity of the circuit is not established until the level of the component that defines
the Gat eAr ray class, because it is only at that level that sufficient context exists to
268 Levelization ChapterS
understand the implied graph. Users of Cire u i t need not necessarily be exposed to
the lower-level Ga te and Te rmi n a1 classes, and may wind up "programming" the cir-
cuit by specifying gates and terminals by names that are translated to indices internally.
To conclude this section: dumb data is a generalization of opaque pointers that can
facilitate the implementation of subsystems, in which low-level objects must implic-
itly refer to other low-level objects. This technique is especially indicated where these
references ne~d not be interpreted at the lower levels of the subsystem, but only in the
context of some (usually) higher-level object. This restricted context can allow for
section 5.6 Redundancy 269
more compact implementations, though it is at the expense of both type safety and
encapsulation. The use of dumb data is typically a low-level implementation detail
and often not exposed in the interfaces of higher-level subsystems.
5.6 Redundancy
Reuse of any kind implies some form of coupling. In some cases the coupling may be
severe. In this book, redundancy refers to the technique of deliberately repeating code
or data in order to avoid unwanted physical dependencies brought on by reuse.
Redundancy is indicated when the functionality exists in a separate physical unit, the
amount of functionality to be reused is relatively small, and the amount of coupling
that would result is so disproportionately large as to outweigh the benefit of the reuse.
For cases where the amount of reuse would be substantial, it is often appropriate to
demote the common code to a lower level where it can be shared.
Even within a single subsystem there is a threshold below which reuse of external
functionality may not be advantageous. Consider two large components that are inde-
pendent. It is possible that one of these components implements a tiny piece of func-
tionality (such as mi n, max, etc.) that the other could reuse. Demoting this tiny piece of
the implementation to a separate component would unjustifiably increase the physical
complexity of the subsystem. Causing one of these components to depend on the
other just for such a small amount of reuse would unjustifiably increase the CCD of
the subsystem. Allowing one component to dominate the other reduces flexibility for
adding other dependencies resulting from future enhancements. Sometimes a viable
alternative to reuse is simply to repeat the code and avoid the coupling.
destroyed when the object is destroyed. An accessor in the public interface of the
object supplies the name (as a const char *) on request. Other than the name, there
is no use made of St r i n 9 in this object.
str cell
II cell.h II cell.h
#ifndef INCLUDED_CELL #ifndef INCLUDED_CELL
#define INCLUDED_CELL #define INCLUDED_CELL
#ifndef INCLUDED_STR
#include "str.h"
#endif
1tendif #endif
allocated buffer. Perhaps the biggest benefit is not having to worry about deleting this
St ring in the Ce 11 's destructor.
To experienced C programmers, none of the above should present any noticeable main-
tenance problem. The disadvantage of depending on St r i n9 is that it is extra baggage
that must follow the ce 11 component around. If s t r is not part of the same subsystem
or depends on other components, then using St r i n9 (instead of just a c h a r *) could
result in having to drag around other components or even libraries, further increasing the
burden of using Cell.
As defined in Figure 5-56a, Cell has a Stri ng and therefore depends on component
5 t r in size. All clients of Ce 11 will be saddled with not just a link-time but also a
compile-time dependency on component str. This problem is avoided if Cell is
defined as shown in Figure 5-56b. (The issues surrounding unnecessary compile-time
dependencies are the subject of Chapter 6.)
In cases such as the one shown in Figure 5-56, avoiding coupling to the 5 t r component
probably outweighs the advantages of reuse. Such would not be the case if the ce 11
component makes any significant use of St ri ng's capabilities (e.g., concatenation) or
if St r i n9 appeared many times in the definition of Cell.
Redundancy can be used effectively in a variety of ways and in conjunction with other
techniques to reduce physical dependencies. In particular, choosing to use objects in
name only can be effective not only for breaking cyclic dependencies within a sub-
system but also for reducing the physical dependency upon other subsystems. Some-
times, however, it is necessary to supply a small amount of redundant information in
order to keep certain objects opaque.
Consider the scenario illustrated in Figure 5-57. We are trying to implement a shape
analyzer on top of a large shape subsystem consisting of, say, 1,000 components.
272 Levelization Chapter 5
shape subsystem
Figure 5-57: ShapeAna 1yzer Forced To Use Highly Coupled Shape Subsystem
section 5.6 Redundancy 273
Fortunately we need to make use of only a small portion of this subsystem, in particu-
lar, the shape component. Unfortunately this component is fully dependent on the rest
of its subsystem, giving shape a disproportionately large link cost (1,000 units as
measured by its component dependency). The CeD for just the five components of
the analyzer subsystem (that is, excluding the local link cost of maintaining the shape
subsystem) is 5,012.
Often there are sophisticated container objects (such as a priority queue) that hold
other objects, but that need not depend on the contained object in any substantive way.
The job of the ShapeQueue is to maintain a heap of Shape objects ordered by area.
The Shape class supplies a public member function to return its area. Designing
Sha peQueue to use Sha pe's a rea ( ) member directly would tie the cost of developing
and maintaining the ShapeQueue (and all of its clients) to the unusually large CCD
imposed by Shape.
)
Figure 5-58 depicts an alternative architecture, motivated entirely by reducing the cost
of maintenance and testing. Instead of having ShapeQueue get the area data directly
from a Sh a pe, aSh a peMa n a ge r extracts this value and enters it, redundantly, along with
each opaque Shape pointer into the ShapeOueue. The rest of ShapeAnalyzer's imple-
mentation has been refactored so that all substantive use of class S ha pe now occurs
only in the shapemanager component, with some additional redundant data (i.e., area)
being stored in each ShapeOueue entry for use by components x, y, and z).
274 Levelization Chapter 5
Level
Number
Component
Dependency
analyzer subsystem
Component
Dependency
Level Number
shape subsystem
Figure 5-58: Reducing CCD by Using Redundant Data and Opaque Pointers
Section 5.7 Callbacks 275
The reduction in maintenance cost associated with the analyzer subsystem of nearly
60 percent is not unusual, nor is the relatively large cost associated with linking to a
large, highly interdependent subsystem. A well-designed subsystem will usually con-
tain a substantial proportion of components that do not depend on any other sub-
systems, and few components that depend on huge, tightly coupled subsystems such
as the one containing Shape.
In short, reuse is rarely without cost, and its benefit must be weighed against the cost
resulting from increased coupling. Very often that cost comes in the form of increased
physical dependence. Techniques used to reduce physical coupling, such as opaque
pointers, occasionally require providing a small amount of redundant information in
order to be applied successfully. In such cases the amount of savings in terms of cou-
pling will dictate the amount of redundancy that is tolerable.
5.7 Callbacks
#include <stdlib.h>
The first parameter of qsort, base, indicates the starting location of a homogeneous
array of objects whose type is unknown to the qsort routine. The second parameter,
numEl ements, indicates the number of objects in the base array. The third parameter,
s i zeofEl ement, indicates the uniform size of each element (as defined by the s i zeaf
operator). The fourth and final parameter, compa re, is a pointer to a callback function.
The qsort function assumes that this callback function, compa re, will correctly deter-
mine whether the first of the objects, e 1eml, implied by its two generic pointer argu-
ments should be considered less than, equal to, or greater than the second argument,
e1em2, by returning a negative, 0, or positive value, respectively.
To illustrate a benign use of callbacks, consider the simple problem of sorting a collec-
tion of Cartesian points based on their relative distances from the origin of a two-dimen-
sional coordinate system. Figure 5-59a depicts an instance of this problem containing
six points labeled a through! The definition of a Poi nt is given in Figure 5-59b.
II point.h
#ifndef INCLUDED_POINT
#define INCLUDED_POINT
class Point {
int d x;
15 G) i nt d_y; C
12 public:
Point(int x. int y) : d_x(x). d-y(y) {}
Point(const Point& p) :
9 d_x(p.d_x), d-y(p.d-y) {}
-Point() {};
6 Point& operator=(const Point& p) {
d_x = p.d_x; d-y = p.d-y; return *this; }
void setX(int x) { d~x = x; }
3 v0 i d .set Y( i nt y) { d-y = y; }
i nt x () con s t { ret urn d_x: }
int y() const { return d-y; }
o CD };
o 2 4 6 8 10
I!endif
The qsort function rearranges entries by blindly swapping one region of memory of
the specified element size with another, based solely on the value returned from the
callback function. The bitwise copy is performed using a function such as the C
Library function memcpy.
In general, copying objects to new locations using memcpy is dangerous (see Section
10.4.2), because an object may contain a pointer or reference to itself or to other
objects which it is responsible for deleting. On the other hand, it is always safe to
copy and move pointers to objects using memcpy. Suppose we create an array of six
pointers to Poi nt objects and in it store the addresses of six Poi n t objects represent-
ing the points in Figure 5-59a.
In order to use qsort, we wi~l need to give qsort away to compare two opaque
entries so it can determine their relative order. That is, given the addresses of a pair of
pointers to points (of type const voi d *), we need a way of determining whether the
distance from the origin to the Poi nt indicated by the first memory address is less
than, equal to, or greater than that of the Poi nt indicated by the second address. An
imperfect implementation of this callback function that suits our immediate needs is
given in Figure 5-60. 13
13 A better implementation in practice would be to use a daub1e for the intermediate calculations, in order
to avoid overflow. This solution is implementation dependent, and may fail on two points that are placed
nearly the same (large) distance from the origin. A robust but less runtime-efficient solution would be to
make use of a user-defined type (e.g., Daub 1e I nt) that is guaranteed to hold at least twice as many bits as
an i nt.
278 Levelization Chapter 5
Programmed with the data indicating the starting location, number of entries, size of
each entry, and a callback function that determines the ordinal positions of two entries
in context, we can reuse this modular implementation of the Quicksort algorithm to
solve our problem as shown in Figure 5-61.
II point.t.c
#include "paint.h"
#include <stdlib.h> II qsort()
#include <iostream.h>
rna i n ( )
{
print(cout, array, SIZE) « endl;
cout « "Now sort by distance from origin:" « endl;
qsort(array, SIZE, sizeof *array, pointCompare);
printCcout, array, SIZE) « endl;
Realize that q s art was developed, tested, and reused many, many times, long before
the Poi n t class or this example was written. Most of the work done by the Quicksort
algorithm is reusable. Only one behavior, campa re, varies from one usage to the next.
Supplying a callback is what enables us to factor and reuse this functionality.
The lack of type safety in the interface of q s art is glaring. But because q so r t is a
stateless algorithm with a single programmable behavior, the need for a generic sorter
object is controvertible. There is, howev~r, an implied data structure. If we have rea-
-'
class OrderedPointCollection {
I I ...
public:
II CREATORS
OrderedPointCollection();
virtual '-OrderedPointCollection();
II MANIPULATORS
void add(Point *point);
private:
II ACCESSORS
virtual int compare(const Point& pointl, const Point& point2) = 0;
};
Figure 5-62: Abstract Base Class for an Arbitrarily Ordered Point Collection
The levelization of this system is shown in Figure 5-63. Notice that the class
OrderedPaintCollection depends on Point in name only, but MyPoints: :compare
depends on Poi nt in size. The virtual function is acting as a "callback" because the
comparison operation must be performed in the context of the Poi nt's actual defini-
tion. Unlike the callback function taking two generic pointers, the virtual function
expects canst references to Poi nt objects. This in-name-only dependency of
Or de red Poi nteo 1 1 e c t ion on Pa i n t provides a welcome degree of type safety,
improving maintainability while making the component easier to use.
Level 2: MyPoints
Levell:
In Name Only
Callbacks are powerful decoupling tools, but they should be used only if necessary. A
mutual dependency generated by a pair of classes that call each other's member func-
tions is a symptom of a poor -design. Callbacks can sometimes be used to break the
cycle, but usually this problem is better handled by repackaging the functionality.
Section 5.7 Callbacks 281
Consider again the original, poorly factored, runtime database architecture shown in
Figure 5-27. If the read function of each parser implements a stateless algorithm, we
could conceivably pass the parsing function to the Runt i meDB as a callback:
However, the resulting obfuscation would probably be unjustifiable. Unlike the previ-
ous example where OrderedPoi ntColl ecti on did not depend in size on Poi nt, each
concrete parser would have to know all about the database in order to load it. If parsing
involves state and/or a multifunction interface, the standard object-oriented approach
would be to create an abstract parser base class and to derive concrete parsers for use
with specific formats, as illustrated in Figure 5-64.
This alternative revised architecture is better than the parser design as first presented
in Section 5.3, because there is no physical coupling among individual parsers, nor is
there a dependency of any processor on any parser implementation. However, this
architecture is not optimal because it forces the runtime database to know about the
interface common to all parsers:
#include "parser.h"
The runtime database may be reused by other systems that have no need for parsers.
Coupling the runtime database to a specific parser interface unnecessarily encumbers
the subsystem, making it less general, less understandable, and less appealing to reuse.
This unnecessary coupling could also adversely affect the maintainability of the runtime
database if the kind of information needed during parsing is frequently updated.
The best design for this system was the revised architecture presented in Figure 5-29 of
Section 5.3, which placed the database at the bottom of the system hierarchy with
absolutely no dependency on parsers. That architecture allowed the database group to
develop and test its subsystem in complete isolation, rather than being sandwiched
between the interface and the implementation supplied by the group developing pars-
ers. The moral of this story is that the unnecessary use of callbacks is something to be
avoided.
Callbacks can also be installed statical1y (i.e., outside of any instance). The new han-
dler 14 is an example of a static callback function with reasonable initial behavior. Cli-
ents can substitute their own function for the default in order to allow them to clean up
their application in a higher-level context.
CCD = 30,005
(orCeD = 5)
class SolarSystem {
Star d_sun;
PlanetList d_list;
I / ...
};
\.
\.
\.
planet subsystem
As you might well imagine, P1 a net is a very large and complex base class object with
many dependencies and a correspondingly high link-time cost. We would like to avoid
a physical link-time dependency of Pl anetl i st on Pl anet, especially in this (admit-
tedly unusual) case where So 1 a rSy s tern otherwise depends on P1a net in name only.
We could try to implement the Pl anetL i st using only opaque pointers to Pl anet
objects. The problem with that approach is that our P1 an e t Lis t will not have seen the
definition of class P1 a net and therefore will not know how to destroy one. We could
change the specification of P1 an e t Lis t so that it does not itself destroy the planets,
and escalate that functionality to a higher level (e.g., Sol a rSystem) as suggested in
Figure 5-66.
class SolarSystem {
Star *d_sun_p;
PlanetList d_list;
static void destroyPlanets(PlanetList *list);
// ...
public:
// ...
~SolarSystem() { destroyPlanets(&d_List); }
// ...
};
But in our example, even Sol a r Sy stem uses P1 an e t in name only. Since the use of the
Pl anetL i st type is an encapsulated implementation detail of Sol arSystem, it is not
obvious how to escalate this functionality any higher.
With complete control over the entire subsystem, a good solution could be to demote the
interface of P1 an e t, as shown in Figure 5-67. Now P1 an e t is just an interface, and all of
the physical coupling is elevated to a higher level that does not affect Sol arSystem.
Testing P1 an e t Lis t will now require deriving a trivial "stub" implementation for
P1 a net in the P1a net Lis t driver.
section 5.7 Callbacks 285
CCD=8
In Name Only
\
\
\
planet subsystem
Unfortunately, we don't control the universe and must live with a poorly factored
Pl anet. We can still break the physical dependency but it will require the use of a
redundant callback function. Suppose we add a static member to class P1 an et Lis t of
the following type:
The P1a net Lis t class now has a static data member that is a pointer to a callback func-
tion that potentially has the necessary context to destroy an instance of class Pl anet.
Before using a P1 an e t Lis t for the first time, a client (who knows about P1 an e t) should
"prime" the class by calling the static method Pl anet Lis t: : setDes t royPl a net Func
with the address of a suitably defined function as shown in Figure 5-68. When the
P1 a net Lis t is destroyed, it can then call the des t roy P1 a net function on each planet
that it owns.
II client.c
#include "client.h"
lIinclude "planet.hl!
#include "planetlist.h"
I I ...
static void destroyPlanet(Planet *p) { delete p; }
I I ...
Client::initC)
{
PlanetList::setDestroyPlanetFunc(&::destroyPlanet);
II
};
II
A rough sketch of the relevant portions of the p 1 a net 1 i s t component is given in Fig-
ure 5-69. Class P1 an e t Lis t provides a mechanism for a client at a higher level to
install the callback function to destroy a P1 an e t. When a P1 an e t Lis t is destroyed,
the destructor checks to see if a destroy function has been installed, and if so applies it
to each P1 an e t in the list in tum. If no callback function has been installed by the time
the Pl anetL i st is destroyed, the contained Pl anet objects are not destroyed and the
dynamic memory associated with each Pl anet is "leaked." (Memory leaks are dis-
cussed in Section 10.3.5).
Section 5.7 Callbacks 287
II planetlist.h
I I ...
class PlanetListlter;
class PlanetList {
/ I ...
friend PlanetListlter;
public:
typedef void DestroyPlanetFunc(Planet *);
private:
static DestroyPlanetFunc *d_destroyPlanetFunc_p;
public:
static void setDestroyPlanetFunc(DestroyPlanetFunc *func);
I I ...
""'PlanetList ();
I I ...
};
II
class PlanetListlter {
I I ...
public:
PlanetListlter(const PlanetList &list);
""'PlanetListlter();
void operator++();
operator const void *() const;
const Planet& operator()(). const;
}; ,
/
I I ...
II planetlist.c
#include "planetlist.h"
PlanetList::DestroyPlanetFunc *PlanetList::d_destroyPlanetFunc_p - 0;
U sing callbacks in this way is not the least bit elegant. Using P1 an e t Lis t requires
knowing about low-level details with which a client should not be bothered. This
approach is not recommended for public interfaces, as it can be assumed that people
will forget to initialize the container class before they use it. To make matters a bit
worse, P1 an et Lis t is not in the public interface of So 1 a rSy s tern. It will therefore be
necessary for So 1 a rSy stem to provide a static member such as
class SolarSystem {
// ...
public:
static void init(void (*)(Planet *));
// ...
};
that must then forward the initialization call to the P1 an e t Lis t class.·-
Used inappropriately, callbacks can blur the responsibility of low-level objects and
result in unnecessary conceptual coupling. In general, callbacks (like recursion) can
be more difficult to understand, maintain, and debug than conventional function calls.
Their (pseudo) asynchronous behavior requires a different type of attention from
developers. As a rule, callbacks should be treated as a refuge of last resort.
In the name of minimizing complexity and effort, it is easy to become too frugal with
classes. Trying to implement an integer list with only a single class is a good illustra-
tion of this common mistake. One might suggest, as in Figure 5-70a, that a list could
be just a pointer to aLi n k or, as in Figure 5-70b, that the link operations could be
merged with the methods associated with the List itself.
The problem with approach (a) is that the level of abstraction is too low for an appli-
cation to use effectively. Approach (b) fails to encapsulate private implementation
details of Lis t. Clients of a list abstraction will not want to be bothered with the low-
section 5.8 Manager Class 289
level details of managing the memory of the individual links, or with ensuring that the
low-level policies of a list implementation are enforced.
(a)
Li nk* ...--....ott> - C> -- o
int int int
Link Link Link
o
(b)
? int int int
List List List List
Figure 5-70: How Not to Implement ali 5 t Component
Even in a two-class list architecture, the role of the subordinate class can be abused.
Normally, a list object itself destroys each of its links directly, but as shown in Figure
5-71, the destructor for this Lis t deletes only the head Lin k. Each Lin k, in tum,
recursively deletes its d_next_p pointer. This "elegant" approach (apart from being
slower and running the risk of overflowing the program stack for long lists) makes it
less clear which object owns which, primarily because instances of the same type are
authorized to destroy one another. A better, more hierarchically structured way for the
Lis t class to clean up when it is destroyed.is to traverse the list of Lin k objects and to
delete each Lin k in turn, as shown in Figure 5-72.
Figure 5-71: Lis t with Lin k that Recursively Deletes the Next Lin k
290 Levelization Chapter 5
List: :'"'"'List()
{
while (d_head_p) (
Link *p = d_head_p;
d_head_p = d_head_p->next();
delete p;
}
}
Figure 5-72: List with Destructor that Iteratively Deletes Each Link
Again the corporation analogy pertains. Regular employees do not hire and fire each
other; that job is reserved for managers. The intrinsic problem is in not distinguishing
between the classes used to implement an abstraction and the manager class used to
enforce policy, manage memory, and coordinate the implementation classes. Note that
the manager class knows about its subordinate classes, but not vice versa.
All too often the cyclic interconnection among instances of classes seems to suggest
that this cyclic nature should be reflected in the physical design of a system. For small
cyclicly dependent networks of objects that are inherently tightly coupled and whose
definitions fit easily within a single component, there may be no reason to eliminate
such cycles. That is, if it makes sense from a standpoint of usability and reuse to
present two or more cohesive logical units in a single physical unit, and the functional
complexity of the combined implementation does not pose an obstacle to effective
testing, then there may be no problem that requires solving. On the other hand, the
coupling may also be the result of not knowing how to avoid the interdependenc~es, or
of not even having considered the issue in the first place.
As another example where the concept of a manager class proves useful, consider a
simple graph consisting of nodes and edges. A graph is among the most basic of het-
erogeneous class networks; yet a node could be as complex as a workstation on a
local area network (LAN), or a planet in a solar system. In other words, the size and
complexity of the graph-independent portion of the node and/or edge might be very
section 5.8 Manager Class 291
large compared to its network-related aspects. It is in these cases that there is consid-
erable motivation to decouple nodes from edges.
Let us start with the situation suggested by Figure 5-10. We can illustrate the princi-
ples related to achieving a levelizable interconnected network of heterogeneous
objects by attempting to develop a simple graph with the premise that Node and Edge
are complex and should belong to distinct physical components.
A known effective technique for avoiding cyclic physical dependencies is to make all
pointers and references to higher-level components be in name only. Perhaps we can
concoct a levelizable subsystem in which edge dominates node. Our strategy will be
to have Node hold a collection of opaque Edge pointers, as illustrated in Figure 5-73.
Taking this approach means that all substantive questions that involve edges cannot be
answered at the node level.
Level 2:
In Name Only
Levell:
Second, testing Node requires creating a dummy Edge class in order to gain access to
the private addEdge function-that is, we are not able to test Node from its intended
public interface alone.
292 Levelization ChapterS .'
class Node {
I I ...
friend Edge; II long-distance friend
. void addEdge(Edge *edge); II private, set only by edge
.,/ Node (const Node&);
Node& operator=(const Node&);
public:
Nade(const char *name); II Who owns the memory for nodes?
-Node ( ) ; II Who is allowed to destroy them?
canst char *name() canst;
int numEdges() canst;
Edge& edge(int index) const; II Reference hampers testing slightly
}; II since Edge is used in name only.
Third, the Node's edge function is correctly designed (from the end-user perspective)
to return, references and not pointers. A reference (even an opaque one), unlike a
pointer, must identify the address of a valid object and therefore cannot (portably) be
null or refer to an illegal address. So if we ask for an Edge of a newly created Node
(which has no edges) we are in trouble. Incrementally testing Node's public edge
function at the node component's level requires not only creating a dummy Edge class
to gain access to the private add Ed ge function of Nod e, but also adding actual
instances of this bogus Edge class so that their (valid) addresses can be compared later
against the lvalues returned by edge (i nt).
Finally, it is not clear who owns the memory for Node instances or who is allowed to
create and destroy them. For example, what happens if we try to destroy a Nod e before
we have removed all of its edges? The answer is that nothing unusual happens-at
least not right away_ Since Node does not know about Edge, it does not know how to
destroy one. Using an Ed ge to access a deleted Nod e will, of course, result in unpre-
dictable behavior. We could pass a callback function to Node that knows how to delete
an Edge, but then we must ensure that Edge objects are created only on the heap.
At this point our design has run out of steam. As often happens in practice, we need to
step back and look at the abstraction we are trying to implement, namely a graph. Just
as with rectangl e and wi ndow, neither node nor edge inherently dominates the other.
There is a mutual dependency involving ownership, which we need to escalate to a
higher level of the system.
Section 5.8 Manager Class 293
Figure 5-75 shows the basic architecture of the new design that will serve as a sound
starting point. Class Grap h will be responsible for managing the memory associated
with instances of both Edge and Node. Nodes and edges will be added to the graph
through Grap h 's interface, as opposed to creating them independently. When a Nod e is
deleted from the graph, Grap h itself will ensure that all Ed ge objects attached to that
Node will be deleted first. This basic design still suffers from the problem that both
Node and Edge must declare Graph to be a fri end. Otherwise unruly clients could, for
example, add an Edge to a Node unbeknownst to either the Edge or the Graph, causing
the graph subsystem to become internally inconsistent. Since we want Node and Edge
to be defined in separate components, we are still not satisfied.
Level 2:
Levell:
In Name Only
For a simple graph, it may be entirely reasonable to place all three classes within a
single component. But because our goal here is to use the graph to illustrate how to
implement much more complex networks, we will not take that approach. There are
(at least) two other ways to address this problem:
1. Factor out as much code as possible from the coupled system into inde-
pendent components, and place the remaining, mutually dependent
classes in a single component.
2. Escalate the level at which encapsulation for the entire subsystem occurs
to eliminate the need for low-level friendships~~/
5.9 Factoring
dependencies to a higher level where their adverse effects are less pronounced.
To demonstrate the use of factoring, suppose we are given a design consisting of three
intrinsically interdependent classes A, B, and C, as illustrated in Figure 5-76a. Suppose
further that the original logical interface is cast in stone and may not be modified.
More than likely, not all of the functionality implemented in these three classes is
inseparably coupled to the rest. We can use the technique of factoring to extract any
independently testable implementation complexity, and thus reduce the burden of
maintaining the truly cyclicly dependent portion of the code. As illustrated in Figure
5-76b, if we are successful in factoring a significant amount of the implementation
into independent components, the remaining interdependent code may be small
enough to justify placing it into a single component.
section 5.9 Factoring 295
Fortunately, our graph example is less extreme than the hypothetical case above. We
have some flexibility in our logical design, and it will turn out that the implied physi-
cal dependencies are not as severe as the hypothetical ones we are postulating. For
now, let us continue to assume the worst-that is, that our initial graph subsystem is a
design consisting of three intrinsically, mutually dependent classes:
Graph
Node Edge
296 Levelization ChapterS
The first place to employ factoring is to separate the part of Nod e that holds graph-
related ,data from the part of Node that holds graph-independent data. Inheritance is
ideal for this kind of factoring. We can do the same for Edge. The basic idea is shown
in Figure 5-77.
Level 2:
Levell:
node edge
In this new design, all of the tightly coupled, graph-related functionality lives in a sin-
gle component, implemented using the three classes Graph, Gnode, and Gedge. The
graph-independent data contained in Node and Edge is now pushed down to a lower
level, and can be shared with other applications that are not concerned with the graph-
related functionality.
Figure 5-78 illustrates the factored, network-independent portions of node and edge.
In this trivial illustration a Node is nothing more than a name, and an Edge is just a
do U b 1e. But suppose for a minute that the nodes in the graph are actually cities and
the edges are roads. The network component of a city, implicit in Gnode, is not neces-
sary to perform many complex operations on a Node itself. A Gnode is just a special
kind of Node that participates in Graph operations. Once an instance of Gnode has
been obtained from Graph, it can be used anywhere in which a Node is required, as
illustrated in Figure 5-79.
section 5.9 Factoring 297
// node.h /1 edge.h
#ifndef INCLUDED NODE #ifndef INCLUDED_EDGE
#define INCLUDED_NODE #define INCLUDED_EDGE
public: public:
Node(const char *name); Edge(double weight);
NodeCconst Node&); EdgeCconst Edge&);
-Node(); ~EdgeC) ;
Nod~& operator=Cconst Node&); Edge& operator=(const Edge&);
const char *nameC) const; / double weight() const;
}; };
#endif #endif
class Node;
class ostream;
class Census {
static int countPeople (const Node& node)
I I ...
};
#include "graph.h"
int g(const Gnode& gnode)
{
return Census::countPeopleCgnode); II uses only the Node portion
}
Another advantage in factoring nodes and edges involves a concept called value
semantics. Saying that a type has value semantics means that a copy constructor and
(usually) an assignment operator are inherently (Le., semantically) valid operations
for a type. 15
15Sometimes we choose not to implement a copy constructor (e.g., for an iterator) even when the
operation could make sense; however, the abstraction itself has value semantics.
298 Levelization ChapterS
For example, consider a condominium complex that contains a fixed amount of land
on which to build single-family homes. The land is divided into 25 lots, arranged in a
5-by-5 grid. The rows of lots are labeled A to E, and the columns are labeled 1 to 5 as
shown in Figure 5-80.
D~O 0 0
C~O 0
B
000
o
1 2 3 4 5
CondoComplex
Each Lot is a separate object that maintains its own list of adjacent lots and is man-
aged by the CondoComp 1ex object. For example, Lot A2 holds pointers to Lot objects
AI, B2, ~nd A3. A House has value semantics because copy construction makes sense
for a House. In other words, it makes sense to copy a House from Lot to Lot-that is,
all houses could look exactly the same.
Suppose now that a Property consists of both the House and the Lot on which it sits,
and that the CondoCompl ex object manages an array of Property objects instead of
Lot objects. Does a Property.. also have value semantics? The answer is no, because
we cannot copy one lot to another.
If we tried to assign the Property with Lot location A2 to the Property with Lot
location C4, we would clobber the adjacency list associated with Lot C4 and invali-
date the larger CondoComp 1 ex object. We therefore cannot make arbitrary independent
Section 5.9 Factoring 299
copies of a Property the way we can for a House. A Property therefore does not
have value semantics.
Although the network portion of a node (defined by Gnode) does not have value
semantics, the part that is defined by Nod e probably does. In C++ terms, this means
- that the copy constructor and assignment operator of both Gnode and Gedge would
necessarily be disabled (i.e., declared private), but Node and Edge could each define
meaningful copy constructors and assignment operators, as shown in Figure 5-81.
(The complete interface for graph is given in Figure 5-86.)
Our second opportunity to factor comes from the observation that, in order to manage
Node and Edge objects properly, Graph will need to keep track of the Gnodes an Gedge
objects it allocates so that when it is destroyed, all of the memory associated with
the nodes and edges of this graph can be recovered. Moreover, each Gnode will also
have to keep track of the Gedge objects adjacent to it (in name only). We have the
opportunity to factor out all of this functionality from the graph component classes
by creating a collection of opaque pointers.
A bag is a kind of container that, unlike a list, does not impose an order on its ele-
ments nor, unlike a set, does it require elements to be unique. Because the semantics
300 Levelization Chapter 5
of a bag are not heavily specified, its implementation is left quite flexible. A Graph
will maintain a bag of Node pointers and a bag of Edge pointers. Whether or not We
have an efficient template implementation, we will want to factor this problem further
by creating a bag of (generic) pointers.
Figure 5-82 shows our factored implementation of a generic bag of pointers and Spe-
cialized components that take advantage of this generic container to implement bags
of pointers of a specific type. We can use either layering or private inheritance to
achieve the desired specialization and restore the type safety of the individual opaque
pointers. Templates would be ideal, but some implementations can be very costly in
terms of link time (as discussed in Section 10.4.1). For purely pragmatic reasons we
may be forced to express the specialized types explicitly. Whatever the implementa-
tion, all of the function arguments are forwarded to the generic Pt r Bag class via
i n 1 i ne functions in order to avoid incurring any additional overhead due to conven-
tional function calls.
gnodeptrbag gedgeptrbag
ptrbag
Figure 5-83 shows the header for a ptrbag component, consisting of four classes.
Pt rBa 9 Lin k is a low-level implementation class whose use is an encapsulated
section 5.9 Factoring 301
implementation detail of the other three classes in the p t r bag component. We could
instead have placed PtrBagLink in a separate component, defined it entirely within
the pt r bag. c file, or nested it within class Pt r Bag. (The advantages and disadvantages
of these and other similar design alternatives are compared and discussed in Section
8.4.)
II ptrbag.h
#ifndef INCLUDED_PTRBAG
#define INCLUDED_PTRBAG
class PtrBaglter;
class PtrBagManip;
class PtrBagLink {
void *d_pointer_p;
PtrBagLink *d_next_p;
private:
PtrBagLink(const PtrBagLink&);
PtrBagLink& operator=(const PtrBagLink&);
public:
PtrBagLink(void *pointer. PtrBagLink *next);
---PtrBagLink();
PtrBagLink *&nextRef(); II used by manipulator
'PtrBagLink *nextC) const;
void *pointerC) const;
};
class PtrBag {
PtrBagLink *d_root_p;
friend PtrBaglter;
friend PtrBagManip;
private:
PtrBagCconst PtrBag&);
PtrBag& operator=(const PtrBag&);
public:
PtrBag();
. . . PtrBag();
void add(void *pointer);
void removeAll(const void *pointer);
};
class PtrBagIter {
PtrBagLink *d_link_p;
302 Levelization Chapter ~
private:
PtrBaglter(const PtrBaglter&);
PtrBaglter& operator=(const PtrBaglter&);
public:
PtrBaglter(const PtrBag& bag);
""'PtrBaglter();
void operator++();
void *operator()() const;
operator const void *() const;
};
class PtrBagManip {
PtrBagLink **d_addrLink_p;
private:
PtrBagManip(const PtrBagManip&);
PtrBagManip& operator=(const PtrBagManip&);
public:
PtrBagManip(PtrBag* bag);
---PtrBagManip();
void advance();
void remove():
void *operator()() const;
operator const void *() canst;
}:
#endif
Pt r Bag is a container used to hold generic pointers. For this application, a redundant
but convenient member function is supplied to remove all pointers with the specified
value from the Pt rBa g. Pt rBa 9 I te r is part of the logical abstraction of a bag of point-
ers, allowing clients to iterate over the bag, returning its contents in some unspecified
order. Pt r Bag Man i p is similar to Pt r Bag I t e r except that it allows its client to modify
the bag by selectively removing entries-a capability punctuated by requiring the client
to supply the address of the container to be manipulated.
them poor candidates for inlining. The add function accesses the global free store, so
it is useless to try to inline it for speed purposes. The remove function consists of
enough code that calling a function will probably produce less object code than sub-
stituting the source in place. While the remove function call adds some execution
~
II ptrbag.c
#include "ptrbag.h"
PtrBag: :-PtrBag()
{
PtrBagManip man(this);
while (man) {
man.remove();
}
}
void PtrBagManip::remove()
{
PtrBagLink *tmp = *d_addrLink_p;
*(PtrBagLink **)d_addrLink_p = (*d_addrLink_p)->next();
delete tmp;
}
The component-dependency graph for the new subsystem is shown in Figure 5-85.
Look at all of the functionality that has been extracted from the cyclic group of
classes buried in the graph component. This functionality can now be tested and
reused independently of that cycle. The functionality in gedgeptrbag is reused in two
different ways even within the graph component itself: once in class Graph to keep
304 Levelization Chapter 5
track of all edges, and once in class Gnode to keep track of connected edges. At this
point we have reduced the amount of cyclicly dependent code to a manageable level
of complexity appropriate for a single component-g rap h. The complexity of the
graph-independent functionality identified by either Node or Edge is now segregated
into independent components, that are testable in isolation.
Level 3:
Level 2:
Levell: /
Figure 5-86 gives the complete header file for the graph component. This implemen-
tation is efficient, flexible, and reasonably maintainable. However, using this compo-
nent is not so straightforward because some of the interface (along with the
implementation) has been factored out and placed in reusable components at lower levels.
II graph.h
#ifndef INCLUDED_GRAPH
#define INCLUDED GRAPH
#ifndef INCLUDED_NODE
#include "node.h"
#endif
#ifndef INCLUDED EDGE
#include "edge.h"
#endif
#ifndef INCLUDED_GNODEPTRBAG
#include "gnodeptrbag.h"
4Iendif
4Iifndef INCLUDED_GEDGEPTRBAG
#include "gedgeptrbag.h"
lIendif
class Graph;
Section 5.9 Factoring 305
Figure 5-86: graph Component Header Defining Classes Gnode, Gedge, and Graph
306 Levelization Chapter 5
For example, suppose you wanted to iterate over the edges connected to a particular
node in a graph. You would need to get the bag of Gedge pointers from that Gnode and
then use that bag to construct an instance of EdgePtrBagIter:
Conveniently, the same methodology works for obtaining all of the edges and nodes
from the graph itself, as illustrated in the implementation of the output operator for a
Graph given in Figure 5-87.
GnadePtrBaglter nit(graph.nodes());
if (nit) {
a «II Nodes: lI
;
}
for (; nit; ++nit) {
a <<" "< < nit ( ) - ) name ( ) ;
}
canst char *p =" Edges: " .,
canst char *q = II ";
A test driver implementing the graph component of Figure 5-86 is given, along with
its output, in Figure 5-88. Notice that the Gnode pointers returned by both addNode
and fi ndNode point directly at the corresponding Gnode within the Graph. The only
publicly available function in Gnode, edges ( ), supplies a const reference to its bag of
Gedge pointers, which can then be used directly by the client to traverse the graph.
section 5.9 Factoring 307
The only public functionality available in a Gedge provides access to the two Gnode
objects to which the Gedge is connected .
// graph.t.e
#inelude "graph.h"
ifinclude "gnodeptrbag.h"
ifinelude "gedgeptrbag.h"
#include <iostream.h)
maine)
{
Graph g;
{
Gnode *nl - g.addNodeC"Mindy");
Gnode *n2 - g.addNode("Susan");
Gnode *n3 - g.addNode("Rick");
g.addNodeC"Franklin");
g.addNode("Cathy");
}
9 . add Ed9,e ( 9 . fin dNod e ( Ric k" ), g. fin d Nod e ( Fran k1 i nil), 2);
II II
cout « g;
1-/ Output:
john@john: a.out
Graph:
Nodes: Cathy Franklin Rick Susan Mindy
Edges: Rick ---(3)--) Cathy
Riek ---(2)--) Franklin
Susan ---(6)--) Franklin -, ..
Figure 5-88: Simple Test Driver Illustrating Usage of the graph Component
308 Levelization Chapter 5
In this implementation of Grap h, private access via (local) friendship to Gnod e and
Gedge is essential to preserving encapsulation. This design eliminates the problems
associated with long-distance friendship by physically uniting the parts of the system
that need to share common implementation details via private access. In other words,
by combining Graph, Gnode, and Gedge in a single component, the required friend-
ships are no longer long-distance ones.
As illustrated in Figure 5-89, it turns out that Gnode and Gedge depend on each other
in name only, and have no backward dependency on Graph. Although the three classes
have no cyclic interdependencies, there is still a need for factoring. Clients of this sub-
system will need to interact directly with both Gnode and Gedge. Making the entire
interface of either Gnode or Gedge public would expose clients to implementation
details of the 9 rap h component. Worse, doing so would allow clients to violate impor-
tant policies enforced by the Grap h manager class.
Graph
In Name Only
For example, making the Gedge constructor public would allow clients to bypass the
Graph object and create instances of a Gedge on the program stack. There would be
section 5.9 Factoring 309
nothing to stop a wayward client from adding aGe d9 e created on the program stack to
a legitimate Gnode belonging to an otherwise valid Graph.
To avoid these problems it is necessary to grant class Grap h access to private function-
ality defined in both Gnode and Gedge. Avoiding long-distance friendship then forces
us to place these intimately dependent classes in the same component. Although there
is no direct physical dependency brought on by granting friendship, modularity and
encapsulation dictate the effective physical coupling suggested in Figure 5-90.
graph
The fact that the physical coupling is brought about only by friendship (as discussed in
Section 3.6) and not hard physical dependencies opens the door to another technique,
which we will explore in the next section. For completeness, the implementation file
for the graph component is provided in Figure 5-91.
To summarize the results of this section: factoring is a general technique that can be
used to reduce the maintenance cost of designs with inherent cyclic dependencies. By
relocating some of the implementation complexity to lower-level components, that
functionality can be tested (and possibly reused) independently of the remaining
cyclicly interdependent code. Factoring results in more flexible architectures without
sacrificing runtime efficiency. When factoring the interface of a subsystem, clients
may be asked to use component interfaces atlower levels of the subsystem hierarchy.
310 Levelization Chapters
II graph.c .
#include "graph3.h"
#include <string.h)
II -*-*-*-*- class Gnode -*-*-*-*-
Gnode: :Gnode(const char *name) : Node(name) {}
Gnode: : . . . Gnode () {}
void Gnode::addCGedge *edgePtr) { d_edges.add(edgePtr); }
void Gnode::remove(Gedge *edgePtr) { d_edges.removeAll(edgePtr); }
const GedgePtrBag& Gnode::edges() const { return d_edges;
interface component
:f.:J>i:
................ :: .. :::.:.::: . .
implementation component
t..···· . 2:.L····Z·.i D
• •
• •
• •
I
•
I
I
Suppose that component y in Figure 5-92b defines the 0 r de red Poi n t Colle c t ion of
Section 5.7. Clients of our subsystem may have absolutely no need for ordered point
collections, yet this component is used by other components within our subsystem to
implement higher-level functionality. At the lower levels of a subsystem, components
will be exchanging correspondingly lower-level information. This information,
although it is an implementation detail to the end user, is well defined, predictable,
and appropriate for the interfaces of low-level components.
We could try to hide 0 rde red Po i ntCo 11 ect i on by making all of its interface func-
tions private and granting specific, higher-level components, such as u and v, friend
status-but why complicate matters? There is no harm a client can do with the defini-
tion of 0 r de red Poi n t C0 11 e c t ion so as long as this type is not used in the interfaces
of the components that define the overall interface to the subsystem. 17
The subsystem shown in Figure 5-92a is similar in structure to the factored implemen-
tation of the graph subsystem presented in the previous section. In that architecture
(Figure 5-85), clients were asked to make use of lower-level components (such as
ptrbag) in the normal course of using the subsystem.
Recall that in the factored implementation of the graph subsystem, both Gnode and
Gedge were managed by Graph, meaning that Graph alone was authorized to create
17 If animplementation class provides functions that alter static variables within the class or . c file,
this principle may not hold.
section 5.10 Escalating Encapsulation 315
and destroy Gnodeand Gedge objects. In that implementation, both Gnode and Gedge
were not encapsulated details of the subsystem; instances of these types, comprising
the graph's implementation, were readily accessible through the interface of class
Graph itself. To prevent clients from usurping the manager class's authority, much of
the interface to both Gnode and Gedge was declared pri vate, and Graph alone was
granted fri end status. Solely to avoid the breach of encapsulation that would result
from long-distance friendship, we were compelled to place Graph, Gnode, and Gedge
within a single component.
In the factored solution, only Graph had private access to Gnode and both classes were
defined in the same component. That approach eliminated the potential for improper
direct use of Gnode by clients, but it precluded direct testing of Gnode as welL
With this new approach, instead of being forced to test the low-level functionality of
Gnode (e.g., adding and removing Gedge pointers) indirectly through the interface of
Gra ph, it is now possible for test engineers to verify this now-pUblic behavior directly.
However, ordinary clients will now also have direct access to this low-level functionality.
316 Levelization ChapterS
II gnade.h
#ifndef INCLUDED_GNODE
#define INCLUDED_GNODE
#ifndef INCLUDED_NODE
#include "node.h"
flendi f
#ifndef INCLUDED_GEDGEPTRBAG
#include "gedgeptrbag.h"
#endif
public:
Gnode(const char *name);
. . . Gnode() ;
void add(Gedge *edgePtr);
vaid remave(Gedge *edgePtr);
canst GedgePtrBag& edges() canst;
};
#endif
Originally, Graph was granted private access to both Gnode and Gedge to preserve
encapsulation. The encapsulation was at risk only because clients of Grap h were
granted direct access to the Gnode and Gedge objects, which themselves were largely
implementation details of Graph. If we stop exposing Gnode and Gedge in the inter-
face of Graph, we can avoid this problem entirely.
Failing to publish header files is not the solution-that's cheating. Not granting cli-
ents access to one or more header files will make the use of certain types opaque,
but these types are still programmatically accessible in name and therefore not
section 5.10 Escalating Encapsulation 317
encapsulated details. For example, an opaque pointer obtained from one part of the
system could be unexpectedly reintroduced by clients into another part of the system
in a way that renders the system internally inconsistent.
Notice how easy it is for a client to extract an opaque S pointer from an instance of
class Wand use it to influence an instance of class E directly:
Compare this approach with a design that properly hides its implementation details
behind an encapsulating interface (i.e., a design where rhere is no exposure of the
implementation types in the logical interface of the wrapper componen~ for that sub-
system). Even with access to all header files, there is still no programmatic way to
318 Levelization Chapter 5
access the low-level implementation objects hiding behind the truly encapsulating
interface of the wrapper.
The advantages of proper encapsulation are many. A clear example is reuse. Trying to
encapsulate an implementation type by withholding a header file effectively prevents
public reuse of that implementation component. If encapsulation is done properly, cli-
ents can have side-by-side access to both low-level types and the subsystems that use
them internally, with no fear that private details of the subsystem will be exposed.
Let us now return to our graph example. Successfully levelizing this new graph archi-
tecture will not be achieved by hiding the low-level implementation types of our sub-
system from test engineers and/or clients. It makes no difference what others do with
their own instances of these types. Rather, successful levelization of this architecture
will be achieved by ensuring that there is no programmatic way to access any instance
of any implementation type that is part of an instance of our subsystem.
Level 4:
Level 3:
Level 2:
Levell:
The Node and Edge classes, containing only network-independent data, are also pro-
grammatically accessible from the interface of the new graph compone~~. However,
from the perspective of users of the graph subsystem, the types Gnode, Gedge, and
Graphlmp and all types defined in ptrbag are now implementation details that are
fully encapsulated by the new wrapper comp.onent.
320 Levelization Chapter 5
To appreciate this solution, consider that a client who has access to 9nod e . h still can-
not affect any Gnode that has been created through the graph component's interface.
Of course, the user is still free to create and manipulate his or her independent Gnode
instances (i.e., for testing purposes).
Figure 5-96 shows the header file of the wrapper component for the new graph sub-
system. The four additional support classes (Nodeld, Edgeld, Nodelter, and
Edge I te r) establish the encapsulation, and either supply or require private access to
Grap h. All of these classes must therefore reside in the same component as Grap h in
order to avoid long-distance friendships.
II graph.h
#ifndef INCLUDED_GRAPH
#define INCLUDED_GRAPH
#ifndef INCLUDED_GRAPHIMP
#include "graphimp.h"
#endif
#ifndef INCLUDED_GNODE
#include "gnode.h"
#endif
#ifndef INCLUDED_GEDGE
#include "gedge.h"
#endif
class Nodeld {
Gnode *d_node_p;
friend Edgeld:
friend Graph:
friend Nodelter:
friend Edgelter:
section 5.10 Escalating Encapsulation 321
private:
NodeId(Gnode *node) : d_node_p(node) {}
Gnode *gnode() const { return d_node_p; }
public:
NodeId() : d_node_p(O) {}
Nodeld(const Nodeld& nid) : d_node_p(nid.d_node_p) {}
~NodeldC) {}
Nodeld& operator=(const Nodeld& nid) { d_node_p = nid.d_node_p; return *this; }
operator Node *() const { return d_node_p; }
Node *operator-)() const { return *this; }
};
class EdgeId {
Gedge *d_edge_p;
friend Graph;
friend Edgelter;
private:
EdgeldCGedge *edge) : d_edge_p(edge) {}
Gedge *gedge() const { return d_edge_p;
public:
Edgeld() d_edge_p(O) {}
EdgeId(const Edgeld& eid) : d_edge_p(eid.d_edge_p) {}
""'EdgeldC) {}
EdgeId& operator=(const Edgeld& eid) { d_edge_p = eid.d_edge_p; return *this; }
Nodeld from() canst { return NodeId(d_edge_p-)from()); }
NodeId toe) const { return NodeldCd_edge_p-)toC»; }
operator Edge *() const { return d_edge_p; }
Edge *operator-)C) const { return *this; }
};
class Graph {
Graphlmp d_imp;
friend NodeIter;
friend Edgelter;
private:
GraphCconst Gra.ph&); II not implemented
Graph& operator=Cconst Graph&); II not implemented
public:
Graph() {}
~Graph () {}
NodeId addNodeCconst char *nodeName)
{
return NodeIdCd_imp.addNode(nodeName»;
}
Nodeld findNode(const char *nodeName)
{
return NodeIdCd_imp.findNodeCnodeName»);
}
322 Levelization Chapters
class Nodelter {
GnodePtrBagIter d_iter;
private:
Nodelter(const Nodelter&); II not implemented
Nodelter& operatar=(const Nodelter&); II not implemented
public:
NodelterCconst Graph& graph) : d_iterCgraph.d_imp.nodes(» {}
voi d operator++C) { ++d_ iter; }
operator const void *() canst { return d_iter; }
NodeId operator()() const { return NodeIdCd_iter(»; }
};
class Edgelter {
GedgePtrBaglter d_iter;
private
Edgelter(const Edgelter&); II not implemented
EdgeIter& operator=(const Edgelter&); II not implemented
public:
EdgeIterCconst Graph& graph) : d_iter(graph.d_imp.edges(» {}
EdgelterCconst Nodeld& nid) : d_iter(nid.gnode()-)edges(» {}
void operator++() { ++d_iter; }
operator const void *() const { return d_iter; }
Edgeld operator()C) const { return EdgeldCd_iter(»; }
};
1Fendif
Notice that, in this interface, there is no direct access to any Gnode or Gedge. Adding
or looking up a node returns a surrogate object of type Node I d, which holds a pointer
to a Gnod e, but under no circumstances will a Nod e I d ever let the client have access to
more than just the Node portion of the Gnode it holds.
Modifying the old test driver to accommodate the new wrapper interface requires only
a few minor changes. In particular, CGnode *) types are replaced by Nodeld types
and a few unnecessary 1fi ncl ude directives are eliminated. The output is, of course,
identical. The modified test driver is shown in Figure 5-97.
II graph.t.e
/linclude "graph.hl!
#include <iostream.h>
ma i n ( )
{
Graph g;
{
Nodeld nl = g.addNode("Mindy");
Nodeld n2 = g.addNode("Susan");
NodeId n3 = g.addNode("Riek");
g.addNode("Franklin");
g.addNode("Cathy"):
}
cout « g:
}
Use of the wrapper intetface is in some respects simpler than a factored implementation
because most, if not all, of the available functionality is presented in a single, mono ..
lithic header file. For example, to iterate over the edges in a graph or node, we do not
need to look further than the header for graph itself:
Wrapping has the disadvantage of making the interface less flexible and communication
across it slower. A wrapped subsystem is also likely to be more costly to develop ini-
tially. However, wrapping may be the only truly effective way to achieve both level-
ization and encapsulation for 'subsystems involving many highly interdependent
components.
We have come a long way from the simple two-component example of Figure 5-10 in
Section 5.1.3, but the seven components in Figure 5-95 lay a strong hierarchical foun-
dation for producing a complex yet easy-to-use and highly reliable subsystem. The
topic of wrappers is continued in Section 6.4.3, where we discuss how to insulate our
clients from compile-time dependency on the implementation types below our wrapper
components.
5.11 Summary
By considering the physical implications of our logical design and proactively engi-
neering our system as a levelizable collection of components, we create a hierarchy of
section 5.11 Summary 325
modular abstractions that can be understood, tested, and reused independently of the
rest of our design.
Using these techniques to create levelizable designs tends to reduce the large, some-
times even overwhelming, logical design space, and helps to guide developers in the
direction of more mainstream, maintainable architectures. Fortunately there is a ser-
endipitous synergy between good logical design and good physical design. Given
time, these two design goals will come to reinforce one another.
Insulation
First we establish the need for addressing insulation as part of our overall architectural
design, providing both theoretical and experimental justification. Next, we identify
many specific C++ constructs that can cause compile-time coupling without attempt-
, ing to alleviate it. In Section 6.3, we discuss several techniques for insulating individ-
ual details of the implementation exposed via the following mechanisms:
• compiler-generated functions,
• include directives,
• private member data, and
• default arguments.
In Section 6.4, we discuss wholesale techniques used for insulating all details of the
implementation:
• protocol classes,
• fully insulating concrete classes, and
• insulating wrapper components.
Insulating very large subsystems presents a unique problem for developers. In Section
6.5, we explore implementing an ANSI C--compliant procedural interface for a very
large C++ system.
Finally, in Section 6.6, we explore the conditions under which insulation is indicated.
The basic runtime costs associated with insulation will be presented, along with specific
conditions under which insulation is not appropriate. We demonstrate the process of
applying insulation, and measure the runtime costs associated with various degrees of
insulation.
Consider the header file for the stack component shown in Figure 6-1. The logical
interface of this St a c k class fully encapsulates its implementation. Programmatically,
Section 6.1 From Encapsulation to Insulation 329
II stack.h
#ifndef INCLUDED_STACK
#define INCLUDED_STACK
class Stack {
int *d_stack_p;
int d_size;
int d_length;
public:
Stack() ;
Stack(const Stack &stack);
-Stack();
Stack& operator=(const Stack &stack);
void push(int value);
i nt pop ( ) ;
int tope) canst;
int isEmpty() const;
};
#endif
II stack.h
#ifndef INCLUDED_STACK
#define INCLUDED_STACK
class StackLink;
class Stack {
StackLink *d_stack_p;
public:
StackC);
Stack(const Stack &stack);
""'Stack();
5tack& aperator=(const Stack &stack);
void push(int value);
int pope);
int tope) canst;
int isEmpty() canst;
};
{fend; f
Even though both St a c k classes fully encapsulate their implementations, any experi-
enced C++ programmer looking at these header files can immediately determine the
general implementation strategy of these components. Each of these stack compo-
nent headers illustrates the difficulty in concealing proprietary implementations even
with encapsulating interfaces. Inline functions can exacerbate the problem by expos-
ing clients to algorithmic details as well.
But the desire to keep component implementations proprietary is not the dominant
problem for large projects. A client has a right to expect that the logical interface of a
component will not change, and ideally changes made to the logical implementation
of a component should not affect clients. In reality, however, the C++ compiler
depends on all information in a header file, including private data. If a human being
can determine the implementation strategy of a component by inspecting its header,
then it is likely that clients of the component would be forced to recompile if the
implementation strategy of that component changes.
Even for relatively small systems (say, 50,000 lines total), this type of coupling is burden-
some at best; for medium and large systems, it is intolerable. For example, a . c file that
should take only seconds to compile now takes minutes, and the total compile-time cost
of a single uninsulated change is now measured not in CPU seconds but in CPU hours!
section 6.1 From Encapsulation to Insulation 331
The system illustrated in Figure 6-3 consists of a base class Sha pe, a number of specific
shapes derived from Shap e, and a number of clients that depend only on the base class
shape. This system has no cyclic physical dependencies and is therefore levelizable.
II shape.h
#ifndef INCLUDED_SHAPE
#define INCLUDED_SHAPE
Shape {
int d_x; II could change to short
int d--y; II could change to short
public:
ShapeCint x, int y);
virtual void draw() canst;
int xOrigin() canst;
I I ...
};
#endif
II circle.h II client3.c
#ifndef INCLUDED CIRCLE #include "client3.h"
#define INCLUDED_CIRCLE #include "shape.h"
1Iendif
Originally the author of class Shape decided to use integers to represent the coordinates
of the origin. Later the author realized that the integer range afforded by ash 0 r tin t
was sufficient and that the size of Shape instances could be reduced significantly. The
fundamental type of a private data member used to store the coordinates is clearly an
implementation detail of the Shap e class. The interface would not change, and it would
continue to accept and return normal integers in the valid range (see Section 9.2). In
fact, this detail is entirely encapsulated by the intetface of Sha pee Yet there is a problem.
Suppose that the author of Shap e changes the private coordinate data type from i nt to
short i nt. Which of the components in Figure 6-3 would be forced to recompile?
Unfortunately, the correct answer is "all of them." Both Ci rcl e and Rectangl e
inherit from Shape and depend intimately on the internal physical layout of Shape.
When any of Shape's data members change, the internal layout of Ci rcl e and
Rectangl e will also have to change accordingly.
Clients of Shape are no better off. For one thing, the position of the virtual table
pointer in the physical layout of the Shape object will almost certainly be affected by
the change from i nt to s h 0 r tin t. Unless the dependent code is recompiled, it sim-
ply will not work. More generally, whenever a header file is modified, all clients that
include that header file must be recompiled. Therefore, whenever any part of the
implementation resides in the header file of a component, the component fails to
"insulate" clients from that part of its logical implementation.
When bugs occur between internal releases of the various levels of a large system,
insulating components (Le., components that insulate clients from their implementa-
tions) are much more easily patched than non-insulating components. As long as the
Section 6.1.1 The Cost of Compile-Time Coupling 333
interface is not altered, the modified implementation can be dropped in place without
having to recompile other components or worrying about headers becoming out of
date. (We revisit this important topic in Section 7.6.2.)
One final testament to the value of insulation is that it can enable us to replace dynam-
ically loaded libraries transparently. Dynamically loaded libraries are not linked into a
single executable but, rather, are linked on demand into a running program. Suppose
that you are the vendor of some C++-based application library. If you supply a fully
insulated library implementation, then you can provide performance enhancements
and bug-fixes without disturbing your clients at all. Sending them an update does not
force them to recompile or even relink. All they do is reconfigure their environment to
point to the new dynamically loaded library, and off they go.
In the following subsection we take a quantitative look at the cost of compile-time cou-
pling. After that, we look at specific ways in which implementation details in C++ can
become non-insulating, and then discuss transformations that can improve the degree of
insulation.
I then measured the CPU time needed to compile the . c file. The experiment was
repeated using headers 1,000 lines long instead of 100 lines. Figure 6-5 provides the
1 This
subsection provides experimental data to corroborate the claims in the main section and may
be omitted without loss of continuity.
334 Insulation Chapter 6
results of running this simple experiment using the CFRONT 3.0 compiler running on a
SUN SPARC 20 Workstation with 32 megabytes of memory.
The first column represents the relative size of the system where N represents the
number of components of equal size. The next two columns represent the measured
compile-time cost for headers on the order of 100 lines and 1,000 lines, respectively.
1 0.1 0.4
2 0.1 1.0
4 0.2 3.4
8 0.4 11.0
16 0.8 32.2
32 2.4 137.7
64 8.2 497.5
128 26.5 more than a day
256 98.1
512 397.6
1024 more than a day
If the total number of included lines is around 3,000 (30 small components or 3 large
ones), doubling the number of included lines roughly triples the compile-time cost.
For projects of this scale, the cost of recompiling a single. c file using CFRONT 3.0 is
roughly proportional to N 1.6 and gets progressively worse for larger systems. A trans-
lation unit that might otherwise take only a few seconds to compile might now take
several minutes.
As if this were not bad enough, because each component is compile-time dependent
on every other component, an uninsulated change to anyone component implies that
all others must recompile as well. The cost of a single uninsulated change in a large
compile-time coupled system is not proportional to N 2 but more like N 3 !
section 6.2 c++ Constructs and Compile-Time Coupling 335
If, when compiling any single translation unit, the amount of included header file
information causes the compiler to exceed available physical memory, virtual mem- .
ory swapping will completely overwhelm the cost of compilation, as was the case for
the last entry in Column 2 and the last four entries in Column 3 of Figure 6-5. That is,
for a given compiler and system configuration, there can be fairly hard limits to the
absolute size of any given translation unit. For this particular configuration, 60,000
lines was practical; 100,000 lines was not.
Sometimes the logical and physical decompositions of components are naturally con-
sistent with each other. Consider a non-inline member function of a class. Its logical
interface (the declaration) re~ides in the physical interface (the . h file), and its logical
implementation (the function body) resides in the physical implementation (the . c
file). In this case, the declaration merely describes the interface without exposing any
more information than is necessary or desirable.
C++ does not require that all details regarding the logical implementation exist in the
. c file. C++ allows this tight compile-time coupling for performance reasons. For a
small, light-weight component implementing a stack or a list, avoiding compile-time
coupling by completely insulating its implementation could have too great an impact
on performance to be practical. Such light-weight components typically reach a stable
state quickly and then are seldom if ever modified.
• inheritance,
• layering,
• inline functions,
336 Insulation Chapter 6
• private members,
• protected members,
• compiler-generated functions,
• include directives,
• default arguments, and
• enumerations.
Whenever one class derives from another, even privately, there can be no way to insu-
late clients from that fact. Even though private inheritance is considered an encapsu-
lated implementation detail of the derived class, the physical layout of the derived
object forces every client that includes the definition of the derived class to have
already seen the definition of the base class. It is therefore appropriate for the header
file of a derived class to include explicitly the header files containing its base classes.
Whenever a base class header is modified (even if just to add a comment), UNIX util-
ities such as rna ke will feel obliged to recompile any client of a derived class before
linking that client into any new executable.
Figure 6-6 illustrates that if any change is made to the physical interface of B, then not
only 0 but also all clients of 0 (i.e., C1, C2, and C3) will be forced to recompile.
section 6.2.2 Layering (HasAIHoldsA) and Compile-Time Coupling 337
private inheritance
In contrast, when a class merely holds the address of an object (HoldsA), the class is
not necessarily dependent on the physical layout of the held object. If so, it is appro-
priate for the header containing the class not to include the header for the held object
but instead merely to declare its type.
Figure 6-7 illustrates a situation where class Stooges uses (in its implementation
only) classes Moe, La r r y, and Cur 1y. Unlike classes La r r y and Cur 1y, a Moe is embed-
ded in every Stooges object and therefore is not insulated from clients of Stooges.
Any modification to the header file of Moe will necessitate the recompilation of all cli-
ents of Stooges.
338 Insulation Chapter 6
II stooges.h
#ifndef INCLUDED_STOOGES
#define INCLUDED_STOOGES
#ifndef INCLUDED_MOE
#include "moe.h"
#endif
class Larry;
class Curly;
class Stooges {
Moe d_moe:
Larry *d_larry;
Curly& d_curly;
public:
Stooges();
II
};
#endif
Figure 6-7: Embedded, Layered Objects Are Not Insulated from Clients
1. Any programmer that can use the component can look at the inline imple-
mentation.
Figure 6-8 illustrates ways that inlining can uninsulate otherwise insulated implemen-
tation details of class Fred. For example, Fred holds pointers to objects of type
Wi 1rna,. Betty, Ba rney, and MrSl ate, and therefore Fred's object layout does not
depend on the object layout of any of these types. Because member function
getWi 1rna returns an object of type Wi 1rna by value and is declared i nl i ne, it is neces-
sary for all clients of class Fred to have already seen the definition of class Wi 1rna.
Since member function get Bet ty is not declared i n 1 i ne, clients of Fred that do not
need to call getBetty (and otherwise do not depend on type Betty in size) need not
2 Passing a user-defined type into a function by value is almost never done (see Section 9.1.11),
340 Insulation Chapter, 6
include the header file for class Betty. In other words, clients that do not use tyd
Betty are not forced to depend on Betty at compile time.
II fred.h
#ifndef INCLUDED FRED
#define INCLUDED_FRED
#ifndef INCLUDED_WILMA
#include "wilma.h"
#endif
#ifndef INCLUDED_MRSLATE
/finclude "mrslate.h"
41endif
class Barney;
class Betty;
class Fred {
Wilma *d_wilma_p;
Barney *d_barney_p;
Betty *d_betty_p;
MrSlate *d_mrSlate_p;
public:
Fred();
Wilma getWilma() canst { return *d_wilma_p; }
Betty getBetty() canst; II non-inline function
const Barney& getBarneyC) const { return *d_barney_p; }
double getSalary() { return d_mrSlate_p-)askForRaise(); }
};
#endif -,
An instance of class Ba rney is returned from member function getBa rney by refer-
ence, so unless a client depends on class Bar n ey in size, there is no need for that client
to include the class definition for Ba rney. Again, the client is not forced to depend on
what it does not need.
Finally, the member function getSal ary makes substantive use of the encapsulated
MrS 1 ate object in its implementation. Because get Sal a r y is declared i n 1 i ne, all cli-
ents of Fred are required to have seen the definition of class MrSl ate, whether or not
they call get Sal a r y. Of course, should any of the implementations of these inline
functions change, all clients of Fred would have to recompile.
We are often reminded that private member functions are encapsulated implementation
details of a class, but they are not insulated implementation details-even when they
are not declared i n1 i n e. Altering so much as the signature of a private member func-
tion of a class is enough to force all clients of the component defining that class to
recompile.
Figure 6-9 illustrates the problem with private members. The d_l ength member is a
detail that was added presumably because it was felt that keeping track of the length
was more efficient than calculating it on demand. If this assumption turns out to be
false, removing d_l ength will cause all clients of this component to recompile. Simi-
larly, the copy function was implemented to factor the copy operation for use in both
the copy constructor and assignment operator. If we now decided to change the signature
of this private helper function from copy(const String&) to copy(const char *)
to enable its use in implementing the default constructor as well, all clients would
again be forced to recompile.
342 Insulation Chapter 6
II str.h
#ifndef INCLUDED_STR
#define INCLUDED_STR
class String {
char *d_string_p;
int d_length;
void copy(const String& string);
public:
String(const char *str);
StringCconst String& string);
-String(const char *str);
String& operator=(const String& string);
II
};
#endif
Whe~ considering protected members, base-class authors must now address two dis-
tinct audiences: derived-class authors and general users. Protected functions are in the
interface specifically for derived classes, but are intended to be treated as implementa-
tion details by general users. Note that protected member data is rarely appropriate,
especially in widely used interfaces for which insulation is a design goal.
On the surface, the protected interface provides a convenient place for prospective
derived-class authors to look to determine what will be required of them. However,
just as with private members, the protected interface is declared in the class definition
and is therefore not an insulated implementation detail as far as general users are con-
cerned. Modifying the protected interface of a base class in any way will force the
recompilation of (1) all clients of the base class, (2) all derived classes, and (3) all clients
of the derived classes.
constructor is specified, the compiler will generate one with member-wise copy
semantics. That is, a copy constructor will be generated that copies each member object
and each base-class object according to its own individual initialization semantics.4
public:
II CREATORS
ComplexSymbol(const String& name, double ret double im = 0.0);
II Default copy ctor and dtor are fine.
II MANIPULATORS
II Default assignment operator is fine.
II
II ACCESSORS
II
}:
new declarations into the header file of our component is a change that cannot be
insulated from clients.
In the experiment at the beginning of this chapter (Section 6.1.1) that demonstrated
the high cost of compile-time coupling, we did not even consider the possibility that
each header might directly include every other header. 5 Instead we assumed that each
. c file explicitly included every header in the system because it needed to do so. In
practice, this scenario does not happen.
What is much more likely to occur is that each header file will include one or more
header files that, in tum, include one or more other header files, until eventually virtu-
ally every header file in the system has been included. This is where redundant
include guards (Section 2.5) help to reduce the cost of compiling by eliminating the
quadratic behavior we observed in the time spent by the C++ preprocessor.
Consider the example in Figure 6-11. A Ban k class uses a Ban k Car d .class and a variety
of currency classes in its interface. The Ban k class does not inherit from any other
class. Let us assume that Ban k does not have any inline functions that make substantive
use of class Ban kCa rd or any of the currency classes. Let us further assume that class
Ban k does not embed instances ot any user-defined class (RasA) in its own definition.
II bank.h
#ifndef INCLUDED_BANK
#define INCLUDED_BANK
#ifndef INCLUDED_BANKCARD
#include "bankcard.h"
#endif
#ifndef INCLUDED_GERMANMARKS
#include "germanmarks.h"
#endif
#ifndef INCLUDED_JAPENESEYEN
#include "japeneseyen.h"
#endif
5 Insome environments, you might encounter a limitation on the number of open source files per-
mitted at anyone time.
;ection 6.2.7 Include Directives and Compile-Time Coupling 345
#ifndef INCLUDED_UNITEDSTATESDOLLARS
1foinclude "unitedstatesdollars.h"
fIend i f
#ifndef INCLUDED_ENGLISHPOUNDS
lfi nc1 ude" eng 1ish po unds . hII
/fendif
I I ..,
I I .,.
II
class Bank {
I I ...
Bank(const Bank&); II We don't want to copy
Bank& operator=(const Bank&); II or assign banks.
public:
II CREATORS
Bank() ;
,...,Bank();
II MANIPULATORS
GermanMarks getMarks(BankCard *cashMachineCard, double amount);
JapeneseYen getYen(BankCard *cashMachineCard, double amount);
UnitedStateDollars getDollars(BankCard *cashMachineCard. double amount);
EnglishPounds getPounds(BankCard *cashMachineCard, double amount);
II
I I .. .
I I .. .
LakosianFooBars getFooBars(BankCard *cashMachineCard, double amount);
};
#endif
Now consider a client of an instance of this Ban k in the United States. This person is
typically interested in going to the bank with his or her bank card and withdrawing
some amount of money in United States dollars. A simple example of a Person's
withdraw member function is shown in Figure 6-12.
346 Insulation Chapter 6
II person.c
ih nc 1ude" per son . h I!
#include "bank.hl!
II ...
Picture the fictitious island republic of Lakos; its national unit of currency, the
FooBar, is notoriously unstable and subject to change without notice. Today this
country has again announced its intention to make an uninsulated change to its imple-
mentation of FooBar. The world financial community is demanding to know who will
be forced to recompile.
Not only will all actual clients of La k 0 S ian F00 Bar s have to recompile, but so will all
other clients of Ban k. That is, if you banked at this bank, whether or not you ever cared
or had even heard about La k 0 s ian F0 0 Bar s, any change at all to 1 a k 0 s ian f 00 bar. h
will cause software configuration management tools (such as make) to recompile you
automatically.
To add insult to injury, there is no real need for you to be compile-time dependent on
that currency! None of your code depends on that currency at compile time. So why
did ban k' s author decide to include all these header files in ban k . h instead of ban k . c?
The answer you might receive is "for the convenience of our clients."
The author of the bank component believes that just in case you might need some
class definition, we'll include it for you. This approach has the relatively small advan-
tage that as long as you include ban k . h, you will never need to include the header for
Un; tedStatesDoll ars or your BankCard. However, this approach also has the rela-
tively large disadvantage that you will forever be at the mercy of a potentially large
number of header files that you neither control nor otherwise care about.
the function can be more self-documenting simply because they place more informa-
tion in the header file:
class Circle {
II ...
public:
Circle(double x = 0, double y = 0, double radius - I);
I I ...
};
Unfortunately, such default values become compiled in along with the interface and
any modification of those values will force clients to recompile.
Enumerations, CPP macros, typedefs, and (by default) non-member canst data do not
have external linkage (see Sections 2.3.3 and 2.3.4). As such, these constructs must
appear in the header file of a component if they are to be used byotber components (or if
they appear in the body of any inline functions intended for use outside the component).
Figure 6-13 illustrates the common practice in small projects of grouping all system-
wide definitions into a single component. As more components are added to the sys-
tem, these components will typically include this common definitions file. Whenever
the need for a new definition or return status is encountered, it is added to the
sysdefs. h file. The more components that are added, the more opportunities there
are to add to the common definitions. Whenever a common definition is added to
sysdefs . h, almost all components in the system are forced to recompile.
Eventually the system reaches the point where making an addition to the global defi-
nitions is simply too expensive. Instead of placing a useful definition in this file, they
are kept local or private. Instead of adding new specific return status values to the enu-
meration, preexisting codes (such as UNSPECIFIED_ERROR) are used over and over,
even though they are vague or even inappropriate.
II sysdefs.h
#ifndef INCLUDED_SYSDEFS
. #define INCLUDED_SYSDEFS
#ifndef INCLUDED_MATH
#include <math.h> II bad idea: should be insulated
#define INCLUDED_MATH
#endif
348 Insulation Chapter 6
struct SysDefs {
typedef int (*Pfdi)(double);
typedef double (*Pfid)(int);
enum ReturnStatus {
SUCCESS = 0,
WARNING,
IOERROR,
FILE_NOT_FOUND,
I I ...
OUT_OF_RANGE,
I I ...
OUT_OF_MEMORY,
II .. .
I I .. .
INVALID_GEOMETRY,
II
I I .. .
I I .. .
UNSPECIFIED_ERROR
};
};
#endif
The problem here is that enumerations and typedefs are not implementation details
but rather are plainly part of the public interface of a component. The interface of this
component is not a well-organiz.ed, cohesive presentation of a single abstraction.
Instead it is an eclectic hodgepodge of details. This all too common use of enumera-
tions does not scale well as project size increases.
The compile-time coupling in this system arises because this interface is driven not
from the lower levels of the physical hierarchy but from the yet-to-be-implemented
higher levels. This upward dependency imposes an implicit compile-time coupling
among all clients, even though these clients are in unrelated parts of the system. This
example is an instance of a more general problem, involving the sharing of ownership
for a component.
In the following section we discuss specific techniques for addressing this and other
problems related to insulation.
section 6.3 Partial Insulation Techniques 349
Not every component should attempt to insulate its clients from every implementation
detail. But, all other things being equal, it is better to insulate a client from an imple-
mentation detail than not to do so-even if only to reduce the clutter in the physical
interface.
Figure 6-14 illustrates how a class can privately inherit from another class and then
selectively publish all members with a given name in its own interface using an access
declaration.
-
6 ellis,Section 11.3, p. 244.
7 strollstrup94, Section 17.5.2, p. 419.
350 Insulation Chapter 6
// base.h /1 myclass.h
#ifndef INCLUDED_BASE #ifndef INCLUDEO_MYCLASS
#define INCLUDED_BASE
C'
#define INCLUDED_MYCLASS
(a) Private Base Class Header File (b) Derived Class Header File
The usefulness of the access declaration is dubious for a couple of reasons. It exposes
a set of functions in the public interface, yet in order for a client to know "Yhat those
functions are, the client must look at the header of the privately derived (implementa-
tion) class in order to know the appropriate arguments and return values. Another
problem is that this class fails to insulate its client from its private base class. The cli-
ent is exposed to changes in private (unpublished) functions that may not even be used
in the implementation of the derived class.
One reason for using private inheritance instead of layering is to take advantage of
the virtual table(s) of the base class. By overriding the behavior of the virtual func-
tions declared in a private base class, we may be able to "customize" or "program"
other behaviors that depend on the overridden behavior at the base-class level. It also
is possible to invent a dummy class for derivation purposes and then proceed with lay-
ering using that dummy class. If insulation is not an issue, then private inheritance
may be appropriate. If, however, this class is to become part of a more generally pub-
lic interface, then a transformation from inheritance to layering is in order.
Figure 6-15 illustrates how the same logical interface as the one in Figure 6-14b can
be achieved without exposing clients to the details of the implementation class.
Instead of privately deriving from class Base, the new implementation holds an out-
wardly opaque pointer to class Bas e. Whenever an instance of My C1 ass is created, the
Section 6.3.1 Removing Private Inheritance 351
II myclass.h II myclass.c
#ifndef INCLUDED_MYCLASS #include "myclass.hll
#define INCLUDED_MYCLASS #include "base.h"
Instead of using access declarations to publish members of a private base class selec-
tively, new member functions of MyCl ass are defined (out-of-line) to forward their
calls to corresponding functions defined in class Ba s e. Note that all member functions
of My C1 ass that depend on Bas e in size must be declared non- i n1i ne if clients are to
be insulated from the definition of Bas e.
In this way, class My C1 ass now insulates its clients from all organizational changes to
class Base. Had class Base been abstract, then d_base_p would point to a dummy
concrete class derived from Base, perhaps implemented entirely in file myclass.c.
Note that all of this insulation is not without its cost (e.g., extra function calls and
dynamic allocation), as discussed in detail in Section 6.6.1.
352 Insulation Chapter 6
Even if performance requirements prevent us from fully insulating a class, we can still
insulate clients from an individual implementation class by converting all embedded
instances of that implementation class to pointers (or references) to that class and then
managing those pointers explicitly in the constructors, destructors, and assignment
operators of the class.
Figure 6-16 shows how we can selectively insulate clients from implementation
classes by converting a RasA relationship (Figure 6-16a) to a HoldsA relationship
(Figure 6-16b). In doing so we must redeclare all inline functions that formerly oper-
ated on My C1 ass data members of type You r C1 ass to be non-inline. The downside of
HoldsA is the increased effort required to manage the layered instance; and also the
additional performance costs associated with indirection, dynamic allocation, and
non-inline functions. Notice how we can continue to access performance-critical
member data (such as d_count) via inline functions.
section 6.3.3 Removing Private Member Functions 353
II myclass.h II myclass.h
#ifndef INCLUDED_MYCLASS #ifndef INCLUDED_MYCLASS
#define INCLUDED_MYCLASS #define INCLUDED_MYCLASS
4fendif
(a) Before Insulating You r C1 ass (b) After Insulating You r C1 ass
from Clients of MyC 1 ass from Clients of My C1 ass
Instead of making the function a private member of the class, make it a static free
function declared at file scope in the . c file of the component. 8
Sometimes functions are made private members not because they need private access
but because the private section of the header file is a good place to store these factored
helper functions. That is, some private helper functions can do all of their work using
only the public interface of the class. In these cases, the transformation from private
member to static free functions is easy and quickly accomplished in two steps.
The first step is to convert each private member function to a private static member by
adding an appropriate writable pointer or read-only reference parameter to the func-
tion. Consider class My C1 ass, as defined in Figure 6-17 a. Class My C1 ass contains two
private member functions, fand g. Member f is a non-canst (manipulator) function
and member 9 is a canst (accessor) function. The manipulator f potentially alters the
object, so, in keeping with our policy (see Section 9.1.1), we will pass the instance by
non-canst pointer along with the o~her arguments to the function. The accessor 9 is
innocuous and we will pass the instance by con s t reference along with g' s other argu-
ments, as shown in Figure 6-17b.
II myclass.h II myclassoh
#ifndef INCLUDED_MYCLASS #ifndef LNCLUDED_MYCLASS
#define INCLUOED_MYCLASS #define INCLUDED_MYCLASS
class MyClass { class MyClass {
I I ... II 000
private: private:
void f( ... ); s tat i c v0 i d f ( My C1 ass *my C1 ass, .. 0 ) ;
public: public:
II II 00'
}; };
#endif 1tendif
8 We will be able to achieve this same effect more elegantly using unnamed namespaces, as dis-
cussed in strollstrup94, Section 17.5.3, pp. 419-420, once this relatively new language feature
becomes more widely available.
section 6.3.3 Removing Private Member Functions 355
The second step is to remove these function declarations entirely from the header file,
remove the member notation from function definitions in the. c file (shown in Figure
6-18a), and finally precede each of these definitions by the keyword s tat i c, as shown
in Figure 6-18b. Note that this second step should not require any changes to the
implementations of the other member functions defined in the . c file.
II myclass.c
#include "myclass.h"
v0 i d My C1ass: : f ( My C1ass *my C1ass, ... ) { 1* ... * I }
i nt My C1ass : : 9 ( con s t My C1ass & my C1ass, ...) { I * ... * I }
I I ...
II myclass.c
#include "myclass.h"
static void f(MyClass *myClass, ... ) { 1* ... *1 }
static int g(const MyClass& myClass, ... ) { 1* ... *1 }
II
Unfortunately, private member functions often operate directly on other private imple-
mentation details, which can make these functions more difficult to extricate. Con-
sider the 1 i st component defined in Figure 6-19. Class List contains three private
member functions-copy, c1 ean, and end-that are used repeatedly to help imple-
ment the public functionality of class Lis t.
The copy function is already a static member, but it needs access to the auxiliary
("slave") class Lin k. Both c 1 e an () and end ( ) depend on access to the private data
member d_h e a d_p that identifies the head of the list, and there are no public functions.
that can be used to obtain access to it. Making these three functions non-members of
Lis t will strip them of their privileged access to the implementations of both Lis t
and Lin k. Although these functions will no longer have access to the private details of
either class, the callers of these functions are members with full access, and they are
at liberty to offer up this information.
356 Insulation Chapter 6
II list.h
#ifndef INCLUDED_LIST
#define INCLUDED_LIST
class List;
class Listlter;
class astream;
class Link {
int d_data;
Link *d_next_p;
friend List;
friend Listlter;
Link(const Link& link); II not implemented
Link& operator=(const Link& link); II not implemented
II CREATORS
Link(int data, Link *next = 0);
}:
class List {
Link *d_head_p;
friend ListIter;
private:
static Link *copy(const Link *link, Link *end = 0);
II allocate and return new copy of given list of links
void clean();
II destroy and deallocate entire list of links
Link *& end();
II return a reference to the end of the list
public:
II CREATORS
List();
List(const List& list);
,..,List();
II MANIPULATORS
List& operator=(const List& list):
void append(int i);
Y0id append(const List& list);
","" \/0 i d pre pen d ( i nt i):
void prepend(canst List& list):
};
ostream& operator«(ostream& 0, canst List& list);
class Listlter {
/ I ...
};
#endif
Figure 6-19a: 1 ; st. h File for L; st Class with Private Member Functions
Section 6.3.3 Removing Private Member Functions 357
II list.c
#include "list.h"
#include <iostream.h)
II **********
II class Link
II **********
II CREATORS
Link::Link(int data, Link *next) : d_data(data), d_next_p(next) {}
II **********
II class List
II **********
II PRIVATE MEMBERS
Link *List::copy(const Link *link, Link *end)
{
Link* linkPtr = end:
for (Link **addrLinkPtr ~ &linkPtr; link; link = link->d_next_p)
*addrLinkPtr = new Link(link-)d_data, *addrLinkPtr);
addrLinkPtr - &(*addrLinkPtr)->d_next_p;
}
return linkPtr:
}
void List::clean()
{
while (d_head_p) {
Link *tmp ~ d_head_p;
d_head_p = d_head_p->d_next_p;
delete tmp:
}
}
II CREATORS
Lis t : : Lis t () : d_h e ad_p ( 0 ) {}
List::List(const List& list) d_head_pCcopy(list.d_head_p» {}
List: :-List() { clean(); }
358 Insulation Chapter 6
II MANIPULATORS
List& List::operator=(const List& list)
{
if (this != &list) {
clean();
d_head_p = capyC1ist.d_head_p);
}
return *this;
}
II FREE FUNCTION
astream& operator«(astream& a, canst List& list)
{
o « f[';
for (Listlter it(list); it; ++it) {
o « ' , « it();
}
return 0 « " ]";
}
II **************
II class Listlter
II **************
I I ...
Figure 6-19b: 1 i st. c File for List Class with Private Member Functions
As shown in Figure 6-20, we can modify both the c1 ean and end helper member
functions so that they, like copy, are declared stat; c and take as arguments the pri-
vate infonnation to which they need access. Clients of these two functions must noW
provide a little more infonnation when they make the call, but these functions will no
longer have to rely on private access to the Lis t class to do their jobs. The only prob-
lem that remains is that these functions still depend on access to the private function-
ality of the encapsulated Link class in order to accomplish their tasks.
section 6.3.3 Removing Private Member Functions 359
II list.h
I I ...
class List {
I I ...
private:
static void clean(Link *link);
static Link *& end(Link **addrLinkPtr);
I I ...
};
II list.c
II
II
II
One solution is to make the needed functionality in the Lin k class publicly accessible.
Since the use of Lin k is an encapsulated implementation detail of Lis t, there is little
hann that can come from allowing clients (or test engineers) to play with separate
instances of the Lin k class. However, a better solution from an insulation point of
view is to move the trivial definition of the Lin k class to the . c file and make it
entirely public. Not only does this solution increase the insulation of the 1 i 5 t compo-
nent's implementation, but it also eliminates a lot of unnecessary clutter in its header
file. The improved version of 1i st is shown in Figures 6-21a and 6-21b.
360 Insulation Chapter 6
II list.h
#ifndef INCLUDED_LIST
#define INCLUDED_LIST
class Link:
class List;
class Listlter:
class ostream;
class List {
Link *d_head_p;
friend ListIter;
public:
II CREATORS
List();
List(const List& list);
''''ListC);
II MANIPULATORS
List& operator=(const List& list);
vo; d append Ci nt i);
void append(const List& list);
void prepend(int i);
void prepend(const List& list);
}:
class ListIter {
I I ...
};
lFendif
Figure 6-21a: 1 i st. h File for 1 i s t Component with Static Free Functions
section 6.3.3 Removing Private Member Functions 361
II list.c
#include "list.h"
#include <iostream.h>
II **********
II class Link
II **********
struct Link {
int d_data;
Link *d_next_p;
II **********
II class List
II **********
II STATIC FREE FUNCTIONS
static Link *copy(const Link *linkt Link *end - 0)
{
Link* linkPtr = end;
for (Link **addrLinkPtr = &linkPtr; link; link = link->d_next_p) {
*addrLinkPtr = new Link(link->d_data, *addrLinkPtr);
addrLinkPtr = &(*addrLinkPtr)->d_next_p;
}
return linkPtr;
}
II CREATORS
Lis t: : Lis t C) : d_h e ad_p ( 0 ) {}
List::ListCconst List& list) : d_head_pCcopyClist.d_head_p) {}
Lis t : :,..., Lis t () { c 1e an Cd_h e ad_p); }
II MANIPULATORS
List& List::operator=Cconst List& list)
{
if (this != &list) {
cleanCd_head_p);
d_head_p = copyClist.d_head_p);
}
return *this;
}
II FREE FUNCTION
ostream& operatar«Costream& 0, canst List& list)
{
o « '[';
for CListIter it(list); it; ++it) {
o « ' , « itC);
}
return a « " ]";
}
II **************
II class Listlter
II **************
II
Figure 6-21b: 1 i st. c File for '-i st Component with Static Free ~unctions
Sometimes private member functions can be converted to static free functions that are
independent of the types defined in the current component. If these functions are non-
trivial, it could be advantageous to attempt to verify them directly_ Instead of creating a
single component with inaccessible yet non-trivial static free functions, consider mak-
ing two components-one with public static members used to implement the other.
Figure 6-22 illustrates the result of moving independent static functions at file scope
from the my c 1 ass. c file and making them into publicly accessible static member
section 6.3.4 Removing Protected Members 363
functions in a separate utility component. This technique makes sense when the func-
tions are either reusable or non-trivial, and it is especially useful when the CCD of
these functions alone is very much smaller than it is for the original component.
II myclass.c II myclassimputil.h
#include "myclass.h" #ifndef INCLUDED_MYCLASSIMPUTIL
#include "myclassimputil.h" #define INCLUDED_MYCLASSIMPUTIL
Although static functions are preferable to private members with respect to compile-
time coupling, performance can become an issue, especially if there is a lot of private
state infonnation that must be passed into and out of the static functions at file scope.
In such cases, other, more general forms of insulation (discussed in Section 6.4) may
be preferable.
What are protected members good for? That is, when is it appropriate to have pro-
tected access to class members? The simplistic answer is that protected members are
appropriate when you wish to distinguish between two distinct audiences: derived-
class authors and general users. The protected interface is every bit as important as the
public interface when it comes to encapsulating private details (see Section 2.2), yet
the protected interface is often given less attention than the public one. Realize that
even though the protected interface of an individual instantiated object is not accessi-
ble by the pUblic, anyone can derive a class that depends on these protected details.
The next question is then, "When would someone want to address two distinct audi-
ences from within a single class?" More often than not, the answer is, "When some-
one is trying to do too much with a single class."
364 Insulation Chapter 6
Consider the header for the abstract base class Shape shown in Figure 6-23. Presum-
ably each derived-shape object has an origin and an area, and knows how to draw
itself on a given .Screen. The screen object provides all the functionality needed to
draw lines and arcs; however, writing the code to achieve this has been found to be
both tedious and error prone. Knowing this, the author of the Shap e base class has
provided a suite of protected member functions to aid the derived-class author in
implementing his or her own specialized d raw function.
Figure 6-24 illustrates a derived Rectangl e class and the implementation of its draw
function using protected helper functions provided in the base class. The Rectangl e
is defined only by its lower-left and upper-right comers, which implicitly forces the
edges of the Rectangl e to be horizontal and vertical. The derived-class author has
also defined the lower-left comer to coincide with the origin of the shape.
section 6.3.4 Removing Protected Members 365
II shape.h
#ifndef INCLUDED_SHAPE
#define INCLUDED_SHAPE
#ifndef INCLUDED_POINT
#include "point.h"
#endif
class Screen;
class Shape {
public:
II TYPES
enum Status { 10_ERROR ~ -1, SUCCESS - 0 }:
private:
II DATA
Point d_origin;
Status d_drawStatus;
protected:
II DERIVED CLASS SUPPORT
static double distance(const Point& start, const Point& end):
void resetDrawStatus();
Status getDrawStatus() const;
void drawLine(Screen *screen. const Point& start, canst Point& end);
void drawArc(Screen *screen, const Point& center, double radius,
double startAngle, double endAngle);
private;
Shape& operator=(const Shape&); II not implemented
Shape(const Shape&); II not implemented
public:
II CREATORS
Shape(canst Point& origin):
virtual -Shape();
II MANIPULATORS
void setOrigin(const Point& origin);
II ACCESSORS
const Point& origin() const:
virtual double area() const = 0;
virtual Status draw(Screen *screen) - 0;
};
#endif
Figure 6-23: Shape Class with Protected Support for Derived-Class Authors
366 Insulation Chapter 6
II rectangl~.h
#ifndef INCLUDED_RECTANGLE
#define INCLUDED_RECTANGLE
#ifndef INCLUDED_SHAPE
#include "shape.hl!
Ifendif
public:
II CREATORS
Rectangle(const Point& lowerLeft, canst Point& upperRight);
RectangleCconst Rectangle& rect);
---Rectangle();
II MANIPULATORS
Rectangle& operator=(const Rectangle& reet);
void setUpperRightCorner(const Point& upperRight);
II ACCESSORS
const Point& upperRightCorner() const;
double area{) const;
Shape::Status draw(Screen *screen);
};
II rectangle,c
#endif #include II rec tangle.h ll
I I ...
Figure 6-24: Derived Rectangl e Shape and the Implementation of Its Draw Member
section 6.3.4 Removing Protected Members 367
In order to draw a Rectangl e, we will need to draw four lines. If any error occurs we
will need to return 10_ERROR from the Rectangl e:: draw function. Our first step is to
clear the draw status. We then identify the appropriate coordinates and make the nec-
essary calls to the protected helper functions. If any error occurs along the way, these
helper functions will internally set the draw status to I O_E RRO R. When we are done,
we simply return the draw status.
This is one way of doing business that is convenient for base-class authors and
derived-class authors alike, but takes its toll on general clients by compile-time cou-
pling them to numerous implementation details that they neither need nor want. This
scenario is illustrated by the component/class diagram in Figure 6-25.
In this case there is little justification for polluting the public interface of class Shap e
with details that only the derived-class authors care about. Suppose that instead of
having each of the derived classes depend on services provided in the base class, each
derived class uses a separate component (if needed) to facilitate drawing. This way, the
unnecessary coupling associated with the protected members would be eliminated.
..
As Figure 6-26 shows, the new system is now factored so that the derived-class
authors use a separate scri be component that the general public does not see.
368 Insulation Chapter 6
In Name Only
The header for the scri be component is shown in Figure 6-27. Since the functionality
provided in this new component is no longer embedded in Shape, we have decided to
uncouple it completely. The drawing functionality no longer depends on Shap e in any
way, and now this facility can readily be reused by objects other than those derived
from Shape that might need to render themselves on a Screen.
II scribe.h
#ifndef INCLUDED_SCRIBE
#define INCLUDED_SCRIBE
class Screen;
class Point;
class Scribe {
int d_hadError;
Section 6.3.4 Removing Protected Members 369
private:
Scribe& operator=(const Scribe&); II not implemented
Scribe(const Scribe&); II not implemented
public:
II STATICS
static double distance(const Point& start, canst Point& end);
II CREATORS
Scribe();
"'"'Scribe();
II MANIPULATORS
vaid drawLine(Screen *screen, canst Point& start, const Point& end);
#endif
Derived-class authors will not find it difficult to use the public members of class
Sc r i be instead of the protected members of the base class. Since the 5 C r i be compo-
nent is provided only as a convenience, those who do not find its functionality useful
need neither include its header nor depend on it at link time. The reimplemented d raw
function for Rectangl e is shown in Figure 6-28. The new version of the header for
the Shape base class is given in Figure 6-29.
Occasionally it is not feasible to remove all of the protected members of a class. Such
is the case when the derived class needs access to protected services provided by a
base class in order to override virtual functions.
370 Insulation Chapter 6
II rectangle.c
1finclude "rectangle.h"
#include "scribe.h"
An abstract base class that defines some shared functionality is sometimes referred to
as a partial implementation. This type of factored implementation allows derived-
class authors to share a common implementation, but protected functionality again
places a burden on general users of the base class by exposing them to uninsulated
implementation details.
II shape.h
#ifndef INCLUDED~SHAPE
#define INCLUDED_SHAPE
#ifnde·f INCLUDED_POINT
#include "point.hl!
/fendif
class Screen;
class Shape {
Point d_origin;
private:
Shape& operato·r=(const Shape&); II not implemented
Shape(const Shape&); II not implemented
public:
II TYPES
enum Status { IO_ERROR = -1. SUCCESS = 0 };
section 6.3.4 Removing Protected Members 371
II CREATORS
Shape(const Point& origin);
virtual -ShapeC);
II MANIPULATORS
void setOrigin(const Point& origin);
II ACCESSORS
canst Paint& origin() canst;
virtual double area() canst = 0;
virtual Status draw(Screen *screen) - 0;
};
4rend if
For example, Figure 6-30 illustrates a simple base class that is used both to provide a
common interface and to factor the common implementation for cars. All cars have a
location, yet the public cannot alter that location directly. Instead, clients must call the
public member function d r i ve that, in tum, will cause the location of the car to
change in various ways, depending on the implementation of the actual (derived) car.
II car.h
#ifndef INCLUDED_CAR
#define INCLUDED_CAR
class Car (
int d_xLocation;
int d-yLocatian;
private:
Car(const Car&); II not implemented
Car& operatar=(canst Car&); II not implemented
protected:
Car(int x. int y);
int setXLocation(int x):
int setYLocation(int y);
II Only derived classes can set the location of a car directly.
void move(;nt deltaX. int deltaY);
static double distancel(double acceleration, double time):
static double distance2(double acceleration, double velocity);
double howFar(int newXlocation, int newYLocation) const;
public:
II CREATORS
virtual -Car();
372 Insulation Chapter 6
II MANIPULATORS
virtual void drive(/* ... */) = 0:
II Public clients alter the location of the
II car by calling the public function drive.
II ACCESSORS
int xLocation() canst:
int yLocatian() canst:
};
#endif
Several helper functions have been supplied in the protected interface of this base
class in order to aid derived -class authors in implementing the d r i ve function of their
own specific class. For instance, the function move takes relative distances and sets the
new absolute location of the Car. Static functions dis tan eel and dis tan c e 2 are inde-
pendent of instance data and provide support for physical distance calculations. The
how Far accessor function compares the current position with a specified new position
and returns the as-the-crow-flies distance between the two points.
Unlike the Shap e base class, however, Car's interface defines a pure virtual function
d r i ve that, depending on the actual derived type of Car, must in tum set the value of
the Ca r's location using protected functions provided by its partial implementation.
The design of the Car base class couples the interface with at least a portion of the
implementation. Now if a car manufacturer wants to develop an entirely new design
for a car, it is forced to carry around the overhead of the partial implementation
defined in the base class whether or not it is used!
In the case of Ca r, some of the functionality (e.g., the static functions and the howFa r
accessor) could certainly be moved to a separate utility class, as was done for Shape.
But extricating the partial implementation from this base class requires a more com-
prehensive effort.
Section 6.3.4 Removing Protected Members 373
Figure 6-31 a illustrates the component/class diagram for the original uninsulated sys-
tem. By factoring the pure interface and partial implementation of Ca r into two sepa-
. rate classes (Car and Carlmp, respectively), we will be able to separate them
physically. By placing the pure interface in a separate component, we provide an insu-
lating interface for public clients of Car, as illustrated in Figure 6-31 b. Note that since
Car I mp derives from Car, further derived classes that choose to share the common
implementation may continue to do so. Fortunately, changes made to the physical
organization of Car I mp cannot affect clients of Car. The extracted protocol for a Car is
shown in Figure 6-32.
374 Insulation Chapter 6
II car.h
#ifndef INCLUDED_CAR
#define INCLUDED CAR
class Car {
public:
II CREATORS
vir t ua 1 ---C a r ( ) ;
II MANIPULATORS
virtual void drive(/* ... */) = 0;
II Public clients alter the location of the
II car by calling the public function drive.
II ACCESSORS
virtual int xLocation() const = 0;
virtual int yLocation() const = 0;
};
II carimp.h
#endif #ifndef INCLUDED_CARIMP
#define INCLUDED CARIMP
What we have done in order to insulate the general users from all of the implementa-
tion details is to extract a pure interface (referred to in this book as a protocol).
Extracting a protocol is a very general and powerful technique for simultaneously
achieving both levelization and insulation. Protocol classes and how to extract them
are the subject of Section 6.4.1.
section 6.3.5 Removing Private Member Data 375
As you may recall, in the previous section we were able to eliminate all of the pro-
tected members from the Shap e base class by introducing a separate facility to support
the implementation of draw functions in derived classes. But the base class Shape still
contained private data.
II myclass.h II myclass.h
#ifndef INCLUDED_MYCLASS #ifndef INCLUDED_MYCLASS
#define INCLUDED_MYCLASS #define INCLUDED_MYCLASS
Removing private static member data is relatively easy. Figure 6-33a shows a private
static integer data member, s_count, used to track the number of active instances of
My C1 ass. As long as inline member functions (or long -distance friends) do not require
direct access, it is usually possible to move static member data to a static variable
defined at file scope in the component's. c file. 9 Removing non-static member data is
considerably more involved.
As we saw in Section 6.3.4, changing this encapsulated private data would force all pub-
lic clients of base class Shap e to recompile. As was done with Car in the previous sec-
9 Invery rare situations, allowing components to have more than one . c file enables developers of
to
reusable libraries to partition member function definitions based on usage patterns in order reduce
the runtime size of typical client programs. Allowing functions to communicate via static variables
defined in the . c file reduces the flexibility to partition the individual member functions of a class
into separate translation units ( . c files).
376 Insulation Chapter 6
tion, we can factor Shap e into two classes, one containing the pure interface and the
other containing the partial implementation (including the definition of the origin data).
The component/class diagram for the factored Sha pe hierarchy is given in Figure 6-34.
There are two distinct advantages to this architecture:
1. Clients of the Shap e class are insulated from all implementation details of
the actual object derived from Shape.
Class Shape no longer embeds an instance of Poi nt, so clients of Shape are no longer
forced to include the definition of Poi nt in order to use a Shape. Derived classes can
continue to share the partial implementation of Shape by deriving from Shapelmp
instead of from Shape. As always, there is absolutely no additional runtime cost asso-
ciated with extending the depth in an inheritance hierarchy. The only additional cost is
that the member functions 0 rig i nand s etO ri gin, which were statically bound, must
now be invoked through the virtual calling mechanism (see Section 6.6.1).
A protocol class for an arbitrary shape is given in Figure 6-35. Even though the Shape
class now insulates all implementation details from its public clients, we would still
opt to keep the support for drawing in a separate component for two reasons:
II shape.h
#ifndef INCLUDED_SHAPE
#define INCLUDED_SHAPE
class Point;
class Screen;
class Shape {
public:
II TYPES
enum Status { lO_ERROR = -1, SUCCESS = 0 };
II CREATORS
virtual ~Shape();
II MANIPULATORS
virtual void setOrigin(const Point& origin) = 0:
II ACCESSORS
virtual canst Point& origin() const = 0:
virtual double area() canst = 0;
virtual Status draw(Screen *screen) = 0;
};
#endif
Unnecessary include directives can cause compile-time coupling where none would
otherwise exist. There are generally three cases where a Hi ncl ude directive should
appear in the header file of a component:
Infrequently, a header file that contains a local linkage construct (such as enum or
typedef in ,class scope) can be another plausible excuse for including one header file
in another. In general, however, there are few other situations in which placing a
Hi n elude directive in a header file is justified.
As we saw earlier in the Bank example (Section 6.2.7), the bank component author's
decision to include each of the foreign currencies was no favor at all to the clients of
class Ba n k. The fact that these currencies appeared (in name) in the interface in no
way implied that Ban k 's clients needed to know their definitions in order to make
good use of Bank. The artificial compile-time dependency of Person on all these for-
eign currencies was solely the result of the nested Hi nc 1ude directives.
The transformation is simple: move all unnecessary include directives from the header
file to the . c file, and replace them with appropriate ("forward") class declarations.
The class declaration tells the client's C++ compiler that the currency represents some
user-defined object type but says nothing about its internal layout. Clients of Ban k are
now insulated from changes made to types they don't use. The easily made insulating
version of the ba n k component is shown in Figure 6-36.
380 Insulation Chapter 6
II bank.h
#ifndef INCLUDED_BANK
#define INCLUDED_BANK
II
II
I I ...
class LakosianFooBars;
class Bank {
I I ...
Bank(const Bank&); II We don't want to copy
Bank& operator=(const Bank&); II or assign banks.
public:
II CREATORS
Bank();
-Bank();
II MANIPULATORS
GermanMarks getMarks(BankCard *cashMachineCard, double amount);
JapaneseYen getYenCBankCard *cashMachineCard, double amount);
UnitedStateDollars getDollars(BankCard *cashMachineCard, double amount);
EnglishPounds getPoundsCBankCard *cashMachineCard, double amount);
I I .. .
I I .. .
I I .. .
LakosianFooBars getFooBars(BankCard *cashMachineCard, double amount):
};
#endif
It is easy enough to remove default arguments from an interface and replace them
with equivalent individual functions: 10
class Circle {
I I ...
public:
Circle(double x = 0, double y = 0, double radius - 1);
I I ...
};
We can change the above interface to the more insulating version as follows:
class Circle {
I I ...
public:
Circle();
Circle(double x) II do we really want this?
Circle(double x, double y);
Circle(double x, double y, double radius);
I I ...
};
Upon reflection we may decide not to provide the identical functionality and to remove
one or more of the options created for us automatically with default arguments.
We can sometimes eliminate the compile-time coupling and yet preserve the factoring
of default arguments by interpreting an invalid optional value (e.g., a null pointer, a
zero size, or a negative index) within the body of the function itself. Recall that in the
interface for the p2p_Router (Figure 4-2) there was a function fi ndPath that took an
"optional" first argument, which was the address at which to store the result:
class p2p_Router {
I I ...
public:
I I ...
int findPath(geom_Polygon *returnValue, canst geom_Point& start,
canst geom_Point& end, int width) canst;
}:
-
10 eUis, Section 8.2.6, p. 142.
382 Insulation Chapter 6
By rearranging the order of arguments, we could have made this argument truly
optional without hard-coding any uninsulating value in the interface:
class p2p_Router {
II ...
public:
II ...
int findPath(const geam_Point& start. canst geom_Point& end,
int width, geom_Polygon *returnValue = 0) canst;
};
Enumerations in the interface by their very nature evoke compile-time coupling. Judi-
cious use of enumerations, typedefs, and all other constructs with internal linkage in
the interface is essential to achieving good insulation.
Consider the three distinct kinds of enumerations shown in Figure 6-37. The first is a
private implementation detail of the class, the second is a publicly accessible constant
value, and the third is a named, enumerated list of return status values.
II whatever.h
#ifndef INCLUDED_WHATEVER
#define INCLUDED_WHATEVER
class WhatEver {
enum { DEFAULT_TABLE_SIZE = 100 }; II 1
public:
enum { DEFAULT_BUFFER_SIZE = 200; }; 112
Status doItC);
};
Ifendif
The first enumeration in Figure 6-37 is inappropriately placed (unless you need .a
compile-time constant in the header--e.g., to implement a fixed array bound). ThIs
enumeration should either be moved to the . c file at file scope or, if necessary, be
Section 6.3.9 Removing Enumerations 383
made a private static con s t member of the class. Representing this number as a static
class data member gives both inline functions, and functions with friend status
defined outside this translation unit, programmatic access to its value, without expos-
ing a "magic number" in the header file.
The second enumeration should at least be made a private static can s t class member,
and a public static (perhaps inline) accessor member function should be defined to
return this value. As with most insulation techniques (see Section 6.6.1), we pay a price
in runtime performance for the reduced coupling. In this case, an optimizing compiler
can take advantage of known compile-time constants, such as fundamental data
declared canst at file scope, enumerators, and literals. By storing actual values (rather
than addresses) directly in the instruction stream, an extra level of indirection can be
avoided. By definition, however, these compile-time constants cannot be insulated
from clients. Hence, any attempt to change them will inevitably force client recompila-
tion. If this level of performance across this interface is an issue, then this component is
probably at too low a level to be considered a good candidate for insulation.
I ...•...
l'~i-ill!·I;JI.ioli, .... ~
The third enumeration is clearly part of the interface. It may be that not all of these
status values are returned by functions in this component, but rather that this compo-
nent has been chosen to hold status values for other components as well. However, to
reduce compile-time coupling, a much preferred approach is to distribute the status
values to the appropriate components and not to attempt to reuse them. Distributing
the enumerated status values greatly reduces coupling by allowing the enumeration to
be independent of higher-levels in the physical hierarchy. Defining return values
locally has the added value of not trying to coerce subtly different meanings into
already existing status values. Each status value's meaning is local to the current
object and exactly suited for its purpose. Reusing status values is but one more case
where the benefit of reuse is more than offset by the coupling that ensues. A possible
alternative to the definitions in Figure 6-37 is illustrated in Figure 6-38.
384 Insulation . Chapter6
II whatever.h
#ifndef INCLUDED_WHATEVER
#define INCLUDED_WHATEVER
class WhatEver (
static canst int s_defaultBufferSize; II 2
public:
static int getOefaultBufferS;ze(); II 2
en urn Stat us { A. B. C }; II 3
Status daltC);
};
lIendif
II whatever.c
lIinclude "whatever.h"
Figure 6-38: Alternative Definitions for the Three Enumerations of Figure 6-37
Consider a function that returned a "bad" status value as a character string. Clients
would be required to know the exact form of the string. Since this value is insulated,
even determining this string the first time can be challenging for clients. Now, sup"
pose that one of the returned strings happened to change from i 0 Er ra r to 10_ERROR.
There would be no compiler support to help clients track down all places where the
Section 6.4 Total Insulation Techniques 385
comparison value in the calling routines would need to change. Even ignoring the
possibility of change, inevitable spelling errors will surely go undetected.
In general, the goal of, insulation is to shield clients from the compile-time depen-
dency associated with knowing unnecessary, encapsulated implementation details; it
is not to meant to shield clients from the programmatically accessible interface or to
compromise type safety.
In practice, developers may fail to consider all of the ramifications of their design
decisions. Sometimes it will be necessary to insulate a particularly poorly designed
class from the rest of the system, but applying individual insulation techniques would
be tedious and unnecessarily costly.
11 meyers, Item 34,- pp. 111-116; murray, Section 3.3, pp. 72-74.
12 gamma, Abstract Factory, Chapter 3, pp. 87-96; Facade, Chapter 4, pp. 185-194.
386 Insulation Chapter 6
· · ·.·.:;/ I
·i.············ .................••..•..•............ / ............................ < ....••.
1i,1~• •III\l:t1iij),i -
A protocol class is a nearly perfect insulator.
-
13 Thisrequirement is sometimes relaxed to permit extralinguistic support for runtime type infonna" .
tion (RTfI) as is discussed in Appendix A.
Section 6.4.1 The Protocol Class 387
Figure 6-39 illustrates a protocol for a simple file abstraction. The . c file for this
abstraction is nearly empty and contains only the following three lines:
II file.c
#include "file.h"
File::---File() {} II defined empty and out-of-line
Note that encoding the location as an integer instead of as an enumeration would have
allowed us to add new integer values without requiring existing clients to recompile.
In the same vein, we could then also remove or change these values without being
able to detect the inconsistency at compile time. Removing compile-time coupling at
the expense of compile-time type checking is typically undesirable.
II file.h
#ifndef INCLUDED_FILE
#define INCLUDED_FILE
class File {
public:
II TYPES
enum From { START, CURRENT, END };
II CREATORS
virtual ---File(); II not pure virtual!
II MANIPULATORS
virtual void seekCint distance, From location) = 0;
virtual int read(char *buffer, int numBytes) = 0;
virtual int writeCconst char *buffer, int numBytes) - 0;
II ACCESSORS
virtual int tell (From location) - 0;·
};
#endif
Instead we have chosen to define the set of valid location values explicitly in the inter-
face. It is therefore appropriate to enumerate them in class Fi 1 e. This enumeration is in
no wayan implementation detail; it is strictly part of the logical interface of class Fi 1e.
That is, adding to or changing this enumeration is like adding to or changing the set of
virtual functions-all derived classes and all clients would be forced to recompile.
388 Insulation Chapter 6
II filemgr.h
#ifndef INCLUDED_FILEMGR
#define INCLUDED_FILEMGR
struct FileMgr {
static File *openCconst char *filename);
};
#endif
One or more of the clients in a system may call upon the Fi 1eMg r in order to create an
instance of Fi 1 e I mp-a concrete implementation class derived from the protocol
class Fi 1 e. Once it is created, a pointer to the implementation object can be passed
around the system as a pointer to an object of type Fi 1e with no compile-time depen-
dencies whatsoever on its implementation.
Figure 6-41 illustrates a system that uses type Fi 1 e, yet is entirely insulated from its
implementation. Class Sub Sy s 1 is the part of the system that is responsible for instan-
tiating new objects of type Fi 1e, and is therefore link-time, but not compile-time,
dependent on class Fi 1 elmp. Both SubSys2 and SubSys3 merely use the Fi 1e proto-
col. These components are neither compile-time nor link-time dependent on Fi 1eMgr
or even on Fi 1 elmp. As such, both components subsys2 and subsys3 can be tested
independently of Fi 1 eMgr. These components can even be tested independently of
Fi 1 e I mp if a suitable stub implementation class is supplied for the Fi 1 e protocol in
the test drivers.
section 6.4.1 The Protocol Class 389
As we saw with the library subsystem example in Figure 5-32, extracting a protocol
can be used to break cyclic link-time d~pendencies.By physically separating the
Report's interface from its implementation, we allowed StatUti 1 to depend on the
lower-level Report protocol while only the higher-level Reportlmp partial implemen-
tation depended back on Stat Uti 1. What is new and important here is that ~hanges to
the higher-level implementation component---even in its header file~an have abso-
lutely no compile-time effect on any clients on the same or lower level of the protocol.
Sometimes we will encounter an instantiatable base class that declares some of its
functions vi rt ua 1. Often this class contains private data. Sometimes this class will
390 Insulation Chapter 6
Consider an instantiatable base class called E1em fitting the description of the previous
paragraph whose usage is suggested in Figure 6-42. The public interface of E1em is
used widely throughout the system by clients to manipulate objects of type El em (or
derived from E1em). The system architect has thoughtfully isolated the creation of
E1 em objects to a single client, eli en t 1.
Unfortunately the intrinsic lack of insulation in the E1em base class exposes all clients
of class E1 em to the many unnecessary encapsulated implementation details described
above. Clearly the design of the El em base class is far from perfect and, ideally, it
should be reworked. Reworking (like working in the first place) will require signifi-
cant thought and effort. For now, we can insulate the general public from unnecessary
details by extracting a protocol from class E1 em.
As illustrated in Figure 6-43, the idea is to create a protocol class at a lower level and
then to escalate static and constructor functionality to a utility class at a higher level.
The protocol will contain only the information needed to access and manipulate
instances of types derived from El em. The utility will support all static methods,
including insulated support for the creation of concrete instances of types derived
from E1 em.
II elem.h
#ifndef INCLUDEO_ELEM
#define INCLUDED_ELEM
392 Insulation Chapter 6
#ifndef INCLUDED_FDa
lIinclude "foo.hl!
#endif
#ifndef INCLUDED_BAR
ftinclude "bar.hl!
flendif
class Elem {
Faa d_fooPart;
Bar d_barPart;
private:
// ...
protected:
// ...
public:
enum Status { GOOD = O. BAD. UGLY};
Elem();
Elem(const Foo& fooPart);
Elem(const Bar& barPart);
Elem(const Foo& faoPart. const Bar& barPart);
Elem(const Elem& elem);
virtual ----Elem();
Elem& operator=(const Elem& elem);
static double fI() { /* ... */ };
static void f2(double d);
Foo f3() const { /* .,. */ };
void f4(const Foo& foo);
virtual const char *f5() const;
virtual void f6(const char *name);
virtual Status f7();
};
#endif
1. Copy the existing component e 1em, containing base class E1em, to a neW
name, elemimp, and rename the contained.class to Elemlmp. Any class
that previously inherited directly from E1em should now be changed to
inherit directly from E1 emlmp. This modification will require adjusting
the inheritance portion of the class definition of any derived classes along
with the #i ncl ude directives of each component containing one or more
of those classes. Derived-class constructor initialization lists may require
section 6.4.1 The Protocol Class 393
some adjustment as well. Note that the El em type arguments and return
values of all existing non-constructor members of E1 em I mp should remain
of type E1 em (i.e., should not be changed to type E1 em I mp).
2. Delete all but the public interface of the original E1em class. If enumera-
tions or typedefs specified in class scope are types used in the interface of
one or more non-static, public functions of E1em, they should remain.
3. Remove the constructors and all other static member functions from the
class, but be sure to leave a virtual destructor, declared non-inline and
defined empty.
4. Make all of the remaining member functions in class E1em pure virtual
and remove their definitions.
5. Remove all #i nc1 ude directives from the el em component. Provide "for-
ward" class declarations when a user-defined type is used in the interlace
of a pure virtual function. The new insulating E1 em class should now
appear as in Figure 6-45.
II elem.h
#ifndef INCLUDED ELEM
#define INCLUDED_ELEM
class Faa;
class Elem {
public:
enum Status { GOOD = 0, BAD, UGLY};
virtual ~Elem(); II defined out-of-line and empty
virtual Elem& operator=(const Elem& elem) = 0;
vi rtua 1 Foo f3() const = 0;
virtual void f4(const Foo& foo) = 0;
virtual const char *f5() canst = 0;
virtual void f6(const char *name) = 0;
virtual Status f7() = 0;
};
#endif
6. Modify class E1em I mp to publicly inherit directly from class E1em. The
header for the base class, e 1 em. h, should now be included directly in the
header for the partial implementation, e 1emi mp . h. Each of the pUblic
non-static member functions is now declared vi rtua 1 and should proba-
bly (although not necessarily) be declared non-i nl i nee Special consider-
ation should be given to the implementation of the virtual assignment
operator
The new E1 emlmp class should now appear as in Figure 6-46. The use of
/ * vir t ua 1 * / indicates that the vir t ua 1 keyword is optional. The nOll-
in line static functions defined in the original E1 em class were part of its
interface and could have been left in the base class. However, had we
done so, we would have been faced with the following unpleasant alterna-
tives:
II elemimp.h
#ifndef INCLUDED_ELEMIMP
#define INCLUDED_ELEMIMP
#ifndef INCLUDED_ELEM
#include "elem.hl!
#endif
#ifndef INCLUDED_Faa
#include "foo.hl!
#endif
private:
I I ...
protected:
I I ...
public:
Elemlmp();
Elemlmp(const Foo& fooPart);
Elemlmp(const Bar& barPart);
ElemImpCconst Foo& fooPart. canst Bar& barPart):
ElemlmpCconst Elemlmp& elemlmp):
1* virtual *1 ~ElemlmpC):
1* virtual *1 Elem& operator=(const Elem& elem);
Elemlmp& operator=(const ElemImp& elemImp);
static double fIC) { 1* ... *1 }
static void f2Cdouble d);
1* virtual *1 Foo f3C) const;
1* virtual *1 void f4(const Foo& foo);
1* virtual *1 const char *f5() const;
1* virtual *1 void f6(const char *name);
1* virtual *1 Status f7();
}:
#endif
create yet another component, e1em uti 1, containing the s t r uc tEl emU til.
Be sure to include e1emi mp. h in e1emu til. c. Move all of the static
member functions defined in E1 em to E1 em I mp. Now copy all of the public
static functions formerly defined in E1 em into E1 emUt i 1 and reimplement
them (out of line) to forward all of the client's requests to the correspond-
ing functions now defined in class E1 emlmp.
9. Since E1 emlmp is not abstract (i.e., since it does not contain any pure vir-
tual functions), it will be desirable to provide an insulated mechanism for
clients so they can instantiate instances of type E1 em I mp without actually
including the non-insulating class definition. (A separate component will
be needed to insulate the creation of every object derived from class
E1 em I mp as well.) For each of the constructors defined in E1 em I mp, define
a new static member function in class ElemUti1, named createElem,
taking precisely the same argument signature as the constructor and
returning a pointer to a dynamically allocated, fully constructed instance
of class E1 em I mp as a pointer to a non-c 0 n s tEl em.
The new insulating ElemUtil class should now appear as in Figure 6-47.
II elemutil.h
#ifndef INCLUDED_ELEMUTIL
#define INCLUDED_ELEMUTIL
class Elem;
class Foo;
class Bar:
struct ElemUtil
Elem *createElemC);
Elem *createElem(const Foa& faoPart);
Elem *createElem(const Bar& barPart);
Elem *createElem(canst Foa& faaPart, const Bar& barPart);
Elem *createElem(const Elem& elem);
static double fIC);
static void f2(double d);
};
#endif
The modified system is illustrated in Figure 6-48. Public clients of the new E1em pro-
tocol will now be relieved of all the compile-time coupling formerly associated with
E1 em. All of this tight coupling has been completely isolated within the element sub-
system.
A concrete class is more than just an interface-it defines a useful object that can be
instantiated as an automatic variable on the program stack. Protocol classes (dis-
cussed in Section 6.4.1) are consistent with pure object-oriented design; however,
engineering is anything but pure. Sometimes we would like the insulating benefits of
having a protocol and yet be able to construct an instance of the object Gust like any
other concrete class).
Consider the class Examp 1 e shown in Figure 6-49. This class contains, as embedded
data members, the use~-defined types A, B, and C. All member functions are implicitly
declared i n 1 i ne and the . c file is essentially empty. The implementation of this class is
clearly not insulated from clients. Suppose we now realize that this class is going to be
used widely and that the implementation is subject to change. What can we do to insu-
late our clients from changes to the implementation details in our example component?
section 6.4.2 The Fully Insulating Concrete Class 399
I I e xamp 1e ..h
#ifndef INCLUDED EXAMPLE
#define INCLUDED_EXAMPLE
#ifndef INCLUDED_A
#include "a.h"
fIend if
#ifndef INCLUDED_B
#include "b.h"
Ifendif
#ifndef INCLUDED_C
#include "c.hn
ffendif
class Example {
A d_a;
B d_b;
Cdc'
- ,
double value2() canst { return d_a.valueC) + d_b.valueC); }
public:
Examp 1 e () {}
Example(canst Example& e) : d_aCe.d_a), d_bCe.d_b), d cCe.d c) {}
-Examp 1e () {}
#endif
II example.c
#include "example.h h
The fITst step is to replace all embedded data with an outwardly opaque pointer to
hold that data. By removing the embedded instances, we eliminate the need of Our cli-
ents to have seen the definitions of classes A, B, and C. We can therefore remove the
explicit #include directives from example.h and replace them with class declara-
tions. Doing so will often require defining previously inline functions out of line,
which is entirely consistent with our desire to insulate.
Figure 6-50 shows how this transform would look for the examp 1 e component. As the
figure shows, the . h file is smaller and the . c file is no longer empty. Clients of com-
ponent examp 1e are now insulated from all implementation-and even interface-
changes to components a, b, and c.
Section 6.4.2 The Fully Insulating Concrete Class 401
II example.h .
#ifndef INCLUDED_EXAMPLE
#define INCLUDED_EXAMPLE
II example.c
class A; #include "example.h"
class B; #include "a.h"
class C; #include Itb.h"
tfinclude "c.hlt
class Example {
A *d_a_p; Example::ExampleC)
B *d_b_p; d_a_pCnew A)
C *d_c_p; , d_b_pCnew B)
double value2C) const; , d_c_pCnew C)
{}
public:
Exampl eC); Example::ExampleCconst Example& example)
ExampleCconst Example& example); d_a_pCnew AC*example.d_a_p))
,."ExampleC); , d_b_pCnew BC*example.d_b_p))
, d_c_pCnew CC*example.d_c_p))
Example& operator=Cconst Example&); {}
However, our clients are not entirely insulated from changes to the implementation of
the examp 1e component itself. Specifically, clients of examp 1 e are not insulated from
the actual number of outwardly opaque pointers contained in the Exampl e class defi-
nition. Adding a single instance of even a fundamental type to the private data of class
Examp 1e would force all of its clients to recompile. Modifying the signature or return
type of any private member function would have the same effect.
How can we completely insulate the implementation of class Examp 1 e and still have it
remain a concrete class? The answer centers around getting rid of the individual pri-
vate data members and replacing them with a single opaque pointer to the class's rep-
resentation. 14
.
DEFINITION: A concrete class is/ully insulating if it
1. contains exactly one data member that is an outwardly opaque
pointer to a non-canst struct (defined in the. c file) specifying the
implementation of that class,
2. does not contain any other private or protected members of any
kind,
3. does not inherit from any class, and
4. does not declare any virtual or inline functions.
II example.h
#ifndef INCLUDED EXAMPLE
#define INCLUDED_EXAMPLE
class Example_i: II fully insulated implementation
class Example {
Example_i *d_this;
public:
ExampleC);
Example(const Example& example):
-Example();
Example& operator=Cconst Example& example);
double valueC) canst;
}
fIend if II example.c
#include Hexample.h"
#include "a.hH
#include nb.h"
#include "c.h"
struct Example_i {
Ada·
-
B db'
.
.
- ~
Cdc·_ t
Figure 6-51 illustrates the result of transforming a class that does not insulate its cli-
ents from any of its implementation details to one that is fully insulating. All public
inline functions are eliminated. All private member data and functio~s are now made
part of an auxiliary s t ruct, defined entirely within the component's. c file. Note that,
in this example, the default member-wise copy semantics for the auxiliary s t ruct
happened to be correct and therefore were not implemented explicitly.
The important property of a fully insulated class is that changing its representation
does not affect how clients perceive the physical layout of an instance, because its
implementation (object layout) is always just a single opaque pointer. An instance of
one fully insulating class looks the same as every instance of every other fully insulat-
ing class, regardless of its purpose or functionality. It is this property of physical uni-
formity that enables the arbitrary reimplementation of the class's interface without
having to alter its header file in any way.
Allowing inheritance or virtual funcrions would affect the object layout by introduc-
ing additional data and/or additional virtual-function-table pointers. Note that inherit-
ing from even an empty s t rue t may affect the size of the derived object. Thus an
instance of an otherwise fully insulating class that inherits from a base class would
necessarily appear physically different from an instance of a fully insulated class that
does not. In other words, inheriting from a base class would increase the size of a fully
insulating class beyond that of a single pointer, physically distinguishing its instances
from those of other, fully insulating classes.
Another important property of being fully insulating is that the class has sole control
over and access to the s t r uc t defining its internal representation. Letting the internal
data member point directly at an instance of a class defined in a separate component
would compromise our ability to make independent insulated changes to our own
implementation. In order to add a private member without affecting our clients, we
would be forced to alter the interface of an independently accessible, independently
testable object.
The name of the data structure type (e.g., Examp 1 e_ i) and especially the name of the
instance variable (e.g., d_thi s) are mostly a matter of style and need not be the same
in all cases. Because the Ex amp 1 e_ i s t r uc t ("hidden" in the . c file) may contain
function or static data members with external linkage, however, there is the possibility
for unexpected link-time collisions with members of like-named classes defined out-
side this component. For this reason, the naming convention for the s t r u c t defining
the fully insulated implementation should be disjoint from that for naming ordinary
classes. Adopting the prefix of the publicly accessible class name followed by an
underscore ensures that an implementation class local to a component will not collide
with classes defined outside this component. You may find this kind of consistent
convention helpful for identifying the representation of fully insulating classes when
working on large projects.
Here we propose to make the wrapper not only encapSUlating but insulating as well.
We therefore endeavor to eliminate the unnecessary clutter and compile-time coupling
associated with an interface that contains irrelevant or perhaps even proprietary infor-
mation.
One way to produce an insulating wrapper component is to apply the total insulation
technique of Section 6.4.2 to the individual objects defined in an encapsulating wrap-
per. We can do this without affecting any of the lower-level objects used to implement
the wrapper.
client code
clients
-----t~---
graph
subsystem
Figure 6-52 shows the component dependency for the graph wrapper component of
Figure 5-95. As you may recall, the clients of graph were not permitted to access the
objects defined in the implementation components: graphimp, gnode, gedge, and
ptrbag. However, clients of graph were not insulated from changes to the headers of
these components.
Let us consider insulating the graph wrapper component of Figure 5-95. A brute-
force conversion of graph using the total insulation technique of Section 6.4.2 pro-
duces the header file shown in Figure 6-53. This interface does achieve total insula-
tion, but it is at a significant cost in runtime performance due to extra dynamic
memory allocations.
II graph.h
#ifndef INCLUDED_GRAPH
#define INCLUDED_GRAPH
class NodeId {
Nodeld_i *d_this; II should be changed to: Gnode *d_node_p;
friend Edgeld;
friend Graph:
friend Nodelter;
friend Edgelter;
public:
Nodeld();
Nodeld(const NodeId& nid);
. . . Nodeld();
Nodeld& operator=(const Nodeld& nid);
operator Node *() canst;
Node *operator->() const;
};
class Edgeld {
Edgeld_i *d_this; II should be changed to: Gedge *d_edge_p;
friend Graph;
friend EdgeIter;
408 Insulation Chapter 6
public:
Edgeld();
Edgeld(const Edgeld& eid);
-Edgeld();
Edgeld& operator=Cconst Edgeld& eid);
Nodeld from() const;
Nodeld toC) const;
operator Edge *() const;
Edge *operator->() const;
};
class Graph {
Graph_i *d_this;
friend Nodelter;
friend Edgelter;
private:
Graph(const Graph&); II not implemented
Graph& operator=(const Graph&); II not implemented
public:
Graph();
-Gra ph ( ) ;
Nodeld addNode(const char *nodeName);
Nodeld findNode(const char *nodeName);
void removeNode(const Nodeld& nid);
Edgeld addEdge(const Nodeld& from, const Nodeld& to, double weigh·t);
Edgeld findEdge(const Nodeld& from, canst Nodeld& to);
void removeEdge(const Edgeld& eid);
};
class Nodelter {
Nodelter_i *d_this;
private:
Nodelter(const Nodelter&);
Nodelter& operator=(const Nodelter&);
public:
Nodelter(const Graph& graph);
. . . NodelterC);
void operator++();
operator const void *() const;
Nodeld operator()() const;
};
class Edgelter {
Edgelter_i *d_this;
section 6.4.3.1 Single-Component Wrappers 409
private:
Edgelter(const Edgelter&);
Edgelter& operator=(const Edgelter&);
public:
EdgeIter(const Graph& graph);
EdgeIter(const Nodeld& nid);
-Edgelter():
void operator++();
operator const void *() const;
EdgeId operator()() const;
};
#endif
Figure 6-53: Header for Fully Insulating 9 rap h Wrapper Component, 9 rap h . h
As Figure 6-54 shows, the fully insulating version of class Node I d now requires
dynamic allocation whenever a Nodeld is returned by value:
II (from graph.h)
Nodeld::~NodeId()
{
delete d_this;
}
Instead of insisting on total insulation for all of the wrapper classes, we can achieve
most of the advantages of insulation at considerably less runtime cost if we only par-
tially insulate the NodeId and Edgeld classes. By exposing just the names of these
implementation classes in the wrapper header, we give up the flexibility to add inde-
pendent members to the wrapper classes; however, we retain the right to modify the
organization of Gnode and Gedge in any way we see fit.
II (from graph.h)
class Nodeld {
Gnode *d_node_p;
friend Edgeld;
friend Graph; II (from graph.c)
friend Nodelter;
friend Edgelter; Nodeld: : Nodeld() : d_node_p( 0) {}
Figure 6-55 demonstrates how one can temper total insulation for lightweight classes to
improve performance. Functions returning Nodeld by value can now do so without the
cost of allocating dynamic memory-a cost we attempt to quantify in Section 6.6.1:
412 Insulation Chapter 6
Although the runtime performance stands to benefit significantly from the partial
insulation of Node I d and Edge I d, the remaining three classes-Gra ph, Node I ter, and
Edge I te r-are an entirely separate matter. In each case, insulating the client of the
wrapper from the implementation object requires a dynamic allocation anyway. It
costs no more at runtime to allocate a s t rue t containing the implementation object
than it does to allocate the implementation object itself. Nor is there any additional
runtime cost associated with extra indirection. We have to follow exactly one
pointer-adding a theoretical offset of 0 is removed by standard compile-time optimi-
zation. In terms of performance, fully insulating these classes costs no more than par-
tially insulating them, so we might as well go for it.
Notice also that Graph, Nodelter, and Edgelter have each disabled both copy con-
struction and assignment. Because the normal use of these objects requires creating and
destroying them much less frequently than Note I d and Edge I d, they are naturally better
candidates for insulation. The fully insulated implementations of Gra ph, Node I ter, and
Edgelter, along with the partially insulated implementations of Nodeld and Edgeld
corresponding to the suggested changes in the header file of Figure 6-53, are provided
for reference in Figure 6-56.
II graph.c
#include "graph.h"
#include "graphimp.h"
Ifinclude "gnode.h"
#include "gedge.h"
Nodeld::Nodeld() : d_node~p(O) {}
Nodeld: :'"'"'Nodeld() {}
Edgeld::Edgeld() : d_edge_p(O) {}
Edgeld: : . . . Edgeld() {}
struct Graph_i {
Graphlmp d_imp;
};
struct Nodelter i {
GnodePtrBaglter d_iter;
Nodelter i(const GnodePtrBag& nodes) d_iter(nodes) {}
};
struct Edgelter_i {
GedgePtrBaglter d_iter;
Edgelter_;(const GedgePtrBag& edges) d_iter(edges) {}
};
If designed properly, a single wrapper component can effectively insulate clients from
the organizational details of many lower-level implementation components.
Wrapping components individually is also possible, but only when direct interaction
with the underlying component by clients is not required. As an instructive (but
unlikely) example, consider creating the fully insulating wrapper component
pubs tack for a non-insulating, list-based stack component.
As illustrated in Figure 6-57, the original s t a c k component exposes three classes and
two operators in its header file. One of these classes, St ac k Lin k, is an encapsulated
implementation detail of the other two classes (S t a c k and St a c kIt e r). The wrapper
component, pubstack, exposes two classes, two free operators, and none of the
underlying implementation details. Regardless of how St a c k and St a c kIt e r are
implemented, clients of the wrapper classes are insulated from all implementation
details.
416 Insulation Chapter 6
pubstack stack
Figure 6-57: Complete Component/Class Diagram for stack and Its Wrapper
Figure 6-58 shows the header file for a fully insulating wrapper for a stack compo-
nent. Each of the two wrapper classes holds only a single private opaque pointer to its
own internally defined implementation structure. There are no other private or pro-
tected members of any kind in the wrapper's physical interface. All functions will be
defined out of line. The friendships necessary to extract the underlying wrapped
objects from other wrapper objects passed as parameters are the only implementation
details in the physical interface of this wrapper component.
II pubstack.h
#ifndef INCLUDED_PUBSTACK
#define INCLUDED_PUBSTACK
class PubStacklter;
class PubStack_i;
class PubStack {
PubStack_i *d_this;
friend PubStacklter;
II May want to grant access to improve performance andlor reuse:
Ilfriend int operatar==(const PubStack&, canst PubStack&);
section 6.4.3.2 Multi-Component Wrappers 417
public:
PubStack();
PubStack(const PubStack& stack);
-PubStack();
PubStack& operator=(const PubStack& stack);
void push(int value);
int pope);
int tope) const;
int isEmpty() const;
};
class PubStacklter_i;
class PubStacklter {
PubStacklter_i *d_this;
PubStacklter(const PubStacklter&);
PubStackIter& operatar=(canst PubStacklter&);
public:
PubStacklter(const PubStack& stack);
""'PubStacklter();
void operator++();
operator canst void *() canst;
int operator()() canst;
};
#endif
Figure 6-59 shows how the pubstack component is implemented. Virtually all func-
tionality supplied by Pub St a c k forwards calls out of line to the corresponding func-
tions of the insulated implementation object, St ac k. Each constructor of Pub St a c k
merely allocates an instance of its auxiliary structure, Pub St a c k_ i. Pub St a c k' s
destructor destroys this dynamically allocated instance, and all member functions
simply forward their input to the corresponding members of the St a c k object embed-
ded in the managed instance of Pub St ac k_ i .
418 Insulation Chapter 6
II pubstack.c
#include "pubstack.h"
#include "stack.h"
struct PubStack_i {
Stack d_stack;
};
PubStack::PubStack()
d_this(new PubStack_i) {}
PubStack::PubStack(const PubStack& s)
d_this(new PUbStack_i(*s.d_this)) {}
struct PubStacklter_i {
Stacklter d_stacklter;
PubStacklter_i(Stack &stack) d stacklter(stack) {}
};
section 6.4.3.2 Multi-Component Wrappers 419
In this example, the free operator== does not absolutely need to have access to the
private implementation of the underlying subobject in order to implement its func-
tionality. Instead ope ra to r== can implement its functionality locally via the public
version of the iterator, which does have private access to the underlying implementa-
tion. If this overhead is deemed excessive, it is easy enough to declare the wrapper
function
a friend of class Pub St a c k. Doing so would grant this free operator private access to
PubStack's underlying Stack object, enabling it to invoke the corresponding, lower-
level operator== directly:
Figure 6-60 illustrates the problem with wrapping components that have to interact
directly. An E1 emSet is an object that manages a collection of objects of type E1 em.
E1emSet has a member, vo ida dd (con s t E1 em&), that takes an element and adds a copy
of its value to the set PubE1 emSet has a similar member, voi d add (cons t PubE1 em&),
which instead takes a PubEl em and adds a copy of its value to the set How would you
propose to implement pubE1 emSet: : add? The only obvious implementation
forces the higher-level PubE1 emSet to be a long-distance friend of PubE1 em, which
(see Section 3.6) is a breach of encapsulation.
Forgetting for the moment the inherent problems with long-distance friendships, the
sheer number of required friendships will quickly prove this strategy to be unwork-
able. Each wrapper type that is used as an argument to a wrapper class member (or
free operator) must declare that class or operator a friend in order to allow it access to
the underlying representation object being passed. As illustrated in Figure 6-61, two
wrappers, PubA and PubB, are currently used in the public interface of PubX. PubC, for-
merly not used by PubX, is in the signature of a member about to be added to PubX. As
the figure shows, adding the member function vai d h (canst PubC& c) to a higher-
level class, PubX, can force a fri end declaration to be added to a lower-level class
definition, PubC. This modification in turn forces that class, along with all of its
clients, to recompile!
class PubA;
class PubS;
Ilclass PubC;
class PubX { if we add these,
X *d_imp_p; then we'll be
forced to add
public: these too!
void f(const PubA& a);
void g(const PubB& b);
Ilvoid h(const PubC& c);
};
Figure 6-61: Two-Way Coupling Caused by the Uses Relation Among Wrappers
422 Insulation Chapter 6
Nonetheless, with careful design it is possible and very useful to create mUlti-compo-
nent insulating wrappers. The secret to creating such a wrapper layer is to realize that
only classes and operators within a single component can legitimately take advantage
of what goes on below the interface of that component, via friendships.
Wrapper objects defined within a single wrapper component are at liberty to employ
friendship as needed to look below the local interfaces and manipulate the underlying
representation directly. For example, suppose in Figure 6-62 that (as with E1 emSet
and E1 em), class E uses class B in its interface and we want to expose a public version
of both E and B to clients. Class Pub E will need private access to obtain the instance of
B encapsulated within PubB. We are forced to declare PubE a friend of PubB, making it
necessary to place both PubB and P.ubE in the same wrapper component to preserve
encapsulation. 15
15 This technique should not be construed as a general panacea for avoiding long-distance friend-
ships among non-wrapper classes. Since wrapper classes are typically simple in nature, merging
several of them in a single component does not necessarily threaten effective testability. Merging the
implementation components, for example, would defeat the goals of designing a hierarchy of indi-
vidual components, each with manageable complexity_
Section 6.4.3.2 Multi-Component Wrappers 423
Wrapper
(Interface)
Layer
--------------------
Subsystem
(Implementation)
Layer
w x y z
Figure 6-62: Creating an Insulating, Multi-Component Wrapper
424 Insulation Chapter 6
The design goal of avoiding long-distance friendship makes it normal for wrapper-
layer components to be much larger and define significantly more objects than is typ-
ical of components in the underlying, low-level implementation. In particular, the
component containing the PubG wrapper in Figure 6-62, like the graph wrapper com-
ponent of Section 6.4.3, supplies additional iterator classes to provide clients with
insulated access to lower-level functionality.
If the wrapper insulation is complete, there is no place in the physical interfaces of the
wrapper components where the types defined in the implementation components are
section 6.5 The Procedural Interface 425
even named. A partially insulating wrapper may hold pointers to objects that are
themselves "first-class citizens" defined in separately accessible components. These
objects can be reused independently of the wrapper and are therefore less easily modified.
In contrast, a fully insulating wrapper merely holds a pointer to a simple s t rue t that
defines the private implementation in its . c file. Because there is no independent com-
ponent, there is no independent way to interact with the representation directly.
Unlike a partially insulating wrapper, it is possible to add arbitrary private data with-
out altering any header file.
In either case, the objects used to implement the wrapper are free to interact effi-
ciently via their own encapsulating (but usually non-insulating) interfaces at the
lower-levels of a subsystem.
If our primary goal is to insulate clients from everything that goes on underneath the
facade of an insulating layer for a very large and complex system, we will have to com-
promise. One such compromise is to give up the true logical encapsulation of a wrap-
per and rely on outwardly opaque pointers with unpublished header files to achieve the
encapsulation. This type of interface is commonly referred to as a procedural interface.
The interface we are providing is typically much more abstract than those developers
used to create the implementation in the first place. For the same reasons discussed in
Chapter 4, it would be exceedingly difficult to ensure the reliability of such a system
by testing it from the procedural interface alone. Fortunately for us, however, the
complexity lies at the lower levels of the system. Our job as procedural-interface
authors is to identify an appropriate subset of the types and functionality already
defined in the lower-level implementation components that will allow end users to
accomplish their desired application-level tasks.
Figure 6-63 is an illustration of the way a procedural interface is organized. All of the
publicly accessible interface functions are independent of each other, and all of them
are at a higher level than every implementation component. There is no levelization
issue other than the fact that each individual interface function depends only on the
underlying implementation; the procedural-interface functions should not depend on
each other.
Section 6.5.1 The Procedural Interface Architecture 427
Procedural
Interface fl f4 f5 f6 f7 f8 f9 ... fn
Layer I \ I 1\ I \" 1\
\
---..----
If we decide to insulate using a procedural interface, then we will not incur the over-
head of creating new wrapper objects or be compelled to confine ourselves to a single
component to avoid long-distance friendships. We can simply expose an appropriate
subset of the underlying type names in the procedural interface without publishing
their definitions.
Note that the requirements here are not the same as they were in Section 5.10. There
our goal was to encapsulate the use of components; here our goal is to insulate clients
from the definitions of the objects we want them to use.
U sing the same type names as defined in the underlying implementation gives away
little information yet preserves the type safety across the interface. End users of this
428 Insulation Chapter 6
interface will benefit from the compiler-enforced type safety in their own applications
as well.
Besides reducing the overhead of additional classes and the compiler-enforced type
safety, exposing the underlying types in name may have a very appealing benefit for
marketing. Some customers may want to take advantage of the underlying object-ori-
ented organization of the system, and may be willing to pay extra for this privilege.
By providing these customers with a few key (protocol) base-class header files from
the underlying system, it is possible to enable them to derive their own special types
to be used within the system without exposing a single implementation detail.
Similarly, some customers may want better performance than an insulating procedural
interface can provide. By publishing the header files of just the lowest-level, concrete
objects (e.g., Poi nt, Box, Po 1ygon), preferred customers may create these objects as
automatic variables and access them directly via inline functions. It is by maintaining
type-name consistency across the procedural interface that all of this integration is
made seamless; notice how this would not be possible with an encapsulating wrapper.
For the purposes of this discussion, let's assume we are to create an ANSI C-compat-
ible interface. We therefore will be forced to use free functions-a necessary violation
of a major design rule from Chapter 2. To help avoid collisions in the global name
space, each of these free functions will begin with a consistent registered prefix (as
discussed in Section 7.2). The ANSI C language does not support C++ references, but
does support the notion of canst versus non-canst. Therefore all objects will be
passed by pointer, and only non-canst objects can be modified or destroyed.
/* CREATORS */
Stack *pi_createStack();
void pi~destroyStack(Stack *thisStack);
ANSI C does not support the overloading of function names, which makes the naming
process problematic-particularly for constructors. Since objects created with a pro-
cedural interface cannot be automatic variables, their creation and destruction is dis-
proportionately more expensive than assignment. For these reasons, we may choose to
omit access to copy constructors, relying instead on the default constructor and
repeated use of the assignment operator.
The type safety afforded by ANSI C goes a long way toward protecting customers
from shooting themselves in the foot. Because this is a C and not a c++ interface,
however, there is also a greater danger that they may accidentally try to destroy some-
thing they did not allocate (and do not own), or try to destroy something they did allo-
cate, but do so more than once. A typical example of a common memory allocation
error is shown in Figure 6-65.
void f()
{
Stack *sl = pi_createStack();
Stack *s2 = pi_createStack();
/* ... */
pi_destroyStackCs1);
pi_destroyStack(sl);. /* Oops! */
}
These kinds of customer errors are among the hardest to debug, and they· can be a
costly drain on a customer support organization. Fortunately there is an effective way
to detect most memory allocation errors. A memory allocator that has proven highly
effective at detecting and reporting memory allocation-related customer program-
ming errors in actual products is presented in Appendix B.
6.5.3 Handles
Basically, a handle is an object that is used to refer to another object. 16 Usually, a han-
dle holds a pointer to the "held" object but contains little else, as illustrated in Figure
6-66. Unlike a wrapper, the object to which the handle refers is programmatically
accessible from the interface of the handle. Handles used in this way are sometimes
called smart pointers. 17 There are many applications for the handle pattern in C++.
The Node I d wrapper class of Figure 5-95 acted as a handle for the Node portion of
Gnode object to which it held a pointer; Edgeld acted similarly.
d_length
Stack ,.oJ . ~
int[d_size]
II pi_stackhandle.h
#ifndef INCLUDED_PI_STACKHANOLE
#define INCLUOED_PI_STACKHANDLE
class Stack:
class pi_StackHandle {
Stack *d_object_p;
private:
pi_StackHandle(const pi_StackHandle&); II not implemented
pi_StackHandle& operator=(const pi_StackHandle&); II not implemented
public:
c II CREATORS
pi_StackHandle();
""StackHandle();
II MANIPULATORS
void loadObjectCStack *stack); II Not intended for public use.
II ACCESSORS
operator Stack *() canst; II Conversion operator to allow use
}; II of this object as if this were
II a writable .pointer to a Stack.
#endif
II pi_stack.h
#ifndef INCLUDED- PI - STACK
#define INCLUDED- PI - STACK .,
/
I
r/
class Stack;
struct pi_Stack {
II CStack Creators)
static void createCpi_StackHandle *handleToBeLoaded);
II (Stack Manipulators)
static Stack *assignCStack *thisStack, canst Stack *thatStack):
static void pushCStack *thisStack, int value);
static int pop(Stack *thisStack);
II (Stack Accessors)
static int top(const Stack *thisStack) canst;
static int isEmpty(const Stack *thisStack) canst;
static int isEqual (const Stack *left, canst Stack *right) const;
};
#endif II pi_stack.c
#include "pi_stack.h"
#include "stack.h"
II
Since we plan to use handles to manage memory, we will modify the stack creation
function Sta c k *pi _c rea teSta c k () that we used in ANSI C. In a handle-base archi ..
section 6.5.3 Handles 433
tecture, an equivalent C++ translation of this function will take a writable pointer to a
stack handle object as a parameter. We avoid the free functions of the ANSI C version
by making this function a static member of class pi _S t a c k. The header for the entire
pi_stack component is given in Figure 6-69.
v0 i d my Fun c ( )
{
pi_StackHandle h; II automatic variable
pi_Stack::create(&h); II load with dynamically allocated object
for Cint i = 0; i < 10; ++i) {
pi_Stack::pushCh, i); II push 0, 1 , ... , 9 on the managed stack
}
int x = pi_Stack::popCh); II pop 9 from managed stack into x
I I ...
}
The scoping afforded by c++ classes and the ability to overload function names in C++
simplify the task of naming. Although the cosmetics of adding handles and scoping func-
tion names does not change the underlying nature of this interface-it is still procedural.
In particular, trying to make a handle look like a wrapper would be ill-advised. Con-
sider what would happen if, instead of the current interface, we implemented the pop
function as a non-static member of class pi _S t a c kHa nd 1e (taking no arguments):
class pi_StackHandle {
I I ...
public:
I I ...
int pope);
I I ...
}:
434 Insulation Chapter 6
II pi_stackhandle.h
#ifndef INCLUDED_PI_STACKHANOLE
#define INCLUDED_PI_STACKHANOLE
class pi_StackHandle {
Stack *d_stack_p; II pi_stackhandle.c
#include "pi_stackhandle.h"
public: #include "stack.h"
I I ...
-pi_StackHandle();
I I ...
I I ...
};
pi_StackHandle::-pi_StackHandle()
{
#endif
}
II
Figure 6-71: Destructor for Manager Handle Destroys Its "Held" Object
The obvious semantics would be that the pop ( ) member should pop and return the top
element of the St a c k object managed by this pi _S t a c kHan d 1 e. Suppose, however, we
are handed a pointer to a non-cons t Stac k that we do not own. How could we pop it?
If, as customers, all we had at our disposal is a pop ( ) member of class St a c kHan d 1 e,
we would be forced to use the loa dO b j e c t ( ) member to put this St a c k object pointer
inside a handle before we could manipulate it. But if we did that, we would now have
two agents managing the memory of the same Stack object!
The single purpose of the handle in a procedural interface is to manage the memory of
a dynamically allocated object. Except for the pi _S t a c k: : ere ate function, which
loads a pi_StackHandl e with a newly allocated Stack object, all of the functionality
defined in the pi _St a c k procedural interface should refer directly to the underlying
St a c k and not the pi _S t a c k Han d 1 e. By following this strategy, customers are never
forced, or even tempted, to abuse a handle to gain access to the functionality of the
underlying object.
section 6.5.4 Accessing and Manipulating Opaque Objects 435
As a matter of consistency, it is desirable that our subject type (e.g., Stack) always
appear with an uppercase first letter. Ignoring the prefix, we want the actual function
name to comply with our design rule from Section 2.7 which suggests that all func-
tions begin with a lowercase letter. To lexically distinguish these global functions
from global types, we have inserted the letter f at the beginning of the actual function
name. For example,
Although this style of naming is entirely appropriate for the procedural interface
layer, it does not necessarily translate directly to the underlying objects and member
functions of the implementation layer. For example, representing the conversion func-
tions (see Figure 5-15)
struct Convert {
static Window toWindow (canst Rectangle& r);
static Rectangle toRectangle (canst Window& w);
};
as
approach would quickly lead to an unlevelizable architecture. I sus'pect that naively try-
ing to map this kind of naming style onto C++ classes and their member functions is a
primary source of cyclic physical dependencies in many existing systems.
We now tum our attention to the class Shape shown in Figure 6-72. A bounding box is
a minimal rectangle consisting of horizontal and vertical edges that circumscribe a
collection of points. Every Shape, among other things, knows how to return (by value)
a bounding box of type Ba x that contains the Shap e.
class. Box;
Because pointers are opaque, there can be no return by value for user-defined types in
a procedural interface. The obvious choice is to allocate a new Box and return a
pointer to it, as shown in Figure 6-73a. One problem with this approach is that objects
returned by value are typically small objects that do not have associated dynamic
memory. Dynamically allocating light-weight objects such as Box or Poi nt every time
one is accessed would create considerable unnecessary overhead. Another problem is
that returning unmanaged objects would make who owned what memory confusing
and increase the likelihood of leaks.
Section 6.5.4 Accessing and Manipulating Opaque Objects 437
We can avoid both the runtime overhead and confusion about ownership by sticking to
the simple principle that only the objects explicitly allocated by the client of a proce-
dural interface can be destroyed by that client-all other objects are owned and man-
aged by the system. The preferred procedural interface function is indicated in Figure
6-73b.
438 Insulation Chapter 6
The improvement in runtime efficiency in Figure 6-73b can be significant. Figure 6.. 74
shows two implementations of a function that returns the sum of the area of the
bounding boxes for an array of shapes. For small, lightweight objects, such as Point or
Box, that are obtained over and over in a single function, the cost of dynamic alloca-
tion and deallocation on every iteration of the loop (Figure 6-74a) could easily domi-
nate the runtime cost of the function call. Instead, we can do the allocation once
outside the loop (Figure 6-7 4b) and then reuse the allocated object over and over,
reSUlting in a dramatic improvement in runtime efficiency.
Sometimes the system itself will allocate an object dynamically and return it to the
client. In such cases, a handle class is usually provided by the underlying system to
manage the memory for that object. For example, consider a Shape interface that is a
protocol class for all kinds of Sha pe objects. Now suppose there is a class Poi nt Iter
that is also a protocol for a variety of specific iterator objects that sequence over some
collection of points. It is possible to ask an arbitrary Shap e through its protocol to
allocate a shape-specific iterator (derived from Poi ntIt e r) and return it by loading a
user-supplied instance of a Poi ntlterHandl e, as shown in Figure 6-75. 18
class Point;
class Pointlter {
·public:
II CREATORS
virtual ~Pointlter();
18 See also the Iterator design pattern in gamma, Chapter 5, pp. 257-71.
Section 6.5.4 Accessing and Manipulating Opaque Objects 439
II MANIPULATORS
virtual void reset() = 0;
virtual void operator++() = 0;
II ACCESSORS
virtual operator const void *() const = 0;
virtual const Point operator()() const = 0;
};
class PointIterHandle {
PointIter *d_iter_p;
PointlterHandle& operator=(PointIterHandle&);
PointIterHandle(PointIterHandle&);
public:
II CREATORS
PointlterHandle();
PointlterHandle(PointIter *iterator);
~PointlterHandle();
II MANIPULATORS
void loadIter(Pointlter *newDynamiclyAllocatedlterator);
II ACCESSORS
PointIter& operator()() const;
operator PointIter&() const;
PointIter *operator-)() const;
PointIter& operator*() canst;
};
class Shape {
I I ...
public:
I I ...
II ACCESSORS
virtual void getVertices(PointIterHandle *returnValue) - 0;
I I ...
};
In this example, the system is dynamically allocating an iterator object and placing it
in a user-supplied handle. Since the underlying system itself is allocating the memory,
the customer is not authorized to delete it. The customer is, however, authorized to
create and destroy an instance of a Poi ntIt e r Han d 1e. The customer therefore creates
a Poi ntlterHandl e and passes it to the getVerti ces function of Shape. The handle
is then loaded by the system with a dynamically allocated pointer to a Poi ntIt e r. The
customer uses the object contained in and managed by the handle. When the handle is
destroyed by the customer, the destructor of the handle in tum destroys the contained,
dynamically allocated iterator. Reusing a handle to obtain another iterator also
prompts the handle to destroy any previously installed iterator before loading the new
440 Insulation Chapter 6
one. An ANSI C-compatible procedural interface for the functionality of Figure 6-75
is given in Figure 6-76. The usage of such an interface is illustrated in Figure 6-77.
1* ACCESSORS *1
void pi_fPointIterResetCPointlter *thislter);
void pi_fPointlterAdvance(Pointlter *thisIter);
1* MANIPULATORS *1
1* void pi_fPointlterHandleLoadlter(PointIterHandle *thisHandle,
* PointIter *newDynamicPointlter);
* Note: not necessary to expose this dangerous function
*1
1* ACCESSORS *1
Pointlter *pi_fPointlterHandleGetlterCconst PointlterHandle *thisHandle);
1* Note: for a procedural interface, this one accessor is sufficient *1
As a procedural-interface author for a large system, you may discover a class interface
that returns a dynamically allocated object directly, without placing it in a client-sup-
plied handle. In such cases, it will be necessary for you to find (or create) such a han-
dle of the appropriate type and require your clients to pass a non-con s t pointer to that
handle into your interface function. You will then have to load the handle with the
system-allocated object yourself. Doing so will preserve the principle that clients of
the procedural interface are authorized to delete only what they explicitly allocate.
Section 6.5.5 Inheritance and Opaque Objects 441
pi_fShapeGetVertices(shape, handle);
it = pi_fPointlterHandleGetlter(handle);
pi_destroyPoint(pt);
pi_destroyPointlterHandle(handle);
}
Converting between types related by inheritance is yet another aspect of writing pro-
cedural interfaces for object-oriented designs that must be addressed. The issue at
hand involves how we present a type-safe, procedural interface that supports the
notion of pointer conversion implied by inheritance.
Consider the class diagram shown in Figure 6-78. Class B derives publicly from both
Al and A2, which means that all of the functionality of both Al and A2 is accessible
442 Insulation Chapter 6
through the public interface of B. Unfortunately, insulation prevents even C++ Cus-
tomers of a procedural interface from knowing anything about how types AI, A2, and
B are related. For example, if we have a pointer to an object of type B and we want to
call a member function defined in AI, we would be out of luck; this is obviously not
acceptable.
Our first thought might be to duplicate the functionality defined in both A1 and A2 in
B. Doing so creates a large number of redundant functions and solves only half the
problem. Suppose we want to use an object of type B in a function that takes an object
of type A1. Should we also make duplicates of each function for every combination of
derived types? I think not.
The C++ language supports implicit (standard) conversion from pointers of a given
type to pointers of another type when the first type publicly inherits (either directly or
indirectly) from the second; it will be necessary to make that conversion explicit in the
procedural interface.
Al *pi_convertBAl(B*);
A2 *pi_convertBA2(B*);
C *pi_convertDIC(Dl*);
C *pi_convertD2C(DC*):
The explicit conversion functions corresponding to Figure 6-78 are shown in Figure
6-79. In this example, four inheritance relationships induced eight functions. Notice
that there are two kinds of functions: one for canst objects and one for non-canst
objects. Although this seems painful, it gets even worse.
Section 6.5.5 Inheritance and Opaque Objects 443
C____A~l_) ( A2 )
Now consider what would happen if we introduced one more inheritance relationship
from C to B, as shown in Figure 6-80. In addition to the obvious two additional conver-
sion routines
B *pi_convertCBCC*);
canst B *pi_convertCanstCB(const C*);
the transitive nature of the IsA relation potentially introduces the following 16 conver-
sions as well:
B *pi_convertDIB(OI*);
Al *pi_convertDIAICDI*);
A2 *pi_canvertDIA2(D1*);
B *pi_convertD2B(D2*);
Al *pi_convert02Al(02*);
A2 *pi_canvert02A2(D2*);
Al *pi_convertCAl(C*);
A2 *pi_convertCA2(C*);
const B *pi_convertConst01BCcanst 01*);
canst Al *pi_canvertConstDlAl(const 01*):
const A2 *pi_convertConstOlA2(const D1*);
canst B *pi_canvertConstD2BCconst 02*);
canst Al *pi_canvertConst02AI(const 02*);
canst A2 *pi_convertConst02A2(const 02*);
canst Al *pi_convertConstCA1(const C*);
canst A2 *pi_convertCanstCA2(const C*);
annoyed by having to use one conversion function-let alone three. There is clearly a
trade-off between the number of conversion-function definitions provided and the
number of conversion-function calls required at runtime.
Attempting to maintain and document all of these functions by hand is expensive and
error prone. Fortunately these conversion functions are trivial, regular, and easy to
generate accurately using techniques similar to those employed in Appendix C for
determining level numbers. Note that instead of trying to document all these conver-
sion functions, it is far more manageable to show users how to infer the appropriate
name based on the two type names:
<Type2> *pi_canvert<Typel><Type2>«Typel>*);
Providing a procedural interface has the distinct disadvantage that, in its pure form,
clients lose the ability to extend the functionality of the system through the use of
inheritance. With careful design, however, it is possible to provide a procedural inter-
face and augment it with a few select header files to mitigate this problem. Procedural
interfaces require the use of long and tedious function names to do what is nonnally
section 6.6 To Insulate or Not to Insulate 445
done by members within class scope, operators, and standard conversions. Function
names become even more tedious when the interface is made ANSI C compliant.
A procedural interface is neither object oriented nor particularly elegant, but it does
have one big advantage: a procedural interface can always be used to insulate the
organization of a large system from clients-even if such an interface was not consid-
ered during the early stages of the design.
There is little we can do to insulate our ~lients from the changes we make to the logi-
cal interface of our components-a fact that underscores the importance of getting
major interfaces correct early in the design process. Batching up such changes and
publishing them infrequently in the form of a software release (see Section 7.6) can
reduce but not eliminate their cost.
As we have seen from the previous sections in this chapter, there are steps we can take
that will reduce or even eliminate our clients' recompilation costs due to changes in
the logical implementations of our components. But insulation itself is not without
cost. Sometimes it will take more development effort to create an insulating interface
for a component, and in some cases insulation could significantly degrade runtime
performance.
In the following subsections we discuss the costs of insulation, when insulation is (or
is not) appropriate, and what kinds of insulation techniques are best suited for particu-
lar situations that arise commonly in practice.
Insulating a class clearly can affect its runtime performance. The degree of impact
depends on the class itself, the way it is used, and the techniques used to insulate it.
446 Insulation Chapter 6
Relative Cost
Access of Access Alone
By value via inline function 1
By pointer via inline function 2
Via non-inline, non-virtual function 10
Via virtual-function mechanism 20
Relative Cost
Creation of Allocation Alone
Automatic 1.5
Dynamic 100+
Figure 6-81 provides some hard numbers for the relative costs of various forms of
function calls and object instantiation. As the figure shows, the cost of accessing data
either directly or through an inline function is statistically identical. Using the
CFRONT 3.0 C++ Compiler on a SUN SPARC-2 workstation with no optimization, it
takes about 1/8 of a microsecond to access an integer data member (either directly or
via an inline function) and assign it to another integer variable (see Figure 6-81a, c).
Notice that it takes 60 percent longer on a SPARC-2 and twice as long qn a SPARC-20 to
accomplish this operation if the access must go through a pointer (b, d).19
struct A {
int d_d;
inline int i() canst;
int f() canst;
virtual int v() canst;
};
19 The operation becomes bound by memory-access time on the faster SPARe 20.
section 6.6.1 The Cost of Insulation 447
main ()
{
A a, *p = &a;
int j;
II TIME IN MICROSECONDS
II SPARC-2 SPARC-20
{ Aa } II h . 0.175 0.060
{ A *p = new A; delete p; } II i . 11.757 5.478
}
For a fully insulating class, there can be no inline functions, so eacb access of a pri-
vate member requires indirection through a pointer. The cost of accessing a data
member with a regular function instead of an inline function is increased by almost a
factor of four (e). Notice that the indirection now adds less than 5 percent to the total
cost of the operation (f). That is, the added access cost of not declaring a member
function i n1 i n e dominates the small additional overhead of the indirection.
For a protocol class, there can be no non-virtual functions, and the pointer indirection
is now mandatory. All function calls must go through the virtual function call mecha-
nism. The cost of performing this same operation with a dynamically bound function
instead of a statically bound function again doubles the cost of the operation (g).20
Although the virtual-function call mechanism is somewhat slower than a direct-func-
tion call, for tiny accessor functions it can be significantly slower than accessing the
data directly, using an inline function. Often, however, if one can afford to make a
function non-jnline, one can afford to make it virtual as well. Note that as the size of
20 It
is worth reiterating that the depth of an inheritance hierarchy does not affect the runtime perfor-
mance of virtual functions. Each class maintains its own virtual table(s), so the cost of dispatching
any virtual function is independent of the number of derivations in the class hierarchy.
448 Insulation Chapter 6
the function grows, the runtime cost associated with executing the body of the func-
tion will soon swamp the cost of whatever calling mechanism is used; the speed
improvement of inline over dynamically bound function calls will then become negligible.
Some components are simply not intended for general use. When the audience of a
component is limited, insulation is no longer critical. In that case, the impact of
changes to the uninsulated implementation may not pose any great threat. In fact
some component may be specific to a subsystem that defines a few interface (wrap-
per) components that themselves completely insulate the entire multi-component sub-
system from general users. Examples are the p2p_router of Chapter 4 and the graph
wrapper of Section 6.4.3.1. Heroic efforts to insulate the implementations of each of
the individual components that make up such a subsystem would be misplaced.
section 6.6.2 When Not to Insulate 449
There are two distinct ways to reduce the frequency of recompilation resulting from
changes to the implementation:
When the runtime cost of work done in a given function call is large relative to the
cost of the call itself, insulation will not pose a significant performance problem.
Therefore, if a class is widely used and its member functions are large, the implemen-
tation of that class should be insulated, regardless of any supposed performance
requirements. On the other hand, highly reused, public components 21 with tiny acces-
sor functions should probably not be insulating unless performance is clearly not an
issue. These factors are summarized in Figure 6-82.
Performance Requirement
I II
high
(Don't Insulate!) • (Insulate! )
•
. • . • • • • • • • • • • • • • • • •
•
III • IV
low •
(Don't Insulate?) •
(Insulate! )
Io....-_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _~ Member Function Size
small large
Fortunately public, low-level classes are typically developed, tuned, and tested thor-
oughly early in the development process. After that, they are seldom if ever modified.
Such intentionally non-insulating, globally used classes become almost like funda-
mental types in the system. Classes such as Poi nt, St r i n g, and Lis t are often used
both internally and as a "medium of exchange" among the major subsystems. It is
understood by developers that these highly reused types are not likely to change.
21 The term public here implies a low-level component or interface that is used widely throughout an
entire system.
Section 6.6.2 When Not to Insulate 451
For a tiny object that does not allocate additional dynamic memory at construction,
the additional cost of returning a fully insulating version of that object by value could
be so severe as to affect the design of the interfaces that use it.
Figure 6-83 illustrates the added runtime cost of partially and fully insulating a Poi n t
class with respect to returning a Poi nt by value from a non-inline function. Using the
original non-insulating Poi n t class implementation of Figure 5-59, it takes 1.52
microseconds on a.SPARe 2 for a call to the getPointA function to return a Poi nt by
value. Moving all of the function definitions out of line while leaving the data mem-
bers embedded in the class definition causes this time to more than double (3.39
microseconds). Fully insulating the class (implying dynamic allocation of the data)
causes the function call to take 10 times as long as it would have for the non-insulat-
ing implementation. For an ultra light-weight class such as Poi nt, the reduction in
runtime performance incurred by insulating its logical implementation is probably
unacceptable.
Other reasons not to insulate could result from a shortage of personnel. There may be
no compelling reason to insulate, and the incremental increase in development time
necessary to achieve the insulation may not be deemed cost-effective. Creating a
wrapper requires significant planning and effort; deadlines and a lack of experience
may prevent potentially wrappable subsystems from getting wrapped properly.
Insulation may be omitted because the added physical complexity of introducing yet
another component may be judged not to be worth the potential benefit that would be
gained through insulation. Both protocols and wrappers involve creating a separate
component to act as the interface. This separate physical entity contributes to the
overall complexity of the physical architecture.
.
Finally, insulation is an additional, independent constraint on the implementation of a
component or subsystem. Addressing this requirement leads to a somewhat more
complex implementation that may be harder for some to understand and marginally
more difficult to maintain than an uninsulated component or subsystem. For example,
fully insulating a class requires creating a separate structure in the . c file and remem-
bering to dynamically allocate and delete it during construction and destruction,
respectivel y.
Added initial development cost, increased component count, and increased complex-
ity are at least tenable reasons to resist unnecessary insulation. There are, however,
clear overall maintenance benefits to be gained from insulation. In the absence of
compelling reasons one way or the other, keep in mind that insulation is more eco-
nomically removed than installed late in the development process.
For large, widely used objects, insulate early and selectively remove
the insulation later if necessary.
section 6.6.3 How to Insulate 453
Once a system is complete and performance analysis proves that removing the insula-
tion from a few key components significantly improves the overall system perfor-
mance, at least the benefits of the insulation will have been realized throughout the
bulk of the development effort. Waiting until the end of a large project to determine
empirically which components can be insulating without significant loss in perfor-
mance sacrifices much of the initial maintenance b'enefit that insulation provides.
In practice, there are two main ways to insulate clients from the logical implementation
of a class:
Because a protocol defines a pure interface, clients of the protocol not only do not
depend on the implementation at compile time, but, unlike with other techniques, they
need not depend on any particular implementation at link time either.
Classes that already employ virtual functions are probable candidates for "perfect"
insulation by extracting a protocol class. These objects are already treated as base
classes, and they already incur the extra overhead of carrying around a pointer to a
virtual-function table in every instance of the class. More often than not, base' classes
with virtual functions are not intended to be instantiated. If a class either declares any
pure virtual functions or declares all of its constructors non-p ub 1 i c, the base class
454 Insulation Chapter 6
cannot be instantiated on the program stack by public clients. Therefore, the usage of
such classes will be left essentially unaffected by insulating them with a protocol.
For utility classes that act as modules (e.g., GeomUt i 1 in Figure 5-21), there is no need to
create an instance of the class in order to use its functionality. In that case, declaring all
member functions s tat i c and non - i n1 i ne, and moving any static member data to the . c
file (at file scope), obviates the overhead of instantiation and the virtual call mechanism.
For a "small" class with mostly non-trivial accessor functions such as a reasonable
Stri ng class, total insulation might be appropriate. Notice here that the implementa-
tion of even simple functions such as equality (==) and assignment (=) potentially
involves loops, additional dynamic allocation, or at minimum another non-inline
function call to strcmp or strcpy. Insulating the implementation of this class would
actually facilitate performance tuning by allowing different implementation strategies
(e.g., reference counting and caching length) to be profiled and evaluated in the con-
text of actual usage without having to recompile the entire system. Again, the insula-
tion could always be removed (if necessary) much later in the development process.
Large, high-level, instantiatable objects (e.g, a circuit siI?ulator or parser) that do not
make use of inheritance or virtual functions in their interface can usually be "fully
insulated" or "wrapped" with negligible impact on either size or runtime overhead.
Insulation is indicated for such objects, especially when the object is intended for
widespread, general use outside of the local software development group.
The p2p_Router shown in Figure 4-2 illustrates an ideal example of a fully insulating
wrapper. The router is not a module; an instance of the router must be created and
"programmed" before it can be used. The work required to construct an instance of a
p2p-"-Router is not trivial, nor is the work done by the addObstructi on function used
to program it. However, the time spent using the router is completely dominated by
work done in the lower levels of the router subsystem on each call to the fin d Pat h
function. The added runtime cost of insulating the router is thus completely negligible.
As a final example, consider how we might go about insulating (someone else's) class
Sol i d, whose header is shown in Figure 6-84. Sol i d is intended to be a common base class
for a variety of solids but is not itself instantiatable. This intent is corroborated by observing
that the constructors and assignment operator for the class are declared pro tee ted ·
II solid.h
#ifndef INCLUDED_SOLID
#define INCLUDED_SOLID
Section 6.6.3 How to Insulate 455
#ifndef INCLUDED_IOSTREAM
#include <iostream.h>
#define INCLUDED_IOSTREAM
1tendif
class Solid {
int d_color;
double d_scale;
double d_density;
ostream *d_errorStream_p;
protected:
II STATIC MEMBERS
static double distance(double xl, double y1. double x2. double y2);
II CREATORS
Solid(ostream *errorStream, double density. double scale = 1.0);
Solid(const Solid& solid);
II MANIPULATORS
Solid& operator=(const Solid& solid):
void setColor(int color) { d_color = color; }
II ACCESSORS
virtual double surfaceEquation(double x, double y, double z) - 0;
II Point(x.y.z) is on the surface when function returns
II approximately 0 (to within some small tolerance).
ostream& errore) { return *d_errorStream_p; }
double masse) const { return density() * volume(); }
public:
II CREATORS
virtual,....Solid();
II MANIPULATORS
virtual void setTemperature(int degrees) = 0;
II Changing the temperature may affect color, depending on the
II actual object.
void setScale(int scale) { d_scale = scale; }
II ACCESSORS
virtual double temperature() const = 0;
int scal~() { return d_scale; }
int color() const { return d_color; }
double density() const { return d_density; }
double volume() const;
double centerOfMassInX() const;
double centerOfMassInY() const;
double centerOfMassInZ() canst;
/ I ...
};
#endif
Figure 6-84: Base-Class So 1 ; d with Public and Protected Interface
456 Insulation Chapter 6
The scale attribute of a So 1 i d determines the relative size of the object. Users of
So 1 i d are permitted to access and modify its scale directly. The protected pure-virtual
sur fa c e Equa t ion function allows a derived class to program the unique behaVior
necessary to describe its own surface (parameterized by sea 1 e ( )) through an implicit
equation. For example, the surface of a sphere might be described as
Non-virtual functions in the base class use the surface equation to compute, among
other things, the Sol i d's volume and center of mass in each spatial dimension. Making
the surfaceEquat i on function protected prevents direct access to surfaceEquati on
by the pUblic. 22
Since it is up to the derived object to define both the behavior of getting and setting
the temperature and how that affects the color of the specific object, the public t em-
perature and setTemperature functions of Sol id have been declared pure virtual.
(Notice that the internal representation of the temperature is already insulated from
clients of the base class.)
Since all objects have a color (encoded as an integer), a private integer data member
and a public inline accessor are provided in the base class. General clients of Sol i d
are not permitted to set the color of an instance directly. Rather, they are required to
adjust its temperature, which in tum may affect the color of the object. The set Color
manipulator function is therefore protected, so that only the derived object itself can
alter the color of this instance directly.
The public interface provides several accessor functions, some of which (such as
vol ume) do substantial numerical work when invoked. The protecteq interface pro-
vides derived-class authors with several helper functions such as setTemperature
that may prove useful in implementing required virtual functions.
Rather than exposing the 0 s t rea mpointer data member directly in the protected inter-
face, the protected function err 0 r is supplied to provide a convenient stream refer-
22 Thedesired effect could also have been achieved by making this virtual function private, but that
would have made the tasks of the derived-class developer less obvious.
Section 6.6.3 How to Insulate 457
ence for reporting errors (such as setting the temperature too low or too high). The
mass may play a role in determining the color, particularly for a very large, dense
Sol i d such a black hole. The protected member function mas s, which calculates the
mass using members supplied in the public interface, is provided for the convenience
of derived-class authors. Finally, dis tan c e is a function frequently used by derived-
class authors. Unlike the mass helper function, di stance does not depend on an
instance of any class and so is made a protected static member of class So 1 i d.
Clearly the original author of the Sol i d base class did not consider insulation an
important design criterion, as evidenced by the casual use of inline functions. Fortu-
nately we have several techniques available to improve the insulation of the imple-
mentation of So 1 i d. These insulation improvements fall into two basic categories:
total and partial.
As an exercise, let us first see what kinds of incremental improvements we can make
to the class So 1 i d:
#ifndef INCLUDED_IOSTREAM
#include <iostream.h>
#define INCLUDED IOSTREAM
#endif
Without needing to think much at all, we can convert the above to c 1 ass 0 s t rea m; to
eliminate unnecessary compile-time dependence on the i ost ream header and to avoid
the unnecessary creation at startup of a static dummy object in every translation unit
that includes sol i d. h (see Section 7.8.1.3).
protected:
static double distance(double xl. double ylt double x2, double y2);
Since dis tan c e is a static function, it does not depend on the Shap e instance data; it
can easily be moved to a separate utility component.
protected:
double masse) const { return density() * volume(); }
private:
ostream *d_errorStream_p;
protected:
ostream& errore) { return *d_errorStream_p; }
It may be possible that some derived So 1 i d objects can be set to any temperature
without error. The error stream function, err 0 r ( ), is just a convenience that some
derived-class authors may find useful. We could simply remove the
d_errorStream_p data member from the factored implementation (as we did with
Scri be in the shape subsystem of Figure 6-26) and let derived-class authors imple-
ment an error stream only if needed.
private:
int d_color:
double d_scale;
double d_density;
protected:
void setColor(int color) { d_color = color; }
public:
int scale() { return d_scale; }
int color() canst { return d_color; }
double density() canst { return d_density; }
II solid.h
#ifndef INCLUDED_SOLID
#define INCLUDED_SOLID
class Solid {
Solid_i *d_this;
protected:
II CREATORS
Solid(double density, double scale - 1.0);
Salid(const Solid& solid);
section 6.6.3 How to Insulate 459
II MANIPULATORS
Solid& operator=(const Solid& solid);
void setColor(int color)
II ACCESSORS
virtual double surfaceEquation(double x, double y, double z) - 0;
II Point(x,y,z) is on the surface when function returns
II approximately 0 (to within some small tolerance).
public:
II CREATORS
virtual ~Solid();
II MANIPULATORS
virtual void setTemperature(int degrees) = 0;
II Changing the temperature may affect color, depending on the
II actual object.
void setScale(int scale);
II ACCESSORS
virtual double temperature() canst - 0;
int scale():'
int color();
double density() canst;
double volume() canst;
double centerOfMassInX() canst;
double centerOfMasslnY() canst;
double centerOfMasslnZ() canst;
I I ...
};
lIendif
At this point, to do any better we will have to use some form of total insulation tech-
nique. We cannot fully insulate the implementation of this class as it stands because of
the use of virtual functions .. Wrapping this class would preclude general users from
deriving new kinds of So 1 i d at will. Of the insulation techniques presented in this
chapter, extracting a protocol is by far the best alternative here.
As with the Ca r class shown in Figure 6-30, we are unable simply to remove all of the
protected functions and place them in a separate utility because of their intimate
interaction with the instance itself. That is, functions in the protected interface (e.g.,
setCol or) were supplied only to implement virtual functions (e.g., setTemperature)
defined in derived classes. At the same time, these protected functions depend
460 Insulation Chapter 6
directly on instance information (e.g., d_co lor) that is accessible by clients via pUblic
functions (e.g., color) defined in this base class. It is primarily because of the virtual
function dependency on intrinsic instance data that we are forced to extract a protocol
to achieve total insulation.
Figure 6-86 shows the result of extracting a protocol from either the original Sol i d or
the partially insulated version. Notice how extracting a protocol class always enables
us to avoid exposing the protected members of the base class .
•
II solid.h
#ifndef INCLUDED SOLID
#define INCLUDED SOLID
class Solid {.
public:
II CREATORS
virtual ~Solid();
II MANIPULATORS
virtual void setTemperature(int degrees) = 0;
Il'Changing the temperature may affect color. depending on the
II actual object.
virtual void setScale(int scale);
II ACCESSORS
virtual double temperature() canst - 0;
virtual int scale():
virtual int color();
virtual double density() canst;
virtual double volume() canst;
virtual double centerOfMassInX() canst;
virtual double centerOfMasslnY() canst;
virtual double centerOfMasslnZ() canst;
I I ...
};
#endif
p2p_Ro ute r class without disturbing the independently tested implementation com-
ponentp2p_Routerlmp.
In many cases this degree of insulation may be good enough. But if p2p_router
defines a very public interface, we can do better. A fully insulating p2p_router com-
ponent would (forward) declare its own implementation structure (e.g.,
p2p_Router_i). Then, in the p2p_router.c file, struct p2p_Router_i would be
defined with a single embedded member of type p2p_Router Imp:
II p2p_router.c
#include "p2p_router.h"
#include "p2p_routerimp.h"
struct p2p_Router_i {
p2p_Routerlmp d_imp;
};
//
1/ p2p_router.c
#include "p2p_router.h"
#include "p2p_routerimp.h"
struct p2p_Router_i {
p2p_Routerlmp d_imp;
int d_moreData; // added fully insulated detail
};
// ...
In this case, doing it right requires just a bit more development effort, but achieves
total insulation without affecting runtime perfonnance at all.
462 Insulation Chapter 6
Sometimes obtaining the last little bit of insulation can be very costly. Recall from
Section 6.4.3 that we opted not to insulate all of the 9 rap h component class com-
pletely. To do so would have caused a disproportionately high cost in terms of runtime
performance. To illustrate this principle, consider the four related implementations of
the graph subsystem we have seen in this and the previous chapters:
System III: Insulating Wrapper. This subsystem fully insulates the imple-
mentations of three of the five wrapper classes presented in Figure 6-53.
The remaining two classes, Nod e I d and Ed gel d, expose their respective
implementation class names, Gnode and Gedge, in their physical (but not
their logical) interfaces.
System IV: Fully Insulating Wrapper. This subsystem fully insulates the
implementations of all five of the wrapper classes presented in Figure 6-53.
To illustrate the runtime cost of these various graph architectures under a spectrum of
operating conditions, I created a small test program to run a series of experiments. In
this program, the graph subsystem is used to create the arbitrary graph structure
shown in Figure 6-87. In this graph, each edge happens to have a weight of 1, but the
particular edge values will not affect the experiment.
After creating an instance of this graph, the program invokes a Nod e I t e r to iterate
over all 15 nodes in the graph, accumulating the values obtained by calling s urn on
each. The recursive function s urn explores the graph from a specified node to a speci-
fied depth, accumulating the weights of the edges it encounters along the way. Since
sum is exploring a binary tree, the runtime of sum is exponential with respect to the
depth to which it searches.
The source for the actual test program is provided in Figure 6-88. The first command-
line argument to the test driver indicates the depth to which s urn is to explore the
464 Insulation Chapter 6
graph. The second command-line argument specifies the number of times to repeat
the (identical) experiment; this second argument is used to obtain accurate time mea-
surements for an average iteration.
II graph.t.e
#include "graph.h"
#include "node.h"
#include "edge.h"
#include <iostream.h>
#include <stdlib.h>
Nodeld nO = g.addNode("nO");
Nodeld n1 = g.addNodeC"n1");
g.addEdge(n, nO, 1);
9 . addE d9 e ( n, nIt 1);
Section 6.6.4 How Much to Insulate 465
total = 0:
for CNodelter it(g); it; ++it) {
total += sum(itC), depth);
}
}
Figure 6-88: Test Driver for Measuring Runtime Efficiency of Graph Subsystems
466 Insulation Chapter 6
The test driver was run for depths ranging from 0 to 20 with a repeat value of 1,000
(depth 0-5),100 (depth 6-10),10 (depth 11-15), and 1 (depth 16-20) on each of the
four systems described above. 23 The results of this very illuminating experiment are
given in Figure 6-89.
When the depth of the graph traversal is specified as 0, no graph traversal takes place.
Most of the time is spent in building up and tearing down the graph structure. These
kinds of operations are inherently relatively expensive; as the first line of Figure 6-89
indicates, the effects of encapsulating and even insulating are negligible and small,
respectively. When fully insulating, we incur a runtime cost that is 83 percent higher
th~n our cost when not insulating. This is because of the very pronounced increase in
the cost of returning a fully insulating Nodeld by value from Nodelter.
As we increase the depth of the graph, the cost of traversing it begins to affect overall
performance. Functions that are used to read the information in a graph are much
smaller and do much less work per call than those used to construct the graph. These
lightweight functions, however, are called many, many times in the course of travers-
ing the graph.
At a depth of 5, it takes 2.5 times as long for the experiment to run on System I as it
took at a depth of 0; however, many times that number of additional function calls are
occurring. If these small functions are made disproportionately expensive, the run-
time performance will suffer. At this same depth, the encapsulated System II now
experiences an increase of 37 percent compared to the runtime for the unwrapped
System I. The partial insulation of System III causes the experiment to take 5 times as
long. The dynamic allocations brought on by totally insulating Nodeld and Edgeld in
System VI have cost us a factor of 30!
At a depth of 10, it takes 100 times as long for the experiment to run on System I as it
did at a depth of O. The time spent calling those "little" functions now dominates the
runtime cost. For an encapsulating wrapper (System II), this experiment will run
about 67 percent longer. For an insulating wrapper (System III) it will take over 8
times as long, and for a fully insulating wrapper (System IV), it will take fully 40
times as long.
23 The test driver was trivially altered to accommodate the slightly different interface of System I.
468 Insulation Chapter 6
By scanning down Figure 6-89 from this point, we can see that we have reached the
other asymptote; increasing the depth does not further spread the respective runtime
performance ratios of these graph subsystem variants.
5. Providing a totally insulating wrapper for tiny objects that are frequently
returned by value can have a devastating effect on overall performance
(forcing the degree of insulation to be reduced and/or the level of insula-
tion to be escalated).
6.7 Summary
In this chapter, we introduced the concept of insulation as the physical analog of the
logical concept commonly referred to as encapsulation. An implementation detail of a
component is insulated if it can be changed without forcing clients of the component
to recompile.
Several constructs were identified that could potentially result in undesirable compi1e~
time coupling:
Section 6.7 Summary 469
All other things being equal, it is better to insulate a particular implementation detail
from a client than not-even if other details remain uninsulated. Partial implementa-
tion techniques are used to reduce the extent of compile-time coupling without incur-
ring all of the overhead that total insulation could imply:
For widely used interfaces, avoiding all compile-time dependency on the underlying
implementation details is highly desirable. Three general insulation approaches were
discussed to insulate clients from all implementation details:
working on very large systems that may not have been designed with a procedural
interface in mind.
• Time to access data: The class may have embedded data and make effective
use of tiny inline functions to access it.
• Time to create objects: A tiny class (e.g., Poi nt) may not already allocate
dynamic memory.
A large project can span many developers, several layers of management, and even
multiple geographic sites. The physical structure of the system will reflect not only
the logical structure of the application but' also the organizational structure of the
development team that implements it. Large systems require hierarchical physical
organization beyond what can be accomplished by a levelizable hierarchy of individ-
ual components alone. In order to encompass more complex functionality, we need to
introduce a unit of physical design at a higher level of abstraction. This chapter
addresses the physical structure needed to support the development of very large sys-
tems. In particular, we introduce a macro unit of physical design referred to in this
book as a package.
473
474 Packages Chapter,
In large systems, static initialization can lead to unacceptably long invocation times.
We take a look at four alternative initialization strategies, comparing their relative
strengths and weaknesses as we go. We also address the need to clean up before pro-
gram exit in order to facilitate memory regression testing.
As we saw with the p2p_router example in Chapter 4, we can build fairly complex
subsystems using only a handful of components. In that example, the implementation
of high-level functionality declared within a single component interface was distrib-
uted across a hierarchy of components that greatly improved its testability. A system
consisting of tens of thousands of lines can be supported easily without further parti-
tioning. But what if our systems are much bigger than this? Suppose they consist of
hundreds of thousands of lines of code. How would we address the physical organiza-
tion of literally hundreds of components? As ever, we will address complexity with
the tried-and-true: abstraction and hierarchy.
When designing a system from the highest level, there are almost always large pieces
that it makes sense to talk about abstractly as individual units. Consider the design of an
interpreter for a large language (such as C++) shown in Figure 7-1. Each of the sub-
systems described in that design is likely to be too large and complex to fit appropriately
Section 7.1 From Components to Packages 475
into a single component. These larger units (indicated in Figure 7-1 with a double
box) are each implemented as a collection of levelizable components.
Interpreter
Runtime Database
The dependencies in Figure 7-1 between these larger units represent an envelope for
the aggregate dependencies among the components that comprise each subsystem.
For example, the runtime database is an independent subsystem; it has no dependen-
cies on any external components. Each of the parser, evaluator, and formatter sub-
systems has components that depend on one or more components in the runtime
database, but none of the components in any of these three subsystems depends on
any components in the other two parallel subsystems. The top-level interpreter con-
sists of components that depend on components within each of the three parallel sub-
systems (and perhaps directly on components within the runtime database). Carefully
partitioning a system into large units and then considering the aggregate dependencies
among these units is critical when distributing the development effort for projects
across multiple individuals, development teams, or geographical sites.
Although the design of Figure 7-1 would not be considered a large project, it could eas-
ily be assigned to more than one developer. There is a natural partitioning that would
allow several developers to work on this project concurrently. After the runtime data-
base is designed, there would be an opportunity for three concurrent development
efforts to begin on the parsing, evaluating, and formatting functionality. Once these
pieces start to fall into place, the implementation and testing of the top-level inter-
preter can begin.
476 Packages Chapter 7
Until now, we have discussed these separate subsystems as conceptual units with no
actual physical partitions. If the entire project is expected to require only 20,000 lines
of code and is being implemented by a single developer, there may be no compelling
need to partition the overall architecture into distinct physical units. However, if the
design is, say, 80,000 lines of code or if more than one developer will be working on
the project at any given time, there is a much greater need for the conceptual physical
partitioning to become concrete.
The tenn package refers to a generally acyclic, often hierarchical collection of com-
ponents that together have a cohesive semantic purpose. Physically, a package con-
sists of a collection of header files along with a single library file containing the
information in the corresponding object (. 0) files. A package might consist of a
loosely coupled collection of low-level, reusable components, such as the original
Standard Components library from AT&T, 1 and now the new Standard Template
Library (STL) developed at Hewlett-Packard. 2 A package might also consist of a spe-
cial-purpose subsystem intended for use by only a single client, such as the
p2p_router subsystem from Chapter 4.
Figure 7-2 illustrates one possible organization for packages within a file system. In
this organization, all packages exist at the same level in the directory structure regard-
less of their physical interdependencies. All headers (required outside a given pack-
age) are placed in a single, system-wide directory called i ncl ude. A library file
corresponding to each package is placed in a single systemwide directory called 1i b•
.
Each package directory contains files holding the source code for components aSSOCI-
ated with that package. As illustrated schematically in Figure 7-2, package pk contains
n components: pk_cl, pk_c2, ... , pk_cn in its source directory. Each component
(e.g., p k_c i) has an associated header' file (p k_c i . h), an implementation file
(p k_ c i . c), and an individual test driver (p k_ c i . t . c) that can be used to exercise the
functionality implemented in the component inc~ementally. Note that to be effective,
these hierarchical test drivers should be considered as much a part of the system
source code as the components they test. These drivers can be easily distinguished
from the implementation files by their . t . c. suffix.
system
develop include 1i b
pl_cl. h libpl.a
. . . libp2.a
pl_cn.h . . .
p2_cl.h libpm.a
. . ..
.. . .
pm_ cn.h
pI p2 pm
dependencies exported
source
pk_cI.h pk_ cl.c pk_cl.t.c
pk_c2.h pk_c2.c pk_c2.t.c
. . . . . . . . .
pk_ci .h pk_ci .c pk_ci .t.c
.. . . .. • . . . .
pk_cn.h pk_cn.c pk_cn.t.c
In addition to the source directory, there are two files under each package directory.
The dependenci es file holds the names of all other packages upon which this pack-
age is authorized to depend. That is, in order to use this package, clients will not have
to include or link to any other component defined in another package unless that pack-
age is named in the dependencies file associated with this package. Although package
dependencies seldom change, it does occasionally happen. Specifying these depen-
dencies is the job of the system architect; verifying them is a process that can and
should be automated.
478 Packages Chapter 7
The ex p0 r ted file contains a list of component headers that are to be placed in the
systemwide include directory of Figure 7-2 for use by general clients. Since not- all
headers defined in a package are intended for use by external clients, the set of
exported headers may be a proper subset of the components defined within the package.
Until now, we have addressed levelization only at the component level. Recall from
Section 4.7 that components that do not depend on any other (local) components are
assigned a level of 1. By local we were referring to components defined in our pack-
age; components defined in other packages were assigned a level of o.
Level 2:
Package Level 2:
Levell:
pkgb
I/"',"}']"I
Package Level I:
Figure 7-3 illustrates the way we have all along been treating the dependencies of ~ur
subsystem (pkgb) on another subsystem (pkga). When testing our own pack age hlef-
section 7.1 From Components to Packages 479
archically, we assume that components defined outside our package are already tested
and known to be internally correct. We therefore can assign to each of these external
components a level number of 0 with respect to our local components. Components
within our own package (e.g., i and j) that do not depend on any other components
local to this package are defined to have a level of 1. Components that depend locally
on components at level 1 but no higher (e.g., k and 1) are at level 2.
As Figure 7-4a shows, the individual component dependencies across package bound-
aries of Figure 7-4b have been abstracted away and replaced with overall package
dependencies. For example, the dependencies of component p on component d and
component q on components e and f shown in Figure 7-4b are collectively repre-
sented in Figure 7-4a by the package dependency of pkgc on pkga.
480 Packages Chapter 7
Lev 2:
Package Level 3:
Lev 1:
pkgd
Lev 2: Lev 4:
Package Level 2:
Lev 1:
pkgb
Lev 3:
Lev 2:
--
Lev 1:
pkgc
Lev 2:
Package Levell:
Lev 1:
~:.2.:.:..........:......l
pkga
Component Level 6:
Component Level 5:
Component Level 4:
Component Level 3:
Component Level 2:
Component Levell:
Notice that the local component level numbers within each package of Figure 7-4a
still begin with level 1. This is again because dependencies on other packages are
treated as "primary inputs" (see Section 4.7) and, for the purposes of hierarchical test-
ing, are presumed to be correct. As is common, each of these packages contains leaf
components (i.e., components such as t that do not depend on any other components
in the system). In an unpackaged system (Figure 7-4b), these leaf components would
all have an absolute component level of 1. Consequently there is a tendency for many
components to fall to the lower levels of the unpackaged diagram, perhaps obscuring
their purpose. It is by packaging these leaf components along with their clients that
we are able to improve the modularity of the system.
Often a package will hold dozens of components. While a typical component might
consist of 500 to 1,000 lines of source code, a typical package might encompass any-
where from 5,000 to 50,000 lines of source. Decomposing large designs into cohesive
packages of manageable size greatly simplifies the development process. For develop-
ers, comprehending up to a few dozen components and their detailed interdependen-
482 Pac~ages Chapter 7
cies within a package (as in Figure 7-4a) is significantly easier than understanding the
arbitrary dependencies among potentially hundreds of unpackaged components (as in
Figure 7-4b).
Packaging also allows system architects to understand, discuss, and develop the Over-
all architecture of a large system at a much higher level of abstraction than would oth-
erwise be possible. For example, an architect can delineate the responsibility of a
package and then specify acceptable dependencies among entire packages as part of
the overall system design without having to address individual components. The
actual package dependencies can later be extracted from the source code and com-
pared against the architect's specification.
Having all the packages at the same level in the directory structure makes them easily
accessible to developers. Using special-purpose tools (see Appendix C), the physical
package interdependencies can be extracted from and compared against the architect's
specification located within the dependencies file of the development structure shown
in Figure 7-2. Note that to guarantee package-Ievellevelization when testing a new
version of a package, only those packages named in the dependencies file should have
their exported headers made available for inclusion or their libraries files supplied in
the link command.
The partitioning of components into packages is governed by more than just some
arbitrary threshold of size or complexity. Identifying package-sized units of cohesive
functionality is a natural consequence of top-down design. As with class dependen-
cies within a single component, component dependencies within a package are often
more numerous and intricate than dependencies across package boundaries. Because
of their more localized nature, the physical character of dependencies among compo-
nents within a package often involves more compile-time coupling than their inter-
package counterparts. In fact, some components defined in a package may be merely
insulated implementation details of other components defined in the same package;
the headers for these implementation components would probably not be made avail-
able outside of the package.
entire system. Therefore, highly coupled parts of the system are often better off being
part of a single package.
The degree to which parts of the system are likely to be reused as a unit also plays a
role in the packaging of components. In the example of Figure 7-1, the runtime data-
base may be used by a suite of tools, while the three parallel subsystems are used only
once. Even if the runtime database were very small in comparison to these other parts
of the system, it could make sense to place this low-level subsystem in its own pack-
age to avoid tying its reusable functionality to any of the other less-often-used pack-
ages. (An analogous argument was presented for demoting enum E in Section 5.3;
Figures 5-24 and 5-25.)
As was discussed in Section 2.3.5, the only logical entities declared at file scope in
header files are classes, structs, unions, and free operators. The reason given for this
restriction was to reduce the opportunity for name collisions. When only a single
developer is involved, ,it is not hard to avoid name collisions simply by following this
strategy. N amespaces (as discussed in Section 7.2.2) can be used to counter a disorga-
nized proliferation of global names resulting from the integration ,of completely inde-
pendent development efforts. However, when dealing with many developers working
across multiple sites on a large unified system, a more structured approach is required.
The approach taken here, which ensures unique global class names, requires that each
package be associated with a unique registered prefix consisting of two to five charac-
ters. When a package is first created, its prefix is registered with some company-wide
authority or service so that no other package developer will inadvertently reuse it.
Each construct in the header file declared at file scope is prepended with the package
484 Packages Chapter 7
prefix. The . c and . h files implementing this component are also each prepended
with the same prefix. It is by prepending each global name with this registered prefix
that we are able to guarantee that similar names defined in distinct packages cannot
possibly collide.
Each identifier declared at file scope must be preceded by a registered prefix in order
to ensure the avoidance of name conflicts across package boundaries. Although only
classes, structs, unions, and free operators are allowed at file scope, extraordinary cir-
cumstances (such as the ANSI C--compliant interface of Section 6.5.4) could force an
exception to this rule. If for some reason we were to declare a function, variable, enu-
meration, or typedef at file scope in a header file, we would still want to make sure to
prepend each of its file-scope identifiers with the appropriate package prefix. This
independent design rule is illustrated in Figure 7-5.
3 Note that for the purposes of the convention for distinguishing type names from non-type names as
presented in Section 2.7, we have elected not to treat the prefix as part of the identifier. An equiva-
lent and equally valid convention would be to capitalize the prefix instead (e.g., Geom_poi nt). Capi-
talizing Poi nt rather than Geom merely emphasizes that geom_Poi nt is a Poi nt type in the geom
package.
Section 7.2.1 The Need for Prefixes 485
Identifiers declared within class scope need not have package prefixes because the
enclosing class (which is prefixed) provides a natural shield against collisions as well
as a suitable grouping for related functionality. Similarly, identifiers with internal
linkage, declared and used entirely within a single . c file, also need not use prefixes.
That is, the scope of a typedef, enumeration, static variable, or static (or inline) free
function specified within a . c file is limited to a single translation unit and therefore
cannot collide with an identical short name defined locally within another translation
unit. Static class member data and non-inline member functions have external link-
age. It is therefore appropriate to use package prefixes for class names even when the
class itself is defined and used entirely within a single . c file. Otherwise we run the
risk that such a hidden class will produce external symbols that at link time might col-
lide with those of a class hidden in the . c file of a component belonging to some other
package. 4
4 Note that prefixes are not strictly necessary for hidden classes, provided that the developer ensures
that all aspects of linkage for the hidden class are internal. A generally useful extension to the pack-
age-prefix technique for naming classes with external linkage that are private to a component was
presented in the context of fully insulating classes at the end of Section 6.4.2.
486 Packages Chapter 7
Names generated by the compiler are sometimes geared to the name of the source file
itself. In CFRONT, file names are used as a basis for naming both the virtual tables and
also for naming the entry points for initializing and destroying instances of user-
defined types defined at file scope-both of which have external linkage. Therefore,
to avoid link-time conflicts, it is important that all source files in the system have
unique names. The library containing all of the .0 files for the geom package would
also be adorned in some manner with the "geom" prefix (e.g., 1 i bgeom. a on a Unix
system).
For many systems, harsh limitations on file-name length make prepending unique pre-
fixes painful. If the limitation is eight characters or fewer, the file names could get
rather cryptic. On some systems (e.g., Unix), file-name length is not a problem except
for archaic constraints placed on the length of the name of a . a file that can be placed
in a library archive file. The names of the corresponding . c files may need to be con-
strained to some relatively small length (as low as 14 characters on some Unix-based
systems). In this case we can either make the. h files correspondingly short to match
the . c file, or we can provide some sort of external cross reference to allow longer
header file names to be associated with shorter (abbreviated) implementation file
names. On my Unix system I use symbolic links to achieve this mapping during
development.
7.2.2 Namespaces
In July of 1993, the ANSIIISO Committee adopted the namespace construct designed
by Bjarne Stroustrup to aid in resolving collisions between global identifiers with the
same name. 5 For example,
namespace geom {
class Point { /* ... *1 };
Point& operator==(const Point& left, const Point& right);
class Polygon { /* ... */ };
// ...
}
defines a namespace geom. The constructs declared within the braces are placed within
their own scope and therefore will not collide with either global names or names
declared in any other namespace. While using directives are supplied primarily to ease
transition, the intent is always to use explicit qualifications via using-declarations: 6
void mySpace::Class::f()
{
9e om: : Poi nt p ( 3 , 2 ) ;
// ...
}
As you can see, both namespaces and registered prefixes can be used in similar ways
to avoid name conflicts among classes developed within a single company. Neither,
however, can serve as a complete substitute for the other.
When dealing with C++ application libraries supplied from two distinct vendors,
there are several potential problems. As described in Appendix B, if the compilers
used to develop these libraries are not compatible, you're out of luck. But even if you
can get both vendors to supply compatible libraries (architecture, operating system,
and compilerllinker), there is no central authority with which to register prefixes; thus
there is a distinct possibility that globally defined names will collide. Herein lies the
power of the namespace construct.
Placing all library code developed by a company within a single namespace wrapper
makes it impossible to ensure that even the unlikely event of matching both prefixes
and identifiers can be overcome merely by explicit qualification. Suppose two compa-
nies, SDL and SCI, both supply geometric library software. Each company decides to
create a "unique" package prefix called geom. Obviously, there is a possibility that one
or more of the geometric names (e.g., Poi nt, Line, Po 1ygon) within those packages
will coincide.
-
6stroustrup94, Sections 17.4.2, p. 408 and 17.4.5.3, p. 414.
488 Packages Chapter 7
II sdl/geom_point.h
#ifndef INCLUDED_SDL_GEOM_POINT
#define INCLUDED_SDL_GEOM_POINT
namespace SOL {
class geom_Point {
I I ...
public:
geom_Point(int x. int y);
geom_PointCconst geom_Paint& pOint);
----geom_Po;nt();
geom_Point& operator=Cconst geom_Point& point);
void setX(int x);
void setY(int y);
int xC) const;
int y() const;
};
}
II sci/geom_point.h
itendif #ifndef INCLUDED_GEOM_POINT
#define INCLUDED_GEOM_POINT
I!endif
II my_class.c
#include "my_class.hl!
#include <sdl/geom_point.h>
#include <sci/geom_point.h>
v0 i d my _C 1 ass: : f C) {
SDL::geom_Po;nt p(1,2);
::geom_Point qC3,4);
I I ...
}
If one (or both) of these companies has the foresight to place their code within a single
companywide namespace, the identifier name conflict-resolution problems disappear.7
The technique of combining package prefixes and namespaces to resolve name con-
flicts among multiple vendors is illustrated in Figure 7-6. Even though SCI did not
choose to use namespaces, we can still access their geom_Poi nt class by prepending
the scope resolution operator (: :) to designate true file scope. Notice that SDL has
protected itself, but SCI is at risk if some other vendor or one of its clients did not
choose to take these precautions.
Because the C++ language supports the arbitrary nesting of namespaces,8 we could
have elected to resolve interpackage name collisions within our company by replacing
package prefixes with package namespaces. For example,
void f
{
SDL::geom_Point pt; II package prefix
// ...
}
void f
{
SDL::geom::Point pt; 1/ package namespace
// ...
}
As we will soon see, however, replacing package prefixes with package namespaces is
ill advised.
As of the writing of this book (May 1996), the namespace feature of the C++ language
was not generally available. Even if it were, it would not affect the need for prefixes,
which have many advantages beyond ~imply avoiding name collisions. A package
serves a cohesive purpose that unites the components within it. Each package tends to
take on its own character. This phenomenon is due in part to the intrinsic nature of the
package and also to the subtle variations in style promulgated by its author. By identi-
fying a component or class as belonging to a particular package, you immediately
7 We could still have problems if compiler-generated symbols with extemallinkage are generated
based on the file name (as is the case in some implementations).
8 strollstrup94, Section 17.4.5.4, pp. 415-416.
490 Packages Chapter 7
provide a context that aids in understanding its broader purpose. 9 In time, the package
prefix will be the first thing to catch your eye when reading application code that
depends on components from mUltiple packages.
Figure 7-7: Link-Time Errors Resulting from Missing the stdc Package Library
9 Forthis and other views on segmenting the global namespace, see stroustrup94, Sections 17.4.1,
p. 406; and 17.4.5.5, pp. 416-417.
section 7.2.3' Preserving Prefix Integrity 491
Sometimes there may be a great temptation to distribute logically related units across
mUltiple physical libraries and to assign these logical units a common package prefix.
For example, a given package (pub) might provide a set of low-level, reusable con-
tainer types. Each of these components and each of the types defined therein would
begin with, the prefix p u b_o Now suppose we are developing our own application
package (x r 2e) and discover we need a new type, Bt r e e, which happens to have similar
characteristics (low level, container, reusable) to those found in the pub package.
What should we do?
We might be tempted to call this component pub_btree and place it in our own
library to reflect its logical relationship to the pub package. This urge should be sup-
pressed. The fact that all components with a given package prefix reside in a single
physical library is too valuable to both understanding and managing the organization
of large systems to be sacrificed.
492 Packages Chapter 7
Probably the easier thing to do is simply to call the class x r 2e_B t r e e and define it in a
component that is part of our own package. Implementing this object locally reduces
the likelihood that it will be reused-which can be both good and bad. By defining the
Btree within the same package, we retain ownership and therefore need not be as con-
cerned about making changes or enhancements to it should it suit our needs to do so.
The potential for reuse is not always obvious a priori. It may be that we believe that no
one else will need a Bt ree type, so we'll just write it and keep it for ourselves. If oth-
ers think this way and the btree component turns out to be truly reusable, we may
eventually see several redundant versions of a Bt r e e popping up in our system. As a
rule, if we see three or more comparable versions of a bt ree component in our system,
the component may very well be a good candidate for reuse. At this point, we should
probably evaluate the impact of consolidating our system by moving a single, unified
version of Btree to the more public pub package (and changing its prefix to pub_).
Often, we will believe that a component is reusable only to find that it is not needed
by others. Placing such deadweight in highly reusable packages is worse than delay-
ing the entry of potentially reusable components into the pub package. It is almost
always easier to make functionality more rather than less public. If in doubt, it is bet-
ter to defer adding a component to a widely used package until empirical evidence
warrants it.
If we are convinced at the outset that a component absolutely belongs in another pack-
age, then we will need to talk to the developer responsible for maintaining that pack-
age. If your proposal is compelling, as it might well be for a Bt ree, the owner of pub
may agree to write the btree component for you and place it in the pub package for
all to use. Note that you will now be just another customer of the pub package, and
give up the right to add intrusive special customizations to the pub_btree component.
Scheduling constraints may force you to write the component yourself and hand it
over (along with its incremental test driver) to the pub package developer. After a
careful review, this developer will assume ownership, and again you will become just
like any other client with no special privileges.
Section 7.3 Package Levelizalion 493
The important trade-off here is that if you create a component redundantly, then you
can make it exactly what you want it to be. You will not have to negotiate with other
package developers, and you may be able to avoid additional package dependencies.
If you hand this component over to some other package developer, you relinquish
responsibility for and control over its functionality_ If the component is not inherently
reusable, the cost to you and to others of sharing it will probably outweigh any bene-
fit. If the component is a good candidate for reuse, then it could be in everyone's best
interest to have it defined and maintained in a single, semantically cohesive, lower-
level package where it can be found and reused easily.
While the notion of a translation unit is well defined in the C++ language, the notion
of a package is entirely the work of the system developers, and its implementation is
dependent on the particular operating system. Because packages are not part of the
language, it is up to system architects and developers to create these cohesive parti-
tions within a large system, almost entirely on their own.
The registered prefix convention for all global identifiers and files is admittedly pain-
ful at first. In time, most people not only adjust to it but come to depend on it during
their daily development efforts. The advantages afforded by registered package pre-
fixes are well worth the extra effort for developing very large projects.
By analogy, a component is to its package as a planet is to its solar system. Each com-
ponent describes a 'physical entity, and each package describes a cohesive aggregate
of these physical entities. The ·physical coupling. among the nearby components
within a package is typically more acute than the coupling between components in
distinct packages.
494 PackageS Chapter 7
Avoiding cyclic dependencies among packages is a major design rule for the follow-
ing reaSons:
Core System
3. Usability. Even if marketing is not an issue, users will not want to have to
link-in a huge library or several large libraries just to use some simple func-
tionality of the basic system (or just one of the supposedly independent
applications). Minimizing package interdependencies reduces the number
of libraries that must be linked into an application, which can in turn help to
reduce the ultimate size of the executable image (both in core and on disk).
5. Reliability. Design for testability dictates that there be a way to test a large
system incrementally and hierarchically_ Avoiding cyclic dependencies
among the macroscopic parts of the system is merely a natural conse-
quence of this paradigm.
496 Packages Chapter,
Although we might be serene enough to tolerate cyclic dependencies among a few com-
ponents within a single package due to carelessness, ignorance, or special circumstance,
we must be steadfast in our resolve to avoid cyclic dependencies among packages.
The techniques for avoiding cyclic dependencies among packages are similar to those
for avoiding cyclic dependencies among components. The basic goal is to ensure that,
if the components in package b depend on services supplied by components in pack-
age a, then components in package a do not depend either directly or indirectly on
components in package b.
r2d2 c3po
Figure 7-9 illustrates a situation in which two packages, r2d2 and c3po, have become
interdependent. This problem is entirely analogous to the problem we encountered in
Figure 5-3, where logical constructs in both rectangl e and wi ndow caused a mutual
dependency between these two components.
mS
r2d2 c3po
Fortunately, remedies analogous to those given in Section 5.2 for untangling the
rectangl e and wi ndow component dependencies apply here also. For example, we
could escalate two of the components contributing to mutual package-level depen-
dency to a higher package level, as shown in Figure 7 -10. Or we might decide to apply
the more general repackaging technique shown in Figure 5-36 to come up with two
entirely new packages.
Component Level 3:
Component Level 2:
The problem identified by Figure 7-12 can arise in practice when a single prefix is
assigned to a conceptual presentation package-that is, a package containing every-
thing directly usable by clients of a multi-package subsystem. If this presentation pack-
age defines both protocol classes (which are inherently very low level) and wrapper
498 Packages Chapter,
components (which are inherently very high level), it will not be possible to interleave
components from separate, intermediate-level implementation packages and maintain
a levelizable package hierarchy. The solution to this common problem is simply to pro-
vide two separate packages for presentation to clients. One package will reside at the
bottom of the package hierarchy and contain components that define only protocol
classes; the second will reside at the top of the subsystem and define only wrappers.
priv
pub
Although ensuring levelizability among packages is essential, that alone is not suffi-
cient. For example, Figure 7 -13a illustrates a bottom-up approach to packaging in
which we have merely taken the unpackaged design of Figure 7-13b and carefully
diced it into packages whose aggregate dependencies on other packages form an acy-
clic graph. But simply partitioning a sea of levelizable components into an otherwise
arbitrary set of levelizable packages does not address an important aspect of design:
cohesion. To be effective, a package should consist of components and logical entities
that have related semantic characteristics, tight coupling, or otherwise make sense to
be packaged together and treated abstractly at a higher level.
Section 7.3.3 Partitioning a System 499
* y
x ·W *Redundant Edge
I package z I
I I
I I
I I
I
I package y
package X
I I
I I
I
I - --
I
--
I I
I I
I .. .. .. ..'.'. IIII~ I
I ·······f . . .
IL ______ ~
package w
___________ ~
II ______ ~
Figure 7-13: Less Useful, Physically P~rtitioned System (Compare with Figure 7-4)
500 Packages Chapter,
A better solution in this case would be to create a separate package for this new com-
ponent, with a similar, perhaps, but not identical prefix that conveys the similar nature
of the logical semantics yet distinguishes the physical dependency implications. By
placing this heavyweight component in a separate package, clients of the light-weight
package will not be saddled with the overhead of unwanted and oppressive dependencies
on libraries they do not need.
ed sym
elem I· cmp
prim
Logistically, it makes sense that the package dependencies across sites be minimized
to whatever extent is possible in order to reduce inefficiencies associated with inter-
site communication. Consider the package development distributions proposed in Fig-
ure 7-15. Distribution (A) is pathologically bad, with seven direct package
dependencies across sites. Dividing the diagram with a vertical line (B) illustrates
another inappropriate partition with five direct intersite dependencies. Dividing the
diagram with a horizontal line (C) may provide an optimal solution with a cost of only
three long-distance direct dependencies. Both (D) and (E) also provide potentially
optimal solutions if the complexity of packages and/or available resources at each site
are not evenly distributed.
502 Packages Chapter 7
I (B)
I
·. ·.•· . . r.· . .·. .·•. ·····<·······.·············. . .·. . . ·. . ·•..·. . . i
4
r::=::::::::::::=::::::::::::::, I.. .. . . . . . . . .
I . (lJ) sy m
L----- _ ------ll
I
I
I
-1-
(C)
(A)
geom
__ .J
N S N S N S N S N S
pub geom geom grph geom elem geom cmp pub geom
.
pub
.
prIm pub cmp pub ed grph elm
grph pnm
.
elem ed elem cmp grph ed grph sym pnm cmp
. . ed
cmp sym ed sym pnm sym pnm
elem sym
(A) cost = 7 (B) cost = 5 (C) cost = 3 (D) cost = 3 (E) cost =3
Identifying packages and delineating their interdependencies can affect the success of
larger projects. Minimizing the cost of interpackage dependencies should be at the
forefront of every architect's mind throughout the design process. Most important,
avoiding the high cost of cyclic dependencies among packages is essential if the flex-
ibility and maintainability of the system are to be preserved.
Packages present a higher level of abstraction than components. For packages with a
horizontal dependency structure, such as geom (see Section 4.13), we must export
most of the individual component header files in order to make the package function-
ality usable by clients (see Figure 7-16a). Even though placing these physically inde-
pendent components in a single package does not hide any additional details, we can
still benefit from the ability to refer to the aggregate of these components abstractly as
geom-a benefit that should not be underestimated.
geom p2p
(a) Horizontal geom Package (b) Tree-Like p 2P Package
In the case of tree-like packages, such as p2p, that sport a small number of insulating
wrapper components, we can gain not only the conceptual abstraction but also a phys-
ical abstraction as well. It is by not exposing superfluous information in the form of
unnecessarily exported header files, as illustrated in Figure 7 -16b, that this physical
form of abstraction is realized.
As with a good component interface, the fewer details we expose in the interface of a
package, the easier it is for the package developer to maintain and tune its implemen-
tation. Minimizing the size of the physical interface to which the client is exposed can
also improve usability. Although the surface area of a horizontal package is inherently
large, this need not be the case for a tree-like package.
Answering "yes" to any of the following questions for a particular component defined
in a given package implies that the header for that component must be exported:
2. Does any other exported component in this package fail to insulate its clients
from this components definition?
3. Do other packages need access to this component, (e.g., to reuse its func-
tionality independently in their own implementations)?
Consider a package such as p2p that is implemented hierarchically and presents its pub-
lic functionality entirely through the interface of only a small collection (one in this
case) of wrapper components. These wrapper components must be exported to the glo-
bal include directory (see Figure 7-2) in order for external clients to use the package.
However, there may be no need to export the header files of the remaining components.
Notice that we are not proposing to withhold header files here for the purpose of
encapsulating details, but rather as a means of reducing the clutter that clients must
wade through in order to use our package. Whether or not we export the implementa-
tion component header files depends on whether or not they are needed (or useful) for
I
purposes other than creating the. 0 files that belong to this package's library.
If a wrapper component is encapsulating but not insulating (see Section 6.4.3) it may
be necessary for the client's compiler to have seen the definition of one or more of its
implementation components in order to compile the wrapper interface. If so, you will
be forced to export implementation headers, your clients will depend on them at com-
pile time, and your flexibility to make changes to them will be impeded.
Finally, in the process of implementing our package, we may have accidentally created
one or more implementation components that other developers find useful in imple-
menting their own packages. In that case, we may generously decide to publish the
header files for these components. In doing so we enable reuse, but also enable addi-
tional interpackage coupling. This coupling could potentially have an adverse effect on
our ability to maintain our own package, and could introduce new package-level depen-
dencies that were not authorized by the system architect. Such additional package-level
dependencies would further constrain the levelizability of the entire system.
506 Packages Chapter 7
If a component header is not exported, our clients remain entirely insulated from it.
We may feel free to make any changes to it that we like. Once a header file is
exported, changes we make to its interface potentially affect many others who are
attempting to reuse its functionality. Even if we preserve the functionality, making
any change whatsoever to an exported component's header file will annoyingly force
clients who include this header to recompile. This example illustrates yet another situ-
ation in which reuse may not necessarily be a good thing.
In practice, there are likely to be a few low-level (horizontal) packages that export a
relatively large number of logically related and probably widely used component
headers. Most of the remaining packages would then implement sophisticated func-
tionality that operates on common, low-level types. Ideally these higher-level pack-
ages would export relatively small, high-level interfaces in the form of insulating
wrapper component headers.
In very large systems (involving many hundreds of thousands of lines of c++ code),
even a package is not at a high enough level of abstraction to be useful in discussing
overall system architecture. During the process of top-down design, architects will
identify major portions of the system. Each of these major subsystems will be imple-
mented by a team of developers; each subsystem will consist of a cohesive collection
of packages called a group.
Just as related components were collected into packages, so are related packages col-
lected into groups. An individual package is appropriately owned and maintained pri-
marily by a single developer, but a package group is usually owned by the project
manager (or principal engineer) of the development team that is charged with its
implementation.
The same principles that applied to the composition of individual 'packages and the
interdependencies among them (such as logical cohesion and avoiding cyclic depen-
dencies) apply to package groups as a whole. Like packages, groups should carry a
section 7.5 Package Groups 507
well-defined architectural significance that governs what is (and what is not) appro-
priate to belong to that group. For example, if a group is entitled "core functionality,"
we should resist placing packages that are not true to that label within this group.
Consider the large system shown in Figure 7-17. Although this system will consist of
some 40 packages (500,000 lines) when complete, its functionality naturally divides into
five vertically arranged package groups. Each of these groups consists of several pack-
ages. Not only are these packages individually levelizable, but the dependencies among
entire groups as defined above are also acyclic. That is, groups at higher levels contain
packages that depend on packages in groups at lower levels, but never vice versa.
base
Figure 7-17: A Large-System Architecture
508 Packages Chapter 7
There are good reasons for wanting to merge individual package libraries into a single
large group library. Many of these reasons are analogous to those for merging the . 0
files of components into a single package library. Consider the internal, package-level
organization for the core database group shown in Figure 7 -18. In this architecture,
there are several packages used in the implementation of the core database functionality.
At the lowest level of the core database group, the dbt package represents a horizontal
collection of types and protocols used throughout the group and by its clients. At the
next level are a set of five independent implementation packages. A single package
dbi provides a collection of wrapper components to present the combined functional-
ity of the implementation packages to clients in higher-level groups.
With the exception of the low-level types and protocols defined in dbt, the entire
functionality of the core database group is accessible through the wrapper compo"
nents provided in db; alone. Because dbi is an encapsulating and insulating package
section 7.5 Package Groups 509
of wrapper components for dba, dbb, dbc, dbd, and dbe, there is no compelling reason
to provide clients of this group with the headers for components defined within these
implementation packages. Once we have built the db; package library, exporting
these headers to higher-level groups would serve only to clutter the global include
directory. Note again that exposing these headers is not an issue of encapsulation, but
one of insulation and abstraction.
After building the database group, we will make available to clients of the group only
the subset of headers defined in the db; and dbt packages. As a convenience to our
clients, we will combine all of our individual package libraries into a single group
library file with the associated prefix db,10 and make that file publicly available.
To clients of our core database, it will now appear as if we had implemented the data-
base as a single package, db, with two related prefixes: db; and dbt. There may now
be a temptation to rename both dbt and db; to the simpler db; but this would be a mis-
take. Within the collection ~f packages that comprise the core database group, we
may be looking at literally hundreds of thousands of lines of code. For some, this "
would be considered a large system in its own right. If we change the prefix names of
these components, we give up an important maintenance property of our system-the
prefix identifies the package where the source can be found. Furthermore, we lose our
protection against namespace collisions between these two packages.
If our solution to these problems is then to combine these two packages into a single
lOW-level package, we have given up package levelization and any reasonable ability to
develop and test our system hierarchically. We are back to the problem illustrated in
Figure 7-12. From a purely practical point of view, we must remember not to lose sight
of maintainability in our efforts to please the aesthetics of our clients (or ourselves).
10 Note that this name too must be registered to avoid collisions between other group and package
library names.
510 Packages Chapter 7
Step back for a moment and notice that the protocols are part of the lowest-level pack-
age (dbt), not part of the presentation package (dbi). Escalating wrappers and demot-
ing protocols is a general and effective technique that can help to avoid cyclic
dependencies between the public and private packages within a group.
Low-level package partitions continue to serve many useful purposes, even though
most clients will not be concerned about internal partitioning. For example, during the
development process, it is inevitable that bugs will occur. It may then be useful to link
with versions of individual packages that have been compiled to contain debuggable
symbols. For very large systems, trying to link and debug many packages using the
debuggable versions can produce very large executables and make the entire process
exceedingly slow. The amount of disk space alone needed to hold an executable in
which every component in a group has been compiled with, the debug option can pose
a significant development burden. Highly effective, commercially available tools 11
used to detect low-level coding errors at runtime can produce executables literally
three times their normal size that run an order of magnitude slower. Having only two
alternatives-all or none-for linking with such large, special-purpose group libraries
is often not practical.
Fortunately most developers, working either within a package group or directly above
it, will probably have a good idea as to which individual packages within the group
are likely to be the ones causing the problem. These developers will know how to
adjust their link command to pull in only the appropriate special-purpose package
libraries, leaving access to the remaining package libraries unaffected. Providing the
ability to select individual specially built package libraries from within a group helps
to widen the envelope of systems that can be developed with a given set of tools on a
given hardware platform.
The size and structure of package aggregates is not bounded. In the example of Figure
7-7, these groups of packages took the form of a vertically arranged sequence. As we
will see in the next section, this vertical arrangement of groups somewhat simplifies
the internal release process. In a yet-larger system (i.e., in excess of a million lines of
source code), groups might form a tree-like or DAG-like structure (see Figure 7-19)-
perhaps to reflect the engineering management structure of the development effort. Of
course, in an actual design, the group dependencies would probably not be as regular
as the one shown in the figure.
Group Level 7:
Group Level 6:
Group Level 5:
Group Level 4: ~
Group Level 3:
Group Level 2:
Group Levell:
Figure 7-19: Hypothetical Very Large System with DAG-Like Group Dependencies
512 Packages Chapter 7
should consist of packages that are logically cohesive or otherwise make sense as a
single cohesive physical unit As it does with packages, the defined purpose of a
group should govern its contents; what is not germane should not be part of the group.
Of course, dependencies among groups of packages should form a directed acyclic
graph Although a package is an appropriate size for being owned and implemented
6
Internal releases are an integral part of any large development project. Groups of
packages are the smallest unit of functionality that are normally released. At some
regular predetermined interval, the code for a group of packages (e.g, the core data-
base group, db, of the previous section) is frozen 12 and the process of building a stable
internal release begins.
12 The liberty to make arbitrary updates to this version of the software is suspended.
section 7.6 The Release Process 513
be built and tested, linki~g only with level-1 packages. The process of rebuilding a
system is markedly similar to the way the individual components within a package are
developed and tested, but on a larger scale.
The levelization of package groups has a special significance in the release process.
All groups at each level in the system are collectively called a layer. For systems with
vertically arranged groups (see Figure 7-17), each layer consists of only a single
group. For larger systems with more complex group arrangements (see Figure 7-19), a
given layer may consist of several groups. To ensure consistency across the entire sys-
tem, it is important that all groups' on which a given group 9 depends have been
released before code for 9 is frozen and 9 is released. For example, group dby in Fig-
ure 7-19 is at level 3. The dby group cannot update its dependencies to the new ver-
sion of group geo until the xl ate group has also been released. In contrast, group dbz
depends only on group geo and hence need not wait for the' xl ate group to be
released in order to start the update process.
By definition, all groups on a given level are independent of each other. The release
process for each of these groups can occur independently. Although not all groups at
the next higher level will depend on all groups at the previous level, tracking individ-
ual group dependencies during the release process may be more effort than it is worth.
We can simplify the release process while ensuring the consistency of the entire sys-
tem simply by insisting that all groups on a given level are released before beginning
the release process for groups at the next higher level.
When the release process for all groups on this layer is complete, the availability of
the new package groups is announced. Developers working on the next higher layer
continue to use the previous release of the lower-level layer until they reach a conve-
nient stopping point. After rerunning their own regression tests one last time, these
developers may now-at their leisure-adjust their environments to refer to the newer
release of the lower-level software.
At this point the developers may have to make changes to their own code to accom-
modate any interface changes made to lower-level package groups since the last
release-a process sometimes referred to as porting. 13 Obviously, with good planning
-
13 The term porting applies to moving a software system to a new platform. This new platform can
take the form of new hardware, a new operating system, or a new version of the lower layers of the
system itself.
514 Packages Chapter 7
such changes will be minimized. After a few minor adjustments, developers should be
able to rerun their regression tests to verify that changes to the lower-level software
have not altered the nature of the needed functionality. These developers can now
resume development, using the new stable release of the software. At some point
these clients will in tum freeze their code and go through a similar release process.
Notice how a client of the immediately preceding layer is not forced to respond imme-
diately when a new release is published. Experience has shown that providing some
slack between the release of successive layers is an effective way to manage internal
releases within a large system.
Figure 7-20 shows one way to organize the development hierarchy for the system pre-
sented in Figure 7-18. This development-directory structure supports mUltiple releases
and the notion of header files shared among packages that are not exported outside the
group. At the root of the directory structure there are the five group directories corre-
sponding to the five groups in the system of Figure 7-17; each group has a subdirectory
structure similar to the one shown here for the core database group, db. Beneath the db
directory are subdirectories holding the past several parallel release structures of this
group; the release illustrated in Figure 7-20 for the db group is release 1.6.3.
Under the group's release directory are four directories and a file. The directories are
de pen den c i e s, sou r c e, inc 1 u de, and 1 i b, and the file is ex p 0 r ted. The dependen-
cies directory indicates the names and release versions of the other groups on which
this group depends. On a Unix-based system, each of these dependencies may be rep-
resented by a symbolic link that refers back to the specific release of the lower-level
group used to build this group. Providing these references allows the include and link
directories of clients to remain relative as they update a single pointer from the old to
the new release of a group.
The source subdirectory is organized in the same way as it was the for the much sim-
pler package-development structure shown in Figure 7-2. As Figure 7-20 indicates, all
of the source for each package within the group lives under a directory corresponding
to its package prefix, which makes it easy for developers to locate packages defined
within the group. Unfortunately, locating packages defined outside the group noW
becomes more difficult. This problem can be addressed by having packages within a
section 7.6.1 The Release Structure 515
group extend a common group prefix (e.g., dba or dbb) Of, less desirably, by identify-
ing the group location in the global package registry. There is an additional issue
involving "prefix prefixes"-that is, how does anyone know that dbq is not a legal
prefix for some new package not in group db?
ed
-
rell.6.3 rell.6.0
~
rell.6.1 -
current
1oca 1 1oea 1
dba_cl.h libdba.a
libdba_9.a
dba_cn.h libdbb.a
dbb_el.h libdbb_9.a
libdbt.a
dbt_e4. h 1 i bdbt_9 . a
source
dbe - cl.h dbe - cl.e dbe _cl.t.e
dbe - c2.h dbe_c2.c dbe- c2.t.c
.. .
dbe_ ci . h dbe - ci . c dbe - c; . t. c
...
dbe - cn.h dbe - cn.e dbe _cn.t.e
The include directory is now more complex in order to support the notion of exported
versus local headers for this group. The subdirectory 10 cal under inc 1 ude is similar to
the global include area of Figure 7-2, but is accessible only from within the db group.
This local directory contains header files that are necessary to support interpackage
communication within this group. The contents of the file exported, defined directly
under the release for the group, identifies individual components or entire packages
whose headers are to be made available to clients external to this group. During a
release, these headers are copied directly into the included directory for the group.14
Finally, the 1 i b directory is now also more complex in order to support the notion of a
single group library. Again the subdirectory 10 cal under 1 ibis similar to the global
1 i b directory of Figure 7-2 in that this subdirectory holds all of the various versions of
the individual package library files. Instead of containing library files corresponding to
each package, 1 i b contains a single library file representing their union. Providing just
a single library file makes using the group more convenient for general clients.
As Figure 7-20 shows, more than one version of each individual package library may
be built. The suffix _9 is used to indicate that the library has debugging symbols.
Many other special forms of libraries may exist as well, for purposes such as perfor-
mance monitoring or runtime memory-bounds checking. If the group is large, it may
not be practical to use or even build special-purpose libraries for the entire group.
Instead, developers will typically identify the individual packages within the group
that they would like to analyze more carefully.
As each package is built, header files that are to be exported from the package for use
by other packages within .this group are placed in the local include directory (e.g.,
sy s tem/ db / re 11 . 7 . 1/ inc 1 ude /1 oca 1 / dba_c3 . h). At the same time, each version of
the individual package libraries is placed in the local lib directory for this group (e.g.,
s y s t em / db / r ell. 7 . 1 / 1 i b / 1 0 cal /1 i b db a . a) Once all packages local to this group
have been built, the package libraries are combined into a single library and placed in
the lib directory (e.g., system/db/rel1.7 .1/1ib/libdb.a). Only those headers that
clients of this group will need in order to use the group are then exported to the
include directory (e.g., system/db/rel1. 7 .1/i ncl ude/dbi_cl. h).
The directory cur r e ntis not published but is reserved for ongoing development.
Although changes to published versions are infrequent and carefully controlled (see
Section 7.6.2), changes to the cur r en t (development) version may be expected to
occur frequently.
system/db/rell.7.1/1ib/libdb.a
system/db/rell.7.1/1ib/sun4os4/1ibdb.a
or
system/db/rell.7.1/1ib/hppaux9/1ibdb.a
The cost of compiling is partially a function of the number of header files in an include
directory, but is even more dependent on the number of directories the compiler has to
518 Packages Chapter 7
search in order to locate all required header files. On most systems, it is Significantly
faster to compile components when all of the header files reside in just a few directo-
ries than if the headers are distributed across many individual (package-level) include
directories.
The experiment was repeated for systems containing 1, 10, 100, 1,000, and 10,000
components on structures with varying numbers of include directories. Figure 7-21
contains the results of running the experiment both with the CFRONT compiler on a SUN
SPARC 10 workstation and also with the native C++ Compiler on an HP7000 workstation.
For reference, compiling an otherwise empty component that depends on only a sin-
gle package include directory takes approximately 1 CPU second to compile on the
SUN and 0.2 CPU seconds on the HP. As the system size increases, the cost of compil-
ing increases modestly on the SUN and only negligibly on the HP. For systems on the
order of 1,000 components, the cost of compiling a component using individual pack-
age include directories can use nearly twice the CPU time on the SUN and 4.5 times
the CPU time on the HP. For larger systems, the overhead of using individual package
include directories is even more pronounced-roughly an order of magnitude for the
SUN and nearly so for the HP. 15
15 Note that actual elapsed "wall" time can overwhelm even the CPU time when compiling compo-
nents that depend on a large subsystem. For example, the wall time to compile a component against
a 1,000-component system distributed across 100 individual package include directories was 22.1
seconds on the SUN and 4.8 seconds on the HP. When the system consisted of 10,000 components,
the wall time to compile a single component grew to 225.5 seconds on the SUN and 209.2 seconds
on the HP.
section 7.6.1 The Release Structure 519
Subsystem
Size in Number of Include Directories Number of Include Directories
Number of
Components 1 10 100 1000 1 10 100 1000
10 1.0 0.2
Time
(100%) (100%) in CPU
Relative to Seconds
Using a Single
100 1.0 1.0 Include Directory 0.2 0.2
(100%) (100%) (100%) (100%)
Reducing the amount of time it takes to recompile and relink can have a significant
impact on productivity. Fortunately, there are a couple of ways we can reduce this
problem for large systems short of buying a faster piece of hardware. The most effec-
tive method is to reduce the number of header files via insulation, as discussed in
Chapter 6. Another method, which will have a lesser (but still significant) impact, is to
reduce the number of include directories that a compiler needs to search during a
given compilation. One such way is to propagate the headers exported from lower-
level groups (identified by file de pen den c i e s) into a dependent group's own exported
headers directory, perhaps with additional filtering defined in file exported.
As Figure 7-22 illustrates, not all the headers exported by the base and db layers are
needed by clients of the t 1 k layer. Instead of having t 1 k simply publish just its own
headers, t 1 k could republish the necessary subset of lower-level exported headers in
addition to its own exported headers. In this way we can avoid forcing its clients to
specify the separate include directories for both bas e and db. Now clients of the t 1 k
layer need specify only one include directory in order to access the t 1 k layer func-
520 Packages Chapter 7
tionality. Here again, it is insulation that enables us to reduce the number of headers
we expose to our clients to improve their rate of compilation.
system/tlk/rell,7,l/include
system/db/rell.7,l/ioclude tlkl_cl.h
system/base/rell,7,l/ioclude dbi_cl.h tlkl c2.h
pub_cl.h dbi_c2.h tlkl c3.h
pub...;.c2.h dbt_cl.h tlk2_cl.h
usr_cl.h dbt_c2.h tlk3_cl.h
pub_c2.h pub_cl.h tlk3_c2.h
usr_c3.h pub_c2.h dbi_cl.h
usr_cl.h dbt_cl.h
pub_cl.h
Another alternative is to make the client group responsible for "prefetching" all of its
required headers into a single include directory before attempting to compile. Requir-
ing the client to create a special-purpose directory to efficiently reuse a subsystem in
effect makes such a subsystem less reusable. This second approach seems less
friendly, since it forces the client to do more work to use the subsystem; however, it
can have its advantages in a hostile environment.
7.6.2 Patches
The simplest, safest, and most common kind of patch involves making changes to
only the. c file of a component. After the. c file is modified and compiled, the result-
ing .0 file may then (on a Unix system) be placed before a library file in the link corn-
section 7.6.2 Patches 521
mand to supplant an existing . 0 file. Of course, clients can choose whether or not to
link-in these patch files-for some, the fix may not be worth the loss in stability.
A patch must not affect the internal layout of any existing object.
Not every bug can be patched. Fortunately, if the header file for the component is not
exported, the layout of such an object can be known only to the components within
the package. In such cases, the bug can almost always be fixed by providing one or
more patch files to solve the problem. However, even if the header file is exported,
there are a number of bugs that can be patched without having to rebuild the entire
system. The more insulated the implementation of a component, the more likely that
it can be patched without affecting components outside the package.
Ideally a patch does not require modifying any header files at all. Modifying infonna-
tion in an exported header file has the potential to affect an unbounded number of cli-
ents; such changes are therefore best avoided. Although risky, there are a number of
repairs we can make that will not invalidate our release, even though it may mean
altering the existing exported header files. If we can guarantee that the effects of these
local changes are link compatible and do not invalidate the release, we can save the
considerable expense and effort of a second release.
Note that the last four examples require modifying a header file. After such a change,
this header file should be artificially backdated to prevent unnecessary recompilations
by clients. The last two examples are risky because of the possibility of introducing an
ambiguity from function or operator overloading in a header file that has already been
included by some client. Had the last example been introduced in a new and separate
header file, there would be no chance that the construct would affect any existing usage.
The lists presented here are not complete, but should give the idea and flavor of the
kinds of changes that, if made carefully, can be accomplished locally via patches. The
only real requirements are that:
3. We are sure that the system would successfully rebuild if we were to try
to do so.
The purpose of a translation unit defining rna i n (other than a hierarchical test driver) is
to provide a C++ subsystem with a command line interface, interpret environment
variables, and manage global resources-nothing more. A common mistake is to
place far too much code in a file that defines rna i n. Such code cannot be tested incre-
mentally from a C++ test driver, nor can it be reused within a larger C++ program.
For example, consider a program designed to perform some sort of desktop publishing
function-say a glossary generator, illustrated in Figure 7-23. The function of a glos-
sary generator is to read an input document and store it as a set of unique words. This
input is filtered against a second input defining a set of blocking words. Blocking words
are common words (such as and, this, a, etc.) that are likely not to be appropriate for a
524 Packages Chapter 7
glossary. Next, the remaining set of words is compared against a third input, a thesaurus ,
that in this context represents a mapping of aliases or alternate forms to more common
or basic terms. For example, method is another name for member function in C++.
Finally, all basic tenns that are not blocked or alia sed must be defined in a fourth
input-a dictionary. A dictionary is a mapping from a set of common terms to their
respective definitions. The outputs of the glossary generator are a list of undefined tenns
and the' alphabetized subset of the definitions in the dictionary corresponding to
recognized terms.
Input Text •
Glossary-Generator • Unrecognized Terms
Blocking Words - • Program
Thesauru s • • Glossary
Die tionary •
Figure 7-23: Glossary-Generator Program
in the d t p_G los s Gen class are provided for these purposes. After the glossary generator
is programmed, we can load the individual words of the input text into the glossary-
generator object using the addTextWord manipulator function. Once we are done load-
ing all the input text for the document, we will create an iterator to sequence over the
glossary definitions in alphabetical order. A second iterator is provided to allow us
to sequence over any undefined terms. Having completed processing on a first doc-
ument, w.e may wish to pass several related documents through the same generator.
The c 1ear I nput W0 r d s manipulator allows us to start again with a new document while
retaining the previously programmed blocking words, aliases, and definitions.
/1 dtp_glossgen.h
#ifndef DTP_INCLUDED_GLOSSGEN
#define DTP_INCLUDED_GLOSSGEN
class dtp_GlossDefIter;
class dtp_GlossUndefTermlter;
friend dtp_GlossDefIter;
friend dtp_GlossUndefTermIter;
private:
// NOT IMPLEMENTED
dtp_GlossGenCconst dtp_GlossGen&);
dtp_GlossGen& operator~(const dtp_GlossGen&);
public:
// CREATORS
dtp_GlossGen():
-dtp_GlossGen():
// MANIPULATORS
int addBlockingWord(const char *blockingWord);
int addAlias(const char *alias, const char *keyTerm);
int addDefinition(const char *keyTerm~ const char *definition);
int addTextWordCconst char *textWord);
void clearlnputWords():
};
private:
II NOT IMPLEMENTED
dtp_GlossDefIter(const dtp_GlossDeflter&);
dtp_GlossDefIter& operator=(const dtp_GlossDefIter&);
public:
II CREATORS
dtp_GlossDefIter(const dtp_GlossGen& glossaryGenerator);
~dtp_GlossDefIter();
II MANIPULATORS
void operator++();
II ACCESSORS
operator const void *() const;
const char *keyTerm(); II Provides an association
canst char *definition(); II (keyTerm. definition) so
II we choose not to define an
II operatorC)() here.
};
private:
II NOT IMPLEMENTED
dtp_GlossUndefTermlter(const dtp_GlossUndefTermIter&);
dtp_GlossUndefTermlter& operator=(const dtp_GlossUndefTermlter&);
public:
II CREATORS
dtp_GlossUndefTermIter(const dtp_GlossGen& glossaryGenerator);
~dtp_GlossUndefTermlter();
II MANIPULATORS
void operator++();
II ACCESSORS
operator const void *() const;
const char *operatorC)() const; II Returns just the current undefined
}; II term so operator()() is ok here.
#endif
Our rna i n will still need to create a dt p_G 1 os sGen object and then translate input from
(files referenced by) the command line into dtp_Gl ossGen member function calls in
order to program this object appropriately. However, we may elect to use any number
section 7.7 The rna i n Program 527
n+2
main()
The job of the interpreter component, illustrated in Figure 7-26, is to attach itself to a
glossary-generator object and then exercise that object accordingly, based on com-
mands found in a specified input file or stream. The interpreter object itself is pro-
grammed with two pieces of information:
II dtp_glossgeninterp.h
#ifndef DTP_GLOSS_GEN_INTERP
#define DTP_GLOSS_GEN_INTERP
class dtp_GlossGen:
class ostream;
class istream;
class dtp_GlossGenlnterp_i;
class dtp_GlossGenlnterp {
dtp_GlossGenInterp_i *d_this;
private:
II NOT IMPLEMENTED
dtp_GlossGenlnterp(const dtp_GlossGenlnterp&);
dtp_GlossGenlnterp& operator=(const dtp_GlossGenlnterp&);
public:
I I CREATORS
dtp~GlossGenInterp(dtp_GlossGen* glossGen);
II create an interpreter
~dtp_GlossGenInterp();
II destroy this interpreter
II MANIPULATORS
void setErrorStream(ostream& errorStream);
II Set output stream to which detailed errors will be reported.
II By default, this stream is cerro
II ACCESSORS
int exercise(const char *fileName = "_") canst;
II Parses commands from the specified input file. Returns
II -Ion 1/0 error, 0 on success and 1 on syntax error.
9
#endif
Two accessor functions of the interpreter are provided to exercise the functionality of
the associated glossary-generator object. The first simply takes a file name and opens
it if possible. This function then calls the second (more primitive) form, which takes
an open stream and an optional "file" name to be used in formatting error messages.
The lower-level function is exposed in the interface so that the source of the stream
need not be an actual file. Note that these two member functions do not affect the state
of the interpreter; they affect only the state of the glossary generator.
Finally, all that is left to do in rna in is to create these two objects and sequence
through a set of command-line arguments. If no command-line arguments are speci-
fied,c i n should be assumed by default. A tiny standalone main driver for the glossary
generator program is shown in Figure 7-27. This driver illustrates a reusable pattern,
suitable to a variety of standalone applications.
II dtp_glossgeninterp.t.c
II
II Usage: a.out [ <file name> I - J*
II
II Example:
II
II john@john: a.out stuff.abc such.def -
II
II The above command line will first read input from the file
II "stuff.abc", then read input from the file "such.def", and
II finally read from standard input (cin).
#include IIdtp_glassgeninterp.h"
#include "dtp_glassgen.h"
canst char *const defaultArgs[] - { "_n};
1111, II has internal linkage
canst int defaultNumArgs = sizeof defaultArgs I sizeof *defaultArgs;
main(int argc, char *argv[J)
{
int status = 0;
canst char *progName = argv[O]:
int numArgs = argc > 1 ? argc : defaultNumArgs;
canst char *const *args = argc > 1 ? argv : defaultArgs;
dtp_GlossGen glossaryGeneratar;
dtp_GlossGenlnterp interpreter(&glossaryGenerator);
for (int i - I ; i < numArgs && 0 == status; ++i) {
status = interpreter.exerciseCargs[i]):
}
return status;
}
Figure 7-27: A Standalone Main Driver for Glossary Generator and Interpreter
530 Packages Chapter 7
Ownership of rna incomes with both privilege and responsibility. There is only one
rna in in a given program. It is this piece of code that should be responsible for reading
environment variables and establishing global resources. The person who owns ma in
owns the global name space. For example, there is no harm if the file containing rna in
defines or accesses external global variables, fails to use package prefixes, and so
forth. To ensure our ability to integrate arbitrary subsystems, however, no other part of
the system should pollute the global name space or attempt to usurp a global resource.
Guideline:
In general, avoid granting one component license that, if also taken
by other components, would adversely impact the system as a whole.
This (Kant-like) philosophy implores that unless we define rna in, we should not
attempt to do something that, if others did it also, would have a negative consequence
for the overall system.
Excessive use of inline functions is just one example of the kind of behavior that can
lead to subtle integration problems down the road. By cavalierly declaring inappropri-
ately large member functions inline, we can often improve the runtime performance
of our own object in isolation or within a small subsystem. However, this runtime
improvement is obtained at the cost of repeated code and increased executable size.
When such selfishly architected subsystems are integrated into larger subsystems, the
increased code size begins to show its adverse effect. Hardware mechanisms designed
.
to improve the performance of commonly used routines are defeated by the exceSSIve
repetition of inline code. The increased program size reduces the percentage of the
executable that the operating system can keep in core, which leads to increased swap-
ping. At some level of integration, many of these objects will actually begin to run
more slowly (as a result of the excessive inlines) than they would have run had so~e
of the larger functions been declared non- i n 1 i ne. The end result of this selfishness IS
a net decrease in overall system performance.
section 7.8 Start-Up 531
Only the . c file that defines rna i n is authorized to redefine global new
and del ete.
An important special case of this philosophy is that only the owner of rna i n can be
authorized to redefine the global operators new and del et e. Components that do not
define rna i n are proscribed from such unilateral behaviors. Otherwise two indepen-
dent subsystems, each redefining a unique resource (such as global operator new),
would not be link compatible.
To summarize: there is no top when designing a large system. The purpose of rna in is
only to provide a C++ subsystem with an interface to the command line, interpret envi-
ronment variables, and manage global resources-nothing more. Factoring functional-
ity provided by rna i n into separate components facilitates hierarchical testing and
enables easier integration into yet larger systems. The. c file that defines rna in owns the
global name space and is exempt from certain design rules that pertain to ordinary com-
ponents. For components that do not define rna i n, care should be taken not to take liber-
ties that, if also taken for other components, could compromise the system as a whole.
7.8 Start-Up
The elasped time between when a program is first invoked and when the thread of
control enters rna in is referred to in this book as start-up. It is during this time that
potentially all non-local static objects in every translation unit are constructed, as
illustrated in Figure 7-28. 17
17 According to the C++ language specification (ellis, Section 3.4, p. 19), all non-local static objects
within a translation unit must be constructed prior to the first use of any function or object defined
within that translation unit; in practice, however, all such initializations can and commonly do occur
at start-UD.
532 Packages Chapter 7
II my_component.c
#include "my_component.h" II defines class my_Class
#include "pub_list.h" II defines class pub_List
#include <sys/types.h> II declares typedef time_t
#include <sys/time.h> II declares ::time()
Since the order of initialization between non-local static objects defined in separate
translation units is implementation dependent, special care must be taken to ensure
that such static objects are initialized before they are used. When the intent is to pro-
vide a single instance of a globally accessible object, our stated aversion to global
data (Section 2.2) leads us to look for an alternative. Instead of creating an instance of
an object at file scope with extemallinkage, we can usually achieve our purpose with
a logical construct commonly referred to as a module and implemented in C++ as a
class containing only static members. I8
18 A module can also refer to a physical entity that is similar to a component, but that has a procedural
interface. Note that, in ANSI C, the only way to implement a logical module is as a physical module
(Le., as a separate translation unit defining static data at file scope). For more about modules, see
stroostrup, Section 1.2.2, p. 16.
section 7.8 Start-Up 533
Guideline
Prefer modules to non-local static instances of objects, especially when:
1. Direct access to the construct is needed outside a translation
unit.
2. The construct may not be needed during start-up or immedi-
ately thereafter and the time to initialize the construct itself is
significant.
The need to ensure the proper initialization of static constructs before they are used is
well documented. 19 What is less commonly appreciated is the magnitude of the com-
bined impact such initializations can have on start-up time. For small programs, ini-
tializing a few static constructs at start-up would probably have no noticeable impact
on a user's perception of the time needed to invoke the program. However, the larger a
system is, the more opportunity there is for independent static constructs to require
initialization during start-up.
Since every static object defined at file scope or within class scope is potentially con-
structed before rna i n is entered, a very large system whose components regularly
define such static objects could take an unacceptably long time to bring up. In fact,
there are documented cases of very large (supposedly interactive) systems where
naively ignoring the cost of initialization at start-up has resulted in invocation times in
excess of 10 minutes!
Non-local static objects are initialized and destroyed automatically by the C++ runtime
system; their indiscriminate use by individual components is a form of egocentric
behavior that degrades the invocation performance of integrated systems. Although
there is nothing we can do to stop these static instances from being initialized at start-
up, there is considerable flexibility about how and when modules are initialized. Fortu-
nately, it is always possible to transform a single global instance of an object into a
module that, when initialized, dynamically allocates that object. 2o Once initialized, the
module can successfully return a reference to the dynamic object it now holds.
There are at least four different techniques that can be used to ensure that a module is
initialized before it is used:
.. Wake-up initialized
• Explicit i nit function
• Nifty counter
• Check every time
Each of these initialization strategies has its own advantages and disadvantages; the
best choice will depend on several factors:
By far the best way to initialize a module is to try to have the module "wake up" in an
initialized state. For example, using this wake-up approach, a global registry module
might be implemented as a list of record links, as shown in Figure 7-29.
II ax_registry.h
#ifndef INCLUDED_AX_REGISTRY
#define INCLUDED_AX_REGISTRY
class aX_RecordLink;
class ax Record;
class ax_Registry {
static ax_RecordLink *d_list_p;
public:
static void addRecord(ax_Record *record);
II Add record to registry; registry now owns the record.
II
}; II ax_registry.c
#include "ax_registry.h"
#endif ax_RecordLink *ax_Registry: :d_list_p = 0;
I I ...
As long as all the static data members are fundamental types (pointers,21 integers,
doubles, arrays of characters, etc.), they will be initialized at load time (i.e., prior to
start-up) without affecting invocation time. Had we instead embedded a pub_L i st
object (Le., not just a pointer) as a static member of class ax_Reg i s try, then that
member would get initialized automatically (during start-up), incurring a runtime cost..
Not all modules can wake up initialized. More generally, some components may
define modules or contain static constructs that must be initialized at runtime before
they can be used. One way to enable this initialization is to provide each such component
with an i nit function, as illustrated in Figure 7-30. This i nit function must be called (at
least once) before the static constructs provided by the component can be used. The
in it-function approach is quite flexible in that the initialization can be deferred until
21A non-local static pointer to a user~defined type can be initialized at load time; in particular,
initialization to 0 is common.
536 Packages Chapter 7
well after the start-up phase and invoked only if and when the component is actually
needed.
II ax_table.h
#ifndef INCLUDED_AX_TABLE
#define INCLUDED_AX_TABLE
class ax_RecordLink;
class ax_Record;
class ax Table {
static ax_RecordLink **d_array_p;
static int d_size;
public:
static void init(int size);
static void cleanupC);
static int addRecordCconst ax Record& record);
I I ...
};
II ax_table.c
#include "ax table.h"
#endif #include "pub_List.h"
#include <memory.h> II declare memset
I I ...
II
Although flexible, the explicit-i ni t-function approach is quite error prone; clients com-
monly forget to initialize a component before using it, often resulting in a fatal runtime
error. To mitigate this problem, we might provide a distinguished component at the .
section 7.8.1.3 The Nifty Counter Technique 537
package level (e.g., ax_package) with an i ni t function that initializes any component
requiring runtime initialization defined within this package. At the same time it could
also call the i nit functions for all other packages upon which this package depends.
The package-level i nit-function approach has some serious drawbacks. First, there is
the obvious maintenance burden of ensuring that the i nit function of every contained
component and of every package upon which these components depend gets called by
the package-level init function. Much more problematic is that initializing the entire
package can dramatically increase coupling, potentially drawing in many components
at link time that are not otherwise needed. It is for this latter reason that the use of
package-level i nit functions are best avoided-especially for a generally reusable
package with a horizontal dependency structure. Instead, it is preferable for compo-
nents that depend directly on other components requiring explicit initialization to ini-
tialize such components individually. The client component may in tum supply an init
function for use by its own direct clients, or instead may incorporate some other ini-
tialization technique. Maintaining the initialization graph at a fine level of granularity
helps to keep the CCD of a system to a minimum.
When static objects use other static objects, the initialization problem becomes more
complex. For the sake of illustration, suppose that the global pub_L i st object of Figure
7 -30 itself makes use of a static construct that also requires runtime initialization (e.g.,
for class-specific memory management, as discussed in Section 10.3.4). Trying to create
a pub_L i st as a static object at start-up before the pub_L i st's static memory manage-
ment has been initialized could easily cause a fatal runtime error. Since the relative order
of these two initializations is implementation dependent, special precautions must be
taken.
538 Packages Chapter 7
II pub_list.h
#ifndef INCLUDED_PUB_LIST
#define INCLUDED_PUB_LIST
I I ... II pub_list.c
#include "pub_List.h"
class pub_List {
I I ... I I ...
};
static int s_niftyCounter = 0:
struct pub_ListInit {
pub_ListlnitC); pub_Listlnit::pub_Listlnit()
""pub_ListInitC); {
} pub_listInit; if (0 == s_niftyCounter++) {
II init pub_list's static constructs
#endif }
}
pub_Listlnit::""pub_ListInitC)
{
if (0 == --s_niftyCounter) {
II clean-up pub_list's static constructs
}
}
Instead of the error-prone i n i t- function approach, we might .consid~r using the nifty-
counter approach.22 In this approach, a dummy static instance of an initialization class
is placed in the header file of a component at file scope, as shown in Figure 7-31. Part
of the purpose of this static instance is to count the number of other components that
include this component's header. Each static instance of this dummy object included
by a translation unit will be constructed during start-up (in some order). The first time
a static instance of the dummy object is constructed, the static count is increased from
o to 1, and the dummy object knows to initialize its component. 23 Each subsequent
time a dummy instance is constructed, the only effect is to increment the static count.
At program exit, the process is reversed; the destructor for each dummy object decre-
ments the static count. When this count reaches 0, the dummy object knows it is OK
to clean up the component. i ostream uses the nifty-counter technique to ensure that
c in, co ut, c err, and c log are initialized before they are used.
The beauty of the nifty-counter approach is that it is foolproof. It is not possible to use
a component requiring runtime initialization without first including its header. Doing
so causes a dummy object to get constructed, which in tum forces an uninitialized
component to become initialized. All this happens before the translation unit that
included the component's header can make use of the newly supplied declarations to
access the component. Thus a class that employs the nifty-counter method of initial-
ization may safely be instantiated statically, even if the class itself uses other non-
local static objects that also employ this technique.
Another benefit of using the nifty-counter approach is that only those components in a
package that are actually needed in order to link are initialized. The runtime cost of
the nifty-counter initialization mechanism itself is negligible except for pathological
designs containing N components depending directly on M modules, where both N
and M are large. Normally this overhead is not large when compared with the con-
struction of the first static object that does the real work of initializing the component.
The major disadvantage of using nifty counters is that even components that only
might be used at runtime are initialized at start-up anyway. For dynamic libraries that
are loaded into a running program on demand, a non-local static initialization often
requires dragging these libraries in at start-up, which defeats the purpose of demand
loading. If the amount of work done during the initialization itself is large (e.g., load-
ing a multi-dimensional table), it would be wise to consider using another technique
that allows us to defer this initialization until later in the execution of the program.
Non-local static objects are commonly used to load a collection of independent con-
crete types into a global registry at start-up. However, linking to some library imple-
mentations (such as archive files on a Unix system) will not incorporate a translation
unit's . a file unless there is an explicit reference to an external symbol that is resolved
by this .0 file.
540 Packages Chapter 7
Consider the system illustrated in Figure 7-32. An a x_Reg i s try (see Figure 7-29) is a
module that acts as a global repository for various kinds of concrete records (e.g.,
my_Record) derived from the protocol class ax_Record. Since it is expected that there
will be many different record subtypes, a special helper class, a x_R e 9 i s t r a r, is avail-
able to aid in the automatic addition of concrete record types into the global registry at
start-up. Component ax_reg; stra r is presented in Figure 7-33.
Section 7.8.1.3 The Nifty Counter Technique 541
II ax_registrar.h
#ifndef INCLUDED_AX_REGISTRAR
#define INCLUDED_AX_REGISTRAR
class ax Record;
struct ax_Registrar {
ax_RegistrarCax_Record(*)(»;
'"'"'ax_Registrar();
};
II ax_registrar.c
#endif #include "ax_registrar.h"
#include "ax_registry.h"
ax_Registrar: :ax_Registrar(ax_Record(*cfp)(»
{
++s_niftyCounter;
a x_ Reg i s try: : add ( ( * c f P) ( ) ) ;
}
ax_Registrar::'"'"'ax_Registrar()
{
if (--s_niftyCounter (= 0) {
a x_Re 9 i s try: : c1e an up ( ( * c f p ) ( ) ) ;
}
}
II my_record.h
#ifndef INCLUDED_MY_RECORD
#define INCLUDED_MY_RECORD
#ifndef INCLUDED_AX_RECORD
#include "ax_Record.h"
4Fendif
II
If concrete record objects reside in such a library archive, there must be some explicit
link-time dependency in order to draw them in. One solution is to provide an empty
non-inline i nit function to be called by rna i n. However, we can avoid the dependency
of derived-record objects on the registry by escalating the registration process to a
higher level (e.g., rna in). In so doing we both improve flexibility and reduce the CCD.
The modified architecture using explicit initialization is shown in Figure 7-35.
section 7.8.1.4 The Check-Every-Time Technique 543
The larger the program, the less likely it is that we will use all of the functionality it
provides. Infrequently used subsystems may still require significant work to initialize
at runtime. As with insulation, if each function call of a component already performs
a non-trivial task, adding a small amount of additional runtime overhead on each call
will probably not noticeably affect runtime performance.
544 Packages Chapter 7
II ax_ledger.h
#ifndef INCLUDED_AX_LEOGER
#define INCLUOED_AX_LEOGER
class ax_Record;
class ax_Ledger {
I I ...
public:
static int addRecord(const Record& record);
static void cleanup();
I I ...
};
II ax_Ledger.c
lIen dif #include "ax_ledger.h"
I I ...
static s_initFlag = 0;
void ax_Table::cleanup()
{
II clean-up component's static constructs
s_initFlag = 0;
}
II
7.8.2 Clean-Up
Often just exiting the program will accomplish what our general users want; however,
as responsible developers, we must always consider the testability of our designs.
There are several ways of verifying that our code does not "leak" memory; however,
holding onto memory indefinitely is sometimes hard to distinguish from an actual
leak-especially in regression tests. Constructs, such as mUltiple inheritance, that
cause dynamically allocated memory to be managed by a pointer to anywhere other
than the beginning of the allocated block make it difficult even for sophisticated tools
to distinguish legitimate use from leaked memory.
One mixed blessing of the nifty-counter approach is that the destructor of the dummy
object can be used to initiate the clean-up of a static construct automatically. This is
good news for quality assurance, but it can present a burden for users who would pre-
fer, for perfonnance reasons, simply to exit. Fortunately we can always supply a
"switch" in order to program whether clean-up is actually to occur at program exit.
The benefit of providing this extra clean-up capability is an extra measure of quality;
the only real cost is that of additional development time and of a small amount of
extra complexity in the interface.
7.8.3 Review
To summarize this entire section: initializing modules and non-local static objects at
start-up can make the time to invoke a large program unacceptably long. Although we
cannot affect the point in the program at which these static instances are initialized, it
is always possible to transform a single global instance of an object into a module. An
effective way to ensure initialization without runtime cost is to design the module or
component to wake up initialized by having only fundamental static data members
(which are initialized at load time). Another approach to reducing invocation time is
to defer initialization until it is actually needed. This deferred initialization can be
accomplished using individual i nit functions or with initialization checks built into
every access. The i n i t- function approach is the most flexible and also the most error
prone, but it may be necessary when the individual access functions are lightweight
and called frequently. Explicit initialization is also required when attempting to link-
in self-initializing components stored in a Unix-style library upon which there is no
explicit link-time dependency. The check-every-time approach is foolproof for clients
and especially appropriate when the work done in each function call is already sub-
stantial. Finally, if we know we are likely to need a component initialized immedi-
ately upon invocation and its functions are lightweight and called frequently, the
nifty-counter approach may be the best choice after all. In all cases, providing a mech-
anism to free any dynamic memory held by static constructs (before the program
exits) will facilitate regression testing for memory leaks.
7.9 Summary
consists of an acyclic hierarchy of cooperating components. The file names for each
component within a package and each global construct defined within that component
should begin with the registered prefix allocated to that package. The dominant pur-
pose of this prefix is to identify in which package the definition of a given component
or class can be found. Consistent use of package prefixes partitions the global name
space, which avoids name conflicts during package integration.
Dependencies among packages are defined by the envelope of the individual depen-
dencies between the components that comprise the packages. For reasons relating to
development, marketing, usability, production, and reliability, it is required that the
aggregate dependencies among packages are acyclic. Packages with acyclic depen-
dencies form a levelizable hierarchy that is completely analogous to component level-
ization. Most of the techniques discussed in Chapter 5 for reducing the coupling
between individual components apply to packages as a whole. In particular, escala-
tion, demotion, and factoring are commonly used to reduce the development costs
associated with interpackage dependencies.
Insulation at the package level includes reducing the number (and size) of header files
that must be exported for clients to use the package. Insulating clients of a package
from a particular component contained within the package requires that the compo-
nent itself is not used directly by external clients of the package as a whole, all
exported components that use this component insulate its definition from external cli-
ents, and the individual component is not independently reused by other packages.
Whenever we insulate our clients from the underlying complexities of a subsystem,
we are likely to have improved both its usability and maintainability.
548 Packages Chapter'
Internal releases are an integral part of any large development project. A directory
structure capable of supporting versioned releases was presented in Section 7.6.1.
Very large systems can be partitioned into horizontal bands of package groups called
layers. A layer corresponds to all groups on a given level. A levelizable system can be
released in stages, starting at the bottom layer (group level 1) and progressing to
higher-level groups. To improve insulation, abstraction, and compile-time perfor-
mance for our clients, we may choose to export only a subset of all headers needed to
compile a given package, group, or layer.
For a large software system written in C++, there is usually no "top"-no single pro-
gram that defines the system. The purpose of rna i n is only to provide a command-line
interface, interpret environment variables, and manage global resources.
Factoring the underlying functionality provided by rna i n into separately testable and
reusable components facilitates integration into yet larger subsystems. Only the . c
file rna i n can take unilateral global actions; components that do not define rna i n
should avoid egocentric behavior that might compromise the integration process down
the road.
Start-up is defined as the time from the moment a program is invoked until the thread
of control enters rna in. It is during this period that all non-local static objects defined
throughout the entire program are constructed. Naively ignoring the cost of such ini-
Section 7.9 Summary 549
tialization can result in unacceptably long invocation times. A module can be imple-
mented in C++ as a class containing only static members, and is preferable to a non-
local static instance, especially when the cost of initialization is high and the need for
the object is not immediate.
2. Explicit i nit function: The i nit function for a component must be called
explicitly before the component can be used.
4. Check every time: Initialization occurs on demand (Le., the first time any
function in the component is called).
Effective regression testing for memory leaks dictates that we provide a way to free
dynamic memory associated with static constructs--even if this feature is not
required by the application itself.
PART III:
ALOE
Until now, the focus has been primarily on concepts that pertain to physical design
(e.g., components, levelization, insulation, and packages). Although good physical
design is critical to the success of larger projects, fundamental logical design issues
should be addressed by any project team early in the development process.
Logical design is a more mature and well-understood discipline than physical design.
Consequently, the presentation in this part takes on a different flavor. Where possible,
other readily accessibl~ books are cited to help minimize redundancy. Part III of this
book is a terse "reference manual" on the effective logical design of components.
In this final part of the book, we limit ourselves to the design and implementation of
individual components. C++ provides an almost overwhelming logical design space.
This extra freedom can make finding an optimal design more complicated than is war-
ranted by the functionality implemented by the component. Our goal is therefore to
simplify the interface of each component and eliminate redundant degrees of freedom
that unnecessarily complicate the logical design space.
552 Logical Design Issues Part III
In Chapter 9 we focus our attention on the abundant issues that confront the compo-
nent-interface author as individual behaviors are cast into the syntax of C++ operators
and member functions. Whether to implement a particular behavior as a member or
free operator, whether to make it virtual, how to pass in a particular argument, and how
to return a value are just some of the 14 separate issues addressed. The consequences
of using the various flavors of integers (e.g., s h0 r t, un s i 9ned, 1 0 ng) in the interface
are also presented. We then take a close look at the issues surrounding special-case
functionality such as conversion operators, compiler-generated behaviors, and-in
particular-the destructor.
In Chapter lOwe tour some of the issues that face implementors of objects in a large-
system environment, with one eye toward performance and the other toward reliabil-
ity. Highlights include the selection and ordering of individual member data and the
effective implementation of individual functions. A large part of this chapter is
devoted to a quantitative analysis of the efficient customized management of an
object's memory. We see that object-specific memory management can be more effi-
cient than the conventional class-specific techniques, while avoiding the potential
problem of soaking up memory in long-running programs. Finally, we explore the
pitfalls of memory management in the context of generic, template-based container
classes and then briefly contrast the applicability of templates with design patterns.
Architecting a Component
An individual object is usually too small to capture a complete concept. For an object to
be effective it may require free operators, or even entire friend classes, in order to cap-
ture the essential behavior of an abstraction. An abstraction is an abstract specification
of objects and functions that cooperate to serve some useful purpose. A component is a
concrete representation of that specification. A component is therefore also the funda-
mental building block of logical design.
Encapsulation, like insulation, can be a matter of degree. The costs associated with
complete encapsulation can often be prohibitively expensive. Sometimes we can
attain considerable performance gains without any real loss in flexibility by settling
for almost complete encapsulation. How and when to make this trade-off requires
careful deliberation.
A component will occasionally need to define and use in its implementation auxiliary
objects that are not intended for direct use by clients. C++ provides several techniques
for implementing such classes, each with advantages and disadvantages. There are
sound reasons for choosing exactly one of these approaches in most cases. Establish-
ing the selection criteria is all that is needed. In this chapter we consider several high-
level aspects of component interface design. We discuss the type and amount of func-
tionality that is appropriate for the component as a whole as well as for the individual
objects it contains. We characterize the costs associated with complete encapsulation,
and present ways to reduce that cost. Finally, we survey the many ways to implement
auxiliary objects within a component, and provide a rationale for making an imple-
mentation choice based on the properties emphasized by the particular usage model.
554 Architecting a Component Chapter8 .
The component level is also the appropriate level for detailed logical interface design.
When you, as a user, take advantage of a component implementing, say, a list abstrac-
tion (see Figure 6-19) you are probably using more than the functionality provided in
the Lis t class itself. For example, writing a simple output statement such as
involves the use of a free operator (Le., operator «) that is not part of the logical
interface of any class. The Lis tIt e r class provides functionality, that is, an intrinsic
part of the list abstraction, yet this _functionality is not supplied by the interface of
class Lis t directly.
In other words, a component is the realization of not just a type, but of a self-consistent
microcosm of functionality that, taken as a whole, comprises what we call an abstrac-
tion. It is the entire abstraction, not just a single ADT, that defines a useful logical par-
tition of the functionality within a system that is implemented by a component.
mayor may not be necessary for any particular client. However, without the addi-
tional ability to add members to the set, this component will be of little use to anyone.
If a component is not intended for public use, then, as suggested in Section 1.8, the
minimal subset of functionality that does the job efficiently for its known fixed set of
clients is, by definition, sufficient. At the other end of the spectrum, if a component is
intended to be reused widely in various situations throughout a system, then we can-
not necessarily know ahead of time what subset of the functionality will be needed. 3
A complete interface enables all operations commonly expected by users of a given
abstraction to be accomplished in an efficient manner. The more remote our clients,
the more likely we are to opt to err on the side of generality by trying to make the
interface complete. 4
Often a complete interface requires a more involved implementation strategy than one
that would be sufficient for any individual client. Hence, a complete interface may be
more expensive to implement. The more general implementation may also run more
slowly than a specialized version, perhaps even on the most basic and frequently used
operations. 5 Hence, a complete interface may be more expensive at runtime. A more
complete interface is usually larger and more complex, incorporating less frequently
used features. A larger or more complex interface makes it more difficult for clients to
find and use basic features. Hence, a complete interface may be more expensive to
use. Since a complete interface is more expensive according to a variety of measures,
it is wise to be sure that a complete interface is warranted before implementing one.
3 Accidental reuse implies use in situations other than for which a component was originally
intended. Intentional reuse implies (among other things) a desire on the part of the component
author to provide a complete interface and a robust implementation. If you were to link to a com-
ponent that is part of a standard library of "reusable" components (e.g., STL), would you be using it
or reusing it? What about; ostream? .
4 See meyers, Item 18, p. 62.
5 For example, template-based container classes that must work correctly when parameterized by
arbitrary user-defined types cannot take the same liberties with bit-wise copy routines (such as
memcopy) as could a container designed exclusively for fundamental types (see Section 10.4.2).
Section 8.2 Component Interface Design 557
Between the two extremes of sufficient and complete can lie a wide middle ground.
For example, it is generally true that assigning the state of one iterator to that of
another is almost never performed in practice. Hence, an iterator's assignment opera-
tor can usually be declared private and left unimplemented, without affecting the
usability of the component. This deliberate omission saves development time and
code size, yet leaves open the possibility of adding that functionality without causing
existing clients to rework their code.
When selecting functions for the interface of a class, our goal should be to strive for a
minimum set, using primitiveness as a criteria. Clearly, adding and deleting members
of a set are independent primitive operations. The ability to iterate over the members
of a set enables a client to determine membership, suggesting that membership itself
is not an independent primitive operation. However, it is likely that determining mem-
bership via iteration is fundamentally much less efficient than it would be if imple-
mented with direct private access to the internal representation (e.g., by binary
search). If determining membership is likely to be a frequently used operation, it
would almost certainly qualify for primitive status.
When selecting functions for the interface of a component, our goal is again to strive for
minimality, but with an eye toward usability. Supplying every conceivable operation for
an abstraction in a component interface increases its girth, overwhelms its clients, and
adversely effects its usability. For example, we could provide non-primitive support for
replacing the top entry on the St a c k of Figure 3-2. Although potentially useful to a few
clients, most would find such functionality superfluous.
By the same argument, we could also have omitted the tests for equality in the stack
component of Figure 3-2. Since these tests are implemented as free, non-friend func-
tions, operator== and operator!= could instead be implemented by any developer
who needs them. But if many users are developing applications that will work
together in a large system, it is desirable to avoid having each user rewrite the same
functions within each subsystem. Such redundancy wastes development time, execut-
able size, and, consequently, execution time. Finding the appropriate non-primitive
functionality to add to a component to make it most useful is a design goal. Often the
smallest interface that accomplishes this goal is optimal.
The term coupling applies to both logical and physical designs. Physical coupling
comes from placing logical entities in the same component or by creating a physical
dependency of one component upon another. Logical coupling arises from types used
in the interface of one component that are defined or supplied by other components.
Section 8.2 Component Interface Design 559
As with physical coupling, logical coupling is best kept to a minimum. Reducing the
number of external types used in the logical interface often makes a component easier
to use and to maintain.
Suppose you are creating a very public interface and you need to accept character
string inputs. Which interface in Figure 8-1 do you feel is more general? Your clients
may have their own string class which they are accustom to using. Every general-pur-
pose string class will know how to generate a canst cha r * representation. The inter-
face of Figure 8-1a will force your clients to use class my_St ri ng; the interface of
Figure 8-1 b will not.
II my_engine.h II my_engine.h
#ifndef INCLUDED_MY_ENGINE #ifndef INCLUDED_MY_ENGINE
#define INCLUDED_MY_ENGINE #define INCLUDED_MY~ENGINE
class my_String; my_Engine {
I I ...
my_Engine { public:
I / ... . my_Engine(canst char *name);
public: // ...
mY_EngineCconst my_String& name); void setNameCconst char *name);
/ I ... / I ...
void setName(canst my_String& name); canst char *nameC) canst;
// ... // ...
canst my_String& name() canst: };
// ...
}; #endif
#endif
(a) Using my _St ri ng in the Interface (b) Using cons t cha r * in the Interface
The consequences of this form of logical coupling would have been even more severe
had we instead elected to depend on some other non-standard component-library type
(e.g., yaur_Stri ng in the interface). Until an ANSIIISO standard string component
560 Architecting a Component Chapter 8
In short, there are a number of high-level questions we must ask ourselves when
designing the interface of a component. The most important questions is, "How public
is this component?" If it will be reused in lots of different and unpredictable ways, it
will need to have a reasonably complete interface. If the component is intended for
private use within a package (and will not be exported), the interface should be suffi-
cient-nothing more. In all cases we can improve the maintainability of our classes if
we design their interfaces to contain only primitive functionality, pushing off useful
but non-primitive functionality into separate operators or classes without private
access. Finally, logical coupling often can result in unwanted physical coupling;
avoiding the use of unnecessary types in the interface of a component that are defined
outside that component can help to alleviate this coupling.
Encapsulation can be harder to achieve than it might at first seem. Like total insula-
tion, total encapsulation can also be prohibitively expensive at runtime.
II bad_point.h
#ifndef INCLUDED_BAD_POINT
#define INCLUDED_BAD_POINT
class bad_Point {
int d_x; // (may change to short later)
int d-y; 1/ (may change to short later)
public:
II CREATORS
ba d_P 0 i nt ( i nt x. i nt y) : d_x Cx ), d3 Cy) {}
bad_Point(const bad_Point& p) : d_x(p.d_x), d-yCp.d-y) {}
II MANIPULATORS
bad_Point& operator=(const bad_Point& p) {
d_x = p.xC); d-y = p.y(); return *this; }
i nt & xC) { ret urn d_x;} I I bad ide a
i nt & y () { ret urn d-y;} I / bad ide a
II ACCESSORS
int xC) const { return d_x; }
int y() const { return d-y; }
};
#endif
Figure 8-3 shows a trivial test driver for the bad_Poi nt interface.
II bad_point.t.c
#include "bad_point.h"
#include <iostream.h>
rna i n ( )
{
bad_Point ptCl,2);
cout « pt « endl;
pt.xC) = 5;
cout « pt « endl;
}
When run on the example as shown in Figure 8-2, this driver produces the following
output (as expected):
john@john: a.out
( 1, 2)
C 5, 2)
john@john:
But now suppose we change the type of the private data members in bad_Poi nt from
i n t to s h0 r t:
class bad_Point {
short d_x; II OK, we changed "private" data
short d-y; II so what?
public:
I I ...
562 Architecting a Component ChapterS
and rerun the experiment. The results have now changed to the unexpected:
john@john: a.out
( 1, 2)
( 1, 2)
john@john:
The problem is that the reference returned in the interface (i nt&) is inconsistent with
the type of data returned (s h0 rt). As a result, a temporary i ntis created and a writ-
able reference to that temporary is returned. We could modify the interface functions
to instead return ash 0 r t &, but then we would have modified the interface in response
to an implementation change-thereby propagating the problem to our clients.
rna i n ( )
{
bad_Point pt(1,2);
cout « pt « endl;
Ilpt.x() = 5; II Returning writable reference replaced
pt.setX(5); II by function taking value of x coordinate.
cout « pt « endl;
}
john@john: a.out
( 1, 2)
( 5, 2)
john@john:
In the b a d_P a i nt example, doing it right costs nothing extra; however, in some cases,
total encapsulation can be more expensive. Consider the two potential implementation
strategies for a geom_Box, shown in Figure 8-4. Implementation (a) stores the lower-
left and upper-right comers of the box as points embedded in the geam_Box. It is
therefore possible to return both the lower-left and upper-right comers by con s t ref-
erence. The center point is not stored, and so it must be calculated and returned by
value. Likewise, both the length and width must be calculated on demand. Implemen-
tation (b), however, stores the center point along with the width and height of the
geam_Box. The center point is returned efficiently by canst reference, while the
lower-left and upper-right comers must be calculated and returned by value. Length
and width now require no calculation, but-being fundamental types-they are
returned most efficiently by value.
(a) Stores Lower-Left and Upper-Right Comers (b) Stores Center, Width, and Height
Ilii::
-
i
~p¢iple>·'1 .................
Part of the advantage of one implementation over the other is in avoiding the expense
of constructing the most frequently accessed point and instead returning it efficiently
by reference. Strictly speaking, however, these two interfaces, though similar, are not
564 Architecting a Component ChapterS
A classic example where encapSUlation is not complete can be found in virtually any
general-purpose string class, which for efficiency will invariably provide direct access
as a con s t c ha r * to its internal null-terminated string representation. Clearly this
interface constrains the internal implementation, forcing it to maintain a valid null-ter-
minated string representation as long as the string object is not modified or deleted.
However, a more encapsulating interface turns out to be too expensive or inconvenient
to be popular.
Another example where the interface constrains the implementation for efficiency can
be found in an unbounded array abstraction in which a writable reference to an
indexed object is returned. As illustrated in Figure 8-5a for an array of points, this
style of interface forces the implementation to maintain the same space for a
geom_Poi nt object once it has been referenced. Any attempt by the array to relocate
the object would invalidate references held by clients.
class geom_PointArray {
// ...
public:
// ...
geom_Point& operator[](int index);
const geom_Point& operator[](int index) canst;
// ...
};
By contrast, a naive, fully encapsulated version would provide functions to get and set
a particular element, as illustrated in Figure 8-5b. Notice that this interface is com-
pletely generaL There is nothing to stop us from storing the points internally as, say,
two parallel arrays of integers. We might decide to implement some kind of in-core
compression scheme for points. We might even think about swapping part of a large
array out to disk.
section 8.3 Degrees of Encapsulation 565
class geom_PointArray {
// ...
public:
// ...
geom_Point pointCint index) const;
void setPointCconst geom_Point& point, int index);
// ...
};
Although this new interface does nothing to limit our implementation choice, the
runtime cost of using this fully encapsulating interface could be substantially more
expensive--even when the two underlying implementations are identical. For less
lightweight elements (i.e., being significantly larger, having a non-inline copy con-
structor, or requiring dynamic memory allocation at construction), a fully encapsulat-
ing version of the interface could be prohibitively expensive at runtime.
Fortunately, there is another fully encapsulating form of the interface that does afford
some relief for "heavier" objects, particularly when accessing their values. Returning
an object by value will result in construction (and destruction) of at least one tempo-
rary of the indexed type. As Figure 8-5c illustrates, we can pass in a writable pointer
to an existing object instead of returning the object by value. Assigning the value of
the existing object Gust once) can often be accomplished with relative efficiency.
class geom_Po~ntArray {
// ...
public:
// ...
void getPointCgeom_Point *returnValue. int index) const;
void setPointCconst geom_Point& point. int index);
// ...
};
To make this all concrete, I created a single experimental version of a Poi n tAr ray
class with all three modes of access available simultaneously. The contents of Figure
8-6 were placed at the top of a driver file used to compare the relative performance of
these three modes of operation.
/1 pointarray.t.c
#include "point.h"
#include <memory.h) // memcpy()
class PointArray {
Point **d_array_p; II array of pointers to Point objects
int d_size; // current physical size of "unbounded" array
Point d_dummy; // not static to avoid construction at startup
private:
void resize(int maxlndex); // extend array of Point pointers when needed
public:
// CREATORS
PointArray(int size) : d_array_p(O), d_size(O), d_dummy(O,O)
{
resize(size - 1); // Factoring 1S good.
}
. . . PointArray() ;
/1 MANIPULATORS
Point& operator[](int index) II ARRAY A
{
if (index )= d_size) {
resize (index);
}
// ACCESSORS
int size() const {return d_size; } // ARRAY A, B, C
section 8.3 Degrees of Encapsulation 567
PointArray::-PointArray()
{
for (int i = D: i < d_size; ++i) {
delete d_array_p[i];
}
delete [] d_array_p;
}
The first test was to compare the relative efficiency of reading the x coordinate of the
first 1,000 points of the array and accumulating this value in the variable sum. This
experiment was run for each of the three array interlaces as presented in Figure 8-7.
To illustrate the effect the "weight" of the object can have on the interlace, the three
different Poi nt implementations used in the experiment of Figure 6-83 were reused
here as well. 7
maine)
{
int arraySize = 1000;
int sum = 0;
PointArrayarray(arraySize);
canst PointArray& constArray - array; II Provide a const reference to
I I ... II enable the invocation of the
II canst version of operatar[]
II INTERFACE A:
{
for (int j = 0; J < arraySize; ++j) {
sum += canstArray[j].x();
}
}
II INTERFACE B:
{
for ( i nt J = 0; J < arraySize; ++j) {
sum += constArray.point(j).x();
}
}
II INTERFACE C:
{
Point ptCO,O);
for (int j = 0; j < arraySize; ++j)
canstArray.getPoint(&pt, j);
sum += pt. x ( ) ;
}
Figure 8-8 provides the results of comparing these three different interfaces styles for
accessing Poi nt objects within the same array. Using the original Poi nt class (line 1)
with all its functions declared inline, the cost of total encapsulation is only minimally
more for the naive encapsulation of ARRAY B (111 %) and nonexistent for the full
encapsulation of ARRAY C (100%). Removing the inline functions from the con-
tained Poi nt type (line 2) makes both constructing and assigning to Poi nt objects
somewhat more expensive. Part of the runtime advantage of ARRAY C (168%) over
ARRAY B (271 %) is that the Poi nt assignment is occurring exactly once per array
access without th~ extra constructor (and destructor) calls generally needed to return
an object by value. For a contained object that allocates dynamic memory on con-
struction (line 3), the cost of construction (1,673%) well exceeds the cost of assigning
the new value in place (169%). From this data we conclude that there can be substan-
Section 8.3 Degrees of Encapsulation 569
The second test was to compare the relative efficiency of setting the x coordinate of
the first 1,000 points of the array while leaving the y coordinate unchanged. Note that
interface A allows us to accomplish this operation directly, while interfaces B and C
force us to first get the current value of the entire point. This experiment, illustrated in
Figure 8-9, was also run for each of the three array interfaces and for each of the three
Poi nt implementations. The results are tabulated in Figure 8-10.
rna i n ( )
{
arraySize = 1000;
PointArrayarray(arraySize);
PointArray& nonConstArray = array; II provide non-canst reference.
I I ...
II INTERFACE A:
{
for (int j ~ 0; j < arraySize; ++j) {
nonCanstArray[j].setX(j);
}
}
570 Architecting a Component Chapter 8
II INTERFACE B:
{
for (int j = 0; j < arraySize; ++j) {
nonConstArray.setPoint(Point(j, nonConstArray.point(j).y(»), j):
}
}
II INTERFACE C:
{
Point pt(O,Q);
for (int j = 0; j < arraySize; ++j) {
nonConstArray.getPoint(&pt, j);
nonConstArray.setPoint(Point(j, pt.y(), j);
}
}
}
Based on the results of this experiment, we can conclude that providing a writable ref-
erence to the contained object can have profound performance benefits that increase
Section 8.3 Degrees of Encapsulation 571
dramatically with the weight of the object. For fully insulating classes, some degree of
relief is provided by returning the original value through the argument list.
Settling for less than full encapsulation is sometimes the right choice.
As a final aside, we should note a subtle problem with the interface of the unencapsu-
lated version of this array. There are two versions of the [J operator:
aperator[](int index)
and
The first of these operators can potentially resize the array; the second cannot. If this
array were implemented as a "sparse array," the space for a Poi nt (or a significan~ly
bigger object) might deliberately be left unallocated until referenced by the non-
eanst version of operator[]. With this interface, the act of merely "reading" a non-
e an s t array object will implicitly populate it. It would be far more practical to skip
the operator overloading and choose distinct function names for these operations.
Doing so would make this array far less prone to subtle misuse that could result in
grossly excessive allocations of memory.
If perfonnance is a design goal, then certain implementation choices (e.g., object com-
pression and swapping to disk) must be ruled out anyway. By making reasonable
assumptions, learned through experience, we can attain most of the benefit of encapsula-
tion without incurring excessive and unnecessary runtime cost. When total encapSUlation
is appropriate, we can sometimes reduce its runtime cost by passing in a previously
constructed object to load instead of returning the object by value.
Often a component will make use of one or more tiny auxiliary classes in its imple-
mentation that are not programmatically accessible in the interfaces of the principal
classes defined in that component. Two characteristics help to distinguish an auxiliary
implementation class from other kinds of classes:
The Lin k class of the list component shown in Figure 6-19 is a case in point. There
are a variety of ways to realize such implementation classes, each with its advantages
and disadvantages. In this section we explore the pros and cons of a variety of design
options.
Section 8.4 Auxiliary Implementation Classes 573
Consider a simple integer list class shown in Figure 8-11a. Class my ~L ink is an imple-
mentation detail of my _L i s t, and is not programmatically accessible from my _L i st. In
this implementation, the auxiliary class definition is placed in the header file of the
component defining the primary class. This straightforward approach is the simplest
and most common method of implementing components using such auxiliary classes.
II my_list.h
#ifndef INCLUDED_MY_LIST
#define INCLUDED_MY_LIST
class my_Link {
int d_data;
my_Link *d_next_p;
public:
I I ...
};
class my_List {
my_Link *d_head_p;
I I ...
public:
II
};
my_list
#endif
We could put the link class in its own component, as illustrated in Figure 8-11 b. This
arrangement has the advantage of allowing us to test (and even reuse) my_L ink inde-
pendently of my _L i st. But for tiny implementation classes such as my _L ink, the cou-
pling brought about by reuse along with the extra physical complexity of a second
component makes this an unlikely choice.
II my_list.h
#ifndef INCLUDED_MY_LIST
#define INCLUDED_MY_LIST
class my_Link;
class my_List {
my_Link *d_head_p;
I I ...
public:
II
};
II my_link.h
1fendif #ifndef INCLUDED_MY_LINK
#define INCLUDED_MY_LINK 1
c 1 ass my _ Lin k {
int d_data;
my_Link *d_next_p;
my_link
public:
II
};
1Iendif
We could declare my _L i s t a f r i end of class my _L ink and make all of the link's func-
tions private, as suggested in Figure 8-11c. Making my _L ink a "slave" class of
my _L i s t prevents clients of component my _1 i s t from using my _L ink directly; how-
ever, access for direct testing is also precluded.
section 8.4 Auxiliary Implementation Classes 575
II my_list.h
#ifndef INCLUOED_MY_LIST
#define INCLUDED_MY_LIST
class my_List;
class my_Link {
int d_data;
my_Link *d_next_p;
friend my_List;
I I ...
};
class my_List {
my_Link *d_head_p;
I I ...
public:
II
}; my_list
#endif
We could make the my _L ink class a local definition, contained entirely within the . c
file, as illustrated in Figure 8-11d. This design would serve to insulate clients from
my _ Lin k. However, in addition to precluding direct testing, this design would also
preclude inlining any members of my _L i s t that made substantive use of my _L ink. If
component my _, i s t had also contained an iterator, not being able to inline iterator
functions might have significantly degraded runtime performance.
II my_list.h
#ifndef INCLUDED_MY_LIST my_list
#define INCLUDED - MY - LIST
class mY_Link;
class my_List {
my_Link *d_head_p;
// ...
public:
II
}:
my_I ist. h my_list.c
#endif
Finally, we could make the my _ Lin k class a private (or public) nested class whose def-
inition is contained entirely within class my _L i s t, as illustrated in Figure 8-11e. This
implementation would not insulate clients from the details of my _L ink, but it would
permit members of my _ Lin k to be used in the bodies of inline members of my _ Lis t
(and my _ Lis tIt e r ). Making my _ Lin k a nested class avoids affecting the global name
space; making it private makes it encapsulated and therefore not directly usable (or
testable).
II my_list.h
#ifndef INCLUDED~MY_LIST
#define INCLUDED_MY_LIST
class my~List {
class my_Link {
int d_data;
my_Link *d~next~p;
public:
I / ...
};
my_Link *d_head_p;
/ I ...
public:
I / ...
}; my_list
The advantages of each of the implementation alternatives for the my _L ink class pre-
sented in this section are summarized in Figure 8-12. Placing my _ Lin k in a separate
component (implementation B) is clearly the most flexible, allowing the component
author to include the auxiliary class definition in either the . c file or the . h file of the
principal component as needed. However, there is a cost associated with each physical
piece of a system. Unless we plan to directly test or independently reuse the auxiliary
class, creating a separate component to hold it would probably be unwarranted.
Nested classes are not as flexible as classes defined at file scope. For example, nested
classes cannot be forward declared;8 hence, nested classes cannot be insulated from
clients of their enclosing class. In addition, nested types are notationally cumbersome
8 The ANSIIISO committee has adopted a proposal to allow the forward declaration of nested
classes in c++. See stroDstrup94, Section 13.5, pp. 289-290.
Section 8.4 Auxiliary Implementation Classes 577
and cause excessive clutter in the physical interface. The syntax of the nested imple-
mentation inhibits our conveniently transplanting the auxiliary class to the . c file or to
another component, should we later decide to insulate or reuse it.
Is Directly Testable
Is Physically Coupled
Can Be Insulated
Is Reusable
The original implementation (A) and the public nested implementation (E') have sim-
ilar properties. The one benefit of the nested public design is that it does not affect the
global name space. Considering the disadvantages of nested classes described aboye,
if you're going to make a nested class public, why not just prefix its name and define
it at file scope in the . h file (as in implementation A)?
578 Architecting a Component Chapter 8
Though not insulated, private nested classes are truly encapsulated and cannot be
accessed by clients of the primary objects, nor can they be directly tested. The slave
class implementation (C) is almost identical to the private nested class implementa-
tion (E), except that the class itself is part of the global name space, though still not
usable or testable directly. The local class implementation (D) is also similar to a pri-
vate nested implementation (E), except that the local classes are insulated from clients
and therefore cannot be used in the bodies of inline functions of the primary classes.
For more on classes without extemallinkage, see Section 3.2 (immediately following
the discussion of Figure 3-7).
The best choice in any given situation will depend on the answers to the following
three questions:
y. Does the component expose inline functions that require access to the
auxiliary class? .
If the component's header does not define inline functions that make sub-
stantive use of an auxiliary class, that class can be insulated from clients
and is often best implemented locally (D). However, if the auxiliary class
is complex enough to warrant direct testing, the class should be imple-
mented using either (A) or (B).
(x?)
x. Does the auxiliary
/~
class require direct no yes
testing? / ~
(y?) (y?)
y. Do inline functions /~ /~
need access to the no yes no yes
auxiliary class?
(z?) (z?) (z?) (z?)
z. Will the component /\
no yes
/\
no yes I""
no yes I"'"
no yes
be used widely?
D D A C/E A B A B*
*The header for the auxiliary class is included in the. h file instead of the . C file
of the principal component.
To sum up: defining auxiliary classes in the same header file as the primary classes is
common; often this approach is adequate, if not optimal. Where possible, we would
like to insulate auxiliary classes from our clients by hiding them in the . c file or, if
necessary for testing, placing them in a separate component. For lightweight compo-
nents that are widely used, we may be forced to use slave or private nested classes to
enforce our sole ownership of the auxiliary class. This section is intended as a guide-
line and is not a substitute for the application of common sense.
8.5 Summary
Components serve jointly as effective units of both logical and physical design. An
abstraction is an abstract specification of closely related objects and (operator) func-
tions; a component (interface and implementation) is the corresponding concrete
realization.
There are several competing aspects to consider when creating the high-level specifi-
cation for a component. For components designed as part of a specific subsystem, we
require only that the interface be sufficient for its intended clients. For components that
will be used for various purposes throughout a large system, we expect the interfaces
580 Architecting a Component Chapter 8
to be complete. By sufficient we imply that the interface is suitable for solving a par-
ticular instance of a problem in some domain. By complete we mean that the interface
is suitable for solving an arbitrary problem in that domain. Both usability and main-
tainability are enhanced by keeping all component interfaces minimal.
User-defined types used in the interface of a component imply a strong logical depen-
dency on that type. As with physical coupling, logical coupling is best minimized. For
example, it is often preferable to use a canst cha r * parameter instead of some par-
ticular string class in order to avoid unnecessary logical coupling, especially if the
interface will be used by a variety of clients in many different contexts.
When implementing a component, there is often a need to create one or more auxil-
iary classes. These classes are not accessible through the interface(s) of the primary
object(s) defined in the component. These classes are implementation details of the
component and are simple enough that they may not require independent testing. The
following strategies have been identified for implementing auxiliary classes:
Figure 8-12 identifies the various advantages of these implementation strategies for
auxiliary classes with respect to the following questions:
Figure 8-13 provides a decision tree that can be used to select the appropriate imple-
mentation for an auxiliary class based on the context in which it will be used.
Designing a Function
The goal of function design is to provide safe, easy, and efficient access to the behav-
iors defined by an abstraction. The C++ language provides great latitude when it
comes to specifying the interface at the function level. Whether to make a function an
operator, whether it should be a member or free operator, how arguments should be
passed, and how values should be returned are all part of this level of the design pro-
cess. There are reasons beyond style that playa role in making these design decisions,
many of which we touch on in this chapter.
The C++ language places a variety of flavors of fundamental integer types (such as
s h0 r t, un s i 9 ne"d, 10 n g, etc.) at our disposal. These types represent yet another degree
of freedom that, if used thoughtlessly, can complicate and even weaken an interface.
can be eliminated without any loss in effectiveness. The resulting framework can then
help guide us toward simpler, more uniform, and more maintainable interfaces.
There is a list of issues one must address when specifying the interface of a function
in C++ in accordance with the ground rules presented in Chapter 2:
There are two organizational issues that, although not part of the logical interface,
must also be addressed:
There is a great deal of interplay among these issues; typically the answer to one
question will imply or at least affect the answer to another. In what follows we
address each of these issues individually, and provide guidelines for making optimal
design decisions. l
Apart from the compiler-generated operators (e.g., assignment), the only reason to
make a function an operator is for the notational convenience of the client. Note that,
unlike function notation, operator notation is not context sensitive; the resulting func-
tion call resolution of an operator invoked from a member function will be the same
as if invoked at file scope. 2 When used judiciously, operator overloading has a natural
and obvious advantage over the functional notation-especially for user-defined logi-
cal and arithmetic types.
rna i n ( ) rna i n ( )
{ {
pUb_IntSet a, b, c, d, e, f; pub_IntSet a, b, c, d, e, f;
Consider the two different usage models shown in Figure 9-1, corresponding to two
different interfaces for an integer set component, pUb_intset. Figure 9-1a illustrates
how operator notation can be used effectively. The nature of the set abstraction makes
the meanings of these operators intuitive, even for developers not familiar with this
particular component. Figure 9-1 b shows the equivalent computation using the more
bulky function call notation. 3
Readability (more than ease of use) should be the primary reason for
employing operator overloading.
In this integer set application, the operator notation clearly enhances both readability
and ease of use. By readability, we mean the ability of a software engineer to discern,
quickly and accurately, the intended behavior of a body of unfamiliar source code.
Ease of use refers to how easily a developer can use the object effectively to create
new software. Any typical body of source code is read many more times than it is
written ("For most large, long-lifetime software systems maintenance costs exceed
development costs by factors ranging from 2 to 4,,4), so it makes practical sense to
favor readability over ease of use in the long run.
Guideline
The semantics of an overloaded operator should be natural, obvious,
and intuitive to clients.
It is easy to come up with cute and easy-to-use applications for operators that have no
intuitive meaning for developers unfamiliar with your component. Sophomoric antics,
such as defining unary ope r a to r"'" as a member of a string class to reverse the string in
place, are obviously out of place in a large-scale development environment. The lit-
mus test for determining when to supply operator notation should be whether there is
3 We have made some of the member functions static to enable the same symmetric implicit conver-
sion of arguments, as do the corresponding operators (see Section 9.1.5). The indentation style of
the deeply nested function calls in Figure 9-1 b is borrowed from languages such as LISP and CLOS
where such constructs occur frequently_
4 sommerville, Section 1.2.1, p. 10.
Section 9.1.1 Operator or Non-Operator Function 587
Guideline
The syntactic properties of overloaded operators for user-defined types
should mirror the properties already defined for the fundamental types.
In the C++ language, every expression has a value. There are two basic types of values,
called lvalues and rvalues. 6 An lvalue is a value whose address can be taken. If an
lvalue can be on the "left" of an assignment statement, it is said to be a modifiable
lvalue; otherwise it is said to be a non-modifiable Ivalue. 7 An rvalue cannot be
assigned to nor can its address be taken. 8 The simplest lvalued expression is a variable
identifier itself. Unless the variable is declared con s t, it is a modifiable Ivalue.
Certain operators, such as assignment (=) and its variations (+= -= *= /= "= &= 1=
~= %= »= «=), pre-increment (++x), and pre-decrement (--x) all return modifiable
lvalues when applied to fundamental types. These operators always return a writable
reference to the modified argument. For example the hypothetical definition of these
operators for the fundamental type do Ub1e (if implemented as a C++ class) might
look as shown in Figure 9-2.
Other operators shown in Figure 9-2 return an rvalue because there is no appropriate
Ivalue to return. In the case of symmetric binary operators (such as + and *), the value to
be returned is neither the left argument nor the right argument but a new value derived
from both; consequently the return must be by value. 9 Equality (== !=) and relational
« <= > >=) operators always return an i nt type rvalue of either 0 or 1; clearly neither
of the input arguments would be appropriate to return here either. The post-increment
and post-decrement operators are an interesting special case in that they are the only
operators that modify the object and yet have no appropriate lvalue to return:
As a more subtle example, consider the two usage models corresponding to a generic
symbol table abstraction shown in Figure 9-3. In both cases, a symbol table-parame-
terized by type i nt-is constructed, two symbols are added, and the value of symbol
"foo" is looked up by name. Since it is entirely possible that a symbol with the speci-
fied name does not exist in the table, it is not appropriate for the function doing the
lookup to return its result by value or reference; hence the value is returned by pointer.
(Notice how and to what degree we have just taken liberty with encapsulation.) But
compare this usage with what we normally expect when we apply 0 per at 0 r [] to a
fundamental array of i nt. We expect to get back a reference to the indexed value, not
a pointer which may be null. This difference in usage between the 0 per at 0 r [] in Fig-
ure 9-3a and the usage of operator[] for fundamental types tends to make the func-
tion call notation of Figure 9-3b preferable in this case. Reserving the operator
notation for those cases where the syntax closely mirrors the corresponding funda-
mental syntax reinforces the effectiveness of operator overloading.
9 For a more detailed explanation, see meyers, Item 23, pp. 82-84.
590 Designing a Function Chapter 9
maine) maine)
{ {
gen_SymTab<int> s; gen_SymTab<int> s;
s("foo", 1); II operator() s.add("foo", 1);
s("bar", 2); II (bad idea) s.add("bar", 2):
const int *val = s["foo"]; const int *val = s.lookup("foo");
I I ... I I ...
Figure 9-3: Two Usage Models for a Generic Symbol Table Abstraction
Figure 9-4 summarizes the declarations of most C++ operators as they would be if
applied to fundamental types. (The fundamental operators - > ->* () and , provide
little insight.)
class P {
T& operator[](int) const; II indexed array access (binary)
T& operator*() const; II pointer dereference (unary)
};
Notice that unary operators that do not modify their arguments are not fundamentally
members. For example, unary operator! works perfectly on a user-defined type such
as an ostream even though there is no ! operator defined for this type:
#include "iostream.h"
void g(ostream& out)
{
if (!out) {
cerr « "output stream is bad" « endl;
return;
}
// ...
}
The code above works because an ostream knows how to convert itself implicitly to a
fundamental type (va; d *) for which the! operator is defined. If operator! were
treated as a member of a hypothetical v 0 i d* class definition, no user-defined conver-
sion could occur and the above code would result in a compile-time error.
Consider what could happen if we defined the concatenation operator (+=) for a string
class to be a free function instead of emulating the approach taken for the fundamental
types. As Figure 9-5 illustrates, making 0 per at 0 r+= a free function has enabled the
implicit conversion of its left-hand con s t c ha r * operand to a temporary p ub_St ring
(denoted here as tOO 5) with f 0 0 as its value. Even though this temporary would be an
rvalue for fundamental types, it is the temporary p ub_S t r i n9 object that then has the
value ba r" concatenated to it (and is not a compile-time error).11 As this behavior
II
would likely surprise and annoy our clients, we would be wise to suppress it.
II pub_String.h
// ...
class pub_String {
I I ...
public:
pub_String(const char *str);
};
void f()
{
pub_String a("tar");
const char *b = "foo";
pub_String c("bar");
b += c; II has no effect
b += c
(pub_String) taOS
(" bar" concatenated
(pub_String) t005 . - to temporary copy
of pub_St ri ng)
a += b += c; II a now holds "tarfoobar"
II but b remains unaffected.
}
11The C++ Language currently permits the modification of unnamed temporaries of user-defined
type. See murray, Section 2.7.3, pp. 53-55.
Section 9.1.2 Free or Member Operator 593
On the other hand, we expect certain operations (e.g., + and ==) to work regardless of
the order of their arguments. Consider 0 per a t 0 r+, which is used to concatenate two
strings and return its result by value. The language allows us to define 0 per at 0 r+ as
either a member or a non-member. The same goes for operator==. If we elect to
define these operators as members, then we will subject our clients to the following
anomalous behavior:
void f()
{
pub_String s("foo"), te"");
i nt i;
t - s + "bar"; II ok
t - "bar" + s ; II error
1 - S -- "bar"; II ok
1 - "bar" - - s ; II error
}
and
pub_String::pub_String(const char *)
while no such conversion on the 'left is possible.1 2 Making these operators free solves
the symmetry problem until we add the conversion operator
Figure 9-6 illustrates a problem brought about merely by adding a conversion (cast)
operator from a pub_St ring to a can s t ch a r *. Strangely, the two apparently similar
operators == and + are not identical with respect to overloading as (naively) we would
like to believe. The difference lies in the fact that there are now two ways to interpret
the == operator:
The problem does not exist for the + operator because there is no way to "add" two
pointer types in C++; hence there is no ambiguity.
II pub_String.h
I I ".
class pub_String {
I I ...
public:
pub_StringCconst char *pcc):
I I ...
operator const char *() const; II (== new conversion operator
};
void f()
{
pub_String s("foo"), te"");
i nt i;
t - s + "bar"; II ok
t - "bar" + s ; II ok
1 - s -- "bar"; II error (ambiguous)
1 - "bar" -- s ; II error (ambiguous)
1 - strlen(s); II ok
}
In a real-world string class, we would never rely on the implicit conversion to obtain
the string value for fear that the extra construction and destruction would unduly
affect our performance. Instead, we would define separate overloaded versions of
Section 9.1.2 Free or Member Operator 595
ope rat 0 r+ to handle each of the three possibilities as efficiently as possible, thus
sidestepping these ambiguity problems.
As Figure 9-7 illustrates, in order to accept a con s t c ha r * on the left of the == operator,
we are forced to make at least one of the equality operators functions a free function.
class pub_String {
I I ...
public:
pub_String(const char *pcc);
operator canst char *() canst;
int operator==(const char *pcc) canst; II bad idea: (asymmetric)
II Allows for user-defined' conversion only for the
II argument on the right side of the operator.
};
struct Foo {
F00 ( ) ;
operator const pub_String& () canst;
II Implicitly convert a Foo to a pub_String,
};
struct Bar {
Bar();
operator canst char *() canst;
II Implicitly convert a Bar to a (canst char *).
};
596 Designing a Function Chapter 9
void g()
{
Faa faa;
Bar bar;
if (bar == faa) { II ok: Bar =to=) (canst char *)
I I ... Faa =ta=) (canst pub_String&)
}
if (faa == bar) { II error: Faa =NO=) (const pub_String&)
I I ... Bar =to=) (canst char *)
As long as we are supplying all three versions of the operatar== function, what harm
could it do to make one a member? The hann is that a lack of symmetry could sur-
prise our clients. In the event that one object can be implicitly converted to
pub_S t r i n9 and the other to a can s t c ha r *, we would still expect the order of com-
parison to be unimportant. That is, if bar == f a a compiles, then so should fa 0 == bar
(and produce the identical result at runtime). However, if the
version is not available as a free function, then there is no way for the following
implicit conversions to occur:
f 0 bar
(pub_String) t008
The conclusion is that 0 per at 0 r== should always be a free function, regardless of
what other functions are involved. The same reasoning holds for the other binary
operators that do not modify either operand and return their result by :value.
Section 9.1.3 Virtual or Non-Virtual Function 597
The example set forth by the language itself is an impartial and useful model that cli-
ents can exploit to infer basic syntactic and axiomatic properties of operators.- The
goal of modeling the fundamental operations is not to enable implicit conversions
unnecessarily, but rather symmetrically to avoid surprises. If operator overloading is
used to any great extent, it is reasonable to expect that the abstraction is suitable for
reuse in a variety of situations. Clients of reusable components will appreciate a con-
sistent and professional interface-devoid of syntactic surprises. Note that the C++
language requires that the following operators be members: 13
Assignment
[] Subscript
-) Class member access
() Function call
(T) Conversion ("cast") operator
new (static) allocation operator
del e t e (static) deallocation operator
II geom_shape.h
#ifndef INCLUDED - GEOM - SHAPE
#define INCLUDED_GEOM_SHAPE
class geom_Shape {
public:
virtual ~geom_ShapeC);
virtual canst void *classld() const = 0;
virtual int compare(const geom_Shape& shape) const = 0;
II Returns negative. zero. or positive corresponding to
II whether this geom_Shape object is less than. equal to, or
II greater than the specified geom_Shape object. respectively.
};
#endif
Figure 9-8 illustrates how symmetric operators can and should continue to remain free
even in the presence of polymorphic behavior. Instead of making each of the six
equality and relational operators virtual members of the class, a single virtual com-
pare member is provided. These six operators will now continue to behave symmetri-
cally with respect to any implicit conversion.
Equality operators often make sense even when the relational operators do not (think
of a point abstraction). Sometimes sorting a heterogeneous collection allows for more
efficient access. In such cases, any ordering (even an arbitrary one) can be useful. The
virtual c 1 ass I d () method in Figure 9-8 enables derived types to define their own
runtime type identifier. 14 Using this identifier, shapes of the same type can be sorted
according to their own internal ordering, while ordering across concrete types can be
defined by some different (perhaps arbitrary) comparison. An implementation of a
geom_Ci rcl e, which participates in a total order on shapes, is provided succinctly for
reference in Figure 9-9.
600 Designing a Function Chapter 9
II geom_circle.h
#ifndef INCLUDED_GEOM_CIRCLE
#define INCLUDED_GEOM_CIRCLE
#ifndef INCLUDED_GEOM_SHAPE
#include "geom_shape.h"
#endif
public:
geom_Circle(double radius) : d_radius(radius) {}
geom_Circle(const geom_Circle& circle) : d_radiusCcircle.d_radius) {}
-geom_Circle(const geom_Circle& circle);
geom_Circle& operator=(const geom_Circle& circle) {
d_radius = circle.d_radius; return *this; }
canst void *classld() const { return d_classld_p; }
int compare(const geom_Shape& shape) const; II virtual
int compare(const geom_Circle& circle) const; II non-virtual
};
#endif II geom_circle.c
#include "geom_circle.h"
canst void *geom_Circle: :d_classld_p = &d_classld_p; II runtime type id
9e 0 m_ Ci r c 1e: : ---9e 0 m_ Ci r c 1e () {}. I I em pt Y & 0 ut - 0 f - 1i ne (s e e Sec t ion 9. 3 . 3 )
More generally, virtual functions are used to describe variation in behavior across
types derived from a common base class. Data members, however, are sufficient for
describing variation in value without having to resort to inheritance. 15 For example,
we would not define a protocol class art_Color, and then derive classes art_Red,
a rt_B 1 ue, and a rt_ Ye 11 ow; a single (perhaps fully insulating) concrete art_Co lor
class that stores one of a number of enumerated colors is probably a more appropriate
design. However, virtual functions are an effective technique for breaking both com-
pile-time and link-time dependencies (see Section 6.4.1). For that reason, a single
concrete class might be derived from an art_Co lor protocol.
DEFINITION:
Hide: A member function hides a function with the same
name ·declared in a base class or at file scope.
Overload: A function overloads the name of another function
with the same name defined in the same scope.
Override: A member function overrides an identical function
declared virtual in a base class.
Redefine: The default definition of a function is irretrievably
replaced by another definition.
Finally, there are four similar terms that are commonly used (and misused) to describe
a function and its effect on other functions (hide, overload, override, and redefine)
that we define here for reference. Distinct functions with the same name are said to be
overloaded only if they are declared in the same scope. When a member function in a
derived class is declared with the identical interface of a function declared virtual in a
base class, that function is said to override the base class function. In all other cases, a
function name hides all identically named functions in an enclosing scope, regardless
of their argument signatures. Functions hidden in a named scope are not directly
accessible, but can be accessed via the scope resolution operator (: :). However, when
we redefine a function (e.g., global new or class specific unary &), we replace its defi-
nition; the previous definition is no longer accessible from the program. I6
Guideline
We should be careful not to hide the definitions of any base-class functions in derived
classes. In particular, we should never supply a new definition for a non-virtual func-
tion in a derived class, since that would make the function sensitive to the type of any
pointer or reference from which the function might be called. I7 Allowing the type of
the pointer or reference to affect which behavior is invoked is counterintuitive, subtle,
and error prone. Hiding functions defined in base classes does not protect them from
use; it merely makes such use more cumbersome. We can always fiddle with the
pointer or use the scope resolution operator to call the hidden member. A better idea is
simply never to hide a member function in the first place. An example of a design pat-
tern involving virtual functions, multiple inheritance, and runtime type identification
can be found in Appendix C.
16 ellis,
Section 10.2, p. 210, and Section 13.1, p. 310.
17 See meyers, Item 37, pp. 130--132. .
Section 9.1.4 Pure or Non-Pure Virtual Member Function 603
Declaring a virtual function to be pure forces the concrete derived-class author to sup-
ply a definition. If failing to supply a specific behavior in a derived class is likely to be
an error, then the virtual function should be declared pure in the base class.
Protocol classes (see Section 6.4.1) are useful for achieving both levelization and
insulation in inheritance hierarchies. We want to avoid defining any behavior in the
protocol class itself; making all of the member functions (except the destructor) pure
virtual enables us to avoid defining any of them.
#include <iostream.h>
Bas e : : '"""B as e () {}
Partial::'"""Partial() {}
void Derived::f()
{
cout « "Derived: :f" « endl;
Partial::f(); II explicit call of pure virtual function
}
Derived::'"'"'Derived() {}
maine)
{ II *** Main Program ***
Base *b - new Derived;
b-)f();
}
II Output:
II john@john: a.out
II Derived::f
II Part i a 1 : : f
II john@john:
The obvious reason for making a function a static member of a class is that it does not
depend on any particular instance of an object:
class my_Widget {
static int d_instanceCount;
I I ...
public:
static int instanceCount() { return d_instanceCount; }
I I ...
};
.
Static member functiQns are commonly used to implement non-
primitive functionality in a separate utility class.
Section 9.1.6 canst Member or Non-canst Member Function 605
Escalating functionality to a higher level may require making it a static member func-
tion of a type defined in some other component (see Figure 5-15). If the function is a
convenience function that does not require private access, we might consider making
the function a static member of a separate class to emphasize its non-primitive status.
struct geom_PointUtil {
static int compareMagnitude(const Point& a, const Point& b);
II Compare the distance of each point from the origin,
II and return a negative, zero, or positive value
II depending on whether the magnitude of a is less than,
II equal to, or greater than that of b, respectively.
};
Notice that by making the campa reMagn.i tude a static function, we retain symmetry
with respect to the implicit conversion of its arguments. Had we instead declared
campareMagnitude a non-static member of geom_Point, there could be cases where
a . compa reMagni tude (b) would compile but b. campa reMa gni tude (a) would not
(see Section 9.1.2).
Though rarely necessary, we could grant private access while retaining this symmetry
just by moving the static method inside the geam_Poi nt class itself.
Figure 9-11 illustrates how physical con s t-ness is enforced in C++. A con s t member
function is permitted to modify and return a writable reference to memory that is held
and managed by an object. Since all of the functions defined above cause or enable
side effects that are programmatically accessible by clients, none would be considered
logically con s t functions.
class ex_String {
char *d_str_p;
public:
I I ...
makeNull() { d_str_p = O;} II physically nan-canst
rna ke Empty ( ) canst { d_str_p[OJ = 0; } II physically canst
Guideline
Deciding what is and what is not canst behavior of an object is an important part of
the design of its class (see Section 10.3.1). Care must taken to ensure that there are no
loopholes that could allow a client to circumvent that decision. Returning writable
access to an object's internal representation from a con s t member function can short-
2o
circuit the ability of the compiler to help ensure that a can s t object is not modified.
II MANIPULATORS II MANIPULATORS
void setValue(double v); void setValue(double v);
te_Node *parent();
te_Node *childl();
te_Node *child2();
II ACCESSORS II ACCESSORS
canst char *name() const; canst char *name() const;
te_Node *parentC) const; const te_Node *parentC) const;
te Node *childlC) const; canst te_Node *childlC) canst;
te_Node *child2() const; const te Node *child2() const;
}; };
When designing the system, it is easy to inadvertently provide ways in which a non-
con s t version of a reference to an object can be obtained from a con s t reference to
that same object. For example, Figure 9-12 provides two definitions of a t e_N 0 d e
used to implement a binary tree. Figure 9-12a defines canst member functions that
return writable access to the parent and each child. A function taking a reference to a
canst te_Nade could easily modify this supposedly canst value, without ever resort-
ing to a cast, as follows:
In the context of a system, an object that does not allow a non-canst reference to an
object to be obtained from a canst reference to that same object alone (either directly
or indirectly) is said to be canst-correct. Figure 9-12b defines a class that does not
provide a way to obtain writable access from a can s t reference through indirect
means. This implementation is can s t-correct because it preserves the intent of what
can and cannot be done with a canst te_Nade alone.
void fCconst Tl& aI, canst T2& a2, 000, canst TN& aN)
{
II There is simply no way for me to get hold of a
II writable reference to any of aI, a2, ... , aN or
II any portion thereof (short of casting away canst).
}
Guideline
A system should be canst-correct.
that allows a client to obtain a modifiable Ed 9 e from a can s t Na de. Even if Ed 9 e were
to have been careful to perpetuate the canst-ness as shown here:
as long as we can obtain a non-canst reference to a Nade from a non-canst Edge, the
subsystem is not canst-correct:
non-const:
Since Ed g e defines a can s t member that returns a reference to a can s t Na de, there is
a directed edge from canst Edge& to canst Node&. The non-canst version of this
method is treated analogously. The problem is that Nade contains a canst member
that returns a non-canst Edge reference-hence the upward sloping diagonal entry in
610 Designing a Function Chapter 9
Figure 9-13a. This diagonal entry introduces a cycle that contains both the canst and
the non-canst versions of Nade. Consequently, given a canst reference to a Node, we
can potentially obtain a non-canst reference to the identical Nade. A similar cycle
involves both versions of Edge, so a canst Edge reference 'could be converted to nOD-
can s t without the use of a cast.
The conversion graph shown in Figure 9-13b reflects the definition of Nade given in
Figure 9-12b. There are conversion cycles between Node& and Edge&, and also
between canst Nade& and canst Edge&. However, there is no one cycle that involves
both versions of either type; this small subsystem is can s t-correct. This interpretation
of canst-correctness generalizes to apply to an entire system.
It is possible that an object may supply canst information, such as a name that could
then be used to look up a writable version of the same object in some non-canst con-
tainer object:
The above is not a violation of canst-correctness because the canst object alone is
not enough to obtain a non-canst version of itself. Had the te_Tree been passed by
canst reference, we would expect that a canst version of the te_Tree: :lookup
member function would instead return a pointer to a canst te_Nade,ensuring the
canst-correct system shown in Figure 9-14.
Section 9.1.6 canst Member or Non-const Member Function 611
non-const:
There are situations where we do want a const member to return a non-canst refer-
ence or pointer to another type. In the Poi ntIt e r Han d 1e of Figure 6-75, we are able
to access a writable version of the contained Poi ntlter from a reference to a const
handle:
The intent here was to emulate the semantics of a writable pointer passed by value-
that is, you can modify the indicated object but you cannot change the handle itself to
refer to a different object. 21 However, as Figure 9-15 illustrates, this subsystem is
con s t-correct because there is no way to obtain a writable Poi ntIt e r Han d 1e refer-
ence from either kind of Po i nt I te r.22
21 The usefulness of a canst iterator obtained from a handle passed to a function by const reference
is dubious.
22 Note that verifying canst-correctness for a system can be done through static analysis; however,
because violating canst-correctness can depend on the underlying object (instance) network, it is
possible to have a canst-correct system that cannot statically be proven to be so. Such systems are
more expensive to maintain and much harder to validate.
612 Designing a Function Chapter 9
non-canst:
Guideline
Think twice (at least) before casting away const.
Casting away canst-ness serves to undermine all the benefits we worked so hard in
this section to achieve.
Member functions intended for direct use by general clients must be declared pub 1; c.
Free operators such as the equality or relational operators may be implemented in
terms of a primitive member function (see Figure 9-21). If this primitIve member
function is not public, the dependent free operators will need to be declared friends of
the class, which lessens maintainability and enables abuse (see Section 3.6).
Section 9.1.7 Public, Protected, or Private Member Function 613
Member functions that are not public expose general users to uninsu-
lated implementation details.
Private member functions are intended for use within a class and by friends of a class.
Since friendship outside a component is discouraged, there is often little advantage to
non-virtual private member functions over static free functions defined at file scope in
the. c file (see Section 6.3.3). One common valid use of a private non-virtual member
function is to factor out complex but seldom needed behavior from an otherwise tiny
and frequently called inline function. For example, class my_Stack in Figure 10-13
uses the private non-inline member 9 rowAr ray ( ) to implement the public inline pus h
method:
i n1i ne
void my_Stack::push(int value)
{
if (d_sp )= d_size) {
growArray() ;
}
Without using private methods, we would be forced either to implement the entire
push method out-of-line with a significant cost in performance, or to violate encapsu-
lation by making the growArray method public. Trying to implement the entire push
function inline is probably out of the question.
Private virtual functions make sense when the behavior defined in a derived class is
used only by members and friends of the base class. A potential use of a private vir-
tual function can be found in the sur f ace Equa t ion member of class Sol i d described
in Section 6.6.3 (see Figure 6-84).
614 Designing a Function Chapter 9
Protected member functions are explicitly earmarked for derived-class authors. Pro-
tected member functions expose general clients to implementation details and are
often best avoided (see Section 6.3.4). Although private virtual function are not acces-
sible to derived classes, derived-class authors may be expected to supply definitions
for these functions; in this sense, private virtual functions are exceptional:
c 1 ass Ba s e {
private:
virtual void programMe();
};
The issue of whether to return by value or not comes down to whether there is some-
thing within the object or argument list suitable to reference. For example, there is no
reasonable implementation23 of
23 See meyers, Item 23, pp. 82-84; Item 31, pp. 102-105.
Section 9.1.8 Return by Value, Reference, or Pointer 615
value, and yet it is still fully encapsulating. The four ways to return a (F 00) value are
summarized in Figure 9-16.
In cases where the function may fail, return by value or reference may not an option. 24
Sometimes we can return the value by pointer, which can be 0 on failure. Another
option is to return an integer status to indicate success or failure, and to return the
object itself through the argument list.
Guideline
For functions that return an error status, an integral value of 0 should
always mean success.
For functions that return an error status as either an i nt or some enumerated type, it is
convenient to have a way of knowing whether or not this function worked25 without
having to inspect some header file to determine the appropriate success value for this
particular function. Traditionally, a status of zero indicates success, a non-zero status
indicates failure, and the particular non-zero value may be used to provide additional
information to clients.
Often there is exactly one way for a function to work and several
ways for it to fail; as clients, we may not care why it failed.
Very often, clients will not care why an operation failed; in such cases a simple test
for non-zero status is sufficient, as indicated in Figure 9-17. In some cases, this con-
vention may allow us to avoid including an additional header enumerating the error
conditions, thereby reducing unwanted compile-time coupling.
if (01= g( ... )) {
status = BAD;
I I ...
}
II
return status;
}
For non-canst member functions, returning a non-canst reference to the object itself
is always a viable option. Returning a pointer or reference to an internal part of the
object potentially limits the implementation choice; its impact on encapsulation
should be considered carefully (see Section 8.3).
Section 9.1.8 Return by Value, Reference, or Pointer 617
For polymorphic objects such as geom_Shape (see Figure 9-8), it is not possible to
return an object by value. Returning a clone (dynamic copy) of the object by non-
eon st· pointer places the burden of deallocation on the client, and is prone to memory
leaks. The use of a reference here would be obscure and inappropriate (see Section
9.1.11). A preferred approach for returning newly allocated polymorphic objects
would be to pass in a pointer to a handle (see Section 6.5.3) explicitly designed to
hold a pointer to the base class geom_Shape:
class geom_ShapeUtil {
void create(geom_ShapeHandle *handle, canst char *typeName);
II Create a new shape of the type specified by typeName and
II load it into the handle passed in via a non-canst pointer.
};
Guideline
Functions that answer a yes-or-no question should be worded appro-
priately (e.g., i sVal i d) and return an i nt value of either 0 ("no") or 1
("yes").
if (1 == angle.isAcute(» { 1* ... */ }
618 Designing a Function Chapter 9
If .you cache this value as a flag in the object, you might be tempted to retum a
masked bit that could have some non-Boolean value (e.g., 8). Converting a non-Bool-
ean value x to a Boolean value y is as simple as y = !! x. An appropriate implementa-
tion of i sAcute might look as follows:
Since the ANSIIISO committee has adopted boal as a distinct integral type in C++,
we should probably consider returning baal instead of i nt in such cases once this
new fundamental type becomes generally available: 26
bool geom_Angle::;sAcute()canst.
{
return d_flags & ACUTE_MASK:
}
Guideline
Avoid declaring results returned by value from functions as canst.
Results returned from a function by value are rvalues. In the case of fundamental
types, declaring an rvalue can s t is redundant, confusing, and can interfere with tem-
plate instantiation:
Having just one function body is often easier to maintain than several overloaded ver-
sions. 28 In most cases it is easy enough to use inline functions to create overloaded
versions that, in effect, allow optional arguments to be located in the middle of an
argument list. 29
. Ift~~b~ .
····.· •··•·•
I·
. .•..•.•·•...•··•··.•·· ........... ,., . . , ........... ,.,.................. ... .. •. . .•. . . . . . . .•. •. •. . . •. . •.•. .•. .•·•·•.· •.······.···.··.··.····.1
Figure 9-18 contrasts the use of overloaded functions with that of default arguments
for factoring common code within a constructor call. As shown in Figure 9-18a, fac-
toring the implementations of several overloaded constructors requires the use of an
auxiliary function i nit, since one constructor cannot usefully be called from
another. 3o Such factoring does not allow us to take advantage of the initialization lists
of the constructors. The use of inline functions in Figure 9-18a to forward calls from
several overloaded functions (sometimes referred to as inline forwarding) eliminates
the potential overhead of nested function calls. At the same time, inline forwarding
negates the insulating value of having separate overloaded functions implemented
out-of-line.
geom_Point { geom_Point
int d_x; int d_x;
int d-y; i n t d-y:
private:
void init (x, y);
public: public:
geom_Point (); geom_Point(int x = 0, int y = 0);
geom_Point (int x, int y);
// ... // ...
}; };
inline inline
void geom_Point::init(int x, int y) geom_Point::geom_Point(int x, int y)
{ d_x(x)
d- x = X', , d-y(y)
d-y = y; {
} }
inline
geom_Point::geom_Point()
{
init(O,O);
}
inline
geom_Point: :geom_Point(int x, int y)
{
initex, y);
}
void g()
{
geom_Point a = 5;
a = a + 10;
}
Default arguments can be more self-documenting, more compact, and more easily
understood by clients than mUltiple overloaded functions because they place more
information in the header file. As such, default arguments are at odds with the goal of
insulation. For more information about how to reduce compile-time coupling when
using default arguments, see Section 6.3.8.
Guideline
Avoid default arguments that require the construction of an
unnamed temporary object.
Passing user-defined types as default arguments is cumbersome at best; not all objects
make sense as defaults. Constructing a temporary object to pass in by default, like
passing an object by value, is expensive and should be avoided.
Never pass a user-defined type (i.e., cl ass, struct, or un; on) to a func-
tion by value.
When it comes to returning a value through the argument list, there are two mind sets:
Wherever feasible, we would like to use the language itself to express our intention
instead of relying on a comment. The C++ language definition states that a pointer
may be null but that a reference may not. Returning a value through a modifiable ref-
erence argument makes the semantics clear: the object to receive the value must be
supplied by the client. Any documentation to reiterate this requirement would be
unnecessary and redundant. Consequently there is no need to test for a null refer-
ence-and there is no portable way to do so anyway. Returning an object by non-
canst pointer can therefore be reserved exclusively for results that are truly optional;
that is, the pointer is always tested inside the function, and if a null pointer is supplied,
the result is not loaded into the object.
In the other camp,32 classical theory discourages functions that modify their argu-
ments; such functions are known to be more difficult to maintain. Remember that, his-
torically, most of the cost incurred over the life of a system is in maintenance and
enhancement-not initial development. Allowing functions to modify their arguments
by reference makes it more difficult for software engineers maintaining an unfamiliar
The expressive power of writable references applies only if you happen to look at the
appropriate header file. From looking at just the client code in Figure 9-19, it is not at
all clear what caused the value of the my_String variable name initialized with
La u r e1" to come to hold the (incorrect) value" Hardy" .
II
II Output:
II name = Hardy
Functions that actually do modify their arguments are relatively scarce. By adopting
the guideline of modifying function arguments only through non-con s t pointers, we
make such functions easy to spot from the client code. ·Figure 9-20 shows that only
one of the three functions that operate on name could legitimately have modified its
value; in this example, we can look in only one place instead of three, simply by virtue
of having followed this guideline.
Guideline
Be consistent about returning values through arguments (e.g., avoid
declaring non-canst reference parameters).
624 Designing a Function Chapter 9
II Output:
II name = Hardy
Figure 9-21 demonstrates that even in the body of a function that takes a non-canst
pointer argument, we need look no further than the definition of this function (rather
than the header files declaring each called function) to infer which called functions
might modify that argument.
v0 i d f ( my _ St r i n9 * name, 1, j)
{
int s = my_Stuff::funcXe*name, i); II should not modify name
I I ...
int t = your_Problem::funcY(name, j); II potentially modifies name
I I ...
int u = their_Thing::funcZ(*name, i + j); II should not modify name
I I ...
}
Historically, passing objects by pointer and passing objects by reference have not
always been equivalent when it came to user-defined conversions. Until the c++ lan-
guage definition changed for release 2.0 of CFRONT,33 a function taking a non-canst
reference to you r _C 1ass would allow its argument to undergo a user-defined conver-
sion to a temporary before the retum-by-argument assignment could occur. The value
would therefore not be returned to the caller, and this error would have gone undetec-
ted until runtime. By contrast, a function taking a non-c 0 ns t pointer never permitted
user-defined conversion to occur; a type error would always have been detected at
compile time. The latter is the desired behavior and is consistent with that of member
operators such as operatar=, which will not implicitly convert the object that they
modify (see Section 9.1.2).
It is worth noting that standard pointer conversions continue to work as expected when
passing the object to be modified via non-canst pointer. In other words, passing the
address of a derived object to a function taking a pointer to one of its public base classes
makes sense and the conversion will occur implicitly. It is only the unwanted, user-
defined conversion that is suppressed by requiring the client to pass the modifiable argu-
ment's address. Fortunately, most current compilers will at least warn you when a user-
defined conversion causes a temporary to be bound to a.non-constreference parameter.
The C language does not permit function arguments to be modified directly; C++
does. When first using C++, classic C programmers will often forget to insert the
con s t qualifier before a reference parameter where appropriate, leaving a reader to
wonder whether the function author intended the argument to be modifiable or not.
Discouraging any use of non-canst references in function arguments makes the intent
clear (or the defect immediately obvious).
Forcing the client to pass the address for a modifiable argument often requires the cli-
ent to type the extra keystroke "&". However, this extra keystroke is worth its weight
in gold when it comes to advertising that a function call can potentially modify its
argument. And because most functions do not modify their parameters, almost all
function calls can quickly be eliminated from suspicion.
of their usage relatively clear, compared with returning some little-known object
through a modifiable reference.
Guideline
Another related issue is that passing a user-defined type by con s t reference is so com-
mon that we might never suspect the importance of a particular value's being an
lvalue. Consider the scenario of Figure 9-22. An infinite precision integer type
my _B i gIn t is defined that can be constructed from a fundamental i nt type. The
my _B i gIn t Set is a homogeneous collection that stores only the address of the object
supplied to its add function. Suppose a naive user tries to create a function 9 that adds
three integers to the set. Each integer is implicitly converted to a temporary
my _B i gIn t, which is guaranteed to remain valid only until the function returns; the
temporary can be destroyed any time thereafter until exiting the scope in which the
temporary was created. 34 If the second temporary my _B i gIn t is no longer valid by the
time the i sMembe r method of my _B i 9 I ntSet is invoked, a memory reference through
a bad pointer value could easily cause this program to crash!
class my_Biglnt {
I I ...
public:
my _B i gIn t ( i nt i);
I I ...
};
class my_BiglntSet {
canst my_Biglnt **d_set_p;
int d_size; II physical size
int d_length; II cardinality
public:
I I ...
void add(const ni_Biglnt& bi); II bad idea: should pass by pointer
/1 Stores the address of this object in the set.
void g()
{
my_BiglntSet set;
set.add(l); II Address of temporary my_Biglnt Added
set.add(2); II Address of temporary my_Biglnt Added
set.add(3); II Address of temporary my_Biglnt Added
set.isMember(2); II core dump?!
Without careful scrutiny of the class definitions, the client has absolutely no warning
that the address of the object (and not a copy of the object) will be retained. Had we
instead defined the a dd function of my _B i gIn tSet to take a con s t pointer, we would
have alerted the client that this function·considers the lvalue to be important, and-at
the same time-documented that fact directly in the declaration itself.35 The modified
usage model for my _B i gIn t 5e t is illustrated in Figure 9-23.
class my_BiglntSet {
I I ...
public:
I I ...
void add(const ni_Biglht *bi);
I I ...
};
void 9 ()
{
my_BiglntSet set;
set.add(l); II compile time error!
set.add(&2); II compile time error!
I I ...
Storing the address of an argument to a function is bad form. If the argument is passed
by value, it is represented- as a local automatic variable and the address will become
invalid as soon as the function returns. If the argument is passed by canst reference,
we have- no guarantee that it does not refer to a temporary. Passing the argument by
can s t pointer instead of by con s t reference suppresses the implicit creation of a tem-
porary, which is desired behavior when we plan to hold onto that address. Exceptions
to this guideline do occur in very common idioms when it is obvious from context
(e.g., an iterator) that the address of the object must be stored. Note that when a func-
tion stores the address of a oon- can s t argument for later modification, the two guide-
lines presented in this section (e.g., modifiable + lvalue) both apply; in this case, the
object should always be passed by non-canst pointer.
"........
. '.,. .'....', . ,.',.......
".
> .''''' ,,',.,'.,
" ............
,:"
".. _...._ i _. .....
_
.. . . . . . . . .
'" '
Beyond modification, functions that delete an object should always take a non-canst
pointer to that object and never a non-canst reference. In order to delete an object,
you must supply a pointer to the delete operator. Taking the address of an object to be
deleted is error prone; some compilers (e.g., CFRONT) will generate a warning mes-
sage, cajoling the developer to add an extra assignment (or worse, a cast) to a pointer
variable. Even more compelling is the fact that the C++ language specification per-
mits the value of a deleted pointer to be adjusted (e.g., to 0) by the compiler. 36 A null
(or invalid) reference is not permitted in the language. 37
Using pointer instead of reference arguinents to capture the semantic properties men-
tioned in this section has yet another benefit in terms of maintenance. If a function
that previously did not modify or take the address of an argument should suddenly be
changed to do so, all clients of that function would be forced to examine their code
before they could recompile. This is exactly what we want! Making such significant
semantic changes with syntactic compatibility could silently lead to subtle bugs and
-very unpleasant surprises.
36 ellis,
Section 5.3.4, p. 63.
37 cargill, Chapter 6, p. 125; ellis, Section 8.4.3, p. 153.
Section 9.1.12 Pass Argument as canst or Non-canst 629
Guideline
Whenever a parameter passes its argument by reference or pointer to
a function that neither modifies that argument nor stores its writable
address, that parameter should be declared canst.
Guideline
Avoid declaring parameters passed by value to a function as canst.
Whether or not this local copy is changed is an implementation detail of the function;
declaring it can s t exposes this decision in the interlace, compromising not only insu-
lation but also readability. This is not an issue for user-defined types since we never
pass them by value anyway (see Section 9.1.11).
Guideline
Consider placing parameters (except perhaps those with default
arguments) that enable modifiable access before parameters that
pass arguments by value, canst reference, or canst pointer.
Except for (optional) parameters with default arguments added after a function is
already in use, parameters that allow their arguments to be modified should precede
parameters whose arguments are passed by value, con s t reference, or con s t pointer.
Apart from making where to look for modifiable arguments more uniform, this rec-
ommendation is admittedly arbitrary; however, it is a classic style that is language
independent, predates C++ (and even C), and has proven useful over the years.
Guideline
Avoid granting friendship to individual functions.
From Section 6.2.3 we know that inline functions affect insulation. Apart from expos-
ing the implementation, large inline functions can increase executable size, poten-
tially making an integrated system run slower than if some of these functions had
been declared non-inline. If insulation is not an issue, the first question is whether the
object code resulting from the body of the function is larger or smaller than the non-
inline function call. If the inline object code is no bigger than a function call, inlining
will not increase executable size.
Guideline
Avoid declaring a function i n1i n e whose body produces object code
that is larger than the object code produced by the equivalent non-
inline function call itself.
For functions that merely get and set data members, it is often reasonable to use an
inline function without first acquiring performance data. For function bodies that gen-
erate more object code than the corresponding non-inline function call, performance
analysis at the system level should precede the decision to define the function inline.
Passing additional arguments to a function increases the amount of code generated for
a non-inline function call. Therefore, an inline function taking several arguments
could justify a somewhat larger function body before profiling.
632 Designing a Function Chapter 9
If a function is called frequently and performance is critical, the next question to ask
is, "From how many distinct locations can the function be called?" If access to the
function is restricted and the function is known to be called from only a few distinct
locations, then inlining is not likely to be an issue with respect to executable size. If
the function is large and may be called from many locations, the function is not likely
to be a candidate for inlining.
Guideline
Finally, inlining is merely a hint to the compiler; there is no way to ensure that a func-
tion will actually be i n 1i n e' d. Whenever we take the address of a function declared
i n 1 i ne, we force a static (non-inline) version of the function to be generated in the
translation unit where the address was taken. If a function declared i n 1 i n e is too large
or too complex, it might not inline; the metrics that control this are compiler dependent.
When a function does not inline, the compiler defines a static version of the inline
function in each translation unit that uses the inline. These multiple static copies may
cause the executable to be bigger and run more slowly than if the function had been
declared non-inline. Fortunately, there are usually ways to ask a compiler to report
functions that do not inline. 39
In Chapter 3 we discussed the uses relation in terms of user-defined types. In this sec-
tion we address the use of various fundamental types in the interface of a function.
Guideline
Avoid using short in the interface; use i nt instead.
In what follows I am assuming a 32-bit (or larger) architecture. If you are working on a 16-bit
41
machine or an embedded system, some of the statements in this section will not apply.
634 Designing a Function Chapter 9
f ( c , s )
char c; short s;
c + s
(int) t004 (int) tOOS
(a) Integral Promotion in Binary Operation (b) Integral Promotion in Function Call
Figure 9-25 illustrates a class that uses short instead of i nt in its interface. Why
might we want to do such a thing? The motivation comes from a desire to express
intent directly in the declaration and avoid having to resort to comments. If we declare
that a parameter is ash 0 r t, no one would ever try to pass in anything larger and so we
don't have to check it ourselves, right?
class my_Point {
short d_x;
short d-y:
public:
II CREATORS
my_Point(short x, short y):
my_Point(const my_Point& p);
my_Pointe);
II MANIPULATORS
my_Point& operator=(const my_Point& p);
void x(short x);
v 0 i d y ( s hart y);
II ACCESSORS
short xC) canst:
short y() canst;
};
.I ... ~"i~Ciple 1
., ........... .
The fact of the matter is that documenting information in the header file that is useful
only when looking directly at the header file itself can be of limited utility when it
comes to maintenance (see Section 9.1.11). Clients will pass an integer literal or
expression regardless of how we attempt to document it in the header; declaring the
integer to be ash a rt simply causes the truncation to occur outside the function rather
than inside, making it impossible for the function itself to detect an overflow error. To
the client, the perception is the same: the function doesn't work.
1. Does using ash art in the interface ensure at compile time that overflow
will not occur at runtime?
No. The C++ language allows arithmetic overflow to occur silently at runtime.
Exposing s h 0 r t in the interface limits the size of the coordinates that any
implementation can accommodate and eliminates our ability to detect
overflow; limiting implementation choice is a symptom of reducing
encapsulation.
636 Designing a Function Chapter 9
If anything, the argument may have to have its high-order bits masked
off, requiring additional work and therefore reducing runtime efficiency.
template<short N)
class pub_BitVec {
int d_bits : N;
public:
BitVec();
int operator[] (int i);
void set(int i);
void clear(int i);
void toggle(int i);
};
The C++ language requires that binary operators involving one un s i 9ned integer first
convert the other integer to un s i 9 ned before performing the operation. Usually this is
not a problem; however, when it is, it's not at all easy to debug.
Guideline
Avoid using uns i gned in the interface; use i nt instead.
II Output: II Output:
II john@john: a.out II john@john: a.out
II 3 * -1 = 4294967293 II (3 > -1) = 0;
II john@john: II john@john:
* -1 > -1
(unsigned) -1 (unsigned) -1
(unsigned) t007
/
(unsigned) tOOa
Figure 9-27a illustrates that when a signed and an uns i gned value are involved in
a binary operation, the bit pattern of the s i 9 n ed number is silently reinterpreted
as an un s i 9ned number. No actual temporary is created. For most integer repre-
sentations, the result is the largest number that will fit in an uns i gned (e.g.,
638 Designing a Function Chapter 9
class my_Array {
int *d_array_p;
unsigned short d_size; II bad idea: Short used in implementation
II only, but see Section 10.1.2.
public:
II CREATORS
ArrayCunsigned int size);
Array(const Array& array);
,...,Array();
II MANIPULATORS
Array& operator=Cconst Array& array);
int& operator[](unsigned int i);
/1 ACCESSORS
int operator[]Cunsigned int i) canst;
unsigned int sizeC) canst;
};
Figure 9-28: Using unsi gned Integers in the Interface (Bad Idea)
One might argue that we deserve what we get when we mix negative and unsigned
integers. Perhaps-when we do it. But consider the seemingly innocent my~Array
class shown in Figure 9-28.
#include <assert.h>
#include <iostream.h>
void printForwardMovingAverageCconst my_Array& a, int width)
{
assertCwidth > 0);
canst int N = width - 1;
int total = 0;
for (int i = -N; i < a.sizeC); ++i) {
if (i + N < a.size(» {
total += a[i + NJ:
}
cout « i « '\t' « doubleCtotal)/width « endl;
if C i >= 0) {
total -= a[i];
}
}
Figure 9-29: Innocent Client Function to Print the Forward Moving Average
Section 9.2.2 Using uns i gned in the Interface 639
As a client of the my_Array class, I have written the function shown in Figure 9-29,
which takes an instance of my_Array and prints its forward moving average of speci-
fied width. As a responsible developer, I whipped up the little test driver shown in
Figure 9-30 to verify that my function worked-and it did not.
II test.c
# include <stdlib.h> II atai()
Figure 9-30: Test Driver for pri ntForwa rdMovi ngAverage Function
The output I expected for the default values (an array of S I ZE 4 containing alII' sand
a WIN DOW width of 2) was supposed to look as shown in Figure 9-31a; the disappointing
reality is shown in Figure 9-31b.
Figure 9-31: Driver Output for pri ntForwa rdMovi ngAverage Function
Even though we know better than to mix uns i gned and i nt, one does not always
check a header for each integer value that is returned. In this case, it was s i z e ( ) that
did us in. The problem is again that comparing a negative number with an unsi gned
i nt will usually go the wrong way, as illustrated again in Figure 9-32.
640 Designing a Function Chapter 9
#include <iostream.h>
rna i n ( )
{
my _A r ray a ( 10 ) ;
cout « "size = II « a.size() « endl;
if (a.size() > -1) {
cout « IIsize is positive or zero." « endl;
}
else {
cout « "size 1S negative!!!" « endl;
}
}
II Output:
II john@john: a.out
II size = 10
II size is negative!!!
II john@john:
Figure 9-32: Comparing an uns; gned i nt Return Value Against a Negative i nt Value
All we need to do to repair the damage in this case is to replace the line
for (i nt i = - N; i < a. s i z e ( ); ++ i) {
Bugs occurring from the use of uns i gned in the interface are frustrating and notoriously
hard to detect. Looking at the problem with a debugger, it can seem that the i f state..
ment itself in Figure 9-32 must be broken. It is often quite a stretch to guess that the
return value of the function is declared un s i 9 ned and is implicitly converting some
other negative number to a positive value as a result of a binary comparison operation.
Section 9.2.2 Using uns i gned in the Interface 641
1. Does using uns i gned in the interface ensure at compile time that negative
numbers will not be passed in at runtime?
No. The C++ language allows the bit pattern to be reinterpreted silently at
runtime.
2. Does using un s i 9ned in the interface allow for the possibility of checking
for negative values?
Yes, but you have to coerce the uns i gned back to an i nt internally.
4. Does using uns i gned increase the size of the positive integer that can be
stored?
Yes-by 1 bit. This extra bit is rarely useful. If the extra capacity is
needed, there is a risk of losing data when the un s i 9 ned is converted back
to an i nt (see Section 10.1.2).
It increases it. Without looking at the header file, there is no safety advan-
tage, since the conversion is done silently. Naively using an unsigned
return value in an expression that involves a negative i nt value will cause
the client's code to break at runtime.
Exposing unsigned in the interface effectively limits the values that any
implementation will accommodate, thereby reducing encapSUlation.
7. Does using uns i gned in the interface interfere with overloaded function
642 Designing a Function Chapter 9
resolution?
Although in this book we cavalierly assume that an i nt holds at least 32 bits, in fact
only 16 bits are needed to satisfy the ANSI requirement for type i nt.42 If you are
working on a 16-bit machine, the following guideline clearly does not apply.
Guideline
Avoid using long in the interface; assert( 5 i zeaf( i nt) )== 4) and use
either i nt or a user-defined large-integer type instead.
The c++ language defines along integer to be at least as large as an i nt. Along i nt
means "the biggest integer you have"; an i nt means "the biggest integer that is effi-
cient" (typically the natural word size of the computer). On a 16-bit machine, along
is probably a double word (32 bits). On most commercially available compilers for
32-bit workstations, along is a single 32-bit word. On 64-bit architectures, an i nt
will probably continue to be set at 32 bits for compatibility with existing programs,
while a 10 n9 might be 64 bits. If portability is an issue, any assumption that a 10 ngin t
is more than 32 bits is a recipe for failure (not every machine you may want to port to
will have a 64-bit long i nt).
Figure 9-33 illustrates a component that uses 1 0 ngin t instead of i n t in the interface.
Why might we want to do such a thing? Usually the answer to this question is some-
thing like, "1 want it to hold the biggest integers it can." For small projects on small
machines, this reason might be sufficient. For large projects running on industrial-
strength workstations on multiple platforms, the i nt is either big enough or it isn't-
if you're not sure, then it isn't. Fortunately, the C++ language enables us to define a
larger integer type ourselves.
class my_Point {
long int d_x;
10 ngin t d_y;
public:
II CREATORS
my_Point(long int x, long int y);
my_Point(const my_Point& p);
my_Pointe) ;
II MANIPULATORS
my_Point& operator=(const my_Point& p);
void x(long int x);
void yClong int y);
II ACCESSORS
long int xC) canst;
long int y() const;
}:
1. Does using along in the interface ensure increased capacity over an i nt?
Not on all platfonns. Often an i nt and along are the same size. If you
depend on increased capacity, your code will not be portable.
3. Does using long in the interface interfere with overloaded function reso-
lution?
The C++ language enables floating-point computation to occur in each of the three
floating-point types:
• float,
• double, and
• long double.
43 Going from i nt to long is not an integral promotion; rather it is a standard conversion. Converting
an i nt to along and its (canst/val at; 1e) equivalents is the only "non-lossy" standard conversion in
the language.
Section 9.3 Special-Case Functions 645
Guideline
Consider using do U b1e exclusively for floating-point types used in the
interface unless there is a compelling reason to use float or 1 ang
daubl e.
Historically, C required all floating -point expressions to be of type do ub 1 e and did not
support long doubl e. ANSI C introduced the ability to do arithmetic directly with
float values. Most C library calls pass and return a floating-point value as a daubl e.
These days, much of the computer hardware is optimized to make daubl e floating-
point calculations run as quickly as possible. In fact, a double precision multiply on
my machine is an order of magnitude faster than an integer multiply (which is imple-
mented as a subroutine).
. J:'~fipl~·• • ·•·•· ·.
I
·
•· .·.·•· .•· ·.·•·.·.· .····.··.···...........
..
...... ........ ...... ... ... ... . . . . . . . . . . . ' .. ·.'····1.·
.
In most cases that arise in practice, the only fundamental types you
need in order to represent integer and floating-point numbers in the
interface are i nt and do Ub1e, respectively.
The same issues of consistency, error checking, operator overloading, and template
instantiation that applied to the integer types apply to floating-point types as well.
There are a few special member functions that warrant some discussion. Conversion
operators (i.e., single-argument constructors and "cast" operators) and compiler-gen-
erated functions (such as the copy constructor, the assignment operator, and, in partic-
ular, the destructor) deserve specific mention.
646 Designing a Function Chapter 9
Implicit conversions compete with type safety, can introduce ambiguities, -and in gen-
eral increase the cost of maintaining a program. Any time we create a constructor that
can take a single argument, we enable an implicit user-defined conversion. Defining a
conversion operator other than a constructor, referred to in this book as a cast
operator, also enables implicit conversion. An example of each of these forms can be
found in Figure 9-35.
pub_String {
I I ...
public:
pub_String(const char *cptrt int maxSizeHint = 0); II "cast constructor"
I I ...
operator const char *() canst; II "cast operator"
};
II d2_table.h
# ifndef INCLUDED_D2_TABLE
# define INCLUDED - 02 - TABLE
class d2_Entry;
class d2_Rowlter;
class d2_Collter;
Section 9.3.1 Conversion Operators 647
class d2 Table {
// ...
friend d2_Rowlter:
// ...
public:
d2_Table();
// ...
}:
class d2_Rowlter {
// ...
friend d2_Collter;
// ...
public: .
d2_RowIter(const d2_Table& table): II takes a d2_Table
operator const void *() canst;
void operatar++();
};
#endif;
The intent is that a client will apply a row iterator to the table and, for each row position,
reapply a new column iterator to that row iterator.
As the function in Figure 9-37 -shows, editor cut-and-paste can introduce bugs:
c; t ( t) on the indicated line should have been cit ( r it). As long as our code is
"type-safe," we stand a good change of detecting such bugs at compile time-but not
here! What actually happens is that each instantiation of the second iterator forces an
implicit conversion of the d2_ Tab 1e, t, to an unnamed temporary of type d2_Row I te r
(which happens to be positioned at the first row of the table). There is no guarantee that
this temporary row iterator will remain valid while the column iterator operates; but if
it does, the table will appear as if the contents of all rows are identical to the fITst.
648 Designing a Function Chapter 9
void g(d2~Table& t)
{
for (d2~Rowlter rit(t); rit; ++rit) {
for (d2_Collter cit(t); cit; ++cit) ( II <-- oops!!! "cit(t)"
II should be "cit(rit)"
cout « cit() « endl; II print (ith row, jth col) table entry
}
}
}
class gr_Node;
class gr_Nodeld {
int d_index;
public:
II gr_Nodeld(int index); II there goes type safety
I I ...
};
class gr_Graph {
I I ...
public:
I I ...
const *lookupNode (const char *name) const;
gr~Node
II lookup a node in the graph by name
const gr_Node *lookupNode (const gr_Nodeld& id) const;
II lookup a node in the graph by id
};
Section 9.3.1 Conversion Operators 649
Guideline
Consider avoiding "cast" operators, especially to fundamental
integral types; instead, make the conversion explicit.
In general, explicit conversion functions are more readable and much safer than
implicit conversions. Although cast constructors are a necessary part of doing busi-
ness, cast operators are a form of implicit conversion that is more easily avoided: we
can always supply an explicit conversion function to do the work of a cast operator.
As we saw in Figure 9-6, providing p ub_S t r i n g with both a cast constructor and a
cast operator (for implicit conversion to and from a can s t c ha r *) led to ambiguities
that required further effort to resolve. Had we replaced the cast operator with a mem-
ber function such as con s t c ha r * s t r () con s t; with the identical implementation,
no ambiguity would have occurred. 44
The C++ language requires that the compiler automatically generate the definitions of
certain basic member functions, if needed, unless they are already explicitly declared
in the class (see Section 6.2.6). Most commonly of interest are the generated copy
constructor and assignment operator.
Guideline
Explicitly declare (either public or private) the constructor and
assignment operator for any class defined in a header file, even when
the default implementations are adequate.
If value semantics are to be supported, the next issue is whether the compiler-gener-
ated constructor and/or assignment operator would do the right thing. 47 If the default
definitions are not correct, we will need to declare these members and define them our-
selves. Otherwise, we must determine the likelihood that the default definitions might
become invalid, and determine also the cost to our clients of making an uninsulated
change to our interface if they do. If the expected cost is too high, we would again opt
to define these operations ourselves rather than use the default implementation.
Finally, for very local objects where the compiler-generated implementations make
sense and insulation is not an issue, we might allow these function definitions to
default. In particular, allowing default copy and assignment semantics is often appro-
priate for classes defined entirely within a . c file. However some clients of an
exported class definition that relies on default semantics may be left with this nagging
doubt: is the default implementation really good enough, or did the author simply fail
to address this issue?
Note that some current implementations of C++ do not allow generated operator= to
be called via function notation, nor its address to be taken, as required by the lan-
guage. 48 Such failings by compilers bolster the argument in favor of always declaring
an exposed class's value-semantic operators explicitly.
The destructor is responsible for destroying the object and freeing any resources (e.g.,
dynamic memory) currently managed by that object. When a class declares a function
vir t ua 1, it is advertising itself as a base class-what other reason could there be for
declaring a function vir t u a 1? Derived classes may accrue resources even when the
base class has none. Conversely, in order to ensure that the derived class destructor is
called, even from a base-class pointer or reference, the base-class destructor must be
declared vi rtua 1.49
The cost of ignorance can sometimes be truly staggering. Figure 9-39 depicts a real-
life problem that went undetected in a large project for quite some time. The story
begins with the fact that the popular core_Stri ng class is derived from
cor e_S t r i n 9 Bas e that contains virtual functions, including of course a virtual
destructor. The cor e_S t r i n9 class, not allocating any additional resources, faile,d to
declare a destructor at all. ~e compiler is required to generate a destructor for the
derived class and place it in a virtual table for the derived class.
class core_StringBase {
I I ...
public:
I / ...
virtual ~core_StringBase();
/ / ...
virtual int length():
virtual operator const char *() const;
};
public:
core_String(const char *cptr);
core_String(const core_String& string);
core_String& operator=(const core_String& string);
int length() { /* ... */ }
};
Figure 9-39: Failing to Define at Least One Virtual Function Out of Line
Not being given any clue as to where to place a unique global copy of the virtual table,
the compiler placed a copy of the table in every translation unit that included the
core_Stri ng header. To add insult to injury, there was also no unique place to gener-
ate a non-inline version of the destructor; hence, a static copy of the destructor was
placed in every translation unit along with the virtual tables. Finally, every inline vir-
tual function (e.g., 1 ength) was also denied a unique home for its out-of-line imple-
mentation. A static version of each inline virtual function was also placed in every
translation unit that included the cor e_S t r i n9 class.
The problem was finally detected when the Unix "nm" utility was run on the execut-
able and a histogram of static names turned up thousands of static function defini-
tions, each with the same name, but defined in separate translation units. Declaring
the destructor for cor e_S t r i n9 and implementing it out of line solved all of the prob-
lems. This behavior is cryptic and implementation dependent; however, this is our
current reality.
In the style we have followed throughout this book, creators precede any other non-
static member functions. Thus, the first virtual member function encountered is
invariably the destructor. Also, in order for the address of a destructor to be placed in
a virtual function table, there must be at least one version of the destructor defined out
of line anyway. The requirement that there must be at least one virtual function
declared non-inline, coupled with the natural lexical position of the destructor within
the class, makes the destructor the natural choice to be declared virtual and defined
out of line.
Guideline
In classes that do not otherwise declare virtual functions, explicitly
declare the destructor as non-virtual and define it appropriately
(either inline or out-of-line).
For classes that do not otherwise declare virtual functions, implementing a virtual
destructor is not likely to be appropriate. Making the destructor alone virtual would,
in most implementations, increase the size of each instance by the size of a pointer.
For small objects such as geom_Poi nt, the increase in cost could be 50 percent. One
solution for guarding against memory leaks is that a class derived from a base class
with a non-virtual destructor should avoid managing additional resources that must be
released when the object is destroyed. 5o
For classes that do not require virtual functions, there is still a reason to require that
the destructor be declared explicitly. Calling the destructor of a fundamental type
explicitly is legal C++: 51
int i;
i. int: :----int(); / / 1ega 1 c++ ; doe s not h i n9
Attempting an explicit call to the destructor of an object that does not explicitly
declare one and for which none has been generated doesn't work on several current
compilers. Since it is not possible to take the address of a destructor, a destructor is
generated for a class that does not explicitly declare one only when a base class or
embedded member object has a destructor. 52 This fact has implications for template-
based container objects that attempt to call the destructor of the parameterized type
explicitly (Figure lO-33b provides a useful workaround). For consistency, it should be
possible to destroy any object in place, regardless of whether or not a destructor has
been defined.
9.4 Summary.
Member:
• We want to disable implicit user-defined conversion for its
leftmost argument.
• It modifies an argument (e.g., = += *= ++).
• The language requires membership (e.g., ( ) [J -»).
Non-Static:
• It depends on data contained within a specific instance of
the class.
• It is an operator function.
• It is static.
passed by value.
There are many alternative integral types available for use in the interface of func-
tions: short, uns i gned, long, etc. In practice, on a 32-bit machine, the only integral
type we need in the interface is i nt. Using any other type is potentially inefficient,
unencapsulating, error prone, or just plain annoying to use.
There are three alternative floating-point types available in c++: float, doubl e, and
long doub 1e. Traditionally all floating-point arguments in C were converted to doub 1e
before being passed as arguments. Most hardware is geared to handle do Ub1e values
as efficiently as possible. Unless there is a compelling reason to do otherwise, all
floating-point numbers should be expressed as do Ub1 e in the interface.
The c++ compiler automatically generates certain undefined functions (if needed).
There are a variety of reasons for not relying on the default behavior, particularly
when the interface is used widely throughout the system. Many implementations of
C++ depend on there being at least one virtual function defined out of line. In OUf
style, this will always be the destructor. Some current compilers do not allow explicit
calls to destructors that are not explicitly declared. In practice, it is wise to define the
destructor of every class explicitly. For classes with no virtual functions, define the
destructor inline or out of line as appropriate. For classes with virtual functions,
define the destructor out of line. For protocol classes (see Section 6.4.1.) the destruc-
tor should be empty.
Implementing an Object
The cavernous realm of object implementation alternatives is made ever more vast by
good (Le., small, encapsulating) interfaces. Making a design error here is far less
costly than errors at higher levels of design because the problem is confined to a tiny
portion of the overall system. Yet there are still several ways in which even individual
implementation techniques can combine during system integration to affect the over-
all success of a project.
A program must run in an environment with finite resource (e.g., memory). Classes
with many instances active at a single time put a premium on the size of their objects.
The sizes and order of their individual data members will affect this size. Custom
memory-management techniques can sometimes be used to double runtime perfor-
mance, but they can also cause a system, over time, to soak up much more memory
than is actually necessary.
In this final chapter, we examine some basic principles relating to the organizational
details of implementing classes in C++. We even proffer some suggestions on imple-
menting individual member functions. In the remainder of the chapter, we examine
several issues relating to custom memory management.
these techniques are integrated into larger systems. We then present object-specific
memory management as a preferred altemative~ne that avoids many of these prob-
lems while achieving essentially the same runtime performance as the class-based tech-
nique. Finally we discuss memory management in the context of templates, and provide
a detailed example of how to implement truly general-purpose container objects.
In this section we discuss logical and organizational issues pertaining to the choice
and ordering of data members within a class.
1 Oftena daubl e can be stored on an odd-word boundary (as opposed to an even-word boundary)
without disastrous consequences. However, on some architectures, failing to follow natural align-
ment for a daub 1e can result in a significant decrease in performance.
Section 10.1.1 Natural Alignment 663
An instance of an array of a given type has the same alignment requirement as that of
the type itself. Satisfying natural alignment for a user-defined type means satisfying
the alignment requirements of the most restrictive embedded subtype. Figure 10-1
gives some examples of natural alignment on a typical 32-bit machine.
The order in which data members are declared can affect object size.
The C++ language guarantees that in the absence of intervening access specifiers
(e.g., publ i c, protected, and pri vate), the memory for non-static data members
will be allocated with increasing address values corresponding to their order of decla-
ration within the structure; however, they need not be contiguous. 2 Alignment within
a structure can cause gaps at both the middle and end of a structure (but never at the
beginning). As a rule, one can assume natural alignment when it comes to organizing
the layout of a c1 ass or s t r uc t; however, one should not depend on it.
d c ?. ?. d- i1
?.
d- I•
d_d
d- i2
?.
Figure 10-2 gives the size, natural alignment, and corresponding object layout of three
user-defined types. Type 0 has a hole in the middle because the second data member is
forced to reside on a word boundary. Type E has two holes: the first hole is caused
because the doubl e, d_d, is forced to start on a double word (8-byte) boundary. The
second hole at the end is to ensure that each element in an array of E objects is also
aligned:
.
given: E a[N], b; II N is compile-time const with value> 0
then: assertCsizeof a == N * sizeof b):
Section 10.1.2 Fundamental Types Used in the Implementation 665
Considering the order in which data members are declared (to reduce object size)
becomes important when there will be many instances of the type active at one time.
We can reorganize the data members of type E in Figure 10-2 to eliminate the holes;
the result is type F, which is 33 percent smaller.
Whenever we attempt to allocate an object in place using the placement syntax for over-
loaded global operator new, we must make sure to do so at a properly aligned location.
We may assume that global new returns addresses that will work for the most restrictive
possible boundary. But we must be careful to avoid code such as the following:
We argued in Section 9.2 that it is wise to restrict the selection of fundamental types
used in the interface. The use of unusual fundamental types in the implementation
brings up a separate set of issues.
Guideline
Use short instead of i nt in the implementation as an optimization
only when it is known to be safe to do so.
public:
win_Point(int x, int y);
I / ...
};
Guideline
Consider not using unsi gned even in the implementation.
public: public:
pub_Array(unsigned int size); // ...
// ... int length() canst;
}; // ...
};
Figure 10-4 shows two classes in which the use of un signed and s ho rt is misplaced
(assuming a standard 32-bit architecture). In Figure 10-4a, the internal size is made
uns i gned to accommodate an array of up to 2 32 integers, presumably to avoid the pos-
sibility of overflow. This decision has prompted the class author to expose the
un s i 9 ned i n t in the interface as well. Even ignoring the adverse effect on the inter-
face, the reasoning that leads to using un s i 9 ned in this example is twice fallacious.
First, there is no way operator new is going to find space for anywhere near 232 (about
4 billion) contiguous integer-sized objects (for the foreseeable future at least). Sec-
ond, unless a pointer variable is larger than an i nt, the virtual address space limits the
total number of integers: 2 32 + s i zeaf (i nt) ::; 2 30 . In other words, a (signed) i nt,
which can hold positive values of up to 231 - 1, is more than big enough.
In Figure lO-4b, the core_Stri ng class defines its internal size to be a sh"ort because
it does not expect the length of a string to exceed several thousand. The internal vari-
able is then made uns i gned, just in case this value exceeds 32,767. Apart from the
loss in maintainability discussed in Section 9.2.2, in all but pathological cases, if
32,767 isn't known to be large enough, then 65,535 is suspect as well. Making a
s h 0 r t value un s i 9 ned "just in case" is tempting fate-it is usually better to use an
i nt than to risk disaster. The misuse of short in this case is made even more ridicu-
lous because natural alignment will create a hole where the other half of an i n t could
have been placed; using a short here saves nothing. 4
As with any localized, code-tuning effort, the decision to use alternate fundamental
integral types (e.g., short, char) to optimize the storage within an object is best
deferred until after the object is working, has been functionally tested, and perfor-
mance analysis data is available. A suite of thorough regression tests will help to
ensure that we do not optimize the correctness out of our implementation. 5
Typedefs are often helpful for expressing complex function declarations. Typedefs
also have a very useful place in the definitions of certain basic types that assume a
precise number of bits in the representation.
4 For further discussion regarding the inappropriate use of fixed-size arrays in the implementation,
see murray, Section 9.2.2, pp. 210-212.
5 See also murray, Sections 9.9-9.10, pp. 234-235; and cargill, Chapter 7, p. 138.
668 Implementing an Object Chapter 10
Sometimes we know exactly how many bits we need. For example, when we want to
store infonnation persistently (on disk) that is shared across heterogeneous platforms,
we want to make sure that our basic data types hold no more and no less precision
than needed. Figure IO-5a shows a systemwide header file that isolates the definitions
of types with absolute sizes. When porting to a new platform, we need change only
this one file in order to ensure that objects that assume absolute sizes are handled cor-
rectly. For example, Figure IO-5b shows a geom_Poi nt class that requires exactly 32
bits for each coordinate. Typically, an i nt corresponds to the word size of the
machine. Even on a 64-bit architecture we need only 32 bits for compatibility with
other architectures-why waste the space?
II sys_type.h II geom_point.h
#ifndef INCLUDED_SYS_TYPE #ifndef INCLUDED_GEOM_POINT
#define INCLUDED - SYS- TYPE #define INCLUDED_GEOM_POINT
#endif
(a) System-Wide Definitions File (b) Fixed-Size geom_Poi nt Class
In case you thought that only functions are worth testing, consider the test driver for
the sys_type component shown in Figure 10-6. Exercising this driver before any
other ensures that components that depend on fixed-size data types are not "fooling
themselves." We have isolated our configuration assumptions to a single file. Com-
pile-time coupling is not a problem here since the common information derives from
the lower-level compiler and the architecture of the machine (which is fixed), and not
from any higher-level extensible collection of components (which could change).6
II sys_type.t.c
#include "test_util.h" II define TEST_ASSERT, etc.
#include "sys_type.h"
maine)
{
TEST_BEGIN
TEST_ASSERTCI == sizeof(Int8));
TEST_ASSERT(2 == sizeof(Int16));
TEST_ASSERT(4 == sizeof(Int32));
TEST_ASSERT(l == sizeof(Uint8));
TEST_ASSERTC2 == sizeof(Uint16));
TEST_ASSERT(4 == sizeof(Uint32));
TEST_ASSERT(4 == sizeofCFloat32));
TEST_ASSERTC8 == sizeofCFloat64));
TEST_END
}
Once we are down to the level of implementing functions, most of our decisions are
localized. The cost of making a poor decision is therefore small, because changing it
typically does not affect a large amount of code. Even so, there are a few general
points to keep in mind when writing function bodies.
In longer functions, there are sometimes several paths that can lead to the same state-
ment; often this statement assumes internal conditions. Figure 10-7 illustrates a situa-
tion in which either the i f or the wh i 1 e might not be entered. In any case, the stated
condition that follows the if statement must hold true and is backed up by an assert
statement.
670 Implementing an Object Chapter 10
//
if (!q) {
while (p && 0 1= strcmp(name, p-)name()) {
p = p-)next();
}
}
I I ...
These kinds of internal self-checks do more than merely detect errors at runtime. The
practice of explicitly identifying an assumption encourages a crispness of thinking
that typically makes the logical flow of the function easier for others to follow.7
Obtaining code coverage is one common criterion used to measure the effectiveness
of tests. But the more paths there are through a function, the more difficult it can be to
assure ourselves that the function is reliable under all conditions.
For example, developers sometimes choose to use a pair of pointers when walking a
list to be modified. This approach requires treating the empty list as a special case (or
always maintaining a dummy first link). Instead of using the pointer to the current
link as a state variable, consider instead maintaining the address of the current link, as
was done for the Pt rBa gMa nip class shown in Figure 5-83.
Figure IO-8a shows an implementation that maintains both a pointer to the current
link and a pointer to' the previous link; the d_p rev Lin k_p pointer will be used to
update the d_next_p field of the previous link when the current link is removed. If the
current link happens to be the first link in the list, d_p rev Lin k_p will be 0, and we
will need to update the root of the list instead; we therefore retain a writable pointer to
the Pt r Ba 9 itself.
Implementation (b) requires only one state variable, and the complexity of removing a
node is significantly reduced.
(a) Treating the Boundary Condition (b) Treating the Boundary Condition
as a Special Case as Part of the Main Algorithm
The technique of maintaining the address of a pointer instead of the pointer itself is a
terse but powerful idiom for manipulating a variety of list-like structures:
s t r uc t Lin k { Lin k *d_n ex t_p; Lin k( Lin k * next) : d_n e xt_p ( next) {} I I ...
The extra level of indirection allows us to insert an element into an ordered list with-
out having to maintain two pointers or treat the empty list as a special case:
This idiom (or its for-loop equivalent) is used to implement the private functions copy
and end in the List class shown in Figure 6-19b, and is also used extensively to imple-
ment a hash-based symbol table (Figure 10-11) in the following section.
In Section 5.6 we argued that reuse of small functions could result in physical cou-
pling that is not worth the benefit of a factored implementation. However, within a
single well-designed component, there is little justification for replicating code. Often
construction, destruction, and assignment will share common algorithms. As with the
Lis t class shown in Figure 6-19, it can be useful to define a small set of more primi-
tive functions to factor out commonality from this basic public functionality, as indi-
cated in Figure 10-9.
It is interesting to note that the assignment operator is not completely primitive; it can be
implemented in terms of the destructor and the copy constructor, which are primitive:
Figure 10-10 shows the header file for a simple implementation of a symbol table.
This implementation uses closed hashing and so is implemented using a dynami-
cally allocated array of my_SymTabL ink pointers (d_tabl e_p) of size derived from
maxEnt ri esHi nt (d_s i ze). There are four basic operations provided: add if not
found, set whether or not found, remove and report if found, and lookup. Each of
these operations can be implemented separately, as shown in Figure 10-11 a. However,
each of these functions basically requires locating the pointer to the symbol (imple-
mented as a my_SymTabL ink). Note that only the remove method requires the address
of the pointer; both add and set can always add a new symbol to the front of the list
for a given hash slot, and lookup never adds a new symbol.
II my_symtab.h
#ifndef INCLUDED_MY_SYMTAB
#define INCLUDED_MY_SYMTAB
class my_Value;
class my_SymTabLink;
class my_SymTabIter;
class my_SymTab {
class my_SymTabLink **d_table_p; /1 closed hash table
int d_size; // size of hash table
friend my_SymTablter;
private:
my~SymTab(const my_SymTab&); /1 not implemented
my_SymTab& operator=(const my_SymTab&); II not implemented
public:
I I CREATORS
my_SymTab(int maxSymbolsHint = 0); II see Section 10.3.1
// Optionally specify approx. number of entries (default -500).
-my_SymTab ( ) ;
// MANIPULATORS
my_Value *add(const char* name);
II Adds a symbol to the table only if name is not already present.
II Returns a pointer to the internal value if added, and 0 otherwise.
my_Value& set(const char* name);
II Adds a symbol to the table if not already present. Returns a
II reference to the internal value of a symbol with specified name.
int remove(const char *name);
II Removes a symbol from the table. Returns a if the symbol with
/1 the specified name was found, and non-zero otherwise.
II ACCESSORS
my_Value *lookupCconst char *name) canst:
// Returns a poi nter to an exi sti ng symbol's va 1 ue, or 0 if
II a symbol with the specified name cannot be found.
};
my_SymTabIter { /* ... *1 };
#endif
II my_symtab.c
# include "my_symtab.h"
char
( ! p) {
p.= slot = new my_SymTabLink(name,
Section 10.2.3 Factor Instead of Duplicate 677
1/ my_symtab.c
# include "my_symtab.h"
*& :. ". . . . . . . . . . . .
**table,