C++ For Java Programmers
C++ For Java Programmers
C++ For Java Programmers
Contents
Preface xv
Chapter 0 Introduction 1
0.1 A History Lesson 1
0.2 High Level Differences 4
0.2.1 Compiled vs. Interpreted Code 4
0.2.2 Security and Robustness 4
0.2.3 Multithreading 5
0.2.4 API Differences 5
0.3 Ten Reasons To Use C++ 6
0.3.1 C++ Is Still Widely Used 6
0.3.2 Templates 6
0.3.3 Operator Overloading 6
0.3.4 Standard Template Library 6
0.3.5 Automatic Reclamation of Resources 6
0.3.6 Conditional Compilation 8
0.3.7 Distinctions Between Accessor and Mutator 8
0.3.8 Multiple Implementation Inheritance 8
0.3.9 Space Efficiency 8
0.3.10 Private Inheritance 8
0.4 Key Points 8
0.5 Exercises 9
v
c++book.mif Page vi Saturday, July 12, 2003 10:53 AM
vi Contents
Contents vii
viii Contents
Contents ix
x Contents
Contents xi
xii Contents
Contents xiii
xiv Contents
c++book.mif Page xv Saturday, July 12, 2003 10:53 AM
Preface
F
OR many years, C++ was the de-facto language of
choice in introductory CS courses, due largely to its support for object-oriented programming, as
well as its wide adoptance in industy. However, because C++ is arguably the most complex lan-
guage ever to be widely-used, Java which also supports object-oriented programming, has
recently emerged as the preferred introductory language. Nonetheless, demand for C++ skill is
still high in industry, and most universities require C++ programming at some point in the Com-
puter Science curriculum. Although Java and C++ look similar, programming in C++ is some-
what more challenging, and filled with subtle details. While there are many books that
thoroughly describe C++ (see the bibliography), the vast majority exceed 1,000 pages, and for
the most part are written for either experienced industry programmers or novices.
This book is designed as a quick start guide for students who are knowledgeable in an
object-oriented language (most likely Java) who would like to learn C++. Throughout the text,
we compare and contrast Java and C++, and show C++ substitutes for Java equivalents. We do
not describe in detail basic concepts (such as inheritance) that are common in C++; rather we
describe how the concepts are implemented in C++. This helps achieve one of the important
goals of this book, which is to keep the page count reasonably low. Consequently this book is
not appropriate for students with limited or no prior programming experience.
Organization
The book begins with a brief overview of C++ in Chapter 0. In Chapter 1, we describe some of
the basic expressions and statements in C++, which mostly mirrors simple Java syntax. Func-
tions, arrays, strings, and parameter passing are discussed in Chapter 2. We use the modern
alternative of introducing and using the standard vector and string classes in the C++ library,
xv
c++book.mif Page xvi Saturday, July 12, 2003 10:53 AM
xvi Preface
Acknowledgements
For this text, I would like to thank my editor Alan Apt and his assistants Toni Holm,
Patrick Linder, and Jake Warde.
<<Add acks for copy-edits, cover, production, marketing>>
I also thank the following reviewers, who provided valuable comments, many of which
have been incorporated into the text:
XXXX, University of YYY
XXXX, University of YYY
XXXX, University of YYY
XXXX, University of YYY
Some of the material in this text (especially Chapters 1, 2, 3, 11, and 12) is adapted from
my textbook Efficient C Programming: A Practical Approach (Prentice-Hall, 1995).
My World Wide Web page http://www.cs.fiu.edu/~weiss will contain updated
source code, an errata list, and a link for receiving bug reports.
M.A.W
Miami, Florida
June, 2003
c++book.mif Page 1 Saturday, July 12, 2003 10:53 AM
C H A P T E R 0
Introduction
In 1972, Dennis Ritchie designed C and implemented it on a PDP-11. The initial design of C
reflected the fact that it was used to implement an operating system. Until then, all operating
systems were written in an assembly language because compiler technology did not generate
sufficiently efficient code from higher-level languages. C provided constructs, such as pointer
arithmetic, direct memory access, increment operators, bit operations, and hexadecimal con-
stants that mimicked the PDP-11’s instruction set, while at the same time keeping the language
small but high level. Consequently, the compiler was able to generate reasonably efficient code.
Eventually, Unix was written mostly in C, with only a few parts requiring assembly language.
In early 2002, a top-of-the-line home computer that sells for less than $2,000 executes
several hundred million instructions per second. Home computers can be purchased with 1
1
c++book.mif Page 2 Saturday, July 12, 2003 10:53 AM
2 Chapter 0 • Introduction
Gigabyte of main memory, and 120 Gigabytes of hard drive space. By the time you read this, the
computer just described may well be a relic.
On the other hand, a PDP-11/45, which this author actually used as a college undergradu-
ate, sold for well over $10,000 (in 1970 dollars). Our model had 128 Kilobytes of main memory,
and executed only several thousand instructions per second. The PDP-11/45 was a 16 bit
machine, so not surprisingly, the int type was 16 bits. But other PDP models, had 12, or 18, or
even 36 bits. The C compiler on the PDP-11/45 would typically take about 30 seconds to com-
pile a trivial one hundred line program. But since it used 32K of memory to do so, compilations
were queued, like printer jobs, to avoid compiling two programs at once and crashing the sys-
tem! Our PDP-11/45 supported over 20 simultaneous users, all connected via old-style “dumb”
terminals. Most of the terminals operated at 110 baud, though there were a few fast ones that
went at 300 baud. At 110 baud, approximately 10 characters per second are transmitted. So
screen editors were not widely used; instead editing was done a line at a time, using the Unix
editor ed (which still exists!). C thus provided rather terse syntax to minimize typing, and more
importantly displaying.
In this environment, it is not surprising that the number one goal of the compiler was to
compile correct programs as fast as possible into code that was as fast as possible. And since the
main users were typically experts, dealing with incorrect programs was left to the programmer,
rather than the compiler (or runtime system). No checks were performed to ensure that a vari-
able was assigned a value prior to use of its value. After all, who wanted to wait any longer for
the program to compile? Arrays were implemented in the same manner as in an assembly lan-
guage, with no bounds checking to consume precious CPU cycles. A host of programming prac-
tices that involved the use of pointers to achieve performance benefits emerged. There was even
a reserved word, register, that was used to suggest to the compiler that a particular local
variable should be stored in a machine register, rather than on the runtime stack, to make the
program run faster, since the compiler would not try to do any flow analysis on its own.
Soon, Unix became popular in the academic community, and with it, the C language grew.
At one point, C was known as the great portable language, suitable for systems use on many
machines. Unix itself was ported to a host of platforms. Eventually a host of software vendors
started producing C compilers, adding their own extra features, but in so doing, made the lan-
guage less portable, in part because the language specification was vague in places, allow com-
peting interpretations. Also, a host of clever but nonetheless unacceptable and unportable
programming tricks had emerged; it became clear that these tricks should be disallowed. Since
compiler technology had become better and computers had become faster, it also because feasi-
ble to add helpful features to the language, in the hopes of enhancing portability, and reducing
the chances of undetected errors. Eventually, in 1989, this resulted in ANSI C, which is now the
gold standard of C that most programmer adhere to. In 1999, C99 was adopted, adding some
new features, but since C99 is not yet widely implemented in compilers, few programmers find
a need to use those features.
c++book.mif Page 3 Saturday, July 12, 2003 10:53 AM
A History Lesson 3
4 Chapter 0 • Introduction
ming errors avoided, system security against hacking is enhanced, since hacking generally
works by having the system do something it is not designed to do.
The Java designers have several advantages. First, in designing the language, they can
expect the compiler to work hard. We no longer use PDP-11s, and compiler research has
advanced greatly to the point that modern optimizing compilers can do a terrific job of creating
code, without requiring the programmer to resort to various tricks common in the good old days.
Second, Java designers specified most of the language (classes, inheritance, exceptions) at once,
adding only a second minor revision (inner classes), with most changes after Java 1.1 dealing
with the library. In doing so, they were able to have language features that don’t clash.
Although Java is the new kid on the block (unless C# becomes the next new kid), and has
lots of nice features, it is certainly not true that Java is better than C++, Nor would we say that
C++ is better than Java. Instead, a modern programmer should be able to use both languages, as
each language has applications that can make it the logical choice.
regard. C++ suffers several problems that can never occur in pure Java code. Four that stand out
are the following:
First, it is possible in C++ to have a pointer or reference to an object that has been returned
back to the memory heap, which is a sure disaster. This is because standard C++ does not do gar-
bage collection; instead the programmer must manage memory themselves, and programmers
are surprisingly bad at doing so. However, some C++ systems include garbage collection and
add runtime checks to avoid using stale pointers. These systems are quite close to the Java stan-
dard of avoiding memory problems.
Second, Standard C++ does not check array indexes, and a common hacker attack is to
find an input routine that reads a string, but doesn’t check that there is enough space for a very
very long string. By judiciously passing a huge string, the hacker can overflow the buffer, writ-
ing replacement values onto variables that are stored in memory adjacent to the buffer. Although
this could never happen in Java, since the C++ specification does not disallow bounds checks,
there is no reason that a safe C++ system couldn’t check array bounds. It would simply be
slower (but safer) than a competitor, and so it is not widely done by default.
Third, old C++ typecasts allow type confusion, in which a type is cast to an unrelated type.
This can never happen in Java, but is allowed in C++.
Fourth, in Java, all variables have a definite assigned value prior to use of their value. This
is because variables that are not local to a method by default are initialized to zero for primitives
and null for references. For local variables, an entire chapter of the Java Language Specifica-
tion is devoted to definite assignment, whereby the compiler is required to perform a flow analy-
sis and produce an error message if a local variable cannot be proven (under a long set of rules)
to have been definitely assigned to through all flows of the method. In C++, this behavior is not
required. Rather, a program that uses an uninitialized variable is said to be incorrect, but the
compiler is not required to take any particular action. Most C++ compilers will print warning
messages if uninitialized variables are detected. But the program will still compile. A similar
story occurs in the case of having a flow that fails to return a value in a non-void function.
0.2.3 Multithreading
C++ does not support multithreading as part of the language. Instead, one must use a set of
library routines that are native to the particular platform. Although Java supports multithreading,
the Java memory model and threading specification has recently been discovered to be inade-
quate and is undergoing revision.
6 Chapter 0 • Introduction
Standard C++ has a very small API, containing little more than some I/O support, a com-
plex number package, and a Collections API, known as the Standard Template Library (STL).
Of course, compiler vendors augment Standard C++ with huge libraries, but each vendor has
different versions rather than implementing a single standard.
0.3.2 Templates
Perhaps the most requested missing feature in Java is the equivalent of the C++ template. In
C++, templates allow the writing of generic, type-independent code, such as sorting algorithms
and generic data structures that work for any type. In Java, this is done by using inheritance, but
the downside is that many errors are not detected at compile time, but instead linger until run
time. Further, using templates in C++ seems to lead to faster code than the inheritance-based
alternative in Java. Generics are under consideration for Java 1.5. Template are discussed in
Chapter 7.
so on. Although many (non-memory) resources are released by object finalization, relying on
object finalization in Java is a poor idea, since objects need not be reclaimed if the garbage col-
lector deems that memory is not low.
Portability Hi Moderate
Templates No Yes
Reflection Yes No
8 Chapter 0 • Introduction
In C++, each class can provide a special method known as the destructor, which will auto-
matically be invoked when an object is no longer active. Careful C++ programmers need not
remember to release non-memory resources, and memory resource can often be released by lay-
ering the memory allocations inside of classes. Many Java programmers who have prior C++
experience lament the lack of destructors. We describe destructors in Section 4.6.
• C++ is based on C
c++book.mif Page 9 Saturday, July 12, 2003 10:53 AM
Exercises 9
• The most important consideration in the design of C++ is to make correct programs run as
fast as possible.
• Compile-time checks in C++ are not as rigid as in Java, but now many compilers perform
some of the same checks as a Java compiler, yield warning messages.
• Run time checks in C++ are not as rigid in Java. In C++, bad array indexes, bad type casts,
and bad pointers do not automatically cause a runtime error. Some compilers will do more
than the minimum, and issue runtime errors for you, but this cannot be relied on.
• Compiled units are not compatible across different types of machines, nor are the compat-
ible on the same machine, when generated by different compilers.
• Although the STL in C++ is excellent, the remainder of the Standard C++ library pales in
comparison to Java. But many non-standard additions are available.
0.5 Exercises
1. Compare and contrast the basic design goals of C++ and Java.
2. What are the different versions of C++?
3. What are the basic differences between compiled and interpreted languages?
4. What is a buffer overflow problem?
5. List features of Java that are not part of Standard C++ and result in C++ being a less safe
language.
6. Describe some features of C++ that make it more attractive than Java.
c++book.mif Page 10 Saturday, July 12, 2003 10:53 AM
10 Chapter 0 • Introduction
c++book.mif Page 11 Saturday, July 12, 2003 10:53 AM
C H A P T E R 1
The typical first C++ program is shown in Figure 1-1. By way of comparison, the equivalent
Java code is shown in Figure 1-2. The C++ program should be placed in a file that has an appro-
priate suffix. C++ does not specify a particular suffix, but most compilers recognize .cpp.
Like Java, control starts at main. Unlike Java, main is not part of any class; instead, it is a non-
class method. In C++ methods are called member functions, and we will adopt the convention
that a function that is not declared as part of a class is simply a function (with the adjective mem-
ber conveniently omitted). C++ also allows global variables.
11
c++book.mif Page 12 Saturday, July 12, 2003 10:53 AM
1 #include <iostream>
2 using namespace std;
3
4 int main( )
5 {
6 cout << "Hello world" << endl;
7 return 0;
8 }
main must always be declared in global scope, it must have return type int, which by
convention will be 0 unless a non-zero error code is to be transmitted back to the invoking pro-
cess, in a manner that is similar to calling System.exit (we will always return 0). Although
some programmers prefer to use a void return type, the language specification is clear that the
return type of main should be int.
main can take additional parameters for command-line arguments (we will discuss this in
Section 11.5). Because main is in global scope, there can be only one version of main. This
contrasts with Java, which allows one main per class.
Line 1 is an include directive that reads the declarations for the standard I/O routines. The
include directive has the effect of having the compiler logically insert the source code taken
from another file. Specifically, the file name that resides in between < and > refers to a system
file. Alternatively, a pair of double quotes can be used to specify a user-defined (non-system)
file. Thus in our example, the entire contents of the file iostream, which resides in a system-
dependent location, is substituted for line 1. In Section 2.1.7 we discuss how the include direc-
tive is typically used to enable faster compilations.
In C++, lines that begin with the # are preprocessor directives. We will see an important
use of preprocessor directives in Section 4.12.
c++book.mif Page 13 Saturday, July 12, 2003 10:53 AM
Primitive Types 13
1.1.4 Output
Ignoring the return statement, the simple program in Figure 1-1 consists of a single statement.
This statement, shown at line 6 is the output mechanism in C++. Here a constant string is placed
on the standard output stream cout. Simple terminal input is similar to output, with cin
replacing cout, and >> replacing <<. As an example:
int x;
cout << "Enter a value of x: ";
cin >> x;
reads the next series of characters (skipping whitespace), interpreting them as an integer. Of
course, this example fails to discuss handling input errors, which is crucial in any serious appli-
cation. Input and output is discussed in more detail in Chapter 9.
Thus, whereas a short might store values in the range -32768 to 32767, an unsigned
short could store values in the range 0 to 65535. This use of unsigned to double the set of
range of positive integers carries the significant danger that mixing unsigned and signed values
can produce surprising results, so it is best to use unsigned types judiciously. An unsigned con-
stant includes a U at the end, as in 1000UL. By default, the integer types are signed.
C++ does not define an equivalent to the byte data type. Historically, C++ programmers
have used signed char, which in C++ is simply eight bits, for this purpose.
Syntactic Differences 15
1.3.2 Conditionals
The if statement in C++ is identical to Java, except that in C++ the condition of the if statement
can be either an int or a bool. This stems from the fact that historically, early implementa-
tions of C++ did not have a bool type, but instead interpreted 0 as false, and anything non-zero
as true. The unfortunate consequence of this decision is that code such as
if( x = 0 )
c++book.mif Page 16 Saturday, July 12, 2003 10:53 AM
which uses = instead of the intended == is not flagged as a compiler error, but instead sets x to
zero and evaluates the condition to false. This is possibly the most common trivial C++ pro-
gramming error.
Occasionally, especially when reading old C++ code, shorthands that make use of the fact
that nonzero evaluates to true are placed in the conditional expression. Thus it is not uncommon
to see tests that should be
if( i != 0 )
rewritten as
if( i )
Newly written code should avoid these types of shortcuts since they tend to make the code less
readable, and no more efficient.
1.3.3 Loops
Like Java, C++ has the for loop, the while loop, and the do loop, along with the break and con-
tinue statements. However in C++, the use of the labelled break and labelled continue to affect
flow of an outer loop is not allowed.
C++ allows the goto, but its use is strongly discouraged.
Additional Syntax 17
First, let us mention that C++ is more liberal than Java in accepting code without a cast. In
Java,
double x = 6.0;
int y;
y = x;
does not compile, but the code does compile (possibly with a warning about losing precision) in
C++. In some cases, typecasting is essential in order to produce a correct answer, as in
double quotient;
int x = 6;
int y = 10;
quotient = x / y;
In this example, in both Java and C++, quotient evaluates to 0.0 rather than 0.6, because
the operands are both of type int, and so division is performed with truncation to an int.
There are several syntactical ways to handle the typecast. The Java style, which also works in
C++,
quotient = (double) x / (double) y;
in which we cast both operands is easy to read. Since only one operand needs to be a double,
quotient = (double) x / y;
also works. However, this code is hard to read because one must be aware that the precedence of
the typecast operator is higher than the precedence of division, which is why the typecast applies
to x, and not x/y. An alternative in C++ is preferred:
quotient = double( x ) / y;
Here, the parentheses surround the expression to be casted, rather than the new type. (Note this
does not work for complex types, such as unsigned int).
Although this second style is preferable, it is hard to find the typecasts that are being used
in a program, since the syntax does not stand out. A third form, a late addition to C++, is to use
the static cast:
quotient = static_cast<double>( x ) / y;
Though this is more typing, it is easy to find the casts mechanically.
A different type of typecast, useful for downcasting in inheritance hierarchies is the
dynamic_cast. We discuss this in Section 6.7.
1.4.2 Labels
C++ allows labels prior to virtually any statement, for use with the goto statement. Since gotos
typically are symptomatic of poorly designed code, one would rarely expect to see labels in C++
code.
to existing types. As an example, suppose that we need to declare objects of type 32-bit
unsigned integer. On some machines this might be unsigned long, while on other, perhaps an
unsigned int is the most appropriate type. On the first machine, we would use
typedef unsigned long uint32;
whereas on the second machine, we would use
typedef unsigned int uint32;
Now, in the rest of the code, either machine can declare variables of type uint32:
uint32 x, y, z;
meaning that the non-portable aspect of the code is confined to one line.
Exercises 19
• The typedef statement allows the programmer to assign meaningful names to existing
types.
1.6 Exercises
C H A P T E R 2
I
N this continuation of the discussion in Chapter 1, we
examine how non-class methods are written, discuss the library string and growable array types,
and see several parameter passing mechanisms that are available in C++.
2.1 Functions
In Section 1.1.1 we mentioned that C++ allows methods that are not members of a class. Typi-
cally we refer to these as functions. This section discusses functions; member functions are dis-
cussed in Chapter 4.
21
c++book.mif Page 22 Saturday, July 12, 2003 10:53 AM
1 #include <iostream>
2 using namespace std;
3
4 int max2( int a, int b )
5 {
6 return a > b ? a : b;
7 }
8
9 int main( )
10 {
11 int x = 37;
12 int y = 52;
13
14 cout << "Max is " << max2( x, y ) << endl;
15 return 0;
16 }
Functions 23
1 #include <iostream>
2 using namespace std;
3
4 int main( )
5 {
6 int x = 37;
7 int y = 52;
8
9 cout << "Max is " << max2( x, y ) << endl;
10 return 0;
11 }
12
13 int max2( int a, int b )
14 {
15 return a > b ? a : b;
16 }
The syntax of the function prototype is for all intents and purposes identical to the listing
of a method in a Java interface. For instance, the prototype for max2 is:
The prototype does not include the body, but instead terminates the declaration with a
semicolon. The prototype, which is also known as a function declaration, allows the max2 func-
tion, which is presumed to be defined elsewhere, to be a candidate when max2 is invoked. Typ-
1 #include <iostream>
2 using namespace std;
3
4 int max2( int a, int b );
5
6 int main( )
7 {
8 int x = 37;
9 int y = 52;
10
11 cout << "Max is " << max2( x, y ) << endl;
12 return 0;
13 }
14
15 int max2( int a, int b )
16 {
17 return a > b ? a : b;
18 }
ical strategy would thus involve listing all the prototypes prior to the first function definition,
thus assuring that all functions can call all other functions. Figure 2-3 illustrates the use of the
function prototype.
Functions 25
1 #include <iostream>
2 using namespace std;
3
4 inline int max2( int a, int b )
5 {
6 return a > b ? a : b;
7 }
8
9 int main( )
10 {
11 int x = 37;
12 int y = 52;
13
14 cout << "Max is " << max2( x, y ) << endl;
15 return 0;
16 }
Figure 2-4 Inline method definition must be visible prior to call to be expanded
Modern compilers have very sophisticated techniques that are used decide if honoring the
directive produces better code. The compiler may well refuse to perform the optimization for
functions that are too long. Additionally, the directive is likely to be ignored for recursive func-
tions.
Often we would like to split a program up into several source files. In such a case, we expect to
be able to invoke a function that is defined in one file from a point that is not in the same file.
This is acceptable as long as a prototype is visible.
The typical scenario to allow this is that instead of having each file list its function decla-
rations at the top, each file creates a corresponding .h file with the function declarations. Then,
any file that needs these declarations can provide an appropriate include directive.
In our example this gives three files all shown in Figure 2-5. The file max2.h simply lists
the function declarations for all functions defined in max2.cpp. The main program is defined
is a separate file and provides the include directive. Recall that this directive replaces line 5 with
the contents of the file max2.h. Finally, max2.cpp provides an include directive also. This
include directive is not needed, but would be typical in the case in which max2.cpp had sev-
eral functions that were calling each other. Section 4.12 discusses one other issue that is com-
mon with this technique.
Compiling this program depends on the platform. Most IDEs perform two steps: compila-
tion and linking. The compilation stage verifies the syntax and generates object code for each of
the files. The linking stage is used to resolve the function invocations with the actual definitions
(sometimes this occurs at runtime).
c++book.mif Page 26 Saturday, July 12, 2003 10:53 AM
comparable as the difference between Java’s built-in array type and ArrayList library type.
Actually, this is probably an understatement, because using built-in arrays in C++ is much more
difficult than using the Java counterpart, whereas vector and ArrayList are for all intents
and purposes identical. The reason is that using C++ built-in arrays may require you to indepen-
dently maintain the array size and also provide additional code to reclaim memory. vector has
neither of these problems. The built-in C++ array type is discussed in Chapter 11. If you would
like to keep your blood pressure low, it’s a good idea to stick with the vector type as much as
possible.
Similarly, strings in C++ come in two basic flavors. The primitive string is simply a built-
in array of characters, and are exasperating and dangerous to use. See Chapter 11. The library
type, string, is a full-fledged string class and is easy to use.
1 #include <iostream>
2 #include <vector>
3 using namespace std;
4
5 int main( )
6 {
7 vector<int> squares;
8
9 for( int i = 0; i < 100; i++ )
10 squares.push_back( i * i );
11
12 for( int j = 0; j < squares.size( ); j++ )
13 cout << j << " squared is " << squares[ j ] << endl;
14
15 return 0;
16 }
vector<int> arr3( );
thinking that the default parameter would use size 0. Unfortunately this is wrong! The declara-
tion above does not create a vector; instead, it states that arr3 is a function that takes no
parameters and returns a vector<int>. Ouch! The result would almost certainly be a bizarre
string of unintelligible error messages when arr3 was used later.
To use the string library type, you must have the include directive (and possibly the usual
using namespace std directive if it is not already present):
#include <string>
Because objects in Java are meant to look like primitive types, strings in C++ are also
easier to use than in Java. Most importantly, the normal equality and relational operators ==, !=,
<, <=, >, >= all work for the C++ string type. No more bugs because you forgot to use
.equals. When = is applied to a string, a copy is made; changes to the original do not
affect the copy. The length of a string can always be obtained by calling the length mem-
ber function.
Java strings are immutable: once created a Java string’s state cannot be changed. In con-
trast, the C++ string is mutable. This has two consequences. First, using the array indexing
operator [], not only can individual characters be accessed, they easily can be changed (how-
ever, as in vector, no bounds checking is performed). Second, the += operator is efficient; no
more StringBuffers! Figure 2-7 illustrates the use of string concatenation;
makeLongString takes quadratic time in Java, but is linear in C++.
c++book.mif Page 29 Saturday, July 12, 2003 10:53 AM
1 #include <iostream>
2 #include <string>
3 using namespace std;
4
5 // return a string that contains n As
6 // In Java this code takes forever
7 string makeLongString( int n )
8 {
9 string result = "";
10
11 for( int i = 0; i < n; i++ )
12 result += "A";
13
14 return result;
15 }
16
17 int main( )
18 {
19 string manyAs = makeLongString( 250000 );
20
21 cout << "Short string is " << makeLongString( 20 ) << endl;
22 cout << "Length is " << manyAs.length( ) << endl;
23
24 return 0;
25 }
Figure 2-7 String concatenation is efficient in C++ because strings are mutable
Two member functions deserve special attention. First, the member function to get sub-
strings is substr. However, unlike Java, the parameters in C++ represent the starting position
and length of the substring, rather than the starting position and first non-included position. Thus
in
string s = "hello";
string sub = s.substr( 2, 2 ); // gives "ll"
sub is a string of length 2, whose first character is s[2].
Second, some legacy code (and C++ libraries) expect primitive strings, not the string
library type. Resist the temptation to use primitive strings. Instead, do all your dirty work with
the string library type, and when a primitive string is required, use the c_str string member
function that is part of string to obtain the primitive equivalent.
1 #include <iostream>
2 using namespace std;
3
4 // Incorrect implementation of swap2
5 void swap2( int val1, int val2 )
6 {
7 int tmp = val1;
8 val1 = val2;
9 val2 = tmp;
10 }
11
12 int main( )
13 {
14 int x = 37;
15 int y = 52;
16
17 swap2( x, y );
18
19 cout << x << " " << y << endl;
20
21 return 0;
22 }
Figure 2-8 Swap function that does not work because of call-by-value
objects. Specifically, all the items in the array are objects that have been created by calling an
appropriate zero-parameter constructor. In C++, this does not work well when the objects are
different types (for instance they are related via inheritance), but if they are the same type, e.g.
string, it works fine and no special syntax is required.
Parameter Passing 31
1 #include <iostream>
2 using namespace std;
3
4 // Correct implementation of swap2
5 void swap2( int & a, int & b )
6 {
7 int tmp = a;
8 a = b;
9 b = tmp;
10 }
11
12 int main( )
13 {
14 int x = 37;
15 int y = 52;
16
17 swap2( x, y );
18
19 cout << x << " " << y << endl;
20
21 return 0;
22 }
In Java, all parameters are passed using call-by-value: the value of the actual arguments are cop-
ied into the formal parameters.
Parameter passing in C++ is also call-by-value by default. Although this is often desirable,
there are two separate situations where an alternative is useful.
Figure 2-8 illustrates a function, swap2, that attempts to swap the values of its parameters. But
if main is run, the values of x and y are not swapped, because of call-by-value. The code swaps
val1 and val2, but since these are simply copies of x and y, it is not possible to write a swap
routine that changes x and y when x and y are passed using call-by-value.
C++ allows call-by-reference. To pass a parameter using call-by-reference, we simply
place an & prior to each of the parameters that are to be passed using this mechanism. Thus,
some parameters can be passed using call-by-value, and others using call-by-reference. The
modified version of swap2, which is now correct, is shown in Figure 2-9. Observe that main is
unchanged no special syntax is used to invoke the method.
c++book.mif Page 32 Saturday, July 12, 2003 10:53 AM
Because formal arguments that are passed using call-by-reference are modifiable in the
invoked function, it is illegal to pass a constant using call by reference. Thus the code
swap2(x,3) would not compile.
Actual arguments must be type-compatible with the formal arguments, without the use of
a typecast. This is required because a typecast generates a temporary variable, and the temporary
variable would become the actual argument, and then changes to the formal parameter in the
invoked function would change the temporary variable (instead of the original), leading to hard-
to-find bugs.
1 #include <iostream>
2 #include <vector>
3 using namespace std;
4
5 // Broken binarySearch because of call-by-value
6 int binarySearch( vector<int> arr, int x )
7 {
8 int low = 0, high = arr.size( ) - 1;
9
10 while( low <= high )
11 {
12 int mid = ( low + high ) / 2;
13 if( arr[ mid ] == x )
14 return mid;
15 else if( x < arr[ mid ] )
16 high = mid - 1;
17 else
18 low = mid + 1;
19 }
20
21 return -1; // not found
22 }
23
24 int main( )
25 {
26 vector<int> v;
27
28 for( int i = 0; i < 30000; i++ )
29 v.push_back( i * i );
30
31 for( int j = 100000; j < 105000; j++ )
32 if( binarySearch( v, j ) >= 0 )
33 cout << j << " is a perfect square" << endl;
34
35 return 0;
36 }
Parameter Passing 33
Finally, we mention that the parameter passing mechanism is part of the signature of the
method and must be included in function declarations.
execute in less than a millisecond. However, the code takes noticeable longer, using up seconds
of CPU time. Here the problem is not the binary search, but the parameter passing: because we
are using call-by-value, each call to binarySearch makes a complete copy of vector v.
Needless to say, 5,000 copies of a 30,000 element vector doesn’t come cheap. This
problem never occurs in Java because all objects (non-primitive entities) are accessed using Java
reference variables, and so objects are always shared, and never copied by using =.
But in C++, a variable of type vector<int> stores the entire state of the object, and =
copies an entire object to another object. Similarly call-by-value copies the entire state of the
actual argument to the formal parameter.
Certainly one way to avoid this problem would be to use call-by-reference. Then the for-
mal parameter is just another name for the actual argument; no copy is made. Although this
would significantly increase the speed of the program, and solve the problem, there are two seri-
ous drawbacks. First, using call-by-reference changes the semantics of the function in that the
caller no longer knows that the actual argument will be unchanged after the call. Second, as we
mentioned in Section 2.3.1, constants or actual arguments requiring typecasts would no longer
be acceptable parameters.
The solution, shown in Figure 2-11 is to augment the call-by-reference with the reserved
word const, signifying that the parameters are to be passed by reference, but that the function
promises not to make any changes to the formal parameter. Thus the actual argument is pro-
tected from being changed. If the function implementation attempts to make a change to a
const formal parameter, the compiler will complain and the function will not compile (it is
possible to cast away the const-ness and subvert this rule, but that’s life with C++). We denote
this parameter passing mechanism as call-by-constant reference (even though the more verbose
call-by-reference to a constant is more accurate)
When a parameter is passed using call-by-constant reference, it is acceptable to supply a
constant or an actual argument that requires a typecast. In effect, as far as the caller is concerned,
call-by-constant reference has the same semantics as call-by-value, except that copying is
avoided.
• Call by reference is required for any object that may be altered by the function.
• Call by value is appropriate for small objects that should not be altered by the function.
This generally includes primitive types and also function objects (Section 7.6.3).
• Call by constant reference is appropriate for large objects that should not be altered by the
function. This generally includes library containers such as vector, general class types,
and even string.
c++book.mif Page 35 Saturday, July 12, 2003 10:53 AM
Key Points 35
• A function declaration consists of the function minus the body and looks like a method
declaration in a Java interface. A function definition includes the body.
• When invoking a function, only those functions who declarations or definitions have
already been seen are eligible to be candidates.
• C++ allows default parameters; the default parameters must be the last parameters.
• In C++, objects can be created without calling new, and accessed and copied as if they
were primitive entities. See Chapter 3 for more information about this.
• C++ supports arrays using both a primitive array and a vector library type. Similarly,
strings can be supported using both primitive arrays of char and a string library type.
• The vector in C++ has functionality that is similar to the ArrayList class in Java.
• The string class in C++ has functionality that is similar to the String class in Java.
The += operator is efficient, the substring function substr takes a different second
parameter, and all the relational operators work, so methods such as equals and
compareTo are not required.
• C++ supports three modes of parameter passing: call-by-value, call-by-reference, and call-
by-constant reference.
2.5 Exercises
1. Grab the most recent tax table. Write a function that takes an adjusted gross income and
filing status (married, single, etc.) and returns the amount of tax owed. Write a test pro-
gram to verify that your function behaves correctly.
2. Write a function to compute X N for nonegative integers N. Assume that X 0 = 1.
3. What is the difference between a function declaration and a function definition?
4. Describe default parameters in C++.
5. What is an inline directive in C++?
6. Describe how C++ supports separate compilation.
7. Write a function that accepts a vector of strings and returns a vector containing the strings
in the vector parameter that have the longest length (in other words, if there are ten strings
that are tied for being the longest length, the return value is a vector of size ten containing
those strings). Then write a test program that reads an arbitrary number of strings, invokes
the function, and outputs the strings returned by the vector.
8. What are the basic differences between the C++ string and the Java String classes?
9. What are the different parameters passing mechainsms in C++, and when are they used?
c++book.mif Page 36 Saturday, July 12, 2003 10:53 AM
C H A P T E R 3
P
OSSIBLY the most significant difference between
Java and C++ concerns how objects are managed by the system. Java has a simple model that
guarantees, among other things, that once an object has been created, the Virtual Machine will
never attempt to destroy the object’s memory as long as the object is being actively referenced,
either directly or indirectly. C++, on the other hand, allows objects to be accessed in many dif-
ferent ways, making coding very tricky, and leading to many subtle and hard-to-find runtime
errors.
In this chapter, we review the Java typing system, briefly describe the C++ object model,
and introduce the C++ concepts of pointer variables and reference variables. We will also
describe how the C++ programmer destroys objects that are no longer needed.
37
c++book.mif Page 38 Saturday, July 12, 2003 10:53 AM
the runtime stack, and the local primitive variable is automatically invalidated when the block
that it was created in ends.
The C++ memory model is significantly more complicated. As we have already seen,
local variables, including objects in C++ (such as vectors and strings) can be created with-
out calling new. In such as case, these local objects are allocated on the runtime stack and have
the same semantics as primitive types. Two important consequences of this are as follows:
First, when = is applied to objects, the state of one object is copied. This contrasts to Java,
where objects are accessed indirectly, by reference variables, and an = copies the value of the
reference variable, rather than the state of the object.
Second, when the block in which a local object is created ends, the object will automati-
cally be reclaimed by the system.
Thus in the following code:
void silly( )
{
vector<int> arr1;
vector<int> arr2;
... // other code not shown
arr2 = arr1;
... // other code not shown
}
arr1 and arr2 are separate vector<int> objects. The statement arr2=arr1 copies the
entire contents of vector<int> arr1 into vector<int> arr2. When silly returns,
arr1 and arr2 will be destroyed, and their memory reclaimed, as part of the function return
sequence.
This example illustrates a signficant difference between Java and C++. In Java, objects are
shared by several reference variables and are rarely copied. C++ encourages, by default, the
copying of objects. However, copying objects can take time, and often we need to avoid this
expense. For instance, we have already seen in Section 2.3 additional syntax to allow copies to
be avoided.
In a more general setting inherited from C, C++ allows the programmer to obtain a vari-
able that stores the memory address where the object is being kept. Such a variable is called a
pointer variable in C++. Pointer variables in C++ have many of the semantics as reference vari-
ables in Java, with extra flexibility (that implies extra dangers). C++ also has another type of
variable called the reference variable, which despite its name is not similar to the reference vari-
able in Java. In the remainder of this chapter, we will discuss both types of variables, and several
tricky C++ issues that are associated with their use.
3.2 Pointers
A pointer variable in C++ is a variable that stores the memory address of any other entity. The
entity can be a primitive type or a class type; however, using pointer variables for primitive
types in good C++ code is rarely needed. Since it does simplify our examples, we will make use
c++book.mif Page 39 Saturday, July 12, 2003 10:53 AM
Pointers 39
ptr x y
(&ptr) 1200 1000
(&x) 1000 x = 10
(&y) 1004 y=7 10 7
ptr x y
(&ptr) 1200 ptr = &x = 1000
The data being pointed at is obtained by the unary dereferencing operator *. In Figure 3-1
*ptr will evaluate to 5, which is the value of the pointed-at variable x. It is illegal to derefer-
ence something that is not a pointer. The * operator is the inverse of & (for example, *&x=5 is
the same as x=5 as long as &x is legal). Dereferencing works not only for reading values from
an object but also for writing new values to the object. Thus, if we say
*ptr = 10; // LEGAL
we have changed the value of x to 10. Figure 3-2 shows the changes that result. This shows the
problem with pointers: Unrestricted alterations are possible, and a runaway pointer can over-
write all sorts of variables unintentionally.
We could also have initialized ptr at declaration time by having it point to x:
int x = 5;
int y = 7;
int *ptr = &x; // LEGAL
The declaration says that x is an int initialized to 5, y is an int initialized to 7, and ptr is a
pointer to an int and is initialized to point at x. Let us look at what could have gone wrong.
The following declaration sequence is incorrect:
int *ptr = &x; // ILLEGAL: x is not declared yet
int x = 5;
int y = 7;
Here we are using x before it has been declared, so the compiler will complain. Here is another
common error:
int x = 5;
int y = 7;
int *ptr = x; // ILLEGAL: x is not an address
In this case we are trying to have ptr point at x, but we have forgotten that a pointer holds an
address. Thus we need an address on the right side of the assignment. The compiler will com-
plain that we have forgotten the &, but its error message may initially appear cryptic.
ptr x y
(&ptr) 1200 ptr = ?
Pointers 41
Continuing with this example, suppose that we have the correct declaration but with ptr
uninitialized:
int x = 5;
int y = 7;
int *ptr; // LEGAL but ptr is uninitialized
What is the value of ptr? As Figure 3-3 shows, the value is undefined because it was never ini-
tialized. Thus the value of *ptr is also undefined. However, using *ptr when ptr is unde-
fined is worse because ptr could hold an address that makes absolutely no sense at all, thus
causing a program crash if it is dereferenced. Even worse, ptr could be pointing at an address
that is accessible, in which case the program will not immediately crash but will be erroneous. If
*ptr is the target of an assignment, then we would be accidentally changing some other data,
which could result in a crash at a later point. This is a tough error to detect because the cause and
symptom may be widely separated in time.
We have already seen the correct syntax for the assignment:
ptr = &x; // LEGAL
Suppose that we forget the address-of operator. Then the assignment
ptr = x; // ILLEGAL: x is not an address
rightly generates a compiler error. There are two ways to make the compiler shut up. One is to
take the address on the right side, as in the correct syntax. The other method is erroneous:
*ptr = x; // Semantically incorrect
The compiler is quiet because the statement says that the int to which ptr is pointing should
get the value of x. For instance, if ptr is &y, then y is assigned the value of x. This assignment
is perfectly legal, but it does not make ptr point at x. Moreover, if ptr is uninitialized, deref-
erencing it is likely to cause a run-time error, as discussed above. This error is obvious from Fig-
ure 3-3. The moral is to always draw a picture at the first sign of pointer trouble.
Using *ptr=x instead of ptr=&x is a common error for two reasons. First, since it
makes the compiler quiet, programmers feel comfortable about using the incorrect semantics.
Second, it looks somewhat like the syntax used for initialization at declaration time. The differ-
ence is that the * at declaration time is not a dereferencing * but rather just an indication that the
object is a pointer type.
Some final words before we get to some substantive uses: First, sometimes we want to
state explicitly that a pointer is pointing nowhere, as opposed to an undefined location. The
NULL pointer points at a memory location that is guaranteed to be incapable of holding any-
thing. Consequently, a NULL pointer cannot be dereferenced. The symbolic constant NULL was
prevalent in C, but is being phased out in favor of an explicit 0. But many still feel that NULL is
more readable, and so we use NULL, assuming that the following declaration exists:
const int NULL = 0;
Pointers are best initialized to the NULL pointer because in many cases they have no
default initial values (these rules apply to other predefined types as well).
c++book.mif Page 42 Saturday, July 12, 2003 10:53 AM
Second, a dereferenced pointer behaves just like the object that it is pointing at. Thus, after
the following three statements, the value stored in x is 15:
x = 5;
ptr = &x;
*ptr += 10;
However, we must be cognizant of precedence rules because (as we discuss in Section 11.3) it is
possible to perform arithmetic not only on the dereferenced values but also on the (un-derefer-
enced) pointers themselves.1 As an example, the following two statements are very different:
*ptr += 1;
*ptr++;
In the first statement the += operator is applied to *ptr, but in the second statement the ++
operator is applied to ptr. The result of applying the ++ operator to ptr is that ptr will be
changed to point at a memory location one memory unit larger than it used to. (We discuss why
this might be useful in Section 11.3.)
Third, if ptr1 and ptr2 are pointers to the same type, then
ptr1 = ptr2;
sets ptr1 to point to the same location as ptr2, while
*ptr1 = *ptr2;
assigns the dereferenced ptr1 the value of the dereferenced ptr2. Figure 3-4 shows that these
statements are quite different. Moreover, when the wrong form is used mistakenly, the conse-
quences might not be obvious immediately. In the previous examples, after the assignment,
*ptr1 and *ptr2 are both 7. Similarly, the expression
ptr1 == ptr2
is true if the two pointers are pointing at the same memory location, while
*ptr1 == *ptr2
is true if the values stored at the two indicated addresses are equal. It is a common mistake to use
the wrong form.
The requirement that ptr1 and ptr2 point to the same type is a consequence of the fact
that C++ is strongly typed: We cannot mix different types of pointers without an explicit type
conversion, unless the user has provided an implicit type conversion.
If several pointers are declared in one statement, the * must precede each variable:
int *ptr1, *ptr2; // Correct ptr1 and ptr2 are both pointer to int
int *ptr1, ptr2; // Wrong!! ptr2 is an int
Finally, when pointers are declared, the white space that surrounds the * is unimportant to
the compiler. Pick a style that you like.
1. This is an unfortunate consequence of C++’s very liberal rules that allow arithmetic on pointers, making use of
the fact that pointers are internally stored as integers. We discuss the reasoning for this in Section 11.3.
c++book.mif Page 43 Saturday, July 12, 2003 10:53 AM
5 5 7
ptr1 x ptr1 x ptr1 x
7 7 7
ptr2 y ptr2 y ptr2 y
rences in many C++ programs. The delete operator is illustrated at line 16.
An example of a memory leak is shown in Figure 3-6, in which we return at line 9 without
calling delete. Fortunately, many sources of memory leaks can be automatically removed
with care. One important rule is to not use new when a stack-allocated variable can be used
instead. A stack-allocated variable is automatically cleaned up (hence it is also known as an
automatic variable in C++). Thus in this code, it would make sense to allocate the vector on the
runtime stack (and avoid the pointer) instead of using new to create it on the heap.
1 #include <iostream>
2 #include <string>
3 using namespace std;
4
5 int main( )
6 {
7 string *strPtr;
8
9 strPtr = new string( "hello" );
10 cout << "The string is: " << *strPtr << endl;
11 cout << "Its length is: " << (*strPtr).length( ) << endl;
12
13 *strPtr += " world";
14 cout << "Now the string is " << *strPtr << endl;
15
16 delete strPtr;
17
18 return 0;
19 }
1 #include <iostream>
2 #include <vector>
3 using namespace std;
4
5 void leak( int i )
6 {
7 vector<int> *ptrToVector = new vector<int>( i );
8 if( i % 2 == 1 )
9 return;
10 // some other code not shown ...
11 delete ptrToVector;
12 }
Figure 3-6 Illustration of memory leak in program that does nothing useful
c++book.mif Page 45 Saturday, July 12, 2003 10:53 AM
3.3.4 Double-delete
A second problem is the so-called double-delete. A double-delete occurs when we attempt to
call delete on the same object more than once. This would occur if we now made the call
delete s; // Oops -- double delete
Since s is stale, the object that it points to is no longer valid. Trouble in the form of a runtime
error is likely to result.
Thus, we see the perils of dynamic memory allocation. We must be certain to never call
delete more than once on an object, and then only after we no longer need it. But if we don’t
call delete at all, we get a memory leak. And if we have a pointer variable and intend to call
delete, we must be certain that the object being pointed at was created by a call to new. When
we have functions calling functions calling other functions, it is hard to keep track of everything.
"hello"
t
Figure 3-7 Stale pointers: because of the call to delete t, pointers s and t are now
pointing at an object that no longer exists; a call to delete s would now be
an illegal double-deletion
c++book.mif Page 46 Saturday, July 12, 2003 10:53 AM
Figure 3-8 A stale pointer: the pointee, ret, does not exist after dup returns
c++book.mif Page 47 Saturday, July 12, 2003 10:53 AM
Figure 3-9 Safer code, but the caller must call delete or there may be a memory leak
Figure 3-10 Using static local variable frees caller from having to call delete
A third option that is sometimes used frees the caller from reclaiming memory. Here, we
return a pointer to a variable that is not allocated from the heap, yet is not allocated on the runt-
ime stack. Two such entities qualify: a global variable, or a static local variable. A static local
variable is essentially the same as a global variable, except that it is only visible inside the func-
tion in which it was declared. The static local variable is created once (the first time the function
is invoked), and the variable retains its values between calls to the same function. Thus like a
static class variable in Java, a static local variable could be used to keep track of the number of
times a function has been invoked. Figure 3-10 illustrates this version of dup.
When a function returns a pointer to a static local variable, the caller no longer has to man-
age memory. However, now the caller must use the return value and, in particular, the object
being pointed at by the return value, prior to making another call to the function. Otherwise, in
the following code fragment,
string *s1 = dup( "hello" );
string *s2 = dup( "world" );
cout << *s1 << " " << *s2 << endl;
the string worldworld is printed twice, because both s1 and s2 are pointing at the same
static object (ret), which is storing the result of the last call to dup. Thus once again, it is
incumbent on the programmer to document that the return value is a pointer to a static variable,
and that the return value must be quickly used.
You should never use delete on an object that was not created by new; if you do, runt-
ime havoc is likely to result. For instance, an attempt to call delete on s1 or s2 in the last
example may lead to a disaster on some C++ implementations. This shows the largest problem
with pointers in C++: when you receive a pointer variable, if the implementation of the function
that is sending you the pointer is hidden (which we would typically expect), unless there are
comments, you have no way of knowing if you are responsible for calling delete, or if you
c++book.mif Page 48 Saturday, July 12, 2003 10:53 AM
should not call delete, and making the wrong decision results in either a memory leak or per-
haps an invalid delete. Furthermore, if you have an array of pointers, and the objects being
pointed at were allocated from the memory heap, then you may need to call delete on these
objects when you are done with the array. If any object is being pointed at twice, the program-
mer must write extra code to avoid double-deletions.
count += 3;
Reference variables must be initialized when they are declared and cannot be changed to
reference another variable. This is because an attempted reassignment via
count = someOtherObject;
assigns to the object longVariableName the value of someOtherObject. This example
is a poor use of reference variables but accurately reflects how they are used in a more general
setting in which the scope of the reference variable is different than that of the object being ref-
erenced. One important case is that a reference variable can be used as a formal parameter,
which acts as an alias for an actual argument. We have previously discussed this in the context of
using call-by-reference and call-by-constant reference passing vectors (Section 2.3).
Reference variables are like pointer constants in that the value they store is the address of
the object they refer to. They are different in that an automatic invisible dereference operator is
applied to the reference variable.
Many C++ libraries are based on functions that were written in C, where reference vari-
ables are not available. In C, pointer variables are used to achieve call-by-reference, and C++
programs are likely to run into older code that uses this technique. We discuss use pointers to
simulate call-by-reference semantics in Section 12.3.
c++book.mif Page 49 Saturday, July 12, 2003 10:53 AM
Using C++ reference variables instead of pointer variables translates into a notational con-
venience, especially because it allows parameters to be passed by reference without the excess
baggage of the & operator on the actual arguments and the * operator that tends to clutter up C
programs.
By the way, pointers can be passed by reference. This is used to allow a function to change
where a pointer, passed as a parameter, is pointing at. A pointer that is passed using call-by-
value cannot be changed to point to a new location (because the formal parameter stores only a
copy of the where value).
Because a reference variable must be initialized at the moment it is declared, it is illegal to
have an array of reference variables.
Finally, we mention that a function can return by reference (or constant reference) in order
to possibly avoid the overhead of a copy. In such a case, the expression in the return statement
must be an object whose lifetime extends past the end of the function. In other words, an object
allocated on the runtime stack should not be returned by reference. We will discuss returning by
reference in Chapter 4. Returning by reference (or constant reference) can sometimes make the
program more efficient by avoiding the overhead of a copy. But doing so is fairly tricky, and
modern optimizing C++ compilers have ways of avoid the overhead of a copy anyway. As a
result, except for a few standard places where the return by reference idiom is used, we do not
recommend it.
to construct the string, and then invoke the c_str member function from the string class
to obtain the primitive string.
Other than this, and command line-arguments (Chapter 12) you should not need to use
pointer variables for arrays and strings, but if you must interact with legacy code, see
Chapter 11.
3.6.5 Inheritance
Any serious use of inheritance will require pointer variables and heap memory allocation. The
code will look much like Java code, except that the programmer will have the significant burden
of reclaiming inactive objects. This issue is unavoidable and is a major reason why many people
claim Java is an easier-to-use language for object-oriented programming than C++. We will dis-
cuss this in more detail in Chapter 6.
• In C++, objects can be allocated both on the runtime stack and on the memory heap.
• Objects allocated from the runtime stack are automatically reclaimed when the block in
which they were allocated terminates.
• A pointer variable stores the address where another variable resides. A C++ pointer vari-
able has similar semantics (especially =, ==, !=, and NULL) as the Java reference vari-
able.
• Objects allocated from the memory heap must eventually be returned back to the memory
heap when they are no longer needed. Otherwise, it is possible to create memory leaks that
may eventually cause your program to run out of memory. An object is returned to the
memory heap by calling delete, with the address of the object.
• Never call delete on an object that was not allocated by new.
c++book.mif Page 51 Saturday, July 12, 2003 10:53 AM
Exercises 51
• After calling delete on an object, all pointers to that object become stale and should not
be used.
• Never attempt to delete an object twice.
• A static local variable is created once per program, but the variable is only visible from
inside the function in which it is declared. Each invocation of the function reuses the same
variable, and its value is retained between function invocations.
• The -> operator is used to access members of a pointed at class type.
• A reference variable in C++ is a pointer constant that is always dereferenced implicitly.
One can view it is a synonym for another name of an object. Reference variables must be
initialized when they are declared and cannot be changed to reference other variables.
Arrays of reference variables are illegal.
• Using pointer variables to simulate call-by-reference, or to implement arrays or strings
should be avoided if possible.
• Pointer variables will be used to avoid large data moves, implementing linked data struc-
tures, and especially in inheritance.
3.8 Exercises
1. How is the C++ memory model different from the Java memory model?
2. What is a pointer variable?
3. What is a stale pointer?
4. What is a memory leak?
5. Which objects have their memory automatically reclaimed?
6. What does the delete operator do?
7. What happens if delete is invoked twice on the same object?
8. What happens if delete is invoked on an object that is not heap-allocated?
9. If a function returns a pointer to an object, why can’t the object be a stack-based local vari-
able?
10. If a function returns a pointer to static data, what must the user be sure to do?
11. If a function returns a pointer to heap-allocated data, what must the user be sure to do?
12. What does the -> operator do?
13. What is a reference variable in C++?
14. Why must C++ reference variables be initialized when declared?
15. Consider
int a, b;
int *ptr;
int *ptrptr;
ptr = &a;
ptrptr = &ptr;
c++book.mif Page 52 Saturday, July 12, 2003 10:53 AM
a. Is this legal?
b. What are the values of *ptr and **ptrptr?
c. Is ptrptr=ptr legal?
16. Is *&x always equal to x? If not, give an example.
17. Is &*x always equal to x? If not, give an example.
18. For the declaration
int a = 5;
int *ptr = &a;
What are the values of the following?
a. ptr
b. *ptr
c. ptr == a
d. ptr == &a
e. &ptr
f. *a
g. *&a
h. **&ptr
c++book.mif Page 53 Saturday, July 12, 2003 10:53 AM
C H A P T E R 4
Object-Based
Programming: Classes
L
IKE Java, C++ uses the class to support object-based
programming, and uses inheritance to support object-oriented programming. Classes in C++
operate in much the same way as in Java, except that there is significant additional syntax
because of C++’s attempt to make class types look exactly like primitive types.
In this chapter, we begin by examining the similarities, and then we look at a host of
important syntactic constructs found in the implementation of classes in C++. This chapter does
not discuss two interesting topics: operator overloading (Chapter 5) and inheritance (Chapter 6).
53
c++book.mif Page 54 Saturday, July 12, 2003 10:53 AM
1 class IntCell
2 {
3 public:
4 IntCell( int initialValue = 0 )
5 { storedValue = initialValue; }
6
7 int getValue( )
8 { return storedValue; }
9
10 void setValue( int val )
11 { storedValue = val; }
12
13 private:
14 int storedValue;
15 };
Figure 4-2 Initial version of C++ class that stores an int value (needs more work)
1 int main( )
2 {
3 IntCell m1;
4 IntCell m2 = 37;
5 IntCell m3( 55 );
6
7 cout << m1.getValue( ) << " " << m2.getValue( )
8 << " " << m3.getValue( ) << endl;
9 m1 = m2;
10 m2.setValue( 40 );
11
12 cout << m1.getValue( ) << " " << m2.getValue( ) << endl;
13
14 return 0
15 }
Figure 4-4 Routine to return true if a vector of IntCells contains at least one zero. Not
compatible with original version of IntCell
Two other difference are immediate. First, instead of supplying a visibility modifier for
each member, we simply provide visibility modifiers for sections of the class. Thus in Figure 4-
2, the constructors and methods are public, while the data is private. There is no package visible
specifier in C++. In a C++ class, visibility is private until a public modifier is seen.
Second, in Java, constructors can invoke each other using a call to this. In C++, this is
not allowed. Instead, there are two typical alternatives. The first alternative, shown in Figure 4-
2, is to use default parameters in the constructor. Thus an IntCell can be constructed with
either an int or no parameters. The default parameter of 0 signals that if no parameter is pro-
vided, the parameter defaults to 0. Another alternative that works if the constructors are too dif-
ferent to be expressed with default parameters is to declare a private member function that can
be invoked by all the constructors. (Although the initialization routine can be a public member
function, like Java, it is best to avoid doing so, because of considerations that come into play
with inheritance). Default parameters must be compile-time constants.
Figure 4-3 illustrates the use of the IntCell class. The class declaration of IntCell
must be placed prior to using the IntCell type name. Typically, one would place it in a .h
file, and the main program would use an include directive. Observe that in main we create
IntCell objects on the runtime stack, using both no parameters and one parameter construc-
tors. Also observe that if the constructor accepts parameters, they can be placed in parentheses
(if there are two or more parameters, they must be placed in parentheses). Line 9 shows that
objects can be copied. As a result of this statement, the contents of the IntCell object m2 are
copied into IntCell object m1. This would be the equivalent of cloning in Java. Thus the sec-
ond output statement prints 37 and then 40.
to one of the array elements, and thus the array. Though the compiler could look at the imple-
mentation of getValue and see that no changes are made to the IntCell, typically this
implementation might not be available in C++ form (in the most general case, it may be invok-
ing other member functions that are already compiled into a library). Thus we need some syntax
to tell the compiler that getValue is not going to change the state of the IntCell.
A member function that looks at an object but promises not to changed the state of the
object is known as an accessor. A member function that might change the state of the object is
known as a mutator. In the IntCell class, getValue is logically an accessor, and
setValue is logically a mutator.
In C++, a member function is assumed to be a mutator unless it is explicitly marked as an
accessor. To mark a member function as an accessor, we place a const at the end of its signa-
ture as shown:
int getValue( ) const
{ return storedValue; }
Whether a member function is an accessor or a mutator is considered part of its signature.
When invoking a member function on a constant object, only the accessors are candidates; as
we’ve already seen, attempts to invoke the mutator fail.
Many less-experienced C++ programmers view the const as an annoyance, or little more
than a comment. In fact, deciding whether a member function is an accessor or a mutator is an
important part of the class design, the const that signifies an accessor should never be omitted.
If it is, you wind up having to remove the const in other places, making your code less read-
able and robust. The ability to mark an object as constant, and expect that this object’s state not
change (at least in the context of a method) allows the compiler to perform aggressive optimiza-
tion and avoids the need for locking in a multithreaded environment in which several threads
access the object simultaneously. Experienced C++ programmers who move to Java often find
the lack of Java support for immutability (the ability to mark the state of an object as unchange-
able) to be a liability.
If a member function is marked as an accessor, then the compiler will not allow the mem-
ber function to compile if it attempts to make any changes to the values of its data members.
This includes assignment to the data members as well as invoking other mutators of the class
(since invoking the mutator could change the data members).
Occasionally an accessor needs to change the values of one of its data members. For
instance, if a data member is being used to count the number of calls to getValue, we might
want to adjust it. This would not normally be allowed if getValue is an accessor, but one can
mark a data member as mutable to exempt it from the normal restrictions.
1 int main( )
2 {
3 IntCell m;
4
5 m = 3;
6 cout << m.getValue( ) << endl;
7
8 return 0;
9 }
Figure 4-5 Example of implicit type conversion, which may or may not be desirable de-
pending on context.
However this code compiles. The reason for this is that C++’s type compatibility rules are
somewhat lenient in places. The thinking of the compiler is that since m is an IntCell, it
should (by default) expect to see an IntCell on the right-hand side of the assignment operator.
What it sees is not an IntCell. However, the compiler is willing to do a type conversion. So
the question it must answer is whether it can fabricate a temporary variable of type IntCell in
place of the 3. That requires constructing the temporary, and since there is an IntCell con-
structor that takes an int, the compiler will use that constructor to fabricate the temporary and
then copy that value into m. Whether this is a good idea or not depends on the application. For
instance, in a class RationalNumber, if there is a constructor that takes a single int, then a
statement such as r=0, where r is of type RationalNumber would be convenient, and would
in fact compile.
On the other hand, since the vector type has a one parameter constructor, this would
open the door for allowing:
vector<double> arr(20);
...
arr = 5;
which would create a temporary vector of size 5, copy its contents into arr, and then free up the
temporary, instead of doing the sensible thing and reporting an error.
Faced with this dilemma, C++ adopts the following rule: A one-parameter constructor
automatically implies the ability to perform a type conversion. Further, by default the compiler
will apply the type conversion if needed, even if not requested, to satisfy assignments and
parameter matches (but not call-by-reference, which requires exact matches). This is known as
an implicit type conversion. The reason for this is that C++ is trying to treat objects in the same
fashion as it treats primitives. To disallow implicit type conversions, the programmer can mark
the constructor as explicit.
The explicit directive is meaningless for constructors that do not take one parameter. If a
constructor is marked as explicit, then it will not be considered in object creations in which
the initialization uses =; instead you must explicitly place the one parameter in parentheses.
c++book.mif Page 58 Saturday, July 12, 2003 10:53 AM
1 class IntCell
2 {
3 public:
4 explicit IntCell( int initialValue = 0 )
5 { storedValue = initialValue; }
6
7 int getValue( ) const
8 { return storedValue; }
9
10 void setValue( int val )
11 { storedValue = val; }
12
13 private:
14 int storedValue;
15 };
Figure 4-6 Second version of C++ class that stores an int value (still needs more work)
What this means is that if IntCell is marked explicit, then in Figure 4-3, the declaration
of m1 and m3 are acceptable, but the declaration of m2 is in error.
As a general rule, one parameter constructors should be marked explicit unless it
makes sense to allow implicit type conversions. Thus in the vector type, the single parameter
constructor has been marked explicit. In the string type, in which a const char *
(primitive string constant) can be passed as a parameter, the constructor is not marked as
explicit to facilitate the mixing of const char * and string library types.
The revised version of IntCell, with the changes made to signify that getValue is an
accessor and the constructor is marked explicit is shown in Figure 4-6.
Figure 4-7 Data members in a Java class and two C++ classes
When a class includes data members that are not primitive entities, a whole new set of
considerations are introduced. For example, the code in Figure 4-7 shows a Java class with three
name "Jane"
gpa 3.9
Figure 4-8 Memory layout in Java, and also in C++ when data members include pointers
c++book.mif Page 60 Saturday, July 12, 2003 10:53 AM
name "Jane"
gpa 3.9
Figure 4-9 Memory layout in C++, when objects are declared as non-pointer data mem-
bers
private data members, the syntactically similar C++ class (which has different semantics), and a
second C++ class whose semantics more closely mirror the Java class.
First, Figure 4-8 shows the layout in the Virtual Machine for Java objects of type
Student. Since name and birthDate are objects, they are accessed by reference variables.
The intent of the picture is that the data fields that are stored as part of a Student instance are
simply pointer variables.
Figure 4-9 shows the layout in C++ for the first declaration of type Student. The decla-
ration is cosmetically identical to the Java declaration, but as is evident from Figure 4-9, com-
plete instances of the data members are stored as part of the Student entity. We can expect that
a copy of Student objects copies all the data members, and this looks good in C++, except of
course, that the copy can be expensive if the data members are large. Recall also that by default
parameter passing and returning makes copies, which is why it can be important to pass objects
using call-by-constant reference instead of call-by-value.
Figure 4-9 shows one new issue that C++ must deal with. In Java, in the constructor, each
of the reference data members is initialized to null, and then assigned to reference an object of
the appropriate type. In C++, the first step of initializing members to null does not work. The
alternative, initializing each member with a default constructor might work, but there are limita-
tions. For instance, there might not be a zero-parameter constructor for type Date. Thus, we
need some syntax to specify how each of the data members is initialized. We discuss this in
Section 4.5.
Although the first piece of C++ in Figure 4-7 looks similar to Java code, we have seen that
it is quite different semantically. Further, often it is the case that the data members are pointer
variables; as we mentioned in Section 3.6, this may be needed in linked data structures and pro-
grams that involve inheritance. The last piece of code in Figure 4-7 shows the data members
declared as pointer variables. With this declaration, the memory layout mirrors the Java layout.
However, this layout creates numerous subtleties.
First, a copy is no longer a real copy; when data members are copied, the result is that the
birthDate and name are shared amongst two instances of Student. This is known as a
c++book.mif Page 61 Saturday, July 12, 2003 10:53 AM
Initializer Lists 61
shallow copy. Typically with C++, we expect a deep copy. Certainly the visible semantics of
C++ should not depend on the implementation details, and the default visible semantics that we
would expect to see should be a deep copy. Thus when data members are pointer variables, if we
expect deep copy semantics, we must redefine the assignment operation to ensure correct behav-
ior. We discuss this in Section 4.6.
A related problem concerns how objects of type Student are returned. By default, a
copy is made, but the copy creates a new temporary object. This is not an assignment operator,
but instead a different kind of constructor, known as the copy constructor. We discuss this in
Section 4.6. The default can be time-consuming, so in some cases it is worth trying to avoid it by
returning using constant reference. We discuss this in Section 4.10.
Another related problem is that in the first implementation, when objects of type
Student are created and destroyed, all of its constituent components, which are part of
Student are automatically destroyed too. In the second implementation this is no longer true
by default, because name and birthDate are simply pointer variables. The objects they are
pointing at are not reclaimed unless delete is applied to them; and as these are private vari-
ables it is difficult for the user to do so. Instead we need to provide a routine, called the destruc-
tor, that ensures that private heap objects allocated by the Student class are reclaimed when
the Student is itself reclaimed. We discuss this in Section 4.6.
Another complication deals with the fact that in Java, the language consists of both the
Java Language Specification and the Java Virtual Machine Specification. Specifically, a Java
compiler can look at Java bytecode to decide the valid methods for a class. The C++ setup places
the class declaration in a .h file, whose contents are logically copied into all source files that
provide an appropriate include directive. This can make compilation slow. In C++, we can spec-
ify the declaration of the class, which lists its member functions and memory layout, and pro-
vide an implementation separately, thus reducing the size of the .h file. We discuss this in
Sections 4.11 and 4.12.
Initializer Lists 63
Consider, for instance, two alternatives for a Student class constructor. The first version
is shown in Figure 4-10 without an initializer list, while an improved version is shown in
Figure 4-11 with an initializer list.
Consider the initialization of the birthDate data member. In this instance, using an ini-
tializer list allows us to initialize the Date data member directly. Without an initializer list, one
must first create a default Date. Since a default Date must represent a valid Date, this Date
may well be initialized to the current Date (today); it certainly will not be some random mem-
ory values. Then in the body of the constructor, the intended Date must be copied into to the
Date data member, overwriting the initialized state. Obviously this means that we have wasted
CPU cycles in initializing the Date to today’s date. Depending on how complex the Date class
itself is, and how often Student objects are constructed, this could be nontrivial, as it could
involve creation of strings to store months, and so on.
Because initialization of each class member should usually be done with its own construc-
tor, when possible you should use explicit initializer lists. Note however, that this form is
intended for relatively simple cases only. If the initialization is not simple (e.g. if error checks
are needed, or the initialization of one data member depends on another), perhaps the body of
the constructor should be used for more complex logic.
It is important to note that the order of evaluation of the initializer list is given by the order
in which class members are listed. This is one reason why it is bad style to have the initialization
of a data member depend on another data member. If your code depends on the order of initial-
ization, it’s probably dubious code, and should be avoided; if this is impossible, at least com-
ment the fact that there is an order dependency, so a future programmer does not change the
order of the data members.
An initializer list is required in four common situations.
1. If any data member does not have a zero-parameter constructor, the data member must be
initialized in the initializer list.
2. If a superclass does not have a constructor, the subclass must use an initializer list to ini-
tialize the inherited component. (Chapter 6). One can view this as being the same as the
first situation.
3. Constant data members must be initialized in the initializer list. A constant data member
can never be altered after the data member is constructed. This means you could not apply
the copy assignment operator in the body of the class constructor to set the value of the
constant data member. An example of a constant data member could be the identification
number in a Student class. Each Student has his or own unique identification num-
ber, but presumably the identification number never changes. This is similar to final data
members in Java, except that in Java, it is the reference variable that is final, not the
object. In C++ it is the data member.
4. A data member that is a reference variable (for instance an ostream &) must be initial-
ized in the constructor.
c++book.mif Page 64 Saturday, July 12, 2003 10:53 AM
4.6.1 Destructor
The destructor is called whenever an object goes out of scope or is subjected to a delete. Typ-
ically, the only responsibility of the destructor is to free up any resources that were allocated
during the use of the object. This includes calling delete for any corresponding news, closing
any files that were opened, and so on. The default applies the destructor on each data member.
The first case is the simplest to understand because the constructed objects were explicitly
requested. The second and third cases construct temporary objects that are never seen by the
user. In both cases we are constructing new objects as copies of existing objects, so certainly the
copy constructor is applicable.
By default the copy constructor is implemented by applying copy constructors to each
data member in turn. For data members that are primitive types (for instance, int, double, or
pointers), simple assignment is done. This would be the case for the storedValue data mem-
ber in our IntCell class. For data members that are themselves class objects, the copy con-
structor for each data member’s class is applied to that data member.
4.6.3 operator=
The copy assignment operator, operator=, is called when = is applied to two already-con-
structed objects. lhs=rhs is intended to copy the state of rhs into lhs. By default
operator= is implemented by applying operator= to each data member in turn.
c++book.mif Page 65 Saturday, July 12, 2003 10:53 AM
different, then the source file will be truncated, hardly a desirable feature. When performing
copies, the first thing we should do is check for this special case, known as aliasing.
In the routines that we write, if the defaults make sense, we will always accept them. How-
ever, if the defaults do not make sense, we will need to implement the destructor, and
operator=, and the copy constructor. When the default does not work, the copy constructor
can generally be implemented by mimicking normal construction and then calling operator=.
Another often-used option is to give a reasonable working implementation of the copy construc-
tor, but then place it in the private section, to disallow call-by-value.
1 class IntCell
2 {
3 public:
4 ~IntCell( )
5 {
6 // Does nothing since IntCell contains only an int data
7 // member. If IntCell contained any class objects their
8 // destructors would be called.
9 }
10
11 IntCell( const IntCell & rhs )
12 : storedValue( rhs.storedValue )
13 {
14 }
15
16 IntCell & IntCell::operator=( const IntCell & rhs )
17 {
18 if( this != &rhs ) // Standard alias test
19 storedValue = rhs.storedValue;
20 return *this;
21 }
22
23 ...
24
25 private:
26 int storedValue;
27 };
Figure 4-12 The defaults that are generated automatically be the compiler
c++book.mif Page 67 Saturday, July 12, 2003 10:53 AM
1 class IntCell
2 {
3 public:
4 explicit IntCell( int initialValue = 0 )
5 { storedValue = new int( initialValue ); }
6 int getValue( ) const
7 { return *storedValue; }
8 void setValue( int val )
9 { *storedValue = val; }
10
11 private:
12 int *storedValue;
13 };
1 int f( )
2 {
3 IntCell a( 2 );
4 IntCell b = a;
5 IntCell c;
6
7 c = b;
8 a.setValue( 4 );
9 cout << a.getValue( ) << endl << b.getValue( ) << endl
10 << c.getValue( ) << endl;
11
12 return 0;
13 }
There are now numerous problems that are exposed in Figure 4-14. First, the output is
three 4s, even though logically only a should be 4. The problem is that the default copy con-
structor and operator= copy the pointer storedValue. Thus a.storedValue,
b.storedValue, and c.storedValue all point at the same int value. These copies are
a.storedValue 2
b.storedValue
c.storedValue 0
Figure 4-15 After line 5 in Figure 4-14, default copy constructor generates shallow copies
c++book.mif Page 68 Saturday, July 12, 2003 10:53 AM
shallow: the pointers, rather than the pointees are copied. A second less obvious problem is a
memory leak. The int initially allocated by a’s constructor remains allocated and needs to be
reclaimed. The int allocated by c’s constructor is no longer referenced by any pointer variable.
It also needs to be reclaimed, but we no longer have a pointer to it. These problems are illus-
trated in Figures 4-15 and 4-16.
a.storedValue 2
b.storedValue
c.storedValue 0
Figure 4-16 After line 7 in Figure 4-14, default operator= generates shallow copy, and
leaks memory
1 class IntCell
2 {
3 public:
4 explicit IntCell( int initialValue = 0 )
5 { storedValue = new int( initialValue ); }
6 IntCell( const IntCell & rhs )
7 { storedValue = new int( *rhs.storedValue ); }
8
9 ~IntCell( )
10 { delete storedValue; }
11
12 IntCell & operator=( const IntCell & rhs )
13 {
14 if( this != &rhs )
15 *storedValue = *rhs.storedValue;
16 return *this;
17 }
18
19 int getValue( ) const
20 { return *storedValue; }
21
22 void setValue( int val )
23 { *storedValue = val; }
24
25 private:
26 int *storedValue;
27 };
To fix these problems, we implement the Big-Three. The result is shown in Figure 4-17.
Generally speaking, if a destructor is necessary to reclaim memory, then the defaults for copy
assignment and copy construction are not acceptable.
Linked data structures, such as linked lists and binary search trees provide classic examples in
which the Big-Three need to be written. Although the Standard Library provides implementa-
tions of stacks, queues, lists, sets, and maps, you may on occasion find that you need to imple-
ment your own. For instance, the linked list (and thus queue) implementations in the Standard
Library use doubly-linked lists, and a singly-linked list can be implemented faster and with less
space. In this section we illustrate a singly-linked list implementation of a queue, with minor
syntactical improvements in sections that follow. Our interest in this implementation is con-
cerned mostly with memory management and the Big-Three.
Recall that a queue supports insertion at one end (the back), and deletion at the other end
(the front). These operations are enqueue and dequeue, respectively. As Figure 4-18 shows,
a singly-linked list in which we store a pointer to both the front and the back of the list can be
used to represent a queue. In an empty queue, the data member front is NULL. In this case, the
linked list contains list nodes that each store the data and a pointer to the next node in the list and
this is implemented in a ListNode class shown in Figure 4-19. The class uses public data
because the data members need to be accessed from the queue class. There are several alternate
solutions, including the use of nested classes, that we will discuss in Sections 4.7 – 4.9.
Figure 4-20 shows that to enqueue a new integer x, we create a new ListNode and
attach it after the last node in the list, in the process updating back. Figure 4-21 shows that to
dequeue the front item, we simply advance front, after saving the data in the front node so it
can be returned. In Java, when we advance front, the node that was formerly referenced by
front becomes unreferenced and eligible for garbage collection. In C++, we must clean up the
memory ourselves.
front back
A B C D
1 class ListNode
2 {
3 public:
4 int element;
5 ListNode *next;
6
7 ListNode( int theElement, ListNode * n = NULL )
8 : element( theElement ), next( n ) { }
9 };
back
Before ...
back
After ... x
front
Before a b ...
front
After b ...
1 class UnderflowException { };
2
3 class IntQueue
4 {
5 private:
6 ListNode *front;
7 ListNode *back;
8
9 public:
10 IntQueue( ) : front( NULL ), back( NULL )
11 { }
12
13 IntQueue( const IntQueue & rhs )
14 : front( NULL ), back( NULL )
15 { *this = rhs; }
16
17 ~IntQueue( )
18 { makeEmpty( ); }
19
20 const IntQueue & operator= ( const IntQueue & rhs )
21 {
22 if( this != &rhs )
23 {
24 makeEmpty( );
25 ListNode *rptr = rhs.front;
26 for( ; rptr != NULL; rptr = rptr->next )
27 enqueue( rptr->element );
28 }
29 return *this;
30 }
31
32 void makeEmpty( )
33 {
34 while( !isEmpty( ) )
35 dequeue( );
36 }
The zero-parameter constructor, shown at lines 10 and 11 does no more than initialize
front and back to NULL; this is NOT done by default. The copy constructor at lines 13 to 15
first makes the queue empty, and then copies rhs into it using operator=. Since
operator= might try to reclaim nodes in this queue prior to the copy, it is important that this
queue be placed into a respectable state prior to invoking the copy assignment operator. The
destructor is shown at lines 17 and 18. Instead of reclaiming the memory itself, it delegates the
dirty work to makeEmpty, which presumably cleans up the memory. The copy assignment
operator operator=, is shown at lines 20 to 30. After the alias test, it empties out this queue,
and then steps through rhs, enqueueing each item it sees. The alias test is crucial here, since
otherwise, a self-assignment makes the queue empty!
makeEmpty is shown at lines 32 to 36. Recall that the destructor was relying on
makeEmpty to reclaim all the list nodes. makeEmpty is implemented simply by calling
dequeue until the queue is empty, thus delegating the dirty work of memory management yet
again.
isEmpty, getFront, and enqueue are all shown in Figure 4-23, and are relatively
straightforward, (in some cases trivial), as they are only a few lines of code each and do not
involve reclaiming memory. The reclaiming of memory however, can only be deferred for so
long, and finally in dequeue, shown in lines 55 to 62 we have to bite the bullet. As the code
shows, a call at line 57 to getFront gives us the frontItem that can be returned at line 61
(or throws an UnderflowException that is unhandled). Prior to advancing front at line 59,
we save a pointer to it (line 58) so we can reclaim the node (line 60) by invoking the delete
operator. Thus we have managed to funnel all memory management to three lines of code,
which is generally your best strategy, since memory management is so bug-prone.
4.7 Friends
Typically we would like to make data members (and occasionally member functions) private.
However as we saw in Section 4.6.6, it often is inconvenient to do so, because there may be one
other class (or possibly a few select classes) that needs access to implementation details. But
making the members public grants access to everyone.
Java has an intermediate visibility modifier, package visibility, in which specific members
are marked as being accessible to other classes. However, while this allows access only to
classes that happen to be in the same package (typically not a severe limitation), the access is
granted to all such classes.
In C++, there is no such visibility modifier. Instead, the class can grant waivers to others
of the normal privacy restrictions. Such a waiver would apply to the access of all of the class’
private members. This waiver can granted only by the class that is willing to allow access to its
c++book.mif Page 75 Saturday, July 12, 2003 10:53 AM
Friends 75
1 class ListNode
2 {
3 private:
4 int element;
5 ListNode *next;
6
7 ListNode( int theElement, ListNode * n = NULL )
8 : element( theElement ), next( n ) { }
9
10 friend class IntQueue;
11 };
private members. The recipient of the waiver can be either an entire class, or a specific function.
There is no limit to the number of waivers that a class can grant.
A waiver is known as a friend declaration.
1 class IntQueue
2 {
3 public:
4 ...
5
6 private:
7 class ListNode
8 {
9 public:
10 int element;
11 ListNode *next;
12
13 ListNode( int theElement, ListNode * n = NULL )
14 : element( theElement ), next( n ) { }
15 };
16
17 ListNode *front;
18 ListNode *back;
19 };
Figure 4-26 Two versions to find the maximum string (alphabetically); only the first version
is correct
C++ allows local classes in which a class is declared inside a function. However, their util-
ity is dubious because unlike Java, automatic local variables in the enclosing function cannot be
accessed (static local variables can be accessed, but this hardly seems like sufficient justification
to introduce the added complexity of using a local class).
C++ does not allow anonymous classes.
to avoid the copy of the return value. But this is tricky, and there are two important issues.
First, as Figure 4-26 shows, if you are returning a constant reference, then the expression
that is being returned must have lifetime that extends past the end of the function. Thus
findMaxWrong is incorrect because it returns a reference to maxValue, and maxValue
does not exist once this function returns. The first implementation is correct, since it is guaran-
teed that the array item exists when the function returns. Notice that a return expression such as
return arr[ maxIndex ] + "";
does not work. The result of the string concatenation is an unnamed temporary string whose
destructor will be called as soon as the function terminates.
Ensuring that the return expression has long lifetime is only half of the task. Consider the
following three calls to findMax.
string s1 = findMax( arr ); // copies
const string & s2 = findMax( arr ); // no copy
string & s3 = findMax( arr ); // illegal
The first call is legal, but defeats the entire purpose of returning by constant reference.
Specifically, the object s1 is a string, it is being created, its initial value is another string,
so this statement causes execution of a string copy constructor. The similar code
string s1;
s1 = findMax( arr );
is no better; it creates a default string, and then the second line causes execution of the
string copy assignment operator.
The second call, in which we initialize s2 is the correct way to avoid the copy. The decla-
ration says that s2 is a reference variable. Thus, it is not a new string. The initialization
states that s2 references the same string object as the return value of findMax. Since
findMax returns by (constant) reference, its return value references the maximum string that is
actually contained in arr. Thus s2 references the maximum string that is actually contained
in arr.
If findMax returned by value, then this would appear to be dubious code, because s2
would be referencing an unnamed temporary, whose destructor could be called as soon as the
statement terminated. Using s2 at the next line could result in an attempt to access an already
destructed string. However, the language specification has specifically contemplated this sce-
nario and it is guaranteed that the temporary variable will not be destroyed while the reference
variable is active. Even so, in this situation, the return by value causes a copy to create the tem-
porary variable. Thus, to avoid the copy, we must both:
The third call should not compile, because it attempts to throw away the const-ness of the
reference variable returned by findMax. Specifically, findMax was returning a reference that
c++book.mif Page 79 Saturday, July 12, 2003 10:53 AM
1 class IntCell
2 {
3 public:
4 explicit IntCell( int initialValue = 0 );
5
6 int getValue( ) const;
7 void setValue( int val );
8
9 private:
10 int storedValue;
11 };
Figure 4-27 Third version of C++ class that stores an int value, now with class declaration
separate from implementation (still needs more work)
could now be used to modify the referenced object, and if this declaration were to be allowed by
the compiler, then s3 could be used to modify the referenced object, in violation of the expecta-
tion of findMax.
However, the const-ness can be cast away using the special const_cast:
string & s3 = const_cast<string &> ( findMax( arr ) );
Now an attempt to change the object that was being referenced compiles (though if the
object was actually stored in read-only-memory, the code may fail at runtime). Needless to say,
using a const_cast to circumvent const-ness is almost always inadvisable and is best
avoided.
1 #include "IntCell.h"
2
3 IntCell::IntCell( int initialValue )
4 : storedValue( initialValue )
5 {
6 }
7
8 int IntCell::getValue( ) const
9 {
10 return storedValue;
11 }
12
13 void IntCell::setValue( int val )
14 {
15 storedValue = val;
16 }
Figure 4-28 Third version of C++ class that stores an int value; class implementation (fi-
nal version)
mented, is the class implementation. The class declaration is also known as the class specifica-
tion, and is sometimes known as the class interface.
Figure 4-27 shows the class declaration for IntCell, which we last saw in Figure 4-6.
The class implementation is shown in Figure 4-28. Observe that we use the :: scoping operator
to signify that we are implementing member functions, rather than plain (non-member) func-
tions. The main program is shown in Figure 4-29.
The same syntax is used whether these member functions are public or private. The signa-
ture of the member functions in the implementation file must exactly match the signature in the
class specification, including parameter passing mechanism, return mechanism, and all uses of
const. The names of the formal parameters do not have to match, and in fact, may be omitted
in the class declaration.
It is illegal to provide an implementation of a member function that was not listed in the
class declaration. If a member function is listed in the class declaration, but is not implemented,
the program will still compile. It will link and run if the missing member function is never actu-
ally invoked. This allows the programmer to implement and debug the class in stages.
The compilation mechanism is the same as described in Section 2.1.7. The .cpp files
(main.cpp and IntCell.cpp) would be compiled as part of the project, and the .h files
would automatically be handled by the include directives.
If the implementation of a member function in IntCell.cpp changes, then only
IntCell.cpp needs to be recompiled. If the class declaration in IntCell.h changes, then
all .cpp files that have referenced the class declaration must be recompiled.
We do remark that implementing a member function inside the class declaration does have
the advantage that an aggressive compiler can perform inline optimization. Thus, often trivial
c++book.mif Page 81 Saturday, July 12, 2003 10:53 AM
1 #include "IntCell.h"
2 #include <iostream>
3 using namespace std;
4
5 int main( )
6 {
7 IntCell m1;
8 IntCell m2 = 37;
9 IntCell m3( 55 );
10
11 cout << m1.getValue( ) << " " << m2.getValue( )
12 << " " << m3.getValue( ) << endl;
13 m1 = m2;
14 m2.setValue( 40 );
15
16 cout << m1.getValue( ) << " " << m2.getValue( ) << endl;
17
18 return 0;
19 }
Figure 4-29 Example showing the use of IntCell when class is separated (no changes)
one-liners that are not likely to undergo changes in future versions are implemented in the class
declaration. Most notably, this often includes constructors and destructors.
1 #ifndef INTCELL_H
2 #define INTCELL_H
3
4 class IntCell
5 {
6 public:
7 explicit IntCell( int initialValue = 0 );
8
9 int getValue( ) const;
10 void setValue( int val );
11
12 private:
13 int storedValue;
14 };
15 #endif
Figure 4-30 Final version of C++ class that stores an int value, with class declaration sep-
arate from implementation, includes ifndef/endif idiom
declaration. So #ifndef is a preprocessor directive and stands for “not defined.” The #endif
at line 15 closes the body of the #ifndef. As a matter of safe programming, this idiom, which
we refer to as the ifndef/endif idiom, should be used in all header files. Even if there is no class
declaration in the header file, using this idiom allows the compiler to avoid some parsing effort
if the file appears a second time in a chain of include directives.
Static Members 83
1 class Ticket
2 {
3 public:
4 Ticket( ) : id( ++ticketCount )
5 { }
6
7 int getID( ) const
8 { return id; }
9
10 static int getTicketCount( )
11 { return ticketCount; }
12
13 private:
14 int id;
15 static int ticketCount;
16
17 Ticket( const Ticket & rhs )
18 {
19 id = ++ticketCount;
20 }
21
22 const Ticket & operator= ( const Ticket & rhs )
23 {
24 if( this != &rhs )
25 id = ++ticketCount;
26 return *this;
27 }
28 };
29
30 int Ticket::ticketCount = 0; // place this line in Ticket.cpp
In this scenario, the ticket id is a data member that is part of each unique ticket instance,
but the ticket count is shared data, and is thus static. ticketCount is declared at line 15 as
being a static variable. Unfortunately, this does not cause the creation of the shared
ticketCount object; this must be provided separately as shown on line 30. Note that this def-
inition cannot be placed in the header file, because if the header file is included in separate
.cpp files, ticketCount will be multiply defined. (The ifndef/endif idiom only avoids mul-
tiple reading of the file for each .cpp file, not for separate .cpp files). ticketCount is ini-
tialized when it is defined.
The constructor, shown at lines 4 and 5 increments ticketCount and uses it to initial-
ize id. getTicketCount is a static method, since it could theoretically return 0 if invoked
prior to the first creation of a ticket. Also, observe the idiom of disabling the copy constructor
and copy assignment operator at lines 17 to 27 by placing them in the private section. Allowing
c++book.mif Page 84 Saturday, July 12, 2003 10:53 AM
1 int main( )
2 {
3 cout << Ticket::getTicketCount( ) << " tickets" << endl;
4
5 Ticket t1, t2, t3;
6 cout << Ticket::getTicketCount( ) << " tickets" << endl;
7 cout << "t2 is " << t2.getID( ) << endl;
8
9 return 0;
10 }
1 class MathUtils
2 {
3 ...
4 private:
5 static vector<int> primes;
6 static bool forceStaticInit;
7
8 static bool staticInit( int n )
9 {
10 // Use Sieve of Erastothenis to eliminate non-primes
11 vector<bool> nums( n + 1, true );
12 for( int i = 2; i * i <= n; i++ )
13 for( int j = i * 2; j <= n; j += i )
14 nums[ j ] = false;
15
16 for( int k = 2; k <= n; k++ )
17 if( nums[ k ] )
18 primes.push_back( k );
19
20 return true;
21 }
22 };
23
24 // In implementation file
25 vector<int> MathUtils::primes;
26 bool MathUtils::forceStaticInit = staticInit( 1000 );
Figure 4-33 Simulating the static initilizer in C++: initialize array with prime numbers less
than or equal to 1000
c++book.mif Page 85 Saturday, July 12, 2003 10:53 AM
Anonymous Objects 85
1 class Utilities
2 {
3 public:
4 static const int BITS_PER_BYTES = 8;
5 ...
6 };
7
8 // In Utilities.cpp
9 const int Utilities::BITS_PER_BYTES;
Disabling the copy constructor means, among other things, that a Ticket object may not
be passed using call-by-value, nor returned using return-by-value.
To invoke the getTicketCount member function, we once again use the :: scoping
operator, as shown on both lines 3 and 6 in Figure 4-32.
If a static data member involves a complex initialization, Java would process the initializa-
tion in a static initializer block. C++ does not have a static initializer block, but the effect can be
achieved as follows: Define a private static member function staticInit, and use a call to
this static member function to initialize the static data member. If the static data member does
not have appropriate copy semantics that would easily allow this, fabricate a second private
static data member of type bool, and have it invoke staticInit. Inside staticInit, you
can then explicitly initialize all the static data members of the class. An example of this strategy
is shown in Figure 4-33.
Because a static member function does not affect any particular instance of a class, static
member functions are never marked as accessors. Normally, static member functions would be
implemented in the .cpp file along with the non-static member functions.
If a static data member is a constant integer type (int, short, long, char, etc.) its ini-
tialization can be performed in the class declaration. However, it must still be defined in the
.cpp file, and the definition must not have any initialization. This hardly seems worth the
effort, except that initializing this way makes the static data member a constant integral expres-
sion, and qualifies it to be used as a case in a switch statement, and a few other places where
constant integral expressions are specifically required. Figure 4-34 illustrates the syntax.
Suppose we want to add defaults. The default string is "", the default double is 0.0,
but we need a default Date. Presumably there is an appropriate constructor, so assuming the
existence of a zero-parameter constructor, Date() represents a default date. Thus the construc-
tor is:
Student( const string & n = "", const Date & b = Date( ),
double g = 0.0 )
: name( n ), birthDate( b ), gpa( g )
{ }
4.15 Namespaces
The C++ equivalent of packages is the namespace. To declare a namespace, we simply write
namespace namespaceName
{
...
}
where namespaceName is the name of the namespace. Inside the braces can be functions,
objects, and class declarations. A class ClassName declared in namespace namespaceName
is formally known as namespaceName::ClassName.
As with Java packages, entities inside of a namespace can be accessed without specifying
the namespace. Like packages, namespaces are open-ended, so one can have several separate
namespace declarations.
As in Java, it can be inconvenient to write the complete class name, that includes the
namespace name. A using directive is the equivalent of an import directive. The first form,
using namespaceName::ClassName;
allows ClassName to be used as a shorthand for namespaceName::ClassName. The sec-
ond form,
using namespace namespaceName;
is the equivalent of the Java wild-card import directive (that ends .*), and allows all entities in
the namespace namespaceName to be known by their shorthands. (Older compilers handle
this wild-card form better than the more specific using directive above, so C++ code tends to use
the wild-card using directive).
In Java, classes can be declared as public or package visible. No such syntax exists in
C++; all (top-level) classes in the namespace are visible outside of the namespace.
Namespaces can be nested, with the :: operator used to access the nested namespace.
Namespace names are normal C++ identifiers and do not include the dot (.) that is typical
in Java namespaces.
Classes, functions, and objects that are declared outside of any namespace are considered
to be in the global namespace. These can always be accessed with a leading ::, as in
::IntCell. This may be necessary from code that is inside another namespace that also con-
tains a class called IntCell.
c++book.mif Page 87 Saturday, July 12, 2003 10:53 AM
Classes, functions, and objects can be declared in an anonymous namespace. Such entities
are not visible outside of the compilation unit (i.e. .cpp file) in which they are declared. This
allows us to declare classes without fear of conflict.
class B
{
...
A *data;
};
Here, both class A and class B contain a data member that is a pointer to an object of the
other class type. The declaration of class A will generate an error, because the compiler does not
know that B is a class type. Clearly switching the order of declaration for classes A and B won’t
work. The solution is an incomplete class declaration that serves solely to allow the compiler to
know the existence of the class type. In our example:
class B; // incomplete class declaration
class A
{
...
B *data;
};
class B
{
...
A *data;
};
Note that if a class makes a more active use of another class, an incomplete class declara-
tion may not be sufficient. For instance, if class A contained a data member of type B, rather than
simply a pointer, the compiler would still complain. In this case, the complete class declaration
of B (the implementation is not needed, but the memory layout is) would have to precede A.
Class B could not then have a data member of type A (since this would imply A and B are infi-
nitely large). Class B could have a data member that is a pointer to A.
c++book.mif Page 88 Saturday, July 12, 2003 10:53 AM
4.17.1 Member Functions and Data Members Cannot Have Same Name
In Java, a class can use a name for both methods and data at the same time. This is not allowed
in C++. In Java the name of class can be used as a method in that class (although usually this just
means that the programmer intending to write a constructor mistakenly provides a void return
type, which becomes legal). In C++ the compiler will not allow a member function to have the
same name as the class in which it is declared.
• C++ classes consist of the declaration and definition. Sometimes they are combined but
for large applications, it is common to place the declaration in a .h header file and the
definition in a .cpp implementation file.
• The header files should use the ifndef/endif idiom to avoid being processed more than
once per implementation file.
• Like Java, constructors are used to create objects. However, primitive data members are
not initialized by default.
c++book.mif Page 89 Saturday, July 12, 2003 10:53 AM
Exercises 89
• A copy constructor is always called whenever a brand new object is created that is initial-
ized to be a copy of an existing object. This includes formal parameters that are passed
using call-by-value.
• A copy assignment (operator=) is always called whenever an already existing object’s
state is changed to be a copy of another already existing object.
• When an object goes out of scope, its destructor is invoked. A destructor is needed for any
class that allocates resources (heap memory, file handles, etc.).
• If a destructor is written, then typically a copy constructor and copy assignment operator
also need to be written to provide deep copy semantics.
• C++ requires that the class designer differentiate accessor member functions (const
member functions) from mutator member functions (non-const member functions). The
const-ness of a member function is part of its signature.
• A one-parameter constructor implies the existence of a type-cast operator. The type-cast
operator may be used by the system implicitly, unless the constructor is declared
explicit.
• Initializer lists are used by the constructor to initialize data members by invoking their
constructors. A constant data member, or data member without a zero-parameter construc-
tor must be initialized in the initializer list.
• A friend of a class is able to access private members of the class. Typically a class grants
friendship to an entire other class, but it can also grant friendship on a function-by-func-
tion basis.
• C++ supports nested classes but not inner classes or anonymous classes. Although local
classes are allowed, because automatic variables in the function are not accessible, they
have less utility than their Java counterpart.
• C++ allows static members. Although there is no static initializer, one can relatively easily
write a function that simulates the behavior of the static initializer. Static data members
must be declared in the class declaration and defined in the implementation file.
• Member functions can return parameters by constant reference to avoid copying but it is
fairly tricky to do so.
• C++ namespaces are the rough equivalent of packages. The using directive is the equiva-
lent of the import directive.
• Incomplete class declarations are used to inform the compiler of the existence of a class
and is used if two or more classes refer to each other circularly.
4.19 Exercises
C H A P T E R 5
Operator Overloading
I
N contrast with Java, C++ is designed around making
objects appear to be as similar as possible to primitives. In Chapter 4, we saw that this decision
introduces significant complications that Java does not have. In this Chapter we look at a feature
of C++ that is not part of Java at all, namely operator overloading.
In this chapter, we describe operator overloading, which is the ability to define a meaning
for the existing operators when applied to new class types. In doing so, we can write classes
such as Rational, ComplexNumber, and BigDecimal without resorting to named meth-
ods such as add. This makes these types look as if they were primitive types built in to the lan-
guage from day 1. Operator overloading has a few subtleties, but for the most part with
reasonable care, it is a useful feature that is not at all that complex. Indeed, more than a few pro-
grammers list operator overloading as a feature they would like added to Java.
We begin our discussion with a Person class sketched in Figure 5-1. Once again, we will not
separate the class declaration and implementation except when needed later to illustrate syntax.
Our minimal Person class provides a constructor, that initializes a name and social security
number (a presumably unique identifier of the person), as well as accessors for those data mem-
bers and a print routine. Obviously we would expect more data members, accessors, and per-
haps some mutators, but this class by itself is sufficient to illustrate some basic principles.
In Java, a class such as this would normally be expected to provide an equals method
(and also a hashCode method). So we have provided an equals method that compares two
Person objects (we will declare two Persons equal if they have the same ssn). Given two
91
c++book.mif Page 92 Saturday, July 12, 2003 10:53 AM
1 class Person
2 {
3 public:
4 Person( int s, const string & n = "" )
5 : ssn( s ), name( n )
6 { }
7
8 const string & getName( ) const
9 { return name; }
10
11 int getSsn( ) const
12 { return ssn; }
13
14 void print( ostream & out = cout ) const
15 { out << "[ " << ssn << ", " << name << " ]"; }
16
17 bool equals( const Person & rhs ) const
18 { return ssn == rhs.ssn; }
19
20 private:
21 const int ssn;
22 string name;
23 };
Overloading I/O 93
1 class Person
2 {
3 public:
4 Person( int s, const string & n = "" )
5 : ssn( s ), name( n )
6 { }
7
8 const string & getName( ) const
9 { return name; }
10
11 int getSsn( ) const
12 { return ssn; }
13
14 void print( ostream & out = cout ) const
15 { out << "[ " << ssn << ", " << name << " ]"; }
16
17 bool equals( const Person & rhs ) const
18 { return ssn == rhs.ssn; }
19
20 bool operator==( const Person & rhs ) const
21 { return equals( rhs ); }
22
23 private:
24 const int ssn;
25 string name;
26 };
words, in lhs==rhs, when operator== is a member function *this is lhs. So if the first
operand cannot be of the class type in which the overloaded operator would be a member, as in
the case of operator<< where the first operand is ostream, then we must use a non-mem-
ber function.
Overloading an operator as a non-member function has a significant disadvantage: since it
is not a member of the class, the implementation cannot access any private data, unless it is
made a friend of the class. Thus, the following code does not compile unless Person has
declared that operator<< is a friend:
ostream & operator<< ( ostream & out, const Person & p )
{
out << "[ " << p.ssn << ", " << p.name << " ]";
return out;
}
On the other hand, the implementation of member function operator== can access pri-
vate data, since it is a member function of class Person:
bool operator==( const Person & rhs ) const
{ return ssn == rhs.ssn; }
In reality, the fact that non-member functions cannot access private data is rarely a signifi-
cant liability because the class designer often can implement non-member functions by invoking
public member functions of the class. Furthermore, the member function implementation of
operator== has its own subtle problem. Recall that in Java, we expect equals to be sym-
metric: a.equals(b) and b.equals(a) should give the same result if both are not null.
The implementation of operator== given above fails this requirement. Specifically,
because the Person constructor is not declared explicit, and because a one-parameter con-
structor is available (a default parameter can be used for the name), the following code com-
piles, and returns true:
Person p1( 123, "Joe" );
cout << ( p1 == 123 ) << endl;
In trying to invoke p1==Person, the compiler finds an inexact match, but can use the
one-parameter constructor to generate a temporary Person object from the int 123. However,
cout << ( 123 == p1 ) << endl;
does not compile: when using operator overloading, if the implementation is a class member
function, the first parameter MUST be exact. Implicit type conversion are not allowed.
If instead of implementing operator== as a member function, it was implemented as a
non-member function:
bool operator== ( const Person & lhs, const Person & rhs )
{
return lhs.equals( rhs );
}
both calls to == above would compile and yield true. In the non-member definition, we now
c++book.mif Page 96 Saturday, July 12, 2003 10:53 AM
have two parameters. Both parameters are input parameters only, and being potentially large are
thus passed using call-by-constant reference. Since the function is not a member function, it can-
not be an accessor function. Observe that the fact that this member function cannot access pri-
vate data does not limit us.
It is illegal to provide both versions of operator==, since invoking p1==p2 would be
ambiguous. In our particular situation, the most sensible approach would be to make the con-
structor explicit, in which case we would not need to worry about this.
However, there are many cases where implicit type conversions make sense. For instance,
if we have a BigInteger class, it would be nice to be able to write the following code:
int log2( const BigInteger & x )
{
int result = 0;
for( BigInteger b = x; b != 1; b /= 2 )
result++;
}
Certainly, we would expect that using 1!=b in the for loop would also work. (We’ll pre-
sume that ints are 32 bits, and that a BigInteger has less than 2,000,000,000 ( 2 31 ) bits.
Given that this would require 250 Megabytes to represent, that’s probably not too unreasonable
a limit). Looking at BigInteger, to support b!=1 and 1!=b we have several choices.
Choice #1 is to provide a single non-member function:
bool operator!= ( const BigInteger & lhs, const BigInteger & rhs )
both b!=1 and 1!=b will generate a temporary BigInteger and in either case, the original
BigInteger and the temporary BigInteger will be compared.
An alternative is to provide a member function, and a non-member function. While they
cannot clash, we can still overload, yielding both:
bool operator!= ( int lhs, const BigInteger & rhs )
bool BigInteger::operator!= ( const BigInteger & rhs ) const
Of course, the second function would also be declared in the BigInteger class declaration.
With this approach, two BigIntegers are compared with operator!=; b!=1 gener-
ates a temporary and then the member version of operator!= is used, and finally, 1!=b is an
exact match for the non-member function. Although we can always provide the following
implementation for the non-member function:
bool operator!= ( int lhs, const BigInteger & rhs )
{
return rhs != lhs;
}
we might be able to do better with sufficient effort, based on the fact that lhs must be small. If
this is not the case, we are probably better of with the first choice above. If it is the case, then it
stands to reason that it is worth defining three overloaded functions:
bool operator!= ( int lhs, const BigInteger & rhs )
bool BigInteger::operator!= ( const BigInteger & rhs ) const
c++book.mif Page 97 Saturday, July 12, 2003 10:53 AM
1 #include "Rational.h"
2 #include <iostream>
3 using namespace std;
4
5 // Rational number test program
6 int main( )
7 {
8 Rational x;
9 Rational sum = 0;
10 int n = 0;
11
12 cout << "Type as many rational numbers as you want" << endl;
13
14 for( sum = 0, n = 0; cin >> x; sum += x, n++ )
15 cout << "Read " << x << endl;
16
17 cout << "Read " << n << " rationals" << endl;
18 cout << "Average is " << ( sum / n ) << endl;
19
20 return 0;
21 }
1 #ifndef _RATIONAL_H
2 #define _RATIONAL_H
3 #include <iostream>
4 using namespace std;
5
6 class Rational
7 {
8 public:
9 // Constructors
10 Rational( int numerator = 0 )
11 : numer( numerator ), denom( 1 ) { }
12 Rational( int numerator, int denominator )
13 : numer( numerator ), denom( denominator )
14 { fixSigns( ); reduce( ); }
15
16 // Assignment Ops
17 const Rational & operator+=( const Rational & rhs );
18 const Rational & operator-=( const Rational & rhs );
19 const Rational & operator/=( const Rational & rhs );
20 const Rational & operator*=( const Rational & rhs );
21
22 // Unary Operators
23 const Rational & operator++( ); // Prefix
24 Rational operator++( int ); // Postfix
25 const Rational & operator--( ); // Prefix
26 Rational operator--( int ); // Postfix
27 const Rational & operator+( ) const;
28 Rational operator-( ) const;
29 bool operator!( ) const;
30
31 // Named Member Functions
32 double toDouble( ) const // Do the division
33 { return static_cast<double>( numer ) / denom; }
34 int toInt( ) const // Do the division
35 { return numer >= 0 ? numer / denom : - ( -numer / denom ); }
36 bool isPositive( ) const
37 { return numer > 0; }
38 bool isNegative( ) const
39 { return numer < 0; }
40 bool isZero( ) const
41 { return numer == 0; }
42 void print( ostream & out = cout ) const;
43 private:
44 // A rational number is represented by a numerator and
45 // denominator in reduced form
46 int numer; // The numerator
47 int denom; // The denominator
48
49 void fixSigns( ); // Ensures denom >= 0
50 void reduce( ); // Ensures lowest form
51 };
52
53 // Math Binary Ops
54 Rational operator+( const Rational & lhs, const Rational & rhs );
55 Rational operator-( const Rational & lhs, const Rational & rhs );
56 Rational operator/( const Rational & lhs, const Rational & rhs );
57 Rational operator*( const Rational & lhs, const Rational & rhs );
58
59 // Relational & Equality Ops
60 bool operator< ( const Rational & lhs, const Rational & rhs );
61 bool operator<=( const Rational & lhs, const Rational & rhs );
62 bool operator> ( const Rational & lhs, const Rational & rhs );
63 bool operator>=( const Rational & lhs, const Rational & rhs );
64 bool operator==( const Rational & lhs, const Rational & rhs );
65 bool operator!=( const Rational & lhs, const Rational & rhs );
66
67 // I/O
68ostream & operator<< ( ostream & out, const Rational & value );
69istream & operator>> ( istream & in, Rational & value );
70#endif
Figure 5-6 Rational declaration (private section and non-members)
numerator -3 and denominator 3. We allow the denominator to be zero, thus representing infin-
ity, negative infinity, or (if the numerator is also zero, indeterminate). These invariants are main-
tained internally by applying fixSigns and reduce as appropriate. Thus, not only is the data
representation private, but so are fixSigns and reduce, as are shown in Figure 5-7.
reduce uses a greatest-common-divisor algorithm, which illustrates the most interesting facet
of Figure 5-7. Although we could make this routine a private static member function, we have
chosen to simply make it a non-member function, but to avoid conflicting with others with the
same name, we limit its scope to the .cpp file by placing it in the anonymous namespace
(Section 4.15). Also, since the behavior of the % operator is undefined if the operands are nega-
tive, we insure that they are not by having gcd switch the sign of a negative numerator prior to
calling gcdRec.
Our class defines several named member functions that are all trivial. toDouble and
toInt are used to create a double or int from the Rational. An implicit type conversion
is not supported. We could have tried to allow it by using operator overloading to implement
type conversion operators:
c++book.mif Page 101 Saturday, July 12, 2003 10:53 AM
Figure 5-7 Private member routines and local gcd to keep Rationals in normalized form
c++book.mif Page 102 Saturday, July 12, 2003 10:53 AM
1 bool operator<=( const Rational & lhs, const Rational & rhs )
2 {
3 return !(lhs - rhs).isPositive( );
4 }
and thus we use a temporary. Because of the temporary, we have to return by value instead of
reference. Even if the copy constructor for the return is optimized away, the use of the temporary
suggests that, in many cases, the prefix form will be faster than the postfix form.
The three remaining unary operators have straightforward implementations, as shown in
Figure 5-14. operator! returns true if the object is zero; this is done by applying ! to the
numerator. Unary operator+ evaluates to the current object; a constant reference return can
be used here. operator- returns the negative of the current object by creating a new object
whose numerator is the negative of the current object. The return must be by copy because the
new object is a local variable. However, there is a trap lurking in operator-. If the word
Rational is omitted on line 26, then the comma operator evaluates (-numer,denom) as
denom, and then an implicit conversion gives the Rational denom/1, which is returned.
Have we had enough of the comma operator yet?
mat.length is 3 (because there are three rows), and each row is itself a one-dimensional
array. Thus, mat[0] is of type double[], and mat[0].length is 2. Similarly
mat[1].length is 3, and mat[2].length is 2.
As shown in Figure 5-15, our class will store the two-dimensional array using a vector
of vectors (at line 33). Note that in the declaration of array, white space must separate the
two > characters; otherwise the compiler will interpret the >> token as shift operator. In other
words, we must write
vector<vector< double> > array; // white space needed
and not
vector<vector<double>> array; // oops!
The constructor first constructs array as having rows entries each of type
vector<double>. Since each entry of array is constructed with the zero-parameter con-
structor, it follows that each entry of array is a vector<double> object of size 0. Thus we
have rows zero-length vectors of double. The body of the constructor is then entered and
each row is resized to have cols columns. Thus the constructor terminates with what appears to
be a two-dimensional array. (Note that the doubles themselves are not guaranteed any initial-
ization)
c++book.mif Page 108 Saturday, July 12, 2003 10:53 AM
1 #ifndef _MATRIX_OF_DOUBLE_H
2 #define _MATRIX_OF_DOUBLE_H
3
4 #include <vector>
5 using namespace std;
6
7 class MatrixOfDouble
8 {
9 public:
10 MatrixOfDouble( int rows, int cols ) : array( rows )
11 { setNumCols( cols ); }
12
13 int numrows( ) const
14 { return array.size( ); }
15 int numcols( ) const
16 { return numrows( ) > 0 ? array[ 0 ].size( ) : 0; }
17
18 void setNumRows( int rows )
19 { array.resize( rows ); }
20 void setNumCols( int cols )
21 { for( int i = 0; i < rows; i++ )
22 array[ i ].resize( cols );
23 }
24 void setDimensions( int rows, int cols )
25 { setNumRows( rows ); setNumCols( cols ); }
26
27 const vector<double> & operator[]( int row ) const
28 { return array[ row ]; }
29 vector<double> & operator[]( int row )
30 { return array[ row ]; }
31
32 private:
33 vector< vector<double> > array;
34 };
35
36 #endif
Because vectors know how to clean up their own memory, we do not need to worry
about a destructor, copy constructor, or copy assignment operator.
The numrows and numcols accessors are easily implemented as shown. As we will see,
it is possible for the user to make the two dimensional array non-rectangular, in which case, the
result from numcols is meaningless.
c++book.mif Page 109 Saturday, July 12, 2003 10:53 AM
We also provide member functions to change the number of rows and columns. Note that
when the number of rows is changed by resizing the array, any additional rows have length 0
(i.e. no columns).
As our Java example illustrated, the key operation is [], which in C++ can be overloaded
as operator[]. The result of mat[r] is the invocation of mat.operator[](r), and
returns a vector corresponding to row r of matrix mat. Thus we have a skeleton:
vector<double> operator [] ( int row )
{ return array[ row ]; }
The main question is whether this is an accessor or a mutator, and what the return mecha-
nism should be. If we consider the following routine:
void copy( MatrixOfDouble & to, const MatrixOfDouble & from )
{
for( int r = 0; r < to.numrows( ); r++ )
to[ r ] = from[ r ];
}
in which we copy each row of from into the corresponding row of to, we see contradictions.
If operator[] returns a vector<double> by value, then to[r] cannot appear on
the left-hand side of the assignment. The only way to affect a change of the elements stored in
to is if operator[] returns a reference to a vector. Unfortunately, doing so would allow
from[ r ] = to[ r ];
to compile, in violation of from’s const-ness. That cannot be allowed in a good design.
Thus we see that we need operator[] to return a constant reference for from, but a
plain reference for to. In other words, we need two versions of operator[], which differ
only in the return types. That is not allowed for two members that otherwise have the same sig-
nature. And therein lies the trick: whether a member function is an accessor or a mutator is part
of the signature. Thus the version that returns a constant reference will be marked as an accessor,
and the version that returns a reference will be marked as a mutator. This is shown at lines 27 to
30.
It is worth emphasizing two points. First, in resolving the call, the compiler will always
choose the mutator unless it is not a candidate (because the object it is acting upon is a constant).
Second, any member function that returns a reference to private data is explicitly allowing the
private data to be changed later on by anyone who retains that reference. As such, the technique
should generally be avoided except in cases such as this one where this is a desired outcome, and
member functions that return by reference should not be marked as accessors, even though the
call itself does not changed the state of the object.
To see how this works, the expression mat[0][1]=3.14 consists of:
vector<double> & row0 = mat.operator[] ( 0 ); // matrix [] mutator
double & item01 = row0.operator[] ( 1 ); // vector [] mutator
item01 = 3.14;
c++book.mif Page 110 Saturday, July 12, 2003 10:53 AM
Here row0 is a reference to (i.e. another name for) mat.array[0] that is stored inter-
nally in mat, representing row 0. item00 is a reference to mat.array[0][1], stored inter-
nally. So changing item01 to 3.14 changes the entry in mat.array[0][1].
Notice that although we are invoking operator[] twice, we are invoking two different
versions of operator[]. Notice that both versions are mutators, but do not make any changes
on their own to the objects they are acting upon. But by returning references to the object, they
open the door for later changes.
• Operator overloading allows us to define meanings for the existing operators when
applied to non-primitive types.
• When an operator is overloaded as a member function, the first operand is accessible as
*this, and subsequent operands are parameters. The first operand must be an exact
match of the class type; implicit type conversions are not acceptable.
• When an operator is overloaded as a non-member function, all operands are parameters,
but the implementation does not, by default, have access to private members of any class.
• All but a few operators can be overloaded. The most common operators to overload
include the assignment operator(s), equality and relational operators, and the input and
output stream operators. Later we will see that the function call operator can also be over-
loaded.
• operator<< should be overloaded for all class types. Typically this is done by provid-
ing a companion print member function.
c++book.mif Page 111 Saturday, July 12, 2003 10:53 AM
Exercises 111
5.9 Exercises
to the corresponding value in the map. If the key is not present, operator[] will insert
it with a default value, and return a reference to the newly inserted value. This implies that
operator[] is a mutator. Add an accessor version with similar semantics, but have it
throw an exception if the key is not found. Provide two separate implementations: one
maintains the items in an array, and a second maintains the items in a linked list. In both
cases, each key and value is stored together in a Pair, which is a nested class that you
should define.
c++book.mif Page 113 Tuesday, April 29, 2003 2:13 PM
C H A P T E R 6
Object-Oriented
Programming: Inheritance
L
IKE Java, C++ supports inheritance. Most of the fea-
tures associated with inheritance that are found in Java have equivalent implementations in C++.
In some cases, these features are not the default behavior, and thus the C++ programmer must be
more careful than a Java programmer. In other cases, the behavior of similar constructs is
slightly different. And additionally, C++ supports some techniques, such as multiple inheritance,
that Java does not allow.
In this chapter, we describe the basics of Java inheritance, see some of the extra code that
the C++ programmer must write to avoid subtle errors, discuss multiple inheritance in C++, and
examine the differences between similar C++ and Java constructs.
To illustrate the basics of inheritance, we begin with a simple Person class in Figure 6-1 and
then a simple Student class in Figure 6-2 that extends Person. The Person class dupli-
cates Figure 5-1, but has some subtle errors that eventually must be fixed. But first, let’s exam-
ine the basic syntax in the Student class.
First, instead of the extends clause that is found in Java, an IS-A relationship is signalled
in C++ by using the syntax seen at line 1 in Figure 6-2:
The reserved word public is required; otherwise we get private inheritance, which does
not model an IS-A relationship (see Section 6.10.2) and is typically not what we want.
113
c++book.mif Page 114 Tuesday, April 29, 2003 2:13 PM
1 class Person
2 {
3 public:
4 Person( int s, const string & n = "" )
5 : ssn( s ), name( n )
6 { }
7
8 const string & getName( ) const
9 { return name; }
10 int getSsn( ) const
11 { return ssn; }
12
13 void print( ostream & out = cout ) const
14 { out << ssn << ", " << name; }
15
16 private:
17 int ssn;
18 string name;
19 };
20
21 ostream & operator<< ( ostream & out, const Person & p )
22 {
23 p.print( out );
24 return out;
25 }
Dynamic dispatch is almost always the preferred course of action. However, dynamic dis-
patch incurs some run-time overhead because it requires that the program maintain extra infor-
mation and that the compiler generate code to determine which member function to invoke. This
overhead was once thought to be significant, and would be incurred even if no inheritance was
actually used, and so C++ did not make it the default. This is unfortunate, because we now know
that the overhead of dynamic dispatch is relatively minor.
In order to achieve dynamic dispatch, the C++ programmer must mark the base class
method with the reserved word virtual. A virtual function uses dynamic dispatch. A non-vir-
tual function uses static dispatch. Thus if we rewrite the print member function in class
Person as:
virtual void print( ostream & out = cout ) const
{ out << ssn << ", " << name; }
the code at the start of this section behaves as expected. This also is the minor change that fixes
operator<<.
As a general rule, if a method is overridden in a derived class, it should be declared virtual
in the base class to ensure that the correct method is always selected. In fact, a stronger state-
ment applies: if a method might reasonably be expected to be overridden in a derived class, it
should be declared virtual in the base class. Once a method is marked virtual, it is virtual from
that point down in the inheritance hierarchy.
Only if a member function is intended to be invariant in an inheritance hierarchy (or if an
entire class is not intended to be extended) does it make sense to not mark it as virtual. Thus
getName and getSsn are not marked as virtual. In Java, these would be marked final to
signify that it is illegal to attempt to override. C++ does not have final methods or final classes,
so the lack of virtual in a method declaration is a signal to the reader that the method is
intended to be final, and those semantics are guaranteed if invoked through a base class refer-
ence (or pointer).
When the class declaration and implementation are separate, the virtual declaration must
be in the member function declaration and should not be in the separate member function defini-
tion.
6.3.1 Defaults
First, we need to decide what the defaults are. For all three, the inherited component is consid-
ered to be a data member. Thus, by default:
• the copy constructor is implemented by invoking a copy constructor on the base class(es),
c++book.mif Page 117 Tuesday, April 29, 2003 2:13 PM
1 class Student
2 {
3 ...
4 Student( const Student & rhs )
5 : Person( rhs ), gpa( rhs.gpa )
6 { }
7
8 ~Student( )
9 {
10 // automatically chains up
11 }
12
13 const Student & operator= ( const Student & rhs )
14 {
15 if( this != & rhs )
16 {
17 Person::operator= ( rhs );
18 gpa = rhs.gpa;
19 }
20 return *this;
21 }
22 ...
23 };
Figure 6-3 shows an explicit implementation of the defaults for the Student class. To
implement the copy constructor, we need to make sure to include a call to the base class copy
constructor in the initializer list. To implement the destructor, we simply list the additional
actions that must be taken in addition to the default. To implement the copy assignment operator,
we need to make sure that we chain up to the base class.
In short, to implement copy semantics, we must chain up to the base class, while in the
destructor the chaining up is automatic.
Here, s1 points at a Student object, and certainly the second assignment is legal, since
Student IS-A Person. So when delete is invoked, whose destructor is used?
Since the destructor is a member function, the answer depends on whether or not the
destructor is declared virtual. In our original code in Figure 6-1, it is not, so we invoke the
Person destructor instead of the Student destructor. This means that any memory in the
Student data fields that was allocated from the memory heap either directly (via calling new),
or indirectly (e.g. if there is a string, or vector as a data member), is never reclaimed, and
we have a subtle memory leak.
Thus we see an important rule: In a base class, a destructor should always be declared vir-
tual to ensure polymorphic destruction. As with all virtual methods, this costs some space and
time, so classes that are intended to be final can avoid declaring their destructors virtual. The
reason it is easy to do this incorrectly is that the base class destructor should be virtual even if
defaults are used in the entire hierarchy. Otherwise, as we just mentioned, memory that was allo-
cated indirectly, and which otherwise would have been released will leak.
Figure 6-4 shows the Person class with correct virtual declarations. The Student class
in Figure 6-1 needs no changes from the original.
1. It is declared virtual
2. The declaration is followed by =0
1 class Shape
2 {
3 public:
4 Shape( const string & s ) : shapeType( s ) { }
5 virtual ~Shape( ) { }
6
7 const string & getType( ) const
8 { return shapeType; }
9
10 virtual double getArea( ) const = 0;
11 virtual void print( ostream & out ) const
12 { out << getType( ) << " of area " << getArea( ); }
13
14 private:
15 string shapeType;
16 };
17
18 ostream & operator<< ( ostream & out, const Shape & s )
19 {
20 s.print( out );
21 return out;
22 }
23
24 class Circle : public Shape
25 {
26 public:
27 Circle( double r ) : Shape( "Circle" ), radius( r )
28 { }
29
30 double getArea( ) const
31 { return 3.14 * radius * radius; }
32
33 private:
34 double radius;
35 };
36
37 class Square : public Shape
38 {
39 // similar implementation as Circle not shown
40 };
Slicing 121
classes do provide constructors that are used by the subclasses. The destructor at line 5 is
marked virtual (we do not make it abstract, since we have other abstract methods).
getType, shown at lines 7 and 8 is not marked virtual, to signify that it is intended as a
final method. Although subclasses can provide different implementations, if getType is
invoked through a Shape reference or pointer, we will always invoke the Shape’s getType
method, which is similar to the behavior we would get in Java.
Abstract method getArea is declared at line 10, and is overridden in the Circle class.
The print method is virtual, signalling that we expect that the default implementation we have
provided may need to be overridden. The fact that Circle and Square do not override
print in our implementation does not justify removing the virtual declaration. The declaration
is there to express that the print method is not invariant and may need to be overridden.
In the Circle class, we see that the Circle constructor invokes the Shape constructor
at line 27 to initialize its inherited components.
6.5 Slicing
In our discussion of virtual member functions, we have talked about accessing derived classes
by pointers and reference variables. So why not by direct base class objects? The reason is rela-
tively simple: it doesn’t work!
Consider the following example:
Student s( 123456789, "Jane", 4.0 );
Person p( 987654321", "Bob" );
p = s;
p.print( );
Clearly the first two lines create two objects. The first object is of type Student, and the
second is of type Person, as shown in Figure 6-6. And equally, clearly, the third statement cop-
ies s into the already existing object p. As Figure 6-6 shows, p only has room for the name and
"Bob" "Jane"
987654321 123456789
4.0
Figure 6-6 Memory layout of base class and derived class object; shaded portions are in-
herited data and might not be visible
c++book.mif Page 122 Tuesday, April 29, 2003 2:13 PM
"Jane" "Jane"
123456789 123456789
4.0
In Java, if we want to store a collection of shapes, we can simply throw them in an array of
Shape, and then safely invoke the print and area methods, with automatic dynamic dis-
patch. But what we actually have is not an array of Shape objects, but rather, an array of refer-
ence variables that all reference objects that are (subtypes of) Shapes.
In C++, we cannot store an array of Shape, because of slicing. As soon as we would try
to place a derived class object into the array, it would be sliced. All calls would be on Shape
objects. Of course Shape is abstract, so that is another problem, but the abstractness of Shape
is not really the critical issue. It is the slicing problem.
We can use an array of pointers to Shapes, and then the code looks exactly like Java. In
fact if:
we can for the most part do everything that we do in Java, with one important exception: Java
does garbage collection, while in C++ we must eventually call delete. And that’s the hard
part.
Using inheritance in C++ implies that we must make significant use of pointers, and allo-
cate objects on the memory heap. Which means we are stuck with the thorny issue of cleaning
up memory, which is quite a nuisance in C++ and notoriously error-prone.
As an example of how we use pointers with inheritance to achieve polymorphic behavior,
the code in Figure 6-8 shows how we store Circles and Squares (and in general any kind of
Shape) in a single collection.
The main routine, shown at lines 36 to 40 simply invokes testShapes mostly so we
can make clear that any allocated memory in testShapes must be cleaned up. In
testShapes, line 25 is the C++ equivalent of creating an empty ArrayList, and then lines
27 to 29 is the equivalent of calling add. We do not have any slicing problems since the
vector is storing pointers, so we are never copying Shape objects themselves. We can then
pass the vector to methods such as printArray and totalArea, which will scan through
the vector, and invoke the appropriate print and area methods on the Shapes being
pointed at.
In testShapes, arr is a local variable allocated on the runtime stack. This means that
when testShapes returns, arr’s destructor will be called. This will free the vector, but
not the Shapes that were being pointed at. Since testShapes created objects from the mem-
ory heap, it must reclaim them, and this is the job of cleanup.
cleanup simply steps through the array invoking delete. Observe that this would not
work if the Shape class did not correctly declare a virtual destructor. printShapes and
totalArea are similar, and simply illustrate different syntax. Since operator<< accepts
any Shape, and *a[i] is an object that is a (subtype of) Shape we can pass it to
c++book.mif Page 124 Tuesday, April 29, 2003 2:13 PM
1 class Printable
2 {
3 public:
4 virtual ~Printable( ) { }
5 virtual void print( ostream & out = cout ) const = 0;
6 };
7
8 class Serializable
9 {
10 public:
11 virtual ~Serializable( ) { }
12 virtual void readMe( istream & in = cin ) = 0;
13 virtual void writeMe( ostream & out = cout ) const = 0;
14 };
1 class Person
2 {
3 public:
4 Person( const string & n, const string & t )
5 : name( n ), ptype( t ) { }
6 virtual ~Person( )
7 { }
8 private:
9 string name;
10 string ptype;
11 };
12
13 class Student : public Person
14 {
15 public:
16 Student( const string & n )
17 : Person( "Student", n ) { ... }
18 int getHours( ) const; // number of credit hours taken
19
20 private:
21 int hours;
22 };
23
24 class Employee : public Person
25 {
26 public:
27 Employee( const string & n )
28 : Person( "Employee", n ) { ... }
29 int getHours( ) const; // number of vacation hours left
30
31 private:
32 int hours;
33 };
34
35 class StudentEmployee : public Student, public Employee
36 {
37 public:
38 // Constructor sets type to StudentEmployee???
39 // ??? getHours ???
40
41 private:
42 // ??? name ???
43 // ??? hours ???
44 };
Student and Employee, and then StudentEmployee extends both Student and
Employee, thus having the functionality of both.
But now consider the problems that have to be resolved in StudentEmployee.
Since getHours is defined in both classes, which is inherited? Since there is an ambigu-
ity, either both are inherited, but then invoker must make plain which of the methods is to be
used, or both are overridden in the StudentEmployee class. The first possibility leads to
ugly code; in the second, StudentEmployee will not be computing the same information as
both Student and Employee, violating at least one IS-A relationship. Most likely, we would
want to just change the method name to avoid this conflict.
More tricky is the memory layout. Since Student and Employee both define an
hours data member, it seems that StudentEmployee needs two copies of hours. This is
certainly true, as seen if we change the name of one of the getHours, but leave the data mem-
ber in tact. A basic tenet of inheritance is that the subclasses inherit the superclass’ data.
But now what about name? Both Student and Employee have a name data member,
so according to our logic, there should be two copies. But that’s no good, since Student and
Employee inherited name from Person. By default however, we get two copies. To get only
one copy, we must use virtual inheritance. When Person is extended, the subclasses use
virtual to signal that any inherited data from Person is stored in a different manner. When
multiple such subclasses are themselves extended, the compiler will be able to distinguish
between data members like hours that were created in the subclasses and data members like
name that are really all part of a single ancestor class.
If the ancestor class supplies some of the data, it makes sense that its constructor should be
invocable. Thus with virtual inheritance, not only can a superclass constructor be invoked, but
also the constructor of an ancestor class that is virtually extended can be invoked in the initial-
izer list. All other initializations of the ancestor’s data by other initializers in the initializer list
are ignored.
The result of all these changes, with an illustration of the syntax is shown in Figure 6-12.
Observe the deficiency of this approach: although the conflict is at StudentEmployee, it is
the Student and Employee classes that are responsible for using virtual inheritance. Thus if
the base classes have not already declared their use of virtual inheritance, it is difficult to inherit
from both of them without having to disturb existing code.
1 class Person
2 {
3 public:
4 Person( const string & t, const string & n )
5 : ptype( t ), name( n ) { }
6 virtual ~Person( )
7 { }
8 private:
9 string ptype;
10 string name;
11 };
12
13 class Student : virtual public Person
14 {
15 public:
16 Student( const string & n, int h )
17 : Person( "Student", n ), hours( h ) { ... }
18 int getCreditHours( ) const; // credit hours taken
19
20 private:
21 int hours;
22 };
23
24 class Employee : virtual public Person
25 {
26 public:
27 Employee( const string & n, int h )
28 : Person( "Employee", n ), hours( h ) { ... }
29 int getVacationHours( ) const; // vacation hours left
30
31 private:
32 int hours;
33 };
34
35 class StudentEmployee : public Student, public Employee
36 {
37 public:
38 StudentEmployee( const string & n, int ch, int vh )
39 : Person( "StudentEmployee", n ),
40 Student( "ignored", ch ), Employee( "ignored", vh )
41 { }
42 private:
43 // one name, two hours
44 };
ble with IntCell and would still have getValue and setValue. With private inheritance,
getValue and setValue are only visible inside of NewCell, though they can be used in
the implementation of get and put, but not outside of the NewCell class.
Figure 6-14 Derived class methods hide base class methods with same name
c++book.mif Page 132 Tuesday, April 29, 2003 2:13 PM
is declared in a derived class, it hides all methods of the same name in the base class. Thus foo
with no parameters is no longer accessible through a derived class reference, even though it
would be accessible through a base class reference.
There are two ways around this problem. The first is to override all of the hidden methods
and redefine them in the derived class with an implementation that chains to the base class. Thus
in class Derived, we add
void foo( ) { Base::foo( ); } // place in class Derived
The other alternative is to provide a using declaration in the derived class. In class
Derived, we add
using Base::bar; // place in class Derived
One reason why this rule is important is that an accessor hides a mutator. Most likely this
was unintentional, but many compilers will warn you, and what they are saying, in effect, is that
you made a slight change to the signature when you overrode the original member function. Pay
attention to those warnings.
6.10.7 Reflection
C++ has little support for reflection. About all you can do is invoke the typeid method by
passing an expression. It returns an object of type type_info representing the runtime type of
the expression. Standard C++ guarantees that this object contains a public data member called
name, and that you can compare type_info objects with == and !=. Thus,
Person *p = new Student( 123456789, "Jane", 4.0 );
Student *s = p;
cout << typeid( *s ).name << endl; // prints name
cout << ( typeid( *s ) == typeid( Person ) ) << endl; // false (0)
• C++ does not have interfaces, but the effect can be achieved with an abstract base class
that contains only abstract methods and a virtual destructor.
• C++ allows multiple inheritance. If two implementations conflict, both of those imple-
mentations should have been declared using virtual inheritance to avoid replication of data
members.
• Protected members are visible in derived class implementations, regardless of how they
are accessed.
• Friendship is not inherited.
• There is no root class that is equivalent to Object. Instead templates are used to imple-
ment generic algorithms.
• Although it is bad style, C++ allows the reduction of visibility of a method in a derived
class.
• Methods declared in the derived class hide the base class methods with the same name.
• The return type of a derived class method can be changed from the base class method if it
is replaced with a subclass of the original return type.
• An array of a derived type is not type compatible with an array of the base type.
• C++ does not support reflection.
6.12 Exercises
Exercises 135
19. Define an abstract base class called Employee that contains a name (string), a social
security number (string), and the respective accessor functions and contains an abstract
method that gets and sets a salary. It also contains a method called print whose task is to
output the name and social security number. You should not use protected members.
Include a two-parameter constructor, using initializer lists, and give all parameters
defaults. Carefully decide which members should be virtual. Next, derive a class called
Hourly that adds a new data member to store an hourly wage (double). Its print
method must print the name, social security number, and salary (with the phrase
"per hour"). It will certainly want to call the base class print. Provide an accessor
and mutator for the salary, and make sure that its constructor initializes a salary. Next,
derive another class called Salaried that adds a new data member to store a yearly sal-
ary (double). Its print method must print the name, social security number, and salary
(with phrase "annual"). Provide an accessor and mutator for the salary, and make sure
that its constructor initializes a salary. Exercise the classes by declaring objects of type
Salaried and Hourly via constructors and calling their print methods. Provide a
single operator<< that prints an Employee (by calling print). This method will
automatically work for anything in the Employee hierarchy. In order to hold all employ-
ees, create a class called Roster that is able to hold a variable number of Employee *
objects. Roster should have a vector of Employee *. Provide the capability to add
an employee, and print the entire roster of employees. To add an employee, Ros-
ter::add is passed a pointer to an Employee object and calls vec-
tor::push_back. Don't worry about error checks. To summarize, Roster has public
methods named add and print. Write a short test program, in which you create a Ros-
ter object, call new for both kinds of Employee, sending the result to Roster::add,
and output the Roster via a call to print.
20. Define a hierarchy that includes Person, Student, Athlete, StudentAthlete,
FootballPlayer, BasketballPlayer, StudentFootballPlayer,
StudentBasketballPlayer, StudentFootballAndBasketballPlayer.
Give a Person a name, a Student a GPA, an Athlete a uniform number, a
FootballPlayer a boolean representing true for offense, and false for defense, a
BasketballPlayer a scoring average (as a double). Define appropriate data repre-
sentations, constructors, destructors, and methods, and make use of virtual inheritance.
Observe that StudentFootballAndBasketballPlayer will have two uniform
numbers. Write a test program that obtains both numbers.
c++book.mif Page 136 Tuesday, April 29, 2003 2:13 PM
C H A P T E R 7
Templates
I
N Java, inheritance is used to write type-independent
code. In C++, however, a different alternative is used: the template.
In this chapter, we begin by reviewing how generic algorithms are implemented in Java,
and then see the basics of how templates are written for functions and classes. We will also see
how templates are used to implement function objects, which in Java are implemented via inher-
itance and interfaces. We will discuss how templates affect separate compilation, and finally, we
will briefly mention some of the advanced uses of templates.
137
c++book.mif Page 138 Tuesday, April 29, 2003 2:13 PM
cates that Comparable is the template argument: it can be replaced by any type to generate a
function (both class and typename can be used interchangeably here). For instance, if a call
1 const int & findMax( const vector<int> & a )
2 {
3 int maxIndex = 0;
4
5 for( int i = 1; i < a.size( ); i++ )
6 if( a[ maxIndex ] < a[ i ] )
7 maxIndex = i;
8
9 return a[ maxIndex ];
10 }
11
12 const double & findMax( const vector<double> & a )
13 {
14 int maxIndex = 0;
15
16 for( int i = 1; i < a.size( ); i++ )
17 if( a[ maxIndex ] < a[ i ] )
18 maxIndex = i;
19
20 return a[ maxIndex ];
21 }
22
23 const string & findMax( const vector<string> & a )
24 {
25 int maxIndex = 0;
26
27 for( int i = 1; i < a.size( ); i++ )
28 if( a[ maxIndex ] < a[ i ] )
29 maxIndex = i;
30
31 return a[ maxIndex ];
32 }
33
34 const IntCell & findMax( const vector<IntCell> & a )
35 {
36 int maxIndex = 0;
37
38 for( int i = 1; i < a.size( ); i++ )
39 if( a[ maxIndex ] < a[ i ] )
40 maxIndex = i;
41
42 return a[ maxIndex ];
43 }
Figure 7-3 findMax function template expanded (including a bad expansion for Int-
c++book.mif Page 140 Tuesday, April 29, 2003 2:13 PM
1 #include "IntCell.h"
2 #include <string>
3 using namespace std;
4
5 template <typename Comparable>
6 const Comparable & max2( const Comparable & lhs,
7 const Comparable & rhs )
8 {
9 return lhs > rhs ? lhs : rhs;
10 }
11
12 const string & max2( const string & lhs, const string & rhs )
13 {
14 return lhs > rhs ? lhs : rhs;
15 }
16
17 int main( )
18 {
19 string s = "hello";
20 int a = 37;
21 double b = 3.14;
22
23 cout << max2( a, a ) << endl; // OK: expand with int
24 cout << max2( b, b ) << endl; // OK: expand with double
25 cout << max2( s, s ) << endl; // OK: not a template
26 cout << max2( a, b ) << endl; // Ambiguous
27
28 return 0;
29 }
1 #ifndef _OBJECTCELL_H
2 #define _OBJECTCELL_H
3
4 template <typename Object>
5 class ObjectCell
6 {
7 public:
8 explicit ObjectCell( const Object & initValue = Object( ) )
9 : storedValue( initValue )
10 { }
11
12 const Object & getValue( ) const
13 { return storedValue; }
14 void setValue( const Object & val )
15 { storedValue = val; }
16
17 private:
18 Object storedValue;
19 };
20 #endif
1 #include "ObjectCell.h"
2 #include <iostream>
3 using namespace std;
4
5 int main( )
6 {
7 ObjectCell<int> m1;
8 ObjectCell<double> m2( 3.14 );
9
10 m1.setValue( 37 );
11 m2.setValue( m2.getValue( ) * 2 );
12
13 cout << m1.getValue( ) << endl;
14 cout << m2.getValue( ) << endl;
15
16 return 0;
17 }
It should be clear that the vector class is in reality a class template. It turns out that the
string class is an instantiated class template (the class template is basic_string). A third
class template is complex, which is most often instantiated as complex<double> and is
found in the standard header file complex. And finally, ostream and istream are actually
instantiations of class template basic_ostream and basic_istream.
1 #ifndef _MATRIX_H
2 #define _MATRIX_H
3 #include <vector>
4 using namespace std;
5
6 template <typename Object>
7 class Matrix
8 {
9 public:
10 Matrix( int rows, int cols ) : array( rows )
11 { setNumCols( cols ); }
12
13 int numrows( ) const
14 { return array.size( ); }
15 int numcols( ) const
16 { return numrows( ) > 0 ? array[ 0 ].size( ) : 0; }
17
18 void setNumRows( int rows )
19 { array.resize( rows ); }
20 void setNumCols( int cols )
21 { for( int i = 0; i < numrows( ); i++ )
22 array[ i ].resize( cols );
23 }
24 void setDimensions( int rows, int cols )
25 { setNumRows( rows ); setNumCols( cols ); }
26
27 const vector<Object> & operator[]( int row ) const
28 { return array[ row ]; }
29 vector<Object> & operator[]( int row )
30 { return array[ row ]; }
31
32 private:
33 vector< vector<Object> > array;
34 };
35
36 #endif
1 #include <iostream>
2 #include "Matrix.h"
3 using namespace std;
4
5 int main( )
6 {
7 Matrix<int> m( 2, 2 );
8 m[ 0 ][ 0 ] = 1; m[ 0 ][ 1 ] = 2;
9 m[ 1 ][ 0 ] = 3; m[ 1 ][ 1 ] = 4;
10
11 cout << "m has " << m.numrows( ) << " rows and "
12 << m.numcols( ) << " cols." << endl;
13
14 cout << m[ 0 ][ 0 ] << " " << m[ 0 ][ 1 ] << endl <<
15 m[ 1 ][ 0 ] << " " << m[ 1 ][ 1 ] << endl;
16
17 return 0;
18 }
quite cumbersome. For instance, to define operator= in the specification requires no extra
baggage. In the implementation, we would have the brutal:
template <typename Object>
const ObjectCell<Object> &
ObjectCell<Object>::operator= ( const ObjectCell<Object> & rhs )
{
if( this != &rhs )
storedValue = rhs.storedValue;
return *this;
}
Even with this, the issue now becomes how to organize the class template declaration and
the member function template definitions. The main problem is that the implementations in
Figure 7-10 are not actually functions; they are still wannabees. They are not even expanded
when the ObjectCell template is instantiated. Each member function template is expanded
only when it is invoked.
1 #ifndef _OBJECTCELL_H
2 #define _OBJECTCELL_H
3
4 template <typename Object>
5 class ObjectCell
6 {
7 public:
8 explicit ObjectCell( const Object & initValue = Object( ) );
9
10 const Object & getValue( ) const;
11 void setValue( const Object & val );
12
13 private:
14 Object storedValue;
15 };
16 #endif
1 #include "ObjectCell.h"
2
3 template <typename Object>
4 ObjectCell<Object>::ObjectCell( const Object & initValue )
5 : storedValue( initValue )
6 {
7 }
8
9 template <typename Object>
10 const Object & ObjectCell<Object>::getValue( ) const
11 {
12 return storedValue;
13 }
14
15 template <typename Object>
16 void ObjectCell<Object>::setValue( const Object & val )
17 {
18 storedValue = val;
19 }
1 #include "ObjectCell.cpp"
2
3 template class ObjectCell<int>;
4 template class ObjectCell<double>;
Figure 7-13 Pair class template that is very picky about types
This strategy is illustrated in Figure 7-14. Lines 9 to 12 contain the member template.
Observe that the constructor for Pair accepts a Pair with arbitrary types and as long as the
types are compatible the constructor template will expand. If the types are not compatible, we
get a compile-time error due to line 11, which is perfect.
Member templates are also a relatively new addition to C++, so many compilers do not
support them. The use shown here in extending type compatibility among different template
instantiations is probably its most common.
Figure 7-14 Pair class template that is allows construction from any type-compatible
Pair using member template
c++book.mif Page 150 Tuesday, April 29, 2003 2:13 PM
1 class Rectangle
2 {
3 public:
4 explicit Rectangle( int len = 0, int wid = 0 )
5 : length( len ), width( wid ) { }
6
7 int getLength( ) const
8 { return length; }
9
10 int getWidth( ) const
11 { return width; }
12
13 void print( ostream & out = cout ) const
14 { out << "Rectangle " << getLength( ) << " by "
15 << getWidth( ); }
16
17 private:
18 int length;
19 int width;
20 };
21
22 ostream & operator<< ( ostream & out, const Rectangle & rhs )
23 {
24 rhs.print( out );
25 return out;
26 }
Figure 7-18 Class that implements comparator interface for rectangles (preliminary)
1 class LessThanByLength
2 {
3 public:
4 bool isLessThan( const Rectangle & lhs,
5 const Rectangle & rhs ) const
6 { return lhs.getLength( ) < rhs.getLength( ); }
7 };
Figure 7-19 Class that implements comparator for rectangles, without comparator
Now main can invoke findMax if it passes a vector of Rectangles and an appro-
priate function object. main is shown in Figure 7-17. The comparator is an anonymous instance
of class LessThanByLength, which implements the Comparator interface and is shown
in Figure 7-18. Specifically, it extends an instantiated Comparator template by implementing
isLessThan.
Except for the template baggage that replaces the use of Object as a superclass, this
implementation is exactly equivalent to the Java idiom for function objects.
1 class LessThanByLength
2 {
3 public:
4 bool operator( ) ( const Rectangle & lhs,
5 const Rectangle & rhs ) const
6 { return lhs.getLength( ) < rhs.getLength( ); }
7 };
Figure 7-20 Overloading of the function call operator for function object
In the implementation of findMax, the call at line 17 becomes
if( cmp.operator() ( a[ maxIndex ], a[ i ] ) )
Since operator() is the function call operator, this can be rewritten as
if( cmp( a[ maxIndex ], a[ i ] ) )
At this point it makes sense to change the name of the parameter to findMax from cmp
to isLessThan. Thus we get the code in Figure 7-21.
To summarize, the function object is declared in Figure 7-20 and provides an implementa-
tion of the function call operator. The routine that uses the function object is a template, and type
of the function object is a template parameter. Syntactically, when the function object is used, it
looks like a normal global function call. The template routine can itself be invoked as in Java by
passing an instance of the function object. The compiler will do all the type resolution.
Function objects are used extensively in the STL, which is the C++ equivalent of the Col-
lections API. The STL is discussed in Chapter 10.
7.8 Exercises
1. When using templates, what kinds of errors are detected at compile time that are not
detected until runtime in Java?
2. When are errors in a template definition detected by the compiler?
3. What is code bloat?
4. What strategies are used for separate compilation of templates?
5. What is a member template?
6. How are templates used for function objects?
7. Implement a function template that takes a single vector as a parameter, and sorts the vec-
tor (using any simple sorting algorithm).
8. Implement a function template that takes a single vector as a parameter and a function
object that represents the comparator as a second parameter and sorts the vector using the
comparator as the basis for ordering (using any simple sorting algorithm).
9. Reimplement Exercise 4.19 using templates to generalize the types of the objects in the
stack.
10. Reimplement Exercise 4.20 using templates to generalize the types of the objects in the
set.
11. Reimplement Exercise 5.11 using templates to generalize the types of the keys and values
in the map.
c++book.mif Page 155 Tuesday, April 29, 2003 2:13 PM
C H A P T E R 8
J
AVA is notoriously concerned with not allowing unsafe
programs to execute. The compiler will catch many programming errors, and the runtime system
will throw exceptions when invalid operations occur. C++ was designed with a different mental-
ity.
In this chapter, we discuss how C++ programs typically handle errors, and then examine
exceptions in C++. Although the syntax has similarities (try and catch), C++ exception han-
dling is a shell of what Java provides.
155
c++book.mif Page 156 Tuesday, April 29, 2003 2:13 PM
1 class Account
2 {
3 public:
4 Account( int b = 0 )
5 : balance( b ) { }
6 int getBalance( ) const
7 { return balance; }
8 void deposit( int d )
9 { balance += d; }
10 private:
11 int balance;
12 };
13
14 int main( )
15 {
16 Account *acc1 = new Account( );
17 Account *acc2;
18
19 acc1->deposit( 50 );
20 cout << acc1->getBalance( ) << endl;
21 cout << acc2->getBalance( ) << endl;
22
23 return 0;
24 }
sense can be seen in Figure 8-3: Here we allow the changing of a constant object. Though C++
puts in a good faith effort to disallow it, if we use the const_cast, we can change an object
that we promised not to change.
But certainly the most evil of C++ issues is shown in Figure 8-4. Here we have an obvious
array index that is out-of-bounds. Almost any language would detect this at runtime. But not
C++. Instead, it uses the four bytes that follow arr[9] as its guess for arr[10]. If arr[10]
were on the left-hand side of the assignment operator, we could even change its value. Almost
certainly this would be some other variable in the program, leading to hard-to-find bugs.
In C, on which primitive C++ arrays are based, the lack of bounds checking was justified
on the grounds that the implementation simply stored a pointer variable. And being a language
of the 1970s, that was used instead of assembly language to implement operating systems, cer-
tainly speed was an important criteria.
But the vector class was added in the mid 1990s, and not including bounds checking as
an automatic part of the indexing operator seems inexcusable. Yet it accurately reflects the C++
philosophy of not forcing the user to pay at runtime for anything more than bare necessities, and
certainly not paying for a feature that was free in C.
c++book.mif Page 157 Tuesday, April 29, 2003 2:13 PM
1 class Barbell
2 {
3 public:
4 Barbell( double b ) : weight( b ) { }
5 double getWeight( ) const
6 { return weight; }
7 private:
8 double weight;
9 };
10
11 int main( )
12 {
13 Barbell *bb = new Barbell( 15.6 );
14 cout << bb->getWeight( ) << endl;
15
16 Account *acc = (Account *) bb;
17 cout << acc->getBalance( ) << endl;
18 acc->deposit( 40 );
19 cout << bb->getWeight( ) << endl;
20
21 return 0;
22 }
Figure 8-2 C++ code that might generate a warning but compiles, allowing type confusion
In C++ error handling in general seems to follow this trend. The compiler and runtime
systems are less likely to signal errors than in Java. And even when such errors are signalled, in
many cases the programmer can avoid acknowledging that the error has occurred, and can thus
continue executing code while the program is no longer in a good state.
1 int main( )
2 {
3 const string h = "hello";
4
5 // two failed attempts to change h
6 h[ 0 ] = 'j'; // does not compile, thankfully
7 string & href = h; // does not compile, thankfully
8
9 // third time is a charm
10 string & ref = const_cast<string &> ( h );
11 ref[ 0 ] = 'j';
12 cout << h << endl; // prints jello
13
14 return 0;
15 }
1 int main( )
2 {
3 int i;
4 vector<int> arr( 10, 37 ); // 10 items, all with val 37
5
6 for( i = 0; i <= 10; i++)
7 cout << i << " " << arr[ i ] << endl;
8
9 return 0;
10 }
1 void printExit( )
2 {
3 cout << "Invoking the printExit method..." << endl;
4 }
5
6 int main( )
7 {
8 atexit( printExit );
9 foo( );
10 }
8.2.2 Assertions
The assert preprocessor macro (see Section 12.1 for a discussion of preprocessor macros) is
made available by including the standard header file cassert.
If NDEBUG is defined, calls to assert are ignored. Otherwise, if the parameter to
assert is zero, an error message is printed and the program is aborted, by calling abort.
Recall that no destructors are invoked, so this is rather drastic. The error message includes the
actual expression as well as the source code file name and line number.
For instance, in Figure 8-6 if a call to foo is made with a NULL pointer, an error message
similar to
Assertion failed: s != NULL, file Fig8-6.cpp, line 8
1 #include <iostream>
2 #include <string>
3 #include <cassert>
4 using namespace std;
5
6 void foo( string *s )
7 {
8 assert( s != NULL );
9 ...
10 }
will appear (on the standard error), and the program will terminate abnormally. If NDEBUG is
defined prior to line 8 via
#define NDEBUG
assert will be ignored.
ever, at run time, if an exception is thrown that was not in the throw list, then a call to
std::unexpected will occur, which normally calls abort. It is possible to change the
behavior of std::unexpected by invoking set_unexpected.
The throw list syntax in C++ is slightly different from Java:
void foo( ) throw( UnderflowException, OverflowException );
An empty (as opposed to missing) throw list signals that no exceptions are expected to be
thrown. If an expected exception is thrown but not caught, the function std:terminate will
be invoked. This function is also called if the exception handling mechanism determines that the
runtime stack has been corrupted, or if a destructor that is executed as part of the handling of an
exception itself throws an exception (to avoid infinite recursion). Calling set_terminate
allows the programmer to supply a function that contains different behavior for the
terminate function.
exception
runtime_error logic_error
bad_alloc
range_error out_of_range length_error
bad_cast
overflow_error invalid_argument domain_error
bad_typeid
underflow_error
bad_exception
ios_base::failure
Figure 8-7 Standard exceptons
c++book.mif Page 162 Tuesday, April 29, 2003 2:13 PM
a class type that can store information about why the exception was thrown.
In Java, Throwable objects contain a stack trace as part of their data. Since there is no
root exception class in C++, you cannot count on much information from the exception object
besides its type.
hierarchies and the methods are marked virtual, dynamic dispatch will occur when the
exception’s methods are invoked, as long as the exception is not passed by value.
The alert reader, may be wondering why the reference to the exception, e, at line 26 is not
stale. After all, in both cases, a local temporary was storing the exception object, and the func-
tion in which that temporary was created has terminated. Clearly if this was true, exceptions
would have to be passed using call-by-value, which would be impossible with inheritance (since
call-by-value and inheritance do not mix). Thus as a special case, the exception handling mech-
anism guarantees that the exception object will not have its destructor called until it is no longer
an active exception. This means the object is valid until it is caught, and the catch block that it is
caught in terminates, unless the catch block rethrows the exception, in which case it retains its
validity as if the catch block was never executed.
• For robust code, avoid calling exit outside of main. Try to use exceptions instead.
• assert can be used for debugging code, but might be inappropriate for production code.
However, a simple replacement for assert that substitutes a call to abort with throwing
of an exception can easily be written (Section 12.1).
• Exceptions in Java are based on exceptions in C++, but the C++ implementation doesn’t
work well mainly because of the need for backward compatibility with earlier versions of
C++.
• Few C++ library routines signal errors by throwing an exception. Instead, many incorpo-
rate error states into either return codes, global variables, or object state.
• C++ does not have a finally block. Instead, attempt to ensure that all of your heap-allo-
cated objects are wrapped inside a stack-allocated object, so that they can be freed by calls
to destructors. One way of doing this is to use the auto_ptr.
• A throw list specifies exceptions that can be expected to occur. If an exception is thrown
from a method that is not in the method’s throw list, function std::unexpected is
invoked.
• A missing throw list indicates that any exception can occur.
• Any object can be thrown in C++.
c++book.mif Page 166 Tuesday, April 29, 2003 2:13 PM
• The standard library defines a small hierarchy of what would be Java errors and runtime
exceptions that can be extended by the programmer. This hierarchy is rooted at class
exception.
• Inside a catch block, we can rethrow an exception simply by issuing throw. The excep-
tion is not required in the throw statement.
• To catch all exceptions use ... .
• Many of the Java rules relating to exceptions are in effect implemented in C++.
• Exceptions should always be caught using call-by-reference or call-by-constant reference.
Catching by reference allows the changing of the state of the exception object prior to
rethrowing it.
• The exception object is always valid up to the end of the catch clause that last handles the
exception.
• Templates and exceptions don’t mix. Avoid using throw lists in function templates.
• auto_ptr is a class template that wraps a pointer variable. The pointer variable should
be viewing a heap-allocated object. When the auto_ptr is destroyed, its destructor will
delete the object it is wrapping if the auto_ptr still enjoys ownership. Ownership is
transferred if the auto_ptr is copied into another auto_ptr.
• Avoid using auto_ptr objects in container classes such as vector.
8.6 Exercises
C H A P T E R 9
I
N Java, I/O is supported by a large library that resides
mostly in package java.io. The package makes use of inheritance, by defining four abstract
classes (InputStream, OutputStream, Reader, and Writer), which can then be
extended to target different sources of data (files, sockets, arrays), and which can be wrapped in
a classic decorator pattern to achieve different functionality (buffering, compression, encryption,
serialization, pushback operators, and so on). In C++, inheritance is also used to define a hierar-
chy, though the C++ hierarchy is considerably smaller and the use of the decorator pattern is not
adopted.
In this chapter, we begin by discussing the I/O hierarchy in C++. Because exceptions are
not part of I/O, we see how the error state is encoded in a stream. Then we look at basic output
and input, and see an equivalent to the use of StringTokenizer that is seen in Java.
167
c++book.mif Page 168 Tuesday, April 29, 2003 2:13 PM
ios_base
ios
istream ostream
fstream stringstream
Figure 9-1 Basic inheritance hierarchy of streams. Any class c in shaded area is really a
template instantiation basic_c<char>, and there is a similar instantiation for
wchar_t (e.g. wifstream is used for wide characters)
Subclasses of ios_base are all templates that are instantiated with the type of characters
that are used in the implementation. Standard instantiations include one for char, and another
for the wide character type wchar_t.
The basic_ios class encapsulates some information that is common to both input and
output. The instantiation basic_ios<char> is defined with a typedef as ios; and is similar
instantiation basic_ios<wchar_t> is defined with a typedef as wios. In other words, we
have:
template <class chartype>
class basic_ios : public ios_base { ... }
which correctly outputs all successfully read data. However, if the read fails (for instance x is an
int, and there is a non-integer character on the input), this loop terminates at that point, rather
than attempting error recovery.
For more robust code, one can use a test that attempts recovery. An example of this strat-
egy is shown in Figure 9-2. Here we have a routine that reads items, separated by white space, of
arbitrary type (as long as the type has correctly overloaded operator<<).
This code illustrates three important points. First, fail will be true if the I/O operation
has failed for some reason, but that the stream is still in good shape, and so the error is correct-
able. If, for instance, we are trying to read an integer, fail could indicate that non-integer data,
such as the word "Joe" was found on the input stream. Thus, second, we can solve the problem
by skipping over the next token. However, third, whenever the stream is in an error state, all I/O
on the stream will continue to fail. So before we attempt to read the string that we will discard,
we must first clear the error state by invoking clear. (That read cannot normally fail, since
there must be data, and the stream is in good shape otherwise, but extra critical code would
check an error state after the read also).
A technical point about this example that is not related to I/O, but instead deals with tem-
plates. The code can be invoked as:
vector<int> arr;
readData( cin, arr );
If the call readData returned the vector instead of using it as a parameter, then the instanti-
ation of the function template must be explicit, as in
vector<int> arr = readData<int>( cin );
1 template <typename Object>
2 void readData( istream & in, vector<Object> & items )
3 {
4 items.resize( 0 );
5 Object x;
6 string junk; // to skip over bad data
7
8 while( !( in >> x ).eof( ) )
9 {
10 if( in.fail( ) )
11 {
12 in.clear( ); // clear the error state
13 in >> junk; // skip over junk
14 cerr << "Skipping " << junk << endl;
15 }
16 else
17 items.push_back( x );
18 }
19 }
Output 171
fail for an unsuccessful operation, such as reading data in the wrong format
since a function template’s return type is not considered in determining a template expansion,
and thus the compiler would have no way to deduce what Object should be.
The method good returns true if the stream is not in an error state. The method bad is
like fail, except it is more severe: the stream has been corrupted for some reason, and so it is
not worth attempting recovery. Figure 9-3 summarizes the methods that can test the state of a
stream.
9.3 Output
As we have already seen, the vast majority of output statements simply overload operator<<.
In addition to operator<<, single characters can be output by invoking the put member
function. The most interesting part of output in C++ is probably the technique that is used to
finely tune how the output is formatted, especially since we might not be happy with the
defaults.
For instance, if we run the code in Figure 9-4, the output that is produced is:
Pat 40000.1
Sandy 125443
Here we see two deficiencies. First, by default doubles are only output to six significant
digits. Second, integer and string types print only the minimum number of characters needed,
making it hard to align output. We would prefer output such as:
Pat 40000.11
Sandy 125443.10
in which we force two decimal places, and require that both the string and double be pad-
ded with at least a few spaces. But strings should be placed on the left, followed with pad-
ding, while double should be placed on the right, preceded with padding.
The number of significant digits and digits after the decimal point, as well as how much
and where padding of output is done is part of the state of a stream. Specifically, it is part of the
format state. Thus we can invoke methods on the stream to examine and possibly change the for-
mat state, on a stream-by-stream basis.
The easiest way to do this is to use manipulators. For instance, some of the manipulators,
of concern to us, with examples of their use on a specific output stream cout, could include
c++book.mif Page 172 Tuesday, April 29, 2003 2:13 PM
1 class Person
2 {
3 public:
4 Person( const string & n = "", double s = 0.0 )
5 : name( n ), salary( s ) { }
6
7 void print( ostream & out = cout ) const
8 { out << name << " " << salary; }
9
10 private:
11 string name;
12 double salary;
13 };
14
15 ostream & operator<< ( ostream & out, const Person & p )
16 {
17 p.print( out );
18 return out;
19 }
20
21 int main( )
22 {
23 vector<Person> arr;
24 arr.push_back( Person( "Pat", 40000.11 ) );
25 arr.push_back( Person( "Sandy", 125443.10 ) );
26
27 for( int i = 0; i < arr.size( ); i++ )
28 cout << arr[ i ] << endl;
29 return 0;
30 }
Output 173
Most of these change the format state for all subsequent operations (until overridden by a con-
tradictory manipulator), except for setw which only applies to the next field. The manipulators
that accept a parameter are available by including the standard header file iomanip. Figure 9-5
shows how we can use these manipulators to generate aligned, nicely-formatted output.
There are a host of manipulators that are available.
Manipulator boolalpha and noboolalpha are used to control whether bools are
printed as false and true or 0 and 1. The latter is the default (for backward compatibility).
Thus
cout << boolalpha << true << " " << noboolalpha << true << endl;
prints
true 1
oct, dec, and hex are manipulators used to control how numbers are output. Alterna-
tively, setbase(b) can be used. The base by default is not printed, but this can be changed by
manipulator showbase; the default is noshowbase. Thus,
cout << 37 << " " << oct << 37 << " "
<< hex << 37 << " " << setbase( 10 ) << 37 << endl;
prints
37 45 25 37
If we have
cout << showbase;
cout << 37 << " " << oct << 37 << " "
<< hex << 37 << " " << setbase( 10 ) << 37 << endl;
then the output is
37 045 0x25 37
uppercase and nouppercase controls whether the x in 0x and e in scientific nota-
tion are printed in lower case or upper case. left and right control the positioning of the data
relative to the padding. internal puts fill characters between the sign and the value.
setprecision(n) sets the floating point precision. setw(w) sets the width of the next out-
put only to w. fixed and scientific control whether scientific notation is output.
setfill(ch) sets the fill white space to ch. For instance, setfill(’*’) can be
used to fill with *, as is commonly done on cashier’s checks to prevent fraud. The following
code
cout << setprecision( 2 ) << setfill( ’*’ ) << fixed << right;
cout << setw( 8 ) << 12.49 << endl;
cout << setw( 8 ) << 3.1 << endl;
outputs
***12.49
****3.10
c++book.mif Page 174 Tuesday, April 29, 2003 2:13 PM
9.4 Input
We have already seen that input streams make use of overloaded sets of operator<< to do
significant work and in Section 9.2 we saw how the error state of a stream can be accessed and
cleared. In this section we discuss some additional input operations.
Often we want to perform character at a time input. Although operator<< is over-
loaded to accept a character, using it can be tedious because operator<< skips whitespace by
default. Although the manipulator skipws can change this setting for future reads, and ws can
change the state for the next read, at best this is tedious, and at worse it potentially is time-con-
suming, if we repeatedly have to set the format state and then reset because the character-at-a-
time input is interspersed with other input.
For this reason, istreams provide a get method. There are several versions of get, but
the easiest to use is the one with signature
istream & get( char & ch );
Thus, in
char ch;
if( cin.get( ch ) )
cout << "Read " << ch << endl;
else
cout << "Read error" << endl;
(or for wide characters)
wchar_t ch;
if( wcin.get( ch ) )
wcout << L"Read " << ch << endl;
else
wcout << L"Read error" << endl;
a single character is read, and if the read fails the stream is put in a bad state. The unget
method is used to undo a get. The peek method is used to examine the next character in the
input stream without digesting it. The declarations of these methods are:
istream & unget( );
int peek( );
c++book.mif Page 175 Tuesday, April 29, 2003 2:13 PM
Files 175
1 istream & getline( istream & in, string & str, char delim )
2 {
3 char ch;
4 str = ""; // empty string, will build one char at-a-time
5
6 while( in.get( ch ) && ch != delim )
7 str += ch;
8
9 return in;
10 }
9.5 Files
Files are modelled by either an ifstream for input, or an ofstream for output (again there
are corresponding class template expansions for wide-character implementations). An
fstream can be used for both input and output, but we do not recommend it. To perform file I/
O, the standard header fstream should be included.
File streams can be constructed with a primitive string (either a string constant, null termi-
nated array of character, or a result of c_str on a string object), and optionally a mode that
describes how the file is to be used. Some examples include:
ifstream file1( "data.txt" );
c++book.mif Page 176 Tuesday, April 29, 2003 2:13 PM
string name3
cin >> name3;
ofstream file3( name3.c_str( ), ios_base::out | ios_base::trunc );
string name4;
getline( cin, name4 );
ofstream file4( name4.c_str( ), ios_base::out | ios_base::app );
In these examples, first we see that an ifstream can be constructed with a file name. The sec-
ond example constructs an ofstream with a primitive string (an array of character), and the
default output mode of truncation. Note that although cin>>name2 compiles, using it is very
dangerous, since it can lead to a buffer overflow. Invoking an overloaded version of get that
works with character arrays is a much safer solution. Option number 3 uses a string class
object, and we see that we can invoke c_str to obtain the primitive string that the ofstream
constructor requires. Also, we see an explicit use of a mode, in which we bitwise-or out and
trunc. This is the default. Finally we see a constructor that opens a file for appending. (Older
versions of C++ use ios instead of ios_base). Alternatively, we can simply declare the
stream object and use member function open later. The state of the stream should be tested after
it is opened, as in
if( file4 )
// ok
else
// not ok
When a stream goes out of scope, it is guaranteed that its destructor is called, thereby clos-
ing the stream automatically. The user can invoke close if it is desired to close the stream
sooner, perhaps to reopen a different file with the same stream object.
offset may be negative. Generally, it is an error to attempt to seekg to before the beginning
of the stream. On some systems, notably Unix, a seekg past the end of the file is supported by
extending the file with undefined contents.
As an example, the routine in Figure 9-7 prints the last howMany characters in the
(binary) file fileName. After opening the file for reading, and checking for errors, we invoke
seekg at line 9 to go to the end. We then back up howMany characters, taking care to avoid
backing up to before the beginning. We do this by calling tellg at line 10 to see how large the
file is and use the smaller of the file size and howMany as the new value of howMany. Then at
line 13 we back up howMany characters, and finally, we can read characters with get and out-
put them with put.
Figure 9-7 Routine that prints last howMany characters from fileName
c++book.mif Page 178 Tuesday, April 29, 2003 2:13 PM
Figure 9-8 Java code to distinguish lines containing exactly two integers
a file that had only one int on each of the first two lines would be processed without error. If
we wanted to insist that every line had two and only two integers, we would need to do more
work.
In Java, we would read one line at a time into a String. Once we had the string, we
could parse it with a StringTokenizer, using code such as Figure 9-8.
This is exactly the behavior that can be implemented with an istringstream, which
like its companion ostringstream, is available by including the standard header sstream.
An istringstream is constructed by passing a string as a parameter. At that point, all of
the basic istream operators, including operator>> and testing of error states are available.
Note that the error states apply to the istringstream, and not the fstream.
Figure 9-9 shows the C++ implementation of twoInts, which is only cosmetically dif-
ferent from the Java code. Observe, first, that fin is a reference to an istream instead of an
ifstream. As with Java, it is always best to use the most generic type. Once we have the
istringstream at line 6, we can do two input operations, and then test the error state, skip-
ping a line. Since there is no equivalent to countTokens to check if there are exactly two
c++book.mif Page 179 Tuesday, April 29, 2003 2:13 PM
endl 179
Figure 9-9 C++ code to distinguish lines containing exactly two integers
tokens, we attempt to read a string, which should fail. If it succeeds, we print an error mes-
sage and go on to the next line. Note carefully that in this code, each iteration of the loop creates
a new istringstream object (on the runtime stack), destroying the original.
Declaring istringstream inside the while loop has the advantage that since each iter-
ation creates a fresh istringstream, we do not have to clear the error state. The obvious dis-
advantage is the repeated calls to constructors and destructors. An alternative is to use the str
method of istringstream to change str. Then we can put the istringstream object
outside the loop, uninitialized, and set its string each time around the loop. However, now we
must clear the error state. Figure 9-10 shows this approach.
For ostringstream, writes can be directed to a string instead of standard output, files,
or other places. To extract the string from the ostringstream, invoke the str method. A
classic example is the conversion of any (printable) type to a string as shown in Figure 9-11.
9.8 endl
Because endl writes a newline and flushes the stream, using endl can be time-consuming
c++book.mif Page 180 Tuesday, April 29, 2003 2:13 PM
when there is significant disk-bound (or network bound I/O). In such a case, writing the "\n"
character directly can be more efficient.
One one of our machines, we copied from one file to another a line at a time, and mea-
sured the time spent writing. In this program which is almost exclusively I/O, we observed that
using endl is four times slower for writing, The files were approximately 2,000,000 lines, with
78,000,000 characters. When each line was written using endl, the time spent writing was
approximately 40 seconds. Ending the line with "\n" reduced the time to 10 seconds. Using the
character ’\n’ instead of a string "\n" did not affect the running time.
Needless to say, this consideration is important only if a significant portion of the running
time is spend performing I/O.
9.9 Serialization
Serialization is not part of standard C++. Each implementation might provide some customized
support for serialization, but certainly objects written by the implementation could only be read
in the same implementation.
• The C++ I/O library combines inheritance and templates. The templates are used to spec-
1 template <typename Object>
2 string toString( const Object & x )
3 {
4 ostringstream os;
5 os << x;
6 return os.str( );
7 }
Exercises 181
9.11 Exercises
11. Write a program that processes include directives. Since an included file may itself contain
include directives, the basic algorithm is recursive.
12. Write a method that takes the name of a file as a parameter, and reverses the contents of
the file.
c++book.mif Page 183 Tuesday, April 29, 2003 2:13 PM
C H A P T E R 1 0
I
N Java, the Collections API in package java.util
implements standard data structures such as lists, sets, and maps. C++ has a package that pro-
vides similar functionality, namely the Standard Template Library, which is known simply as the
STL. As the name suggests, the STL makes heavy use of templates.
In this chapter, we describe how the STL is organized and cover the basic containers and
iterators, as well as a small collection of algorithms such as sorting and searching.
10.1.1 Containers
C++ defines several container templates. Like Java, some collections are unordered; others, are
183
c++book.mif Page 184 Tuesday, April 29, 2003 2:13 PM
ordered. Some collections allow duplicates; others do not. All containers support the following
operations:
int size( ) const
void clear( )
bool empty( ) const
size returns the number of elements in the container; empty returns true if the container con-
tains no elements and false otherwise.
Unlike Java, there is no universal add method; different containers use different names.
Some of the container class templates are vector, deque, list, set, multiset, map,
multimap, and priority_queue.
vector is the equivalent of an ArrayList. The add operation for vector is named
push_back. vector supports operator[]. list is the equivalent of LinkedList. Its
add operation is also named push_back, but list does not support operator[]. How-
ever, list does support push_front. deque is an array-based data structure that supports
efficient indexing with operator[], and both push_front and push_back, all in con-
stant time per operation.
set is the equivalent of TreeSet. The add operator for set is insert. multiset
allows duplicates, whereas set does not. map is the equivalent of TreeMap. The add opera-
tion is insert, but one must pass the key and value in a single pair object. However, map also
provides an overloaded operator[] that makes the map look just like an array. A
multimap allows duplicate keys.
The STL also contains a priority_queue class. Its operation is known as push.
Comparing these collections with Java, we see that STL supports sets and maps that contain
duplicates, as well as the priority queue, but does not support searching with hash tables.
10.1.2 Iterators
In Java, each container defines an internal iterator type, but exports it through the Iterator
interface type. In C++, each container defines several iterator types, and these specific iterator
types are used by the programmer instead of an abstract type.
For instance, if we have a vector<int>, the basic iterator type is
vector<int>::iterator. Another iterator type, vector<int>::const_iterator,
does not allow changes to the container on which the iterator is operating. This implies that the
basic iterator can be used to change the container.
All iterators are guaranteed to have at least the following set of operations:
++itr and itr++ advance the iterator itr to the next location. Both the prefix and
postfix forms are available. This does not cause any change to container. Some iterators support
--itr and itr--. Those iterators are called bidirectional iterators. Some iterators support
both itr+=k and itr+k. Those iterators are called random-access iterators. itr+=k
advances the iterator k positions. itr+k returns a new iterator that is k positions ahead of itr.
c++book.mif Page 185 Tuesday, April 29, 2003 2:13 PM
begin returns an iterator that is positioned at the first item in the container. end returns
an iterator that is position at the endmarker, which represents a position one past the last element
in the container. For instance, on an empty container, begin and end return the same position.
begin and end both make use of the fact that identical-looking methods can be over-
loaded if one is an accessor and one is a mutator. So if begin is invoked on a constant con-
tainer, we will get a const_iterator, which won’t support any changes to the container. If
begin is invoked on a mutable container, we will get an iterator, which can be used to
change the container.
Typically we initialize a local iterator to be a copy of the begin iterator, and have it step
through the container, stopping as soon as it hits the endmarker. As an example, Figure 10-1
shows a print function that prints the elements of any container, provided that the elements in
the container have provided an operator<<. If the container is a set, its elements are output
in sorted order. Figure 10-2 illustrates four different containers that invoke the print function,
along with the expected output (in comments). Observe that both set and multiset output in
sorted order, with multiset allowing the second insertion of foo.
c++book.mif Page 186 Tuesday, April 29, 2003 2:13 PM
1 #include <iostream>
2 #include <vector>
3 #include <list>
4 #include <set>
5 #include <string>
6 using namespace std;
7
8 int main( )
9 {
10 vector<int> vec;
11 vec.push_back( 3 ); vec.push_back( 4 );
12
13 list<double> lst;
14 lst.push_back( 3.14 ); lst.push_front( 6.28 );
15
16 set<string> s;
17 s.insert( "foo" ); s.insert( "bar" ); s.insert( "foo" );
18
19 multiset<string> ms;
20 ms.insert( "foo" ); ms.insert( "bar" ); ms.insert( "foo" );
21
22 print( vec ); // 3 4
23 print( lst ); // 6.28 3.14
24 print( s ); // bar foo
25 print( ms ); // bar foo foo
26
27 return 0;
28 }
10.1.3 Pairs
If we try to print a map, the program will not compile immediately because the elements of a
map are pairs of keys and values. If operator<< is overloaded for pair, then we can in fact
use the print routine. Figure 10-3 illustrates the general strategy.
As expected, pair is a class template, and stores two data members first and
second, which can be directly accessed, without invoking methods. So we can easily overload
operator<< to output a pair, assuming its components first and second have done so
too.
In Figure 10-4 we can create a map that stores the name of a city and the zip code, both as
strings. (The zip code cannot be an int, since many zip codes begin with 0). Lines 10 and 11
show that a map stores pair objects, and the pair objects can be added by calling insert. Line
12 shows the much more natural equivalent that makes use of operator overloading. We describe
maps in more detail in Section 10.7.
c++book.mif Page 187 Tuesday, April 29, 2003 2:13 PM
1 #include <iostream>
2 #include <map>
3 using namespace std;
4
5 template <typename Type1, typename Type2>
6 ostream & operator<<( ostream & out, const pair<Type1,Type2> & p )
7 {
8 return out << "[" << p.first << "," << p.second << "]";
9 }
Recall that because of slicing, derived class objects cannot be copied into base class objects.
Unlike Java, the STL stores copies of objects, not simply references to the objects. Thus, hetero-
geneous containers storing multiple types of compatible objects should store pointers to the
objects, rather than the objects themselves.
Figure 10-5 shows a vector that stores pointers to both Student and Person (these
classes were defined in Figure 6-2 and Figure 6-4, respectively). Observe that we must again
overload operator<<, because if we do not, the existing operator<< that outputs the value
of the pointer (the memory address of the object it is pointing at) will be used. By providing our
own, with the base class type as a parameter, we have a better match than the existing version
that accepts a generic void * as a parameter.
1 #include <iostream>
2 #include <map>
3 #include <string>
4 using namespace std;
5
6 int main( )
7 {
8 map<string,string> zip;
9
10 zip.insert( pair<string,string>( "Miami", "33199" ) );
11 zip.insert( pair<string,string>( "Princeton", "08544" ) );
12 zip[ "Boston" ] = "02134";
13
14 // Prints: [Boston,02134] [Miami,33199] [Princeton,08544]
15 print( zip );
16
17 return 0;
18 }
10.1.5 Constructors
All containers can be constructed from other containers. However, instead of a constructor that
accepts another container, the constructors will accept a pair of iterators representing the first
item from the other container and the first non-included item from the other container. Thus, for
instance,
vector<int> clone( original.begin( ), original.end( ) );
constructs a new vector clone with the same elements as any container original.
costly than expected. Thus lists in C++ have the additional C++ advantage of generally
requiring less data movement compared to arrays when objects are large and a reasonable esti-
mate of the vector capacity is not available at the start.
The basic operations that are supported by both containers are:
void push_back( const Object & x );
Object & back( );
void pop_back( );
Object & front( );
iterator insert( iterator pos, const Object & x );
iterator erase( iterator pos );
iterator erase( iterator start, iterator end );
push_back adds x to the end of the container. back returns the object at the end of the
container; an accessor is also defined that returns a constant reference. pop_back removes the
object at the end of the container. front returns the object at the front of the container.
insert adds x into the container, prior to the position given by the iterator. This is a constant
time operation for list, but not for vector or deque. insert returns an iterator represent-
ing the position of the inserted item.
Adding x to the front of c could be implemented as
c.insert( c.begin( ), x );
Adding x to the back of c could be implemented as
c.insert( c.end( ), x );
since end returns the endmarker.
The one-parameter erase removes the object at the position given by the iterator, and is
constant time for list, but not vector or deque. It returns the position of the element that
followed pos prior to the call to erase. Most importantly, this operation invalidates pos,
which is now stale. Typically pos is reset to the return value of erase. Removing the first and
last elements of container c can be done with
c.erase( c.begin( ) );
c.erase( --c.end( ) );
In the second call, observe that the return value from c.end() is an unnamed temporary whose
position represents the endmarker. Thus the -- operator changes the state of the unnamed tem-
porary to view the last item in the container. After the erase method is called, the unnamed
temporary’s destructor is invoked.
Two-parameter erase removes all items beginning at position start, up to but not
including end. The idea of a range being half-open-ended is similar to substring operations in
java.util.String. It means that an entire container can be erased by the call:
c.erase( c.begin( ), c.end( ) );
A possible implementation of erase is:
c++book.mif Page 190 Tuesday, April 29, 2003 2:13 PM
Figure 10-6 Awkward routine to print a container in reverse with normal iterator
resenting the last position (not the endmarker), and the beginmarker (not the first position). The
reverse iterator is reverse_iterator or const_reverse_iterator, as appropriate,
and for a reverse iterator, ++ moves toward the front while -- moves toward the rear, opposite
to normal iterator semantics.
1 #include <queue>
2 #include <iostream>
3 #include <list>
4 using namespace std;
5
6 int main( )
7 {
8 queue<int,list<int> > q;
9 q.push( 37 ); q.push( 111 );
10 for( ; !q.empty( ); q.pop( ) )
11 cout << q.front( ) << endl;
12
13 return 0;
14 }
10.6 Sets
The set class template in C++ behaves in the same manner as Java. A set does not allow
duplicates, and by default, iteration of a set views items in the default order. However, sets
can use a function object to override the default ordering.
c++book.mif Page 194 Tuesday, April 29, 2003 2:13 PM
Sets 195
1 class PtrToPersonLess
2 {
3 public:
4 bool operator() ( const Person *lhs, const Person *rhs ) const
5 { return lhs->getSsn( ) < rhs->getSsn( ); }
6 };
7
8 int main( )
9 {
10 set<Person *, PtrToPersonLess> s;
11
12 s.insert( new Person( 987654321, "Bob" ) );
13 s.insert( new Student( 123456789, "Jane", 4.0 ) );
14
15 print( s );
16
17 return 0;
18 }
lower_bound returns an iterator to the first element in the set with a key that is greater
than or equal to x. upper_bound returns an iterator to the first element in the set with a key
that is greater than x. equal_range returns a pair of iterators representing lower_bound
and upper_bound. These routines are typically most useful in multisets.
10.6.3 multisets
A multiset is like a set except that duplicates are allowed. The return type of insert is
modified to indicate that the insert always succeeds. As a result, we no longer need a pair,
but can simply return an iterator representing the newly inserted x.
iterator insert( const Object & x );
iterator insert( iterator hint, const Object & x );
For the multiset, the erase member function that takes an Object x removes all
occurrences of x. To simply remove one occurrence, use the erase member function that takes
an iterator. To find all occurrences of x, we cannot simply call find; that returns an iterator ref-
erencing one occurrence (if there is one), but which specific occurrence is returned is not guar-
anteed. Instead, the range returned by lower_bound and upper_bound (with
upper_bound not included) contains all of occurrences of x; typically this is obtained by a
call to equal_range.
1 map<string,double> salaries;
2
3 salaries[ "Pat" ] = 75000.00;
4 cout << salaries[ "Pat" ] << endl;
5 cout << salaries[ "Jan" ] << endl;
6
7 map<string,double>::const_iterator itr;
8 itr = salaries.find( "Chris" );
9 if( itr == salaries.end( ) )
10 cout << "Not an employee of this company!" << endl;
11 else
12 cout << itr->second << endl;
function object. Recall from Section 7.6.3 that the function object idiom in C++ is implemented
by providing a class that contains an overloaded operator(), and then instantiating a tem-
plate with the class name as a template parameter. Figure 10-10 illustrates the idiom, in which
the code seen earlier in Figure 10-5 is adapted to use a set instead of a vector.
10.7 Maps
As we have already seen, a map behaves like a set instantiated with a pair representing a key
and value, with a comparison function that refers only to the key. Thus it supports all of the set
operations, including insert, but as we saw in Figure 10-4, we must insert a properly instanti-
ated pair. The find operation for maps requires only a key, but the iterator that it returns ref-
erences a pair. Similarly, erase requires only a key, and otherwise behaves like the set’s
erase.
Most importantly, the map overloads the array indexing operator[]:
ValueType & operator[] ( const KeyType & key )
The semantics of operator[] are as follows. If the key is present in the map, a refer-
ence to the value is returned. If the key is not present in the map, it is inserted with a default
value into the map, and then a reference to the inserted default value is returned. The default
value is obtained by applying a zero-parameter constructor, or is zero for the primitive types.
These semantics do not allow an accessor version of operator[], and so operator[] can-
not be used on a map that is constant. For instance, if a map is passed by constant reference,
inside the routine, operator[] is unusable. This could be a case where casting away const-
ness is useful.
The code snippet in Figure 10-11 illustrates two techniques to access items in a map. First
observe that at line 3, the left-hand-side invokes operator[], thus inserting "Pat" and a
double of value 0 into the map, returning a reference to that double. Then the assignment
changes that double, inside the map, to 75000. Line 4 outputs 75000. Unfortunately, line 5
inserts "Jan" and a salary of 0.0 into the map, and then prints it. This may or may not be the
c++book.mif Page 197 Tuesday, April 29, 2003 2:13 PM
proper thing to, depending on the application. If it is important to distinguish between items that
are in the map and not in the map, or if it is important to not insert into the map (because it is
immutable), then an alternate approach shown at lines 7 to 12 can be used. There we see a call to
find. If the key is not found, the iterator is the endmarker, and can be tested. If the key is
found, we can access the second item in pair referenced by the iterator, which is the value for
the key. We could make a change to itr->second if instead of a const_iterator, itr is
an iterator.
10.7.1 Multimaps
A multimap is a map in which duplicate keys are allowed. In Java, the effect of a multimap is
achieved by using a map whose values are Lists.
Otherwise, multimaps behave like maps but do not support operator[].
1 #include <iostream>
2 #include <fstream>
3 #include <sstream>
4 #include <map>
5 #include <string>
6 #include <vector>
7 #include <iomanip>
8 using namespace std;
9
10 typedef vector<int> LList;
11 ostream & operator<<( ostream & out,
12 const pair<string,LList> & rhs )
13 {
14 out << left << setw( 20 ) << rhs.first;
15 print( rhs.second, out ); // Figure 10-1
16 return out;
17 }
18
19 void printConcordance( istream & in, ostream & out )
20 {
21 string oneLine;
22 map<string,LList> wordMap;
23
24 // Read the words; add them to wordMap
25 for( int lineNum = 1; getline( in, oneLine ); lineNum++ )
26 {
27 istringstream st( oneLine );
28 string word;
29
30 while( st >> word )
31 wordMap[ word ].push_back( lineNum );
32 }
33
34 map<string,LList>::iterator itr;
35 for( itr = wordMap.begin( ); itr != wordMap.end( ); ++itr )
36 out << *itr << endl;
37 }
First, we print the word which is the first data member, and second we output the list of
line numbers, which is the second data member.
Sometimes priority queues are set up to remove and access the smallest item instead of the
largest item. In such a case, the priority queue can be instantiated with an appropriate greater
function object to override the default ordering.
The priority queue template is instantiated with an item type, the container type (as in
stack and queue), and the comparator, with defaults allowed for the last two parameters. In
Figure 10-13, line 29 shows the default instantiation of priority_queue, that allows access
to the largest items, while line 30 shows an instantiation that allows access to the smallest item.
10.10.1 Sorting
Sorting in C++ is accomplished by use of function template sort. The parameters to sort rep-
resent the start and endmarker of a (range in a) container, and an optional comparator:
void sort( Iterator begin, Iterator end );
void sort( Iterator begin, Iterator end, Comparator cmp );
The iterators must support random access. The sort algorithm does not guarantee that equal
items retain their original order. For that, we can use stable_sort instead of sort.
As an example, in
sort( v.begin( ), v.end( ) );
sort( v.begin( ), v.end( ), greater<int>( ) );
sort( v.begin( ), v.begin( ) + ( v.end( ) - v.begin( ) ) / 2 );
the first call sorts the entire container, v, in non-decreasing order. The second call sorts the entire
container in non-increasing order. The third call sorts the first half of the container in non-
decreasing order. Note that (v.begin()+v.end())/2 is not allowed; instead we can com-
pute a separation distance, halve it, and add it to the begin iterator.
The sorting algorithm is generally quicksort, which yields an O( N log N ) algorithm on
average. However, O( N log N ) worst-case performance is not guaranteed. In addition to sort-
ing, there are also algorithms for selection, shuffling, partitioning, partial sorting, reversing,
rotating, and merging,
10.10.2 Searching
Several generic searching algorithms are available for containers. The two most basic are:
Iterator find( Iterator begin, Iterator end, const Object & x );
Iterator find_if( Iterator begin, Iterator end, Predicate pred );
c++book.mif Page 201 Tuesday, April 29, 2003 2:13 PM
find returns an iterator representing the first occurrence of x in the range specified by begin
and end, or end if x is not found. find_if returns an iterator representing the first occur-
rence of an object for which the function object pred would return true, or end if no match is
found.
For instance, suppose we want to find the first occurrence of a string of length exactly 9 in
a vector<string>. First, we define a function object that expresses this condition:
class StringLengthComp
{
public:
bool operator() ( const string & s ) const
{ return s.length( ) == 9; }
};
However, for additional code reuse, a class template might be better:
template <int len>
class StringLength
{
public:
bool operator() ( const string & s ) const
{ return s.length( ) == len; }
};
Then, at the end of the following code fragment, in which v is of type
vector<string>, and itr is of type vector<string>::const_iterator will
either be v.end() or will be located at a string of length 9:
itr = find_if( v.begin( ), v.end( ), StringLength<9>( ) );
Thus, find is implemented as a call to find_if with an appropriately instantiated
equal_to as the third parameter.
There are a host of generic algorithms that are available. We list a few of the common
ones.
binary_search is used to search a sorted range for an object. A comparator can be
provided, or the default ordering can be used. equal_range, lower_bound and
upper_bound search sorted ranges and behave with the same semantics as the identically
named member functions in set. min_element can be used to find the smallest item in a
range, and can be invoked with or without a comparator. count returns the number of occur-
rences of an object in a range delimited by a pair of iterators. count_if returns the number of
objects in a range delimited by a pair of iterators that are true according to a predicate.
adjacent_find returns an iterator referring to the first element such that a predicate
returns true when applied to the element and its predecessor. The default predicate is
equal_to, in which case adjacent_find finds the first occurrence of an element that
equals the next element.
c++book.mif Page 202 Tuesday, April 29, 2003 2:13 PM
find_first_of takes four iterators representing two sequences (the first sequence and
the second sequence). It returns an iterator representing the first occurrence of any of the ele-
ments in the second sequence. For instance,
vector<int> wins; // store winning numbers
wins.push_back( 37337 );
wins.push_back( 46521 ),
wins.push_back( 53810 );
vector<int> myNumbers;
... // populate myNumbers
vector<int>::const_iterator itr;
itr = find_first_of( myNumbers.begin( ), myNumbers.end( ),
wins.begin( ), wins.end( ) );
searches the vector myNumbers for any of the numbers in wins and returns an iterator rep-
resenting the first occurrence of such a number in myNumbers.
A fifth parameter, a predicate, can be used to decide if an item in the second sequence is a
match for an item in the first sequence. Of course, the default is equal_to.
10.10.4 Copying
There are several algorithms that deal with copying. These include copy, copy_backwards,
remove, remove_copy, remove_if, remove_copy_if, replace, replace_copy,
replace_if, replace_copy_if, unique, unique_copy. Method names that have
copy and non-copy versions differ in whether they change the original, or leave the original
unchanged and produce a new sequence. A typical routine is copy:
c++book.mif Page 203 Tuesday, April 29, 2003 2:13 PM
BitSets 203
10.11BitSets
Like Java, C++ has bitwise operators that can manipulate a set of bits stored in a primitive type
and a bitset class template.
The class template, which is declared in the standard header bitset, is instantiated with
the number of bits to be stored (this must be a compile-time constant), and indexing starts at 0.
Bits can be accessed with test, or alternatively, with the array indexing operator[]. set
and unset can be used to turn on or off a particular bit; with no parameters these methods
c++book.mif Page 204 Tuesday, April 29, 2003 2:13 PM
affect all bits. Alternatively, operator[] can be used on the left-hand side of an assignment.
Also overloaded are the standard bitwise operators, that allow bitwise operations on bitset
types. The bitsets involved in those operations must have identical sizes.
The bitset has a look and feel that is similar to both vector and map; but, it can be
expected to be more efficient that vector, set, or map. However, a set or map could be
space-efficient for cases where there are many bits, but only a few are ever set to be on.
10.12Key Points
• Standard STL containers include the sequence containers: vector, list, and deque,
and also set, multiset, map, multimap, and priority_queue. multisets
allow duplicates and multimaps allow duplicate keys.
• maps store keys and values. operator[] returns the value associated with a key, and if
the key is not present, it is inserted with a default value, that is then returned.
• Containers can be accessed by iterators, which are more powerful than their Java counter-
parts.
• Significant compile-time type checking is performed by the STL.
• Little runtime error checking is performed by the STL.
• There are several general types of iterators: forward iterators, bidirectional iterators, ran-
dom access iterators, and stream iterators are the most common. Additionally, there are
const_iterators, reverse_iterators, and const_reverse_iterators.
• Iterators use operator overloading extensively. The common operators are ++, *, =, ==,
and !=. Bidirectional iterators allow --. Random access iterators allow - and +, −= and
+=.
• Stream iterators allow repeated iteration over an input or output stream.
• Each container has a begin and end member function that yields iterators that represent
the beginning of the container and the endmarker of the container. There are both accessor
and mutator versions of begin and end.
• pair is a class template that stores the first item and second item as public data. The
pair is used in the map, which is a set of pairs, and also in the return type of some set
member functions.
• If a standard container is storing a heterogeneous collection, it should store pointers to the
objects.
• Six function objects are defined as class templates in the standard header functional.
These are less, greater, equal_to, not_equal_to, great_equal, and
less_equal.
• The unary adapters allow the conversion of the standard binary predicates to unary predi-
cates, by supplying one of the parameters to the binary predicate.
• The inserter adapters allow copying into empty containers by converting the assignment
operator of an iterator into an insertion operation on the container.
c++book.mif Page 205 Tuesday, April 29, 2003 2:13 PM
Exercises 205
• The standard library includes over 60 function templates for sorting, searching, copying,
and many other generic algorithms.
• C++ has a bitset class template. To use it, the number of bits must be known at compile
time. If this is not possible, alternatives such as vector<bool>, set<int> (contain-
ing only the true bits), or map<int,bool> can be used, but might not be as fast.
10.13Exercises
1. How does the STL differ from the Java Collections API?
2. Describe the functionality of iterators in C++.
3. What are the different types of iterators?
4. What does end return?
5. What kind of error checks are performed by STL routines?
6. What is a const_iterator and how is it used?
7. What is a reverse iterator and how is it used?
8. What is a stream iterator and how is it used?
9. What is the difference between a set and a multiset?
10. Why is there no operator[] accessor for map?
11. What is a unary binder adapter?
12. What is an inserter adapter?
13. Describe the general categories of STL algorithms and give an example or an algorithm in
each category.
14. In Exercise 6.19, make two modifications. First, in the Employee class, add
operator< that orders employees by name. Then change the implementation of Roster
to use a multiset of Employee * (ordered by name). Part of the multiset tem-
plate instantiation includes an appropriate function object. The multiset print routine
should output employees in sorted order (by name).
15. Implement a spelling checker. Prompt the user for the name of a file that stores a dictio-
nary of words. Then prompt for the name of a file that you want to spell-check. Any word
that is not in the dictionary is considered to be misspelled. Output, in sorted order, each
misspelled word and the line number(s) on which it occurs. If a word is misspelled more
than once, it is listed once, but with several line numbers. Of course you should verify that
files open correctly. Use the following rule to determine what a word is: The input is con-
sidered to be a sequence of tokens separated by whitespace. Any token that ends with a
single period, question mark, comma, semicolon, or colon should have the punctuation
removed. After doing this, any token that contains letters only is considered a word. Con-
vert this word to lower case.
16. Implement the sort template that takes a pair of iterators and a comparator, using any
simple sorting algorithm. Then implement the sort template that takes a pair of iterators.
In order to do this, and reuse the three-parameter sort, you will need to define a phantom
c++book.mif Page 206 Tuesday, April 29, 2003 2:13 PM
four-parameter sort template that takes an object of the type to be sorted as the fourth
parameter. The two-parameter sort will invoke the four-parameter sort, which in turn
will invoke a three-parameter sort, with less<Object>() as the third parameter.
c++book.mif Page 207 Tuesday, April 29, 2003 2:13 PM
C H A P T E R 1 1
I
N a perfect world, all array and strings manipulations in
C++ would be done with the vector class template and the string class that is part of the
standard library. But life is not perfect. Many programs written prior to the standardization of
vector and string make use of primitive arrays and strings, and some library routines inter-
face with primitive arrays and strings instead of the library classes. Parameters to main, for
instance, are a primitive array of primitive strings.
In this chapter, we discuss primitive arrays and strings in C++. We will see the relationship
between the primitive array and pointer variables, and how this relationship influences the
design of the STL. We will also discuss command-line arguments, and briefly mention multidi-
mensional arrays.
207
c++book.mif Page 208 Tuesday, April 29, 2003 2:13 PM
features of arrays and pointers and discuss why these restrictions come into play.
...
Figure 11-1 Memory model for arrays (assumes 4 byte int); declaration is int a[3]; int i;
c++book.mif Page 209 Tuesday, April 29, 2003 2:13 PM
Now that we have seen how arrays are manipulated in C++, we can see why some of the
limitations discussed earlier occur, and we can also see how arrays are passed as function param-
eters. First we have the problem of checking that the index is in range. Performing the bounds
check would require that we store the array size in an additional parameter. Certainly this is fea-
sible, but it does incur both time and space overhead. In a common application of arrays (short
strings), the overhead could be significant. As we have mentioned in Section 2.2.1 and illus-
trated in Section 8.1, the lack of range checking can cause serious problems such as off-by-one
errors in array indexing that can lead to bugs that are very difficult to spot. (If index range
checking is crucial, use the vector’s at member function).
The second limitation of the basic array (is solved by vector) is array copying. Suppose
that a and b are arrays of the same type. In many languages, if the arrays are also the same size,
the statement a=b would perform an element-by-element copy of the array b into the array a. In
C++ this statement is illegal because a and b represent constant pointers to the start of their
respective arrays, specifically &a[0] and &b[0]. Then a=b is an attempt to change where a
points, rather than copying the contents of array b into array a. What makes the statement ille-
gal, rather than legal but wrong, is that a cannot be reassigned to point somewhere else because
it is essentially a constant object. The only way to copy two arrays is to do it element by ele-
ment; there is no shorthand. A similar argument shows that the expression a==b does not evalu-
ate to true if and only if each element of a matches the corresponding element of b. Instead,
this expression is legal. It evaluates to true if and only if a and b represent the same memory
location (that is, they refer to the same array).
Finally, an array can be used as a parameter to a function, and the rules follow logically
from our understanding that an array name is little more than a pointer. Suppose we have a func-
tion functionCall that accepts one array of int as its parameter. The caller/callee views are
functionCall( actualArray ); // Function Call
functionCall( int formalArray[ ] ) // Function Declaration
Note that in the function declaration, the brackets serve only as a type declaration, in the same
way that int does. Note that the [] must follow the formal parameter, unlike Java where it can
either follow or precede the formal parameter. In the function call only the name of the array is
passed; there are no brackets. In accordance with the call-by-value conventions of C++, the
value of actualArray is copied into formalArray. Because actualArray represents
the memory location where the entire array actualArray is stored, formalArray[i]
accesses actualArray[i]. This means that the variables represented by the indexed array
are modifiable. Thus an array, when considered as an aggregate, is passed by reference. Further-
more, any size component in the formalArray declaration is ignored, and the size of the
actual array is unknown. If the size is needed, it must be passed as an additional parameter.
Note that passing the aggregate by reference means that functionCall can change ele-
ments in the array. We can use the const directive to attempt to disallow this (but this tech-
nique is not foolproof because of the ability to cast away const-ness):
functionCall( const int formalArray[ ] );
c++book.mif Page 210 Tuesday, April 29, 2003 2:13 PM
a1 a2
The [] is absolutely necessary here to ensure that all of the objects in the allocated array
have their destructors called prior to reclaiming of the memory for array a2. Without the [] it is
possible that only a2[0]’s destructor is called, and the remaing items in the array do not have
their destructors called, nor will a2’s memory be reclaimed, which is hardly what we intend.
1 void f( int i )
2 {
3 int a1[ 10 ];
4 int *a2 = new int [ 10 ];
5
6 ...
7 g( a1 );
8 g( a2 );
9
10 // On return, all memory associated with a1 is freed
11 // On return, only the pointer a2 is freed;
12 // 10 ints have leaked
13 // delete [ ] a2; // This would fix the leak
14 }
With new and delete we have to manage the memory ourselves rather than allow the com-
piler to do it for us. Why then, would we be interested in this? The answer is that by managing
memory ourselves, we can build expanding arrays. Suppose, for example, that in Figure 11-2 we
decide, after the declarations but before the calls to g at lines 7 and 8, that we really wanted 12
ints instead of 10. In the case of a1 we are stuck, and the call at line 7 cannot work. However,
with a2 we have an alternative, as illustrated by the following maneuver:
int *original = a2; // 1. Save pointer to the original
a2 = new int [ 12 ]; // 2. Have a2 point at more memory
for( int i = 0; i < 10; i++ ) // 3. Copy the old data over
a2[ i ] = original[ i ];
delete [ ] original; // 4. Recycle the original array
Figure 11-4 shows the changes that result. A moment’s thought will convince you that this is an
expensive operation, because we copy all of the elements from original to a1. If, for
instance, this array expansion is in response to reading input, it would be inefficient to re-expand
a2
(a)
a2
(b)
Original
a2
(c)
Original
a2
(d)
Original
Figure 11-4 Array expansion: (a) starting point: a2 points at 10 integers; (b) after step 1:
original points at the 10 integers; (c) after steps 2 and 3: a2 points at 12 inte-
gers, the first 10 of which are copied from original; (d) after step 4: the 10 inte-
gers are freed
c++book.mif Page 213 Tuesday, April 29, 2003 2:13 PM
every time we read a few elements. Thus when array expansion is implemented, we always
make it some multiplicative constant times as large. For instance, we might expand to make it
twice as large. In this way, when we expand the array from N items to 2N items, the cost of the N
copies can be amortized over the next N items that can be inserted into the array without an
expansion.
c++book.mif Page 214 Tuesday, April 29, 2003 2:13 PM
1 #include <iostream>
2 #include <cstdlib>
3 using namespace std;
4
5 // Read an unlimited number of ints with no attempts at error
6 // recovery; return a pointer to the data, and set ItemsRead
7 int * getInts( int & itemsRead )
8 {
9 int arraySize = 0;
10 int inputVal;
11 int *array = NULL; // Initialize to NULL pointer
12
13 itemsRead = 0;
14 cout << "Enter any number of integers: ";
15 while( cin >> inputVal )
16 {
17 if( itemsRead == arraySize )
18 { // Array doubling code
19 int *original = array;
20 array = new int[ arraySize * 2 + 1 ];
21 for( int i = 0; i < arraySize; i++ )
22 array[ i ] = original[ i ];
23 delete [ ] original; // Safe if Original is NULL
24 arraySize = arraySize * 2 + 1;
25 }
26 array[ itemsRead++ ] = inputVal;
27 }
28 return array;
29 }
30
31 int main( )
32 {
33 int *array;
34 int numItems;
35
36 array = getInts( numItems );
37 for( int i = 0; i < numItems; i++ )
38 cout << array[ i ] << endl;
39
40 return 0;
41 }
Figure 11-5 Code to read an unlimited number of ints and write them out
To make things more concrete, Figure 11-5 shows a program that reads an unlimited num-
ber of integers from the standard input and stores the result in a dynamically expanding array.
The function declaration for getInts tells us that it returns the address where the array will
c++book.mif Page 215 Tuesday, April 29, 2003 2:13 PM
reside, and it sets a reference parameter itemsRead to indicate how many items were actually
read.
At the start of getInts, itemsRead is set to 0, as is the initial arraySize. We
repeatedly read new items at line 15. If the array is full, as indicated by a successful test at line
17, then the array is expanded. Lines 19 to 23 perform the array doubling. At line 19 we save a
pointer to the currently allocated block of memory. We have to remember that the first time
through the loop, the pointer will be NULL. At line 20 we allocate a new block of memory,
roughly twice the size of the old. We add one so that the initial doubling converts a zero-sized
array to an array of size one. At line 24 we set the new array size. At line 26, the actual input
item is assigned to the array, and the number of items read is incremented. When the input fails
(for whatever reason), we merely return the pointer to the dynamically allocated memory. Note
carefully that
The main routine calls getInts, assigning the return value to a pointer.
As we can see, this is lots of work. That’s why modern C++ programmers use vector.
C++, as described so far, provides library routines for strings but no language support. In
fact, the only language support is provided by a string constant. A string constant provides a
shorthand mechanism for specifying a sequence of characters. It automatically includes the null
terminator as an invisible last character. Any character (specified with an escape sequence if
necessary) may appear in the string constant. Thus "Nina" represents a five-character array.
Additionally, a string constant can be used as an initializer for a character array. Thus:
char name1[ ] = "Nina"; // name1 is an array of five char
char name2[ 9 ] = "Nina"; // name2 is an array of nine char
char name3[ 4 ] = "Nina"; // name3 is an array of four char
In the first case the size of the array allocated for name1 is determined implicitly, while in the
second case we have over-allocated (which is necessary if we intend later to copy a longer string
into name2). The third case is wrong because we have not allocated enough memory for the
null terminator. Initialization by a string constant is a special exemption; we cannot say
char name4[ 8 ] = name1; // ILLEGAL!
A string constant can be used in any place that both a string and a constant string can. For
instance, it may be used as the second parameter to strcpy but not as the first parameter. This
is because the declaration for strcpy does not disallow the possibility that the first parameter
might be altered (indeed, we know that it will). Because a string constant can be stored in read-
only memory, allowing it to be used as a target of strcpy could result in a hardware error. Note
carefully that we can always send a nonconstant string to a parameter that expects a constant
string. Thus we have
strcpy( name2, "Mark" ); // LEGAL
strcpy( "Mark", name2 ); // ILLEGAL!
strcpy( name2, name1 ); // LEGAL
The declarations for the string routines indicate that the parameters are pointers. This fol-
lows from the fact that the name of an array is a pointer. The second parameter to strcpy is a
constant string, meaning that any string can be passed and it is guaranteed to be unchanged. The
first parameter is a non-contant string, and might be changed. Consequently, a constant string
cannot be passed; this includes string constants.
Beginners tend to take the equivalence of arrays and pointers one step too far. Recall that
the fundamental difference between an array and a pointer is that an array definition allocates
enough memory to store the array, while a pointer points to memory that is allocated elsewhere.
Because strings are arrays of characters, this distinction applies to strings. A common error is
declaring a pointer when an array is needed. As examples, consider the following declarations:
char name[ ] = "Nina";
char *name1 = "Nina";
char *name2;
The first declaration allocates five bytes for name, initializing it to a copy of the string constant
"Nina" (including the null terminator). The second declaration states merely that name1
points at the zeroth character of the string constant "Nina". In fact, the declaration is wrong
c++book.mif Page 218 Tuesday, April 29, 2003 2:13 PM
because we are mixing pointer types: the right side is a const char *, while the left side is
merely a char *. Some compilers will complain. The reason for this is that a subsequent
name1[ 3 ] = 'e';
is an attempt to alter the string constant. Since a string constant is supposed to be constant, this
action should not be allowed. The easiest way for the compiler to do this is to follow the conven-
tion that if a is a constant array, then a[i] is a constant also and cannot be assigned to. If the
statement
char *name1 = "Nina";
were allowed, this would be hard to enforce. By enforcing const-ness at each assignment, the
problem becomes manageable. It is legal to use
const char *name1 = "Nina";
but that is hardly the same as declaring an array to store a copy of the actual string; furthermore,
name1[3]='e' is easily seen by the compiler to be illegal in this case. A common example
where this would be used is
const char *message = "Welcome to FIU!";
Another common consequence of declaring a pointer instead of an array object is the fol-
lowing statement (in which we assume that name2 is declared as above):
strcpy( name2, name );
Here the programmer expects to copy name into name2 but is fooled because the declaration
for strcpy indicates that two pointers are to be passed. The call fails because name2 is just a
pointer rather than a pointer to sufficient memory to hold a copy of name. If name2 is a NULL
pointer, points at a string constant stored in read-only memory, or points at an illegal random
location, strcpy is certain to attempt to dereference it, generating an error. If name2 points at
a modifiable array (for instance, name2=name is executed), there is no problem.
All these considerations tell us that using the C++ string is a better option that primi-
tive strings in most cases to safely hide all new uses of primitive strings inside of a string.
A string can be constructed from a const char *, and a const char * can be
extracted from a string via the member function c_str. So when interacting with older code
that expects char * and produces char *, one strategy is to create a string as soon as possi-
ble, do the string manuipulations safely with the string class, extract a const char * by
using c_str and pass that to the older code. The return value from the older code can be imme-
diately converted into a string. For instance, suppose there is a routine
const char *getenv( const char *prop );
that expects a primitive string and returns a primitive string. We can pass it a string prop,
and assign the result to a string val, as
string val = getenv( prop.c_str( ) );
making use of the automatic implicit conversion from const char * to string.
c++book.mif Page 219 Tuesday, April 29, 2003 2:13 PM
a ptr x y
a[0] a[1] a[2] a[3] a[4] a[5] a[6] a[7] a[8] a[9]
Figure 11-7 Pointer arithmetic: x=&a[3]; y=x+4;
Figure 11-8 Array initialization coded two ways: first by using indexing, second by using
pointer hopping
at an integer but somewhere in the middle and would be misaligned, generally leading to a hard-
ware fault. Since that interpretation would give erroneous results, C++ uses the following inter-
pretation: ++ptr adds the size of the pointed at object to the address stored in ptr.
This interpretation carries over to other pointer operations. The expression x=&a[3]
makes x point at a[3]. Parentheses are not needed, as mentioned earlier. The expression
y=x+4 makes y point at a[7]. We could thus use a pointer to traverse an array instead of using
the usual index iteration method. We will discuss this in Sections 11.3.3 and 11.3.4.
Although it makes sense to add or subtract an integer type from a pointer type, it does not
make sense to add two pointers. It does, however, make sense to subtract two pointers: y-x
evaluates to 4 in the example above (since subtraction is the inverse of addition). Thus pointers
can be subtracted but not added.
Given two pointers x and y, x<y is true if the object x is pointed at is at a lower address
than the object y is pointing at. Assuming that neither is pointing at NULL, this expression is
almost always meaningless unless both are pointing at elements in the same array. In that case
x<y is true if x is pointing at a lower-indexed element than y because, as we have seen, the ele-
ments of an array are guaranteed to be stored in increasing and contiguous parts of memory. This
is the only legitimate use of the relational operator on pointers, and all other uses should be
avoided. To summarize, we have the following pointer operations:
• Pointers may be assigned, compared for equality (and inequality), and dereferenced in
C++, as well as almost all other languages. The operators are =, ==, !=, and *.
• We can apply the prefix or postfix increment operators to a pointer, can add an integer, and
can subtract either an integer or pointer. The operators are ++, --, +, -, +=, and -=.
c++book.mif Page 222 Tuesday, April 29, 2003 2:13 PM
• We can apply relational operators to pointers, but the result makes sense only if the point-
ers point to parts of the same array, or one pointer points to NULL. The operators are <,
<=, >, and >=.
• We can test against NULL by applying the ! operator (because the NULL pointer is 0).
• We can subscript and delete pointers via [], delete, and delete[].
• We can apply trivial operators, such as & and sizeof, to find out information about the
pointer (not the object it is pointing at).
• We can apply some other operators, such as ->.
The moral of the story is that, in many cases, it is best to leave minute coding details to the
compiler and concentrate on the larger algorithmic issues and on writing the clearest code possi-
ble. Many systems have a profiler tool that will allow you to decide where a program is spend-
ing most of its running time. This will tell you where to apply algorithmic improvements, so it is
important to learn how to use the optimizer and profiler on your system.
1 #include <iostream>
2 using namespace std;
3
4 int main( int argc, char *argv[ ], char *envp[ ] )
5 {
6 for( int i = 0; envp[ i ] != NULL; i++ )
7 cout << envp[ i ] << endl;
8
9 return 0;
10 }
Figure 11-9 The echo command (a little more verbose than normal).
c++book.mif Page 225 Tuesday, April 29, 2003 2:13 PM
If only one environment variable needs to be examined, that standard library provides
getenv (standard header cstdlib needs to be included), with signature previously discussed
in Section 11.2:
const char *getenv( const char *prop );
Figure 11-10 shows how we list all the environment variables, assuming that the third
parameter to main is being supported.
• Primitive arrays are represented by a pointer variable rather than a real array object.
• Heap-allocated arrays are created by invoking the array new operator, new[].
• Heap-allocated arrays are reclaimed by invoking the array delete operator, delete[].
Do not simply invoke delete, as that may only reclaim the object in index 0 (if that
object is itself a pointer variable).
• When a primitive array is passed as a parameter to a function, the array when viewed as an
aggregate is being passed by reference.
• Primitive strings are implemented as a null-terminated array of characters.
• String copies are difficult to do correctly because there is no runtime check that the target
string has enough memory to store the result (plus the null terminator).
• A common string error is to simply declare a char*, and assume that a string is created.
The char* has to point at memory that can store the characters.
• Avoid primitive string manipulations by using the library string class, the implicit con-
version from char* to string, and the c_str member function.
• ++, when applied to a pointer variable that is pointing at an array element advances the
pointer variable to the next array element.
• Pointer hopping used to be widely used to increase program speed. Modern optimizing
compilers reduce the need for pointer hopping.
• Pointers can be used as iterators for the STL generic algorithms.
• Command-line arguments are available through the argv parameter that can be provided
to main. argv is an array of strings (char *argv[]), and the command itself is
argv[0]. The number of command line arguments is passed as the argc parameter to
main.
• Environment variables can often be accessed by a third parameter to main.
• Avoid multidimensional arrays, which are simply stored as a flat one-dimensional array by
the compiler. When a primitive multidimensional array is listed as a formal parameter, all
but the first dimension must be provided.
11.8 Exercises
C H A P T E R 1 2
C-Style C++
B
ECAUSE C++ is based on C, and C++ was designed
to be compatible with most C programs, several C constructs that have much better C++ equiva-
lents are still part of C++. We have already seen, for example, that C-style strings are part of the
language and still are used as parameters to main. Some C constructs are required for third-
party libraries that were intended to be used by C programs.
In this chapter, we describe C constructs that are commonly seen in C++ programs, partic-
ularly in older C++ code, even though better alternatives exist. These constructs include prepro-
cessor macros, using pointers to simulate call-by-reference, and the C libraries that are part of
Standard C++. Although it is a relatively rare occurrence, occasionally program speed can be
improved by judiciously using C constructs in performance-critical sections of code.
229
c++book.mif Page 230 Tuesday, April 29, 2003 2:13 PM
1 #define printDebug( expr ) cout << __FILE__ << " [" << \
2 __LINE__ << "] (" << #expr << "): " << ( expr ) << endl;
3
4 int main( )
5 {
6 int x = 5, y = 7;
7
8 printDebug( x + y );
9
10 return 0;
11 }
Figure 12-1 Macro for debugging prints with file and line number included
z = absoluteValue( ++a );
which is expanded to
In C, the only typecast is the Java style cast, except that it is unchecked. Thus as we saw in
Section 6.7 to cast down an inheritance hierarchy, the dynamic_cast should be used. Some-
times we need to do totally bizarre casts. This is a very rare occurrence and usually means we
are either writing totally bizarre code, or perhaps device drivers or other low-level code.
In such a case, although the old style cast works, it is better to use
reinterpret_cast, since this cast stands out much more in the code. When you use a
reinterpret_cast, all bets are off, since you are in effect, disregarding the typing informa-
tion of an object. All you can expect is that the result of a reinterpret_cast will have the
new type, with the same bits as the old. In the case of a pointer, it will be pointing at the same
object, but the object will be viewed as a different type that might not make any sense at all.
c++book.mif Page 233 Tuesday, April 29, 2003 2:13 PM
1 struct tm
2 {
3 int tm_sec; /* seconds after the minute (0- 61) */
4 int tm_min; /* minutes after the hour (0- 59) */
5 int tm_hour; /* hours after midnight (0- 23) */
6 int tm_mday; /* day of the month (1- 31) */
7 int tm_mon; /* month since January (0- 11) */
8 int tm_year; /* years since 1900 (0- ) */
9 int tm_wday; /* days since Sunday (0- 6) */
10 int tm_yday; /* days since January 1 (0-365) */
11 int tm_isdst; /* daylight savings time flag */
12 };
13
14 typedef unsigned long time_t;
15
16 /* Some functions */
17 char *asctime(const struct tm *);
18 time_t mktime(struct tm *);
1 // Find Friday the 13th birthdays for person born Oct 13, 1937
2
3 #include <ctime>
4 #include <iostream>
5 using namespace std;
6
7 int main( )
8 {
9 const int FRIDAY = 6 - 1; // Sunday is 0, etc...
10 tm theTime = { 0 }; // Set all fields to 0
11
12 theTime.tm_mon = 10 - 1; // January is 0, etc...
13 theTime.tm_mday = 13; // 13th day of the month
14
15 for( int year = 1937; year < 2073; year++ )
16 {
17 theTime.tm_year = year - 1900; // 1900 is 0, etc...
18 if( mktime( &theTime ) == -1 )
19 {
20 cerr << "mktime failed in " << year << endl;
21 continue;
22 }
23 if( theTime.tm_wday == FRIDAY )
24 cout << asctime( &theTime );
25 }
26 return 0;
27 }
Figure 12-5 Progam to find all Friday the 13th birthdays for a friend
or -1 if the tm struct is out-of-range. The tm struct is mutable because mktime also
attempts to fill in fields such as tm_wday and tm_yday. Note that in both cases, the tm type
includes the word struct. This is optional in C++, and so we have never done it, but it is
required in C. A person born on Oct. 13, 1937 will naturally have a few birthdays fall on Friday
the 13th. Figure 12-5 shows a simple program to calculate when this occurs.
The first parameter is the control string and is output, except that the additional parame-
ters are substituted as appropriate into the control string in places marked by a % conversion
sequence. For instance %d is used to print an integer in decimal, %o prints an integer in octal, %s
prints a (primitive) string, %f prints a float or double, and %% prints a %.
The return value of printf is the number of characters actually written, or -1 if there is
an error. But hardly anybody ever bothers to check the return code. As an example,
int x = 5;
double y = 3.14;
printf( "x is %d, y is %f\n", x, y );
returns 18 (characters, newline included) and prints
x is 5, y is 3.14
After the % and before the character that specifies the type, a host of options control all of the
same things that can be controlled by the ostream manipulators. For instance in Figure 9-5 we
had:
out << left << setw( 15 ) << name << " "
<< right << fixed
<< setprecision( 2 ) << setw( 12 ) << salary;
the equivalent call to printf is
printf( "%-15s %12.2f", name, salary );
The most important thing to know about printf is that it is not type-safe. If the addi-
tional parameters do not match the conversion specifiers in the control string, you get gibberish.
Because C is not object-oriented, only the primitive types have conversion specifiers;
even in C++, you cannot define new specifiers for user-defined class types, making printf
vastily inferior to the C++ iostream library.
The C function that reads formatted input is scanf. The basic form is
int scanf( const char *control, void *obj1, void *obj2, ... );
The first parameter is the control string, as before, except that doubles should use %lf
instead of %f. Also, field width specifiers become maximums, instead of minimums (this is use-
ful for strings, because you want to make sure you don’t read more characters than you have
room for). The return value is the number of conversion specifiers that are actually matched, so
if this is less than the number of conversion specifiers, something has gone wrong. As an exam-
ple,
int x;
double y;
char name[ 100 ];
scanf( "%d %lf %99s", &x, &y, name );
reads an int, double, and primitive string, putting them in the objects specified, and hope-
fully returns 3. Note that name is already an address, so we don’t need the &.
c++book.mif Page 237 Tuesday, April 29, 2003 2:13 PM
Like printf, scanf is not typesafe, and errors are likely to result in an abnormal pro-
gram termination, because pointer variables are involved. The most common mistake with
scanf is to forget to pass an address, as in
scanf( "%d", x );
This attempts to put an integer in the memory location given by the integer stored in x. It is
unlikely that this is a valid location; it certainly is not x’s location.
scanf is a dangerous routine that has little use in a C++ program. If possible you should
replace existing calls to print and scanf with the iostream equivalents. You should avoid
having a program use both libraries, since both libraries buffer, and I/O can be interwoven unex-
pectedly.
Use of printf and scanf, and all file routines requires the standard header cstdio
(or stdio.h).
Figure 12-6 shows a routine that copies from one file to another. We write it in C-style,
using primitive strings. Line 4 declares charCounted, and line 5 declares ch which will rep-
resent a character read by fgetc. Note that ch must have type int. If it is char, and this pro-
gram is applied to large binary files, the copy will most likely be short. At line 6, observe that
both sfp and dfp must BOTH be declared as pointer variables. The first * applies only to sfp.
Line 8 is a simple alias test. The error messages at line 10 illustrate the use of the stderr
stream. Lines 13 and 18 open the files as binary files. If the second open fails, we must remem-
1 // Copy files; return number of chars copied
2 int copy( const char *destFile, const char *sourceFile )
3 {
4 int charsCounted = 0
5 int ch;
6 FILE *sfp, *dfp;
7
8 if( strcmp( sourceFile, destFile ) == 0 )
9 {
10 fprintf( stderr, "Cannot copy to self\n" );
11 return -1;
12 }
13 if( ( sfp = fopen( sourceFile, "rb" ) ) == NULL )
14 {
15 fprintf( stderr, "Bad input file %s\n", sourceFile );
16 return -1;
17 }
18 if( ( dfp = fopen( destFile, "wb" ) ) == NULL )
19 {
20 fprintf( stderr, "Bad output file %s\n", destFile );
21 fclose( sfp );
22 return -1;
23 }
24
25 while( ( ch = getc( sfp ) ) != EOF )
26 if( putc( ch, dfp ) == EOF )
27 {
28 fprintf( stderr, "Unexpected write error.\n" );
29 break;
30 }
31 else
32 charsCounted++;
33
34 fclose( sfp );
35 fclose( dfp );
36 return charsCounted;
37 }
Figure 12-6 Copy files using getc and putc; return number of characters copied
c++book.mif Page 239 Tuesday, April 29, 2003 2:13 PM
ber to close the first file. The copying of files is performed at lines 25 to 32, and then the files are
closed at lines 34 and 35. Failing to close the output file could result in some buffered data not
being written out, especially if the program terminates abruptly. fflush can be used to force
buffered data to be written out, prior to closing the file.
Two similar looking functions are fgetc and fputc. Because getc and putc are pre-
processor macros, they can be dangerous to use if the arguments involve side effects. In this
extremely rare case, that is almost certainly poor programming practice, fgetc and fputc,
which are guaranteed to be functions and not macros, can be safely used.
Another routine that is provided is ungetc, which allows the putting back of a character
onto the input stream:
int ungetc( int ch, FILE *stream );
To read and write lines at a time, we can use fgets and fputs:
char *fgets( char *str, int howMany, FILE *stream );
int fputs( const char *str, FILE *stream );
fputs outputs a string to an output stream. It does not supply a newline character unless
one is already present. fgets reads characters from an input stream until one of three events
occurs:
1. EOF is encountered.
2. A newline is encountered.
3. howMany-1 characters are seen, before event 1 or 2 occurs.
After the characters are read, a null terminator is appended. A newline is stored only if it
was encountered. str is returned on success; if no characters were read because of an EOF or
any other error, a NULL pointer is returned.
As an example, suppose we want to read a large file one line at a time, using normal C++:
void processFile( string fileName )
{
ifstream fin( fileName.c_str( ) );
string oneLine;
Figure 12-7 getlineFast:: takes half the time of getline, but can fail
Figure 12-7 shows routine called getlineFast that uses this getline routine, pre-
suming that no line is longer than MAX_LINE_LEN. At line 7, the call to gcount returns the
number of characters read by getline. Presumably, if this count is 0, getlineFast should
return false; otherwise we copy the primitive string into oneLine. If we use getlineFast
instead of getline, 7.5 seconds becomes 2.8 seconds! However, getlineFast only works
if lines have less than MAX_LINE_LEN characters. If not, very long lines will be split into sepa-
Figure 12-9 Routine that prints last howMany characters from fileName
rate lines silently. Although there are some solutions available in the istream class, all of
them will take longer than 2.8 seconds. It turns out that this is a great place to use fgets.
We can write a getline that takes a FILE*, and then have the user create a FILE*
instead of an istream. With reasonable care, we can localize these changes, so that the rest of
the program doesn’t have to change. Best of all, our routine will handle arbitrarily long lines and
be faster than getlineFast, taking about 2.4 seconds. The additional getline is shown in
Figure 12-8.
We begin by clearing out oneLine at line 6. Note that assignment of "" to oneLine
with = is significantly slower than calling erase, and since this is done on every line (2000000
times), this inefficiency is significant enough to be worth using erase. In the main loop, the
idea is to keep calling fgets, concatenating the result to oneLine (also line 6) until we read
characters that contain a terminating newline character (that test is performed at line 13). Those
characters are also appended (at line 20 outside of the main loop), after the newline character is
stripped out at line 15.
Other routines that are available include feof, the random access routines fseek, and
ftell that mimic the routines in istream, discussed in Section 9.6. (Actually the istream
routines mimic these.) For fseek, the constants that specify beginning, current, and end, are
SEEK_SET SEEK_CUR, and SEEK_END. The code in Figure 12-9 shows the direct correspon-
dence between the FILE* and istream routines, by recoding Figure 9-7 line-for-line. (How-
ever, as we mention in Section 12.5, several aspects of this code that are legal in C++, such as
exceptions, are not legal C).
c++book.mif Page 242 Tuesday, April 29, 2003 2:13 PM
Finally, we mention sprintf and sscanf which allow printing and scanning from a
primitive string. These signatures are:
int sprintf( char *buffer, const char *control, val1, val2, ... );
int sscanf( const char *buffer, const char *control,
void *obj1, void *obj2, ... );
For instance, if we have
int x1 = 37, x2;
double y1 = 3.14, y2;
char oneLine[ 100 ];
sprintf( oneLine, "%d %f", x1, y1 );
oneLine will contain "37 3.14". A subsequent
sscanf( oneLine, "%d %lf", &x2, &y2 );
populates x2 and y2 with 37 and 3.14, respectively. Note that
sscanf( oneLine, "%d", &x2);
sscanf( oneLine, "%lf", &y2 );
sets x2 to 37 and y2 to 37.0, because sscanf does not maintain a notion of previous parsing of
the buffer string.
sprintf is dangerous because it is possible that buffer does not contain enough char-
acters to hold the result. sscanf is dangerous, especially in the case where the target tokens are
strings. The functionality of these functions is provided in ostringstream and
istringstream, so there is little need to use them.
was allocated by new should be released by delete, and not by free. Otherwise, havoc
results.
C-library routines that return heap-allocated memory get the memory from an alloc
function. Thus their memory is returned to the heap by calling free. One such example is
strdup.
12.4.5 system
The system function, in standard header cstdlib or stdlib.h, is used to invoke a com-
mand. It takes a primitive string representing the command; this string is passed to the operating
system’s command processor and is run. How this is done is highly system dependent, and obvi-
ously non-portable, since few commands are available on all systems. For instance,
system( "dir" );
invokes the dir command (if one exists).
12.4.6 qsort
qsort is a generic sorting algorithm with declaration
void qsort( void *base, size_t numItems, size_t itemSize,
int cmp( const void *, const void * ) );
The parameters to qsort represent
Figure 12-10 shows the call to qsort to sort an array of integers, along with the definition of an
appropriate comparison function. The comparison function takes two generic pointers to the
items. At line 3, we convert lhs to a pointer to a const int, and then dereference it to get the
int that lhs is pointing at. In C++, we would sensibly use reinterpret_cast, rather than
c++book.mif Page 244 Tuesday, April 29, 2003 2:13 PM
C Programming 245
1 #include <iostream>
2 #include <cstdarg>
3 using namespace std;
4
5 void printStrings( const char *str1, ... )
6 {
7 const char *nextStr;
8 va_list argp;
9
10 cout << str1 << endl;
11 va_start( argp, str1 );
12 while( ( nextStr = va_arg( argp, const char * ) ) != NULL )
13 cout << nextStr << endl;
14
15 va_end( argp );
16 }
17
18 int main( )
19 {
20 printStrings( "This", "is", "a", "test", (const char *) NULL );
21 return 0;
22 }
12.5 C Programming
If you program in C instead of C++, you need to be aware that some C++ features are missing in
C. The partial list, based on ANSI C (note that items 11 and 12 will behave the same in recently
adopted C99 as in C++) includes:
Basically, what’s left? Not much. It’s a long way down to C from the C++ or Java world.
• The standard C library is part of the C++ library. Header files such as stdio.h are avail-
able as both stdio.h and cstdio. The latter header file places all of the library in the
std namespace; the older header file leaves the library in the global namespace.
• The preprocessor can be used to implement macros; however, macros can be dangerous if
invoked with a parameter that contains side effects.
• C-style parameter passing is needed to interact with libraries that were written for C. To
simulate call-by-reference, or call-by-constant reference for large objects, we pass the
address of the object instead of the object. The formal parameter is declared as a pointer
(or a pointer to a constant).
• printf and scanf are not typesafe and their functionality is completely contained in
ostream and istream.
• Files can be accessed by using FILE* to represent both input and output streams.
Attempts to read from a file open for writing (or vice-versa) results in a bad return code
for the read, rather than a compile-time error. Most of the C-style I/O routines are cosmet-
ically similar to the C++ counterparts.
• Sometimes I/O that uses char* instead of string can be more efficient. Occasionally,
using fgets can help implement the more efficient I/O.
• Never call free on an object created with new; never call delete on an object created
with malloc. Avoid using the alloc family because they allocate raw memory, without
calling constructors, and are not typesafe.
• strtol and strtod can be used to extract a long or double (or smaller type) from a
primitive string. Although istringstream can accomplish the same thing, it is reason-
able to expect that strtol and strtod can be more efficient for this task.
• system can be used to invoke a system command.
• C and C++ support variable-length argument lists, via the standard header cstdarg or
stdarg.h.
• Programming exclusively in C requires abandoning lots of comforts of Java and C++.
12.7 Exercises
Exercises 247
C H A P T E R 1 3
A
native method is a method that is implemented in
another language, such as C or C++, and run in the Java Virtual Machine. The JDK provides a
standard programming interface called the Java Native Interface (JNI), that in theory allows the
Virtual Machine to invoke C and C++ code somewhat portably.
In this chapter, we describe the Java Native Interface. We begin by explaining why one
might want to implement some Java methods in an alternate language. Then we describe the
basic layout of the JNI, provide an example that involves invoking a simple native method, and
then write native methods that access Java objects. Finally, we briefly discuss the use of Java
features, such as exceptions and object monitors in native code.
1. You already have significantly large and tricky code written in another language and you
would rather not rewrite it in Java. Instead, you would like to use the existing codebase.
2. You need to access system devices, or perform some platform specific task that is beyond
the capability of Java. Many Java library routines eventually invoke private native meth-
ods for just this purpose. For instance the I/O, threading, and networking packages all con-
tain private native methods.
3. You think that Java might be too slow for your application, and that performance can be
enhanced by implementing time-critical code in C++.
249
c++book.mif Page 250 Tuesday, April 29, 2003 2:13 PM
Although Java used to be painfully slow, a modern Java implementation has performance
that is comparable to C++ for many applications. Using JNI to achieve performance improve-
ments is possible, but is no longer needed as much as it used to be.
Using JNI has significant downsides. First, you lose portability. A native implementation
must be supplied for each platform. Given the large Java library that already makes use of native
methods, if a new native method has identical C++ code that can be used on all platforms, then it
is likely that the code could have been implemented in Java in the first place. Second, you lose
safety. Native methods are not afforded the same protections as Java methods. Once you enter in
C++ code, all bets are off, and any C++ bug, such as indexing an array out-of-bounds, using a
stale pointer, and trashing memory can occur. Third, the implementation of the native method is
contained in a dynamic library (in a Windows environment, a .dll; in a Unix environment a
.so file). Any Java program that uses native methods must load dynamic libraries, and this is an
operation that the Java Security Manager might object to, because of the safety concerns listed
above. Lack of safety means lack of security, so for instance, by default user-defined native
methods generally cannot be invoked inside an applet. Fourth, the code is cumbersome, often
compiles and often fails to run because of silly typing errors such as poor capitalization or miss-
ing semicolons inside of string constants.
Once we decide to use JNI, the basic procedure is relatively straightforward.
1. A Java class declares that some methods have non-Java implementations by marking the
methods as native.
2. A C++ function is written that implements the native method, using the JNI protocols.
3. The C++ function is compiled in a dynamic library.
4. The Java Virtual Machine loads the dynamic library, and then calls to the native method
are handled by invoking the implementation in the dynamic library.
The devil, of course, is in the details, which are numerous, since the JNI is expected to work not
only for C++, but also for C and other languages that are not object-oriented. For instance,
1. How are fields and methods of a Java object used, given that C has no classes?
2. How are parameters passed from Java to C/C++?
3. How is a value returned from C/C++?
4. What about function overloading, since C does not allow it?
5. How do we differentiate between static and non-static members?
6. What about strings and arrays?
7. How can the C/C++ code throw an exception, and what happens if it invokes a Java
method that throws an exception?
We will begin our discussion by first implementing a single native method. To avoid all of
the above complications, our method will be static, with no parameters, and no return type, and
simply print a string. Still, this is tricky, since it is the first use of various incantations that will
c++book.mif Page 251 Tuesday, April 29, 2003 2:13 PM
be part of all JNI implementations. Then we will access fields and invoke methods of an object,
discuss arrays and strings, have the native method return a value and throw an exception, and
then quickly examine some the JNI support for object monitors.
1 class HelloNative
2 {
3 native public static void hello( );
4
5 static
6 {
7 System.loadLibrary( "HelloNative" );
8 }
9 }
10
11 class HelloNativeTest
12 {
13 public static void main( String[ ] args )
14 {
15 HelloNative.hello( );
16 }
17 }
The C++ declaration lists two parameters. The first, is a pointer to a JNIEnv object, and
will be used extensively to access fields and methods of objects. The second parameter is a
jclass object, representing information about the HelloNative class. Java programmers
who are familiar with the Reflection API will recognize this as being the equivalent of a Class
object. A jclass object allows us to obtain information and use fields and methods for any
Java class. This second parameter is passed for static methods only. For instance methods, the
second parameter is a jobject object, representing the moral equivalent of the this refer-
ence. Given a jobject, one can always obtain the jclass object representing the object’s
class type, and at that point, the object’s fields can be manipulated, and methods can be invoked.
But more on that in Section 13.4.
Since the native method takes no parameters, the C++ declaration lists no additional
parameters after the first two. We can implement the method trivially, as shown in Figure 13-3.1
You should make sure to include the header file generated by javah, and to give names to the
formal parameters. It is standard to name the first parameter env; the second parameter is often
either cls or ths, depending on whether we are implementing a static or instance method. Do
not use class or this, since these are C++ reserved words (avoid them if you are using C,
too, in case you want to painlessly upgrade to C++ later on).
1. Note that printing to the standard output from a native method is a bad plan, because writing to standard output
in both C++ and Java’s System.out.println could yield intermixed output, due to buffering.
c++book.mif Page 254 Tuesday, April 29, 2003 2:13 PM
1 class Date
2 {
3 public Date( int m, int d, int y )
4 { month = m; day = d; year = y; }
5
6 static
7 {
8 System.loadLibrary( "Date" );
9 }
10
11 native public void printDate( );
12
13 public int getMonth( )
14 { return month; }
15 public int getDay( )
16 { return day; }
17 public int getYear( )
18 { return year; }
19
20 public String toString( )
21 { return month + "/" + day + "/" + year; }
22
23 private int month;
24 private int day;
25 private int year;
26 }
1 class TestDate
2 {
3 public static void main( String[ ] args )
4 {
5 Date d = new Date( 8, 23, 2003 );
6 d.printDate( );
7 }
8 }
The trickiest step in the process is obtaining the jfieldID. The name of the field is easy
enough, but the type is difficult, because it can be any arbitrary type, such as array of some other
class type. Once again, JNI specifies a coding mechanism for types, and also for method signa-
tures. Although the algorithm is relatively short, it can be tedious to do by hand. Fortunately,
Java provides a program to generate all the field types and method signatures for you. To do so,
invoke javap with the name of the class that you are interested. Once again, the class must
already have been compiled. javap requires two options to get the complete listing, so the
command for our example as
javap -s -private Date
The output of javap is shown in Figure 13-7. As we can see, int is represented as I. On
the other hand, String is Ljava/lang/String;, and omitting the L or the ; will give
incomprehensible runtime errors. Thus it is best to use javap.
c++book.mif Page 258 Tuesday, April 29, 2003 2:13 PM
1 #include "Date.h"
2 #include <iostream>
3 using namespace std;
4
5 JNIEXPORT void JNICALL
6 Java_Date_printDate( JNIEnv * env, jobject ths )
7 {
8 jclass cls = env->GetObjectClass( ths );
9
10 jfieldID monthID = env->GetFieldID( cls, "month", "I" );
11 jfieldID dayID = env->GetFieldID( cls, "day", "I" );
12 jfieldID yearID = env->GetFieldID( cls, "year", "I" );
13
14 jint m = env->GetIntField( ths, monthID );
15 jint d = env->GetIntField( ths, dayID );
16 jint y = env->GetIntField( ths, yearID );
17
18 cout << m << "/" << d << "/" << y << endl;
19 }
1 #include "Date.h"
2 #include <iostream>
3 using namespace std;
4
5 JNIEXPORT void JNICALL
6 Java_Date_printDate( JNIEnv * env, jobject ths )
7 {
8 jclass cls = env->GetObjectClass( ths );
9 jmethodID getMonthID, getDayID, getYearID;
10
11 getMonthID = env->GetMethodID( cls, "getMonth", "()I" );
12 getDayID = env->GetMethodID( cls, "getDay", "()I" );
13 getYearID = env->GetMethodID( cls, "getYear", "()I" );
14
15 jint m = env->CallIntMethod( ths, getMonthID );
16 jint d = env->CallIntMethod( ths, getDayID );
17 jint y = env->CallIntMethod( ths, getYearID );
18
19 cout << m << "/" << d << "/" << y << endl;
20 }
Figure 13-9 provides our second implementation of printDate and shows how instance
methods are invoked. As before, we get a jclass entity, and then at lines 11 to 13, we obtain
the jmethodIDs for each of the methods that we want to invoke (the jmethodID variable
declarations are all together on line 9 simply to avoid making lines 11 to 13 too long). Once we
have the jmethodIDs, we can use them to invoke the method, as shown at lines 15 to 17. As
with accessing instance fields, the first parameter is the object on which the method is to be
invoked. If this method took additional parameters, they would follow the jmethodIDs in the
parameter list.
Invocation of CallXXXMethod uses dynamic dispatch. An alternative is
XXX CallNonvirtualXXXMethod( jobject ths, jmethodID m, ... );
that does not use dynamic dispatch, but that is hardly ever something you would want to do.
13.5.1 Strings
To obtain a primitive const char * from a jstring, we invoke environment function
GetStringUTFChars. The parameters to GetStringUTFChars are the jstring and
the address of a jboolean variable. If this address is not NULL, the Boolean variable will be
set to true if the char * is a copy of the original (which is to be expected, since Java uses 16-
bit Unicode, and the char * is an 8-bit UTF), and false if somehow a copy was not made.
Since it rarely matters whether or not a copy was made, usually NULL is passed as a second
parameter.
When we are done with the const char * that has been produced by
GetStringUTFChars by allocation from the memory heap, we must release them by invok-
ing ReleaseStringUTFChars. Failure to do so leads to a memory leak.
c++book.mif Page 261 Tuesday, April 29, 2003 2:13 PM
1 #include "Date.h"
2 #include <iostream>
3 using namespace std;
4
5 JNIEXPORT void JNICALL
6 Java_Date_printDate( JNIEnv * env, jobject ths )
7 {
8 jclass cls = env->GetObjectClass( ths );
9
10 jmethodID toStringID = env->GetMethodID( cls, "toString",
11 "()Ljava/lang/String;" );
12
13 jstring str = (jstring) env->CallObjectMethod( ths,
14 toStringID );
15
16 const char *c_ret = env->GetStringUTFChars( str, NULL );
17 cout << "(calling toString) " << c_ret << endl;
18 env->ReleaseStringUTFChars( str, c_ret );
19 }
Figure 13-11 Using strings (primitive style) for static native method StringAdd.add
c++book.mif Page 262 Tuesday, April 29, 2003 2:13 PM
Figure 13-12 Using C++ string library for static native method StringAdd.add
dows platform, we get 008D9E3C; other nonsense is produced on Unix machines. In fact, we
are lucky not to crash the Virtual Machine. Instead, at line 16 we get a null-terminated primitive
string, print it, and then free the C-style string at line 18.
New jstring objects can be created (mostly for the purposes of returning one) by
invoking NewStringUTF. As an example, the code in Figure 13-11 shows a routine that
implements a string concatenation in native code. Once again, this is a silly example, but illus-
trates the syntax.
As we can see by lines 1 and 2, we are implementing method add in class StringAdd.
This is a static method, since the second parameter is a jclass. The two parameters are both
String, and the return type is String. At lines 4 and 5 we obtain the UTF string, and at line
6, we allocate an array that will store the result of the concatenation. The +1 is needed to provide
space for the null terminator. At lines 8 and 9, we compute the result of concatenation by first
copying a1 to c, and then appending b1 to c. Then we call NewStringUTF at line 10 to form
a jstring that can be returned at line 16. Prior to returning, we must clean up memory. Lines
12 and 13 show the calls to ReleaseStringUTFChars, and line 14 cleans up the call to
new[] with a matching delete[].
A cleaner alternate that avoids the calls to new[] and delete[] by using the C++
string library type is shown in Figure 13-12.
13.5.2 Arrays
The JNI defines eight array types for the primitives. A typical example is jintArray. Addi-
tionally, jobjectArray is used to represent an array of Object. The environment function
GetArrayLength can be used get the length of any array object, passed as a parameter. To
access individual items in the array, we have to use one strategy for primitives, and another for
objects.
c++book.mif Page 263 Tuesday, April 29, 2003 2:13 PM
1 class NativeSumDemo
2 {
3 native public static double sum( double [ ] arr );
4
5 static
6 {
7 System.loadLibrary( "Sum" );
8 }
9
10 public static void main( String [ ] args )
11 {
12 double [ ] arr = { 3.0, 6.5, 7.5, 9.5 };
13 System.out.println( sum( arr ) );
14 }
15 }
1 #include "NativeSumDemo.h"
2
3 JNIEXPORT jdouble JNICALL Java_NativeSumDemo_sum
4 ( JNIEnv *env, jclass cls, jdoubleArray arr )
5 {
6 jdouble sum = 0;
7 jsize len = env->GetArrayLength( arr );
8
9 // Get the elements; don't care to know if copied or not
10 jdouble *a = env->GetDoubleArrayElements( arr, NULL );
11
12 for( jsize i = 0; i < len; i++ )
13 sum += a[ i ];
14
15 // Release elements; no need to flush back
16 env->ReleaseDoubleArrayElements( arr, a, JNI_ABORT );
17
18 return sum;
19 }
double) and initialized at line 6. At line 7, we obtain the length of the array by calling
GetArrayLength.
Line 10 gets a C-style array from the jdoubleArray. As was the case with strings, this
array could be a copy, or it could be the original, depending mostly on the implementation of the
Virtual Machine. With arrays, if double and jdouble have identical representations, and if
the Java array is stored contiguously (which is not guaranteed, since the garbage collector may
elect to move parts of the array to reduce fragmentation), then it is possible that the pointer vari-
able is actually pointing to the memory that stores the original double[] inside the virtual
machine. In such a case, no copy is made. However, once this pointer is handed out, the garbage
collector could not safely move parts of the array without invalidating the pointer. Thus, if no
copy is made, the original array is pinned and cannot relocate until the pointer is released.
Once we have the C-style array, we can compute its sum. Prior to returning, we must
release the array, as shown at line 16. When we were working with strings, releasing the string
returned the memory back to the system. With arrays, there are two independent issues.
First, if the array is a copy, we must copy any changes back to the original; otherwise they
are not reflected. Second, if the array is a copy, we must have memory reclaimed. As a result, the
last parameter to ReleaseXXXArrayElements can be either of 0, JNI_COMMIT, or
JNI_ABORT. If the parameter is 0, we flush the contents back to the original, the reflecting all
changes, and then reclaim the memory if needed. JNI_COMMIT flushes the contents, but does
not reclaim the memory. This can be useful if changes need to be reflected immediately, but
1 /* DO NOT EDIT THIS FILE - it is machine generated */
2 #include <jni.h>
3 /* Header for class NativeSumDemo */
4
5 #ifndef _Included_NativeSumDemo
6 #define _Included_NativeSumDemo
7 #ifdef __cplusplus
8 extern "C" {
9 #endif
10 /*
11 * Class: NativeSumDemo
12 * Method: sum
13 * Signature: ([D)D
14 */
15 JNIEXPORT jdouble JNICALL Java_NativeSumDemo_sum
16 (JNIEnv *, jclass, jdoubleArray);
17
18 #ifdef __cplusplus
19 }
20 #endif
21 #endif
would not be needed if no copy was made (hence the second parameter to
GetXXXArrayElements could be useful to decide if this call should be made). JNI_ABORT
does not flush the contents, but reclaims the memory if needed. JNI_ABORT would be useful if
no changes were made to the array, because then we would avoid the flushing that would be
done with a parameter of 0. In fact, this is exactly the case we have, so we call
ReleaseXXXArrayElements with JNI_ABORT as the parameter.
Accessing elements in a jobjectArray is more difficult because we cannot obtain the
C-style equivalent. Instead, we must call
GetObjectArrayElement( array, idx );
SetObjectArrayElement( array, idx, val );
through the env pointer. Clearly this makes accessing arrays of objects fairly slow.
New arrays can be created by using
NewObjectArray( jclass cls, int len, jobject default );
NewXXXArray( int len, XXX default );
through the env pointer; the return type is the appropriate jarray type.
1 class NativeSumDemo
2 {
3 native public static double sum( double [] arr )
4 throws Exception;
5
6 static
7 {
8 System.loadLibrary( "Sum" );
9 }
10
11 public static void main( String[] args )
12 {
13 double [ ] arr1 = { 3.0, 6.5, 7.5, 9.5 };
14 double [ ] arr2 = { };
15
16 try
17 {
18 System.out.println( sum( arr1 ) );
19 System.out.println( sum( arr2 ) );
20 }
21 catch( Exception e )
22 {
23 System.out.println( "Caught the exception!" );
24 e.printStackTrace( );
25 }
26 }
27 }
Figure 13-16 Same main that illustrates exception being thrown by native call
If the native code does neither, and instead continues executing native code, even though
an exception is pending, then the behavior when other environment functions are called is unde-
fined and dangerous.
If the native method needs to throw an exception on its own, then it can do so using the
environment functions Throw or ThrowNew. The result of calling either function is that an
exception is now pending; however, as before, the pending exception does not terminate the
native method. Instead, a return statement should immediately follow, and certainly other envi-
ronment functions should not be called. So if it is important to release array elements, do so
before invoking Throw or ThrowNew.
ThrowNew is much easier to use than Throw because you can simply give the complete
name of the exception class and the parameter that is passed to its constructor. We illustrate how
a native method can throw an exception in Figure 13-16, which provides a main and,
Figure 13-17, which implements the native method itself.
In Figure 13-16, we see that method sum might throw an Exception; the exception is
thrown if the array has length 0. (This is terrible style, and a better exception should be used, in
c++book.mif Page 267 Tuesday, April 29, 2003 2:13 PM
1 #include <iostream>
2 using namespace std;
3
4 #include "NativeSumDemo.h"
5
6 JNIEXPORT jdouble JNICALL Java_NativeSumDemo_sum
7 ( JNIEnv *env, jclass cls, jdoubleArray arr)
8 {
9 jdouble sum = 0;
10 jsize len = env->GetArrayLength( arr );
11
12 if( len == 0 )
13 {
14 env->ThrowNew( env->FindClass( "java/lang/Exception" ),
15 "Empty array" );
16 cout << "Throwing an exception, but should exit" << endl;
17 return 0.0;
18 }
19
20 // Get the elements; don't care to know if copied or not
21 jdouble *a = env->GetDoubleArrayElements( arr, NULL );
22
23 for( jsize i = 0; i < len; i++ )
24 sum += a[ i ];
25
26 // Release elements; no need to flush back
27 env->ReleaseDoubleArrayElements( arr, a, JNI_ABORT );
28
29 return sum;
30 }
general; however using Exception illustrates a point as we will see when the native method is
implemented). Note also that the library loaded at line 8 is Sum, rather than NativeSumDemo.
When you compile the example, make sure the appropriate link library is created. The rest of the
code is standard fare, and we expect that the second call to sum triggers the catch block.
Figure 13-17 Illustrates the implementation of the native method. First, note that the
throws list declared in Java is not part of the native function name at line 6. The additional code
at lines 12 to 18 shows the test for the case of a zero-length array. When this test succeeds, a new
exception is created, and marked as pending at lines 14 and 15 by the call to ThrowNew. Note
that the name of the exception must reflect the complete class name, including package name.
However, ThrowNew does not cause an immediate return, so the print statement at line 16 will
be executed.
c++book.mif Page 268 Tuesday, April 29, 2003 2:13 PM
The return value at line 17 is required to avoid warnings from the C++ compiler, but is
never used by the caller, because the return immediately causes the Virtual Machine to throw a
Java exception.
This means that you cannot cache local references, nor pass them to other threads. If you
do so, and try to use them later on, you may find that the garbage collector has reclaimed the
objects.
A global reference is a wrapper around a local reference. Unlike local references, global
reference are valid across multiple threads and multiple native calls, until a call to
DeleteGlobalRef. If there is no call to DeleteGlobalRef, the global reference is valid
for the entire duration of the Virtual Machine, which could be a problem for general objects and
arrays, but is probably reasonable behavior for jclass references.
From this discussion, we also know that local references are valid for the duration of the
native method call. However, if the native method makes a time-consuming function call, it
might be worth releasing local references prior to making the function call. Alternatively, if the
native method call creates many local references to large objects, it might be prudent to release
some of the references when they are no longer needed. As an example, if we are iterating
through a jobjectArray, each call to GetObjectArrayElement creates a jobject. It
c++book.mif Page 270 Tuesday, April 29, 2003 2:13 PM
may be worth reclaiming it as we advance to the next array element. This is done with
DeleteLocalRef. Figure 13-19 illustrates this with an example in a native routine that
counts the total string length in an array of strings. We use DeleteLocalRef after accessing
each string, prior to proceeding to the next string. Depending on the underlying implementation
of the JNI, the number of strings, and the size of the string objects, this could help performance,
or simply have no noticeable effect.
is equivalent to
synchronized( obj )
{
/* synchronized block */
}
Don’t forget to invoke MonitorExit. Although there are no environment functions to invoke
wait and notifyAll, these can be called by obtaining a jmethodID, and calling the appro-
priate function using the normal JNI mechanism.
13.10Invocation API
The Invocation API allows the C++ programmer to create a Virtual Machine from inside a C++
program. Once the Virtual Machine is created, the normal mechanism can be used to invoke the
main method of any class.
The code to do so is boilerplate and is shown in Figure 13-20. Clearly it can be general-
ized to allow any class, and to allow command-line arguments to main. The hard part of this
code is to do the compilation. In short, in addition to providing options to specify the include
directories, you must make sure that library jvm.lib (for Windows) or jvm (for Unix) is used
in the compilation. For Unix this involves using two options: -L to specify the search path for
libraries, and -l to specify the library itself. Using the Visual Studio products, the complete
library name is included with the compilation command, and the PATH environment variable is
c++book.mif Page 271 Tuesday, April 29, 2003 2:13 PM
Figure 13-20 Creating a Java VM from C++ main; invokes Hello with no parameters
set to include that hotspot compiler C:\jdk\jre\bin\hotspot. The online code contains
more specific compilation instructions.
c++book.mif Page 272 Tuesday, April 29, 2003 2:13 PM
13.11Key Points
• Java native methods are specified with the native reserved word.
• After the class is compiled, we can run javah to generate a C/C++ header file.
• After the implementation is written in the native language, it must be compiled into a
dynamic library.
• The library should be loaded by System.loadLibrary, typically invoked from the
static initializer of the native method’s class.
• The eight Java primitives have corresponding native types, such as jint.
• JNI_TRUE and JNI_FALSE are defined to represent true and false, respectively.
• jstring and jobject represent String and Object.
• jXXXArray is used to represent the eight primitive Java arrays.
• jobjectArray represents an array of Object.
• jclass represents a Java class type.
• jmethodID represents a method.
• jfieldID represents a field.
• The Java native method is implemented by a native function whose name incorporates the
package name, class name, and possibly signature of the Java native method.
• The first parameter to a native function is the environment pointer.
• The second parameter is a jclass, representing the class type for static methods, or a
jobject, representing this, for instance methods.
• Additional parameters will be listed in the native function, as declared in the Java native
method.
• javap is used to obtain a list of encoded field type signatures and method signatures.
• An instance field of an object is accessed by getting a jfieldID from the class type,
field name, and (encoded) field type, and then invoking GetXXXField, with a jobject
and jfieldID. Static fields can be accessed by GetStaticFieldID and
GetStaticXXXField, with a jclass in place of a jobject. Fields can also be
changed with SetXXXField and SetStaticXXXField.
• An instance method of an object is invoked using dynamic dispatch by getting a
jmethodID from the class type, method name, and (encoded) method signature, and
then invoking CallXXXMethod, with a jobject, jmethodID, and parameters to the
method. Static methods can be invoked by GetStaticMethodID and
CallStaticXXXMethod, with a jclass in place of a jobject. Methods can also
be invoked without dynamic dispatch by using CallNonvirtualXXXMethod.
• A C-style string (const char *) can be extracted from a jstring by invoking the
environment function GetStringUTFChars. ReleaseStringUTFChars should
be called when the C-style string is no longer needed.
• A C-style primitive array of primitive types, such as jint * can be extract from a
jarray (such as jintArray) by invoking the environment function
c++book.mif Page 273 Tuesday, April 29, 2003 2:13 PM
Exercises 273
13.12Exercises
12. How are constructors for new Java objects invoked in C++ code?
13. Why can’t a jstring be safely type-cast to a const char *? What is the correct way
to obtain a const char * from a jstring? What memory issues must be dealt with?
14. How is a jstring created from a const char *?
15. Explain how Java arrays are accessed in C++ code.
16. What does it mean for an array to be pinned? How can you tell if an array is pinned?
17. What does the last parameter to ReleaseXXXArrayElements do?
18. How can a C++ native method signal an exception?
19. How can a C++ native method tell if invoking a Java method caused an exception be
raised? Can a C++ native method handle an exception?
20. How does C native code differ from C++ native code?
21. What is a local reference, and how long are local references valid?
22. What is a global reference, and how long are global references valid?
23. How can a C++ native method obtain and release a monitor?
24. What is the invocation API?
25. Can a native method make changes to a final field? Find out by writing a program that
attempts to do so.
26. Can a native method invoke a private method of another class? Find out by writing a pro-
gram that attempts to do so.
27. Class CIO, defined below is intented to allow the output of a single int, double, or
String, using C-style sprintf formatting. Each of the native methods returns a
String, in which the escape sequence in the control string is replaced by the second
parameter. The native methods are implemented by calling the C library routine sprintf. In
the call to sprintf, provide a large buffer for the first parameter (and hope for the best),
and then pass control and var as parameters to the C library sprintf.
class CIO
{
public native static String sprintf( String control, String var );
public native static String sprintf( String control, int var );
public native static String sprintf( String control, double var );
}
28. Implement the standard matrix multiplication algorithm using a native method, throwing
an exception if the matrices have incompatible sizes. The signature of your method is:
native public static double [][] multiply( double [][] a,
double [][] b );
c++book.mif Page 275 Tuesday, April 29, 2003 2:13 PM
Bibliography
T
he thinking behind C++ is described in [13]. The two
1,000-page gorrillas that describe C++ are [12] and [5] (with answers to the latter title provided
in [15]). The books [8] and [9] provide great tips for safe C++ programming. Advanced features
of the C++ language itself are discussed in [2], [4], [7], and [14]. A collection of answers to fre-
quently asked C++ questions is provided in [1]. The C++ I/O library is described in great detail
in [6]. Good references for the STL include [10] and [11], as well as the 1000-page gorillas. The
classic reference for the C programming language is [3].
1. M. Cline, G. Lomow, and M. Girou, C++ FAQs, 2d ed., Addison-Wesley, Reading, Mass.,
1999.
2. J. O. Coplien, Advanced C++, Addison-Wesley, Reading, Mass., 1992.
3. B. W. Kernighan and D. M. Ritchie, The C Programming Language, 2d ed., Prentice-Hall,
Englewood Cliffs, N.J., 1997.
4. A. Koening and B. Moo, Ruminations on C++, Addison-Wesley, Reading, Mass., 1997.
5. J. Lajoie and S. Lippman, C++ Primer, 3d ed., Addison-Wesley, Reading, Mass., 1998.
6. A Langer and K. Kreft, Standard C++ IOStreams and Locales: Advanced Programmer's
Guide and Reference, Addison-Wesley, Reading, Mass., 2000.
7. S. Lippman, Essential C++, Addison-Wesley, Reading, Mass., 2000.
8. S. Meyers, Effective C++, 2d ed., Addison-Wesley, Reading, Mass., 1998.
9. S. Meyers, More Effective C++, Addison-Wesley, Reading, Mass., 1996.
10. S. Meyers, Effective STL, Addison-Wesley, Reading, Mass., 2001.
11. D. R. Musser and A. Saini, C++ Programming with the Standard Template Library, Addi-
son-Wesley, Reading, Mass., 1996.
275
c++book.mif Page 276 Tuesday, April 29, 2003 2:13 PM
276 Bibliography
Index
277