CPP Tutorial
CPP Tutorial
C++ Tutorials
• Object-oriented Design
1
C++ Tutorials
C++ Tutorials....................................................................................................................................................................1
C++ Language and Library..............................................................................................................................................5
C++ Namespaces..............................................................................................................................................................6
INTRODUCTION TO NAMESPACES - PART 1...............................................................................................6
INTRODUCTION TO NAMESPACES - PART 2...............................................................................................7
INTRODUCTION TO NAMESPACES - PART 3...............................................................................................8
INTRODUCTION TO C++ NAMESPACES - PART 4 ......................................................................................9
New Fundamanental Type - bool ................................................................................................................................. 10
Stream I/O...................................................................................................................................................................... 13
INTRODUCTION TO STREAM I/O PART 1 - OVERLOADING <<........................................................... 13
INTRODUCTION TO STREAM I/O PART 2 - FORMATTING AND MANIPULATORS........................ 15
INTRODUCTION TO STREAM I/O PART 3 - COPYING FILES ................................................................ 16
INTRODUCTION TO STREAM I/O PART 4 - TIE()...................................................................................... 20
INTRODUCTION TO STREAM I/O PART 5 - STREAMBUF...................................................................... 22
INTRODUCTION TO STREAM I/O PART 6 - SEEKING IN FILES............................................................ 23
C++ Virtual Functions .................................................................................................................................................. 24
Templates....................................................................................................................................................................... 31
INTRODUCTION TO C++ TEMPLATES PART 1 - FUNCTION TEMPLATES........................................ 31
NEW C++ FEATURE - MEMBER TEMPLATES ........................................................................................... 32
INTRODUCTION TO C++ TEMPLATES PART 2 - CLASS TEMPLATES................................................ 33
INTRODUCTION TO TEMPLATES PART 3 - TEMPLATE ARGUMENTS.............................................. 36
INTRODUCTION TO TEMPLATES PART 4 - SPECIALIZATIONS .......................................................... 37
INTRODUCTION TO TEMPLATES PART 5 - FORCING INSTANTIATION........................................... 39
INTRODUCTION TO TEMPLATES PART 6 - FRIENDS ............................................................................. 40
Use of Static................................................................................................................................................................... 41
THE MEANING OF "STATIC".......................................................................................................................... 41
LOCAL STATICS AND CONSTRUCTORS/DESTRUCTORS ..................................................................... 43
Mutable .......................................................................................................................................................................... 44
Explicit ........................................................................................................................................................................... 46
Standard Template Library........................................................................................................................................... 48
INTRODUCTION TO STL PART 1 - GETTING STARTED ......................................................................... 48
INTRODUCTION TO STL PART 2 - VECTORS, LISTS, DEQUES ............................................................ 50
INTRODUCTION TO STL PART 3 - SETS ..................................................................................................... 52
INTRODUCTION TO STL PART 4 - MAPS.................................................................................................... 53
INTRODUCTION TO STL PART 5 - BIT SETS.............................................................................................. 55
INTRODUCTION TO STL PART 6 - STACKS ............................................................................................... 55
INTRODUCTION TO STL PART 7 - ITERATORS ........................................................................................ 56
INTRODUCTION TO STL PART 8 - ADVANCE() AND DISTANCE() ..................................................... 57
INTRODUCTION TO STL PART 9 - SORTING............................................................................................. 59
INTRODUCTION TO STL PART 10 - COPYING .......................................................................................... 60
INTRODUCTION TO STL PART 11 - REPLACING...................................................................................... 61
INTRODUCTION TO STL PART 12 - FILLING............................................................................................. 63
INTRODUCTION TO STL PART 13 - ACCUMULATING ........................................................................... 64
INTRODUCTION TO STL PART 14 - OPERATING ON SETS.................................................................... 64
Exception Handling....................................................................................................................................................... 65
INTRODUCTION TO EXCEPTION HANDLING PART 1 - A SIMPLE EXAMPLE................................. 65
INTRODUCTION TO EXCEPTION HANDLING PART 2 - THROWING AN EXCEPTION .................. 67
INTRODUCTION TO EXCEPTION HANDLING PART 3 - STACK UNWINDING................................. 69
INTRODUCTION TO EXCEPTION HANDING PART 4 - HANDLING AN EXCEPTION...................... 70
INTRODUCTION TO EXCEPTION HANDLING PART 5 - TERMINATE() AND UNEXPECTED() .... 72
Placement New/Delete.................................................................................................................................................. 74
Pointers to Members and Functions ............................................................................................................................. 76
POINTERS TO MEMBERS................................................................................................................................ 76
A NEW ANGLE ON FUNCTION POINTERS ................................................................................................. 78
2
C++ Tutorials
3
C++ Tutorials
4
C++ Tutorials
5
C++ Tutorials
C++ Namespaces
#include "vendor1.h"
#include "vendor2.h"
and then it turns out that the headers have this in them:
// vendor1.h
class String {
...
};
// vendor2.h
class String {
...
};
This usage will trigger a compiler error, because class String is defined twice. In other words,
each vendor has included a String class in the class library, leading to a compile-time clash.
Even if you could somehow get around this compile-time problem, there is the further
problem of link-time clashes, where two libraries contain some identically-named symbols.
The namespace feature gets around this difficulty by means of separate named namespaces:
// vendor1.h
namespace Vendor1 {
class String {
...
};
}
6
C++ Tutorials
// vendor2.h
namespace Vendor2 {
class String {
...
};
}
There are no longer two classes named String, but instead there are now classes named
Vendor1::String and Vendor2::String. In future discussions we will see how namespaces can
be used in applications.
namespace Vendor2 {
class String { ... };
}
How would you actually use the String classes in these namespaces? There are a couple of
common ways of doing so. The first is simply to qualify the class name with the namespace
name:
Vendor1::String s1, s2, s3;
This usage declares three strings, each of type Vendor1::String.
Another approach is to use a using directive:
using namespace Vendor1;
Such a directive specifies that the names in the namespace can be used in the scope where the
using directive occurs. So, for example, one could say:
using namespace Vendor1;
7
C++ Tutorials
Vendor2::String s2;
would still work.
You might have noticed that namespaces have some similarities of notation with nested
classes. But namespaces represent a more general way of grouping types and functions. For
example, if you have:
class A {
void f1();
void f2();
};
then f1() and f2() are member functions of class A, and they operate on objects of class A (via
the "this" pointer). In contrast, saying:
namespace A {
void f1();
void f2();
}
is a way of grouping functions f1() and f2(), but no objects or class types are involved.
namespace B {
void f1();
void f2();
}
The members of the namespace can be accessed by using qualified names, for example:
void g() {A::f1();}
or by saying:
using namespace A;
void g() {f1();}
Another interesting aspect of namespaces is that of the unnamed namespace:
namespace {
void f1();
int x;
}
This is equivalent to:
namespace unique_generated_name {
void f1();
int x;
}
using namespace unique_generated_name;
8
C++ Tutorials
All unnamed namespaces in a single scope share the same unique name. All global unnamed
namespaces in a translation unit are part of the same namespace and are different from similar
unnamed namespaces in other translation units. So, for example:
namespace {
int x1;
namespace {
int y1;
}
}
namespace {
int x2;
namespace {
int y2;
}
}
x1 and x2 are in the same namespace, as are y1 and y2.
Why is this feature useful? It provides an alternative to the keyword "static" for controlling
global visibility. "static" has several meanings in C and C++ and can be confusing. If we have:
static int x;
static void f() {}
we can replace these lines with:
namespace {
int x;
void f() {}
}
namespace Vendor2 {
class String {
...
};
int x;
}
and how those names can be accessed via qualification:
Vendor2::String s;
or a using directive:
9
C++ Tutorials
String s;
Another way of accessing names is to employ a using declaration:
using Vendor2::String;
String s;
This can be a little confusing. A using directive:
using namespace X;
says that all the names in namespace X are available for use, but none of them are actually
declared or introduced. A using declaration, on the other hand, actually introduces a name into
the current scope. So saying:
using namespace Vendor2;
makes String and x available for use, but doesn't declare them. Saying:
using Vendor2::String;
actually introduces Vendor2::String into the current scope as a declaration. Saying:
using Vendor1::String;
using Vendor2::String;
will trigger a "duplicate declaration" compiler error.
There are several other aspects of using declarations that are worth learning about; these can
be found in a good C++ reference book.
b = true;
if (b)
...
A bool value is either true or false. A bool value can be converted to an integer:
bool b;
int i;
b = false;
i = int(b);
in which case false turns into 0 and true into 1. This process goes under the C/C++ name of
"integral promotion".
10
C++ Tutorials
A pointer, integer, or enumeration can be converted to a bool. A null pointer or zero value
becomes false, while any other value becomes true. Such conversion is required for
conditional statements:
char* p;
...
if (p)
...
In this example "p" is converted to bool and then the true/false value is checked to determine
whether to execute the conditional block of code.
Why is a bool type an advantage? You can get a variety of opinions on whether this is a step
forward. In C, common usage to mimic this type would be as follows:
typedef int Bool;
#define FALSE 0
#define TRUE 1
One problem with such an approach is that it's not at all type-safe. For example, a programmer
could say:
Bool b;
b = 37;
and the compiler wouldn't care. Another problem is displaying values of Boolean type:
printf("%s", b ? "true" : "false");
which is awkward. In C++ it is possible to set up a stream I/O output operator specifically for
a particular type, and thus output of bool values can be distinguished from plain integral types.
This is an example of function overloading (see next section). Without bool as a distinct type,
usage like:
void f(int i) {}
void f(Bool b) {}
would be invalid.
Finally, why wasn't bool added to the language, but as a class type found in a standard library?
This question is hard to answer, but one possible reason is that many C implementations have
supplied a Boolean pseudo-type using a typedef and #define scheme as illustrated above, and
these implementations rely on representing Booleans as integral types rather than as class
types.
(further comment)
In the last issue we talked about the new fundamental type "bool". Two additional comments
should be made about this feature. An example of how Boolean has been faked in C was
given:
typedef int Bool;
#define FALSE 0
#define TRUE 1
and then usage like this:
11
C++ Tutorials
Bool b;
b = 37;
was presented, with a comment that a C compiler would not complain. A C++ compiler given
similar usage:
bool b;
b = 37;
will not complain either, but the two sequences are not the same. In the C case, a later
statement like:
if (b == TRUE)
...
will fail, because it reduces to:
if (37 == 1)
...
In the C++ case, the statement:
b = 37;
turns into:
b = true;
and a later test:
if (b == true)
...
will indeed succeed.
The issue was also raised as to why bool was not implemented as a class type in some C++
standard library. Dag Bruck of the ANSI/ISO C++ committees sent an example of why this
will not work.
There is a rule in C++ that says that at most one user-defined conversion may be automatically
applied. A user-defined conversion is a constructor like:
class A {
public:
A(int);
};
to convert an int to an A, or a conversion function:
class A {
public:
operator int();
};
to convert an A to an int.
If bool is a class type, for example:
class bool {
public:
operator int();
};
then the call "f(3 < 4)" in this code:
class X {
public:
X(int);
12
C++ Tutorials
};
void f(X);
main()
{
f(3 < 4);
}
will result in two user-defined conversions, one to convert the bool class object resulting from
"3 < 4" to an int, the other to call the X(int) constructor on the resulting int.
Stream I/O
INTRODUCTION TO STREAM I/O PART 1 - OVERLOADING <<
In this issue we will begin discussing the stream I/O package that comes with C++. The first
four sections of this issue are related and present several aspects of stream I/O along with
some related topics.
If you've used C++ at all, you've probably seen a simple example of how to do output:
cout << "Hello, world" << "\n";
instead of:
printf("Hello, world\n");
cout is an output stream, kind of like stdout in C. The C example could be written as:
fprintf(stdout, "Hello, world\n");
which makes this correspondence a bit clearer.
Once you get beyond simple input/output usage, what is the stream I/O package good for? One
quite useful thing it can do is to allow the programmer to take control of I/O for particular
C++ types such as classes. This end is achieved by the use of operator overloading.
Suppose that we have a Date class:
class Date {
int month;
int day;
int year;
public:
Date(char*);
Date(int, int, int);
};
with an internal representation of a Date using three integers for month, day, and year, and a
couple of constructors to create a Date object. How would we output the value of a Date
object?
One way would be to devise a member function:
void out();
13
C++ Tutorials
implemented as:
void Date::out()
{
printf("%d/%d/%d", month, day, year);
}
This function would operate on an object instance of a Date and would access the
month/day/year members and display them. This approach will certainly work and may be
suitable in some kinds of applications.
But this scheme doesn't integrate very well with stream I/O. For example, I cannot say:
Date d(9, 25, 1956);
14
C++ Tutorials
means:
x.operator@(y);
that is, the left operand of the operator must be an instance of the class of which the
overloaded operator is a member.
cout << n;
is equivalent to:
printf("%d", n);
that is, no special formatting is done.
But what if you want to say:
printf("%08d", n);
displaying n in a field 8 wide with leading 0s? Such an operation would be performed by
saying:
#include <iostream.h>
#include <iomanip.h>
/* stuff */
15
C++ Tutorials
A statement like:
cout.setf(ios::left);
calls the member function setf() inherited from the ios class, to set flags for the stream.
ios::left is an enumerator representing a particular flag value.
How can you design your own manipulators? A simple example is as follows:
#include <iostream.h>
#include <iomanip.h>
main()
{
cout << "xxx" << dash << "yyy" << endl;
return 0;
}
We define a manipulator called "dash" that inserts a dash into an output stream. This is
followed by the output of more text and then a builtin manipulator ("endl") is called. endl
inserts a newline character and flushes the output buffer. We will say more about endl later in
the newsletter.
Manipulators are in fact pointers to functions, and they are implemented via a couple of hooks
in iostream.h:
ostream& operator<<(ostream& (*)(ostream&));
ostream& operator<<(ios& (*)(ios&));
These operators are member functions of class ostream. They will accept either a pointer to
function that takes an ostream& or a pointer to function that takes an ios&. The former would
be used for actual output, the latter for setting ios flags as discussed above.
16
C++ Tutorials
assert(argc == 3);
fclose(fpin);
fclose(fpout);
return 0;
}
EOF is a marker used to signify the end of file; its value typically is -1. In most commonly-
used operating systems there is no actual character in a file to signify end of file.
This approach works on text files. Unfortunately, however, for binary files, an attempt to copy
a 10406-byte file resulted in output of only 383 bytes. Why? Because EOF is itself a valid
character that can occur in a binary file. If set to -1, then this is equivalent to 255 or 0377 or
0xff, a perfectly legal byte in a file. So we would need to say:
#include <stdio.h>
#include <assert.h>
assert(argc == 3);
for (;;) {
c = getc(fpin);
if (feof(fpin))
break;
17
C++ Tutorials
fputc(c, fpout);
}
fclose(fpin);
fclose(fpout);
return 0;
}
feof() is a macro that tells whether the previous operation, in this case getc(), hit end of file.
Note also that we open the files in binary mode.
How would we do the equivalent in C++? One way would be to say:
#include <fstream.h>
#include <assert.h>
char c;
while (ifs.get(c))
ofs.put(c);
return 0;
}
ifstream and ofstream are input and output file streams, taking a single char* argument and a
set of flags.
These classes are derived from ios, which has an operator conversion function (from a stream
object to void*). If a statement like:
assert(ifs && ofs);
is specified, then this conversion function is called. It returns 0 if there's something wrong
with the stream. In other words, an object like "ifs" is converted to a void* automatically, and
the value of the void* pointer tells the stream status (non-zero for a good state, zero for bad).
The actual copying is straightforward, using the get() member function. It accepts a reference
to a character, so there's no need to use the return value to pass back the character that was
read.
#include <fstream.h>
#include <assert.h>
18
C++ Tutorials
{
assert(argc == 3);
return 0;
}
with no loop involved. The expression:
ifs.rdbuf()
returns a filebuf*, a pointer to an object that actually represents the low-level buffering for the
file. filebuf is derived from a class streambuf, and ofstream is derived from ostream, and
ostream has an operator<< defined for streambufs. So the looping over the input file occurs
within operator<<. We are "outputting" a filebuf/streambuf.
Finally, how about code for copying standard input to output:
#include <iostream.h>
int main()
{
char c;
return 0;
}
If you run this program on text input, you will notice that the output's pretty jumbled. This is
because by default whitespace is skipped on input. To fix this problem, you can say:
#include <iostream.h>
int main()
{
char c;
cin.unsetf(ios::skipws);
return 0;
}
to disable the skipws flag. This program does not, however, work with binary files. To make it
work gets into a tricky issue; the binary mode is specified when opening a file, and in this
example standard input and output are already open. This ties in with low-level buffering and
19
C++ Tutorials
reading the first chunk of a file when it's opened. By contrast, skipping whitespace is a higher-
level operation in the stream I/O library.
(correction)
In issue #008 we talked about copying files and said this about one of the examples of copying
files using C:
This approach works on text files. Unfortunately, however, for binary
files, an attempt to copy a 10406-byte file resulted in output of only
383 bytes. Why? Because EOF is itself a valid character that can
occur in a binary file. If set to -1, then this is equivalent to 255
or 0377 or 0xff, a perfectly legal byte in a file.
This isn't quite the case. A common mistake when copying files in C is to use a char instead of
an int with getc() and putc(). If a char is used, then the explanation above is correct, because
with a binary file EOF interpreted as a character is one of the 256 valid bit patterns that a char
can hold.
But with an int this is not a problem. getc(), and its functional equivalent fgetc(), return an
unsigned char converted to an int. So the int can represent all character values 0-255, along
with the EOF marker (typically -1).
It turns out that the reason why the example failed was due to a ^Z in the file. ^Z used to be
used as an end-of-file marker for DOS files used on PCs.
Thanks to David Nelson for mentioning this.
int main()
{
char c;
cin.unsetf(ios::skipws);
return 0;
}
Jerry Schwarz suggested that it might be worth discussing the tie() function and its effect on
the performance of this code. Specifically, if we slightly change the above code to:
#include <iostream.h>
int main()
{
char c;
cin.tie(0);
20
C++ Tutorials
cin.unsetf(ios::skipws);
return 0;
}
it runs about 8X faster with one popular C++ compiler, and about 18X with another.
The difference has to do with buffering and flushing of streams. When input is requested, for
example with:
cin >> c
there may be output pending in the buffer for the output stream. The input stream is therefore
tied to the output stream such that a request for input will cause pending output to be flushed.
Flushing output is expensive, typically triggering a flush() call and a write() system call (on
UNIX systems). Disabling the linkage between the input and output streams gets rid of this
overhead.
To further illustrate this point, consider another example:
#include <iostream.h>
int main()
{
char buf[100];
//cin.tie(0);
cin.unsetf(ios::skipws);
cout.unsetf(ios::unitbuf);
return 0;
}
It's common for output to be completely unbuffered (unit buffering) if going to a terminal
(screen or window). So setting cin.tie(0) will not necessarily change observable behavior,
because output will be flushed immediately in all cases.
To affect behavior in this example, one also needs to disable unit buffering for the stream,
achieved by saying:
cout.unsetf(ios::unitbuf);
Once this is done, cin.tie(0) will change behavior in a visible way. If the input stream is
untied, then the prompt in the example above will not come out before input is requested from
the user, leading to confusion.
Note also that current libraries vary in their behavior. The above example works for one
library that was tried, but for another, there appears to be no way to disable unit buffering
21
C++ Tutorials
under any circumstances, when output is to a terminal. The draft ANSI/ISO C++ standard
calls for unit buffering to be set for error output ("cerr").
If tie() is called with no argument, it returns the stream currently tied to. For example:
cout << (void*)cin.tie() << "\n";
int main()
{
int c;
return 0;
}
This scheme uses what are known as streambufs, underlying buffers used in the stream I/O
package. An expression:
cin.rdbuf()->sbumpc()
says "obtain the streambuf pointer for the standard input stream (cin) and grab the next
character from it and then advance the internal pointer within the buffer". Similarly,
cout.rdbuf()->sputc(c)
adds a character to the output buffer.
Doing I/O in this way is lower-level than some other approaches, but correspondingly faster. If
we summarize the four file-copying methods we've studied (see issues #008 and #009 for code
examples of them), from slowest to fastest, they might be as follows.
Copy a character at a time with >> and <<:
cin.tie(0);
cin.unsetf(ios::skipws);
22
C++ Tutorials
while (ifs.get(c))
ofs.put(c);
Copy with streambufs (above):
while ((c = cin.rdbuf()->sbumpc()) != EOF)
cout.rdbuf()->sputc(c);
Copy with streambufs but explicit copying buried:
ifstream ifs(argv[1], ios::in | ios::binary);
ofstream ofs(argv[2], ios::out | ios::binary);
get/put 72
streambuf 62
streambuf hidden 43
Actual times will vary for a given library. Perhaps the most critical factor is whether functions
that are used in a given case are inlined or not. Note also that if you are copying binary files
you need to be careful with the way copying is done.
Why the time differences? All of these methods use streambufs in some form. But the slowest
method, using >> and <<, also does additional processing. For example, it calls internal
functions like ipfx() and opfx() to handle unit buffering, elision of whitespace on input, and so
on. get/put also call these functions.
The fastest two approaches do not worry about such processing, but simply allow one to
manipulate the underlying buffer directly. They offer fewer services but are correspondingly
faster.
int main()
{
ofstream ofs("xxx");
if (!ofs)
; // give error
23
C++ Tutorials
return 0;
}
Here we have an output file stream attached to a file "xxx". We open this file and write a
single blank character at the beginning of it. In this particular application this character is a
status character of some sort that we will update from time to time.
After writing the status character, we write some characters to the file, at which point we wish
to update the status character. To do this, we save the current position of the file using tellp(),
seek to the beginning, write the character, and then seek back to where we were, at which
point we can write some more characters.
Note that "streampos" is a defined type of some kind rather than simply a fixed fundamental
type like "long". You should not assume particular types when working with file offsets and
positions, but instead save the value returned by tellp() and then use it later.
In a similar way, it's tricky to use absolute file offsets other than 0 when seeking in files. For
example, there are issues with binary files and with CR/LF translation. You may be assuming
that a newline takes two characters when it only takes one, or vice versa.
seekp() also has a two-parameter version:
ofs.seekp(pos, ios::beg); // from beginning
24
C++ Tutorials
How might this be done in C++? One way is to use virtual functions. A virtual function is a
function member of a class, declared using the "virtual" keyword. A pointer to a derived class
object may be assigned to a base class pointer, and a virtual function called through the
pointer. If the function is virtual and occurs both in the base class and in derived classes, then
the right function will be picked up based on what the base class pointer "really" points at.
For graphics, we can use a base class called Shape, with derived classes named Line, Circle,
and Text. Shape and each of the derived classes has a virtual function draw(). We create new
objects and point at them using Shape* pointers. But when we call a draw() function, as in:
Shape* p = new Line(0.1, 0.1, Co_blue, 0.4, 0.4);
p->draw();
the draw() function for a Line is called, not the draw() function for Shape. This style of
programming is very common and goes by names like "polymorphism" and "object-oriented
programming". To illustrate it further, here is an example of this type of programming for a
graphics application. Annotations in /* */ explain in some detail what is going on.
#include <string.h>
#include <assert.h>
#include <iostream.h>
/*
The type of X/Y points on the screen.
*/
/*
Colors.
*/
/*
These are protected so that they can be accessed
by derived classes. Private wouldn't allow this.
public:
Shape(Coord x, Coord y, Color c) :
25
C++ Tutorials
/*
Constructor to initialize data members common to
all shape types.
*/
/*
Destructor for Shape. It's a virtual function.
Destructors in derived classes are virtual also
because this one is declared so.
*/
/*
Similarly for the draw() function. It's a pure virtual and
is not called directly.
*/
};
/*
Line is derived from Shape, and picks up its
data members.
*/
/*
Additional data members needed only for Lines.
*/
public:
Line(Coord x, Coord y, Color c, Coord xd, Coord yd) :
xdest(xd), ydest(yd),
Shape(x, y, c) {} // constructor with base initialization
/*
Construct a Line, calling the Shape constructor as well
to initialize data members of the base class.
26
C++ Tutorials
*/
/*
Destructor.
*/
/*
Draw a line.
*/
};
/*
Radius of circle.
*/
public:
Circle(Coord x, Coord y, Color c, Coord r) : rad(r),
Shape(x, y, c) {} // constructor with base initialization
27
C++ Tutorials
public:
Text(Coord x, Coord y, Color c, const char* s) :
Shape(x, y, c) // constructor with base initialization
{
str = new char[strlen(s) + 1];
assert(str);
strcpy(str, s);
/*
Copy out text string. Note that this would be done differently
if we were taking advantage of some newer C++ features like
exceptions and strings.
*/
}
~Text() {delete [] str; cout << "~Text\n";} // virtual dtor
/*
Destructor; delete text string.
*/
int main()
{
const int N = 5;
int i;
Shape* sptrs[N];
/*
Pointer to vector of Shape* pointers. Pointers to classes
derived from Shape can be assigned to Shape* pointers.
*/
28
C++ Tutorials
/*
Create some shape objects.
*/
/*
Draw them using virtual functions to pick up the
right draw() function based on the actual object
type being pointed at.
*/
// cleanup
/*
Clean up the objects using virtual destructors.
*/
return 0;
}
When we run this program, the output is:
Line(0.1, 0.1, 2, 0.4, 0.5)
Line(0.3, 0.2, 0, 0.9, 0.75)
Circle(0.5, 0.5, 1, 0.3)
Text(0.7, 0.4, 2, Howdy!)
Circle(0.3, 0.3, 0, 0.1)
~Line
~Line
~Circle
~Text
~Circle
with enum color values represented by small integers.
A few additional comments. Virtual functions typically are implemented by placing a pointer
to a jump table in each object instance. This table pointer represents the "real" type of the
object, even though the object is being manipulated through a base class pointer.
Because virtual functions usually need to have their function address taken, to store in a table,
declaring them inline as the above example does is often a waste of time. They will be laid
down as static copies per object file. There are some advanced techniques for optimizing
virtual functions, but you can't count on these being available.
29
C++ Tutorials
Note that we declared the Shape destructor virtual (there are no virtual constructors). If we had
not done this, then when we iterated over the vector of Shape* pointers, deleting each object in
turn, the destructors for the actual object types derived from Shape would not have been
called, and in the case above this would result in a memory leak in the Text class.
Shape is an example of an abstract class, whose purpose is to serve as a base for derived
classes that actually do the work. It is not possible to create an actual object instance of Shape,
because it contains at least one pure virtual function.
30
C++ Tutorials
Templates
31
C++ Tutorials
This template will also work on non-numeric types, so long as they have the ">" operator
defined. For example:
class A {
public:
int operator>(const A&); // use "bool" return type
// instead, if available
};
A a;
A b;
A c = max(a, b);
Templates are a powerful but complex feature, about which we will have more to say.
Languages like C or Java(tm), that do not have templates, typically use macros or rely on
using base class pointers and virtual functions to synthesize some of the properties of
templates.
Templates in C++ are a more ambitious attempt to support "generic programming" than some
previous efforts found in other programming languages. Support for generic programming in
C++ is considered by some to be as important a language goal for C++ as is support for
object-oriented programming (using base/derived classes and virtual functions; see newsletter
issue #008). An example of heavy template use can be found in STL, the Standard Template
Library.
int main()
{
Pair<short, float> x(37, 12.34);
Pair<long, long double> y(x);
32
C++ Tutorials
return 0;
}
This is an adaptation of a class found in the Standard Template Library. Note that an object of
class Pair<long, long double> is constructed from an object of class Pair<short, float>. By
using a template constructor it is possible to construct a Pair from any other Pair, assuming
that conversion from T to A and U to B are supported. Without the availability of template
constructors one could only declare constructors with fixed types like "Pair(int)" or else use
the template arguments to Pair itself, as in "Pair(A, B)".
In a similar way to function template use, it's possible to have usage like:
template <class T> struct A {
template <class U> struct B {/* stuff */};
};
A<double>::B<long> ab;
In this example, the type value of T within the nested template declaration would be "double",
while the value of U would be "long".
There are a few restrictions on member templates. A destructor for a class cannot be defined
as a function template, nor may a function template member of a class be virtual.
33
C++ Tutorials
A<-37> a;
This feature is useful in the case where you want to pass a size into the template. For example,
a Vector template might accept a type argument that tells what type of elements will be
operated on, and a size argument giving the vector length:
template <class T, int N> class Vector {
// stuff
};
Vector<float, 100> v;
A template argument may have a default specified (this feature is not widely available as yet):
template <class T = int, int N = 100> class Vector {
// stuff
};
vec[pos] = val;
}
return vec[pos];
}
34
C++ Tutorials
// driver program
int main()
{
Vector<double, 10> v;
int i = 0;
double d = 0.0;
return 0;
}
Actual values are stored in a private vector of type T and length N. In a real Vector class we
might overload operator[] to provide a natural sort of interface such as an actual vector has.
What would happen if we said something like:
Vector<char, -1000> v;
This is an example of code that is legal until the template is actually instantiated into a class.
Because a member like:
char vec[-1000];
is not valid (you can't have arrays of negative or zero size), this usage will be flagged as an
error when instantiation is done.
The process of instantiation itself is a bit tricky. If I have 10 translation units (source files),
and each uses an instantiated class:
Vector<unsigned long, 250>
where does the code for the instantiated class's member functions go? The template definition
itself resides most commonly in a header file, so that it can be accessed everywhere and
because template code has some different properties than other source code.
This is an extremely hard problem for a compiler to solve. One solution is to make all template
functions inline and duplicate the code for them per translation unit. This results in very fast
but potentially bulky code.
Another approach, which works if you have control over the object file format and the linker,
is to generate duplicate instantiations per object file and then use the linker to merge them.
Yet another approach is to create auxiliary files or directories ("repositories") that have a
memory of what has been instantiated in which object file, and use that state file in
conjunction with the compiler and linker to control the instantiation process.
There are also schemes for explicitly forcing instantiation to take place. We'll discuss these in
a future issue. The instantiation issue is usually hidden from a programmer, but sometimes
becomes visible, for example if the programmer notices that object file sizes seem bloated.
35
C++ Tutorials
A<void> a;
The member x will be of type void* in this case. In the earlier example, using void as a type
argument would result in an instantiation error, because a data member of a class (or any
object for that matter) cannot be of type void.
Usage like:
A<int [37][47]> a1;
A<float, 100> a;
This is useful in specifying the size of an internal data structure, in this example a vector of
float[100]. The size could also be specified via a constructor, but in that case the size would
not be known at compile time and therefore dynamic storage would have to be used to allocate
the vector.
The address of an external object can be used:
template <char* cp> struct A { /* ... */ };
char c;
A<&c> a;
or you can use the address of a function:
template <void (*fp)(int)> struct A { /* ... */ };
36
C++ Tutorials
void f(int) {}
A<f> a;
This latter case might be useful if you want to pass in a pointer of a function to be used
internally within the template, for example, to compare elements of a vector.
Some other kinds of constructs are not permitted as arguments:
- a constant expression of floating type
- local types
String<char> x;
37
C++ Tutorials
The "template <>" notation is fairly new and may not yet be implemented in your local
compiler.
This sequence is a bit different from:
template <class T> class String {
// stuff
};
String<char> x;
In this second case, the default implementation of String is used, whereas in the specialization
case, the programmer overrides the default template and provides an implementation of
String<char>.
For a function template, a specialization would be defined as:
template <class T> void f(T) {}
int i = f(12.34);
38
C++ Tutorials
};
};
A<int> a;
At instantiation time, the template formal parameter T is assigned the type value "int".
Instantiation is done based on need -- the generated class A<int> will not be instantiated
unless it has first been referenced or otherwise used.
The actual process of instantiation is done in various ways, for example during the link phase
of producing an executable program. But it is possible to explicitly force instantiation to occur
in a file. For example:
template <class T> class A {
T x;
void f();
};
template <class T> void A<T>::f() {}
39
C++ Tutorials
void f()
{
A<double> a;
int i = a.x;
}
int main()
{
f();
return 0;
}
In this example, the function f() gains access to the private members of all instantiated classes
that come from the template A, such as A<double>, A<char**>, and so on.
In a similar way, a whole class can be granted access to private members of a template:
template <class T> class A {
int x;
friend class B;
};
class B {
public:
void f();
};
void B::f()
{
A<short> a;
int i = a.x;
}
int main()
{
40
C++ Tutorials
B b;
b.f();
return 0;
}
Here, class B is a friend of template A, and so all of B's members can access the private
members of A<short>.
In an earlier issue, we talked about member templates. With this feature additional
combinations of friends and templates are possible.
Java and all Java-based marks are trademarks or registered trademarks of Sun Microsystems,
Inc. in the United States and other countries. Throughout this document, when the Java
trademark appears alone it is a reference to the Java programming language. When the JDK
trademark appears alone it is a reference to the Java Development Kit.
Use of Static
THE MEANING OF "STATIC"
Someone asked about the meaning of the term "static" in C++. This particular term is perhaps
the most overworked one in the language. It's both a descriptive word and a C++ keyword that
is used in various ways.
"static" as a descriptive term refers to the lifetime of C++ memory or storage locations. There
are several types of storage:
- static
- dynamic (heap)
- auto (stack)
A typical storage layout scheme will have the following arrangement, from lowest to highest
virtual memory address:
text (program code)
heap
41
C++ Tutorials
stack
with the heap and stack growing toward each other. The C++ draft standard does not mandate
this arrangement, and this example is only an illustration of one way of doing it.
Static storage thus refers to memory locations that persist for the life of the program; global
variables are static. Stack storage comes and goes as functions are called ("stack frames"), and
heap storage is allocated and deallocated using operators new and delete. Note that usage like:
void f()
{
static int x = 37;
}
also refers to storage that persists throughout the program, even though x cannot be used
outside of f() to refer to that storage.
So we might say that "static" as a descriptive term is used to describe the lifetime of memory
locations. static can also be used to describe the visibility of objects. For example:
static int x = 37;
int A::x = 0;
void A::f() {}
A static data member like A::x is shared across all object instances of A. That is, if I define
two object instances:
A a1;
A a2;
then they have the same x but y is different between them. A static data member is useful to
share information between object instances. For example, in issue #010 we talked about using
a specialized allocator on a per-class basis to allocate memory for object instances, and a static
member "freelist" was used as part of the implementation of this scheme.
A static function member, such as A::f(), can be used to provide utility functions to a class.
For example, with a class representing calendar dates, a function that tells whether a given
42
C++ Tutorials
year is a leap year might best be represented as a static function. The function is related to the
operation of the class but doesn't operate on particular object instances (actual calendar dates)
of the class. Such a function could be made global, but it's cleaner to have the function as part
of the Date package:
class Date {
static int is_leap(int year); // use bool if available
public:
// stuff
};
In this example, is_leap() is private to Date and can only be used within member functions of
Date, instead of by the whole program.
static meaning "local to a file" has been devalued somewhat by the introduction of C++
namespaces; the draft standard states that use of static is deprecated for objects in namespace
scope. For example, saying:
static int x;
static void f() {}
is equivalent to:
namespace {
int x;
void f() {}
}
That is, an unnamed namespace is used to wrap the static declarations. All unnamed
namespaces in a single source file (translation unit) are part of the same namespace and differ
from similar namespaces in other translation units.
void f()
{
static A a;
}
This object has a constructor that must be called at some point. But we can't call the
constructor each time that f() is called, because the object is static, that is, exists for the life of
the program, and should be constructed only once. The draft standard says that such an object
should be constructed once, the first time execution passes through its declaration.
This might be implemented internally by a compiler as:
void f()
{
static int __first = 1;
static A a;
43
C++ Tutorials
if (__first) {
a.A::A(); // conceptual, not legal syntax
__first = 0;
}
// other processing
}
If f() is never called, then the object will not be constructed. If it is constructed, it must be
destructed when the program terminates.
Mutable
In C++ it's possible to have a class object instance that is constant and cannot be modified by
the program, once initially set up. For example:
class A {
public:
int x;
A();
};
const A a;
void g()
{
a.x = 37;
}
is illegal. In a similar way, invoking a non-const member function on a const object is also
illegal:
class A {
public:
int x;
A();
void f();
};
const A a;
void g()
{
a.f();
}
44
C++ Tutorials
The reason for this latter prohibition is due to separate compilation. A::f() may be defined in
some other translation unit, and there's no way of knowing whether it modifies the object upon
which it operates.
It is possible to define const member functions:
void f() const;
that are allowed to operate on a const object instance. Such a function does not modify the
instance it operates on. The type of the "this" pointer for a class T is normally:
T *const this;
meaning that the pointer cannot be changed. Within a const member function, the type is:
const T *const this;
meaning that neither the pointer nor the pointed-at object instance can be modified.
Recently a new feature has been added to C++ to selectively allow for individual data class
members to be modified even for a const object instance, and lessen the need for casting away
of const. For example:
class A {
public:
mutable int x;
A();
};
const A a;
void f()
{
a.x = 37;
}
This says that "x" can be modified even though it's a member of a const object instance.
How useful "mutable" turns out to be remains to be seen. One cited example for its use is
within classes whose object instances appear constant but actually do change their state
internally. For example:
class Box {
double xll, yll; // lower left X,Y
double xur, yur; // upper right X,Y
double a; // cached area
public:
double area() const
{
a = (xur - xll) * (yur - yll);
return a;
}
class Box(double x1, double y1, double x2, double y2) :
xll(x1), yll(y1), xur(x2), yur(y2)
{
}
};
45
C++ Tutorials
void f()
{
b.area();
}
which is illegal usage unless we instead say:
class Box {
double xll, yll; // lower left X,Y
double xur, yur; // upper right X,Y
mutable double a; // cached area
public:
double area() const
{
a = (xur - xll) * (yur - yll);
return a;
}
class Box(double x1, double y1, double x2, double y2) :
xll(x1), yll(y1), xur(x2), yur(y2)
{
}
};
void f()
{
b.area();
}
Explicit
In C++ it is possible to declare constructors for a class, taking a single parameter, and use
those constructors for doing type conversion. For example:
class A {
public:
A(int);
};
void f(A) {}
void g()
46
C++ Tutorials
{
A a1 = 37;
A a2 = A(47);
A a3(57);
a1 = 67;
f(77);
}
A declaration like:
A a1 = 37;
says to call the A(int) constructor to create an A object from the integer value. Such a
constructor is called a "converting constructor".
However, this type of implicit conversion can be confusing, and there is a way of disabling it,
using a new keyword "explicit" in the constructor declaration:
class A {
public:
explicit A(int);
};
void f(A) {}
void g()
{
A a1 = 37; // illegal
A a2 = A(47); // OK
A a3(57); // OK
a1 = 67; // illegal
f(77); // illegal
}
Using the explicit keyword, a constructor is declared to be
"nonconverting", and explicit constructor syntax is required:
class A {
public:
explicit A(int);
};
void f(A) {}
47
C++ Tutorials
void g()
{
A a1 = A(37);
A a2 = A(47);
A a3(57);
a1 = A(67);
f(A(77));
}
Note that an expression such as:
A(47)
is closely related to function-style casts supported by C++. For example:
double d = 12.34;
int i = int(d);
int main()
{
vector<int> v;
48
C++ Tutorials
random_shuffle(v.begin(), v.end());
return 0;
}
When run, this program produces output like:
6 11 9 23 18 12 17 24 20 15 4 22 10 5 1 19 13 3 14 16 0 8 21 2 7
There's quite a bit to say about this example. In the first place,
STL is divided into three logical parts:
- containers
- iterators
- algorithms
Containers are data structures such as vectors. They are implemented as templates, meaning
that a container can hold any type of data element. In the example above, we have
"vector<int>", or a vector of integers.
Iterators can be viewed as pointers to elements within a container.
Algorithms are functions (function templates actually) that operate on data in containers.
Algorithms have no special knowledge of the types of data on which they operate, meaning
that an algorithm is generic in its application.
We include header files for the STL features that we want to use. Note that the headers have
no ".h" on them. This is a new feature in which the .h for standard headers is dropped.
The next line of interest is:
using namespace std;
We discussed namespaces in earlier newsletter issues. This statement means that the names in
namespace "std" should be made available to the program. Standard libraries use std to avoid
the problem mentioned earlier where library elements (like functions or class names) conflict
with names found in other libraries.
The line:
vector<int> v;
declares a vector of integers, and then:
for (int i = 0; i < 25; i++)
v.push_back(i);
adds the numbers 0-24 to the vector, using the push_back() member function.
Actual shuffling is done with the line:
random_shuffle(v.begin(), v.end());
where v.begin() and v.end() are iterator arguments that delimit the extend of the list to be
shuffled.
Finally, we display the shuffled list of integers, using an overloaded operator[] on the vector:
49
C++ Tutorials
int main()
{
vector<int> v;
50
C++ Tutorials
random_shuffle(v.begin(), v.end());
return 0;
}
With lists, we can't use [] to index the list, nor is random_shuffle() supported for lists. So we
make do with:
#include <list>
#include <algorithm>
#include <iostream>
int main()
{
list<int> v;
//random_shuffle(v.begin(), v.end());
return 0;
}
where we add elements to the list, and then simply retrieve the element at the front of the list,
print it, and pop it off the list.
Finally, we present a hybrid using deques. random_shuffle() can be used with these, because
they have properties of vectors. But we can also use list operations like front() and
pop_front():
#include <algorithm>
#include <iostream>
#include <deque>
51
C++ Tutorials
int main()
{
deque<int> v;
random_shuffle(v.begin(), v.end());
return 0;
}
Which of vectors, lists, and deques you should use depend on the application, of course. There
are several additional container types that we'll be looking at in future issues, including stacks
and queues. It's also possible to define your own container types.
The performance of operations on these structures is defined in the standard, and can be relied
upon when designing for portability.
int main()
{
typedef set<int, less<int> > SetInt;
//typedef multiset<int, less<int> > SetInt;
SetInt s;
52
C++ Tutorials
return 0;
}
This example is for set, but the usage for multiset is almost identical. The first item to consider
is the line:
typedef set<int, less<int> > SetInt;
This establishes a type "SetInt", which is a set operating on ints, and which uses the template
"less<int>" defined in <function> to order the keys of the set. In other words, set takes two
type arguments, the first of which is the underlying type of the set, and the second a template
class that defines how ordering is to be done in the set.
Next, we use insert() to insert keys in the set. Note that some duplicate keys will be inserted,
for example "4".
Then we establish an iterator pointing at the beginning of the set, and iterate over the elements,
outputting each in turn. The code for multiset is identical save for the typedef declaration.
The output for set is:
0 1 2 3 4 5 6 7 8 9 10 12 14 16 18
and for multiset:
0 0 1 2 2 3 4 4 5 6 6 7 8 8 9 10 12 14 16 18
STL also provides bitsets, which are packed arrays of binary values. These are not the same as
"vector<bool>", which is a vector of Booleans.
int main()
{
53
C++ Tutorials
MAP counter;
char buf[256];
MAP::iterator it = counter.begin();
return 0;
}
This is a short but somewhat tricky example. We first set up a typedef for:
map<string, long, less<string> >
which is a map template with three template arguments. The first is the type of the key, in this
example a string. The second is the value associated with the key, in this case a long integer
used as a counter. Finally, because the keys of the map are maintained in sorted order, we
provide a template comparison function (see issue #016 for another example of this).
Another typedef we establish but do not use in this simple example is the VAL type, which is
a template of type "pair<string,long>". pair is used internally within STL, and in this case is
used to represent a map element key/value pair. So VAL represents an element in the map.
We then read lines of input and insert each word into the map. The statement:
counter[buf]++;
does several things. First of all, buf is a char*, not a string, and must be converted via a
constructor. What we've said is equivalent to:
counter[string(buf)]++;
operator[] is overloaded for maps, and in this case the key is used to look up the element, and
return a long&, that is, a reference to the underlying value. This value is then incremented (it
started at zero).
Finally, we iterate over the map entries, using an iterator. Note that:
(*it).first
cannot be replaced by:
it->first
because "*" is overloaded. When * is applied to "it", it returns a pair<string,key> object, that
is, the underlying type of elements in the map. We then reference "first" and "second", fields
in pair, to retrieve keys and values for output.
For input:
a
b
c
54
C++ Tutorials
a
b
output is:
a2
b2
c1
There are some complex ideas here, but map is a very powerful feature worth mastering.
int main()
{
bitset<16> b1("1011011110001011");
bitset<16> b2;
b2 = ~b1;
return 0;
}
A declaration like:
bitset<16> b1("1011011110001011");
declares a 16-long set of bits, and initializes the value of the set to the specified bits.
We then operate on the bit set, in this example performing a bitwise NOT operation, that is,
toggling all the bits. The result of this operation is stored in b2.
Finally, we iterate over b2 and display all the bits. b2.size() returns the number of bits in the
set, and the [] operator is overloaded to provide access to individual bits.
There are other operations possible on bit sets, for example the flip() function to toggle an
individual bit.
55
C++ Tutorials
#include <stack>
#include <list>
int main()
{
stack<int, list<int> > stk;
while (!stk.empty()) {
cout << stk.top() << endl;
stk.pop();
}
return 0;
}
We declare the stack, specifying the underlying type (int), and the sort of list used to represent
the stack (list<int>).
We then use push() to push items on the stack, top() to retrieve the value of the top item on the
stack, and pop() to pop items off the stack. empty() is used to determine whether the stack is
empty or not.
We will move on to other aspects of STL in future issues. One data structure not discussed is
queues and priority_queues. A queue is something like a stack, except that it's first-in-first-out
instead of last-in-first-out.
void main()
{
int arr[N];
arr[50] = 37;
56
C++ Tutorials
void main()
{
vector<int> iv(N);
iv[50] = 37;
57
C++ Tutorials
subject. There's another way to write the example we presented before, using a couple of STL
iterator functions:
#include <algorithm>
#include <iterator>
#include <vector>
#include <iostream>
void main()
{
vector<int> iv(N);
iv[50] = 37;
iv[52] = 47;
advance(iter, 2);
cout << "value = " << *iter << "\n";
}
The function distance() computes the distance between two iterator values. In this example,
we know that we're starting at "iv.begin()", the beginning of the integer vector. And we've
found a match at "iter", and so we can use distance() to compute the distance between these,
and display this result. Note that more recently distance() has been changed to work more like
a regular function, with the beginning and ending arguments supplied and the difference
returned as the result of the function:
d = distance(iv.begin(), iter);
A similar issue comes up with advancing an iterator. For example, it's possible to use "++" for
this, but cumbersome when you wish to advance the iterator a large value. Instead of ++,
advance() can be used to advance the iterator a specified number of positions. In the example
above, we move the iterator forward 2 positions, and then display the value stored in the
vector at that location.
These functions provide an alternative way of manipulating iterators, that does not depend so
much on pointer arithmetic.
58
C++ Tutorials
class String {
char* str;
public:
String()
{
str = 0;
}
String(char* s)
{
str = strdup(s);
assert(str);
}
int operator<(const String& s) const
{
return strcmp(str, s.str) < 0;
}
operator char*()
{
return str;
}
};
int main()
{
int i, j;
vector<String> v;
59
C++ Tutorials
random_shuffle(v.begin(), v.end());
sort(v.begin(), v.end());
return 0;
}
This String class provides a thin layer over char* pointers. It is provided for illustrative
purposes rather than as a model of how to write a good String class.
We first build a vector of String objects by iterating over the char* list, calling a String
constructor for each entry in turn. Then we shuffle the list, display it, and then sort it by
calling sort() with a couple of iterator parameters v.begin() and v.end(). Output looks like:
phi delta beta theta omega alpha rho gamma epsilon
alpha beta delta epsilon gamma omega phi rho theta
There are a couple of things to note about this example. If we commented out the operator<
function, the example would still compile, and the < comparison would be done by converting
both Strings to char* using the conversion function we supplied. Comparing actual pointers,
that is, comparing addresses, is probably not going to work, except by chance in a case where
the list of char* is already in sorted order.
Also, sort() is not stable, which means that the order of duplicate items is not preserved.
"stable_sort" can be used if this property is desired.
In the next few issues, we'll be looking at some of the other algorithms found in STL.
int main()
60
C++ Tutorials
{
int i = 0;
return 0;
}
In the first case, we want to copy the contents of "a" to "b". We specify a couple of iterators
"a" and "a + 10" to describe the region to be copied, and another iterator "b" that describes the
beginning of the destination region. In the second example, we do a similar thing, except we
copy backwards starting with the ending iterator. copy_backward() is important when source
and destination overlap. In the third example, we copy a vector to itself, sort of a "rolling"
copy. The results of running this program are:
1 2 3 4 5 6 7 8 9 10 0 0 0
1 2 3 1 2 3 4 5 6 7 8 9 10
1231231231231
As with previous examples, we could replace primitive arrays with vector<int> types, and use
begin() and end() as higher-level iterator mechanisms.
Copying is a low-level, efficient operation. It does no checking while copying, so, for
example, if the destination array is too small, then copying will run off the end of the array.
61
C++ Tutorials
int vec3[10];
int main()
{
int i = 0;
return 0;
}
In both cases we replace all values of "10" in the vectors with the value "20". replace_copy()
is like replace(), except that the replacing is not done in place, but instead is sent to a specified
location described by an iterator (in this case, "vec3").
The output of this program is:
1 2 20 5 9 20 3 2 7 20
1 2 10 5 9 10 3 2 7 10
1 2 20 5 9 20 3 2 7 20
A more general form of replacement uses replace_if(), along with a specified predicate
template instance:
#include <algorithm>
#include <iostream>
62
C++ Tutorials
int main()
{
int i = 0;
return 0;
}
In this example, is_odd<T> is a class template that is used to determine whether a value of
type T is even or odd. The constructor call, is_odd<int>(), creates an object instance of the
template where T is "int". replace_if() calls operator() of the template object to evaluate
whether a given value should be replaced.
This program replaces odd values with the value 59. Output is:
59 2 10 59 59 10 59 2 59 10
There is also replace_copy_if(), which combines replace_copy() and replace_if() functions.
int vec1[10];
int vec2[10];
int main()
{
fill(vec1, vec1 + 10, -1);
for (int i = 0; i < 10; i++)
cout << vec1[i] << " ";
cout << endl;
fill_n(vec2, 5, -1);
for (int j = 0; j < 10; j++)
cout << vec2[j] << " ";
cout << endl;
return 0;
}
63
C++ Tutorials
fill() fills according to the specified iterator range, while fill_n() fills a specified number of
locations based on a starting iterator and a count. The results of running this program are:
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 0 0 0 0 0
int main()
{
int sum = accumulate(vec, vec + 5, 0);
return 0;
}
In this example, we specify iterators for a vector of integers, along with an initial value (0) for
doing the summation.
By default, the "+" operator is applied to the values in turn. Other operators can be used, for
example "*" in the second example. In this case the starting value is 1 rather than 0.
64
C++ Tutorials
int main()
{
vector<int>::iterator last =
set_union(set1, set1 + 3, set2, set2 + 3, first);
return 0;
}
In the example we set up two ordered sets of numbers, and then take their union. We specify
two pairs of iterators to delimit the input sets, along with an output iterator.
Algorithms are provided for taking union, intersection, difference, and for determining
whether one set of elements is a subset of another set.
Exception Handling
INTRODUCTION TO EXCEPTION HANDLING PART 1 - A SIMPLE EXAMPLE
In this and subsequent issues we will be discussing some aspects of C++ exception handling.
To start this discussion, let's consider a simple example. Suppose that you are writing a
program to manipulate calendar dates, and want to check whether a given year is in the 20th
century (ignoring the issue of whether the 21st century starts in 2000 or 2001!).
Using exceptions, one way to do this might be:
#include <iostream.h>
class DateException {
char* err;
public:
DateException(char* s) {err = s;}
void print() const {cerr << err << endl;}
};
65
C++ Tutorials
{
if (date < 1900)
throw DateException("date < 1900");
if (date > 1999)
throw DateException("date > 1999");
// process date ...
}
int main()
{
try {
f();
}
catch (const DateException& de) {
de.print();
return 1;
}
return 0;
}
The basic idea here is that we have a try block:
try {
f();
}
Within this block, we execute some code, in this case a function call f(). Then we have a list of
one or more handlers:
catch (DateException de) {
de.print();
return 1;
}
If an abnormal condition arises in the code, we can throw an exception:
if (date < 1900)
throw DateException("date < 1900");
and have it caught by one of the handlers at an outer level, that is, execution will continue at
the point of the handler, with the execution stack unwound.
An exception may be a class object type such as DateException, or a fundamental C++ type
like an integer. Obviously, a class object type can store and convey more information about
the nature of the exception, as illustrated in this example. Saying:
throw -37;
will indeed throw an exception, which may be caught somewhere, but this idiom is not
particularly useful.
66
C++ Tutorials
return 0;
}
where "..." will catch any exception type.
We will explore various details of exception handling in future issues, but one general
comment is in order. C++ exceptions are not the same as low-level hardware interrupts, nor
are they the same as UNIX signals such as SIGTERM. And there's no linkage between
exceptions such as divide by zero (which may be a low-level machine exception) and C++
exceptions.
void g()
{
try { // try block
f();
}
catch (int i) { // handler or catch clause
}
}
67
C++ Tutorials
In this example the exception with value 37 is thrown, and control passes to the handler. A
throw transfers control to the nearest handler with the appropriate type. "Nearest" means in the
sense of stack frames and try blocks that have been dynamically entered.
Typically an exception that is thrown is of class type rather than a simple constant like "37".
Throwing a class object instance allows for more sophisticated usage such as conveying
additional information about the nature of an exception.
A class object instance that is thrown is treated similarly to a function argument or operand in
a return statement. A temporary copy of the instance may be made at the throw point, just as
temporaries are sometimes used with function argument passing. A copy constructor if any is
used to initialize the temporary, with the class's destructor used to destruct the temporary. The
temporary persists as long as there is a handler being executed for the given exception. As in
other parts of the C++ language, some compilers may be able in some cases to eliminate the
temporary.
An example:
#include <iostream.h>
class Exc {
char* s;
public:
Exc(char* e) {s = e; cerr << "ctor called\n";}
Exc(const Exc& e) {s = e.s; cerr << "copy ctor called\n";}
~Exc() {cerr << "dtor called\n";}
char* geterr() const {return s;}
};
// other processing
}
int main()
{
try {
check_date(1879);
}
catch (const Exc& e) {
cerr << "exception was: " << e.geterr() << "\n";
}
return 0;
}
If you run this program, you can trace through the various stages of throwing the exception,
including the actual throw, making a temporary copy of the class instance, and the invocation
of the destructor on the temporary.
68
C++ Tutorials
class A {
int x;
public:
A(int i) {x = i; cerr << "ctor " << x << endl;}
~A() {cerr << "dtor " << x << endl;}
};
void f()
{
A a1(1);
A a2(2);
}
69
C++ Tutorials
int main()
{
try {
A a3(3);
f();
A a4(4);
}
catch (const char* s) {
cerr << "exception: " << s << endl;
}
return 0;
}
Output of this program is:
ctor 3
ctor 1
dtor 1
dtor 3
exception: this is a test
In this example, we enter the try block in main(), allocate a3, then call f(). f() allocates a1, then
throws an exception, which will transfer control to the catch clause in main().
In this example, the a1 and a3 objects have their destructors called. a2 and a4 do not, because
they were never allocated.
It's possible to have class objects containing other class objects, or arrays of class objects, with
partial construction taking place followed by an exception being thrown. In this case, only the
constructed subobjects will be destructed.
70
C++ Tutorials
or:
catch (T& x) {
// stuff
}
or:
catch (const T& x) {
// stuff
}
will catch a thrown exception of type E, given that:
- T and E are the same type, or
class A {};
void f()
{
throw B();
}
int main()
{
try {
f();
}
catch (const A& x) {
cout << "exception caught" << endl;
}
return 0;
}
because A is a public base class of B. Handlers are tried in order of appearance. If, for
example, you place a handler for a derived class after a handler for a corresponding base class,
it will never be invoked. If we had a handler for B after A, in the example above, it would not
be called.
A handler like:
catch (...) {
// stuff
71
C++ Tutorials
}
appearing as the last handler in a series, will match any exception type.
If no handler is found, the search for a matching handler continues in a dynamically
surrounding try block. If no handler is found at all, a special library function terminate() is
called, typically ending the program.
An exception is considered caught by a handler when the parameters to the handler have been
initialized, and considered finished when the handler exits.
In the next issue we'll talk a bit about exception specifications, that are used to specify what
exception types a function may throw.
int main()
{
try {
f();
}
catch (char* s) {
}
return 0;
}
What will happen? An exception of type "int" is thrown, but there is no handler for it. In this
case, a special function terminate() is called. terminate() is called whenever the exception
handling mechanism cannot find a handler for a thrown exception. terminate() is also called in
a couple of odd cases, for example when an exception occurs in the middle of throwing
another exception.
terminate() is a library function which by default aborts the program. You can override
terminate if you want:
#include <iostream.h>
#include <stdlib.h>
PFV set_terminate(PFV);
void t()
{
cerr << "terminate() called" << endl;
exit(1);
72
C++ Tutorials
void f()
{
throw -37;
}
int main()
{
set_terminate(t);
try {
f();
}
catch (char* s) {
}
return 0;
}
Note that this area is in a state of flux as far as compiler adaptation of new features. For
example, terminate() should really be "std::terminate()", and the declarations may be found in
a header file "<exception>". But not all compilers have this yet, and these examples are
written using an older no-longer-standard convention.
In a similar way, a call to the unexpected() function can be triggered by saying:
#include <iostream.h>
#include <stdlib.h>
PFV set_unexpected(PFV);
void u()
{
cerr << "unexpected() called" << endl;
exit(1);
}
int main()
{
set_unexpected(u);
try {
73
C++ Tutorials
f();
}
catch (int i) {
}
return 0;
}
unexpected() is called when a function with an exception specification throws an exception of
a type not listed in the exception specification for the function. In this example, f()'s exception
specification is:
throw(char*)
A function declaration without such a specification may throw any type of exception, and one
with:
throw()
is not allowed to throw exceptions at all. By default unexpected() calls terminate(), but in
certain cases where the user has defined their own version of unexpected(), execution can
continue.
There is also a brand-new library function:
bool uncaught_exception();
that is true from the time after completion of the evaluation of the object to be thrown until
completion of the initialization of the exception declaration in the matching handler. For
example, this would be true during stack unwinding (see newsletter #017). If this function
returns true, then you don't want to throw an exception, because doing so would cause
terminate() to be called.
Placement New/Delete
In C++, operators new/delete mostly replace the use of malloc() and free() in C. For example:
class A {
public:
A();
~A();
};
A* p = new A;
...
delete p;
allocates storage for an A object and arranges for its constructor to be called, later followed by
invocation of the destructor and freeing of the storage. You can use the standard new/delete
functions in the library, or define your own globally and/or on a per-class basis.
74
C++ Tutorials
Alloc allocator;
...
...
A* p = new (allocator) A;
If you do this, then you need to define your own new function, like this:
void* operator new(size_t s, Alloc& a)
{
// stuff
}
The first parameter is always of type "size_t" (typically unsigned int), and any additional
parameters are then listed. In this example, the "a" instance of Alloc might be examined to
determine what strategy to use to allocate space. A similar approach can be used for operator
new[] used for arrays.
This feature has been around for a while. A relatively new feature that goes along with it is
placement delete. If during object initialization as part of a placement new call, for example
during constructor invocation on a class object instance, an exception is thrown, then a
matching placement delete call is made, with the same arguments and values as to placement
new. In the example above, a matching function would be:
void operator delete(void* p, Alloc& a)
{
// stuff
}
With new, the first parameter is always "size_t", and with delete, always "void*". So
"matching" in this instance means all other parameters match. "a" would have the value as was
passed to new earlier.
Here's a simple example:
int flag = 0;
75
C++ Tutorials
class A {
public:
A() {throw -37;}
};
int main()
{
try {
A* p = new (1234) A;
}
catch (int i) {
}
if (flag == 0)
return 1;
else
return 0;
}
Placement delete may not be in your local C++ compiler as yet. In compilers without this
feature, memory will leak. Note also that you can't call overloaded operator delete directly via
the operator syntax; you'd have to code it as a regular function call.
void f(int i)
{
printf("%d\n", i);
}
76
C++ Tutorials
void main()
{
fp p = &f;
p(37);
}
and are employed in a variety of ways, for example to specify a comparison function to a
library function like qsort().
In C++, pointers can be similarly used, but there are a couple of quirks to consider. We will
discuss two of them in this section, and another one in the next section.
The first point to mention is that C++ has C-style functions in it, but also has other types of
functions, notably member functions. For example:
class A {
public:
void f(int);
};
In this example, A::f(int) is a member function. That is, it operates on object instances of class
A, and the function itself has a "this" pointer that points at the instance in question.
Because C++ is a strongly typed language, it is desirable that a pointer to a member function
be treated differently than a pointer to a C-style function, and that a pointer to a function
member of class A be distinguished from a pointer to a member of class B. To do this, we can
say:
#include <iostream.h>
class A {
public:
void f(int i) {cout << "value is: " << i << "\n";}
};
pmfA x = &A::f;
void main()
{
A a;
A* p = &a;
(p->*x)(37);
}
Note the notation for actually calling the member function.
It is not possible to intermix such a type with other pointer types, so for example:
void f(int) {}
pmfA x = &f;
77
C++ Tutorials
is invalid.
class A {
public:
static void g(int);
};
fp p = &A::g;
is treated like a C-style function. A static function has no "this" pointer and does not operate
on actual object instances.
Pointers to members are typically implemented just like C function pointers, but there is an
issue with their implementation in cases where inheritance is used. In such a case, you have to
worry about computing offsets of subobjects, and so on, when calling a member function, and
for this purpose a runtime structure similar to a virtual table used for virtual functions is used.
It's also possible to have pointers to data members of a class, with the pointer representing an
offset into a class instance. For example:
#include <iostream.h>
class A {
public:
int x;
};
void main()
{
A a;
A* p = &a;
a.x = 37;
78
C++ Tutorials
used just like a function pointer in C. But according to the standard (section 7.5), such a
pointer in fact has a different type.
For example, consider:
extern "C" typedef void (*fp1)(int);
Type Identification
A relatively new feature in C++ is type identification, where it is possible to determine the
type of an object at run time. A simple example of this feature is:
#include <typeinfo.h>
#include <stdio.h>
class A {
public:
virtual void f(int) {}
};
class B : public A {
public:
virtual void f(int) {}
};
int main()
{
A a;
B b;
A* ap1 = &a;
79
C++ Tutorials
A* ap2 = &b;
if (typeid(*ap1) == typeid(A))
printf("ap1 is A\n");
else
printf("ap1 is B\n");
if (typeid(*ap2) == typeid(A))
printf("ap2 is A\n");
else
printf("ap2 is B\n");
return 0;
}
which produces:
ap1 is A
ap2 is B
even though the nominal type of both *ap1 and *ap2 is A. In this example, *ap1 and *ap2
represent polymorphic types, that is, types that can refer to any class type in a hierarchy of
derivations. If we omitted the virtual functions in A and B, this program would give different
results, considering both *ap1 and *ap2 to be referencing A objects.
typeid() produces an object of type "typeinfo", described in typeinfo.h (or just "typeinfo" in
newer systems). This type has operations for testing for equality, and also a member function
for returning the name of a type. For example, when this code is executed:
#include <typeinfo.h>
#include <stdio.h>
int main()
{
int i;
double x[57];
float f1 = 0.0;
const float f2 = 0.0;
printf("%s\n", typeid(i).name());
printf("%s\n", typeid(x).name());
if (typeid(f1) == typeid(f2))
printf("equal\n");
return 0;
}
the result is:
int
double [57]
equal
80
C++ Tutorials
Note that the typeid() comparison ignores top-level "const". The form of the name returned by
name() is implementation-dependent.
This feature of C++ is quite important, because it represents a partial departure from early
binding, that is, fully resolving names at compile time. Sometimes it's necessary to be able to
manipulate type names in a running program. A more recent language like Java(tm) has many
more features of this type.
Java and all Java-based marks are trademarks or registered trademarks of Sun Microsystems,
Inc. in the United States and other countries. Throughout this document, when the Java
trademark appears alone it is a reference to the Java programming language. When the JDK
trademark appears alone it is a reference to the Java Development Kit.
Dynamic Casts
In the last issue we discussed runtime type identification, a mechanism for obtaining the type
of an object during program execution. There is another aspect of this that we need to mention,
the dynamic_cast<> feature. If we have an expression:
dynamic_cast<T*>(p)
then this operator converts its operand p to the type T*, if *p really is a T or a class derived
from T; otherwise, the operator returns 0.
What does this mean in practice? Suppose that you have a pointer or reference to a base class,
and you want to know whether you "really" have a base class pointer, or instead a pointer to
some class object derived from the base class. In this case, you can say:
#include <typeinfo.h>
#include <iostream.h>
class A {
public:
virtual void f() {cout << "A::f\n";}
};
class B : public A {
public:
virtual void f() {cout << "B::f\n";}
};
void f(A* p)
{
81
C++ Tutorials
B* bp = dynamic_cast<B*>(p);
if (bp)
bp->f();
}
int main()
{
A* ap = new A();
B* bp = new B();
f(ap);
f(bp);
return 0;
}
Here we have a program that creates A and B objects, and passes pointers to them to a
function f(). f() checks whether p is really a pointer to a B, and if so, calls B::f().
Note that we could use the technique shown in the last issue if all we want to do is check the
type. But there are advantages to combining the check and the cast. One is that a combined
operator makes it difficult to mismatch the test and the cast. Another advantage is that a static
cast, for example as illustrated in the commented-out line above, doesn't always give the
correct result. That is, it relies on static information and doesn't know whether a base class
pointer "really" points at a derived object instance.
void g()
{
f<double>(37);
}
It used to be that you'd have to use all the template parameter types (like T) in the declaration
of the template, but this is no longer required. In this example, T is declared via the <>
specification to be of type double, and the actual function parameter is of course an int.
One possible application for the feature is the ability to specify what a template's return type
should be:
template <class T, class U, class V> V max(T t, U u)
82
C++ Tutorials
{
if (t > u)
return V(t);
else
return V(u);
}
void g()
{
int i = max<double,double,int>(12.34, 43.21);
}
independent of reliance on the template function arguments.
int main()
{
FILE* fp = fopen("test.txt", "r");
assert(fp != NULL);
return 0;
}
If the argument to assert() is false (zero), the program terminates by calling the library
function abort(), and gives a diagnostic as to the file and line of the error. In this example, an
error like:
Assertion failed: fp != NULL, file x2.c, line 8
Abnormal program termination
comes out. Note that we could shorten the test to:
assert(fp);
identical to fp != NULL.
83
C++ Tutorials
assert() is useful for "should never happen" kinds of errors, or for quick prototyping. It's not
really suitable as a primary tool for giving end-user error messages.
Another error-reporting tool is <cerrno>. This has antecedents in the UNIX operating system,
where a system call would return -1 on failure, and set a global variable "errno" to a number
giving a more precise indication of what failed.
An example of using this technique is:
#include <stdio.h>
#include <errno.h>
#include <iostream.h>
#include <string.h>
int main()
{
errno = 0;
if (fp == NULL)
cout << strerror(errno) << endl;
return 0;
}
errno is reset, and then an fopen() call made, which will ultimately invoke a system call open()
to open a file. If fopen() returns NULL, errno can be queried to find out what exactly went
wrong. strerror() is used to retrieve the text of the various error message codes represented by
errno.
In this example, the output is:
No such file or directory
This mechanism is useful in obtaining detail about errors, but you need to be careful to reset
errno each time. Also, errno is not thread-safe.
int main()
{
char buf[25];
strcpy(buf, "testing");
printf("%s\n", buf);
}
This approach works pretty well and is efficient, but is quite low-level and prone to errors.
84
C++ Tutorials
A newer facility is C++-style strings. A simple example, that reads from standard input and
writes each line to standard output, after reversing the characters in the line, looks like this:
#include <iostream>
#include <string>
int main()
{
string instr;
return 0;
}
Note in this example that >> and << are overloaded for I/O, that [] is used to index individual
characters, that += is used to concatenate an individual character to a string, and that there's no
need to worry about memory management.
The string class is based on a template "basic_string" that provides string operations, and is
defined as:
typedef basic_string<char> string;
But you don't need to worry about this unless you really want to make sophisticated use of
string facilities. Note also that string is defined in the "std" namespace, which must be
included via a using declaration.
Strings have value semantics, meaning that a copy is done when one string is assigned to
another. So, for example, the output of this program:
#include <iostream>
#include <string>
int main()
{
string s1 = "abc";
string s2 = s1;
s1 = "def";
return 0;
}
is "abc" and not "def".
85
C++ Tutorials
Another example, illustrating some additional features of string, is one that replaces "abc" in
input lines with "ABC", and writes the result to standard output:
#include <iostream>
#include <string>
#include <stdio>
int main()
{
string str;
return 0;
}
find() attempts to find a substring in the string, and returns its index if found. "string::npos" is
a special value that indicates the search failed. Strings have the property:
length() < npos
If the search succeeds, we replace "abc" with "ABC". We then output the value using C-style
I/O, as an illustration of a how a C++ string can be converted to a C one using c_str().
C++ strings offer a higher-level abstraction than C-style ones, and are preferred in most cases.
int main()
{
cout << numeric_limits<long>::digits << endl;
cout << numeric_limits<double>::max_exponent10 << endl;
return 0;
}
86
C++ Tutorials
prints "31" and "308" when using Borland C++ 5.0 on a PC. 31 is the number of non-sign
digits in a long, and 308 the maximum base-10 exponent of a double. Properties that do not
make sense for a type (such as max_exponent10 for int) are given default values.
The set of properties that is available will vary based on the underlying type. For example,
floating-point types have information available on exponents, infinity, and so on.
Some of the common properties are illustrated by another example:
#include <iostream>
#include <limits>
int main()
{
cout << numeric_limits<short>::is_integer << endl;
cout << numeric_limits<short>::min() << endl;
cout << numeric_limits<short>::max() << endl;
return 0;
}
which checks whether the type (short) is an integral type, and obtains the minimum (-32768)
and maximum (32767) values for the type.
If you define your own custom numeric type, it's a good idea to specialize numeric_limits for
that type. For example, suppose that I have a type "LongLong" that is twice the length of a
long. I might say something like:
#include <iostream>
#include <limits>
class numeric_limits<LongLong> {
public:
inline static LongLong min() throw() {/* ... */}
inline static LongLong max() throw() {/* ... */}
};
int main()
{
cout << numeric_limits<LongLong>::min() << endl;
cout << numeric_limits<LongLong>::max() << endl;
return 0;
}
and define the appropriate members suitable for this numeric type.
87
C++ Tutorials
...
This says that two basic flavors of new() are supported (there are also other ones such as new[]
for use with arrays). The first throws a bad_alloc exception if it can't allocate memory, while
the second simply returns 0. The second flavor would be used like this:
int* ip = new (nothrow) int[10000]; // never throws an exception
if (ip == 0)
// allocation error
In other words, it's a syntactic variant of placement new() as described in issue #019.
This approach allows for error-handling strategies that do not use exceptions. Whether such
strategies are "good" depends a lot on the particular application in question.
88
C++ Tutorials
The function abort() terminates a C++ program, without executing destructors for objects with
automatic or static storage duration, and without calling functions registered with atexit().
atexit() is used to register functions that are to be called when the program terminates.
The effect of exit() is to call the destructors for all static objects, in the reverse order of their
construction (automatic objects are not destructed). This last-in-first-out process also
incorporates functions registered with atexit(), such that a function registered with atexit()
after a static object is constructed, will be called before that static object is destructed.
After this process is complete, all open C streams are flushed and closed, files created with
tmpfile() are removed, and an exit status is passed back to the calling system.
Some aspects of all the above are a little tricky, but it's important to understand the complete
process of invocation and termination. Sometimes there are important parts of an application
(such as stream I/O) that depend on the underlying mechanics of program startup and
shutdown.
logic_error
length_error (invalid length)
domain_error (domain error)
out_of_range (argument out of range)
invalid_argument (invalid argument)
runtime_error
range_error (out of range in internal computation)
overflow_error (overflow error)
underflow_error (underflow error)
arranged in a corresponding class hierarchy.
An example of using these exceptions would be the following program:
#include <iostream>
#include <stdexcept>
89
C++ Tutorials
{
if (x < 0)
throw invalid_argument("x < 0");
if (y < 0)
throw invalid_argument("y < 0");
}
int main()
{
try {
f(37, -59);
}
catch (exception e) {
cout << e.what() << endl;
}
return 0;
}
The thrown exception is caught in main(). If it had not been, the program would terminate.
You can of course create your own exception classes, derived or not from "exception". But it's
worth knowing about and using standard exceptions wherever possible.
This is a very simple idea, but a quite useful one. For example, consider the problem of
returning two values from a function, such as the minimum and maximum of a set of values:
#include <algorithm>
#include <utility>
#include <iostream>
90
C++ Tutorials
int main()
{
int vec[] = {1, 19, 2, 14, -5, 59, 67, -37, 100, 47};
cout << p.first << " " << p.second << endl;
return 0;
}
minmax() takes a T* vector argument and a vector size, and returns the minimum and
maximum values of the vector, using the library functions min_element() and max_element().
The values are passed back in a pair<T,T> structure.
Pair is used in the standard library, for example to represent the (key,value) pair within the
map container.
std::complex<double>
std::complex<long double>
All the usual operations on complex types are provided by this template. For example, a
simple program that multiplies two complex numbers is:
#include <complex>
#include <iostream>
int main()
{
ComplexDouble a(1.0, 2.0);
ComplexDouble b(3.0, 5.0);
91
C++ Tutorials
return 0;
}
which takes the product of (1.0,2.0) and (3.0,5.0), yielding (-7.0,11.0).
Complex does not do any special error checking for domain or range errors, beyond that
provided by underlying operations such as sqrt().
class A {
int x;
public:
A() {printf("A::A %lx\n", (unsigned long)this);}
~A() {printf("A::~A %lx\n", (unsigned long)this);}
};
return vp;
}
92
C++ Tutorials
int main()
{
A* ap = new A[10];
delete [] ap;
return 0;
}
This example redefines operator new[]() and operator delete[](), and they are invoked when
the program is executed.
When operator new[]() is called, it is passed an argument indicating how many bytes are
required for the total array. In this example, approximately 40 bytes are needed for the 10
array slots (this will vary from system to system, with overhead for each chunk of space
allocated).
In the example above, the actual bytes are allocated via a call to operator new(), that is, the
non-array version is called to allocate the bytes. operator delete[]() works in a similar way.
Note that the C++ standard specifies that the size of the array is saved, so that when it is
deleted, the system will know how many slots to iterate across to call the destructors for
individual objects.
Typical output of the program is:
allocated size = 44
allocated pointer = 7b2514
A::A 7b2518
A::A 7b251c
A::A 7b2520
A::A 7b2524
A::A 7b2528
A::A 7b252c
A::A 7b2530
A::A 7b2534
A::A 7b2538
A::A 7b253c
A::~A 7b253c
A::~A 7b2538
A::~A 7b2534
A::~A 7b2530
A::~A 7b252c
A::~A 7b2528
A::~A 7b2524
A::~A 7b2520
A::~A 7b251c
93
C++ Tutorials
A::~A 7b2518
returned pointer = 7b2514
Note that objects are constructed and then destructed in LIFO (last-in first-out) order. Also,
note that we used C-style I/O instead of stream I/O to print out information. Why is this? If
stream I/O is used here, the program will crash with a popular compiler, probably because at
the first call to operator new[](), the I/O system is not initialized as yet (the call to new in this
case is presumably to obtain a buffer to initialize the system). So you need to be very careful
in overloading the global versions of new and delete.
It's also possible to define operator new[]() and operator delete[]() on a per-class basis.
Considering this feature and the one described in the next section, there are six varieties each
of new and delete:
regular + throws exception
space
horizontal tab
vertical tab
form feed
newline
94
C++ Tutorials
or 96 characters in all. These are the characters used to compose a C++ source program.
Some national character sets, such as the European ISO-646 one, use some of these character
positions for other letters. The ASCII characters so affected are:
[]{}|\
To get around this problem, C++ defines trigraph sequences that can be used to represent these
characters:
[ ??(
] ??)
{ ??<
} ??>
| ??!
\ ??/
# ??=
^ ??'
~ ??-
Trigraph sequences are mapped to the corresponding basic source character early in the
compilation process.
C++ also has the notion of "alternative tokens", that can be used to replace tokens with others.
The list of tokens and their alternatives is this:
{ <%
} %>
[ <:
] :>
# %:
## %:%:
&& and
| bitor
|| or
^ xor
95
C++ Tutorials
~ compl
& bitand
&= and_eq
|= or_eq
^= xor_eq
! not
!= not_eq
Another idea is the "basic execution character set". This includes all of the basic source
character set, plus control characters for alert, backspace, carriage return, and null. The
"execution character set" is the basic execution character set plus additional implementation-
defined characters. The idea is that a source character set is used to define a C++ program
itself, while an execution character set is used when a C++ application is executing.
Given this notion, it's possible to manipulate additional characters in a running program, for
example characters from Cyrillic or Greek. Character constants can be expressed using any of:
\137 octal
\xabcd hexadecimal
L'\u2345'
The above features may not yet exist in your local C++ compiler. They are important to
consider when developing internationalized applications.
Allocators
In previous issues, we've looked at some of the standard containers (such as vector) found in
the C++ Standard Library. One of the interesting issues that comes up is how such containers
manage memory. It turns out that containers use what is called a standard allocator, defined in
<memory>.
To see a bit of how this works, we will devise a custom allocator:
96
C++ Tutorials
#include <vector>
#include <cstddef>
#include <iostream>
void destroy(pointer p)
{
p->~T();
}
return p;
}
operator delete(p);
}
};
This allocator overrides the default allocator for vector, and is specified by saying:
vector<int, alloc<int> > v;
The allocator template establishes a series of standard internal types such as "pointer". The
real work gets done in allocate() and deallocate(), which are used to allocate N objects of type
97
C++ Tutorials
T. The standard allocator uses operator new() and operator delete() to actually
allocate/deallocate memory.
Normally you don't need to worry too much about this area, but sometimes for reasons of
speed and space you may wish to construct your own allocator for use with standard
containers. For example, allocators can be written that are efficient for very small objects, or
that use shared memory, or that use memory from pre-allocated pools of objects.
C++ as a Better C
• Function Prototypes
• References
• Operator New/Delete
• Declaration Statements
• Function Overloading
• Operator Overloading
• Inline Functions
• Type Names
• External Linkage
• General Initializers
• Jumping Past Initialization
• Function Parameter Names
• Character Types and Arrays
• Function-style Casts
• Bit Field Types
• Anonymous Unions
• Empty Classes
• Hiding Names
98
C++ Tutorials
Function Prototypes
People often ask about how to get started with C++ or move a project or development team to
the language. There are many answers to this question. One of the simplest and best is to begin
using C++ as a "better C". This term doesn't have a precise meaning but can be illustrated via
a series of examples. We will cover some of these examples in forthcoming issues of the
newsletter.
One simple but important area of difference between C and C++ deals with the area of
function definition and invocation. In older versions of C ("Classic C"), functions would be
defined in this way:
f(s)
char* s;
{
return 0;
}
The return type of this function is implicitly "int", and the function has no prototype. In ANSI
C and in C++, a similar definition would be:
int f(char* s)
{
return 0;
}
Why does this matter? Well, suppose that you call the function with this invocation:
f(s)
char* s;
{
return 0;
}
g()
{
f(23);
}
In Classic C, this would be a serious programming error, because a value of integer type (23)
is being passed to a function expecting a character pointer. However, the error would not be
flagged by the compiler, and the result would be a runtime failure such as a crash. By contrast,
in ANSI C and in C++ the compiler would flag such usage.
Very occasionally, you want to cheat, and actually pass a value like
23 as a character pointer. To do this, you can say:
f((char*)23);
99
C++ Tutorials
Such usage is typically only seen in very low level systems programming.
Using function prototypes in C++ is a big step forward from Classic C; this approach will
eliminate a large class of errors in which the wrong number or types of arguments are passed
to a function.
References
In the last newsletter we discussed using function prototypes in C++ to eliminate a common
type of error encountered in C, that of calling a function with the wrong number or types of
arguments. Another C++ feature that can be used to reduce programming errors is known as
references.
int i;
int& ir = i;
ir is another name for i. To see how references are useful, and also how they're implemented,
consider writing a function that has two return values to pass back. In ANSI C, we might say:
void f(int a, int b, int* sum, int* prod)
{
*sum = a + b;
*prod = a * b;
}
void g()
{
int s;
int p;
void g()
{
int s;
100
C++ Tutorials
int p;
Operator New/Delete
In the first newsletter we talked about using C++ as a better C. This term doesn't have a
precise meaning, but one way of looking at it is to focus on the features C++ adds to C,
exclusive of the most obvious one, namely the class concept used in object-oriented
programming.
One of these features is operator new and operator delete. These are intended to replace
malloc() and free() in the C standard library. To give an example of how these are similar and
how they differ, suppose that we want to allocate a 100-long vector of integers for some
purpose. In C, we would say:
int* ip;
ip = (int*)malloc(sizeof(int) * 100);
...
free((void*)ip);
With new/delete in C++, we would have:
int* ip;
ip = new int[100];
101
C++ Tutorials
...
delete ip;
The most obvious difference is that the C++ approach takes care of the low-level details
necessary to determine how many bytes to allocate. With the C++ new operator, you simply
describe the type of the desired storage, in this example "int[100]".
The C and C++ approaches have several similarities:
- neither malloc() nor new initialize the space to zeros
new_handler set_new_handler(new_handler);
void f()
{
printf("new handler invoked due to new failure\n");
exit(1);
}
main()
{
float* p;
set_new_handler(f);
for (;;)
p = new float[5000]; // something that will
// fail eventually
return 0;
}
A new handler is a way of establishing a hook from the C++ standard library to a user
program. set_new_handler() is a library function that records a pointer to another function that
is to be called in the event of a new failure.
It is possible to define your own new and delete functions. For example:
void* operator new(size_t s)
102
C++ Tutorials
{
// allocate and align storage of size s
(clarification of above)
In the previous issue of the newsletter, there was an example:
int* ip;
ip = new int[100];
delete ip;
This code will work with many compilers, but it should instead read:
int* ip;
ip = new int[100];
delete [] ip;
This is an area of C++ that has changed several times in recent years. There are a number of
issues to note. The first is that new and delete in C++ have more than one function. The new
operator allocates storage, just like malloc() in C, but it is also responsible for calling the
constructor for any class object that is being allocated. For example, if we have a String class,
saying:
String* p = new String("xxx");
will allocate space for a String object, and then call the constructor to initialize the String
object to the value "xxx". In a similar way, the delete operator arranges for the destructor to be
called for an object, and then the space is deallocated in a manner similar to the C function
free().
If we have an array of class objects, as in:
String* p = new String[100];
then a constructor must be called for each array slot, since each is a class object. Typically this
processing is handled by a C++ internal library function that iterates over the array.
In a similar way, deallocation of an array of class objects can be done by saying:
delete [] p;
It used to be that you had to say:
103
C++ Tutorials
delete [100] p;
but this feature is obsolete. The size of the array is recovered by the library function that
implements the delete operator for arrays. The pointer/size pair can be stored in an auxiliary
data structure or the size can be stored in the allocated block before the first actual byte of
data.
What makes this a bit tricky is that all of this work of calling constructors and destructors
doesn't matter for fundamental data types like int:
int* ip;
ip = new int[100];
delete ip;
This code will work in many cases, because there are no destructors to call, and deleting a
block of storage works pretty much the same whether it's treated as an array of ints or a single
large chunk of bytes.
But more recently, the ANSI standardization committee has decided to break out the new and
delete operators for arrays as separate functions, so that a program can control the allocation of
arrays separately from other types. For example, you can say:
void* operator new(unsigned int) {/* ... */ return 0;}
void f()
{
int* ip;
Declaration Statements
In C, when you write a function, all the declarations of local variables must appear at the top
of the function or at the beginning of a block:
void f()
{
int x;
/* ... */
while (x) {
int y;
104
C++ Tutorials
/* ... */
}
}
Each such variable has a lifetime that corresponds to the lifetime of the block it's declared in.
So in this example, x is accessible throughout the whole function, and y is accessible inside
the while loop.
In C++, declarations of this type are not required to appear only at the top of the function or
block. They can appear wherever C++ statements are allowed:
class A {
public:
A(double);
};
void f()
{
int x;
/* ... */
while (x) {
/* ... */
}
int y;
y = x + 5;
/* ... */
A aobj(12.34);
}
and so on. Such a construction is called a "declaration statement". The lifetime of a variable
declared in this way is from the point of declaration to the end of the block.
/* i no longer available */
In this example the scope of i is the for statement. The rule about the scope of such variables
has changed fairly recently as part of the ANSI standardization process, so your compiler may
have different behavior.
Why are declaration statements useful? One benefit is that introducing variables with shorter
lifetimes tends to reduce errors. You've probably encountered very large functions in C or C++
where a single variable declared at the top of the function is used and reused over and over for
different purposes. With the C++ feature described here, you can introduce variables only
when they're needed.
105
C++ Tutorials
Function Overloading
Suppose that you are writing some software to manipulate calendar dates, and you wish to
allow a user of the software to specify dates in one of two forms:
8, 4, 1964 (as a triple of numbers)
void f(long) {}
is fine.
A common place where function overloading is seen is in constructors for a class. For
example, we might have:
class Date {
...
public:
Date(int m, int d, int y);
Date(char*);
};
to represent a calendar date. Two constructors, representing the two ways of creating a date
object (from a triple of numbers and from a string) are specified.
class String {
...
public:
String();
String(char*);
106
C++ Tutorials
String(char);
};
Here we have three constructors, the first to create a null String and the second to create a
String from a char*. The third constructor creates a String from an individual character, so that
for example 'x' turns into a String "x".
What happens if you declare a String object like this:
String s(37);
Clearly, the first String constructor won't be called, because it takes no arguments. And 37
isn't a valid char*, so the second constructor won't be used. That leaves String(char), but 37 is
an int and not a char. The third constructor will indeed be called, after 37 is demoted from an
int to a char.
In this case, the user "got away" with doing things this way, though it's not clear what was
intended. Usage like:
String s(12345);
is even more problematic, because 12345 cannot be converted to a char in any meaningful
way.
The process of determining which function should be called is known as "argument
matching", and it's one of the most difficult aspects of C++ to understand. Function
overloading is powerful, but it's smart to use it in a way that makes clear which function will
be called when.
Operator Overloading
Suppose that you are using an enumeration and you wish to output its value:
enum E {e = 37};
cout << e;
37 will indeed be output, by virtue of the enumerator value being promoted to an int and then
output using the operator<<(int) function found in iostream.h.
But what if you're interested in actually seeing the enumerator values in symbolic form? One
approach to this would be as follows:
#include <iostream.h>
107
C++ Tutorials
case e2:
s = "e2";
break;
case e3:
s = "e3";
break;
default:
s = "badvalue";
break;
}
return os << s;
}
main()
{
enum E x;
x = e3;
return 0;
}
In the last output statement, we created an invalid enumerator value and then output it.
Operator overloading in C++ is very powerful but can be abused. It's quite possible to create a
system of operators such that it is difficult to know what is going on with a particular piece of
code.
Some uses of overloaded operators, such as [] for array indexing with subscript checking, ->
for smart pointers, or + - * / for doing arithmetic on complex numbers, can make sense, while
other uses may not.
Inline Functions
Suppose that you wish to write a function in C to compute the maximum of two numbers. One
way would be to say:
int max(int a, int b)
{
return (a > b ? a : b);
108
C++ Tutorials
}
But calling a frequently-used function can be a bit slow, and so you instead use a macro:
#define max(a, b) ((a) > (b) ? (a) : (b))
The extra parentheses are required to handle cases like:
max(a = b, c = d)
This approach can work pretty well. But it is error-prone due to the extra parentheses and also
because of side effects like:
max(a++, b++)
An alternative in C++ is to use inline functions:
inline int max(int a, int b)
{
return (a > b ? a : b);
}
Such a function is written just like a regular C or C++ function. But it IS a function and not
simply a macro; macros don't really obey the rules of C++ and therefore can introduce
problems. Note also that one could use C++ templates to write this function, with the
argument types generalized to any numerical type.
If an inline function is a member function of a C++ class, there are a couple of ways to write
it:
class A {
public:
void f() { /* stuff */ } // "inline" not needed
};
or:
class A {
public:
inline void f();
};
109
C++ Tutorials
Inlining tends to blow up the size of code, because the function is expanded at each point of
call. The one exception to this rule would be a very small inline function, such as one used to
access a private data member:
class A {
int x;
public:
int getx() {return x;}
};
which is likely to be both faster and smaller than its non-inline counterpart.
A simple rule of thumb when doing development is not to use inline functions initially. After
development is mostly complete, you can profile the program to see where the bottlenecks are
and then change functions to inlines as appropriate.
Here's a complete program that uses inline functions as part of an implementation of bit maps.
Bit maps are useful in storing true/false values efficiently. Note that in a couple of places we
could use the new bool fundamental type in place of ints. Also note that this implementation
assumes that chars are 8 bits in width; there's no fundamental reason they have to be (in
Java(tm) the Unicode character set is used and chars are 16 bits).
This example runs about 50% faster with inlines enabled.
#include <assert.h>
#include <stdlib.h>
#include <string.h>
//#define inline
class Bitmap {
typedef unsigned long UL; // type of specified bit num
UL len; // number of bits
unsigned char* p; // pointer to the bits
UL size(); // figure out bitmap size
public:
Bitmap(UL); // constructor
~Bitmap(); // destructor
void set(UL); // set a bit
void clear(UL); // clear a bit
int test(UL); // test a bit
void clearall(); // clear all bits
};
// constructor
inline Bitmap::Bitmap(UL n)
{
110
C++ Tutorials
// destructor
inline Bitmap::~Bitmap()
{
delete [] p;
}
// set a bit
inline void Bitmap::set(UL bn)
{
assert(bn < len);
p[bn / 8] |= (1 << (bn % 8));
}
// clear a bit
inline void Bitmap::clear(UL bn)
{
assert(bn < len);
p[bn / 8] &= ~(1 << (bn % 8));
}
#ifdef DRIVER
main()
{
const unsigned long N = 123456L;
int i;
long j;
int k;
111
C++ Tutorials
int r;
bm.clearall();
for (j = 0; j < N; j++)
assert(!bm.test(j));
k = 1000;
while (k-- > 0) {
r = rand() & 0xffff;
bm.set(r);
assert(bm.test(r));
bm.clear(r);
assert(!bm.test(r));
}
}
return 0;
}
#endif
112
C++ Tutorials
Java and all Java-based marks are trademarks or registered trademarks of Sun Microsystems,
Inc. in the United States and other countries. Throughout this document, when the Java
trademark appears alone it is a reference to the Java programming language. When the JDK
trademark appears alone it is a reference to the Java Development Kit.
Type Names
In C, a common style of usage is to say:
struct A {
int x;
};
typedef struct A A;
after which A can be used as a type name to declare objects:
void f()
{
A a;
}
In C++, classes, structs, unions, and enum names are automatically type names, so you can
say:
struct A {
int x;
};
void f()
{
A a;
}
or:
enum E {ee};
void f()
{
E e;
}
By using the typedef trick you can follow a style of programming in C somewhat like that
used in C++.
But there is a quirk or two when using C++. Consider usage like:
struct A {
int x;
};
int A;
113
C++ Tutorials
void f()
{
A a;
}
This is illegal because the int declaration A hides the struct declaration. The struct A can still
be used, however, by specifying it via an "elaborated type specifier":
struct A
The same applies to other type names:
class A a;
union U u;
enum E e;
Taking advantage of this feature, that is, giving a class type and a variable or function the
same name, isn't very good usage. It's supported for compatibility reasons with old C code; C
puts structure tags (names) into a separate namespace, but C++ does not. Terms like "struct
compatibility hack" and "1.5 namespace rule" are sometimes used to describe this feature.
External Linkage
One of the common issues that always comes up with programming languages is how to mix
code written in one language with code written in another.
For example, suppose that you're writing C++ code and wish to call C functions. A common
case of this would be to access C functions that manipulate C-style strings, for example
strcmp() or strlen(). So as a first try, we might say:
extern size_t strlen(const char*);
and then use the function. This will work, at least at compile time, but will probably give a
link error about an unresolved symbol.
The reason for the link error is that a typical C++ compiler will modify the name of a function
or object ("mangle" it), for example to include information about the types of the arguments.
As an example, a common scheme for mangling the function name strlen(const char*) would
result in:
strlen__FPCc
There are two purposes for this mangling. One is to support function overloading. For
example, the following two functions cannot both be called "f" in the object file symbol table:
int f(int);
int f(double);
But suppose that overloading was not an issue, and in one compilation unit we have:
extern void f(double);
and we use this function, and its name in the object file is just "f". And suppose that in another
compilation unit the definition is found, as:
114
C++ Tutorials
void f(char*) {}
This will silently do the wrong thing -- a double will be passed to a function requiring a char*.
Mangling the names of functions eliminates this problem, because a linker error will instead
be triggered. This technique goes by the name "type safe linkage".
So to be able to call C functions, we need to disable name mangling. The way of doing this is
to say:
extern "C" size_t strlen(const char*);
or:
extern "C" {
size_t strlen(const char*);
int strcmp(const char*, const char*);
}
This usage is commonly seen in header files that are used both by C and C++ programs. The
extern "C" declarations are conditional based on whether C++ is being compiled instead of C.
Because name mangling is disabled with a declaration of this type, usage like:
extern "C" {
int f(int);
int f(double);
}
is illegal (because both functions would have the name "f").
Note that extern "C" declarations do not specify the details of what must be done to allow C++
and C code to be mixed. Name mangling is commonly part of the problem to be solved, but
only part.
There are other issues with mixing languages that are beyond the scope of this presentation.
The whole area of calling conventions, such as the order of argument passing, is a tricky one.
For example, if every C++ compiler used the same mangling scheme for names, this would
not necessarily result in object code that could be mixed and matched.
General Initializers
In C, usage like:
int f() {return 37;}
int i = 47;
int j;
for global variables is legal. Typically, in an object file and an executable program these types
of declarations might be lumped into sections with names like "text", "data", and "bss",
meaning "program code", "data with an initializer", and "data with no initializer".
When a program is loaded by the operating system for execution, a common scheme will have
the text and data stored within the binary file on disk that represents the program, and the bss
115
C++ Tutorials
section simply stored as an entry in a symbol table and created and zeroed dynamically when
the program is loaded.
There are variations on this scheme, such as shared libraries, that are not our concern here.
Rather, we want to discuss the workings of an extension that C++ makes to this scheme,
namely general initializers for globals. For example, I can say:
int f() {return 37;}
int i = 47;
int j = f() + i;
In some simple cases a clever compiler can compute the value that should go into j, but in
general such values are not computable at compile time. Note also that sequences like:
class A {
public:
A();
~A();
};
A a;
are legal, with the global "a" object constructed before the program "really" starts, and
destructed "after" the program terminates.
Since values cannot be computed at compile time, they must be computed at run time. How is
this done? One way is to generate a dummy function per object file:
int f() {return 37;}
int i = 47;
int j; // = f() + i;
116
C++ Tutorials
(objects that persist for the life of the program) are zeroed, then constant initializers are
applied (as in C), then dynamic general initializers are applied "before the first use of a
function or object defined in that translation unit".
Calling the function abort() defined in the standard library will terminate the program without
destructors for global static objects being called. Note that some libraries, for example stream
I/O, rely on destruction of global class objects as a hook for flushing I/O buffers. You should
not rely on any particular order of initialization of global objects, and using a startup()
function called from main(), just as in C, still can make sense as a program structuring
mechanism for initializing global objects.
int main()
{
goto xxx;
{
int x = 0;
xxx:
printf("%d\n", x);
}
return 0;
}
With one compiler, compiling and executing this program as C code results in a value of 512
being printed, that is, garbage is output. Thus the restriction makes sense.
The use of goto statements is best avoided except in carefully structured situations such as
jumping to the end of a block. Jumping over initializations can also occur with switch/case
statements.
117
C++ Tutorials
C++ has a feature that allows you to simply omit the parameter name:
int main()
{
printf("%d\n", sizeof('x'));
return 0;
}
If this program is compiled as ANSI C, then the value printed will be sizeof(int), typically 2
on PCs and 4 on workstations. If the program is treated as C++, then the printed value will be
sizeof(char), defined by the draft ANSI/ISO standard to be 1. So the type of a char constant in
C is int, whereas the type in C++ is char. Note that it's possible to have sizeof(char) ==
sizeof(int) for a given machine architecture, though not very likely.
Another difference is illustrated by this example:
#include <stdio.h>
118
C++ Tutorials
int main()
{
printf("%s\n", buf);
return 0;
}
This is legal C, but invalid C++. The string literal requires a trailing \0 terminator, and there is
not enough room in the character array for it. This is valid C, but you access the resulting array
at your own risk. Without the terminating null character, a function like printf() may not work
correctly, and the program may not even terminate.
Function-style Casts
In C and C++ (and Java(tm)), you can cast one object type to another by usage like:
double d = 12.34;
int i = (int)d;
Casting in this way gets around type system checking. It may introduce problems such as loss
of precision, but is useful in some cases.
In C++ it's possible to employ a different style of casting using a functional notation:
double d = 12.34;
int i = int(d);
This example achieves the same end as the previous one.
The type of a cast using this notation is limited. For example, saying:
unsigned long*** p = unsigned long***(0);
is invalid, and would need to be replaced by:
typedef unsigned long*** T;
T p = T(0);
or by the old style:
unsigned long*** p = (unsigned long***)0;
Casting using functional notation is closely tied in with constructor calls. For example:
class A {
public:
A();
A(int);
};
void f()
{
119
C++ Tutorials
A a;
a = A(37);
}
causes an A object local to f() to be created via the default constructor. Then this object is
assigned the result of constructing an A object with 37 as its argument. In this example there is
both a cast (of sorts) and a constructor call. If we want to split hairs a perhaps more
appropriate technical name for this style of casting is "explicit type conversion".
It is also possible have usage like:
void f()
{
int i;
i = int();
}
If this example used a class type with a default constructor, then the constructor would be
called both for the declaration and the assignment. But for a fundamental type, a call like int()
results in a zero value of the given type. In other words, i gets the value 0.
The reason for this feature is to support generality when templates are used. There may be a
template such as:
template <class T> class A {
void f()
{
T t = T();
}
};
and it's desirable that the template work with any sort of type argument.
Note that there are also casts of the form "static_cast<T>" and so on, which we will discuss in
a future issue.
Java and all Java-based marks are trademarks or registered trademarks of Sun Microsystems,
Inc. in the United States and other countries. Throughout this document, when the Java
trademark appears alone it is a reference to the Java programming language. When the JDK
trademark appears alone it is a reference to the Java Development Kit.
120
C++ Tutorials
Here's a small difference between C and C++. In ANSI C, bit fields must be of type "int",
"signed int", or "unsigned int". In C++, they may be of any integral type, for example:
enum E {e1, e2, e3};
class A {
public:
int x : 5;
unsigned char y : 8;
E z : 5;
};
This extension was added in order to allow bit field values to be passed to functions expecting
a particular type, for example:
void f(E e)
{
}
void g()
{
A a;
a.z = e3;
f(a.z);
}
Note that even with this relaxation of C rules, bit fields can be problematic to use. There are
no pointers or references to bit fields in C++, and the layout and size of fields is tricky and not
necessarily portable.
Anonymous Unions
Here's a simple one. In C++ this usage is legal:
struct A {
union {
int x;
double y;
char* z;
};
};
whereas in C you'd have to say:
struct A {
union {
int x;
double y;
char* z;
121
C++ Tutorials
} u;
};
giving the union a name. With the C++ approach, you can treat the union members as though
they were members of the enclosing struct.
Of course, the members still belong to the union, meaning that they share memory space and
only one is active at a given time.
Empty Classes
Here's a simple one. In C, an empty struct like:
struct A {};
is invalid, whereas in C++ usage like:
struct A {};
or:
class B {};
is perfectly legal. This type of construct is useful when developing a skeleton or placeholder
for a class.
An empty class has size greater than zero. Two class objects of empty classes will have
distinct addresses, as in:
class A {};
void f()
{
A* p1 = new A;
A* p2 = new A;
Hiding Names
122
C++ Tutorials
int xxx[10];
int main()
{
struct xxx {
int a;
};
printf("%d\n", sizeof(xxx));
return 0;
}
When compiled as C code, it will typically print a value like 20 or 40, whereas when treated as
C++, the output value will likely be 2 or 4.
Why is this? In C++, the introduction of the local struct declaration hides the global "xxx", and
the program is simply taking the size of a struct which has a single integer member in it. In C,
"sizeof(xxx)" refers to the global array, and a tag like "xxx" doesn't automatically refer to a
struct.
If we said "sizeof(struct xxx)" then we would be able to refer to the local struct declaration.
123
C++ Tutorials
C++ Performance
Java and all Java-based marks are trademarks or registered trademarks of Sun Microsystems,
Inc. in the United States and other countries. Throughout this document, when the Java
trademark appears alone it is a reference to the Java programming language. When the JDK
trademark appears alone it is a reference to the Java Development Kit.
124
C++ Tutorials
But it's still the case that calling a function, even one implemented in assembly language, has
some overhead, which comes from saving registers, manipulating stack frames, actual transfer
of control, and so on. So it might be worth trying to exploit a common case -- the case where
you can determine the relationship of the strings by looking only at the first character.
So we might use an inline function in C++ to encapsulate this logic:
inline int local_strcmp(const char* s, const char* t)
{
return (*s != *t ? *s - *t : strcmp(s, t));
}
If the first characters of each string do not match, there's no need to go further by calling
strcmp(); we already know the answer.
Another way to implement the same idea is via a C macro:
#define local_strcmp(s, t) ((s)[0] != (t)[0] ? (s)[0] - (t)[0] : \
strcmp((s), (t)))
This approach has a couple of disadvantages, however. Macros are hard to get right because of
the need to parenthesize arguments so as to avoid subtly wrong semantics. Writing
local_strcmp() as a real function is more natural.
And macros are less likely to be understood by development tools such as browsers or
debuggers. Inline functions are also a source of problems for such tools, but they at least are
part of the C++ language proper, and many C++ compilers have a way of disabling inlining to
help address this problem.
How much speedup is this approach good for? In the word stemming program, for input of
about 65000 words, the times in seconds were:
strcmp() 9.7
125
C++ Tutorials
The Copy class deals with the problem by internally allocating large blocks and then shaving
off small chunks for individual strings. It keeps track of all the large blocks allocated and
deallocates them when a given Copy object is no longer needed. To use this system, you
would allocate a Copy object for each major subsystem in your application that uses small
strings. For example, at one point in your application, you might need to read in a dictionary
from disk and use it for a while. You would allocate a Copy object and then use it to allocate
the strings for each word, then flush the strings all at once.
In the application that this class was devised for, implementing string copying in this way
saved 50K out of a total available memory pool of 500K. This is with Borland C++, which
rounds the number of requested bytes for a string to the next multiple of 16, or an average
wastage of 8 bytes. Since the Copy class uses 1024-byte chunks, on average 512 bytes will be
wasted for a given set of strings, so the breakeven point would be 512 / 8 = 64 or more strings.
There are many variations on this theme. For example, if you are certain that the strings will
never be freed, then you can simply grab a large amount of memory and shave chunks off of
it, without worrying about keeping track of the allocated memory. Or if you have many
objects of one class, such as tree nodes, you can overload operator new() for that class to do a
similar type of thing.
Note that this particular storage allocator is not general. The allocated storage is aligned on 1-
byte boundaries. This means that trying to allocate other than char* objects may result in
performance degradation or a memory fault (such as "bus error" on UNIX systems). And the
performance gains of course decline somewhat with large strings, while the wastage increases
from stranding parts of the 1024-byte allocated chunks.
This same approach could be used in C or assembly language, but C++ makes it easier and
encourages this particular style of programming.
An example of usage is included. A dictionary of 20065 words with total length 168K is read
in. Without use of the Copy class it requires 354K, an 111% overhead. With the Copy class it
takes 194K, an overhead of 15%. This is a difference of 160K, or 8 bytes per word. The results
will of course vary depending on a particular operating system and runtime library. And the
Copy version runs about 20% faster than the conventional version on a 486 PC.
The driver program that is included will work only with Borland C++, so you will need to
write some other code to emulate the logic.
#include <string.h>
#include <assert.h>
const int COPY_BUF = 1024; // size of buffer to get const int COPY_VEC = 64; // starting
size of vector
class Copy {
int ln; // number of buffers in use
int maxln; // max size of vector
char** vec; // storage vector
int freelen; // length free in current
char* freestr; // current free string
public:
Copy(); // constructor
126
C++ Tutorials
~Copy(); // destructor
char* copy(char*); // copy a string
};
// constructor
Copy::Copy()
{
ln = 0;
maxln = 0;
vec = 0;
freelen = 0;
freestr = 0;
}
// destructor
Copy::~Copy()
{ int i;
// delete buffers
if (vec)
delete vec;
}
// copy a string char* Copy::copy(char* s) {
int i;
char** newvec;
int len;
char* p;
len = strlen(s) + 1;
// reallocate vector
127
C++ Tutorials
assert(newvec);
for (i = 0; i < ln; i++)
newvec[i] = vec[i];
if (vec)
delete vec;
vec = newvec;
}
freelen -= len;
p = freestr;
freestr += len;
strcpy(p, s);
return p;
}
#ifdef DRIVER
#include <stdio.h>
#include <alloc.h>
main() {
long cl;
const int MAXLINE = 256;
char buf[MAXLINE];
FILE* fp;
char* s;
#ifdef USE_COPY
Copy co;
#endif
cl = coreleft();
fp = fopen("c:/src/words", "r");
assert(fp);
while (fgets(buf, MAXLINE, fp) != NULL) {
#ifdef USE_COPY
s = co.copy(buf);
128
C++ Tutorials
#else
s = new char[strlen(buf) + 1];
assert(s);
strcpy(s, buf);
#endif
}
fclose(fp);
printf("memory used = %ld\n", cl - coreleft());
return 0;
}
#endif
129
C++ Tutorials
class B {
A a;
public:
B() {}
};
A::A() {x = 0; y = 0; z = 0;}
Class A has a constructor A::A(), used to initialize three of the class's data members. Class B
has a constructor declared inline (defined in the body of the class declaration). The constructor
is empty.
Suppose that we use a lot of B class objects in a program. Each object must be constructed, but
we know that the constructor function body is empty. So will there be a performance issue?
The answer is possibly "yes", because the constructor body really is NOT empty, but contains
a call to A::A() to construct the A object that is part of the B class. Direct constructor calls are
not used in C++, but conceptually we could think of B's constructor as containing this code:
B::B() {a.A::A();} // construct "a" object in B class
There's nothing sneaky about this way of doing things; it falls directly out of the language
definition. But in complex cases, such as ones involving multiple levels of inheritance, a
seemingly empty constructor or destructor can in fact contain a large amount of processing.
Declaration Statements
Suppose that you have a function to compute factorials (1 x 2 x ... N):
double fact(int n)
{
double f = 1.0;
int i;
for (i = 2; i <= n; i++)
f *= (double)i;
return f;
}
130
C++ Tutorials
and you need to use this factorial function to initialize a constant in another function, after
doing some preliminary checks on the function parameters to ensure that all are greater than
zero. In C you can approach this a couple of ways. In the first, you would say:
/* return -1 on error, else 0 */
int f(int a, int b)
{
const double f = fact(25);
if (a <= 0 || b <= 0)
return -1;
/* use f in calculations */
return 0;
}
This approach does an expensive computation each time, even under error conditions. A way
to avoid this would be to say:
/* return -1 on error, else 0 */
int f(int a, int b)
{
const double f = (a <= 0 || b <= 0 ? 0.0 : fact(25));
if (a <= 0 || b <= 0)
return -1;
/* use f in calculations */
return 0;
}
but the logic is a bit torturous. In C++, using declaration statements (see above), this problem
can be avoided entirely, by saying:
/* return -1 on error, else 0 */
int f(int a, int b)
{
if (a <= 0 || b <= 0)
return -1;
/* use f in calculations */
return 0;
}
131
C++ Tutorials
main()
{
long cnt = 1000000L;
132
C++ Tutorials
would be a better choice on performance grounds, that is, output a single character instead of a
C string containing a single character.
Using one popular C++ compiler (Borland C++ 4.52), and outputting 100K lines using these
three methods, the running times in seconds are:
"\n" 1.9
'\n' 1.3
endl 13.2
Outputting a single character is a little simpler than outputting a string of characters, so it's a
bit faster.
Why is endl much slower? It turns out that it has different semantics. Besides adding a
newline character like the other two forms do, it also flushes the output buffer. On a UNIX-
like system, this means that ultimately a write() system call is done for each line, an expensive
operation. Normally, output directed to a file is buffered in chunks of size 512 or 1024 or
similar.
The Borland compiler has a #define called _BIG_INLINE_ in iostream.h that was enabled to
do more inlining and achieve the times listed here.
Does this sort of consideration matter very much? Most of the time, no. If you're doing
interactive I/O, it is best to write in the style that is plainest to you and others. If, however,
you're writing millions of characters to files, then you ought to pay attention to an issue like
this.
Note also that there's no guarantee that performance characteristics of stream I/O operations
will be uniform across different compilers. It's probably true in most cases that outputting a
single character is cheaper than outputting a C string containing a single character, but it
doesn't have to be that way.
Per-class New/Delete
Some types of applications tend to use many small blocks of space for allocating nodes for
particular types of data structures, small strings, and so on. In issue #002 we talked about a
technique for efficiently allocating many small strings.
Another way of tackling this problem is to overload the new/delete operators on a per-class
basis. That is, take over responsibility for allocating and deallocating the storage required by
class objects. Here is an example of what this would look like for a class A:
#include <stddef.h>
#include <stdlib.h>
class A {
int data;
A* next;
#ifdef USE_ND
133
C++ Tutorials
#ifdef USE_ND
A* A::freelist = 0;
if (freelist) {
A* p = freelist;
freelist = freelist->next;
return p;
}
return malloc(sz);
}
p->next = freelist;
freelist = p;
}
#endif
A::A() {}
A::~A() {}
#ifdef DRIVER
const int N = 1000;
A* aptr[N];
134
C++ Tutorials
int main()
{
int i;
int j;
return 0;
}
#endif
We've also included a driver program. For this example, that recycles the memory for object
instances, the new approach is about 4-5X faster than the standard approach.
When new() is called for an A type, the overloaded function checks the free list to see if any
old recycled instances are around, and if so one of them is used instead of calling malloc().
The free list is shared across all object instances (the freelist variable is static). delete() simply
returns a no-longer-needed instance to the free list.
This technique is useful only for dynamically-created objects. For static or local objects, the
storage has already been allocated (on the stack, for example).
We have again sidestepped the issue of whether a failure in new() should throw an exception
instead of returning an error value. This is an area in transition in the language.
There are other issues with writing your own storage allocator. For example, you have to make
sure that the memory for an object is aligned correctly. A double of 8-byte length may need to
be aligned, say, on a 4-byte boundary for performance reasons or to avoid addressing
exceptions ("bus error - core dumped" on a UNIX system). Other issues include fragmentation
and support for program threads.
Duplicate Inlines
Suppose that you have a bit of code such as:
inline long fact(long n)
{
if (n < 2)
135
C++ Tutorials
return 1;
else
return n * fact(n - 1);
}
int main()
{
long x = fact(23);
return 0;
}
to compute the factorial function via a recursive algorithm. Will fact() actually be expanded as
an inline? In many compilers, the answer is no. The "inline" keyword is simply a hint to the
compiler, which is free to ignore it.
So what happens if the inline function is not expanded as inline? The answer varies from
compiler to compiler. The traditional approach is to lay down a static copy of the function
body, one copy for each translation unit where the inline function is used, and with such
copies persisting throughout the linking phase and showing up in the executable image. Other
approaches lay down a provisional copy per translation unit, but with a smart linker to merge
the copies.
Extra copies of functions in the executable can be quite wasteful of space. How do you avoid
the problem? One way is to use inlines sparingly at first, and then selectively enable inlining
based on program profiling that you've done. Just because a function is small, with a high call
overhead at each invocation, doesn't necessarily mean that it should be inline. For example,
the function may be called only rarely, and inlining might not make any difference to the total
program execution time.
Another approach diagnoses the problem after the fact. For example, here's a simple script that
finds duplicate inlines on UNIX systems:
#!/bin/sh
nm $@ |
egrep ' t ' |
awk '{print $3}' |
sort |
uniq -c |
sort -nr |
awk '$1 > = 2{print}' |
demangle
nm is a tool for dumping the symbol tables of objects or executables. A " t " indicates a static
text (function) symbol. A list of such symbols is formed and those with a count of 2 or more
filtered out and displayed after demangling their C++ names ("demangle" has various names
on different systems).
This technique is simply illustrative and not guaranteed to work on every system.
Note also that some libraries, such as the Standard Template Library, rely heavily on inlining.
STL is distributed as a set of header files containing inline templates, with the idea being that
the inlines are expanded per translation unit.
136
C++ Tutorials
Much of the time such an approach is perfectly acceptable, but it's worth at least knowing
what's going on behind the scenes with inlining, and what you can do about it if performance
is not acceptable.
...
}
to perform basic checks on the passed-in pointer. assert() is a function (actually a macro) that
checks whether its argument is true (non-zero), and aborts the program if not.
But C++ offers additional opportunities to the designer interested in producing quality code.
For example, consider a common problem in C, where vector bounds are not checked during a
dereference operation, and a bad location is accessed or written to.
In C++, you can partially solve this problem by defining a Vector class, with a vector
dereferencing class member defined for the Vector, and the vector size stored:
#include <stdio.h>
#include <assert.h>
137
C++ Tutorials
class Vector {
int len; // number of elements
int* ptr; // pointer to elements
public:
Vector(int); // constructor
~Vector(); // destructor
int& operator[](int); // dereferencing
};
//constructor
Vector::Vector(int n)
{
assert(n >= 1);
{
delete ptr;
}
//dereferencing int& Vector::operator[](int i) {
assert(i >= 1 && i <= len);
return 0;
}
In this example, we create a vector of 10 elements, and the vector is indexed 1..10. If the
vector is dereferenced illegally, as in:
138
C++ Tutorials
v[0] = 0;
an assertion failure will be triggered.
One objection to this technique is that it can be slow. If every vector reference requires a
function call (to Vector::operator[]), then there may be a large performance hit. However,
performance concerns can be dealt with by making the dereferencing function inline.
Two other comments about the above example. We are assuming in these newsletters that if
operator new() fails, it returns a NULL pointer:
ptr = new int[n];
assert(ptr); // check for non-NULL pointer
The current draft ANSI standard says that when such a failure occurs, an exception is thrown
or else a new handler is invoked. Because many C++ implementations still use the old
approach of returning NULL, we will stick with it for now.
The other comment concerns the use of references. In the code:
v[i] = i * i;
the actual code is equivalent to:
v.operator[](i) = i * i;
and could actually be written this way (see a C++ reference book on operator overloading for
details).
Vector::operator[] returns a reference, which can be used on the left-hand side of an
assignment expression. In C the equivalent code would be more awkward:
#include <stdio.h>
int x[10]; // use f() to index into x[10]
int* f(int i) {
return &x[i - 1];
}
main() {
*f(5) = 37;
return 0;
}
139
C++ Tutorials
and a program using the Date struct can initialize a struct like so:
struct Date d;
d.month = 9;
d.day = 25;
d.year = 1956;
And you devise various functions, for example one to compute the number of days between
two dates:
long days_b_dates(struct Date* d1, struct Date* d2);
This approach can work pretty well.
But what happens if someone says:
struct Date d;
d.month = 9;
d.day = 31;
d.year = 1956;
and then calls a function like days_b_dates()? The date in this example is invalid, because
month 9 (September) has only 30 days. Once an invalid date is introduced, functions that use
the date will not work properly. In C, one way to deal with this problem would be to have a
function to do integrity checking on each Date pointer passed to a function like
days_b_dates().
In C++, a simpler and cleaner approach is to use a constructor to ensure the validity of an
object. A constructor is a function called when an object comes into scope. So I could say:
#include <assert.h>
class Date {
int month;
int day;
int year;
static int isleap(int);
public:
Date(int, int, int);
};
140
C++ Tutorials
month = m;
day = d;
year = y;
}
141
C++ Tutorials
Stream I/O
Suppose that you wish to output three values and you use some C-style output to do so:
printf("%d %d %d\n", a, b);
What is wrong here? Well, the output specification calls for three integer values to be output,
but only two were specified. You can probably "get away" with this usage without your
program crashing, with the printf() routine picking up a garbage value from the stack. But
many cases of this usage will crash the program.
printf("%s\n", a);
where the argument is of the wrong type. Stream I/O is fundamentally safer than C-style I/O;
stream I/O is said to be "type safe".
Miscellaneous Topics
• Standard Template Library
• C++ and Java(tm)
• Book Review - The Mythical Man-Month
• Calendar Date Class
• Boyer-Moore-Horspool String Searching
• Book Review - Inner Loops
142
C++ Tutorials
Java and all Java-based marks are trademarks or registered trademarks of Sun Microsystems,
Inc. in the United States and other countries. Throughout this document, when the Java
trademark appears alone it is a reference to the Java programming language. When the JDK
trademark appears alone it is a reference to the Java Development Kit.
143
C++ Tutorials
Giving a detailed comparison of the languages is beyond the scope of the newsletter, but if
you wish to find out more, there are several places to look. Sun has a Web site:
http://java.sun.com
with useful information in it, and an anonymous FTP site as well:
java.sun.com
Another Web site with pointers to many Java resources is:
http://www.gamelan.com
I've also looked at the book "Java!" by Tim Ritchey, which appears to have a lot of useful
information that gives some context to the language and its use. There are many more Java
books in the works that will be appearing in the next few months.
Java and all Java-based marks are trademarks or registered trademarks of Sun Microsystems,
Inc. in the United States and other countries. Throughout this document, when the Java
trademark appears alone it is a reference to the Java programming language. When the JDK
trademark appears alone it is a reference to the Java Development Kit.
144
C++ Tutorials
product is a lot of additional work, on the order of 3X as Brooks describes it. Each of these
steps is independent, therefore Brooks talks about a 9X ratio of cost between a program and a
programming systems product.
Of course, 9X isn't a magic figure, but it captures the huge difference in cost between hacking
out a few thousand lines of code over the weekend and putting out a polished product to
customers.
The book has been updated with significant new material. He discusses the promise and
practicality of object-oriented programming, software reuse, and so on.
Highly recommended.
#ifndef __DATE_H__
#define __DATE_H__
class Date {
Drep d; // actual date
static int init_flag; // init flag
static int isleap(int); // leap year?
static Drep cdays[MAX_YEAR-MIN_YEAR+1]; // cumul days per yr
static void init_date(); // initialize date
static Drep mdy_to_d(int, int, int); // m/d/y --> day
static void d_to_mdy(Drep,int&,int&,int&);// day --> m/d/y
public:
Date(Drep); // constructor from internal
Date(const Date&); // copy constructor
Date(int, int, int); // constructor from m/d/y
Date(const char*); // constructor from char*
operator Drep(); // conversion to Drep
145
C++ Tutorials
#endif
and then the source itself, along with a driver program:
// Date class and driver program
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
#include <ctype.h>
#include <assert.h>
#include "date.h"
init_flag = 1;
for (i = MIN_YEAR; i <= MAX_YEAR; i++) {
cumul += 365 + isleap(i);
cdays[i - MIN_YEAR] = cumul;
}
}
// a leap year?
int Date::isleap(int year)
{
if (year % 4)
return 0;
if (year % 100)
return 1;
if (year % 400)
return 0;
146
C++ Tutorials
return 1;
}
if (!init_flag)
init_date();
return d;
}
if (!init_flag)
init_date();
147
C++ Tutorials
break;
d -= t;
}
assert(i <= 12);
month = i;
day = d;
}
d = dt;
}
148
C++ Tutorials
i = 0;
for (;;) {
if (i == 3)
break;
while (*s && (*s <= ' ' || ISDEL(*s)))
s++;
if (!*s)
break;
j = 0;
if (isdigit(*s)) {
while (isdigit(*s))
buf[i][j++] = *s++;
buf[i][j] = 0;
i++;
}
else if (isalpha(*s)) {
while (isalpha(*s))
buf[i][j++] = tolower(*s++);
buf[i][j] = 0;
i++;
}
else {
break;
}
}
assert(i == 3);
// month
i = 0;
if (isalpha(buf[1][0]))
i = 1;
if (isalpha(buf[i][0])) {
if (buf[i][3])
buf[i][3] = 0;
for (j = 0; j < 12; j++) {
if (!strcmp(buf[i], mon[j]))
break;
}
j++;
mo = j;
}
else {
mo = atoi(buf[i]);
}
149
C++ Tutorials
// day
i = !i;
dy = atoi(buf[i]);
// year
yr = atoi(buf[2]);
if (yr < 100)
yr += 1900;
// copy constructor
Date::Date(const Date& x)
{
d = x.d;
}
// print a date
void Date::print(char* s)
{
int month;
int day;
int year;
char buf[25];
char* t;
150
C++ Tutorials
// day of week
int Date::dow()
{
Drep dw = (d - 1) % 7 + DOW_MIN;
if (dw > 7)
dw -= 7;
return dw;
}
n = 0;
for (i = d; i <= dt.d; i++) {
Date x(i);
dw = x.dow();
if (dw == 1) // sunday
continue;
if (dw == 7) // saturday
continue;
x.get_mdy(mo, dy, yr);
if (mo == 5 && dy >= 25 && dw == 2) // memorial
continue;
if (mo == 9 && dy <= 7 && dw == 2) // labor
continue;
if (mo == 11 && dy >= 22 &&
dy <= 28 && dw == 5) // thanks
151
C++ Tutorials
continue;
if (mo == 1 && dy == 1) // new years
continue;
if (mo == 12 && dy == 31 && dw == 6)
continue;
if (mo == 1 && dy == 2 && dw == 2)
continue;
if (mo == 7 && dy == 4) // 4th july
continue;
if (mo == 7 && dy == 3 && dw == 6)
continue;
if (mo == 7 && dy == 5 && dw == 2)
continue;
if (mo == 12 && dy == 25) // christmas
continue;
if (mo == 12 && dy == 24 && dw == 6)
continue;
if (mo == 12 && dy == 26 && dw == 2)
continue;
n++;
}
return n;
}
#ifdef DRIVER
int main()
{
char buf[25];
for (;;) {
printf("date 1: ");
gets(buf);
Date d1(buf);
printf("date 2: ");
gets(buf);
Date d2(buf);
printf("calendar days = %ld\n", d2 - d1);
printf("work days = %ld\n\n", d1.wdays(d2));
}
return 0;
}
#endif
1. This class represents calendar dates for the years 1875 to 2025. An actual date is stored as
an absolute day number with January 1, 1875 as the basis. There are other ways of storing
dates, for example by representing the month/day/year as integers.
152
C++ Tutorials
2. The header file uses an include guard __DATE_H__ so that it can be included multiple
times without error. It's common in large programming projects to have headers included more
than once.
3. The Date class uses a set of private static utility functions, for example one that determines
if a given year is a leap year or not. These functions are private to the class but do not operate
on object instances of the class.
4. There are a set of constructors used to build Date objects. One of these is a copy constructor
and two others are used to create Date objects from a month/day/year set of numbers, or from
a string which has the date formatted in one of several forms:
September 25, 1956
9/25/56
9 25 56
This particular constructor will be confused by dates written in the
European format, for example:
25/9/56
5. There are member functions for determining what day of the week a given date is (Sunday -
Saturday), and for computing the number of days between two dates.
6. There is also a member function for computing the number of work days between two dates
(inclusive of beginning and end dates). This function is somewhat arbitrary and encodes rules
used in the United States, including boundary holidays (for example, if New Year's is on a
Sunday, Monday will be taken as a holiday).
7. The functions for turning month/day/year into an internal number, and vice versa, use a
precomputed vector that gives the cumulative days since 1875 for a given year. Given this
vector, the approach is straightforward and brute force.
8. The day of week calculation uses modulo arithmetic, based on a known day of week for
January 1, 1875.
9. There are various other ways of handling dates. For example, the UNIX system represents
time as the number of seconds since midnight UTC on January 1, 1970. For file timestamps
and so on, a date system with a granularity of a whole day would not work. As another
example, the western world changed its calendar system in September of 1752, and the above
code would not work across this boundary, even if the Drep representation would handle the
number of days involved.
153
C++ Tutorials
relatively new high-performance algorithms for searching like the Boyer-Moore-Horspool one
(see the book "Information Retrieval" by William Frakes for a description of this algorithm).
This particular algorithm does some preprocessing of the pattern, as a means of determining
how far to skip ahead in the search text if an initial match attempt fails. The results of the
preprocessing are saved in a vector, that is used during the search process.
It is quite possible but inconvenient and inefficient to code this algorithm in C, especially if
the same pattern is to be applied to a large body of text. If coding in C, the preprocessing
would have to be done each time, or else saved in an auxiliary structure that is passed to the
search function.
But with C++, using a class abstraction, this algorithm can be implemented quite neatly:
#include <string.h>
#include <assert.h>
#include <stdio.h>
class Search {
static const int MAXCHAR = 256;
int d[MAXCHAR];
int m;
char* patt;
public:
Search(char*);
int find(char*);
};
Search::Search(char* p)
{
assert(p);
patt = p;
m = strlen(patt);
int k = 0;
int n = strlen(text);
if (m > n)
return -1;
154
C++ Tutorials
int k = m - 1;
while (k < n) {
int j = m - 1;
int i = k;
while (j >= 0 && text[i] == patt[j]) {
j--;
i--;
}
if (j == -1)
return i + 1;
k += d[text[k]];
}
return -1;
}
#ifdef DRIVER
int main(int argc, char* argv[])
{
assert(argc == 3);
fclose(fp);
return !nf;
}
#endif
We've added a short driver program, to produce a search program something like the UNIX
"fgrep" tool.
We construct a Search object based on a pattern, and then apply that pattern to successive lines
of text. Search::find() returns -1 if the pattern is not found, else the starting index >= 0 in the
text.
155
C++ Tutorials
Whether this algorithm will be faster than that available on your local system depends on
several factors. A standard library function like strstr() may be coded in assembly language.
Also, there's another class of string matching algorithms based on regular expressions and
finite state machines, with different performance characteristics.
This simple program illustrates a way of wrapping the details of a particular algorithm into a
neat package, hidden from the user.
156
C++ Tutorials
157
C++ Tutorials
are available in your local C++ compiler. There is often a lag of a year or more between
feature standardization and that feature showing up in an actual compiler].
At the most recent ANSI/ISO C++ standards meeting in Stockholm in July, a major change
was made to the type of string literals. Previously, string literals were of type char[]; now they
are of type const char[].
This repairs a longstanding blemish in C++'s type system. However, it has the potential of
breaking a lot of existing code. To lessen the impact, a new standard conversion has been
added to the language, from string literal to char*. (The type of wide string literals has also
changed, and a similar standard conversion has been added for them).
The result is that some old code will continue to work, but some won't. For example:
char* p = "abc"; // used to compile; still does
void f(char*);
f("abc"); // used to compile, still does
try {
throw "abc";
}
catch (char*) {} // used to catch, now doesn't
The new standard conversion is immediately deprecated, meaning that it may be removed
from the next revision of the standard. If that happens, the first and third examples above will
become compilation errors as well.
One possibly confusing thing about this new standard conversion is that it operates upon a
subset of values of a type (literal constants), rather than on all values of a type (which is more
common). There is precedent, however, in existing standard conversions defined for the null
pointer constant.
If you want to write code that will work under both the old and new rules, you can use just the
new type in some contexts:
const char* p = "abc";
const char* q = expr ? "abc" : "de";
but in some contexts requiring exact type match both types must be specified:
try {
throw "abc";
}
catch (char*) { /* do something */ }
catch (const char*) { /* do the same thing */ }
158
C++ Tutorials
Changing the type of string literals is a big change in the language, which also introduces a
significant new incompatibility with C. Whether the gain is worth the pain is a matter of
opinion, but the ANSI vote was 80% in favor and the ISO vote was unanimous. It is expected
that compiler vendors will provide a compatibility switch that gives string literals their old
type.
159
C++ Tutorials
object files. For example, if code in more than one object file takes the address of the above
function f() and prints it out, the same address value should appear each time. Otherwise,
extern inline functions are like static inline functions: the function definition is compiled
multiple times, once for each source file that calls it.
Implementing this sharing requires additional linker support on some platforms, which may be
part of the reason why "extern inline" is not yet supported in some C++ compilers.
Additionally, this sharing does not always have to be done; if an inline function does not
contain static local variables or have its address taken, there is no way to tell whether the
definition is shared or not, and a compiler is free to not share it (at the cost of increasing
program size). And there is even some compiler sleight-of-hand that can avoid sharing when
these conditions are present.
But what happens if, as in the original example, an extern inline function that is not inlined has
different definitions in the different places it is used?
In this case, there is a violation of C++'s One Definition Rule. This means that the program's
behavior is considered undefined according to the language standard, but that neither the
compiler nor the linker is required to give a diagnostic message. In practice, this means that,
depending on how the implementation works, the compiler or linker may just silently pick one
of the definitions to be used everywhere. For the above example, both of the calls to f() might
return 1 or both might return 4.
But the 1994 change did not risk altering the meaning of any existing code, because an extern
specifier would have to be added to the source to trigger these new semantics.
However, at the most recent standards meeting in Stockholm in July, a further change was
made to make external linkage the default for non-member inline functions. (The immediate
motivation for this change was a need of the new template compilation model that was
adopted at the same meeting; but more generally it was felt that changing the default was an
idea whose time had come, and the change was approved unanimously in both ANSI and
ISO).
With this latest change, all non-member inline functions that do not explicitly specify "static"
will become external, and thus it is possible that existing code will now function differently.
To help cope with this, compilers may provide a compatibility option to give inline functions
their old linkage. It is also possible for users to force the old behavior by use of the
preprocessor
#define inline static inline
but this only works if there are no member functions declared with "inline", and as a
preprocessor-based solution is not recommended.
Note that change of behavior may occur even when there is a single source definition for a
function. For example, assume that the following function is defined in a header file
somewhere:
file3.h:
inline int g(int x, int y)
{
#ifndef NDEBUG
cerr << "I'm in g()" << endl;
#endif
if (x >= y)
return h(x, y);
else
160
C++ Tutorials
return 2 * x - y;
}
Even though the source for the function is defined only once, the function can have different
semantics depending upon where it is compiled. For example, in one file NDEBUG might be
defined, but in another not. Or, the call to function h() might be overloaded and resolve
differently in one file from another, depending upon what other functions were visible in each
file. These cases are also violations of the One Definition Rule, and may lead to a change in
behavior of existing code.
Still another way that existing inline function code can have its behavior altered is if it uses
local static variables. Consider the following function e() defined in a header file:
inline int e() {
static int i = 0;
return ++i;
}
When the function previously had internal linkage, there was a separate "i" allocated within
each object file that had a call to e(). But now that the function gets external linkage, there is
only one copy of e(), and only one "i". This will cause calls to the function to return different
values than before.
The One Definition Rule is a weakness of C++ where software reliability is concerned;
languages with stronger module systems (such as Ada or Java(tm)) do not have these kinds of
problems. As a general guideline, global inline functions should operate upon their arguments,
and avoid static variables, interactions with the surrounding context, and the preprocessor.
Java and all Java-based marks are trademarks or registered trademarks of Sun Microsystems,
Inc. in the United States and other countries. Throughout this document, when the Java
trademark appears alone it is a reference to the Java programming language. When the JDK
trademark appears alone it is a reference to the Java Development Kit.
161
C++ Tutorials
used to represent header file and source file extensions, but they may be different on any given
system):
file1.h:
template <class T>
T max(T a, T b) {
return a > b ? a : b;
}
caller.C:
#include "file1.h"
void c(float x, float y) {
float z = max(x, y);
...
}
The template function definition is included in the header file that declares the function. This
is the simplest method, and up to now has been the only fully portable method; the original
Standard Template Library implementation used this technique almost exclusively.
However, there is a natural reluctance to have all implementation code in header files, and so
the next simplest arrangement is to move the template definitions to regular source files, and
have the header pull them in:
file2.h:
template <class T> T max(T a, T b);
#include "file2.C"
file2.C:
template <class T>
T max(T a, T b) {
return a > b ? a : b;
}
where caller.C is the same as before (except for including file2.h rather than file1.h). The use
of a regular source file for the template definition here is mostly an illusion, since file2.C is
never compiled by itself but rather as part of the compilation of caller.C. But it does at least
suggest a separation of interface and implementation.
A variation on this scheme that is used in some compilers permits you to leave out the explicit
#include in the header:
file2a.h:
template <class T> T max(T a, T b);
with file2.C and caller.C the same as before. Here, the compiler implicitly knows by some rule
where to find the corresponding .C file (usually it looks in the same directory as the .h, for a .C
file with the same base name), and pulls it into the translation unit being compiled. But again,
the .C file is not itself compiled.
All of these methods belong to the "inclusion" model of template compilation. It is the model
that almost all current C++ compilers provide. It is relatively simple to implement and simple
to understand, but while it has sufficed in practice, there are some serious flaws with it. Most
of these are due to the template definition code getting introduced into the instantiating
context, with unexpected name leakage as a result. Consider the following example:
file3.h:
template <class T> void f(T);
#include "file3.C"
162
C++ Tutorials
file3.C:
void g(int);
template <class T> void f(T t) {
g(0);
}
caller3.C:
#include "file3.h"
void g(long);
void h() {
f(3.14);
g(1); // hijacked!
}
Clearly the writer of caller3.C expected the g(1) call to refer to the g(long) in the same source
file. But instead, the g(int) in file3.C is visible as well, and is a better match on overloading
resolution. While use of namespaces can alleviate some of these problems, similar things can
happen due to macros:
caller3a.C:
#define g some_other_name
#include "file3.h"
void h() {
f(3.14);
}
This time, the call g(0) in file3.C, which is clearly intended to refer to the g(int) in that file,
gets altered by the macro defined in the context of caller3a.C.
None of these problems would occur if file3.C were separately compiled, because there would
be no possibility of its context and the caller's context becoming intermingled in unexpected
ways. While these kinds of context problems can also occur in inline functions (see C++
Newsletter #015), the potential for damage with templates is much greater, given the centrality
of templates to modern C++ libraries and applications.
A separate compilation model for templates was envisioned as part of the C++ language
template design from the start, but was never specified in any detailed way. The first attempt
to (partially) implement it (Cfront 3.0) ran into difficulties, and subsequently compiler vendors
shied away from it. The first attempts to specify it in the draft ANSI/ISO standard were
criticized as poorly specified, hard to use, and hard to implement efficiently. A series of
contentious discussions and reversals ensued, but now by way of invention and compromise, a
(what is hoped to be) clear and reasonably efficient version of separate compilation has been
made. In addition, the de facto existing "inclusion" model is also permitted by the standard
(but the implicit inclusion method, illustrated by file2a.h above, will not be, unless by vendor
extension).
163
C++ Tutorials
164
C++ Tutorials
165
C++ Tutorials
ic2.C:
#include "ic1.h"
class A { ... };
void m() {
A a;
f(a); // this starts the instantiations
}
This is a case of transitive instantiation, where m() instantiates f(A) which instantiates
g(Container<A>). Within g(), length(t) is a dependent name lookup, so it can find length either
in the definition context (ic1.C) or in the instantiation context (ic3.C). But it's in neither. It's in
ic2.C, which is considered "intermediate context". Thus this example would not compile as is,
and would have to be recoded to use the inclusion method (basically, drop the "export"'s and
include the .C's into the .h's).
It is an open question how common this kind of intermediate context problem will be. One
analysis found no cases of it in the template-intensive Standard Template Library, which may
be encouraging. As with many of the new inventions of the C++ standardization process, only
time will tell.
166
C++ Tutorials
namespace N {
class A { ... };
A& operator+(const A&, const A&);
void f(A);
void g();
}
Now consider the following function that takes arguments of the class type:
void z(N::A x, N::A y)
{
x + y; // (1)
f(x); // (2)
g(); // (3)
}
Given the original rules for namespaces (just the three basic methods of namespace visibility),
all three of the statements in this function are compilation errors, because none of the
functions being called are visible.
However the standards committee has changed the way functions are looked up. Now there is
a new language rule, which says that the namespaces of the classes of the arguments, and the
namespaces of the base classes of those classes, are included in the search for function
declarations, even when the contents of those namespaces are not otherwise visible.
So, when looking for an operator+() in (1) above, the arguments are x and y, the class of those
arguments is A, and A is declared in namespace N. Thus the compiler looks for an operator+()
in N, and finds one, and the call is legal. A similar process happens for the call to f() in (2).
However, the call to g() in (3) is still a compilation error, because there are no arguments to
direct the lookup. The call would have to be made using one of the basic methods:
N::g(); // explicit qualification
If the arguments to the function have different types, then all the associated namespaces are
searched. Arguments of built-in types such as int have no associated namespace, while
arguments of more complicated types such as pointers to functions bring in the namespaces of
the pointed-to function's parameters and return type.
This new lookup rule was first added to solve some technical language definition problems
with operator functions. It was then added to solve some other problems with template
"dependent name" lookup (see C++ Newsletter #017) and template friends. At that point it
was felt that consistency demanded the new rule be extended to lookup of all functions in all
contexts, and this was done (albeit with some dissent within the committee) at the Stockholm
meeting in July.
Because of the staggered introduction of this rule, for a while you may encounter compilers
that implement it for operator functions but not for other functions, but eventually all
implementations will be in full conformance.
One important thing to note about this rule change is that it is a step toward making
namespaces a powerful scoping and packaging construct, rather than just a transparent vehicle
to avoid name collisions. The art of employing namespaces is still in its early stages, and first
reports have indicated that the basic methods of making names visible are sometimes too
167
C++ Tutorials
verbose (explicit qualification), too broad (using directives), or too prone to error and
omission (using declarations). The new rule may help alleviate some of these problems.
class A {
public:
A() { ... }
A(const A&) { ... std::set_unexpected(u2); }
};
168
C++ Tutorials
A a;
throw a; // which unexpected handler gets called?
}
int main()
{
std::set_unexpected(u1);
f();
return 0;
}
The copy constructor for A is called as part of the throw operation in f(), so by the time the
C++ implementation determines that an unexpected handler needs to be called, u2() is the
current handler. However, based on this recent change, it is the handler in effect at the time of
the throw - u1() - which gets called. On the other hand, if a direct call to terminate() or
unexpected() is made from the application, it is always the current handler which gets called.
Some would argue that this kind of rule just adds complexity without much benefit to already-
complex C++ implementations, but others feel that if an application is going to be dynamically
changing its terminate and unexpected handlers, retaining the correct association is important.
In the next issue we'll talk about another clarification of terminate() and unexpected(), this
time related to the uncaught_exception() library function introduced above.
169
C++ Tutorials
...
http://www.maths.warwick.ac.uk/c++/pub/
170
C++ Tutorials
The ANSI public review period ends on March 18, so if you're interested in submitting a
comment, better do it quickly!
One interesting case is this one, which was featured in the January
1997 issue of the magazine C++ Report:
try {
// exception prone code here, that may do a throw
}
catch (...) {
// common error code here
try {
throw; // re-throw to more specific handler
}
catch (ExceptA&) {
// handle ExceptA here
}
catch (ExceptB&) {
// handle ExceptB here
}
catch (...) {
// handle unknown exceptions here
}
throw;
}
The idea behind the code is to factor out common error handling logic into the first part of the
catch handler (so as not to replicate it), rethrow the exception to get error handling specific to
the exception in the individual inner handlers, and then finally to rethrow the exception again
to let functions further up the call chain do their handling.
The question is, does this code work as intended? The draft standard speaks of a throw
creating a temporary object that is then deleted when the corresponding handler exits. Does
this mean that when the inner handlers above exit, the rethrow will be of a nonexistent
temporary object? The standard isn't really clear on this, and some existing compilers have
been found to do the deletion at the inner handler, with the result that the program crashes.
171
C++ Tutorials
The answer is that this code should indeed work as intended, and that the existing compilers
for which this does not work are wrong. (Fortunately SCO's new C++ compiler is one of the
ones that is getting it right!).
Furthermore, the committee stated that the value of the standard library function
uncaught_exception() (see C++ Newsletter #019) changes (from false to true) at both of the
rethrows, until such time as the rethrown exception is caught again.
Another exception handling issue that was clarified is whether base class destructors are called
when a derived class destructor throws an exception:
class B {
public:
~B() { ... }
};
class D : public B {
public:
~D() { throw "error"; }
};
void f() {
try {
D d;
}
catch (...) { }
}
Does ~B() get called as well as ~D()? The answer is yes. This may seem almost obvious -- it is
part of the general principle of C++ that constructed subobjects always get destroyed if
something goes wrong with the enclosing object -- but in fact there was some debate on this
within the committee.
Finally, one of the comments from the ANSI public review period concerned an area of
exception handling that needed no clarification but is often misunderstood:
try {
throw 0;
}
catch (void *) {
// does the exception get caught here?
}
The handler should not catch the exception, but apparently in some compilers it does. The
draft standard is clear that throw and catch types either have to match exactly, or be related by
inheritance, or be subject to a pointer-to-pointer standard conversion. Since 0 is not of a
pointer type, the last requirement isn't met, and no handler is found. Similarly note that the
whole range of other standard conversions do not apply, so that for example a handler of type
long does not catch an exception of type int.
172
C++ Tutorials
173
C++ Tutorials
Return Void
Jonathan Schilling, [email protected]
One issue that the C++ standards committee has discussed several times is allowing the return
statement to return expressions of type void. An example would be this:
void m();
void n() {
return m();
}
Currently, this is not allowed; the proposal is to extend the language to allow a return
statement to have an expression of void type, within a function of void return type.
The motivation is to make it easier to write templates. Consider this example, which in a
different form appears in the new standard library:
template <class T>
T f(T (*pf)()) {
...
return pf();
174
C++ Tutorials
int g();
void h();
...
175
C++ Tutorials
somebody discovers that the library depends upon a language feature that doesn't exist. Here's
the problem:
class A {
public:
A(); // has a default constructor
};
A a;
class B {
public:
B(int); // doesn't have a default constructor
};
B b(19);
void g() {
f(a, a); // 1) ok, both arguments present
f(a); // 2) ok, second argument defaults to A::A()
f(b, b); // 3) ???
f(b); // 4) error, 2nd arg defaults to absent B::B()
}
The issue is when and where dependent default arguments of template functions get
instantiated. Currently they get instantiated in the context of the declaration of the function,
which means that the line 3) call above is an error, because B::B() is looked up and found not
to exist. However significant parts of the new standard library have been written under the
assumption that line 3) will compile, on the grounds that the default argument is not actually
needed, and that only line 4) should cause an error.
So what to do now?
The problem could be worked around in the library by adding overloaded function signatures
(as in the current draft's rewrite rule) or by using an additional intermediate template, but
neither solution is concise or graceful, either for the standard library or for user-written classes
in the years to come.
On the other hand modifying the language to not instantiate default arguments unless needed
involves the usual complexities of template instantiations and their context (see C++
Newsletter #016 and #017) and is a tricky change to make this late in the standards process.
This inconsistency between language and library arose by mutual confusion and happenstance,
which of course made the discussion about it at the last meeting especially overwrought, with
no consensus reached. The question will probably be decided at the standards meeting next
month, and we'll let you know what happens.
176
C++ Tutorials
void z() {
f(b, b); // used to be error, under new rule is ok
f(b); // error, both old and new rules
}
Under the old rules, the instantiation of f(b, b) occurred in the context of the definition of the
template function, at which point the default argument value B::B() would be looked up, and
would cause an error because it did not exist.
Under the new rule, a distinction is made between template arguments that depend upon the
template formal parameters and those that don't. This distinction is already made in other
template contexts such as name lookup and separate compilation (see issue #017) so no new
specification wording is needed.
For non-dependent default arguments, name binding and error checking may be done at the
point of function definition (but need not be, in which case it is done at the point of
instantiation). However, for dependent default arguments, name binding and error checking
may only be done when the default argument is needed, that is, when the function is called
without the argument.
Thus, in the above example, the default argument is dependent (since it involves the template
formal parameter T), and so the instantiation of f(b, b) does not cause any error, since the
default argument is not needed. However the instantiation of f(b) would cause an error, since
then the default argument is needed.
For an example of a function with a non-dependent default argument, consider this addition to
the example:
template <class T>
void g(T t1, char c2 = b);
void y() {
g(b, 'x'); // error, old and new rules
}
177
C++ Tutorials
The default argument expression has nothing to do with the template formal parameter T, and
so the error in the default expression (an object of class B cannot be converted to char) must
be given, even though the default argument is not used. This error may be given either at the
point of the definition of g() or at the point of instantiation.
The discussion of this issue in the committee continued to be hotter than it needed to be, but in
the end there was a large majority supporting the language modification. This means that the
library can remain unchanged in its use of template default arguments.
178
C++ Tutorials
Java and all Java-based marks are trademarks or registered trademarks of Sun Microsystems,
Inc. in the United States and other countries. Throughout this document, when the Java
trademark appears alone it is a reference to the Java programming language. When the JDK
trademark appears alone it is a reference to the Java Development Kit.
179
C++ Tutorials
The draft ANSI/ISO C++ standard library incorporates by reference much of the C standard
library. This has always been the case, back to the earliest days of C++; what has changed
during the standardization process is the placement of C standard library names into
namespace std, the namespace that also holds the C++ standard library.
Thus, a proper C++ program now calls the C standard library like this:
#include <cstdio>
...
std::printf("hello, old library\n");
rather than in either of the ways it used to:
#include <stdio.h>
...
::printf("hello, old library\n"); // explicit scoping
// or
printf("hello, old library\n"); // lazy but more typical
The old forms are still accepted by virtue of a deprecated backward compatibility provision,
which states that <stdio.h> has the effect of pulling in <cstdio> and making its std:: names
visible as if there were using declarations for them.
But what occurs if you have declared your own name in the global namespace, that happens to
be the same as one of the names in the C standard library? That is, something like:
double printf = 3.1416;
Is this well-formed? Does it depend upon whether <cstdio> is present?
Or upon whether <stdio.h> is present?
At the most recent standards meeting in London in July, the committee decided to reserve all
C standard library names for the implementation, in both namespace std and in the global
namespace, regardless of whether any of the headers that define those names are present. So
for example the above declaration of printf would result in undefined behavior.
This decision was made primarily to make C++ compiler and library vendors' lives a little
easier, since for various reasons putting the C standard library into namespace std has proved
to be a major headache for them. (Indeed some vendors have tried for several meetings to get
the C standard library taken out of namespace std altogether, but this has been rejected by the
committee).
With a commonly known name such as printf, it's unlikely anyone will declare it themselves.
But there are many names in the C standard library, and it is possible to collide with some of
the more obscure ones. The best advice is this:
Stay out of the global namespace.
That is, all C++ application code should be put into an appropriate namespace (named or
unnamed; see C++ Newsletter issues #001 through #004); then you will never risk collision
with implementation-reserved names or names that come in through system headers (unless
they are macros, against which namespaces offer no protection).
180
C++ Tutorials
something.C:
#include "something.h"
181
C++ Tutorials
Now suppose we do the same thing, but with a pointer to function as a parameter:
something.h:
extern "C" {
void g(int (*pf)());
}
something.C:
#include "something.h"
something.C:
void g(pf) { ... }
This works because the typedef preserves the "pointer to C" characteristic of the type when pf
appears again in the definition of g.
Either of these approaches will also work properly with compilers implementing the old
language rules.
182
C++ Tutorials
http://www.research.att.com/~bs/iso_release.html
183
C++ Tutorials
2. Other operations do not have that guarantee, but at least will not leak memory, fail to
destruct constructed objects, or behave in an undefined manner upon destruction of the
container.
The first level of guarantee can be thought of as "commit or rollback" semantics. For example,
if you insert an object into a list, you can know that if the insertion is successful the list will
now contain that object, or if it is unsuccessful the list will remain unchanged.
The second level of guarantee is weaker than that. It simply states that the contents of the
container are undefined, but the program will still behave reasonably otherwise. Thus, for
example, if an insertion into a vector is unsuccessful, you will still have a working,
destructible vector, but its contents are unknown -- it might be unchanged from before the
operation, or it might be empty, or anywhere in between.
The rationale for the difference between the two levels of guarantees is based on how the
different STL containers are implemented and the difficulty of supporting the guarantees; this
and other details of exception safety in containers will be discussed in more detail in the next
issue.
Here's an example of what that means, using lists and vectors (C++
Newsletter #015):
#include <iostream>
#include <list>
class A {
public:
A(int i) { n = i; }
#if CCTOR
184
C++ Tutorials
A(const A& a) {
if (a.n < 6)
n = a.n;
else
throw "too large";
}
#endif
int get() const { return n; }
private:
int n;
};
int main() {
list<A> la;
la.push_back(A(0)); la.push_back(A(1));
try {
for (int i = 2; i < 10; i++) {
LI li = la.begin(); li++;
la.insert(li, A(i));
}
} catch (const char* s) { }
185
C++ Tutorials
Now, let's take the above example and use a vector rather than a list. (This can be done by
simply editing the three text occurrences of "list" to "vector").
Without the copy constructor, we get the same output as for list:
0987654321
But with the copy constructor included via defining CCTOR, we get:
0543211
which looks like a "wrong" value (there's an extra 1). What happened?
Remember in the previous issue we talked about two levels of guarantees in terms of
exception safety in containers. The stronger level is the commit-or-rollback level, and is
required for most operations on lists, maps, and sets. The weaker level doesn't guarantee the
contents of the container, but does guarantee that the container will be well-formed (for
example, you can iterate through it and destruct it). This weaker level is all that is required of
most operations on vectors and deques.
The basic rationale for the difference is to permit efficient implementation. The first group of
containers are typically "node- based", meaning elements of a container are allocated in
separate nodes that are linked together, while the second group of containers are "array-
based", meaning elements of a container are allocated in contiguous storage. It's a lot easier to
provide commit-or-rollback semantics on node-based containers than array-based ones; hence
the two levels of guarantees.
So, while the above example for vectors is guaranteed to execute to completion, there's no way
of knowing what the output will be. (The "0543211" output comes from the Silicon Graphics
free STL, and looks to be the result of a partial resizing or copying operation; another STL
implementation might produce an entirely different result).
There are some special cases to the general description of the two levels of guarantees above.
For instance: multiple element insertion operations on maps and sets do not have the first level
guarantee. Insertions of PODs - plain old C-level structs - for vectors and deques do have it.
Stacks and queues have it as well. Thus for complete details you'll have to check the standard
or a reference book.
In conclusion, the basic benefit of all this is that if you have classes that use exception
handling, you can put them into standard library containers and get reasonable and useful
behavior in the event an exception is thrown.
auto_ptr
Jonathan Schilling, [email protected]
This newsletter has not yet mentioned the auto_ptr class found in the new standard library.
This template class provides a simple form of local, exception-safe, dynamic memory
allocation. Its design has undergone some changes during the standardization process, but is
now final.
To understand the purpose of auto_ptr, first consider this code:
class SomeClass { ... void foo(); ... };
186
C++ Tutorials
void f() {
SomeClass* p = new SomeClass();
p->foo();
}
Calling function f() causes a memory leak, because the storage acquired by the new operator is
never released and the destructor for SomeClass (which may itself release other acquired
storage or resources within SomeClass) is never called.
Even if the coding of f() is changed to:
void f() {
SomeClass* p = new SomeClass();
p->foo();
delete p;
}
the function may still leak, because if an exception is thrown out of the call to foo(), the delete
statement will never execute.
There are several approaches to getting around this type of problem but the one that best
preserves the structure of the code is to use the standard library's auto_ptr class, like this:
#include <memory>
using namespace std;
void f() {
auto_ptr<SomeClass> p(new SomeClass());
p->foo();
}
auto_ptr is a template class that is instantiated with the class being pointed to as its argument
type. Objects of the auto_ptr class are initialized by a regular pointer to the class being pointed
to. Once created, auto_ptr objects are used just like a regular pointer: that is, the * and ->
operations are defined for auto_ptr objects, with practically no additional overhead over their
built-in versions.
But most importantly, it is not necessary to write a delete statement to correspond to the new
operator above; rather, as part of the destruction of the auto_ptr object at the end of its scope
(function f(), in this case), the memory acquired by the new will be deleted, and the
SomeClass destructor will be called. And since destructors are called for local objects when
exceptions are thrown and the stack is unwound (see C++ Newsletter #017), the same will
happen if an exception is thrown by the call to foo().
In addition, the auto_ptr class has a member function get() which allows you to get at the
original regular pointer, and member functions release() and reset() which allow you to
explicitly deassociate or delete the original pointer. These are less often used, however.
The design changes in auto_ptr have come from deciding whether to allow auto_ptr objects to
be copy constructed or assigned. In particular, the former is necessary if auto_ptr objects are to
be passed to or returned from functions. From the first public Committee Draft to the second
Draft to the final Draft International Standard, this part of auto_ptr has changed, and so what
may be in the standard library implementation you are currently using may not yet correspond
to the final version of the library adopted by the committee at the Morristown meeting this
past November.
187
C++ Tutorials
In the final version, copy construction and assignment of auto_ptr objects is allowed, but only
on non-const objects. Copying an auto_ptr transfers "ownership" of the storage pointed to by
the auto_ptr to the destination, and the source of the copy is modified so that its pointer is null.
Thus, in the following case:
void g(auto_ptr<SomeClass> p) {
p->foo(); // ok
}
void f() {
auto_ptr<SomeClass> p(new SomeClass());
g(p);
p->foo(); // runtime error
}
the call to foo() in g() will work properly, but the one in f() would cause runtime undefined
behavior (such as a core dump) because it would be a dereference of a null pointer. These
"destructive copy semantics" are of course different from normal pointer copying semantics
(where both the destination and source point to the same storage) and are the reason why only
non-const objects may be used.
Because of these transfer of ownership semantics, it is generally not possible to use auto_ptr
objects within collections, such as STL. This is because the algorithms that implement STL
may create temporary copies (for example, during a sort) that unexpectedly gain ownership of
the auto_ptr, causing very unpredictable results in the container.
You may have seen "smart pointer" classes elsewhere which present alternative interfaces,
semantics, and implementations. It's important to realize that the standard library auto_ptr
class is NOT a generalized, all-purpose smart pointer class, nor is it a substitute for garbage
collection. Rather, it is designed for one specific purpose, and the "auto" part of the auto_ptr
name should be considered suggestive: use auto_ptr for local (automatic) variables and
temporaries only.
188
C++ Tutorials
a number of respects -- it can only terminate in certain ways, it usually can't call other standard
library functions, it can't refer to static objects unless they are of type volatile sig_atomic_t,
and so forth (see Section 7.7.1.1 of the ISO C standard for full details). Any violation of these
constraints results in undefined behavior.
This left open the question of what constraints a C++ signal handling function has. The
standards committee resolved this during the past year by adding language to the standard
defining the concept of a "plain old function" (POF), analogous to the existing concept of
plain old data (POD). A POF is a function that only uses the common subset of the C and C++
languages. A C++ signal handler only has defined behavior if it is a POF and if it would have
defined behavior under the C standard; in particular, any handler which is not a POF -- i.e.,
which uses any C++ features -- will have undefined behavior.
This means that even innocuous uses of C++ language features, such as "for (int i ...)" instead
of "for (i ...)", in a signal handler will result in undefined behavior, even when in
implementation terms it should make no difference.
That's the letter of the law. In practice, things are different.
First, the C standard is not really the arbiter of signal handlers. Typically operating systems
provide more full-featured, robust signal handling mechanisms than the bare-bones level of
portable support given by the C standard. Similarly, the constraints upon signal handlers in
operating systems may be different. For instance, the POSIX standard has a more specific and
generally less restrictive set of constraints upon signal handlers than the C standard. It is this
operating system definition or standard that then often becomes the important one for
programmers to be aware of.
Second, a lot of C++ features do not impact the runtime considerations of what would
interfere with signal handling. Examples would include interspersing of declarations with
statements, use of references, use of scoping notation (assuming the scope reference itself was
well-defined), and many others. Such innocuous usages, while strictly speaking undefined by
the C++ standard, are very likely to work without problems.
The C++ features that are not likely to work are those that involve complicated runtime
processing or state information. In particular, any use of C++ exception handling within a
signal handler is likely to lead to disaster (and indeed a footnote in the standard points this
out), unless the implementation documentation has specifically stated that it will work.
Similarly, runtime type information (RTTI) and static object declaration with dynamic
initialization are good candidates for malfunction.
Note the C++ standard does not say that use of C++ features in signal handlers is
"implementation-defined", in which case implementations would have to document which
features will or will not work in a signal handler, but rather "undefined", which lets C++
vendors off the hook. It is up to the user of an implementation to figure out whether a
particular C++ feature can safely operate in a signal handler, hopefully with some general
guidance from the vendor. Of course, the safest route is to follow the letter of the law and
comply with the C standard restrictions by avoiding C++ features entirely.
189
C++ Tutorials
The vector container of the C++ standard library was introduced in C++
Newsletter issue #015. From its inception in the original Standard
Template Library specification, a declaration such as
int main() {
vector<int> v(100, 1); // expecting ctor1
return 0;
}
This program prints out "ctor 2", because that is the better match (the first constructor requires
a type conversion), but it is certainly not what the user intended! Other usages can lead to
compile-time ambiguities, usually with fairly incomprehensible error messages.
Now one way to work around the problem in this instance would be to simply insert a cast to
the actual type in the first constructor:
vector<int> v((vector<int>::size_type)100, 1);
With this change the above program prints out "ctor 1", since that is now the better match.
But this is non-intuitive, and furthermore the ANSI/ISO standards committee discovered that
this problem also occurs in other sequence containers in the standard library, such as list,
deque, and basic_string, and in other member functions, such as insert() and replace(),
whenever a template argument type convertible to int was involved. So a better resolution was
needed.
190
C++ Tutorials
The solution adopted by the committee last year was simply to declare that the second
constructor shall have the effect of the first constructor, if the input iterator type (class II in the
above example) is an integral type! In other words, the library will "do what I mean", not "do
what I say".
How is this done in the library? One way is to specialize the member template for every
integral type. But the standard also mysteriously states that "Less cumbersome implementation
techniques also exist". This is referring to implementing a compile-time dispatching scheme
inside the library, whereby the implementation can tell whether the instantiating type of the
second constructor is integral or not. This involves clever uses of partial specialization and the
numeric_limits<T> traits class. Bjarne Stroustrup stated during a committee meeting that
people who look at the code for this technique react with "fascinated horror", but fortunately
the horror is for standard library vendors and not you!
Meanwhile, if you do not have access to a standard library implementation that conforms to
the final standard, you may have to use a work-around such as presented above.
191
C++ Tutorials
Typename Changes
Jonathan Schilling, [email protected]
This newsletter has not previously mentioned the "typename" keyword. This language feature
was introduced several years ago during the standardization process. To understand its
purpose, consider the following code:
template<class T> class Y {
T::A a; // error
};
When the compiler sees this class template definition, it has no way of knowing what T::A
represents. In particular, it doesn't know whether T::A is a type or is something else. Usages
such as
T::A(bb);
might either be a function call of T::A passing global variable bb as an argument, or a
declaration of a variable bb of type T::A. (Yes, in C and C++ you can declare variables that
way; makes parsing lots of fun!)
Issue #017 of the Newsletter discussed the idea of dependent and non-dependent names within
templates. T::A is a dependent name (because part of the name is the template formal
parameter T), and the language rule became that a dependent name within a template is
assumed to NOT be a type unless the applicable name lookup finds a type (which it doesn't
here) or unless the new typename keyword is used. Neither of these happens in this case.
So, the above examples need to be modified to
template<class T> class Y {
typename T::A a; // ok
typename T::A(bb); // ok, bb is a data member
};
and all is well.
Since typename was introduced, its permitted use has been expanded a couple of times to
make template writing easier. A while ago, the language was changed to allow typename to
appear before any qualified name, even if the name isn't template dependent, as long as the
usage is within the scope of a template declaration or definition.
192
C++ Tutorials
More recently, at the last committee meeting before the standard was made final, the language
syntax was revised to allow typename to be used within a return statement. This makes the
following usage possible:
class V {
public:
typedef int weight;
// ...
};
class W {
public:
typedef int weight;
// ...
};
void z() {
A<V> a;
a.f();
}
The first use of typename corresponds to what we've discussed above. But without also having
typename in the return statement, the compiler would have to assume that that T::weight
(being a dependent name) was a function name rather than a type, and so would generate an
error.
As a temporary source of confusion, some compilers have not yet implemented all of the
consequences of dependent/non-dependent lookups in templates, and so this example might
compile even without the typename. Also note that as a non-standard extension some
compilers will assume an implicit typename in the example at the beginning, causing it to
compile without a typename.
More importantly, the above example is the sort of architecture you see in STL or generic
programming, where several classes share the same characteristic (in this case, a type named
"weight"), and you can make use of that characteristic without knowing which of those classes
you're being instantiated with. It is in such circumstances that the typename keyword is most
likely to be necessary.
193
C++ Tutorials
Object-oriented Design
INTRODUCTION TO OBJECT-ORIENTED DESIGN PART 1 - ABSTRACTION
Up until now we've largely avoided discussing object-oriented design (OOD). This is a topic
with a variety of methods put forward, and people tend to have strong views about it. But there
are some useful general principles that can be stated, and we will present some of them in a
series of articles.
The first point is perhaps the hardest one for newcomers to OOD to grasp. People will ask
"How can I decide what classes my program should have in it?" The fundamental rule is that a
class should represent some abstraction. For example, a Date class might represent calendar
dates, an Integer class might deal with integers, and a Matrix class would represent
mathematical matrices. So you need to ask "What kinds of entities does my application
manipulate?"
Some examples of potential classes in different application areas would include:
GUI/Graphics - Line, Circle, Window, TextArea, Button, Point
char dow;
short dow;
int dow;
194
C++ Tutorials
Direct use of primitive types for representation has its drawbacks. For example, if I choose to
represent day of week as an integer, then what is meant by:
int dow;
...
dow = 19;
The domain of the type is violated. As another example, C/C++ pointers are notorious for
being misused and thereby introducing bugs into programs. A better choice in many cases is a
higher-level abstraction like a string class, found in the C++ and Java standard libraries.
On the other end of the scale, it's also possible to have a class try to do too much, or to cover
several disparate abstractions. For example, in statistics, it doesn't make sense to mix Mean
and Correlation. These statistical methods have little in common. If you have a class
"Statistics" with both of these in it, along with an add() member function to add new values,
the result will be a mishmash. For example, for Mean, you need a stream of single values,
whereas for Correlation, you need a sequence of (X,Y) pairs.
We will have more to say about OOD principles. A good book illustrating several object-
oriented design principles is "Designing and Coding Reusable C++" by Martin Carroll and
Margaret Ellis, published by Addison-Wesley.
195
C++ Tutorials
We've declared a class Point with a couple of private data members. There is a constructor to
create new object instances of Point, and a member function dist() to compute the distance
between this point and another one.
Suppose that we instead implemented this as C code. We might have:
struct Point {
float x;
float y;
};
float Point_dist(Point*);
and so on.
The C approach will certainly work, so why all the fuss about data abstraction and C++? There
are several reasons for the fuss. One is simply that data abstraction is a useful way of looking
at the organization of a software program. Rather than decomposing a program in terms of its
functional structure, we instead ask the question "What data types are we operating on, and
what sorts of operations do we wish to do on them?"
With data abstraction, there is a distinction made between the representation of a type, and
public operations on and behavior of that type. For example, I as a user of Point don't have to
know or care that internally, a point is represented by a couple of floating-point numbers.
Other choices might conceivably be doubles or longs or shorts. All I care about is the public
behavior of the type.
In a similar vein, data abstraction allows for the formal manipulation of types in a
mathematical sense. For example, suppose that we are dealing with screen points in the range
0-1000, typical of windowing systems today. And we are using the C approach, and say:
Point p;
p.x = 125;
p.y = -59;
What does this mean? The domain of the type has been violated, by introduction of an invalid
value for Y. This sort of invalid value can easily be screened out in a C++ constructor for
Point. Without maintaining integrity of a type, it's hard to reason about the behavior of the
type, for example, whether dist() really does compute the distance appropriately.
Also, if the representation of a type is hidden, it can be changed at a later time without
affecting the users of the type.
As another simple example of data abstraction, consider designing a String class. In C, strings
are implemented simply as character pointers, that is, of type "char*". Such pointers tend to be
error prone, and we might desire a higher-level alternative.
In terms of the actual string representation, we obviously have to store the string's characters,
and we also might want to store the string length separately from the actual characters.
Some of the operations on strings that we might want would include:
- creating a String from a char*
196
C++ Tutorials
class A {
public:
virtual void f() {cout << "A::f" << endl;}
};
class B : public A {
public:
virtual void f() {cout << "B::f" << endl;}
};
int main()
197
C++ Tutorials
{
B b;
A* ap = &b;
ap->f();
return 0;
}
which calls B::f(). That is, the base class pointer ap "really" points at a B object, and so B::f()
is called.
This feature requires some run-time assistance to determine which type of object is really
being manipulated, and which f() to call. One implementation approach uses a hidden pointer
in each object instance, that points at a table of function pointers (a virtual table or vtbl), and
dispatches accordingly.
Without language support for polymorphism, one would have to say something like:
#include <iostream.h>
class A {
public:
int type;
A() {type = 1;}
void f() {cout << "A::f" << endl;}
};
class B : public A {
public:
B() {type = 2;}
void f() {cout << "B::f" << endl;}
};
int main()
{
B b;
A* ap = &b;
if (ap->type == 1)
ap->f();
else
((B*)ap)->f();
return 0;
}
that is, use an explicit type field. This is cumbersome.
The use of base/derived classes (superclasses and subclasses) in combination with
polymorphic functions goes by the technical name of "object-oriented programming".
198
C++ Tutorials
It's interesting to note that in Java, methods (functions) are by default polymorphic, and one
has to specifically disable this feature by use of the "final", "private", or "static" keywords. In
C++ the default goes the other way.
dt.m = 27;
What does this mean? Probably nothing good. So it would be better to rewrite this as:
class Date {
int m;
int d;
int y;
public:
Date(int, int, int);
};
with a public constructor that will properly initialize a Date object.
In C++, data members of a class may be private (the default), protected (available to derived
classes), or public (available to everyone).
A simple and useful technique for controlling access to the private state of an object is to
define some member functions for setting and getting values:
class A {
int x;
public:
void set_x(int i) {x = i;}
int get_x() {return x;}
};
These functions are inline and have little or no performance overhead.
In C++ there is another sort of hiding available, that offered by namespaces. Suppose that you
have a program with some global data in it:
int x[100];
and you use a C++ class library that also uses global data:
double x = 12.34;
199
C++ Tutorials
These names will clash when you attempt to link the program. A simple solution is to use
namespaces:
namespace Company1 {
int x[100];
}
namespace Company2 {
double x = 12.34;
}
and refer to the values as "Company1::x" and "Company2::x". Note that the Java language has
no global variables, and similar usage to this example would involve static data defined in
classes.
Data hiding is a simple but extremely important concept. Without it, it is difficult to reason
about the behavior of an object, given that its state can be arbitrarily changed at any point.
200
C++ Tutorials
This will work, until such time as the internal representation is changed to something else. At
that point, this usage will be invalidated, and will not compile or will introduce subtle
problems into a running program (what if I change the stack origin by 1?).
The point is simply that exposing the internal representation introduces a set of problems with
program reliability and maintainability.
201
C++ Tutorials
It's useful to distinguish between developers, who may wish to extend a class, and end users.
For example, with the Date class, the representation (number of days since 1/1/1800) is non-
standard, and in a hard format to manipulate. So it makes sense to hide the representation
completely. On the other hand, for TreeNode, with binary trees as a well-understood entity,
giving a developer access to the representation may be a good idea.
There's quite a bit more to say about extensibility, which we will do in future issues.
class A {
public:
A() {cout << "A::A\n";}
~A() {cout << "A::~A\n";}
};
class B : public A {
public:
B() {cout << "B::B\n";}
~B() {cout << "B::~B\n";}
};
class C : public B {
public:
C() {cout << "C::C\n";}
~C() {cout << "C::~C\n";}
};
void f()
{
C c;
}
int main()
{
202
C++ Tutorials
f();
return 0;
}
in fact causes the constructors for B and A to be called, and likewise for the destructor.
As a simple rule of thumb, I personally try to keep derivations to three levels or less. In other
words, a base class, and a couple of levels of derived classes from it.
203
C++ Tutorials
204