C Faq
C Faq
C Faq
A class defines a data type, much like a struct would be in C. In a computer science sense, a type
consists of both a set of states and a set of operations which transition between those states. Thus
int is a type because it has both a set of states and it has operations like i + j or i++, etc. In exactly
the same way, a class provides a set of (usually public) operations, and a set of (usually non-public)
data bits representing the abstract values that instances of the type can have.
You can imagine that int is a class that has member functions called operator++, etc. (int isn't
really a class, but the basic analogy is this: a class is a type, much like int is a type.)
Note: a C programmer can think of a class as a C struct whose members default to private. But if
that's all you think of a class, then you probably need to experience a personal paradigm shift.
After the declaration int i; we say that "i is an object of type int." In OO/C++, "object" usually
means "an instance of a class." Thus a class defines the behavior of possibly many objects
(instances).
When it provides a simplified view of a chunk of software, and it is expressed in the vocabulary of a
user (where a "chunk" is normally a class or a tight group of classes, and a "user" is another
developer rather than the ultimate customer).
• The "simplified view" means unnecessary details are intentionally hidden. This reduces the
user's defect-rate.
• The "vocabulary of users" means users don't need to learn a new set of words and concepts.
This reduces the user's learning curve.
The key money-saving insight is to separate the volatile part of some chunk of software from the
stable part. Encapsulation puts a firewall around the chunk, which prevents other chunks from
accessing the volatile parts; other chunks can only access the stable parts. This prevents the other
chunks from breaking if (when!) the volatile parts are changed. In context of OO software, a
"chunk" is normally a class or a tight group of classes.
The "volatile parts" are the implementation details. If the chunk is a single class, the volatile part is
normally encapsulated using the private and/or protected keywords. If the chunk is a tight group of
classes, encapsulation can be used to deny access to entire classes in that group. Inheritance can
also be used as a form of encapsulation.
The "stable parts" are the interfaces. A good interface provides a simplified view in the vocabulary
of a user, and is designed from the outside-in (here a "user" means another developer, not the end-
user who buys the completed application). If the chunk is a single class, the interface is simply the
class's public member functions and friend functions. If the chunk is a tight group of classes, the
interface can include several of the classes in the chunk.
1 of 133
C++ FAQ
Designing a clean interface and separating that interface from its implementation merely allows
users to use the interface. But encapsulating (putting "in a capsule") the implementation forces
users to use the interface.
[7.5] How does C++ help with the tradeoff of safety vs. usability?
In C, encapsulation was accomplished by making things static in a compilation unit or module. This
prevented another module from accessing the static stuff. (By the way, that use is now deprecated:
don't do that in C++.)
Unfortunately this approach doesn't support multiple instances of the data, since there is no direct
support for making multiple instances of a module's static data. If multiple instances were needed
in C, programmers typically used a struct. But unfortunately C structs don't support encapsulation.
This exacerbates the tradeoff between safety (information hiding) and usability (multiple instances).
In C++, you can have both multiple instances and encapsulation via a class. The public part of a
class contains the class's interface, which normally consists of the class's public member functions
and its friend functions. The private and/or protected parts of a class contain the class's
implementation, which is typically where the data lives.
The end result is like an "encapsulated struct." This reduces the tradeoff between safety
(information hiding) and usability (multiple instances).
[7.6] How can I prevent other programmers from violating encapsulation by seeing the
private parts of my class?
It doesn't violate encapsulation for a programmer to see the private and/or protected parts of your
class, so long as they don't write code that somehow depends on what they saw. In other words,
encapsulation doesn't prevent people from knowing about the inside of a class; it prevents the code
they write from becoming dependent on the insides of the class. Your company doesn't have to pay
a "maintenance cost" to maintain the gray matter between your ears; but it does have to pay a
maintenance cost to maintain the code that comes out of your finger tips. What you know as a
person doesn't increase maintenance cost, provided the code they write depends on the interface
rather than the implementation.
Besides, this is rarely if ever a problem. I don't know any programmers who have intentionally tried
to access the private parts of a class. "My recommendation in such cases would be to change the
programmer, not the code" [James Kanze; used with permission].
No.
Encapsulation != security.
[7.8] What's the difference between the keywords struct and class?
The members and base classes of a struct are public by default, while in class, they default to
private. Note: you should make your base classes explicitly public, private, or protected, rather than
relying on the defaults.
int main()
{
int x, y;
// ...
swap(x,y);
}
Here i and j are aliases for main's x and y respectively. In other words, i is x — not a pointer to x,
nor a copy of x, but x itself. Anything you do to i gets done to x, and vice versa.
OK. That's how you should think of references as a programmer. Now, at the risk of confusing you
by giving you a different perspective, here's how references are implemented. Underneath it all, a
reference i to object x is typically the machine address of the object x. But when the programmer
says i++, the compiler generates code that increments x. In particular, the address bits that the
compiler uses to find x are not changed. A C programmer will think of this as if you used the C style
pass-by-pointer, with the syntactic variant of (1) moving the & from the caller into the callee, and
(2) eliminating the *s. In other words, a C programmer will think of i as a macro for (*p), where p
is a pointer to x (e.g., the compiler automatically dereferences the underlying pointer; i++ is
changed to (*p)++; i = 7 is automatically changed to *p = 7).
Important note: Even though a reference is often implemented using an address in the underlying
assembly language, please do not think of a reference as a funny looking pointer to an object. A
reference is the object. It is not a pointer to the object, nor a copy of the object. It is the object.
You change the state of the referent (the referent is the object to which the reference refers).
Remember: the reference is the referent, so changing the reference changes the state of the
referent. In compiler writer lingo, a reference is an "lvalue" (something that can appear on the left
hand side of an assignment operator).
The function call can appear on the left hand side of an assignment operator.
3 of 133
C++ FAQ
This ability may seem strange at first. For example, no one thinks the expression f() = 7 makes
sense. Yet, if a is an object of class Array, most people think that a[i] = 7 makes sense even though
a[i] is really just a function call in disguise (it calls Array::operator[](int), which is the subscript
operator for class Array).
class Array {
public:
int size() const;
float& operator[] (int index);
// ...
};
int main()
{
Array a;
for (int i = 0; i < a.size(); ++i)
a[i] = 7; // This line invokes Array::operator[](int)
}
[Recently created thanks to Robert Cullen (in 4/01). Click here to go to the next FAQ in the "chain"
of recent changes.]
It chains these method calls, which is why this is called method chaining.
The first thing that gets executed is object.method1(). This returns some object, which might be a
reference to object (i.e., method1() might end with return *this;), or it might be some other object.
Let's call the returned object objectB). Then objectB) becomes the this object of method2().
The most common use of method chaining is in the iostream library. E.g., cout << x << y works
because cout << x is a function that returns cout.
A less common, but still rather slick, use for method chaining is in the Named Parameter Idiom.
[8.5] How can you reseat a reference to make it refer to a different object?
No way.
Unlike a pointer, once a reference is bound to an object, it can not be "reseated" to another object.
The reference itself isn't an object (it has no identity; taking the address of a reference gives you
the address of the referent; remember: the reference is its referent).
In that sense, a reference is similar to a const pointer such as int* const p (as opposed to a pointer
to const such as const int* p). In spite of the gross similarity, please don't confuse references with
pointers; they're not at all the same.
[8.6] When should I use references, and when should I use pointers?
Use references when you can, and pointers when you have to.
References are usually preferred over pointers whenever you don't need "reseating". This usually
means that references are most useful in a class's public interface. References typically appear on
the skin of an object, and pointers on the inside.
4 of 133
C++ FAQ
The exception to the above is where a function's parameter or return value needs a "sentinel"
reference. This is usually best done by returning/taking a pointer, and giving the NULL pointer this
special significance (references should always alias objects, not a dereferenced NULL pointer).
Note: Old line C programmers sometimes don't like references since they provide reference
semantics that isn't explicit in the caller's code. After some C++ experience, however, one quickly
realizes this is a form of information hiding, which is an asset rather than a liability. E.g.,
programmers should write code in the language of the problem rather than the language of the
machine.
The term handle is used to mean any technique that lets you get to another object — a generalized
pseudo-pointer. The term is (intentionally) ambiguous and vague.
Ambiguity is actually an asset in certain cases. For example, during early design you might not be
ready to commit to a specific representation for the handles. You might not be sure whether you'll
want simple pointers vs. references vs. pointers-to-pointers vs. pointers-to-references vs. integer
indices into an array vs. strings (or other key) that can be looked up in a hash-table (or other data
structure) vs. database keys vs. some other technique. If you merely know that you'll need some
sort of thingy that will uniquely identify and get to an object, you call the thingy a Handle.
So if your ultimate goal is to enable a glop of code to uniquely identify/look-up a specific object of
some class Fred, you need to pass a Fred handle into that glop of code. The handle might be a
string that can be used as a key in some well-known lookup table (e.g., a key in a
std::map<std::string,Fred> or a std::map<std::string,Fred*>), or it might be an integer that
would be an index into some well-known array (e.g., Fred* array = new Fred[maxNumFreds]), or it
might be a simple Fred*, or it might be something else.
Novices often think in terms of pointers, but in reality there are downside risks to using raw
pointers. E.g., what if the Fred object needs to move? How do we know when it's safe to delete the
Fred objects? What if the Fred object needs to (temporarily) get serialized on disk? etc., etc. Most of
the time we add more layers of indirection to manage situations like these. For example, the
handles might be Fred**, where the pointed-to Fred* pointers are guaranteed to never move but
when the Fred objects need to move, you just update the pointed-to Fred* pointers. Or you make
the handle an integer then have the Fred objects (or pointers to the Fred objects) looked up in a
table/array/whatever. Or whatever.
The point is that we use the word Handle when we don't yet know the details of what we're going to
do.
Another time we use the word Handle is when we want to be vague about what we've already done
(sometimes the term magic cookie is used for this as well, as in, "The software passes around a
magic cookie that is used to uniquely identify and locate the appropriate Fred object"). The reason
we (sometimes) want to be vague about what we've already done is to minimize the ripple effect
if/when the specific details/representation of the handle change. E.g., if/when someone changes the
handle from a string that is used in a lookup table to an integer that is looked up in an array, we
don't want to go and update a zillion lines of code.
To further ease maintenance if/when the details/representation of a handle changes (or to generally
make the code easier to read/write), we often encapsulate the handle in a class. This class often
overloads operators operator-> and operator* (since the handle acts like a pointer, it might as well
look like a pointer).
5 of 133
C++ FAQ
An inline function is a function whose code gets inserted into the caller's code stream. Like a
#define macro, inline functions improve performance by avoiding the overhead of the call itself and
(especially!) by the compiler being able to optimize through the call ("procedural integration").
[9.2] How can inline functions help with the tradeoff of safety vs. speed?
In straight C, you can achieve "encapsulated structs" by putting a void* in a struct, in which case
the void* points to the real data that is unknown to users of the struct. Therefore users of the struct
don't know how to interpret the stuff pointed to by the void*, but the access functions cast the
void* to the approprate hidden type. This gives a form of encapsulation.
Unfortunately it forfeits type safety, and also imposes a function call to access even trivial fields of
the struct (if you allowed direct access to the struct's fields, anyone and everyone would be able to
get direct access since they would of necessity know how to interpret the stuff pointed to by the
void*; this would make it difficult to change the underlying data structure).
Function call overhead is small, but can add up. C++ classes allow function calls to be expanded
inline. This lets you have the safety of encapsulation along with the speed of direct access.
Furthermore the parameter types of these inline functions are checked by the compiler, an
improvement over C's #define macros.
[9.3] Why should I use inline functions? Why not just use plain old #define macros?
Because #define macros are evil in 4 different ways: evil#1, evil#2, evil#3, and evil#4.
Unlike #define macros, inline functions avoid infamous macro errors since inline functions always
evaluate every argument exactly once. In other words, invoking an inline function is semantically
just like invoking a regular function, only faster:
int f();
void userCode(int x)
{
int ans;
Also unlike macros, argument types are checked, and necessary conversions are performed
correctly.
Macros are bad for your health; don't use them unless you have to.
6 of 133
C++ FAQ
[ Top | Bottom | Previous section | Next section | Search the FAQ ]
[9.4] How do you tell the compiler to make a non-member function inline?
When you declare an inline function, it looks just like a normal function:
But when you define an inline function, you prepend the function's definition with the keyword
inline, and you put the definition into a header file:
inline
void f(int i, char c)
{
// ...
}
Note: It's imperative that the function's definition (the part between the {...}) be placed in a
header file, unless the function is used only in a single .cpp file. In particular, if you put the inline
function's definition into a .cpp file and you call it
[9.5] How do you tell the compiler to make a member function inline?
When you declare an inline member function, it looks just like a normal member function:
class Fred {
public:
void f(int i, char c);
};
But when you define an inline member function, you prepend the member function's definition with
the keyword inline, and you put the definition into a header file:
inline
void Fred::f(int i, char c)
{
// ...
}
It's usually imperative that the function's definition (the part between the {...}) be placed in a
header file. If you put the inline function's definition into a .cpp file, and if it is called from some
other .cpp file, you'll get an "unresolved external" error from the linker.
[9.6] Is there another way to tell the compiler to make a member function inline?
class Fred {
public:
void f(int i, char c)
{
// ...
}
};
7 of 133
C++ FAQ
Although this is easier on the person who writes the class, it's harder on all the readers since it
mixes "what" a class does with "how" it does them. Because of this mixture, we normally prefer to
define member functions outside the class body with the inline keyword. The insight that makes
sense of this: in a reuse-oriented world, there will usually be many people who use your class, but
there is only one person who builds it (yourself); therefore you should do things that favor the
many rather than the few.
Nope.
Beware that overuse of inline functions can cause code bloat, which can in turn have a negative
performance impact in paging environments.
The term code bloat simply means that the size of the code gets larger (bloated). In the context of
inline functions, the concern is that too many inline functions might increase the size of the
executable (i.e., cause code bloat), and that might cause the operating system to thrash, which
simply means it spends most of its time going out to disk to pull in the next chunk of code.
Of course it's also possible that inline functions will decrease the size of the executable. This may
seem backwards, but it's really true. In particular, the amount of code necessary to call a function is
sometimes greater than the amount of code to expand the function inline. This can happen with
very short functions, and it can also happen with long functions when the optimizer is able to
remove a lot of redundant code — that is, when the optimizer is able to make the long function
short.
So the message is this: there is no simple answer. You have to play with it to see what is best. Do
not settle for a simplistic answer like, "Never use inline functions" or "Always use inline functions" or
"Use inline functions if and only if the function is less than N lines of code." These one-size-fits-all
rules may be easy to use, but they will produce sub-optimal results.
Constructors are like "init functions". They turn a pile of arbitrary bits into a living object. Minimally
they initialize internally used fields. They may also allocate resources (memory, files, semaphores,
sockets, etc).
A big difference!
Suppose that List is the name of some class. Then function f() declares a local List object called x:
void f()
{
List x; // Local object named x (of class List)
// ...
}
But function g() declares a function called x() that returns a List:
8 of 133
C++ FAQ
void g()
{
List x(); // Function named x (that returns a List)
// ...
}
No way.
Dragons be here: if you call another constructor, the compiler initializes a temporary local object; it
does not initialize this object. You can combine both constructors by using a default parameter, or
you can share their common code in a private init() member function.
No. A "default constructor" is a constructor that can be called with no arguments. Thus a
constructor that takes no arguments is certainly a default constructor:
class Fred {
public:
Fred(); // Default constructor: can be called with no args
// ...
};
However it is possible (and even likely) that a default constructor can take arguments, provided
they are given default values:
class Fred {
public:
Fred(int i=3, int j=5); // Default constructor: can be called with no args
// ...
};
[10.5] Which constructor gets called when I create an array of Fred objects?
There is no way to tell the compiler to call a different constructor (except as discussed below). If
your class Fred doesn't have a default constructor, attempting to create an array of Fred objects is
trapped as an error at compile time.
class Fred {
public:
Fred(int i, int j);
// ... assume there is no default constructor in class Fred ...
};
int main()
{
Fred a[10]; // ERROR: Fred doesn't have a default constructor
Fred* p = new Fred[10]; // ERROR: Fred doesn't have a default constructor
}
However if you are constructing an object of the standard std::vector<Fred> rather than an array
of Fred (which you probably should be doing anyway since arrays are evil), you don't have to have
9 of 133
C++ FAQ
a default constructor in class Fred, since you can give the std::vector a Fred object to be used to
initialize the elements:
#include <vector>
int main()
{
std::vector<Fred> a(10, Fred(5,7));
// The 10 Fred objects in std::vector a will be initialized with Fred(5,7).
// ...
}
Even though you ought to use a std::vector rather than an array, there are times when an array
might be the right thing to do, and for those, there is the "explicit initialization of arrays" syntax.
Here's how it looks:
class Fred {
public:
Fred(int i, int j);
// ... assume there is no default constructor in class Fred ...
};
int main()
{
Fred a[10] = {
Fred(5,7), Fred(5,7), Fred(5,7), Fred(5,7), Fred(5,7),
Fred(5,7), Fred(5,7), Fred(5,7), Fred(5,7), Fred(5,7)
};
Of course you don't have to do Fred(5,7) for every entry — you can put in any numbers you want,
even parameters or other variables. The point is that this syntax is (a) doable but (b) not as nice as
the std::vector syntax. Remember this: arrays are evil — unless there is a compelling reason to use
an array, use a std::vector instead.
Initialization lists. In fact, constructors should initialize all member objects in the initialization list.
For example, this constructor initializes member object x_ using an initialization list:
Fred::Fred() : x_(whatever) { }. The most common benefit of doing this is improved performance.
For example, if the expression whatever is the same as member variable x_, the result of the
whatever expression is constructed directly inside x_ — the compiler does not make a separate
copy of the object. Even if the types are not the same, the compiler is usually able to do a better
job with initialization lists than with assignments.
The other (inefficient) way to build constructors is via assignment, such as:
Fred::Fred() { x_ = whatever; }. In this case the expression whatever causes a separate,
temporary object to be created, and this temporary object is passed into the x_ object's assignment
operator. Then that temporary object is destructed at the ;. That's inefficient.
As if that wasn't bad enough, there's another source of inefficiency when using assignment in a
constructor: the member object will get fully constructed by its default constructor, and this might,
for example, allocate some default amount of memory or open some default file. All this work could
10 of 133
C++ FAQ
be for naught if the whatever expression and/or assignment operator causes the object to close that
file and/or release that memory (e.g., if the default constructor didn't allocate a large enough pool
of memory or if it opened the wrong file).
Conclusion: All other things being equal, your code will run faster if you use initialization lists rather
than assignment.
Note: There is no performance difference if the type of x_ is some built-in/intrinsic type, such as int
or char* or float. But even in these cases, my personal preference is to set those data members in
the initialization list rather than via assignment for consistency.
Some people feel you should not use the this pointer in a constructor because the object is not fully
formed yet. However you can use this in the constructor (in the {body} and even in the
initialization list) if you are careful.
Here is something that always works: the {body} of a constructor (or a function called from the
constructor) can reliably access the data members declared in a base class and/or the data
members declared in the constructor's own class. This is because all those data members are
guaranteed to have been fully constructed by the time the constructor's {body} starts executing.
Here is something that never works: the {body} of a constructor (or a function called from the
constructor) cannot get down to a derived class by calling a virtual member function that is
overridden in the derived class. If your goal was to get to the overridden function in the derived
class, you won't get what you want. Note that you won't get to the override in the derived class
independent of how you call the virtual member function: explicitly using the this pointer (e.g., this-
>method()), implicitly using the this pointer (e.g., method()), or even calling some other function
that calls the virtual member function on your this object. The bottom line is this: even if the caller
is constructing an object of a derived class, during the constructor of the base class, your object is
not yet of that derived class. You have been warned.
Here is something that sometimes works: if you pass any of the data members in this object to
another data member's initializer, you must make sure that the other data member has already
been initialized. The good news is that you can determine whether the other data member has (or
has not) been initialized using some straightforward language rules that are independent of the
particular compiler you're using. The bad news it that you have to know those language rules (e.g.,
base class sub-objects are initialized first (look up the order if you have multiple and/or virtual
inheritance!), then data members defined in the class are initialized in the order in which they
appear in the class declaration). If you don't know these rules, then don't pass any data member
from the this object (regardless of whether or not you explicitly use the this keyword) to any other
data member's initializer! And if you do know the rules, please be careful.
A technique that provides more intuitive and/or safer construction operations for users of your
class.
The problem is that constructors always have the same name as the class. Therefore the only way
to differentiate between the various constructors of a class is by the parameter list. But if there are
lots of constructors, the differences between them become somewhat subtle and error prone.
With the Named Constructor Idiom, you declare all the class's constructors in the private or
protected sections, and you provide public static methods that return an object. These static
methods are the so-called "Named Constructors." In general there is one such static method for
each different way to construct an object.
11 of 133
C++ FAQ
For example, suppose we are building a Point class that represents a position on the X-Y plane.
Turns out there are two common ways to specify a 2-space coordinate: rectangular coordinates
(X+Y), polar coordinates (Radius+Angle). (Don't worry if you can't remember these; the point isn't
the particulars of coordinate systems; the point is that there are several ways to create a Point
object.) Unfortunately the parameters for these two coordinate systems are the same: two floats.
This would create an ambiguity error in the overloaded constructors:
class Point {
public:
Point(float x, float y); // Rectangular coordinates
Point(float r, float a); // Polar coordinates (radius and angle)
// ERROR: Overload is Ambiguous: Point::Point(float,float)
};
int main()
{
Point p = Point(5.7, 1.2); // Ambiguous: Which coordinate system?
}
One way to solve this ambiguity is to use the Named Constructor Idiom:
class Point {
public:
static Point rectangular(float x, float y); // Rectangular coord's
static Point polar(float radius, float angle); // Polar coordinates
// These static methods are the so-called "named constructors"
// ...
private:
Point(float x, float y); // Rectangular coordinates
float x_, y_;
};
Now the users of Point have a clear and unambiguous syntax for creating Points in either coordinate
system:
int main()
{
Point p1 = Point::rectangular(5.7, 1.2); // Obviously rectangular
Point p2 = Point::polar(5.7, 1.2); // Obviously polar
}
Make sure your constructors are in the protected section if you expect Point to have derived classes.
The Named Constructor Idiom can also be used to make sure your objects are always created via
new.
12 of 133
C++ FAQ
[10.9] Why can't I initialize my static member data in my constructor's initialization list?
Because you must explicitly define your class's static data members.
Fred.h:
class Fred {
public:
Fred();
// ...
private:
int i_;
static int j_;
};
Fred::Fred()
: i_(10) // OK: you can (and should) initialize member data this way
, j_(42) // Error: you cannot initialize static member data like this
{
// ...
}
[10.10] Why are classes with static data members getting linker errors?
Because static data members must be explicitly defined in exactly one compilation unit . If you didn't
do this, you'll probably get an "undefined external" linker error. For example:
// Fred.h
class Fred {
public:
// ...
private:
static int j_; // Declares static data member Fred::j_
// ...
};
The linker will holler at you ("Fred::j_ is not defined") unless you define (as opposed to merely
declare) Fred::j_ in (exactly) one of your source files:
// Fred.cpp
#include "Fred.h"
// Alternatively, if you wish to use the implicit 0 value for static ints:
// int Fred::j_;
The usual place to define static data members of class Fred is file Fred.cpp (or Fred.C or whatever
source file extension you use).
13 of 133
C++ FAQ
[10.11] What's the "static initialization order fiasco"?
The static initialization order fiasco is a very subtle and commonly misunderstood aspect of C++.
Unfortunately it's very hard to detect — the errors occur before main() begins.
In short, suppose you have two static objects x and y which exist in separate source files, say x.cpp
and y.cpp. Suppose further that the initialization for the y object (typically the y object's
constructor) calls some method on the x object.
The tragedy is that you have a 50%-50% chance of dying. If the compilation unit for x.cpp happens
to get initialized first, all is well. But if the compilation unit for y.cpp get initialized first, then y's
initialization will get run before x's initialization, and you're toast. E.g., y's constructor could call a
method on the x object, yet the x object hasn't yet been constructed.
I hear they're hiring down at McDonalds. Enjoy your new job flipping burgers.
If you think it's "exciting" to play Russian Roulette with live rounds in half the chambers, you can
stop reading here. On the other hand if you like to improve your chances of survival by preventing
disasters in a systematic way, you probably want to read the next FAQ.
Note: The static initialization order fiasco can also, in some cases, apply to built-in/intrinsic types.
Use the "construct on first use" idiom, which simply means to wrap your static object inside a
function.
For example, suppose you have two classes, Fred and Barney. There is a global Fred object called x,
and a global Barney object called y. Barney's constructor invokes the goBowling() method on the x
object. The file x.cpp defines the x object:
// File x.cpp
#include "Fred.hpp"
Fred x;
// File y.cpp
#include "Barney.hpp"
Barney y;
For completeness the Barney constructor might look something like this:
// File Barney.cpp
#include "Barney.hpp"
Barney::Barney()
{
// ...
x.goBowling();
// ...
}
14 of 133
C++ FAQ
As described above, the disaster occurs if y is constructed before x, which happens 50% of the time
since they're in different source files.
There are many solutions to this problem, but a very simple and completely portable solution is to
replace the global Fred object, x, with a global function, x(), that returns the Fred object by
reference.
// File x.cpp
#include "Fred.hpp"
Fred& x()
{
static Fred* ans = new Fred();
return *ans;
}
Since static local objects are constructed the first time control flows over their declaration (only),
the above new Fred() statement will only happen once: the first time x() is called. Every
subsequent call will return the same Fred object (the one pointed to by ans). Then all you do is
change your usages of x to x():
// File Barney.cpp
#include "Barney.hpp"
Barney::Barney()
{
// ...
x().goBowling();
// ...
}
This is called the Construct On First Use Idiom because it does just that: the global Fred object is
constructed on its first use.
The downside of this approach is that the Fred object is never destructed. There is another
technique that answers this concern, but it needs to be used with care since it creates the possibility
of another (equally nasty) problem.
Note: The static initialization order fiasco can also, in some cases, apply to built-in/intrinsic types.
[10.13] Why doesn't the construct-on-first-use idiom use a static object instead of a
static pointer?
Short answer: it's possible to use a static object rather than a static pointer, but doing so opens up
another (equally subtle, equally nasty) problem.
Long answer: sometimes people worry about the fact that the previous solution "leaks." In many
cases, this is not a problem, but it is a problem in some cases. Note: even though the object
pointed to by ans in the previous FAQ is never deleted, the memory doesn't actually "leak" when
the program exits since the operating system automatically reclaims all the memory in a program's
heap when that program exits. In other words, the only time you'd need to worry about this is when
the destructor for the Fred object performs some important action (such as writing something to a
file) that must occur sometime while the program is exiting.
In those cases where the construct-on-first-use object (the Fred, in this case) needs to eventually
get destructed, you might consider changing function x() as follows:
15 of 133
C++ FAQ
// File x.cpp
#include "Fred.hpp"
Fred& x()
{
static Fred ans; // was static Fred* ans = new Fred();
return ans; // was return *ans;
}
However there is (or rather, may be) a rather subtle problem with this change. To understand this
potential problem, let's remember why we're doing all this in the first place: we need to make
100% sure our static object (a) gets constructed prior to its first use and (b) doesn't get destructed
until after its last use. Obviously it would be a disaster if any static object got used either before
construction or after destruction. The message here is that you need to worry about two situations
(static initialization and static deinitialization), not just one.
By changing the declaration from static Fred* ans = new Fred(); to static Fred ans;, we still
correctly handle the initialization situation but we no longer handle the deinitialization situation. For
example, if there are 3 static objects, say a, b and c, that use ans during their destructors, the only
way to avoid a static deinitialization disaster is if ans is destructed after all three, which is, perhaps,
only a one in four chance of success.
In the common case the odds are actually much worse than that. In fact, the common case all but
guarantees you'll have a static deinitialization disaster. The reason is a combination of the following
three items: [1] The common case is when the objects that use ans during destruction also use ans
during construction. [2] One of those objects, say a, triggers the construction of ans, meaning the
constructor for a starts before the constructor for ans. [3] Static objects are destructed in the
reverse order of construction — first constructed is last destructed. Putting those together means a
will get destructed after ans, which is a static deinitialization disaster: a's destructor will use ans
after it's been destructed. Bang, you're dead.
There is a third approach that handles both the static initialization and static deinitialization
situations, but it has other non-trivial costs. I'm too lazy (and busy!) to write any more FAQs today
so if you're interested in that third approach, you'll have to buy a book that describes that third
approach in detail. The C++ FAQs book is one of those books, and it also gives the cost/benefit
analysis to decide if/when that third approach should be used.
[10.14] How do I prevent the "static initialization order fiasco" for my static data
members?
Just use the same technique just described, but this time use a static member function rather than
a global function.
// File X.hpp
class X {
public:
// ...
private:
static Fred x_;
};
#include "X.hpp"
Fred X::x_;
Naturally also the Fred object will be used in one or more of X's methods:
void X::someMethod()
{
x_.goBowling();
}
But now the "disaster scenario" is if someone somewhere somehow calls this method before the
Fred object gets constructed. For example, if someone else creates a static X object and invokes its
someMethod() method during static initialization, then you're at the mercy of the compiler as to
whether the compiler will construct X::x_ before or after the someMethod() is called. (Note that the
ANSI/ISO C++ committee is working on this problem, but compilers aren't yet generally available
that handle these changes; watch this space for an update in the future.)
In any event, it's always portable and safe to change the X::x_ static data member into a static
member function:
// File X.hpp
class X {
public:
// ...
private:
static Fred& x();
};
// File X.cpp
#include "X.hpp"
Fred& X::x()
{
static Fred* ans = new Fred();
return *ans;
}
void X::someMethod()
{
x().goBowling();
}
If you're super performance sensitive and you're concerned about the overhead of an extra function
call on each invocation of X::someMethod() you can set up a static Fred& instead. As you recall,
static local are only initialized once (the first time control flows over their declaration), so this will
call X::x() only once: the first time X::someMethod() is called:
17 of 133
C++ FAQ
void X::someMethod()
{
static Fred& x = X::x();
x.goBowling();
}
Note: The static initialization order fiasco can also, in some cases, apply to built-in/intrinsic types.
[10.15] Do I need to worry about the "static initialization order fiasco" for variables of
built-in/intrinsic types?
Yes.
If you initialize your built-in/intrinsic type using a function call, the static initialization order fiasco is
able to kill you just as bad as with user-defined/class types. For example, the following code shows
the failure:
#include <iostream>
int x = f();
int y = g();
int f()
{
cout << "using 'y' (which is " << y << ")\n";
return 3*y + 7;
}
int g()
{
cout << "initializing 'y'\n";
return 5;
}
The output of this little program will show that it uses y before initializing it. The solution, as before,
is the Construct On First Use Idiom:
#include <iostream>
int& x()
{
static int ans = f();
return ans;
}
int& y()
{
static int ans = g();
return ans;
}
18 of 133
C++ FAQ
int f()
{
cout << "using 'y' (which is " << y() << ")\n";
return 3*y() + 7;
}
int g()
{
cout << "initializing 'y'\n";
return 5;
}
Of course you might be able to simplify this by moving the initialization code for x and y into their
respective functions:
#include <iostream>
int& x()
{
static int ans;
return ans;
}
int& y()
{
static int ans;
return ans;
}
And, if you can get rid of the print statements you can further simplify these to something really
simple:
int& x()
{
static int ans = 3*y() + 7;
return ans;
}
19 of 133
C++ FAQ
int& y()
{
static int ans = 5;
return ans;
}
Furthermore, since y is initialized using a constant expression, it no longer needs its wrapper
function — it can be a simple variable again.
The fundamental problem solved by the Named Parameter Idiom is that C++ only supports
positional parameters. For example, a caller of a function isn't allowed to say, "Here's the value for
formal parameter xyz, and this other thing is the value for formal parameter pqr." All you can do in
C++ (and C and Java) is say, "Here's the first parameter, here's the second parameter, etc." The
alternative, called named parameters and implemented in the language Ada, is especially useful if a
function takes a large number of mostly default-able parameters.
Over the years people have cooked up lots of workarounds for the lack of named parameters in C
and C++. One of these involves burying the parameter values in a string parameter then parsing
this string at run-time. This is what's done in the second parameter of fopen(), for example.
Another workaround is to combine all the boolean parameters in a bit-map, then the caller or's a
bunch of bit-shifted constants together to produce the actual parameter. This is what's done in the
second parameter of open(), for example. These approaches work, but the following technique
produces caller-code that's more obvious, easier to write, easier to read, and is generally more
elegant.
The idea, called the Named Parameter Idiom, is to change the function's parameters to methods of
a newly created class, where all these methods return *this by reference. Then you simply rename
the main function into a parameterless "do-it" method on that class.
The example will be for the "open a file" concept. Let's say that concept logically requires a
parameter for the file's name, and optionally allows parameters for whether the file should be
opened read-only vs. read-write vs. write-only, whether or not the file should be created if it
doesn't already exist, whether the writing location should be at the end ("append") or the beginning
("overwrite"), the block-size if the file is to be created, whether the I/O is buffered or non-buffered,
the buffer-size, whether it is to be shared vs. exclusive access, and probably a few others. If we
implemented this concept using a normal function with positional parameters, the caller code would
be very difficult to read: there'd be as many as 8 positional parameters, and the caller would
probably make a lot of mistakes. So instead we use the Named Parameter Idiom.
Before we go through the implementation, here's what the caller code might look like, assuming
you are willing to accept all the function's default parameters:
File f = OpenFile("foo.txt");
That's the easy case. Now here's what it might look like if you want to change a bunch of the
parameters.
20 of 133
C++ FAQ
File f = OpenFile("foo.txt").
readonly().
createIfNotExist().
appendWhenWriting().
blockSize(1024).
unbuffered().
exclusiveAccess();
Notice how the "parameters", if it's fair to call them that, are in random order (they're not
positional) and they all have names. So the programmer doesn't have to remember the order of the
parameters, and the names are (hopefully) obvious.
So here's how to implement it: first we create a new class (OpenFile) that houses all the parameter
values as private data members. Then all the methods (readonly(), blockSize(unsigned), etc.)
return *this (that is, they return a reference to the OpenFile object, allowing the method calls to be
chained. Finally we make the required parameter (the file's name, in this case) into a normal,
positional, parameter on OpenFile's constructor.
class File;
class OpenFile {
public:
OpenFile(const string& filename);
// sets all the default values for each data member
OpenFile& readonly(); // changes readonly_ to true
OpenFile& createIfNotExist();
OpenFile& blockSize(unsigned nbytes);
// ...
private:
friend File;
bool readonly_; // defaults to false [for example]
// ...
unsigned blockSize_; // defaults to 4096 [for example]
// ...
};
The only other thing to do is make the constructor for class File to take an OpenFile object:
class File {
public:
File(const OpenFile& params);
// vacuums the actual params out of the OpenFile object
// ...
};
Note that OpenFile declares File as its friend, that way OpenFile doesn't need a bunch of (otherwise
useless) public: get methods.
Since each member function in the chain returns a reference, there is no copying of objects and the
chain is highly efficient. Furthermore, if the various member functions are inline, the generated
object code will probably be on par with C-style code that sets various members of a struct. Of
course if the member functions are not inline, there may be a slight increase in code size and a
slight decrease in performance (but only if the construction occurs on the critical path of a CPU-
bound program; this is a can of worms I'll try to avoid opening; read the C++ FAQs book for a
rather thorough discussion of the issues), so it may, in this case, be a trade-off for making the code
more reliable.
21 of 133
C++ FAQ
[11.1] What's the deal with destructors?
Destructors are used to release any resources allocated by the object. E.g., class Lock might lock a
semaphore, and the destructor will release that semaphore. The most common example is when the
constructor uses new, and the destructor uses delete.
Destructors are a "prepare to die" member function. They are often abbreviated "dtor".
In the following example, b's destructor will be executed first, then a's destructor:
void userCode()
{
Fred a;
Fred b;
// ...
}
In the following example, the order for destructors will be a[9], a[8], ..., a[1], a[0]:
void userCode()
{
Fred a[10];
// ...
}
No.
You can have only one destructor for a class Fred. It's always called Fred::~Fred(). It never takes
any parameters, and it never returns anything.
You can't pass parameters to the destructor anyway, since you never explicitly call a destructor
(well, almost never).
No!
The destructor will get called again at the close } of the block in which the local was created. This is
a guarantee of the language; it happens automagically; there's no way to stop it from happening.
But you can get really bad results from calling a destructor on the same object a second time!
Bang! You're dead!
22 of 133
C++ FAQ
[11.6] What if I want a local to "die" before the close } of the scope in which it was
created? Can I call a destructor on a local if I really want to?
Suppose the (desirable) side effect of destructing a local File object is to close the File. Now suppose
you have an object f of a class File and you want File f to be closed before the end of the scope
(i.e., the }) of the scope of object f:
void someCode()
{
File f;
// ... [This code that should execute when f is still open] ...
There is a simple solution to this problem. But in the mean time, remember: Do not explicitly call
the destructor!
[11.7] OK, OK already; I won't explicitly call the destructor of a local; but how do I
handle the above situation?
Simply wrap the extent of the lifetime of the local in an artificial block {...}:
void someCode()
{
{
File f;
// ... [This code will execute when f is still open] ...
}
// ^— f's destructor will automagically be called here!
Most of the time, you can limit the lifetime of a local by wrapping the local in an artificial block
({...}). But if for some reason you can't do that, add a member function that has a similar effect as
the destructor. But do not call the destructor itself!
For example, in the case of class File, you might add a close() method. Typically the destructor will
simply call this close() method. Note that the close() method will need to mark the File object so a
subsequent call won't re-close an already-closed File. E.g., it might set the fileHandle_ data
member to some nonsensical value such as -1, and it might check at the beginning to see if the
fileHandle_ is already equal to -1:
class File {
public:
void close();
~File();
23 of 133
C++ FAQ
// ...
private:
int fileHandle_; // fileHandle_ >= 0 if/only-if it's open
};
File::~File()
{
close();
}
void File::close()
{
if (fileHandle_ >= 0) {
// ... [Perform some operating-system call to close the file] ...
fileHandle_ = -1;
}
}
Note that the other File methods may also need to check if the fileHandle_ is -1 (i.e., check if the
File is closed).
Note also that any constructors that don't actually open a file should set fileHandle_ to -1.
[11.9] But can I explicitly call a destructor if I've allocated my object with new?
Probably not.
Unless you used placement new, you should simply delete the object rather than explicitly calling
the destructor. For example, suppose you allocated the object via a typical new expression:
Then the destructor Fred::~Fred() will automagically get called when you delete it via:
You should not explicitly call the destructor, since doing so won't release the memory that was
allocated for the Fred object itself. Remember: delete p does two things: it calls the destructor and
it deallocates the memory.
There are many uses of placement new. The simplest use is to place an object at a particular
location in memory. This is done by supplying the place as a pointer parameter to the new part of a
new expression:
void someCode()
{
char memory[sizeof(Fred)]; // Line #1
void* place = memory; // Line #2
24 of 133
C++ FAQ
// ...
}
Line #1 creates an array of sizeof(Fred) bytes of memory, which is big enough to hold a Fred
object. Line #2 creates a pointer place that points to the first byte of this memory (experienced C
programmers will note that this step was unnecessary; it's there only to make the code more
obvious). Line #3 essentially just calls the constructor Fred::Fred(). The this pointer in the Fred
constructor will be equal to place. The returned pointer f will therefore be equal to place.
ADVICE: Don't use this "placement new" syntax unless you have to. Use it only when you really
care that an object is placed at a particular location in memory. For example, when your hardware
has a memory-mapped I/O timer device, and you want to place a Clock object at that memory
location.
DANGER: You are taking sole responsibility that the pointer you pass to the "placement new"
operator points to a region of memory that is big enough and is properly aligned for the object type
that you're creating. Neither the compiler nor the run-time system make any attempt to check
whether you did this right. If your Fred class needs to be aligned on a 4 byte boundary but you
supplied a location that isn't properly aligned, you can have a serious disaster on your hands (if you
don't know what "alignment" means, please don't use the placement new syntax). You have been
warned.
You are also solely responsible for destructing the placed object. This is done by explicitly calling the
destructor:
void someCode()
{
char memory[sizeof(Fred)];
void* p = memory;
Fred* f = new(p) Fred();
// ...
f->~Fred(); // Explicitly call the destructor for the placed object
}
This is about the only time you ever explicitly call a destructor.
[11.11] When I write a destructor, do I need to explicitly call the destructors for my
member objects?
No. You never need to explicitly call a destructor (except with placement new).
A class's destructor (whether or not you explicitly define one) automagically invokes the destructors
for member objects. They are destroyed in the reverse order they appear within the declaration for
the class.
class Member {
public:
~Member();
// ...
};
class Fred {
public:
~Fred();
// ...
private:
Member x_;
25 of 133
C++ FAQ
Member y_;
Member z_;
};
Fred::~Fred()
{
// Compiler automagically calls z_.~Member()
// Compiler automagically calls y_.~Member()
// Compiler automagically calls x_.~Member()
}
[11.12] When I write a derived class's destructor, do I need to explicitly call the
destructor for my base class?
No. You never need to explicitly call a destructor (except with placement new).
A derived class's destructor (whether or not you explicitly define one) automagically invokes the
destructors for base class subobjects. Base classes are destructed after member objects. In the
event of multiple inheritance, direct base classes are destructed in the reverse order of their
appearance in the inheritance list.
class Member {
public:
~Member();
// ...
};
class Base {
public:
virtual ~Base(); // A virtual destructor
// ...
};
Derived::~Derived()
{
// Compiler automagically calls x_.~Member()
// Compiler automagically calls Base::~Base()
}
Note: Order dependencies with virtual inheritance are trickier. If you are relying on order
dependencies in a virtual inheritance hierarchy, you'll need a lot more information than is in this
FAQ.
26 of 133
C++ FAQ
Self assignment is when someone assigns an object to itself. For example,
void userCode(Fred& x)
{
x = x; // Self-assignment
}
Obviously no one ever explicitly does a self assignment like the above, but since more than one
pointer or reference can point to the same object (aliasing), it is possible to have self assignment
without knowing it:
int main()
{
Fred z;
userCode(z, z);
}
If you don't worry about self assignment, you'll expose your users to some very subtle bugs that
have very subtle and often disastrous symptoms. For example, the following class will cause a
complete disaster in the case of self-assignment:
class Wilma { };
class Fred {
public:
Fred() : p_(new Wilma()) {}
Fred(const Fred& f) : p_(new Wilma(*f.p_)) { }
~Fred() { delete p_; }
Fred& operator= (const Fred& f)
{
// Bad code: Doesn't handle self-assignment!
delete p_; // Line #1
p_ = new Wilma(*f.p_); // Line #2
return *this;
}
private:
Wilma* p_;
};
If someone assigns a Fred object to itself, line #1 deletes both this->p_ and f.p_ since *this and f
are the same object. But line #2 uses *f.p_, which is no longer a valid object. This will likely cause
a major disaster.
The bottom line is that you the author of class Fred are responsible to make sure self-assignment
on a Fred object is innocuous. Do not assume that users won't ever do that to your objects. It is
your fault if your object crashes when it gets a self-assignment.
27 of 133
C++ FAQ
Aside: the above Fred::operator= (const Fred&) has a second problem: If an
exception is thrown while evaluating new Wilma(*f.p_) (e.g., an out-of-memory
exception or an exception in Wilma's copy constructor), this->p_ will be a dangling
pointer — it will point to memory that is no longer valid. This can be solved by
allocating the new objects before deleting the old objects.
You should worry about self assignment every time you create a class. This does not mean that you
need to add extra code to all your classes: as long as your objects gracefully handle self
assignment, it doesn't matter whether you had to add extra code or not.
If you do need to add extra code to your assignment operator, here's a simple and effective
technique:
return *this;
}
This explicit test isn't always necessary. For example, if you were to fix the assignment operator in
the previous FAQ to handle exceptions thrown by new and/or exceptions thrown by the copy
constructor of class Wilma, you might produce the following code. Note that this code has the
(pleasant) side effect of automatically handling self assignment as well:
In cases like the previous example (where self assignment is harmless but inefficient), some
programmers want to improve the efficiency of self assignment by adding an otherwise unnecessary
test, such as "if (this == &f) return *this;". It is generally the wrong tradeoff to make self
assignment more efficient by making the non-self assignment case less efficient. For example,
adding the above if test to the Fred assignment operator would make the non-self assignment case
slightly less efficient (an extra (and unnecessary) conditional branch). If self assignment actually
occured once in a thousand times, the if would waste cycles 99.9% of the time.
Operator overloading allows C/C++ operators to have user-defined meanings on user-defined types
(classes). Overloaded operators are syntactic sugar for function calls:
class Fred {
public:
// ...
};
28 of 133
C++ FAQ
#if 0
#else
#endif
By overloading standard operators on a class, you can exploit the intuition of the users of that class.
This lets users program in the language of the problem domain rather than in the language of the
machine.
The ultimate goal is to reduce both the learning curve and the defect rate.
[13.4] But operator overloading makes my class look ugly; isn't it supposed to make my
code clearer?
Operator overloading makes life easier for the users of a class, not for the developer of the class!
class Array {
public:
int& operator[] (unsigned i); // Some people don't like this syntax
// ...
29 of 133
C++ FAQ
};
inline
int& Array::operator[] (unsigned i) // Some people don't like this syntax
{
// ...
}
Some people don't like the keyword operator or the somewhat funny syntax that goes with it in the
body of the class itself. But the operator overloading syntax isn't supposed to make life easier for
the developer of a class. It's supposed to make life easier for the users of the class:
int main()
{
Array a;
a[3] = 4; // User code should be obvious and easy to understand...
}
Remember: in a reuse-oriented world, there will usually be many people who use your class, but
there is only one person who builds it (yourself); therefore you should do things that favor the
many rather than the few.
Most can be overloaded. The only C operators that can't be are . and ?: (and sizeof, which is
technically an operator). C++ adds a few of its own operators, most of which can be overloaded
except :: and .*.
Here's an example of the subscript operator (it returns a reference). First without operator
overloading:
class Array {
public:
int& elem(unsigned i) { if (i > 99) error(); return data[i]; }
private:
int data[100];
};
int main()
{
Array a;
a.elem(10) = 42;
a.elem(12) += a.elem(13);
}
class Array {
public:
int& operator[] (unsigned i) { if (i > 99) error(); return data[i]; }
private:
int data[100];
};
int main()
{
Array a;
30 of 133
C++ FAQ
a[10] = 42;
a[12] += a[13];
}
[13.6] Can I overload operator== so it lets me compare two char[] using a string
comparison?
No: at least one operand of any overloaded operator must be of some user-defined type (most of
the time that means a class).
But even if C++ allowed you to do this, which it doesn't, you wouldn't want to do it anyway since
you really should be using a std::string-like class rather than an array of char in the first place since
arrays are evil.
Nope.
The names of, precedence of, associativity of, and arity of operators is fixed by the language. There
is no operator** in C++, so you cannot create one for a class type.
If you're in doubt, consider that x ** y is the same as x * (*y) (in other words, the compiler
assumes y is a pointer). Besides, operator overloading is just syntactic sugar for function calls.
Although this particular syntactic sugar can be very sweet, it doesn't add anything fundamental. I
suggest you overload pow(base,exponent) (a double precision version is in <cmath>).
By the way, operator^ can work for to-the-power-of, except it has the wrong precedence and
associativity.
When you have multiple subscripts, the cleanest way to do it is with operator() rather than with
operator[]. The reason is that operator[] always takes exactly one parameter, but operator() can
take any number of parameters (in the case of a rectangular matrix, two paramters are needed).
For example:
class Matrix {
public:
Matrix(unsigned rows, unsigned cols);
double& operator() (unsigned row, unsigned col);
double operator() (unsigned row, unsigned col) const;
// ...
~Matrix(); // Destructor
Matrix(const Matrix& m); // Copy constructor
Matrix& operator= (const Matrix& m); // Assignment operator
// ...
private:
unsigned rows_, cols_;
double* data_;
};
inline
Matrix::Matrix(unsigned rows, unsigned cols)
31 of 133
C++ FAQ
: rows_ (rows),
cols_ (cols),
data_ (new double[rows * cols])
{
if (rows == 0 || cols == 0)
throw BadIndex("Matrix constructor has 0 size");
}
inline
Matrix::~Matrix()
{
delete[] data_;
}
inline
double& Matrix::operator() (unsigned row, unsigned col)
{
if (row >= rows_ || col >= cols_)
throw BadIndex("Matrix subscript out of bounds");
return data_[cols_*row + col];
}
inline
double Matrix::operator() (unsigned row, unsigned col) const
{
if (row >= rows_ || col >= cols_)
throw BadIndex("const Matrix subscript out of bounds");
return data_[cols_*row + col];
}
Then you can access an element of Matrix m using m(i,j) rather than m[i][j]:
int main()
{
Matrix m(10,10);
m(5,8) = 106.15;
std::cout << m(5,8);
// ...
}
Here's what this FAQ is really all about: Some people build a Matrix class that has an operator[]
that returns a reference to an Array object, and that Array object has an operator[] that returns an
element of the Matrix (e.g., a reference to a double). Thus they access elements of the matrix using
syntax like m[i][j] rather than syntax like m(i,j).
The array-of-array solution obviously works, but it is less flexible than the operator() approach.
Specifically, there are easy performance tuning tricks that can be done with the operator() approach
that are more difficult in the [][] approach, and therefore the [][] approach is more likely to lead to
bad performance, at least in some cases.
For example, the easiest way to implement the [][] approach is to use a physical layout of the
matrix as a dense matrix that is stored in row-major form (or is it column-major; I can't ever
remember). In contrast, the operator() approach totally hides the physical layout of the matrix, and
that can lead to better performance in some cases.
32 of 133
C++ FAQ
Put it this way: the operator() approach is never worse than, and sometimes better than, the [][]
approach.
• The operator() approach is never worse because it is easy to implement the dense, row-
major physical layout using the operator() approach, so when that configuration happens to
be the optimal layout from a performance standpoint, the operator() approach is just as easy
as the [][] approach (perhaps the operator() approach is a tiny bit easier, but I won't quibble
over minor nits).
• The operator() approach is sometimes better because whenever the optimal layout for a
given application happens to be something other than dense, row-major, the implementation
is often significantly easier using the operator() approach compared to the [][] approach.
As an example of when a physical layout makes a significant difference, a recent project happened
to access the matrix elements in columns (that is, the algorithm accesses all the elements in one
column, then the elements in another, etc.), and if the physical layout is row-major, the accesses
can "stride the cache". For example, if the rows happen to be almost as big as the processor's cache
size, the machine can end up with a "cache miss" for almost every element access. In this particular
project, we got a 20% improvement in performance by changing the mapping from the logical
layout (row,column) to the physical layout (column,row).
Of course there are many examples of this sort of thing from numerical methods, and sparse
matrices are a whole other dimension on this issue. Since it is, in general, easier to implement a
sparse matrix or swap row/column ordering using the operator() approach, the operator() approach
loses nothing and may gain something — it has no down-side and a potential up-side.
[13.10] Should I design my classes from the outside (interfaces first) or from the inside
(data first)?
A good interface provides a simplified view that is expressed in the vocabulary of a user. In the case
of OO software, the interface is normally the set of public methods of either a single class or a tight
group of classes.
First think about what the object logically represents, not how you intend to physically build it. For
example, suppose you have a Stack class that will be built by containing a LinkedList:
class Stack {
public:
// ...
private:
LinkedList list_;
};
Should the Stack have a get() method that returns the LinkedList? Or a set() method that takes a
LinkedList? Or a constructor that takes a LinkedList? Obviously the answer is No, since you should
design your interfaces from the outside-in. I.e., users of Stack objects don't care about LinkedLists;
they care about pushing and popping.
Now for another example that is a bit more subtle. Suppose class LinkedList is built using a linked
list of Node objects, where each Node object has a pointer to the next Node:
33 of 133
C++ FAQ
class LinkedList {
public:
// ...
private:
Node* first_;
};
Should the LinkedList class have a get() method that will let users access the first Node? Should the
Node object have a get() method that will let users follow that Node to the next Node in the chain?
In other words, what should a LinkedList look like from the outside? Is a LinkedList really a chain of
Node objects? Or is that just an implementation detail? And if it is just an implementation detail,
how will the LinkedList let users access each of the elements in the LinkedList one at a time?
One man's answer: A LinkedList is not a chain of Nodes. That may be how it is built, but that is not
what it is. What it is is a sequence of elements. Therefore the LinkedList abstraction should provide
a "LinkedListIterator" class as well, and that "LinkedListIterator" might have an operator++ to go to
the next element, and it might have a get()/set() pair to access its value stored in the Node (the
value in the Node element is solely the responsibility of the LinkedList user, which is why there is a
get()/set() pair that allows the user to freely manipulate that value).
Starting from the user's perspective, we might want our LinkedList class to support operations that
look similar to accessing an array using pointer arithmetic:
void userCode(LinkedList& a)
{
for (LinkedListIterator p = a.begin(); p != a.end(); ++p)
std::cout << *p << '\n';
}
To implement this interface, LinkedList will need a begin() method and an end() method. These
return a "LinkedListIterator" object. The "LinkedListIterator" will need a method to go forward, +
+p; a method to access the current element, *p; and a comparison operator, p != a.end().
The code follows. The key insight is that the LinkedList class does not have any methods that lets
users access the Nodes. Nodes are an implementation technique that is completely buried. The
LinkedList class could have its internals replaced with a doubly linked list, or even an array, and the
only difference would be some performance differences with the prepend(elem) and append(elem)
methods.
class LinkedListIterator;
class LinkedList;
class Node {
// No public members; this is a "private class"
friend LinkedListIterator; // A friend class
friend LinkedList;
Node* next_;
int elem_;
};
class LinkedListIterator {
public:
bool operator== (LinkedListIterator i) const;
bool operator!= (LinkedListIterator i) const;
void operator++ (); // Go to the next element
34 of 133
C++ FAQ
int& operator* (); // Access the current element
private:
LinkedListIterator(Node* p);
Node* p_;
friend LinkedList; // so LinkedList can construct a LinkedListIterator
};
class LinkedList {
public:
void append(int elem); // Adds elem after the end
void prepend(int elem); // Adds elem before the beginning
// ...
LinkedListIterator begin();
LinkedListIterator end();
// ...
private:
Node* first_;
};
Here are the methods that are obviously inlinable (probably in the same header file):
inline LinkedListIterator::LinkedListIterator(Node* p)
: p_(p)
{}
35 of 133
C++ FAQ
Conclusion: The linked list had two different kinds of data. The values of the elements stored in the
linked list are the responsibility of the user of the linked list (and only the user; the linked list itself
makes no attempt to prohibit users from changing the third element to 5), and the linked list's
infrastructure data (next pointers, etc.), whose values are the responsibility of the linked list (and
only the linked list; e.g., the linked list does not let users change (or even look at!) the various next
pointers).
Thus the only get()/set() methods were to get and set the elements of the linked list, but not the
infrastructure of the linked list. Since the linked list hides the infrastructure pointers/etc., it is able
to make very strong promises regarding that infrastructure (e.g., if it was a doubly linked list, it
might guarantee that every forward pointer was matched by a backwards pointer from the next
Node).
So, we see here an example of where the values of some of a class's data is the responsibility of
users (in which case the class needs to have get()/set() methods for that data) but the data that
the class wants to control does not necessarily have get()/set() methods.
Note: the purpose of this example is not to show you how to write a linked-list class. In fact you
should not "roll your own" linked-list class since you should use one of the "container classes"
provided with your compiler. Ideally you'll use one of the standard container classes such as the
std::list<T> template.
Friends can be either functions or other classes. A class grants access privileges to its friends.
Normally a developer has political and technical control over both the friend and member functions
of a class (else you may need to get permission from the owner of the other pieces when you want
to update your own class).
You often need to split a class in half when the two halves will have different numbers of instances
or different lifetimes. In these cases, the two halves usually need direct access to each other (the
two halves used to be in the same class, so you haven't increased the amount of code that needs
direct access to a data structure; you've simply reshuffled the code into two classes instead of one).
The safest way to implement this is to make the two halves friends of each other.
If you use friends like just described, you'll keep private things private. People who don't
understand this often make naive efforts to avoid using friendship in situations like the above, and
often they actually destroy encapsulation. They either use public data (grotesque!), or they make
the data accessible between the halves via public get() and set() member functions. Having a public
get() and set() member function for a private datum is OK only when the private datum "makes
sense" from outside the class (from a user's perspective). In many cases, these get()/set() member
functions are almost as bad as public data: they hide (only) the name of the private datum, but
they don't hide the existence of the private datum.
Similarly, if you use friend functions as a syntactic variant of a class's public access functions, they
don't violate encapsulation any more than a member function violates encapsulation. In other
words, a class's friends don't violate the encapsulation barrier: along with the class's member
functions, they are the encapsulation barrier.
(Many people think of a friend function as something outside the class. Instead, try thinking of a
friend function as part of the class's public interface. A friend function in the class declaration
36 of 133
C++ FAQ
doesn't violate encapsulation any more than a public member function violates encapsulation: both
have exactly the same authority with respect to accessing the class's non-public parts.)
Member functions and friend functions are equally privileged (100% vested). The major difference
is that a friend function is called like f(x), while a member function is called like x.f(). Thus the
ability to choose between member functions (x.f()) and friend functions (f(x)) allows a designer to
select the syntax that is deemed most readable, which lowers maintenance costs.
The major disadvantage of friend functions is that they require an extra line of code when you want
dynamic binding. To get the effect of a virtual friend, the friend function should call a hidden
(usually protected) virtual member function. This is called the Virtual Friend Function Idiom. For
example:
class Base {
public:
friend void f(Base& b);
// ...
protected:
virtual void do_f();
// ...
};
void userCode(Base& b)
{
f(b);
}
The statement f(b) in userCode(Base&) will invoke b.do_f(), which is virtual. This means that
Derived::do_f() will get control if b is actually a object of class Derived. Note that Derived overrides
the behavior of the protected virtual member function do_f(); it does not have its own variation of
the friend function, f(Base&).
[14.4] What does it mean that "friendship isn't inherited, transitive, or reciprocal"?
Just because I grant you friendship access to me doesn't automatically grant your kids access to
me, doesn't automatically grant your friends access to me, and doesn't automatically grant me
access to you.
37 of 133
C++ FAQ
• I don't necessarily trust the kids of my friends. The privileges of friendship aren't inherited.
Derived classes of a friend aren't necessarily friends. If class Fred declares that class Base is
a friend, classes derived from Base don't have any automatic special access rights to Fred
objects.
• I don't necessarily trust the friends of my friends. The privileges of friendship aren't
transitive. A friend of a friend isn't necessarily a friend. If class Fred declares class Wilma as
a friend, and class Wilma declares class Betty as a friend, class Betty doesn't necessarily
have any special access rights to Fred objects.
• You don't necessarily trust me simply because I declare you my friend. The privileges of
friendship aren't reciprocal. If class Fred declares that class Wilma is a friend, Wilma objects
have special access to Fred objects but Fred objects do not automatically have special access
to Wilma objects.
Use a member when you can, and a friend when you have to.
Sometimes friends are syntactically better (e.g., in class Fred, friend functions allow the Fred
parameter to be second, while members require it to be first). Another good use of friend functions
are the binary infix arithmetic operators. E.g., aComplex + aComplex should be defined as a friend
rather than a member if you want to allow aFloat + aComplex as well (member functions don't
allow promotion of the left hand argument, since that would change the class of the object that is
the recipient of the member function invocation).
Increase type safety, reduce errors, improve performance, allow extensibility, and provide
inheritability.
printf() is arguably not broken, and scanf() is perhaps livable despite being error prone, however
both are limited with respect to what C++ I/O can do. C++ I/O (using << and >>) is, relative to C
(using printf() and scanf()):
• Better type safety: With <iostream>, the type of object being I/O'd is known statically by
the compiler. In contrast, <cstdio> uses "%" fields to figure out the types dynamically.
• Less error prone: With <iostream>, there are no redundant "%" tokens that have to be
consistent with the actual objects being I/O'd. Removing redundancy removes a class of
errors.
• Extensible: The C++ <iostream> mechanism allows new user-defined types to be I/O'd
without breaking existing code. Imagine the chaos if everyone was simultaneously adding
new incompatible "%" fields to printf() and scanf()?!).
• Inheritable: The C++ <iostream> mechanism is built from real classes such as std::ostream
and std::istream. Unlike <cstdio>'s FILE*, these are real classes and hence inheritable. This
means you can have other user-defined things that look and act like streams, yet that do
whatever strange and wonderful things you want. You automatically get to use the zillions of
lines of I/O code written by users you don't even know, and they don't need to know about
your "extended stream" class.
[15.2] Why does my program go into an infinite loop when someone enters an invalid
input character?
For example, suppose you have the following code that reads integers from std::cin:
38 of 133
C++ FAQ
#include <iostream>
int main()
{
std::cout << "Enter numbers separated by whitespace (use -1 to quit): ";
int i = 0;
while (i != -1) {
std::cin >> i; // BAD FORM — See comments below
std::cout << "You entered " << i << '\n';
}
}
The problem with this code is that it lacks any checking to see if someone entered an invalid input
character. In particular, if someone enters something that doesn't look like an integer (such as an
'x'), the stream std::cin goes into a "failed state," and all subsequent input attempts return
immediately without doing anything. In other words, the program enters an infinite loop; if 42 was
the last number that was successfully read, the program will print the message You entered 42 over
and over.
An easy way to check for invalid input is to move the input request from the body of the while loop
into the control-expression of the while loop. E.g.,
#include <iostream>
int main()
{
std::cout << "Enter a number, or -1 to quit: ";
int i = 0;
while (std::cin >> i) { // GOOD FORM
if (i == -1) break;
std::cout << "You entered " << i << '\n';
}
}
This will cause the while loop to exit either when you hit end-of-file, or when you enter a bad
integer, or when you enter -1.
(Naturally you can eliminate the break by changing the while loop expression from
while (std::cin >> i) to while ((std::cin >> i) && (i != -1)), but that's not really the point of this
FAQ since this FAQ has to do with iostreams rather than generic structured programming
guidelines.)
[15.3] How does that funky while (std::cin >> foo) syntax work?
See the previous FAQ for an example of the "funky while (std::cin >> foo) syntax."
The expression (std::cin >> foo) calls the appropriate operator>> (for example, it calls the
operator>> that takes an std::istream on the left and, if foo is of type int, an int& on the right).
The std::istream operator>> functions return their left argument by convention, which in this case
means it will return std::cin. Next the compiler notices that the returned std::istream is in a
boolean context, so it converts that std::istream into a boolean.
To convert an std::istream into a boolean, the compiler calls a member function called
std::istream::operator void*(). This returns a void* pointer, which is in turn converted to a boolean
(NULL becomes false, any other pointer becomes true). So in this case the compiler generates a call
to std::cin.operator void*(), just as if you had casted it explicitly such as (void*) std::cin.
39 of 133
C++ FAQ
The operator void*() cast operator returns some non-NULL pointer if the stream is in a good state,
or NULL if it's in a failed state. For example, if you read one too many times (e.g., if you're already
at end-of-file), or if the actual info on the input stream isn't valid for the type of foo (e.g., if foo is
an int and the data is an 'x' character), the stream will go into a failed state and the cast operator
will return NULL.
The reason operator>> doesn't simply return a bool (or void*) indicating whether it succeeded or
failed is to support the "cascading" syntax:
In other words, if we replace operator>> with a normal function name such as readFrom(), this
becomes the expression:
[15.4] Why does my input seem to process past the end of file?
Because the eof state may not get set until after a read is attempted past the end of file. That is,
reading the last byte from a file might not set the eof state. E.g., suppose the input stream is
mapped to a keyboard — in that case it's not even theoretically possible for the C++ library to
predict whether or not the character that the user just typed will be the last character.
For example, the following code might have an off-by-one error with the count i:
int i = 0;
while (! std::cin.eof()) { // WRONG! (not reliable)
std::cin >> x;
++i;
// Work with x ...
}
int i = 0;
while (std::cin >> x) { // RIGHT! (reliable)
++i;
// Work with x ...
}
[15.5] Why is my program ignoring my input request after the first iteration?
Because the numerical extractor leaves non-digits behind in the input buffer.
40 of 133
C++ FAQ
char name[1000];
int age;
for (;;) {
std::cout << "Name: ";
std::cin >> name;
std::cout << "Age: ";
std::cin >> age;
}
for (;;) {
std::cout << "Name: ";
std::cin >> name;
std::cout << "Age: ";
std::cin >> age;
std::cin.ignore(INT_MAX, '\n');
}
Of course you might want to change the for (;;) statement to while (std::cin), but don't confuse
that with skipping the non-numeric characters at the end of the loop via the line:
std::cin.ignore(...);.
#include <iostream>
class Fred {
public:
friend std::ostream& operator<< (std::ostream& o, const Fred& fred);
// ...
private:
int i_; // Just for illustration
};
int main()
{
Fred f;
std::cout << "My Fred object: " << f << "\n";
}
We use a non-member function (a friend in this case) since the Fred object is the right-hand
operand of the << operator. If the Fred object was supposed to be on the left hand side of the <<
(that is, myFred << std::cout rather than std::cout << myFred), we could have used a member
function named operator<<.
Note that operator<< returns the stream. This is so the output operations can be cascaded.
41 of 133
C++ FAQ
[15.7] But shouldn't I always use a printOn() method rather than a friend function?
No.
The usual reason people want to always use a printOn() method rather than a friend function is
because they wrongly believe that friends violate encapsulation and/or that friends are evil. These
beliefs are naive and wrong: when used properly, friends can actually enhance encapsulation.
This is not to say that the printOn() method approach is never useful. For example, it is useful when
providing printing for an entire hierarchy of classes. But if you use a printOn() method, it should
normally be protected, not public.
For completeness, here is "the printOn() method approach." The idea is to have a member function
(often called printOn() that does the actual printing, then have operator<< call that printOn()
method. When it is done wrongly, the printOn() method is public so operator<< doesn't have to be
a friend — it can be a simple top-level function that is neither a friend nor a member of the class.
Here's some sample code:
#include <iostream>
class Fred {
public:
void printOn(std::ostream& o) const;
// ...
};
// The actual printing is done inside the printOn() method [NOT recommended!]
void Fred::printOn(std::ostream& o) const
{
// ...
}
People wrongly assume that this reduces maintenance cost "since it avoids having a friend
function." This is a wrong assumption because:
42 of 133
C++ FAQ
2. The member-called-by-top-level-function approach makes the class harder to use,
particularly by programmers who are not also class designers. The approach exposes
a public method that programmers are not supposed to call. When a programmer reads the
public methods of the class, they'll see two ways to do the same thing. The documentation
would need to say something like, "This does exactly the same as that, but don't use this;
instead use that." And the average programmer will say, "Huh? Why make the method public
if I'm not supposed to use it?" In reality the only reason the printOn() method is public is to
avoid granting friendship status to operator<<, and that is a notion that is somewhere
between subtle and incomprehensible to a programmer who simply wants to use the class.
Net: the member-called-by-top-level-function approach has a cost but no benefit. Therefore it is, in
general, a bad idea.
Note: if the printOn() method is protected or private, the second objection doesn't apply. There are
cases when that approach is reasonable, such as when providing printing for an entire hierarchy of
classes. Note also that when the printOn() method is non-public, operator<< needs to be a friend.
Use operator overloading to provide a friend right-shift operator, operator>>. This is similar to the
output operator, except the parameter doesn't have a const: "Fred&" rather than "const Fred&".
#include <iostream>
class Fred {
public:
friend std::istream& operator>> (std::istream& i, Fred& fred);
// ...
private:
int i_; // Just for illustration
};
int main()
{
Fred f;
std::cout << "Enter a Fred object: ";
std::cin >> f;
// ...
}
Note that operator>> returns the stream. This is so the input operations can be cascaded and/or
used in a while loop or if statement.
class Base {
public:
friend std::ostream& operator<< (std::ostream& o, const Base& b);
// ...
protected:
43 of 133
C++ FAQ
virtual void printOn(std::ostream& o) const;
};
The end result is that operator<< acts as if it was dynamically bound, even though it's a friend
function. This is called the Virtual Friend Function Idiom.
Note that derived classes override printOn(std::ostream&) const. In particular, they do not provide
their own operator<<.
Naturally if Base is an ABC, Base::printOn(std::ostream&) const can be declared pure virtual using
the "= 0" syntax.
[15.10] How can I "reopen" std::cin and std::cout in binary mode under DOS and/or
OS/2?
For example, suppose you want to do binary I/O using std::cin and std::cout. Suppose further that
your operating system (such as DOS or OS/2) insists on translating "\r\n" into "\n" on input from
std::cin, and "\n" to "\r\n" on output to std::cout or std::cerr.
Unfortunately there is no standard way to cause std::cin, std::cout, and/or std::cerr to be opened
in binary mode. Closing the streams and attempting to reopen them in binary mode might have
unexpected or undesirable results.
On systems where it makes a difference, the implementation might provide a way to make them
binary streams, but you would have to check the manuals to find out.
You should use forward slashes in your filenames, even on an operating system that uses
backslashes such as DOS, Windows, OS/2, etc. For example:
#include <iostream>
#include <fstream>
int main()
{
#if 1
std::ifstream file("../test.dat"); // RIGHT!
#else
std::ifstream file("..\test.dat"); // WRONG!
#endif
44 of 133
C++ FAQ
// ...
}
Remember, the backslash ("\") is used in string literals to create special characters: "\n" is a
newline, "\b" is a backspace, and "\t" is a tab, "\a" is an "alert", "\v" is a vertical-tab, etc. Therefore
the file name "\version\next\alpha\beta\test.dat" is interpreted as a bunch of very funny
characters; use "/version/next/alpha/beta/test.dat" instead, even on systems that use a "\" as the
directory separator such as DOS, Windows, OS/2, etc. This is because the library routines on these
operating systems handle "/" and "\" interchangeably.
There are two easy ways to do this: you can use the <cstdio> facilities or the <iostream> library.
In general, you should prefer the <iostream> library.
The <iostream> library allows you to convert pretty much anything to a std::string using the
following syntax (the example converts a double, but you could substitute pretty much anything
that prints using the << operator):
#include <iostream>
#include <sstream>
#include <string>
std::string convertToString(double x)
{
std::ostringstream o;
if (o << x)
return o.str();
// some sort of error handling goes here...
return "conversion error";
}
The std::ostringstream object o offers formatting facilities just like those for std::cout. You can use
manipulators and format flags to control the formatting of the result, just as you can for other
std::cout.
In this example, we insert x into o via the overloaded insertion operator, <<. This invokes the
iostream formatting facilities to convert x into a std::string. The if test makes sure the conversion
works correctly — it should always succeed for built-in/intrinsic types, but the if test is good style.
The expression os.str() returns the std::string that contains whatever has been inserted into stream
o, in this case the string value of x.
There are two easy ways to do this: you can use the <cstdio> facilities or the <iostream> library.
In general, you should prefer the <iostream> library.
The <iostream> library allows you to convert a std::string to pretty much anything using the
following syntax (the example converts a double, but you could substitute pretty much anything
that can be read using the >> operator):
#include <iostream>
#include <sstream>
#include <string>
The std::istringstream object i offers formatting facilities just like those for std::cin. You can use
manipulators and format flags to control the formatting of the result, just as you can for other
std::cin.
In this example, we initialize the std::istringstream i passing the std::string s (for example, s might
be the string "123.456"), then we extract i into x via the overloaded extraction operator, >>. This
invokes the iostream formatting facilities to convert as much of the string as possible/appropriate
based on the type of x.
The if test makes sure the conversion works correctly. For example, if the string contains characters
that are inappropriate for the type of x, the if test will fail.
The pointed-to-data.
The keyword should really be delete_the_thing_pointed_to_by. The same abuse of English occurs
when freeing the memory pointed to by a pointer in C: free(p) really means
free_the_stuff_pointed_to_by(p).
[16.2] Can I free() pointers allocated with new? Can I delete pointers allocated with
malloc()?
No!
It is perfectly legal, moral, and wholesome to use malloc() and delete in the same program, or to
use new and free() in the same program. But it is illegal, immoral, and despicable to call free() with
a pointer allocated via new, or to call delete on a pointer allocated via malloc().
Beware! I occasionally get e-mail from people telling me that it works OK for them on machine X
and compiler Y. That does not make it right! Sometimes people say, "But I'm just working with an
array of char." Nonetheless do not mix malloc() and delete on the same pointer, or new and free()
on the same pointer! If you allocated via p = new char[n], you must use delete[] p; you must not
use free(p). Or if you allocated via p = malloc(n), you must use free(p); you must not use
delete[] p or delete p! Mixing these up could cause a catastrophic failure at runtime if the code was
ported to a new machine, a new compiler, or even a new version of the same compiler.
No!
When realloc() has to copy the allocation, it uses a bitwise copy operation, which will tear many C+
+ objects to shreds. C++ objects should be allowed to copy themselves. They use their own copy
constructor or assignment operator.
Besides all that, the heap that new uses may not be the same as the heap that malloc() and
realloc() use!
No! (But if you have an old compiler, you may have to force the new operator to throw an exception
if it runs out of memory.)
It turns out to be a real pain to always write explicit NULL tests after every new allocation. Code like
the following is very tedious:
If your compiler doesn't support (or if you refuse to use) exceptions, your code might be even more
tedious:
Take heart. In C++, if the runtime system cannot allocate sizeof(Fred) bytes of memory during
p = new Fred(), a std::bad_alloc exception will be thrown. Unlike malloc(), new never returns NULL!
However, if your compiler is old, it may not yet support this. Find out by checking your compiler's
documentation under "new". If you have an old compiler, you may have to force the compiler to
have this behavior.
[16.6] How can I convince my (older) compiler to automatically check new to see if it
returns NULL?
If you have an old compiler that doesn't automagically perform the NULL test, you can force the
runtime system to do the test by installing a "new handler" function. Your "new handler" function
can do anything you want, such as throw an exception, delete some objects and return (in which
case operator new will retry the allocation), print a message and abort() the program, etc.
47 of 133
C++ FAQ
Here's a sample "new handler" that prints a message and throws an exception. The handler is
installed using std::set_new_handler():
void myNewHandler()
{
// This is your own handler. It can do anything you want.
throw alloc_error();
}
int main()
{
std::set_new_handler(myNewHandler); // Install your "new handler"
// ...
}
After the std::set_new_handler() line is executed, operator new will call your myNewHandler()
if/when it runs out of memory. This means that new will never return NULL:
Note: If your compiler doesn't support exception handling, you can, as a last resort, change the line
throw ...; to:
Note: If some global/static object's constructor uses new, it won't use the myNewHandler() function
since that constructor will get called before main() begins. Unfortunately there's no convenient way
to guarantee that the std::set_new_handler() will be called before the first use of new. For
example, even if you put the std::set_new_handler() call in the constructor of a global object, you
still don't know if the module ("compilation unit") that contains that global object will be elaborated
first or last or somewhere inbetween. Therefore you still don't have any guarantee that your call of
std::set_new_handler() will happen before any other global's constructor gets invoked.
No!
The C++ language guarantees that delete p will do nothing if p is equal to NULL. Since you might
get the test backwards, and since most testing methodologies force you to explicitly test every
branch point, you should not put in the redundant if test.
Wrong:
if (p != NULL)
delete p;
Right:
48 of 133
C++ FAQ
delete p;
[16.8] What are the two steps that happen when I say delete p?
delete p is a two-step process: it calls the destructor, then releases the memory. The code
generated for delete p looks something like this (assuming p is of type Fred*):
The statement p->~Fred() calls the destructor for the Fred object pointed to by p.
[16.9] In p = new Fred(), does the Fred memory "leak" if the Fred constructor throws an
exception?
No.
If an exception occurs during the Fred constructor of p = new Fred(), the C++ language guarantees
that the memory sizeof(Fred) bytes that were allocated will automagically be released back to the
heap.
The statement marked "Placement new" calls the Fred constructor. The pointer p becomes the this
pointer inside the constructor, Fred::Fred().
Any time you allocate an array of objects via new (usually with the [n] in the new expression), you
must use [] in the delete statement. This syntax is necessary because there is no syntactic
difference between a pointer to a thing and a pointer to an array of things (something we inherited
from C).
[16.11] What if I forget the [] when deleteing array allocated via new T[n]?
It is the programmer's —not the compiler's— responsibility to get the connection between new T[n]
and delete[] p correct. If you get it wrong, neither a compile-time nor a run-time error message will
be generated by the compiler. Heap corruption is a likely result. Or worse. Your program will
probably die.
[16.12] Can I drop the [] when deleteing array of some built-in type (char, int, etc)?
No!
Sometimes programmers think that the [] in the delete[] p only exists so the compiler will call the
appropriate destructors for all elements in the array. Because of this reasoning, they assume that
an array of some built-in type such as char or int can be deleted without the []. E.g., they assume
the following is valid code:
void userCode(int n)
{
char* p = new char[n];
// ...
delete p; // <— ERROR! Should be delete[] p !
}
But the above code is wrong, and it can cause a disaster at runtime. In particular, the code that's
called for delete p is operator delete(void*), but the code that's called for delete[] p is
operator delete[](void*). The default behavior for the latter is to call the former, but users are
allowed to replace the latter with a different behavior (in which case they would normally also
replace the corresponding new code in operator new[](size_t)). If they replaced the delete[] code
so it wasn't compatible with the delete code, and you called the wrong one (i.e., if you said delete p
rather than delete[] p), you could end up with a disaster at runtime.
[16.13] After p = new Fred[n], how does the compiler know there are n objects to be
destructed during delete[] p?
Long answer: The run-time system stores the number of objects, n, somewhere where it can be
retrieved if you only know the pointer, p. There are two popular techniques that do this. Both these
techniques are in use by commercial grade compilers, both have tradeoffs, and neither is perfect.
These techniques are:
• Over-allocate the array and put n just to the left of the first Fred object.
• Use an associative array with p as the key and n as the value.
[16.14] Is it legal (and moral) for a member function to say delete this?
50 of 133
C++ FAQ
As long as you're careful, it's OK for an object to commit suicide (delete this).
1. You must be absolutely 100% positive sure that this object was allocated via new (not by
new[], nor by placement new, nor a local object on the stack, nor a global, nor a member of
another object; but by plain ordinary new).
2. You must be absolutely 100% positive sure that your member function will be the last
member function invoked on this object.
3. You must be absolutely 100% positive sure that the rest of your member function (after the
delete this line) doesn't touch any piece of this object (including calling any other member
functions or touching any data members).
4. You must be absolutely 100% positive sure that no one even touches the this pointer itself
after the delete this line. In other words, you must not examine it, compare it with another
pointer, compare it with NULL, print it, cast it, do anything with it.
Naturally the usual caveats apply in cases where your this pointer is a pointer to a base class when
you don't have a virtual destructor.
There are many ways to do this, depending on how flexible you want the array sizing to be. On one
extreme, if you know all the dimensions at compile-time, you can allocate multidimensional arrays
statically (as in C):
void manipulateArray()
{
const unsigned nrows = 10; // Num rows is a compile-time constant
const unsigned ncols = 20; // Num columns is a compile-time constant
Fred matrix[nrows][ncols];
More commonly, the size of the matrix isn't known until run-time but you know that it will be
rectangular. In this case you need to use the heap ("freestore"), but at least you are able to
allocate all the elements in one freestore chunk.
51 of 133
C++ FAQ
// Since we used a simple pointer above, we need to be VERY
// careful to avoid skipping over the delete code.
// That's why we catch all exceptions:
try {
// ...
}
catch (...) {
// Make sure to do the delete when an exception is thrown:
delete[] matrix;
throw; // Re-throw the current exception
}
Finally at the other extreme, you may not even be guaranteed that the matrix is rectangular. For
example, if each row could have a different length, you'll need to allocate each row individually. In
the following function, ncols[i] is the number of columns in row number i, where i varies between 0
and nrows-1 inclusive.
// ...
}
catch (...) {
// Make sure to do the delete when an exception is thrown:
// Note that some of these matrix[...] pointers might be
// NULL, but that's okay since it's legal to delete NULL.
for (unsigned i = nrows; i > 0; --i)
delete[] matrix[i-1];
delete[] matrix;
throw; // Re-throw the current exception
}
Note the funny use of matrix[i-1] in the deletion process. This prevents wrap-around of the
unsigned value when i goes one step below zero.
Finally, note that pointers and arrays are evil. It is normally much better to encapsulate your
pointers in a class that has a safe and simple interface. The following FAQ shows how to do this.
[16.16] But the previous FAQ's code is SOOOO tricky and error prone! Isn't there a
simpler way?
Yep.
The reason the code in the previous FAQ was so tricky and error prone was that it used pointers,
and we know that pointers and arrays are evil. The solution is to encapsulate your pointers in a
class that has a safe and simple interface. For example, we can define a Matrix class that handles a
rectangular matrix so our user code will be vastly simplified when compared to the the rectangular
matrix code from the previous FAQ:
53 of 133
C++ FAQ
void manipulateArray(unsigned nrows, unsigned ncols)
{
Matrix matrix(nrows, ncols); // Construct a Matrix called matrix
The main thing to notice is the lack of clean-up code. For example, there aren't any delete
statements in the above code, yet there will be no memory leaks, assuming only that the Matrix
destructor does its job correctly.
class Matrix {
public:
Matrix(unsigned nrows, unsigned ncols);
// Throws a BadSize object if either size is zero
class BadSize { };
private:
Fred* data_;
unsigned nrows_, ncols_;
};
54 of 133
C++ FAQ
Matrix::Matrix(unsigned nrows, unsigned ncols)
: data_ (new Fred[nrows * ncols]),
nrows_ (nrows),
ncols_ (ncols)
{
if (nrows == 0 || ncols == 0)
throw BadSize();
}
Matrix::~Matrix()
{
delete[] data_;
}
Note that the above Matrix class accomplishes two things: it moves some tricky memory
management code from the user code (e.g., main()) to the class, and it reduces the overall bulk of
program. The latter point is important. For example, assuming Matrix is even mildly reusable,
moving complexity from the users [plural] of Matrix into Matrix itself [singular] is equivalent to
moving complexity from the many to the few. Anyone who's seen Star Trek 2 knows that the good
of the many outweighs the good of the few... or the one.
[16.17] But the above Matrix class is specific to Fred! Isn't there a way to make it
generic?
Now it's easy to use Matrix<T> for things other than Fred. For example, the following uses a Matrix
of std::string (where std::string is the standard string class):
#include <string>
55 of 133
C++ FAQ
You can thus get an entire family of classes from a template. For example, Matrix<Fred>,
Matrix<std::string>, Matrix< Matrix<std::string> >, etc.
private:
T* data_;
unsigned nrows_, ncols_;
};
template<class T>
inline T& Matrix<T>::operator() (unsigned row, unsigned col)
{
if (row >= nrows_ || col >= ncols_) throw BoundsViolation();
return data_[row*ncols_ + col];
}
template<class T>
inline const T& Matrix<T>::operator() (unsigned row, unsigned col) const
{
if (row >= nrows_ || col >= ncols_) throw BoundsViolation();
56 of 133
C++ FAQ
return data_[row*ncols_ + col];
}
template<class T>
inline Matrix<T>::Matrix(unsigned nrows, unsigned ncols)
: data_ (new T[nrows * ncols])
, nrows_ (nrows)
, ncols_ (ncols)
{
if (nrows == 0 || ncols == 0)
throw BadSize();
}
template<class T>
inline Matrix<T>::~Matrix()
{
delete[] data_;
}
The following uses a vector<vector<T> > (note the space between the two > symbols).
#include <vector>
private:
vector<vector<T> > data_;
};
template<class T>
inline T& Matrix<T>::operator() (unsigned row, unsigned col)
{
if (row >= nrows_ || col >= ncols_) throw BoundsViolation();
return data_[row][col];
}
template<class T>
inline const T& Matrix<T>::operator() (unsigned row, unsigned col) const
{
if (row >= nrows_ || col >= ncols_) throw BoundsViolation();
return data_[row][col];
57 of 133
C++ FAQ
}
template<class T>
Matrix<T>::Matrix(unsigned nrows, unsigned ncols)
: data_ (nrows)
{
if (nrows == 0 || ncols == 0)
throw BadSize();
for (unsigned i = 0; i < nrows; ++i)
data_[i].resize(ncols);
}
[16.19] Does C++ have arrays whose length can be specified at run-time?
Yes, in the sense that the standard library has a std::vector template that provides this behavior.
No, in the sense that built-in array types need to have their length specified at compile time.
Yes, in the sense that even built-in array types can specify the first index bounds at run-time. E.g.,
comparing with the previous FAQ, if you only need the first array dimension to vary then you can
just ask new for an array of arrays, rather than an array of pointers to arrays:
You can't do this if you need anything other than the first dimension of the array to change at run-
time.
But please, don't use arrays unless you have to. Arrays are evil. Use some object of some class if
you can. Use arrays only when you have to.
[16.20] How can I force objects of my class to always be created via new rather than as
locals or global/static objects?
As usual with the Named Constructor Idiom, the constructors are all private or protected, and there
are one or more public static create() methods (the so-called "named constructors"), one per
constructor. In this case the create() methods allocate the objects via new. Since the constructors
themselves are not public, there is no other way to create objects of the class.
class Fred {
public:
// The create() methods are the "named constructors":
static Fred* create() { return new Fred(); }
static Fred* create(int i) { return new Fred(i); }
static Fred* create(const Fred& fred) { return new Fred(fred); }
// ...
58 of 133
C++ FAQ
private:
// The constructors themselves are private or protected:
Fred();
Fred(int i);
Fred(const Fred& fred);
// ...
};
int main()
{
Fred* p = Fred::create(5);
// ...
delete p;
}
Make sure your constructors are in the protected section if you expect Fred to have derived classes.
Note also that you can make another class Wilma a friend of Fred if you want to allow a Wilma to
have a member object of class Fred, but of course this is a softening of the original goal, namely to
force Fred objects to be allocated via new.
If all you want is the ability to pass around a bunch of pointers to the same object, with the feature
that the object will automagically get deleted when the last pointer to it disappears, you can use
something like the following "smart pointer" class:
// Fred.h
class FredPtr;
class Fred {
public:
Fred() : count_(0) /*...*/ { } // All ctors set count_ to 0 !
// ...
private:
friend FredPtr; // A friend class
unsigned count_;
// count_ must be initialized to 0 by all constructors
// count_ is the number of FredPtr objects that point at this
};
class FredPtr {
public:
Fred* operator-> () { return p_; }
Fred& operator* () { return *p_; }
FredPtr(Fred* p) : p_(p) { ++p_->count_; } // p must not be NULL
~FredPtr() { if (--p_->count_ == 0) delete p_; }
FredPtr(const FredPtr& p) : p_(p.p_) { ++p_->count_; }
FredPtr& operator= (const FredPtr& p)
{ // DO NOT CHANGE THE ORDER OF THESE STATEMENTS!
// (This order properly handles self-assignment)
++p.p_->count_;
if (--p_->count_ == 0) delete p_;
p_ = p.p_;
59 of 133
C++ FAQ
return *this;
}
private:
Fred* p_; // p_ is never NULL
};
Note that you can soften the "never NULL" rule above with a little more checking in the constructor,
copy constructor, assignment operator, and destructor. If you do that, you might as well put a p_ !
= NULL check into the "*" and "->" operators (at least as an assert()). I would recommend against
an operator Fred*() method, since that would let people accidentally get at the Fred*.
One of the implicit constraints on FredPtr is that it must only point to Fred objects which have been
allocated via new. If you want to be really safe, you can enforce this constraint by making all of
Fred's constructors private, and for each constructor have a public (static) create() method which
allocates the Fred object via new and returns a FredPtr (not a Fred*). That way the only way
anyone could create a Fred object would be to get a FredPtr ("Fred* p = new Fred()" would be
replaced by "FredPtr p = Fred::create()"). Thus no one could accidentally subvert the reference
counted mechanism.
For example, if Fred had a Fred::Fred() and a Fred::Fred(int i, int j), the changes to class Fred
would be:
class Fred {
public:
static FredPtr create(); // Defined below class FredPtr {...}
static FredPtr create(int i, int j); // Defined below class FredPtr {...}
// ...
private:
Fred();
Fred(int i, int j);
// ...
};
The end result is that you now have a way to use simple reference counting to provide "pointer
semantics" for a given object. Users of your Fred class explicitly use FredPtr objects, which act more
or less like Fred* pointers. The benefit is that users can make as many copies of their FredPtr
"smart pointer" objects, and the pointed-to Fred object will automagically get deleted when the last
such FredPtr object vanishes.
If you'd rather give your users "reference semantics" rather than "pointer semantics," you can use
reference counting to provide "copy on write".
Reference counting can be done with either pointer semantics or reference semantics. The previous
FAQ shows how to do reference counting with pointer semantics. This FAQ shows how to do
reference counting with reference semantics.
60 of 133
C++ FAQ
The basic idea is to allow users to think they're copying your Fred objects, but in reality the
underlying implementation doesn't actually do any copying unless and until some user actually tries
to modify the underlying Fred object.
Class Fred::Data houses all the data that would normally go into the Fred class. Fred::Data also has
an extra data member, count_, to manage the reference counting. Class Fred ends up being a
"smart reference" that (internally) points to a Fred::Data.
class Fred {
public:
// ...
private:
class Data {
public:
Data();
Data(int i, int j);
Data(const Data& d);
unsigned count_;
// count_ is the number of Fred objects that point at this
// count_ must be initialized to 1 by all constructors
// (it starts as 1 since it is pointed to by the Fred object that created it)
};
Data* data_;
};
Fred::Fred(const Fred& f)
: data_(f.data_)
{
++ data_->count_;
}
61 of 133
C++ FAQ
Fred::~Fred()
{
if (--data_->count_ == 0) delete data_;
}
void Fred::sampleMutatorMethod()
{
// This method might need to change things in *data_
// Thus it first checks if this is the only pointer to *data_
if (data_->count_ > 1) {
Data* d = new Data(*data_); // Invoke Fred::Data's copy ctor
-- data_->count_;
data_ = d;
}
assert(data_->count_ == 1);
If it is fairly common to call Fred's default constructor, you can avoid all those new calls by sharing
a common Fred::Data object for all Freds that are constructed via Fred::Fred(). To avoid static
initialization order problems, this shared Fred::Data object is created "on first use" inside a
function. Here are the changes that would be made to the above code (note that the shared
Fred::Data object's destructor is never invoked; if that is a problem, either hope you don't have any
static initialization order problems, or drop back to the approach described above):
class Fred {
public:
// ...
private:
// ...
static Data* defaultData();
};
Fred::Fred()
: data_(defaultData())
{
++ data_->count_;
}
Fred::Data* Fred::defaultData()
62 of 133
C++ FAQ
{
static Data* p = NULL;
if (p == NULL) {
p = new Data();
++ p->count_; // Make sure it never goes to zero
}
return p;
}
Note: You can also provide reference counting for a hierarchy of classes if your Fred class would
normally have been a base class.
The previous FAQ presented a reference counting scheme that provided users with reference
semantics, but did so for a single class rather than for a hierarchy of classes. This FAQ extends the
previous technique to allow for a hierarchy of classes. The basic difference is that Fred::Data is now
the root of a hierarchy of classes, which probably cause it to have some virtual functions. Note that
class Fred itself will still not have any virtual functions.
The Virtual Constructor Idiom is used to make copies of the Fred::Data objects. To select which
derived class to create, the sample code below uses the Named Constructor Idiom, but other
techniques are possible (a switch statement in the constructor, etc). The sample code assumes two
derived classes: Der1 and Der2. Methods in the derived classes are unaware of the reference
counting.
class Fred {
public:
// ...
private:
class Data {
public:
Data() : count_(1) { }
Data(const Data& d) : count_(1) { } // Do NOT copy the 'count_' member!
Data& operator= (const Data&) { return *this; } // Do NOT copy the 'count_' member!
virtual ~Data() { assert(count_ == 0); } // A virtual destructor
virtual Data* clone() const = 0; // A virtual constructor
virtual void sampleInspectorMethod() const = 0; // A pure virtual function
virtual void sampleMutatorMethod() = 0;
private:
unsigned count_; // count_ doesn't need to be protected
friend Fred; // Allow Fred to access count_
};
63 of 133
C++ FAQ
class Der1 : public Data {
public:
Der1(const std::string& s, int i);
virtual void sampleInspectorMethod() const;
virtual void sampleMutatorMethod();
virtual Data* clone() const;
// ...
};
Fred(Data* data);
// Creates a Fred smart-reference that owns *data
// It is private to force users to use a createXXX() method
// Requirement: data must not be NULL
Fred::Fred(const Fred& f)
: data_(f.data_)
{
++ data_->count_;
}
Fred::~Fred()
{
if (--data_->count_ == 0) delete data_;
}
64 of 133
C++ FAQ
// Therefore we simply "pass the method through" to *data_:
data_->sampleInspectorMethod();
}
void Fred::sampleMutatorMethod()
{
// This method might need to change things in *data_
// Thus it first checks if this is the only pointer to *data_
if (data_->count_ > 1) {
Data* d = data_->clone(); // The Virtual Constructor Idiom
-- data_->count_;
data_ = d;
}
assert(data_->count_ == 1);
Naturally the constructors and sampleXXX methods for Fred::Der1 and Fred::Der2 will need to be
implemented in whatever way is appropriate.
[16.24] Can you absolutely prevent people from subverting the reference counting
mechanism, and if so, should you?
There are two basic approaches to subverting the reference counting mechanism:
1. The scheme could be subverted if someone got a Fred* (rather than being forced to use a
FredPtr). Someone could get a Fred* if class FredPtr has an operator*() that returns a
Fred&": FredPtr p = Fred::create(); Fred* p2 = &*p;. Yes it's bizarre and unexpected, but it
could happen. This hole could be closed in two ways: overload Fred::operator&() so it
returns a FredPtr, or change the return type of FredPtr::operator*() so it returns a FredRef
(FredRef would be a class that simulates a reference; it would need to have all the methods
that Fred has, and it would need to forward all those method calls to the underlying Fred
object; there might be a performance penalty for this second choice depending on how good
the compiler is at inlining methods). Another way to fix this is to eliminate
FredPtr::operator*() — and lose the corresponding ability to get and use a Fred&. But even
if you did all this, someone could still generate a Fred* by explicitly calling operator->():
FredPtr p = Fred::create(); Fred* p2 = p.operator->();.
2. The scheme could be subverted if someone had a leak and/or dangling pointer to a FredPtr
Basically what we're saying here is that Fred is now safe, but we somehow want to prevent
people from doing stupid things with FredPtr objects. (And if we could solve that via
FredPtrPtr objects, we'd have the same problem again with them). One hole here is if
someone created a FredPtr using new, then allowed the FredPtr to leak (worst case this is a
leak, which is bad but is usually a little better than a dangling pointer). This hole could be
plugged by declaring FredPtr::operator new() as private, thus preventing someone from
saying new FredPtr(). Another hole here is if someone creates a local FredPtr object, then
takes the address of that FredPtr and passed around the FredPtr*. If that FredPtr* lived
longer than the FredPtr, you could have a dangling pointer — shudder. This hole could be
plugged by preventing people from taking the address of a FredPtr (by overloading
FredPtr::operator&() as private), with the corresponding loss of functionality. But even if you
did all that, they could still create a FredPtr& which is almost as dangerous as a FredPtr*,
simply by doing this: FredPtr p; ... FredPtr& q = p; (or by passing the FredPtr& to someone
else).
65 of 133
C++ FAQ
And even if we closed all those holes, C++ has those wonderful pieces of syntax called pointer
casts. Using a pointer cast or two, a sufficiently motivated programmer can normally create a hole
that's big enough to drive a proverbial truck through. (By the way, pointer casts are evil.)
So the lessons here seems to be: (a) you can't prevent espionage no matter how hard you try, and
(b) you can easily prevent mistakes.
So I recommend settling for the "low hanging fruit": use the easy-to-build and easy-to-use
mechanisms that prevent mistakes, and don't bother trying to prevent espionage. You won't
succeed, and even if you do, it'll (probably) cost you more than it's worth.
So if we can't use the C++ language itself to prevent espionage, are there other ways to do it? Yes.
I personally use old fashioned code reviews for that. And since the espionage techniques usually
involve some bizarre syntax and/or use of pointer-casts and unions, you can use a tool to point out
most of the "hot spots."
Yes.
Compared with the "smart pointer" techniques (see [16.21], the two kinds of garbage collector
techniques (see [16.26]) are:
• less portable
• usually more efficient (especially when the average object size is small or in multithreaded
environments)
• able to handle "cycles" in the data (reference counting techniques normally "leak" if the data
structures can form a cycle)
• sometimes leak other objects (since the garbage collectors are necessarily conservative,
they sometimes see a random bit pattern that appears to be a pointer into an allocation,
especially if the allocation is large; this can allow the allocation to leak)
• work better with existing libraries (since smart pointers need to be used explicitly, they may
be hard to integrate with existing libraries)
[16.26] What are the two kinds of garbage collectors for C++?
[Recently added a URL for Bartlett's collector thanks to Abhishek (in 4/01) and added a URL for
Attardi and Flagella's CMM thanks to Markus Laker (in 8/01). Click here to go to the next FAQ in the
"chain" of recent changes.]
1. Conservative garbage collectors. These know little or nothing about the layout of the stack or
of C++ objects, and simply look for bit patterns that appear to be pointers. In practice they
seem to work with both C and C++ code, particularly when the average object size is small.
Here are some examples, in alphabetical order:
o Boehm-Demers-Weiser collector
o Geodesic Systems collector
2. Hybrid garbage collectors. These usually scan the stack conservatively, but require the
programmer to supply layout information for heap objects. This requires more work on the
programmer's part, but may result in improved performance. Here are some examples, in
alphabetical order:
o Attardi and Flagella's CMM
o Bartlett's mostly copying collector
66 of 133
C++ FAQ
Since garbage collectors for C++ are normally conservative, they can sometimes leak if a bit
pattern "looks like" it might be a pointer to an otherwise unused block. Also they sometimes get
confused when pointers to a block actually point outside the block's extent (which is illegal, but
some programmers simply must push the envelope; sigh) and (rarely) when a pointer is hidden by
a compiler optimization. In practice these problems are not usually serious, however providing the
collector with hints about the layout of the objects can sometimes ameliorate these issues.
[16.27] Where can I get more info on garbage collectors for C++?
[17.1] What are some ways try / catch / throw can improve software quality?
The commonly used alternative to try / catch / throw is to return a return code (sometimes called
an error code) that the caller explicitly tests via some conditional statement such as if. For example,
printf(), scanf() and malloc() work this way: the caller is supposed to test the return value to see if
the function succeeded.
Although the return code technique is sometimes the most appropriate error handling technique,
there are some nasty side effects to adding unnecessary if statements:
• Degrade quality: It is well known that conditional statements are approximately ten times
more likely to contain errors than any other kind of statement. So all other things being
equal, if you can eliminate conditionals / conditional statements from your code, you will
likely have more robust code.
• Slow down time-to-market: Since conditional statements are branch points which are
related to the number of test cases that are needed for white-box testing, unnecessary
conditional statements increase the amount of time that needs to be devoted to testing.
Basically if you don't exercise every branch point, there will be instructions in your code that
will never have been executed under test conditions until they are seen by your
users/customers. That's bad.
• Increase development cost: Bug finding, bug fixing, and testing are all increased by
unnecessary control flow complexity.
So compared to error reporting via return-codes and if, using try / catch / throw is likely to result in
code that has fewer bugs, is less expensive to develop, and has faster time-to-market. Of course if
your organization doesn't have any experiential knowledge of try / catch / throw, you might want to
use it on a toy project first just to make sure you know what you're doing — you should always get
used to a weapon on the firing range before you bring it to the front lines of a shooting war.
Throw an exception.
Constructors don't have a return type, so it's not possible to use return codes. The best way to
signal constructor failure is therefore to throw an exception.
If you don't have or won't use exceptions, here's a work-around. If a constructor fails, the
constructor can put the object into a "zombie" state. Do this by setting an internal status bit so the
object acts sort of like it's dead even though it is technically still alive. Then add a query
("inspector") member function to check this "zombie" bit so users of your class can find out if their
object is truly alive, or if it's a zombie (i.e., a "living dead" object). Also you'll probably want to
have your other member functions check this zombie bit, and, if the object isn't really alive, do a
67 of 133
C++ FAQ
no-op (or perhaps something more obnoxious such as abort()). This is really ugly, but it's the best
you can do if you can't (or don't want to) use exceptions.
Write a message to a log-file. Or call Aunt Tilda. But do not throw an exception!
The C++ rule is that you must never throw an exception from a destructor that is being called
during the "stack unwinding" process of another exception. For example, if someone says
throw Foo(), the stack will be unwound so all the stack frames between the throw Foo() and the
} catch (Foo e) { will get popped. This is called stack unwinding.
During stack unwinding, all the local objects in all those stack frames are destructed. If one of
those destructors throws an exception (say it throws a Bar object), the C++ runtime system is in a
no-win situation: should it ignore the Bar and end up in the } catch (Foo e) { where it was originally
headed? Should it ignore the Foo and look for a } catch (Bar e) { handler? There is no good answer
-- either choice loses information.
So the C++ language guarantees that it will call terminate() at this point, and terminate() kills the
process. Bang you're dead.
The easy way to prevent this is never throw an exception from a destructor. But if you really want
to be clever, you can say never throw an exception from a destructor while processing another
exception. But in this second case, you're in a difficult situation: the destructor itself needs code to
handle both throwing an exception and doing "something else", and the caller has no guarantees as
to what might happen when the destructor detects an error (it might throw an exception, it might
do "something else"). So the whole solution is harder to write. So the easy thing to do is always do
"something else". That is, never throw an exception from a destructor.
Of course the word never should be "in quotes" since there is always some situation somewhere
where the rule won't hold. But certainly at least 99% of the time this is a good rule of thumb.
Every data member inside your object should clean up its own mess.
If a constructor throws an exception, the object's destructor is not run. If your object has already
done something that needs to be undone (such as allocating some memory, opening a file, or
locking a semaphore), this "stuff that needs to be undone" must be remembered by a data member
inside the object.
For example, rather than allocating memory into a raw Fred* data member, put the allocated
memory into a "smart pointer" member object, and the destructor of this smart pointer will delete
the Fred object when the smart pointer dies. The standard class auto_ptr is an example of such as
"smart pointer" class. You can also write your own reference counting smart pointer. You can also
use smart pointers to "point" to disk records or objects on other machines.
[17.5] How do I change the string-length of an array of char to prevent memory leaks
even if/when someone throws an exception?
If what you really want to do is work with strings, don't use an array of char in the first place, since
arrays are evil. Instead use an object of some string-like class.
68 of 133
C++ FAQ
For example, suppose you want to get a copy of a string, fiddle with the copy, then append another
string to the end of the fiddled copy. The array-of-char approach would look something like this:
} catch (...) {
delete[] copy; // Prevent memory leaks if we got an exception
throw; // Re-throw the current exception
}
Using char*s like this is tedious and error prone. Why not just use an object of some string class?
Your compiler probably supplies a string-like class, and it's probably just as fast and certainly it's a
lot simpler and safer than the char* code that you would have to write yourself. For example, if
you're using the std::string class from the standardization committee, your code might look
something like this:
That's a total of two (2) lines of code within the body of the function, as compared with twelve (12)
lines of code in the previous example. Most of the savings came from memory management, but
some also came because we didn't have to explicitly call strxxx() routines. Here are some high
points:
69 of 133
C++ FAQ
• We do not need to explicitly write any code that reallocates memory when we grow the
string, since std::string handles memory management automatically.
• We do not need to delete[] anything at the end, since std::string handles memory
management automatically.
• We do not need a try block in this second example, since std::string handles memory
management automatically, even if someone somewhere throws an exception.
A good thing. It means using the keyword const to prevent const objects from getting mutated.
For example, if you wanted to create a function f() that accepted a std::string, plus you want to
promise callers not to change the caller's std::string that gets passed to f(), you can have f()
receive its std::string parameter...
In the pass by reference-to-const and pass by pointer-to-const cases, any attempts to change to
the caller's std::string within the f() functions would be flagged by the compiler as an error at
compile-time. This check is done entirely at compile-time: there is no run-time space or speed cost
for the const. In the pass by value case (f3()), the called function gets a copy of the caller's
std::string. This means that f3() can change its local copy, but the copy is destroyed when f3()
returns. In particular f3() cannot change the caller's std::string object.
As an opposite example, if you wanted to create a function g() that accepted a std::string, but you
want to let callers know that g() might change the caller's std::string object. In this case you can
have g() receive its std::string parameter...
The lack of const in these functions tells the compiler that they are allowed to (but are not required
to) change the caller's std::string object. Thus they can pass their std::string to any of the f()
functions, but only f3() (the one that receives its parameter "by value") can pass its std::string to
g1() or g2(). If f1() or f2() need to call either g() function, a local copy of the std::string object
must be passed to the g() function; the parameter to f1() or f2() cannot be directly passed to either
g() function. E.g.,
std::string localCopy = s;
g1(localCopy); // OK since localCopy is not const
}
Naturally in the above case, any changes that g1() makes are made to the localCopy object that is
local to f1(). In particular, no changes will be made to the const parameter that was passed by
reference to f1().
70 of 133
C++ FAQ
Declaring the const-ness of a parameter is just another form of type safety. It is almost as if a const
std::string, for example, is a different class than an ordinary std::string, since the const variant is
missing the various mutative operations in the non-const variant (e.g., you can imagine that a
const std::string simply doesn't have an assignment operator).
If you find ordinary type safety helps you get systems correct (it does; especially in large systems),
you'll find const correctness helps also.
Back-patching const correctness results in a snowball effect: every const you add "over here"
requires four more to be added "over there."
It means p points to an object of class Fred, but p can't be used to change that Fred object
(naturally p could also be NULL).
For example, if class Fred has a const member function called inspect(), saying p->inspect() is OK.
But if class Fred has a non-const member function called mutate(), saying p->mutate() is an error
(the error is caught by the compiler; no run-time tests are done, which means const doesn't slow
your program down).
[18.5] What's the difference between "const Fred* p", "Fred* const p" and
"const Fred* const p"?
• const Fred* p means "p points to a Fred that is const" — that is, the Fred object can't be
changed via p.
• Fred* const p means "p is a const pointer to a Fred" — that is, you can change the Fred
object via p, but you can't change the pointer p itself.
• const Fred* const p means "p is a const pointer to a const Fred" — that is, you can't change
the pointer p itself, nor can you change the Fred object via p.
It means x aliases a Fred object, but x can't be used to change that Fred object.
For example, if class Fred has a const member function called inspect(), saying x.inspect() is OK.
But if class Fred has a non-const member function called mutate(), saying x.mutate() is an error
(the error is caught by the compiler; no run-time tests are done, which means const doesn't slow
your program down).
No, it is nonsense.
To find out what the above declaration means, you have to read it right-to-left. Thus
"Fred& const x" means "x is a const reference to a Fred". But that is redundant, since references
are always const. You can't reseat a reference. Never. With or without the const.
71 of 133
C++ FAQ
In other words, "Fred& const x" is functionally equivalent to "Fred& x". Since you're gaining nothing
by adding the const after the &, you shouldn't add it since it will confuse people. I.e., the const will
make some people think that the Fred is const, as if you had said "const Fred& x".
The problem with using "Fred const& x" (with the const before the &) is that it could easily be mis-
typed as the nonsensical "Fred &const x" (with the const after the &).
A const member function is indicated by a const suffix just after the member function's parameter
list. Member functions with a const suffix are called "const member functions" or "inspectors."
Member functions without a const suffix are called "non-const member functions" or "mutators."
class Fred {
public:
void inspect() const; // This member promises NOT to change *this
void mutate(); // This member function might change *this
};
The error in unchangeable.mutate() is caught at compile time. There is no runtime space or speed
penalty for const.
The trailing const on inspect() member function means that the abstract (client-visible) state of the
object isn't going to change. This is slightly different from promising that the "raw bits" of the
object's struct aren't going to change. C++ compilers aren't allowed to take the "bitwise"
interpretation unless they can solve the aliasing problem, which normally can't be solved (i.e., a
non-const alias could exist which could modify the state of the object). Another (important) insight
from this aliasing issue: pointing at an object with a pointer-to-const doesn't guarantee that the
object won't change; it promises only that the object won't change via that pointer.
A small percentage of inspectors need to make innocuous changes to data members (e.g., a Set
object might want to cache its last lookup in hopes of improving the performance of its next
lookup). By saying the changes are "innocuous," I mean that the changes wouldn't be visible from
72 of 133
C++ FAQ
outside the object's interface (otherwise the member function would be a mutator rather than an
inspector).
When this happens, the data member which will be modified should be marked as mutable (put the
mutable keyword just before the data member's declaration; i.e., in the same place where you
could put const). This tells the compiler that the data member is allowed to change during a const
member function. If your compiler doesn't support the mutable keyword, you can cast away the
const'ness of this via the const_cast keyword (but see the NOTE below before doing this). E.g., in
Set::lookup() const, you might say,
After this line, self will have the same bits as this (e.g., self == this), but self is a Set* rather than
a const Set*. Therefore you can use self to modify the object pointed to by this.
NOTE: there is an extremely unlikely error that can occur with const_cast. It only happens when
three very rare things are combined at the same time: a data member that ought to be mutable
(such as is discussed above), a compiler that doesn't support the mutable keyword, and an object
that was originally defined to be const (as opposed to a normal, non-const object that is pointed to
by a pointer-to-const). Although this combination is so rare that it may never happen to you, if it
ever did happen the code may not work (the Standard says the behavior is undefined).
If you ever want to use const_cast, use mutable instead. In other words, if you ever need to change
a member of an object, and that object is pointed to by a pointer-to-const, the safest and simplest
thing to do is add mutable to the member's declaration. You can use const_cast if you are sure that
the actual object isn't const (e.g., if you are sure the object is declared something like this: Set s;),
but if the object itself might be const (e.g., if it might be declared like: const Set s;), use mutable
rather than const_cast.
Please don't write and tell me that version X of compiler Y on machine Z allows you to change a
non-mutable member of a const object. I don't care — it is illegal according to the language and
your code will probably fail on a different compiler or even a different version (an upgrade) of the
same compiler. Just say no. Use mutable instead.
Even if the language outlawed const_cast, the only way to avoid flushing the register cache across a
const member function call would be to solve the aliasing problem (i.e., to prove that there are no
non-const pointers that point to the object). This can happen only in rare cases (when the object is
constructed in the scope of the const member function invocation, and when all the non-const
member function invocations between the object's construction and the const member function
invocation are statically bound, and when every one of these invocations is also inlined, and when
the constructor itself is inlined, and when any member functions the constructor calls are inline).
[18.12] Why does the compiler allow me to change an int after I've pointed at it with a
const int*?
Because "const int* p" means "p promises not to change the *p," not "*p promises not to change."
Causing a const int* to point to an int doesn't const-ify the int. The int can't be changed via the
const int*, but if someone else has an int* (note: no const) that points to ("aliases") the same int,
then that int* can be used to change the int. For example:
73 of 133
C++ FAQ
void f(const int* p1, int* p2)
{
int i = *p1; // Get the (original) value of *p1
*p2 = 7; // If p1 == p2, this will also change *p1
int j = *p1; // Get the (possibly new) value of *p1
if (i != j) {
std::cout << "*p1 changed, but it didn't change via pointer p1!\n";
assert(p1 == p2); // This is the only way *p1 could be different
}
}
int main()
{
int x;
f(&x, &x); // This is perfectly legal (and even moral!)
}
Note that main() and f(const int*,int*) could be in different compilation units that are compiled on
different days of the week. In that case there is no way the compiler can possibly detect the aliasing
at compile time. Therefore there is no way we could make a language rule that prohibits this sort of
thing. In fact, we wouldn't even want to make such a rule, since in general it's considered a feature
that you can have many pointers pointing to the same thing. The fact that one of those pointers
promises not to change the underlying "thing" is just a promise made by the pointer; it's not a
promise made by the "thing".
"const Fred* p" means that the Fred can't be changed via pointer p, but there might be other ways
to get at the object without going through a const (such as an aliased non-const pointer such as a
Fred*). For example, if you have two pointers "const Fred* p" and "Fred* q" that point to the same
Fred object (aliasing), pointer q can be used to change the Fred object but pointer p cannot.
class Fred {
public:
void inspect() const; // A const member function
void mutate(); // A non-const member function
};
int main()
{
Fred f;
const Fred* p = &f;
Fred* q = &f;
74 of 133
C++ FAQ
Yep.
Inheritance is what separates abstract data type (ADT) programming from OO programming.
As a specification device.
Human beings abstract things on two dimensions: part-of and kind-of. A Ford Taurus is-a-kind-of-a
Car, and a Ford Taurus has-a Engine, Tires, etc. The part-of hierarchy has been a part of software
since the ADT style became relevant; inheritance adds "the other" major dimension of
decomposition.
(Note: this FAQ has to do with public inheritance; private and protected inheritance are different.)
Yes.
An object of a derived class is a kind of the base class. Therefore the conversion from a derived
class pointer to a base class pointer is perfectly safe, and happens all the time. For example, if I am
pointing at a car, I am in fact pointing at a vehicle, so converting a Car* to a Vehicle* is perfectly
safe and normal:
(Note: this FAQ has to do with public inheritance; private and protected inheritance are different.)
• A member (either data member or member function) declared in a private section of a class
can only be accessed by member functions and friends of that class
75 of 133
C++ FAQ
• A member (either data member or member function) declared in a protected section of a
class can only be accessed by member functions and friends of that class, and by member
functions and friends of derived classes
• A member (either data member or member function) declared in a public section of a class
can be accessed by anyone
[19.6] Why can't my derived class access private things from my base class?
Derived classes do not get access to private members of a base class. This effectively "seals off" the
derived class from any changes made to the private members of the base class.
[19.7] How can I protect derived classes from breaking when I change the internal parts
of the base class?
A class has two distinct interfaces for two distinct sets of clients:
Unless you expect all your derived classes to be built by your own team, you should declare your
base class's data members as private and use protected inline access functions by which derived
classes will access the private data in the base class. This way the private data declarations can
change, but the derived class's code won't break (unless you change the protected access
functions).
From an OO perspective, it is the single most important feature of C++: [6.8], [6.9].
A virtual function allows derived classes to replace the implementation provided by the base class.
The compiler makes sure the replacement is always called whenever the object in question is
actually of the derived class, even if the object is accessed by a base pointer rather than a derived
pointer. This allows algorithms in the base class to be replaced in the derived class, even if users
don't know about the derived class.
The derived class can either fully replace ("override") the base class member function, or the
derived class can partially replace ("augment") the base class member function. The latter is
accomplished by having the derived class member function call the base class member function, if
desired.
[20.2] How can C++ achieve dynamic binding yet also static typing?
When you have a pointer to an object, the object may actually be of a class that is derived from the
class of the pointer (e.g., a Vehicle* that is actually pointing to a Car object; this is called
"polymorphism"). Thus there are two types: the (static) type of the pointer (Vehicle, in this case),
and the (dynamic) type of the pointed-to object (Car, in this case).
Static typing means that the legality of a member function invocation is checked at the earliest
possible moment: by the compiler at compile time. The compiler uses the static type of the pointer
to determine whether the member function invocation is legal. If the type of the pointer can handle
the member function, certainly the pointed-to object can handle it as well. E.g., if Vehicle has a
certain member function, certainly Car also has that member function since Car is a kind-of Vehicle.
76 of 133
C++ FAQ
Dynamic binding means that the address of the code in a member function invocation is determined
at the last possible moment: based on the dynamic type of the object at run time. It is called
"dynamic binding" because the binding to the code that actually gets called is accomplished
dynamically (at run time). Dynamic binding is a result of virtual functions.
[20.3] What's the difference between how virtual and non-virtual member functions are
called?
Non-virtual member functions are resolved statically. That is, the member function is selected
statically (at compile-time) based on the type of the pointer (or reference) to the object.
In contrast, virtual member functions are resolved dynamically (at run-time). That is, the member
function is selected dynamically (at run-time) based on the type of the object, not the type of the
pointer/reference to that object. This is called "dynamic binding." Most compilers use some variant
of the following technique: if the object has one or more virtual functions, the compiler puts a
hidden pointer in the object called a "virtual-pointer" or "v-pointer." This v-pointer points to a global
table called the "virtual-table" or "v-table."
The compiler creates a v-table for each class that has at least one virtual function. For example, if
class Circle has virtual functions for draw() and move() and resize(), there would be exactly one v-
table associated with class Circle, even if there were a gazillion Circle objects, and the v-pointer of
each of those Circle objects would point to the Circle v-table. The v-table itself has pointers to each
of the virtual functions in the class. For example, the Circle v-table would have three pointers: a
pointer to Circle::draw(), a pointer to Circle::move(), and a pointer to Circle::resize().
During a dispatch of a virtual function, the run-time system follows the object's v-pointer to the
class's v-table, then follows the appropriate slot in the v-table to the method code.
The space-cost overhead of the above technique is nominal: an extra pointer per object (but only
for objects that will need to do dynamic binding), plus an extra pointer per method (but only for
virtual methods). The time-cost overhead is also fairly nominal: compared to a normal function call,
a virtual function call requires two extra fetches (one to get the value of the v-pointer, a second to
get the address of the method). None of this runtime activity happens with non-virtual functions,
since the compiler resolves non-virtual functions exclusively at compile-time based on the type of
the pointer.
Note: the above discussion is simplified considerably, since it doesn't account for extra structural
things like multiple inheritance, virtual inheritance, RTTI, etc., nor does it account for space/speed
issues such as page faults, calling a function via a pointer-to-function, etc. If you want to know
about those other things, please ask comp.lang.c++; PLEASE DO NOT SEND E-MAIL TO ME!
Suppose there is a base class Vehicle with derived classes Car and "Truck". The code traverses a list
of Vehicle objects and does different things depending on the type of Vehicle. For example it might
weigh the "Truck" objects (to make sure they're not carrying too heavy of a load) but it might do
something different with a Car object — check the registration, for example.
The initial solution for this, at least with most people, is to use an if statement. E.g., "if the object is
a "Truck", do this, else if it is a Car, do that, else do a third thing":
void myCode(VehicleList& v)
{
77 of 133
C++ FAQ
for (VehicleList::iterator p = v.begin(); p != v.end(); ++p) {
Vehicle& v = **p; // just for shorthand
The problem with this is what I call "else-if-heimer's disease": eventually you'll forget to add an
else if when you add a new derived class, and you'll probably have a bug that won't be detected
until run-time, or worse, when the product is in the field.
The solution is to use dynamic binding rather than dynamic typing. Instead of having (what I call)
the live-code dead-data metaphor (where the code is alive and the car/truck objects are relatively
dead), we move the code into the data. This is a slight variation of Bertrand Meyer's Inversion
Principle.
It's surprisingly easy. You just give a name to the code within the {...} blocks of each if (in this
case it's the "foo-bar" operation), and you add that name as a virtual member function in the base
class, Vehicle.
class Vehicle {
public:
// performs the "foo-bar" operation
virtual void fooBar() = 0;
};
Then you remove the whole if...else if... block, and replace it with a simple call to this virtual
function:
void myCode(VehicleList& v)
{
for (VehicleList::iterator p = v.begin(); p != v.end(); ++p) {
Vehicle& v = **p; // just for shorthand
78 of 133
C++ FAQ
// perform the "foo-bar" operation.
v.fooBar();
Finally you simply move the code that used to be in the {...} block of each if into the fooBar()
member function of the appropriate derived class:
void Car::fooBar()
{
// car-specific code that does "foo-bar" on 'this'
... // this code was in {...} of if (v is a Car)
}
void Truck::fooBar()
{
// truck-specific code that does "foo-bar" on 'this'
... // this code was in {...} of if (v is a Truck)
}
If you actually have an else block in the original myCode() function (see above for the "semi-
generic code that does the 'foo-bar' operation on something other than a Car or Truck"), change
Vehicle's fooBar() from pure virtual to plain virtual and move the code into that member function:
class Vehicle {
public:
// performs the "foo-bar" operation
virtual void fooBar();
};
void Vehicle::fooBar()
{
// semi-generic code that does "foo-bar" on something else
... // this code was in {...} of the else case
}
In any case, the point is that we try to avoid decision logic with decisions based on the kind-of
derived class you're dealing with. I.e., you're trying to avoid if the object is a car do xyz,
else if it's a truck do pqr, etc.
79 of 133
C++ FAQ
When you may delete a derived object via a base pointer.
virtual functions bind to the code associated with the class of the object, rather than with the class
of the pointer/reference. When you say delete basePtr, and the base class has a virtual destructor,
the destructor that gets invoked is the one associated with the type of the object *basePtr, rather
than the one associated with the type of the pointer. This is generally A Good Thing.
If you had a hard grokking the previous rule, try this (over)simplified one on for size: A class should
have a virtual destructor unless that class has no virtual functions. Rationale: if you have any
virtual functions at all, you're probably going to be doing "stuff" to derived objects via a base
pointer, and some of the "stuff" you may do may include invoking a destructor (normally done
implicitly via delete). Plus once you've put the first virtual function into a class, you've already paid
all the per-object space cost that you'll ever pay (one pointer per object; note that this is
theoretically compiler-specific; in practice everyone does it pretty much the same way), so making
the destructor virtual won't generally cost you anything extra.
An idiom that allows you to do something that C++ doesn't directly support.
You can get the effect of a virtual constructor by a virtual clone() member function (for copy
constructing), or a virtual create() member function (for the default constructor).
class Shape {
public:
virtual ~Shape() { } // A virtual destructor
virtual void draw() = 0; // A pure virtual function
virtual void move() = 0;
// ...
virtual Shape* clone() const = 0; // Uses the copy constructor
virtual Shape* create() const = 0; // Uses the default constructor
};
In the clone() member function, the new Circle(*this) code calls Circle's copy constructor to copy
the state of this into the newly created Circle object. In the create() member function, the
new Circle() code calls Circle's default constructor.
80 of 133
C++ FAQ
Users use these as if they were "virtual constructors":
void userCode(Shape& s)
{
Shape* s2 = s.clone();
Shape* s3 = s.create();
// ...
delete s2; // You probably need a virtual destructor here
delete s3;
}
This function will work correctly regardless of whether the Shape is a Circle, Square, or some other
kind-of Shape that doesn't even exist yet.
Note: The return type of Circle's clone() member function is intentionally different from the return
type of Shape's clone() member function. This is called Covariant Return Types, a feature that was
not originally part of the language. If your compiler complains at the declaration of
Circle* clone() const within class Circle (e.g., saying "The return type is different" or "The member
function's type differs from the base class virtual function by return type alone"), you have an old
compiler and you'll have to change the return type to Shape*.
Amazingly Microsoft Visual C++ is one of those compilers that does not, as of version 6.0, handle
Covariant Return Types. This means:
• MS VC++ 6.0 will give you an error message on the overrides of clone() and create().
• Do not write me about this. The above code is correct with respect to the C++ Standard (see
section 10.3p5); the problem is with MS VC++ 6.0, not with the above code. Simply put, MS
VC++ 6.0 is broken with respect to its treatment of Covariant Return Types.
[21.1] Should I hide member functions that were public in my base class?
Attempting to hide (eliminate, revoke, privatize) inherited public member functions is an all-too-
common design error. It usually stems from muddy thinking.
(Note: this FAQ has to do with public inheritance; private and protected inheritance are different.)
[21.2] Derived* —> Base* works OK; why doesn't Derived** —> Base** work?
C++ allows a Derived* to be converted to a Base*, since a Derived object is a kind of a Base
object. However trying to convert a Derived** to a Base** is flagged as an error. Although this
error may not be obvious, it is nonetheless a good thing. For example, if you could convert a Car**
to a Vehicle**, and if you could similarly convert a NuclearSubmarine** to a Vehicle**, you could
assign those two pointers and end up making a Car* point at a NuclearSubmarine:
class Vehicle {
public:
virtual ~Vehicle() { }
virtual void startEngine() = 0;
};
81 of 133
C++ FAQ
};
int main()
{
Car car;
Car* carPtr = &car;
Car** carPtrPtr = &carPtr;
Vehicle** vehiclePtrPtr = carPtrPtr; // This is an error in C++
NuclearSubmarine sub;
NuclearSubmarine* subPtr = ⊂
*vehiclePtrPtr = subPtr;
// This last line would have caused carPtr to point to sub !
carPtr->openGasCap(); // This might call fireNuclearMissle()!
}
In other words, if it was legal to convert a Derived** to a Base**, the Base** could be
dereferenced (yielding a Base*), and the Base* could be made to point to an object of a different
derived class, which could cause serious problems for national security (who knows what would
happen if you invoked the openGasCap() member function on what you thought was a Car, but in
reality it was a NuclearSubmarine!! Try the above code out and see what it does — on most
compilers it will call NuclearSubmarine::fireNuclearMissle()!
(Note: this FAQ has to do with public inheritance; private and protected inheritance are different.)
Nope.
I know it sounds strange, but it's true. You can think of this as a direct consequence of the previous
FAQ, or you can reason it this way: if the kind-of relationship were valid, then someone could point
a parking-lot-of-Vehicle pointer at a parking-lot-of-Car. But parking-lot-of-Vehicle has a
addNewVehicleToParkingLot(Vehicle&) member function which can add any Vehicle object to the
parking lot. This would allow you to park a NuclearSubmarine in a parking-lot-of-Car. Certainly it
would be surprising if someone removed what they thought was a Car from the parking-lot-of-Car,
only to find that it is actually a NuclearSubmarine.
Another way to say this truth: a container of Thing is not a kind-of container of Anything even if a
Thing is a kind-of an Anything. Swallow hard; it's true.
You don't have to like it. But you do have to accept it.
One last example which we use in our OO/C++ training courses: "A Bag-of-Apple is not a kind-of
Bag-of-Fruit." If a Bag-of-Apple could be passed as a Bag-of-Fruit, someone could put a Banana into
the Bag, even though it is supposed to only contain Apples!
(Note: this FAQ has to do with public inheritance; private and protected inheritance are different.)
Nope.
82 of 133
C++ FAQ
This is a corollary of the previous FAQ. Unfortunately this one can get you into a lot of hot water.
Consider this:
class Base {
public:
virtual void f(); // 1
};
int main()
{
Derived arrayOfDerived[10]; // 4
userCode(arrayOfDerived); // 5
}
The compiler thinks this is perfectly type-safe. Line 5 converts a Derived* to a Base*. But in reality
it is horrendously evil: since Derived is larger than Base, the pointer arithmetic done on line 3 is
incorrect: the compiler uses sizeof(Base) when computing the address for arrayOfBase[1], yet the
array is an array of Derived, which means the address computed on line 3 (and the subsequent
invocation of member function f()) isn't even at the beginning of any object! It's smack in the
middle of a Derived object. Assuming your compiler uses the usual approach to virtual functions,
this will reinterpret the int i_ of the first Derived as if it pointed to a virtual table, it will follow that
"pointer" (which at this point means we're digging stuff out of a random memory location), and
grab one of the first few words of memory at that location and interpret them as if they were the
address of a C++ member function, then load that (random memory location) into the instruction
pointer and begin grabbing machine instructions from that memory location. The chances of this
crashing are very high.
The root problem is that C++ can't distinguish between a pointer-to-a-thing and a pointer-to-an-
array-of-things. Naturally C++ "inherited" this feature from C.
NOTE: If we had used an array-like class (e.g., std::vector<Derived> from the standard library)
instead of using a raw array, this problem would have been properly trapped as an error at compile
time rather than a run-time disaster.
(Note: this FAQ has to do with public inheritance; private and protected inheritance are different.)
Seriously, arrays are very closely related to pointers, and pointers are notoriously difficult to deal
with. But if you have a complete grasp of why the above few FAQs were a problem from a design
perspective (e.g., if you really know why a container of Thing is not a kind-of container of
Anything), and if you think everyone else who will be maintaining your code also has a full grasp on
these OO design truths, then you should feel free to use arrays. But if you're like most people, you
83 of 133
C++ FAQ
should use a template container class such as std::vector<T> from the standard library rather than
raw arrays.
(Note: this FAQ has to do with public inheritance; private and protected inheritance are different.)
For example, suppose Ellipse has a setSize(x,y) member function, and suppose this member
function promises the Ellipse's width() will be x, and its height() will be y. In this case, Circle can't
be a kind-of Ellipse. Simply put, if Ellipse can do something Circle can't, then Circle can't be a kind
of Ellipse.
This leaves two potential (valid) relationships between Circle and Ellipse:
In the first case, Ellipse could be derived from class AsymmetricShape, and setSize(x,y) could be
introduced in AsymmetricShape. However Circle could be derived from SymmetricShape which has
a setSize(size) member function.
In the second case, class Oval could only have setSize(size) which sets both the width() and the
height() to size. Ellipse and Circle could both inherit from Oval. Ellipse —but not Circle— could add
the setSize(x,y) operation (but beware of the hiding rule if the same member function name
setSize() is used for both operations).
(Note: this FAQ has to do with public inheritance; private and protected inheritance are different.)
(Note: setSize(x,y) isn't sacred. Depending on your goals, it may be okay to prevent users from
changing the dimensions of an Ellipse, in which case it would be a valid design choice to not have a
setSize(x,y) method in Ellipse. However this series of FAQs discusses what to do when you want to
create a derived class of a pre-existing base class that has an "unacceptable" method in it. Of
course the ideal situation is to discover this problem when the base class doesn't yet exist. But life
isn't always ideal...)
[21.7] Are there other options to the "Circle is/isnot kind-of Ellipse" dilemma?
If you claim that all Ellipses can be squashed asymmetrically, and you claim that Circle is a kind-of
Ellipse, and you claim that Circle can't be squashed asymmetrically, clearly you've got to adjust
(revoke, actually) one of your claims. Thus you've either got to get rid of Ellipse::setSize(x,y), get
rid of the inheritance relationship between Circle and Ellipse, or admit that your Circles aren't
necessarily circular.
Here are the two most common traps new OO/C++ programmers regularly fall into. They attempt
to use coding hacks to cover up a broken design (they redefine Circle::setSize(x,y) to throw an
exception, call abort(), choose the average of the two parameters, or to be a no-op). Unfortunately
all these hacks will surprise users, since users are expecting width() == x and height() == y. The
one thing you must not do is surprise your users.
If it is important to you to retain the "Circle is a kind-of Ellipse" inheritance relationship, you can
weaken the promise made by Ellipse's setSize(x,y). E.g., you could change the promise to, "This
member function might set width() to x and/or it might set height() to y, or it might do nothing".
Unfortunately this dilutes the contract into dribble, since the user can't rely on any meaningful
84 of 133
C++ FAQ
behavior. The whole hierarchy therefore begins to be worthless (it's hard to convince someone to
use an object if you have to shrug your shoulders when asked what the object does for them).
(Note: this FAQ has to do with public inheritance; private and protected inheritance are different.)
(Note: setSize(x,y) isn't sacred. Depending on your goals, it may be okay to prevent users from
changing the dimensions of an Ellipse, in which case it would be a valid design choice to not have a
setSize(x,y) method in Ellipse. However this series of FAQs discusses what to do when you want to
create a derived class of a pre-existing base class that has an "unacceptable" method in it. Of
course the ideal situation is to discover this problem when the base class doesn't yet exist. But life
isn't always ideal...)
[21.8] But I have a Ph.D. in Mathematics, and I'm sure a Circle is a kind of an Ellipse!
Does this mean Marshall Cline is stupid? Or that C++ is stupid? Or that OO is
stupid?
Actually, it doesn't mean any of these things. The sad reality is that it means your intuition is
wrong.
Look, I have received and answered dozens of passionate e-mail messages about this subject. I
have taught it hundreds of times to thousands of software professionals all over the place. I know it
goes against your intuition. But trust me; your intuition is wrong.
The real problem is your intuitive notion of "kind of" doesn't match the OO notion of proper
inheritance (technically called "subtyping"). The bottom line is that the derived class objects must
be substitutable for the base class objects. In the case of Circle/Ellipse, the setSize(x,y) member
function violates this substitutability.
You have three choices: [1] remove the setSize(x,y) member function from Ellipse (thus breaking
existing code that calls the setSize(x,y) member function), [2] allow a Circle to have a different
height than width (an asymmetrical circle; hmmm), or [3] drop the inheritance relationship. Sorry,
but there simply are no other choices. Note that some people mention the option of deriving both
Circle and Ellipse from a third common base class, but that's just a variant of option [3] above.
Another way to say this is that you have to either make the base class weaker (in this case
braindamage Ellipse to the point that you can't set its width and height to different values), or make
the derived class stronger (in this case empower a Circle with the ability to be both symmetric and,
ahem, asymmetric). When neither of these is very satisfying (such as in the Circle/Ellipse case), one
normally simply removes the inheritance relationship. If the inheritance relationship simply has to
exist, you may need to remove the mutator member functions (setHeight(y), setWidth(x), and
setSize(x,y)) from the base class.
(Note: this FAQ has to do with public inheritance; private and protected inheritance are different.)
(Note: setSize(x,y) isn't sacred. Depending on your goals, it may be okay to prevent users from
changing the dimensions of an Ellipse, in which case it would be a valid design choice to not have a
setSize(x,y) method in Ellipse. However this series of FAQs discusses what to do when you want to
create a derived class of a pre-existing base class that has an "unacceptable" method in it. Of
course the ideal situation is to discover this problem when the base class doesn't yet exist. But life
isn't always ideal...)
If Circle is the base class and Ellipse is the derived class, then you run into a whole new set of
problems. For example, suppose Circle has a radius() method. Then Ellipse will also need to have a
radius() method, but that doesn't make much sense: what does it even mean for a (possibly
assymetric) ellipse to have a radius?
85 of 133
C++ FAQ
If you get over that hurdle (e.g., by having Ellipse::radius() return the average of the major and
minor axes, or whatever), then there is a problem with the relationship between radius() and
area(). E.g., suppose Circle has an area() method that promises to return 3.14159[etc] times the
square whatever radius() returns. Then either Ellipse::area() will not return the true area of the
ellipse, or you'll have to stand on your head to get radius() to return something that matches the
above formula.
Even if you get past that one (i.e., by having Ellipse::radius() return the square root of the ellipse's
area divided by pi), you'll get stuck by the circumference() method. E.g., suppose Circle has a
circumference() method that promises to return two times pi times whatever is returned by
radius(). Now you're stuck: there's no way to make all those constraints work out for Ellipse: the
Ellipse class will have to lie about its area, its circumference, or both.
Bottom line: you can make anything inherit from anything provided the methods in the derived
class abide by the promises made in the base class. But you ought not to use inheritance just
because you feel like it, or just because you want to get code reuse. You should use inheritance (a)
only if the derived class's methods can abide by all the promises made in the base class, and (b)
only if you don't think you'll confuse your users, and (c) only if there's something to be gained by
using the inheritance — some real, measurable improvement in time, money or risk.
[21.10] But my problem doesn't have anything to do with circles and ellipses, so what
good is that silly example to me?
Ahhh, there's the rub. You think the Circle/Ellipse example is just a silly example. But in reality,
your problem is an isomorphism to that example.
I don't care what your inheritance problem is, but all (yes all) bad inheritances boil down to the
Circle-is-not-a-kind-of-Ellipse example.
Here's why: Bad inheritances always have a base class with an extra capability (often an extra
member function or two; sometimes an extra promise made by one or a combination of member
functions) that a derived class can't satisfy. You've either got to make the base class weaker, make
the derived class stronger, or eliminate the proposed inheritance relationship. I've seen lots and lots
and lots of these bad inheritance proposals, and believe me, they all boil down to the Circle/Ellipse
example.
Therefore, if you truly understand the Circle/Ellipse example, you'll be able to recognize bad
inheritance everywhere. If you don't understand what's going on with the Circle/Ellipse problem, the
chances are high that you'll make some very serious and very expensive inheritance mistakes.
(Note: this FAQ has to do with public inheritance; private and protected inheritance are different.)
Interfaces are a company's most valuable resources. Designing an interface takes longer than
whipping together a concrete class which fulfills that interface. Furthermore interfaces require the
time of more expensive people.
Since interfaces are so valuable, they should be protected from being tarnished by data structures
and other implementation artifacts. Thus you should separate interface from implementation.
Use an ABC.
86 of 133
C++ FAQ
[22.3] What is an ABC?
At the design level, an abstract base class (ABC) corresponds to an abstract concept. If you asked a
mechanic if he repaired vehicles, he'd probably wonder what kind-of vehicle you had in mind.
Chances are he doesn't repair space shuttles, ocean liners, bicycles, or nuclear submarines. The
problem is that the term "vehicle" is an abstract concept (e.g., you can't build a "vehicle" unless
you know what kind of vehicle to build). In C++, class Vehicle would be an ABC, with Bicycle,
SpaceShuttle, etc, being derived classes (an OceanLiner is-a-kind-of-a Vehicle). In real-world OO,
ABCs show up all over the place.
At the programming language level, an ABC is a class that has one or more pure virtual member
functions. You cannot make an object (instance) of an ABC.
A member function declaration that turns a normal class into an abstract class (i.e., an ABC). You
normally only implement it in a derived class.
Some member functions exist in concept; they don't have any reasonable definition. E.g., suppose I
asked you to draw a Shape at location (x,y) that has size 7. You'd ask me "what kind of shape
should I draw?" (circles, squares, hexagons, etc, are drawn differently). In C++, we must indicate
the existence of the draw() member function (so users can call it when they have a Shape* or a
Shape&), but we recognize it can (logically) be defined only in derived classes:
class Shape {
public:
virtual void draw() const = 0; // = 0 means it is "pure virtual"
// ...
};
This pure virtual function makes Shape an ABC. If you want, you can think of the "= 0;" syntax as if
the code were at the NULL pointer. Thus Shape promises a service to its users, yet Shape isn't able
to provide any code to fulfill that promise. This forces any actual object created from a [concrete]
class derived from Shape to have the indicated member function, even though the base class
doesn't have enough information to actually define it yet.
Note that it is possible to provide a definition for a pure virtual function, but this usually confuses
novices and is best avoided until later.
[22.5] How do you define a copy constructor or assignment operator for a class that
contains a pointer to a (abstract) base class?
If the class "owns" the object pointed to by the (abstract) base class pointer, use the Virtual
Constructor Idiom in the (abstract) base class. As usual with this idiom, we declare a pure virtual
clone() method in the base class:
class Shape {
public:
// ...
virtual Shape* clone() const = 0; // The Virtual (Copy) Constructor
// ...
};
87 of 133
C++ FAQ
class Circle : public Shape {
public:
// ...
virtual Shape* clone() const { return new Circle(*this); }
// ...
};
Now suppose that each Fred object "has-a" Shape object. Naturally the Fred object doesn't know
whether the Shape is Circle or a Square or ... Fred's copy constructor and assignment operator will
invoke Shape's clone() method to copy the object:
class Fred {
public:
Fred(Shape* p) : p_(p) { assert(p != NULL); } // p must not be NULL
~Fred() { delete p_; }
Fred(const Fred& f) : p_(f.p_->clone()) { }
Fred& operator= (const Fred& f)
{
if (this != &f) { // Check for self-assignment
Shape* p2 = f.p_->clone(); // Create the new one FIRST...
delete p_; // ...THEN delete the old one
p_ = p2;
}
return *this;
}
// ...
private:
Shape* p_;
};
[23.1] Is it okay for a non-virtual function of the base class to call a virtual function?
Yes. It's sometimes (not always!) a great idea. For example, suppose all Shape objects have a
common algorithm for printing, but this algorithm depends on their area and they all have a
potentially different way to compute their area. In this case Shape's area() method would
necessarily have to be virtual (probably pure virtual) but Shape::print() could, if we were
guaranteed no derived class wanted a different algorithm for printing, be a non-virtual defined in
the base class Shape.
#include "Shape.hpp"
[23.2] That last FAQ confuses me. Is it a different strategy from the other ways to use
virtual functions? What's going on?
88 of 133
C++ FAQ
[Recently created (in 4/01). Click here to go to the next FAQ in the "chain" of recent changes.]
Yes, it is a different strategy. Yes, there really are two different basic ways to use virtual functions:
1. Suppose you have the situation described in the previous FAQ: you have a method whose
overall structure is the same for each derived class, but has little pieces that are different in
each derived class. So the algorithm is the same, but the primitives are different. In this
case you'd write the overall algorithm in the base class as a public method (that's sometimes
non-virtual), and you'd write the little pieces in the derived classes. The little pieces would
be declared in the base class (they're often protected, they're often pure virtual, and they're
certainly virtual), and they'd ultimately be defined in each derived class. The most critical
question in this situation is whether or not the public method containing the overall
algorithm should be virtual. The answer is to make it virtual if you think that some derived
class might need to override it.
2. Suppose you have the exact opposite situation from the previous FAQ, where you have a
method whose overall structure is different in each derived class, yet it has little pieces that
are the same in most (if not all) derived classes. In this case you'd put the overall algorithm
in a public virtual that's ultimately defined in the derived classes, and the little pieces of
common code can be written once (to avoid code duplication) and stashed somewhere
(anywhere!). A common place to stash the little pieces is in the protected part of the base
class, but that's not necessary and it might not even be best. Just find a place to stash them
and you'll be fine. Note that if you do stash them in the base class, you should normally
make them protected, since normally they do things that public users don't need/want to do.
Assuming they're protected, they probably shouldn't be virtual: if the derived class doesn't
like the behavior in one of them, it doesn't have to call that method.
For emphasis, the above list is a both/and situation, not an either/or situation. In other words, you
don't have to choose between these two strategies on any given class. It's perfectly normal to have
method f() correspond to strategy #1 while method g() corresponds to strategy #2. In other words,
it's perfectly normal to have both strategies working in the same class.
[23.3] When my base class's constructor calls a virtual function, why doesn't my derived
class's override of that virtual function get invoked?
During the class Base's constructor, the object isn't yet a Derived, so if Base::Base() calls a virtual
function virt(), the Base::virt() will be invoked, even if Derived::virt() exists.
Similarly, during Base's destructor, the object is no longer a Derived, so when Base::~Base() calls
virt(), Base::virt() gets control, not the Derived::virt() override.
You'll quickly see the wisdom of this approach when you imagine the disaster if Derived::virt()
touched a member object from class Derived. In particular, if Base::Base() called the virtual
function virt(), this rule causes Base::virt() to be invoked. If it weren't for this rule, Derived::virt()
would get called before the Derived part of a Derived object is constructed, and Derived::virt()
could touch unconstructed member objects from the Derived part of a Derived object. That would
be a disaster.
[23.4] Should a derived class replace ("override") a non-virtual function from a base
class?
Experienced C++ programmers will sometimes redefine a non-virtual function for efficiency (e.g., if
the derived class implementation can make better use of the derived class's resources) or to get
around the hiding rule. However the client-visible effects must be identical, since non-virtual
89 of 133
C++ FAQ
functions are dispatched based on the static type of the pointer/reference rather than the dynamic
type of the pointed-to/referenced object.
Here's the mess you're in: if Base declares a member function f(int), and Derived declares a
member function f(float) (same name but different parameter types and/or constness), then the
Base f(int) is "hidden" rather than "overloaded" or "overridden" (even if the Base f(int) is virtual).
Here's how you get out of the mess: Derived must have a using declaration of the hidden member
function. For example,
class Base {
public:
void f(int);
};
If the using syntax isn't supported by your compiler, redefine the hidden Base member function(s),
even if they are non-virtual. Normally this re-definition merely calls the hidden Base member
function using the :: syntax. E.g.,
[23.6] What does it mean that the "virtual table" is an unresolved external?
The compiler typically creates a magical data structure called the "virtual table" for classes that
have virtual functions (this is how it handles dynamic binding). Normally you don't have to know
about it at all. But if you forget to define a virtual function for class Fred, you will sometimes get
this linker error.
Here's the nitty gritty: Many compilers put this magical "virtual table" in the compilation unit that
defines the first non-inline virtual function in the class. Thus if the first non-inline virtual function in
Fred is wilma(), the compiler will put Fred's virtual table in the same compilation unit where it sees
Fred::wilma(). Unfortunately if you accidentally forget to define Fred::wilma(), rather than getting a
Fred::wilma() is undefined, you may get a "Fred's virtual table is undefined". Sad but true.
[Recently created (in 8/01). Click here to go to the next FAQ in the "chain" of recent changes.]
90 of 133
C++ FAQ
This is known as making the class "final" or "a leaf." There are two ways to do it: an easy technical
approach and an even easier non-technical approach.
• The (easy) technical approach is to make the class's constructors private and to use the
Named Constructor Idiom to create the objects. No one can create objects of a derived class
since the base class's constructor will be inaccessible. The "named constructors" themselves
could return by pointer if you want your objects allocated by new or they could return by
value if you want the objects created on the stack.
• The (even easier) non-technical approach is to put a big fat ugly comment next to the class
definition. The comment could say, for example, // We'll fire you if you inherit from this class
or even just /*final*/ class Whatever {...};. Some programmers balk at this because it is
enforced by people rather than by technology, but don't knock it on face value: it is quite
effective in practice.
[Recently created (in 8/01). Click here to go to the next FAQ in the "chain" of recent changes.]
This is known as making the method "final" or "a leaf." Here's an easy-to-use solution to this that
gives you 90+% of what you want: simply add a comment next to the method and rely on code
reviews or random maintenance activities to find violators. The comment could say, for example,
// We'll fire you if you override this method or perhaps more likely, /*final*/ void theMethod();.
The advantages to this technique are (a) it is extremely easy/fast/inexpensive to use, and (b) it is
quite effective in practice. In other words, you get 90+% of the benefit with almost no cost — lots
of bang per buck.
(I'm not aware of a "100% solution" to this problem so this may be the best you can get. If you
know of something better, please feel free to email me. But please do not email me objecting to this
solution because it's low-tech or because it doesn't "prevent" people from doing the wrong thing.
Who cares whether it's low-tech or high-tech as long as it's effective?!? And nothing in C++
"prevents" people from doing the wrong thing. Using pointer casts and pointer arithmetic, people
can do just about anything they want. C++ makes it easy to do the right thing, but it doesn't
prevent espionage. Besides, the original question (see above) asked for something so people won't
do the wrong thing, not so they can't do the wrong thing.)
In any case, this solution should give you most of the potential benefit at almost no cost.
E.g., the "Car has-a Engine" relationship can be expressed using simple composition:
class Engine {
public:
91 of 133
C++ FAQ
Engine(int numCylinders);
void start(); // Starts this Engine
};
class Car {
public:
Car() : e_(8) { } // Initializes this Car with 8 cylinders
void start() { e_.start(); } // Start this Car by starting its Engine
private:
Engine e_; // Car has-a Engine
};
The "Car has-a Engine" relationship can also be expressed using private inheritance:
• In both cases there is exactly one Engine member object contained in every Car object
• In neither case can users (outsiders) convert a Car* to an Engine*
• In both cases the Car class has a start() method that calls the start() method on the
contained Engine object.
• The simple-composition variant is needed if you want to contain several Engines per Car
• The private-inheritance variant can introduce unnecessary multiple inheritance
• The private-inheritance variant allows members of Car to convert a Car* to an Engine*
• The private-inheritance variant allows access to the protected members of the base class
• The private-inheritance variant allows Car to override Engine's virtual functions
• The private-inheritance variant makes it slightly simpler (20 characters compared to 28
characters) to give Car a start() method that simply calls through to the Engine's start()
method
Note that private inheritance is usually used to gain access into the protected members of the base
class, but this is usually a short-term solution (translation: a band-aid).
Use composition when you can, private inheritance when you have to.
Normally you don't want to have access to the internals of too many other classes, and private
inheritance gives you some of this extra power (and responsibility). But private inheritance isn't
evil; it's just more expensive to maintain, since it increases the probability that someone will
change something that will break your code.
A legitimate, long-term use for private inheritance is when you want to build a class Fred that uses
code in a class Wilma, and the code from class Wilma needs to invoke member functions from your
new class, Fred. In this case, Fred calls non-virtuals in Wilma, and Wilma calls (usually pure
virtuals) in itself, which are overridden by Fred. This would be much harder to do with composition.
92 of 133
C++ FAQ
class Wilma {
protected:
void fredCallsWilma()
{
std::cout << "Wilma::fredCallsWilma()\n";
wilmaCallsFred();
}
virtual void wilmaCallsFred() = 0; // A pure virtual function
};
[24.4] Should I pointer-cast from a private derived class to its base class?
Generally, No.
From a member function or friend of a privately derived class, the relationship to the base class is
known, and the upward conversion from PrivatelyDer* to Base* (or PrivatelyDer& to Base&) is safe;
no cast is needed or recommended.
However users of PrivatelyDer should avoid this unsafe conversion, since it is based on a private
decision of PrivatelyDer, and is subject to change without notice.
Similarities: both allow overriding virtual functions in the private/protected base class, neither
claims the derived is a kind-of its base.
Dissimilarities: protected inheritance allows derived classes of derived classes to know about the
inheritance relationship. Thus your grand kids are effectively exposed to your implementation
details. This has both benefits (it allows derived classes of the protected derived class to exploit the
relationship to the protected base class) and costs (the protected derived class can't change the
relationship without potentially breaking further derived classes).
[24.6] What are the access rules with private and protected inheritance?
93 of 133
C++ FAQ
class B { /*...*/ };
class D_priv : private B { /*...*/ };
class D_prot : protected B { /*...*/ };
class D_publ : public B { /*...*/ };
class UserClass { B b; /*...*/ };
None of the derived classes can access anything that is private in B. In D_priv, the public and
protected parts of B are private. In D_prot, the public and protected parts of B are protected. In
D_publ, the public parts of B are public and the protected parts of B are protected (D_publ is-a-
kind-of-a B). class UserClass can access only the public parts of B, which "seals off" UserClass from
B.
To make a public member of B so it is public in D_priv or D_prot, state the name of the member
with a B:: prefix. E.g., to make member B::f(int,float) public in D_prot, you would say:
Thank you for reading this answer rather than just trying to set your own coding standards.
But beware that some people on comp.lang.c++ are very sensitive on this issue. Nearly every
software engineer has, at some point, been exploited by someone who used coding standards as a
"power play." Furthermore some attempts to set C++ coding standards have been made by those
who didn't know what they were talking about, so the standards end up being based on what was
the state-of-the-art when the standards setters were writing code. Such impositions generate an
attitude of mistrust for coding standards.
Obviously anyone who asks this question wants to be trained so they don't run off on their own
ignorance, but nonetheless posting a question such as this one to comp.lang.c++ tends to generate
more heat than light.
Coding standards do not make non-OO programmers into OO programmers; only training and
experience do that. If coding standards have merit, it is that they discourage the petty
fragmentation that occurs when large organizations coordinate the activities of diverse groups of
programmers.
But you really want more than a coding standard. The structure provided by coding standards gives
neophytes one less degree of freedom to worry about, which is good. However pragmatic guidelines
should go well beyond pretty-printing standards. Organizations need a consistent philosophy of
design and implementation. E.g., strong or weak typing? references or pointers in interfaces?
stream I/O or stdio? should C++ code call C code? vice versa? how should ABCs be used? should
inheritance be used as an implementation technique or as a specification technique? what testing
strategy should be employed? inspection strategy? should interfaces uniformly have a get() and/or
set() member function for each data member? should interfaces be designed from the outside-in or
the inside-out? should errors be handled by try/catch/throw or by return codes? etc.
What is needed is a "pseudo standard" for detailed design. I recommend a three-pronged approach
to achieving this standardization: training, mentoring, and libraries. Training provides "intense
instruction," mentoring allows OO to be caught rather than just taught, and high quality C++ class
libraries provide "long term instruction." There is a thriving commercial market for all three kinds of
"training." Advice by organizations who have been through the mill is consistent: Buy, Don't Build.
94 of 133
C++ FAQ
Buy libraries, buy training, buy tools, buy consulting. Companies who have attempted to become a
self-taught tool-shop as well as an application/system shop have found success difficult.
Few argue that coding standards are "ideal," or even "good," however they are necessary in the
kind of organizations/situations described above.
The following FAQs provide some basic guidance in conventions and styles.
[25.3] Should our organization determine coding standards from our C experience?
No!
No matter how vast your C experience, no matter how advanced your C expertise, being a good C
programmer does not make you a good C++ programmer. Converting from C to C++ is more than
just learning the syntax and semantics of the ++ part of C++. Organizations who want the promise
of OO, but who fail to put the "OO" into "OO programming", are fooling themselves; the balance
sheet will show their folly.
C++ coding standards should be tempered by C++ experts. Asking comp.lang.c++ is a start. Seek
out experts who can help guide you away from pitfalls. Get training. Buy libraries and see if "good"
libraries pass your coding standards. Do not set standards by yourself unless you have considerable
experience in C++. Having no standard is better than having a bad standard, since improper
"official" positions "harden" bad brain traces. There is a thriving market for both C++ training and
libraries from which to pull expertise.
One more thing: whenever something is in demand, the potential for charlatans increases. Look
before you leap. Also ask for student-reviews from past companies, since not even expertise makes
someone a good communicator. Finally, select a practitioner who can teach, not a full time teacher
who has a passing knowledge of the language/paradigm.
The headers in ISO Standard C++ don't have a .h suffix. This is something the standards
committee changed from former practice. The details are different between headers that existed in
C and those that are specific to C++.
The C++ standard library is guaranteed to have 18 standard headers from the C language. These
headers come in two standard flavors, <cxxx> and <xxx.h> (where xxx is the basename of the
header, such as stdio, stdlib, etc). These two flavors are identical except the <cxxx> versions
provide their declarations in the std namespace only, and the <xxx.h> versions make them
available both in std namespace and in the global namespace. The committee did it this way so that
existing C code could continue to be compiled in C++. However the <xxx.h> versions are
deprecated, meaning they are standard now but might not be part of the standard in future
revisions. (See clause D.5 of the ISO C++ standard.)
The C++ standard library is also guaranteed to have 32 additional standard headers that have no
direct counterparts in C, such as <iostream>, <string>, and <new>. You may see things like
#include <iostream.h> and so on in old code, and some compiler vendors offer .h versions for that
reason. But be careful: the .h versions, if available, may differ from the standard versions. And if
you compile some units of a program with, for example, <iostream> and others with <iostream.h>,
the program may not work.
For new projects, use only the <xxx> headers, not the <xxx.h> headers.
When modifying or extending existing code that uses the old header names, you should probably
follow the practice in that code unless there's some important reason to switch to the standard
95 of 133
C++ FAQ
headers (such as a facility available in standard <iostream> that was not available in the vendor's
<iostream.h>). If you need to standardize existing code, make sure to change all C++ headers in
all program units including external libraries that get linked in to the final executable.
All of this affects the standard headers only. You're free to name your own headers anything you
like; see [25.8].
[25.5] Is the ?: operator evil since it can be used to create unreadable code?
No, but as always, remember that readability is one of the most important things.
Some people feel the ?: ternary operator should be avoided because they find it confusing at times
compared to the good old if statement. In many cases ?: tends to make your code more difficult to
read (and therefore you should replace those usages of ?: with if statements), but there are times
when the ?: operator is clearer since it can emphasize what's really happening, rather than the fact
that there's an if in there somewhere.
Let's start with a really simple case. Suppose you need to print the result of a function call. In that
case you should put the real goal (printing) at the beginning of the line, and bury the function call
within the line since it's relatively incidental (this left-right thing is based on the intuitive notion that
most developers think the first thing on a line is the most important thing):
Now let's extend this idea to the ?: operator. Suppose your real goal is to print something, but you
need to do some incidental decision logic to figure out what should be printed. Since the printing is
the most important thing conceptually, we prefer to put it first on the line, and we prefer to bury
the incidental decision logic. In the example code below, variable n represents the number of
senders of a message; the message itself is being printed to std::cout:
All that being said, you can get pretty outrageous and unreadable code ("write only code") using
various combinations of ?:, &&, ||, etc. For example,
96 of 133
C++ FAQ
Personally I think the explicit if example is clearer since it emphasizes the major thing that's going
on (a decision based on the result of calling f()) rather than the minor thing (calling f()). In other
words, the use of if here is good for precisely the same reason that it was bad above: we want to
major on the majors and minor on the minors.
In any event, don't forget that readability is the goal (at least it's one of the goals). Your goal
should not be to avoid certain syntactic constructs such as ?: or && or || or if — or even goto. If
you sink to the level of a "Standards Bigot," you'll ultimately embarass yourself since there are
always counterexamples to any syntax-based rule. If on the other hand you emphasize broad goals
and guidelines (e.g., "major on the majors," or "put the most important thing first on the line," or
even "make sure your code is obvious and readable"), you're usually much better off.
Code must be written to be read, not by the compiler, but by another human being.
An object is initialized (constructed) the moment it is declared. If you don't have enough
information to initialize an object until half way down the function, you should create it half way
down the function when it can be initialized correctly. Don't initialize it to an "empty" value at the
top then "assign" it later. The reason for this is runtime performance. Building an object correctly is
faster than building it incorrectly and remodeling it later. Simple examples show a factor of 350%
speed hit for simple classes like String. Your mileage may vary; surely the overall system
degradation will be less that 350%, but there will be degradation. Unnecessary degradation.
A common retort to the above is: "we'll provide set() member functions for every datum in our
objects so the cost of construction will be spread out." This is worse than the performance
overhead, since now you're introducing a maintenance nightmare. Providing a set() member
function for every datum is tantamount to public data: you've exposed your implementation
technique to the world. The only thing you've hidden is the physical names of your member objects,
but the fact that you're using a List and a String and a float, for example, is open for all to see.
Bottom line: Locals should be declared near their first use. Sorry that this isn't familiar to C experts,
but new doesn't necessarily mean bad.
If you already have a convention, use it. If not, consult your compiler to see what the compiler
expects. Typical answers are: .C, .cc, .cpp, or .cxx (naturally the .C extension assumes a case-
sensitive file system to distinguish .C from .c).
We've often used both .cpp for our C++ source files, and we have also used .C. In the latter case,
we supply the compiler option forces .c files to be treated as C++ source files (-Tdp for IBM CSet+
+, -cpp for Zortech C++, -P for Borland C++, etc.) when porting to case-insensitive file systems.
None of these approaches have any striking technical superiority to the others; we generally use
whichever technique is preferred by our customer (again, these issues are dominated by business
considerations, not by technical considerations).
If you already have a convention, use it. If not, and if you don't need your editor to distinguish
between C and C++ files, simply use .h. Otherwise use whatever the editor wants, such as .H, .hh,
or .hpp.
We've tended to use either .hpp or .h for our C++ header files.
97 of 133
C++ FAQ
[25.9] Are there any lint-like guidelines for C++?
Yes, there are some practices which are generally considered dangerous. However none of these
are universally "bad," since situations arise when even the worst of these is needed:
• A class Fred's assignment operator should return *this as a Fred& (allows chaining of
assignments)
• A class with any virtual functions ought to have a virtual destructor
• A class with any of {destructor, assignment operator, copy constructor} generally needs all
3
• A class Fred's copy constructor and assignment operator should have const in the
parameter: respectively Fred::Fred(const Fred&) and Fred& Fred::operator= (const Fred&)
• When initializing an object's member objects in the constructor, always use initialization lists
rather than assignment. The performance difference for user-defined classes can be
substantial (3x!)
• Assignment operators should make sure that self assignment does nothing, otherwise you
may have a disaster. In some cases, this may require you to add an explicit test to your
assignment operators.
• In classes that define both += and +, a += b and a = a + b should generally do the same
thing; ditto for the other identities of built-in/intrinsic types (e.g., a += 1 and ++a; p[i] and
*(p+i); etc). This can be enforced by writing the binary operations using the op= forms.
E.g.,
Fred operator+ (const Fred& a, const Fred& b)
{
Fred ans = a;
ans += b;
return ans;
}
This way the "constructive" binary operators don't even need to be friends. But it is
sometimes possible to more efficiently implement common operations (e.g., if class Fred is
actually std::string, and += has to reallocate/copy string memory, it may be better to know
the eventual length from the beginning).
[25.10] Why do people worry so much about pointer casts and/or reference casts?
[Recently created (in 8/01). Click here to go to the next FAQ in the "chain" of recent changes.]
Because they're evil! (Use them sparingly and with great care.)
For some reason, programmers are sloppy in their use of pointer casts. They cast this to that all
over the place, then they wonder why things don't quite work right. Here's the worst thing: when
the compiler gives them an error message, they add a cast to "shut the compiler up," then they
"test it" to see if it seems to work. If you have a lot of pointer casts or reference casts, read on.
The compiler will often be silent when you're doing pointer-casts and/or reference casts. Pointer-
casts (and reference-casts) tend to shut the compiler up. I think of them as a filter on error
messages: the compiler wants to complain because it sees you're doing something stupid, but it
also sees that it's not allowed to complain due to your pointer-cast, so it drops the error message
into the bit-bucket. It's like putting duct tape on the compiler's mouth: it's trying to tell you
something important, but you've intentionally shut it up.
A pointer-cast says to the compiler, "Stop thinking and start generating code; I'm smart, you're
dumb; I'm big, you're little; I know what I'm doing so just pretend this is assembly language and
generate the code." The compiler pretty much blindly generates code when you start casting — you
98 of 133
C++ FAQ
are taking control (and responsibility!) for the outcome. The compiler and the language reduce (and
in some cases eliminate!) the guarantees you get as to what will happen. You're on your own.
By way of analogy, even if it's legal to juggle chainsaws, it's stupid. If something goes wrong, don't
bother complaining to the chainsaw manufacturer — you did something they didn't guarantee would
work. You're on your own.
(To be completely fair, the language does give you some guarantees when you cast, at least in a
limited subset of casts. For example, it's guaranteed to work as you'd expect if the cast happens to
be from an object-pointer (a pointer to a piece of data, as opposed to a pointer-to-function or
pointer-to-member) to type void* and back to the same type of object-pointer. But in a lot of cases
you're on your own.)
[28.1] What is value and/or reference semantics, and which is best in C++?
With reference semantics, assignment is a pointer-copy (i.e., a reference). Value (or "copy")
semantics mean assignment copies the value, not just the pointer. C++ gives you the choice: use
the assignment operator to copy the value (copy/value semantics), or use a pointer-copy to copy a
pointer (reference semantics). C++ allows you to override the assignment operator to do anything
your heart desires, however the default (and most common) choice is to copy the value.
Pros of reference semantics: flexibility and dynamic binding (you get dynamic binding in C++ only
when you pass by pointer or pass by reference, not when you pass by value).
Pros of value semantics: speed. "Speed" seems like an odd benefit for a feature that requires an
object (vs. a pointer) to be copied, but the fact of the matter is that one usually accesses an object
more than one copies the object, so the cost of the occasional copies is (usually) more than offset
by the benefit of having an actual object rather than a pointer to an object.
There are three cases when you have an actual object as opposed to a pointer to an object: local
objects, global/static objects, and fully contained member objects in a class. The most important of
these is the last ("composition").
More info about copy-vs-reference semantics is given in the next FAQs. Please read them all to get
a balanced perspective. The first few have intentionally been slanted toward value semantics, so if
you only read the first few of the following FAQs, you'll get a warped perspective.
Assignment has other issues (e.g., shallow vs. deep copy) which are not covered here.
virtual data allows a derived class to change the exact class of a base class's member object. virtual
data isn't strictly "supported" by C++, however it can be simulated in C++. It ain't pretty, but it
works.
To simulate virtual data in C++, the base class must have a pointer to the member object, and the
derived class must provide a new object to be pointed to by the base class's pointer. The base class
would also have one or more normal constructors that provide their own referent (again via new),
and the base class's destructor would delete the referent.
For example, class Stack might have an Array member object (using a pointer), and derived class
StretchableStack might override the base class member data from Array to StretchableArray. For
this to work, StretchableArray would have to inherit from Array, so Stack would have an Array*.
99 of 133
C++ FAQ
Stack's normal constructors would initialize this Array* with a new Array, but Stack would also have
a (possibly protected) constructor that would accept an Array* from a derived class.
StretchableStack's constructor would provide a new StretchableArray to this special constructor.
Pros:
Cons:
In other words, we succeeded at making our job easier as the implementer of StretchableStack, but
all our users pay for it. Unfortunately the extra overhead was imposed on both users of
StretchableStack and on users of Stack.
Please read the rest of this section. (You will not get a balanced perspective without the others.)
[28.3] What's the difference between virtual data and dynamic data?
The easiest way to see the distinction is by an analogy with virtual functions: A virtual member
function means the declaration (signature) must stay the same in derived classes, but the definition
(body) can be overridden. The overriddenness of an inherited member function is a static property
of the derived class; it doesn't change dynamically throughout the life of any particular object, nor
is it possible for distinct objects of the derived class to have distinct definitions of the member
function.
Now go back and re-read the previous paragraph, but make these substitutions:
Another way to look at this is to distinguish "per-object" member functions from "dynamic" member
functions. A "per-object" member function is a member function that is potentially different in any
given instance of an object, and could be implemented by burying a function pointer in the object;
this pointer could be const, since the pointer will never be changed throughout the object's life. A
"dynamic" member function is a member function that will change dynamically over time; this could
also be implemented by a function pointer, but the function pointer would not be const.
Extending the analogy, this gives us three distinct concepts for data members:
• virtual data: the definition (class) of the member object is overridable in derived classes
provided its declaration ("type") remains the same, and this overriddenness is a static
property of the derived class
• per-object-data: any given object of a class can instantiate a different conformal (same
type) member object upon initialization (usually a "wrapper" object), and the exact class of
the member object is a static property of the object that wraps it
• dynamic-data: the member object's exact class can change dynamically over time
100 of 133
C++ FAQ
The reason they all look so much the same is that none of this is "supported" in C++. It's all merely
"allowed," and in this case, the mechanism for faking each of these is the same: a pointer to a
(probably abstract) base class. In a language that made these "first class" abstraction mechanisms,
the difference would be more striking, since they'd each have a different syntactic variant.
[28.4] Should I normally use pointers to freestore allocated objects for my data
members, or should I use "composition"?
Composition.
Your member objects should normally be "contained" in the composite object (but not always;
"wrapper" objects are a good example of where you want a pointer/reference; also the N-to-1-uses-
a relationship needs something like a pointer/reference).
There are three reasons why fully contained member objects ("composition") has better
performance than pointers to freestore-allocated member objects:
• Extra layer of indirection every time you need to access the member object
• Extra freestore allocations (new in constructor, delete in destructor)
• Extra dynamic binding (reason given below)
[28.5] What are relative costs of the 3 performance hits associated with allocating
member objects from the freestore?
Thus fully-contained member objects allow significant optimizations that wouldn't be possible under
the "member objects-by-pointer" approach. This is the main reason that languages which enforce
reference-semantics have "inherent" performance challenges.
Note: Please read the next three FAQs to get a balanced perspective!
Occasionally...
When the object is referenced via a pointer or a reference, a call to a virtual function cannot be
inlined, since the call must be resolved dynamically. Reason: the compiler can't know which actual
code to call until run-time (i.e., dynamically), since the code may be from a derived class that was
created after the caller was compiled.
Therefore the only time an inline virtual call can be inlined is when the compiler knows the "exact
class" of the object which is the target of the virtual function call. This can happen only when the
101 of 133
C++ FAQ
compiler has an actual object rather than a pointer or reference to an object. I.e., either with a local
object, a global/static object, or a fully contained object inside a composite.
Note that the difference between inlining and non-inlining is normally much more significant than
the difference between a regular function call and a virtual function call. For example, the difference
between a regular function call and a virtual function call is often just two extra memory references,
but the difference between an inline function and a non-inline function can be as much as an order
of magnitude (for zillions of calls to insignificant member functions, loss of inlining virtual functions
can result in 25X speed degradation! [Doug Lea, "Customization in C++," proc Usenix C++ 1990]).
A practical consequence of this insight: don't get bogged down in the endless debates (or sales
tactics!) of compiler/language vendors who compare the cost of a virtual function call on their
language/compiler with the same on another language/compiler. Such comparisons are largely
meaningless when compared with the ability of the language/compiler to "inline expand" member
function calls. I.e., many language implementation vendors make a big stink about how good their
dispatch strategy is, but if these implementations don't inline member function calls, the overall
system performance would be poor, since it is inlining —not dispatching— that has the greatest
performance impact.
Note: Please read the next two FAQs to see the other side of this coin!
Wrong.
Reference semantics are A Good Thing. We can't live without pointers. We just don't want our s/w
to be One Gigantic Rats Nest Of Pointers. In C++, you can pick and choose where you want
reference semantics (pointers/references) and where you'd like value semantics (where objects
physically contain other objects etc). In a large system, there should be a balance. However if you
implement absolutely everything as a pointer, you'll get enormous speed hits.
Objects near the problem skin are larger than higher level objects. The identity of these "problem
space" abstractions is usually more important than their "value." Thus reference semantics should
be used for problem-space objects.
Note that these problem space objects are normally at a higher level of abstraction than the
solution space objects, so the problem space objects normally have a relatively lower frequency of
interaction. Therefore C++ gives us an ideal situation: we choose reference semantics for objects
that need unique identity or that are too large to copy, and we can choose value semantics for the
others. Thus the highest frequency objects will end up with value semantics, since we install
flexibility where it doesn't hurt us (only), and we install performance where we need it most!
These are some of the many issues the come into play with real OO design. OO/C++ mastery takes
time and high quality training. If you want a powerful tool, you've got to invest.
[28.8] Does the poor performance of reference semantics mean I should pass-by-value?
Nope.
The previous FAQ were talking about member objects, not parameters. Generally, objects that are
part of an inheritance hierarchy should be passed by reference or by pointer, not by value, since
only then do you get the (desired) dynamic binding (pass-by-value doesn't mix with inheritance,
since larger derived class objects get "sliced" when passed by value as a base class object).
102 of 133
C++ FAQ
Unless compelling reasons are given to the contrary, member objects should be by value and
parameters should be by reference. The discussion in the previous few FAQs indicates some of the
"compelling reasons" for when member objects should be by reference.
• You must use your C++ compiler when compiling main() (e.g., for static initialization)
• Your C++ compiler should direct the linking process (e.g., so it can get its special libraries)
• Your C and C++ compilers probably need to come from same vendor and have compatible
versions (e.g., so they have the same calling conventions)
In addition, you'll need to read the rest of this section to find out how to make your C functions
callable by C++ and/or your C++ functions callable by C.
BTW there is another way to handle this whole thing: compile all your code (even your C-style
code) using a C++ compiler. That pretty much eliminates the need to mix C and C++, plus it will
cause you to be more careful (and possibly —hopefully!— discover some bugs) in your C-style code.
The down-side is that you'll need to update your C-style code in certain ways, basically because the
C++ compiler is more careful/picky than your C compiler. The point is that the effort required to
clean up your C-style code may be less than the effort required to mix C and C++, and as a bonus
you get cleaned up C-style code. Obviously you don't have much of a choice if you're not able to
alter your C-style code (e.g., if it's from a third-party).
To #include a standard header file (such as <cstdio>), you don't have to do anything unusual. E.g.,
int main()
{
std::printf("Hello world\n"); // Nothing unusual in the call either
}
If you think the std:: part of the std::printf() call is unusual, then the best thing to do is "get over
it." In other words, it's the standard way to use names in the standard library, so you might as well
start getting used to it now.
However if you are compiling C code using your C++ compiler, you don't want to have to tweak all
these calls from printf() to std::printf(). Fortunately in this case the C code will use the old-style
header <stdio.h> rather than the new-style header <cstdio>, and the magic of namespaces will
take care of everything else:
int main()
{
printf("Hello world\n"); /* Nothing unusual in the call either */
}
103 of 133
C++ FAQ
Final comment: if you have C headers that are not part of the standard library, we have somewhat
different guidelines for you. There are two cases: either you can't change the header, or you can
change the header.
If you are including a C header file that isn't provided by the system, you may need to wrap the
#include line in an extern C { /*...*/ } construct. This tells the C++ compiler that the functions
declared in the header file are are C functions.
extern "C" {
// Get declaration for f(int i, char c, float x)
#include "my-C-code.h"
}
int main()
{
f(7, 'x', 3.14); // Note: nothing unusual in the call
}
Note: Somewhat different guidelines apply for C headers provided by the system (such as
<cstdio>) and for C headers that you can change.
[29.4] How can I modify my own C header files so it's easier to #include them in C++
code?
If you are including a C header file that isn't provided by the system, and if you are able to change
the C header, you should strongly consider adding the extern C {...} logic inside the header to
make it easier for C++ users to #include it into their C++ code. Since a C compiler won't
understand the extern C construct, you must wrap the extern C { and } lines in an #ifdef so they
won't be seen by normal C compilers.
Step #1: Put the following lines at the very top of your C header file (note: the symbol __cplusplus
is #defined if/only-if the compiler is a C++ compiler):
#ifdef __cplusplus
extern "C" {
#endif
Step #2: Put the following lines at the very bottom of your C header file:
#ifdef __cplusplus
}
#endif
Now you can #include your C header without any extern C nonsense in your C++ code:
int main()
{
104 of 133
C++ FAQ
f(7, 'x', 3.14); // Note: nothing unusual in the call
}
Note: Somewhat different guidelines apply for C headers provided by the system (such as
<cstdio>) and for C headers that you can't change.
Note: #define macros are evil in 4 different ways: evil#1, evil#2, evil#3, and evil#4. But they're
still useful sometimes. Just wash your hands after using them.
[29.5] How can I call a non-system C function f(int,char,float) from my C++ code?
If you have an individual C function that you want to call, and for some reason you don't have or
don't want to #include a C header file in which that function is declared, you can declare the
individual C function in your C code using the extern C syntax. Naturally you need to use the full
function prototype:
extern "C" {
void f(int i, char c, float x);
int g(char* s, const char* s2);
double sqrtOfSumOfSquares(double a, double b);
}
After this you simply call the function just as if it was a C++ function:
int main()
{
f(7, 'x', 3.14); // Note: nothing unusual in the call
}
[29.6] How can I create a C++ function f(int,char,float) that is callable by my C code?
The C++ compiler must know that f(int,char,float) is to be called by a C compiler using the extern C
construct:
// ...
The extern C line tells the compiler that the external information sent to the linker should use C
calling conventions and name mangling (e.g., preceded by a single underscore). Since name
overloading isn't supported by C, you can't make several overloaded functions simultaneously
callable by a C program.
105 of 133
C++ FAQ
[29.7] Why is the linker giving errors for C/C++ functions being called from C++/C
functions?
If you didn't get your extern C right, you'll sometimes get linker errors rather than compiler errors.
This is due to the fact that C++ compilers usually "mangle" function names (e.g., to support
function overloading) differently than C compilers.
Here's an example (for info on extern C, see the previous two FAQs).
Fred.h:
#ifdef __cplusplus
class Fred {
public:
Fred();
void wilma(int);
private:
int a_;
};
#else
typedef
struct Fred
Fred;
#endif
#ifdef __cplusplus
extern "C" {
#endif
#ifdef __cplusplus
}
#endif
#endif /*FRED_H*/
Fred.cpp:
#include "Fred.h"
106 of 133
C++ FAQ
Fred::Fred() : a_(0) { }
void Fred::wilma(int a) { }
main.cpp:
#include "Fred.h"
int main()
{
Fred fred;
c_function(&fred);
return 0;
}
c-function.c:
/* This is C code */
#include "Fred.h"
Passing pointers to C++ objects to/from C functions will fail if you pass and get back something
that isn't exactly the same pointer. For example, don't pass a base class pointer and receive back a
derived class pointer, since your C compiler won't understand the pointer conversions necessary to
handle multiple and/or virtual inheritance.
Sometimes.
(For basic info on passing C++ objects to/from C functions, read the previous FAQ).
You can safely access a C++ object's data from a C function if the C++ class:
If the C++ class has any base classes at all (or if any fully contained subobjects have base classes),
accessing the data will technically be non-portable, since class layout under inheritance isn't
imposed by the language. However in practice, all C++ compilers do it the same way: the base
class object appears first (in left-to-right order in the event of multiple inheritance), and member
objects follow.
107 of 133
C++ FAQ
Furthermore, if the class (or any base class) contains any virtual functions, almost all C++
compliers put a void* into the object either at the location of the first virtual function or at the very
beginning of the object. Again, this is not required by the language, but it is the way "everyone"
does it.
If the class has any virtual base classes, it is even more complicated and less portable. One
common implementation technique is for objects to contain an object of the virtual base class (V)
last (regardless of where V shows up as a virtual base class in the inheritance hierarchy). The rest
of the object's parts appear in the normal order. Every derived class that has V as a virtual base
class actually has a pointer to the V part of the final object.
[29.10] Why do I feel like I'm "further from the machine" in C++ as opposed to C?
As an OO programming language, C++ allows you to model the problem domain itself, which allows
you to program in the language of the problem domain rather than in the language of the solution
domain.
One of C's great strengths is the fact that it has "no hidden mechanism": what you see is what you
get. You can read a C program and "see" every clock cycle. This is not the case in C++; old line C
programmers (such as many of us once were) are often ambivalent (can you say, "hostile"?) about
this feature. However after they've made the transition to OO thinking, they often realize that
although C++ hides some mechanism from the programmer, it also provides a level of abstraction
and economy of expression which lowers maintenance costs without destroying run-time
performance.
Naturally you can write bad code in any language; C++ doesn't guarantee any particular level of
quality, reusability, abstraction, or any other measure of "goodness."
C++ doesn't try to make it impossible for bad programmers to write bad programs; it enables
reasonable developers to create superior software.
Yep.
The type of this function is different depending on whether it is an ordinary function or a non-static
member function of some class:
Note: if it's a static member function of class Fred, its type is the same as if it was an ordinary
function: "int (*)(char,float)".
[30.2] How do I pass a pointer to member function to a signal handler, X event callback,
etc?
Don't.
108 of 133
C++ FAQ
Because a member function is meaningless without an object to invoke it on, you can't do this
directly (if The X Windows System was rewritten in C++, it would probably pass references to
objects around, not just pointers to functions; naturally the objects would embody the required
function and probably a whole lot more).
As a patch for existing software, use a top-level (non-member) function as a wrapper which takes
an object obtained through some other technique (held in a global, perhaps). The top-level function
would apply the desired member function against the global object.
class Fred {
public:
void memberFunction();
static void staticMemberFunction(); // A static member function can handle it
// ...
};
int main()
{
/* signal(SIGINT, Fred::memberFunction); */ // Can NOT do this
signal(SIGINT, Fred_memberFunction_wrapper); // OK
signal(SIGINT, Fred::staticMemberFunction); // Also OK
}
Note: static member functions do not require an actual object to be invoked, so pointers-to-static-
member-functions are type compatible with regular pointers-to-functions.
[30.3] Why do I keep getting compile errors (type mismatch) when I try to use a member
function as an interrupt service routine?
This is a special case of the previous two questions, therefore read the previous two answers first.
Non-static member functions have a hidden parameter that corresponds to the this pointer. The this
pointer points to the instance data for the object. The interrupt hardware/firmware in the system is
not capable of providing the this pointer argument. You must use "normal" functions (non class
members) or static member functions as interrupt service routines.
One possible solution is to use a static member as the interrupt service routine and have that
function look somewhere to find the instance/member pair that should be called on interrupt. Thus
the effect is that a member function is invoked on an interrupt, but for technical reasons you need
to call an intermediate function first.
Long answer: In C++, member functions have an implicit parameter which points to the object (the
this pointer inside the member function). Normal C functions can be thought of as having a different
calling convention from member functions, so the types of their pointers (pointer-to-member-
109 of 133
C++ FAQ
function vs. pointer-to-function) are different and incompatible. C++ introduces a new type of
pointer, called a pointer-to-member, which can be invoked only by providing an object.
[30.5] How can I avoid syntax errors when calling a member function using a pointer-to-
member-function?
Two things: (1) use a typedef, and (2) use a #define macro.
class Fred {
public:
int f(char x, float y);
int g(char x, float y);
int h(char x, float y);
int i(char x, float y);
// ...
};
Here's the way you create the #define macro (normally I dislike #define macros, but this is one of
those rare cases where they actually improve the readability and writability of your code):
I strongly recommend these features. In the real world, member function invocations are a lot more
complex than the simple example just given, and the difference in readability and writability is
significant. comp.lang.c++ has had to endure hundreds and hundreds of postings from confused
programmers who couldn't quite get the syntax right. Almost all these errors would have vanished
had they used these features.
Note: #define macros are evil in 4 different ways: evil#1, evil#2, evil#3, and evil#4. But they're
still useful sometimes. But you should still feel a vague sense of shame after using them.
Use the usual typedef and #define macro and you're 90% done.
110 of 133
C++ FAQ
class Fred {
public:
int f(char x, float y);
int g(char x, float y);
int h(char x, float y);
int i(char x, float y);
// ...
};
That makes calling one of the member functions on object "fred" straightforward:
Note: #define macros are evil in 4 different ways: evil#1, evil#2, evil#3, and evil#4. But they're
still useful sometimes. Feel ashamed, feel guilty, but when an evil construct like a macro improves
your software, use it.
31.1] Why should I use container classes rather than simple arrays?
Let's assume the best case scenario: you're an experienced C programmer, which almost by
definition means you're pretty good at working with arrays. You know you can handle the
complexity; you've done it for years. And you're smart — the smartest on the team — the smartest
in the whole company. But even given all that, please read this entire FAQ and think very carefully
about it before you go into "business as usual" mode.
Fundamentally it boils down to this simple fact: C++ is not C. That means (this might be painful for
you!!) you'll need to set aside some of your hard earned wisdom from your vast experience in C.
The two languages simply are different. The "best" way to do something in C is not always the same
as the "best" way to do it in C++. If you really want to program in C, please do yourself a favor and
program in C. But if you want to be really good at C++, then learn the C++ ways of doing things.
You may be a C guru, but if you're just learning C++, you're just learning C++ — you're a newbie.
(Ouch; I know that had to hurt. Sorry.)
1. Container classes make programmers more productive. So if you insist on using arrays while
those around are willing to use container classes, you'll probably be less productive than
they are (even if you're smarter and more experienced than they are!).
111 of 133
C++ FAQ
2. Container classes let programmers write more robust code. So if you insist on using arrays
while those around are willing to use container classes, your code will probably have more
bugs than their code (even if you're smarter and more experienced).
3. And if you're so smart and so experienced that you can use arrays as fast and as safe as
they can use container classes, someone else will probably end up maintaining your code
and they'll probably introduce bugs. Or worse, you'll be the only one who can maintain your
code so management will yank you from development and move you into a full-time
maintenance role — just what you always wanted!
1. Subscripts don't get checked to see if they are out of bounds. (Note that some container
classes, such as std::vector, have methods to access elements with or without bounds
checking on subscripts.)
2. Arrays often require you to allocate memory from the heap (see below for examples), in
which case you must manually make sure the allocation is eventually deleted (even when
someone throws an exception). When you use container classes, this memory management
is handled automatically, but when you use arrays, you have to manually write a bunch of
code (and unfortunately that code is often subtle and tricky) to deal with this. For example,
in addition to writing the code that destroys all the objects and deletes the memory, arrays
often also force you you to write an extra try block with a catch clause that destroys all the
objects, deletes the memory, then re-throws the exception. This is a real pain in the neck, as
shown here. When using container classes, things are much easier.
3. You can't insert an element into the middle of the array, or even add one at the end, unless
you allocate the array via the heap, and even then you must allocate a new array and copy
the elements.
4. Container classes give you the choice of passing them by reference or by value, but arrays
do not give you that choice: they are always passed by reference. If you want to simulate
pass-by-value with an array, you have to manually write code that explicitly copies the
array's elements (possibly allocating from the heap), along with code to clean up the copy
when you're done with it. All this is handled automatically for you if you use a container
class.
5. If your function has a non-static local array (i.e., an "auto" array), you cannot return that
array, whereas the same is not true for objects of container classes.
1. Different C++ containers have different strengths and weaknesses, but for any given job
there's usually one of them that is better — clearer, safer, easier/cheaper to maintain, and
often more efficient — than an array. For instance,
o You might consider a std::map instead of manually writing code for a lookup table.
o A std::map might also be used for a sparse array or sparse matrix.
o A std::vector is the most array-like of the standard container classes, but it also
offers various extra features such as bounds checking via the at() member function,
insertions/removals of elements, automatic memory management even if someone
throws an exception, ability to be passed both by reference and by value, etc.
o A std::string is almost always better than an array of char (you can think of a
std::string as a "container class" for the sake of this discussion).
2. Container classes aren't best for everything, and sometimes you may need to use arrays.
But that should be very rare, and if/when it happens:
o Please design your container class's public interface in such a way that the code that
uses the container class is unaware of the fact that there is an array inside.
o The goal is to "bury" the array inside a container class. In other words, make sure
there is a very small number of lines of code that directly touch the array (just your
112 of 133
C++ FAQ
own methods of your container class) so everyone else (the users of your container
class) can write code that doesn't depend on there being an array inside your
container class.
To net this out, arrays really are evil. You may not think so if you're new to C++. But after you
write a big pile of code that uses arrays (especially if you make your code leak-proof and exception-
safe), you'll learn — the hard way. Or you'll learn the easy way by believing those who've already
done things like that. The choice is yours.
#include <string>
#include <map>
#include <iostream>
int main()
{
// age is a map from string to int
std::map<std::string, int, std::less<std::string> > age;
std::cout << "Fred is " << age["Fred"] << " years old\n";
}
You can't, but you can fake it pretty well. In C/C++ all arrays are homogeneous (i.e., the elements
are all the same type). However, with an extra layer of indirection you can give the appearance of a
heterogeneous container (a heterogeneous container is a container where the contained objects are
of different types).
The first case occurs when all objects you want to store in a container are publicly derived from a
common base class. You can then declare/define your container to hold pointers to the base class.
You indirectly store a derived class object in a container by storing the object's address as an
element in the container. You can then access objects in the container indirectly through the
pointers (enjoying polymorphic behavior). If you need to know the exact type of the object in the
container you can use dynamic_cast<> or typeid(). You'll probably need the Virtual Constructor
Idiom to copy a container of disparate object types. The downside of this approach is that it makes
memory management a little more problematic (who "owns" the pointed-to objects? if you delete
these pointed-to objects when you destroy the container, how can you guarantee that no one else
has a copy of one of these pointers? if you don't delete these pointed-to objects when you destroy
the container, how can you be sure that someone else will eventually do the deleteing?). It also
makes copying the container more complex (may actually break the container's copying functions
since you don't want to copy the pointers, at least not when the container "owns" the pointed-to
objects).
The second case occurs when the object types are disjoint — they do not share a common base
class. The approach here is to use a handle class. The container is a container of handle objects (by
113 of 133
C++ FAQ
value or by pointer, your choice; by value is easier). Each handle object knows how to "hold on to"
(i.e. ,maintain a pointer to) one of the objects you want to put in the container. You can use either
a single handle class with several different types of pointers as instance data, or a hierarchy of
handle classes that shadow the various types you wish to contain (requires the container be of
handle base class pointers). The downside of this approach is that it opens up the handle class(es)
to maintenance every time you change the set of types that can be contained. The benefit is that
you can use the handle class(es) to encapsulate most of the ugliness of memory management and
object lifetime. Thus using handle objects may be beneficial even in the first case.
The most important thing to remember is this: don't roll your own from scratch unless there is a
compelling reason to do so. In other words, instead of creating your own list or hashtable, use one
of the standard class templates such as std::vector<T> or std::list<T> or whatever.
Assuming you have a compelling reason to build your own container, here's how to handle inserting
(or accessing, changing, etc.) the elements.
To make the discussion concrete, I'll discuss how to insert an element into a linked list. This
example is just complex enough that it generalizes pretty well to things like vectors, hash tables,
binary trees, etc.
A linked list makes it easy insert an element before the first or after the last element of the list, but
limiting ourselves to these would produce a library that is too weak (a weak library is almost worse
than no library). This answer will be a lot to swallow for novice C++'ers, so I'll give a couple of
options. The first option is easiest; the second and third are better.
1. Empower the List with a "current location," and member functions such as advance(),
backup(), atEnd(), atBegin(), getCurrElem(), setCurrElem(Elem), insertElem(Elem), and
removeElem(). Although this works in small examples, the notion of a current position
makes it difficult to access elements at two or more positions within the list (e.g., "for all
pairs x,y do the following...").
2. Remove the above member functions from List itself, and move them to a separate class,
ListPosition. ListPosition would act as a "current position" within a list. This allows multiple
positions within the same list. ListPosition would be a friend of class List, so List can hide its
innards from the outside world (else the innards of List would have to be publicized via
public member functions in List). Note: ListPosition can use operator overloading for things
like advance() and backup(), since operator overloading is syntactic sugar for normal
member functions.
3. Consider the entire iteration as an atomic event, and create a class template that embodies
this event. This enhances performance by allowing the public access member functions
(which may be virtual functions) to be avoided during the access, and this access often
occurs within an inner loop. Unfortunately the class template will increase the size of your
object code, since templates gain speed by duplicating code. For more, see [Koenig,
"Templates as interfaces," JOOP, 4, 5 (Sept 91)], and [Stroustrup, "The C++ Programming
Language Third Edition," under "Comparator"].
A template is a cookie-cutter that specifies how to cut cookies that all look pretty much the same
(although the cookies can be made of various kinds of dough, they'll all have the same basic
shape). In the same way, a class template is a cookie cutter for a description of how to build a
family of classes that all look basically the same, and a function template describes how to build a
family of similar looking functions.
114 of 133
C++ FAQ
Class templates are often used to build type safe containers (although this only scratches the
surface for how they can be used).
Repeating the above over and over for Array of float, of char, of std::string, of Array-of-std::string,
etc, will become tedious.
Unlike template functions, template classes (instantiations of class templates) need to be explicit
about the parameters over which they are instantiating:
int main()
{
Array<int> ai;
Array<float> af;
Array<char*> ac;
Array<std::string> as;
115 of 133
C++ FAQ
Array< Array<int> > aai;
}
Note the space between the two >'s in the last example. Without this space, the compiler would see
a >> (right-shift) token instead of two >'s.
If we also had to swap floats, longs, Strings, Sets, and FileSystems, we'd get pretty tired of coding
lines that look almost identical except for the type. Mindless repetition is an ideal job for a
computer, hence a function template:
template<class T>
void swap(T& x, T& y)
{
T tmp = x;
x = y;
y = tmp;
}
Every time we used swap() with a given pair of types, the compiler will go to the above definition
and will create yet another "template function" as an instantiation of the above. E.g.,
int main()
{
int i,j; /*...*/ swap(i,j); // Instantiates a swap for int
float a,b; /*...*/ swap(a,b); // Instantiates a swap for float
char c,d; /*...*/ swap(c,d); // Instantiates a swap for char
std::string s,t; /*...*/ swap(s,t); // Instantiates a swap for std::string
}
[31.8] How do I explicitly select which version of a function template should get called?
When you call a function template, the compiler tries to deduce the template type. Most of the time
it can do that successfully, but every once in a while you may want to help the compiler deduce the
right type — either because it cannot deduce the type at all, or perhaps because it would deduce
the wrong type.
For example, you might be calling a function template that doesn't have any parameters of its
template argument types, or you might want to force the compiler to do certain promotions on the
arguments before selecting the correct function template. In these cases you'll need to explicitly tell
the compiler which instantiation of the function template should be called.
116 of 133
C++ FAQ
Here is a sample function template where the template parameter T does not appear in the
function's parameter list. In this case the compiler cannot deduce the template parameter types
when the function is called.
template<class T>
void f()
{
// ...
}
To call this function with T being an int or a std::string, you could say:
#include <string>
void sample()
{
f<int>(); // type T will be int in this call
f<std::string>(); // type T will be std::string in this call
}
Here is another function whose template parameters appear in the function's list of formal
parameters (that is, the compiler can deduce the template type from the actual arguments):
template<class T>
void g(T x)
{
// ...
}
Now if you want to force the actual arguments to be promoted before the compiler deduces the
template type, you can use the above technique. E.g., if you simply called g(42) you would get
g<int>(42), but if you wanted to pass 42 to g<long>(), you could say this: g<long>(42). (Of
course you could also promote the parameter explicitly, such as either g(long(42)) or even g(42L),
but that ruins the example.)
Similarly if you said g("xyz") you'd end up calling g<char*>(char*), but if you wanted to call the
std::string version of g<>() you could say g<std::string>("xyz"). (Again you could also promote
the argument, such as g(std::string("xyz")), but that's another story.)
A parameterized type is a type that is parameterized over another type or some value. List<int> is
a type (List) parameterized over another type (int).
Not to be confused with "generality" (which just means avoiding solutions which are overly specific),
"genericity" means class templates.
117 of 133
C++ FAQ
STL ("Standard Templates Library") is a library that consists mainly of (very efficient) container
classes, along with some iterators and algorithms to work with the contents of these containers.
Technically speaking the term "STL" is no longer meaningful since the classes provided by the STL
have been fully integrated into the standard library, along with other standard classes like
std::ostream, etc. Nonetheless many people still refer to the STL as if it was a separate thing, so
you might as well get used to hearing that term.
Since the classes that were part of the STL have become part of the standard library, your compiler
should provide these classes. If your compiler doesn't include these standard classes, either get an
updated version of your compiler or download a copy of the STL classes from one of the following:
STL hacks for GCC-2.6.3 are part of the GNU libg++ package 2.6.2.1 or later (and they may be in
an earlier version as well). Thanks to Mike Lindner.
Also you may as well get used to some people using "STL" to include the standard string header,
"<string>", and others objecting to that usage.
[32.3] How can I find a Fred object in an STL container of Fred* such as
std::vector<Fred*>?
STL functions such as std::find_if() help you find a T element in a container of T's. But if you have a
container of pointers such as std::vector<Fred*>, these functions will enable you to find an
element that matches a given Fred* pointer, but they don't let you find an element that matches a
given Fred object.
The solution is to use an optional parameter that specifies the "match" function. The following class
template lets you compare the objects on the other end of the dereferenced pointers.
template<class T>
class DereferencedEqual {
public:
DereferencedEqual(const T* p) : p_(p) { }
bool operator() (const T* p2) const { return *p_ == *p2; }
private:
const T* p_;
};
Now you can use this template to find an appropriate Fred object:
[32.5] How can you tell if you have a dynamically typed C++ class library?
• Hint #1: when everything is derived from a single root class, usually Object.
• Hint #2: when the container classes (List, Stack, Set, etc) are non-templates.
• Hint #3: when the container classes (List, Stack, Set, etc) insert/extract elements as
pointers to Object. This lets you put an Apple into such a container, but when you get it out,
the compiler knows only that it is derived from Object, so you have to use a pointer cast to
convert it back to an Apple*; and you'd better pray a lot that it really is an Apple, cause your
blood is on your own head).
You can make the pointer cast "safe" by using dynamic_cast, but this dynamic testing is just that:
dynamic. This coding style is the essence of dynamic typing in C++. You call a function that says
"convert this Object into an Apple or give me NULL if its not an Apple," and you've got dynamic
typing: you don't know what will happen until run-time.
When you use templates to implement your containers, the C++ compiler can statically validate
90+% of an application's typing information (the figure "90+%" is apocryphal; some claim they
always get 100%, those who need persistence get something less than 100% static type checking).
The point is: C++ gets genericity from templates, not from inheritance.
CString s = "Text";
CStatusBar* p =
(CStatusBar*)AfxGetApp()->m_pMainWnd-
>GetDescendantWindow(AFX_IDW_STATUS_BAR);
p->SetPaneText(1, s);
This works with MFC v.1.00 which hopefully means it will work with other versions as well.
[33.4] How can I decompile an executable program back into C++ source code?
Here are a few of the many reasons this is not even remotely feasible:
• What makes you think the program was written in C++ to begin with?
• Even if you are sure it was originally written (at least partially) in C++, which one of the
gazillion C++ compilers produced it?
• Even if you know the compiler, which particular version of the compiler was used?
• Even if you know the compiler's manufacturer and version number, what compile-time
options were used?
• Even if you know the compiler's manufacturer and version number and compile-time options,
what third party libraries were linked-in, and what was their version?
• Even if you know all that stuff, most executables have had their debugging information
stripped out, so the resulting decompiled code will be totally unreadable.
• Even if you know everything about the compiler, manufacturer, version number, compile-
time options, third party libraries, and debugging information, the cost of writing a
decompiler that works with even one particular compiler and has even a modest success rate
119 of 133
C++ FAQ
at generating code would be a monumental effort — on the par with writing the compiler
itself from scratch.
But the biggest question is not how you can decompile someone's code, but why do you want to do
this? If you're trying to reverse-engineer someone else's code, shame on you; go find honest work.
If you're trying to recover from losing your own source, the best suggestion I have is to make
better backups next time.
Recall that when you delete[] an array, the runtime system magically knows how many destructors
to run. This FAQ describes a technique used by some C++ compilers to do this (the other common
technique is to use an associative array).
If the compiler uses the "over-allocation" technique, the code for p = new Fred[n] looks something
like the following. Note that WORDSIZE is an imaginary machine-dependent constant that is at least
sizeof(size_t), possibly rounded up for any alignment constraints. On many machines, this constant
will have a value of 4 or 8. It is not a real C++ identifier that will be defined for your compiler.
Note that the address passed to operator delete[] is not the same as p.
Compared to the associative array technique, this technique is faster, but more sensitive to the
problem of programmers saying delete p rather than delete[] p. For example, if you make a
programming error by saying delete p where you should have said delete[] p, the address that is
passed to operator delete(void*) is not the address of any valid heap allocation. This will probably
corrupt the heap. Bang! You're dead!
[33.7] How do compilers use an "associative array" to remember the number of elements
in an allocated array?
Recall that when you delete[] an array, the runtime system magically knows how many destructors
to run. This FAQ describes a technique used by some C++ compilers to do this (the other common
technique is to over-allocate).
120 of 133
C++ FAQ
If the compiler uses the associative array technique, the code for p = new Fred[n] looks something
like this (where arrayLengthAssociation is the imaginary name of a hidden, global associative array
that maps from void* to "size_t"):
Cfront uses this technique (it uses an AVL tree to implement the associative array).
Compared to the over-allocation technique, the associative array technique is slower, but less
sensitive to the problem of programmers saying delete p rather than delete[] p. For example, if you
make a programming error by saying delete p where you should have said delete[] p, only the first
Fred in the array gets destructed, but the heap may survive (unless you've replaced
operator delete[] with something that doesn't simply call operator delete, or unless the destructors
for the other Fred objects were necessary).
[33.8] If name mangling was standardized, could I link code compiled with compilers
from different compiler vendors?
In other words, some people would like to see name mangling standards incorporated into the
proposed C++ ANSI standards in an attempt to avoiding having to purchase different versions of
class libraries for different compiler vendors. However name mangling differences are one of the
smallest differences between implementations, even on the same platform.
[33.9] GNU C++ (g++) produces big executables for tiny programs; Why?
libg++ (the library used by g++) was probably compiled with debug info (-g). On some machines,
recompiling libg++ without debugging can save lots of disk space (approximately 1 MB; the down-
side: you'll be unable to trace into libg++ calls). Merely strip-ping the executable doesn't reclaim as
much as recompiling without -g followed by subsequent strip-ping the resultant a.out's.
Use size a.out to see how big the program code and data segments really are, rather than ls -
s a.out which includes the symbol table.
[Recently rewrote; added a new grammar/lexer plus a new URL for the old grammar/lexer thanks
to Ed Willink (in 4/01). Click here to go to the next FAQ in the "chain" of recent changes.]
The primary yacc grammar you'll want is from Ed Willink. Ed believes his grammar is fully compliant
with the ISO/ANSI C++ standard, however he doesn't warrant it: "the grammar has not," he says,
"been used in anger." You can get the grammar without action routines or the grammar with
dummy action routines. You can also get the corresponding lexer. For those who are interested in
how he achieves a context-free parser (by pushing all the ambiguities plus a small number of
repairs to be done later after parsing is complete), you might want to read chapter 4 of his thesis.
There is also a very old yacc grammar that doesn't support templates, exceptions, nor namespaces;
plus it deviates from the core language in some subtle ways. You can get that grammar here or
here.
These are not versions of the language, but rather versions of Cfront, which was the original C++
translator implemented by AT&T. It has become generally accepted to use these version numbers
as if they were versions of the language itself.
122 of 133
C++ FAQ
[33.12] Is it possible to convert C++ to C?
Depends on what you mean. If you mean, Is it possible to convert C++ to readable and
maintainable C-code? then sorry, the answer is No — C++ features don't directly map to C, plus the
generated C code is not intended for humans to follow. If instead you mean, Are there compilers
which convert C++ to C for the purpose of compiling onto a platform that yet doesn't have a C++
compiler? then you're in luck — keep reading.
A compiler which compiles C++ to C does full syntax and semantic checking on the program, and
just happens to use C code as a way of generating object code. Such a compiler is not merely some
kind of fancy macro processor. (And please don't email me claiming these are preprocessors — they
are not — they are full compilers.) It is possible to implement all of the features of ISO Standard C+
+ by translation to C, and except for exception handling, it typically results in object code with
efficiency comparable to that of the code generated by a conventional C++ compiler.
Here are some products that perform compilation to C (note: if you know of any other products that
do this, please email me):
• Comeau Computing offers a compiler based on Edison Design Group's front end that outputs
C code.
• Cfront, the original implementation of C++, done by Bjarne Stroustrup and others at AT&T,
generates C code. However it has two problems: it's been difficult to obtain a license since
the mid 90s when it started going through a maze of ownership changes, and development
ceased at that same time and so it is doesn't get bug fixes and doesn't support any of the
newer language features (e.g., exceptions, namespaces, RTTI, member templates).
• Contrary to popular myth, there is at present no version of g++ that translates C++ to C.
Such a thing seems to be doable, but no one has gone ahead and done it (yet).
Note that you typically need to specify the target platform's CPU, OS and C compiler so that the
generated C code will be specifically targeted for this platform. This means: (a) you probably can't
take the C code generated for platform X and compile it on platform Y; and (b) it'll be difficult to do
the translation yourself — it'll probably be a lot cheaper/safer with one of these tools.
One more time: do not email me saying these are just preprocessors — they are not — they are
compilers.
First, the best thing to do is get rid of the macro if at all possible. In fact, get rid of all macros:
they're evil in 4 different ways: evil#1, evil#2, evil#3, and evil#4, regardless of whether the
contain an if (but they're especially evil if they contain an if).
But if you can't (or don't want to) kill the macro that contains an if, here's how to make it less evil:
#define MYMACRO(a,b) \
if (xyzzy) asdf()
This will cause big problems if someone uses that macro in an if statement:
if (whatever)
MYMACRO(foo,bar);
else
baz;
123 of 133
C++ FAQ
The problem is that the else baz nests with the wrong if: the compiler sees this:
if (whatever)
if (xyzzy) asdf();
else baz;
The easy solution is to require {...} everywhere, but there's another solution that I prefer even if
there's a coding standard that requires {...} everywhere (just in case someone somewhere
forgets): add a balancing else to the macro definition:
#define MYMACRO(a,b) \
if (xyzzy) asdf(); \
else
Now the compiler will see a balanced set of ifs and elses:
if (whatever)
if (xyzzy)
asdf();
else
; // that's an empty statement
else
baz;
Like I said, I personally do the above even when the coding standard calls for {...} in all the ifs. Call
me paranoid, but I sleep better at night and my code has fewer bugs.
Note: you need to make sure to put a ; at the end of the macro usage (not at the end of the macro
definition!!). For example, the macro usage should look like this:
if (whatever)
MYMACRO(foo,bar); // right: there is a ; after MYMACRO(...)
else
baz;
Note: there is another #define macro (do {...} while (false)) that is fairly popular, but that has
some strange side-effects when used in C++.
[34.2] What should be done with macros that have multiple lines?
Answer: Choke, gag, cough. Macros are evil in 4 different ways: evil#1, evil#2, evil#3, and evil#4.
Kill them all!!
But if you can't (or don't want to) kill them, here's how to make them less evil:
#define MYMACRO(a,b) \
statement1; \
statement2; \
... \
statementN;
124 of 133
C++ FAQ
This can cause problems if someone uses the macro in a context that demands a single statement.
E.g.,
while (whatever)
MYMACRO(foo, bar);
The naive solution is to wrap the statements inside {...}, such as this:
#define MYMACRO(a,b) \
{\
statement1; \
statement2; \
... \
statementN; \
}
But this will cause compile-time errors with things like the following:
if (whatever)
MYMACRO(foo, bar);
else
baz;
if (whatever)
{
statement1;
statement2;
...
statementN;
}; // ERROR: { } cannot have a ; before an else
else
baz;
The usual solution in C was to wrap the statements inside a do { <statements go here>
} while (false), since that will execute the <statements go here> part exactly once. E.g., the macro
might look like this:
#define MYMACRO(a, b) \
do { \
statement1; \
statement2; \
... \
statementN; \
} while (false)
Note that there is no ; at the end of the macro definition. The ; gets added by the user of the
macro, such as the following:
if (whatever)
MYMACRO(foo, bar); // The ; is added here
else
baz;
This will expand to the following (note that the ; added by the user goes after (and completes) the
"} while (false)" part):
125 of 133
C++ FAQ
if (whatever)
do {
statement1;
statement2;
...
statementN;
} while (false);
else
baz;
The only problem with this is that it looks like a loop, and some C++ compilers refuse to "inline
expand" any method that has a loop in it.
So in C++ the best solution is to wrap the statements in an if (true) { <statements go here> } else
construct (note that the else is dangling, just like the situation described in the previous FAQ):
#define MYMACRO(a, b) \
if (true) { \
statement1; \
statement2; \
... \
statementN; \
} else
Now the code will expand into this (note the balanced set of ifs and elses):
if (whatever)
if (true) {
statement1;
statement2;
...
statementN;
} else
; // that's a null statement
else
baz;
[34.3] What should be done with macros that need to paste two tokens together?
Groan. I really hate macros. Yes they're useful sometimes, and yes I use them. But I always wash
my hands afterwards. Twice. Macros are evil in 4 different ways: evil#1, evil#2, evil#3, and evil#4.
Okay, here we go again, desperately trying to make an inherently evil thing a little less evil.
First, the basic approach is use the ISO/ANSI C and ISO/ANSI C++ "token pasting" feature: ##. On
the surface this would look like the following:
Suppose you have a macro called "MYMACRO", and suppose you're passing a token as the
parameter of that macro, and suppose you want to concatenate that token with the token "Tmp" to
create a variable name. For example, the use of MYMACRO(Foo) would create a variable named
"FooTmp" and the use of MYMACRO(Bar) would create a variable named "BarTmp". In this case the
naive approach would be to say this:
#define MYMACRO(a) \
/*...*/ a ## Tmp /*...*/
126 of 133
C++ FAQ
However you need a double layer of indirection when you use ##. Basically you need to create a
special macro for "token pasting" such as:
Trust me on this — you really need to do this! (And please nobody write me saying it sometimes
works without the second layer of indirection. Try concatenating a symbol with __LINE__ and see
what happens then.)
#define MYMACRO(a) \
/*...*/ name2(a,Tmp) /*...*/
And if you have a three-way concatenation to do (e.g., to paste three tokens together), you'd
create a name3() macro like this:
[34.4] Why can't the compiler find my header file in #include "c:\test.hpp" ?
You should use forward slashes ("/") rather than backslashes ("\") in your #include filenames, even
on an operating system that uses backslashes such as DOS, Windows, OS/2, etc. For example:
#if 1
#include "/version/next/alpha/beta/test.hpp" // RIGHT!
#else
#include "\version\next\alpha\beta\test.hpp" // WRONG!
#endif
Note that you should use forward slashes ("/") on all your filenames, not just on your #include files.
[34.5] What are the C++ scoping rules for for loops?
Yep.
The following code used to be legal, but not any more, since i's scope is now inside the for loop
only:
if (i != 10) {
// We exited the loop early; handle this situation separately
// ...
}
127 of 133
C++ FAQ
If you're working with some old code that uses a for loop variable after the for loop, the compiler
will (hopefully!) give you wa warning or an error message such as "Variable i is not in scope".
Unfortunately there are cases when old code will compile cleanly, but will do something different —
the wrong thing. For example, if the old code has a global variable i, the above code if (i != 10)
silently change in meaning from the for loop variable i under the old rule to the global variable i
under the current rule. This is not good. If you're concerned, you should check with your compiler
to see if it has some option that forces it to use the old rules with your old code.
Note: You should avoid having the same variable name in nested scopes, such as a global i and a
local i. In fact, you should avoid globals althogether whenever you can. If you abided by these
coding standards in your old code, you won't be hurt by a lot of things, including the scoping rules
for for loop variables.
Note: If your new code might get compiled with an old compiler, you might want to put {...}
around the for loop to force even old compilers to scope the loop variable to the loop. And please
try to avoid the temptation to use macros for this. Remember: macros are evil in 4 different ways:
evil#1, evil#2, evil#3, and evil#4.
If you declare both char f() and float f(), the compiler gives you an error message, since calling
simply f() would be ambiguous.
A persistent object can live after the program which created it has stopped. Persistent objects can
even outlive different versions of the creating program, can outlive the disk system, the operating
system, or even the hardware on which the OS was running when they were created.
The challenge with persistent objects is to effectively store their member function code out on
secondary storage along with their data bits (and the data bits and member function code of all
member objects, and of all their member objects and base classes, etc). This is non-trivial when
you have to do it yourself. In C++, you have to do it yourself. C++/OO databases can help hide the
mechanism for all this.
[34.8] Why is floating point so inaccurate? Why doesn't this print 0.43?
#include <iostream>
int main()
{
float a = 1000.43;
float b = 1000.0;
std::cout << a - b << '\n';
}
Answer: Floating point is an approximation. The IEEE standard for 32 bit float supports 1 bit of sign,
8 bits of exponent, and 23 bits of mantissa. Since a normalized binary-point mantissa always has
the form 1.xxxxx... the leading 1 is dropped and you get effectively 24 bits of mantissa. The
128 of 133
C++ FAQ
number 1000.43 (and many, many others) is not exactly representable in float or double format.
1000.43 is actually represented as the following bitpattern (the "s" shows the position of the sign
bit, the "e"s show the positions of the exponent bits, and the "m"s show the positions of the
mantissa bits):
seeeeeeeemmmmmmmmmmmmmmmmmmmmmmm
01000100011110100001101110000101
[34.9] How can I create two classes that both know about each other?
Sometimes you must create two classes that use each other. This is called a circular dependency.
For example:
class Fred {
public:
Barney* foo(); // Error: Unknown symbol 'Barney'
};
class Barney {
public:
Fred* bar();
};
The Fred class has a member function that returns a Barney*, and the Barney class has a member
function that returns a Fred. You may inform the compiler about the existence of a class or
structure by using a "forward declaration":
class Barney;
This line must appear before the declaration of class Fred. It simply informs the compiler that the
name Barney is a class, and further it is a promise to the compiler that you will eventually supply a
complete definition of that class.
[34.10] What special considerations are needed when forward declarations are used with
member objects?
The compiler will give you a compile-time error if the first class contains an object (as opposed to a
pointer to an object) of the second class. For example,
class Barney {
Fred x; // Error: The declaration of Fred is incomplete
};
class Fred {
Barney* y;
};
129 of 133
C++ FAQ
One way to solve this problem is to reverse order of the classes so the "used" class is defined
before the class that uses it:
class Fred {
Barney* y; // Okay: the first can point to an object of the second
};
class Barney {
Fred x; // Okay: the second can have an object of the first
};
Note that it is never legal for each class to fully contain an object of the other class since that would
imply infinitely large objects. In other words, if an instance of Fred contains a Barney (as opposed
to a Barney*), and a Barney contains a Fred (as opposed to a Fred*), the compiler will give you an
error.
[34.11] What special considerations are needed when forward declarations are used with
inline functions?
The compiler will give you a compile-time error if the first class contains an inline function that
invokes a member function of the second class. For example,
class Barney {
public:
void method()
{
x->yabbaDabbaDo(); // Error: Fred used before it was defined
}
private:
Fred* x; // Okay: the first can point to an object of the second
};
class Fred {
public:
void yabbaDabbaDo();
private:
Barney* y;
};
One way to solve this problem is to move the offending member function into the Barney.cpp file as
a non-inline member function. Another way to solve this problem is to reverse order of the classes
so the "used" class is defined before the class that uses it:
class Fred {
public:
void yabbaDabbaDo();
private:
Barney* y; // Okay: the first can point to an object of the second
};
130 of 133
C++ FAQ
class Barney {
public:
void method()
{
x->yabbaDabbaDo(); // Okay: Fred is fully defined at this point
}
private:
Fred* x;
};
Just remember this: Whenever you use forward declaration, you can use only that symbol; you may
not do anything that requires knowledge of the forward-declared class. Specifically you may not
access any members of the second class.
Because the std::vector<> template needs to know the sizeof() its contained elements, plus the
std::vector<> probably accesses members of the contained elements (such as the copy
constructor, the destructor, etc.). For example,
class Barney {
std::vector<Fred> x; // Error: the declaration of Fred is incomplete
};
class Fred {
Barney* y;
};
One solution to this problem is to change Barney so it uses a std::vector<> of Fred pointers rather
than a std::vector<> of Fred objects:
class Barney {
std::vector<Fred*> x; // Okay: Barney can use Fred pointers
};
class Fred {
Barney* y;
};
Another solution to this problem is to reverse the order of the classes so Fred is defined before
Barney:
class Fred {
Barney* y; // Okay: the first can point to an object of the second
};
class Barney {
std::vector<Fred> x; // Okay: Fred is fully defined at this point
};
131 of 133
C++ FAQ
Just remember this: Whenever you use a class as a template parameter, the declaration of that
class must be complete and not simply forward declared.
The above "surprise" message will appear on some (but not all) compilers/machines. But even if
your particular compiler/machine doesn't cause the above "surprise" message (and if you write me
telling me whether it does, you'll show you've missed the whole point of this FAQ), floating point
will surprise you at some point. So read this FAQ and you'll know what to do.
The reason floating point will surprise you is that float and double values are normally represented
using a finite precision binary format. In other words, floating point numbers are not real numbers.
For example, in your machine's floating point format it might be impossible to exactly represent the
number 0.1. By way of analogy, it's impossible to exactly represent the number one third in decimal
format (unless you use an infinite number of digits).
To dig a little deeper, let's examine what the decimal number 0.625 means. This number has a 6 in
the "tenths" place, a 2 in the "hundreths" place, and a 5 in the "thousanths" place. In other words,
we have a digit for each power of 10. But in binary, we might, depending on the details of your
machine's floating point format, have a bit for each power of 2. So the fractional part might have a
"halves" place, a "quarters" place, an "eighths" place, "sixteenths" place, etc., and each of these
places has a bit.
Let's pretend your machine represents the fractional part of floating point numbers using the above
scheme (it's normally more complicated than that, but if you already know exactly how floating
point numbers are stored, chances are you don't need this FAQ to begin with, so look at this as a
good starting point). On that pretend machine, the bits of the fractional part of 0.625 would be 101:
1 in the "halves" place, 0 in the "quarters" place, and 1 in the "eighths" place. In other words,
0.625 is 1/2 + 1/8.
But on this pretend machine, 0.1 cannot be represented exactly since it cannot be formed as a sum
of (negative) powers of 2 — at least not without an infinite number of (negative) powers of 2. We
can get close, but we can't represent it exactly. In particular we'd have a 0 in the "halves" place, a
0 in the "quarters" place, a 0 in the "eighths" place, and finally a 1 in the "sixteenths" place, leaving
a remainder of 1/10 - 1/16 = 3/80. Figuring out the other bits is left as an exercise (hint: look for a
repeating bit-pattern).
The message is that some floating point numbers cannot always be represented exactly, so
comparisons don't always do what you'd like them to do. In other words, if the computer actually
multiplies 10.0 by 1.0/10.0, it might not exactly get 1.0 back.
That's the problem. Now here's the solution: be very careful when comparing floating point
numbers for equality. For example:
if (x <= y) // Dubious!
bar();
}
Here's a (probably) smarter way to compare two floating point numbers (note that
std::numeric_limits<double>::epsilon() is a very small double-precision number):
#include <climits>
// similar to if (x <= y)
if (x - y < std::numeric_limits<double>::epsilon()) // Smarter!
bar();
}
Of course you can write some utility routines to encapsulate these funny pieces of code, such as:
#include <climits>
if (isLessOrEqual(x, y))
bar();
}
133 of 133