An Extensible I/O Facility For C+ +
An Extensible I/O Facility For C+ +
An Extensible I/O Facility For C+ +
Bjarne Stroustrup
AT&T Bell Laboratories
600 Mountain Avenue
ABSTRACT
This paper describes the classes ostream and istream designed to replace the
printf/scanf family of input and output functions in C+ +. The operator << is
overloaded for ostreams to provide an single type-secure paradigm for output of
both built-in and user-deÞned types. This paradigm typically yields output state
ments as short as or shorter than printf(). The operator >> handles input in a
similar fashion. Conditions like reaching the end of a Þle or the corruption of a
stream are handled by associating a state with each stream, rather than by return
ing "illegal values" like EOF. The C++ facilities used to implement stream i/o
are brießy explained. These include classes providing data hiding, constructors
providing initialization, destructors providing cleanup (in particular, ßushing of
buffers), operator overloading, and virtual functions providing uniform use of
buffers with different strategies for handling underßow and overßow.
Introduction
Except for minor details C++ is a superset of the C programming language. In addition to
the features of C, C++ provides Simula67-like classes, operator overloading, and a host of
minor improvements. The parts of the C+ + language that are used to specify the stream I/O
facility are brießy explained. References 6-8 describe C++ in the detail necessary for writing
programs in it.
The printfO family of functions provides simple, ßexible, and terse formatted output^ for C
programs (and therefore also for C++ programs). However, uses of prlntf() cannot in general
be type checked, and there is no convenient way for a user to deal with user-deÞned types in the
same way as built-in types. Consider:
This is of course an error under all circumstances since only fprlntf() and not prlntf () takes a
FILE* like stderr as an argument. This problem is trivially handled in C++ where <stdio.h>
declares
thus catching that error at compile time. The ellipsis indicates that any number of arguments of
any type may follow the initial arguments. However, had x been an int (rather than the char*
expected by the in the format string) no error would have been detected until the program
started printing garbage. Furthermore, had x been a user-deÞned type like complex there would
have been no way of specifying the output format of x in the convenient way used for types
"known to printfO" (for example, %s and W). The programmer would typically have deÞned a
separate function for printing complex numbers and then written something like this;
57
fprlntf(stderr,"! ¥ ;)"
put_coniplex(st(ierr,x);
fprintf(stderr,"Nn");
This is inelegant, and would be a major annoyance in C++ programs where a non-trivial pro
gram typically uses several user-deÞned types for the manipulation of entities that are
interesting/critical to an application.
Type-security and uniform treatment can be achieved by using a single overloaded function
name for a set of output functions. For example:
putCstderr,"! È
put(8tderr,x)j
put(stdeiT,"Xn");
The type of the argument determines which "put function" wil be invoked for each a^ument.
However, this is too verbose. The C++ solution using an output stream for which Ç has been
deÞned as a "put to" operator looks like this:
cerr << "x = " << x Ç "Xn";
where cerr is the standard error output stream (equivalent to the C stderr). So, if x is an Int
with the value 123, this statement would print
X = 123
on cerr.
The stream I/O facility is implemented exclusively using language features available to every
C++ programmer. Like C, C++ does not have any I/O facilities built into the language. The
stream I/O facility is provided in a library and contain no "extra-linguistic magic .
Output of built-in types
For output the class ostreain is defined. The operator Ç ("put to") is deFined to handle out-
put of the built-in types:
class ostrean {
// ...
public:
ostreanSt operator<<(char*); //write
cstreaiD& operator<<(long)j // beware: << a writes 97
ostreanS: operator<<(double);
// ...
o s t r e a a ( 8 t r e a i i i b u f * s ) ; / / b i n d t o s t r e a m b u ff e r
ostrean(int fd); // bind for Þle
ostreaJD(int size, char* p); // bind to vector
'ostreaoO;
};
CO
This class declaration defines the new type ©stream. A class declaration is very much like a
C struct declaration, except that a C++ class can have function members. Furthermore, a
member of a class appearing before the public: label can only be used by the functions men
tioned in the class declaration and is inaccessible to all other functions in the program. The inter
nal representation of an ostream has been rendered inaccessible to a user in this way, and since it
is not particularly interesting it has been omitted to simplify the discussion. The comment
.// ...
is used to indicate this. In C++ // starts a comment that terminates at the end of line. Tradi
tional C /* */ comments can also be used.
The operations <<, put() and ßush() may be applied to an ostream. For example:
cerr << "i = "j
An operator<< function returns a reference to the ©stream it was called for so that another
ostream can be applied to it. For example:
cerr << "x = " (( x;
In particular, this implies that when several items are printed by a single output statement they
will be printed in the expected order: left to right. Since "x = " is a string the Þrst operator<<
function will be chosen to write it, and similarly the appropriate operator<< function will be
chosen for x depending on x's type. Since x was an int it will be implicitly converted to a long
(as if in an assignment) and passed to the second operator<< function. Floating point numbers
are handled by the third operator<< function.
This facility for overloading function names and operators and then choosing the correct ver
sion to use for a particular call based on the types of the arguments is general in C+ + and has
nothing particular to do with I/O. Overloading enables the programmer to reduce the number of
function names needed in a program by allowing several functions performing similar operations
on objects of different types to share a name. In this particular case we avoid the printf(),
fprlntfO, and sprintfO name proliferation. The implicit type conversions reduce the number
of functions needed. For example, there is a single function for handling the integral types:
char, short, int, and long. There is no facility for printing unsigned values in a different way
from signed values since the facility for resolving calls to overloaded functions in C++ cannot
distinguish signed and unsigned types. Separate functions can, when needed, be used to handle
such cases.
It was necessary to deÞne an output operator to avoid the verbosity that would have resulted
from using an output function. But why <<? In C+ +, it is not possible to deÞne a new lexical
token, so one could not simply invent a new operatorf.
The assignment operator was a candidate for both input and output, but it binds the wrong
way. That is, cout=a=b would be interpreted as cout=(a=b), and most people seemed to prefer
the input operator to be different from the output operator.
t Part of the reason for this "restriction" is that most "obvious" choices of new operators would create ambiguities
and/or render legal C++ programs illegal. Consider, for example, these "possible operators": ->, **, <-, and //.
The operators < and > were tried, but the meanings "less than" and "greater than" were so
Þrmly implanted in people's minds that the new I/O statements were for all practical purposes
unreadable (this does not appear to be the case for << and )È. Apart from that, < is just
above ' / on most keyboards and people were writing expressions like this:
cout << X , y > z;
writes the number 10 after x and not the expected newline. This and similar problems can be
alleviated by deÞning a few macros
#deÞne sp << " "
#deÞne ht << "Nt"
#define nl Ç "\n"
Using non-syntactic macros is considered bad style in some quarters, but I like these (despite
disliking macros in general).
Consider also these examples
cout << X Ç " " Ç y Ç " " << z << "\n";
cout Ç "X = " Ç X << ", y = " << y << "\n";
Most people Þnd them hard to read because of the high number of quotes and because the -jut-
put operator is visually too imposing. The macros above plus a bit of indentation can help here.
cout << X sp << y sp << 2 nl;
cout << "x = " << X
<< ", y = " Ç y nl;
There are two standard output streams cout and cerr corresponding to stdout and stderr.
Naturally, the implementation involves a buffer that is occasionally ßushed onto the associated
output device. Flushing can be done explicitly like this:
cout.ßushO;
This is not required; the buffer is internal and hidden and will be ßushed automatically when
appropriate.
Two related questions comes to mind: How did a stream like cout get initialized, and how
does it get ßushed when the program terminates? Consider this program
iÞnclude (stream.h>
malnO
{
cout << "Hello, world";
}
The include directive ensures that the declarations needed to use an ostream are available. When
class ostream was declared it was provided with constructors and a destructor. A constructoi is a
function that must be called whenever an object of its type is created. It is distinguished by the
60
compiler by having the same name as its class. In particular, for an ostream the constructors
ostreaia(streambuf* s); // bind to stream buffer
ostream(int fd); // bind for Þle
cstream(lnt size, char* p); // bind to vector
were declared so one of them must be called when an ostream is created. Which one to call will,
as usual, be determined by the type of the arguments. The standard output streams are declared
like this (using Þle bufferes as described below);
char cout_buf [BUFSIZE];
Þlebuf cout_Þle = Þlebuf(l,cout_buf,BUFSIZE); // UNIX output stream 1
ostream cout = ostreain(&cout_Þle);
char cerT_buf[l];
Þlebuf cerr_Þle = Þlebuf(2,ceiT_buf,0); // UNIX output stream 2
// 0-len^h a) unbuffered
ostream cerr Ç ostream(jcoerr_file);
This code appears in the source for the stream I/O part of the C+ + standard library, not in the
(stream.h> header Þle. The C++ compiler/linker/loader is smart enough to Þgure out that the
ostream constructor needs to be called for cout and cerr before main() is executed. Every static
object of a class with a constructor in a program is handled in this way; this is not a ''special
feature" for I/O.
A destructor is a function that must be executed when an object of its type is destroyed (for
example, when an object goes out of scope). The name of the destructor for a class is the com
plement operator " followed by the name of the class, for example
'ostreamO;
61
class complex {
double re, im;
public:
complex(double r = 0, double i = 0) { re=r} imsi; }
friend double real(complexi a) { return a.re; }
friend double imag{complex& a) { return a.im; }
Operator << can be deÞned for the new type complex like this
ostreami operator(((ostream8;S , complex z)
^ return s Ç "(" << real(z) Ç Ç imag(z) Ç ")¥¥;
3j
and used exactly like a built-in type:
complex x(l,2);
// ...
cout Ç "X = " Ç X Ç "Xn";
Note that deÞning an output operation for a user-deÞned type does not require modiÞcation of
the declaration of class ostream, or access to the (hidden) data structure maintained by it. The
former is fortunate since the declaration of class ostream resides among the standard header Þles
to which the general user does not have write access. The latter is also important since it pro
vides a good protection against accidental corruption of that data structure. It also makes it pos
sible to change the implementation of an ostream without affecting user programs (see the ack
nowledgements).
Formatted output
So far << has been used only for unformatted output, and that has indeed been its major use
in real programs. There are, however, a few formatting routines that create a string representa
tion of their argument for use as output. Their (optional) second argument speciÞes the number
of character positions to be used.
char* oct(long, int =0) // octal representation
char* dec(loDg, int =0) // decimal representation
char* hex(long, int =0) // hexadecimal representation
Truncation or padding will be done unless a zero-sized Þeld is speciÞed; then (exactly) as many
characters as needed is used. For example:
cout (< "octc Ç oct(x,6) << ") È hexC Ç hex(x,4) Ç ¥')\n";
One can also used a printf style format string:
char* form(char* ...); // printf format
Using formO one gets exactly the facilities and problems well known from use of printf(); it is
actually sprintfO in disguise. Work is needed to get a satisfactory facility for providing format
ted output of user-deÞned types without the elaboration and type-insecunties associated with the
printf approach. In particular, it is probably necessary to Þnd a standard way of providing the
62
output function for a user deÞned type with information allowing it to determine space limita
tions, expectations about padding, left or right adjustment, etc., as expressed by its caller. A
practical, but not ideal approach, Is to provide functions for user deÞned types that, like the for
matting functions above, produce a suitable string representation of the object for which they are
called. For example:
class coniplex {
ßoat re,la;
public:
// ...
char* string(char* format) { return fom(fomat,re,lm); )
};
// ...
cout <<
Input can be done using an istream deÞned analogous to an ostream. Here is a small com
plete program that reads in a number using the operator >> ("get from") on the standard input
stream cin. The number read is assumed to be a number of inches and the program prims the
equivalent number of centimeters:
^include (stream.h>
malnO
{
int inch;
cout << "inchesa";
cin >> inch;
cout << inch << " in È " << inch*2.54 << " cm\n";
}
This can be compiled and run given the input 10 like this:
$ CC inch.c
$ a.out
inches=10
10 in Ç 2^.4 CD
$
Note that the address-of operator & does not appear anywhere in the example above. How
then did the number read get into the variable Inch? The answer is that the input operations are
deÞned using reference arguments. A reference is an alternative name for an object. For exam
ple, given an integer
int i;
The type int& is read "reference to int" and use of a reference is synonymous to use of the
name of the object it was initialized with. For example:
i = 7;
r - 7;
both assign 7 to 1. By declaring an argument to be of type reference the classical "call by refer
ence" is obtained.
63
Class istreao is deÞned like this:
class Istream {
// ...
public:
ostrean* tle(ostrean& s);
Naturally constructors and destructors are provided for the type istream as they were for the
type ostream.
One problem remains about why the example worked as intended: How did the output stream
cout Þgure out that it needed to write the prompt "inches=" before the input operation took
place? After all, had input and output been independent there would have been no reason to
ßush cout's buffer until the end of the program, and the program did not contain an explicit
ßushO. The stream I/O library uses the standard technique of "tying" an output stream to an
input stream. This means that cin knows about cout and executes a
cout.ßush0
before attempting to read characters from its device. The operation tie() can be used to tie any
output stream to any input stream. For example
cin.tie(mystream);
would cause cin to ßush the output stream mystream instead of cout.
Consider the functions
The Þrst reads a whitespace terminated string into a vector of characters; the second reads a sin
gle character into a char.
The functions reading ßoating point constants also accept plain integers.
64
Whitespace and raw input
The >> operator functions all skip whitespace characters. Whitespace is deÞned as the stan
dard C whitespace by a call to isspace() as deÞned in (ctype.h). On an implementation using
the ASCII character set this deÞnes whitespace to be the characters blank, tab, newline, vertical
tab, formfeed, and return.
Where it is not a good idea to simply treat any sequence of whitespace characters as a token
separator the functions
istreao& get(char& c); // single character
istream& get(char* p, Int n, int ='\n'&0377)j // string
can be used. They treat whitespace characters like other characters. The Þrst reads a single char
acter into its argument, the second reads at most n characters into a character vector starting at
p. The optional third argument is used to specify a character that will not be read. Default, the
second get() function will read at most n characters, but not more than a line: '\n' will not be
read.
Stream states
Every stream has a "state" associated with it, and errors and non-standard conditions are
handled by setting and testing this state appropriately. The fundamental reason for choosing this
approach over the traditional C approach of returning an illegal value in case of trouble was the
desire to treat all types (including user-deÞned types) in the same way, and for many types there
is no possible "illegal value" that can be returned. For example, when reading an int, and
returning an int every possible return value represents a legal value, so there is no way of
representing end-of-Þle.
An istream can be in one of the following states:
¥ eniun streain_state { .good, _eof, _fail, _bad };
If the state is .good or _eof the previous input operation succeeded. If the state is .good the next
input operation might succeed, otherwise it will fail. If one tries to read into a variable v and the
operation fails the value of v should be unchanged (it is unchanged if v is of one of the types
"known to" class ostream). In other words, applying an input operation to a stream that is not
in the .good state is a null operation. The difference between the states .fail and .bad is subtle,
and only really interesting to implementors of input operations: In the state .fail it is assumed
that the stream is uncorrupted and that no characters have been "lost". In the state .bad all bets
are off.
switch (cin.rdstateO) {
case .good:
// the last operation on cin succeeded
break;
case _eof:
// at end of Þle
break;
case .fail:
// BAD
break;
}
65
It might be worth noting that if someone invented a new state so that the test above only han
dled 4 out of 5 cases the compiler would issue a warning.
For any variable z of a type for which the operators È and Ç have been defined a copy loop
can be written like this:
while (cin>>z) cout << z << "Xn";
For example, if z is a character vector this loop wil take standard input and put it one word
(that is, a sequence of non-whitespace characters) per line onto standard output.
When a stream (or a stream operation returning a reference to a stream as in the copy exam
ple) is used as a condition, the state of the stream is tested and the test "succeeds", that is the
value of the condition is non-zero, (only) if the state is _good. To Þnd out why a loop or test
failed one can examine the state.
To copy characters (including whitespace characters) the raw input function get() can be
used:
char ch;
while (cin.get(ch)) cout<<ch;
An input operation can be deÞned for a user-deÞned type exactly as an output operation was,
but for an input operation it is essential that the second argument is of reference type. For exam
ple:
istreaoi operator>>(istream& s, complexSt a)
/*
input formats for a complex; "f" indicates a ßoat:
f
( f )
( f , f )
*/
{
double re = 0, im = 0;
char c = Oj
s>>c;
if (c " '(') {
s>>re)>c;
if (c " ',*) sÈim)>c;
if (c 1= ')') s.clear(_bad); // set the state
}
else {
s.putback(c);
s>>re;
}
if (s) a Ç complex(re,im);
return s;
Despite the scarcity of error handlittg code this wil actually handle most kinds of errors well.
The local variable c was initialized to avoid having its value accidentally ( after a failed opera
tion, and the Þnal check of the stream state ensures that the value of the argument a is changed
only if everything went well.
More work is needed on the input operations. In particular it would be nice if one could
specify input in terms of a pattern (as in languages like Snobol or Icon ) and then just test for
success and failure of the complete input operation. Such operations would naturally have to
provide some extra buffering so that they could "restore an input stream to its original state"
after a failed pattern-match operation.
String manipulation
Traditionally the functions sprintfO and sscanf() have been used to do I/O-Iike operations
on character strings. Using streams similar operations can be done by binding an istream or an
ostream to a character vector and then using the associated operators exactly as if the stream was
bound to a device. For example, if a vector buf contains a traditional zero-terminated string of
characters the copy loop presented above can be used to print the words from that vector:
cbar buf [SOMESIZE];
// Þll buf
istreao ist(sizeof(buf),buf); // make a stream for buf
char b2[MAX]j // larger than largest word
while (lst>>b2) cout Ç b2 << "\n";
Another use of this would be to read a Þle into a vector of characters replacing every
sequence of whitespace characters with a single space:
char buf [SOHESIZE]; // hopefully large enough
ostream ost(sizeof(buf)/buf)j
char b2[MAX]; // larger than largest word
while (cin>>b2) ost Ç b2 << "
There is no need to check for overßow of buf; its associated stream ost knows its size and will
go into _fail state when it is full.
Looking at the examples of output above one might conjecture that "an object should not be
printed by some general function, but rather print itself given an output stream and maybe also
some formatting information as arguments". In other words the output operator for a type X
should look something like this:
class X {
// ...
print(ostream& s = cout, format_type& format È default_format);
X objj
// ...
Obj.print(cerr);
This does have some appeal, but could not be the basic model in C-I-+ since built-in types like
int are not classes so that it is not possible to write
123.print(cerr); // illegal
Furthermore, this style could easily lead to the verbosity that the << operator style of output was
invented to avoid:
There are, however, cases where this "inversion" of the output paradigm becomes necessaryt.
t The considerations and concepts involved in this example will be familiar to users of Simula67', Smalltalk', or
C44; but may appear rather strange to others. If so, please have a look at any of these languages: the issues are
fundamental.
67
Consider a class shape providing the general concept of a geometric shape:
class shape {
point center; // every shape has a center
// ...
public:
// ...
virtual drau();
};
A shape can be "drawn", that is, printed on a stream, but the function drawC) is virtual.
That is, a separate drauC) function is provided for each particular kind of shape derived from
class shape. For example:
class circle : public shape {
int radius; // a circle has both a center and a radius
public:
// ...
void drawO;
circle(point cen, int rad);
};
That is, a circle has all the attributes of a shape, and can be manipulated as a shape, but it
also have some special properties that must be taken into account when it is manipulated. For
example:
shape* p;
circle c(point(0,0),10);
p _ gjc; // the compiler does not know that *p is a circle
// ...
p->draw(); // somewhere else
must draw p as a circle. In other words, the fact that the shape pointed to by p is a circle must
be deduced at run lime from information stored in each shape.
Now consider providing this facility within the "ostreamiiobject" paradigm for output. Like
Smalltalk and Simula67, C++ only provides the run time type resolution necessary to determine
the actual type of an object at run time using the "object.operation(argument)" paradigm, so an
<< operation must be "inverted". This is how the inversion can be done:
ostreamSt operator<{(ostream& s, shape* p) {
return p->draw(s);
}
Naturally, the reason for the inversion is to maintain a single (terse) paradigm for output
operations. Since there is no standard way of passing formatting information on to the virtual
output function (draw in this example), the solution is not perfect.
Buffering
The I/O operations have been speciÞed without any mention of device types, but not all dev
ices can be treated identically with respect to buffering strategies. In particular, an ostream
bound to a character string needs a different kind of buffer from an ostream bound to a Þle.
There is also a need for double buffering of streams connected to network facilities. These prob
lems are handled by providing different buffer types for different streams at the time of initializa
tion (note the three constructors for class ©stream presented above). There is only one set of
operations on these buffer types, so the ostream functions do not contain code distinguishing
them. However, the functions handling buffer underßow and overßow are virtual. This is suf
Þcient to cope with the buffering strategies needed to date, and an excellent example of the use of
virtual functions to allow uniform treatment of logically equivalent facilities with different imple
mentations. The declaration of a stream buffer in <streaßi.h> looks like this:
// ...
};
Note that the pointers needed to maintain the buffer are speciÞed here so that the common "per
character" operations can be deÞned (once only) as maximally efÞcient inline functions. Only
the overßowO and underßow() functions need to be implemented for each particular buffering
strategy. For example:
struct Þlebuf : public streambuf { //a stream buffer for Þles
EfÞciency
One might expect that since this I/O facility is deÞned using generally available language
features, it is noticeably less efÞcient than a built-in facility. This does not appear to be the case.
Inline expanded functions are used for the basic operations (like "put a character into a buffer"),
so the basic overhead tend to be one function call per simple object (integer, string, etc.) written
(or read) plus one function call per buffer overßow. This does not appear to be fundamentally
different from other I/O facilities dealing with objects at this level.
69
Conclusion . ã
ãã ,È*...È tST/S oi
digm for input and output of both b"''"' avoiding the verbosity traditionally associ-
(surprisingly) turned out to be i/o facility handles a range of
ated with extensible type-secure '/O ^eme ^ bit more work, espe
Acknowledgements
ACKnowieugcuiv.ia,. . r
References