Draft ANSI C Rationale
Draft ANSI C Rationale
Draft ANSI C Rationale
1 INTRODUCTION 1
1.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Organization of the document . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Base documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.6 Definitions of terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.7 Compliance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.8 Future directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 ENVIRONMENT 9
2.1 Conceptual models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.1 Translation environment . . . . . . . . . . . . . . . . . . . . . 9
2.1.2 Execution environments . . . . . . . . . . . . . . . . . . . . . 11
2.2 Environmental considerations . . . . . . . . . . . . . . . . . . . . . . 13
2.2.1 Character sets . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.2 Character display semantics . . . . . . . . . . . . . . . . . . . 16
2.2.3 Signals and interrupts . . . . . . . . . . . . . . . . . . . . . . 16
2.2.4 Environmental limits . . . . . . . . . . . . . . . . . . . . . . . 17
3 LANGUAGE 19
3.1 Lexical Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1.1 Keywords . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1.2 Identifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1.3 Constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.1.4 String literals . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.1.5 Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.1.6 Punctuators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.1.7 Header names . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.1.8 Preprocessing numbers . . . . . . . . . . . . . . . . . . . . . . 33
3.1.9 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2 Conversions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.2.1 Arithmetic operands . . . . . . . . . . . . . . . . . . . . . . . 34
i
ii CONTENTS
4 LIBRARY 71
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.1.1 Definitions of terms . . . . . . . . . . . . . . . . . . . . . . . 71
4.1.2 Standard headers . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.1.3 Errors <errno.h> . . . . . . . . . . . . . . . . . . . . . . . . 73
4.1.4 Limits <float.h> and <limits.h> . . . . . . . . . . . . . . . 73
4.1.5 Common definitions <stddef.h> . . . . . . . . . . . . . . . . 74
4.1.6 Use of library functions . . . . . . . . . . . . . . . . . . . . . 75
4.2 Diagnostics <assert.h> . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.2.1 Program diagnostics . . . . . . . . . . . . . . . . . . . . . . . 76
4.3 Character Handling <ctype.h> . . . . . . . . . . . . . . . . . . . . . 76
4.3.1 Character testing functions . . . . . . . . . . . . . . . . . . . 77
4.3.2 Character case mapping functions . . . . . . . . . . . . . . . 78
4.4 Localization <locale.h> . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.4.1 Locale control . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.4.2 Numeric formatting convention inquiry . . . . . . . . . . . . 80
4.5 Mathematics <math.h> . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.5.1 Treatment of error conditions . . . . . . . . . . . . . . . . . . 81
4.5.2 Trigonometric functions . . . . . . . . . . . . . . . . . . . . . 82
4.5.3 Hyperbolic functions . . . . . . . . . . . . . . . . . . . . . . . 83
4.5.4 Exponential and logarithmic functions . . . . . . . . . . . . . 83
4.5.5 Power functions . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.5.6 Nearest integer, absolute value, and remainder functions . . . 84
4.6 Nonlocal jumps <setjmp.h> . . . . . . . . . . . . . . . . . . . . . . . 84
4.6.1 Save calling environment . . . . . . . . . . . . . . . . . . . . 85
4.6.2 Restore calling environment . . . . . . . . . . . . . . . . . . . 85
4.7 Signal Handling <signal.h> . . . . . . . . . . . . . . . . . . . . . . 86
4.7.1 Specify signal handling . . . . . . . . . . . . . . . . . . . . . . 86
4.7.2 Send signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.8 Variable Arguments <stdarg.h> . . . . . . . . . . . . . . . . . . . . 87
4.8.1 Variable argument list access macros . . . . . . . . . . . . . . 87
4.9 Input/Output <stdio.h> . . . . . . . . . . . . . . . . . . . . . . . . 88
4.9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
RATIONALE
iv CONTENTS
4.9.2 Streams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.9.3 Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.9.4 Operations on files . . . . . . . . . . . . . . . . . . . . . . . . 92
4.9.5 File access functions . . . . . . . . . . . . . . . . . . . . . . . 93
4.9.6 Formatted input/output functions . . . . . . . . . . . . . . . 95
4.9.7 Character input/output functions . . . . . . . . . . . . . . . . 97
4.9.8 Direct input/output functions . . . . . . . . . . . . . . . . . . 98
4.9.9 File positioning functions . . . . . . . . . . . . . . . . . . . . 99
4.9.10 Error-handling functions . . . . . . . . . . . . . . . . . . . . . 100
4.10 General Utilities <stdlib.h> . . . . . . . . . . . . . . . . . . . . . . 100
4.10.1 String conversion functions . . . . . . . . . . . . . . . . . . . 100
4.10.2 Pseudo-random sequence generation functions . . . . . . . . . 101
4.10.3 Memory management functions . . . . . . . . . . . . . . . . . 101
4.10.4 Communication with the environment . . . . . . . . . . . . . 102
4.10.5 Searching and sorting utilities . . . . . . . . . . . . . . . . . . 104
4.10.6 Integer arithmetic functions . . . . . . . . . . . . . . . . . . . 104
4.10.7 Multibyte character functions . . . . . . . . . . . . . . . . . . 105
4.10.8 Multibyte string functions . . . . . . . . . . . . . . . . . . . . 105
4.11 STRING HANDLING <string.h> . . . . . . . . . . . . . . . . . . . 105
4.11.1 String function conventions . . . . . . . . . . . . . . . . . . . 105
4.11.2 Copying functions . . . . . . . . . . . . . . . . . . . . . . . . 106
4.11.3 Concatenation functions . . . . . . . . . . . . . . . . . . . . . 106
4.11.4 Comparison functions . . . . . . . . . . . . . . . . . . . . . . 107
4.11.5 Search functions . . . . . . . . . . . . . . . . . . . . . . . . . 107
4.11.6 Miscellaneous functions . . . . . . . . . . . . . . . . . . . . . 108
4.12 DATE AND TIME <time.h> . . . . . . . . . . . . . . . . . . . . . . 108
4.12.1 Components of time . . . . . . . . . . . . . . . . . . . . . . . 108
4.12.2 Time manipulation functions . . . . . . . . . . . . . . . . . . 108
4.12.3 Time conversion functions . . . . . . . . . . . . . . . . . . . . 110
4.13 Future library directions . . . . . . . . . . . . . . . . . . . . . . . . . 111
4.13.1 Errors <errno.h> . . . . . . . . . . . . . . . . . . . . . . . . 111
4.13.2 Character handling <ctype.h> . . . . . . . . . . . . . . . . . 111
4.13.3 Localization <locale.h> . . . . . . . . . . . . . . . . . . . . 111
4.13.4 Mathematics <math.h> . . . . . . . . . . . . . . . . . . . . . 111
4.13.5 Signal handling <signal.h> . . . . . . . . . . . . . . . . . . . 111
4.13.6 Input/output <stdio.h> . . . . . . . . . . . . . . . . . . . . 111
4.13.7 General utilities <stdlib.h> . . . . . . . . . . . . . . . . . . 111
4.13.8 String handling <string.h> . . . . . . . . . . . . . . . . . . . 111
5 APPENDICES 113
INDEX 115
Section 1
INTRODUCTION
1.1 Purpose
The Committee’s overall goal was to develop a clear, consistent, and unambiguous
Standard for the C programming language which codifies the common, existing def-
inition of C and which promotes the portability of user programs across C language
environments.
The X3J11 charter clearly mandates the Committee to codify common existing
practice. The Committee has held fast to precedent wherever this was clear and
unambiguous. The vast majority of the language defined by the Standard is precisely
the same as is defined in Appendix A of The C Programming Language by Brian
Kernighan and Dennis Ritchie, and as is implemented in almost all C translators.
(This document is hereinafter referred to as K&R.)
K&R is not the only source of “existing practice.” Much work has been done over
1
2 Section 1. INTRODUCTION
the years to improve the C language by addressing its weaknesses. The Committee
has formalized enhancements of proven value which have become part of the various
dialects of C.
Existing practice, however, has not always been consistent. Various dialects
of C have approached problems in different and sometimes diametrically opposed
ways. This divergence has happened for several reasons. First, K&R, which has
served as the language specification for almost all C translators, is imprecise in some
areas (thereby allowing divergent interpretations), and it does not address some
issues (such as a complete specification of a library) important for code portability.
Second, as the language has matured over the years, various extensions have been
added in different dialects to address limitations and weaknesses of the language;
these extensions have not been consistent across dialects.
One of the Committee’s goals was to consider such areas of divergence and to
establish a set of clear, unambiguous rules consistent with the rest of the language.
This effort included the consideration of extensions made in various C dialects, the
specification of a complete set of required library functions, and the development of
a complete, correct syntax for C.
The work of the Committee was in large part a balancing act. The Committee
has tried to improve portability while retaining the definition of certain features of
C as machine-dependent. It attempted to incorporate valuable new ideas without
disrupting the basic structure and fabric of the language. It tried to develop a clear
and consistent language without invalidating existing programs. All of the goals were
important and each decision was weighed in the light of sometimes contradictory
requirements in an attempt to reach a workable compromise.
In specifying a standard language, the Committee used several guiding principles,
the most important of which are:
Existing code is important, existing implementations are not. A large body
of C code exists of considerable commercial value. Every attempt has been made
to ensure that the bulk of this code will be acceptable to any implementation con-
forming to the Standard. The Committee did not want to force most programmers
to modify their C programs just to have them accepted by a conforming translator.
On the other hand, no one implementation was held up as the exemplar by which
to define C: it is assumed that all existing implementations must change somewhat
to conform to the Standard.
C code can be portable. Although the C language was originally born with the
UNIX operating system on the DEC PDP-11, it has since been implemented on a
wide variety of computers and operating systems. It has also seen considerable use
in cross-compilation of code for embedded systems to be executed in a free-standing
environment. The Committee has attempted to specify the language and the library
to be as widely implementable as possible, while recognizing that a system must meet
certain minimum criteria to be considered a viable host or target for the language.
C code can be non-portable. Although it strove to give programmers the op-
portunity to write truly portable programs, the Committee did not want to force
1.1. Purpose 3
RATIONALE
4 Section 1. INTRODUCTION
1.2 Scope
This Rationale focuses primarily on additions, clarifications, and changes made to
the language as described in the Base Documents (see §1.5). It is not a rationale for
the C language as a whole: the Committee was charged with codifying an existing
language, not designing a new one. No attempt is made in this Rationale to defend
the pre-existing syntax of the language, such as the syntax of declarations or the
binding of operators.
The Standard is contrived as carefully as possible to permit a broad range of im-
plementations, from direct interpreters to highly optimizing compilers with separate
linkers, from ROM-based embedded microcomputers to multi-user multi-processing
host systems. A certain amount of specialized terminology has therefore been cho-
sen to minimize the bias toward compiler implementations shown in the Base Doc-
uments.
The Rationale discusses some language or library features which were not
adopted into the Standard. These are usually features which are popular in some C
implementations, so that a user of those implementations might question why they
do not appear in the Standard.
1.3 References
• A char (or signed char or unsigned char) occupies exactly one byte.
(Thus, for instance, on a machine with 36-bit words, a byte can be defined to consist
of 9, 12, 18, or 36 bits, these numbers being all the exact divisors of 36 which are not
less than 8.) These strictures codify the widespread presumption that any object
can be treated as an array of characters, the size of which is given by the sizeof
operator with that object’s type as its operand.
RATIONALE
6 Section 1. INTRODUCTION
These definitions do not preclude “holes” in struct objects. Such holes are in
fact often mandated by alignment and packing requirements. The holes simply do
not participate in representing the (composite) value of an object.
The definition of object does not employ the notion of type. Thus an object has
no type in and of itself. However, since an object may only be designated by an
lvalue (see §3.2.2.1), the phrase “the type of an object” is taken to mean, here and
in the Standard, “the type of the lvalue designating this object,” and “the value of
an object” means “the contents of the object interpreted as a value of the type of
the lvalue designating the object.”
The concept of multi-byte character has been added to C to support very large
character sets. See §2.2.1.2.
1.7 Compliance
The three-fold definition of compliance is used to broaden the population of con-
forming programs and distinguish between conforming programs using a single im-
plementation and portable conforming programs.
A strictly conforming program is another term for a maximally portable program.
The goal is to give the programmer a fighting chance to make powerful C programs
that are also highly portable, without demeaning perfectly useful C programs that
happen not to be portable. Thus the adverb strictly.
1.7. Compliance 7
• A strictly conforming program can use only a restricted subset of the identifiers
that begin with underscore (§4.1.2). Identifiers and keywords are distinct
(§3.1.1). Otherwise, programmers can use whatever internal names they wish;
a conforming implementation is guaranteed not to use conflicting names of
the form reserved to the programmer. (Note, however, the class of identifiers
which are identified in §4.13 as possible future library names.)
• The external functions defined in, or called within, a portable program can be
named whatever the programmer wishes, as long as these names are distinct
from the external names defined by the Standard library (§4). External names
in a maximally portable program must be distinct within the first 6 characters
mapped into one case (§3.1.2).
RATIONALE
8 Section 1. INTRODUCTION
Other proposals rejected more quickly were to provide a validation suite, and to
provide the source code for an acceptable library. Both were recognized to be major
undertakings, and both were seen to compromise the integrity of the Standard by
giving concrete examples that might bear more weight than the Standard itself. The
potential legal implications were also a concern.
Standardization of such tools as program consistency checkers and symbolic
debuggers lies outside the mandate of the Committee. However, the Committee
has taken pains to allow such programs to work with conforming programs and
implementations.
ENVIRONMENT
9
10 Section 2. ENVIRONMENT
upon C, the preprocessing commands accreted over time, with little central direction,
and with even less precision in their documentation. This evolution has resulted in
a variety of local features, each with its ardent adherents: the Base Document offers
little clear basis for choosing one over the other.
The consensus of the Committee is that preprocessing should be simple and
overt, that it should sacrifice power for clarity. For instance, the macro invocation
f(a, b) should assuredly have two actual arguments, even if b expands to c, d;
and the formal definition of f must call for exactly two arguments. Above all,
the preprocessing sub-language should be specified precisely enough to minimize or
eliminate dialect formation.
To clarify the nature of preprocessing, the translation from source text to tokens
is spelled out as a number of separate phases. The separate phases need not actually
be present in the translator, but the net effect must be as if they were. The phases
need not be performed in a separate preprocessor, although the definition certainly
permits this common practice. Since the preprocessor need not know anything
about the specific properties of the target, a machine-independent implementation
is permissible.
The Committee deemed that it was outside the scope of its mandate to require
the output of the preprocessing phases be available as a separate translator output
file.
The phases of translation are spelled out to resolve the numerous questions
raised about the precedence of different parses. Can a #define begin a comment?
(No.) Is backslash/new-line permitted within a trigraph? (No.) Must a comment
be contained within one #include file? (Yes.) And so on. The Rationale section
on preprocessing (§3.8) discusses the reasons for many of the particular decisions
which shaped the specification of the phases of translation.
A backslash immediately before a new-line has long been used to continue string
literals, as well as preprocessing command lines. In the interest of easing machine
generation of C, and of transporting code to machines with restrictive physical
line lengths, the Committee generalized this mechanism to permit any token to be
continued by interposing a backslash/new-line sequence.
2.1.1.3 Diagnostics
By mandating some form of diagnostic message for any program containing a syntax
error or constraint violation, the Standard performs two important services. First, it
gives teeth to the concept of erroneous program, since a conforming implementation
must distinguish such a program from a valid one. Second, it severely constrains
the nature of extensions permissible to a conforming implementation.
The Standard says nothing about the nature of the diagnostic message, which
could simply be “syntax error”, with no hint of where the error occurs. (An
implementation must, of course, describe what translator output constitutes a di-
agnostic message, so that the user can recognize it as such.) The Committee ulti-
2.1. Conceptual models 11
mately decided that any diagnostic activity beyond this level is an issue of quality of
implementation, and that market forces would encourage more useful diagnostics.
Nevertheless, the Committee felt that at least some significant class of errors must
be diagnosed, and the class specified should be recognizable by all translators.
The Standard does not forbid extensions, but such extensions must not inval-
idate strictly conforming programs. The translator must diagnose the use of such
extensions, or allow them to be disabled as discussed in (Rationale) §1.7. Other-
wise, extensions to a conforming C implementation lie in such realms as defining
semantics for syntax to which no semantics is ascribed by the Standard, or giving
meaning to undefined behavior.
The properties required of a hosted environment are spelled out in a fair amount of
detail in order to give programmers a reasonable chance of writing programs which
are portable among such environments.
The behavior of the arguments to main, and of the interaction of exit, main
and atexit (see §4.10.4.2) has been codified to curb some unwanted variety in the
representation of argv strings, and in the meaning of values returned by main.
The specification of argc and argv as arguments to main recognizes extensive
prior practice. argv[argc] is required to be a null pointer to provide a redundant
check for the end of the list, also on the basis of common practice.
main is the only function that may portably be declared either with zero or two
arguments. (The number of arguments must ordinarily match exactly between invo-
cation and definition.) This special case simply recognizes the widespread practice
of leaving off the arguments to main when the program does not access the program
argument strings. While many implementations support more than two arguments
to main, such practice is neither blessed nor forbidden by the Standard; a program
that defines main with three arguments is not strictly conforming. (See Standard
Appendix F.5.1.)
Command line I/O redirection is not mandated by the Standard; this was deemed
to be a feature of the underlying operating system rather than the C language.
RATIONALE
12 Section 2. ENVIRONMENT
sum = 0;
for (i = 0; i < N; ++i)
sum += a[i];
both sum and i might be profitably kept in registers during the execution of the
loop. Thus, the actual memory objects designated by sum and i would not change
state during the loop.
Such behavior is, of course, too loose for hardware-oriented applications such as
device drivers and memory-mapped I/O. The following loop looks almost identical
to the previous example, but the specification of volatile ensures that each assign-
ment to *ttyport takes place in the same sequence, and with the same values, as
the (hypothetical) abstract machine would have done.
evaluation of the subexpression mask1 & mask2 could be performed prior to the
loop in the real implementation, assuming that neither mask1 nor mask2 appear as
an operand of the address-of (&) operator anywhere in the function. In the abstract
machine, of course, this subexpression is re-evaluated at each loop iteration, but
the real implementation is not required to mimic this repetitiveness, because the
variables mask1 and mask2 are not volatile and the same results are obtained
either way.
The previous example shows that a subexpression can be pre-computed in the
real implementation. A question sometimes asked regarding optimization is, “Is
the rearrangement still conforming if the pre-computed expression might raise a
signal (such as division by zero)?” Fortunately for optimizers, the answer is “Yes,”
because any evaluation that raises a computational signal has fallen into an undefined
behavior (§3.3), for which any action is allowable.
Behavior is described in terms of an abstract machine to underscore, once again,
that the Standard mandates results as if certain mechanisms are used, without
requiring those actual mechanisms in the implementation. The Standard specifies
agreement points at which the value of an object or class of objects in an implemen-
tation must agree with the value ascribed by the abstract semantics.
Appendix B to the Standard lists the sequence points specified in the body of
the Standard.
The class of interactive devices is intended to include at least asynchronous ter-
minals, or paired display screens and keyboards. An implementation may extend the
definition to include other input and output devices, or even network inter-program
connections, provided they obey the Standard’s characterization of interactivity.
RATIONALE
14 Section 2. ENVIRONMENT
# [ ] { } \ | ~ ^
Given this repertoire, the Committee faced the problem of defining representations
for the absent characters. The obvious idea of defining two-character escape se-
quences fails because C uses all the characters which are in the ISO 646 repertoire:
2.2. Environmental considerations 15
no single escape character is available. The best that can be done is to use a trigraph
— an escape digraph followed by a distinguishing character.
?? was selected as the escape digraph because it is not used anywhere else
in C (except as noted below); it suggests that something unusual is going on. The
third character was chosen with an eye to graphical similarity to the character being
represented.
The sequence ?? cannot currently occur anywhere in a legal C program except
in strings, character constants, comments, or header names. The character escape
sequence '\?' (see §3.1.3.4) was introduced to allow two adjacent question-marks
in such contexts to be represented as ?\?, a form distinct from the escape digraph.
The Committee makes no claims that a program written using trigraphs looks
attractive. As a matter of style, it may be wise to surround trigraphs with white
space, so that they stand out better in program text. Some users may wish to define
preprocessing macros for some or all of the trigraph sequences.
QUIET CHANGE
Programs with character sequences such as ??! in string constants,
character constants, or header names will now produce different results.
• The null character ('\0') may not be used as part of a multibyte encoding,
except for the one-byte null character itself. This allows existing functions
which manipulate strings transparently to work with multibyte sequences.
• Shift encodings (which interpret byte sequences in part on the basis of some
state information) must start out in a known (default) shift state under certain
circumstances, such as the start of string literals.
RATIONALE
16 Section 2. ENVIRONMENT
2.2.4.2.1 Sizes of integral types <limits.h> Such a large body of C code has
been developed for 8-bit byte machines that the integer sizes in such environments
RATIONALE
18 Section 2. ENVIRONMENT
1
See X3J3 working document S8-112.
Section 3
LANGUAGE
While more formal methods of language definition were explored, the Committee
decided early on to employ the style of the Base Document: Backus-Naur Form for
the syntax and prose for the constraints and semantics. Anything more ambitious
was considered to be likely to delay the Standard, and to make it less accessible to
its audience.
3.1.1 Keywords
Several keywords have been added: const, enum, signed, void, and volatile.
As much as possible, however, new features have been added by overloading ex-
isting keywords, as, for example, long double instead of extended. It is recognized
that each added keyword will require some existing code that used it as an identi-
fier to be rewritten. No meaningful programs are known to be quietly changed by
adding the new keywords.
The keywords entry, fortran, and asm have not been included since they were
either never used, or are not portable. Uses of fortran and asm as keywords are
noted as common extensions.
3.1.2 Identifiers
While an implementation is not obliged to remember more than the first 31 charac-
ters of an identifier for the purpose of name matching, the programmer is effectively
prohibited from intentionally creating two different identifiers that are the same in
19
20 Section 3. LANGUAGE
the first 31 characters. Implementations may therefore store the full identifier; they
are not obliged to truncate to 31.
The decision to extend significance to 31 characters for internal names was made
with little opposition, but the decision to retain the old six-character case-insensitive
restriction on significance of external names was most painful. While strong senti-
ment was expressed for making C “right” by requiring longer names everywhere, the
Committee recognized that the language must, for years to come, coexist with other
languages and with older assemblers and linkers. Rather than undermine support
for the Standard, the severe restrictions have been retained.
The Committee has decided to label as obsolescent the practice of providing
different identifier significance for internal and external identifers, thereby signalling
its intent that some future version of the C Standard require 31-character case-
sensitive external name significance, and thereby encouraging new implementations
to support such significance.
Three solutions to the external identifier length/case problem were explored,
each with its own set of problems:
1. Make sure that external identifiers are unique within the first six characters,
3.1. Lexical Elements 21
and use only one case within the name. A unique six-character prefix could be
used, followed by an underscore, followed by a longer, more descriptive name:
2. Use the prefix method described above, and then use #define statements to
provide a longer, more descriptive name for the unique name, such as:
Note that overuse of this technique might result in exceeding the limit on the
number of allowed #define macros, or some other implementation limit.
3. Use longer and/or multi-case external names, and limit the portability of the
programs to systems that support the longer names.
4. Declare all exported items (or pointers thereto) in a single data structure
and export that structure. The technique can reduce the number of external
identifiers to one per translation unit; member names within the structure are
internal identifiers, hence can have full significance. The principal drawback
of this technique is that functions can only be exported by reference, not by
name; on many systems this entails a run-time overhead on each function call.
QUIET CHANGE
A program that depends upon internal identifiers matching only in the
first (say) eight characters may change to one with distinct objects for
each variant spelling of the identifier.
first(){
extern d_struct func();
/* ... */
}
RATIONALE
22 Section 3. LANGUAGE
second(){
d_struct n = func();
}
While it was generally agreed that it is poor practice to take advantage of an external
declaration once it had gone out of scope, some argued that a translator had to
remember the declaration for checking anyway, so why not acknowledge this? The
compromise adopted was to decree essentially that block scope rules apply, but that
a conforming implementation need not diagnose a failure to redeclare an external
identifier that had gone out of scope (undefined behavior).
QUIET CHANGE
A program relying on file scope rules may be valid under block scope
rules but behave differently — for instance, if d struct were defined as
type float rather than struct data in the example above.
#define status 23
void exit(int status);
#define status []
which is syntactically correct but semantically quite different from the intent.
To protect an implementation’s header prototypes from such misinterpretation,
the implementor must write them to avoid these surprises. Possible solutions include
not using identifiers in prototypes, or using names (such as status or Status) in
the reserved name space.
3.1. Lexical Elements 23
The definition model to be used for objects with external linkage was a major
standardization issue. The basic problem was to decide which declarations of an
object define storage for the object, and which merely reference an existing object.
A related problem was whether multiple definitions of storage are allowed, or only
one is acceptable. Existing implementations of C exhibit at least four different
models, listed here in order of increasing restrictiveness:
Common Every object declaration with external linkage (whether or not the key-
word extern appears in the declaration) creates a definition of storage. When
all of the modules are combined together, each definition with the same name
is located at the same address in memory. (The name is derived from common
storage in FORTRAN.) This model was the intent of the original designer of
C, Dennis Ritchie.
Relaxed Ref/Def The appearance of the keyword extern (whether it is used out-
side of the scope of a function or not) in a declaration indicates a pure reference
(ref), which does not define storage. Somewhere in all of the translation units,
at least one definition (def) of the object must exist. An external definition
is indicated by an object declaration in file scope containing no storage class
indication. A reference without a corresponding definition is an error. Some
implementations also will not generate a reference for items which are declared
with the extern keyword, but are never used within the code. The UNIX oper-
ating system C compiler and linker implement this model, which is recognized
as a common extension to the C language (F.4.11). UNIX C programs which
take advantage of this model are standard conforming in their environment,
but are not maximally portable.
Strict Ref/Def This is the same as the relaxed ref/def model, save that only one
definition is allowed. Again, some implementations may decide not to put out
RATIONALE
24 Section 3. LANGUAGE
references to items that are not used. This is the model specified in K&R and
in the Base Document.
3.1.2.5 Types
Several new types have been added:
void
void *
signed char
3.1. Lexical Elements 25
Relaxed Ref/Def
int i; int i;
main() { second() {
i = 1; third(i);
second(); }
}
Strict Ref/Def
int i; extern int i;
main() { second() {
i = 1; third(i);
second(); }
}
Initializer
int i = 0; int i;
main() { second() {
i = 1; third(i);
second(); }
}
RATIONALE
26 Section 3. LANGUAGE
unsigned char
unsigned short
unsigned long
long double
void is used primarily as the typemark for a function which returns no result. It
may also be used, in any context where the value of an expression is to be discarded,
to indicate explicitly that a value is ignored by writing the cast (void). Finally, a
function prototype list that has no arguments is written as f(void), because f()
retains its old meaning that nothing is said about the arguments.
A “pointer to void,” void *, is a generic pointer, capable of pointing to any
(data) object without truncation. A pointer to void must have the same represen-
tation and alignment as a pointer to character; the intent of this rule is to allow
existing programs which call library functions (such as memcpy and free) to con-
tinue to work. A pointer to void may not be dereferenced, although such a pointer
may be converted to a normal pointer type which may be dereferenced. Pointers to
other types coerce silently to and from void * in assignments, function prototypes,
comparisons, and conditional expressions, whereas other pointer type clashes are
invalid. It is undefined what will happen if a pointer of some type is converted to
void *, and then the void * pointer is converted to a type with a stricter alignment
requirement.
Three types of char are specified: signed, plain, and unsigned. A plain char
may be represented as either signed or unsigned, depending upon the implementa-
tion, as in prior practice. The type signed char was introduced to make available
a one-byte signed integer type on those systems which implement plain char as
unsigned. For reasons of symmetry, the keyword signed is allowed as part of the
type name of other integral types.
Two varieties of the integral types are specified: signed and unsigned. If neither
specifier is used, signed is assumed. In the Base Document the only unsigned type
is unsigned int.
The keyword unsigned is something of a misnomer, suggesting as it does arith-
metic that is non-negative but capable of overflow. The semantics of the C type
unsigned is that of modulus, or wrap-around, arithmetic, for which overflow has
no meaning. The result of an unsigned arithmetic operation is thus always defined,
whereas the result of a signed operation may (in principle) be undefined. In prac-
tice, on twos-complement machines, both types often give the same result for all
operators except division, modulus, right shift, and comparisons. Hence there has
been a lack of sensitivity in the C community to the differences between signed and
unsigned arithmetic (see §3.2.1.1).
3.1. Lexical Elements 27
RATIONALE
28 Section 3. LANGUAGE
3.1.3 Constants
In folding and converting constants, an implementation must use at least as much
precision as is provided by the target environment. However, it is not required to use
exactly the same precision as the target, since this would require a cross compiler
to simulate target arithmetic at translation time.
QUIET CHANGE
Unsuffixed integer constants may have different types. In K&R, unsuf-
fixed decimal constants greater than INT MAX, and unsuffixed octal or
hexadecimal constants greater than UINT MAX are of type long.
RATIONALE
30 Section 3. LANGUAGE
printf("\033[10;10h%d\n", somevalue);
write:
Notwithstanding the general rule that literal constants are non-negative1, a char-
acter constant containing one character is effectively preceded with a (char) cast
and hence may yield a negative value if plain char is represented the same as signed
char. This simply reflects widespread past practice and was deemed too dangerous
to change.
QUIET CHANGE
A constant of the form '\078' is valid, but now has different meaning.
It now denotes a character constant whose value is the (implementation-
defined) combination of the values of the two characters '\07' and '8'.
In some implementations the old meaning is the character whose code is
078 ≡ 0100 ≡ 64.
QUIET CHANGE
A constant of the form '\a' or '\x' now may have different meaning.
The old meaning, if any, was implementation dependent.
A long string can be continued across multiple lines by using the backslash-
newline line continuation, but this practice requires that the continuation of the
string start in the first position of the next line. To permit more flexible layout,
and to solve some preprocessing problems (see §3.8.3), the Committee introduced
string literal concatenation. Two string literals in a row are pasted together (with
no null character in the middle) to make one combined string literal. This addition
to the C language allows a programmer to extend a string literal beyond the end of
a physical line without having to use the backslash-newline mechanism and thereby
destroying the indentation scheme of the program. An explicit concatenation oper-
ator was not introduced because the concatenation is a lexical construct rather than
a run-time operation.
without concatenation:
with concatenation:
RATIONALE
32 Section 3. LANGUAGE
QUIET CHANGE
A string of the form "\078" is valid, but now has different meaning. (See
§3.1.3.)
QUIET CHANGE
A string of the form "\a" or "\x" now has different meaning. (See
§3.1.3.)
QUIET CHANGE
It is neither required nor forbidden that identical string literals be rep-
resented by a single copy of the string in memory; a program depending
upon either scheme may behave differently.
3.1.5 Operators
Assignment operators of the form =+, described as old fashioned even in K&R, have
been dropped.
The form += is now defined to be a single token, not two, so no white space is
permitted within it; no compelling case could be made for permitting such white
space.
QUIET CHANGE
Expressions of the form x=-3 change meaning with the loss of the old-
style assignment operators.
3.1.6 Punctuators
The punctuator ... (ellipsis) has been added to denote a variable number of trailing
arguments in a function prototype. (See §3.5.4.3.)
The constraint that certain punctuators must occur in pairs (and the similar con-
straint on certain operators in §3.1.5) only applies after preprocessing. Syntactic
constraints are checked during syntactic analysis, and this follows preprocessing.
3.1.9 Comments
The Committee considered proposals to allow comments to nest. The main argu-
ment for nesting comments is that it would allow programmers to “comment out”
code. The Committee rejected this proposal on the grounds that comments should
be used for adding documentation to a program, and that preferable mechanisms
already exist for source code exclusion. For example,
#if 0
/* this code is bracketed out because ... */
code_to_be_excluded();
#endif
Preprocessing directives such as this prevent the enclosed code from being scanned
by later translation phases. Bracketed material can include comments and other,
nested, regions of bracketed code.
RATIONALE
34 Section 3. LANGUAGE
if (0) {
/* this code is bracketed out because ... */
code_to_be_excluded();
}
3.2 Conversions
3.2.1 Arithmetic operands
3.2.1.1 Characters and integers
Since the publication of K&R, a serious divergence has occurred among implemen-
tations of C in the evolution of integral promotion rules. Implementations fall into
two major camps, which may be characterized as unsigned preserving and value
preserving. The difference between these approaches centers on the treatment of
unsigned char and unsigned short, when widened by the integral promotions,
but the decision has an impact on the typing of constants as well (see §3.1.3.2).
The unsigned preserving approach calls for promoting the two smaller unsigned
types to unsigned int. This is a simple rule, and yields a type which is independent
of execution environment.
The value preserving approach calls for promoting those types to signed int,
if that type can properly represent all the values of the original type, and otherwise
for promoting those types to unsigned int. Thus, if the execution environment
represents short as something smaller than int, unsigned short becomes int;
otherwise it becomes unsigned int.
Both schemes give the same answer in the vast majority of cases, and both
give the same effective result in even more cases in implementations with twos-
complement arithmetic and quiet wraparound on signed overflow — that is, in most
current implementations. In such implementations, differences between the two only
appear when these two conditions are both true:
2. The result of the preceding expression is used in a context in which its signed-
ness is significant:
The Standard clarifies that the integral promotion rules also apply to bit-fields.
RATIONALE
36 Section 3. LANGUAGE
The Standard, unlike the Base Document, does not require rounding in the double
to float conversion. Some widely used IEEE floating point processor chips control
floating to integral conversion with the same mode bits as for double-precision to
single-precision conversion; since truncation-toward-zero is the appropriate setting
for C in the former case, it would be expensive to require such implementations to
round to float.
The rules in the Standard for these conversions are slight modifications of those
in the Base Document: the modifications accommodate the added types and the
value preserving rules (see §3.2.1.1). Explicit license has been added to perform
calculations in a “wider” type than absolutely necessary, since this can sometimes
produce smaller and faster code (not to mention the correct answer more often).
Calculations can also be performed in a “narrower” type, by the as if rule, so long
as the same end result is obtained. Explicit casting can always be used to obtain
exactly the intermediate types required.
The Committee relaxed the requirement that float operands be converted to
double. An implementation may still choose to convert.
QUIET CHANGE
Expressions with float operands may now be computed at lower preci-
sion. The Base Document specified that all floating point operations be
done in double.
A difference of opinion within the C community has centered around the meaning
of lvalue, one group considering an lvalue to be any kind of object locator, another
group holding that an lvalue is meaningful on the left side of an assigning operator.
The Committee has adopted the definition of lvalue as an object locator. The term
modifiable lvalue is used for the second of the above concepts.
The role of array objects has been a classic source of confusion in C, in large
part because of the numerous contexts in which an array reference is converted to
a pointer to its first element. While this conversion neatly handles the semantics
of subscripting, the fact that a[i] is itself a modifiable lvalue while a is not has
puzzled many students of the language. A more precise description has therefore
been incorporated in the Standard, in the hopes of combatting this confusion.
3.2. Conversions 37
3.2.2.2 void
The description of operators and expressions is simplified by saying that void yields
a value, with the understanding that the value has no representation, hence requires
no storage.
3.2.2.3 Pointers
C has now been implemented on a wide range of architectures. While some of
these architectures feature uniform pointers which are the size of some integer type,
maximally portable code may not assume any necessary correspondence between
different pointer types and the integral types.
The use of void * (“pointer to void”) as a generic object pointer type is an
invention of the Committee. Adoption of this type was stimulated by the desire
to specify function prototype arguments that either quietly convert arbitrary point-
ers (as in fread) or complain if the argument type does not exactly match (as in
strcmp). Nothing is said about pointers to functions, which may be incommensurate
with object pointers and/or integers.
Since pointers and integers are now considered incommensurate, the only integer
that can be safely converted to a pointer is the constant 0. The result of converting
any other integer to a pointer is machine dependent.
Consequences of the treatment of pointer types in the Standard include:
Implicit in the Standard is the notion of invalid pointers. In discussing pointers, the
Standard typically refers to “a pointer to an object” or “a pointer to a function” or
“a null pointer.” A special case in address arithmetic allows for a pointer to just
past the end of an array. Any other pointer is invalid.
RATIONALE
38 Section 3. LANGUAGE
3.3 Expressions
Several closely-related topics are involved in the precise specification of expression
evaluation: precedence, associativity, grouping, sequence points, agreement points,
order of evaluation, and interleaving. The latter three terms are discussed in §2.1.2.3.
The rules of precedence are encoded into the syntactic rules for each operator.
For example, the syntax for additive-expression includes the rule
additive-expression + multiplicative-expression
which implies that a+b*c parses as a+(b*c). The rules of associativity are similarly
encoded into the syntactic rules. For example, the syntax for assignment-expression
includes the rule
(a+b)+c and a+(b+c) may well yield different results: suppose that b is greater
than 0, a equals -b, and c is positive but substantially smaller than b. (That is,
suppose c/b is less than DBL EPSILON.) Then (a+b)+c is 0+c, or c, while a+(b+c)
equals a+b, or 0. That is to say, floating point addition (and multiplication) is not
associative.
The Base Document’s rule imposes a high cost on translation of numerical code
to C. Much numerical code is written in FORTRAN, which does provide a no-
regrouping guarantee; indeed, this is the normal semantic interpretation in most
high-level languages other than C. The Base Document’s advice, “rewrite using
explicit temporaries,” is burdensome to those with tens or hundreds of thousands
of lines of code to convert, a conversion which in most other respects could be done
automatically.
Elimination of the regrouping rule does not in fact prohibit much regrouping
of integer expressions. The bitwise logical operators can be arbitrarily regrouped,
since any regrouping gives the same result as if the expression had not been re-
grouped. This is also true of integer addition and multiplication in implementations
with twos-complement arithmetic and silent wraparound on overflow. Indeed, in
any implementation, regroupings which do not introduce overflows behave as if no
regrouping had occurred. (Results may also differ in such an implementation if the
expression as written results in overflows: in such a case the behavior is undefined,
so any regrouping couldn’t be any worse.)
The types of lvalues that may be used to access an object have been restricted so
that an optimizer is not required to make worst-case aliasing assumptions.
In practice, aliasing arises with the use of pointers. A contrived example to
illustrate the issues is
int a;
void f(int * b)
{
a = 1;
*b = 2;
g(a);
}
It is tempting to generate the call to g as if the source expression were g(1), but b
might point to a, so this optimization is not safe. On the other hand, consider
int a;
void f( double * b )
{
a = 1;
*b = 2.0;
g(a);
}
RATIONALE
40 Section 3. LANGUAGE
• The lvalue types may differ in signedness. In the common range, a signed
integral type and its unsigned variant have the same representation; it was
felt that an appreciable body of existing code is not “strictly typed” in this
area.
• Character pointer types are often used in the bytewise manipulation of objects;
a byte stored through such a character pointer may well end up in an object
of any type.
• A qualified version of the object’s type, though formally a different type, pro-
vides the same interpretation of the value of the object.
*fip = a;
*ip = 2;
g(fip->i);
}
It is not safe to optimize the first call to g as g(2), or the second as g(1), since the
call to f could quite legitimately have been
struct fi x;
f( &x, &x.i );
that a void primary expression is no part of a further expression, except that a void
expression may be cast to void, may be the second or third operand of a conditional
operator, or may be an operand of a comma operator.
The first expression on each line was discussed in the previous paragraph. The
second is conventional usage. All subsequent expressions take advantage of the
implicit conversion of a function designator to a pointer value, in nearly all expression
contexts. The Committee saw no real harm in allowing these forms; outlawing forms
like (*f)(), while still permitting *a (for int a[]), simply seemed more trouble
than it was worth.
The rule for implicit declaration of functions has been retained, but various past
ambiguities have been resolved by describing this usage in terms of a corresponding
explicit declaration.
For compatibility with past practice, all argument promotions occur as described
in the Base Document in the absence of a prototype declaration, including the (not
always desirable) promotion of float to double. A prototype gives the implementor
explicit license to pass a float as a float rather than a double, or a char as a
RATIONALE
42 Section 3. LANGUAGE
char rather than an int, or an argument in a special register, etc. If the definition
of a function in the presence of a prototype would cause the function to expect other
than the default promotion types, then clearly the calls to this function must be
made in the presence of a compatible prototype.
To clarify this and other relationships between function calls and function defi-
nitions, the Standard describes an equivalence between a function call or definition
which does occur in the presence of a prototype and one that does not.
Thus a prototyped function with no “narrow” types and no variable argument
list must be callable in the absence of a prototype, since the types actually passed in
a call are equivalent to the explicit function definition prototype. This constraint is
necessary to retain compatibility with past usage of library functions. (See §4.1.3.)
This provision constrains the latitude of an implementor because the parame-
ter passing conventions of prototype and non-prototype function calls must be the
same for functions accepting a fixed number of arguments. Implementations in en-
vironments where efficient function calling mechanisms are available must, in effect,
use the efficient calling sequence either in all “fixed argument list” calls or in none.
Since efficient calling sequences often do not allow for variable argument functions,
the fixed part of a variable argument list may be passed in a completely different
fashion than in a fixed argument list with the same number and type of arguments.
The existing practice of omitting trailing parameters in a call if it is known that
the parameters will not be used has consistently been discouraged. Since omission
of such parameters creates an inequivalence between the call and the declaration,
the behavior in such cases is undefined, and a maximally portable program will
avoid this usage. Hence an implementation is free to implement a function calling
mechanism for fixed argument lists which would (perhaps fatally) fail if the wrong
number or type of arguments were to be provided.
Strictly speaking then, calls to printf are obliged to be in the scope of a proto-
type (as by #include <stdio.h>), but implementations are not obliged to fail on
such a lapse. (The behavior is undefined).
Since the language now permits structure parameters, structure assignment and
functions returning structures, the concept of a structure expression is now part of
the C language. A structure value can be produced by an assignment, by a function
call, by a comma operator expression or by a conditional operator expression:
s1 = (s2 = s3)
sf(x)
(x, s1)
x ? s1 : s2
In these cases, the result is not an lvalue; hence it cannot be assigned to nor can its
address be taken.
3.3. Expressions 43
Similarly, x.y is an lvalue only if x is an lvalue. Thus none of the following valid
expressions are lvalues:
sf(3).a
(s1=s2).a
((i==6)?s1:s2).a
(x,s1).a
RATIONALE
44 Section 3. LANGUAGE
N == sizeof(a)/sizeof(a[0])
Thus size t is also a convenient type for array sizes, and is so used in several library
functions. (See §4.9.8.1, §4.9.8.2, §4.10.3.1, etc.)
The Standard specifies that the argument to sizeof can be any value except a
bit field, a void expression, or a function designator. This generality allows for
interesting environmental enquiries; given the declarations
these expressions determine the size of the type used for ...
Nothing portable can be said about casting integers to pointers, or vice versa, since
the two are now incommensurate.
The definition of these conversions adopted in the Standard resembles that in
the Base Document, but with several significant differences. The Base Document
required that a pointer successfully converted to an integer must be guaranteed to
3.3. Expressions 45
The type char must have the least strict alignment of any type, so char * has often
been used as a portable type for representing arbitrary object pointers. This usage
creates an unfortunate confusion between the ideas of arbitrary pointer and character
or string pointer. The new type void *, which has the same representation as char
*, is therefore preferable for arbitrary pointers.
The Standard (§3.2.1.4) requires that a cast of one floating point type to another
(e.g., double to float) results in an actual conversion.
RATIONALE
46 Section 3. LANGUAGE
that this type be signed, in order to obtain proper algebraic ordering when dealing
with pointers within the same array. However, the magnitude of a pointer difference
can be as large as the size of the largest object that can be declared. (And since that
is an unsigned type, the difference between two pointers may cause an overflow.)
The type of pointer minus pointer is defined to be int in K&R. The Stan-
dard defines the result of this operation to be a signed integer, the size of which
is implementation-defined. The type is published as ptrdiff t, in the standard
header <stddef.h>. Old code recompiled by a conforming compiler may no longer
work if the implementation defines the result of such an operation to be a type other
than int and if the program depended on the result to be of type int. This behavior
was considered by the Committee to be correctable. Overflow was considered not
to break old code since it was undefined by K&R. Mismatch of types between ac-
tual and formal argument declarations is correctable by including a properly defined
function prototype in the scope of the function invocation.
An important endorsement of widespread practice is the requirement that a
pointer can always be incremented to just past the end of an array, with no fear of
overflow or wraparound:
SOMETYPE array[SPAN];
/* ... */
for (p = &array[0]; p < &array[SPAN]; p++)
This stipulation merely requires that every object be followed by one byte whose
address is representable. That byte can be the first byte of the next object declared
for all but the last object located in a contiguous segment of memory. (In the exam-
ple, the address &array[SPAN] must address a byte following the highest element
of array.) Since the pointer expression p+1 need not (and should not) be derefer-
enced, it is unnecessary to leave room for a complete object of size sizeof(*p).
In the case of p-1, on the other hand, an entire object would have to be allocated
prior to the array of objects that p traverses, so decrement loops that run off the
bottom of an array may fail. This restriction allows segmented architectures, for
instance, to place objects at the start of a range of addressable memory.
QUIET CHANGE
Shifting by a long count no longer coerces the shifted operand to long.
3.3. Expressions 47
The Committee has affirmed the freedom in implementation granted by the Base
Document in not requiring the signed right shift operation to sign extend, since such
a requirement might slow down fast code and since the usefulness of sign extended
shifts is marginal. (Shifting a negative twos-complement integer arithmetically right
one place is not the same as dividing by two!)
RATIONALE
48 Section 3. LANGUAGE
The expression following #if (§3.8.1) must expand to integer constants, charac-
ter constants, the special operator defined, and operators with no side effects.
No environmental inquiries can be made, since all arithmetic is done as translate-
time (signed or unsigned) long integers, and casts are disallowed. The restriction to
translate-time arithmetic frees an implementation from having to perform execution-
environment arithmetic in the host environment. It does not preclude an imple-
mentation from doing so — the implementation may simply define “translate-time
arithmetic” to be that of the target.
Unsigned arithmetic is performed in these expressions (according to the default
widening rules) when unsigned operands are involved; this rule allows for unsur-
prising arithmetic involving very large constants (i.e, those whose type is unsigned
RATIONALE
50 Section 3. LANGUAGE
QUIET CHANGE
A program that uses #if expressions to determine properties of the ex-
ecution environment may now get different answers.
3.5 Declarations
The Committee decided that empty declarations are invalid (except for a special case
with tags, see §3.5.2.3, and the case of enumerations such as enum {zero,one};,
see §3.5.2.2). While many seemingly silly constructs are tolerated in other parts
of the language in the interest of facilitating the machine generation of C, empty
declarations were considered sufficiently easy to avoid.
The practice of placing the storage class specifier other than first in a declaration
has been branded as obsolescent (See §3.9.3.) The Committee feels it desirable to
rule out such constructs as
enum { aaa, aab,
/* etc */
zzy, zzz } typedef a2z;
in some future standard.
3.5. Declarations 51
RATIONALE
52 Section 3. LANGUAGE
But if struct y is already defined in a containing block, the first field of struct x
will refer to the older declaration.
Thus special semantics has been given to the form:
struct y;
It now hides the outer declaration of y, and “opens” a new instance in the current
block.
QUIET CHANGE
The empty declaration struct x; is no longer innocuous.
const No writes through this lvalue. In the absence of this qualifier, writes may
occur through this lvalue.
volatile No cacheing through this lvalue: each operation in the abstract semantics
must be performed. (That is, no cacheing assumptions may be made, since
the location is not guaranteed to contain any previous value.) In the absence
of this qualifier, the contents of the designated location may be assumed to be
unchanged (except for possible aliasing.)
the Standard were chosen to assure that the default, unqualified, case was the most
common, and that it corresponded most clearly to traditional practice in the use of
lvalue expressions.
Four combinations of the two qualifiers is possible; each defines a useful set of lvalue
properties. The next several paragraphs describe typical uses of these qualifiers.
The translator may assume, for an unqualified lvalue, that it may read or write
the referenced object, that the value of this object cannot be changed except by
explicitly programmed actions in the current thread of control, but that other lvalue
expressions could reference the same object.
const is specified in such a way that an implementation is at liberty to put
const objects in read-only storage, and is encouraged to diagnose obvious attempts
to modify them, but is not required to track down all the subtle ways that such
checking can be subverted. If a function parameter is declared const, then the
referenced object is not changed (through that lvalue) in the body of the function
— the parameter is read-only.
A static volatile object is an appropriate model for a memory-mapped I/O
register. Implementors of C translators should take into account relevant hardware
details on the target systems when implementing accesses to volatile objects. For
instance, the hardware logic of a system may require that a two-byte memory-
mapped register not be accessed with byte operations; a compiler for such a system
would have to assure that no such instructions were generated, even if the source
code only accesses one byte of the register. Whether read-modify-write instructions
can be used on such device registers must also be considered. Whatever decisions are
adopted on such issues must be documented, as volatile access is implementation-
defined. A volatile object is an appropriate model for a variable shared among
multiple processes.
A static const volatile object appropriately models a memory-mapped input
port, such as a real-time clock. Similarly, a const volatile object models a variable
which can be altered by another process but not by this one.
Although the type qualifiers are formally treated as defining new types they actually
serve as modifiers of declarators. Thus the declarations
In these declarations the const property is associated with the declarator stype, so
x and y are both const objects.
RATIONALE
54 Section 3. LANGUAGE
The Committee considered making const and volatile storage classes, but this
would have ruled out any number of desirable constructs, such as const members
of structures and variable pointers to const types.
3.5.4 Declarators
The function prototype syntax was adapted from C++. (See §3.3.2.2 and §3.5.4.3)
Some current implementations have a limit of six type modifiers (function re-
turning, array of, pointer to), the limit used in Ritchie’s original compiler. This
limit has been raised to twelve since the original limit has proven insufficient in
some cases; in particular, it did not allow for FORTRAN-to-C translation, since
FORTRAN allows for seven subscripts. (Some users have reported using nine or ten
levels, particularly in machine-generated C code.)
RATIONALE
56 Section 3. LANGUAGE
void func2(int x)
{
char * str1, * str2 ;
/* ... */
x = compare(str1, str2) ;
/* ... */
}
The optimizer knows that the pointers passed to compare are not used to assign new
values to any objects that the pointers reference. Hence the optimizer can make less
conservative assumptions about the side effects of compare than would otherwise be
necessary.
The Standard requires that calls to functions taking a variable number of argu-
ments must occur in the presence of a prototype (using the trailing ellipsis notation
,...). An implementation may thus assume that all other functions are called with
a fixed argument list, and may therefore use possibly more efficient calling sequences.
Programs using old-style headers in which the number of arguments in the calls and
the definition differ may not work in implementations which take advantage of such
optimizations. This is not a Quiet Change, strictly speaking, since the program
does not conform to the Standard. A word of warning is in order, however, since
the style is not uncommon in extant code, and since a conforming translator is not
required to diagnose such mismatches when they occur in separate translation units.
Such trouble spots can be made manifest (assuming an implementation provides rea-
sonable diagnostics) by providing new-style function declarations in the translation
units with the non-matching calls. Programmers who currently rely on being able
to omit trailing arguments are advised to recode using the <stdarg.h> paradigm.
Function prototypes may be used to define function types as well:
struct d_funct {
d_binop f1;
int (*f2)(double, double);
};
The structure d funct has two fields, both of which hold pointers to functions taking
two double arguments; the function types differ in their return type.
3.5. Declarations 57
3.5.7 Initialization
An implementation might conceivably have codes for floating zero and/or null
pointer other than all bits zero. In such a case, the implementation must fill out an
incomplete initializer with the various appropriate representations of zero; it may
not just fill the area with zero bytes.
The Committee considered proposals for permitting automatic aggregate initial-
izers to consist of a brace-enclosed series of arbitrary (execute-time) expressions,
instead of just those usable for a translate-time static initializer. However, cases
like this were troubling:
int x[2] = { f(x[1]), g(x[0]) };
Rather than determine a set of rules which would avoid pathological cases and yet
not seem too arbitrary, the Committee elected to permit only static initializers. Con-
sequently, an implementation may choose to build a hidden static aggregate, using
the same machinery as for other aggregate initializers, then copy that aggregate to
the automatic variable upon block entry.
A structure expression, such as a call to a function returning the appropriate
structure type, is permitted as an automatic structure initializer, since the usage
seems unproblematic.
For programmer convenience, even though it is a minor irregularity in initializer
semantics, the trailing null character in a string literal need not initialize an array
element, as in:
RATIONALE
58 Section 3. LANGUAGE
The Committee has adopted the rule (already used successfully in some implemen-
tations) that the first member of the union is the candidate for initialization. Other
notations for union initialization were considered, but none seemed of sufficient merit
to outweigh the lack of prior art.
This rule has a parallel with the initialization of structures. Members of struc-
tures are initialized in the sequence in which they are declared. The same can now
be said of unions, with the significant difference that only one union member (the
first) can be initialized.
3.6 Statements
3.6.1 Labeled statements
Since label definition and label reference are syntactically distinctive contexts, labels
are established as a separate name space.
• A great deal of code (or jump table space) might be generated for an innocent-
looking case range such as 0 .. 65535.
• The range 'A'..'Z' would specify all the integers between the character code
for A and that for Z. In some common character sets this range would include
non-alphabetic characters, and in others it might not include all the alphabetic
characters (especially in non-English character sets).
QUIET CHANGE
long expressions and constants in switch statements are no longer trun-
cated to int.
RATIONALE
60 Section 3. LANGUAGE
To avoid a nasty ambiguity, the Standard bans the use of typedef names as formal
parameters. For instance, in translating the text
int f(size_t, a_t, b_t, c_t, d_t, e_t, f_t, g_t,
h_t, i_t, j_t, k_t, l_t, m_t, n_t, o_t,
p_t, q_t, r_t, s_t)
the translator determines that the construct can only be a prototype declaration as
soon as it scans the first size t and following comma. In the absence of this rule,
it might be necessary to see the token following the right parenthesis that closes the
parameter list, which would require a sizeable look-ahead, before deciding whether
the text under scrutiny is a prototype declaration or an old-style function header
definition.
Some current implementations rewrite the type of a (for instance) char parameter
as if it were declared int, since the argument is known to be passed as an int
(in the absence of prototypes). The Standard requires, however, that the received
argument be converted as if by assignment upon function entry. Type rewriting is
thus no longer permissible.
3.8. Preprocessing directives 61
QUIET CHANGE
Functions that depend on char or short parameter types being widened
to int, or float to double, may behave differently.
Notes for implementors: the assignment conversion for argument passing often
requires no executable code. In most twos-complement machines, a short or char
is a contiguous subset of the bytes comprising the int actually passed (for even
the most unusual byte orderings), so that assignment conversion can be effected by
adjusting the address of the argument (if necessary) .
For an argument declared float, however, an explicit conversion must usually
be performed from the double actually passed to the float desired. Not many
implementations can subset the bytes of a double to get a float. (Even those that
apparently permit simple truncation often get the wrong answer on certain negative
numbers.)
RATIONALE
62 Section 3. LANGUAGE
(spaces or tabs) between the # and the directive, since the white space introduces
no ambiguity, causes no particular processing problems, and allows maximum flex-
ibility in coding style. Note that similar considerations apply for comments, which
are reduced to white space early in the phases of translation (§2.1.1.2):
# ifndef xxx
# define xxx "abc"
# elif xxx > 0
/* ... */
# endif
an implementation is not required to diagnose an error for the elif statement, even
though if it were processed, a syntactic error would be detected.
Various proposals were considered for permitting text other than comments at
the end of directives, particularly #endif and #else, presumably to label them for
easier matchup with their corresponding #if directives. The Committee rejected
all such proposals because of the difficulty of specifying exactly what would be
permitted, and how the translator would have to process it.
Various proposals were considered for permitting additional unary expressions
to be used for the purpose of testing for the system type, testing for the presence of
a file before #include, and other extensions to the preprocessing language. These
proposals were all rejected on the grounds of insufficient prior art and/or insufficient
utility.
3.8. Preprocessing directives 63
• The double quotes do not delimit a string literal with all its defined escape
sequences. (In some systems, backslash is a legitimate character in a filename.)
The construct just looks like a string literal.
• The filename on the #include (and #line) directive, if it does not begin with
" or <, is macro expanded prior to execution of the directive. Allowing macros
in the include directive facilitates the parameterization of include file names,
an important issue in transportability.
The file search rules used for the filename in the #include directive were left as
implementation-defined. The Standard intends that the rules which are eventually
provided by the implementor correspond as closely as possible to the original K&R
rules. The primary reason that explicit rules were not included in the Standard
is the infeasibility of describing a portable file system structure. It was consid-
ered unacceptable to include UNIX-like directory rules due to significant differences
between this structure and other popular commercial file system structures.
Nested include files raise an issue of interpreting the file search rules. In UNIX
C an include statement found within an include file entails a search for the named
file relative to the file system directory that holds the outer #include. Other imple-
mentations, including the earlier UNIX C described in K&R, always search relative
to the same current directory. The Committee decided, in principle, in favor of the
K&R approach, but was unable to provide explicit search rules as explained above.
RATIONALE
64 Section 3. LANGUAGE
The Standard specifies a set of include file names which must map onto distinct host
file names. In the absence of such a requirement, it would be impossible to write
portable programs using include files.
Section §2.2.4.1 on translation limits contains the required number of nesting levels
for include files. The limits chosen were intended to reflect reasonable needs for
users constrained by reasonable system resources available to implementors.
By defining a failure to read an include file as a syntax error, the Standard requires
that the failure be diagnosed. More than one proposal was presented for some form
of conditional include, or a directive such as #ifincludable, but none were accepted
by the Committee due to lack of prior art.
(in header1.h)
#define NULL_DEV 0
(in header2.h)
#define NULL_DEV 0
The first case might be useful in moving extant code from a signed-char implementa-
tion to one in which char is unsigned. The second case might be useful in adapting
code which assumes that sizeof results in an int value. The redefinition of const
could be useful in retrofitting more modern C code to an older implementation.
As with any other powerful language feature, keyword redefinition is subject to
abuse. Users cannot expect any meaningful behavior to come about from source
files starting with
RATIONALE
66 Section 3. LANGUAGE
practice could be condoned. However, since the facility provided by this mechanism
seems to be widely used, the Committee introduced a more tractable mechanism of
comparable power.
The # operator has been introduced for stringizing. It may only be used in a
#define expansion. It causes the formal parameter name following to be replaced
by a string literal formed by stringizing the actual argument token sequence. In
conjunction with string literal concatenation (see §3.1.4), use of this operator permits
the construction of strings as effectively as by identifier replacement within a string.
An example in the Standard illustrates this feature.
One problem with defining the effect of stringizing is the treatment of white
space occurring in macro definitions. Where this could be discarded in the past, now
upwards of one logical line worth (over 500 characters) may have to be retained. As a
compromise between token-based and character-based preprocessing disciplines, the
Committee decided to permit white space to be retained as one bit of information:
none or one. Arbitrary white space is replaced in the string by one space character.
The remaining problem with stringizing was to associate a “spelling” with each
token. (The problem arises in token-based preprocessors, which might, for instance,
convert a numeric literal to a canonical or internal representation, losing information
about base, leading 0’s, etc.) In the interest of simplicity, the Committee decided
that each token should expand to just those characters used to specify it in the
original source text.
QUIET CHANGE
A macro that relies on formal parameter substitution within a string
literal will produce different results.
Given these definitions, the expansion of a(b) is aaab, not aaa2 or aaan.)
RATIONALE
68 Section 3. LANGUAGE
Aside from giving values to LINE and FILE (see §3.8.8), the effect of #line
is unspecified. A good implementation will presumably provide line and file infor-
mation in conjunction with most diagnostics.
The directive #error has been introduced to provide an explicit mechanism for
forcing translation to fail under certain conditions. (Formally the Standard only
requires, can only require, that a diagnostic be issued when the #error directive is
effected. It is the intent of the Committee, however, that translation cease imme-
diately upon encountering this directive, if this is feasible in the implementation;
further diagnostics on text beyond the directive are apt to be of little value.) Tra-
ditionally such failure has had to be forced by inserting text so ill-formed that the
translator gagged on it.
The #pragma directive has been added as the universal method for extending the
space of directives.
The existing practice of using empty # lines for spacing is supported in the Standard.
The rule that these macros may not be redefined or undefined reduces the complex-
ity of the name space that the programmer and implementor must understand; it
recognizes that these macros have special built-in properties.
The macros DATE and TIME have been added to make available the time of
translation. A particular format for the expansion of these macros has been specified
to aid in parsing strings initialized by them.
The macros LINE and FILE have been added to give programmers access
to the source line number and file name.
The macro STDC allows for conditional translation on whether the translator
claims to be standard-conforming or not. It is defined as having value 1; future ver-
sions of the Standard could define it as 2, 3, ..., to allow for conditional compilation
on which version of the Standard a translator conforms to. This macro should be
of use in the transition toward conformance to the Standard.
3.9. Future language directions 69
RATIONALE
70 Section 3. LANGUAGE
adopting this future direction, hopes to provide common ground for implementors
and users concerned with this problem, so that some future C Standard can adopt
this non-overlapping rule on the basis of widespread experience.
Section 4
LIBRARY
4.1 Introduction
The Base Document for this section of the Standard was the 1984 /usr/group Stan-
dard. The /usr/group document contains definitions of some facilities which were
specific to the UNIX Operating System and not relevant to other operating envi-
ronments, such as pipes, ioctls, file access permissions and process control facilities.
Those definitions were dropped from the Standard. Some other functions were ex-
cluded from the Standard because they were non-portable or were ill-defined.
Other facilities not in the library Base Document but present in many UNIX
implementations, such as the curses (terminal-independent screen handling) library
were considered to be more complex and less essential than the facilities of the Base
Document; these functions were not added to the Standard.
71
72 Section 4. LIBRARY
#ifndef __ERRNO_H
#define __ERRNO_H
/* body of <errno.h> */
/* ... */
#endif
#ifdef __EXTENSIONS__
typedef int file_no;
extern int read(file_no _N, void * _Buffer, int _Nbytes);
/*...*/
#endif
Also reserved for the implementor are all external identifiers beginning with
an underscore, and all other identifiers beginning with an underscore followed by a
capital letter or an underscore. This gives a space of names for writing the numerous
behind-the-scenes non-external macros and functions a library needs to do its job
properly.
With these exceptions, the Standard assures the programmer that all other iden-
tifiers are available, with no fear of unexpected collisions when moving programs
from one implementation to another.1 Note, in particular, that part of the name
space of internal identifiers beginning with underscore is available to the user —
translator implementors have not been the only ones to find use for “hidden” names.
C is such a portable language in many respects that this issue of “name space pollu-
tion” is currently one of the principal barriers to writing completely portable code.
Therefore the Standard assures that macro and typedef names are reserved only if
the associated header is explicitly included.
4.1.3 Errors
<errno.h>
<errno.h> is a header invented to encapsulate the error handling mechanism used
by many of the library routines in math.h and strlib.h.2
The error reporting machinery centered about the setting of errno is generally
regarded with tolerance at best. It requires a “pathological coupling” between li-
brary functions and makes use of a static writable memory cell, which interferes
with the construction of shareable libraries. Nevertheless, the Committee preferred
to standardize this existing, however deficient, machinery rather than invent some-
thing more ambitious.
The definition of errno as an lvalue macro grants implementors the license to
expand it to something like * errno addr(), where the function returns a pointer
to the (current) modifiable copy of errno.
4.1.4 Limits
<float.h> and <limits.h>
Both <float.h> and <limits.h> are inventions. Included in these headers are
various parameters of the execution environment which are potentially useful at
compile time, and which are difficult or impossible to determine by other means.
The availability of this information in headers provides a portable way of tun-
ing a program to different environments. Another possible method of determining
1
See §3.1.2.1 for a discussion of some of the precautions an implementor should take to keep
this promise. Note also that any implementation-defined member names in structures defined in
<time.h> and <locals.h> must begin with an underscore, rather than following the pattern of
other names in those structures.
2
In earlier drafts of the Standard, errno and related macros were defined in <stddef.h>. When
the Committee decided that the other definitions in this header were of such general utility that
they should be required even in freestanding environments, it created <errno.h>.
RATIONALE
74 Section 4. LIBRARY
• to allow use of values which cannot readily (or, in some cases, cannot possibly)
be constructed as manifest constants, and
The offsetof macro has been added to provide a portable means of determining
the offset, in bytes, of a member within its structure. This capability is useful in
programs, such as are typical in data-base implementations, which declare a large
number of different data structures: it is desirable to provide “generic” routines that
work from descriptions of the structures, rather than from the structure declarations
themselves.3
3
Consider, for instance, a set of nodes (structures) which are to be dynamically allocated and
4.1. Introduction 75
(size_t)&(((s_name*)0)->m_name)
or
(size_t)(char *)&(((s_name*)0)->m_name)
or, where X is some predeclared address (or 0) and A(Z) is defined as ((char*)&Z),
(size_t)( A( (s_name*)X->m_name ) - A( X ))
It was not feasible, however, to mandate any single one of these forms as a construct
guaranteed to be portable.
Other implementations may choose to expand this macro as a call to a built-in
function that interrogates the translator’s symbol table.
RATIONALE
76 Section 4. LIBRARY
4.2 Diagnostics
<assert.h>
4.2.1 Program diagnostics
4.2.1.1 The assert macro
Some implementations tolerate an arbitrary scalar expression as the argument to
assert, but the Committee decided to require correct operation only for int ex-
pressions. For the sake of implementors, no hard and fast format for the output
of a failing assertion is required; but the Standard mandates enough machinery to
replicate the form shown in the footnote.
It can be difficult or impossible to make assert a true function, so it is restricted
to macro form only.
To minimize the number of different methods for program termination, assert
is now defined in terms of the abort function.
Note that defining the macro NDEBUG to disable assertions may change the be-
havior of a program with no failing assertion if any argument expression to assert
has side-effects, because the expression is no longer evaluated.
It is possible to turn assertions off and on in different functions within a transla-
tion unit by defining (or undefining) NDEBUG and including <assert.h> again. The
implementation of this behavior in <assert.h> is simple: undefine any previous
definition of assert before providing the new one. Thus the header might look like
#undef assert
#ifdef NDEBUG
#define assert(ignore) ((void) 0)
#else
extern void __gripe(char *_Expr, char *_File, int _Line);
#define assert(expr) \
( (expr)? (void)0 : __gripe(#expr, __FILE__, __LINE__) )
#endif
Note that assert must expand to a void expression, so the more obvious if state-
ment does not suffice as a definition of assert. Note also the avoidance of names
in a header which would conflict with the user’s name space (see §3.1.2.1).
Since these functions are often used primarily as macros, their domain is re-
stricted to the small positive integers representable in an unsigned char, plus the
value of EOF. EOF is traditionally −1, but may be any negative integer, and hence
distinguishable from any valid character code. These macros may thus be efficiently
implemented by using the argument as an index into a small array of attributes.
The Standard (§4.13.1) warns that names beginning with is and to, when these
are followed by lower-case letters, are subject to future use in adding items to
<ctype.h>.
The Standard specifies that the set of letters, in the default locale, comprises the 26
upper-case and 26 lower-case letters of the Latin (English) alphabet. This set may
vary in a locale-specific fashion (that is, under control of the setlocale function,
§4.4) so long as
isspace is widely used within the library as the working definition of white space.
RATIONALE
78 Section 4. LIBRARY
4.4 Localization
<locale.h>
C has become an international language. Users of the language outside the United
States have been forced to deal with the various Americanisms built into the stan-
dard library routines.
Areas affected by international considerations include:
Alphabet. The English language uses 26 letters derived from the Latin alphabet.
This set of letters suffices for English, Swahili, and Hawaiian; all other living
languages use either the Latin alphabet plus other characters, or other, non-
Latin alphabets or syllabaries.
In English, each letter has an upper-case and lower-case form. The German
“sharp S”, ß, occurs only in lower-case. European French usually omits dia-
criticals on upper-case letters. Some languages do not have the concept of two
cases.
Collation. In both EBCDIC and ASCII the code for ‘z’ is greater than the code
for ‘a’, and so on for other letters in the alphabet, so a “machine sort” gives
not unreasonable results for ordering strings. In contrast, most European
languages use a codeset resembling ASCII in which some of the codes used
in ASCII for punctuation characters are used for alphabetic characters. (See
§2.2.1.) The ordering of these codes is not alphabetic. In some languages
letters with diacritics sort as separate letters; in others they should be collated
just as the unmarked form. In Spanish, “ll” sorts as a single letter following
“l”; in German, “ß” sorts like “ss”.
Formatting of numbers and currency amounts. In the United States the pe-
riod is invariably used for the decimal point; this usage was built into the
definitions of such functions as printf and scanf. Prevalent practice in sev-
eral major European countries is to use a comma; a raised dot is employed
4.4. Localization <locale.h> 79
Date and time. The standard function asctime returns a string which includes
abbreviations for month and weekday names, and returns the various elements
in a format which might be considered unusual even in its country of origin.
Various common date formats include
The Committee has introduced mechanisms into the C library to allow these and
other issues to be treated in the appropriate locale-specific manner.
The localization features of the Standard are based on these principles:
RATIONALE
80 Section 4. LIBRARY
object forms of their programs in different locales. Users do not want to use
different versions of a program just because they deal with several different
locales.
Function interface. Locale is changed by calling a function, thus allowing the im-
plementation to recognize the change, rather than by, say, changing a memory
location that contains the decimal point character.
Immediate effect. When a new locale is selected, affected functions reflect the
change immediately. (This is not meant to imply if a signal-handling function
were to change the selected locale and return to a library function, that the
return value from that library function must be completely correct with respect
to the new locale.)
4.5 Mathematics
<math.h>
For historical reasons, the math library is only defined for the floating type double.
All the names formed by appending f or l to a name in <math.h> are reserved to
allow for the definition of float and long double libraries.
The functions ecvt, fcvt, and gcvt have been dropped since their capability is
available through sprintf.
4.5. Mathematics <math.h> 81
Traditionally, HUGE VAL has been defined as a manifest constant that approxi-
mates the largest representable double value. As an approximation to infinity it is
problematic. As a function return value indicating overflow, it can cause trouble if
first assigned to a float before testing, since a float may not necessarily hold all
values representable in a double.
After considering several alternatives, the Committee decided to generalize
HUGE VAL to a positive double expression, so that it could be expressed as an external
identifier naming a location initialized precisely with the proper bit pattern. It can
even be a special encoding for machine infinity, on implementations that support
such codes. It need not be representable as a float, however.
Similarly, domain errors in the past were typically indicated by a zero return,
which is not necessarily distinguishable from a valid result. The Committee agreed
to make the return value for domain errors implementation-defined, so that special
machine codes can be used to advantage. This makes possible an implementation
of the math library in accordance with the IEEE P854 proposal on floating point
representation and arithmetic.
The Committee considered the adoption of the matherr capability from UNIX
System V. In this feature of that system’s math library, any error (such as overflow
or underflow) results in a call from the library function to a user-defined exception
handler named matherr. The Committee rejected this approach for several reasons:
RATIONALE
82 Section 4. LIBRARY
x - (2*pi) * (int)(x/(2*pi))
are ill-advised.
RATIONALE
84 Section 4. LIBRARY
pointer to the argument is desired, not the value of the argument itself. Thus, a
scalar or struct type is unsuitable. Note that a one-element array of the appropriate
type is a valid definition.
setjmp is constrained to be a macro only: in some implementations the infor-
mation necessary to restore context is only available while executing the function
making the call to setjmp.
RATIONALE
86 Section 4. LIBRARY
register (if their addresses are never taken), it is not obvious that an automatic
declaration will not be rolled back. Hence the vague wording. In fact, the only
reliable way to ensure that a local variable retain the value it had at the time of the
call to longjmp is to define it with the volatile attribute.
Some implementations leave a process in a special state while a signal is being
handled. An explicit reassurance must be given to the environment when the signal
handler is done. To keep this job manageable, the Committee agreed to restrict
longjmp to only one level of signal handling.
The longjmp function should not be called in an exit handler (i.e., a function
registered with the atexit function (see §4.10.4.2)), since it might jump to some
code which is no longer in scope.
When a signal occurs the normal flow of control of a program is interrupted. If a sig-
nal occurs that is being trapped by a signal handler, that handler is invoked. When
it is finished, execution continues at the point at which the signal occurred. This
arrangement could cause problems if the signal handler invokes a library function
that was being executed at the time of the signal. Since library functions are not
guaranteed to be re-entrant, they should not be called from a signal handler that
returns. (See §2.2.3.) A specific exception to this rule has been granted for calls
to signal from within the signal handler; otherwise, the handler could not reliably
reset the signal.
4.8. Variable Arguments <stdarg.h> 87
The specification that some signals may be effectively set to SIG IGN instead of
SIG DFL at program startup allows programs under UNIX systems to inherit this
effective setting from parent processes.
For performance reasons, UNIX does not reset SIGILL to default handling when
the handler is called (usually to emulate missing instructions). This treatment is
sanctioned by specifying that whether reset occurs for SIGILL is implementation-
defined.
RATIONALE
88 Section 4. LIBRARY
4.9 Input/Output
<stdio.h>
Many implementations of the C runtime environment (most notably the UNIX oper-
ating system) provide, aside from the standard I/O library (fopen, fclose, fread,
fwrite, fseek), a set of unbuffered I/O services (open, close, read, write, lseek).
The Committee has decided not to standardize the latter set of functions.
A suggested semantics for these functions in the UNIX world may be found in
the emerging IEEE P1003 standard. The standard I/O library functions use a file
pointer for referring to the desired I/O stream. The unbuffered I/O services use a
file descriptor (a small integer) to refer to the desired I/O stream.
Due to weak implementations of the standard I/O library, many implementors
have assumed that the standard I/O library was used for small records and that the
4.9. Input/Output <stdio.h> 89
unbuffered I/O library was used for large records. However, a good implementation
of the standard I/O library can match the performance of the unbuffered services
on large records. The user also has the capability of tuning the performance of the
standard I/O library (with setvbuf) to suit the application.
Some subtle differences between the two sets of services can make the implemen-
tation of the unbuffered I/O services difficult:
• The model of a file used in the unbuffered I/O services is an array of characters.
Many C environments do not support this file model.
• Difficulties arise when handling the new-line character. Many hosts use con-
ventions other than an in-stream new-line character to mark the end of a line.
The unbuffered I/O services assume that no translation occurs between the
program’s data and the file data when performing I/O, so either the new-line
character translation would be lost (which breaks programs) or the implemen-
tor must be aware of the new-line translation (which results in non-portable
programs).
In summary, the Committee chose not to standardize the unbuffered I/O services
because:
• The performance of the standard I/O services can be the same or better than
the unbuffered I/O services.
• The unbuffered I/O file model may not be appropriate for many C language
environments.
4.9.1 Introduction
The macros IOFBF, IOLBF, IONBF are enumerations of the third argument to
setvbuf, a function adopted from UNIX System V.
SEEK CUR, SEEK END, and SEEK SET have been moved to <stdio.h> from a header
specified in the Base Document and not retained in the Standard.
FOPEN MAX and TMP MAX are added environmental limits of some interest to pro-
grams that manipulate multiple temporary files.
FILENAME MAX is provided so that buffers to hold file names can be conveniently
declared. If the target system supports arbitrarily long filenames, the implementor
should provide some reasonable value (80?, 255?, 509?) rather than something
unusable like USHRT MAX.
RATIONALE
90 Section 4. LIBRARY
4.9.2 Streams
C inherited its notion of text streams from the UNIX environment in which it was
born. Having each line delimited by a single new-line character, regardless of the
characteristics of the actual terminal, supported a simple model of text as a sort of
arbitrary length scroll or “galley.” Having a channel that is “transparent” (no file
structure or reserved data encodings) eliminated the need for a distinction between
text and binary streams.
Many other environments have different properties, however. If a program writ-
ten in C is to produce a text file digestible by other programs, by text editors in
particular, it must conform to the text formatting conventions of that environment.
The I/O facilities defined by the Standard are both more complex and more
restrictive than the ancestral I/O facilities of UNIX. This is justified on pragmatic
grounds: most of the differences, restrictions and omissions exist to permit C I/O
implementations in environments which differ from the UNIX I/O model.
Troublesome aspects of the stream concept include:
The definition of lines. In the UNIX model, division of a file into lines is effected
by new-line characters. Different techniques are used by other systems —
lines may be separated by CR-LF (carriage return, line feed) or by unrecorded
areas on the recording medium, or each line may be prefixed by its length.
The Standard addresses this diversity by specifying that new-line be used as
a line separator at the program level, but then permitting an implementation
to transform the data read or written to conform to the conventions of the
environment.
Some environments represent text lines as blank-filled fixed-length records.
Thus the Standard specifies that it is implementation-defined whether trailing
blanks are removed from a line on input. (This specification also addresses
the problems of environments which represent text as variable-length records,
but do not allow a record length of 0: an empty line may be written as a
one-character record containing a blank, and the blank is stripped on input.)
Random access. The UNIX I/O model features random access to data in a file,
indexed by character number. On systems where a new-line character pro-
cessed by the program represents an unknown number of physically recorded
characters, this simple mechanism cannot be consistently supported for text
streams. The Standard abstracts the significant properties of random access
for text streams: the ability to determine the current file position and then
4.9. Input/Output <stdio.h> 91
later reposition the file to the same location. ftell returns a file position
indicator, which has no necessary interpretation except that an fseek opera-
tion with that indicator value will position the file to the same place. Thus
an implementation may encode whatever file positioning information is most
appropriate for a text file, subject only to the constraint that the encoding
be representable as a long. Use of fgetpos and fsetpos removes even this
constraint.
Buffering. UNIX allows the program to control the extent and type of buffering
for various purposes. For example, a program can provide its own large I/O
buffer to improve efficiency, or can request unbuffered terminal I/O to process
each input character as it is entered. Other systems do not necessarily support
this generality. Some systems provide only line-at-a-time access to terminal
input; some systems support program-allocated buffers only by copying data
to and from system-allocated buffers for processing. Buffering is addressed
in the Standard by specifying UNIX-like setbuf and setvbuf functions, but
permitting great latitude in their implementation. A conforming library need
neither attempt the impossible nor respond to a program attempt to improve
efficiency by introducing additional overhead.
Thus, the Standard imposes a clear distinction between text streams, which must
be mapped to suit local custom, and binary streams, for which no mapping takes
place. Local custom on UNIX (and related) systems is of course to treat the two
sorts of streams identically, and nothing in the Standard requires any changes to
this practice.
Even the specification of binary streams requires some changes to accommodate
a wide range of systems. Because many systems do not keep track of the length of a
file to the nearest byte, an arbitrary number of characters may appear on the end of
a binary stream directed to a file. The Standard cannot forbid this implementation,
but does require that this padding consist only of null characters. The alternative
would be to restrict C to producing binary files digestible only by other C programs;
this alternative runs counter to the spirit of C.
The set of characters required to be preserved in text stream I/O are those needed
for writing C programs; the intent is the Standard should permit a C translator to
be written in a maximally portable fashion. Control characters such as backspace
are not required for this purpose, so their handling in text streams is not mandated.
It was agreed that some minimum maximum line length must be mandated; 254
was chosen.
4.9.3 Files
The as if principle is once again invoked to define the nature of input and output
in terms of just two functions, fgetc and fputc. The actual primitives in a given
system may be quite different.
RATIONALE
92 Section 4. LIBRARY
The fflush function ensures that output has been forced out of internal I/O buffers
for a specified stream. Occasionally, however, it is necessary to ensure that all output
is forced out, and the programmer may not conveniently be able to specify all
the currently-open streams (perhaps because some streams are manipulated within
library packages).5 To provide an implementation-independent method of flushing
all output buffers, the Standard specifies that this is the result of calling fflush
with a NULL argument.
The b type modifier has been added to deal with the text/binary dichotomy (see
§4.9.2). Because of the limited ability to seek within text files (see §4.9.9.1), an
implementation is at liberty to treat the old update + modes as if b were also
specified. Table 4.1 tabulates the capabilities and actions associated with the various
specified mode string arguments to fopen.
r w a r+ w+ a+
√ √
file must exist before open
√ √
old file contents discarded on open
√ √ √ √
stream can be read
√ √ √ √ √
stream can be written
√ √
stream can be written only at end
Other specifications for files, such as record length and block size, are not speci-
fied in the Standard, due to their widely varying characteristics in different operating
5
For instance, on a system (such as UNIX) which supports process forks, it is usually necessary
to flush all output buffers just prior to the fork.
RATIONALE
94 Section 4. LIBRARY
environments. Changes to file access modes and buffer sizes may be specified us-
ing the setvbuf function. (See §4.9.5.6.) An implementation may choose to allow
additional file specifications as part of the mode string argument. For instance,
file1 = fopen(file1name,"wb,reclen=80");
setbuf is subsumed by setvbuf, but has been retained for compatibility with old
code.
setvbuf has been adopted from UNIX System V, both to control the nature of
stream buffering and to specify the size of I/O buffers. An implementation is not
required to make actual use of a buffer provided for a stream, so a program must
never expect the buffer’s contents to reflect I/O operations. Further, the Standard
does not require that the requested buffering be implemented; it merely mandates a
standard mechanism for requesting whatever buffering services might be provided.
Although three types of buffering are defined, an implementation may choose
to make one or more of them equivalent. For example, a library may choose to
implement line-buffering for binary files as equivalent to unbuffered I/O or may
choose to always implement full-buffering as equivalent to line-buffering.
The general principle is to provide portable code with a means of requesting the
most appropriate popular buffering style, but not to require an implementation to
support these styles.
4.9. Input/Output <stdio.h> 95
Use of the L modifier with floating conversions has been added to deal with formatted
output of the new type long double.
Note that the %X and %x formats expect a corresponding int argument; %lX or
%lx must be supplied with a long int argument.
The conversion specification %p has been added for pointer conversion, since
the size of a pointer is not necessarily the same as the size of an int. Because
an implementation may support more than one size of pointer, the corresponding
argument is expected to be a (void *) pointer.
The %n format has been added to permit ascertaining the number of characters
converted up to that point in the current invocation of the formatter.
Some pre-Standard implementations switch formats for %g at an exponent of −3
instead of (the Standard’s) −4: existing code which requires the format switch at −3
will have to be changed.
Some existing implementations provide %D and %O as synonyms or replacements
for %ld and %lo. The Committee considered the latter notation preferable.
The Committee has reserved lower case conversion specifiers for future standard-
ization.
The use of leading zero in field widths to specify zero padding has been super-
seded by a precision field. The older mechanism has been retained.
Some implementations have provided the format %r as a means of indirectly
passing a variable-length argument list. The functions vfprintf, etc., are considered
to be a more controlled method of effecting this indirection, so %r was not adopted
in the Standard. (See §4.9.6.7.)
The printing formats for numbers is not entirely specified. The requirements
of the Standard are loose enough to allow implementations to handle such cases as
signed zero, not-a-number, and infinity in an appropriate fashion.
• As soon as one specified conversion fails, the whole function invocation fails.
RATIONALE
96 Section 4. LIBRARY
Input pointer conversion with %p has been added, although it is obviously risky,
for symmetry with fprintf. The %i format has been added to permit the scanner
to determine the radix of the number in the input stream; the %n format has been
added to make available the number of characters scanned thus far in the current
invocation of the scanner.
White space is now defined by the isspace function. (See §4.3.1.9.)
An implementation must not use the ungetc function to perform the necessary
one-character pushback. In particular, since the unmatched text is left “unread,”
the file position indicator as reported by the ftell function must be the position
of the character remaining to be read. Furthermore, if the unread characters were
themselves pushed back via ungetc calls, the pushback in fscanf must not affect
the push-back stack in ungetc. A scanf call that matches N characters from a
stream must leave the stream in the same state as if N consecutive getc calls had
been issued.
RATIONALE
98 Section 4. LIBRARY
pushback regardless of the state of the buffer; it felt that this degree of generality
makes clearer the ways in which the function may be used.
It is permissible to push back a different character than that which was read;
this accords with common existing practice. The last-in, first-out nature of ungetc
has been clarified.
• the stream is associated with a terminal, or some other file type for which file
position indicator is meaningless; or
RATIONALE
100 Section 4. LIBRARY
At various times, the Committee considered providing a form of perror that delivers
up an error string version of errno without performing any output. It ultimately de-
cided to provide this capability in a separate function, strerror. (See §4.11.6.1).
atof, atoi, and atol are subsumed by strtod and strtol, but have been retained
because they are used extensively in existing code. They are less reliable, but may
be faster if the argument is known to be in a valid range.
See §4.10.1.1.
See §4.10.1.1.
strtod and strtol have been adopted (from UNIX System V) because they offer
more control over the conversion process, and because they are required not to
produce unexpected results on overflow during conversion.
See §4.10.1.4.
4.10. General Utilities <stdlib.h> 101
/* initial allocation */
p = (OBJ *) calloc(0, sizeof(OBJ));
/* ... */
RATIONALE
102 Section 4. LIBRARY
functions may therefore return a null pointer for an allocation request of zero bytes.
Note that this treatment does not preclude the paradigm outlined above.
QUIET CHANGE
A program which relies on size-0 allocation requests returning a non-null
pointer will behave differently.
Some implementations provide a function (often called alloca) which allocates the
requested object from automatic storage; the object is automatically freed when the
calling function exits. Such a function is not efficiently implementable in a variety
of environments, so it was not adopted in the Standard.
The system function allows a program to suspend its execution temporarily in order
to run another program to completion.
Information may be passed to the called program in three ways: through
command-line argument strings, through the environment, and (most portably)
through data files. Before calling the system function, the calling program should
close all such data files.
RATIONALE
104 Section 4. LIBRARY
Information may be returned from the called program in two ways: through
the implementation-defined return value (in many implementations, the termina-
tion status code which is the argument to the exit function is returned by the
implementation to the caller as the value returned by the system function), and
(most portably) through data files.
If the environment is interactive, information may also be exchanged with users
of interactive devices.
Some implementations offer built-in programs called “commands” (for example,
“date”) which may provide useful information to an application program via the
system function. The Standard does not attempt to characterize such commands,
and their use is not portable.
On the other hand, the use of the system function is portable, provided the
implementation supports the capability. The Standard permits the application to
ascertain this by calling the system function with a null pointer argument. Whether
more levels of nesting are supported can also be ascertained this way; assuming more
than one such level is obviously dangerous.
RATIONALE
106 Section 4. LIBRARY
See §4.11.1.
strcoll and strxfrm provide for locale-specific string sorting. strcoll is intended
for applications in which the number of comparisons is small; strxfrm is more
appropriate when items are to be compared a number of times — the cost of trans-
formation is then only paid once.
See §4.11.4.3.
See §4.11.1.
This function has been included to provide a convenient solution to many simple
problems of lexical analysis, such as scanning command line arguments.
RATIONALE
108 Section 4. LIBRARY
capacity, and low timer overhead must be balanced carefully in the light of this
intended use.
#include <time.h>
struct tm when;
time_t now;
time_t deadline;
/* ... */
now = time(0);
when = *localtime(&now);
when.tm_hour += 1; /* result is in the range [1,24] */
deadline = mktime(&when);
The specification of mktime guarantees that the addition to the tm hour field pro-
duces the correct result even when the new value of tm hour is 24, i.e., a value
outside the range ever returned by a library function in a struct tm object.
One of the reasons for adding this function is to replace the capability to do
such arithmetic which is lost when a programmer cannot depend on time t being
an integral multiple of some known time unit.
Several readers of earlier versions of this Rationale have pointed out apparent
problems in this example if now is just before a transition into or out of daylight
savings time. However, when.tm isdst indicates what sort of time was the basis of
the calculation. Implementors, take heed. If this field is set to −1 on input, one
truly ambiguous case involves the transition out of daylight savings time. As DST
is currently legislated in the USA, the hour 0100–0159 occurs twice, first as DST
and then as standard time. Hence an unlabeled 0130 on this date is problematic.
RATIONALE
110 Section 4. LIBRARY
An implementation may choose to take this as DST or standard time, marking its
decision in the tm isdst field. It may also legitimately take this as invalid input
(and return (time t)(-1)).
Although the name of this function suggests a conflict with the principle of removing
ASCII dependencies from the Standard, the name has been retained due to prior art.
For the same reason of existing practice, a proposal to remove the newline character
from the string format was not adopted. Proposals to allow for the use of languages
other than English in naming weekdays and months met with objections on grounds
of prior art, and on grounds that a truly international version of this function was
difficult to specify: three-letter abbreviation of weekday and month names is not
universally conventional, for instance. The strftime function (§4.12.3.5) provides
appropriate facilities for locale-specific date and time strings.
This function has been retained, despite objections that GMT — that is, Coor-
dinated Universal Time (UTC) — is not available in some implementations, since
UTC is a useful and widespread standard representation of time. If UTC is not
available, a null pointer may be returned.
strftime provides a way of formatting the date and time in the appropriate locale-
specific fashion, using the %c, %x, and %X format specifiers. More generally, it allows
the programmer to tailor whatever date and time format is appropriate for a given
application. The facility is based on the UNIX system date command. See §4.4 for
further discussion of locale specification.
For the field controlled by %P, an implementation may wish to provide special
symbols to mark noon and midnight.
4.13. Future library directions 111
RATIONALE
112 Section 4. LIBRARY
Section 5
APPENDICES
113
114 Section 5. APPENDICES
Index
115
116 INDEX
RATIONALE
118 INDEX
va arg macro, 87
va list type, 87
value preserving, 34
<varargs.h> header, 87
va start macro, 87
VAX/VMS operating system, 81
vfprintf function, 95, 96
void * type, 26, 37, 45, 47, 48, 95
void keyword, 19, 51
volatile keyword, 19
vprintf function, 96
vsprintf function, 96
wchar t type, 74
white space, 19
wide characters, 30, 32
widened types, 75
RATIONALE