Rob Pike Notes on Programming in C
Rob Pike Notes on Programming in C
org/bell_labs/pikestyle
quotes | docs | repo | golang | sam | man | acme | Glenda | 9times | harmful | 9P | cat-
v.org
Related sites: | site updates | site map |
• » bell labs/
• ◦ › blit/
◦ › concurrent window system/
◦ › crabs/
◦ › duffs device
◦ › face the nation/
◦ › good bad ugly/
◦ › innovations song/
◦ › mk/
◦ › new c compilers/
◦ » pikestyle
◦ › reading chess/
◦ › sam lang tutorial/
◦ › squeak/
◦ › squinting at power series/
◦ › structural regexps/
◦ › the hideous name/
◦ › timing trials/
◦ › transparent wsys/
◦ › upas mail system/
◦ › utah2000/
◦ › utf 8 history
◦ › why pascal/
• › economics/
• › feynman/
• › henry spencer/
• › inferno/
• › plan 9/
• › political science/
• › programming/
• › unix/
• › xml/
Notes on Programming in C
Rob Pike
1 of 8 10/24/23, 19:06
Rob Pike: Notes on Programming in C http://doc.cat-v.org/bell_labs/pikestyle
Introduction
Kernighan and Plauger's The Elements of Programming Style was an important
and rightly influential book. But sometimes I feel its concise rules were taken as a
cookbook approach to good style instead of the succinct expression of a philosophy
they were meant to be. If the book claims that variable names should be chosen
meaningfully, doesn't it then follow that variables whose names are small essays on
their use are even better? Isn't MaximumValueUntilOverflow a better name than maxval? I
don't think so.
Issues of typography
A program is a sort of publication. It's meant to be read by the programmer,
another programmer (perhaps yourself a few days, weeks or years later), and lastly a
machine. The machine doesn't care how pretty the program is - if the program
compiles, the machine's happy - but people do, and they should. Sometimes they care
too much: pretty printers mechanically produce pretty output that accentuates
irrelevant detail in the program, which is as sensible as putting all the prepositions in
English text in bold font. Although many people think programs should look like the
Algol-68 report (and some systems even require you to edit programs in that style), a
clear program is not made any clearer by such presentation, and a bad program is
only made laughable.
Variable names
Ah, variable names. Length is not a virtue in a name; clarity of expression is. A
global variable rarely used may deserve a long name, maxphysaddr say. An array index
2 of 8 10/24/23, 19:06
Rob Pike: Notes on Programming in C http://doc.cat-v.org/bell_labs/pikestyle
used on every line of a loop needn't be named any more elaborately than i. Saying
index or elementnumber is more to type (or calls upon your text editor) and obscures the
details of the computation. When the variable names are huge, it's harder to see
what's going on. This is partly a typographic issue; consider
for(i=0 to 100)
array[i]=0
vs.
for(elementnumber=0 to 100)
array[elementnumber]=0;
The problem gets worse fast with real examples. Indices are just notation, so treat
them as such.
Consider: When you have a pointer to an object, it is a name for exactly that
object and no other. That sounds trivial, but look at the following two expressions:
np
node[i]
The first points to a node, the second evaluates to (say) the same node. But the
second form is an expression; it is not so simple. To interpret it, we must know what
node is, what i is, and that i and node are related by the (probably unspecified) rules
of the surrounding program. Nothing about the expression in isolation can show that
i is a valid index of node, let alone the index of the element we want. If i and j and k
3 of 8 10/24/23, 19:06
Rob Pike: Notes on Programming in C http://doc.cat-v.org/bell_labs/pikestyle
are all indices into the node array, it's very easy to slip up, and the compiler cannot
help. It's particularly easy to make mistakes when passing things to subroutines: a
pointer is a single thing; an array and an index must be believed to belong together in
the receiving subroutine.
vs.
lp->type.
or
(++lp)->type.
iadvances but the rest of the expression must stay constant; with pointers, there's
only one thing to advance.
is sufficiently evocative; if an array is being indexed the array will have some well-
chosen name and the expression will end up longer:
node[i].left.
Again, the extra characters become more irritating as the examples become larger.
As a rule, if you find code containing many similar, complex expressions that
evaluate to elements of a data structure, judicious use of pointers can clear things
up. Consider what
if(goleft)
p->left=p->right->left;
else
p->right=p->left->right;
would look like using a compound expression for p. Sometimes it's worth a temporary
variable (here p) or a macro to distill the calculation.
Procedure names
Procedure names should reflect what they do; function names should reflect what
4 of 8 10/24/23, 19:06
Rob Pike: Notes on Programming in C http://doc.cat-v.org/bell_labs/pikestyle
they return. Functions are used in expressions, often in things like if's, so they need
to read appropriately.
if(checksize(x))
is unhelpful because we can't deduce whether checksize returns true on error or non-
error; instead
if(validsize(x))
makes the point clear and makes a future mistake in using the routine less likely.
Comments
A delicate matter, requiring taste and judgement. I tend to err on the side of
eliminating comments, for several reasons. First, if the code is clear, and uses good
type names and variable names, it should explain itself. Second, comments aren't
checked by the compiler, so there is no guarantee they're right, especially after the
code is modified. A misleading comment can be very confusing. Third, the issue of
typography: comments clutter code.
i=i+1;
Complexity
Most programs are too complicated - that is, more complex than they need to be
to solve their problems efficiently. Why? Mostly it's because of bad design, but I will
skip that issue here because it's a big one. But programs are often complicated at the
microscopic level, and that is something I can address here.
5 of 8 10/24/23, 19:06
Rob Pike: Notes on Programming in C http://doc.cat-v.org/bell_labs/pikestyle
Rule 1. You can't tell where a program is going to spend its time. Bottlenecks
occur in surprising places, so don't try to second guess and put in a speed hack until
you've proven that's where the bottleneck is.
Rule 2. Measure. Don't tune for speed until you've measured, and even then
don't unless one part of the code overwhelms the rest.
Rule 3. Fancy algorithms are slow when n is small, and n is usually small. Fancy
algorithms have big constants. Until you know that n is frequently going to be big,
don't get fancy. (Even if n does get big, use Rule 2 first.) For example, binary trees
are always faster than splay trees for workaday problems.
Rule 4. Fancy algorithms are buggier than simple ones, and they're much harder
to implement. Use simple algorithms as well as simple data structures.
The following data structures are a complete list for almost all practical
programs:
array
linked list
hash table
binary tree
Of course, you must also be prepared to collect these into compound data structures.
For instance, a symbol table might be implemented as a hash table containing linked
lists of arrays of characters.
Rule 5. Data dominates. If you've chosen the right data structures and
organized things well, the algorithms will almost always be self-evident. Data
structures, not algorithms, are central to programming. (See The Mythical Man-
Month: Essays on Software Engineering by F. P. Brooks, page 102.)
Perhaps the most intriguing aspect of this kind of design is that the tables can
sometimes be generated by another program - a parser generator, in the classical
case. As a more earthy example, if an operating system is driven by a set of tables
that connect I/O requests to the appropriate device drivers, the system may be
6 of 8 10/24/23, 19:06
Rob Pike: Notes on Programming in C http://doc.cat-v.org/bell_labs/pikestyle
One of the reasons data-driven programs are not common, at least among
beginners, is the tyranny of Pascal. Pascal, like its creator, believes firmly in the
separation of code and data. It therefore (at least in its original form) has no ability
to create initialized data. This flies in the face of the theories of Turing and von
Neumann, which define the basic principles of the stored-program computer. Code
and data are the same, or at least they can be. How else can you explain how a
compiler works? (Functional languages have a similar problem with I/O.)
Function pointers
Another result of the tyranny of Pascal is that beginners don't use function
pointers. (You can't have function-valued variables in Pascal.) Using function pointers
to encode complexity has some interesting properties.
Some of the complexity is passed to the routine pointed to. The routine must
obey some standard protocol - it's one of a set of routines invoked identically - but
beyond that, what it does is its business alone. The complexity is distributed.
There is this idea of a protocol, in that all functions used similarly must behave
similarly. This makes for easy documentation, testing, growth and even making the
program run distributed over a network - the protocol can be encoded as remote
procedure calls.
Include files
Simple rule: include files should never include include files. If instead they state
(in comments or implicitly) what files they need to have included first, the problem of
deciding which files to include is pushed to the user (programmer) but in a way that's
easy to handle and that, by construction, avoids multiple inclusions. Multiple
inclusions are a bane of systems programming. It's not rare to have files included
7 of 8 10/24/23, 19:06
Rob Pike: Notes on Programming in C http://doc.cat-v.org/bell_labs/pikestyle
five or more times to compile a single C source file. The Unix /usr/include/sys stuff is
terrible this way.
There's a little dance involving #ifdef's that can prevent a file being read twice,
but it's usually done wrong in practice - the #ifdef's are in the file itself, not the file
that includes it. The result is often thousands of needless lines of code passing
through the lexical analyzer, which is (in good compilers) the most expensive phase.
Powered by werc
Search
8 of 8 10/24/23, 19:06