C Track: Compiling C Programs.: The Different Kinds of Files
C Track: Compiling C Programs.: The Different Kinds of Files
It is important to understand that while some computer languages (e.g. Scheme or Basic) are normally
used with an interactive interpreter (where you type in commands that are immediately executed), C
doesn't work that way. C source code files are always compiled into binary code by a program called
a "compiler" and then executed. This is actually a multi-step process which we describe in some
detail here.
1. Regular source code files. These files contain function definitions, and have names which end
in ".c" by convention.
2. Header files. These files contain function declarations (also known as function prototypes)
and various preprocessor statements (see below). They are used to allow source code files to
access externally-defined functions. Header files end in ".h" by convention.
3. Object files. These files are produced as the output of the compiler. They consist of function
definitions in binary form, but they are not executable by themselves. Object files end in ".o"
by convention, although on some operating systems (e.g. Windows, MS-DOS), they often end
in ".obj".
4. Binary executables. These are produced as the output of a program called a "linker". The
linker links together a number of object files to produce a binary file which can be directly
executed. Binary executables have no special suffix on Unix operating systems, although they
generally end in ".exe" on Windows.
There are other kinds of files as well, notably libraries (".a" files) and shared libraries (".so" files),
but you won't normally need to deal with them directly.
The preprocessor
Before the C compiler starts compiling a source code file, the file is processed by a preprocessor.
This is in reality a separate program (normally called "cpp", for "C preprocessor"), but it is invoked
automatically by the compiler before compilation proper begins. What the preprocessor does is
convert the source code file you write into another source code file (you can think of it as a
"modified" or "expanded" source code file). That modified file may exist as a real file in the file
system, or it may only be stored in memory for a short time before being sent to the compiler. Either
way, you don't have to worry about it, but you do have to know what the preprocessor commands do.
Preprocessor commands start with the pound sign ("#"). There are several preprocessor commands;
two of the most important are:
int a = BIGNUM;
becomes
int a = 1000000;
#define is used in this way so as to avoid having to explicitly write out some constant value
in many different places in a source code file. This is important in case you need to change the
constant value later on; it's much less bug-prone to change it once, in the #define, than to
have to change it in multiple places scattered all over the code.
3. #include. This is used to access function definitions defined outside of a source code file. For
instance:
4. #include <stdio.h>
causes the preprocessor to paste the contents of <stdio.h> into the source code file at the
location of the #include statement before it gets compiled. #include is almost always used
to include header files, which are files which mainly contain function declarations and
#define statements. In this case, we use #include in order to be able to use functions such as
printf and scanf, whose declarations are located in the file stdio.h. C compilers do not
allow you to use a function unless it has previously been declared or defined in that file;
#include statements are thus the way to re-use previously-written code in your C programs.
There are a number of other preprocessor commands as well, but we will deal with them as we need
them.
% gcc -c foo.c
where % is the unix prompt. This tells the compiler to run the preprocessor on the file foo.c and then
compile it into the object code file foo.o. The -c option means to compile the source code file into
an object file but not to invoke the linker. If your entire program is in one source code file, you can
instead do this:
Note also that the name of the compiler we are using is gcc, which stands for "GNU C compiler" or
"GNU compiler collection" depending on who you listen to. Other C compilers exist; many of them
have the name cc, for "C compiler". On Linux systems cc is an alias for gcc.
Like the preprocessor, the linker is a separate program called ld. Also like the preprocessor, the
linker is invoked automatically for you when you use the compiler. The normal way of using the
linker is as follows:
This line tells the compiler to link together three object files (foo.o, bar.o, and baz.o) into a binary
executable file named myprog. Now you have a file called myprog that you can run and which will
hopefully do something cool and/or useful.
This is all you need to know to begin compiling your own C programs. Generally, we also
recommend that you use the -Wall command-line option:
The -Wall option causes the compiler to warn you about legal but dubious code constructs, and will
help you catch a lot of bugs very early. If you want to be even more anal (and who doesn't?), do this:
The -Wstrict-prototypes option means that the compiler will warn you if you haven't written
correct prototypes for all your functions. The -ansi and -pedantic options cause the compiler to
warn about any non-portable construct (e.g. constructs that may be legal in gcc but not in all standard
C compilers; such features should usually be avoided).