Misruda Introduction To Systems Software
Misruda Introduction To Systems Software
S y s t e m s S o f t wa r e
Jonathan Misurda
Computer Science Department
University of Pittsburgh
[email protected]
http://www.cs.pitt.edu/∼jmisurda
Version 3, revision 0
This text is meant to accompany the course CS 0449 at the University of Pittsburgh. Any
other use, commercial or otherwise, is prohibited without permission of the author. All
rights reserved.
Contents i
List of Figures v
Preface ix
1 Pointers 1
1.1 Basic Pointers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Fundamental Operations . . . . . . . . . . . . . . . . . . 2
1.2 Passing Pointers to Functions . . . . . . . . . . . . . . . . . . . . 4
1.3 Pointers, Arrays, and Strings . . . . . . . . . . . . . . . . . . . . . 5
1.3.1 Pointer Arithmetic . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Terms and Definitions . . . . . . . . . . . . . . . . . . . . . . . . 7
4 Function Pointers 30
4.1 Function Pointer Declarations . . . . . . . . . . . . . . . . . . . . 30
4.2 Function Pointer Use . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2.1 Function Pointers as Parameters . . . . . . . . . . . . . . 32
4.2.2 Call/Jump Tables . . . . . . . . . . . . . . . . . . . . . . 34
4.3 Terms and Definitions . . . . . . . . . . . . . . . . . . . . . . . . 36
Index 122
Colophon 125
List of Figures
10.1 Two threads independently running the same code can lead to a
race condition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
10.2 Synchronizing the execution of two threads using a mutex. . . . . 90
10.3 The producer/consumer problem in pseudocode. . . . . . . . . . 92
2.1 Variables in inner scopes can shadow variables of the same name
in enclosing scopes. . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 A pointer error caused by automatic destruction of variables. . . . 12
2.3 Unlike automatic variables, pointers to static locals can be used as
return values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 The strtok() function can tokenize a string based on a set of
delimiter characters. . . . . . . . . . . . . . . . . . . . . . . . . . 14
At some point late last decade, the core curriculum of many major cs programs,
including the one here at the University of Pittsburgh, switched to teaching Java, C],
or python. While there is no mistaking the prevalence or importance of a modern,
Object-Oriented, garbage-collected programming language, there is also plenty of
room for an “old-fashioned” do-it-yourself language like C. It is still the language
of large programs and Operating Systems, and learning it opens the door to doing
work with real-life systems like GNU/Linux.
Armed with a C compiler, we can produce executable programs, but how were
they made? What does the file contain? What actually happens to make the program
run? To be able to answer such questions about a program and its interactions seems
to be fundamental to the issue of defining a system. Biologists have long known the
benefit of studying life in its natural environment, and Computer Scientists should
do no different. If we follow such a model and study a program’s “life” as we run it
on the computer, we will begin to learn about each of the parts that work together.
Only then can we truly appreciate the abstractions that the Operating System, the
hardware, and high-level programming languages provide to us.
This material picks up where high-level programming languages stop, by looking
at interactions with memory, libraries, system calls, threading, and networking.
This course, however, avoids discussing the implementations of the abstractions the
system provides, leaving those topics for more advanced, specialized courses later
in the curriculum.
I started out writing this text not as a book, but as a study guide for the second
exam in CS 0449: Introduction to Systems Software. I did not have a specific textbook
except for the C programming portion of the course, and felt that the students
could use something to help them tie in the lectures, the course slides, and the
projects. So I sat down one Friday afternoon and, through the next four days, wrote
a 60-page “pamphlet.” I’ve since decided to stick with my effort for the next term,
x Preface
as part of a four-book curriculum that includes two other freely available texts on
Linux Programming and Linux Device Drivers (links available in the Bibliography
Section).
Over time, I hope to continue to add new topics and refine the material presented
here. I appreciate any feedback that you, the reader, might have. Any accepted
significant improvement is usually worth some extra credit to my students. All
correspondence can be directed to the email address found on the front cover.
Acknowledgments
No book, not even one so quickly written, is produced in a vacuum. I would like
to thank all of my students for their patience while we all came to terms with my
vision for the course. I especially want to thank Nathaniel Buck, Stacey Crotti, Jeff
Kaminski, and Gerald Maloney for taking the time to provide detailed feedback. I’d
also like to thank my parents and friends for contributing in ways from just basic
support to the artistic. This text would not be the same without all of your help.
—Jonathan Misurda
May 1, 2008
1 | Pointers
Then we can call it from our program with swap(a,b). However, if we initially
set x=3; y=5; and run the above swap function, the values of x and y remain
unchanged. This should be unsurprising to us because we know that when we pass
parameters to a function they are passed “by value” (also known as passing “by
copy”).
This means that a and b contain the same values as x and y at the moment
that the call to swap() occurs because there is an implicit assignment of the form
a=x; b=y; at the call site. From that point on in the function swap(), any changes
to a and b have no effect on the original x and y. Thus, our swap code works
fine inside the scope of swap() but once the scope is destroyed when the function
returns, the changes are not reflected in the calling function.
We then wonder if there is another way to write our swap() so that it succeeds. Java
Content
Our first inclination might be to attempt to return multiple values. In C, like in Java,
this is not directly possible. A function may only return one thing. However, we
could wrap our values in a structure or array and return the one aggregate object,
but then we have to do the work of extracting the values from the object, which
is just as much work as doing the swap in the first place. We soon realize that we
cannot write a swap function that exchanges the values of its arguments in any
reasonable fashion. If we are programming in Java with the primitive data types,
2 Pointers
this is where our attempts must stop. However, in C we have a way to rewrite the
function to allow it to actually work.
In fact, this should not be surprising because we are essentially asking a function
to return multiple pieces of data, and we have already seen one function that can
do that: scanf(). If we pass scanf() a format string like "%d %d" it will set two
argument variables to the integers that the user inputs. In essence, it is doing what
we just said could not be done: It is modifying the values of its parameters. How is
that accomplished? The answer is by using pointers.
where the asterisk indicates that p is a pointer. We need to be careful with declaring
multiple variables on one line because its behavior in regards to pointers is surprising.
If we have the declaration:
1.1 Basic Pointers 3
x 1000
p 1004 p x
10
00
int *p, q;
or even:
int* p, q;
we get an integer pointer named p and an integer named q. No matter where you
place the asterisk, it binds to the next variable. To avoid confusion, it is best to
declare every variable on its own separate line.
To set the value of a pointer, we need to be able to get an address from an
existing variable name. In C, we can use the address-of operator, which is the unary
ampersand (&) to take a variable name and return its address:
int x;
int *p;
p = &x;
This code listing declares an integer x that lives someplace in memory and an integer
pointer p that also lives somewhere in memory (since pointers are variables too).
The assignment sets the pointer p to “point to” the variable x by taking its address
and storing it in p.
Figure 1.1 shows two ways of picturing this relationship. On the left, we have a
possible layout of ram, where x lives at address 1000 and p lives at address 1004.
After the assignment, p contains the value 1000, the address of x. On the right,
much more abstractly, is shown the “points-to” relationship.
4 Pointers
Now that we have the pointer p we can use it as another name for x. But in
order to accomplish that, we need to be able to traverse the link of the points-to
relationship. When we follow the arrow and want to talk about the location a pointer
points-to rather than the pointer itself, we are doing a dereference operation. The
dereference operator in C is also the asterisk (*). Notice that although we used the
asterisk in the pointer definition to declare a variable as a pointer, the dereference
operator is different.
When we place the asterisk to the left of a pointer variable or expression that
yields a pointer, we chase the pointer link and are now referring to the location it
points to. That means that the following two statements are equivalent:
x = 4; *p = 4;
Note that it is usually a mistake to assign a pointer variable a value that is not
computed from taking the address of something or from a function that returns a
pointer. This general rule should remind us that p = 4; would not be appropriate
because we do not normally know in advance where memory for objects will be
reserved.
We also need to change the way we invoke it. If we have our same variables, x and y,
we would call the function as swap(&x, &y). After swap() returns, we now find
that x is 5 and y is 3. In other words, the swap worked.
To better understand what happened here, we can trace the code constructing a
picture like before. Figure 1.2 shows the steps. When the swap() function is called,
there are four variables in memory: x and y which contain 3 and 5, respectively, and
a and b which get set to the addresses of x and y. Next, a temporary variable t is
created and initialized to the value of what a points to, i.e., the value of x. We then
1.3 Pointers, Arrays, and Strings 5
x 3 x 3 x 5 x 5
y 5 y 5 y 5 y 3
a a a a
b b b b
t 3 t 3 t 3
copy the value of what b points to (namely the value of y) into the location that a
points to. Finally, we copy into the location pointed to by b our temporary variable.
Our swap function is an example of one of the two ways that pointers as function
parameters are used in C. In the case of swap(), scanf(), and fread(), among
many others, the parameters that are passed as pointers are actually acting as addi-
tional return values from the functions.
The other reason (which is not mutually exclusive from the previous reason)
that pointers are used as parameters is for time and space efficiency in passing
large objects (such as arrays, structs, or arrays of structs). Since we have already
established that C is pass-by-value, if we pass a large object to a function, that object
would have to be duplicated and that might take a long time. Instead, we can pass a
pointer (which is really just integer-sized), thus taking no noticeable time to copy.
This is why fwrite() takes a pointer parameter even though it does not change the
object in memory.
This function is one we have already discussed in the course. It is strcpy(). The
way that it works is that the post-increment allows us to walk one character at a
time through both the source and destination arrays. The dereferences turn the
1.4 Terms and Definitions 7
pointers into character values. The assignment does the copy of a letter, and yields
the right-hand side as the result of the expression. The right-hand side is then
evaluated in a boolean context to determine if the loop stops or continues. Since
every character is non-zero, the loop continues. The loop terminates when the
nul-terminator character is copied, because its value is zero and thus false. The loop
needs no body, the entirety of the work is done as side-effects of the loop condition.
Dereference To follow the link of a pointer to the location it points to. In C, the
dereference operator is *.
--
-- +
- +
-- +
-
- +
Figure 2.1: Static electricity is a built-up charge that is separated from an op-
posite charge. When you touch a grounded object, the charges
suddenly flow to cancel out, giving you a shock.
the compiler can compute exactly how much space will be needed for variables, the
variables are static data. Dynamic data is that which is allocated while the program
runs, since its size may depend on input or random values.
At first glance, local variables would seem to be static data, since the compiler
can determine exactly how much space is necessary for them. However, while
the compiler may be able to compute the memory needs of a function, there is
not always a way to determine how many times that function may be called if, for
instance, it is recursive. Since each invocation of the function needs its own copy of
the local variables, allocation of their storage must be dealt with at run-time.
Static data can be allocated when the program begins and freed when the pro-
gram exits. Dynamic data, on the other hand, will need special facilities to handle
the fluctuations in the demand for memory while the program runs.
file.c
int main()
for(i=0;i<10;i++)
hidden, and the innermost variable is said to be shadowing the outer one.
Listing 2.1 shows an example of a shadowed variable. The new block redeclares
the shadowed variable, and when this program is run, the value “6” is displayed on
the screen. In general, using shadowed variables is not good practice and can lead
to a great deal of confusion to someone reading the code.
#include <stdio.h>
int main() {
int shadowed;
shadowed = 4;
{
int shadowed;
shadowed = 6;
printf("%d\n", shadowed);
}
return 0;
}
Listing 2.1: Variables in inner scopes can shadow variables of the same
name in enclosing scopes.
no bearing on the correctness of a program since the scope is narrower than the
lifetime.
The only problem that could arise with automatic variables comes as an abuse of
pointers. Consider the code of Listing 2.2. Here function f() creates and returns a
pointer to an automatic variable, which function main() captures. However, when
main() goes to use the memory location referenced by the pointer p, that variable is
“dead” and can no longer safely be used. The gcc compiler is kind enough to issue a
warning if we do this:
However, the code actually compiles and a program is produced. Proving again that
C is not picky about what you do, no matter if you mean it or not.
int *f() {
int x;
x = 5;
return &x;
}
int main() {
int *p;
p = f();
*p = 4;
return 0;
}
Listing 2.2: A pointer error caused by automatic destruction of variables.
sprintf(result ,
"%.3s ␣ %.3s%3d ␣ %.2d:%.2d:%.2d ␣ %d\n",
wday_name[timeptr ->tm_wday],
mon_name[timeptr ->tm_mon],
timeptr ->tm_mday , timeptr ->tm_hour ,
timeptr ->tm_min , timeptr ->tm_sec ,
1900 + timeptr ->tm_year);
return result;
}
Listing 2.3: Unlike automatic variables, pointers to static locals can be used
as return values.
piler warned that was a bad idea. The other combination is a variable with a local
scope but a global lifetime. This combination would imply that the variable could
be used only within the block it was declared in but would retain its value between
function invocations. Such a variable type might be useful as a way to eliminate
the need for a global variable to fulfill this role. Anytime a global variable can be
eliminated is usually a good thing.
By declaring a local variable static, the variable now will keep its value between
function invocations, unlike an automatic variable. Interestingly, an implication of
this is that static local variables can safely have pointers to them as return values.
Listing 2.3 shows the man page for the asctime() function, which builds the string
in a static local variable. The advantage to this is that the function can handle the
allocation but does not require the caller to use free() as if malloc() had been
used.
14 Variables: Scope & Lifetime
#include <string.h>
#include <stdio.h>
int main() {
char str[] = "The ␣ quick ␣ brown";
char *tok;
while(tok != NULL) {
printf("token: ␣ %s\n", tok);
tok = strtok(NULL, " ␣ ");
}
return 0;
}
Listing 2.4: The strtok() function can tokenize a string based on a set of
delimiter characters.
The quintessential example of static local variables is the standard library func-
tion strtok(), whose behavior is somewhat atypical. Listing 2.4 shows an example
of splitting a string based on the space character. The first time strtok() is called,
the string to tokenize and the list of delimiters are passed. However, the second
and subsequent times the function is invoked, the string parameter should be NULL.
Additionally, strtok() is destructive to the string that is being tokenized. In order
to understand this, imagine that the following string is passed to strtok() with
space as the delimiter:
t h e q u i c k b r o w n \0
On the first call to strtok(), the return value should point to a string that contains
the word “the.” On the second call, where NULL is passed, the return value should
point to “quick.” If we now examine the original string in a debugger, we would see
the following:
t h e \0 q u i c k \0 b r o w n \0
2.1 Scope and Lifetime in C 15
The delimiter characters have all been replaced by the null terminator! That
explains why we cannot pass the original string on the second call, since even
strtok() will stop processing when it encounters the null, thinking that the string
is over. So now we have “lost” the remainder of the string. The strtok() function,
however, remembers it for us in a static local variable. Passing NULL tells the function
to use the saved pointer from the last call and to pick up tokenizing the string from
the point it left off last.
In most cases, the compiler will decide when to put a value into a register and when
to keep it in memory. In certain cases, the memory location that data is stored
in is special. It could be a shared variable that multiple threads or processes (see
Chapter 9) are using. It could be a special memory location that the Operating
System maps to a piece of hardware. In these cases, the value of the variable in
the register may not match the value that is in memory. Declaring a variable as
volatile tells the C compiler to always reload the variable’s value from memory
before using it. In a way, this could be thought of as the opposite of the register
keyword.
The volatile keyword is somewhat rare to see in normal desktop applications,
but it is useful in systems software when interacting with hardware devices. Though
you may not encounter it much, it is nonetheless important to remember for that
one time you might need it (or more if you develop low-level programs).
16 Variables: Scope & Lifetime
Scope Lifetime
Automatic The block it is defined in The life of the function
Global The entire file plus any files The life of the program
that import it using extern
Static Global The entire file, but may not The life of the program
be imported into any other
file using #1extern
Static Local The block it is defined in The life of the program
Lifetime The time from which a particular memory location is allocated until it is
deallocated.
Shadowing A variable in an inner scope with the same name as one in an enclosing
scope. The innermost declaration is the variable that is used; it shadows the
outermost declaration.
gcc
Preprocessed Object
C source source files Executable
.c cpp cc1 .o ld
If < and > are used, the preprocessor looks on the include path1 where the standard
header files for the system are found. A header file contains function and data type
definitions that are found outside of a particular file. If “ and ” are used instead, the
local directory is searched for the named file to include.
The directive #define creates a macro. A macro is a simple or parameterized
symbol that is expanded to some other text. One common use of a macro is to
define a constant. For example, we might define the value of π in the following way:
#define PI 3.1415926535
We can actually parameterize our macros to allow for more generic substitutions.
For instance, if we frequently wanted to know which of two numbers is larger, we
could create a macro called MAX that does this for us:
#define MAX(a,b) (((a) > (b)) ? (a) : (b))
Notice that we do not need to put any type names in our definition. This is because
the preprocessor has no understanding of code or types; it is just following simple
substitution rules: Whatever is given as a and b will be inserted into the code. For
instance:
c = MAX(3,4);
will become:
c = (((3) > (4)) ? (3) : (4));
However,
c = MAX("bob","fred");
will become:
c = ((("bob") > ("fred")) ? ("bob") : ("fred"));
which is not legal C syntax. The preprocessor will do anything you tell it to, but
there is no guarantee that what it produces is appropriate C.
2 Visual Studio under Windows produces object files with the extension .obj
20 Compiling & Linking: From Code to Executable
2. Libraries
The job of the linker is to assemble the code from these three places and create the
final program file.
The source code is, of course, the code the programmer has written. The li-
braries are collections of code that accomplish common tasks provided by a com-
piler writer, system designer, or other third party.3 C programs nearly always refer to
code provided by the C Standard Library. The C Standard Library contains helpful
functions to deal with input and output, files, strings, math, and other common
programming tasks. For instance, we have made much use of the function printf()
in our programs. To gain access to this function in our code, two independent steps
must be done.
The first step is to inform the compiler that there is a function named printf()
that takes a format string followed by a variable number of arguments. The compiler
needs to know this information to do type checking (to ensure the return value and
parameters match their specifications) and to generate appropriate code (which
will be elaborated upon in Chapter 6). This information is provided by a function
prototype declaration that resides in <stdio.h>.
The second step is to tell the linker how to find this code. The simplest way
to assemble all three sources of code into a program is to literally put it all into
one big file. This is referred to as static linking. It is not the only option, however.
Remember that static is a term that is often used to describe the time a program
is compiled. Its opposite is dynamic — while the program is running. It comes
then as no surprise that we also have the option to dynamically link libraries, so
that the code is not present in the executable program but is inserted later while the
program loads (link loading) or executes (dynamic loading).
Each of these techniques has the same goal: put the code necessary for our pro-
gram to run into ram. Most common computer architectures follow the von Neu-
mann Architecture where both code and data must be loaded into a main memory
3 You can always write your own libraries as well!
3.1 The Stages of Compilation 21
Archives
/usr/lib/libc.a /usr/lib/libm.a
Object
Files Executable
.o ld
Library Code
Linker
Figure 3.2: In static linking, the linker inserts the code from library archives
directly into the executable file.
before instructions can be fetched and executed. Static linking puts the code into
the executable so that when the Operating System loads the program, the code is
trivially there. Dynamic linking defers the loading of library code until runtime.
We will now discuss the issues and trade-offs for each of these three mechanisms.
Static Linking
Static linking occurs during compilation time. Figure 3.2 gives an overview of
the linker’s function during static linking. When the linker is invoked as part of
the compilation process, code is taken from the libraries and inserted into the
executable file directly. The code for the libraries in Unix/Linux comes from archive
(.a) files. The linker reads in these files and then copies the appropriate code into
the executable file, updating the targets of the appropriate call instructions.
The advantages of static linking are primarily simplicity and speed. Because all
targets of calls are known at compile time, efficient machine code can be generated
and executed. Additionally, understanding and debugging the resultant program
is straightforward. A statically-linked program contains all of the code that is
necessary for it to run. There are no dependencies on external files and installation
can be as simple as copying the executable to a new machine.
There are two major disadvantages to static linking, however. The first is an issue
of storage. If you examine a typical Unix/Linux machine, you will find hundreds
of programs that all make use of the printf() function. If every one of these
programs had the code for printf() statically linked into its executable, we would
have megabytes of disk space being used to store just the code for printf().
The second major disadvantage is exposed by examining such programs under
22 Compiling & Linking: From Code to Executable
/usr/lib/libc.so /usr/lib/libm.so
Data (Heap)
Executable
Data (Globals)
ld Linked Code
Text (Code)
Link Loader
Figure 3.3: Dynamic linking resolves calls to library functions at load time.
the light of Software Engineering, where modularity — the ability to break a program
up into simple, reusable pieces — is emphasized. Imagine that a bug in printf()
is subsequently discovered on our system that is entirely statically linked. To fix
our bug, we will have to find every program that uses printf() and then recompile
them to include the fixed library. For programs whose source code we do not have,
we would have to wait until the vendor releases an update, if ever.
Dynamic Linking
A better approach for code that will be shared between multiple programs is to use
dynamic linking. Figure 3.3 shows the process. Notice that the executable file has
already been produced, and that we are about to load and execute the program. In
dynamic linking, the linker is invoked twice: once at compile time and once every
time the program is loaded into memory. For this reason, the linker is sometimes
referred to as a link loader.
When the linker is invoked as part of the compiler (ld as a part of gcc, for
instance) the linker knows that the program will eventually be loaded, and any
library calls will be resolved. The linker then inserts some extra information into
the executable file to help the linker at load time. When the linker runs again, it
takes this information, makes sure the dynamically linked library is loaded into
memory, and updates all the appropriate addresses to point to the proper locations.
On Unix/Linux, dynamically linked libraries are called shared objects and
have .so as their extension. Windows calls them Dynamically Linked Libraries
(appropriately) with the extension .DLL.
Dynamic linking allows for a system to have a single copy of a library stored on
disk and for the library code to be copied into the process’s address space at load time.
3.1 The Stages of Compilation 23
4 The developer can still fix bugs in either the main application or the library without having to
distribute both files. However, this is somewhat minor compared to the advantage of having just one
copy of a library on a system.
24 Compiling & Linking: From Code to Executable
DLL 2
DLL 1
Stack Stack
Figure 3.4: Dynamically loaded libraries are loaded programmatically and on-
demand.
Dynamic Loading
Dynamic loading is a subset of dynamic linking where the linking occurs completely
on demand, usually by a program’s explicit request to the Operating System to load
a library. Figure 3.4 shows an example of a Windows program making a request to
load two dlls programmatically. In Windows, a programmer can make a call to
LoadLibrary() to ask for a particular library to be loaded by passing the name as
a string parameter. Under Unix/Linux there is an analogous call, dlopen().
One challenge for the Operating System in supporting dynamic loading is where
to place the newly loaded code in memory. To handle this and traditional load-time
linking, a portion of a process’s address space (see Chapter 5) will typically be
reserved for libraries.
Care has to be taken by the programmer to handle the error condition that arises
5 Recent versions of Windows protect core dlls from being overwritten by older versions, but this
does not solve 100% of all problem cases.
3.2 Executable File Formats 25
if the load fails. It is possible that a library has been deleted or not installed, and the
code must be robust to not simply crash. This is one significant issue with dynamic
loading, since with compile- or load-time linking the program will not compile
or run without all of its dependencies available. A user will not be happy if they
lose their work because in the middle of doing something the program terminates,
saying that a necessary library was not found.
For this reason, core functionality of the program is probably best included
using static or dynamic linking. Dynamic loading is often used for loading plugins
such as support for additional audio or video formats in a media player or special
effect plugins for an image editor. If the plugins are not present, the user may be
inconvenienced, but they will likely still be able to use the program to do basic tasks.
a.out (Assembler OUTput) — The oldest Unix format, but did not have adequate
support for dynamic linking and thus is no longer commonly used.
COFF (Common Object File Format) — An older Unix format that is also no
longer used, but forms the basis for some other executable file formats used
today.
Mach-O — The Mac osx format, based on the Mach research kernel developed at
cmu in the mid 1980s.
26 Compiling & Linking: From Code to Executable
Though a.out is not used any longer,6 it serves as a good example of what types
of things an executable file might need to contain. The list below lists the seven
sections of an a.out executable:
1. exec header
2. text segment
3. data segment
4. text relocations
5. data relocations
6. symbol table
7. string table
The exec header contains the size information of each of the other sections as a
fixed-size chunk. If we were to define a structure in C to hold this header, it would
look like Listing 3.1.
struct exec {
unsigned long a_midmag; //magic number
unsigned long a_text;
unsigned long a_data;
unsigned long a_bss;
unsigned long a_syms;
unsigned long a_entry;
unsigned long a_trsize;
unsigned long a_drsize;
};
Listing 3.1: The a.out header section.
The magic number is an identifying sequence of bytes that indicates the filetype.
We saw something similar with id3 tags, as they all began with the string “TAG”.
Word documents begin with “DOC”. The loader knows to interpret this file as an
a.out format executable by the value of this magic number.
The text segment contains the program’s instructions and the data segment
contains initialized static data such as the global variables. The header also contains
6 When using gcc without the -o option, you will notice it produces a file named a.out, but this file,
somewhat confusingly, is actually in elf format.
3.2 Executable File Formats 27
the size of the BSS section which tells the loader to reserve a portion of the address
space for static data initialized to zero.7 Since this data is initialized to zero, it does
not take up space in the executable file, and thus only appears as a value in the
header. The two relocation sections allow for the linker to update where external or
relocatable symbols are defined (i.e., what addresses they live at).
The symbol table contains information about internal and external functions
and data, including their names. Since the linker may need to look things up in
this table, we want random access of the symbol table to be quick. The quickest
way to look something up is to do so by index, as with an array. For this to work,
however, we need each record to be a fixed size. Since strings can be variable length,
we want some way to have fixed-sized records that contain variable-sized data. To
accomplish this, we split the table into two parts. The symbol table with fixed-sized
entries, and a string table that contains only the strings. Each record in the symbol
table will contain a pointer to the appropriate string in the string table.
The string table will also contain any string literals that appear in the program’s Java
Content
source code. This is another example of deduplication. In Listing 3.2, we see a
common beginner’s mistake in Java. The == operator tests for equality, but when
applied to objects the equality it tests for is that the two objects live at the same
address. Obviously, we wanted to use the .equals method. But if we compile and
run this, what would we see? The output is:
Equal
7 A reputable link on the meaning of bss states that it is “Block Started by Symbol.” See http://www.
faqs.org/faqs/unix-faq/faq/part1/section-3.html.
28 Compiling & Linking: From Code to Executable
Why did we get the right answer? Java class files are executable files too. And to
save both storage space and network bandwidth when transferred, Java deduplicates
string literals during compilation. When the class file is loaded into memory, the
string literal is only loaded once. Thus the references compare as equal since both
are pointing to the same object in memory. One small change and it will break. If
we change the initialization of s to:
String s = new String("String ␣ literal");
We are now constructing a new object in memory that will live in a different memory
location and our comparison will fail.
This is not specific to Java. In C, we have similar concerns. If we declare a string
variable as:
char s[] = "String ␣ literal";
we get a variable s that points to the string literal. It is unsafe (and generally a
compiler warning) to modify the string literal by doing something like s[0] = 's';.
This is prevented in Java because String objects are immutable.
Executable File or simply “Executable” — A file containing the code and data
necessary to run a program.
Header File A file containing function and data type definitions that refer to code
outside of a given file.
Linker A tool for combining multiple sources of code and data into a single exe-
cutable or program.
Loader The portion of the Operating System that is responsible for transferring
code and data into memory.
Object File A file containing machine code, but calls to functions that are outside
of the particular source file it was generated from are unresolved.
Both code and data live in memory on a computer. We have seen how it is
possible to refer to a piece of data both by name and by its address. We called a
variable containing such an address a pointer. But in addition to being able to point
to data, we can also create pointers to functions. A function pointer is a pointer
that points to the address of loaded code. An example of function pointers is given
in Listing 4.1.
This example simply displays “3” upon the console. The interesting thing to note
is that there is no direct call to f() (there is no f(…) in the code), but there is a call to
g() which seems to have no body. However, we do use f() as the right-hand side of
an assignment to g. The odd declaration of g makes it look like a function prototype.
However, with experience, it becomes apparent that it is a function pointer because
of the fairly unique syntax of having the asterisk inside of the parentheses. The
function pointer g is now pointing to the location where the function f() lives in
memory. We can now call f() directly as we always could by saying f(3), or we
can dereference the pointer g.
Remember that with arrays, the name of an array is a pointer to the beginning
of that array. There is no need to use the dereference operator (*) because the
subscript notation ([ and ]) does the dereference automatically. The same is true for
functions. The name of the function is a pointer to that function. We do not need to
dereference it with a * because the ( and ) do it automatically. Thus as the argument
to printf(), g(3) dereferences the pointer g to obtain the actual function f() and
calls it with the parameter 3.
#include <stdio.h>
int f(int x) {
return x;
}
int main() {
int (*g)(int x);
g = f;
printf("%d\n",g(3));
return 0;
}
Listing 4.1: Function pointer example.
means we have a function pointer that can point to any function that has an empty
parameter list and returns an integer. The difference is in the parenthesization.
int *(*f)();
means we have a function pointer that can point to any function that has an empty
parameter list and returns a pointer to an integer. If we forget the parentheses:
int **f();
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define NUM_NAMES 5
#define MAX_LENGTH 10
int main() {
char names[NUM_NAMES][MAX_LENGTH] = {
"Mary","Bob", "Fred",
"John", "Carl"};
int i;
for(i=0;i<NUM_NAMES; i++) {
printf("%d: ␣ %s\t", i, names[i]);
}
return 0;
}
Listing 4.3: qsort()ing strings with strcmp().
The final parameter looks complex but, based on our earlier discussion, it should
be evident that this is a function pointer (the * inside the parentheses gives it away).
The function we pass to qsort() should return an integer, and take two parameters
that will point to two items in our array which this function is supposed to compare.
The return values for comparator need to be handled as:
< 0, if the first parameter is < the second
comparator = 0, if they are equal
> 0, if the first parameter is > the second
This rule might remind us of the return values for strcmp(). In fact, strcmp()
makes an obvious choice for sorting an array of strings. The only requirement is
that all the strings need to be a fixed size for this to work. Since sorting requires
swapping elements, qsort() must be told as a parameter the size of the elements
in the array in order to rearrange the array elements correctly. Listing 4.3 gives an
34 Function Pointers
example. When we run this code, we get the expected output indicating a successful
sort:
One interesting thing to note is that this code compiles with a warning:
(11) thot $ gcc qs.c
qs.c: In function `main':
qs.c:15: warning: passing arg 4 of `qsort' from incompatible pointer type
This message is telling us that there is something wrong about passing strcmp() to
qsort(). If we look at the formal declaration of strcmp():
int strcmp(const char *str1, const char *str2);
we see the parameters are declared as char * rather than the void * that the
function pointer was declared as in Listing 4.2. This is one warning that is all right
to ignore. In the old days of C, there was no special void * type, and a char * was
used to point to any type when necessary.
To do something more complicated like sorting an array of structures, we will
need to write our own comparator function. Listing 4.4 shows an example. The
program sorts the structures first by age and then by name if there is a “tie.” The
output is the following:
18: Bob 18: Fred 20: John 20: Mary 21: Carl
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define NUM_STUDENTS 5
struct student {
int age;
char *name;
};
int main() {
struct student s[NUM_STUDENTS] = {
{20, "Mary"}, {18, "Bob"}, {18, "Fred"},
{20, "John"}, {21, "Carl"}};
int i;
qsort(s, NUM_STUDENTS ,
sizeof(struct student), mycmp);
After the loader has done its job, the program is now occupying space in mem-
ory. The program in memory is referred to as a process. A process is the Operating
System’s way of managing a running program and keeping it isolated from other
processes on the system. Associated with each process is a range of legal addresses
that the program may reference. These addresses form what is known as an ad-
dress space. To make sure that a programmer does not interfere with any other
running process (either by accident or maliciously), the Operating System needs
to ensure that one program may not change the code or data of another without
explicit permission.
One way to solve this problem of protection is to have the computer enforce
strict limits on the range that an address in a process may take. At each instruction
that references a memory address, that address is checked against this legal range
to ensure that the instruction is only affecting the code or data in that process.
However, this incurs a performance penalty since the cpu has to do extra checking
every time there is a memory operation.
Modern Operating Systems take a different approach. Through a technique
referred to as Virtual Memory, a process is given the illusion that it is the only one
running on the computer. This means that its address space can be all of the memory
the process can reference in the native machine word size. On a 32-bit machine
with 32-bit pointers, a process can pretend to have all 232 = 4 gigabytes to itself. Of
course, even the most expensive high-end computers do not have 4GB of memory
per process of physical ram, so the Operating System needs to work some magic
to make this happen. It makes use of the hard disk, keeping unneeded portions of
your program there until they need to be reloaded into physical memory.
Figure 5.1 shows a diagram of what a typical process’s address space contains.
The loader has placed all of the code and global variables into the low addresses.
But this does not take up all of the space. We also need some dynamic storage
38 Processes & Address Spaces
0x7fffffff
S tack
D ata (H eap)
Data (Globals)
Text (C ode)
Figure 5.1: A process has code and data in its address space.
for function invocations and dynamically generated data. Functions will keep
their associated storage in a dynamically managed section called the stack. Other
dynamic data, which needs to have a lifetime beyond that of a single function call,
will be placed in a structure called a heap.1
Notice that the stack and the heap grow toward each other. In a fixed-sized
region, the best way to accommodate two variable sized regions is to have them
expand toward each other from the ends. This makes sense for managing program
memory as well, since programs that use a large amount of heap space likely will
not use much of the stack, and vice versa. Chapter 6 and Chapter 7 will discuss the
stack and the heap at more length.
5.1 Pages
Memory management by the Operating System is done at a chunk size known as a
page. A page’s size depends on the particulars of a system, but a common size is
4 kilobytes. The Operating System looks at what chunks you are using and those
that you do not have allocated. It is clear that smaller programs will have large
chunks of unallocated space in the region between the stack and the heap. These
unallocated pages do not need to take up physical memory.
1 Unfortunately, the term “heap” has two unrelated meanings in Computer Science. In this context,
we mean the portion of the address space managed for dynamic data. It can also refer to a particular
data structure that maintains sorted order between inserts and deletes, and is commonly used to
implement a priority queue.
5.2 Terms and Definitions 39
Since the Operating System manages the pages of the system, it can do some
tricks to make dynamic libraries even more convenient. We already motivated
dynamic libraries by saying they can save disk storage space by keeping common
code in only one place. However, from our picture of process loading, it would
appear that every shared library is copied into every process’s address space, wasting
ram by having multiple copies of code loaded into memory. Since physical memory
is an even more precious resource than disk storage, this seems to be less of a benefit
than we initially thought. The good news is that the Operating System can share the
pages in which the libraries are loaded with every process that needs to access them.
This way, only one copy of the library’s code will be loaded in physical memory, but
every program can use the code.
Address Space The region of memory associated with a process for its code and
data.
Page The unit of allocation and management at the Operating System level.
Virtual Memory The mechanism by which a process appears to have the memory
of the computer to itself.
6 | Stacks, Calling Conventions, &
Activation Records
The first type of dynamic data we will deal with is local variables. Local variables
are associated with function invocations, and since the compiler does not necessarily
know how many times a function might be called in a program, there is no way to
predict statically how much memory a program needs. As such, a portion of memory
needs to be dedicated to holding the local variables and other data associated with
function calls. Data in this region should ideally be created on a function’s call and
destroyed at a function’s return. If we look at the effect of having a function call
other functions, we see that the local variables of the most recent function call are
the ones that are most important. In other words, the local variables created last
are used, and are the first ones to be destroyed; those created earlier can be ignored.
This behavior is reminiscent of a stack.
To create a stack we need some dedicated storage space and a means to indicate
where the top of the stack lives. There is a large amount of unused memory in the
address space after loading the code and global data. That unused space can be used
for the stack. Since practically every program will be written with functions and
local variables, the architecture will usually have a register, called the stack pointer,
dedicated to storing the top of the stack.
With storage and a stack pointer, we can make great strides in managing the
dynamic memory needs of functions. When a function is compiled, the compiler
can figure out how many bytes are needed to store all of the local variables in that
function and then write an instruction that adjusts the stack pointer by that much
on every function call. When we want to deallocate that memory on function return,
we could adjust the stack pointer in the opposite direction.
Other than local variables, what information might we want to store on the
stack? Since the concept of a stack is so intrinsically linked with function calls, it
41
..
.
Caller’s data
..
.
Shared Boundary
..
.
Callee’s data
..
.
Figure 6.1: The shared boundary between two functions is a logical place to set
up parameters.
seems to make sense that the return address of the function should be stored on
the stack as well. If we examine the caller/callee relationship, we see that their data
will be in adjacent locations on the stack. Figure 6.1 shows the two stack entries
and the boundary between them. If the caller function needs to set up arguments
for the callee, this boundary seems a natural place to pass them.
On machines with many registers, some registers may be designated for tem-
porary values in the computation of complicated expressions. These temporary
registers may be free for any function to use, and thus, if a function wants a particu-
lar register to be saved across an intervening call to another function, the calling
function must save it on the stack. This is referred to as a caller-saved register. Other
registers may be counted upon to retain their values across intervening function
calls, and if a called function wants to use them, it is responsible for saving it on
the stack and restoring them before the function returns. These are callee-saved
registers. In general, the stack is used to save any architectural state that needs to be
preserved.
We now have the following pieces of data that need to be on the stack:
1. The local variables — including temporary storage, such as for saving registers
3. The parameters
All of these together will form a function’s activation record (sometimes called a
frame). Not every system will have all of these as part of an activation record. How
42 Stacks, Calling Conventions, & Activation Records
MIPS x86
Arguments: First 4 in %a0–%a3, re- Generally all on stack
mainder on stack
Return values: %v0–%v1 %eax
Caller-saved registers: %t0–%t9 %eax, %ecx, & %edx
Callee-saved registers: %s0–%s9 Usually none
the call instruction pushes the return address onto the top of the stack and then
jumps to the subroutine. Below is a diagram of the current state of the stack (the
dots indicate extra space left unused due to alignment).
printArray(3, 4, "abc");
displays its parameters on the screen, one per line.
6.2 Variadic Functions 47
#include <stdio.h>
#include <stdarg.h>
int main() {
int *p;
int i;
p = makearray(1,2,3,4,-1);
for(i=0;i<5;i++)
printf("%d\n", p[i]);
return 0;
}
Listing 6.1: A variadic function to turn its arguments into an array.
48 Stacks, Calling Conventions, & Activation Records
had been a system service, running as a superuser on the machine, the attacker
could have injected code to make themselves a superuser too. This type of attack is
known as a buffer-overrun vulnerability.
Great care must always be taken with arrays on the stack. Overrunning the end
of the buffer means writing over activation records, and possibly subjecting the
system to an attack. Always check that the destination of a copy is large enough to
hold the source!
While the stack is useful for maintaining data necessary to support function calls,
programs also may want to perform dynamic data allocation. Dynamic allocation is
necessary for data that has an unknown size at compile-time, an unknown number
at compile-time, and/or whose lifetime must extend beyond that of the function
that creates it. The remaining portion of our address space is devoted to the storage
of this type of dynamic data, in a region called the heap.
As is often the case, there are many ways to track and manage the allocation of
memory. There are trade-offs between ease of allocation and deallocation, whether
it is done manually or it is automatic, and the speed and memory efficiency need
to be considered. Also as usual, the answer to which approach is best depends on
many factors.
This chapter starts by describing the major approaches to allocation and deal-
location. We first describe the two major ways to track memory allocation. The
first is a bitmap — an array of bits, one per allocated chunk of memory — that
indicates whether or not the corresponding chunk has been allocated. The second
management data structure is a linked list that stores contiguous regions of free
or allocated memory. A third technique, the Buddy Allocator attempts to reduce
wasted space from many allocations. The chapter also describes an example imple-
mentation of malloc(), the C Standard Library mechanism for dynamic memory
allocation.
7.1 Allocation
The two operations we will be concerned with are allocation, the request for memory
of a particular size, and deallocation, returning allocated memory back to the system
7.1 Allocation 51
for subsequent allocations to use. The use of the stack for function calls led us
to create activation records upon function invocation and to remove them from
the stack on function return. In essence, we were allocating and deallocating the
activation records at runtime — the very operations we are attempting to define for
the heap.
The question is then, why is the stack insufficient and what is different about
the heap? As the name implies, the stack is managed as a fifo with allocation
corresponding to a push operation and deallocation corresponding to pop. This
worked for function calls because the most recently called function, the one whose
activation record was allocated most recently and lives at the top of the stack, is
the one that returns first. Deallocations always occur in the opposite order from
the allocations. New allocations always occur at the top of the stack, and with the
stack growing from higher addresses to lower ones by convention, this means that
all space above the stack pointer is in-use. All space at lower addresses is free or not
part of the stack.
Thus, allocation is simply moving the top of the stack, and deallocation is moving
it back. But for objects whose lifetime is not limited to the activation of a particular
function, this order requirement is too restrictive. We would like to be able to
allocate objects A, B, and C, and then deallocate object B. This leaves an unallocated
region in the middle that we may wish to reuse to allocate object D.
In this section, we are considering this more general case of allocation: the
possibility that we have free space in between allocated spaces. We need to track
that space and to allocate from it. With that in mind, the simple dividing line
between free and used space that the stack pointer represented is insufficient and
we need to use a more flexible scheme.
A B C D
0 8 16 24 32
Memory regions
Figure 7.1: A bitmap can store whether a chunk of memory is allocated or free.
A B C D
0 8 16 24 32
Memory regions
A 0 6 - 6 4 B 10 3 - 13 4 C 17 9
D 26 3 - 29 3 Linked List
Figure 7.2: A linked list can store allocated and unallocated regions.
may be slow. Practically, it would involve a lot of bit shifting and masking. As
such, using bitmaps for any significant tracking of dynamic memory allocations is
unlikely, but bitmaps do often find a use in tracking disk space allocation.
in the bitmap only required a single bit’s worth of storage, how large might a list
node be? We need to store the size, the start, and links for the linked list. For faster
deallocation support, we probably want this to be a doubly-linked list, requiring us
to have two node pointers. Assuming all of these fields are four bytes in size, we
would need 4 × 4 = 16 bytes or 16 × 8 = 128 bits. Thus, to reserve space for the
worst case scenario, we would need 128n bits where the bitmap only needed n bits.
The linked list is 128 times the size!
This is horrible and we may wonder how we began by trying to reduce the size
of a data structure but ended up making it 128 times worse. The answer is in the
worst case scenarios. They were the worse case because they eliminated the runs
that were the bases of our compression. When our assumptions are not valid, our
end result is likely to come out worse. The good news is that our assumptions are
valid in the typical case. Such degenerate linked lists are not likely to result from
the normal use of dynamic memory.
While we have convinced ourselves the linked list is still a valid approach, we still
need a good solution for where to store the elements of the linked list. Reserving
the space in advance is not feasible. A better solution might be to think of the
memory-tracking data structure as a “tax” on the region of memory we are tracking.
For a bitmap, we pay a fixed-rate tax off the top — before we have even used the
region. For paying that tax, we never have to pay again. For the linked list however,
we could instead pay a tax on each allocation. Every time we get a request for
dynamic memory, we could allocate a bit extra to store the newly-required list node.
For instance, if we get a request for 100 bytes, we actually allocate 116 bytes and use
the additional space to store one of the nodes we described above.
First fit Find the first free block, starting from the beginning, that can accommo-
date the request.
Next fit Find the first free block that can accommodate the request, starting where
56 Dynamic Memory Allocation Management
the last search left off, wrapping back to the beginning if necessary.
Best fit Find the free block that is closest in size to the request.
Worst fit Find the free block with the most left over after fulfilling the allocation
request.
Quick fit Keep several lists of free blocks of common sizes, allocate from the list
that nearest matches the request.
First fit is the simplest of the algorithms, but suffers from unnecessary repeated
traversals of the linked list. Each time first fit runs, it starts at the beginning of the
list and must search over space that is unlikely to have any free spaces due to having
allocated from the beginning each prior time the function was called. To avoid this
cost, we could remember where the last allocation happened and start the search
from there. This modification of first fit is called next fit.
Both of these algorithms take the first free block they find, which may not be
ideal. This strategy may leave uselessly small blocks or prevent a later request from
being fulfilled because a large free block was split when a smaller free spot elsewhere
in the list might have been a better fit. This wasted space between allocations is
external fragmentation.
To avoid external fragmentation, we may wish to search for the best fit. The
best fit algorithm searches the entire list looking for the free space that is closest in
size to the request. This means that we will never stop a large future request from
being fulfilled because we took a large block and split it unnecessarily. However,
this algorithm can turn out to be poor in actual usage because we end up with many
uselessly small leftovers when the free space is just slightly larger than the request.
This is guaranteed to be as small as possible whenever an exact fit is not found, due
to the difference between the free space and the allocation being minimized by our
definition of “best”. Additionally, best fit is slow because we must go through the
entire linked list, unless we are lucky enough to find a perfect fit.
To avoid having many small pieces remain, we could do the exact opposite
from best fit, and find the worst fit for a request. This should leave a free chunk
after allocation that remains usefully large. As with best fit, the entire list must
be searched to find the worst fit, resulting in poor runtime performance. Unlike
best fit, which could stop early upon finding a perfect fit, the worst fit cannot be
known without examining every free chunk. Despite our intuition, simulation of
this algorithm reveals that it is not very good in practice. An insight into why is that
7.2 Deallocation 57
after several allocations, all of the free chunks are around the same, small size. This
is bad for big requests and makes looking through the whole list useless as every
free chunk is about equal size.
An alternative to the search-based algorithms, quick fit acknowledges that most
allocations come clustered in certain sizes. To support these common sizes, quick fit
uses several lists of free spaces, with each list containing blocks of a predetermined
size. When an allocation request is made, quick fit looks at the list most appropriate
for the request. Performance is good because searching is eliminated: With a fixed
number of lists, determining the right list takes constant time. Leftover space
(internal fragmentation) can be bounded since an appropriately-sized piece of
memory is allocated. If the lists were selected to match the needs of the program
making the allocation, this would leave very little wasted space. Additionally, that
leftover space should not harm future large requests because the large requests
would be fulfilled from a different list. One issue with quick fit is the question of
whether or not to coalesce free nodes on deallocation or to simply return them to
their appropriate list. One solution is to provide a configurable parameter to the
allocator that says how many adjacent small free nodes are allowed to exist before
they are collapsed into one. This ensures large unallocated regions as well as enough
of the more common smaller regions.
The two likely “winners” of the allocation battle are next fit and quick fit. They
both avoid searching the entire list yet manage to fulfill requests and mostly avoid
fragmentation. The gnu glibc implementation of malloc() uses a hybrid approach
that combines a quick fit scheme with best fit. The writers claim that while it is not
the theoretically best performing malloc(), it is consistently good in practice.
7.2 Deallocation
The other important operation to consider is deallocation. This is where the true
distinction against stack allocation is drawn. Whenever we free space on the stack,
we reclaim only the most recently allocated data. The heap has no such organization,
and thus deallocations may occur regardless of the original allocation order. Since
the stack is completely full from bottom to top, the only bookkeeping necessary is
an architectural register to store the location of the top. The heap, on the other hand,
will inevitably have “holes” — free spaces from past deallocations — that will arise.
Keeping track of the locations of these holes motivates the use of a data structure
such as a bitmap or linked list. In this section, we look at the various approaches a
58 Dynamic Memory Allocation Management
A X B A B
A X A
X B B
When we allocate some memory, our linked list changes. The free node is split into
two parts: the newly allocated part and the leftover free part. Eventually dealloca-
tions happen, and it is time to release a once-used region of memory. Figure 7.3
shows the four scenarios that we might find when doing a free operation. The top-
most shows a region being deallocated (indicated with an ‘X’) that has two allocated
neighboring regions; in this case, we simply mark the middle region as free. The
second and third cases show when the left or right neighbor is free. In this case, we
want to coalesce the free nodes into a single node so that we may later allocate this
as one large contiguous region. The final case shows both of our neighbors being
free, and thus we will need to coalesce them all.
To facilitate coalescing nodes, we may want to use a doubly linked list, which
has pointers to the next node as well as the previous node. Note that we do not want
to coalesce allocated nodes because we would like to be able to search the linked
list for a particular allocation by name (or address, if that is what we are storing as
the “name”).
For all of the flaws of using bitmaps for dynamic memory management, deallocation
of a bitmap-managed region is surprisingly simple: the appropriate bits switch
from 1 (allocated) to 0 (deallocated). The beauty of this approach is that free regions
are coalesced without any explicit effort. The newly freed region’s zeros naturally
“melt” into any neighboring free space.
7.2 Deallocation 59
Reference Counting
We have already determined that a dynamically-allocated object is garbage and can
be collected when it has been leaked and there are no longer any valid references to
the memory. Possibly the simplest way to determine this is to count valid links to
an object and, when the count reaches zero, automatically free the memory. This
strategy is known as reference counting and can be implemented relatively easily
even in native code.
Each object needs a reference count variable associated with it. This variable is
incremented or decremented as the program runs. It will need to be updated:
60 Dynamic Memory Allocation Management
Refs = 2 Refs = 1
A B
ptr
Figure 7.4: Reference counting can lead to memory leaks. If the pointer ptr
goes out of scope, the circularly linked list should be collected. How-
ever, each object retains one pointer to the other, leading to neither
having the requisite zero reference count for deallocation.
When a reference goes out of scope, the reference count on the associated object
must be decremented. Copying references affects both sides of the assignment. The
left-hand side (often called an l-value) might have been referring to an object prior
to the assignment. This reference is now going to be lost from the overwrite, so the
original object’s reference count must be decremented. The right-hand side of the
assignment (predictably called the r-value) is now going to have one more reference
to it and the associated counter must be incremented accordingly.
When an object’s reference count reaches zero, the object is garbage and can be
collected. This might happen while the program is running (making it a concurrent
collector) or at periodic breaks in the program’s execution (a stop-the-world collec-
tor). The act of garbage collection can be as easy as freeing the object with whatever
heap management operation is available.
A problem that can arise in a reference counting garbage collector is, remarkably,
that it can leak memory. If a data structure has a cycle, such as in a circularly linked
list as shown in Figure 7.4, there can be no way to collect the data structure. With
a cycle, there is at least one reference to each object that remains even after all
references from the program code are gone. Since the reference count never reaches
zero, the objects are not freed. Possible solutions to this problem include detecting
that the objects are part of a cycle or by using one of the other garbage collection
algorithms.
7.2 Deallocation 61
In-place Collectors
Another approach to garbage collection is via an in-place collector. The process is
comprised of two phases: a mark phase and a sweep phase. During the mark phase,
all of the references found on the stack are followed to find the objects to which
they refer. Those objects may contain references themselves. As the algorithm
traverses this graph of references, it marks each object it encounters as reachable,
thus indicating it is not to be collected. When every reference that can be reached
has its associated object marked, the algorithm switches to the sweep phase.
In the sweep phase, all unmarked objects are freed from the heap. All that
remains are the reachable objects. When the deallocation is finished, all of the
marked objects are reset to unmarked so that the process may begin all over again
when the garbage collector is invoked the next time.
This mark and sweep approach is simple and relatively fast. It avoids cycles
because encountering an object we have already seen can be detected as the object
will be already marked as seen. It suffers from a significant problem, however. The
newly freed space might be between objects that are still alive and remain in the heap.
We now have holes that are small and scattered throughout memory, rather than a
big free contiguous chunk of memory from which to allocate new objects. While
there might be a significant fraction of space that is free, it might be fragmented to
the point of being unusable. This is once again, external fragmentation, and was the
motivation behind coalescing the adjacent free nodes in a linked list management
scheme.
Copying Collectors
To fix the fragmentation issue of the in-place collector, a garbage collector could
compact the region by moving all of the objects closer together. This would con-
stitute a third compaction phase and is actually unnecessary. We can avoid a third
phase by combining deallocation and compaction into a single pass through the
heap.
Copying garbage collectors such as the semispace collector typically divide the
heap into two halves and copy from the full half into the reserved, empty half.
Figure 7.5 shows an example. In Figure 7.5a, objects B and D have been designated
unreachable and should be freed. Rather than explicitly do this freeing and be left
with two small holes in memory, objects B and D are left untouched. Objects A
and C are referenced and thus alive. A copying collector will move these live objects
to the reserved half of the heap, placing them contiguously to avoid wasting space.
62 Dynamic Memory Allocation Management
A B C D A C
Heap Heap
(a) Objects B and D are unreachable and the (b) Objects A and C are moved to the re-
heap is nearly half full. served half, and the original half is
marked as free.
Figure 7.5: Copying garbage collectors divide the heap in half and move the
in-use data to the reserved half, which has been left empty.
Stack
$sp
Unallocated Space
brk
Heap
_end
Globals
malloc() gets a request for allocation that cannot fit, it extends brk. The heap is
exhausted if the break gets too near the top of the stack. Likewise, the stack may be
exhausted (usually from deep recursion) if it gets too close to brk.
Typically malloc() uses a linked list allocation strategy to track free and allo-
cated space. One of the issues with linked lists is the question of where to store the
list. An implementation of malloc() might store the linked list inside the heap,
with each node near the allocated region. This allows calls to free() to easily access
the size field of the node in the list corresponding to the region to reclaim. The
drawback to this is that any over- or under-run of a heap-allocated buffer may
overwrite the list, resulting in a corrupted heap.
Not all implementations of malloc() adjust the value of brk. The gnu imple-
mentation in glibc uses mmap() for allocations beyond 128KB. The mmap() system
call requests pages directly from the Operating System. Some malloc()s use only
mmap() for allocation.
64 Dynamic Memory Allocation Management
coalesced back into a node of size 2k+1 . The free buddy will be removed from the
linked list of free spaces of size 2k and the combined space will be inserted into the
free list of size 2k+1 . Since this may result in both a region and its buddy being free
at size 2k+1 , we can repeat this process with progressively larger regions until the
newly coalesced region’s buddy is not free or only the original entire free space is
left.
External Fragmentation Free space that is too small to be useful, a result of deal-
location without compaction.
Heap A region of a process’s address space dedicated to dynamic data whose lifetime
extends beyond that of the function that creates it.
Internal Fragmentation Wasted space due to the minimum allocation unit being
too large.
The Operating System (os) is a special process on a computer responsible for two
major tasks: managing resources and abstracting details. An os manages the shared
resources on the computer. These resources include the cpu, ram, disk storage,
and other input and output devices. The os is also useful in abstracting the specific
details of the system away from application programmers. For instance, an os may
provide a uniform way to print a document to a printer regardless of its specific
make and model.
The core process of the os is called the kernel. The kernel runs at the highest
privilege level that the cpu allows and thus can perform any action. The kernel
is responsible for management and protection; it should be the most trusted com-
ponent on the system. The kernel runs in its own address space that is referred to
as kernel space. Programs that are not the kernel, referred to as user programs,
cannot access the memory of the kernel and run at a lower privilege level. This
portion of the computer is called user space.
Application programmers access the facilities that the Operating System pro-
vides via system calls. System calls are functions that the Operating System can do,
usually related to process management and i/o operations.
Return to caller
L ib ra ry
Trap to kernel
(p rin tf c a ll)
3 Trap code in register 7
User 8
space
2
4 9 Increment SP
User
Call printf
code
1 Push arguments
K e rn e l
S y s c a ll
space D is p a tc h
(O S ) 5 6 h a n d le r
Figure 8.1: A library call usually wraps one or more system calls.
to a call to the C Standard Library. In the library, the code to interpolate the
arguments is run, and the final output string is prepared. When it is finally time to
display the string, the library makes a system call to do the actual i/o operation. The
Unix/Linux utility strace provides a list of system calls made during the execution
of a program. In Figure 8.2 we see the system calls made by a “hello world” program
using printf().
The Unix and Linux Operating Systems provide a write() system call that
interacts with i/o devices. The first parameter being the value 1 indicates that
the output should go to the stdout device. Section 8.1.2 will detail the i/o calls
provided by Unix and Linux systems.
68 Operating System Interaction
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int main() {
int fd;
char buffer[100];
strcpy(buffer , "Hello, ␣ World!\n");
return 0;
}
Listing 8.1: Using the Unix system calls to do a “Hello World” program.
70 Operating System Interaction
Notice that stdout and stderr both display upon the screen by default. They are
separate streams, however, and may be redirected or piped independently of each
other.
Another thing to notice about Listing 8.1 is the second parameter to open().
Two macros are bitwise-ORed together. If we look for the definitions of these in the
header files, we see them as:
#define O_RDONLY 0
#define O_WRONLY 1
#define O_RDWR 2
#define O_CREAT 16
which are all powers of two. This technique is common when we want to send
several flags that affect the operation of a function. Each separate bit in an integer
can be seen as an independent boolean flag. Bitwise-ORing them together allows
the programmer to specify one or more flags simultaneously. In this example the
flags to open the file for writing and to create it if it does not already exist are set.
The implementation can check whether a particular flag is set by bitwise-ANDing
the parameter with the same constants. This technique is also very commonly seen
in the functions Microsoft Windows provides.
#include <stdio.h>
#include <unistd.h>
int main() {
if(fork()==0) {
printf("Hi ␣ from ␣ the ␣ child!\n");
}
else {
printf("Hi ␣ from ␣ the ␣ parent\n");
}
been displayed. On the test run, the following output was seen:
This indicates that the child process ran and completed before the parent process
resumed its execution.
#include <stdio.h>
#include <unistd.h>
int main() {
if(fork()==0) {
char *args[3] = {"ls", "-al", NULL};
execvp(args[0], args);
}
else {
int status;
wait(&status);
printf("Hi ␣ from ␣ the ␣ parent\n");
}
return 0;
}
Listing 8.3: Launching a child process using fork and execvp.
8.2 Signals
A signal is a message from the Operating System to a user space program. Signals
are generally used to indicate error conditions, in much the same way that Java
Exceptions function. A program can register a handler to “catch” a particular signal
and will be asynchronously notified of the signal without the need to poll. Polling is
simply the action of repeatedly querying (e.g., in a loop) whether something is true.
Figure 8.3 shows a list of the os signals on a modern Linux machine. You can
generate a complete list for your system by executing the command kill -l. Most
signals tend to fall into a few major categories. There are the error signals, which
indicate something has gone awry:
SIGBUS A bus error, usually caused by bad data alignment or a bad address.
The remaining signals send information about the state of the os, including
things like the terminal window has been resized or a process was paused.
#include <unistd.h>
#include <sys/types.h>
#include <signal.h>
int main() {
pid_t my_pid = getpid();
kill(my_pid , SIGSTOP);
return 0;
}
Listing 8.4: Signals can be sent programmatically via kill.
#include <unistd.h>
#include <signal.h>
int main() {
signal(SIGALRM , catch_alarm);
alarm(1);
while(timer > 0) ;
alarm(0);
return 0;
}
Listing 8.5: SIGALRM can be used to notify a program after a given amount of
time has elapsed.
76 Operating System Interaction
serting them would require rewriting the code. With the x86’s variable-length
instruction set architecture, a two-byte breakpoint might overwrite more than one
instruction. This would be problematic if a particular breakpoint was skipped over
and the target of a jump was the second byte of the breakpoint. To avoid this prob-
lem, the breakpoint trap is given a special one-byte encoding. Remembering this
encoding may come in handy if you ever are dealing with low-level code and want
to insert a breakpoint by hand.
Context The state of a process, necessary to restart it where it left off. Usually
includes the state of the registers and other hardware details.
Context Switch The act of saving the context of a running process and restoring
the context of a suspended process in order to change the currently running
program.
Interrupt A cpu instruction or signal (the voltage kind) issued by hardware that in-
terrupts the currently executing code and jumps to a handler routine installed
by the Operating System. On Intel x86 computers, there is no distinction in
name between an interrupt and a trap; both are referred to as interrupts.
Operating System A program that manages resources and abstracts details of hard-
ware away from application programmers.
Signal A message from the Operating System, delivered asynchronously, that usu-
ally indicates an error has occurred.
8.3 Terms and Definitions 77
System Call A function that the Operating System provides to user programs to
interact with the system and/or perform i/o operations.
Trap A software interrupt, usually used to signal the cpu to cross into kernel space
from user space. See also: interrupt
User Program Any application that is not part of the Operating System and runs
in User Space.
User Space The unprivileged portion of the computer in which user programs run.
D
C
B
A
Time
Figure 9.1: While the CPU sees just one stream of instructions, each process
believes that it has exclusive access to the CPU.
1 While this queue may be managed in fifo order, we will use ‘queue’ just to imply a set of waiting
objects.
80 Multiprogramming & Threading
Created
Ready
Blocked Running
Exit
Figure 9.2: The life cycle of a process. Dashed lines indicate an abnormal termi-
nation.
associated context. A thread’s context should be small, since the Operating System
will still manage the process as a whole. For instance, a list of open files is part of
a process’s context but not an individual thread’s. Figure 9.3 shows a list of what
might be part of a process’s context and what context needs to be stored per thread.
If we define a stream of instructions as a function, we can more easily see what state
we need to store. A function needs the machine registers and a stack of its own.
This will form a minimal thread context.
9.1 Threads
Every process has one thread by definition. An application must explicitly create
any additional threads it needs. Support for threading can come from two sources.
In user threading, a user-space library provides threading support with minimal,
if any, support from the Operating System. In kernel threading, the Operating
System has full support for threads and manages them in much the same way as it
manages processes.
Having solved that problem, let us tackle a second. Imagine that a thread is
executing some code and comes upon an i/o operation. The operation traps into
the kernel, and finding the data not yet ready, the kernel moves the process into
the blocked state. One of the original motivations was that during these times of
being blocked, we would like to run some other code to do some useful work, so we
might assume that another thread will get to run. But remember, the kernel knows
nothing of the threads and has put the entire process to sleep. There is no way any
other code in the process can run until the i/o request has finished.
Though we found a reasonable way around the yielding problem, this seems the
death knell for user threads. There is no way to avoid the inevitable block that will
happen to the process when the i/o operation cannot be completed. The only way
around it would be if there was some facility by which going into a blocked state
could be prevented. If an Operating System has a facility for non-blocking i/o calls,
the user threading library could use them and insert a yield to run a different thread
until the requested data was ready.
Unix/Linux systems have a system call named select() or poll() that tells
whether a given i/o operation would block. It has the added side effect of doing the
actual i/o request. Since select() can be non-blocking, the threading library could
provide its own version of the i/o calls. A thread would use the library’s routines,
and when in the library, the library could make a call to select() to see if the
operation would block. If it would, it can put that thread to sleep and allow another
thread to run. The next time we are in the library, via a yield(), a create(), or an
i/o call for another thread, we can check to see whether the original call is ready,
and if so, unblock the requesting thread.
Since an Operating System needs to have non-blocking i/o support to make user
threading work, it is arguable that user threads actually require no explicit support.
The select() call is useful for more than just threads, including checking to see
whether any network packets have arrived. While this minimal level of functionality
is required, we will see with kernel threads that the level of os support is far beyond
a single system call.
9.2 Terms and Definitions 83
Preemption Interrupting a process because it has had the cpu for some amount
of time in order to allow another process to run.
Scheduler The portion of the Operating System responsible for choosing which
process gets to run.
User Threading Threading done via a user space library that provides thread sup-
port with minimal, if any, support from the Operating System.
10 | Practical Threading, Synchro-
nization, & Deadlocks
#include <stdio.h>
#include <pthread.h>
int main() {
pthread_t thread;
int id, arg1, arg2;
arg1 = 1;
id = pthread_create( &thread , NULL,
do_stuff , (void *)&arg1 );
arg2 = 2;
do_stuff((void *)&arg2);
return 0;
}
Listing 10.1: Basic thread creation.
thread. This identifier is “opaque,” meaning that we do not know what type this
identifier is (integer, structure, etc.) and we should not depend on its being any
particular type or having any particular value. The second parameter controls how
the thread is initialized, and for most simple implementations it can be set to NULL
to take on the defaults.
The third and fourth parameters specify the stream of instructions to run in the
new thread. First comes a pointer to a function containing the code to run. This
function can, of course, call other functions, but it could be considered analogous
to a “main” function for that particular thread. The signature of the function must
be such that it takes and returns a void *. Because of the strict type checking done
on passing function pointers in terms of return values and parameters, this function
needs to be as generic as possible while still having a well-defined prototype. The
advantage in using a void * is that it can point to anything, even an aggregate data
type like an array or structure. In this way, no matter how many arguments are
actually needed, the function can receive them. The final parameter is the actual
parameters to pass to this function, which can be NULL if unnecessary.
10.1 Threading with pthreads 87
int main() {
pthread_t thread;
int id, arg1, arg2;
arg1 = 1;
id = pthread_create(&thread , NULL,
do_stuff , (void *)&arg1);
pthread_yield();
arg2 = 2;
do_stuff((void *)&arg2);
return 0;
}
Listing 10.2: Inserting a yield to voluntarily give up execution.
int main() {
pthread_t thread;
int id, arg1, arg2;
arg1 = 1;
id = pthread_create(&thread , NULL,
do_stuff , (void *)&arg1);
arg2 = 2;
do_stuff((void *)&arg2);
pthread_join(thread , NULL);
return 0;
}
Listing 10.3: Waiting for the spawned thread to complete.
thread(s) might be blocked, or the scheduler might simply ignore the yield. A better
solution is to force the process to wait until the other thread completes.
Listing 10.3 illustrates the better way to ensure threads complete. Calling the
pthread_join() function blocks the thread that issued the call until the thread
specified in the parameter finishes. The second parameter to the pthread_join()
call is a void **. When a function needs to change a parameter, we pass a pointer
to it. Passing a pointer-to-a-pointer allows a function to alter a pointer parameter,
in this case, setting it to the return value of the thread the join is waiting on. We
can, of course, choose to ignore this parameter, in which case we can simply pass
NULL. Note, however, joining the threads still does not guarantee they will run in
any particular order before the call to pthread_join().
The moral of the threading story is that, unless explicitly managed, threads
run in no guaranteed order. This lesson becomes even more important when we
begin to access shared resources in multiple threads concurrently. When we need
to manipulate shared objects, we may need to ensure a particular order is preserved,
which leads to the next topic: Synchronization.
10.2 Synchronization
Imagine that there are two threads, Thread 0 and Thread 1, as in Figure 10.1. At
time 3, Thread 0 is preempted and Thread 1 begins to run, accessing the same
memory location X. Because Thread 0 did not get to write back its increment to
10.2 Synchronization 89
Thread 0 Thread 1
1 read X
2 X = X + 1
3 read X
4 X = X + 1
5 write X
6 write X
Figure 10.1: Two threads independently running the same code can lead to a
race condition.
memory, Thread 1 has read an older version. Whichever thread writes last is the
one that makes the update, and the other is lost. When the order of operations,
including any possible preemptions, results in different values, the program is said
to have a race condition.
Determining whether code is susceptible to race conditions is an exercise in
Murphy’s Law.1 Race conditions occur when code that accesses a shared variable
may be interrupted before the change can be written back. The obvious solution to
preventing a race condition is to simply forbid the thread from being interrupted
during execution of this critical region of code. Allowing a user-space process to
control whether it can be preempted is a bad idea, however. If a user program were
allowed to do this, it could simply monopolize the cpu and not allow any other
programs to run. Whatever the solution, it will require the help of the Operating
System, as it is the only part of the system we can trust to make sure an action is not
interrupted.
A better solution is to allow a thread to designate a portion of its code as a
critical region and control whether other threads can enter the region. If a thread
has already entered a critical region of code, all other threads should be blocked
from entering. This lets other threads still run and do “non-critical” code; we have
not given up any parallelism. The marking of a critical region itself must not be
interruptible, a trait we refer to as being “atomic.” This atomicity and the ability
to make other threads block means that we need the Operating System or the
user-thread scheduler’s help.
Several different mechanisms for synchronization are in common use. We will
focus on three in this text, although a fourth, known as a Monitor, forms the basis
Thread 0 Thread 1
1 lock mutex
2 read X
3 X = X + 1
4 lock mutex
5 write X
6 unlock mutex
7 read X
8 X = X + 1
9 write X
10 unlock mutex
for Java’s support for synchronization. The pthread library provides support for
Mutexes and Condition Variables. Semaphores can be used with the inclusion of a
separate header file.
The pthread library provides an abstraction layer for synchronization primitives.
Regardless of the facilities of the Operating System, with respect to its support
for threading or synchronization, mutexes and condition variables will always be
available.
10.2.1 Mutexes
The first synchronization primitive is a mutex. The term comes from the phrase
Mutual Exclusion. Mutual exclusion is exactly what we are looking for with respect
to critical regions. We want each thread’s entry into a particular critical region to
be exclusive from any other thread’s entry. A mutex behaves as a simple lock, and
thus we get the two operations lock() and unlock() to perform.
With a mutex and the lock and unlock operations, we can solve the problem of
Figure 10.1. Figure 10.2 assumes a mutex variable named mutex that is initially in
an unlocked state. Thread 0 comes along first, locks the mutex, and proceeds to do
its work up until time 4, when it is preempted and Thread 1 takes over. Thread 1
attempts to acquire the mutex lock but fails and is blocked. With no other threads
to run, Thread 0 resumes and finishes its work, unlocking the mutex. With the
mutex now unlocked, the next time that Thread 1 is scheduled to run it can, as it is
no longer in the blocked state.
The pthread library provides a simple and convenient way to use mutexes. A
10.2 Synchronization 91
#include <stdio.h>
#include <pthread.h>
int tail = 0;
int A[20];
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
Shared Variables
#define N 10;
int buffer[N];
int in = 0, out = 0, counter = 0;
Consumer Producer
while(1) { while(1) {
if(counter == 0) if(counter == N)
sleep(); sleep();
... = buffer[out]; buffer[in] = ... ;
out = (out+1) % N; in = (in+1) % N;
counter --; counter++;
if(counter == N-1) if(counter==1)
wakeup(producer); wakeup(consumer);
} }
buffer was full before we consumed an item, and if the buffer was full, the producer
is asleep, so wake it up.
However, there is a subtle problem here. Imagine that the consumer is running,
executes the if(counter==0) line, and finds the buffer empty. But right before the
sleep is executed, the thread is preempted and stops running. Now the producer
has a chance to run, and since the buffer is empty, successfully produces an item
into it. The producer notices that the count is now one, meaning the buffer was
empty just before this item was produced, and so it assumes that the consumer is
currently asleep. It sends a wakeup which, since the consumer has not yet actually
executed its sleep, has no effect. The producer may continue running and eventually
will fill up the buffer, at which point the producer itself will go to sleep. When the
consumer regains control of the cpu, it executes the sleep() since it has already
checked the condition and cannot tell that it has been preempted.2 Now both the
consumer and the producer threads are asleep and no useful work can be done.
This is called a deadlock and is the subject of Section 10.3.
There are two ways to prevent this problem. The first is to make sure that the
check and the sleep are not interrupted. The second is to remember that there was a
2 In fact, it does not need to have been preempted if these two threads were running on separate cores
or processors.
94 Practical Threading, Synchronization, & Deadlocks
wakeup issued while the thread was not sleeping and to immediately wake up when
the next sleep is executed in that thread. The first way is implemented via condition
variables, the second solution is a semaphore.
A condition variable is a way of implementing sleep and wakeup operations in
the pthread library. A variable of type pthread_cond_t represents a “condition”
and acts somewhat like a phone number. If a thread wants to sleep, it can invoke
pthread_cond_wait() and go to sleep. The first parameter is a condition variable
that enables another thread to “phone it” and wake it up. A thread sleeping on a
particular condition variable is awoken by calling pthread_cond_signal() with
the particular condition variable that the sleeping thread is waiting on. Condition
variables can be initialized much the same way that mutexes were, by assigning a
special initializer value to them appropriately called PTHREAD_COND_INITIALIZER.
While this enables us to wait and signal (sleep and wake up), we still have
the issue of possibly being interrupted. This is where the second parameter of
pthread_cond_wait() comes into play. This parameter must be a mutex that
protects the condition from being interrupted before the wait can be called. As soon
as the thread sleeps, the mutex is unlocked, otherwise deadlock would occur. When
a thread wakes back up it waits until it can reacquire the mutex before continuing
on with the critical region. Listing 10.5 shows the producer function rewritten to
use condition variables.
10.2.3 Semaphores
The semaphore.h header provides access to a third type of synchronization, a
semaphore. A semaphore can be thought of as a counter that keeps track of how
many more wakeups than sleeps there have been. In this way, if a thread attempts
to go to sleep with a wakeup already having been sent, the thread will not go to
sleep. Semaphores have two major operations, which fall under a variety of names.
In the semaphore.h header, the operations are called wait and post, but they can
also be known as lock and unlock, down and up, or even P and V. Whenever a wait
is performed on a semaphore, the corresponding counter is decremented. If there
are no saved wakeups, the thread blocks. If the counter is still positive or zero, the
thread can continue on. The post function is an increment to the counter and if
the counter remains negative, it means that there is at least one thread waiting that
should be woken up.
One way to conceptualize the counter is to consider it as maintaining a count
of how many resources there are currently available. In the Producer/Consumer
10.2 Synchronization 95
example, each array element is a resource. The producer needs free array elements,
and when it exhausts them it must wait for more free spaces to be produced by the
consumer. We can use a semaphore to count the free spaces. If the counter goes
negative, the magnitude of this negative count represents how many more copies of
the resource there would need to be to allow all of the threads that want a copy to
have one.
Semaphores and mutexes are very closely related. In fact, a mutex is simply a
semaphore that only counts up to one. Conceptually, a mutex is a semaphore that
represents the resource of the cpu. There can only be one thread in a critical region
that may be running, and all other threads must block until it is their turn.
Semaphores can be declared as a sem_t type. There is no way to have a fixed
initializer, however, because a semaphore can initially take on an integer value rather
than being locked or unlocked. A semaphore is initialized via the sem_init()
function, which takes three parameters: the semaphore variable, the value 0 on all
Linux machines, and the initial value for the semaphore. Listing 10.6 shows the
producer/consumer problem solved by using semaphores. Notice that they can
even replace the mutex, although we could use a mutex if we wanted. The semempty
(initialized to N) and semfull (initialized to 0) semaphores count how many empty
and full slots there are in the buffer. When there are no more empty slots, the
producer should sleep, and when there are no more full slots, the consumer should
sleep.
96 Practical Threading, Synchronization, & Deadlocks
10.3 Deadlocks
The formal definition of a deadlock is that four things must be true:
Mutual exclusion Only one thread may access the resource at a time.
Hold and wait When trying to acquire a new resource, the requesting thread does
not release the ones it already holds.
No preemption of the resource The resource cannot be forcibly released from the
holding thread.
For our purposes, we will truly worry about the circular wait condition. This
means that we must be careful about how and when we acquire resources, including
mutexes and semaphores. If we do something as simple as mistakenly alter the
order of the semaphores from Listing 10.6 to be:
sem_wait(&semmutex);
sem_wait(&semempty);
our program will instantly deadlock. If there are no empty slots, the thread does
not release the semaphore used for mutual exclusion so that the other thread may
run and consume some items.
Ensuring that your code is deadlock-free can sometimes be a difficult task. A
simple rule of thumb can help you avoid most deadlocks and produce code that
spends as much time unblocked as possible:
Always place the mutex (or semaphore being used as a mutex) around the absolute
smallest amount of code possible.
This is not a perfect rule, and surely there is a counter-example to defeat it. No rules
will ever replace understanding the issues of synchronization and using them to
illuminate the potential problems of your own code.
10.4 Terms and Definitions 97
Critical Region A region of code that could result in a race condition if interrupted.
Deadlock A program that is waiting for events that will never occur.
Race Condition A region of code that results in different values depending on the
order in which threads are executed and preempted.
In this chapter we will examine the basics of having two or more computers talk
to each other over an electronic or radio-frequency connection. Having computers
connected to a network is almost taken for granted in this day and age, with the
quintessential network being the Internet. There are plenty of other networks, from
telephones (both cellular and land-line) to the local-area networks that share data
and applications in businesses and homes.
We will start with an introduction to networking basics from a programmer’s
perspective. We focus on the makeup and potential issues of network communi-
cation, and how they affect the performance and reliability of transmitting and
receiving data. We then move to the de facto standard for programming network
applications: Berkeley Sockets. Berkeley Sockets is an Application Programming
Interface (api). An api is an abstraction, furnished by an Operating System or
library that exposes a set of functions and data structures to do some specific tasks.
11.1 Introduction
A network is a connection of two or more computers such that they can share
information. While networking is ubiquitous today, some details are important to
understand before network-aware applications can be adequately written.
Networks, like Operating Systems, can be broken up into several layers to ab-
stract specific details and to allow a network to be made up of heterogeneous
components. There is a formal seven-layer model for networking known as the
OSI model that serves as a set of logical divisions between the different components
into which a network can be subdivided. The Internet uses five of these layers and
results in the diagram shown in Figure 11.1. The bottom-most layer represents the
actual electronic (physical) connection. While wireless networks are common, most
networks will consist of a closed electrical circuit that sends signals and needs to
11.1 Introduction 99
Application
(DHCP, DNS, FTP, HTTP, IRC, POP3, TELNET …)
Transport
(TCP, UDP, RTP …)
Internet
(IP)
Data Link
(ATM, Ethernet, FDDI, Frame Relay, PPP …)
Physical Layer
(Ethernet physical layer, ISDN, Modems, SONET …)
resolve issues of message collision. On top of this layer comes an agreement on how
to send data by way of these electronic signals. The data is organized into discrete
chunks called packets. How these packets are organized needs to be standardized
for communication to be intelligible to the recipient. This standard agreement on
how to do something is known as a protocol. The protocol governing the Internet
is appropriately known as the Internet Protocol (ip).
The Internet Protocol defines a particular packet, as illustrated in Figure 11.2.
The first 192 bits (24 bytes) form a header that indicates details such as the desti-
nation and source of the packet. To identify specific computers in a network, each
computer is assigned a unique IP address, a 32- or 128-bit number. Because of
the way ip addresses are allocated, it often works out that many computers share a
single ip address, but the details of how this works are beyond the scope of this text.
100 Networks & Sockets
Figure 11.3: Each layer adds its own header to store information.
The different sizes of ip addresses come about as a result of two different standards.
ipv4 is the current system using 32-bit addresses. Addresses are represented in
the familiar “dotted decimal” notation such as 127.0.0.1. As networked devices
keep growing, an effort to make sure that every device can have a unique address
spawned the ipv6 standard. With this standard’s 128-bit addresses, there is little
chance of running out anytime in the foreseeable future.
Packets sent via ip make no guarantees about arrival or receipt order. As a
theoretical concept, such a guarantee is impossible to make. Imagine that Alice
sends a message to Bob and wants to know that Bob receives it, so she asks Bob to
send a reply when he gets it. A week passes, and Alice hears nothing from Bob and
begins to wonder. However, she is met with an unanswerable question: Did Bob
not get her message, or did she not get Bob’s reply?
The good news in regard to this conundrum is that modern networks are usually
reliable enough that “dropped” packets are rare. A protocol that does nothing to
guarantee receipt is known as a Datagram protocol. The term Datagram comes
from a play on telegram, which also had no guarantee about receipt.
While mostly reliable communication might be adequate for some uses, the
majority of applications want reliable, order-preserving communication. Email,
for instance, would be useless if the message arrived garbled and with parts miss-
ing. Since we assume a mostly-reliable network, we can do better. A protocol
implemented on top of ip called Transmission Control Protocol (tcp) attempts
to account for the occasional lost or out-of-order packet. It does this through ac-
knowledgment messages and a sequence number attached to each packet to indicate
relative order. To do this, tcp needs to add this sequence number to the packet,
and so it adds its own header following the ip header. Figure 11.3 illustrates the
concatenation of headers done by each layer. (Note that the figure assumes the data
link layer is Ethernet.)
Some applications, like streaming audio or video, or data for video games, can
tolerate the occasional lost or out-of-order packet. Not worrying about receipt or or-
der allows for larger amounts of data to be sent at faster rates. In fact, no connection
is even necessarily made between the sender and the recipient. The most common
11.2 Berkeley Sockets 101
Server Client
socket()
bind() connect()
listen()
accept()
send() and recv()
close()
call creates a file descriptor representing the connection to be used by the other
functions. Berkeley Sockets distinguish between a listening server and a connecting
client. Listing 11.1 gives the code for a server that simply replies with “Hello there!”
to any program that connects to that particular machine and port pair. A port is an
application-reserved connection point on a particular ip address.
Due to the fact that a network communication failure is much more likely than
failures with many other i/o operations, even a simple program ends up with many
lines of code. If something goes wrong, every function will return a negative number
and set errno, the global error code, to the appropriate value. Using perror()
converts errno into a (sometimes) useful error message and prints it to the screen.
For the send() and recv() functions, the returned value indicates how many bytes
were actually sent. If the data is too large, multiple calls may be needed to handle it.
We can connect to the server in Listing 11.1 by using telnet, which emulates a
terminal and connects to a specified address and port. If the server is running on
the local machine, the following output would be seen:
Berkeley Sockets also support connectionless protocols like udp. The sendto()
and recvfrom() calls take extra parameters that specify the address of the recipient
or the sender. There is no need to do anything other than set up a socket to use
them.
#include <string.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <stdio.h>
#include <stdlib.h>
int main() {
int sfd, connfd , amt = 0;
struct sockaddr_in addr;
char buf[1024];
memset(&addr, 0, sizeof(addr));
addr.sin_family = AF_INET;
addr.sin_port = htons(MYPORT);
addr.sin_addr.s_addr = INADDR_ANY;
Figure A.1 gives a list of the general purpose registers. The first six can be used
for most any purpose, although some instructions expect certain values to be in a
particular register. %esp and %ebp are used for managing the stack and activation
records. The program counter is %eip, which is read-only and can only be set via a
jump instruction. The results of comparisons for conditional branches are stored in
the register EFLAGS.
The registers %eax, %ebx, %ecx, and %edx each have subregister fields as shown
in Figure A.2. For example, the lower (least-significant) 16 bits of %eax is known as
%ax. %ax is further subdivided into two 8-bit registers, %ah (high) and %al (low).
There is no name for the upper 16-bits of the registers. Note that these subfields
are all part of the %eax register, and not separate registers, so if you load a 32-bit
quantity and then read %ax, you will read the lower 16-bits of the value in %eax.
The same applies to the other three registers.
Operations in x86 usually have two operands, which are often a source and a
destination. For arithmetic operations like add, these serve as the addends, and the
addend in the destination position stores the sum. In mathematical terms, for two
registers a and b, the result of the operation is a = a + b, overwriting one of the
original values.
107
%eax Accumulator
%ebx Base
%ecx Counter
%edx Data
%eax
z }| {
%ah %al
| {z }
%ax
Figure A.2: %eax, %ebx, %ecx, and %edx have subregister fields.
108 The Intel x86 32-bit Architecture
sub Subtract
add Add
and Bitwise AND
b byte (8-bit)
w word (16-bit)
l long (32-bit)
q quad (64-bit)
After the opcode, the first operand is the source and the second operand is the
destination. Memory dereferences are denoted by ( ). Listing A.1 gives an example
of a “hello world” program as produced by gcc.
.file "asm.c"
.section .rodata.str1.1 ,"aMS",@progbits ,1
.LC0:
.string "hello ␣ world!"
.text
.globl main
.type main, @function
main:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
andl $-16, %esp ;1111 1111 1111 0000
subl $16, %esp
movl $.LC0, (%esp)
call puts
movl $0, %eax
leave
ret
Listing A.1: Hello world in AT&T assembler syntax.
main:
push ebp
mov ebp, esp
sub esp, 8
and esp, -16 ;1111 1111 1111 0000
sub esp, 16
mov DWORD PTR [esp], .LC0
call puts
mov eax, 0
leave
ret
Listing A.2: Hello world in Intel assembler syntax.
110 The Intel x86 32-bit Architecture
BYTE 1 byte
WORD 2 bytes
DWORD 4 bytes (double word)
QWORD 8 bytes (quad word)
Intel syntax orders the operands completely in reverse from the at&t con-
vention. The first operand is the destination, the second operand is the source.
Dereferences are denoted by [ ]. Listing A.2 gives a sample of the same “hello
world” program as in Listing A.1 rewritten in Intel syntax.
A.4 Flags
Suppose we have a conditional statement in C such as if(x == 0) { ... } which
we could translate into x86 using the compare and jump-if-equals instructions as:
cmpl $0, %eax
je .next
; ...
.next:
A.5 Privilege Levels 111
One thing that is not immediately apparent in the code is how the branch “knows”
the result of the previous compare instruction. The answer is that the compare
instruction has a side-effect: It sets the %eflags register based on the result of the
comparison.
The %eflags register is a collection of single-bit boolean variables that represent
various pieces of state beyond the normal result. Many instructions modify %eflags
as a part of their operation. Some arithemetic instructions like addition set flags if
they overflow the bounds of the destination. The conditonal jumps consume the
state of various flags as the condition on which to branch. In the above example,
the jump-equals instruction actually checks the value of the special zero flag (ZF)
that is part of %eflags. In fact, the je instruction is actually a psuedonym for the
jz instruction: jump if the zero flag is set.
The side-effect of an operation setting flags can lead to confusing code. Consider
the listing:
test %eax, %eax
je .next
; ...
.next:
This is functionally equivalent to the version that used cmpl above. The test
instruction computes the bitwise-AND of the two arguments, in this case both are
the register %eax. Since anything AND itself is going to be itself, this seems to be a
no-op. But test takes the result of that AND and sets the ZF based on it. To learn
about these side-effects, it is always handy to have the instruction set manual nearby.
A good compiler will probably generate the listing that uses test rather than
cmp since the immediate 0 takes up 4 bytes of representation that are not needed
in the encoding of test. The smaller code is generally preferred for performance
(caching) reasons.
below it. This allows the os to modify anything and to be protected from malicious
or buggy user programs.
B | Debugging Under gdb
Often you will want to be able to examine a program while it executes, usually to
find a bug or learn more about what a program is doing. A debugger is a program
that gives its user control over the execution and data of another program while
running.
With the very first programs a new programmer writes come the first problems,
known as “bugs.” These are logical errors that the compiler cannot check. To
track down these bugs, a concept known as debugging, beginners often use print
statements. While such statements certainly work, this approach is often tedious
and time-consuming due to frequent recompiling and re-execution. Adding print
statements can also mask bugs. Since these are additional function calls, they provide
legitimate stack frames where before an array out of bounds problem might have
been a segmentation violation. With multithreaded programs, print statements can
change timings and context switch points resulting in a different execution order
that may hide a concurrency issue or race condition.
Print statements generally come in two kinds: The “I’m here” variety, which
indicates a particular path of execution, and the “x is 5” variety that examine the
contents of variables. The first type is an attempt at understanding the decisions
that a program makes in its execution, i.e., what branches were taken. The path of
a program through its flowchart representation is known as its control flow. The
second type explores data values at certain points of execution.
We can do both of these things with a debugger, without the need to modify the
source code. We may, however, choose to modify the executable at compile-time
to provide the debugger with extra information about the program’s structure and
correspondence to the source. Remember that a compiled executable has only
memory locations and machine instructions. Gone are symbols like x or count for
variables; all that remains are registers and memory addresses. For a debugger to
be more helpful, we can choose to add extra information to the executable while
114 Debugging Under gdb
Function Name Execution will stop before the specified function is executed.
Line Number Execution will stop before the specified line of code is executed.
Absolute Address Execution will stop before the instruction at that address is exe-
cuted.
The first two specifications require the executable to have additional informa-
tion that may not necessarily be there. The line number information requires the
executable to have been built with debugging information. The function names are
usually part of the symbol table even without a special compilation but the symbol
table may be “stripped” out after compilation. Specifying an absolute address always
works, but gives no high-level language support.
Running gdb is as simple as specifying an executable to debug on the command
line. If your program requires command line arguments, they can be specified by
using the --args command line option, or by issuing the set args command once
gdb has started up.
Once in gdb, the command to place a breakpoint is break, which can be abbre-
viated just by typing b. For example, a breakpoint could be placed at the main()
function by typing b main. When the program is run via run or r, the program
will immediately stop at the main() function. If there is debugging information, a
B.1 Examining Control Flow 115
breakpoint can be set at a particular file and line number, separated by a colon: b
main.c:10 sets a breakpoint at line 10 in the main.c file.
It is also possible to put a breakpoint at an arbitrary instruction by specifying its
address in memory. The syntax is b *0x8048365, where the hexadecimal number
needs to be the start of an instruction. It is up to you to ensure that this address
is valid. If it is not aligned to the start of an instruction, the program might crash.
Section 8.2.2 gives some insight on how breakpoints are implemented and why
placing a breakpoint in the middle of a multibyte instruction would be catastrophic
to the program.
Once the program is stopped, you will probably want to examine some data, the
topic of the next section. One useful thing to check, however, is what the call stack
contains, i.e., what function calls led to the current place. This is called a backtrace
and can be seen via the backtrace (abbreviated back or bt) command.
When you are finished with your examination, you have either found your bug
and want to stop debugging, or you will need to continue on. Typing run or r
allows you to restart the program from the beginning. The quit command exits
gdb. If you want to continue, you can simply issue the command or the shortcut c.
Continuing runs the program until it hits another breakpoint or the program ends,
whichever comes first.
You may also want to step through the execution of the program, as if there were
a breakpoint at each line. If the program was built with debugging information, you
can use the commands next and step. These two commands are identical except
for how they behave when they encounter a function call. The step command will
go to the next source line inside the called function. The next command will skip
over the function call and stop at the source code line immediately following the
call. In other words, it will not leave the current function. Both step and next can
be abbreviated with their first letter.
If the source code is not available and the program was not built with debugging
information, step and next cannot be used. However, there are parallel commands
for operating directly on the machine instructions. The stepi command goes to
the next machine instruction, even if that is inside a separate function, whereas
nexti skips to the next instruction following a call without leaving the current
function. The abbreviation for stepi is si and for nexti is ni.
Each of the above control flow instructions (i.e., continue, next, step, nexti,
and stepi) take an optional numerical argument. This number indicates a re-
peat count. The particular operation is performed that many times before control is
returned to the debugger.
116 Debugging Under gdb
The technique of last resort is to interrupt the program with a SIGINT signal
(see Section 8.2) by pressing ctrl+c. The signal will be handled by gdb and it will
return control back to the user. One word of caution, however: it may be somewhat
surprising where execution has stopped (it could be deep in a bunch of library calls),
so it may be helpful to use back to get a backtrace and possibly some locations for
regular breakpoints.
If the program was built with debugging symbols, you can write your expressions
in terms of actual program variables to see what they contain. If your program is
without such symbols, you can always look at register values. For example, on an
x86 platform, you could display the contents of the register %eax by prefixing it
with a dollar sign ($) like so:
(gdb) print $eax
$1 = 10
With the ability to use casts and dereferences, the print command is likely all
that you need. However, the relative frequency with which you will want to look
at the contents of some memory locations is high enough that there is a dedicated
examine command, x. With the x command, the specified argument must be an
address. By default, the contents are dumped as a hexadecimal number in the
machine’s native word size. It is possible to also specify a format, which acts as a
typecast for the data. The type is specified by a single letter code following a forward
slash. For instance:
(gdb) x 0x8048498
0x8048498 <_IO_stdin_used+4>: 0x6c6c6548
(gdb) x/d 0x8048498
0x8048498 <_IO_stdin_used+4>: 1819043144
B.3 Examining Code 117
Note that all four x commands operate upon the same address, but each has a
different data interpretation. With /d, the number is printed out in decimal, rather
than the hexadecimal (also obtainable via /x). Specifying /s will treat the address
as the start of a C-style string and will attempt to print data until it encounters a
null character.
The above example also illustrates a few of the output features of gdb. The
first column is the address that is being examined, but in between the < and >,
gdb attempts to map this address back to the nearest entry in the symbol table.
In some cases, this can be quite useful, since with debugging symbols, individual
variable names will be identified even when you know only an address (say from
the contents of a pointer). Without full debugging symbols, or with a stripped
executable, the output might not be correct, so common sense must always be used
when interpreting the output.
x/i $eip
Backtrace A list of function calls that led to the current call; a stack dump.
Breakpoint A location in code where execution should stop or pause, usually used
to transfer control to a debugger.
Control Flow The path or paths possible through a region of code as a result of
decision (control) structures.
Debugger A program that controls and examines the execution and data of another
program.
References For Further Reading
[1] Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. Compilers: Principles, Tech-
niques, and Tools. Addison Wesley, 1986.
[2] Jeff Bonwick and Sun Microsystems. The slab allocator: An object-caching
kernel memory allocator. In USENIX Summer, pages 87–98, 1994.
[3] Jonathan Corbet, Alessandro Rubini, and Greg Kroah-Hartman. Linux De-
vice Drivers. O’Reilly, 3rd edition, 2005. Available from: http://lwn.net/
Kernel/LDD3/.
[4] Intel Corporation. Intel 64 and IA-32 Architectures Software Developer’s Manual,
volume 1: Basic Architecture. 2007. Available from: http://www.intel.com/
design/processor/manuals/253665.pdf.
[5] Intel Corporation. Intel 64 and IA-32 Architectures Software Developer’s Manual,
volume 2A: Instruction Set Reference, A-M. 2007. Available from: http:
//www.intel.com/design/processor/manuals/253666.pdf.
[6] Intel Corporation. Intel 64 and IA-32 Architectures Software Developer’s Manual,
volume 2B: Instruction Set Reference, N-Z. 2007. Available from: http:
//www.intel.com/design/processor/manuals/253667.pdf.
[7] Tim Lindholm and Frank Yellin. The Java Virtual Machine Specification.
Addison Wesley Longman, 2nd edition, 1999. Available from: http://java.
sun.com/docs/books/jvms/.
[9] Mark Mitchell, Jeffrey Oldham, and Alex Samuel. Advanced Linux Pro-
gramming. New Riders Publishing, 2001. Available from: http://www.
advancedlinuxprogramming.com/alp-folder.
[10] David A. Patterson and John L. Hennessy. Computer Organization and Design:
The Hardware/Software Interface. Morgan Kaufmann, 3rd edition, 2007.
[11] Richard Stallman, Roland Pesch, Stan Shebs, et al. Debugging with gdb. Free
Software Foundation, 9th edition, 2006. Available from: http://sourceware.
org/gdb/current/onlinedocs/gdb.html.
[12] Andrew S. Tanenbaum. Modern Operating Systems. Prentice Hall, 2nd edition,
2001.
Index
A Callee-saved 41, 49
Activation record 41, 49 Caller-saved 41, 49
Address 2, 7 Calling convention 42, 49
Address space 37, 39, 78 CISC 106
Alignment 42, 49 Coalesce 58, 65
API 98, 104 Compiler 17, 19, 29
AT&T Syntax 108 Condition variable 94
Context 68, 76, 79
B Switch 68, 76, 79
Back patching 28 Control flow 113, 119
Back-patching 23 CPU bound 79, 83
Backtrace 115, 119 Critical region 89, 97
Berkeley Sockets 98, 101
Best Fit 56 D
Bitmap 50, 51, 65
Blocked 79, 83 Data segment 26
Breakpoint 114, 119 Datagram 100
BSS section 27 Deadlock 85, 93, 97
Buddy Allocator 50, 64 Debugger 74–76, 113, 119
Buffer-overrun 49 Deduplication 23, 29
Dereference 4, 7
C DLL Hell 24
C Standard Library 20 Domain Name Server 101
Call table 34, 36 Dynamic 8, 16
123
E L
Exec header 26 Libraries 20
Executable file 25, 29 Library 29
Lifetime 8, 16
F Link loader 22
File descriptor 68 Linked list 50
First Fit 56 Linker 17, 20–25, 29
Fragmentation Linking
External 56, 65 Dynamic 20, 22–24
Internal 52, 65 Dynamic loading 24–25
Frame 41, 49 Static 20–22
Pointer 42, 49 Loader 29, 71
Function pointer 30, 36, 74
M
G Macro 18, 29
Garbage collection 59, 65 Magic number 26
gcc 11, 17, 42, 108, 114 Memory leak 59, 65
gdb 74, 108, 113–119 Multiprogramming 79, 84
Mutex 90
H
Header file 18, 29 N
Heap 38, 50, 65 Network 98, 104
Next Fit 56
I
I/O bound 80, 83 O
Intel syntax 108 Object file 19, 29
Interrupt 68, 76, 79 Operating System 66, 76
Vector 68, 76 Optimized 19
IP Address 99, 104 Ordinal 34, 36
J P
Jump Table 34 Packet 99, 104
Page 38, 39
K Plugins 25
Kernel 66, 76 Pointer 2, 7
Mode 68 Pointer Arithmetic 6, 7
Space 66, 76 Poll 72, 76
124 Index
This text was typeset in LATEX 2ε using lualatex under the MiKTEX 2.9 system on
Windows 7. LATEX is a macro package based upon Donald Knuth’s TEX typesetting
language.1 LATEXwas originally developed by Leslie Lamport in 1985.2 MiKTEX is
maintained and developed by Christian Schenk.3
The typefaces are Minion Pro, Myriad Pro, and Consolas. Illustrations were
edited in Adobe® Illustrator® and the final pdf was touched-up in Adobe® Acrobat®.
1 http://www-cs-faculty.stanford.edu/~knuth/
2 http://www.latex-project.org/
3 http://www.miktex.org/