Intro To C - Module 8
Intro To C - Module 8
Weekly Reading: "Doug Lea's Memory Allocator" (slides 58-65). Rest of presentation optional.
Mutual Recursion
C allows recursive functions. What happens if we need mutual recursion, across functions?
Below is a toy example:
#include "stdbool.h"
#include "stdio.h"
bool is_even(int n) {
if (n == 0) return true;
else return is_odd(n - 1);
}
bool is_odd(int n) {
if (n == 0) return false;
else return is_even(n - 1);
}
int main() {
if (is_even(22)) printf("22 is even.\n");
return 0;
}
It doesn't compile. When C was invented, source code was read in a single pass, so type
signatures had to be declared before use. The solution to this is to use a forward declaration.
The compiler doesn't need to know everything upfront, just the type signatures. If the line:
is inserted in the code above, anywhere above is_even, the code will compile and run properly.
Header Files
We can now explain why the standard libraries use the .h suffix instead of .c, as you've used in
the files you've created thus far.
When you #include <stdio.h>, instead of <stdio.c>, you are telling the compiler to include
a header file that declares the type signatures of the stdio functions. This gives the compiler
enough information for separate compilation of your source file—the implementation isn't
necessary until later, when object files are linked together to make an executable.
You will also see, on C and C++ projects, this pattern:
#ifndef MYLIB_H
#define MYLIB_H
#endif
The reason for this is to avoid redundant #includes—included files may also include files, filling
out the transitive closure, creating a risk of overlap. The #ifndef ("if not defined") guard is
used, as well as a #define of the same label, to ensure that each header file is included only
once.
The nuances of memory allocation are beyond the scope of this course, as there are dozens of
strategies and tradeoffs, and it is sill a field of active research, but we'll summarize the basic
problem. Let us use, as a model for our heap, the address space [0...19].
The program requests a block (A) of size 5, so we give it [0...4]. Those cells are unavailable;
there are 15 left:
AAAAA...............
Next, it requests a block (B) of size 13, we give it [5...17]; then it requests a block (C) of size 1,
so we give it [18].
AAAAABBBBBBBBBBBBBC.
AAAAA.............C.
Later on, it requests block D of size 3—we give it [5...7]—and E of size one—[8].
AAAAADDDE.........C.
AAAAA...E.........C.
Now, it tries to allocate a block of size 10. The resources are there—13 of 20 memory cells are
unused—but the largest contiguous block has size 9. The heap is in a state of fragmentation.
This is what modern memory allocators do everything they can to avoid.
One contributing factor to this is that we have small and large blocks sharing the same space.
Long-lived small blocks sit on the heap, making it impossible to find space for big ones. If all
blocks allocated were of the same size—clearly, this is not a practical solution—there would be
no risk of fragmentation. Still, if we regularize the size of allocations—say, we make it a policy to
round up to the nearest power of 2, so that a request for 859 bytes receives a block of size
1024—we can make allocation more manageable, at the cost of some inefficiency. For this
reason, the block you are given by malloc may be larger than what you requested—of course,
you should never depend on this.
There are a number of intersecting resource issues here. Physical memory is expensive, shared
by the system, and managed by a finely-tuned machine called an operating system. What we're
discussing here, though, is fragmentation in the address space. On a modern, 64-bit machine,
the user address space is very large—247 bytes (128 TB) on most modern machines. A process
that uses physical memory inefficiently by overallocating harms the system, but not necessarily
itself. One that endures address space fragmentation, on the other hand, can suffer allocation
failures—fatal, for most programs—even when the physical resources are available. We want
our programs to be good neighbors and not waste memory, but a little bit of inefficiency might be
tolerable. Also, we want malloc to be fast—some programs call it millions of times per
second—so we cannot afford complex algorithms, and we prefer to make system calls—such as
sbrk, which is used to request more heap space—as rarely as possible. For example, it is best
for our program and the OS that it not make a system call every time it allocates 128 bytes.
The solution modern allocators use is one where they request physical memory—a system
call—rarely and in large blocks, preallocating some of what they have into free lists to contain
small blocks of predetermined sizes—say: 8, 12, 16, 24, 32, 40, 48, 64, 80, 96, 128, and up.
Doubly-linked lists are a common data structure, but not the only option. The program is
therefore able to quickly find blocks of approximately the size allocated; when free is used,
blocks are returned to the free lists and can be allocated again. Some allocators also
consolidate small blocks into larger ones, or vice versa, as demands on the heap evolve.
No harm is done if the program suffers a few kilobytes of overhead for small blocks, but large
ones require more care. With hundreds of programs running at any time on a given system, we
can't afford to have each one speculatively allocate 800,000,000 bytes because it might need
space for a 10,000-by-10,000 matrix. So, for large (128 KB+) allocations, it's more common to
use mmap to create, in essence, an in-memory temporary file and, on a system with virtual
memory, use demand paging to ensure that physical memory pages are used only if they are
needed.
This only scratches the surface of memory allocation and the tradeoffs involved—for more
depth, take the operating systems classes.
Although malloc is not truly nondeterministic, it is usually considered to be so, in terms of
reliability and especially performance. System-wide events with no connection to your program
can cause allocation to fail (return NULL) or take longer than expected. This issue exists in
garbage-collected languages as well—the "embarrassing pause" can occur at times that are
very difficult to predict—but it is, in higher-level languages, mostly hidden from you. That said,
people writing high-performance code in any language prefer to avoid allocation except when
absolutely necessary.
We've seen in Module 7 how to create resizable arrays. Either a resizable or a fixed-size array
can be used to implement the stack data structure, as you've already done. What about a
queue? There are a variety of solutions, but one way to create a fixed-size FIFO queue is with a
ring buffer, so called because it uses modular arithmetic to "wrap around" as on a ring.
An implementation is below:
queue* queue_new(size_t s) {
queue* q = malloc(sizeof(queue));
int* buf = malloc(s * sizeof(int));
if (!out || !buf) { // allocation failure
exit(1);
}
q->front = 0;
q->end = 0;
q->capacity = s;
q->buf = buf;
q->is_full = false;
return q;
}
void queue_delete(queue* q) {
free(q->buf);
free(q);
}
size_t queue_size(queue* q) {
if (q->is_full)
return q->capacity;
else if (q->end < q->front)
return q->capacity + q->end - q->front;
else
return q->end - q->front;
}
The queue keeps track of two indices: end and front, the first of which points to an unused
slot—where the next element will go—with the latter signifying the next item to be dequeued.
We use bool returns to signify the exceptional cases of either an empty (when pulling) or a full
queue (when adding) and we use modular arithmetic to ensure "wrap-around" behavior.
For example, if q->capacity were 8; q->front, 5; and q->end, 2, then the buffer's semantic
contents would be {q->buf[5], q->buf[6], q->buf[7], q->buf[0], q->buf[1]}.
Unfortunately, q->front and q->end don't always tell us everything we need to know. There are
two cases in which those two indices will be equal—when the queue is empty, and when it is
full. In order to differentiate the cases, we must add the q->is_full field, which adds overhead
but cannot be avoided if we want to use the queue's full capacity.
I should also note that we might be tempted to coalesce the latter two branches in the
queue_size method, like so:
Why can't we? The first issue is that these numbers, as the data structure is designed, are all
size_t, which is unsigned, so that if q->end - q->front were meant to go negative, you
would instead get a large positive number and incorrect results. It might seem like using a
signed type, such as int64_t, would fix this, but it causes a different problem because, in C
semantics, -1 % 6 is -1, not 5. In order to guarantee a positive modulus, we are required to
handle the cases separately.
The coalescence above would be safe, however, if the queue capacity were restricted to powers
of 2, and if the % operator were replaced by a bit mask, e.g., x & 0x1F instead of x % 32. Of
course, this would make the structure less flexible.
C does not have exceptions, and the standard library does whatever it can to keep your
program alive. Therefore, there is no FileNotFound exception; fopen will, instead, give you a
null pointer if it cannot open a file—this puts the onus on you to check the FILE* that is returned
to you and handle the situation. You'd probably like to know why you got that null pointer.
To do this, you check errno, a global int variable that contains either zero—nothing has gone
wrong yet—or a code for the last error registered. To use it, you must #include <errno.h> .
The perror function interprets these codes and prints out a useful error message. Given a
string you supply, it will print both that string and a relevant error message, as in the section
below.
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
int main () {
const char* filename = "does_not_exist.txt";
errno = 0;
FILE* f = fopen(filename, "r");
if (f != NULL) {
// do stuff
return 0;
} else {
printf("Error code is %d\n", errno);
char error_buf[200];
sprintf(error_buf, "(%s:%d) Could not open %s",
__FILE__, __LINE__, filename);
perror(error_buf);
exit(1);
}
}
There are a few things here that merit discussion. We set errno to 0 before calling fopen to
avoid getting some other error—in this particular case, this is unnecessary, since we're at the
beginning of the program, but it's often good practice, for the sake of being sure of why errno
was set.
Using sprintf, we build a string that also includes the __FILE__ and __LINE__ macros to
indicate (approximately) where in the source code the error was detected. This string is passed
into perror for the purpose of appending a useful error message inferred from the error code.
We could, in this case, return 1 since we are in main, but instead we use a call to exit, which
has the general benefit of exiting the program from anywhere.
$ ./error_no
Error code is 2
(error_no.c:15) Could not open does_not_exist.txt: No such file or directory
The goto statement is one you should almost never use, but it is provided and is sometimes
useful. If you place a label in a function, you can goto it from elsewhere in the function
body—you may jump backward or forward in the statement sequence.
Is this bad? Not necessarily, but it can be confusing. We've already seen the switch/case
statement, which requires the user to explicitly break unless they want fallthrough semantics.
Although "spaghetti code" these days refers more generally to inscrutable software, the epithet's
original meaning was goto-driven code in which control can (as it does in a human
organization) transfer from anywhere, to anywhere, without warning. This can make reasoning
about a program's execution nearly impossible.
Below is a goto-less program that sums the primes less than N for given N, using 20 as a
default value.
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
bool is_prime(int n) {
if (n < 2) return false;
for (int d = 2; d * d <= n; d++) {
if ((n % d) == 0) return false;
}
return true;
}
This code is simple enough, and it works, but if you want to be clever—which, let's be clear, you
usually shouldn't—you can remove some of those if (...) { } blocks by replacing them with
branches.
bool is_prime(int n) {
if (n < 2) goto not_prime;
for (int d = 2; d * d <= n; d++) {
if ((n % d) == 0) goto not_prime;
}
return true;
not_prime:
return false;
}
We have to move some variables around, because the compiler gets antsy about possibly
undeclared values when arbitrary control flow is used—for example, I had to move acc to an
earlier position in main in order to placate it.
thus, acc is moved up in main. Since labels must attach to statements, we use a "null
statement" for the done_for label, which exists at the end of the loop. But we can do "better"
and remove the for-loops as well!
bool is_prime(int n) {
int d = 2;
if (n < 2) goto not_prime;
start_prime_for:
if (d * d > n) goto is_prime;
if ((n % d) == 0) goto not_prime;
d++;
goto start_prime_for;
is_prime:
return true;
not_prime:
return false;
}
There is still some room for "improvement." We are using is_prime, a function call. If we want
to make our code less clear, we can shove that machinery into the main function—in other
words, inline it by hand. We could put it at the beginning or the end, but for maximum lötlz, I'm
going to jam it right in the middle for no otherwise reason. I also renamed variables and labels,
as well as throwing in a magic constant, to make the source code even more unmaintainable.
It works...
$ ./prime_sum_with_goto 50
Sum is 328
A more "white hat" use of goto is for cleanup, especially at the end of functions in exceptional
circumstances. Usually, when an allocation failure occurs, we want to crash the
program—preferably an exit, not a segmentation fault, the latter being indicative of undefined
behavior. If our plan is to crash, we don't have to worry about resource leaks, because the OS
will reclaim the resources once the program exits. However, if we're writing a long-running
program that must survive and handle allocation failures, including partial allocations of complex
data structures, we'd like to free everything we've allocated. One way to do this is as follows:
typedef record {
char* field1;
char* field2;
} record;
record* record_new() {
record* out = malloc(sizeof(record));
if (!out) return NULL;
char* f1 = malloc(16);
if (!f1) goto cleanup1; else out->field1 = f1;
char* f2 = malloc(16);
if (!f2) goto cleanup2;
else {
out->field2 = f2;
return out;
}
cleanup2:
free(f1);
cleanup1:
free(out);
return NULL;
}
This function, in event of a partial allocation, avoids leaking memory by freeing everything it
has malloc'd before returning NULL. If the third allocation (f2) fails, it uses goto to jump to
cleanup2—which will never be reached under normal execution, due to earlier return—and
frees f1 and then out before returning a null pointer.
If you include "setjmp.h" you have access to two functions—setjmp and longjmp—that allow
nonlocal goto: you can "unwind the stack" and jump back in time. To do this, you must create a
jmp_buf, a data structure—don't worry about the internals; it's mostly register contents—that
these functions use to track the environment. When you call setjmp(jb) on such an object, you
put this environment information into jb and the original call returns 0—an important detail. If
you call longjmp(jb, n) on that buffer, you will return to the setjmp call which will, in this later
execution, return n. Noting that setjmp uses a zero return value to indicate first execution and
nonzero values to signal re-entrance, you cannot call longjmp with n equal to 0; the function will
silently change it to 1.
#include "setjmp.h"
#include "stdio.h"
int main () {
jmp_buf env;
int x = setjmp(env);
printf("setjmp was %d\n", x);
if (x == 0) {
helper(1, env);
helper(37, env);
helper(61, env);
} else {
printf("Failed with exception %d\n", x);
}
return 0;
}
Take a moment to guess what this program will print out; feel free to write it down, to test your
intuition. The answer will be given below.
$ ./setjmp1
setjmp was 0
entering helper function (1)
exiting helper function (1)
entering helper function (37)
setjmp was 2
Failed with exception 2
The first call to setjmp returns 0, which means that our if (x == 0) block will execute;
helper(1, env) then goes off without any issues, printing on exit. Our call to helper(37,
env) results, because our unhelpful helper dislikes that number, results in a longjmp, with
return code 2, sending us to the setjmp statement in main—we have exited helper without
return, and are back in the calling function. This time, x is set to 2, so we take the else branch
before exit.
#include <setjmp.h>
int f(int x, jmp_buf env) {
setjmp(env);
return x * x;
}
int main() {
jmp_buf env;
f(17, env);
longjmp(env, 34);
return 0;
}
What's wrong with this? Remember that every function call creates a stack frame for its local
variables, including copied parameters and the return address, that will be deallocated on exit.
The setjmp call sets the jump buffer's instruction pointer to a place at which a parameter called
x is needed—in the first trip through f, this is no problem. When f exits, the stack frame in
which x existed is deallocated—in the second trip, the program relies on a value that no longer
exists. Undefined behavior results.
Just as you cannot return a pointer to a function's local variable, because the pointer will be
dead as soon as the function exits, you can never jump forward (in stack space) into a function
call that has already exited—only back.
You'll probably never use setjmp and longjmp—I discuss them for completeness, so that if you
encounter them, you will know what they are.
You've encountered some of these functions before, but it's worth revisiting the string functions,
because you'll be using them a lot. These operate on null-terminated character strings, and it is
not safe to use them on pointers if you do not know that there will be a null terminator (recall
gets—an adversary can supply an arbitrarily long string, triggering buffer overflow.)
strlen: returns the length of a string, not including the null terminator.
strlen("Hello World") —> 11 (but you need a char[12] to hold it, for the \0)
strcpy: copies the second string (source) into the first (destination). The destination buffer must
be large enough to hold the source, and they cannot overlap.
strcat: copies the second string (source) at the end of the first—appending, rather than
overwriting. The destination buffer must be large enough to hold the combined string, and they
cannot overlap.
strchr: returns a pointer to the first instance of a character within a string, or NULL if there are
none.
Finally, there exist "counted" versions of three functions above—strncmp, strncpy, and
strncat—that should be used when you can't guarantee an upper bound on sizes—they take
integer arguments and, rather than overflow a buffer, exit gracefully.
char buf[128];
char* list_of_strings[] = {"cats ", "dogs ", "eels ", "foxes ",
"goats ", "horses ", "ibexes"};
int r = build_string(buf, 128, 7, list_of_strings);
If the concatenated string is too large for the buffer, the function returns false and the truncated
output will not be null terminated; if it returns true, successful concatenation—in O(N) time—has
been achieved.
When you're working with generic binary data, zero bytes aren't null terminators—they're just
ordinary bytes, legal values like any other, so you cannot use string functions. Instead of
strchr, use memchr; instead of strcmp, use memcmp. These functions do not treat zero bytes as
terminators.
Whenever you want to copy a block of data—an array, a struct—of known size, use memcpy.
Like strcpy, it takes the destination as its first argument; the third, a size_t, is the number of
bytes to be copied. For example, we have below a function that copies 100 records from array a
into b:
int main() {
record* a = malloc(100 * sizeof(record));
record* b = malloc(100 * sizeof(record));
As with strcpy, the source and destination arrays are not allowed to overlap. But if they do
overlap, there is a function called memmove that, while not as fast, can safely be used in this
case.
Last, but not least, you will sometimes need to set every byte in a block of memory to a specific
value. For that purpose, there's memset, which takes a pointer to a block, a character value (an
int, but interpreted as an unsigned char, or byte), and the block's size. For example, if we
wanted to zero a's memory in the example above, we would use:
Module 8 Writeup
8.1. Write 100–200 words on your PSI interpreter to answer the following: Which features have
you successfully implemented? How do you know—or why do you believe—that your
implementation is correct? Are there any features you haven't been able to implement? Are
there bugs? If so, discuss them. How did you find them, what do you think is going wrong, and
how might you go about fixing them?
As long as you have a complete (not necessarily bug-free) implementation and thoughtful
explanations of what works and what doesn't, you'll get at least 90% credit for this assignment.