08 Structs
08 Structs
------------------------------------------
Arrays are but one data structure supported by the C language; they limit us to
contiguous elements of the same type, only allow us to access those elements by
index (rather than by something more descriptive):
In this lecture, we’ll learn how to model other in-memory data structures in C.
Structs
-------
struct Point {
int x;
int y;
};
The above struct definition creates a new type named "struct Point", which we
can define and use like so:
pt.x = 2;
pt.y = 3;
Note C syntax requires us to write "struct" before "Point" as the type name.
Structs are so-called because they describe the in-memory structure of a type.
This is what pt looks like in memory:
+---+---+---+---+---+---+---+---+
| .x = 2 | .y = 3 |
+---+---+---+---+---+---+---+---+
|----------- struct pt ---------|
The struct definition of struct Point tells the compiler to find the .x field at
byte-offset 0, and the .y field at byte-offset 4 (because .y comes after .x).
So writing:
&pt.y
Note that the struct layout (i.e., the byte offsets) may differ depending on
what hardware and platform you compile this program for. For example, the
layout may look different on a platform where ints are 8 bytes.
Structs can also have fields of different types. For example, the following
struct MyString allows us to quickly access the length of a string without
scanning it for a null terminator byte:
struct MyString {
unsigned int len;
char *s = str;
};
struct MyString s;
s.str = "abc";
s.len = strlen("abc");
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| .len = 3 | (padding) | .str = (points to "abc") |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
|--------------------- struct MyString -------------------------|
This is because C inserts padding between struct fields to ensure that fields
are correctly aligned (e.g., 8-byte pointers should begin at addresses that are
a multiple of 8). You do not need to know the precise algorithm, but you should
keep a few things in mind:
The size and layout of each struct is determined and fixed at compile time.
Struct definitions may contains fields of any type, including pointers, arrays,
and other structs (as long as they don’t contain themselves):
struct TwoPoints {
struct Point p1;
struct Point p2;
};
struct Buffer {
unsigned int len;
char buf[1024];
};
struct Foo {
struct TwoPoints pp;
struct Foo f; // compiler error: a struct cannot contain itself
};
Pointers to structs
-------------------
Since structs may get very large in size (e.g., struct Buffer), it can be more
efficient to refer to them via pointers rather than copy them around by value:
void inefficient_arg(struct Buffer buf) { ... }
(*b).len
b->len
This notation should be read as, "dereference b, and access the len field."
Keep in mind that this notation still performs a dereference; if b is a NULL
pointer, this expression will segfault.
Self-referential structs
------------------------
Structs cannot contain themselves, but they can contain pointers to themselves.
We can exploit this ability to implement data structures such as singly linked
lists:
struct IntListNode {
int data;
struct IntListNode *next;
};
struct IntDListNode {
int data;
struct IntDListNode *next;
struct IntDListNode *prev;
};
Or trees:
struct IntTreeNode {
int data;
struct IntTreeNode *left;
struct IntTreeNode *right;
};
struct FloatListNode {
float data;
struct FloatListNode *next;
};
struct PointListNode {
struct Point data;
struct PointListNode *next;
};
How do we write a generic (singly) linked list that works with any type?
In C, we can use void *:
struct Node {
void *data; // points to the data held by this node
struct Node *next;
};
Conventionally, the "next" element of the last element of a linked list is just
a NULL pointer. For instance, this is what an empty list looks like:
Unions (optional)
-----------------
Unions are similar to structs, except all fields occupy the same memory
location. For example:
union LongDouble {
unsigned long as_long;
double as_double;
};
+---+---+---+---+---+---+---+---+
| .as_long / .as_double |
+---+---+---+---+---+---+---+---+
|------- union LongDouble ------|
The .as_long and .as_double fields both share the same 8 bytes.
Unions aren’t used very often, but they are useful for reinterpreting the bit
pattern of one type as another, without error-prone pointer casting:
union LongDouble v;
v.as_double = 3.14;
printf("%lu\n", v.as_long);
Unions are also useful for representing data that may be one of type or another,
but never both at the same time. For instance, the following struct carries an
extra "tag" flag to indicate whether the data field carries long or a double:
struct LongOrDouble {
union LongDouble data;
int tag; // data.as_long when non-zero; data.as_double otherwise
};
struct LongOrDouble l;
l.tag = 1;
l.data.as_long = 1234;
struct LongOrDouble d;
d.tag = 0;
d.data.as_double = 3.14;
This coding pattern is known as a "tagged union."
Unions can have fields of different sizes; the sizeof a union is the sizeof its
largest field.
Function pointers
-----------------
+--------------------+
| |
| stack | local variables
| |
+--------------------+
|
v
^
|
+--------------------+
| |
| heap | malloc()ed variables
| |
+--------------------+
+--------------------+
| |
| data | global variables
| |
+--------------------+
+--------------------+
| |
| code | executable machine code
| |
+--------------------+
For security reasons, the OS typically prevents us from writing to code section,
and there’s usually nothing interesting for us to read from there either.
However, function pointers are useful because they allow us to pass functions to
other functions, and help us write generic code in C.
The usual motivating example is qsort(); quicksort is tricky enough that we want
to implement it generically for all types of arrays, but we can’t implement it
without knowing how to compare elements of the array. So, we ask the caller to
provide a pointer to the function used to compare two elements:
says that compar is a pointer to a function that receives two arguments (whose
types are both const void *), and returns an int.
Here are some example comparison functions:
Now we can give them to qsort() to let it sort different types of arrays:
Note that compare_int() and compare_double() must have the exact type that
qsort() asks for; this is why both comparison functions take void * parameters,
and cast them to int * and double * internally.
Function pointer syntax can be tricky to read and write! The following two
declarations are very different:
Typedef (optional)
------------------
Having to write the struct or union keyword every time we declare a Point or
LongDouble can get cumbersome. You can use a typedef to get around this
requirement:
typedef struct {
int x;
int y;
} Point;
typedef union {
unsigned long as_long;
double as_double;
} LongDouble;
Now we can declare and define variables using the struct or union keywords:
Point p;
LongDouble v;
ListNode n;
Typedefs are also a handy way to make function pointer syntax more tolerable
(though it isn’t very obvious what type name we’re defining):
Typedefs are just type aliases and are purely stylistic. There is no agreed
upon standard for when and how you should use a typedef. We don’t require you
to use them in this class, but you may be required to do so in other contexts;
when in doubt, consult the appropriate style guide.