0% found this document useful (0 votes)

45 views11 pages

Type-Safe Generic Data Structures in C

The document discusses two techniques for implementing generic data structures in C: 1. Unsafely using raw memory and pointer casts. This allows storing elements of any type but is not type-safe. 2. Safely using code generation through macros. This avoids code duplication by using a macro to generate a type-safe stack implementation from a template for each element type.

Uploaded by

Random Name

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views11 pages

Type-Safe Generic Data Structures in C

Uploaded by

Random Name

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

home blog projects

Type-safe generic data structures in C

Ian Fisher ― June 7, 2020 ― history

Sign up to receive an email for each new post.

[email protected]

Or, follow my RSS feed.

The rise of a new generation of low-level programming languages like Rust, Go and Zig
has caused C and its primitive type system to fall into some disrepute. Nonetheless,
with suﬃcient creativity it is possible to achieve surprisingly sophisticated results in C.
One such result is generic data structures. This post reviews two techniques for
implementing generic data structures in C: unsafely using raw memory and pointer
casts, and safely using code generation through macros.1

Warm-up: an int stack

The generic data structure we will be implementing is a stack. As a warm-up, we'll write
a regular, non-generic stack that only works for int values. Our minimalist stack will
support only two operations, push and pop—not the most useful data structure, but
enough to cover the fundamental challenges of data structure implementation.

Our stack's internal state consists of a length, a capacity, and a pointer to the stack's
heap-allocated data. The length is the number of elements in the stack while the
capacity is the maximum number of elements that the allocated memory could hold
(not the size of the allocated memory in bytes, which is capacity * sizeof(int) ).

typedef struct {
size_t len, capacity;
int* data;
} IntStack;
Aside: You can ﬁnd the full code for this post here.

We'll start with the push operation. First, the stack is resized if there is not enough
capacity for an additional element. Then, the stack's length is incremented and the value
is written to the end of the data array:2

void IntStack_push(IntStack* stck, int value) {

if (!stck) {
return;
}

if (stck->len + 1 > stck->capacity) {

/* TODO: Handle arithmetic overflow. */
size_t new_capacity = stck->capacity * 2;
int* new_data = realloc(stck->data, new_capacity * sizeof(int));

if (!new_data) {
/* TODO: Handle memory error. */
return;
}

stck->capacity = new_capacity;
stck->data = new_data;
}

stck->len++;
stck->data[stck->len - 1] = value;
}

Note that realloc may return a null pointer if it cannot re-allocate the memory, so it's
not safe to assign the return value to stck->data without checking if it is null ﬁrst. Also
note that the use of realloc assumes that stck->data has been previously allocated
with malloc , an assumption that will be upheld by the constructor that we'll write later.

The pop operation decrements the length ﬁeld and returns the former last element. The
return value is wrapped in an IntResult data structure, since in the case that stck is
null or empty, there is no popped value to return.3
IntResult IntStack_pop(IntStack* stck) {
if (!stck || stck->len == 0) {
return IntResult_error();
}

stck->len--;
return IntResult_of(stck->data[stck->len]);
}

IntResult and its constructors are deﬁned as follows:

typedef struct {
bool error;
int result;
} IntResult;

IntResult IntResult_of(int v) {
IntResult r = { .error = false, .result = v };
return r;
}

IntResult IntResult_error() {
IntResult r = { .error = true };
return r;
}

We'll also provide a constructor for the convenient creation of IntStack objects. It
initializes the stack with a length of 0 and a small initial capacity of heap-allocated
memory.

IntStack IntStack_new() {
size_t capacity = 8;
int* data = malloc(capacity * sizeof(int));
if (!data) {
/* TODO: Handle memory error. */
}
IntStack stck = { .len = 0, .capacity = capacity, .data = data };
return stck;
}

Since the constructor allocates memory from the heap, we need a corresponding
destructor to free it:

void IntStack_free(IntStack* stck) {

if (stck) {
free(stck->data);
}
}

And with that, we have a minimal but complete stack class:

IntStack int_stack = IntStack_new();

IntStack_push(&int_stack, 1);
IntStack_push(&int_stack, 2);
IntResult r = IntStack_pop(&int_stack);
assert(!r.error);
assert(r.result == 2);
IntStack_free(&int_stack);

Unsafe generic stack

IntStack only works for int values. If you wanted a char stack, you would have to
write a another stack implementation. And if you did so, you would ﬁnd that the code for
CharStack is nearly identical to the code for IntStack , because the stack, like most
container data structures, doesn't manipulate its elements in any way, it just stores
them. The only thing it needs to know about them is how much memory each one of
them occupies. So rather than writing different stacks for every element type, let's try
writing a stack that works for any element type.

The critical insight is that, even though at compile time neither the type nor the size of
the elements is known, the elements of the stack can still be stored as unstructured
binary data as long as we keep track of how much memory each element occupies.
In C, binary data can be stored by in an array of char s. To give an example, suppose we
deﬁne a Point type:

typedef struct {
int x, y;
} Point;

We can store a Point object in a char array like this, using memcpy to copy the bytes
into the array:

// Initialize an array with more than enough capacity.

char data[100];

// Initialize a Point object.

Point p = { .x = 42, .y = 43 };

// Copy the Point object into the array.

memcpy(data, &p, sizeof(Point));

// Print the bytes of the array.

for (size_t i = 0; i < sizeof(Point); i++) {
printf("data[%ld] = %d\n", i, data[i]);
}

On my system, the for loop prints

data[0] = 42
data[1] = 0
data[2] = 0
data[3] = 0
data[4] = 43
data[5] = 0
data[6] = 0
data[7] = 0

But the individual elements of the array must be treated as opaque because the
memory layout of Point is up to the compiler.
This technique is the foundation of our generic stack type, UnsafeStack .4 UnsafeStack
has a char* data ﬁeld instead of int* data , and an additional objsize ﬁeld to track
how many bytes each object occupies:

typedef struct {
size_t len, capacity;
size_t objsize;
char* data;
} UnsafeStack;

The logic of resizing the array in UnsafeStack_push is similar to that of IntStack_push .

As we saw in the Point example, we have to use memcpy to copy the value to the end of
the stack, rather than assigning it directly to an index of the array, because the value
could be arbitrarily large. UnsafeStack_push accepts a value of type void* so that any
objects of any type can be passed to it.

void UnsafeStack_push(UnsafeStack* stck, void* value) {

if (!stck) {
return;
}

if (stck->len + 1 > stck->capacity) {

size_t new_capacity = stck->capacity * 2;
char* new_data = realloc(stck->data, new_capacity * stck->objsize);

if (!new_data) {
/* TODO: Handle memory error. */
return;
}

stck->capacity = new_capacity;
stck->data = new_data;
}

memcpy(stck->data + (stck->len * stck->objsize), value, stck->objsize);

stck->len++;
}
Similarly, UnsafeStack_pop returns a pointer of type void* , because the type of the
objects in the data structure isn't known at compile time. The pointer is an offset into
the array calculated as stck->len * stck->objsize :

void* UnsafeStack_pop(UnsafeStack* stck) {

if (!stck || stck->len == 0) {
return NULL;
}

stck->len--;
return stck->data + (stck->len * stck->objsize);
}

Errors can be signalled by returning a null pointer, so we don't need a Result object for
UnsafeStack .

UnsafeStack_new and UnsafeStack_free are very similar to before:

UnsafeStack UnsafeStack_new(size_t objsize) {

size_t capacity = 8;
char* data = malloc(capacity * objsize);
if (!data) {}
UnsafeStack stck = { .len = 0, .capacity = capacity, .objsize = objsize, .da
return stck;
}

void UnsafeStack_free(UnsafeStack* stck) {

if (stck) {
free(stck->data);
}
}

UnsafeStack 's API is a bit different from IntStack 's. When pushing a value onto the
stack, we provide a pointer to it rather than the element itself, and when popping a
value, we cast the return value and then de-reference it.
UnsafeStack_push(&unsafe_stack, &i);
int v = *(int*)UnsafeStack_pop(&unsafe_stack);

Since UnsafeStack_push accepts a pointer argument, we cannot pass integer literals to

it.

Safe generic stack using macros

Unlike casts in Java and type assertions in Go, casts in C are entirely unsafe. The
compiler will not complain at compile time and the program will not throw an exception
at runtime if the cast is invalid. Rather, it will silently continue on to do strange and
terrible things, like accessing uninitialized memory or overwriting memory belonging to
other variables. Since C is a statically typed language, we would prefer to avoid these
possible errors.

Let's return to the fundamental problem. If we want type-safe stacks, then we have to
write a different, though virtually identical, implementation for each type. It's essentially
a problem of code duplication.

Fortunately, C has a mechanism for dealing with code duplication: macros. Rather than
writing out a full stack implementation for each type, we can use a macro to generate it
for us from a template. We will take our IntStack implementation, replace all uses of
int with a type parameter, and then wrap the whole implementation in a macro that is
parameterized on type . Whenever we want to use a stack for a new type, we'll call the
macro to generate the code for the implementation. As far as the C compiler is
concerned, it's as if we wrote a separate implementation for each type with the concrete
types in the code so that the compiler can type-check the code properly.

The macro code is a little hard to read, not the least because each line must end with a
backslash to continue the macro on to the next line, but it is recognizably almost the
same code as for IntStack . The syntax typename##_new , typename##_free , etc., tells
the preprocessor to glue typename , a macro parameter, to literal strings like _new or
_free , yielding results like FloatStack_new or StringStack_free . The macro parameter
type is substituted wherever we had int in the original IntStack code.

#define DECL_STACK(typename, type) \

typedef struct { \
size_t len, capacity; \

type* data; \
type data;
} typename; \
\
typedef struct { \
bool error; \
type result; \
} typename##Result; \
\
typename typename##_new() { \
size_t capacity = 8; \
type* data = malloc(capacity * sizeof(type)); \
if (!data) {} \
typename stck = { .len = 0, .capacity = capacity, .data = data }; \
return stck; \
} \
\
void typename##_free(typename* stck) { \
if (stck) { \
free(stck->data); \
} \
} \
\
size_t typename##_length(typename* stck) { \
return stck ? stck->len : 0; \
} \
\
void typename##_push(typename* stck, type value) { \
if (!stck) { \
return; \
} \
\
if (stck->len + 1 > stck->capacity) { \
size_t new_capacity = stck->capacity * 2; \
type* new_data = realloc(stck->data, new_capacity * sizeof(type)); \
\
if (!new_data) { \
return; \
} \
\
stck->capacity = new_capacity; \

stck->data = new data; \

stck data new_data; \
} \
\
stck->len++; \
stck->data[stck->len - 1] = value; \
} \
\
typename##Result typename##_pop(typename* stck) { \
if (!stck || stck->len == 0) { \
typename##Result errorval = { .error = true }; \
return errorval; \
} \
\
type value = stck->data[stck->len - 1]; \
stck->len--; \
typename##Result r = { .error = false, .result = value }; \
return r; \
}

We then call DECL_STACK to declare a new stack type, either in a header ﬁle or at the top
level of a program:

DECL_STACK(SafeIntStack, int)

The resultant API achieves the safety and convenience of IntStack and the generality
of UnsafeStack :

SafeIntStack safe_int_stack = SafeIntStack_new();

SafeIntStack_push(&safe_int_stack, 1);
SafeIntStack_push(&safe_int_stack, 2);
SafeIntStackResult r = SafeIntStack_pop(&safe_int_stack);
assert(!r.error);
assert(r.result == 2);
SafeIntStack_free(&safe_int_stack);

Note that SafeIntStack is still only as safe as C's type system, which will, for example,
allow you to use a string literal where an int is expected, with only a compiler warning.
This is a fundamental limitation that cannot be worked-around.
The safe stack data structure has no overhead above hand-written code, except that
each new declaration increases the program size by a constant amount (unlike the
unsafe stack, which uses the same code for all data structures). Incidentally, this code
generation technique is essentially how templates are implemented behind the scenes
in C++.

That it is possible to write type-safe generic data structures in a language whose type
system does not natively support them speaks to C's flexibility.5 But the techniques we
used—unchecked pointer casts and unsanitary lexical macros—are themselves quite
unsafe if used improperly. Flexibility, simplicity and safety: a language can attain at
most two out of three. Rust and Haskell choose flexibility and safety. Go chooses safety
and simplicity.6 C chooses flexibility and simplicity. Each choice has its trade-offs. The
tendency for modern languages to prefer safety is a direct consequence of the
innumerable bugs found in C code that could have been prevented by stronger compile-
time guarantees. Nonetheless, as this post has shown, flexibility and simplicity are a
powerful combination.

1. I assume the reader is proﬁcient in C. The classic text on C is The C Programming Language by Brian Kernighan
and Dennis Ritchie, universally known as K&R after the author's initials. Look for the second edition. Though K&R
has aged remarkably well, best practices have been reﬁned and some language features have changed since
1988. Modern C by Jens Gustedt is a great re-introduction to the modern language.↩

2. Here and throughout I've added TODO s to mark where a more robust implementation would need to handle an
edge case.↩

3. Traditionally, error handling in C is done either by returning a special error value, or "out-of-band" by setting the
global errno variable. I ﬁnd the latter approach inelegant, and the former is impossible because any int value is
a possible legal return value of IntStack_pop .↩

4. GLib, the utility library for the GNOME desktop environment, uses this technique for its garray generic array
type.↩

5. For more evidence, see Daniel Holden's Cello, a framework for high-level programming in C.↩

. Debatable, perhaps. Go does allow some unsafe constructs like interface{} , but it largely abandons the reckless
permissiveness of C.↩

Intro To C - Module 7
No ratings yet
Intro To C - Module 7
10 pages
C Language Topics For Interview
No ratings yet
C Language Topics For Interview
24 pages
C Language Topics For Interview
No ratings yet
C Language Topics For Interview
24 pages
L05 Riscvi
No ratings yet
L05 Riscvi
50 pages
Intro To C - Module 5
No ratings yet
Intro To C - Module 5
15 pages
Stack (Data Structure) : "Pushdown" Redirects Here. For The Strength Training Exercise, See
No ratings yet
Stack (Data Structure) : "Pushdown" Redirects Here. For The Strength Training Exercise, See
15 pages
Stack
No ratings yet
Stack
16 pages
Stacks Tutorial
No ratings yet
Stacks Tutorial
12 pages
1.introduction To Data Structures
No ratings yet
1.introduction To Data Structures
114 pages
Practical DS 2024
No ratings yet
Practical DS 2024
7 pages
CS DataStructure-Lecture 2-Linked Stack
No ratings yet
CS DataStructure-Lecture 2-Linked Stack
21 pages
Stack Using Array
No ratings yet
Stack Using Array
3 pages
22 01ContiguousStacks
No ratings yet
22 01ContiguousStacks
26 pages
Generic Data Structures in C - Andreinc
No ratings yet
Generic Data Structures in C - Andreinc
13 pages
Malloc Lab & Midterm Solutions: Recitation 11: Tuesday: 11/08/2016
No ratings yet
Malloc Lab & Midterm Solutions: Recitation 11: Tuesday: 11/08/2016
64 pages
Javascript
No ratings yet
Javascript
17 pages
Mastering C Pointers - From Basics To Real Applications
No ratings yet
Mastering C Pointers - From Basics To Real Applications
14 pages
L9 Stack ImplementationArrays
No ratings yet
L9 Stack ImplementationArrays
27 pages
Article 8 Uses of Pointers in C Alex Via
No ratings yet
Article 8 Uses of Pointers in C Alex Via
8 pages
Lecture 5 - CS50x 2025
No ratings yet
Lecture 5 - CS50x 2025
29 pages
03B+04A Dynamic Memory Allocation Memory Leaks
No ratings yet
03B+04A Dynamic Memory Allocation Memory Leaks
53 pages
Dynamic Memory Allocation
No ratings yet
Dynamic Memory Allocation
5 pages
Stacks Explained NewNotes
No ratings yet
Stacks Explained NewNotes
5 pages
DS Lab Work 4
No ratings yet
DS Lab Work 4
6 pages
Module 2 DSC
No ratings yet
Module 2 DSC
17 pages
C Boot Camp: Feb 26, 2017 Ray Axel Jerry
No ratings yet
C Boot Camp: Feb 26, 2017 Ray Axel Jerry
34 pages
Section 4
No ratings yet
Section 4
8 pages
Stack - Sample Program
No ratings yet
Stack - Sample Program
5 pages
08 Structs
No ratings yet
08 Structs
7 pages
22CSP02 - DS Lab Question Bank
No ratings yet
22CSP02 - DS Lab Question Bank
24 pages
L21 DynamicAllocation
No ratings yet
L21 DynamicAllocation
56 pages
Assignment 1 Week 1 961
No ratings yet
Assignment 1 Week 1 961
6 pages
Source Code Security: I. II. I. II. Iii. IV. V. VI. Vii. Viii. IX. X
No ratings yet
Source Code Security: I. II. I. II. Iii. IV. V. VI. Vii. Viii. IX. X
51 pages
Memory Management
No ratings yet
Memory Management
4 pages
Lecture 5 - CS50x
No ratings yet
Lecture 5 - CS50x
9 pages
4 MemoryCorruption
No ratings yet
4 MemoryCorruption
55 pages
Introduction To Data Structures
No ratings yet
Introduction To Data Structures
34 pages
MergeResult 2024 04 06 04 46 13
No ratings yet
MergeResult 2024 04 06 04 46 13
112 pages
C Programming AllClasses-Outline-198-233
No ratings yet
C Programming AllClasses-Outline-198-233
36 pages
2024-Lab Exam Experiments BCA 3rd Sem Riya Updated
No ratings yet
2024-Lab Exam Experiments BCA 3rd Sem Riya Updated
36 pages
11 - 5 - Garbage Collection (09 - 51)
No ratings yet
11 - 5 - Garbage Collection (09 - 51)
6 pages
DS Notes-1
No ratings yet
DS Notes-1
174 pages
Linked List
No ratings yet
Linked List
18 pages
02A-II Stack Arrays Strings
No ratings yet
02A-II Stack Arrays Strings
47 pages
Unit II
No ratings yet
Unit II
32 pages
Dsac
No ratings yet
Dsac
38 pages
Data Structures & Algorithms in C
No ratings yet
Data Structures & Algorithms in C
17 pages
C Lecture 5
No ratings yet
C Lecture 5
68 pages
Stack Notes
No ratings yet
Stack Notes
9 pages
DS Unit Iv
No ratings yet
DS Unit Iv
27 pages
6 Pointer, Linked List, DMM
No ratings yet
6 Pointer, Linked List, DMM
14 pages
1 SourceCodeSecurity
No ratings yet
1 SourceCodeSecurity
57 pages
Lec13 HeapAttacks
No ratings yet
Lec13 HeapAttacks
34 pages
EENG212 - Algorithms & Data Structures: Stacks
No ratings yet
EENG212 - Algorithms & Data Structures: Stacks
5 pages
11 - 6 - Memory-Related Perils and Pitfalls (13 - 38)
No ratings yet
11 - 6 - Memory-Related Perils and Pitfalls (13 - 38)
8 pages
CS DataStructure-Lecture 1-Stack Based Array
No ratings yet
CS DataStructure-Lecture 1-Stack Based Array
27 pages
Memory Allocation
No ratings yet
Memory Allocation
21 pages
uC/OS-II Event Flags
No ratings yet
uC/OS-II Event Flags
32 pages
SV Enum Datatypes Sample PDF
No ratings yet
SV Enum Datatypes Sample PDF
4 pages
New Microsoft Office Word Document
No ratings yet
New Microsoft Office Word Document
523 pages
Chapter 6 - C Program Data Types, Constants & Variables
No ratings yet
Chapter 6 - C Program Data Types, Constants & Variables
9 pages
BCSL305 - DSA Lab - 3rd Sem
No ratings yet
BCSL305 - DSA Lab - 3rd Sem
32 pages
Notes On C by Vinayaka
No ratings yet
Notes On C by Vinayaka
59 pages
windows驱动架构设计
No ratings yet
windows驱动架构设计
38 pages
Black
No ratings yet
Black
41 pages
Lift Simulator Assigment - Programing Languges
No ratings yet
Lift Simulator Assigment - Programing Languges
4 pages
DS Lab Program 8
No ratings yet
DS Lab Program 8
4 pages
Ch-14-Structure Union Enumerated Data Types
No ratings yet
Ch-14-Structure Union Enumerated Data Types
24 pages
15A. Hash Table and Function
No ratings yet
15A. Hash Table and Function
18 pages
Chapter 7 - Controlling Program
No ratings yet
Chapter 7 - Controlling Program
38 pages
Connection Framework Ref Manual
No ratings yet
Connection Framework Ref Manual
145 pages
File Structure MCQ
No ratings yet
File Structure MCQ
82 pages
ADVLSI - Model QP Solution
No ratings yet
ADVLSI - Model QP Solution
17 pages
Compiler Design: Program To Search A Character From A Given String
No ratings yet
Compiler Design: Program To Search A Character From A Given String
17 pages
Old Company Name in Catalogs and Other Documents
No ratings yet
Old Company Name in Catalogs and Other Documents
15 pages
C Programming Test
100% (2)
C Programming Test
116 pages
Solution For Structures and Union
No ratings yet
Solution For Structures and Union
10 pages
Boost - Signals: Douglas Gregor
No ratings yet
Boost - Signals: Douglas Gregor
38 pages
100+ C Interview Questions, Your Interviewer Might Ask: For Free Interview Preparation Check The Links Below
No ratings yet
100+ C Interview Questions, Your Interviewer Might Ask: For Free Interview Preparation Check The Links Below
87 pages
OS1-practsolutionset
No ratings yet
OS1-practsolutionset
44 pages
VT Presentation
No ratings yet
VT Presentation
22 pages
Business Mail X400 FileWork Windows Programmers Guide en
No ratings yet
Business Mail X400 FileWork Windows Programmers Guide en
119 pages
Your Help Wanted - Language Proposals in Flight - Walter E Brown - CppCon 2014
No ratings yet
Your Help Wanted - Language Proposals in Flight - Walter E Brown - CppCon 2014
34 pages
Comparison of MISRA C Testing Tools
No ratings yet
Comparison of MISRA C Testing Tools
9 pages
Libft en
No ratings yet
Libft en
19 pages
Introduction To Programming Language USING C (CC-102)
No ratings yet
Introduction To Programming Language USING C (CC-102)
35 pages
C Programming Project
No ratings yet
C Programming Project
5 pages

Type-Safe Generic Data Structures in C

Uploaded by

Type-Safe Generic Data Structures in C

Uploaded by

home blog projects

Type-safe generic data structures in C

Sign up to receive an email for each new post.

Or, follow my RSS feed.

Warm-up: an int stack

void IntStack_push(IntStack* stck, int value) {

if (stck->len + 1 > stck->capacity) {

IntResult and its constructors are deﬁned as follows:

void IntStack_free(IntStack* stck) {

And with that, we have a minimal but complete stack class:

IntStack int_stack = IntStack_new();

Unsafe generic stack

// Initialize an array with more than enough capacity.

// Initialize a Point object.

// Copy the Point object into the array.

// Print the bytes of the array.

On my system, the for loop prints

The logic of resizing the array in UnsafeStack_push is similar to that of IntStack_push .

void UnsafeStack_push(UnsafeStack* stck, void* value) {

if (stck->len + 1 > stck->capacity) {

memcpy(stck->data + (stck->len * stck->objsize), value, stck->objsize);

void* UnsafeStack_pop(UnsafeStack* stck) {

UnsafeStack_new and UnsafeStack_free are very similar to before:

UnsafeStack UnsafeStack_new(size_t objsize) {

void UnsafeStack_free(UnsafeStack* stck) {

Since UnsafeStack_push accepts a pointer argument, we cannot pass integer literals to

Safe generic stack using macros

#define DECL_STACK(typename, type) \

stck->data = new data; \

SafeIntStack safe_int_stack = SafeIntStack_new();

You might also like