0% found this document useful (0 votes)
4 views

Beeje Guide c Programming

Beej's Guide to C Programming is a comprehensive resource for learning C programming, covering topics from basic syntax to advanced concepts like pointers and arrays. The guide includes practical examples, explanations of various C features, and guidelines for compilation and function usage. It is designed for a wide audience, including beginners and those looking to deepen their understanding of C.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Beeje Guide c Programming

Beej's Guide to C Programming is a comprehensive resource for learning C programming, covering topics from basic syntax to advanced concepts like pointers and arrays. The guide includes practical examples, explanations of various C features, and guidelines for compilation and function usage. It is designed for a wide audience, including beginners and those looking to deepen their understanding of C.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 223

Beej’s Guide to C Programming

Brian “Beej Jorgensen” Hall

v0.5.19, Copyright © December 23, 2020


Contents

Foreward 1
Audience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Platform and Compiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Official Homepage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Email Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Mirroring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Note for Translators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Copyright and Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

Hello, World! 4
What to Expect from C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Hello, World! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Compilation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Building with gcc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
C Versions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Variables and Statements 9


Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Variable Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Variable Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Boolean Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Operators and Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
The sizeof Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Ternary Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Pre-and-Post Increment-and-Decrement . . . . . . . . . . . . . . . . . . . . . . . . . . 13
The Comma Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Conditional Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Boolean Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Flow Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
The if statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
The while statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
The do-while statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
The for statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
The switch Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

Functions 21
Passing by Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Function Prototypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

Pointers—Cower In Fear! 24
Memory and Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

i
CONTENTS ii

Pointer Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Dereferencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Passing Pointers as Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
The NULL Pointer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
A Note on Declaring Pointers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

Arrays 30
Easy Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Getting the Length of an Array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Array Initializers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Out of Bounds! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Multidimensional Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Arrays and Pointers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Getting a Pointer to an Array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Passing Single Dimensional Arrays to Functions . . . . . . . . . . . . . . . . . . . . . . 34
Changing Arrays in Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Passing Multidimensional Arrays to Functions . . . . . . . . . . . . . . . . . . . . . . . 36

Strings 37
Constant Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
String Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
String Variables as Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
String Initializers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Getting String Length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
String Termination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Copying a String . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

Structs 42
Declaring a Struct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Struct Initializers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Passing Structs to Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
The Arrow Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Copying and Returning structs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

typedef: Making New Types 46


typedef in Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Scoping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
typedef in Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
typedef and structs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
typedef and Other Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
typedef and Pointers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
typedef and Capitalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

Pointers II: Arithmetic 50


Pointer Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Adding to Pointers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Changing Pointers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Subtracting Pointers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Array/Pointer Equivalence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Array/Pointer Equivalence in Function Calls . . . . . . . . . . . . . . . . . . . . . . . . 53
void Pointers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

Manual Memory Allocation 58


Allocating and Deallocating, malloc() and free() . . . . . . . . . . . . . . . . . . . . . . . 58
Error Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
CONTENTS iii

Allocating Space for an Array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59


An Alternative: calloc() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Changing Allocated Size with realloc() . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
realloc() with NULL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Aligned Allocations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

Scope 64
Block Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Where To Define Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Variable Hiding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
File Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
for-loop Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
A Note on Function Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

Types II: Way More Types! 67


Signed and Unsigned Integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Character Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
More Integer Types: short, long, long long . . . . . . . . . . . . . . . . . . . . . . . . . 69
More Float: double and long double . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
How Many Decimal Digits? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Converting to Decimal and Back . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Constant Numeric Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Hexadecimal and Octal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Integer Constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Floating Point Constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

Types III: Conversions 78


String Conversions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
Numeric Value to String . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
String to Numeric Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Numeric Conversions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Boolean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Integer to Integer Conversions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Integer and Floating Point Conversions . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Implicit Conversions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
The Integer Promotions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
The Usual Arithmetic Conversions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
void* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Explicit Conversions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Casting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

Types IV: Qualifiers and Specifiers 85


Type Qualifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
const . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
restrict . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
volatile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
_Atomic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Type Specifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
auto . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
static . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
extern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

Multifile Projects 92
Includes and Function Prototypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
CONTENTS iv

Dealing with Repeated Includes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94


static and extern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Compiling with Object Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

The Outside Environment 96


Command Line Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
The Last argv is NULL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
The Alternate: char **argv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Fun Facts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
Exit Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
Other Exit Status Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Environment Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Setting Environment Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

The C Preprocessor 104


#include . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
Simple Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Conditional Compilation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
If Defined, #ifdef and #endif . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
If Not Defined, #ifndef . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
#else . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
General Conditional: #if, #elif . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Losing a Macro: #undef . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
Built-in Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
Mandatory Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
Optional Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
Macros with Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
Macros with One Argument . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Macros with More than One Argument . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Macros with Variable Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
Stringification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
Concatenation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
Multiline Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
The #error Directive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
The #pragma Directive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
Non-Standard Pragmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Standard Pragmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
_Pragma Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
The #line Directive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
The Null Directive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

structs II: More Fun with structs 117


Anonymous structs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
Self-Referential structs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
Flexible Array Members . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
Padding Bytes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
offsetof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Bit-Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Non-Adjacent Bit-Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
Signed or Unsigned ints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
Unnamed Bit-Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Zero-Width Unnamed Bit-Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Unions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
Pointers to unions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
CONTENTS v

Characters and Strings II 126


Escape Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
Frequently-used Escapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
Rarely-used Escapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
Numeric Escapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

Enumerated Types: enum 130


Behavior of enum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
Numbering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
Trailing Commas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
Style . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
Your enum is a Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

Pointers III: Pointers to Pointers and More 134


Pointers to Pointers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
Pointer Pointers and const . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
Multibyte Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
The NULL Pointer and Zero . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
Pointers as Integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
Pointer Differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
Pointers to Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

Bitwise Operations 143


Bitwise AND, OR, XOR, and NOT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
Bitwise Shift . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

Variadic Functions 145


Ellipses in Function Signatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
Getting the Extra Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
va_list Functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

Locale and Internationalization 148


Setting the Localization, Quick and Dirty . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
Getting the Monetary Locale Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
Monetary Digit Grouping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
Separators and Sign Position . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
Example Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
Localization Specifics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

Standard I/O Library 153


fopen() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
freopen() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
fclose() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
printf(), fprintf() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
scanf(), fscanf() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
gets(), fgets() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
getc(), fgetc(), getchar() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
puts(), fputs() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
putc(), fputc(), putchar() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
fseek(), rewind() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
ftell() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
fgetpos(), fsetpos() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
ungetc() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
fread() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
CONTENTS vi

fwrite() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
feof(), ferror(), . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
perror() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
remove() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
rename() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
tmpfile() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
tmpnam() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
setbuf(), setvbuf() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
fflush() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193

String Manipulation 195


strlen() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
strcmp(), strncmp() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
strcat(), strncat() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
strchr(), strrchr() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
strcpy(), strncpy() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
strspn(), strcspn() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
strstr() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
strtok() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206

Mathematics 208
sin(), sinf(), sinl() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
cos(), cosf(), cosl() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
tan(), tanf(), tanl() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
asin(), asinf(), asinl() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
acos(), acosf(), acosl() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
atan(), atanf(), atanl(), . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
sqrt() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
Foreward

No point in wasting words here, folks, let’s jump straight into the C code:
E((ck?main((z?(stat(M,&t)?P+=a+'{'?0:3:
execv(M,k),a=G,i=P,y=G&255,
sprintf(Q,y/'@'-3?A(*L(V(%d+%d)+%d,0)

And they lived happily ever after. The End.


What’s this? You say something’s still not clear about this whole C programming language thing?
Well, to be quite honest, I’m not even sure what the above code does. It’s a snippet from one of the entires in
the 2001 International Obfuscated C Code Contest1 , a wonderful competition wherein the entrants attempt
to write the most unreadable C code possible, with often surprising results.
The bad news is that if you’re a beginner in this whole thing, all C code you see probably looks obfuscated!
The good news is, it’s not going to be that way for long.
What we’ll try to do over the course of this guide is lead you from complete and utter sheer lost confusion
on to the sort of enlightened bliss that can only be obtained though pure C programming. Right on.

Audience
This guide assumes that you’ve already got some programming knowledge under your belt from another
language, such as Python2 , JavaScript3 , Java4 , Rust5 , Go6 , Swift7 , etc. (Objective-C8 devs will have a par-
ticularly easy time of it!)
We’re going to assume you know what variables are, what loops do, how functions work, and so on.
If that’s not you for whatever reason the best I can hope to provide is some pastey entertainment for your
reading pleasure. The only thing I can reasonably promise is that this guide won’t end on a cliffhanger…or
will it?

Platform and Compiler


I’ll try to stick to Plain Ol’-Fashioned ISO-standard C9 . Well, for the most part. Here and there I might go
crazy and start talking about POSIX10 or something, but we’ll see.
1
http://www.ioccc.org/
2
https://en.wikipedia.org/wiki/Python_(programming_language)
3
https://en.wikipedia.org/wiki/JavaScript
4
https://en.wikipedia.org/wiki/Java_(programming_language)
5
https://en.wikipedia.org/wiki/Rust_(programming_language)
6
https://en.wikipedia.org/wiki/Go_(programming_language)
7
https://en.wikipedia.org/wiki/Swift_(programming_language)
8
https://en.wikipedia.org/wiki/Objective-C
9
https://en.wikipedia.org/wiki/ANSI_C
10
https://en.wikipedia.org/wiki/POSIX

1
FOREWARD 2

Unix users (e.g. Linux, BSD, etc.) try running cc or gcc from the command line–you might already have a
compiler installed. If you don’t, search your distribution for installing gcc or clang.
Windows users should check out Visual Studio Community11 . Or, if you’re looking for a more Unix-like
experience (recommended!), install WSL12 and gcc.
Mac users will want to install XCode13 , and in particular the command line tools.
There are a lot of compilers out there, and virtually all of them will work for this book. And for those not in
the know, a C++ compiler will compile C most code, so it’ll work for the purposes of this guide.

Official Homepage
This official location of this document is http://beej.us/guide/bgc/14 . Maybe this’ll change in the future, but
it’s more likely that all the other guides are migrated off Chico State computers.

Email Policy
I’m generally available to help out with email questions so feel free to write in, but I can’t guarantee a
response. I lead a pretty busy life and there are times when I just can’t answer a question you have. When
that’s the case, I usually just delete the message. It’s nothing personal; I just won’t ever have the time to give
the detailed answer you require.
As a rule, the more complex the question, the less likely I am to respond. If you can narrow down your
question before mailing it and be sure to include any pertinent information (like platform, compiler, error
messages you’re getting, and anything else you think might help me troubleshoot), you’re much more likely
to get a response.
If you don’t get a response, hack on it some more, try to find the answer, and if it’s still elusive, then write
me again with the information you’ve found and hopefully it will be enough for me to help out.
Now that I’ve badgered you about how to write and not write me, I’d just like to let you know that I fully
appreciate all the praise the guide has received over the years. It’s a real morale boost, and it gladdens me to
hear that it is being used for good! :-) Thank you!

Mirroring
You are more than welcome to mirror this site, whether publicly or privately. If you publicly mirror the site
and want me to link to it from the main page, drop me a line at [email protected].

Note for Translators


If you want to translate the guide into another language, write me at [email protected] and I’ll link to your
translation from the main page. Feel free to add your name and contact info to the translation.
Please note the license restrictions in the Copyright and Distribution section, below.

Copyright and Distribution


Beej’s Guide to Network Programming is Copyright © 2020 Brian “Beej Jorgensen” Hall.
11
https://visualstudio.microsoft.com/vs/community/
12
https://docs.microsoft.com/en-us/windows/wsl/install-win10
13
https://developer.apple.com/xcode/
14
http://beej.us/guide/bgc/
FOREWARD 3

With specific exceptions for source code and translations, below, this work is licensed under the Creative
Commons Attribution-Noncommercial-No Derivative Works 3.0 License. To view a copy of this license,
visit http://creativecommons.org/licenses/by-nc-nd/3.0/ or send a letter to Creative Commons,
171 Second Street, Suite 300, San Francisco, California, 94105, USA.
One specific exception to the “No Derivative Works” portion of the license is as follows: this guide may
be freely translated into any language, provided the translation is accurate, and the guide is reprinted in its
entirety. The same license restrictions apply to the translation as to the original guide. The translation may
also include the name and contact information for the translator.
The C source code presented in this document is hereby granted to the public domain, and is completely free
of any license restriction.
Educators are freely encouraged to recommend or supply copies of this guide to their students.
Contact [email protected] for more information.
Hello, World!

What to Expect from C


“Where do these stairs go?” “They go up.”
—Ray Stantz and Peter Venkman, Ghostbusters
C is a low-level language.
It didn’t used to be. Back in the day when people carved punch cards out of granite, C was an incredible way
to be free of the drudgery of lower-level languages like assembly15 .
But now in these modern times, current-generation languages offer all kinds of features that didn’t exist in
1972 when C was invented. This means C is a pretty basic language with not a lot of features. It can do
anything, but it can make you work for it.
So why would we even use it today?
• As a learning tool: not only is C a venerable piece of computing history, but it is connected to the bare
metal16 in a way that present-day languages are not. When you learn C, you learn about how software
interfaces with computer memory at a low level. There are no seatbelts. You’ll write software that
crashes, I assure you. And that’s all part of the fun!
• As a useful tool: C still is used for certain applications, such as building operating systems17 or in
embedded systems18 . (Though the Rust19 programming language is eyeing both these fields!)
If you’re familiar with another language, a lot of things about C are easy. C inspired many other languages,
and you’ll see bits of it in Go, Rust, Swift, Python, JavaScript, Java, and all kinds of other languages. Those
parts will be familiar.
The one thing about C that hangs people up is pointers. Virtually everything else is familiar, but pointers are
the weird one. The concept behind pointers is likely one you already know, but C forces you to be explicit
about it, using operators you’ve likely never seen before.
It’s especially insidious because once you grok20 pointers, they’re suddenly easy. But up until that moment,
they’re slippery eels.
Everything else in C is just memorizing another way (or sometimes the same way!) of doing something
you’ve done already. Pointers are the weird bit.
So get ready for a rollicking adventure as close to the core of the computer as you can get without assembly,
in the most influential computer language of all time21 . Hang on!
15
https://en.wikipedia.org/wiki/Assembly_language
16
https://en.wikipedia.org/wiki/Bare_machine
17
https://en.wikipedia.org/wiki/Operating_system
18
https://en.wikipedia.org/wiki/Embedded_system
19
https://en.wikipedia.org/wiki/Rust_(programming_language)
20
https://en.wikipedia.org/wiki/Grok
21
I know someone will fight me on that, but it’s gotta be at least in the top three, right?

4
HELLO, WORLD! 5

Hello, World!
This is the canonical example of a C program. Everyone uses it. (Note that the numbers to the left are for
reader reference only, and are not part of the source code.)
1 /* Hello world program */
2

3 #include <stdio.h>
4

5 int main(void)
6 {
7 printf("Hello, World!\n"); // Actually do the work here
8 }

We’re going to don our long-sleeved heavy-duty rubber gloves, grab a scalpel, and rip into this thing to see
what makes it tick. So, scrub up, because here we go. Cutting very gently…
Let’s get the easy thing out of the way: anything between the digraphs /* and */ is a comment and will be
completely ignored by the compiler. Same goes for anything on a line after a //. This allows you to leave
messages to yourself and others, so that when you come back and read your code in the distant future, you’ll
know what the heck it was you were trying to do. Believe me, you will forget; it happens.
Now, what is this #include? GROSS! Well, it tells the C Preprocessor to pull the contents of another file
and insert it into the code right there.
Wait—what’s a C Preprocessor? Good question. There are two stages (well, technically there are more than
two, but hey, let’s pretend there are two and have a good laugh) to compilation: the preprocessor and the
compiler. Anything that starts with pound sign, or “octothorpe”, (#) is something the preprocessor operates
on before the compiler even gets started. Common preprocessor directives, as they’re called, are #include
and #define. More on that later.
Before we go on, why would I even begin to bother pointing out that a pound sign is called an octothorpe?
The answer is simple: I think the word octothorpe is so excellently funny, I have to gratuitously spread its
name around whenever I get the opportunity. Octothorpe. Octothorpe, octothorpe, octothorpe.
So anyway. After the C preprocessor has finished preprocessing everything, the results are ready for the
compiler to take them and produce assembly code22 , machine code23 , or whatever it’s about to do. Don’t
worry about the technical details of compilation for now; just know that your source runs through the pre-
processor, then the output of that runs through the compiler, then that produces an executable for you to run.
Octothorpe.
What about the rest of the line? What’s <stdio.h>? That is what is known as a header file. It’s the dot-h
at the end that gives it away. In fact it’s the “Standard I/O” (stdio) header file that you will grow to know
and love. It contains preprocessor directives and function prototypes (more on that later) for common input
and output needs. For our demo program, we’re outputting the string “Hello, World!”, so we in particular
need the function prototype for the printf() function from this header file. Basically, if we tried to use
printf() without #include <stdio.h>, the compiler would have complained to us about it.

How did I know I needed to #include <stdio.h> for printf()? Answer: it’s in the documentation. If
you’re on a Unix system, man printf and it’ll tell you right at the top of the man page what header files are
required. Or see the reference section in this book. :-)
Holy moly. That was all to cover the first line! But, let’s face it, it has been completely dissected. No mystery
shall remain!
So take a breather…look back over the sample code. Only a couple easy lines to go.
Welcome back from your break! I know you didn’t really take a break; I was just humoring you.
22
https://en.wikipedia.org/wiki/Assembly_language
23
https://en.wikipedia.org/wiki/Machine_code
HELLO, WORLD! 6

The next line is main(). This is the definition of the function main(); everything between the squirrelly
braces ({ and }) is part of the function definition.
How do you call a different function, anyway? The answer lies in the printf() line, but we’ll get to that in
a minute.
Now, the main function is a special one in many ways, but one way stands above the rest: it is the function
that will be called automatically when your program starts executing. Nothing of yours gets called before
main(). In the case of our example, this works fine since all we want to do is print a line and exit.

Oh, that’s another thing: once the program executes past the end of main(), down there at the closing
squirrelly brace, the program will exit, and you’ll be back at your command prompt.
So now we know that that program has brought in a header file, stdio.h, and declared a main() function
that will execute when the program is started. What are the goodies in main()?
I am so happy you asked. Really! We only have the one goodie: a call to the function printf(). You can
tell this is a function call and not a function definition in a number of ways, but one indicator is the lack of
squirrelly braces after it. And you end the function call with a semicolon so the compiler knows it’s the end
of the expression. You’ll be putting semicolons after most everything, as you’ll see.
You’re passing one parameter to the function printf(): a string to be printed when you call it. Oh, yeah—
we’re calling a function! We rock! Wait, wait—don’t get cocky. What’s that crazy \n at the end of the string?
Well, most characters in the string look just like they are stored. But there are certain characters that you can’t
print on screen well that are embedded as two-character backslash codes. One of the most popular is \n (read
“backslash-N”) that corresponds to the newline character. This is the character that causing further printing
to continue on the next line instead of the current. It’s like hitting return at the end of the line.
So copy that code into a file called hello.c and build it. On a Unix-like platform (e.g. Linux, BSD, Mac,
or WSL), you’ll build with a command like so:
gcc -o hello hello.c

(This means “compile hello.c, and output an executable called hello”.)


After that’s done, you should have a file called hello that you can run with this command:
./hello

(The leading ./ tells the shell to “run from the current directory”.)
And see what happens:
Hello, World!

It’s done and tested! Ship it!

Compilation
Let’s talk a bit more about how to build C programs, and what happens behind the scenes there.
Like other languages, C has source code. But, depending on what language you’re coming from, you might
never have had to compile your source code into an executable.
Compilation is the process of taking your C source code and turning it into a program that your operating
system can execute.
JavaScript and Python devs aren’t used to a separate compilation step at all–though behind the scenes it’s
happening! Python compiles your source code into something called bytecode that the Python virtual machine
can execute. Java devs are used to compilation, but that produces bytecode for the Java Virtual Machine.
When compiling C, machine code is generated. This is the 1s and 0s that can be executed directly by the
CPU.
HELLO, WORLD! 7

Languages that typically aren’t compiled are called interpreted languages. But as we mentioned
with Java and Python, they also have a compilation step. And there’s no rule saying that C can’t
be interpreted. (There are C interpreters out there!) In short, it’s a bunch of gray areas. Com-
pilation in general is just taking source code and turning it into another, more easily-executed
form.
The C compiler is the program that does the compilation.
As we’ve already said, gcc is a compiler that’s installed on a lot of Unix-like operating systems24 . And it’s
commonly run from the command line in a terminal, but not always. You can run it from your IDE, as well.
But we’ll do some command line examples here because there are too many IDEs to cover. Search the
Internet for your IDE and “how to compile C” for more information.
So how do we do command line builds?

Building with gcc


If you have a source file called hello.c in the current directory, you can build that into a program called
hello with this command typed in a terminal:

gcc -o hello hello.c

The -o means “output to this file”25 . And there’s hello.c at the end, the name of the file we want to compile.
If your source is broken up into multiple files, you can compile them all together (almost as if they were one
file, but the rules are actually more complex than that) by putting all the .c files on the command line:
gcc -o awesomegame ui.c characters.c npc.c items.c

and they’ll all get built together into a big executable.


That’s enough to get started—later we’ll talk details about multiple source files, object files, and all kinds of
fun stuff.

C Versions
C has come a long way over the years, and it had many named version numbers to describe which dialect of
the language you’re using.
These generally refer to the year of the specification.
The most famous are C89, C99, and C11. We’ll focus on the latter in this book.
But here’s a more complete table:

Version Description
K&R C 1978, the original. Named after Brian Kernighan and Dennis Ritchie. Ritchie
designed and coded the language, and Kernighan co-authored the book on it.
You rarely see original K&R code today. If you do, it’ll look odd, like Middle
English looks odd to modern English readers.
C89, ANSI C, C90 In 1989, the American National Standards Institute (ANSI) produced a C
language specification that set the tone for C that persists to this day. A year
later, the reins were handed to the International Organization for
Standardization (ISO) that produced the identical C90.

24
https://en.wikipedia.org/wiki/Unix
25
If you don’t give it an output filename, it will export to a file called a.out by default—this filename has its roots deep in Unix
history.
HELLO, WORLD! 8

Version Description
C95 A rarely-mentioned addition to C89 that included wide character support.
C99 The first big overhaul with lots of language additions. The thing most people
will rememeber is the addition of //-style comments. This is the most popular
version of C in use as of this writing.
C11 This major version update includes Unicode support and multi-threading. Be
advised that if you start using these language features, you might be
sacrificing portability with places that are stuck in C99 land. But, honestly,
1999 is getting to be a while back now.
C17, C18 Bugfix update to C11. C17 seems to be the official name, but the publication
was delayed until 2018. As far as I can tell, these two are interchangeable,
with C17 being preferred.
C2x What’s coming next! Expected to eventually become C21.

You can force GCC to use one of these standards with the -std= command line argument. If you want it to
be picky about the standard, add -pedantic.
For example:
gcc -std=c99 -pedantic foo.c

For this book, I compile programs for C18 with all warnings set:
gcc -Wall -Wextra -std=c18 -pedantic foo.c
Variables and Statements

“It takes all kinds to make a world, does it not, Padre?”


“So it does, my son, so it does.”
—Pirate Captain Thomas Bartholomew Red to the Padre, Pirates
There sure can be lotsa stuff in a C program.
Yup.
And for various reasons, it’ll be easier for all of us if we classify some of the types of things you can find in
a program, so we can be clear what we’re talking about.

Variables
It’s said that “variables hold values”. But another way to think about it is that a variable is a human-readable
name that refers to some data in memory.
We’re going to take a second here and take a peek down the rabbit hole that is pointers. Don’t worry about
it.
You can think of memory as a big array of bytes26 Data is stored in this “array”27 . If a number is larger than
a single byte, it is stored in multiple bytes. Because memory is like an array, each byte of memory can be
referred to by its index. This index into memory is also called an address, or a location, or a pointer.
When you have a variable in C, the value of that variable is in memory somewhere, at some address. Of
course. After all, where else would it be? But it’s a pain to refer to a value by its numeric address, so we
make a name for it instead, and that’s what the variable is.
The reason I’m bringing all this up is twofold:
1. It’s going to make it easier to understand pointers later.
2. Also, it’s going to make it easier to understand pointers later.
So a variable is a name for some data that’s stored in memory at some address.

Variable Names
You can use any characters in the range 0-9, A-Z, a-z, and underscore for variable names, with the following
rules:
• You can’t start a variable with a digit 0-9.
• You can’t start a variable name with two underscores.
• You can’t start a variable name with an underscore followed by a capital A-Z.
26
A “byte” is an 8-bit binary number. Think of it as an integer that can only hold the values from 0 to 255, inclusive.
27
I’m seriously oversimplifying how modern memory works, here. But the mental model works, so please forgive me.

9
VARIABLES AND STATEMENTS 10

For Unicode, things get a little different, but the basic idea is that you can start or continue the variable name
with one of the characters listed in C11 §D.1, and you can continue but not start a variable name with any of
the characters listed in C11 §D.2.
Since those are just number ranges, I’m not going to reproduce them here. If you’re in an environment that
supports Unicode, just try it and see if it works.
Just don’t start a variable name with the “Combining Left Harpoon Above” character and you’ll be fine.

Variable Types
Depending on which languages you already have in your toolkit, you might or might not be familiar with the
idea of types. But C’s kinda picky about them, so we should do a refresher.
Some example types:

Type Example C Type


Integer 3490 int
Floating point 3.14159 float
Character (single) 'c' char
String "Hello, world!" char *28

C makes an effort to convert automatically between most numeric types when you ask it to. But other than
that, all conversions are manual, notably between string and numeric.
Almost all of the types in C are variants on these types.
Before you can use a variable, you have to declare that variable and tell C what type the variable holds. Once
declared, the type of variable cannot be changed later at runtime. What you set it to is what it is until it falls
out of scope and is reabsorbed into the universe.
Let’s take our previous “Hello, world” code and add a couple variables to it:
1 #include <stdio.h>
2

3 int main(void)
4 {
5 int i; /* holds signed integers, e.g. -3, -2, 0, 1, 10 */
6 float f; /* holds signed floating point numbers, e.g. -3.1416 */
7

8 printf("Hello, World!\n"); /* ah, blessed familiarity */


9 }

There! We’ve declared a couple of variables. We haven’t used them yet, and they’re both uninitialized. One
holds an integer number, and the other holds a floating point number (a real number, basically, if you have a
math background).
Uninitialized variables have indeterminate value29 . They have to be initialized or else you must assume they
contain some nonsense number.
This is one of the places C can “get you”. Much of the time, in my experience, the indeterminate
value is zero… but it can vary from run to run! Never assume the value will be zero, even if you
see it is. Always explicitly initialize variables to some value before you use them!
28
Read this as “pointer to a char” or “char pointer”. “Char” for character. Though I can’t find a study, it seems anecdotally most
people pronounce this as “char”, a minority say “car”, and a handful say “care”. We’ll talk more about pointers later.
29
Colloquially, we say they have “random” values, but they aren’t truly—or even pseudo-truly—random numbers.
VARIABLES AND STATEMENTS 11

What’s this? You want to store some numbers in those variables? Insanity!
Let’s go ahead and do that:
1 int main(void)
2 {
3 int i;
4

5 i = 2; // Assign the value 2 into the variable i


6

7 printf("Hello, World!\n");
8 }

Killer. We’ve stored a value. Let’s print it.


We’re going to do that by passing two amazing parameters to the printf() function. The first argument is
a string that describes what to print and how to print it (called the format string), and the second is the value
to print, namely whatever is in the variable i.
printf() hunts through the format string for a variety of special sequences which start with a percent sign
(%) that tell it what to print. For example, if it finds a %d, it looks to the next parameter that was passed, and
prints it out as an integer. If it finds a %f, it prints the value out as a float. If it finds a %s, it prints a string.
As such, we can print out the value of various types like so:
1 int main(void)
2 {
3 int i = 2;
4 float f = 3.14;
5 char *s = "Hello, world!"; // char * ("char pointer") is the string type
6

7 printf("%s i = %d and f = %f!\n", s, i, f);


8 }

And the output will be:


Hello, world! i = 2 and f = 3.14!

In this way, printf() might be similar to various types of format or parameterized strings in other languages
you’re familiar with.

Boolean Types
C has Boolean types, true or false?
1!
Historically, C didn’t have a Boolean type, and some might argue it still doesn’t.
In C, 0 means “false”, and non-zero means “true”.
So 1 is true. And 37 is true. And 0 is false.
You can just declare Boolean types as ints:
int x = 1;

if (x) {
printf("x is true!\n");
}
VARIABLES AND STATEMENTS 12

If you #include <stdbool.h>, you also get access to some symbolic names that might make things look
more familiar, namely a bool type and true and false values:
1 #include <stdio.h>
2 #include <stdbool.h>
3

4 int main(void) {
5 bool x = true;
6

7 if (x) {
8 printf("x is true!\n");
9 }
10 }

But these are identical to using integer values for true and false. They’re just a facade to make things look
nice.

Operators and Expressions


C operators should be familiar to you from other languages. Let’s blast through some of them here.
(There are a bunch more details than this, but we’re going to do enough in this section to get started.)

The sizeof Operator


This operator tells you the size (in bytes) that a particular variable or data type uses in memory.
This can be different on different systems, except for char (which is always 1 byte).
And this might not seem very useful now, but we’ll be making reference to it here and there, so it’s worth
covering.
You can take the sizeof a variable or expression:
int a = 999;

print("%zu", sizeof a); // Prints 8 on my system


print("%zu", sizeof 3.14); // Prints 8 on my system, also

or you can take the sizeof a type (note the parentheses are required around a type name, unlike an expres-
sion):
print("%zu", sizeof(int)); // Prints 8 on my system
print("%zu", sizeof(char)); // Prints 1 on all systems

We’ll make use of this later on.

Arithmetic
Hopefully these are familiar:
i = i + 3; // addition (+) and assignment (=) operators, add 3 to i
i = i - 8; // subtraction, subtract 8 from i
i = i * 9; // multiplication
i = i / 2; // division
i = i % 5; // modulo (division remainder)

There are shorthand variants for all of the above. Each of those lines could more tersely be written as:
VARIABLES AND STATEMENTS 13

i += 3; // Same as "i = i + 3", add 3 to i


i -= 8; // Same as "i = i - 8"
i *= 9; // Same as "i = i * 9"
i /= 2; // Same as "i = i / 2"
i %= 5; // Same as "i = i % 5"

There is no exponentiation. You’ll have to use one of the pow() function variants from math.h.
Let’s get into some of the weirder stuff you might not have in your other languages!

Ternary Operator
C also includes the ternary operator. This is an expression whose value depends on the result of a conditional
embedded in it.
// If x > 10, add 17 to y. Otherwise add 37 to y.

y += x > 10? 17: 37;

What a mess! You’ll get used to it the more you read it. To help out a bit, I’ll rewrite the above expression
using if statements:
// This expression:

y += x > 10? 17: 37;

// is equivalent to this non-expression:

if (x > 10)
y += 17;
else
y += 37;

Or, another example that prints if a number stored in x is odd or even:


printf("The number %d is %s.\n", x, x % 2 == 0?"even": "odd")

The %s format specifier in printf() means print a string. If the expression x % 2 evaluates to 0, the value
of the entire ternary expression evaluates to the string "even". Otherwise it evaluates to the string "odd".
Pretty cool!
It’s important to note that the ternary operator isn’t flow control like the if statement is. It’s just an expression
that evaluates to a value.

Pre-and-Post Increment-and-Decrement
Now, let’s mess with another thing that you might not have seen.
These are the legendary post-increment and post-decrement operators:
i++; // Add one to i (post-increment)
i--; // Subtract one from i (post-decrement)

Very commonly, these are just used as shorter versions of:


i += 1; // Add one to i
i -= 1; // Subtract one from i

but they’re more subtly different than that, the clever scoundrels.
Let’s take a look at this variant, pre-increment and pre-decrement:
VARIABLES AND STATEMENTS 14

++i; // Add one to i (pre-increment)


--i; // Subtract one from i (pre-decrement)

With pre-increment and pre-decrement, the value of the variable is incremented or decremented before the
expression is evaluated. Then the expression is evaluated with the new value.
With post-increment and post-decrement, the value of the expression is first computed with the value as-is,
and then the value is incremented or decremented after the value of the expression has been determined.
You can actually embed them in expressions, like this:
i = 10;
j = 5 + i++; // Compute 5 + i, _then_ increment i

printf("%d, %d\n", i, j); // Prints 11, 15

Let’s compare this to the pre-increment operator:


i = 10;
j = 5 + ++i; // Increment i, _then_ compute 5 + i

printf("%d, %d\n", i, j); // Prints 11, 16

This technique is used frequently with array and pointer access and manipulation. It gives you a way to use
the value in a variable, and also increment or decrement that value before or after it is used.
But by far the most common place you’ll see this is in a for loop:
for (i = 0; i < 10; i++)
printf("i is %d\n");

But more on that later.

The Comma Operator


This is an uncommonly-used way to separated expressions that will run left to right:
x = 10, y = 20; // First assign 10 to x, then 20 to y

Seems a bit silly, since you could just replace the comma with a semicolon, right?
x = 10; y = 20; // First assign 10 to x, then 20 to y

But that’s a little different. The latter is two separate expressions, while the former is a single expression!
With the comma operator, the value of the comma expression is the value of the rightmost expression:
x = 1, 2, 3;

printf("x is %d\n", x); // Prints 3, because 3 is rightmost in the comma list

But even that’s pretty contrived. One common place the comma operator is used is in for loops to do multiple
things in each section of the statement:
for (i = 0, j = 10; i < 100; i++, j++)
printf("%d, %d\n", i, j);

We’ll revisit that later.

Conditional Operators
For Boolean values, we have a raft of standard operators:
VARIABLES AND STATEMENTS 15

a == b; // True if a is equivalent to b
a != b; // True if a is not equivalent to b
a < b; // True if a is less than b
a > b; // True if a is greater than b
a <= b; // True if a is less than or equal to b
a >= b; // True if a is greater than or equal to b

Don’t mix up assignment = with comparison ==! Use two equals to compare, one to assign.
We can use the comparison expressions with if statements:
if (a <= 10)
printf("Success!\n");

Boolean Operators
We can chain together or alter conditional expressions with Boolean operators for and, or, and not.

Operator Boolean meaning


&& and
|| or
! not

An example of Boolean “and”:


// Do something if x less than 10 and y greater than 20:

if (x < 10 && y > 20)


printf("Doing something!\n");

An example of Boolean “not”:


if (!(x < 12))
printf("x is not less than 12\n");

! has higher precedence than the other Boolean operators, so we have to use parentheses in that case.

Of course, that’s just the same as:


if (x >= 12)
printf("x is not less than 12\n");

but I needed the example!

Flow Control
Booleans are all good, but of course we’re nowhere if we can’t control program flow. Let’s take a look at a
number of constructs: if, for, while, and do-while.
First, a general forward-looking note about statements and blocks of statements brought to you by your local
friendly C developer:
After something like an if or while statement, you can either put a single statement to be executed, or a
block of statements to all be executed in sequence.
Let’s start with a single statement:
if (x == 10) printf("x is 10");
VARIABLES AND STATEMENTS 16

This is also sometimes written on a separate line. (Whitespace is largely irrelevant in C—it’s not like Python.)
if (x == 10)
printf("x is 10\n");

But what if you want multiple things to happen due to the conditional? You can use squirrelly braces to mark
a block or compound statement.
if (x == 10) {
printf("x is 10\n");
printf("And also this happens when x is 10\n");
}

It’s a really common style to always use squirrelly braces even if they aren’t necessary:
if (x == 10) {
printf("x is 10\n");
}

Some devs feel the code is easier to read and avoids errors like this where things visually look like they’re
in the if block, but actually they aren’t.
// BAD ERROR EXAMPLE

if (x == 10)
printf("x is 10\n");
printf("And also this happens ALWAYS\n"); // Surprise!! Unconditional!

while and for and the other looping constructs work the same way as the examples above. If you want to
do multiple things in a loop or after an if, wrap them up in squirrelly braces.
In other words, the if is going to run the one thing after the if. And that one thing can be a single statement
or a block of statements.

The if statement
We’ve already been using if for multiple examples, since it’s likely you’ve seen it in a language before, but
here’s another:
int i = 10;

if (i > 10) {
printf("Yes, i is greater than 10.\n");
printf("And this will also print if i is greater than 10.\n");
}

if (i <= 10) printf("i is less than or equal to 10.\n");

In the example code, the message will print if i is greater than 10, otherwise execution continues to the next
line. Notice the squirrley braces after the if statement; if the condition is true, either the first statement or
expression right after the if will be executed, or else the collection of code in the squirlley braces after the
if will be executed. This sort of code block behavior is common to all statements.

The while statement


while is your average run-of-the-mill looping construct. Do a thing while a condition expression is true.

Let’s do one!
VARIABLES AND STATEMENTS 17

// print the following output:


//
// i is now 0!
// i is now 1!
// [ more of the same between 2 and 7 ]
// i is now 8!
// i is now 9!

i = 0;

while (i < 10) {


printf("i is now %d!\n", i);
i++;
}

printf("All done!\n");

That gets you a basic loop. C also has a for loop which would have been cleaner for that example.
A not-uncommon use of while is for infinite loops where you repeat while true:
while (1) {
printf("1 is always true, so this repeats forever.\n");
}

The do-while statement


So now that we’ve gotten the while statement under control, let’s take a look at its closely related cousin,
do-while.

They are basically the same, except if the loop condition is false on the first pass, do-while will execute
once, but while won’t execute at all. Let’s see by example:
/* using a while statement: */

i = 10;

// this is not executed because i is not less than 10:


while(i < 10) {
printf("while: i is %d\n", i);
i++;
}

/* using a do-while statement: */

i = 10;

// this is executed once, because the loop condition is not checked until
// after the body of the loop runs:

do {
printf("do-while: i is %d\n", i);
i++;
} while (i < 10);

printf("All done!\n");
VARIABLES AND STATEMENTS 18

Notice that in both cases, the loop condition is false right away. So in the while, the loop fails, and the
following block of code is never executed. With the do-while, however, the condition is checked after the
block of code executes, so it always executes at least once. In this case, it prints the message, increments i,
then fails the condition, and continues to the “All done!” output.
The moral of the story is this: if you want the loop to execute at least once, no matter what the loop condition,
use do-while.
All these examples might have been better done with a for loop. Let’s do something less deterministic—
repeat until a certain random number comes up!
1 #include <stdio.h> // For printf
2 #include <stdlib.h> // For rand
3

4 int main(void)
5 {
6 int r;
7

8 do {
9 r = rand() % 100; // Get a random number between 0 and 99
10 printf("%d\n", r);
11 } while (r != 37); // Repeat until 37 comes up
12 }

The for statement


Welcome to one of the most popular loops in the world! The for loop!
This is a great loop if you know the number of times you want to loop in advance.
You could do the same thing using just a while loop, but the for loop can help keep the code cleaner.
Here are two pieces of equivalent code—note how the for loop is just a more compact representation:
// Print numbers between 0 and 9, inclusive...

// Using a while statement:

i = 0;
while (i < 10) {
printf("i is %d\n", i);
i++;
}

// Do the exact same thing with a for-loop:

for (i = 0; i < 10; i++) {


printf("i is %d\n", i);
}

That’s right, folks—they do exactly the same thing. But you can see how the for statement is a little more
compact and easy on the eyes. (JavaScript users will fully appreciate its C origins at this point.)
It’s split into three parts, separated by semicolons. The first is the initialization, the second is the loop
condition, and the third is what should happen at the end of the block if the loop condition is true. All three
of these parts are optional.
for (initialize things; loop if this is true; do this after each loop)
VARIABLES AND STATEMENTS 19

Note that the loop will not execute even a single time if the loop condition starts off false.
for-loop fun fact!

You can use the comma operator to do multiple things in each clause of the for loop!
for (i = 0, j = 999; i < 10; i++, j--) {
printf("%d, %d\n", i, j);
}

An empty for will run forever:


for(;;) { // "forever"
printf("I will print this again and again and again\n" );
printf("for all eternity until the cold-death of the universe.\n");
}

The switch Statement


Depending on what languages you’re coming from, you might or might not be familiar with switch, or C’s
version might even be more restrictive than you’re used to. This is a statement that allows you to take a
variety of actions depending on the value of an integer expression.
Basically, it evaluates an expression to an integer value, jumps to the case that corresponds to that value.
Execution resumes from that point. If a break statement is encountered, then execution jumps out of the
switch.

Let’s do an example where the user enters a number of goats and we print out a gut-feel of how many goats
that is.
1 #include <stdio.h>
2

3 int main(void)
4 {
5 int goat_count;
6

7 printf("Enter a goat count: ");


8 scanf("%d", &goat_count); // Read an integer from the keyboard
9

10 switch (goat_count) {
11 case 0:
12 printf("You have no goats.\n");
13 break;
14

15 case 1:
16 printf("You have a singular goat.\n");
17 break;
18

19 case 2:
20 printf("You have a brace of goats.\n");
21 break;
22

23 default:
24 printf("You have a bona fide plethora of goats!\n");
25 break;
26 }
27 }
VARIABLES AND STATEMENTS 20

In that example, if the user enters, say, 2, the switch will jump to the case 2 and execute from there. When
(if) it hits a break, it jumps out of the switch.
Also, you might see that default label there at the bottom. This is what happens when no cases match.
Every case, including default, is optional. And they can occur in any order, but it’s really typical for
default, if any, to be listed last.

So the whole thing acts like an if-else cascade:


if (goat_count == 0)
printf("You have no goats.\n");
else if (goat_count == 1)
printf("You have a singular goat.\n");
else if (goat_count == 2)
printf("You have a brace of goats.\n");
else:
printf("You have a bona fide plethora of goats!\n");

With some key differences:


• switch is often faster to jump to the correct code (though the spec makes no such guarantee).
• if-else can do things like relational conditionals like < and >= and floating point and other types,
while switch cannot.
There’s one more neat thing about switch that you sometimes see that is quite interesting: fall through.
Remember how break causes us to jump out of the switch?
Well, what happens if we don’t break?
Turns out we just keep on going into the next case! Demo!
switch (x) {
case 1:
printf("1\n");
// fall through!
case 2:
printf("2\n");
break;
case 3:
printf("3\n");
break;
}

If x == 1, this switch will first hit case 1, it’ll print the 1, but then it just continues on to the next line of
code… which prints 2!
And then, at last, we hit a break so we jump out of the switch.
if x == 2, then we just it the case 2, print 2, and break as normal.
Not having a break is called fall through.
ProTip: ALWAYS put a comment in the code where you intend to fall through, like I did above. It will save
other programmers from wondering if you meant to do that.
In fact, this is one of the common places to introduce bugs in C programs: forgetting to put a break in your
case. You gotta do it if you don’t want to just roll into the next case30 .

30
This was considered such hazard that the designers of the Go Programming Language made break the default; you have to explicitly
use Go’s fallthrough statement if you want to fall into the next case.
Functions

Very much like other languages you’re used to, C has the concept of functions.
Functions can accept a variety of arguments and return a value. One important thing, though: the arguments
and return value types are predeclared—because that’s how C likes it!
Let’s take a look at a function. This is a function that takes an int as an argument, and returns an int.
1 int plus_one(int n) // The "definition"
2 {
3 return n + 1;
4 }
5

The int before the plus_one indicates the return type.


The int n indicates that this function takes one int argument, stored in parameter n.
Continuing the program down into main(), we can see the call to the function, where we assign the return
value into local variable j:
6 int main(void)
7 {
8 int i = 10, j;
9

10 j = plus_one(i); // The "call"


11

12 printf("i + 1 is %d\n", j);


13 }

Before I forget, notice that I defined the function before I used it. If hadn’t done that, the compiler
wouldn’t know about it yet when it compiles main() and it would have given an unknown
function call error. There is a more proper way to do the above code with function prototypes,
but we’ll talk about that later.
Also notice that main() is a function!
It returns an int.
But what’s this void thing? This is a keyword that’s used to indicate that the function accepts no arguments.
You can also return void to indicate that you don’t return a value:
1 // This function takes no parameters and returns no value:
2

3 void hello(void)
4 {
5 printf("Hello, world!\n");
6 }

21
FUNCTIONS 22

8 int main(void)
9 {
10 hello(); // Prints "Hello, world!"
11 }

Passing by Value
When you pass a value to a function, a copy of that value gets made in this magical mystery world known as
the stack31 . (The stack is just a hunk of memory somewhere that the program allocates memory on. Some
of the stack is used to hold the copies of values that are passed to functions.)
For now, the important part is that a copy of the variable or value is being passed to the function. The practical
upshot of this is that since the function is operating on a copy of the value, you can’t affect the value back in
the calling function directly. Like if you wanted to increment a value by one, this would NOT work:
1 void increment(int a)
2 {
3 a++;
4 }
5

6 int main(void)
7 {
8 int i = 10;
9

10 increment(i);
11 }

You might somewhat sensibly think that the value of i after the call would be 11, since that’s what the ++
does, right? This would be incorrect. What is really happening here?
Well, when you pass i to the increment() function, a copy gets made on the stack, right? It’s the copy that
increment() works on, not the original; the original i is unaffected. We even gave the copy a name: a,
right? It’s right there in the parameter list of the function definition. So we increment a, sure enough, but
what good does that do us out in main() ? None! Ha!
That’s why in the previous example with the plus_one() function, we returned the locally modified value
so that we could see it again in main().
Seems a little bit restrictive, huh? Like you can only get one piece of data back from a function, is what
you’re thinking. There is, however, another way to get data back; C folks call it passing by reference. But no
fancy-schmancy name will distract you from the fact that EVERYTHING you pass to a function WITHOUT
EXCEPTION is copied onto the stack and the function operates on that local copy, NO MATTER WHAT.
Remember that, even when we’re talking about this so-called passing by reference.
But that’s a story for another time.

Function Prototypes
So if you recall back in the ice age a few sections ago, I mentioned that you had to define the function before
you used it, otherwise the compiler wouldn’t know about it ahead of time, and would bomb out with an error.
31
Now. technically speaking, the C specification doesn’t say anything about a stack. It’s true. Your system might not use a stack
deep-down for function calls. But it either does or looks like it does, and every single C programmer on the planet will know what
you’re talking about when you talk about “the stack”. It would be just mean for me to keep you in the dark. Plus, the stack analogy is
excellent for describing how recursion works.
FUNCTIONS 23

This isn’t quite strictly true. You can notify the compiler in advance that you’ll be using a function of a
certain type that has a certain parameter list and that way the function can be defined anywhere at all, as long
as the function prototype has been declared first.
Fortunately, the function prototype is really quite easy. It’s merely a copy of the first line of the function
definition with a semicolon tacked on the end for good measure. For example, this code calls a function that
is defined later, because a prototype has been declared first:
1 int foo(void); // This is the prototype!
2

3 int main(void)
4 {
5 int i;
6

7 i = foo();
8 }
9

10 int foo(void) // this is the definition, just like the prototype!


11 {
12 return 3490;
13 }

You might notice something about the sample code we’ve been using…that is, we’ve been using the good old
printf() function without defining it or declaring a prototype! How do we get away with this lawlessness?
We don’t, actually. There is a prototype; it’s in that header file stdio.h that we included with #include,
remember? So we’re still legit, officer!
Pointers—Cower In Fear!

Pointers are one of the most feared things in the C language. In fact, they are the one thing that makes this
language challenging at all. But why?
Because they, quite honestly, can cause electric shocks to come up through the keyboard and physically weld
your arms permanently in place, cursing you to a life at the keyboard in this language from the 70s!
Well, not really. But they can cause huge headaches if you don’t know what you’re doing when you try to
mess with them.

Memory and Variables


Computer memory holds data of all kinds, right? It’ll hold floats, ints, or whatever you have. To make
memory easy to cope with, each byte of memory is identified by an integer. These integers increase sequen-
tially as you move up through memory. You can think of it as a bunch of numbered boxes, where each box
holds a byte32 of data. Or like a big array where each element holds a byte, if you come from a language
with arrays. The number that represents each box is called its address.
Now, not all data types use just a byte. For instance, an int is often four bytes, as is a float, but it really
depends on the system. You can use the sizeof operator to determine how many bytes of memory a certain
type uses.
// %zu is the format specifier for type size_t ("t" is for "type", but
// it's pronounced "size tee"), which is what is returned by sizeof.
// More on size_t later.

printf("an int uses %zu bytes of memory\n", sizeof(int));

// That prints "4" for me, but can vary by system.

When you have a data type that uses more than a byte of memory, the bytes that make up the data are
always adjacent to one another in memory. Sometimes they’re in order, and sometimes they’re not33 , but
that’s platform-dependent, and often taken care of for you without you needing to worry about pesky byte
orderings.
So anyway, if we can get on with it and get a drum roll and some forboding music playing for the definition
of a pointer, a pointer is the address of some data in memory. Imagine the classical score from 2001: A
Space Odessey at this point. Ba bum ba bum ba bum BAAAAH!
Ok, so maybe a bit overwrought here, yes? There’s not a lot of mystery about pointers. They are the address
of data. Just like an int can be 12, a pointer can be the address of data.
32
A byte is a number made up of no more than 8 binary digits, or bits for short. This means in decimal digits just like grandma used
to use, it can hold an unsigned number between 0 and 255, inclusive.
33
The order that bytes come in is referred to as the endianess of the number. Common ones are big endian and little endian. This
usually isn’t something you need to worry about.

24
POINTERS—COWER IN FEAR! 25

This means that all these things mean the same thing:
• Index into memory (if you’re thinking of memory like a big array)
• Address
• Pointer
• Location
I’m going to use these interchangeably. And yes, I just threw location in there because you can never have
enough words that mean the same thing.
Often, we like to make a pointer to some data that we have stored in a variable, as opposed to any old random
data out in memory wherever. Having a pointer to a variable is often more useful.
So if we have an int, say, and we want a pointer to it, what we want is some way to get the address of that
int, right? After all, the pointer is just the address of the data. What operator do you suppose we’d use to
find the address of the int?
Well, by a shocking suprise that must come as something of a shock to you, gentle reader, we use the
address-of operator (which happens to be an ampersand: “&”) to find the address of the data. Amper-
sand.
So for a quick example, we’ll introduce a new format specifier for printf() so you can print a pointer. You
know already how %d prints a decimal integer, yes? Well, %p prints a pointer. Now, this pointer is going to
look like a garbage number (and it might be printed in hexadecimal34 instead of decimal), but it is merely
the index into memory the data is stored in. (Or the index into memory that the first byte of data is stored in,
if the data is multi-byte.) In virtually all circumstances, including this one, the actual value of the number
printed is unimportant to you, and I show it here only for demonstration of the address-of operator.
1 #include <stdio.h>
2

3 int main(void)
4 {
5 int i = 10;
6

7 printf("The value of i is %d, and its address is %p\n", i, &i);


8 }

On my computer, this prints:


The value of i is 10, and its address is 0x7ffda2546fc4

If you’re curious, that hexadecimal number is 140,727,326,896,068 in base 10. That’s the index into memory
where the variable i’s data is stored. It’s the address of i. It’s the location of i. It’s a pointer to i.
It’s a pointer because it lets you know where i is in memory. Like a literal sign with an arrow on it pointing
at a thing, this number indicates to us where in memory we can find the value of i. It points to i.
Again, we don’t really care what the number is, generally. We just care that it’s a pointer to i.

Pointer Types
Well, this is all well and good. You can now successfully take the address of a variable and print it on the
screen. There’s a little something for the ol’ resume, right? Here’s where you grab me by the scruff of the
neck and ask politely what the frick pointers are good for.
Excellent question, and we’ll get to that right after these messages from our sponsor.
ACME ROBOTIC HOUSING UNIT CLEANING SERVICES. YOUR HOMESTEAD WILL BE DRA-
MATICALLY IMPROVED OR YOU WILL BE TERMINATED. MESSAGE ENDS.
34
That is, base 16 with digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, and F.
POINTERS—COWER IN FEAR! 26

Welcome back to another installment of Beej’s Guide to Whatever. When we met last we were talking about
how to make use of pointers. Well, what we’re going to do is store a pointer off in a variable so that we can
use it later. You can identify the pointer type because there’s an asterisk (*) before the variable name and
after its type:
1 int main(void)
2 {
3 int i; /* i's type is "int" */
4 int *p; /* p's type is "pointer to an int", or "int-pointer" */
5 }

Hey, so we have here a variable that is a pointer itself, and it can point to other ints. We know it points to
ints, since it’s of type int* (read “int-pointer”).

When you do an assignment into a pointer variable, the type of the right hand side of the assignment has to
be the same type as the pointer variable. Fortunately for us, when you take the address-of a variable, the
resultant type is a pointer to that variable type, so assignments like the following are perfect:
int i;
int *p; /* p is a pointer, but is uninitialized and points to garbage */

p = &i; /* p now "points to" i */

On the left of the assignment, we have a variable of type pointer-to-int (int*), and on the right side, we
have expression of type address-of-int (since i is an int). But remember that “address” and “pointer” both
mean the same thing! The address of a thing is pointer to that thing.
So effectively, both sides of the assignment are type pointer-to-int (which is the same as type “address-of-
int”, but no one says it that way).

Get it? I know is still doesn’t quite make much sense since you haven’t seen an actual use for the pointer
variable, but we’re taking small steps here so that no one gets lost. So now, let’s introduce you to the anti-
address-of, operator. It’s kind of like what address-of would be like in Bizarro World.

Dereferencing
Like we’ve said, a pointer, also known as an address, is sometimes also called a reference. How in the name
of all that is holy can there be so many terms for exactly the same thing? I don’t know the answer to that one,
but these things are all equivalent, and can be used interchangeably.
The only reason I’m telling you this is so that the name of this operator will make any sense to you whatsoever.
When you have a pointer to a variable (roughly “a reference to a variable”), you can use the original variable
through the pointer by dereferencing the pointer. (You can think of this as “de-pointering” the pointer, but
no one ever says “de-pointering”.)
What do I mean by “get access to the original variable”? Well, if you have a variable called i, and you have
a pointer to i called p, you can use the dereferenced pointer p exactly as if it were the original variable i!
You almost have enough knowledge to handle an example. The last tidbit you need to know is actually this:
what is the dereference operator? It is the asterisk, again: *. Now, don’t get this confused with the asterisk
you used in the pointer declaration, earlier. They are the same character, but they have different meanings in
different contexts35 .
Here’s a full-blown example:
1 #include <stdio.h>
2

3 int main(void)
35
That’s not all! It’s used in /*comments*/ and multiplication!
POINTERS—COWER IN FEAR! 27

4 {
5 int i;
6 int *p; // this is NOT a dereference--this is a type "int*"
7

8 p = &i; // p now points to i, p holds address of i


9

10 i = 10; // i is now 10
11 *p = 20; // i (yes i!) is now 20!!
12

13 printf("i is %d\n", i); // prints "20"


14 printf("i is %d\n", *p); // "20"! dereference-p is the same as i!
15 }

Remember that p holds the address of i, as you can see where we did the assignment to p. What the derefer-
ence operator does is tells the computer to use the variable the pointer points to instead of using the pointer
itself. In this way, we have turned *p into an alias of sorts for i.
Great, but why? Why do any of this?

Passing Pointers as Parameters


Right about now, you’re thinking that you have an awful lot of knowledge about pointers, but absolutely zero
application, right? I mean, what use is *p if you could just simply say i instead?
Well, my feathered friend, the real power of pointers comes into play when you start passing them to functions.
Why is this a big deal? You might recall from before that you could pass all kinds of parameters to functions
and they’d be dutifully copied onto the stack, and then you could manipulate local copies of those variables
from within the function, and then you could return a single value.
What if you wanted to bring back more than one single piece of data from the function? I mean, you can
only return one thing, right? What if I answered that question with another question, like this:
What happens when you pass a pointer as a parameter to a function? Does a copy of the pointer get put on
the stack? You bet your sweet peas it does. Remember how earlier I rambled on and on about how EVERY
SINGLE PARAMETER gets copied onto the stack and the function uses a copy of the parameter? Well, the
same is true here. The function will get a copy of the pointer.
But, and this is the clever part: we will have set up the pointer in advance to point at a variable…and then
the function can dereference its copy of the pointer to get back to the original variable! The function can’t
see the variable itself, but it can certainly dereference a pointer to that variable! Example!
1 #include <stdio.h>
2

3 void increment(int *p) // note that it accepts a pointer to an int


4 {
5 *p = *p + 1; // add one to the thing p points to
6 }
7

8 int main(void)
9 {
10 int i = 10;
11 int *j = &i; // note the address-of; turns it into a pointer
12

13 printf("i is %d\n", i); // prints "10"


14 printf("i is also %d\n", *j); // prints "10"
15

16 increment(j);
POINTERS—COWER IN FEAR! 28

17

18 printf("i is %d\n", i); // prints "11"!


19 }

Ok! There are a couple things to see here…not the least of which is that the increment() function takes
an int* as a parameter. We pass it an int* in the call by changing the int variable i to an int* using the
address-of operator. (Remember, a pointer is an address, so we make pointers out of variables by running
them through the address-of operator.)
The increment() function gets a copy of the pointer on the stack. Both the original pointer j (in main())
and the copy of that pointer p (in increment()) point to the same address, namely the one holding the
value i. So dereferencing either will allow you to modify the original variable i! The function can modify
a variable in another scope! Rock on!
Pointer enthusiasts will recall from early on in the guide, we used a function to read from the keyboard,
scanf()…and, although you might not have recognized it at the time, we used the address-of to pass a
pointer to a value to scanf(). We had to pass a pointer, see, because scanf() reads from the keyboard and
stores the result in a variable. The only way it can see that variable that is local to that calling function is if
we pass a pointer to that variable:
int i = 0;

scanf("%d", &i); /* pretend you typed "12" */


printf("i is %d\n", i); /* prints "i is 12" */

See, scanf() dereferences the pointer we pass it in order to modify the variable it points to. And now you
know why you have to put that pesky ampersand in there!

The NULL Pointer


Any pointer type can be set to a special value called NULL. This indicates that this pointer doesn’t point to
anything.
int *p;

p = NULL;

Since it doesn’t point to a value, dereferencing it is undefined behavior, and probably will result in a crash:
int *p = NULL;

*p = 12; // CRASH or SOMETHING PROBABLY BAD

Despite being called the billion dollar mistake by its creator36 , the NULL pointer is a good sentinel value37
and general indicator that a pointer hasn’t yet been initialized.
(Of course, the pointer points to garbage unless you explicitly assign it to point to an address or NULL.)

A Note on Declaring Pointers


The syntax for declaring a pointer can get a little weird. Let’s look at this example:
int a;
int b;

We can condense that into a single line, right?


36
https://en.wikipedia.org/wiki/Null_pointer#History
37
https://en.wikipedia.org/wiki/Sentinel_value
POINTERS—COWER IN FEAR! 29

int a, b; // Same thing

So a and b are both ints. No problem.


But what about this?
int a;
int *p;

Can we make that into one line? We can. But where does the * go?
The rule is that the * goes in front of any variable that is a pointer type. That is. the * is not part of the int
in this example. it’s a part of variable p.
With that in mind, we can write this:
int a, *p; // Same thing

It’s important to note that this line does not declare two pointers:
int *p, q; // p is a pointer to an int; q is just an int.

So take a look at this and determine which variables are pointers and which are not:
int *a, b, c, *d, e, *f, g, h, *i;

I’ll drop the answer in a footnote38 .

38
The pointer type variables are a, d, f, and i, because those are the ones with * in front of them.
Arrays

Luckily, C has arrays. I mean, I know it’s considered a low-level language39 but it does at least have the
concept of arrays built-in. And since a great many languages drew inspiration from C’s syntax, you’re
probably already familiar with using [ and ] for declaring and using arrays in C.
But only barely! As we’ll find out later, arrays are just syntactic sugar in C—they’re actually all pointers and
stuff deep down. Freak out! But for now, let’s just use them as arrays. Phew.

Easy Example
Let’s just crank out an example:
1 #include <stdio.h>
2

3 int main(void)
4 {
5 int i;
6 float f[4]; // Declare an array of 4 floats
7

8 f[0] = 3.14159; // Indexing starts at 0, of course.


9 f[1] = 1.41421;
10 f[2] = 1.61803;
11 f[3] = 2.71828;
12

13 // Print them all out:


14

15 for (i = 0; i < 4; i++) {


16 printf("%f\n", f[i]);
17 }
18 }

When you declare an array, you have to give it a size. And the size has to be fixed40 .
In the above example, we made an array of 4 floats. The value in the square brackets in the declaration lets
us know that.
Later on in subsequent lines, we access the values in the array, setting them or getting them, again with square
brackets.
Hopefully this looks familiar from languages you already know!
39
These days, anyway.
40
Again, not really, but variable-length arrays—of which I’m not really a fan—are a story for another time.

30
ARRAYS 31

Getting the Length of an Array


You can’t. C doesn’t record this information. You have to manage it separately in another variable.
There is a trick to get the number of elements in an array in the scope in which an array is declared. But,
generally speaking, this won’t work the way you want if you pass the array into a function.

Array Initializers
You can initialize an array with constants ahead of time:
1 #include <stdio.h>
2

3 int main(void)
4 {
5 int i;
6 int a[5] = {22, 37, 3490, 18, 95}; // Initialize with these values
7

8 for (i = 0; i < 5; i++) {


9 printf("%d\n", a[i]);
10 }
11 }

Catch: initializer values must be constant terms. Can’t throw variables in there. Sorry, Illinois!
You should never have more items in your initializer than there is room for in the array, or the compiler will
get cranky:
foo.c: In function ‘main’:
foo.c:6:39: warning: excess elements in array initializer
6 | int a[5] = {22, 37, 3490, 18, 95, 999};
| ^~~
foo.c:6:39: note: (near initialization for ‘a’)

But (fun fact!) you can have fewer items in your initializer than there is room for in the array. The remaining
elements in the array will be automatically initialized with zero.
int a[5] = {22, 37, 3490};

// is the same as:

int a[5] = {22, 37, 3490, 0, 0};

It’s a common shortcut to see this in an initializer when you want to set an entire array to zero:
int a[100] = {0};

Which means, “Make the first element zero, and then automatically make the rest zero, as well.”
Lastly, you can also have C compute the size of the array from the initializer, just by leaving the size off:
int a[3] = {22, 37, 3490};

// is the same as:

int a[] = {22, 37, 3490}; // Left the size off!


ARRAYS 32

Out of Bounds!
C doesn’t stop you from accessing arrays out of bounds. It might not even warn you.
Let’s steal the example from above and keep printing off the end of the array. It only has 5 elements, but let’s
try to print 10 and see what happens:
1 #include <stdio.h>
2

3 int main(void)
4 {
5 int i;
6 int a[5] = {22, 37, 3490, 18, 95};
7

8 for (i = 0; i < 10; i++) { // BAD NEWS: printing too many elements!
9 printf("%d\n", a[i]);
10 }
11 }

Running it on my computer prints:


22
37
3490
18
95
32765
1847052032
1780534144
-56487472
21890

Yikes! What’s that? Well, turns out printing off the end of an array results in what C developers call undefined
behavior. We’ll talk more about this beast later, but for now it means, “You’ve done something bad, and
anything could happen during your program run.”
And by anything, I mean typically things like finding zeroes, finding garbage numbers, or crashing. But
really the C spec says in this circumstance the compiler is allowed to emit code that does anything41 .
Short version: don’t do anything that causes undefined behavior. Ever42 .

Multidimensional Arrays
You can add as many dimensions as you want to your arrays.
int a[10];
int b[2][7];
int c[4][5][6];

These are stored in memory in row-major order43 .


You an also use initializers on multidimensional arrays by nesting them:
41
In the good old MS-DOS days before memory protection was a thing, I was writing some particularly abusive C code that deliber-
ately engaged in all kinds of undefined behavior. But I knew what I was doing, and things were working pretty well. Until I made a
misstep that caused a lockup and, as I found upon reboot, nuked all my BIOS settings. That was fun. (Shout-out to @man for those fun
times.)
42
There are a lot of things that cause undefined behavior, not just out-of-bounds array accesses. This is what makes the C language
so exciting.
43
https://en.wikipedia.org/wiki/Row-_and_column-major_order
ARRAYS 33

1 #include <stdio.h>
2

3 int main(void)
4 {
5 int row, col;
6

7 int a[2][5] = { // Initialize a 2D array


8 {0, 1, 2, 3, 4},
9 {5, 6, 7, 8, 9}
10 };
11

12 for (row = 0; row < 2; row++) {


13 for (col = 0; col < 5; col++) {
14 printf("(%d,%d) = %d\n", row, col, a[row][col]);
15 }
16 }
17 }

For output of:


(0,0) = 0
(0,1) = 1
(0,2) = 2
(0,3) = 3
(0,4) = 4
(1,0) = 5
(1,1) = 6
(1,2) = 7
(1,3) = 8
(1,4) = 9

Arrays and Pointers


[Casually] So… I kinda might have mentioned up there that arrays were pointers, deep down? We should
take a shallow dive into that now so that things aren’t completely confusing. Later on, we’ll look at what the
real relationship between arrays and pointers is, but for now I just want to look at passing arrays to functions.

Getting a Pointer to an Array


I want to tell you a secret. Generally speaking, when a C programmer talks about a pointer to an array, they’re
talking about a pointer to the first element of the array44 .
So let’s get a pointer to the first element of an array.
1 #include <stdio.h>
2

3 int main(void)
4 {
5 int a[5] = {11, 22, 33, 44, 55};
6 int *p;
7

8 p = &a[0]; // p points to the array


9 // Well, to the first element, actually
44
This is technically incorrect, as a pointer to an array and a pointer to the first element of an array have different types. But we can
burn that bridge when we get to it.
ARRAYS 34

10

11 printf("%d\n", *p); // Prints "11"


12 }

This is so common to do in C that the language allows us a shorthand:


p = &a[0]; // p points to the array

// is the same as:

p = a; // p points to the array, but much nicer-looking!

Just referring to the array name in isolation is the same as getting a pointer to the first element of the array!
We’re going to use this extensively in the upcoming examples.
But hold on a second–isn’t p an int*? And *p gives is 11, same as a[0]? Yessss. You’re starting to get a
glimpe of how arrays and pointers are related in C.

Passing Single Dimensional Arrays to Functions


Let’s do an example with a single dimensional array. I’m going to write a couple functions that we can pass
the array to that do different things.
Prepare for some mind-blowing function signatures!
1 #include <stdio.h>
2

3 // Passing as a pointer to the first element


4 void times2(int *a, int len)
5 {
6 for (int i = 0; i < len; i++)
7 printf("%d\n", a[i] * 2);
8 }
9

10 // Same thing, but using array notation


11 void times3(int a[], int len)
12 {
13 for (int i = 0; i < len; i++)
14 printf("%d\n", a[i] * 3);
15 }
16

17 // Same thing, but using array notation with size


18 void times4(int a[5], int len)
19 {
20 for (int i = 0; i < len; i++)
21 printf("%d\n", a[i] * 4);
22 }
23

24 int main(void)
25 {
26 int x[5] = {11, 22, 33, 44, 55};
27

28 times2(x, 5);
29 times3(x, 5);
30 times4(x, 5);
31 }
ARRAYS 35

All those methods of listing the array as a parameter in the function are identical.
void times2(int *a, int len)
void times3(int a[], int len)
void times4(int a[5], int len)

In C, the first is the most common, by far.


And, in fact, in the latter situation, the compiler doesn’t even care what number you pass in (other than it has
to be greater than zero45 ). It doesn’t enforce anything at all.
Now that I’ve said that, the size of the array in the function declaration actually does matter when you’re
passing multidimensional arrays into functions, but let’s come back to that.

Changing Arrays in Functions


We’ve said that arrays are just pointers in disguise. This means that if you pass an array to a function, you’re
likely passing a pointer to the first element in the array.
But if the function has a pointer to the data, it is able to manipulate that data! So changes that a function
makes to an array will be visible back out in the caller.
Here’s an example where we pass a pointer to an array into a function, the function manipulates the values
in that array, and those changes are visible out in the caller.
1 #include <stdio.h>
2

3 void double_array(int *a, int len)


4 {
5 // Multiple each element by 2
6 //
7 // This doubles the values in x in main() since x and a both point
8 // to the same array in memory!
9

10 for (int i = 0; i < len; i++)


11 a[i] *= 2;
12 }
13

14 int main(void)
15 {
16 int x[5] = {1, 2, 3, 4, 5};
17

18 double_array(x, 5);
19

20 for (int i = 0; i < 5; i++)


21 printf("%d\n", x[i]); // 2, 4, 6, 8, 10!
22 }

Later when we talk about the equivalence between arrays and pointers, we’ll see how this makes a lot more
sense. For now, it’s enough to know that functions can make changes to arrays that are visible out in the
caller.
45
C11 §6.7.6.2¶1 requires it be greater than zero. But you might see code out there with arrays declared of zero length at the end of
structs and GCC is particularly lenient about it unless you compile with -pedantic. This zero-length array was a hackish mechanism
for making variable-length structures. Unfortunately, it’s technically undefined behavior to access such an array even though it basically
worked everywhere. C99 codified a well-defined replacement for it called flexible array members, which we’ll chat about later.
ARRAYS 36

Passing Multidimensional Arrays to Functions


The story changes a little when we’re talking about multidimensional arrays. C needs to know all the di-
mensions (except the first one) so it has enough information to know where in memory to look to find a
value.
Here’s an example where we’re explicit with all the dimensions:
1 #include <stdio.h>
2

3 void print_2D_array(int a[2][3])


4 {
5 for (int row = 0; row < 2; row++) {
6 for (int col = 0; col < 3; col++)
7 printf("%d ", a[row][col]);
8 printf("\n");
9 }
10 }
11

12 int main(void)
13 {
14 int x[2][3] = {
15 {1, 2, 3},
16 {4, 5, 6}
17 };
18

19 print_2D_array(x);
20 }

But in this case, these two46 are equivalent:


void print_2D_array(int a[2][3])
void print_2D_array(int a[][3])

The compiler really only needs the second dimension so it can figure out how far in memory to skip for each
increment of the first dimension.
Also, the compiler does minimal compile-time bounds checking (if you’re lucky), and C does zero runtime
checking of bounds. No seat belts! Don’t crash!

46
This is also equivalent: void print_2D_array(int (*a)[3]), but that’s more than I want to get into right now.
Strings

Finally! Strings! What could be simpler?


Well, turns out strings aren’t actually strings in C. That’s right! They’re pointers! Of course they are!
Much like arrays, strings in C barely exist.
But let’s check it out—it’s not really such a big deal.

Constant Strings
Before we start, let’s talk about constant strings in C. These are sequences of characters in double quotes (").
(Single quotes enclose characters, and are a different animal entirely.)
Examples:
"Hello, world!\n"
"This is a test."
"When asked if this string had quotes in it, she replied, \"It does.\""

The first one has a newline at the end—quite a common thing to see.
The last one has quotes embedded within it, but you see each is preceded by (we say “escaped by”) a backslash
(\) indicating that a literal quote belongs in the string at this point. This is how the C compiler can tell the
difference between printing a double quote and the double quote at the end of the string.

String Variables
Now that we know how to make a constant string, let’s assign it to a variable so we can do something with
it.
char *s = "Hello, world!";

Check out that type: pointer to a char47 . The string variable s is actually a pointer to the first character in
that string, namely the H.
And we can print it with the %s (for “string”) format specifier:
char *s = "Hello, world!";

printf("%s\n", s); // "Hello, world!"


47
It’s actually type const char*, but we haven’t talked about const yet.

37
STRINGS 38

String Variables as Arrays


Another option is this, equivalent to the above char* usage:
char s[14] = "Hello, world!";

// or, if we were properly lazy:

char s[] = "Hello, world!";

This means you can use array notation to access characters in a string. Let’s do exactly that to print all the
characters in a string on the same line:
1 #include <stdio.h>
2

3 int main(void)
4 {
5 char s[] = "Hello, world!";
6

7 for (int i = 0; i < 13; i++)


8 printf("%c\n", s[i]);
9 }

Note that we’re using the format specifier %c to print a single character.
Also, check this out. The program will still work fine if we change the definition of s to be a char* type:
1 #include <stdio.h>
2

3 int main(void)
4 {
5 char *s = "Hello, world!"; // char* here
6

7 for (int i = 0; i < 13; i++)


8 printf("%c\n", s[i]); // But still use arrays here...?
9 }

And we still can use array notation to get the job done when printing it out! This is surprising, but is still
only because we haven’t talked about array/pointer equivalence yet. But this is yet another hint that arrays
and pointers are the same thing, deep down.

String Initializers
We’ve already seen some examples with initializing string variables with constant strings:
char *s = "Hello, world!";
char t[] = "Hello, again!";

But these two are subtly different.


This one is a pointer to a constant string (i.e. a pointer to the first character in a constant string):
char *s = "Hello, world!";

If you try to mutate that string with this:


char *s = "Hello, world!";

s[0] = 'z'; // BAD NEWS: tried to mutate a constant string!


STRINGS 39

The behavior is undefined. Probably, depending on your system, a crash will result.
But declaring it as an array is different. This one is a non-constant, mutable copy of the constant string that
we can change at will
char t[] = "Hello, again!"; // t is an array copy of the string
t[0] = 'z'; // No problem

printf("%s\n", t); // "zello, again!"

So remember: if you have a pointer to a constant string, don’t try to change it!

Getting String Length


You can’t, since C doesn’t track it for you. And when I say “can’t”, I actually mean “can”48 . There’s a
function in <string.h> called strlen() that can be used to compute the length of any string.
1 #include <stdio.h>
2 #include <string.h>
3

4 int main(void)
5 {
6 char *s = "Hello, world!";
7

8 printf("The string is %zu characters long.\n", strlen(s));


9 }

The strlen() function returns type size_t, which is an integer type so you can use it for integer math. We
print size_t with %zu.
The above program prints:
The string is 13 characters long.

Great! So it is possible to get the string length!


But… if C doesn’t track the length of the string anywhere, how does it know how long the string is?

String Termination
C does strings a little differently than many programming languages, and in fact differently than almost every
modern programming language.
When you’re making a new language, you have basically two options for storing a string in memory:
1. Store the bytes of the string along with a number indicating the length of the string.
2. Store the bytes of the string, and mark the end of the string with a special byte called the terminator.
If you want strings longer than 255 characters, option 1 requires at least two bytes to store the length. Whereas
option 2 only requires one byte to terminate the string. So a bit of savings there.
Of course, these days is seems ridiculous to worry about saving a byte (or 3—lots of languages will happily
let you have strings that are 4 gigabytes in length). But back in the day, it was a bigger deal.
So C took approach #2. In C, a “string” is defined by two basic characteristics:
• A pointer to the first character in the string.
48
Though it is true that C doesn’t track the length of strings.
STRINGS 40

• A zero-valued byte (or NUL character49 ) somewhere in memory after the pointer that indicates the end
of the string.
A NUL character can be written in C code as \0, though you don’t often have to do this.
When you include a constant string in your code, the NUL character is automatically, implicitly included.
char *s = "Hello!"; // Actually "Hello!\0" behind the scenes

So with this in mind, let’s write our own strlen() function that counts characters in a string until it finds a
NUL.

The procedure is to look down the string for a single NUL character, counting as we go50 :
int my_strlen(char *s)
{
int count = 0;

while (s[count] != '\0') // Single quotes for single char


count++;

return count;
}

And that’s basically how the built-in strlen() gets the job done.

Copying a String
You can’t copy a string through the assignment operator (=). All that does is make a copy of the pointer to
the first character… so you end up with two pointers to the same string:
1 #include <stdio.h>
2

3 int main(void)
4 {
5 char s[] = "Hello, world!";
6 char *t;
7

8 // This makes a copy of the pointer, not a copy of the string!


9 t = s;
10

11 // We modify t
12 t[0] = 'z';
13

14 // But printing s shows the modification!


15 // Because t and s point to the same string!
16

17 printf("%s\n", s); // "zello, world!"


18 }

If you want to make a copy of a string, you have to copy it a byte at a time—but this is made easier with the
strcpy() function51 .

Before you copy the string, make sure you have room to copy it into, i.e. the destination array that’s going
to hold the characters needs to be at least as long as the string you’re copying.
49
This is different than the NULL pointer, and I’ll abbreviate it NUL when talking about the character versus NULL for the pointer.
50
Later we’ll learn a neater way to do with with pointer arithmetic.
51
There’s a safer function called strncpy() that you should probably use instead, but we’ll get to that later.
STRINGS 41

1 #include <stdio.h>
2 #include <string.h>
3

4 int main(void)
5 {
6 char s[] = "Hello, world!";
7 char t[100]; // Each char is one byte, so plenty of room
8

9 // This makes a copy of the string!


10 strcpy(t, s);
11

12 // We modify t
13 t[0] = 'z';
14

15 // And s remains unaffected because it's a different string


16 printf("%s\n", s); // "Hello, world!"
17

18 // But t has been changed


19 printf("%s\n", t); // "zello, world!"
20 }

Notice with strcpy(), the destination pointer is the first argument, and the source pointer is the second. A
mnemonic I use to remember this is that it’s the order you would have put t and s if an assignment = worked
for strings.
Structs

In C, have something called a struct, which is a user-definable type that holds multiple pieces of data,
potentially of different types.
It’s a convenient way to bundle multiple variables into a single one. This can be beneficial for passing
variables to functions (so you just have to pass one instead of many), and useful for organizing data and
making code more readable.
If you’ve come from another language, you might be familiar with the idea of classes and objects. These
don’t exist in C, natively52 . You can think of a struct as a class with only data members, and no methods.

Declaring a Struct
You can declare a struct in your code like so:
struct car {
char *name;
float price;
int speed;
};

This is often done at the global scope outside any functions so that the struct is globally available.
When you do this, you’re making a new type. The full type name is struct car. (Not just car—that won’t
work.)
There aren’t any variables of that type yet, but we can declare some:
struct car saturn;

And now we have an uninitialized variable saturn53 of type struct car.


We should initialize it! But how do we set the values of those individual fields?
Like in many other languages that stole it from C, we’re going to use the dot operator (.) to access the
individual fields.
saturn.name = "Saturn SL/2";
saturn.price = 15999.99;
saturn.speed = 175;

printf("Name: %s\n", saturn.name);


printf("Price (USD): %f\n", saturn.price);
printf("Top Speed (km): %d\n", saturn.speed);
52
Although in C individual items in memory like ints are referred to as “objects”, they’re not objects in an object-oriented program-
ming sense.
53
The Saturn was a popular brand of economy car in the United States until it was put out of business by the 2008 crash, sadly so to
us fans.

42
STRUCTS 43

Struct Initializers
That example in the previous section was a little unwieldy. There must be a better way to initialize that
struct variable!

You can do it with an initializer by putting values in for the fields in the order they appear in the struct
when you define the variable. (This won’t work after the variable has been defined—it has to happen in the
definition).
struct car {
char *name;
float price;
int speed;
};

// Now with an initializer! Same field order as in the struct declaration:


struct car saturn = {"Saturn SL/2", 16000.99, 175};

printf("Name: %s\n", saturn.name);


printf("Price: %f\n", saturn.price);
printf("Top Speed: %d km\n", saturn.speed);

The fact that the fields in the initializer need to be in the same order is a little freaky. If someone changes the
order in struct car, it could break all the other code!
We can be more specific with our initializers:
struct car saturn = {.speed=172, .name="Saturn SL/2"};

Now it’s independent of the order in the struct declaration. Which is safer code, for sure.
Similar to array initializers, any missing field designators are initialized to zero (in this case, that would be
.price, which I’ve omitted).

Passing Structs to Functions


You can do a couple things to pass a struct to a function.
1. Pass the struct.
2. Pass a pointer to the struct.
Recall that when you pass something to a function, a copy of that thing gets made for the function to operate
on, whether it’s a copy of a pointer, an int, a struct, or anything.
There are basically two cases when you’d want to pass a pointer to the struct:
1. You need the function to be able to make changes to the struct that was passed in, and have those
changes show in the caller.
2. The struct is somewhat large and it’s more expensive to copy that onto the stack than it is to just
copy a pointer54
For those two reasons, it’s far more common to pass a pointer to a struct to a function.
Let’s try that, making a function that will allow you to set the .price field of the struct car:
1 struct car {
2 char *name;
3 float price;
4 int speed;
54
A pointer is likely 8 bytes on a 64-bit system.
STRUCTS 44

5 };
6

7 int main(void)
8 {
9 struct car saturn = {.speed=175, .name="Saturn SL/2"};
10

11 // Pass a pointer to this struct car, along with a new,


12 // more realistic, price:
13 set_price(&saturn, 800.00);
14

15 // ... code continues ...

You should be able to come up with the function signature for set_price() just by looking at the types of
the arguments we have there.
saturn is a struct car, so &saturn must be the address of the struct car, AKA a pointer to a struct
car, namely a struct car*.

And 800.0 is a float.


So the function declaration must look like this:
void set_price(struct car *c, float new_price)

We just need to write the body. One attempt might be:


void set_price(struct car *c, float new_price) {
c.price = new_price; // ERROR!!
}

That won’t work because the dot operator only works on structs… it doesn’t work on pointers to structs.
Ok, so we can dereference the struct to de-pointer it to get to the struct itself. Dereferencing a struct
car* results in the struct car that the pointer points to, which we should be able to use the dot operator
on:
void set_price(struct car *c, float new_price) {
(*c).price = new_price; // Works, but non-idiomatic :(
}

And that works! But it’s a little clunky to type all those parens and the asterisk. C has some syntactic sugar
called the arrow operator that helps with that.

The Arrow Operator


void set_price(struct car *c, float new_price) {
// (*c).price = new_price; // Works, but non-idiomatic :(
//
// The line above is 100% equivalent to the one below:

c->price = new_price; // That's the one!


}

The arrow operator helps refer to fields in pointers to structs.


So when accessing fields. when do we use dot and when do we use arrow?
• If you have a struct, use dot (.).
• If you have a pointer to a struct, use arrow (->).
STRUCTS 45

Copying and Returning structs


Here’s an easy one for you!
Just assign from one to the other!
struct a, b;

b = a; // Copy the struct

And returning a struct (as opposed to a pointer to one) from a function also makes a similar copy to the
receiving variable.
This is not a “deep copy”. All fields are copied as-is, including pointers to things.
typedef: Making New Types

Well, not so much making new types as getting new names for existing types. Sounds kinda pointless on the
surface, but we can really use this to make our code cleaner.

typedef in Theory
Basically, you take an existing type and you make an alias for it with typedef.
Like this:
typedef int antelope; // Make "antelope" an alias for "int"

antelope x = 10; // Type "antelope" is the same as type "int"

You can take any existing type and do it. You can even make a number of types with a comma list:
typedef int antelope, bagel, mushroom; // These are all "int"

That’s really useful, right? That you can type mushroom instead of int? You must be super excited about
this feature!
OK, Professor Sarcasm—we’ll get to some more common applications of this in a moment.

Scoping
typedef follows regular scoping rules.

For this reason, it’s quite common to find typedef at file scope (“global”) so that all functions can use the
new types at will.

typedef in Practice
So renaming int to something else isn’t that exciting. Let’s see where typedef commonly makes an ap-
pearance.

typedef and structs


Sometimes a struct will be typedef’d to a new name so you don’t have to type the word struct over and
over.
struct animal {
char *name;
int leg_count, speed;
};

46
TYPEDEF: MAKING NEW TYPES 47

// original name new name


// | |
// v v
// |-----------| |----|
typedef struct animal animal;

struct animal y; // This works


animal z; // This also works because "animal" is an alias

Personally, I don’t care for this practice. I like the clarity the code has when you add the word struct to the
type; programmers know what they’re getting. But it’s really common so I’m including it here.
Now I want to run the exact same example in a way that you might commonly see. We’re going to put the
struct animal in the typedef. You can mash it all together like this:

// original name
// |
// v
// |-----------|
typedef struct animal {
char *name;
int leg_count, speed;
} animal; // <-- new name

struct animal y; // This works


animal z; // This also works because "animal" is an alias

That’s exactly the same as the previous example, just more concise.
But that’s not all! There’s another common shortcut that you might see in code using what are called anony-
mous structures55 . It turns out you don’t actually need to name the structure in a variety of places, and with
typedef is one of them.

Let’s do the same example with an anonymous structure:


// anonymous struct!
// |
// v
// |----|
typedef struct {
char *name;
int leg_count, speed;
} animal; // <-- new name

//struct animal y; // ERROR: this no longer works


animal z; // This works because "animal" is an alias

As another example, we might find something like this:


typedef struct {
int x, y;
} point;

point p = {.x=20, .y=40};

printf("%d, %d\n", p.x, p.y); // 20, 10


55
We’ll talk more about these later.
TYPEDEF: MAKING NEW TYPES 48

typedef and Other Types


It’s not that using typedef with a simple type like int is completely useless… it helps you abstract the types
to make it easier to change them later.
For example, if you have float all over your code in 100 zillion places, it’s going to be painful to change
them all to double if you find you have to do that later for some reason.
But if you prepared a little with:
typedef float app_float;

// and

app_float f1, f2, f3;

Then if later you want to change to another type, like long double, you just nee to change the typedef:
// voila!
// |---------|
typedef long double app_float;

// and

app_float f1, f2, f3; // Now these are all long doubles

typedef and Pointers


You can make a type that is a pointer.
typedef int *intptr;

int a = 10;
intptr x = &a; // "intptr" is type "int*"

I really don’t like this practice. It hides the fact that x is a pointer type because you don’t see a * in the
declaration.
IMHO, it’s better to explicitly show that you’re declaring a pointer type so that other devs can clearly see it
and don’t mistake x for having a non-pointer type.

typedef and Capitalization


I’ve seen all kinds of capitalization on typedef.
typedef struct {
int x, y;
} my_point; // lower snake case

typedef struct {
int x, y;
} MyPoint; // CamelCase

typedef struct {
int x, y;
} Mypoint; // Leading uppercase

typedef struct {
TYPEDEF: MAKING NEW TYPES 49

int x, y;
} MY_POINT; // UPPER SNAKE CASE

The C11 specification doesn’t dictate one way or another, and shows examples in all uppercase and all low-
ercase.
K&R2 uses leading uppercase predominantly, but show some examples in uppercase and snake case (with
_t).

If you have a style guide in use, stick with it. If you don’t, grab one and stick with it.
Pointers II: Arithmetic

Time to get more into it with a number of new pointer topics! If you’re not up to speed with pointers, check
out the first section in the guide on the matter.

Pointer Arithmetic
Turns out you can do math on pointers, notably addition and subtraction.
But what does it mean when you do that?
In short, if you have a pointer to a type, adding one to the pointer moves to the next item of that type directly
after it in memory.
It’s important to remember that as we move pointers around and look at different places in memory, we
need to make sure that we’re always pointing to a valid place in memory before we dereference. If we’re off
in the weeds and we try to see what’s there, the behavior is undefined and a crash is a common result.
This is a little chicken-and-eggy with Array/Pointer Equivalence, below, but we’re going to give it a shot,
anyway.

Adding to Pointers
First, let’s take an array of numbers.
int a[5] = {11, 22, 33, 44, 55};

Then let’s get a pointer to the first element in that array:


int a[5] = {11, 22, 33, 44, 55};

int *p = &a[0]; // Or "int *p = a;" works just as well

The let’s print the value there by dereferencing the pointer:


printf("%d\n", *p); // Prints 11

Now let’s use pointer arithmetic to print the next element in the array, the one at index 1:
printf("%d\n", *(p + 1)); // Prints 22!!

What happened there? C knows that p is a pointer to an int. So it knows the sizeof an int56 and it knows
to skip that many bytes to get to the next int after the first one!
In fact, the prior example could be written these two equivalent ways:
printf("%d\n", *p); // Prints 11
printf("%d\n", *(p + 0)); // Prints 11
56
Recall that the sizeof operator tells you the size in bytes of an object in memory.

50
POINTERS II: ARITHMETIC 51

because adding 0 to a pointer results in the same pointer.


Let’s think of the upshot here. We can iterate over elements of an array this way instead of using an array:
int a[5] = {11, 22, 33, 44, 55};

int *p = &a[0]; // Or "int *p = a;" works just as well

for (int i = 0; i < 5; i++) {


printf("%d\n", *(p + i)); // Same as p[i]!
}

And that works the same as if we used array notation! Oooo! Getting closer to that array/pointer equivalence
thing! More on this later in this chapter.
But what’s actually happening, here? How do it work?
Remember from early on that memory is like a big array, where a byte is stored at each array index.
And the array index into memory has a few names:
• Index into memory
• Location
• Address
• Pointer!
So a point is an index into memory, somewhere.
For a random example, say that a number 3490 was stored at address (“index”) 23,237,489,202. If we have
an int pointer to that 3490, that value of that pointer is 23,237,489,202… because the pointer is the memory
address. Different words for the same thing.
And now let’s say we have another number, 4096, stored right after the 3490 at address 23,237,489,210 (8
higher than the 3490 because each int in this example is 8 bytes long).
If we add 1 to that pointer, it actually jumps ahead sizeof(int) bytes to the next int. It knows to jump
that far ahead because it’s an int pointer. If it were a float pointer, it’d jump sizeof(float) bytes ahead
to get to the next float!
So you can look at the next int, by adding 1 to the pointer, the one after that by adding 2 to the pointer, and
so on.

Changing Pointers
We saw how we could add an integer to a pointer in the previous section. This time, let’s modify the pointer,
itself.
You can just add (or subtract) integer values directly to (or from) any pointer!
Let’s do that example again, except with a couple changes. First, I’m going to add a 999 to the end of our
numbers to act as a sentinel value. This will let us know where the end of the data is.
int a[] = {11, 22, 33, 44, 55, 999}; // Add 999 here as a sentinel

int *p = &a[0]; // p points to the 11

And we also have p pointing to the element at index 0 of a, namely 11, just like before.
Now—let’s starting incrementing p so that it points at subsequent elements of the array. We’ll do this until p
points to the 999; that is, we’ll do it until *p == 999:
while (*p != 999) { // While the thing p points to isn't 999
printf("%d\n", *p); // Print it
POINTERS II: ARITHMETIC 52

p++; // Move p to point to the next int!


}

Pretty crazy, right?


When we give it a run, first p points to 11. Then we increment p, and it points to 22, and then again, it points
to 33. And so on, until it points to 999 and we quit.

Subtracting Pointers
You can subtract a value from a pointer to get to earlier address, as well, just like we were adding to them
before.
But we can also subtract two pointers to find the difference between them, e.g. we can calculate how many
ints there are between two int*s. The catch is that this only works within a single array57 —if the pointers
point to anything else, you get undefined behavior.
Remember how strings are char*s in C? Let’s see if we can use this to write another variant of strlen()
to compute the length of a string that utilizes pointer subtraction.
The idea is that if we have a pointer to the beginning of the string, we can find a pointer to the end of the
string by scanning ahead for the NUL character.
And if we have a pointer to the beginning of the string, and we computed the pointer to the end of the string,
we can just subtract the two pointers to come up with the length!
1 #include <stdio.h>
2

3 int my_strlen(char *s)


4 {
5 // Start scanning from the beginning of the string
6 char *p = s;
7

8 // Scan until we find the NUL character


9 while (*p != '\0')
10 p++;
11

12 // Return the difference in pointers


13 return p - s;
14 }
15

16 int main(void)
17 {
18 printf("%d\n", my_strlen("Hello, world!")); // Prints "13"
19 }

Remember that you can only use pointer subtraction between two pointers that point to the same array!

Array/Pointer Equivalence
We’re finally ready to talk about this! We’ve seen plenty of examples of places where we’ve intermixed array
notation, but let’s give out the fundamental formula of array/pointer equivalence:
a[b] == *(a + b)
57
Or string, which is really an array of chars. Somewhat peculiarly, you can also have a pointer that references one past the end of
the array without a problem and still do math on it. You just can’t dereference it when it’s out there.
POINTERS II: ARITHMETIC 53

Study that! Those are equivalent and can be used interchangeably!


I’ve oversimplified a bit, because in my above example a and b can both be expressions, and we might want
a few more parentheses to force order of operations in case the expressions are complex.
The spec is specific, as always, declaring (in C11 §6.5.2.1¶2):
E1[E2] is identical to (*((E1)+(E2)))

but that’s a little harder to grok. Just make sure you include parentheses if the expressions are complicated
so all your math happens in the right order.
This means we can decide if we’re going to use array or pointer notation for any array or pointer (assuming
it points to an element of an array).
Let’s use an array and pointer with both array and pointer notation:
1 #include <stdio.h>
2

3 int main(void)
4 {
5 int a[] = {11, 22, 33, 44, 55}; // Add 999 here as a sentinel
6

7 int *p = a; // p points to the first element of a, 11


8

9 // Print all elements of the array a variety of ways:


10

11 for (int i = 0; i < 5; i++)


12 printf("%d\n", a[i]); // Array notation with a
13

14 for (int i = 0; i < 5; i++)


15 printf("%d\n", p[i]); // Array notation with p
16

17 for (int i = 0; i < 5; i++)


18 printf("%d\n", *(a + i)); // Pointer notation with a
19

20 for (int i = 0; i < 5; i++)


21 printf("%d\n", *(p + i)); // Pointer notation with p
22

23 for (int i = 0; i < 5; i++)


24 printf("%d\n", *(p++)); // Moving pointer p
25 //printf("%d\n", *(a++)); // Moving array variable a--ERROR!
26 }

So you can see that in general, if you have an array variable, you can use pointer or array notion to access
elements. Same with a pointer variable.
The one big difference is that you can modify a pointer to point to a different address, but you can’t do that
with an array variable.

Array/Pointer Equivalence in Function Calls


This is where you’ll encounter this concept the most, for sure.
If you have a function that takes a pointer argument, e.g.:
int my_strlen(char *s)

this means you can pass either an array or a pointer to this function and have it work!
POINTERS II: ARITHMETIC 54

char s[] = "Antelopes";


char *t = "Wombats";

printf("%d\n", my_strlen(s)); // Works!


printf("%d\n", my_strlen(t)); // Works, too!

And it’s also why these two function signatures are equivalent:
int my_strlen(char *s) // Works!
int my_strlen(char s[]) // Works, too!

void Pointers
You’ve already seen the void keyword used with functions, but this is an entirely separate, unrelated animal.
Sometimes it’s useful to have a pointer to a thing that you don’t know the type of.
I know. Bear with me just a second.
Let’s look at an example, the built-in memcpy() function:
void *memcpy(void *s1, void *s2, size_t n);

This function copies n bytes of memory starting from address s1 into the memory starting at address s2.
But look! s1 and s2 are void*s! Why? What does it mean? Let’s run more examples to see.
For instance, we could copy a string with memcpy() (though strcpy() is more appropriate for strings):
1 #include <stdio.h>
2 #include <string.h>
3

4 int main(void)
5 {
6 char s[] = "Goats!";
7 char t[100];
8

9 memcpy(t, s, 7); // Copy 7 bytes--including the NUL terminator!


10

11 printf("%s\n", t); // "Goats!"


12 }

Or we can copy some ints:


1 #include <stdio.h>
2 #include <string.h>
3

4 int main(void)
5 {
6 int a[] = {11, 22, 33};
7 int b[3];
8

9 memcpy(b, a, 3 * sizeof(int)); // Copy 3 ints of data


10

11 printf("%d\n", b[1]); // 22
12 }

That one’s a little wild—you see what we did there with memcpy()? We copied the data from a to b, but we
had to specify how many bytes to copy, and an int is more than one byte.
POINTERS II: ARITHMETIC 55

OK, then—how many bytes does an int take? Answer: depends on the system. But we can tell how many
bytes any type takes with the sizeof operator.
So there’s the answer: an int takes sizeof(int) bytes of memory to store.
And if we have 3 of them in our array, like we did in that example, the entire space used for the 3 ints must
be 3 * sizeof(int).
(In the string example, earlier, it would have been more technically accurate to copy 7 * sizeof(char)
bytes. But chars are always one byte large, by definition, so that just devolves into 7 * 1.)
We could even copy a float or a struct with memcpy()! (Though this is abusive—we should just use =
for that):
struct antelope my_antelope;
struct antelopy my_clone_antelope;

// ...

memcpy(&my_clone, &my_antelope, sizeof my_antelope);

Look at how versatile memcpy() is! If you have a pointer to a source and a pointer to a destination, and you
have the number of bytes you want to copy, you can copy any type of data.
That’s the power of void*. You can write code that doesn’t care about the type and is able to do things with
it.
But with great power comes great responsibility. Maybe not that great in this case, but there are some limits.
1. You cannot do pointer arithmetic a void*.
2. You cannot dereference a void*.
3. You cannot use the arrow operator on a void*, since it’s also a deference.
4. You cannot use array notation on a void*, since it’s also a dereference, as well58 .
And if you think about it, these rules make sense. All those operations rely on knowing the sizeof the type
of data pointed to, and with void*, we don’t know the size of the data being pointed to—it could be anything!
But wait—if you can’t dereference a void* what good can it ever do you?
Like with memcpy(), it helps you write generic functions that can handle multiple types of data. But the
secret is that, deep down, you convert the void* to another type before you use it!
And conversion is easy: you can just assign into a variable of the desired type59 .
char a = 'X'; // A single char

void *p = &a; // p points to the 'X'


char *q = p; // q also points to the 'X'

printf("%c\n", *p); // ERROR--cannot dereference void*!


printf("%c\n", *q); // Prints "X"

Let’s write our own memcpy() to try this out. We can copy bytes (chars), and we know the number of bytes
because it’s passed in.
void *my_memcpy(void *dest, void *src, int byte_count)
{
// Convert void*s to char*s
char *s = src, *d = dest;

58
Because remember that array notation is just a dereference and some pointer math, and you can’t dereference a void*!
59
You can also cast the void* to another type, but we haven’t gotten to casts yet.
POINTERS II: ARITHMETIC 56

// Now that we have char*s, we can dereference and copy them


while (byte_count--) {
*d++ = *s++;
}

// Most of these functions return the destination, just in case


// that's useful to the caller.
return dest;
}

Right there at the beginning, we copy the void*s into char*s so that we can use them as char*s. It’s as
easy as that.
Then some fun in a while loop, where we decrement byte_count until it becomes false (0). Remember
that with post-decrement, the value of the expression is computed (for while to use) and then the variable is
decremented.
And some fun in the copy, where we assign *d = *s to copy the byte, but we do it with post-increment so
that both d and s move to the next byte after the assignment is made.
Lastly, most memory and string functions return a copy of a pointer to the destination string just in case the
caller wants to use it.
Now that we’ve done that, I just want to quickly point out that we can use this technique to iterate over the
bytes of any object in C, floats, structs, or anything!
Let’s run one more real-world example with the built-in qsort() routine that can sort anything thanks to the
magic of void*s.
(In the following example, you can ignore the word const, which we haven’t covered yet.)
1 #include <stdio.h>
2 #include <stdlib.h>
3

4 // The type of structure we're going to sort


5 struct animal {
6 char *name;
7 int leg_count;
8 };
9

10 // This is a comparison function called by qsort() to help it determine


11 // what exactly to sort by. We'll use it to sort an array of struct
12 // animals by leg_count.
13 int compar(const void *elem1, const void *elem2)
14 {
15 // We know we're sorting struct animals, so let's make both
16 // arguments pointers to struct animals
17 const struct animal *animal1 = elem1;
18 const struct animal *animal2 = elem2;
19

20 // Return <0 =0 or >0 depending on whatever we want to sort by.


21

22 // Let's sort ascending by leg_count, so we'll return the difference


23 // in the leg_counts
24 return animal1->leg_count - animal2->leg_count;
25 }
26

27 int main(void)
POINTERS II: ARITHMETIC 57

28 {
29 // Let's build an array of 4 struct animals with different
30 // characteristics. This array is out of order by leg_count, but
31 // we'll sort it in a second.
32 struct animal a[4] = {
33 {.name="Dog", .leg_count=4},
34 {.name="Monkey", .leg_count=2},
35 {.name="Antelope", .leg_count=4},
36 {.name="Snake", .leg_count=0}
37 };
38

39 // Call qsort() to sort the array. qsort() needs to be told exactly


40 // what to sort this data by, and we'll do that inside the compar()
41 // function.
42 //
43 // This call is saying: qsort array a, which has 4 elements, and
44 // each element is sizeof(struct animal) bytes big, and this is the
45 // function that will compare any two elements.
46 qsort(a, 4, sizeof(struct animal), compar);
47

48 // Print them all out


49 for (int i = 0; i < 4; i++) {
50 printf("%d: %s\n", a[i].leg_count, a[i].name);
51 }
52 }

As long as you give qsort() a function that can compare two items that you have in your array to be sorted, it
can sort anything. And it does this without needing to have the types of the items hardcoded in there anywhere.
qsort() just rearranges blocks of bytes based on the results of the compar() function you passed in.
Manual Memory Allocation

This is one of the big areas where C likely diverges from languages you already know: manual memory
management.
Other languages uses reference counting, garbage collection, or other means to determine when to allocate
new memory for some data—and when to deallocate it when no variables refer to it.
And that’s nice. It’s nice to be able to not worry about it, to just drop all the references to an item and trust
that at some point the memory associated with it will be freed.
But C’s not like that, entirely.
Of course, in C, some variables are automatically allocated and deallocated when they come into scope and
leave scope. We call these automatic variables. They’re your average run-of-the-mill block scope “local”
variables. No problem.
But what if you want something to persist longer than a particular block? This is where manual memory
management comes into play.
You can tell C explicitly to allocate for you a certain number of bytes that you can use as you please. And
these bytes will remain allocated until you explicitly free that memory60 .
It’s important to free the memory you’re done with! If you don’t, we call that a memory leak and your process
will continue to reserve that memory until it exits.
If you manually allocated it, you have to manually free it when you’re done with it.
So how do we do this? We’re going to learn a couple new functions, and make use of the sizeof operator
to help us learn how many bytes to allocate.
In common C parlance, devs say that automatic local variables are allocated “on the stack”, and manually-
allocated memory is “on the heap”. The spec doesn’t talk about either of those things, but all C devs will
know what you’re talking about if you bring them up.
All functions we’re going to learn in this chapter can be found in <stdlib.h>.

Allocating and Deallocating, malloc() and free()


The malloc() function accepts a number of bytes to allocate, and returns a void pointer to that block of
newly-allocated memory.
Since it’s a void*, you can assign it into whatever pointer type you want… normally this will correspond in
some way to the number of bytes you’re allocating.
So… how many bytes should I allocate? We can use sizeof to help with that. If we want to allocate enough
room for a single int, we can use sizeof(int) and pass that to malloc().
60
Or until the program exits, in which case all the memory allocated by it is freed. Asterisk: some systems allow you to allocate
memory that persists after a program exits, but it’s system dependent, out of scope for this guide, and you’ll certainly never do it on
accident.

58
MANUAL MEMORY ALLOCATION 59

After we’re done with some allocated memory, we can call free() to indicate we’re done with that memory
and it can be used for something else. As an argument, you pass the same pointer you got from malloc()
(or a copy of it). It’s undefined behavior to use a memory region after you free() it.
Let’s try. We’ll allocate enough memory for an int, and then store something there, and the print it.
// Allocate space for a single int (sizeof(int) bytes-worth):

int *p = malloc(sizeof(int));

*p = 12; // Store something there

printf("%d\n", *p); // Print it: 12

free(p); // All done with that memory

//*p = 3490; // ERROR: undefined behavior! Use after free()!

Now, in that contrived example, there’s really no benefit to it. We could have just used an automatic int
and it would have worked. But we’ll see how the ability to allocate memory this way has its advantages,
especially with more complex data structures.
One more thing you’ll commonly see takes advantage of the fact that sizeof can give you the size of the
result type of any constant expression. So you could put a variable name in there, too, and use that. Here’s
an example of that, just like the previous one:
int *p = malloc(sizeof *p); // *p is an int, so same as sizeof(int)

Error Checking
All the allocation functions return a pointer to the newly-allocated stretch of memory, or NULL if the memory
cannot be allocated for some reason.
Some OSes like Linux can be configured in such a way that malloc() never returns NULL, even if you’re
out of memory. But despite this, you should always code it up with protections in mind.
int *x;

x = malloc(sizeof(int) * 10);

if (x == NULL) {
printf("Error allocating 10 ints\n");
// do something here to handle it
}

Here’s a common pattern that you’ll see, where we do the assignment and the condition on the same line:
int *x;

if ((x = malloc(sizeof(int) * 10)) == NULL)


printf("Error allocating 10 ints\n");
// do something here to handle it
}

Allocating Space for an Array


We’ve seen how to allocate space for a single thing; now what about for a bunch of them in an array?
MANUAL MEMORY ALLOCATION 60

In C, an array is a bunch of the same thing back-to-back in a contiguous stretch of memory.


We can allocate a contiguous stretch of memory—we’ve seen how to do that. If we wanted 3490 bytes of
memory, we could just ask for it:
char *p = malloc(3490); // Voila

And—indeed!—that’s an array of 3490 chars (AKA a string!) since each char is 1 byte. In other words,
sizeof(char) is 1.

Note: there’s no initialization done on the newly-allocated memory—it’s full of garbage. Clear it with mem-
set() if you want to, or see calloc(), below.

But we can just multiply the size of the thing we want by the number of elements we want, and then access
them using either pointer or array notation. Example!
1 #include <stdio.h>
2 #include <stdlib.h>
3

4 int main(void)
5 {
6 // Allocate space for 10 ints
7 int *p = malloc(sizeof(int) * 10);
8

9 // Assign them values 0-45:


10 for (int i = 0; i < 10; i++)
11 p[i] = i * 5;
12

13 // Print all values 0, 5, 10, 15, ..., 40, 45


14 for (int i = 0; i < 10; i++)
15 printf("%d\n", p[i]);
16

17 // Free the space


18 free(p);
19 }

The key’s in that malloc() line. If we know each int takes sizeof(int) bytes to hold it, and we know
we want 10 of them, we can just allocate exactly that many bytes with:
sizeof(int) * 10

And this trick works for every type. Just pass it to sizeof and multiply by the size of the array.

An Alternative: calloc()
This is another allocation function that works similarly to malloc(), with two key differences:
• Instead of a single argument, you pass the size of one element, and the number of elements you wish
to allocate. It’s like it’s made for allocating arrays.
• It clears the memory to zero.
You still use free() to deallocate memory obtained through calloc().
Here’s a comparison of calloc() and malloc().
// Allocate space for 10 ints with calloc(), initialized to 0:
int *p = calloc(sizeof(int), 10);

// Allocate space for 10 ints with malloc(), initialized to 0:


MANUAL MEMORY ALLOCATION 61

int *q = malloc(sizeof(int) * 10);


memset(q, 0, sizeof(int) * 10); // set to 0

Again, the result is the same for both except malloc() doesn’t zero the memory by default.

Changing Allocated Size with realloc()


If you’ve already allocated 10 ints, but later you decide you need 20, what can you do?
One option is to allocate some new space, and then memcpy() the memory over… but it turns out that
sometimes you don’t need to move anything. And there’s one function that’s just smart enough to do the
right thing in all the right circumstances: realloc().
It takes a pointer to some previously-allocted memory (by malloc() or calloc()) and a new size for the
memory region to be.
It then grows or shrinks that memory, and returns a pointer to it. Sometimes it might return the same pointer
(if the data didn’t have to be copied elsewhere), or it might return a different one (if the data did have to be
copied).
Let’s allocate an array of 20 floats, and then change our mind and make it an array of 40.
1 #include <stdio.h>
2 #include <stdlib.h>
3

4 int main(void)
5 {
6 // Allocate space for 20 floats
7 float *p = malloc(sizeof *p * 20);
8

9 // Assign them fractional values 0.0-1.0:


10 for (int i = 0; i < 20; i++)
11 p[i] = i / 20.0;
12

13 // But wait! Let's actually make this an array of 40 elements


14 p = realloc(p, sizeof *p * 40);
15

16 // And assign the new elements values in the range 1.0-2.0


17 for (int i = 20; i < 40; i++)
18 p[i] = 1.0 + (i - 20) / 20.0;
19

20 // Print all values 0.0-2.0 in the 40 elements:


21 for (int i = 0; i < 40; i++)
22 printf("%f\n", p[i]);
23

24 // Free the space


25 free(p);
26 }

Notice in there how we took the return value from realloc() and reassigned it into the same pointer variable
p that we passed in. That’s pretty common to do.

realloc() with NULL


Trivia time! These two lines are equivalent:
MANUAL MEMORY ALLOCATION 62

char *p = malloc(3490);
char *p = realloc(NULL, 3490);

That could be convenient if you have some kind of allocation loop and you don’t want to special-case the
first malloc().
int *p = NULL;
int length = 0;

while (!done) {
// Allocate 10 more ints:
length += 10;
p = realloc(p, sizeof *p * length);

// Do amazing things
// ...
}

In that example, we didn’t need an initial malloc() since p was NULL to start.

Aligned Allocations
You probably aren’t going to need to use this.
And I don’t want to get too far off in the weeds talking about it right now, but there’s this thing called memory
alignment, which has to do with the memory address (pointer value) being a multiple of a certain number.
For example, a system might require that 16-bit values begin on memory addresses that are multiples of 2.
Or that 64-bit values begin on memory addresses that are multiples of 2, 4, or 8, for example. It depends on
the CPU.
Some systems require this kind of alignment for fast memory access, or some even for memory access at all.
Now, if you use malloc(), calloc(), or realloc(), C will give you a chunk of memory that’s well-aligned
for any value at all, even structs. Works in all cases.
But there might be times that you know that some data can be aligned at a smaller boundary, or must be aligned
at a larger one for some reason. I imagine this is more common with embedded systems programming.
In those cases, you can specify an alignment with aligned_alloc().
The alignment is an integer power of two greater than zero, so 2, 4, 8, 16, etc. and you give that to
aligned_alloc() before the number of bytes you’re interested in.

The other restriction is that the number of bytes you allocate needs to be a multiple of the alignment. But
this might be changing. See C Defect Report 46061
Let’s do an example, allocating on a 64-byte boundary:
1 #include <stdio.h>
2 #include <stdlib.h>
3 #include <string.h>
4

5 int main(void)
6 {
7 // Allocate 256 bytes aligned on a 64-byte boundary
8 char *p = aligned_alloc(64, 256); // 256 == 64 * 4
9

61
http://www.open-std.org/jtc1/sc22/wg14/www/docs/summary.htm#dr_460
MANUAL MEMORY ALLOCATION 63

10 // Copy a string in there and print it


11 strcpy(p, "Hello, world!");
12 printf("%s\n", p);
13

14 // Free the space


15 free(p);
16 }

I want to throw a note here about realloc() and aligned_alloc(). realloc() doesn’t have any align-
ment guarantees, so if you need to get some aligned reallocated space, you’ll have to do it the hard way with
memcpy().

Here’s a non-standard aligned_realloc() function, if you need it:


void *aligned_realloc(void *ptr, size_t alignment, size_t size)
{
char *new_ptr = aligned_alloc(alignment, size);

if (new_ptr == NULL)
return NULL;

if (ptr != NULL)
memcpy(new_ptr, ptr, size);

return new_ptr;
}

Note that it always copies data, taking time, while real realloc() will avoid that if it can. So this is hardly
efficient. Avoid needing to reallocate custom-aligned data.
Scope

Scope is all about what variables are visible in what contexts.

Block Scope
This is the scope of almost all the variables devs define. It includes what other languages might call “function
scope”, i.e. variables that are declared inside functions.
The basic rule is that if you’ve declared a variable in a block delimited by squirrelly braces, the scope of that
variable is that block.
If there’s a block inside a block, then variables declared in the inner block are local to that block, and cannot
be seen in the outer scope.
Once a variable’s scope ends, that variable can no longer be referenced, and you can consider its value to be
gone into the great bit bucket62 in the sky.
An example with nested scope:
1 int main(void)
2 {
3 int a = 12; // Local to outer block, but visible in inner block
4

5 if (a == 12) {
6 int b = 99; // Local to inner block, not visible in outer block
7

8 printf("%d %d\n", a, b); // OK: "12 99"


9 }
10

11 printf("%d\n", a); // OK, we're still in a's scope


12

13 printf("%d\n", b); // ILLEGAL, out of b's scope


14 }

Where To Define Variables


Another fun fact is that you can define variables anywhere in the block, within reason—they have the scope
of that block, but cannot be used before they are defined.
1 #include <stdio.h>
2

3 int main(void)
4 {
5 int i = 0;
62
https://en.wikipedia.org/wiki/Bit_bucket

64
SCOPE 65

7 printf("%d\n", i); // OK: "0"


8

9 //printf("%d\n", j); // ILLEGAL--can't use j before it's defined


10

11 int j = 5;
12

13 printf("%d %d\n", i, j); // OK: "0 5"


14 }

Historically, C required all the variables be defined before any code in the block, but this is no longer the
case in the C99 standard.

Variable Hiding
If you have a variable named the same thing at an inner scope as one at an outer scope, the one at the inner
scope takes precedence at long as you’re running in the inner scope. That is, it hides the one at outer scope
for th duration of its lifetime.
1 #include <stdio.h>
2

3 int main(void)
4 {
5 int i = 10;
6

7 {
8 int i = 20;
9

10 printf("%d\n", i); // Inner scope i, 20 (outer i is hidden)


11 }
12

13 printf("%d\n", i); // Outer scope i, 10


14 }

You might have noticed in that example that I just threw a block in there at line 7, not so much as a for or
if statement to kick it off! This is perfectly legal. Sometimes a dev will want to group a bunch of local
variables together for a quick computation and will do this, but it’s rare to see.

File Scope
If you define a variable outside of a block, that variable has file scope. It’s visible in all functions in the file
that come after it, and shared between them. (An exception is if a block defines a variable of the same name,
it would hide the one at file scope.)
This is closest to what you would consider to be “global” scope in another language.
For example:
1 #include <stdio.h>
2

3 int shared = 10; // File scope! Visible to the whole file after this!
4

5 void func1(void)
6 {
7 shared += 100; // Now shared holds 110
8 }
SCOPE 66

10 void func2(void)
11 {
12 printf("%d\n", shared); // Prints "10"
13 }
14

15 int main(void)
16 {
17 func1();
18 func2();
19 }

Note that if shared were declared at the bottom of the file, it wouldn’t compile. It has to be declared before
any functions use it.

for-loop Scope
I really don’t know what to call this, as C11 §6.8.5.3¶1 doesn’t give it a proper name. We’ve done it already
a few times in this guide, as well. It’s when you declare a variable inside the first clause of a for-loop:
for (int i = 0; i < 10; i++)
printf("%d\n", i);

printf("%d\n", i); // ILLEGAL--i is only in scope for the for-loop

In that example, i’s lifetime begins the moment it is defined, and continues for the duration of the loop.
If the loop body is enclosed in a block, the variables defined in the for-loop are visible from that inner scope.
Unless, of course, that inner scope hides them. This crazy example prints 999 five times:
1 #include <stdio.h>
2

3 int main(void)
4 {
5 for (int i = 0; i < 5; i++) {
6 int i = 999; // Hides the i in the for-loop scope
7 printf("%d\n", i);
8 }
9 }

A Note on Function Scope


The C spec does refer to function scope, but it’s used exclusively with labels, something we haven’t discussed
yet. More on that another day.
Types II: Way More Types!

We’re used to char, int, and float types, but it’s now time to take that stuff to the next level and see what
else we have out there in the types department!

Signed and Unsigned Integers


So far we’ve used int as a signed type, that is, a value that can be either negative or positive. But C also has
specific unsigned integer types that can only hold positive numbers.
These types are prefaced by the keyword unsigned.
int a; // signed
signed int a; // signed
signed a; // signed, "shorthand" for "int" or "signed int", rare
unsigned int b; // unsigned
unsigned c; // unsigned, shorthand for "unsigned int"

Why? Why would you decide you only wanted to hold positive numbers?
Answer: you can get larger numbers in an unsigned variable than you can in a signed ones.
But why is that?
You can think of integers being represented by a certain number of bits63 . On my computer, an int is
represented by 64 bits.
And each permutation of bits that are either 1 or 0 represents a number. We can decide how to divvy up these
numbers.
With signed numbers, we use (roughly) half the permutations to represent negative numbers, and the other
half to represent positive numbers.
With unsigned, we use all the permutations to represent positive numbers.
On my computer with 64-bit ints using two’s complement64 to represent unsigned numbers, I have the
following limits on integer range:

Type Minimum Maximum


int -9,223,372,036,854,775,808 9,223,372,036,854,775,807
unsigned int 0 18,446,744,073,709,551,615

Notice that the largest positive unsigned int is approximately twice as large as the largest positive int.
So you can get some flexibility there.
63
“Bit” is short for binary digit. Binary is just another way of representing numbers. Instead of digits 0-9 like we’re used to, it’s
digits 0-1.
64
https://en.wikipedia.org/wiki/Two%27s_complement

67
TYPES II: WAY MORE TYPES! 68

Character Types
Remember char? The type we can use to hold a single character?
char c = 'B';

printf("%c\n", c); // "B"

I have a shocker for you: it’s actually an integer.


char c = 'B';

// Change this from %c to %d:


printf("%d\n", c); // 66 (!!)

Deep down, char is just a small int, namely an integer that uses just a single byte of space, limiting its range
to…
Here the C spec gets just a little funky. It assures us that a char is a single byte, i.e. sizeof(char) == 1.
But then in C11 §3.6¶3 it goes out of its way to say:
A byte is composed of a contiguous sequence of bits, the number of which is implementation-
defined.
Wait—what? Some of you might be used to the notion that a byte is 8 bits, right? I mean, that’s what it
is, right? And the answer is, “Almost certainly.”65 But C is an old language, and machines back in the day
had, shall we say, a more relaxed opinion over how many bits were in a byte. And through the years, C has
retained this flexibility.
But assuming your bytes in C are 8 bits, like they are for virtually all machines in the world that you’ll ever
see, the range of a char is…
—So before I can tell you, it turns out that chars might be signed or unsigned depending on your compiler.
Unless you explicitly specify.
In many cases, just having char is fine because you don’t care about the sign of the data. But if you need
signed or unsigned chars, you must be specific:
char a; // Could be signed or unsigned
signed char b; // Definitely signed
unsigned char c; // Definitely unsigned

OK, now, finally, we can figure out the range of numbers if we assume that a char is 8 bits and your system
uses the virtually universal two’s complement representation for signed and unsigned66 .
So, assuming those constraints, we can finally figure our ranges:

char type Minimum Maximum


signed char -128 127
unsigned char 0 255

And the ranges for char are implementation-defined.


Let me get this straight. char is actually a number, so can we do math on it?
Yup! Just remember to keep things in the range of a char!
65
The industry term for a sequence of exactly, indisputably 8 bits is an octet.
66
In general, f you have an 𝑛 bit two’s complement number, the signed range is −2𝑛−1 to 2𝑛−1 − 1. And the unsigned range is 0
to 2𝑛−1 .
TYPES II: WAY MORE TYPES! 69

1 #include <stdio.h>
2

3 int main(void)
4 {
5 char a = 10, b = 20;
6

7 printf("%d\n", a + b); // 30!


8 }

What about those constant characters in single quotes, like 'B'? How does that have a numeric value?
The spec is also hand-wavey here, since C isn’t designed to run on a single type of underlying system.
But let’s just assume for the moment that your character set is based on ASCII67 for at least the first 128
characters. In that case, the character constant will be converted to a char whose value is the same as the
ASCII value of the character.
That was a mouthful. Let’s just have an example:
1 #include <stdio.h>
2

3 int main(void)
4 {
5 char a = 10;
6 char b = 'B'; // ASCII value 66
7

8 printf("%d\n", a + b); // 76!


9 }

This depends on your execution environment and the character set used68 . One of the most popular character
sets today is Unicode69 (which is a superset of ASCII), so for your basic 0-9, A-Z, a-z and punctuation, you’ll
almost certainly get the ASCII values out of them.

More Integer Types: short, long, long long


So far we’ve just generally been using two integer types:
• char
• int
and we recently learned about the unsigned variants of the integer types. And we learned that char was
secretly a small int in disguise. So we know the ints can come in multiple bit sizes.
But there are a couple more integer types we should look at, and the minimum minimum and maximum values
they can hold.
Yes, I said “minimum” twice. The spec says that these types will hold numbers of at least these sizes, so your
implementation might be different. The header file <limits.h> defines macros that hold the minimum and
maximum integer values; rely on that to be sure, and never hardcode or assume these values.
These additional types are short int, long int, and long long int. Commonly, when using these
types, C developers leave the int part off (e.g. long long), and the compiler is perfectly happy.
// These two lines are equivalent:
long long int x;
long long x;
67
https://en.wikipedia.org/wiki/ASCII
68
https://en.wikipedia.org/wiki/List_of_information_system_character_sets
69
https://en.wikipedia.org/wiki/Unicode
TYPES II: WAY MORE TYPES! 70

// And so are these:


short int x;
short x;

Let’s take a look at the integer data types and sizes in ascending order, grouped by signedness.

Type Minimum Bytes Minimum Value Maximum Value


char 1 -127 or 0 127 or 25570
signed char 1 -127 127
short 2 -32767 32767
int 2 -32767 32767
long 4 -2147483647 2147483647
long long 8 -9223372036854775807 9223372036854775807
unsigned char 1 0 255
unsigned short 2 0 65535
unsigned int 2 0 65535
unsigned long 4 0 44294967295
unsigned long long 8 0 9223372036854775807

There is no long long long type. You can’t just keep adding longs like that. Don’t be silly.
Two’s complement fans might have noticed something funny about those numbers. Why does,
for example, the signed char stop at -127 instead of -128? Remember: these are only the
minimums required by the spec. Some number representations (like sign and magnitude71 ) top
off at ±127.
Let’s run the same table on my 64-bit, two’s complement system and see what comes out:

Type My Bytes Minimum Value Maximum Value


char 1 -128 12772
signed char 1 -128 127
short 2 -32768 32767
int 4 -2147483648 2147483647
long 8 -9223372036854775808 9223372036854775807
long long 8 -9223372036854775808 9223372036854775807
unsigned char 1 0 255
unsigned short 2 0 65535
unsigned int 4 0 4294967295
unsigned long 8 0 18446744073709551615
unsigned long long 8 0 18446744073709551615

That’s a little more sensible, but we can see how my system has larger limits than the minimums in the
specification.
So what are the macros in <limits.h>?

Type Min Macro Max Macro


char CHAR_MIN CHAR_MAX

70
Depends on if a char defaults to signed char or unsigned char
71
https://en.wikipedia.org/wiki/Signed_number_representations#Signed_magnitude_representation
72
My char is signed.
TYPES II: WAY MORE TYPES! 71

Type Min Macro Max Macro


signed char SCHAR_MIN SCHAR_MAX
short SHRT_MIN SHRT_MAX
int INT_MIN INT_MAX
long LONG_MIN LONG_MAX
long long LLONG_MIN LLONG_MAX
unsigned char 0 UCHAR_MAX
unsigned short 0 USHRT_MAX
unsigned int 0 UINT_MAX
unsigned long 0 ULONG_MAX
unsigned long long 0 ULLONG_MAX

Notice there’s a way hidden in there to determine if a system uses signed or unsigned chars. If CHAR_MAX
== UCHAR_MAX, it must be unsigned.

Also notice there’s no minimum macro for the unsigned variants—they’re just 0.

More Float: double and long double


Let’s see what the spec has to say about floating point numbers in §5.2.4.2.2¶1-2:
The following parameters are used to define the model for each floating-point type:

Parameter Definition
𝑠 sign (±1)
𝑏 base or radix of exponent representation (an integer > 1)
𝑒 exponent (an integer between a minimum 𝑒𝑚𝑖𝑛 and a maximum 𝑒𝑚𝑎𝑥 )
𝑝 precision (the number of base-𝑏 digits in the significand)
𝑓𝑘 nonnegative integers less than 𝑏 (the significand digits)

A floating-point number (𝑥) is defined by the following model:


𝑝
𝑥 = 𝑠𝑏𝑒 ∑ 𝑓𝑘 𝑏−𝑘 , 𝑒𝑚𝑖𝑛 ≤ 𝑒 ≤ 𝑒𝑚𝑎𝑥
𝑘=1

I hope that cleared it right up for you.


Okay, fine. Let’s step back a bit and see what’s practical.
Note: we refer to a bunch of macros in this section. They can be found in the header <float.h>.
Floating point number are encoded in a specific sequence of bits (IEEE-754 format73 is tremendously popular)
in bytes.
Diving in a bit more, the number is basically represented as the significand (which is the number part—the
significant digits themselves, also sometimes referred to as the mantissa) and the exponent, which is what
power to raise the digits to. Recall that a negative exponent can make a number smaller.
Imagine we’re using 10 as a number to raise by an exponent. We could represent the following numbers by
using a significand of 12345, and exponents of −3, 4, and 0 to encode the following floating point values:
12345 × 10−3 = 12.345
12345 × 104 = 123450000
73
https://en.wikipedia.org/wiki/IEEE_754
TYPES II: WAY MORE TYPES! 72

12345 × 100 = 12345


For all those numbers, the significand stays the same. The only difference is the exponent.
On your machine, the base for the exponent is probably 2, not 10, since computers like binary. You can
check it by printing the FLT_RADIX macro.
So we have a number that’s represented by a number of bytes, encoded in some way. Because there are a
limited number of bit patterns, a limited number of floating point numbers can be represented.
But more particularly, only a certain number of significant decimal digits can be represented accurately.
How can you get more? You can use larger data types!
And we have a couple of them. We know about float already, but for more precision we have double. And
for even more precision, we have long double (unrelated to long int except by name).
The spec doesn’t go into how many bytes of storage each type should take, but on my system, we can see the
relative size increases:

Type sizeof
float 4
double 8
long double 16

So each of the types (on my system) uses those additional bits for more precision.
But how much precision are we talking, here? How many decimal numbers can be represented by these
values?
Well, C provides us with a bunch of macros in <float.h> to help us figure that out.
It gets a little wonky if you are using a base-2 (binary) system for storing the numbers (which is virtually
everyone on the planet, probably including you), but bear with me while we figure it out.

How Many Decimal Digits?


The million dollar question is, “How many significant decimal digits can I store in a given floating point type
before the floating point precision runs out?”
But it’s not quite so easy to answer. So we’ll do it in two ways.
The number of decimal digits you can store in a floating point type and surely get the same number back out
when you print it is given by these macros:

Type Decimal Digits You Can Store Minimum


float FLT_DIG 6
double DBL_DIG 10
long double LDBL_DIG 10

On my system, FLT_DIG is 6, so I can be sure that if I print out a 6 digit float, I’ll get the same thing back.
(It could be more—some numbers will come back correctly with more digits. But 6 is definitely coming
back.)
For example, printing out floats following this pattern of increasing digits, we apparently make it to 8 digits
before something goes wrong, but after that we’re back to 7 correct digits.
0.12345
TYPES II: WAY MORE TYPES! 73

0.123456
0.1234567
0.12345678
0.123456791 <-- Things start going wrong
0.1234567910

Let’s do another demo. In this code we’ll have two floats that both hold numbers that have FLT_DIG
significant decimal digits74 . Then we add those together, for what should be 12 significant decimal digits.
But that’s more than we can store in a float and correctly recover as a string—so we see when we print it
out, things start going wrong after the 7th significant digit.
1 #include <stdio.h>
2 #include <float.h>
3

4 int main(void)
5 {
6 // Both these numbers have 6 significant digits, so they can be
7 // stored accurately in a float:
8

9 float f = 3.14159f;
10 float g = 0.00000265358f;
11

12 printf("%.5f\n", f); // 3.14159 -- correct!


13 printf("%.11f\n", g); // 0.00000265358 -- correct!
14

15 // Now add them up


16 f += g; // 3.14159265358 is what f _should_ be
17

18 printf("%.11f\n", f); // 3.14159274101 -- wrong!


19 }

(The above code has an f after the numeric constants—this indicates that the constant is type float, as
opposed to the default of double. More on this later.)
Remember that FLT_DIG is the safe number of digits you can store in a float and retrieve correctly.
Sometimes you might get one or two more out of it. But sometimes you’ll only get FLT_DIG digits back.
The sure thing: if you store any number of digits up to and including FLT_DIG in a float, you’re sure to get
them back correctly.
So that’s the story. FLT_DIG. The End.
…Or is it?

Converting to Decimal and Back


But storing a base 10 number in a floating point number is only half the story.
What about when you print out a floating point number? How many digits can you print?
You might think it would be the same as the number you can store, but it’s not75 !
But recall that you might have more decimal digits than FLT_DIG encoded correctly in the number. In order
to make sure you’re printed them all out, you can Of course, if you store the number 3.14f in a float, you
can’t expect to print out more than 2 decimal places and get sensible results. But FLT_DIG (if 6) says that
you can’t store more digits than 3.14159f and be sure of getting it stored successfully.
74
This program runs as its comments indicate on a system with FLT_DIG of 6 that uses IEEE-754 base-2 floating point numbers.
Otherwise, you might get different output.
75
Or at least, it’s probably not—if you store floating point numbers in base 2.
TYPES II: WAY MORE TYPES! 74

But what if you did some math on a floating point number? Can you get more precision?

Constant Numeric Types


When you write down a constant number, like 1234, it has a type. But what type is it? Let’s look at the how
C decides what type the constant is, and how to force it to choose a specific type.

Hexadecimal and Octal


In addition to good ol’ decimal like Grandma used to bake, C also supports constants of different bases.
If you lead a number with 0x, it is read as a hex number:
int a = 0x1A2B; // Hexadecimal
int b = 0x1a2b; // Case doesn't matter for hex digits

printf("%x", a); // Print a hex number, "1a2b"

If you lead a number with a 0, it is read as an octal number:


int a = 012;

printf("%o\n", a); // Print an octal number, "12"

This is particularly problematic for beginner programmers who try to pad decimal numbers on the left with
0 to line things up nice and pretty, inadvertently changing the base of the number:

int x = 11111; // Decimal 11111


int y = 00111; // Decimal 73 (Octal 111)
int z = 01111; // Decimal 585 (Octal 1111)

A Note on Binary
An unofficial extension76 in many C compilers allows you to represent a binary number with a 0b prefix:
int x = 0b101010; // Binary 101010

printf("%d\n", x); // Prints 42 decimal

There’s no printf() format specifier for printing a binary number. You have to do it a character at a time
with bitwise operators.

Integer Constants
You can force a constant integer to be a certain type by appending a suffix to it that indicates the type.
We’ll do some assignments to demo, but most often devs leave off the suffixes unless needed to be precise.
The compiler is pretty good at making sure the types are compatible.
int x = 1234;
long int x = 1234L;
long long int x = 1234LL

unsigned int x = 1234U;


unsigned long int x = 1234UL;
unsigned long long int x = 1234ULL;
76
It’s really surprising to me that C doesn’t have this in the spec yet. In the C99 Rationale document, they write, “A proposal to add
binary constants was rejected due to lack of precedent and insufficient utility.” Which seems kind of silly in light of some of the other
features they kitchen-sinked in there! I’ll bet one of the next releases has it.
TYPES II: WAY MORE TYPES! 75

The suffix can be uppercase or lowercase. And the U and L or LL can appear either one first.

Type Suffix
int None
long int L
long long int LL
unsigned int U
unsigned long int UL
unsigned long long int ULL

I mentioned in the table that “no suffix” means int… but it’s actually more complex than that.
So what happens when you have an unsuffixed number like:
int x = 1234;

What type is it?


What C will generally do is choose the smallest type from int up that can hold the value.
But specifically, that depends on the number’s base (decimal, hex, or octal), as well.
The spec has a great table indicating which type gets used for what unsuffixed value. In fact, I’m just going
to copy it wholesale right here.
C11 §6.4.4.1¶5 reads, “The type of an integer constant is the first of the first of the corresponding list in
which its value can be represented.”
And then goes on to show this table:

Octal or Hexadecimal
Suffix Decimal Constant Constant
none int int
long int unsigned int
long int
unsigned long int
long long int
unsigned long long int

u or U unsigned int unsigned int


unsigned long int unsigned long int
unsigned long long int unsigned long long int

l or L long int long int


long long int unsigned long int
long long int
unsigned long long int

Both u or U unsigned long int unsigned long int


and l or L unsigned long long int unsigned long long int

ll or LL long long int long long int


unsigned long long int

Both u or U unsigned long long int unsigned long long int


and ll or LL
TYPES II: WAY MORE TYPES! 76

What that’s saying is that, for example, if you specify a number like 123456789U, first C will see if it can be
unsigned int. If it doesn’t fit there, it’ll try unsigned long int. And then unsigned long long int.
It’ll use the smallest type that can hold the number.

Floating Point Constants


You’d think that a floating point constant like 1.23 would have a default type of float, right?
Surprise! Turns out unsuffiexed floating point numbers are type double! Happy belated birthday!
You can force it to be of type float by appending an f (or F—it’s case-insensitive). You can force it to be
of type long double by appending l (or L).

Type Suffix
float F
double None
long double L

For example:
float x = 3.14f;
double x = 3.14;
long double x = 3.14L;

This whole time, though, we’ve just been doing this, right?
float x = 3.14;

Isn’t the left a float and the right a double? Yes! But C’s pretty good with automatic numeric conversions,
so it’s more common to have an unsuffixed floating point constant than not. More on that later.

Scientific Notation
Remember earlier when we talked about how a floating point number can be represented by a significand,
base, and exponent?
Well, there’s a common way of writing such a number, shown here followed by it’s more recognizable equiv-
alent which is what you get when you actually run the math:
1.2345 × 103 = 1234.5
Writing numbers in the form 𝑠 × 𝑏𝑒 is called scientific notation77 . In C, these are written using “E notation”,
so these are equivalent:

Scientific Notation E notation


−3
1.2345 × 10 = 12.345 1.2345e-3
1.2345 × 104 = 123450000 1.2345e+4

You can print a number in this notation with %e:


printf("%e\n", 123456.0); // Prints 1.234560e+05

A couple little fun facts about scientific notation:


• You don’t have to write them with a single leading digit before the decimal point. Any number of
numbers can go in front.
77
https://en.wikipedia.org/wiki/Scientific_notation
TYPES II: WAY MORE TYPES! 77

double x = 123.456e+3; // 123456

However, when you print it, it will change the exponent so there is only one digit in front of the decimal
point.
• The plus can be left off the exponent, as it’s default, but this is uncommon in practice from what I’ve
seen.
1.2345e10 == 1.2345e+10

• You can apply the F or L suffixes to E-notation constants:


1.2345e10F
1.2345e10L

Hexadecimal Floating Point Constants


But wait, there’s more floating to be done!
Turns out there are hexadecimal floating point constants, as well!
These work similar to decimal floating point numbers, but they begin with a 0x just like integer numbers.
The catch is that you must specify an exponent, and this exponent produces a power of 2. That is: 2𝑥 .
And then you use a p instead of an e when writing the number:
So 0xa.1p3 is 10.0625 × 23 == 80.5.
When using floating point hex constants, We can print hex scientific notation with %a:
double x = 0xa.1p3;

printf("%a\n", x); // 0x1.42p+6


printf("%f\n", x); // 80.500000
Types III: Conversions

In this chapter, we want to talk all about converting from one type to another. C has a variety of ways of
doing this, and some might be a little different that you’re used to in other languages.
Before we talk about how to make conversions happen, let’s talk about how they work when they do happen.

String Conversions
Unlike many languages, C doesn’t do string-to-number (and vice-versa) conversions in quite as streamlined
a manner as it does numeric conversions.
For these, we’ll have to call functions to do the dirty work.

Numeric Value to String


When we want to convert a number to a string, we can use either sprintf() (pronounced SPRINT-f ) or
snprintf() (s-n-print-f )78

These basically work like printf(), except they output to a string instead, and you can print that string later,
or whatever.
For example, turning part of the value π into a string:
1 #include <stdio.h>
2

3 int main(void)
4 {
5 char s[10];
6 float f = 3.14159;
7

8 // Convert "f" to string, storing in "s", writing at most 10 characters


9 // including the NUL terminator
10

11 snprintf(s, 10, "%f", f);


12

13 printf("String value: %s\n", s); // String value: 3.141590


14 }

If we wanted to convert a double, we’d use %lf. Or a long double, %Lf.


78
They’re the same except snprintf() allows you to specify a maximum number of bytes to output, preventing the overrunning of
the end of your string. So it’s safer.

78
TYPES III: CONVERSIONS 79

String to Numeric Value


There are a couple families of functions to do this in C. We’ll call these the atoi (pronounced a-to-i) family
and the strtol (stir-to-long) family.
For basic conversion from a string to a number, try the atoi functions from <stdlib.h>. These have bad
error-handling characteristics (including undefined behavior if you pass in a bad string), so use them carefully.

Function Description
atoi String to int
atof String to float
atol String to long int
atoll String to long long int

Though the spec doesn’t cop to it, the a at the beginning of the function stands for ASCII79 , so really atoi()
is “ASCII-to-integer”, but saying so today is a bit ASCII-centric.
Here’s an example converting a string to a float:
1 #include <stdio.h>
2 #include <stdlib.h>
3

4 int main(void)
5 {
6 char *pi = "3.14159";
7 float f;
8

9 f = atof(pi);
10

11 printf("%f\n", f);
12 }

But, like I said, we get undefined behavior from weird things like this:
int x = atoi("what"); // "What" ain't no number I ever heard of

(When I run that, I get 0 back, but you really shouldn’t count on that in any way. You could get something
completely different.)
For better error handling characteristics, let’s check out all those strtol functions, also in <stdlib.h>. Not
only that, but they convert to more types and more bases, too!

Function Description
strtol String to long int
strtoll String to long long int
strtoul String to unsigned long int
strtoullString to unsigned long long int
strtof String to float
strtod String to double
strtold String to long double

These functions all follow a similar pattern of use, and are a lot of people’s first experience with pointers to
pointers! But never fret—it’s easier than it looks.
79
https://en.wikipedia.org/wiki/ASCII
TYPES III: CONVERSIONS 80

Let’s do an example where we convert a string to an unsigned long, discarding error information (i.e. in-
formation about bad characters in the input string):
1 #include <stdio.h>
2 #include <stdlib.h>
3

4 int main(void)
5 {
6 char *s = "3490";
7

8 // Convert string s, a number in base 10, to an unsigned long int.


9 // NULL means we don't care to learn about any error information.
10

11 unsigned long int x = strtoul(s, NULL, 10);


12

13 printf("%lu\n", x); // 3490


14 }

Notice a couple things there. Even though we didn’t deign to capture any information about error characters
in the string, strtoul() won’t give us undefined behavior; it will just return 0.
Also, we specified that this was a decimal (base 10) number.
Does this mean we can convert numbers of different bases? Sure! Let’s do binary!
1 #include <stdio.h>
2 #include <stdlib.h>
3

4 int main(void)
5 {
6 char *s = "101010"; // What's the meaning of this number?
7

8 // Convert string s, a number in base 2, to an unsigned long int.


9

10 unsigned long int x = strtoul(s, NULL, 2);


11

12 printf("%lu\n", x); // 42
13 }

OK, that’s all fun and games, but what’s with that NULL in there? What’s that for?
That helps us figure out if an error occurred in the processing of the string. It’s a pointer to a pointer to a
char, which sounds scary, but isn’t once you wrap your head around it.

Let’s do an example where we feed in a deliberately bad number, and we’ll see how strtol() lets us know
where the first invalid digit is.
1 #include <stdio.h>
2 #include <stdlib.h>
3

4 int main(void)
5 {
6 char *s = "34x90"; // "x" is not a valid digit in base 10!
7 char *badchar;
8

9 // Convert string s, a number in base 10, to an unsigned long int.


10

11 unsigned long int x = strtoul(s, &badchar, 10);


TYPES III: CONVERSIONS 81

12

13 // It tries to convert as much as possible, so gets this far:


14

15 printf("%lu\n", x); // 34
16

17 // But we can see the offending bad character because badchar


18 // points to it!
19

20 printf("Invalid character: %c\n", *badchar); // "x"


21 }

So there we have strtoul() modifying what badchar points to in order to show us where things went
wrong80 .
But what if nothing goes wrong? In that case, badchar will point to the NUL terminator at the end of the
string. So we can test for it:
1 #include <stdio.h>
2 #include <stdlib.h>
3

4 int main(void)
5 {
6 char *s = "3490"; // "x" is not a valid digit in base 10!
7 char *badchar;
8

9 // Convert string s, a number in base 10, to an unsigned long int.


10

11 unsigned long int x = strtoul(s, &badchar, 10);


12

13 // Check if things went well


14

15 if (*badchar == '\0') {
16 printf("Success! %lu\n", x);
17 } else {
18 printf("Partial conversion: %lu\n", x);
19 printf("Invalid character: %c\n", *badchar);
20 }
21 }

So there you have it. The atoi()-style functions are good in a controlled pinch, but the strtol()-style
functions give you far more control over error handling and the base of the input.

Numeric Conversions
Boolean
If you convert a zero to bool, the result is 0. Otherwise it’s 1.

Integer to Integer Conversions


If an integer type is converted to unsigned and doesn’t fit in it, the unsigned result wraps around odometer-
style until it fits in the unsigned81 .
80
We have to pass a pointer to badchar into strtoul() or it won’t be able to modify it in any way we can see, analogous to why
you have to pass a pointer to an int to a function if you want that function to be able to change that value of that int.
81
In practice, what’s probably happening on your implementation is that the high-order bits are just being dropped from the result, so
a 16-bit number 0x1234 being converted to an 8-bit number ends up as 0x0034, or just 0x34.
TYPES III: CONVERSIONS 82

If an integer type is converted to a signed number and doesn’t fit, the result is implementation-defined!
Something documented will happen, but you’ll have to look it up82

Integer and Floating Point Conversions


If a floating point type is converted to an integer type, the fractional part is discarded with prejudice83 .
But—and here’s the catch—if the number is too large to fit in the integer, you get undefined behavior. So
don’t do that.
Going From integer or floating point to floating point, C makes a best effort to find the closest floating point
number to the integer that it can.
Again, though, if the original value can’t be represented, it’s undefined behavior.

Implicit Conversions
These are conversions the compiler does automatically for you when you mix and match types.

The Integer Promotions


In a number of places, if a int can be used to represent a value from char or short (signed or unsigned),
that value is promoted up to int. If it doesn’t fit in an int, it’s promoted to unsigned int.
This is how we can do something like this:
char x = 10, y = 20;
int i = x + y;

In that case, x and y get promoted to int by C before the math takes place.
The integer promotions take place during The Usual Arithmetic Conversions, with variadic functions84 , unary
+ and - operators, or when passing values to functions without prototypes85 .

The Usual Arithmetic Conversions


These are automatic conversions that C does around numeric operations that you ask for. (That’s actually
what they’re called, by the way, by C11 §6.3.1.8.) Note that for this section, we’re just talking about numeric
types—strings will come later.
These conversions answer questions about what happens when you mix types, like this:
int x = 3 + 1.2; // Mixing int and double
float y = 12 * 2; // Mixing float and int

Do they become ints? Do they become floats? How does it work?


Here are the steps, paraphrased for easy consumption.
1. If one thing in the expression is a floating type, convert the other things to that floating type.
2. Otherwise, if both types are integer types, perform the integer promotions on each, then make the
operand types as big as they need to be hold the common largest value. Sometimes this involves
changing signed to unsigned.
82
Again, in practice, what will likely happen on your system is that the bit pattern for the original will be truncated and then just used
to represent the signed number, two’s complement. For example, my system takes an unsigned char of 192 and converts it to signed
char -64. In two’s complement, the bit pattern for both these numbers is binary 11000000.
83
Not really—it’s just discarded regularly.
84
Functions with a variable number of arguments.
85
This is rarely done because the compiler will complain and having a prototype is the Right Thing to do. I think this still works for
historic reasons, before prototypes were a thing.
TYPES III: CONVERSIONS 83

If you want to know the gritty details, check out C11 §6.3.1.8. But you probably don’t.
Just generally remember that int types become float types if there’s a floating point type anywhere in there,
and the compiler makes an effort to make sure mixed integer types don’t overflow.

void*
The void* type is interesting because it can be converted from or to any pointer type.
int x = 10;

void *p = &x; // &x is type int*, but we store it in a void*

int *q = p; // p is void*, but we store it in an int*

Explicit Conversions
These are conversions from type to type that you have to ask for; the compiler won’t do it for you.
You can convert from one type to another by assigning one type to another with an =.
You can also convert explicitly with a cast.

Casting
You can explicitly change the type of an expression by putting a new type in parentheses in front of it. Some
C devs frown on the practice unless absolutely necessary, but it’s likely you’ll come across some C code with
these in it.
Let’s do an example where we want to convert an int into a long so that we can store it in a long.
Note: this example is contrived and the cast in this case is completely unnecessary because the x + 12
expression would automatically be changed to long int to match the wider type of y.
int x = 10;
long int y = (long int)x + 12;

In that example, even those x was type int before, the expression (long int)x has type long int. We
say, “We cast x to long int.”
More commonly, you might see a cast being used to convert a void* into a specific pointer type so it can be
dereferenced.
A callback from the built-in qsort() function might display this behavior since it has void*s passed into it:
int compar(const void *elem1, const void *elem2)
{
return *((const int*)elem2) - *((const int*)elem1);
}

But you could also clearly write it with an assignment:


int compar(const void *elem1, const void *elem2)
{
const int *e1 = elem1;
const int *e2 = elem2;

return *e2 - *e1;


}
TYPES III: CONVERSIONS 84

One place you’ll see casts more commonly is to avoid a warning when printing pointer values with the
rarely-used %p which gets picky with anything other than a void*:
int x = 3490;
int *p = &x;

printf("%p\n", p);

generates this warning:


warning: format ‘%p’ expects argument of type ‘void *’, but argument
2 has type ‘int *’

You can fix it with a cast:


printf("%p\n", (void *)p);

Another place is with explicit pointer changes, if you don’t want to use an intervening void*, but these are
also pretty uncommon:
long x = 3490;
long *p = &x;
unsigned char *c = (unsigned char *)p;

Again, casting is rarely needed in practice. If you find yourself casting, there might be another way to do the
same thing, or maybe you’re casting unnecessarily.
Or maybe it is necessary. Personally, I try to avoid it, but am not afraid to use it if I have to.
Types IV: Qualifiers and Specifiers

Now that we have some more types under our belts, turns out we can give these types some additional
attributes that control their behavior. These are the type qualifiers and storage class specifiers.

Type Qualifiers
These are going to allow you to declare constant values, and also to give the compiler optimization hints that
it can use.

const
This is the most common type qualifier you’ll see. It means the variable is constant, and any attempt to
modify it will result in a very angry compiler.
const int x = 2;

x = 4; // COMPILER PUKING SOUNDS, can't assign to a constant

You can’t change a const value.


Often you see const in parameter lists for functions:
void foo(const int x)
{
printf("%d\n", x + 30); // OK, doesn't modify "x"
}

const and Pointers

This one gets a little funky, because there are two usages that have two meanings when it comes to pointers.
For one thing, we can make it so you can’t change the thing the pointer points to. You do this by putting the
const up front with the type name (before the asterisk) in the type declaration.

int x[] = {10, 20};


const int *p = x;

p++; // We can modify p, no problem

*p = 30; // Compiler error! Can't change what it points to

Somewhat confusingly, these two things are equivalent:


const int *p; // Can't modify what p points to
int const *p; // Can't modify what p points to, just like the previous line

85
TYPES IV: QUALIFIERS AND SPECIFIERS 86

Great, so we can’t change the thing the pointer points to, but we can change the pointer itself. What if we
want the other way around? We want to be able to change what the pointer points to, but not the pointer
itself?
Just move the const after the asterisk in the declaration:
int *const p; // We can't modify "p" with pointer arithmetic

p++; // Compiler error!

But we can modify what they point to:


int x = 10;
int *const p = &x;

*p = 20; // Set "x" to 20, no problem

You can also do make both things const:


const int *const p; // Can't modify p or *p!

Finally, if you have multiple levels of indirection, you should const the appropriate levels. Just because a
pointer is const, doesn’t mean the pointer it points to must also be. You can explicitly set them like in the
following examples:
char **p;
p++; // OK!
(*p)++; // OK!

char **const p;
p++; // Error!
(*p)++; // OK!

char *const *p;


p++; // OK!
(*p)++; // Error!

char *const *const p;


p++; // Error!
(*p)++; // Error!

const Correctness

One more thing I have to mention is that the compiler will warn on something like this:
const int x = 20;
int *p = &x;

saying something to the effect of:


initialization discards 'const' qualifier from pointer type target

What’s happening there?


Well, we need to look at the types on either side of the assignment:
const int x = 20;
int *p = &x;
// ^ ^
// | |
// int* const int*
TYPES IV: QUALIFIERS AND SPECIFIERS 87

The compiler is warning us that the value on the right side of the assignment is const, but the one of the
left is not. And the compiler is letting us know that it is discarding the “const-ness” of the expression on the
right.
That is, we can still try to do the following, but it’s just wrong. The compiler will warn, and it’s undefined
behavior:
const int x = 20;
int *p = &x;

*p = 40; // Undefined behavior--maybe it modifies "x", maybe not!

printf("%d\n", x); // 40, if you're lucky

restrict
TLDR: you never have to use this and you can ignore it every time you see it.
restrict is a hint to the compiler that a particular piece of memory will only be accessed by one pointer
and never another. If a developer declares a pointer to be restrict and then accesses the object it points to
in another way, the behavior is undefined.
Basically you’re telling C, “Hey—I guarantee that this one single pointer is the only way I access this memory,
and if I’m lying, you can pull undefined behavior on me.”
And C uses that information to perform certain optimizations.
For example, let’s write a function to swap two variables, and we’ll use the restrict keyword to assure C
that we’ll never pass in pointers to the same thing. And then let’s blow it an try passing in pointers to the
same thing.
1 void swap(int *restrict a, int *restrict b)
2 {
3 int t;
4

5 t = *a;
6 *a = *b;
7 *b = t;
8 }
9

10 int main(void)
11 {
12 int x = 10, y = 20;
13

14 swap(&x, &y); // OK! "a" and "b", above, point to different things
15

16 swap(&x, &x); // Undefined behavior! "a" and "b" point to the same thing
17 }

If we were to take out the restrict keywords, above, that would allow both calls to work safely. But then
the compiler might not be able to optimize.
restrict has block scope, that is, the restriction only lasts for the scope its used. If it’s in a parameter list
for a function, it’s in the block scope of that function.
If the restricted pointer points to an array, the restriction covers the entire array.
If it’s outside any function in file scope, the restriction covers the entire program.
You’re likely to see this in library functions like printf():
TYPES IV: QUALIFIERS AND SPECIFIERS 88

int printf(const char * restrict format, ...);

Again, that’s just telling the compiler that inside the printf() function, there will be only one pointer that
refers to any part of that format string.

volatile
You’re unlikely to see or need this unless you’re dealing with hardware directly.
volatile tells the compiler that a value might change behind its back and should be looked up every time.

An example might be where the compiler is looking in memory at an address that continuously updates
behind the scenes, e.g. some kind of hardware timer.
If the compiler decides to optimize that and store the value in a register for a protracted time, the value in
memory will update and won’t be reflected in the register.
By declaring something volatile, you’re telling the compiler, “Hey, the thing this points at might change
at any time for reasons outside this program code.”
volatile int *p;

_Atomic
This is an optional C feature that we’ll talk about another time.

Type Specifiers
Type specifiers are similar to type quantifiers. They give the compiler more information about the type of a
variable.

auto
You barely ever see this keyword, since auto is the default for block scope variables. It’s implied.
These are the same:
{
int a; // auto is the default...
auto int a; // So this is redundant
}

The auto keyword indicates that this object has automatic storage duration. That is, it exists in the scope in
which it is defined, and is automatically deallocated when the scope is exited.
One gotcha about automatic variables is that their value is indeterminate until you explicitly initialize them.
We say they’re full of “random” or “garbage” data, though neither of those really makes me happy. In any
case, you won’t know what’s in it unless you initialize it.
Always initialize all automatic variables before use!

static
This keyword has two meanings, depending on if the variable is file scope or block scope.
Let’s start with block scope.
TYPES IV: QUALIFIERS AND SPECIFIERS 89

static in Block Scope

In this case, we’re basically saying, “I just want a single instance of this variable to exist, shared between
calls.”
That is, its value will persist between calls.
static in block scope with an initializer will only be initialized one time on program startup, not each time
the function is called.
Let’s do an example:
1 #include <stdio.h>
2

3 void counter(void)
4 {
5 static int count = 1; // This is initialized one time
6

7 printf("This has been called %d time(s)\n", count);


8

9 count++;
10 }
11

12 int main(void)
13 {
14 counter(); // "This has been called 1 time(s)"
15 counter(); // "This has been called 2 time(s)"
16 counter(); // "This has been called 3 time(s)"
17 counter(); // "This has been called 4 time(s)"
18 }

See how the value of count persists between calls?


One thing of note is that static block scope variables are initialized to 0 by default.
static int foo; // Default starting value is `0`...
static int foo = 0; // So the `0` assignment is redundant

Finally, be advised that if you’re writing multithreaded programs, you have to be sure you don’t let multiple
threads trample the same variable.

static in File Scope

When you get out to file scope, outside any blocks, the meaning rather changes.
Variables at file scope already persist between function calls, so that behavior is already there.
Instead what static means in this context is that this variable isn’t visible outside of this particular source
file. Kinda like “global”, but only in this file.
More on that in the section about building with multiple source files.

extern
The extern type specifier gives us a way to refer to objects in other source files.
Let’s say, for example, the file bar.c had the following as its entirety:
1 // bar.c
2

3 int a = 37;
TYPES IV: QUALIFIERS AND SPECIFIERS 90

Just that. Declaring a new int a in file scope.


But what if we had another source file, foo.c, and we wanted to refer to the a that’s in bar.c?
It’s easy with the extern keyword:
1 // foo.c
2

3 extern int a;
4

5 int main(void)
6 {
7 printf("%d\n", a); // 37, from bar.c!
8

9 a = 99;
10

11 printf("%d\n", a); // Same "a" from bar.c, but it's now 99


12 }

We could have also made the extern int a in block scope, and it still would have referred to the a in
bar.c:

1 // foo.c
2

3 int main(void)
4 {
5 extern int a;
6

7 printf("%d\n", a); // 37, from bar.c!


8

9 a = 99;
10

11 printf("%d\n", a); // Same "a" from bar.c, but it's now 99


12 }

Now, if a in bar.c had been marked static. this wouldn’t have worked. static variables at file scope are
not visible outside that file.
A final note about extern on functions. For functions, extern is the default, so it’s redundant. You can
declare a function static if you only want it visible in a single source file.

register
Barely anyone uses this anymore.
This is a keyword to hint to the compiler that this variable is frequently-used, and should be made as fast as
possible to access. The compiler is under no obligation to agree to it.
Now, modern C compiler optimizers are pretty effective at figuring this out themselves, so it’s rare to see
these days.
But if you must:
1 #include <stdio.h>
2

3 int main(void)
4 {
5 register int a; // Make "a" as fast to use as possible.
6
TYPES IV: QUALIFIERS AND SPECIFIERS 91

7 for (a = 0; a < 10; a++)


8 printf("%d\n", a);
9 }

It does come at a price, however. You can’t take the address of a register:
register int a;
int *p = &a; // COMPILER ERROR! Can't take address of a register

The same applies to any part of an array:


register int a[] = {11, 22, 33, 44, 55};
int p = a; // COMPILER ERROR! Can't take address of a[0]

Or dereferencing part of an array:


register int a[] = {11, 22, 33, 44, 55};

int a = *(a + 2); // COMPILER ERROR! Address of a[0] taken

Interestingly, for the equivalent with array notation, gcc only warns:
register int a[] = {11, 22, 33, 44, 55};

int a = a[2]; // COMPILER WARNING!

with:
warning: ISO C forbids subscripting ‘register’ array

A bit of backstory, here: deep inside the CPU are little dedicated “variables” called registers86 . They are
super fast to access compared to RAM, so using them gets you a speed boost. But they’re not in RAM, so
they don’t have an associated memory address (which is why you can’t take the address-of or get a pointer
to them).
But, like I said, modern compilers are really good at producing optimal code, using registers whenever
possible regardless of whether or not you specified the register keyword. Not only that, but the spec
allows them to just treat it as if you’d typed auto, if they want.
In short, you probably don’t want to even bother with register, and just let the compiler do what it thinks
is best.

86
https://en.wikipedia.org/wiki/Processor_register
Multifile Projects

So far we’ve been looking at toy programs that for the most part fit in a single file. But complex C programs
are made up of many files that are all compiled and linked together into a single executable.
In this chapter we’ll check out some of the common patterns and practices for putting together larger projects.

Includes and Function Prototypes


A really common situation is that some of your functions are defined in one file, and you want to call them
from another.
This actually works out of the box with a warning… let’s first try it and then look at the right way to fix the
warning.
For these examples, we’ll put the filename as the first comment in the source.
To compile them, you’ll need to specify all the sources on the command line:
# output file source files
# v v
# |----| |---------|
gcc -o foo foo.c bar.c

In that examples, foo.c and bar.c get built into the executable named foo.
So let’s take a look at the source file bar.c:
1 // File bar.c
2

3 int add(int x, int y) {


4 return x + y;
5 }

And the file foo.c with main in it:


1 // File foo.c
2

3 #include <stdio.h>
4

5 int main(void)
6 {
7 printf("%d\n", add(2, 3)); // 5!
8 }

See how from main() we call add()—but add() is in a completely different source file! It’s in bar.c,
while the call to it is in foo.c!
If we build this with:

92
MULTIFILE PROJECTS 93

gcc -o foo foo.c bar.c

we get this warning:


warning: implicit declaration of function ‘add’

But if we ignore that (which really we should never do—always get your code to build with zero warnings!)
and try to run it:
./foo
5

Indeed, we get the result of 2 + 3! Yay!


So… about that warning. Let’s fix it.
What implicit declaration means is that we’re using a function, namely add() in this case, without
letting C know anything about it ahead of time. C wants to know what it returns, what types it takes as
arguments, and things such as that.
We saw how to fix that earlier with a function prototype. Indeed, if we add one of those to foo.c before we
make the call, everything works well:
1 // File foo.c
2

3 #include <stdio.h>
4

5 int add(int, int); // Add the prototype


6

7 int main(void)
8 {
9 printf("%d\n", add(2, 3)); // 5!
10 }

No more warning!
But that’s a pain—needing to type in the prototype every time you want to use a function. I mean, we used
printf() right there and didn’t need to type in a prototype; what gives?

If you remember from what back with hello.c at the beginning of the book, we actually did include the
prototype for printf()! It’s in the file stdio.h! And we included that with #include!
Can we do the same with our add() function? Make a prototype for it and put it in a header file?
Sure!
Header files in C have a .h extension by default. And they often, but not always, have the same name as
their corresponding .c file. So let’s make a bar.h file for our bar.c file, and we’ll stick the prototype in it:
1 // File bar.h
2

3 int add(int, int);

And now let’s modify foo.c to include that file. Assuming it’s in the same directory, we include it inside
double quotes (as opposed to angle brackets):
1 // File foo.c
2

3 #include <stdio.h>
4

5 #include "bar.h" // Include from current directory


6

7 int main(void)
MULTIFILE PROJECTS 94

8 {
9 printf("%d\n", add(2, 3)); // 5!
10 }

Notice how we don’t have the prototype in foo.c anymore—we included it from bar.h. Now any file that
wants that add() functionality can just #include "bar.h" to get it, and you don’t need to worry about
typing in the function prototype.
As you might have guessed, #include literally includes the named file right there in your source code, just
as if you’d typed it in.
We’re almost there! There’s just one more piece of boilerplate we have to add.

Dealing with Repeated Includes


It’s not uncommon that a header file will itself #include other headers needed for the functionality of its
corresponding C files. I mean, why not?
And it could be that you have a header #included multiple times from different places. Maybe that’s no
problem, but maybe it would cause compiler errors. And we can’t control how many places #include it!
Even, worse we might get into a crazy situation where header a.h includes header b.h, and b.h includes
a.h! It’s an #include infinite cycle!

Trying to build such a thing gives an error:


error: #include nested depth 200 exceeds maximum of 200

What we need to do is make it so that if a file gets included once, subsequent #includes for that file are
ignored.
The stuff that we’re about to do is so common that you should just automatically do it every time you
make a header file!
And the common way to do this is with a preprocessor variable that we set the first time we #include the
file. And then for subsequent #includes, we first check to make sure that the variable isn’t defined.
For that variable name, it’s super common to take the name of the header file, like bar.h, make it uppercase,
and replace the period with an underscore: BAR_H.
So put a check at the very, very top of the file where you see if it’s already been included, and effectively
comment the whole thing out if it has.
(Don’t put a leading underscore (because a leading underscore followed by a capital letter is reserved) or a
double leading underscore (because that’s also reserved.))
1 #ifndef BAR_H // If BAR_H isn't defined...
2 #define BAR_H // Define it (with no particular value)
3

4 // File bar.h
5

6 int add(int, int);


7

8 #endif // End of the #ifndef BAR_H

This will effectively cause the header file to be included only a single time, no matter how many places try
to #include it.
MULTIFILE PROJECTS 95

static and extern


When it comes to multifile projects, you can make sure file-scope variables and functions are not visible
from other source files with the static keyword.
And you can refer to objects in other files with extern.
For more info, check out the sections in the book on the static and extern type specifiers.

Compiling with Object Files


This isn’t part of the spec, but it’s 99.999% common in the C world.
You can compile C files into an intermediate representation called object files. These are compiled machine
code that hasn’t been put into an executable yet.
Object files in Windows have a .OBJ extension; in Unix-likes, they’re .o.
In gcc, we can build some like this, with the -c (compile only!) flag:
gcc -c foo.c # produces foo.o
gcc -c bar.c # produces bar.o

And then we can link those together into a single executable:


gcc -o foo foo.o bar.o

Voila, we’ve produced an executable foo from the two object files.
But you’re thinking, why bother? Can’t we just:
gcc -o foo foo.c bar.c

and kill two boids87 with one stone?


For little programs, that’s fine. I do it all the time.
But for larger programs, we can take advantage of the fact that compiling from source to object files is
relatively slow, and linking together a bunch of object files is relatively fast.
This really shows with the make utility that only rebuilds sources that are newer than their outputs.
Let’s say you had a thousand C files. You could compile them all to object files to start (slowly) and then
combine all those object files into an executable (fast).
Now say you modified just one of those C source files—here’s the magic: you only have to rebuild that one
object file for that source file! And then you rebuild the executable (fast). All the other C files don’t have to
be touched.
In other words, by only rebuilding the object files we need to, we cut down on compilation times radically.
(Unless of course you’re doing a “clean” build, in which case all the object files have to be created.)

87
https://en.wikipedia.org/wiki/Boids
The Outside Environment

When you run a program, it’s actually you talking to the shell, saying, “Hey, please run this thing.” And the
shell says, “Sure,” and then tells the operating system, “Hey, could you please make a new process and run
this thing?” And if all goes well, the OS complies and your program runs.
But there’s a whole world outside your program in the shell that can be interacted with from within C. We’ll
look at a few of those in this chapter.

Command Line Arguments


Many command line utilities accept command line arguments. For example, if we want to see all files that
end in .txt, we can type something like this on a Unix-like system:
ls *.txt

(or dir instead of ls on a Windows system).


In this case, the command is ls, but it arguments are all all files that end with .txt88 .
So how can we see what is passed into program from the command line?
Say we have a program called add that adds all numbers passed on the command line and prints the result:
./add 10 30 5
45

That’s gonna pay the bills for sure!


But seriously, this is a great tool for seeing how to get those arguments from the command line and break
them down.
First, let’s see how to get them at all. For this, we’re going to need a new main()!
Here’s a program that prints out all the command line arguments. For example, if we name the executable
foo, we can run it like this:

./foo i like turtles

and we’ll see this output:


arg 0: ./foo
arg 1: i
arg 2: like
arg 3: turtles
88
Historially, MS-DOS and Windows programs would do this differently than Unix. In Unix, the shell would expand the wildcard
into all matching files before your program saw it, whereas the Microsoft variants would pass the wildcard expression into the program
to deal with. In any case, there are arguments that get passed into the program.

96
THE OUTSIDE ENVIRONMENT 97

It’s a little weird, because the zeroth argument is the name of the executable, itself. But that’s just something
to get used to. The arguments themselves follow directly.
Source:
1 #include <stdio.h>
2

3 int main(int argc, char *argv[])


4 {
5 for (int i = 0; i < argc; i++) {
6 printf("arg %d: %s\n", i, argv[i]);
7 }
8 }

Whoa! What’s going on with the main() function signature? What’s argc and argv89 (pronounced arg-c
and arg-v)?
Let’s start with the easy one first: argc. This is the argument count, including the program name, itself. If
you think of all the arguments as an array of strings, which is exactly what they are, then you can think of
argc as the length of that array, which is exactly what it is.

And so what we’re doing in that loop is going through all the argvs and printing them out one at a time, so
for a given input:
./foo i like turtles

we get a corresponding output:


arg 0: ./foo
arg 1: i
arg 2: like
arg 3: turtles

With that in mind, we should be good to go with our adder program.


Our plan:
• Look at all the command line arguments (past argv[0], the program name)
• Convert them to integers
• Add them to a running total
• Print the result
Let’s get to it!
1 #include <stdio.h>
2 #include <stdlib.h>
3

4 int main(int argc, char **argv)


5 {
6 int total = 0;
7

8 for (int i = 0; i < argc; i++) {


9 int value = atoi(argv[i]); // Use strtol() for better error handling
10

11 total += value;
12 }
13

89
Since they’re just regular parameter names, you don’t actually have to call them argc and argv. But it’s so very idiomatic to use
those names, if you get creative, other C programmers will look at you with a suspicious eye, indeed!
THE OUTSIDE ENVIRONMENT 98

14 printf("%d\n", total);
15 }

Sample runs:
$ ./add
0
$ ./add 1
1
$ ./add 1 2
3
$ ./add 1 2 3
6
$ ./add 1 2 3 4
10

Of course, it might puke if you pass in a non-integer, but hardening against that is left as an exercise to the
reader.

The Last argv is NULL


One bit of fun trivia about argv is that after the last string is a pointer to NULL.
That is:
argv[argc] == NULL

is always true!
This might seem pointless, but it turns out to be useful in a couple places; we’ll take a look at one of those
right now.

The Alternate: char **argv


Remember that when you call a function, C doesn’t differentiate between array notation and pointer notation
in the function signature.
That is, these are the same:
void foo(char a[])
void foo(char *a)

Now, it’s been convenient to think of argv as an array of strings, i.e. an array of char*s, so this made sense:
int main(int argc, char *argc[])

but because of the equivalence, you could also write:


int main(int argc, char **argv)

Yeah, that’s a pointer to a pointer, all right! If it makes it easier, think of it as a pointer to a string. But really,
it’s a pointer to a value that points to a char.
Also recall that these are equivalent:
argv[i]
*(argv + i)

which means you can do pointer arithmetic on argv.


So an alternate way to consume the command line arguments might be to just walk along the argv array by
bumping up a pointer until we hit that NULL at the end.
Let’s modify our adder to do that:
THE OUTSIDE ENVIRONMENT 99

1 #include <stdio.h>
2 #include <stdlib.h>
3

4 int main(int argc, char **argv)


5 {
6 int total = 0;
7

8 // Cute trick to get the compiler to stop warning about the


9 // unused variable argc:
10 (void)argc;
11

12 for (char **p = argv; *p != NULL; p++) {


13 int value = atoi(*p); // Use strtol() for better error handling
14

15 total += value;
16 }
17

18 printf("%d\n", total);
19 }

Personally, I use array notation to access argv, but have seen this style floating around, as well.

Fun Facts
Just a few more things about argc and argv.
• Some environments might not set argv[0] to the program name. If it’s not available, argv[0] will
be an empty string. I’ve never seen this happen.
• The spec is actually pretty liberal with what an implementation can do with argv and where those
values come from. But every system I’ve been on works the same way, as we’ve discussed in this
section.
• You can modify argc, argv, or any of the strings that argv points to. (Just don’t make those strings
longer than they already are!)
• On some Unix-like systems, modifying the string argv[0] results in the output of ps changing90 .
Normally, if you have a program called foo that you’ve run with ./foo, you might see this in the
output of ps:
4078 tty1 S 0:00 ./foo

But if you modify argv[0] like so, being careful that the new string "Hi! " is the same length as
the old one "./foo":
strcpy(argv[0], "Hi! ");

and then run ps while the program ./foo is still executing, we’ll see this instead:
4079 tty1 S 0:00 Hi!

This behavior is not in the spec and is highly system-dependent.

Exit Status
Did you notice that the function signatures for main() have it returning type int? What’s that all about?
It has to do with a thing called the exit status, which is an integer that can be returned to the program that
90
ps, Process Status, is a Unix command to see what processes are running at the moment.
THE OUTSIDE ENVIRONMENT 100

launched yours to let it know how things went.


Now, there are a number of ways a program can exit in C, including returning from main(), or calling one
of the exit() variants.
All of these methods accept an int as an argument.
Side note: did you see that in basically all my examples, even though main() is supposed to return an int,
I don’t actually return anything? In any other function, this would be illegal, but there’s a special case in
C: if execution reaches the end of main() without finding a return, it automatically does a return 0.
But what does the 0 mean? What other numbers can we put there? And how are they used?
The spec is both clear and vague on the matter, as is common. Clear because it spells out what you can do,
but vague in that it doesn’t particularly limit it, either.
Nothing for it but to forge ahead and figure it out!
Let’s get Inception91 for a second: turns out that when you run your program, you’re running it from another
program.
Usually this other program is some kind of shell92 that doesn’t do much on its own except launch other
programs.
But this is a multi-phase process, especially visible in command-line shells:
1. The shell launches your program
2. The shell typically goes to sleep (for command-line shells)
3. Your program runs
4. Your program terminates
5. The shell wakes up and waits for another command
Now, there’s a little piece of communication that takes place between steps 4 and 5: the program can return
a status value that the shell can interrogate. Typically, this value is used to indicate the success or failure of
your program, and, if a failure, what type of failure.
This value is what we’ve been returning from main(). That’s the status.
Now, the C spec allows for two different status values, which have macro names defined in <stdlib.h>:

Status Description
EXIT_SUCCESS or 0 Program terminated successfully.
EXIT_FAILURE Program terminated with an error.

Let’s write a short program that multiplies two numbers from the command line. We’ll require that you
specify exactly two values. If you don’t, we’ll print an error message, and exit with an error status.
1 #include <stdio.h>
2 #include <stdlib.h>
3

4 int main(int argc, char **argv)


5 {
6 if (argc != 3) {
7 printf("usage: mult x y\n");
8 return EXIT_FAILURE; // Indicate to shell that it didn't work
9 }
10

91
https://en.wikipedia.org/wiki/Inception
92
https://en.wikipedia.org/wiki/Shell_(computing)
THE OUTSIDE ENVIRONMENT 101

11 printf("%d\n", atoi(argv[1]) * atoi(argv[2]));


12

13 return 0; // same as EXIT_SUCCESS, everything was good.


14 }

Now if we try to run this, we get the expected effect until we specify exactly the right number of command-
line arguments:
$ ./mult
usage: mult x y

$ ./mult 3 4 5
usage: mult x y

$ ./mult 3 4
12

But that doesn’t really show the exit status that we returned, does it? We can get the shell to print it out,
though. Assuming you’re running Bash or another POSIX shell, you can use echo $? to see it93 .
Let’s try:
$ ./mult
usage: mult x y
$ echo $?
1

$ ./mult 3 4 5
usage: mult x y
$ echo $?
1

$ ./mult 3 4
12
$ echo $?
0

Interesting! We see that on my system, EXIT_FAILURE is 1. The spec doesn’t spell this out, so it could be
any number. But try it; it’s probably 1 on your system, too.

Other Exit Status Values


The status 0 most definitely means success, but what about all the other integers, even negative ones?
Here we’re going off the C spec and into Unix land. In general, while 0 means success, a positive non-zero
number means failure. So you can only have one type of success, and multiple types of failure. Bash says
the exit code should be between 0 and 255, though a number of codes are reserved.
In short, if you want to indicate different error exit statuses in a Unix environment, you can start with 1 and
work your way up.
On Linux, if you try any code outside the range 0-255, it will bitwise AND the code with 0xff, effectively
clamping it to that range.
You can script the shell to later use these status codes to make decisions about what to do next.
93
In Windows cmd.exe, type echo %errorlevel%. In PowerShell, type $LastExitCode.
THE OUTSIDE ENVIRONMENT 102

Environment Variables
Before I get into this, I need to warn you that C doesn’t specify what an environment variable is. So I’m
going to describe the environment variable system that works on every major platform I’m aware of.
Basically, the environment is the program that’s going to run your program, e.g. the bash shell. And it might
have some bash variables defined. In case you didn’t know, the shell can make its own variables. Each shell
is different, but in bash you can just type set and it’ll show you all of them.
Here’s an except from the 61 variables that are defined in my bash shell:
HISTFILE=/home/beej/.bash_history
HISTFILESIZE=500
HISTSIZE=500
HOME=/home/beej
HOSTNAME=FBILAPTOP
HOSTTYPE=x86_64
IFS=$' \t\n'

Notice they are in the form of key/value pairs. For example, one key is HOSTTYPE and its value is x86_64.
From a C perspective, all values are strings, even if they’re numbers94 .
So, anyway! Long story short, it’s possible to get these values from inside your C program.
Let’s write a program that uses the standard getenv() function to look up a value that you set in the shell.
getenv() will return a pointer to the value string, or else NULL if the environment variable doesn’t exist.

1 #include <stdio.h>
2 #include <stdlib.h>
3

4 int main(void)
5 {
6 char *val = getenv("FROTZ"); // Try to get the value
7

8 // Check to make sure it exists


9 if (val == NULL) {
10 printf("Cannot find the FROTZ environment variable\n");
11 return EXIT_FAILURE;
12 }
13

14 printf("Value: %s\n", val);


15 }

If I run this directly, I get this:


$ ./foo
Cannot find the FROTZ environment variable

which makes since, since I haven’t set it yet.


In bash, I can set it to something with95 :
$ export FROTZ="C is awesome!"

Then if I run it, I get:


$ ./foo
Value: C is awesome!
94
If you need a numeric value, convert the string with something like atoi() or strtol().
95
In Windows CMD.EXE, use set FROTZ=value. In PowerShell, use $Env:FROTZ=value.
THE OUTSIDE ENVIRONMENT 103

In this way, you can set up data in environment variables, and you can get it in your C code and modify your
behavior accordingly.

Setting Environment Variables


This isn’t standard, but a lot of systems provide ways to set environment variables.
If on a Unix-like, look up the documentation for putenv(), setenv(), and unsetenv(). On Windows, see
_putenv().
The C Preprocessor

Before your program gets compiled, it actually runs through a phase called preprocessing. It’s almost like
there’s a language on top of the C language that runs first. And it outputs the C code, which then gets
compiled.
We’ve already seen this to an extent with #include! That’s the C Preprocessor! Where it sees that directive,
it includes the named file right there, just as if you’d typed it in there. And then the compiler builds the whole
thing.
But it turns out it’s a lot more powerful than just being able to include things. You can define macros that
are substituted… and even macros that take arguments!

#include
Let’s start with the one we’ve already seen a bunch. This is, of course, a way to include other sources in your
source. Very commonly used with header files.
While the spec allows for all kinds of behavior with #include, we’re going to take a more pragmatic ap-
proach and talk about the way it works on every system I’ve ever seen.
We can split header files into two categories: system and local. Things that are built-in, like stdio.h,
stdlib.h, math.h, and so on, you can include with angle brackets:

#include <stdio.h>
#include <stdlib.h>

The angle brackets tell C, “Hey, don’t look in the current directory for this header file—look in the system-
wide include directory instead.”
Which, of course, implies that there must be a way to include local files from the current directory. And there
is: with double quotes:
#include "myheader.h"

Or you can very probably look in relative directories using forward slashes and dots, like this:
#include "mydir/myheader.h"
#include "../someheader.py"

Don’t use a backslash (\) for your path separators in your #include! It’s undefined behavior! Use forward
slash (/) only, even on Windows.
In summary, used angle brackets (< and >) for the system includes, and use double quotes (") for your personal
includes.

104
THE C PREPROCESSOR 105

Simple Macros
Now let’s check out another cool feature of the preprocessor: the ability to define constant values and substi-
tute them in place.
We do this with #define (often read “pound define”). Here’s an example:
1 #include <stdio.h>
2

3 #define HELLO "Hello, world"


4 #define PI 3.14159
5

6 int main(void)
7 {
8 printf("%s, %f\n", HELLO, PI);
9 }

On lines 3 and 4 we defined a couple macros. Wherever these appear elsewhere in the code (line 8), they’ll
be substituted with the defined values.
From the C compiler’s perspective, it’s exactly as if we’d written this, instead:
1 #include <stdio.h>
2

3 int main(void)
4 {
5 printf("%s, %f\n", "Hello, world", 3.14159);
6 }

Note that the macros aren’t typed, per se. Really all that happens is they get replaced wholesale with whatever
they’re #defined as. If the resulting C code is invalid, the compiler will puke.
You can also define a macro with no value:
#define EXTRA_HAPPY

in that case, the macro exists and is defined, but is defined to be nothing. So anyplace it occurs in the text
will just be replaced with nothing. We’ll see a use for this later.
It’s conventional to write macro names in ALL_CAPS even though that’s not technically required.
Overall, this gives you a way to define constant values that are effectively global and can be used any place
that a constant can be used, e.g. in switch cases.
It can also be used to replace or modify keywords, a place a const won’t work at all, though this practice
should be used sparingly.

Conditional Compilation
It’s possible to get the preprocessor to decide whether or not to present certain blocks of code to the compiler,
or just remove them entirely before compilation.
We do that by basically wrapping up the code in conditional blocks, similar to if-else statements.

If Defined, #ifdef and #endif


First of all, let’s try to compile specific code depending on whether or not a macro is even defined.
1 #include <stdio.h>
2
THE C PREPROCESSOR 106

3 #define EXTRA_HAPPY
4

5 int main(void)
6 {
7

8 #ifdef EXTRA_HAPPY
9 printf("I'm extra happy!\n");
10 #endif
11

12 printf("OK!\n");
13 }

In that example, we define EXTRA_HAPPY (to be nothing, but it is defined), then on line 8 we check to see
if it is defined with an #ifdef directive. If it is defined, the subsequent code will be included up until the
#endif.

So because it is defined, the code will be included for compilation and the output will be:
I'm extra happy!
OK!

If we were to comment out the #define, like so:


//#define EXTRA_HAPPY

then it wouldn’t be defined, and the code wouldn’t be included in compilation. And the output would just
be:
OK!

It’s important to remember that these decisions happen at compile time! The code actually get compiled or
removed depending on the condition. This is in contrast to a standard if statement that gets evaluated while
the program is running.

If Not Defined, #ifndef


There’s also the negative sense of “if defined”: “if not defined”, or #ifndef. We could change the previous
example to read to output different things based on whether or not something was defined:
8 #ifdef EXTRA_HAPPY
9 printf("I'm extra happy!\n");
10 #endif
11

12 #ifndef EXTRA_HAPPY
13 printf("I'm just regular\n");
14 #endif

We’ll see a cleaner way to do that in the next section.


Tying it all back in to header files, we’ve seen how we can cause header files to only be included one time
by wrapping them in preprocessor directives like this:
#ifndef MYHEADER_H // First line of myheader.h
#define MYHEADER_H

int x = 12;

#endif // Last line of myheader.h


THE C PREPROCESSOR 107

This demonstrates how a macro persists across files and multiple #includes. If it’s not yet defined, let’s
define it and compile the whole header file.
But the next time it’s included, we see that MYHEADER_H is defined, so we don’t send the header file to the
compiler—it gets effectively removed.

#else
But that’s not all we can do! There’s also an #else that we can throw in the mix.
Let’s mod the previous example:
8 #ifdef EXTRA_HAPPY
9 printf("I'm extra happy!\n");
10 #else
11 printf("I'm just regular\n");
12 #endif

Now if EXTRA_HAPPY is not defined, it’ll hit the #else clause and print:
I'm just regular

General Conditional: #if, #elif


This works very much like the #ifdef and #ifndef directives in that you can also have an #else and the
whole thing wraps up with #endif.
The only difference is that the constant expression after the #if must evaluate to true (non-zero) for the code
in the #if to be compiled. So instead of whether or not something is defined, we want an expression that
evaluates to true.
1 #include <stdio.h>
2

3 #define HAPPY_FACTOR 1
4

5 int main(void)
6 {
7

8 #if HAPPY_FACTOR == 0
9 printf("I'm not happy!\n");
10 #elif HAPPY_FACTOR == 1
11 printf("I'm just regular\n");
12 #else
13 printf("I'm extra happy!\n");
14 #endif
15

16 printf("OK!\n");
17 }

Again, for the unmatched #if clauses, the compiler won’t even see those lines. For the above code, after the
preprocessor gets finished with it, all the compiler sees is:
1 #include <stdio.h>
2

3 int main(void)
4 {
5

6 printf("I'm just regular\n");


THE C PREPROCESSOR 108

8 printf("OK!\n");
9 }

One hackish thing this is used for is to comment out large numbers of lines quickly96 .
If you put an #if 0 (“if false”) at the front of the block to be commented out and an #endif at the end, you
can get this effect:
#if 0
printf("All this code"); /* is effectively */
printf("commented out"); // by the #if 0
#endif

You might have noticed that there’s no #elifdef or #elifndef directives. How can we get the same effect
with #if? That is, what if I wanted this:
#ifdef FOO
x = 2;
#elifdef BAR // ERROR: Not supported by standard C
x = 3;
#endif

How could I do it?


Turns out there’s a preprocessor operator called defined that we can use with an #if statement.
These are equivalent:
#ifdef FOO
#if defined FOO
#if defined(FOO) // Parentheses optional

As are these:
#ifndef FOO
#if !defined FOO
#if !defined(FOO) // Parentheses optional

Notice how we can use the standard logical NOT operator (!) for “not defined”.
So now we’re back in #if land and we can use #elif with impunity!
This broken code:
#ifdef FOO
x = 2;
#elifdef BAR // ERROR: Not supported by standard C
x = 3;
#endif

can be replaced with:


#if defined FOO
x = 2;
#elif defined BAR
x = 3;
#endif
96
You can’t always just wrap the code in /* */ comments because those won’t nest.
THE C PREPROCESSOR 109

Losing a Macro: #undef


If you’ve defined something but you don’t need it any longer, you can undefine it with #undef.
1 #include <stdio.h>
2

3 int main(void)
4 {
5 #define GOATS
6

7 #ifdef GOATS
8 printf("Goats detected!\n"); // prints
9 #endif
10

11 #undef GOATS // Make GOATS no longer defined


12

13 #ifdef GOATS
14 printf("Goats detected, again!\n"); // doesn't print
15 #endif
16 }

Built-in Macros
The standard defines a lot of built-in macros that you can test and use for conditional compilation. Let’s look
at those here.

Mandatory Macros
These are all defined:

Macro Description
__DATE__ The date of compilation—like when you’re
compiling this file—in Mmm dd yyyy format
__TIME__ The time of compilation in hh:mm:ss format
__FILE__ A string containing this file’s name
__LINE__ The line number of the file this macro appears on
__func__ The name of the function this appears in, as a string97
__STDC__ Defined with 1 if this is a standard C compiler
__STDC_HOSTED__ This will be 1 if the compiler is a hosted
implementation98 , otherwise 0
__STDC_VERSION__ This version of C, a constant long int in the form
yyyymmL, e.g. 201710L

Let’s put these together.


1 #include <stdio.h>
2

3 int main(void)
4 {
97
This isn’t really a macro—it’s technically an identifier. But it’s the only predefined identifier and it feels very macro-like, so I’m
including it here. Like a rebel.
98
A hosted implementation basically means you’re running the full C standard, probably on an operating system of some kind. Which
you probably are. If you’re running on bare metal in some kind of embedded system, you’re probably on a standalone implementation.
THE C PREPROCESSOR 110

5 printf("This function: %s\n", __func__);


6 printf("This file: %s\n", __FILE__);
7 printf("This line: %d\n", __LINE__);
8 printf("Compiled on: %s %s\n", __DATE__, __TIME__);
9 printf("C Version: %ld\n", __STDC_VERSION__);
10 }

The output on my system is:


This function: main
This file: foo.c
This line: 7
Compiled on: Nov 23 2020 17:16:27
C Version: 201710

__FILE__, __func__ and __LINE__ are particularly useful to report error conditions in messages to devel-
opers. The assert() macro in <assert.h> uses these to call out where in the code the assertion failed.

Optional Macros
Your implementation might define these, as well. Or it might not.

Macro Description
__STDC_ISO_10646__ If defined, wchar_t holds Unicode values,
otherwise something else
__STDC_MB_MIGHT_NEQ_WC__ A 1 indicates that the values in multibyte characters
might not map equally to values in wide characters
__STDC_UTF_16__ A 1 indicates that the system uses UTF-16 encoding
in type char16_t
__STDC_UTF_32__ A 1 indicates that the system uses UTF-32 encoding
in type char32_t
__STDC_ANALYZABLE__ A 1 indicates the code is analyzable99
__STDC_IEC_559__ 1 if IEEE-754 (aka IEC 60559) floating point is
supported
__STDC_IEC_559_COMPLEX__ 1 if IEC 60559 complex floating point is supported
__STDC_LIB_EXT1__ 1 if this implementation supports a variety of “safe”
alternate standard library functions (they have _s
suffixes on the name)
__STDC_NO_ATOMICS__ 1 if this implementation does not support _Atomic
or <stdatomic.h>
__STDC_NO_COMPLEX__ 1 if this implementation does not support complex
types or <complex.h>
__STDC_NO_THREADS__ 1 if this implementation does not support
<threads.h>
__STDC_NO_VLA__ 1 if this implementation does not support
variable-length arrays

Macros with Arguments


Macros are more powerful than simple substitution, though. You can set them up to take arguments that are
substituted in, as well.
99
OK, I know that was a cop-out answer. Basically there’s an optional extension compilers can implement wherein they agree to limit
certain types of undefined behavior so that the C code is more amenable to static code analysis. It is unlikely you’ll need to use this.
THE C PREPROCESSOR 111

A question often arises for when to use parameterized macros versus functions. Short answer: use functions.
But you’ll see lots of macros in the wild and in the standard library. People tend to use them for short, mathy
things, and also for features that might change from platform to platform. You can define different keywords
for one platform or another.

Macros with One Argument


Let’s start with a simple one that squares a number:
1 #include <stdio.h>
2

3 #define SQR(x) x * x // Not quite right, but bear with me


4

5 int main(void)
6 {
7 printf("%d\n", SQR(12)); // 144
8 }

What that’s saying is “everywhere you see SQR with some value, replace it with that value times itself”.
So line 7 will be changed to:
7 printf("%d\n", 12 * 12); // 144

which C comfortably converts to 144.


But we’ve made an elementary error in that macro, one that we need to avoid.
Let’s check it out. What if we wanted to compute SQR(3 + 4)? Well, 3 + 4 = 7, so we must want to
compute 72 = 49. That’s it; 49—final answer.
Let’s drop it in our code and see that we get… 19?
7 printf("%d\n", SQR(3 + 4)); // 19!!??

What happened?
If we follow the macro expansion, we get
7 printf("%d\n", 3 + 4 * 3 + 4); // 19!

Oops! Since multiplication takes precedence, we do the 4 × 3 = 12 first, and get 3 + 12 + 4 = 19. Not
what we were after.
So we have to fix this to make it right.
This is so common that you should automatically do it every time you make a parameterized math
macro!
The fix is easy: just add some parentheses!
3 #define SQR(x) (x) * (x) // Better... but still not quite good enough!

And now our macro expands to:


7 printf("%d\n", (3 + 4) * (3 + 4)); // 49! Woo hoo!

But we actually still have the same problem which might manifest if we have a higher-precedence operator
than multiply (*) nearby.
So the safe, proper way to put the macro together is to wrap the whole thing in additional parentheses, like
so:
3 #define SQR(x) ((x) * (x)) // Good!
THE C PREPROCESSOR 112

Just make it a habit to do that when you make a math macro and you can’t go wrong.

Macros with More than One Argument


You can stack these things up as much as you want:
#define TRIANGLE_AREA(w, h) (0.5 * (w) * (h))

Let’s do some macros that solve for 𝑥 using the quadratic formula. Just in case you don’t have it on the top
of your head, it says for equations of the form:
𝑎𝑥2 + 𝑏𝑥 + 𝑐 = 0
you can solve for 𝑥 with the quadratic formula:

−𝑏± 𝑏2 − 4𝑎𝑐
𝑥=
2𝑎
Which is crazy. Also notice the plus-or-minus (±) in there, indicating that there are actually two solutions.
So let’s make macros for both:
#define QUADP(a, b, c) ((-(b) + sqrt((b) * (b) - 4 * (a) * (c))) / (2 * (a)))
#define QUADM(a, b, c) ((-(b) - sqrt((b) * (b) - 4 * (a) * (c))) / (2 * (a)))

So that gets us some math. But let’s define one more that we can use as arguments to printf() to print both
answers.
// macro replacement
// |-----------| |----------------------------|
#define QUAD(a, b, c) QUADP(a, b, c), QUADM(a, b, c)

That’s just a couple values separated by a comma—and we can use that as a “combined” argument of sorts
to printf() like this:
printf("x = %f or x = %f\n", QUAD(2, 10, 5));

Let’s put it together into some code:


1 #include <stdio.h>
2 #include <math.h> // For sqrt()
3

4 #define QUADP(a, b, c) ((-(b) + sqrt((b) * (b) - 4 * (a) * (c))) / (2 * (a)))


5 #define QUADM(a, b, c) ((-(b) - sqrt((b) * (b) - 4 * (a) * (c))) / (2 * (a)))
6 #define QUAD(a, b, c) QUADP(a, b, c), QUADM(a, b, c)
7

8 int main(void)
9 {
10 printf("2*x^2 + 10*x + 5 = 0\n");
11 printf("x = %f or x = %f\n", QUAD(2, 10, 5));
12 }

And this gives us the output:


2*x^2 + 10*x + 5 = 0
x = -0.563508 or x = -4.436492

Plugging in either of those values gives us roughly zero (a bit off because the numbers aren’t exact):
2 × −0.5635082 + 10 × −0.563508 + 5 ≈ 0.000003
THE C PREPROCESSOR 113

Macros with Variable Arguments


There’s also a way to have a variable number of arguments passed to a macro, using ellipses (...) after
the known, named arguments. When the macro is expanded, all of the extra arguments will be in a comma-
separated list in the __VA_ARGS__ macro, and can be replaced from there:
1 #include <stdio.h>
2

3 // Combine the first two arguments to a single number,


4 // then have a commalist of the rest of them:
5

6 #define X(a, b, ...) (10*(a) + 20*(b)), __VA_ARGS__


7

8 int main(void)
9 {
10 printf("%d %f %s %d\n", X(5, 4, 3.14, "Hi!", 12));
11 }

The substitution that takes place on line 10 would be:


10 printf("%d %f %s %d\n", (10*(5) + 20*(4)), 3.14, "Hi!", 12);

for output:
130 3.140000 Hi! 12

You can also “stringify” __VA_ARGS__ by putting a # in front of it:


#define X(...) #__VA_ARGS__

printf("%s\n", X(1,2,3)); // Prints "1, 2, 3"

Stringification
Already mentioned, just above, you can turn any argument into a string by preceding it with a # in the
replacement text.
For example, we could print anything as a string with this macro and printf():
#define STR(x) #x

printf("%s\n", STR(3.14159));

In that case, the substitution leads to:


printf("%s\n", "3.14159");

Let’s see if we can use this to greater effect so that we can pass any int variable name into a macro, and
have it print out it’s name and value.
1 #include <stdio.h>
2

3 #define PRINT_INT_VAL(x) printf("%s = %d\n", #x, x)


4

5 int main(void)
6 {
7 int a = 5;
8

9 PRINT_INT_VAL(a); // prints "a = 5"


10 }
THE C PREPROCESSOR 114

On line 9, we get the following macro replacement:


1 printf("%s = %d\n", "a", 5);

Concatenation
We can concatenate two arguments together with ##, as well. Fun times!
#define CAT(a, b) a ## b

printf("%f\n", CAT(3.14, 1592)); // 3.141592

Multiline Macros
It’s possible to continue a macro to multiple lines if you escape the newline with a backslash (\).
Let’s write a multiline macro that prints numbers from 0 to the product of the two arguments passed in.
1 #include <stdio.h>
2

3 #define PRINT_NUMS_TO_PRODUCT(a, b) { \
4 int product = (a) * (b); \
5 for (int i = 0; i < product; i++) { \
6 printf("%d\n", i); \
7 } \
8 }
9

10 int main(void)
11 {
12 PRINT_NUMS_TO_PRODUCT(2, 4); // Outputs numbers from 0 to 7
13 }

A couple things to note there:


• Escapes at the end of every line except the last one to indicate that the macro continues.
• Though not strictly necessary, I wrapped the whole thing in curly braces. This did two things:
1. Made it look nice.
2. Made a new block scope for my product variable so it wouldn’t conflict with any other existing
variables at the outer block scope.

The #error Directive


This directive causes the compiler to error out as soon as it sees it.
Commonly, this is used inside a conditional to prevent compilation unless some prerequisites are met:
#ifndef __STDC_IEC_559__
#error I really need IEEE-754 floating point to compile. Sorry!
#endif

Some compilers have a non-standard complementary #warning directive that will output a warning but not
stop compilation, but this is not in the C11 spec.

The #pragma Directive


This is one funky directive, short for “pragmatic”. You can use it to do… well, anything your compiler
supports you doing with it.
THE C PREPROCESSOR 115

Basically the only time you’re going to add this to your code is if some documentation tells you to do so.

Non-Standard Pragmas
Here’s one non-standard example of using #pragma to cause the compiler to execute a for loop in parallel
with multiple threads (if the compiler supports the OpenMP100 extension):
#pragma omp parallel for
for (int i = 0; i < 10; i++) { ... }

There are all kinds of #pragma directives documented across all four corners of the globe.
All unrecognized #pragmas are ignored by the compiler.

Standard Pragmas
There are also a few standard ones, and these start with STDC, and follow the same form:
#pragma STDC pragma_name on-off

The on-off portion can be either ON, OFF, or DEFAULT.


And the pragma_name can be one of these:

Pragma Name Description


FP_CONTRACT Allow floating point expressions to be contracted
into a single operation to avoid rounding errors that
might occur from multiple operations.
FENV_ACCESS Set to ON if you plan to access the floating point
status flags. If OFF, the compiler might perform
optimizations that cause the values in the flags to be
inconsistent or invalid.
CX_LIMITED_RANGE Set to ON to allow the compiler to skip overflow
checks when performing complex arithmetic.
Defaults to OFF.

For example:
#pragma STDC FP_CONTRACT OFF
#pragma STDC CX_LIMITED_RANGE ON

As for CX_LIMITED_RANGE, the spec points out:


The purpose of the pragma is to allow the implementation to use the formulas:
(𝑥 + 𝑖𝑦) × (𝑢 + 𝑖𝑣) = (𝑥𝑢 − 𝑦𝑣) + 𝑖(𝑦𝑢 + 𝑥𝑣)
(𝑥 + 𝑖𝑦)/(𝑢 + 𝑖𝑣) = [(𝑥𝑢 + 𝑦𝑣) + 𝑖(𝑦𝑢 − 𝑥𝑣)]/(𝑢2 + 𝑣2 )
|𝑥 + 𝑖𝑦| = √𝑥2 + 𝑦2
where the programmer can determine they are safe.

_Pragma Operator
This is another way to declare a pragma that you could use in a macro.
These are equivalent:
100
https://www.openmp.org/
THE C PREPROCESSOR 116

#pragma "Unnecessary" quotes


_Pragma("\"Unnecessary\" quotes")

This can be used in a macro, if need be:


#define PRAGMA(x) _Pragma(#x)

The #line Directive


This allows you to override the values for __LINE__ and __FILE__. If you want.
I’ve never wanted to do this, but in K&R2, they write:
For the benefit of other preprocessors that generate C programs […]
So maybe there’s that.
To override the line number to, say 300:
#line 300

and __LINE__ will keep counting up from there.


To override the line number and the filename:
#line 300 "newfilename"

The Null Directive


A # on a line by itself is ignored by the preprocessor. Now, to be entirely honest, I don’t know what the use
case is for this.
I’ve seen examples like this:
#ifdef FOO
#
#else
printf("Something");
#endif

which is just cosmetic; the line with the solitary # can be deleted with no ill effect.
Or maybe for cosmetic consistency, like this:
#
#ifdef FOO
x = 2;
#endif
#
#if BAR == 17
x = 12;
#endif
#

But, with respect to cosmetics, that’s just ugly.


Another post mentions elimination of comments—that in GCC, a comment after a # will not be seen by the
compiler. Which I don’t doubt, but the specification doesn’t seem to say this is standard behavior.
My searches for rationale aren’t bearing much fruit. So I’m going to just say this is some good ol’ fashioned
C esoterica.
structs II: More Fun with structs

Turns out there’s a lot more you can do with structs than we’ve talked about, but it’s just a big pile of
miscellaneous things. So we’ll throw them in this chapter.
If you’re good with struct basics, you can round out your knowledge here.

Anonymous structs
These are “the struct with no name”. We also mention these in the typedef section, but we’ll refresh here.
Here’s a regular struct:
struct animal {
char *name;
int leg_count, speed;
};

And here’s the anonymous equivalent:


struct { // <-- No name!
char *name;
int leg_count, speed;
};

Okaaaaay. So we have a struct, but it has no name, so we have no way of using it later? Seems pretty
pointless.
Admittedly, in that example, it is. But we can still make use of it a couple ways.
One is rare, but since the anonymous struct represents a type, we can just put some variable names after it
and use them.
struct { // <-- No name!
char *name;
int leg_count, speed;
} a, b, c; // 3 variables of this struct type

a.name = "antelope";
c.leg_count = 4; // for example

But that’s still not that useful.


Far more common is use of anonymous structs with a typedef so that we can use it later (e.g. to pass
variables to functions).
typedef struct { // <-- No name!
char *name;
int leg_count, speed;

117
STRUCTS II: MORE FUN WITH STRUCTS 118

} animal; // New type: animal

animal a, b, c;

a.name = "antelope";
c.leg_count = 4; // for example

Personally, I don’t use many anonymous structs. I think it’s more pleasant to see the entire struct animal
before the variable name in a declaration.
But that’s just, like, my opinion, man.

Self-Referential structs
For any graph-like data structure, it’s useful to be able to have pointers to the connected nodes/vertices. But
this means that in the definition of a node, you need to have a pointer to a node. It’s chicken and eggy!
But it turns out you can do this in C with no problem whatsoever.
For example, here’s a linked list node:
struct node {
int data;
struct node *next;
};

It’s important to node that next is a pointer. This is what allows the whole thing to even build. Even though
the compiler doesn’t know what the entire struct node looks like yet, all pointers are the same size.
Here’s a cheesy linked list program to test it out:
1 #include <stdio.h>
2 #include <stdlib.h>
3

4 struct node {
5 int data;
6 struct node *next;
7 };
8

9 int main(void)
10 {
11 struct node *head;
12

13 // Hackishly set up a linked list (11)->(22)->(33)


14 head = malloc(sizeof(struct node));
15 head->data = 11;
16 head->next = malloc(sizeof(struct node));
17 head->next->data = 22;
18 head->next->next = malloc(sizeof(struct node));
19 head->next->next->data = 33;
20 head->next->next->next = NULL;
21

22 // Traverse it
23 for (struct node *cur = head; cur != NULL; cur = cur->next) {
24 printf("%d\n", cur->data);
25 }
26 }
STRUCTS II: MORE FUN WITH STRUCTS 119

Running that prints:


11
22
33

Flexible Array Members


Back in the good old days, when people carved C code out of wood, some folks thought would be neat if
they could allocate structs that had variable length arrays at the end of them.
I want to be clear that the first part of the section is the old way of doing things, and we’re going to do things
the new way after that.
For example, maybe you could define a struct for holding strings and the length of that string. It would
have a length and an array to hold the data. Maybe something like this:
struct len_string {
int length;
char data[8];
};

But that has 8 hardcoded as the maximum length of a string, and that’s not much. What if we did something
clever and just malloc()d some extra space at the end after the struct, and then let the data overflow into
that space?
Let’s do that, and then allocate another 40 bytes on top of it:
struct len_string *s = malloc(sizeof *s + 40);

Because data is the last field of the struct, if we overflow that field, it runs out into space that we already
allocated! For this reason, this trick only works if the short array is the last field in the struct.
// Copy more than 8 bytes!

strcpy(s->data, "Hello, world!"); // Won't crash. Probably.

In fact, there was a common compiler workaround for doing this, where you’d allocate a zero length array at
the end:
struct len_string {
int length;
char data[0];
};

And then every extra byte you allocated was ready for use in that string.
Because data is the last field of the struct, if we overflow that field, it runs out into space that we already
allocated!
// Copy more than 8 bytes!

strcpy(s->data, "Hello, world!"); // Won't crash. Probably.

But, of course, actually accessing the data beyond the end of that array is undefined behavior! In these
modern times, we no longer deign to resort to such savagery.
Luckily for us, we can still get the same effect with C11, but now it’s legal.
Let’s just change our above definition to have no size for the array101 :
101
Technically we say that it has an incomplete type.
STRUCTS II: MORE FUN WITH STRUCTS 120

struct len_string {
int length;
char data[];
};

Again, this only works if the flexible array member is the last field in the struct.
And then we can allocate all the space we want for those strings by malloc()ing larger than the struct
len_string, as we do in this example that makes a new struct len_string from a C string:

struct len_string *len_string_from_c_string(char *s)


{
int len = strlen(s);

// Allocate "len" more bytes than we'd normally need


struct len_string *ls = malloc(sizeof *ls + len);

ls->length = len;

// Copy the string into those extra bytes


memcpy(ls->data, s, len);

return ls;
}

Padding Bytes
Beware that C is allowed to add padding bytes within or after a struct as it sees fit. You can’t trust that they
will be directly adjacent in memory102 .
Let’s take a look at this program. We output two numbers. One is the sum of the sizeofs the individual
field types. The other is the sizeof the entire struct.
One would expect them to be the same. The size of the total is the size of the sum of its parts, right?
1 #include <stdio.h>
2

3 struct foo {
4 int a;
5 char b;
6 int c;
7 char d;
8 };
9

10 int main(void)
11 {
12 printf("%zu\n", sizeof(int) + sizeof(char) + sizeof(int) + sizeof(char));
13 printf("%zu\n", sizeof(struct foo));
14 }

But on my system, this outputs:


10
16
102
Though some compilers have options to force this to occur—search for __attribute__((packed)) to see how to do this with
GCC.
STRUCTS II: MORE FUN WITH STRUCTS 121

They’re not the same! The compiler has added 6 bytes of padding to help it be more performant. Maybe you
got different output with your compiler, but unless you’re forcing it, you can’t be sure there’s no padding.

offsetof
In the previous section, we saw that the compiler could inject padding bytes at will inside a structure.
What if we needed to know where those were? We can measure it with offsetof, defined in <stddef.h>.
Let’s modify the code from above to print the offsets of the individual fields in the struct:
1 #include <stdio.h>
2 #include <stddef.h>
3

4 struct foo {
5 int a;
6 char b;
7 int c;
8 char d;
9 };
10

11 int main(void)
12 {
13 printf("%zu\n", offsetof(struct foo, a));
14 printf("%zu\n", offsetof(struct foo, b));
15 printf("%zu\n", offsetof(struct foo, c));
16 printf("%zu\n", offsetof(struct foo, d));
17 }

For me, this outputs:


0
4
8
12

indicating that we’re using 4 bytes for each of the fields. It’s a little weird, because char is only 1 byte, right?
The compiler is putting 3 padding bytes after each char so that all the fields are 4 bytes long. Presumably
this will run faster on my CPU.

Bit-Fields
In my experience, these are rarely used, but you might see them out there from time to time, especially in
lower-level applications that pack bits together into larger spaces.
Let’s take a look at some code to demonstrate a use case:
1 #include <stdio.h>
2

3 struct foo {
4 unsigned int a;
5 unsigned int b;
6 unsigned int c;
7 unsigned int d;
8 };
9
STRUCTS II: MORE FUN WITH STRUCTS 122

10 int main(void)
11 {
12 printf("%zu\n", sizeof(struct foo));
13 }

For me, this prints 16. Which makes sense, since unsigneds are 4 bytes on my system.
But what if we knew that all the values that were going to be stored in a and b could be stored in 5 bits, and
the values in c, and d could be stored in 3 bits? That’s only a total 16 bits. Why have 128 bits reserved for
them if we’re only going to use 16?
Well, we can tell C to pretty-please try to pack these values in. We can specify the maximum number of bits
that values can take (from 1 up the size of the containing type).
We do this by putting a colon after the field name, followed by the field width in bits.
3 struct foo {
4 unsigned int a:5;
5 unsigned int b:5;
6 unsigned int c:3;
7 unsigned int d:3;
8 };

Now when I ask C how big my struct foo is, it tells me 4! It was 16 bytes, but now it’s only 4. It has
“packed” those 4 values down into 4 bytes, which is a four-fold memory savings.
The tradeoff is, of course, that the 5-bit fields can only hold values from 0-31 and the 3-bit fields can only
hold values from 0-7. But life’s all about compromise, after all.

Non-Adjacent Bit-Fields
A gotcha: C will only combine adjacent bit-fields. If they’re interrupted by non-bit-fields, you get no
savings:
struct foo { // sizeof(struct foo) == 16 (for me)
unsigned int a:1; // since a is not adjacent to c.
unsigned int b;
unsigned int c:1;
unsigned int d;
};

A quick rearrangement yields some space savings from 16 bytes down to 12 bytes (on my system):
struct foo { // sizeof(struct foo) == 12 (for me)
unsigned int a:1;
unsigned int c:1;
unsigned int b;
unsigned int d;
};

Put all your bitfields together to get the compiler to combine them.

Signed or Unsigned ints


If you just declare a bit-field to be int, the different compilers will treat it as signed or unsigned. Just like
the situation with char.
Be specific about the signedness when using bit-fields.
STRUCTS II: MORE FUN WITH STRUCTS 123

Unnamed Bit-Fields
In some specific circumstances, you might need to reserve some bits for hardware reasons, but not need to
use them in code.
For example, let’s say you have a byte where the top 2 bits have a meaning, the bottom 1 bit has a meaning,
but the middle 5 bits do not get used by you103 .
We could do something like this:
struct foo {
unsigned char a:2;
unsigned char dummy:5;
unsigned char b:1;
};

And that works—in our code we use a and b, but never dummy. It’s just there to eat up 5 bits to make sure a
and b are in the “required” (by this contrived example) positions within the byte.
C allows us a way to clean this up: unnamed bit-fields. You can just leave the name (dummy) out in this case,
and C is perfectly happy for the same effect:
struct foo {
unsigned char a:2;
unsigned char :5; // <-- unnamed bit-field!
unsigned char b:1;
};

Zero-Width Unnamed Bit-Fields


Some more esoterica out here… Let’s say you were packing bits into an unsigned int, and you needed
some adjacent bit-fields to pack into the next unsigned int.
That is, if you do this:
struct foo {
unsigned int a:1;
unsigned int b:2;
unsigned int c:3;
unsigned int d:4;
};

the compiler packs all those into a single unsigned int. But what if you needed a and b in one int, and c
and d in a different one?
There’s a solution for that: put an unnamed bit-field of width 0 where you want the compiler to start anew
with packing bits in a different int:
struct foo {
unsigned int a:1;
unsigned int b:2;
unsigned int :0; // <--Zero-width unnamed bit-field
unsigned int c:3;
unsigned int d:4;
};

It’s analogous to an explicit page break in a word processor. You’re telling the compiler, “Stop packing bits
in this unsigned, and start packing them in the next one.”
103
Assuming 8-bit chars, i.e. CHAR_BIT == 8.
STRUCTS II: MORE FUN WITH STRUCTS 124

Unions
These are basically just like structs, except the fields overlap in memory. The union will be only large
enough for the largest field, and you can only use one field at a time.
It’s a way to reuse the same memory space for different types of data.
You declare them just like structs, except it’s union. Take a look at this:
union foo {
int a, b, c, d, e, f;
float g, h;
char i, j, k, l;
};

Now, that’s a lot of fields. If this were a struct, my system would tell me it took 36 bytes to hold it all.
But it’s a union, so all those fields overlap in the same stretch of memory. The biggest one is int (or float),
taking up 4 bytes on my system. And, indeed, if I ask for the sizeof the union foo, it tells me 4!
The tradeoff is that you can only portably use one of those fields at a time. If you try to read from a field that
was not the last one written to, the behavior is unspecified.
Let’s take that crazy union and first store an int in it, then a float. Then we’ll print out the int again to
see what’s in there—even though, since it wasn’t the last value we stored, the result is unspecified.
1 #include <stdio.h>
2

3 union foo {
4 int a, b, c, d, e, f;
5 float g, h;
6 char i, j, k, l;
7 };
8

9 int main(void)
10 {
11 union foo x;
12

13 x.a = 12;
14 printf("%d\n", x.a); // OK--x.a was the last thing we stored into
15

16 x.g = 3.141592;
17 printf("%f\n", x.g); // OK--x.g was the last thing we stored into
18

19 printf("%d\n", x.a); // Unspecified behavior!


20 }

On my machine, this prints:


12
3.141592
1078530008

Probably deep down the decimal value 1078530008 is probably the same pattern of bits as 3.141592, but
the spec makes no guarantees about this.

Pointers to unions
If you have a pointer to a union, you can cast that pointer to any of the types of the fields in that union and
get the values out that way.
STRUCTS II: MORE FUN WITH STRUCTS 125

In this example, we see that the union has ints and floats in it. And we get pointers to the union, but
we cast them to int* and float* types (the cast silences compiler warnings). And then if we dereference
those, we see that they have the values we stored directly in the union.
1 #include <stdio.h>
2

3 union foo {
4 int a, b, c, d, e, f;
5 float g, h;
6 char i, j, k, l;
7 };
8

9 int main(void)
10 {
11 union foo x;
12

13 int *foo_int_p = (int *)&x;


14 float *foo_float_p = (float *)&x;
15

16 x.a = 12;
17 printf("%d\n", x.a); // 12
18 printf("%d\n", *foo_int_p); // 12, again
19

20 x.g = 3.141592;
21 printf("%f\n", x.g); // 3.141592
22 printf("%f\n", *foo_float_p); // 3.141592, again
23 }

The reverse is also true. If we have a pointer to a type inside the union, we can cast that to a pointer to the
union and access its members.

union foo x;
int *foo_int_p = (int *)&x; // Pointer to int field
union foo *p = (union foo *)foo_int_p; // Back to pointer to union

p->a = 12; // This line the same as...


x.a = 12; // this one.

All this just lets you know that, under the hood, all these values in a union start at the same place in memory,
and that’s the same as where the entire union is.
Characters and Strings II

We’ve talked about how char types are actually just small integer types… but it’s the same for a character
in single quotes.
But a string in double quotes is type const char *.
Turns out there are few more types of strings and characters, and it leads down one of the most infamous
rabbit holes in the language: the whole multibyte/wide/Unicode/localization thingy.
We’re going to peer into that rabbit hole, but not go in. …Yet!

Escape Sequences
We’re used to strings and characters with regular letters, punctuation, and numbers:
char *s = "Hello!";
char t = 'c';

But what if we want some special characters in there that we can’t type on the keyboard because they don’t
exist (e.g. “€”), or even if we want a character that’s a single quote? We clearly can’t do this:
char t = ''';

To do these things, we use something called escape sequences. These are the backslash character (\) followed
by another character. The two (or more) characters together have special meaning.
For our single quote character example, we can put an escape (that is, \) in front of the central single quote
to solve it:
char t = '\'';

Now C knows that \' means just a regular quote we want to print, not the end of the character sequence.
You can say either “backslash” or “escape” in this context (“escape that quote”) and C devs will know what
you’re talking about. Also, “escape” in this context is different than your Esc key or the ASCII ESC code.

Frequently-used Escapes
In my humble opinion, these escape characters make up 99.2%104 of all escapes.

Code Description
\n Newline character—when printing, continue subsequent output on the next line
\' Single quote—used for a single quote character constant
\" Double quote—used for a double quote in a string literal
\\ Backslash—used for a literal \ in a string or character
104
I just made up that number, but it’s probably not far off

126
CHARACTERS AND STRINGS II 127

Here are some examples of the escapes and what they output when printed.
printf("Use \\n for newline\n"); // Use \n for newline
printf("Say \"hello\"!\n"); // Say "hello"!
printf("%c\n", '\''); // '

Rarely-used Escapes
But there are more escapes! You just don’t see these as often.

Code Description
\a Alert. This makes the terminal make a sound or flash, or both!
\b Backspace. Moves the cursor back a character. Doesn’t delete the character.
\f Formfeed. This moves to the next “page”, but that doesn’t have much modern meaning.
On my system, this behaves like \v.
\r Return. Move to the beginning of the same line.
\t Horizontal tab. Moves to the next horizontal tab stop. On my machine, this lines up on
columns that are multiples of 8, but YMMV.
\v Vertical tab. Moves to the next vertical tab stop. On my machine, this moves to the same
column on the next line.
\? Literal question mark. Sometimes you need this to avoid trigraphs, as shown below.

Single Line Status Updates


A use case for \b or \r is to show status updates that appear on the same line on the screen and don’t cause
the display to scroll. Here’s an example that does a countdown from 10. (Note this makes use of the non-
standard POSIX function sleep() from <unistd.h>—if you’re not on a Unix-like, search for your platform
and sleep for the equivalent.)
1 #include <stdio.h>
2 #include <unistd.h> // Non-standard Unix-likes only for sleep()
3

4 int main(void)
5 {
6 for (int i = 10; i >= 0; i--) {
7 printf("\rT minus %d second%s... \b", i, i != 1? "s": "");
8

9 fflush(stdout); // Force output to update


10

11 sleep(1); // Delay 1 second


12 }
13

14 printf("\rLiftoff! \n");
15 }

Quite a few things are happening on line 7. First of all, we lead with a \r to get us to the beginning of the
current line, then we overwrite whatever’s there with the current countdown. (There’s ternary operator out
there to make sure we print 1 second instead of 1 seconds.)
Also, there’s a space after the ... That’s so that we properly overwrite the last . when i drops from 10 to
9 and we get a column narrower. Try it without the space to see what I mean.

And we wrap it up with a \b to back up over that space so the cursor sits at the exact end of the line in an
aesthetically-pleasing way.
CHARACTERS AND STRINGS II 128

Note that line 14 also has a lot of spaces at the end to overwrite the characters that were already there from
the countdown.
Finally, we have a weird fflush(stdout) in there, whatever that means. Short answer is that most termi-
nals are line buffered by default, meaning they don’t actually display anything until a newline character is
encountered. Since we don’t have a newline (we just have \r), without this line, the program would just
sit there until Liftoff! and then print everything all in one instant. fflush() overrides this behavior and
forces output to happen right now.

The Question Mark Escape


Why bother with this? After all, this works just fine:
printf("Doesn't it?\n");

And it works fine with the escape, too:


printf("Doesn't it\?\n"); // Note \?

So what’s the point??!


Let’s get more emphatic with another question mark and an exclamation point:
printf("Doesn't it??!\n");

When I compile this, I get this warning:


foo.c: In function ‘main’:
foo.c:5:23: warning: trigraph ??! converted to | [-Wtrigraphs]
5 | printf("Doesn't it??!\n");
|

And running it gives this unlikely result:


Doesn't it|

So trigraphs? What the heck is this??!


I’m sure we’ll revisit this dusty corner of the language later, but the short of it is the compiler looks for certain
triplets of characters starting with ?? and it substitutes other characters in their place. So if you’re on some
ancient terminal without a pipe symbol (|) on the keyboard, you can type ??! instead.
You can fix this by escaping the second question mark, like so:
printf("Doesn't it?\?!\n");

And then it compiles and works as-expected.


These days, of course, no one ever uses trigraphs. But that whole ??! does sometimes appear if you decide
to use it in a string for emphasis.

Numeric Escapes
In addition, there are ways to specify numeric constants or other character values inside strings or character
constants.
If you know an octal or hexadecimal representation of a byte, you can include that in a string or character
constant.
The following table has example numbers, but any hex or octal numbers may be used. Pad with leading zeros
if necessary to read the proper digit count.
CHARACTERS AND STRINGS II 129

Code Description
\123 Embed the byte with octal value 123, 3 digits exactly.
\x4D Embed the byte with hex value 4D, 2 digits.
\u2620 Embed the Unicode character at code point with hex value 2620, 4 digits.
\U0001243F Embed the Unicode character at code point with hex value 1243F, 8 digits.

Here’s an example of the less-commonly used octal notation to represent the letter B in between A and C.
Normally this would be used for some kind of special unprintable character, but we have other ways to do
that, below, and this is just an octal demo:
printf("A\102C\n"); // 102 is `B` in ASCII/UTF-8

Note there’s no leading zero on the octal number when you include it this way. But it does need to be three
characters, so pad with leading zeros if you need to.
But far more common is to use hex constants these days. Here’s a demo that you shouldn’t use, but it demos
embedding the UTF-8 bytes 0xE2, 0x80, and 0xA2 in a string, which corresponds to the Unicode “bullet”
character (•).
printf("\xE2\x80\xA2 Bullet 1\n");
printf("\xE2\x80\xA2 Bullet 2\n");
printf("\xE2\x80\xA2 Bullet 3\n");

Produces the following output if you’re on a UTF-8 console (or probably garbage if you’re not):
• Bullet 1
• Bullet 2
• Bullet 3

But that’s a crummy way to do Unicode. You can use the escapes \u (16-bit) or \U (32-bit) to just refer to
Unicode by code point number. The bullet is 2022 (hex) in Unicode, so you can do this and get more portable
results:
printf("\u2022 Bullet 1\n");
printf("\u2022 Bullet 2\n");
printf("\u2022 Bullet 3\n");

Be sure to pad \u with enough leading zeros to get to four characters, and \U with enough to get to eight.
For example, that bullet could be done with \U and four leading zeros:
printf("\U00002022 Bullet 1\n");

But who has time to be that verbose?


Enumerated Types: enum

C offers us another way to have constant integer values by name: enum.


For example:
enum {
ONE=1,
TWO=2
};

printf("%d %d", ONE, TWO); // 1 2

In some ways, it can be better—or different—than using a #define. Key differences:


• enums can only be integer types.
• #define can define anything at all.
• enums are often shown by their symbolic identifier name in a debugger.
• #defined numbers just show as raw numbers which are harder to know the meaning of while debug-
ging.
Since they’re integer types, they can be used any place integers can be used, including in array dimensions
and case statements.
Let’s tear into this more.

Behavior of enum
Numbering
enums are automatically numbered unless you override them.

They start at 0, and autoincrement up from there, by default:


enum {
SHEEP, // Value is 0
WHEAT, // Value is 1
WOOD, // Value is 2
BRICK, // Value is 3
ORE // Value is 4
};

printf("%d %d\n", SHEEP, BRICK); // 0 2

You can force particular integer values, as we saw earlier:


enum {
X=2,

130
ENUMERATED TYPES: ENUM 131

Y=18,
Z=-2
};

Duplicates are not a problem:


enum {
X=2,
Y=2,
Z=2
};

if values are omitted, numbering continues counting in the positive direction from whichever value was last
specified. For example:
enum {
A, // 0, default starting value
B, // 1
C=4, // 4, manually set
D, // 5
E, // 6
F=3 // 3, manually set
G, // 4
H // 5
}

Trailing Commas
This is perfectly fine, if that’s your style:
enum {
X=2,
Y=18,
Z=-2, // <-- Trailing comma
};

It’s gotten more popular in languages of the recent decades so you might be pleased to see it.

Scope
enums scope as you’d expect. If at file scope, the whole file can see it. If in a block, it’s local to that block.

It’s really common for enums to be defined in header files so they can be #included at file scope.

Style
As you’ve noticed, it’s common to declare the enum symbols in uppercase (with underscores).
This isn’t a requirement, but is a very, very common idiom.

Your enum is a Type


This is an important thing to know about enum: they’re a type, analogous to how a struct is a type.
You can give them a tag name so you can refer to the type later and declare variables of that type.
Now, since enums are integer types, why not just use int?
ENUMERATED TYPES: ENUM 132

In C, the best reason for this is code clarity–it’s a nice, typed way to describe your thinking in code. C (unlike
C++) doesn’t actually enforce any values being in range for a particular enum.
Let’s do an example where we declare a variable r of type enum resource that can hold those values:
// Named enum, type is "enum resource"

enum resource {
SHEEP,
WHEAT,
WOOD,
BRICK,
ORE
};

// Declare a variable "r" of type "enum resource"

enum resource r = BRICK;

if (r == BRICK) {
printf("I'll trade you a brick for two sheep.\n");
}

You can also typedef these, of course, though I personally don’t like to.
typedef enum {
SHEEP,
WHEAT,
WOOD,
BRICK,
ORE
} RESOURCE;

RESOURCE r = BRICK;

Another shortcut that’s legal but rare is to declare variables when you declare the enum:
// Declare an enum and some initialized variables of that type:

enum {
SHEEP,
WHEAT,
WOOD,
BRICK,
ORE
} r = BRICK, s = WOOD;

You can also give the enum a name so you can use it later, which is probably what you want to do in most
cases:
// Declare an enum and some initialized variables of that type:

enum resource { // <-- type is now "enum resource"


SHEEP,
WHEAT,
WOOD,
BRICK,
ENUMERATED TYPES: ENUM 133

ORE
} r = BRICK, s = WOOD;

In short, enums are a great way to write nice, scoped, typed, clean code.
Pointers III: Pointers to Pointers and
More

Here’s where we cover some intermediate and advanced pointer usage. If you don’t have pointers down well,
review the previous chapters on pointers and pointer arithmetic before starting on this stuff.

Pointers to Pointers
If you can have a pointer to a variable, and a variable can be a pointer, can you have a pointer to a variable
that it itself a pointer?
Yes! This is a pointer to a pointer, and it’s held in variable of type pointer-pointer.
Before we tear into that, I want to try for a gut feel for how pointers to pointers work.
Remember that a pointer is just a number. It’s a number that represents an index in computer memory,
typically one that holds a value we’re interested in for some reason.
That pointer, which is a number, has to be stored somewhere. And that place is memory, just like everything
else105 .
But because it’s stored in memory, it must have an index it’s stored at, right? The pointer must have an index
in memory where it is stored. And that index is a number. It’s the address of the pointer. It’s a pointer to the
pointer.
Let’s start with a regular pointer to an int, back from the earlier chapters:
1 #include <stdio.h>
2

3 int main(void)
4 {
5 int x = 3490; // Type: int
6 int *p = &x; // Type: pointer to an int
7

8 printf("%d\n", *p); // 3490


9 }

Straightforward enough, right? We have two types represented: int and int*, and we set up p to point to x.
Then we can dereference p on line 8 and print out the value 3490.
But, like we said, we can have a pointer to any variable… so does that mean we can have a pointer to p?
In other words, what type is this expression?
105
There’s some devil in the details with values that are stored in registers only, but we can safely ignore that for our purposes here.
Also the C spec makes no stance on these “register” things beyond the register keyword, the description for which doesn’t mention
registers.

134
POINTERS III: POINTERS TO POINTERS AND MORE 135

int x = 3490; // Type: int


int *p = &x; // Type: pointer to an int

&p // <-- What type is the address of p? AKA a pointer to p?

If x is an int, then &x is a pointer to an int that we’ve stored in p which is type int*. Follow? (Repeat this
paragraph until you do!)
And therefore &p is a pointer to an int*, AKA a “pointer to a pointer to an int”. AKA “int-pointer-pointer”.
Got it? (Repeat the previous paragraph until you do!)
We write this type with two asterisks: int **. Let’s see it in action.
1 #include <stdio.h>
2

3 int main(void)
4 {
5 int x = 3490; // Type: int
6 int *p = &x; // Type: pointer to an int
7 int **q = &p; // Type: pointer to pointer to int
8

9 printf("%d %d\n", *p, **q); // 3490 3490


10 }

Let’s make up some pretend addresses for the above values as examples and see what these three variables
might look like in memory. The address values, below are just made up by me for example purposes:

Variable Stored at Address Value Stored There


x 28350 3490—the value from the code
p 29122 28350—the address of x!
q 30840 29122—the address of p!

Indeed, let’s try it for real on my computer106 and print out the pointer values with %p and I’ll do the same
table again with actual references (printed in hex).

Variable Stored at Address Value Stored There


x 0x7ffd96a07b94 3490—the value from the code
p 0x7ffd96a07b98 0x7ffd96a07b94—the address of x!
q 0x7ffd96a07ba0 0x7ffd96a07b98—the address of p!

You can see those addresses are the same except the last byte, so just focus on those.
On my system, ints are 4 bytes, which is why we’re seeing the address go up by 4 from x to p107 and then
goes up by 8 from p to q. On my system, all pointers are 8 bytes.
Does it matter if it’s an int* or an int**? Is one more bytes than the other? Nope! Remember that all
pointers are addresses, that is indexes into memory. And on my machine you can represent an index with 8
bytes… doesn’t matter what’s stored at that index.
Now check out what we did there on line 9 of the previous example: we double dereferenced q to get back
to our 3490.
This is the important bit about pointers and pointers to pointers:
106
You’re very likely to get different numbers on yours.
107
There is absolutely nothing in the spec that says this will always work this way, but it happens to work this way on my system.
POINTERS III: POINTERS TO POINTERS AND MORE 136

• You can get a pointer to anything with & (including to a pointer!)


• You can get the thing a pointer points to with * (including a pointer!)
So you can think of & as being used to make pointers, and * being the inverse—it goes the opposite direction
of &—to get to the thing pointed to.
In terms of type, each time you &, that adds another pointer level to the type.

If you have Then you run The result type is


int x &x int *
int *x &x int **
int **x &x int ***
int ***x &x int ****

And each time you use dereference (*), it does the opposite:

If you have Then you run The result type is


int ****x *x int ***
int ***x *x int **
int **x *x int *
int *x *x int

Note that you can use multiple *s in a row to quickly dereference, just like we saw in the example code with
**q, above. Each one strips away one level of indirection.

If you have Then you run The result type is


int ****x ***x int *
int ***x **x int *
int **x **x int

In general, &*E == E108 . The dereference “undoes” the address-of.


But & doesn’t work the same way—you can only do those one at a time, and have to store the result in an
intermediate variable:
int x = 3490; // Type: int
int *p = &x; // Type: int *, pointer to an int
int **q = &p; // Type: int **, pointer to pointer to int
int ***r = &q; // Type: int ***, pointer to pointer to pointer to int
int ****s = &r; // Type: int ****, you get the idea
int *****t = &s; // Type: int *****

Pointer Pointers and const


If you recall, declaring a pointer like this:
int *const p;

means that you can’t modify p. Trying to p++ would give you a compile-time error.
But how does that work with int ** or int ***? Where does the const go, and what does it mean?
108
Even if E is NULL, it turns out, weirdly.
POINTERS III: POINTERS TO POINTERS AND MORE 137

Let’s start with the simple bit. The const right next to the variable name refers to that variable. So if you
want an int*** that you can’t change, you can do this:
int ***const p;

p++; // Not allowed

But here’s where things get a little weird.


What if we had this situation:
1 int main(void)
2 {
3 int x = 3490;
4 int *const p = &x;
5 int **q = &p;
6 }

When I build that, I get a warning:


warning: initialization discards ‘const’ qualifier from pointer target type
7 | int **q = &p;
| ^

What’s going on? The


That is, we’re saying that q is type int **, and if you dereference that, the rightmost * in the type goes away.
So after the dereference, we have type int *.
And we’re assigning &p into it which is a pointer to an int *const, or, in other words, int *const *.
But q is int **! A type with different constness on the first *! So we get a warning that the const in p’s
int *const * is being ignored and thrown away.

We can fix that by making sure q’s type is at least as const as p.


int x = 3490;
int *const p = &x;
int *const *q = &p;

And now we’re happy.


We could make q even more const. As it is, above, we’re saying, “q isn’t itself const, but the thing it points
to is const.” But we could make them both const:
int x = 3490;
int *const p = &x;
int *const *const q = &p; // More const!

And that works, too. Now we can’t modify q, or the pointer q points to.

Multibyte Values
We kinda hinted at this in a variety of places earlier, but clearly not every value can be stored in a single byte
of memory. Things take up multiple bytes of memory (assuming they’re not chars). You can tell how many
bytes by using sizeof. And you can tell which address in memory is the first byte of the object by using the
standard & operator, which always returns the address of the first byte.
And here’s another fun fact! If you iterate over the bytes of any object, you get its object representation. Two
things with the same object representation in memory are equal.
If you want to iterate over the object representation, you should do it with pointers to unsigned char.
POINTERS III: POINTERS TO POINTERS AND MORE 138

Let’s make our own version of memcpy() that does exactly this:
void *my_memcpy(void *dest, const void *src, size_t n)
{
// Make local variables for src and dest, but of type unsigned char

const unsigned char *s = src;


unsigned char *d = dest;

while (n-- > 0) // For the given number of bytes


*d++ = *s++; // Copy source byte to dest byte

// Most copy functions return a pointer to the dest as a convenience


// to the caller

return dest;
}

(There are some good examples of post-increment and post-decrement in there for you to study, as well.)
It’s important to note that the version, above, is probably less efficient than the one that comes with your
system.
But you can pass pointers to anything into it, and it’ll copy those objects. Could be int*, struct animal*,
or anything.
Let’s do another example that prints out the object representation bytes of a struct so we can see if there’s
any padding in there and what values it has109 .
1 #include <stdio.h>
2

3 struct foo {
4 char a;
5 int b;
6 };
7

8 int main(void)
9 {
10 struct foo x = {0x12, 0x12345678};
11 unsigned char *p = (unsigned char *)&x;
12

13 for (size_t i = 0; i < sizeof x; i++) {


14 printf("%02X\n", p[i]);
15 }
16 }

What we have there is a struct foo that’s built in such a way that should encourage a compiler to inject
padding bytes (though it doesn’t have to). And then we get an unsigned char * to the first byte of the
struct foo variable x.

From there, all we need to know is the sizeof x and we can loop through that many bytes, printing out the
values (in hex for ease).
Running this gives a bunch of numbers as output. I’ve annotated it below to identify where the values were
stored:
12 | x.a == 0x12
109
Your C compiler is not required to add padding bytes, and the values of any padding bytes that are added are indeterminate.
POINTERS III: POINTERS TO POINTERS AND MORE 139

AB |
BF | padding bytes with "random" value
26 |

78 |
56 | x.b == 0x12345678
34 |
12 |

On all systems, sizeof(char) is 1, and we see that first byte at the top of the output holding the value 0x12
that we stored there.
Then we have some padding bytes—for me, these varied from run to run.
Finally, on my system, sizeof(int) is 4, and we can see those 4 bytes at the end. Notice how they’re the
same bytes as are in the hex value 0x12345678, but strangely in reverse order110 .
So that’s a little peek under the hood at the bytes of a more complex entity in memory.

The NULL Pointer and Zero


These things can be used interchangeably:
• NULL
• 0
• '\0'
• (void *)0

Personally, I always use NULL when I mean NULL, but you might see some other variants from time to time.
Though '\0' (a byte with all bits set to zero) will also compare equal, it’s weird to compare it to a pointer; you
should compare NULL against the pointer. (Of course, lots of times in string processing, you’re comparing
the thing the pointer points to to '\0', and that’s right.)
0 is called the null pointer constant, and, when compared to or assigned into another pointer, it is converted
to a null pointer of the same type.

Pointers as Integers
You can cast pointers to integers and vice-versa (since a pointer is just an index into memory), but you proba-
bly only ever need to do this if you’re doing some low-level hardware stuff. The results of such machinations
are implementation-defined, so they aren’t portable. And weird things could happen.
C does make one guarantee, though: you can convert a pointer to a uintptr_t type and you’ll be able to
convert it back to a pointer without losing any data.
uintptr_t is defined in <stdint.h>111 .

Additionally, if you feel like being signed, you can use intptr_t to the same effect.
110
This will vary depending on the architecture, but my system is little endian, which means the least-significant byte of the number
is stored first. Big endian systems will have the 12 first and the 78 last. But the spec doesn’t dictate anything about this representation.
111
It’s an optional feature, so it might not be there—but it probably is.
POINTERS III: POINTERS TO POINTERS AND MORE 140

Pointer Differences
As you know from the section on pointer arithmetic, you can subtract one pointer from another112 to get the
difference between them in count of array elements.
Now the type of that difference is something that’s up to the implementation, so it could vary from system to
system.
To be more portable, you can store the result in a variable of type ptrdiff_t defined in <stddef.h>.
int cats[100];

int *f = cats + 20;


int *g = cats + 60;

ptrdiff_t d = g - f; // difference is 40

And you can print it by prefixing the integer format specifier with t:
printf("%td\n", d); // Print decimal: 40
printf("%tX\n", d); // Print hex: 28

Pointers to Functions
Functions are just collections of machine instructions in memory, so there’s no reason we can’t get a pointer
to the first instruction of the function.
And then call it.
This can be useful for passing a pointer to a function into another function as an argument. Then the second
one could call whatever was passed in.
The tricky part with these, though, is that C needs to know the type of the variable that is the pointer to the
function.
And it would really like to know all the details.
Like “this is a pointer to a function that takes two int arguments and returns void”.
How do you write all that down so you can declare a variable?
Well, it turns out it looks very much like a function prototype, except with some extra parentheses:
// Declare p to be a pointer to a function.
// This function returns a float, and takes two ints as arguments.

float (*p)(int, int);

Also notice that you don’t have to give the parameters names. But you can if you want; they’re just ignored.
// Declare p to be a pointer to a function.
// This function returns a float, and takes two ints as arguments.

float (*p)(int a, int b);

So now that we know how to declare a variable, how do we know what to assign into it? How do we get the
address of a function?
Turns out there’s a shortcut just like with getting a pointer to an array: you can just refer to the bare function
name without parens. (You can put an & in front of this if you like, but it’s unnecessary and not idiomatic.)
112
Assuming they point to the same array object.
POINTERS III: POINTERS TO POINTERS AND MORE 141

Once you have a pointer to a function, you can call it just by adding parens and an argument list.
Let’s do a simple example where I effectively make an alias for a function by setting a pointer to it. Then
we’ll call it.
This code prints out 3490:
1 #include <stdio.h>
2

3 void print_int(int n)
4 {
5 printf("%d\n", n);
6 }
7

8 int main(void)
9 {
10 // Assign p to point to print_int:
11

12 void (*p)(int) = print_int;


13

14 p(3490); // Call print_int via the pointer


15 }

Notice how the type of p represents the return value and parameter types of print_int. It has to, or else C
will complain about incompatible pointer types.
One more example here shows how we might pass a pointer to a function as an argument to another function.
We’ll write a function that takes a couple integer arguments, plus a pointer to a function that operates on
those two arguments. Then it prints the result.
1 #include <stdio.h>
2

3 int add(int a, int b)


4 {
5 return a + b;
6 }
7

8 int mult(int a, int b)


9 {
10 return a * b;
11 }
12

13 void print_math(int (*op)(int, int), int x, int y)


14 {
15 int result = op(x, y);
16

17 printf("%d\n", result);
18 }
19

20 int main(void)
21 {
22 print_math(add, 5, 7); // 12
23 print_math(mult, 5, 7); // 35
24 }

Take a moment to digest that. The idea here is that we’re going to pass a pointer to a function to
print_math(), and it’s going to call that function to do some math.
POINTERS III: POINTERS TO POINTERS AND MORE 142

This way we can change the behavior of print_math() by passing another function into it. You can see we
do that on lines 22-23 when we pass in pointers to functions add and mult, respectively.
Now, on line 13, I think we can all agree the function signature of print_math() is a sight to behold. And, if
you can believe it, this one is actually pretty straight-forward compared to some things you can construct113 .
But let’s digest it. Turns out there are only three parameters, but they’re a little hard to see:
// op x y
// |-----------------| |---| |---|
void print_math(int (*op)(int, int), int x, int y)

The first, op, is a pointer to a function that takes two ints as arguments and returns an int. This matches
the signatures for both add() and mult().
The second and third, x and y, are just standard int parameters.
Slowly and deliberately let your eyes play over the signature while you identify the working parts. One thing
that always stands out for me is the sequence (*op)(, the parens and the asterisk. That’s the giveaway it’s a
pointer to a function.
Finally, jump back to the Pointers II chapter for a pointer-to-function example using the built-in qsort().

113
The Go Programming Language drew its type declaration syntax inspiration from the opposite of what C does.
Bitwise Operations

These numeric operations effectively allow you to manipulate individual bits in variables, fitting since C is
such a low-level langauge114 .
If you’re not familiar with bitwise operations, Wikipedia has a good bitwise article115 .

Bitwise AND, OR, XOR, and NOT


For each of these, the usual arithmetic conversions take place on the operands (which in this case must be an
integer type), and then the appropriate bitwise operation is performed.

Operation Operator Example


AND & a = b & c
OR | a = b | c
XOR ^ a = b ^ c
NOT ~ a = ~c

Note how they’re similar to the Boolean operators && and ||.
These have assignment shorthand variants similar to += and -=:

Operator Example Longhand equivalent


&= a &= c a = a & c
|= a |= c a = a | c
^= a ^= c a = a ^ c

Bitwise Shift
For these, the integer promotions are performed on each operand (which must be an integer type) and then a
bitwise shift is executed. The type of the result is the type of the promoted left operand.
New bits are filled with zeros, with a possible exception noted in the implementation-defined behavior, below.

Operation Operator Example


Shift left << a = b << c
Shift right >> a = b >> c
114
Not that other languages don’t do this—they do. It is interesting how many modern languages use the same operators for bitwise
that C does.
115
https://en.wikipedia.org/wiki/Bitwise_operation

143
BITWISE OPERATIONS 144

There’s also the same similar shorthand for shifting:

Operator Example Longhand equivalent


>>= a >>= c a = a >> c
<<= a <<= c a = a << c

Watch for undefined behavior: no negative shifts, and no shifts that are larger than the size of the promoted
left operand.
Also watch for implementation-defined behavior: if you right-shift a negative number, the results are
implementation-defined. (It’s perfectly fine to right-shift a signed int, just make sure it’s positive.)
Variadic Functions

Variadic is a fancy word for functions that take arbitrary numbers of arguments.
A regular function takes a specific number of arguments, for example:
int add(int x, int y)
{
return x + y;
}

You can only call that with exactly two arguments which correspond to parameters x and y.
add(2, 3);
add(5, 12);

But if you try it with more, the compiler won’t let you:
add(2, 3, 4); // ERROR
add(5); // ERROR

Variadic functions get around this limitation to a certain extent.


We’ve already seen a famous example in printf()! You can pass all kinds of things to it.
printf("Hello, world!\n");
printf("The number is %d\n", 2);
printf("The number is %d and pi is %f\n", 2, 3.14159);

It seems to not care how many arguments you give it!


Well, that’s not entirely true. Zero arguments will give you an error:
printf(); // ERROR

This leads us to one of the limitations of variadic functions in C: they must have at least one argument.
But aside from that, they’re pretty flexible, even allows arguments to have different types just like printf()
does.
Let’s see how they work!

Ellipses in Function Signatures


So how does it work, syntactically?
What you do is put all the arguments that must be passed first (and remember there has to be at least one) and
after that, you put .... Just like this:
void func(int a, ...) // Literally 3 dots here

Here’s some code to demo that:

145
VARIADIC FUNCTIONS 146

#include <stdio.h>

void func(int a, ...) {


printf("a is %d\n", a); // Prints "a is 2"
}

int main(void)
{
func(2, 3, 4, 5, 6);
}

So, great, we can get that first argument that’s in variable a, but what about the rest of the arguments? How
do you get to them?
Here’s where the fun begins!

Getting the Extra Arguments


You’re going to want to include <stdarg.h> to make any of this work.
First things first, we’re going to use a special variable of type va_list (variable argument list) to keep track
of which variable we’re accessing at a time.
The idea is that we first start processing arguments with a call to va_start(), process each argument in turn
with va_arg(), and then, when done, wrap it up with va_end().
When you call va_start(), you need to pass in the last named parameter (the one just before the ...) so
it knows where to start looking for the additional arguments.
And when you call va_arg() to get the next argument, you have to tell it the type of argument to get next.
Here’s a demo that adds together an arbitrary number of integers. The first argument is the number of integers
to add together. We’ll make use of that to figure out how many times we have to call va_arg().
1 #include <stdio.h>
2 #include <stdarg.h>
3

4 int add(int count, ...)


5 {
6 int total = 0;
7 va_list va;
8

9 va_start(va, count); // Start with arguments after "count"


10

11 for (int i = 0; i < count; i++) {


12 int n = va_arg(va, int); // Get the next int
13

14 total += n;
15 }
16

17 va_end(va); // All done


18

19 return total;
20 }
21

22 int main(void)
23 {
VARIADIC FUNCTIONS 147

24 printf("%d\n", add(4, 6, 2, -4, 17)); // 6 + 2 - 4 + 17 = 21


25 printf("%d\n", add(2, 22, 44)); // 22 + 44 = 66
26 }

When printf() is called, it uses the number of %ds (or whatever) in the format string to know how many
more arguments there are!
If the syntax of va_arg() is looking strange to you (because of that loose type name floating around in there),
you’re not alone. These are implemented with preprocessor macros in order to get all the proper magic in
there.

va_list Functionality
What is that va_list variable we’re using up there? It’s an opaque variable116 that holds information about
which argument we’re going to get next with va_arg(). You see how we just call va_arg() over and over?
The va_list variable is a placeholder that’s keeping track of progress so far.
But we have to initialize that variable to some sensible value. That’s where va_start() comes into play.
When we called va_start(va, count), above, we were saying, “Initialize the va variable to point to the
variable argument immediately after count.”
And that’s why we need to have at least one named variable in our argument list117 .
Once you have that pointer to the initial parameter, you can easily get subsequent argument values by calling
va_arg() repeatedly. When you do, you have to pass in your va_list variable (so it can keep on keeping
track of where you are), as well as the type of argument you’re about to copy off.
It’s up to you as a programmer to figure out which type you’re going to pass to va_arg(). In the above
example, we just did ints. But in the case of printf(), it uses the format specifier to determine which type
to pull off next.
And when you’re done, call va_end() to wrap it up. You must (the spec says) call this on a particular
va_list variable before you decide to call either va_start() or va_copy() on it again. I know we haven’t
talked about va_copy() yet.
So the standard progression is:
• va_start() to initialize your va_list variable
• Repeatedly va_arg() to get the values
• va_end() to deinitialize your va_list variable
I also mentioned va_copy() up there; it makes a copy of your va_list variable in the exact same state.
That is, if you haven’t started with va_arg() with the source variable, the new on won’t be started, either.
If you’ve consumed 5 variables with va_arg() so far, the copy will also reflect that.
va_copy() can be useful if you need to scan ahead through the arguments but need to also remember your
current place.

116
That is, us lowly developers aren’t supposed to know what’s in there or what it means. The spec doesn’t dictate what it is in detail.
117
Honestly, it would be possible to remove that limitation from the language, but the idea is that the macros va_start(), va_arg(),
and va_end() should be able to be written in C. And to make that happen, we need some way to initialize a pointer to the location of
the first parameter. And to do that, we need the name of the first parameter. It would require a language extension to make this possible,
and so far the committee hasn’t found a rationale for doing so.
Locale and Internationalization

Localization is the process of making your app ready to work well in different locales (or countries).
As you might know, not everyone uses the same character for decimal points or for thousands separators…
or for currency.
These locales have names, and you can select one to use. For example, a US locale might write a number
like:
100,000.00
Whereas in Brazil, the same might be written with the commas and decimal points swapped:
100.000,00
Makes it easier to write your code so it ports to other nationalities with ease!
Well, sort of. Turns out C only has one built-in locale, and it’s limited. The spec really leaves a lot of
ambiguity here; it’s hard to be completely portable.
But we’ll do our best!

Setting the Localization, Quick and Dirty


For these calls, include <locale.h>.
There is basically one thing you can portably do here in terms of declaring a specific locale. This is likely
what you want to do if you’re going to do locale anything:
set_locale(LC_ALL, ""); // Use this environment's locale for everything

You’ll want to call that so that the program gets initialized with your current locale.
Getting into more details, there is one more thing you can do and stay portable:
set_locale(LC_ALL, "C"); // Use the default C locale

but that’s called by default every time your program starts, so there’s not much need to do it yourself.
In that second string, you can specify any locale supported by your system. This is completely system-
dependent, so it will vary. On my system, I can specify this:
setlocale(LC_ALL, "en_US.UTF-8"); // Non-portable!

And that’ll work. But it’s only portable to systems which have that exact same name for that exact same
locale, and you can’t guarantee it.
By passing in an empty string ("") for the second argument, you’re telling C, “Hey, figure out what the
current locale on this system is so I don’t have to tell you.”

148
LOCALE AND INTERNATIONALIZATION 149

Getting the Monetary Locale Settings


Because moving green pieces of paper around promises to be the key to happiness118 , let’s talk about monetary
locale. When you’re writing portable code, you have to know what to type for cash, right? Whether that’s
“$”, “€”, “¥”, or “£”.
How can you write that code without going insane? Luckily, once you call setlocale(LC_ALL, ""), you
can just look these up with a call to localeconv():
struct lconv *x = localeconv();

This function returns a pointer to a statically-allocated struct lconv that has all that juicy information
you’re looking for.
Here are the fields of struct lconv and their meanings.
“negative”, and int_ means “international”. Though a lot of these are type char or char*, most (or the
strings they point to) are actually treated as integers119 .
Before we go further, know that CHAR_MAX (from <limits.h>) is the maximum value that can be held in a
char. And that many of the following char values use that to indicate the value isn’t available in the given
locale.

Field Description
char *mon_decimal_point Decimal pointer character for money, e.g. ".".
char *mon_thousands_sep Thousands separator character for money, e.g. ",".
char *mon_grouping Grouping description for money (see below).
char *positive_sign Positive sign for money, e.g. "+" or "".
char *negative_sign Negative sign for money, e.g. "-".
char *currency_symbol Currency symbol, e.g. "$".
char frac_digits When printing monetary amounts, how many digits to print past the
decimal point, e.g. 2.
char p_cs_precedes 1 if the currency_symbol comes before the value for a non-negative
monetary amount, 0 if after.
char n_cs_precedes 1 if the currency_symbol comes before the value for a negative
monetary amount, 0 if after.
char p_sep_by_space Determines the separation of the currency symbol from the value for
non-negative amounts (see below).
char n_sep_by_space Determines the separation of the currency symbol from the value for
negative amounts (see below).
char p_sign_posn Determines the positive_sign position for non-negative values.
char p_sign_posn Determines the positive_sign position for negative values.
char *int_curr_symbol International currency symbol, e.g. "USD ".
char int_frac_digits International value for frac_digits.
char int_p_cs_precedes International value for p_cs_precedes.
char int_n_cs_precedes International value for n_cs_precedes.
char int_p_sep_by_space International value for p_sep_by_space.
char int_n_sep_by_space International value for n_sep_by_space.
char int_p_sign_posn International value for p_sign_posn.
char int_n_sign_posn International value for n_sign_posn.

118
“This planet has—or rather had—a problem, which was this: most of the people living on it were unhappy for pretty much of the
time. Many solutions were suggested for this problem, but most of these were largely concerned with the movement of small green
pieces of paper, which was odd because on the whole it wasn’t the small green pieces of paper that were unhappy.” —The Hitchhiker’s
Guide to the Galaxy, Douglas Adams
119
Remember that char is just a byte-sized integer.
LOCALE AND INTERNATIONALIZATION 150

Monetary Digit Grouping


OK, this is a trippy one. mon_grouping is a char*, so you might be thinking it’s a string. But in this case,
no, it’s really an array of chars. It should always end either with a 0 or CHAR_MAX.
These values describe how to group sets of numbers in currency to the left of the decimal (the whole number
part).
For example, we might have:
2 1 0
--- --- ---
$100,000,000.00

These are groups of three. Group 0 (just left of the decimal) has 3 digits. Group 1 (next group to the left) has
3 digits, and the last one also has 3.
So we could describe these groups, from the right (the decimal) to the left with a bunch of integer values
representing the group sizes:
3 3 3

And that would work for values up to $100,000,000.


But what if we had more? We could keep adding 3s…
3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3

but that’s crazy. Luckily, we can specify 0 to indicate that the previous group size repeats:
3 0

Which means to repeat every 3. That would handle $100, $1,000, $10,000, $10,000,000, $100,000,000,000,
and so on.
You can go legitimately crazy with these to indicate some weird groupings.
For example:
4 3 2 1 0

would indicate:
$1,0,0,0,0,00,000,0000.00

One more value that can occur is CHAR_MAX. This indicates that no more grouping should occur, and can
appear anywhere in the array, including the first value.
3 2 CHAR_MAX

would indicate:
100000000,00,000.00

for example.
And simply having CHAR_MAX in the first array position would tell you there was to be no grouping at all.

Separators and Sign Position


All the sep_by_space variants deal with spacing around the currency sign. Valid values are:

Value Description
0 No space between currency symbol and value.
1 Separate the currency symbol (and sign, if any) from the value with a space.
LOCALE AND INTERNATIONALIZATION 151

Value Description
2 Separate the sign symbol from the currency symbol (if adjacent) with a space,
otherwise separate the sign symbol from the value with a space.

The sign_posn variants are determined by the following values:

Value Description
0 Put parens around the value and the currency symbol.
1 Put the sign string in front of the currency symbol and value.
2 Put the sign string after the currency symbol and value.
3 Put the sign string directly in front of the currency symbol.
4 Put the sign string directly behind the currency symbol.

Example Values
When I get the values on my system, this is what I see (grouping string displayed as individual byte values):
mon_decimal_point = "."
mon_thousands_sep = ","
mon_grouping = 3 3 0
positive_sign = ""
negative_sign = "-"
currency_symbol = "$"
frac_digits = 2
p_cs_precedes = 1
n_cs_precedes = 1
p_sep_by_space = 0
n_sep_by_space = 0
p_sign_posn = 1
n_sign_posn = 1
int_curr_symbol = "USD "
int_frac_digits = 2
int_p_cs_precedes = 1
int_n_cs_precedes = 1
int_p_sep_by_space = 1
int_n_sep_by_space = 1
int_p_sign_posn = 1
int_n_sign_posn = 1

Localization Specifics
Notice how we passed the macro LC_ALL to setlocale() earlier… this hints that there might be some
variant that allows you to be more precise about which parts of the locale you’re setting.
Let’s take a look at the values you can see for these:

Macro Description
LC_ALL Set all of the following to the given locale.
LC_COLLATE Controls the behavior of the strcoll() and strxfrm() functions.
LC_CTYPE Controls the behavior of the character-handling functions120 .
LC_MONETARY Controls the values returned by localeconv().
LOCALE AND INTERNATIONALIZATION 152

Macro Description
LC_NUMERIC Controls the decimal point for the printf() family of functions.
LC_TIME Controls time formatting of the strftime() and wcsftime() time and date
printing functions.

It’s pretty common to see LC_ALL being set, but, hey, at least you have options.
Also I should point out that LC_CTYPE is one of the biggies because it ties into wide characters, a significant
can of worms that we’ll talk about later.

120
Except for isdigit() and isxdigit().
Standard I/O Library

The most basic of all libraries in the whole of the standard C library is the standard I/O library. It’s used for
reading from and writing to files. I can see you’re very excited about this.
So I’ll continue. It’s also used for reading and writing to the console, as we’ve already often seen with the
printf() function.

(A little secret here—many many things in various operating systems are secretly files deep down, and the
console is no exception. “Everything in Unix is a file!” :-))
You’ll probably want some prototypes of the functions you can use, right? To get your grubby little mittens
on those, you’ll want to include stdio.h.
Anyway, so we can do all kinds of cool stuff in terms of file I/O. LIE DETECTED. Ok, ok. We can do all
kinds of stuff in terms of file I/O. Basically, the strategy is this:
1. Use fopen() to get a pointer to a file structure of type FILE*. This pointer is what you’ll be passing
to many of the other file I/O calls.
2. Use some of the other file calls, like fscanf(), fgets(), fprintf(), or etc. using the FILE* returned
from fopen().
3. When done, call fclose() with the FILE*. This let’s the operating system know that you’re truly
done with the file, no take-backs.
What’s in the FILE*? Well, as you might guess, it points to a struct that contains all kinds of information
about the current read and write position in the file, how the file was opened, and other stuff like that. But,
honestly, who cares. No one, that’s who. The FILE structure is opaque to you as a programmer; that is, you
don’t need to know what’s in it, and you don’t even want to know what’s in it. You just pass it to the other
standard I/O functions and they know what to do.
This is actually pretty important: try to not muck around in the FILE structure. It’s not even the same from
system to system, and you’ll end up writing some really non-portable code.
One more thing to mention about the standard I/O library: a lot of the functions that operate on files use an
“f” prefix on the function name. The same function that is operating on the console will leave the “f” off.
For instance, if you want to print to the console, you use printf(), but if you want to print to a file, use
fprintf(), see?

Wait a moment! If writing to the console is, deep down, just like writing to a file, since everything in Unix is
a file, why are there two functions? Answer: it’s more convenient. But, more importantly, is there a FILE*
associated with the console that you can use? Answer: YES!
There are, in fact, three (count ’em!) special FILE*s you have at your disposal merely for just including
stdio.h. There is one for input, and two for output.

That hardly seems fair—why does output get two files, and input only get one?
That’s jumping the gun a bit—let’s just look at them:

153
STANDARD I/O LIBRARY 154

stdin

Input from the console.


stdout

Output to the console.


stderr

Output to the console on the error file stream.


So standard input (stdin) is by default just what you type at the keyboard. You can use that in fscanf() if
you want, just like this:
/* this line: */
scanf("%d", &x);

/* is just like this line: */


fscanf(stdin, "%d", &x);

And stdout works the same way:


printf("Hello, world!\n");
fprintf(stdout, "Hello, world!\n"); /* same as previous line! */

So what is this stderr thing? What happens when you output to that? Well, generally it goes to the console
just like stdout, but people use it for error messages, specifically. Why? On many systems you can redirect
the output from the program into a file from the command line…and sometimes you’re interested in getting
just the error output. So if the program is good and writes all its errors to stderr, a user can redirect just
stderr into a file, and just see that. It’s just a nice thing you, as a programmer, can do.
STANDARD I/O LIBRARY 155

fopen()
Opens a file for reading or writing

Synopsis
#include <stdio.h>

FILE *fopen(const char *path, const char *mode);

Description
The fopen() opens a file for reading or writing.
Parameter path can be a relative or fully-qualified path and file name to the file in question.
Paramter mode tells fopen() how to open the file (reading, writing, or both), and whether or not it’s a binary
file. Possible modes are:
r

Open the file for reading (read-only).


w

Open the file for writing (write-only). The file is created if it doesn’t exist.
r+

Open the file for reading and writing. The file has to already exist.
w+

Open the file for writing and reading. The file is created if it doesn’t already exist.
a

Open the file for append. This is just like opening a file for writing, but it positions the file pointer at the end
of the file, so the next write appends to the end. The file is created if it doesn’t exist.
a+

Open the file for reading and appending. The file is created if it doesn’t exist.
Any of the modes can have the letter “b” appended to the end, as is “wb” (“write binary”), to signify that the
file in question is a binary file. (“Binary” in this case generally means that the file contains non-alphanumeric
characters that look like garbage to human eyes.) Many systems (like Unix) don’t differentiate between binary
and non-binary files, so the “b” is extraneous. But if your data is binary, it doesn’t hurt to throw the “b” in
there, and it might help someone who is trying to port your code to another system.

Return Value
fopen() returns a FILE* that can be used in subsequent file-related calls.

If something goes wrong (e.g. you tried to open a file for read that didn’t exist), fopen() will return NULL.

Example
1 int main(void)
2 {
3 FILE *fp;
4
STANDARD I/O LIBRARY 156

5 if ((fp = fopen("datafile.dat", "r")) == NULL) {


6 printf("Couldn't open datafile.dat for reading\n");
7 exit(1);
8 }
9

10 // fp is now initialized and can be read from it


11 }

See Also
fclose(), freopen()
STANDARD I/O LIBRARY 157

freopen()
Reopen an existing FILE*, associating it with a new path

Synopsis
#include <stdio.h>

FILE *freopen(const char *filename, const char *mode, FILE *stream);

Description
Let’s say you have an existing FILE* stream that’s already open, but you want it to suddenly use a different
file than the one it’s using. You can use freopen() to “re-open” the stream with a new file.
Why on Earth would you ever want to do that? Well, the most common reason would be if you had a program
that normally would read from stdin, but instead you wanted it to read from a file. Instead of changing all
your scanf()s to fscanf()s, you could simply reopen stdin on the file you wanted to read from.
Another usage that is allowed on some systems is that you can pass NULL for filename, and specify a new
mode for stream. So you could change a file from “r+” (read and write) to just “r” (read), for instance. It’s
implementation dependent which modes can be changed.
When you call freopen(), the old stream is closed. Otherwise, the function behaves just like the standard
fopen().

Return Value
freopen() returns stream if all goes well.

If something goes wrong (e.g. you tried to open a file for read that didn’t exist), freopen() will return NULL.

Example
1 #include <stdio.h>
2

3 int main(void)
4 {
5 int i, i2;
6

7 scanf("%d", &i); // read i from stdin


8

9 // now change stdin to refer to a file instead of the keyboard


10 freopen("someints.txt", "r", stdin);
11

12 scanf("%d", &i2); // now this reads from the file "someints.txt"


13

14 printf("Hello, world!\n"); // print to the screen


15

16 // change stdout to go to a file instead of the terminal:


17 freopen("output.txt", "w", stdout);
18

19 printf("This goes to the file \"output.txt\"\n");


20

21 // this is allowed on some systems--you can change the mode of a file:


STANDARD I/O LIBRARY 158

22 freopen(NULL, "wb", stdout); // change to "wb" instead of "w"


23 }

See Also
fclose(), fopen()
STANDARD I/O LIBRARY 159

fclose()
The opposite of fopen()–closes a file when you’re done with it so that it frees system resources.

Synopsis
#include <stdio.h>

int fclose(FILE *stream);

Description
When you open a file, the system sets aside some resources to maintain information about that open file.
Usually it can only open so many files at once. In any case, the Right Thing to do is to close your files when
you’re done using them so that the system resources are freed.
Also, you might not find that all the information that you’ve written to the file has actually been written to
disk until the file is closed. (You can force this with a call to fflush().)
When your program exits normally, it closes all open files for you. Lots of times, though, you’ll have a
long-running program, and it’d be better to close the files before then. In any case, not closing a file you’ve
opened makes you look bad. So, remember to fclose() your file when you’re done with it!

Return Value
On success, 0 is returned. Typically no one checks for this. On error EOF is returned. Typically no one checks
for this, either.

Example
1 FILE *fp;
2

3 fp = fopen("spoonDB.dat", r"); // (you should error-check this)


4 sort_spoon_database(fp);
5 fclose(fp); // pretty simple, huh.

See Also
fopen()
STANDARD I/O LIBRARY 160

printf(), fprintf()
Print a formatted string to the console or to a file.

Synopsis
#include <stdio.h>

int printf(const char *format, ...);


int fprintf(FILE *stream, const char *format, ...);

Description
These functions print formatted strings to a file (that is, a FILE* you likely got from fopen()), or to the
console (which is usually itself just a special file, right?)
The printf() function is legendary as being one of the most flexible outputting systems ever devisied. It
can also get a bit freaky here or there, most notably in the format string. We’ll take it a step at a time here.
The easiest way to look at the format string is that it will print everything in the string as-is, unless a character
has a percent sign (%) in front of it. That’s when the magic happens: the next argument in the printf()
argument list is printed in the way described by the percent code.
Here are the most common percent codes:
%d

Print the next argument as a signed decimal number, like 3490. The argument printed this way should be an
int.

%f

Print the next argument as a signed floating point number, like 3.14159. The argument printed this way
should be a float.
%c

Print the next argument as a character, like 'B'. The argument printed this way should be a char.
%s

Print the next argument as a string, like "Did you remember your mittens?". The argument printed this
way should be a char* or char[].
%%

No arguments are converted, and a plain old run-of-the-mill percent sign is printed. This is how you print a
‘%’ using printf().
So those are the basics. I’ll give you some more of the percent codes in a bit, but let’s get some more breadth
before then. There’s actually a lot more that you can specify in there after the percent sign.
For one thing, you can put a field width in there—this is a number that tells printf() how many spaces to
put on one side or the other of the value you’re printing. That helps you line things up in nice columns. If
the number is negative, the result becomes left-justified instead of right-justified. Example:
printf("%10d", x); /* prints X on the right side of the 10-space field */
printf("%-10d", x); /* prints X on the left side of the 10-space field */

If you don’t know the field width in advance, you can use a little kung-foo to get it from the argument list
just before the argument itself. Do this by placing your seat and tray tables in the fully upright position. The
seatbelt is fastened by placing the—cough. I seem to have been doing way too much flying lately. Ignoring
STANDARD I/O LIBRARY 161

that useless fact completely, you can specify a dynamic field width by putting a * in for the width. If you are
not willing or able to perform this task, please notify a flight attendant and we will reseat you.
int width = 12;
int value = 3490;

printf("%*d\n", width, value);

You can also put a “0” in front of the number if you want it to be padded with zeros:
int x = 17;
printf("%05d", x); /* "00017" */

When it comes to floating point, you can also specify how many decimal places to print by making a field
width of the form “x.y” where x is the field width (you can leave this off if you want it to be just wide
enough) and y is the number of digits past the decimal point to print:
float f = 3.1415926535;

printf("%.2f", f); /* "3.14" */


printf("%7.3f", f); /* " 3.141" <-- 7 spaces across */

Ok, those above are definitely the most common uses of printf(), but there are still more modifiers you
can put in after the percent and before the field width:
0

This was already mentioned above. It pads the spaces before a number with zeros, e.g. "%05d".
-

This was also already mentioned above. It causes the value to be left-justified in the field, e.g. "%-5d".
' ' (space)

This prints a blank space before a positive number, so that it will line up in a column along with negative
numbers (which have a negative sign in front of them). "% d".
+

Always puts a + sign in front of a number that you print so that it will line up in a column along with negative
numbers (which have a negative sign in front of them). "%+d".
#

This causes the output to be printed in a different form than normal. The results vary based on the specifier
used, but generally, hexidecimal output ("%x") gets a "0x" prepended to the output, and octal output ("%o")
gets a "0" prepended to it. These are, if you’ll notice, how such numbers are represented in C source.
Additionally, floating point numbers, when printed with this # modified, will print a trailing decimal point
even if the number has no fractional part. Example: "%#x".
Now, I know earlier I promised the rest of the format specifiers…so ok, here they are:
%i

Just like %d, above.


%o

Prints the integer number out in octal format. Octal is a base-eight number representation scheme invented
on the planet Krylon where all the inhabitants have only eight fingers.
%u

Just like %d, but works on unsigned ints, instead of ints.


STANDARD I/O LIBRARY 162

%x or %X

Prints the unsigned int argument in hexidecimal (base-16) format. This is for people with 16 fingers, or
people who are simply addicted hex, like you should be. Just try it! "%x" prints the hex digits in lowercase,
while "%X" prints them in uppercase.
%F

Just like “%f”, except any string-based results (which can happen for numbers like infinity) are printed in
uppercase.
%e or %E

Prints the float argument in exponential (scientific) notation. This is your classic form similar to “three
times 10 to the 8th power”, except printed in text form: “3e8”. (You see, the “e” is read “times 10 to the”.)
If you use the "%E" specifier, the the exponent “e” is written in uppercase, a la “3E8”.
%g or %G

Another way of printing doubles. In this case the precision you specific tells it how many significant figures
to print.
%p

Prints a pointer type out in hex representation. In other words, the address that the pointer is pointing to is
printed. (Not the value in the address, but the address number itself.)
%n

This specifier is cool and different, and rarely needed. It doesn’t actually print anything, but stores the number
of characters printed so far in the next pointer argument in the list.
int numChars;
float a = 3.14159;
int b = 3490;

printf("%f %d%n\n", a, b, &numChars);


printf("The above line contains %d characters.\n", numChars);

The above example will print out the values of a and b, and then store the number of characters printed so
far into the variable numChars. The next call to printf() prints out that result.
So let’s recap what we have here. We have a format string in the form:
"%[modifier][fieldwidth][.precision][lengthmodifier][formatspecifier]"

Modifier is like the "-" for left justification, the field width is how wide a space to print the result in, the
precision is, for floats, how many decimal places to print, and the format specifier is like %d.
That wraps it up, except what’s this “lengthmodifier” I put up there?! Yes, just when you thought things were
under control, I had to add something else on there. Basically, it’s to tell printf() in more detail what size
the arguments are. For instance, char, short, int, and long are all integer types, but they all use a different
number of bytes of memory, so you can’t use plain old “%d” for all of them, right? How can printf() tell
the difference?
The answer is that you tell it explicitly using another optional letter (the length modifier, this) before the type
specifier. If you omit it, then the basic types are assumed (like %d is for int, and %f is for float).
Here are the format specifiers:
h

Integer referred to is a short integer, e.g. “%hd” is a short and “%hu” is an unsigned short.
l (“ell”)
STANDARD I/O LIBRARY 163

Integer referred to is a long integer, e.g. “%ld” is a long and “%lu” is an unsigned long.
hh

Integer referred to is a char integer, e.g. “%hhd” is a char and “%hhu” is an unsigned char.
ll (“ell ell”)

Integer referred to is a long long integer, e.g. “%lld” is a long long and “%llu” is an unsigned long
long.

I know it’s hard to believe, but there might be still more format and length specifiers on your system. Check
your manual for more information.

Return Value
Example
1 int a = 100;
2 float b = 2.717;
3 char *c = "beej!";
4 char d = 'X';
5 int e = 5;
6

7 printf("%d", a); /* "100" */


8 printf("%f", b); /* "2.717000" */
9 printf("%s", c); /* "beej!" */
10 printf("%c", d); /* "X" */
11 printf("110%%"); /* "110%" */
12

13 printf("%10d\n", a); /* " 100" */


14 printf("%-10d\n", a); /* "100 " */
15 printf("%*d\n", e, a); /* " 100" */
16 printf("%.2f\n", b); /* "2.71" */
17

18 printf("%hhd\n", c); /* "88" <-- ASCII code for 'X' */


19

20 printf("%5d %5.2f %c\n", a, b, d); /* " 100 2.71 X" */

See Also
sprintf(), vprintf(), vfprintf(), vsprintf()
STANDARD I/O LIBRARY 164

scanf(), fscanf()
Read formatted string, character, or numeric data from the console or from a file.

Synopsis
#include <stdio.h>

int scanf(const char *format, ...);


int fscanf(FILE *stream, const char *format, ...);

Description
The scanf() family of functions reads data from the console or from a FILE stream, parses it, and stores
the results away in variables you provide in the argument list.
The format string is very similar to that in printf() in that you can tell it to read a "%d", for instance for an
int. But it also has additional capabilities, most notably that it can eat up other characters in the input that
you specify in the format string.
But let’s start simple, and look at the most basic usage first before plunging into the depths of the function.
We’ll start by reading an int from the keyboard:
int a;

scanf("%d", &a);

scanf() obviously needs a pointer to the variable if it is going to change the variable itself, so we use the
address-of operator to get the pointer.
In this case, scanf() walks down the format string, finds a “%d”, and then knows it needs to read an integer
and store it in the next variable in the argument list, a.
Here are some of the other percent-codes you can put in the format string:
%d

Reads an integer to be stored in an int. This integer can be signed.


%f (%e, %E, and %g are equivalent)

Reads a floating point number, to be stored in a float.


%s

Reads a string. This will stop on the first whitespace character reached, or at the specified field width
(e.g. “%10s”), whichever comes first.
And here are some more codes, except these don’t tend to be used as often. You, of course, may use them as
often as you wish!
%u

Reads an unsigned integer to be stored in an unsigned int.


%x (%X is equivalent)

Reads an unsigned hexidecimal integer to be stored in an unsigned int.


%o

Reads an unsigned octal integer to be stored in an unsigned int.


%i
STANDARD I/O LIBRARY 165

Like %d, except you can preface the input with “0x” if it’s a hex number, or “0” if it’s an octal number.
%c

Reads in a character to be stored in a char. If you specify a field width (e.g. “%12c”, it will read that many
characters, so make sure you have an array that large to hold them.
%p

Reads in a pointer to be stored in a void*. The format of this pointer should be the same as that which is
outputted with printf() and the “%p” format specifier.
%n

Reads nothing, but will store the number of characters processed so far into the next int parameter in the
argument list.
%%

Matches a literal percent sign. No conversion of parameters is done. This is simply how you get a standalone
percent sign in your string without scanf() trying to do something with it.
%[

This is about the weirdest format specifier there is. It allows you to specify a set of characters to be stored
away (likely in an array of chars). Conversion stops when a character that is not in the set is matched.
For example, %[0-9] means “match all numbers zero through nine.” And %[AD-G34] means “match A, D
through G, 3, or 4”.
Now, to convolute matters, you can tell scanf() to match characters that are not in the set by putting a caret
(^) directly after the %[ and following it with the set, like this: %[^A-C], which means “match all characters
that are not A through C.”
To match a close square bracket, make it the first character in the set, like this: %[]A-C] or %[^]A-C]. (I
added the “A-C” just so it was clear that the “]” was first in the set.)
To match a hyphen, make it the last character in the set: %[A-C-].
So if we wanted to match all letters except “%”, “^”, “]”, “B”, “C”, “D”, “E”, and “-”, we could use this
format string: %[^]%^B-E-].
So those are the basics! Phew! There’s a lot of stuff to know, but, like I said, a few of these format specifiers
are common, and the others are pretty rare.
Got it? Now we can go onto the next—no wait! There’s more! Yes, still more to know about scanf(). Does
it never end? Try to imagine how I feel writing about it!
So you know that “%d” stores into an int. But how do you store into a long, short, or double?
Well, like in printf(), you can add a modifier before the type specifier to tell scanf() that you have a
longer or shorter type. The following is a table of the possible modifiers:
h

The value to be parsed is a short int or short unsigned. Example: %hd or %hu.
l

The value to be parsed is a long int or long unsigned, or double (for %f conversions.) Example: %ld,
%lu, or %lf.

The value to be parsed is a long long for integer types or long double for float types. Example: %Ld,
%Lu, or %Lf.
STANDARD I/O LIBRARY 166

Tells scanf() do to the conversion specified, but not store it anywhere. It simply discards the data as it reads
it. This is what you use if you want scanf() to eat some data but you don’t want to store it anywhere; you
don’t give scanf() an argument for this conversion. Example: %*d.

Return Value
scanf() returns the number of items assigned into variables. Since assignment into variables stops when
given invalid input for a certain format specifier, this can tell you if you’ve input all your data correctly.
Also, scanf() returns EOF on end-of-file.

Example
1 int a;
2 long int b;
3 unsigned int c;
4 float d;
5 double e;
6 long double f;
7 char s[100];
8

9 scanf("%d", &a); // store an int


10 scanf(" %d", &a); // eat any whitespace, then store an int
11 scanf("%s", s); // store a string
12 scanf("%Lf", &f); // store a long double
13

14 // store an unsigned, read all whitespace, then store a long int:


15 scanf("%u %ld", &c, &b);
16

17 // store an int, read whitespace, read "blendo", read whitespace,


18 // and store a float:
19 scanf("%d blendo %f", &a, &d);
20

21 // read all whitespace, then store all characters up to a newline


22 scanf(" %[^\n]", s);
23

24 // store a float, read (and ignore) an int, then store a double:


25 scanf("%f %*d %lf", &d, &e);
26

27 // store 10 characters:
28 scanf("%10c", s);

See Also
sscanf(), vscanf(), vsscanf(), vfscanf()
STANDARD I/O LIBRARY 167

gets(), fgets()
Read a string from console or file

Synopsis
#include <stdio.h>

char *fgets(char *s, int size, FILE *stream);


char *gets(char *s);

Description
These are functions that will retrieve a newline-terminated string from the console or a file. In other normal
words, it reads a line of text. The behavior is slightly different, and, as such, so is the usage. For instance,
here is the usage of gets():
Don’t use gets().
Admittedly, rationale would be useful, yes? For one thing, gets() doesn’t allow you to specify the length
of the buffer to store the string in. This would allow people to keep entering data past the end of your buffer,
and believe me, this would be Bad News.
I was going to add another reason, but that’s basically the primary and only reason not to use gets(). As
you might suspect, fgets() allows you to specify a maximum string length.
One difference here between the two functions: gets() will devour and throw away the newline at the end
of the line, while fgets() will store it at the end of your string (space permitting).
Here’s an example of using fgets() from the console, making it behave more like gets():
char s[100];
gets(s); // don't use this--read a line (from stdin)
fgets(s, sizeof(s), stdin); // read a line from stdin

In this case, the sizeof() operator gives us the total size of the array in bytes, and since a char is a byte, it
conveniently gives us the total size of the array.
Of course, like I keep saying, the string returned from fgets() probably has a newline at the end that you
might not want. You can write a short function to chop the newline off, like so:
char *remove_newline(char *s)
{
int len = strlen(s);

if (len > 0 && s[len-1] == '\n') // if there's a newline


s[len-1] = '\0'; // truncate the string

return s;
}

So, in summary, use fgets() to read a line of text from the keyboard or a file, and don’t use gets().

Return Value
Both gets() and fgets() return a pointer to the string passed.
On error or end-of-file, the functions return NULL.
STANDARD I/O LIBRARY 168

Example
1 char s[100];
2

3 gets(s); // read from standard input (don't use this--use fgets()!)


4

5 fgets(s, sizeof(s), stdin); // read 100 bytes from standard input


6

7 fp = fopen("datafile.dat", "r"); // (you should error-check this)


8 fgets(s, 100, fp); // read 100 bytes from the file datafile.dat
9 fclose(fp);
10

11 fgets(s, 20, stdin); // read a maximum of 20 bytes from stdin

See Also
getc(), fgetc(), getchar(), puts(), fputs(), ungetc()
STANDARD I/O LIBRARY 169

getc(), fgetc(), getchar()


Get a single character from the console or from a file.

Synopsis
#include <stdio.h>

int getc(FILE *stream);


int fgetc(FILE *stream);
int getchar(void);

Description
All of these functions in one way or another, read a single character from the console or from a FILE. The
differences are fairly minor, and here are the descriptions:
getc() returns a character from the specified FILE. From a usage standpoint, it’s equivalent to the same
fgetc() call, and fgetc() is a little more common to see. Only the implementation of the two functions
differs.
fgetc() returns a character from the specified FILE. From a usage standpoint, it’s equivalent to the same
getc() call, except that fgetc() is a little more common to see. Only the implementation of the two
functions differs.
Yes, I cheated and used cut-n-paste to do that last paragraph.
getchar() returns a character from stdin. In fact, it’s the same as calling getc(stdin).

Return Value
All three functions return the unsigned char that they read, except it’s cast to an int.
If end-of-file or an error is encountered, all three functions return EOF.

Example
1 // read all characters from a file, outputting only the letter 'b's
2 // it finds in the file
3

4 #include <stdio.h>
5

6 int main(void)
7 {
8 FILE *fp;
9 int c;
10

11 fp = fopen("datafile.txt", "r"); // error check this!


12

13 // this while-statement assigns into c, and then checks against EOF:


14

15 while((c = fgetc(fp)) != EOF) {


16 if (c == 'b') {
17 putchar(c);
18 }
19 }
20
STANDARD I/O LIBRARY 170

21 fclose(fp);
22 }

See Also
STANDARD I/O LIBRARY 171

puts(), fputs()
Write a string to the console or to a file.

Synopsis
#include <stdio.h>

int puts(const char *s);


int fputs(const char *s, FILE *stream);

Description
Both these functions output a NUL-terminated string. puts() outputs to the console, while fputs() allows
you to specify the file for output.

Return Value
Both functions return non-negative on success, or EOF on error.

Example
1 // read strings from the console and save them in a file
2

3 #include <stdio.h>
4

5 int main(void)
6 {
7 FILE *fp;
8 char s[100];
9

10 fp = fopen("datafile.txt", "w"); // error check this!


11

12 while(fgets(s, sizeof(s), stdin) != NULL) { // read a string


13 fputs(s, fp); // write it to the file we opened
14 }
15

16 fclose(fp);
17 }

See Also
STANDARD I/O LIBRARY 172

putc(), fputc(), putchar()


Write a single character to the console or to a file.

Synopsis
#include <stdio.h>

int putc(int c, FILE *stream);


int fputc(int c, FILE *stream);
int putchar(int c);

Description
All three functions output a single character, either to the console or to a FILE.
putc() takes a character argument, and outputs it to the specified FILE. fputc() does exactly the same
thing, and differs from putc() in implementation only. Most people use fputc().
putchar() writes the character to the console, and is the same as calling putc(c, stdout).

Return Value
All three functions return the character written on success, or EOF on error.

Example
1 // print the alphabet
2

3 #include <stdio.h>
4

5 int main(void)
6 {
7 char i;
8

9 for(i = 'A'; i <= 'Z'; i++)


10 putchar(i);
11

12 putchar('\n'); // put a newline at the end to make it pretty


13 }

See Also
STANDARD I/O LIBRARY 173

fseek(), rewind()
Position the file pointer in anticipition of the next read or write.

Synopsis
#include <stdio.h>

int fseek(FILE *stream, long offset, int whence);


void rewind(FILE *stream);

Description
When doing reads and writes to a file, the OS keeps track of where you are in the file using a counter
generically known as the file pointer. You can reposition the file pointer to a different point in the file using
the fseek() call. Think of it as a way to randomly access you file.
The first argument is the file in question, obviously. offset argument is the position that you want to seek
to, and whence is what that offset is relative to.
Of course, you probably like to think of the offset as being from the beginning of the file. I mean, “Seek to
position 3490, that should be 3490 bytes from the beginning of the file.” Well, it can be, but it doesn’t have
to be. Imagine the power you’re wielding here. Try to command your enthusiasm.
You can set the value of whence to one of three things:
SEEK_SET

offset is relative to the beginning of the file. This is probably what you had in mind anyway, and is the
most commonly used value for whence.
SEEK_CUR

offset is relative to the current file pointer position. So, in effect, you can say, “Move to my current position
plus 30 bytes,” or, “move to my current position minus 20 bytes.”
SEEK_END

offset is relative to the end of the file. Just like SEEK_SET except from the other end of the file. Be sure to
use negative values for offset if you want to back up from the end of the file, instead of going past the end
into oblivion.
Speaking of seeking off the end of the file, can you do it? Sure thing. In fact, you can seek way off the end
and then write a character; the file will be expanded to a size big enough to hold a bunch of zeros way out to
that character.
Now that the complicated function is out of the way, what’s this rewind() that I briefly mentioned? It
repositions the file pointer at the beginning of the file:
fseek(fp, 0, SEEK_SET); // same as rewind()
rewind(fp); // same as fseek(fp, 0, SEEK_SET)

Return Value
For fseek(), on success zero is returned; -1 is returned on failure.
The call to rewind() never fails.
STANDARD I/O LIBRARY 174

Example
1 fseek(fp, 100, SEEK_SET); // seek to the 100th byte of the file
2 fseek(fp, -30, SEEK_CUR); // seek backward 30 bytes from the current pos
3 fseek(fp, -10, SEEK_END); // seek to the 10th byte before the end of file
4

5 fseek(fp, 0, SEEK_SET); // seek to the beginning of the file


6 rewind(fp); // seek to the beginning of the file

See Also
ftell(), fgetpos(), fsetpos()
STANDARD I/O LIBRARY 175

ftell()
Tells you where a particular file is about to read from or write to.

Synopsis
#include <stdio.h>

long ftell(FILE *stream);

Description
This function is the opposite of fseek(). It tells you where in the file the next file operation will occur
relative to the beginning of the file.
It’s useful if you want to remember where you are in the file, fseek() somewhere else, and then come back
later. You can take the return value from ftell() and feed it back into fseek() (with whence parameter
set to SEEK_SET) when you want to return to your previous position.

Return Value
Returns the current offset in the file, or -1 on error.

Example
1 long pos;
2

3 // store the current position in variable "pos":


4 pos = ftell(fp);
5

6 // seek ahead 10 bytes:


7 fseek(fp, 10, SEEK_CUR);
8

9 // do some mysterious writes to the file


10 do_mysterious_writes_to_file(fp);
11

12 // and return to the starting position, stored in "pos":


13 fseek(fp, pos, SEEK_SET);

See Also
fseek(), rewind(), fgetpos(), fsetpos()
STANDARD I/O LIBRARY 176

fgetpos(), fsetpos()
Get the current position in a file, or set the current position in a file. Just like ftell() and fseek() for most
systems.

Synopsis
#include <stdio.h>

int fgetpos(FILE *stream, fpos_t *pos);


int fsetpos(FILE *stream, fpos_t *pos);

Description
These functions are just like ftell() and fseek(), except instead of counting in bytes, they use an opaque
data structure to hold positional information about the file. (Opaque, in this case, means you’re not supposed
to know what the data type is made up of.)
On virtually every system (and certainly every system that I know of), people don’t use these functions, using
ftell() and fseek() instead. These functions exist just in case your system can’t remember file positions
as a simple byte offset.
Since the pos variable is opaque, you have to assign to it using the fgetpos() call itself. Then you save the
value for later and use it to reset the position using fsetpos().

Return Value
Both functions return zero on success, and -1 on error.

Example
1 char s[100];
2 fpos_t pos;
3

4 fgets(s, sizeof(s), fp); // read a line from the file


5

6 fgetpos(fp, &pos); // save the position


7

8 fgets(s, sizeof(s), fp); // read another line from the file


9

10 fsetpos(fp, &pos); // now restore the position to where we saved

See Also
fseek(), ftell(), rewind()
STANDARD I/O LIBRARY 177

ungetc()
Pushes a character back into the input stream.

Synopsis
#include <stdio.h>

int ungetc(int c, FILE *stream);

Description
You know how getc() reads the next character from a file stream? Well, this is the opposite of that—it
pushes a character back into the file stream so that it will show up again on the very next read from the
stream, as if you’d never gotten it from getc() in the first place.
Why, in the name of all that is holy would you want to do that? Perhaps you have a stream of data that
you’re reading a character at a time, and you won’t know to stop reading until you get a certain character,
but you want to be able to read that character again later. You can read the character, see that it’s what you’re
supposed to stop on, and then ungetc() it so it’ll show up on the next read.
Yeah, that doesn’t happen very often, but there we are.
Here’s the catch: the standard only guarantees that you’ll be able to push back one character. Some imple-
mentations might allow you to push back more, but there’s really no way to tell and still be portable.

Return Value
On success, ungetc() returns the character you passed to it. On failure, it returns EOF.

Example
1 // read a piece of punctuation, then everything after it up to the next
2 // piece of punctuation. return the punctuation, and store the rest
3 // in a string
4 //
5 // sample input: !foo#bar*baz
6 // output: return value: '!', s is "foo"
7 // return value: '#', s is "bar"
8 // return value: '*', s is "baz"
9 //
10

11 char read_punctstring(FILE *fp, char *s)


12 {
13 char origpunct, c;
14

15 origpunct = fgetc(fp);
16

17 if (origpunct == EOF) // return EOF on end-of-file


18 return EOF;
19

20 while(c = fgetc(fp), !ispunct(c) && c != EOF) {


21 *s++ = c; // save it in the string
22 }
23 *s = '\0'; // nul-terminate the string!
24
STANDARD I/O LIBRARY 178

25 // if we read punctuation last, ungetc it so we can fgetc it next


26 // time:
27 if (ispunct(c))
28 ungetc(c, fp);
29 }
30

31 return origpunct;
32 }

See Also
fgetc()
STANDARD I/O LIBRARY 179

fread()
Read binary data from a file.

Synopsis
#include <stdio.h>

size_t fread(void *p, size_t size, size_t nmemb, FILE *stream);

Description
You might remember that you can call fopen() with the “b” flag in the open mode string to open the file in
“binary” mode. Files open in not-binary (ASCII or text mode) can be read using standard character-oriented
calls like fgetc() or fgets(). Files open in binary mode are typically read using the fread() function.
All this function does is says, “Hey, read this many things where each thing is a certain number of bytes, and
store the whole mess of them in memory starting at this pointer.”
This can be very useful, believe me, when you want to do something like store 20 ints in a file.
But wait—can’t you use fprintf() with the “%d” format specifier to save the ints to a text file and store
them that way? Yes, sure. That has the advantage that a human can open the file and read the numbers. It
has the disadvantage that it’s slower to convert the numbers from ints to text and that the numbers are likely
to take more space in the file. (Remember, an int is likely 4 bytes, but the string “12345678” is 8 bytes.)
So storing the binary data can certainly be more compact and faster to read.
(As for the prototype, what is this size_t you see floating around? It’s short for “size type” which is a data
type defined to hold the size of something. Great—would I stop beating around the bush already and give
you the straight story?! Ok, size_t is probably an int.)

Return Value
This function returns the number of items successfully read. If all requested items are read, the return value
will be equal to that of the parameter nmemb. If EOF occurs, the return value will be zero.
To make you confused, it will also return zero if there’s an error. You can use the functions feof() or
ferror() to tell which one really happened.

Example
1 // read 10 numbers from a file and store them in an array
2

3 int main(void)
4 {
5 int i;
6 int n[10]
7 FILE *fp;
8

9 fp = fopen("binarynumbers.dat", "rb");
10 fread(n, sizeof(int), 10, fp); // read 10 ints
11 fclose(fp);
12

13 // print them out:


14 for(i = 0; i < 10; i++)
STANDARD I/O LIBRARY 180

15 printf("n[%d] == %d\n", i, n[i]);


16 }

See Also
fopen(), fwrite(), feof(), ferror()
STANDARD I/O LIBRARY 181

fwrite()
Write binary data to a file.

Synopsis
#include <stdio.h>

size_t fwrite(const void *p, size_t size, size_t nmemb, FILE *stream);

Description
This is the counterpart to the fread() function. It writes blocks of binary data to disk. For a description of
what this means, see the entry for fread().

Return Value
fwrite() returns the number of items successfully written, which should hopefully be nmemb that you passed
in. It’ll return zero on error.

Example
1 // save 10 random numbers to a file
2

3 int main(void)
4 {
5 int i;
6 int r[10];
7 FILE *fp;
8

9 // populate the array with random numbers:


10 for(i = 0; i < 10; i++) {
11 r[i] = rand();
12 }
13

14 // save the random numbers (10 ints) to the file


15 fp = fopen("binaryfile.dat", "wb");
16 fwrite(r, sizeof(int), 10, fp); // write 10 ints
17 fclose(fp);
18 }

See Also
fopen(), fread()
STANDARD I/O LIBRARY 182

feof(), ferror(),
clearerr()

Determine if a file has reached end-of-file or if an error has occurred.

Synopsis
#include <stdio.h>

int feof(FILE *stream);


int ferror(FILE *stream);
void clearerr(FILE *stream);

Description
Each FILE* that you use to read and write data from and to a file contains flags that the system sets when
certain events occur. If you get an error, it sets the error flag; if you reach the end of the file during a read, it
sets the EOF flag. Pretty simple really.
The functions feof() and ferror() give you a simple way to test these flags: they’ll return non-zero (true)
if they’re set.
Once the flags are set for a particular stream, they stay that way until you call clearerr() to clear them.

Return Value
feof() and ferror() return non-zero (true) if the file has reached EOF or there has been an error, respec-
tively.

Example
1 // read binary data, checking for eof or error
2 int main(void)
3 {
4 int a;
5 FILE *fp;
6

7 fp = fopen("binaryints.dat", "rb");
8

9 // read single ints at a time, stopping on EOF or error:


10

11 while(fread(&a, sizeof(int), 1, fp), !feof(fp) && !ferror(fp)) {


12 printf("I read %d\n", a);
13 }
14

15 if (feof(fp))
16 printf("End of file was reached.\n");
17

18 if (ferror(fp))
19 printf("An error occurred.\n");
20

21 fclose(fp);
22 }
STANDARD I/O LIBRARY 183

See Also
fopen(), fread()
STANDARD I/O LIBRARY 184

perror()
Print the last error message to stderr

Synopsis
#include <stdio.h>
#include <errno.h> // only if you want to directly use the "errno" var

void perror(const char *s);

Description
Many functions, when they encounter an error condition for whatever reason, will set a global variable called
errno for you. errno is just an interger representing a unique error.

But to you, the user, some number isn’t generally very useful. For this reason, you can call perror() after
an error occurs to print what error has actually happened in a nice human-readable string.
And to help you along, you can pass a parameter, s, that will be prepended to the error string for you.
One more clever trick you can do is check the value of the errno (you have to include errno.h to see it)
for specific errors and have your code do different things. Perhaps you want to ignore certain errors but not
others, for instance.
The catch is that different systems define different values for errno, so it’s not very portable. The standard
only defines a few math-related values, and not others. You’ll have to check your local man-pages for what
works on your system.

Return Value
Returns nothing at all! Sorry!

Example
fseek() returns -1 on error, and sets errno, so let’s use it. Seeking on stdin makes no sense, so it should
generate an error:
1 #include <stdio.h>
2 #include <errno.h> // must include this to see "errno" in this example
3

4 int main(void)
5 {
6 if (fseek(stdin, 10L, SEEK_SET) < 0)
7 perror("fseek");
8

9 fclose(stdin); // stop using this stream


10

11 if (fseek(stdin, 20L, SEEK_CUR) < 0) {


12

13 // specifically check errno to see what kind of


14 // error happened...this works on Linux, but your
15 // mileage may vary on other systems!
16

17 if (errno == EBADF) {
18 perror("fseek again, EBADF");
19 } else {
STANDARD I/O LIBRARY 185

20 perror("fseek again");
21 }
22 }
23 }

And the output is:


fseek: Illegal seek
fseek again, EBADF: Bad file descriptor

See Also
feof(), ferror(), clearerr()
STANDARD I/O LIBRARY 186

remove()
Delete a file

Synopsis
#include <stdio.h>

int remove(const char *filename);

Description
Removes the specified file from the filesystem. It just deletes it. Nothing magical. Simply call this function
and sacrifice a small chicken and the requested file will be deleted.

Return Value
Returns zero on success, and -1 on error, setting errno.

Example
1 char *filename = "/home/beej/evidence.txt";
2

3 remove(filename);
4 remove("/disks/d/Windows/system.ini");

See Also
rename()
STANDARD I/O LIBRARY 187

rename()
Renames a file and optionally moves it to a new location

Synopsis
#include <stdio.h>

int rename(const char *old, const char *new);

Description
Renames the file old to name new. Use this function if you’re tired of the old name of the file, and you are
ready for a change. Sometimes simply renaming your files makes them feel new again, and could save you
money over just getting all new files!
One other cool thing you can do with this function is actually move a file from one directory to another by
specifying a different path for the new name.

Return Value
Returns zero on success, and -1 on error, setting errno.

Example
1 rename("foo", "bar"); // changes the name of the file "foo" to "bar"
2

3 // the following moves the file "evidence.txt" from "/tmp" to


4 // "/home/beej", and also renames it to "nothing.txt":
5 rename("/tmp/evidence.txt", "/home/beej/nothing.txt");

See Also
remove()
STANDARD I/O LIBRARY 188

tmpfile()
Create a temporary file

Synopsis
#include <stdio.h>

FILE *tmpfile(void);

Description
This is a nifty little function that will create and open a temporary file for you, and will return a FILE* to it
that you can use. The file is opened with mode “r+b”, so it’s suitable for reading, writing, and binary data.
By using a little magic, the temp file is automatically deleted when it is close()’d or when your program
exits. (Specifically, tmpfile() unlinks the file right after it opens it. If you don’t know what that means, it
won’t affect your tmpfile() skill, but hey, be curious! It’s for your own good!)

Return Value
This function returns an open FILE* on success, or NULL on failure.

Example
1 #include <stdio.h>
2

3 int main(void)
4 {
5 FILE *temp;
6 char s[128];
7

8 temp = tmpfile();
9

10 fprintf(temp, "What is the frequency, Alexander?\n");


11

12 rewind(temp); // back to the beginning


13

14 fscanf(temp, "%s", s); // read it back out


15

16 fclose(temp); // close (and magically delete)


17 }

See Also
fopen(), fclose(), tmpnam()
STANDARD I/O LIBRARY 189

tmpnam()
Generate a unique name for a temporary file

Synopsis
#include <stdio.h>

char *tmpnam(char *s);

Description
This function takes a good hard look at the existing files on your system, and comes up with a unique name
for a new file that is suitable for temporary file usage.
Let’s say you have a program that needs to store off some data for a short time so you create a temporary file
for the data, to be deleted when the program is done running. Now imagine that you called this file foo.txt.
This is all well and good, except what if a user already has a file called foo.txt in the directory that you
ran your program from? You’d overwrite their file, and they’d be unhappy and stalk you forever. And you
wouldn’t want that, now would you?
Ok, so you get wise, and you decide to put the file in /tmp so that it won’t overwrite any important content.
But wait! What if some other user is running your program at the same time and they both want to use that
filename? Or what if some other program has already created that file?
See, all of these scary problems can be completely avoided if you just use tmpnam() to get a safe-ready-to-use
filename.
So how do you use it? There are two amazing ways. One, you can declare an array (or malloc() it—
whatever) that is big enough to hold the temporary file name. How big is that? Fortunately there has been a
macro defined for you, L_tmpnam, which is how big the array must be.
And the second way: just pass NULL for the filename. tmpnam() will store the temporary name in a static
array and return a pointer to that. Subsequent calls with a NULL argument will overwrite the static array, so
be sure you’re done using it before you call tmpnam() again.
Again, this function just makes a file name for you. It’s up to you to later fopen() the file and use it.
One more note: some compilers warn against using tmpnam() since some systems have better functions (like
the Unix function mkstemp().) You might want to check your local documentation to see if there’s a better
option. Linux documentation goes so far as to say, “Never use this function. Use mkstemp() instead.”
I, however, am going to be a jerk and not talk about mkstemp() because it’s not in the standard I’m writing
about. Nyaah.

Return Value
Returns a pointer to the temporary file name. This is either a pointer to the string you passed in, or a pointer
to internal static storage if you passed in NULL. On error (like it can’t find any temporary name that is unique),
tmpnam() returns NULL.

Example
1 char filename[L_tmpnam];
2 char *another_filename;
3

4 if (tmpnam(filename) != NULL)
5 printf("We got a temp file named: \"%s\"\n", filename);
STANDARD I/O LIBRARY 190

6 else
7 printf("Something went wrong, and we got nothing!\n");
8

9 another_filename = tmpnam(NULL);
10 printf("We got another temp file named: \"%s\"\n", another_filename);
11 printf("And we didn't error check it because we're too lazy!\n");

On my Linux system, this generates the following output:


We got a temp file named: "/tmp/filew9PMuZ"
We got another temp file named: "/tmp/fileOwrgPO"
And we didn't error check it because we're too lazy!

See Also
fopen(), tmpfile()
STANDARD I/O LIBRARY 191

setbuf(), setvbuf()
Configure buffering for standard I/O operations

Synopsis
#include <stdio.h>

void setbuf(FILE *stream, char *buf);


int setvbuf(FILE *stream, char *buf, int mode, size_t size);

Description
Now brace yourself because this might come as a bit of a surprise to you: when you printf() or fprintf()
or use any I/O functions like that, it does not normally work immediately. For the sake of efficiency, and to
irritate you, the I/O on a FILE* stream is buffered away safely until certain conditions are met, and only then
is the actual I/O performed. The functions setbuf() and setvbuf() allow you to change those conditions
and the buffering behavior.
So what are the different buffering behaviors? The biggest is called “full buffering”, wherein all I/O is stored
in a big buffer until it is full, and then it is dumped out to disk (or whatever the file is). The next biggest is
called “line buffering”; with line buffering, I/O is stored up a line at a time (until a newline ('\n') character
is encountered) and then that line is processed. Finally, we have “unbuffered”, which means I/O is processed
immediately with every standard I/O call.
You might have seen and wondered why you could call putchar() time and time again and not see any
output until you called putchar('\n'); that’s right—stdout is line-buffered!
Since setbuf() is just a simplified version of setvbuf(), we’ll talk about setvbuf() first.
The stream is the FILE* you wish to modify. The standard says you must make your call to setvbuf()
before any I/O operation is performed on the stream, or else by then it might be too late.
The next argument, buf allows you to make your own buffer space (using malloc() or just a char array)
to use for buffering. If you don’t care to do this, just set buf to NULL.
Now we get to the real meat of the function: mode allows you to choose what kind of buffering you want to
use on this stream. Set it to one of the following:
_IOFBF

stream will be fully buffered.

_IOLBF

stream will be line buffered.

_IONBF

stream will be unbuffered.

Finally, the size argument is the size of the array you passed in for buf…unless you passed NULL for buf,
in which case it will resize the existing buffer to the size you specify.
Now what about this lesser function setbuf()? It’s just like calling setvbuf() with some specific param-
eters, except setbuf() doesn’t return a value. The following example shows the equivalency:
// these are the same:
setbuf(stream, buf);
setvbuf(stream, buf, _IOFBF, BUFSIZ); // fully buffered

// and these are the same:


STANDARD I/O LIBRARY 192

setbuf(stream, NULL);
setvbuf(stream, NULL, _IONBF, BUFSIZ); // unbuffered

Return Value
setvbuf() returns zero on success, and nonzero on failure. setbuf() has no return value.

Example
1 FILE *fp;
2 char lineBuf[1024];
3

4 fp = fopen("somefile.txt", "r");
5 setvbuf(fp, lineBuf, _IOLBF, 1024); // set to line buffering
6 // ...
7 fclose(fp);
8

9 fp = fopen("another.dat", "rb");
10 setbuf(fp, NULL); // set to unbuffered
11 // ...
12 fclose(fp);

See Also
fflush()
STANDARD I/O LIBRARY 193

fflush()
Process all buffered I/O for a stream right now

Synopsis
#include <stdio.h>

int fflush(FILE *stream);

Description
When you do standard I/O, as mentioned in the section on the setvbuf() function, it is usually stored in a
buffer until a line has been entered or the buffer is full or the file is closed. Sometimes, though, you really
want the output to happen right this second, and not wait around in the buffer. You can force this to happen
by calling fflush().
The advantage to buffering is that the OS doesn’t need to hit the disk every time you call fprintf(). The
disadvantage is that if you look at the file on the disk after the fprintf() call, it might not have actually
been written to yet. (“I called fputs(), but the file is still zero bytes long! Why?!”) In virtually all circum-
stances, the advantages of buffering outweigh the disadvantages; for those other circumstances, however,
use fflush().
Note that fflush() is only designed to work on output streams according to the spec. What will happen if
you try it on an input stream? Use your spooky voice: who knooooows!

Return Value
On success, fflush() returns zero. If there’s an error, it returns EOF and sets the error condition for the
stream (see ferror().)

Example
In this example, we’re going to use the carriage return, which is '\r'. This is like newline ('\n'), except
that it doesn’t move to the next line. It just returns to the front of the current line.
What we’re going to do is a little text-based status bar like so many command line programs implement. It’ll
do a countdown from 10 to 0 printing over itself on the same line.
What is the catch and what does this have to do with fflush()? The catch is that the terminal is most likely
“line buffered” (see the section on setvbuf() for more info), meaning that it won’t actually display anything
until it prints a newline. But we’re not printing newlines; we’re just printing carriage returns, so we need a
way to force the output to occur even though we’re on the same line. Yes, it’s fflush()!
1 #include <stdio.h>
2 #include <unistd.h> // for prototype for sleep()
3

4 int main(void)
5 {
6 int count;
7

8 for(count = 10; count >= 0; count--) {


9 printf("\rSeconds until launch: "); // lead with a CR
10 if (count > 0)
11 printf("%2d", count);
12 else
13 printf("blastoff!\n");
STANDARD I/O LIBRARY 194

14

15 // force output now!!


16 fflush(stdout);
17

18 // the sleep() function is non-standard, but virtually every


19 // system implements it--it simply delays for the specificed
20 // number of seconds:
21 sleep(1);
22 }
23 }

See Also
setbuf(), setvbuf()
String Manipulation

As has been mentioned earlier in the guide, a string in C is a sequence of bytes in memory, terminated by a
NUL character (‘\0’). The NUL at the end is important, since it lets all these string functions (and printf()
and puts() and everything else that deals with a string) know where the end of the string actually is.
Fortunately, when you operate on a string using one of these many functions available to you, they add the
NUL terminator on for you, so you actually rarely have to keep track of it yourself. (Sometimes you do,
especially if you’re building a string from scratch a character at a time or something.)
In this section you’ll find functions for pulling substrings out of strings, concatenating strings together, getting
the length of a string, and so forth and so on.

195
STRING MANIPULATION 196

strlen()
Returns the length of a string.

Synopsis
#include <string.h>

size_t strlen(const char *s);

Description
This function returns the length of the passed null-terminated string (not counting the NUL character at
the end). It does this by walking down the string and counting the bytes until the NUL character, so it’s a
little time consuming. If you have to get the length of the same string repeatedly, save it off in a variable
somewhere.

Return Value
Returns the number of characters in the string.

Example
1 char *s = "Hello, world!"; // 13 characters
2

3 // prints "The string is 13 characters long.":


4

5 printf("The string is %d characters long.\n", strlen(s));

See Also
STRING MANIPULATION 197

strcmp(), strncmp()
Compare two strings and return a difference.

Synopsis
#include <string.h>

int strcmp(const char *s1, const char *s2);


int strncmp(const char *s1, const char *s2, size_t n);

Description
Both these functions compare two strings. strcmp() compares the entire string down to the end, while
strncmp() only compares the first n characters of the strings.

It’s a little funky what they return. Basically it’s a difference of the strings, so if the strings are the same,
it’ll return zero (since the difference is zero). It’ll return non-zero if the strings differ; basically it will find
the first mismatched character and return less-than zero if that character in s1 is less than the corresponding
character in s2. It’ll return greater-than zero if that character in s1 is greater than that in s2.
For the most part, people just check to see if the return value is zero or not, because, more often than not,
people are only curious if strings are the same.
These functions can be used as comparison functions for qsort() if you have an array of char*s you want
to sort.

Return Value
Returns zero if the strings are the same, less-than zero if the first different character in s1 is less than that in
s2, or greater-than zero if the first difference character in s1 is greater than than in s2.

Example
1 char *s1 = "Muffin";
2 char *s2 = "Muffin Sandwich";
3 char *s3 = "Muffin";
4

5 strcmp("Biscuits", "Kittens"); // returns < 0 since 'B' < 'K'


6 strcmp("Kittens", "Biscuits"); // returns > 0 since 'K' > 'B'
7

8 if (strcmp(s1, s2) == 0)
9 printf("This won't get printed because the strings differ");
10

11 if (strcmp(s1, s3) == 0)
12 printf("This will print because s1 and s3 are the same");
13

14 // this is a little weird...but if the strings are the same, it'll


15 // return zero, which can also be thought of as "false". Not-false
16 // is "true", so (!strcmp()) will be true if the strings are the
17 // same. yes, it's odd, but you see this all the time in the wild
18 // so you might as well get used to it:
19

20 if (!strcmp(s1, s3))
21 printf("The strings are the same!")
22
STRING MANIPULATION 198

23 if (!strncmp(s1, s2, 6))


24 printf("The first 6 characters of s1 and s2 are the same");

See Also
memcmp(), qsort()
STRING MANIPULATION 199

strcat(), strncat()
Concatenate two strings into a single string.

Synopsis
#include <string.h>

int strcat(const char *dest, const char *src);


int strncat(const char *dest, const char *src, size_t n);

Description
“Concatenate”, for those not in the know, means to “stick together”. These functions take two strings, and
stick them together, storing the result in the first string.
These functions don’t take the size of the first string into account when it does the concatenation. What this
means in practical terms is that you can try to stick a 2 megabyte string into a 10 byte space. This will lead
to unintended consequences, unless you intended to lead to unintended consequences, in which case it will
lead to intended unintended consequences.
Technical banter aside, your boss and/or professor will be irate.
If you want to make sure you don’t overrun the first string, be sure to check the lengths of the strings first
and use some highly technical subtraction to make sure things fit.
You can actually only concatenate the first n characters of the second string by using strncat() and speci-
fying the maximum number of characters to copy.

Return Value
Both functions return a pointer to the destination string, like most of the string-oriented functions.

Example
1 char dest[20] = "Hello";
2 char *src = ", World!";
3 char numbers[] = "12345678";
4

5 printf("dest before strcat: \"%s\"\n", dest); // "Hello"


6

7 strcat(dest, src);
8 printf("dest after strcat: \"%s\"\n", dest); // "Hello, world!"
9

10 strncat(dest, numbers, 3); // strcat first 3 chars of numbers


11 printf("dest after strncat: \"%s\"\n", dest); // "Hello, world!123"

Notice I mixed and matched pointer and array notation there with src and numbers; this is just fine with
string functions.

See Also
strlen()
STRING MANIPULATION 200

strchr(), strrchr()
Find a character in a string.

Synopsis
#include <string.h>

char *strchr(char *str, int c);


char *strrchr(char *str, int c);

Description
The functions strchr() and strrchr find the first or last occurance of a letter in a string, respectively. (The
extra “r” in strrchr() stands for “reverse”–it looks starting at the end of the string and working backward.)
Each function returns a pointer to the char in question, or NULL if the letter isn’t found in the string.
Quite straightforward.
One thing you can do if you want to find the next occurance of the letter after finding the first, is call the
function again with the previous return value plus one. (Remember pointer arithmetic?) Or minus one if
you’re looking in reverse. Don’t accidentally go off the end of the string!

Return Value
Returns a pointer to the occurance of the letter in the string, or NULL if the letter is not found.

Example
1 // "Hello, world!"
2 // ^ ^
3 // A B
4

5 char *str = "Hello, world!";


6 char *p;
7

8 p = strchr(str, ','); // p now points at position A


9 p = strrchr(str, 'o'); // p now points at position B

// repeatedly find all occurances of the letter 'B'


char *str = "A BIG BROWN BAT BIT BEEJ";
char *p;

for(p = strchr(str, 'B'); p != NULL; p = strchr(p + 1, 'B')) {


printf("Found a 'B' here: %s\n", p);
}

// output is:
//
// Found a 'B' here: BIG BROWN BAT BIT BEEJ
// Found a 'B' here: BROWN BAT BIT BEEJ
// Found a 'B' here: BAT BIT BEEJ
// Found a 'B' here: BIT BEEJ
// Found a 'B' here: BEEJ
STRING MANIPULATION 201

See Also
STRING MANIPULATION 202

strcpy(), strncpy()
Copy a string

Synopsis
#include <string.h>

char *strcpy(char *dest, char *src);


char *strncpy(char *dest, char *src, size_t n);

Description
These functions copy a string from one address to another, stopping at the NUL terminator on the srcstring.
strncpy() is just like strcpy(), except only the first n characters are actually copied. Beware that if you
hit the limit, n before you get a NUL terminator on the src string, your dest string won’t be NUL-terminated.
Beware! BEWARE!
(If the src string has fewer than n characters, it works just like strcpy().)
You can terminate the string yourself by sticking the '\0' in there yourself:
char s[10];
char foo = "My hovercraft is full of eels."; // more than 10 chars

strncpy(s, foo, 9); // only copy 9 chars into positions 0-8


s[9] = '\0'; // position 9 gets the terminator

Return Value
Both functions return dest for your convenience, at no extra charge.

Example
1 char *src = "hockey hockey hockey hockey hockey hockey hockey hockey";
2 char dest[20];
3

4 int len;
5

6 strcpy(dest, "I like "); // dest is now "I like "


7

8 len = strlen(dest);
9

10 // tricky, but let's use some pointer arithmetic and math to append
11 // as much of src as possible onto the end of dest, -1 on the length to
12 // leave room for the terminator:
13 strncpy(dest+len, src, sizeof(dest)-len-1);
14

15 // remember that sizeof() returns the size of the array in bytes


16 // and a char is a byte:
17 dest[sizeof(dest)-1] = '\0'; // terminate
18

19 // dest is now: v null terminator


20 // I like hockey hocke
21 // 01234567890123456789012345
STRING MANIPULATION 203

See Also
memcpy(), strcat(), strncat()
STRING MANIPULATION 204

strspn(), strcspn()
Return the length of a string consisting entirely of a set of characters, or of not a set of characters.

Synopsis
#include <string.h>

size_t strspn(char *str, const char *accept);


size_t strcspn(char *str, const char *reject);

Description
strspn() will tell you the length of a string consisting entirely of the set of characters in accept. That is,
it starts walking down str until it finds a character that is not in the set (that is, a character that is not to be
accepted), and returns the length of the string so far.
strcspn() works much the same way, except that it walks down str until it finds a character in the reject
set (that is, a character that is to be rejected.) It then returns the length of the string so far.

Return Value
The lenght of the string consisting of all characters in accept (for strspn()), or the length of the string
consisting of all characters except reject (for strcspn()

Example
1 char str1[] = "a banana";
2 char str2[] = "the bolivian navy on manuvers in the south pacific";
3

4 // how many letters in str1 until we reach something that's not a vowel?
5 n = strspn(str1, "aeiou"); // n == 1, just "a"
6

7 // how many letters in str1 until we reach something that's not a, b,


8 // or space?
9 n = strspn(str1, "ab "); // n == 4, "a ba"
10

11 // how many letters in str2 before we get a "y"?


12 n = strcspn(str2, "y"); // n = 16, "the bolivian nav"

See Also
strchr(), strrchr()
STRING MANIPULATION 205

strstr()
Find a string in another string.

Synopsis
#include <string.h>

char *strstr(const char *str, const char *substr);

Description
Let’s say you have a big long string, and you want to find a word, or whatever substring strikes your fancy,
inside the first string. Then strstr() is for you! It’ll return a pointer to the substr within the str!

Return Value
You get back a pointer to the occurance of the substr inside the str, or NULL if the substring can’t be found.

Example
1 char *str = "The quick brown fox jumped over the lazy dogs.";
2 char *p;
3

4 p = strstr(str, "lazy");
5 printf("%s\n", p); // "lazy dogs."
6

7 // p is NULL after this, since the string "wombat" isn't in str:


8 p = strstr(str, "wombat");

See Also
strchr(), strrchr(), strspn(), strcspn()
STRING MANIPULATION 206

strtok()
Tokenize a string.

Synopsis
#include <string.h>

char *strtok(char *str, const char *delim);

Description
If you have a string that has a bunch of separators in it, and you want to break that string up into individual
pieces, this function can do it for you.
The usage is a little bit weird, but at least whenever you see the function in the wild, it’s consistently weird.
Basically, the first time you call it, you pass the string, str that you want to break up in as the first argument.
For each subsequent call to get more tokens out of the string, you pass NULL. This is a little weird, but
strtok() remembers the string you originally passed in, and continues to strip tokens off for you.

Note that it does this by actually putting a NUL terminator after the token, and then returning a pointer to
the start of the token. So the original string you pass in is destroyed, as it were. If you need to preserve the
string, be sure to pass a copy of it to strtok() so the original isn’t destroyed.

Return Value
A pointer to the next token. If you’re out of tokens, NULL is returned.

Example
1 // break up the string into a series of space or
2 // punctuation-separated words
3 char *str = "Where is my bacon, dude?";
4 char *token;
5

6 // Note that the following if-do-while construct is very very


7 // very very very common to see when using strtok().
8

9 // grab the first token (making sure there is a first token!)


10 if ((token = strtok(str, ".,?! ")) != NULL) {
11 do {
12 printf("Word: \"%s\"\n", token);
13

14 // now, the while continuation condition grabs the


15 // next token (by passing NULL as the first param)
16 // and continues if the token's not NULL:
17 } while ((token = strtok(NULL, ".,?! ")) != NULL);
18 }
19

20 // output is:
21 //
22 // Word: "Where"
23 // Word: "is"
24 // Word: "my"
25 // Word: "bacon"
STRING MANIPULATION 207

26 // Word: "dude"
27 //

See Also
strchr(), strrchr(), strspn(), strcspn()
Mathematics

It’s your favorite subject: Mathematics! Hello, I’m Doctor Math, and I’ll be making math FUN and EASY!
[vomiting sounds]
Ok, I know math isn’t the grandest thing for some of you out there, but these are merely functions that quickly
and easily do math you either know, want, or just don’t care about. That pretty much covers it.
For you trig fans out there, we’ve got all manner of things, including sine, cosine, tangent, and, conversely,
arc sine, arc cosine, and arc tangent. That’s very exciting.
And for normal people, there is a slurry of your run-of-the-mill functions that will serve your general purpose
mathematical needs, including absolute value, hypotenuse length, square root, cube root, and power.
In short, you’re a fricking MATHEMATICAL DEITY!
Oh wait, before then, I should tell you that the trig functions have three variants with different suffixes. The
“f” suffix (e.g. sinf()) returns a float, while the “l” suffix (e.g. sinl()) returns a massive and nicely
accurate long double. Normal sin() just returns a double. These are extensions to ANSI C, but they
should be supported by modern compilers.
Also, there are several values that are defined in the math.h header file.

Constant C Macro Equivalent


𝑒 M_E
log2 𝑒 M_LOG2E
log10 𝑒 M_LOG10E
log𝑒 2 M_LN2
log𝑒 10 M_LN10
𝜋 M_PI
𝜋/2 M_PI_2
𝜋/4 M_PI_4
1/𝜋 M_1_PI
2/𝜋√ M_2_PI
2/
√ 𝜋 M_2_SQRTPI
2√ M_SQRT2
1/ 2 M_SQRT1_2

208
MATHEMATICS 209

sin(), sinf(), sinl()


Calculate the sine of a number.

Synopsis
#include <math.h>

double sin(double x);


float sinf(float x);
long double sinl(long double x);

Description
Calculates the sine of the value x, where x is in radians.
For those of you who don’t remember, radians are another way of measuring an angle, just like degrees. To
convert from degrees to radians or the other way around, use the following code:
degrees = radians * 180.0f / M_PI;
radians = degrees * M_PI / 180;

Return Value
Returns the sine of x. The variants return different types.

Example
1 double sinx;
2 long double ldsinx;
3

4 sinx = sin(3490.0); // round and round we go!


5 ldsinx = sinl((long double)3.490);

See Also
cos(), tan(), asin()
MATHEMATICS 210

cos(), cosf(), cosl()


Calculate the cosine of a number.

Synopsis
#include <math.h>

double cos(double x)
float cosf(float x)
long double cosl(long double x)

Description
Calculates the cosine of the value x, where x is in radians.
For those of you who don’t remember, radians are another way of measuring an angle, just like degrees. To
convert from degrees to radians or the other way around, use the following code:
degrees = radians * 180.0f / M_PI;
radians = degrees * M_PI / 180;

Return Value
Returns the cosine of x. The variants return different types.

Example
1 double sinx;
2 long double ldsinx;
3

4 sinx = sin(3490.0); // round and round we go!


5 ldsinx = sinl((long double)3.490);

See Also
sin(), tan(), acos()
MATHEMATICS 211

tan(), tanf(), tanl()


Calculate the tangent of a number.

Synopsis
#include <math.h>

double tan(double x)
float tanf(float x)
long double tanl(long double x)

Description
Calculates the tangent of the value x, where x is in radians.
For those of you who don’t remember, radians are another way of measuring an angle, just like degrees. To
convert from degrees to radians or the other way around, use the following code:
degrees = radians * 180.0f / M_PI;
radians = degrees * M_PI / 180;

Return Value
Returns the tangent of x. The variants return different types.

Example
1 double tanx;
2 long double ldtanx;
3

4 tanx = tan(3490.0); // round and round we go!


5 ldtanx = tanl((long double)3.490);

See Also
sin(), cos(), atan(), atan2()
MATHEMATICS 212

asin(), asinf(), asinl()


Calculate the arc sine of a number.

Synopsis
#include <math.h>

double asin(double x);


float asinf(float x);
long double asinl(long double x);

Description
Calculates the arc sine of a number in radians. (That is, the value whose sine is x.) The number must be in
the range -1.0 to 1.0.
For those of you who don’t remember, radians are another way of measuring an angle, just like degrees. To
convert from degrees to radians or the other way around, use the following code:
degrees = radians * 180.0f / M_PI;
radians = degrees * M_PI / 180;

Return Value
Returns the arc sine of x, unless x is out of range. In that case, errno will be set to EDOM and the return
value will be NaN. The variants return different types.

Example
1 double asinx;
2 long double ldasinx;
3

4 asinx = asin(0.2);
5 ldasinx = asinl((long double)0.3);

See Also
acos(), atan(), atan2(), sin()
MATHEMATICS 213

acos(), acosf(), acosl()


Calculate the arc cosine of a number.

Synopsis
#include <math.h>

double acos(double x);


float acosf(float x);
long double acosl(long double x);

Description
Calculates the arc cosine of a number in radians. (That is, the value whose cosine is x.) The number must be
in the range -1.0 to 1.0.
For those of you who don’t remember, radians are another way of measuring an angle, just like degrees. To
convert from degrees to radians or the other way around, use the following code:
degrees = radians * 180.0f / M_PI;
radians = degrees * M_PI / 180;

Return Value
Returns the arc cosine of x, unless x is out of range. In that case, errno will be set to EDOM and the return
value will be NaN. The variants return different types.

Example
1 double acosx;
2 long double ldacosx;
3

4 acosx = acos(0.2);
5 ldacosx = acosl((long double)0.3);

See Also
asin(), atan(), atan2(), cos()
MATHEMATICS 214

atan(), atanf(), atanl(),


atan2(), atan2f(), atan2l()

Calculate the arc tangent of a number.

Synopsis
#include <math.h>

double atan(double x);


float atanf(float x);
long double atanl(long double x);

double atan2(double y, double x);


float atan2f(float y, float x);
long double atan2l(long double y, long double x);

Description
Calculates the arc tangent of a number in radians. (That is, the value whose tangent is x.)
The atan2() variants are pretty much the same as using atan() with y/x as the argument…except that
atan2() will use those values to determine the correct quadrant of the result.

For those of you who don’t remember, radians are another way of measuring an angle, just like degrees. To
convert from degrees to radians or the other way around, use the following code:
degrees = radians * 180.0f / M_PI;
radians = degrees * M_PI / 180;

Return Value
The atan() functions return the arc tangent of x, which will be between PI/2 and -PI/2. The atan2()
functions return an angle between PI and -PI.

Example
1 double atanx;
2 long double ldatanx;
3

4 atanx = atan(0.2);
5 ldatanx = atanl((long double)0.3);
6

7 atanx = atan2(0.2);
8 ldatanx = atan2l((long double)0.3);

See Also
tan(), asin(), atan()
MATHEMATICS 215

sqrt()
Calculate the square root of a number

Synopsis
#include <math.h>

double sqrt(double x);


float sqrtf(float x);
long double sqrtl(long double x);

Description
Computes the square root of a number. To those of you who don’t know what a square root is, I’m not going
to explain. Suffice it to say, the square root of a number delivers a value that when squared (multiplied by
itself) results in the original number.
Ok, fine—I did explain it after all, but only because I wanted to show off. It’s not like I’m giving you
examples or anything, such as the square root of nine is three, because when you multiply three by three you
get nine, or anything like that. No examples. I hate examples!
And I suppose you wanted some actual practical information here as well. You can see the usual trio of func-
tions here—they all compute square root, but they take different types as arguments. Pretty straightforward,
really.

Return Value
Returns (and I know this must be something of a surprise to you) the square root of x. If you try to be smart
and pass a negative number in for x, the global variable errno will be set to EDOM (which stands for DOMain
Error, not some kind of cheese.)

Example
1 // example usage of sqrt()
2

3 float something = 10;


4

5 double x1 = 8.2, y1 = -5.4;


6 double x2 = 3.8, y2 = 34.9;
7 double dx, dy;
8

9 printf("square root of 10 is %.2f\n", sqrtf(something));


10

11 dx = x2 - x1;
12 dy = y2 - y1;
13 printf("distance between points (x1, y1) and (x2, y2): %.2f\n",
14 sqrt(dx*dx + dy*dy));

And the output is:


square root of 10 is 3.16
distance between points (x1, y1) and (x2, y2): 40.54
MATHEMATICS 216

See Also
hypot()

You might also like