C Programming Notes
C Programming Notes
C
Notes for Professionals
300+ pages
of professional hints and tricks
Disclaimer
GoalKicker.com This is an unocial free book created for educational purposes and is
not aliated with ocial C group(s) or company(s).
Free Programming Books All trademarks and registered trademarks are
the property of their respective owners
Contents
About ................................................................................................................................................................................... 1
Chapter 1: Getting started with C Language .................................................................................................. 2
Section 1.1: Hello World ................................................................................................................................................. 2
Section 1.2: Original "Hello, World!" in K&R C .............................................................................................................. 4
Chapter 2: Comments ................................................................................................................................................. 6
Section 2.1: Commenting using the preprocessor ...................................................................................................... 6
Section 2.2: /* */ delimited comments ........................................................................................................................ 6
Section 2.3: // delimited comments ............................................................................................................................ 7
Section 2.4: Possible pitfall due to trigraphs .............................................................................................................. 7
Chapter 3: Data Types ............................................................................................................................................... 9
Section 3.1: Interpreting Declarations .......................................................................................................................... 9
Section 3.2: Fixed Width Integer Types (since C99) ................................................................................................. 11
Section 3.3: Integer types and constants .................................................................................................................. 11
Section 3.4: Floating Point Constants ........................................................................................................................ 12
Section 3.5: String Literals .......................................................................................................................................... 13
Chapter 4: Operators ............................................................................................................................................... 14
Section 4.1: Relational Operators ............................................................................................................................... 14
Section 4.2: Conditional Operator/Ternary Operator ............................................................................................. 15
Section 4.3: Bitwise Operators ................................................................................................................................... 16
Section 4.4: Short circuit behavior of logical operators .......................................................................................... 18
Section 4.5: Comma Operator ................................................................................................................................... 19
Section 4.6: Arithmetic Operators .............................................................................................................................. 19
Section 4.7: Access Operators ................................................................................................................................... 22
Section 4.8: sizeof Operator ....................................................................................................................................... 24
Section 4.9: Cast Operator ......................................................................................................................................... 24
Section 4.10: Function Call Operator ......................................................................................................................... 24
Section 4.11: Increment / Decrement ......................................................................................................................... 25
Section 4.12: Assignment Operators .......................................................................................................................... 25
Section 4.13: Logical Operators .................................................................................................................................. 26
Section 4.14: Pointer Arithmetic .................................................................................................................................. 27
Section 4.15: _Alignof .................................................................................................................................................. 28
Chapter 5: Boolean .................................................................................................................................................... 30
Section 5.1: Using stdbool.h ........................................................................................................................................ 30
Section 5.2: Using #define .......................................................................................................................................... 30
Section 5.3: Using the Intrinsic (built-in) Type _Bool ............................................................................................... 31
Section 5.4: Integers and pointers in Boolean expressions .................................................................................... 31
Section 5.5: Defining a bool type using typedef ...................................................................................................... 32
Chapter 6: Strings ....................................................................................................................................................... 33
Section 6.1: Tokenisation: strtok(), strtok_r() and strtok_s() .................................................................................. 33
Section 6.2: String literals ........................................................................................................................................... 35
Section 6.3: Calculate the Length: strlen() ................................................................................................................ 36
Section 6.4: Basic introduction to strings .................................................................................................................. 37
Section 6.5: Copying strings ....................................................................................................................................... 37
Section 6.6: Iterating Over the Characters in a String ............................................................................................. 40
Section 6.7: Creating Arrays of Strings ..................................................................................................................... 41
Section 6.8: Convert Strings to Number: atoi(), atof() (dangerous, don't use them) ........................................... 41
Section 6.9: string formatted data read/write ......................................................................................................... 42
Section 6.10: Find first/last occurrence of a specific character: strchr(), strrchr() ............................................... 43
Section 6.11: Copy and Concatenation: strcpy(), strcat() ........................................................................................ 44
Section 6.12: Comparsion: strcmp(), strncmp(), strcasecmp(), strncasecmp() .................................................... 45
Section 6.13: Safely convert Strings to Number: strtoX functions .......................................................................... 47
Section 6.14: strspn and strcspn ................................................................................................................................. 48
Chapter 7: Literals for numbers, characters and strings ...................................................................... 50
Section 7.1: Floating point literals ............................................................................................................................... 50
Section 7.2: String literals ........................................................................................................................................... 50
Section 7.3: Character literals .................................................................................................................................... 50
Section 7.4: Integer literals ......................................................................................................................................... 51
Chapter 8: Compound Literals ............................................................................................................................. 53
Section 8.1: Definition/Initialisation of Compound Literals ...................................................................................... 53
Chapter 9: Bit-fields .................................................................................................................................................. 55
Section 9.1: Bit-fields .................................................................................................................................................... 55
Section 9.2: Using bit-fields as small integers .......................................................................................................... 56
Section 9.3: Bit-field alignment .................................................................................................................................. 56
Section 9.4: Don'ts for bit-fields ................................................................................................................................. 57
Section 9.5: When are bit-fields useful? .................................................................................................................... 58
Chapter 10: Arrays ...................................................................................................................................................... 60
Section 10.1: Declaring and initializing an array ....................................................................................................... 60
Section 10.2: Iterating through an array eciently and row-major order ............................................................ 61
Section 10.3: Array length ........................................................................................................................................... 62
Section 10.4: Passing multidimensional arrays to a function ................................................................................. 63
Section 10.5: Multi-dimensional arrays ...................................................................................................................... 64
Section 10.6: Define array and access array element ............................................................................................. 67
Section 10.7: Clearing array contents (zeroing) ....................................................................................................... 67
Section 10.8: Setting values in arrays ........................................................................................................................ 68
Section 10.9: Allocate and zero-initialize an array with user defined size ............................................................. 68
Section 10.10: Iterating through an array using pointers ........................................................................................ 69
Chapter 11: Linked lists ............................................................................................................................................. 71
Section 11.1: A doubly linked list .................................................................................................................................. 71
Section 11.2: Reversing a linked list ............................................................................................................................ 73
Section 11.3: Inserting a node at the nth position ..................................................................................................... 75
Section 11.4: Inserting a node at the beginning of a singly linked list .................................................................... 76
Chapter 12: Enumerations ...................................................................................................................................... 79
Section 12.1: Simple Enumeration ............................................................................................................................... 79
Section 12.2: enumeration constant without typename .......................................................................................... 80
Section 12.3: Enumeration with duplicate value ....................................................................................................... 80
Section 12.4: Typedef enum ....................................................................................................................................... 81
Chapter 13: Structs ..................................................................................................................................................... 83
Section 13.1: Flexible Array Members ......................................................................................................................... 83
Section 13.2: Typedef Structs ..................................................................................................................................... 85
Section 13.3: Pointers to structs .................................................................................................................................. 86
Section 13.4: Passing structs to functions ................................................................................................................. 88
Section 13.5: Object-based programming using structs ......................................................................................... 89
Section 13.6: Simple data structures .......................................................................................................................... 91
Chapter 14: Standard Math ................................................................................................................................... 93
Section 14.1: Power functions - pow(), powf(), powl() .............................................................................................. 93
Section 14.2: Double precision floating-point remainder: fmod() .......................................................................... 94
Section 14.3: Single precision and long double precision floating-point remainder: fmodf(), fmodl() ............... 94
Chapter 15: Iteration Statements/Loops: for, while, do-while ............................................................ 96
Section 15.1: For loop ................................................................................................................................................... 96
Section 15.2: Loop Unrolling and Du's Device ........................................................................................................ 96
Section 15.3: While loop ............................................................................................................................................... 97
Section 15.4: Do-While loop ........................................................................................................................................ 97
Section 15.5: Structure and flow of control in a for loop ......................................................................................... 98
Section 15.6: Infinite Loops .......................................................................................................................................... 99
Chapter 16: Selection Statements .................................................................................................................... 100
Section 16.1: if () Statements ..................................................................................................................................... 100
Section 16.2: Nested if()...else VS if()..else Ladder .................................................................................................. 100
Section 16.3: switch () Statements ........................................................................................................................... 102
Section 16.4: if () ... else statements and syntax ..................................................................................................... 104
Section 16.5: if()...else Ladder Chaining two or more if () ... else statements ....................................................... 104
Chapter 17: Initialization ........................................................................................................................................ 105
Section 17.1: Initialization of Variables in C .............................................................................................................. 105
Section 17.2: Using designated initializers ............................................................................................................... 106
Section 17.3: Initializing structures and arrays of structures ................................................................................ 108
Chapter 18: Declaration vs Definition ............................................................................................................ 110
Section 18.1: Understanding Declaration and Definition ....................................................................................... 110
Chapter 19: Command-line arguments ......................................................................................................... 111
Section 19.1: Print the arguments to a program and convert to integer values ................................................. 111
Section 19.2: Printing the command line arguments ............................................................................................. 111
Section 19.3: Using GNU getopt tools ...................................................................................................................... 112
Chapter 20: Files and I/O streams .................................................................................................................. 115
Section 20.1: Open and write to file ......................................................................................................................... 115
Section 20.2: Run process ........................................................................................................................................ 116
Section 20.3: fprintf ................................................................................................................................................... 116
Section 20.4: Get lines from a file using getline() .................................................................................................. 116
Section 20.5: fscanf() ................................................................................................................................................ 120
Section 20.6: Read lines from a file ......................................................................................................................... 121
Section 20.7: Open and write to a binary file ......................................................................................................... 122
Chapter 21: Formatted Input/Output ............................................................................................................. 124
Section 21.1: Conversion Specifiers for printing ...................................................................................................... 124
Section 21.2: The printf() Function ........................................................................................................................... 125
Section 21.3: Printing format flags ........................................................................................................................... 125
Section 21.4: Printing the Value of a Pointer to an Object .................................................................................... 126
Section 21.5: Printing the Dierence of the Values of two Pointers to an Object ............................................... 127
Section 21.6: Length modifiers ................................................................................................................................. 128
Chapter 22: Pointers ................................................................................................................................................ 129
Section 22.1: Introduction ......................................................................................................................................... 129
Section 22.2: Common errors .................................................................................................................................. 131
Section 22.3: Dereferencing a Pointer .................................................................................................................... 134
Section 22.4: Dereferencing a Pointer to a struct .................................................................................................. 134
Section 22.5: Const Pointers ..................................................................................................................................... 135
Section 22.6: Function pointers ................................................................................................................................ 138
Section 22.7: Polymorphic behaviour with void pointers ...................................................................................... 139
Section 22.8: Address-of Operator ( & ) ................................................................................................................. 140
Section 22.9: Initializing Pointers ............................................................................................................................. 140
Section 22.10: Pointer to Pointer .............................................................................................................................. 141
Section 22.11: void* pointers as arguments and return values to standard functions ....................................... 141
Section 22.12: Same Asterisk, Dierent Meanings ................................................................................................. 142
Chapter 23: Sequence points .............................................................................................................................. 144
Section 23.1: Unsequenced expressions .................................................................................................................. 144
Section 23.2: Sequenced expressions ..................................................................................................................... 144
Section 23.3: Indeterminately sequenced expressions ......................................................................................... 145
Chapter 24: Function Pointers ........................................................................................................................... 146
Section 24.1: Introduction .......................................................................................................................................... 146
Section 24.2: Returning Function Pointers from a Function ................................................................................. 146
Section 24.3: Best Practices ..................................................................................................................................... 147
Section 24.4: Assigning a Function Pointer ............................................................................................................. 149
Section 24.5: Mnemonic for writing function pointers ........................................................................................... 149
Section 24.6: Basics ................................................................................................................................................... 150
Chapter 25: Function Parameters .................................................................................................................... 152
Section 25.1: Parameters are passed by value ...................................................................................................... 152
Section 25.2: Passing in Arrays to Functions .......................................................................................................... 152
Section 25.3: Order of function parameter execution ........................................................................................... 153
Section 25.4: Using pointer parameters to return multiple values ...................................................................... 153
Section 25.5: Example of function returning struct containing values with error codes ................................... 154
Chapter 26: Pass 2D-arrays to functions ..................................................................................................... 156
Section 26.1: Pass a 2D-array to a function ........................................................................................................... 156
Section 26.2: Using flat arrays as 2D arrays .......................................................................................................... 162
Chapter 27: Error handling .................................................................................................................................. 163
Section 27.1: errno ..................................................................................................................................................... 163
Section 27.2: strerror ................................................................................................................................................. 163
Section 27.3: perror ................................................................................................................................................... 163
Chapter 28: Undefined behavior ...................................................................................................................... 165
Section 28.1: Dereferencing a pointer to variable beyond its lifetime ................................................................. 165
Section 28.2: Copying overlapping memory .......................................................................................................... 165
Section 28.3: Signed integer overflow ..................................................................................................................... 166
Section 28.4: Use of an uninitialized variable ......................................................................................................... 167
Section 28.5: Data race ............................................................................................................................................ 168
Section 28.6: Read value of pointer that was freed .............................................................................................. 169
Section 28.7: Using incorrect format specifier in printf ......................................................................................... 170
Section 28.8: Modify string literal ............................................................................................................................ 170
Section 28.9: Passing a null pointer to printf %s conversion ................................................................................ 170
Section 28.10: Modifying any object more than once between two sequence points ....................................... 171
Section 28.11: Freeing memory twice ...................................................................................................................... 172
Section 28.12: Bit shifting using negative counts or beyond the width of the type ............................................ 172
Section 28.13: Returning from a function that's declared with `_Noreturn` or `noreturn` function specifier
............................................................................................................................................................................. 173
Section 28.14: Accessing memory beyond allocated chunk ................................................................................. 174
Section 28.15: Modifying a const variable using a pointer .................................................................................... 174
Section 28.16: Reading an uninitialized object that is not backed by memory .................................................. 175
Section 28.17: Addition or subtraction of pointer not properly bounded ............................................................ 175
Section 28.18: Dereferencing a null pointer ............................................................................................................ 175
Section 28.19: Using ush on an input stream ...................................................................................................... 176
Section 28.20: Inconsistent linkage of identifiers ................................................................................................... 176
Section 28.21: Missing return statement in value returning function ................................................................... 177
Section 28.22: Division by zero ................................................................................................................................ 177
Section 28.23: Conversion between pointer types produces incorrectly aligned result .................................... 178
Section 28.24: Modifying the string returned by getenv, strerror, and setlocale functions .............................. 179
Chapter 29: Random Number Generation ................................................................................................... 180
Section 29.1: Basic Random Number Generation .................................................................................................. 180
Section 29.2: Permuted Congruential Generator ................................................................................................... 180
Section 29.3: Xorshift Generation ............................................................................................................................ 181
Section 29.4: Restrict generation to a given range ............................................................................................... 182
Chapter 30: Preprocessor and Macros .......................................................................................................... 183
Section 30.1: Header Include Guards ....................................................................................................................... 183
Section 30.2: #if 0 to block out code sections ........................................................................................................ 186
Section 30.3: Function-like macros .......................................................................................................................... 187
Section 30.4: Source file inclusion ............................................................................................................................ 188
Section 30.5: Conditional inclusion and conditional function signature modification ....................................... 188
Section 30.6: __cplusplus for using C externals in C++ code compiled with C++ - name mangling ............... 190
Section 30.7: Token pasting ..................................................................................................................................... 191
Section 30.8: Predefined Macros ............................................................................................................................. 192
Section 30.9: Variadic arguments macro ............................................................................................................... 193
Section 30.10: Macro Replacement ......................................................................................................................... 194
Section 30.11: Error directive ..................................................................................................................................... 195
Section 30.12: FOREACH implementation ............................................................................................................... 196
Chapter 31: Signal handling ................................................................................................................................. 199
Section 31.1: Signal Handling with “signal()” ............................................................................................................ 199
Chapter 32: Variable arguments ...................................................................................................................... 201
Section 32.1: Using an explicit count argument to determine the length of the va_list .................................... 201
Section 32.2: Using terminator values to determine the end of va_list .............................................................. 202
Section 32.3: Implementing functions with a `printf()`-like interface ................................................................... 202
Section 32.4: Using a format string ......................................................................................................................... 205
Chapter 33: Assertion ............................................................................................................................................. 207
Section 33.1: Simple Assertion .................................................................................................................................. 207
Section 33.2: Static Assertion ................................................................................................................................... 207
Section 33.3: Assert Error Messages ....................................................................................................................... 208
Section 33.4: Assertion of Unreachable Code ........................................................................................................ 209
Section 33.5: Precondition and Postcondition ........................................................................................................ 209
Chapter 34: Generic selection ............................................................................................................................ 211
Section 34.1: Check whether a variable is of a certain qualified type .................................................................. 211
Section 34.2: Generic selection based on multiple arguments ............................................................................. 211
Section 34.3: Type-generic printing macro ............................................................................................................ 213
Chapter 35: X-macros ............................................................................................................................................ 214
Section 35.1: Trivial use of X-macros for printfs ..................................................................................................... 214
Section 35.2: Extension: Give the X macro as an argument ................................................................................. 214
Section 35.3: Enum Value and Identifier ................................................................................................................. 215
Section 35.4: Code generation ................................................................................................................................. 215
Chapter 36: Aliasing and eective type ....................................................................................................... 217
Section 36.1: Eective type ....................................................................................................................................... 217
Section 36.2: restrict qualification ............................................................................................................................ 217
Section 36.3: Changing bytes ................................................................................................................................... 218
Section 36.4: Character types cannot be accessed through non-character types ........................................... 219
Section 36.5: Violating the strict aliasing rules ....................................................................................................... 220
Chapter 37: Compilation ....................................................................................................................................... 221
Section 37.1: The Compiler ........................................................................................................................................ 221
Section 37.2: File Types ............................................................................................................................................ 222
Section 37.3: The Linker ............................................................................................................................................ 222
Section 37.4: The Preprocessor ............................................................................................................................... 224
Section 37.5: The Translation Phases ...................................................................................................................... 225
Chapter 38: Inline assembly ................................................................................................................................ 227
Section 38.1: gcc Inline assembly in macros ........................................................................................................... 227
Section 38.2: gcc Basic asm support ...................................................................................................................... 227
Section 38.3: gcc Extended asm support ................................................................................................................ 228
Chapter 39: Identifier Scope ............................................................................................................................... 229
Section 39.1: Function Prototype Scope .................................................................................................................. 229
Section 39.2: Block Scope ......................................................................................................................................... 230
Section 39.3: File Scope ............................................................................................................................................ 230
Section 39.4: Function scope .................................................................................................................................... 231
Chapter 40: Implicit and Explicit Conversions ........................................................................................... 232
Section 40.1: Integer Conversions in Function Calls ............................................................................................... 232
Section 40.2: Pointer Conversions in Function Calls .............................................................................................. 233
Chapter 41: Type Qualifiers ................................................................................................................................ 235
Section 41.1: Volatile variables .................................................................................................................................. 235
Section 41.2: Unmodifiable (const) variables ......................................................................................................... 236
Chapter 42: Typedef .............................................................................................................................................. 237
Section 42.1: Typedef for Structures and Unions ................................................................................................... 237
Section 42.2: Typedef for Function Pointers .......................................................................................................... 238
Section 42.3: Simple Uses of Typedef ..................................................................................................................... 239
Chapter 43: Storage Classes .............................................................................................................................. 241
Section 43.1: auto ....................................................................................................................................................... 241
Section 43.2: register ................................................................................................................................................ 241
Section 43.3: static ..................................................................................................................................................... 242
Section 43.4: typedef ................................................................................................................................................ 243
Section 43.5: extern ................................................................................................................................................... 243
Section 43.6: _Thread_local .................................................................................................................................... 244
Chapter 44: Declarations .................................................................................................................................... 246
Section 44.1: Calling a function from another C file ............................................................................................... 246
Section 44.2: Using a Global Variable ..................................................................................................................... 247
Section 44.3: Introduction ......................................................................................................................................... 247
Section 44.4: Typedef ............................................................................................................................................... 250
Section 44.5: Using Global Constants ..................................................................................................................... 250
Section 44.6: Using the right-left or spiral rule to decipher C declaration .......................................................... 252
Chapter 45: Structure Padding and Packing ............................................................................................. 256
Section 45.1: Packing structures .............................................................................................................................. 256
Section 45.2: Structure padding .............................................................................................................................. 257
Chapter 46: Memory management ................................................................................................................ 258
Section 46.1: Allocating Memory .............................................................................................................................. 258
Section 46.2: Freeing Memory ................................................................................................................................. 259
Section 46.3: Reallocating Memory ......................................................................................................................... 261
Section 46.4: realloc(ptr, 0) is not equivalent to free(ptr) ..................................................................................... 262
Section 46.5: Multidimensional arrays of variable size ......................................................................................... 262
Section 46.6: alloca: allocate memory on stack .................................................................................................... 263
Section 46.7: User-defined memory management ............................................................................................... 264
Chapter 47: Implementation-defined behaviour ..................................................................................... 266
Section 47.1: Right shift of a negative integer ........................................................................................................ 266
Section 47.2: Assigning an out-of-range value to an integer ............................................................................... 266
Section 47.3: Allocating zero bytes .......................................................................................................................... 266
Section 47.4: Representation of signed integers ................................................................................................... 266
Chapter 48: Atomics ............................................................................................................................................... 267
Section 48.1: atomics and operators ....................................................................................................................... 267
Chapter 49: Jump Statements .......................................................................................................................... 268
Section 49.1: Using return ......................................................................................................................................... 268
Section 49.2: Using goto to jump out of nested loops .......................................................................................... 268
Section 49.3: Using break and continue ................................................................................................................. 269
Chapter 50: Create and include header files ............................................................................................. 271
Section 50.1: Introduction ......................................................................................................................................... 271
Section 50.2: Self-containment ................................................................................................................................ 271
Section 50.3: Minimality ............................................................................................................................................ 273
Section 50.4: Notation and Miscellany .................................................................................................................... 273
Section 50.5: Idempotence ....................................................................................................................................... 275
Section 50.6: Include What You Use (IWYU) ........................................................................................................... 275
Chapter 51: <ctype.h> — character classification & conversion ....................................................... 277
Section 51.1: Introduction .......................................................................................................................................... 277
Section 51.2: Classifying characters read from a stream ..................................................................................... 278
Section 51.3: Classifying characters from a string ................................................................................................. 279
Chapter 52: Side Eects ....................................................................................................................................... 280
Section 52.1: Pre/Post Increment/Decrement operators ..................................................................................... 280
Chapter 53: Multi-Character Character Sequence .................................................................................. 282
Section 53.1: Trigraphs .............................................................................................................................................. 282
Section 53.2: Digraphs .............................................................................................................................................. 282
Chapter 54: Constraints ........................................................................................................................................ 284
Section 54.1: Duplicate variable names in the same scope .................................................................................. 284
Section 54.2: Unary arithmetic operators .............................................................................................................. 284
Chapter 55: Inlining ................................................................................................................................................. 285
Section 55.1: Inlining functions used in more than one source file ....................................................................... 285
Chapter 56: Unions ................................................................................................................................................... 287
Section 56.1: Using unions to reinterpret values .................................................................................................... 287
Section 56.2: Writing to one union member and reading from another ............................................................. 287
Section 56.3: Dierence between struct and union ............................................................................................... 288
Chapter 57: Threads (native) ............................................................................................................................. 289
Section 57.1: Inititialization by one thread ............................................................................................................... 289
Section 57.2: Start several threads .......................................................................................................................... 289
Chapter 58: Multithreading ................................................................................................................................. 291
Section 58.1: C11 Threads simple example .............................................................................................................. 291
Chapter 59: Interprocess Communication (IPC) ........................................................................................ 292
Section 59.1: Semaphores ......................................................................................................................................... 292
Chapter 60: Testing frameworks ..................................................................................................................... 297
Section 60.1: Unity Test Framework ........................................................................................................................ 297
Section 60.2: CMocka ................................................................................................................................................ 297
Section 60.3: CppUTest ............................................................................................................................................. 298
Chapter 61: Valgrind ................................................................................................................................................ 300
Section 61.1: Bytes lost -- Forgetting to free ........................................................................................................... 300
Section 61.2: Most common errors encountered while using Valgrind ................................................................ 300
Section 61.3: Running Valgrind ................................................................................................................................. 301
Section 61.4: Adding flags ......................................................................................................................................... 301
Chapter 62: Common C programming idioms and developer practices ..................................... 302
Section 62.1: Comparing literal and variable .......................................................................................................... 302
Section 62.2: Do not leave the parameter list of a function blank — use void ................................................... 302
Chapter 63: Common pitfalls .............................................................................................................................. 305
Section 63.1: Mixing signed and unsigned integers in arithmetic operations ...................................................... 305
Section 63.2: Macros are simple string replacements .......................................................................................... 305
Section 63.3: Forgetting to copy the return value of realloc into a temporary .................................................. 307
Section 63.4: Forgetting to allocate one extra byte for \0 ................................................................................... 308
Section 63.5: Misunderstanding array decay ......................................................................................................... 308
Section 63.6: Forgetting to free memory (memory leaks) ................................................................................... 310
Section 63.7: Copying too much .............................................................................................................................. 311
Section 63.8: Mistakenly writing = instead of == when comparing ....................................................................... 312
Section 63.9: Newline character is not consumed in typical scanf() call ............................................................ 313
Section 63.10: Adding a semicolon to a #define .................................................................................................... 314
Section 63.11: Incautious use of semicolons ............................................................................................................ 314
Section 63.12: Undefined reference errors when linking ....................................................................................... 315
Section 63.13: Checking logical expression against 'true' ...................................................................................... 317
Section 63.14: Doing extra scaling in pointer arithmetic ....................................................................................... 318
Section 63.15: Multi-line comments cannot be nested ........................................................................................... 319
Section 63.16: Ignoring return values of library functions ..................................................................................... 321
Section 63.17: Comparing floating point numbers ................................................................................................. 321
Section 63.18: Floating point literals are of type double by default ..................................................................... 323
Section 63.19: Using character constants instead of string literals, and vice versa ........................................... 323
Section 63.20: Recursive function — missing out the base condition .................................................................. 324
Section 63.21: Overstepping array boundaries ...................................................................................................... 325
Section 63.22: Passing unadjacent arrays to functions expecting "real" multidimensional arrays ................. 326
Credits ............................................................................................................................................................................ 328
You may also like ...................................................................................................................................................... 333
About
Please feel free to share this PDF with anyone for free,
latest version of this book can be downloaded from:
https://goalkicker.com/CBook
This is an unofficial free book created for educational purposes and is not
affiliated with official C group(s) or company(s) nor Stack Overflow. All
trademarks and registered trademarks are the property of their respective
company owners
hello.c
#include <stdio.h>
int main(void)
{
puts("Hello, World");
return 0;
}
This line tells the compiler to include the contents of the standard library header file stdio.h in the program.
Headers are usually files containing function declarations, macros and data types, and you must include the header
file before you use them. This line includes stdio.h so it can call the function puts().
int main(void)
This line starts the definition of a function. It states the name of the function (main), the type and number of
arguments it expects (void, meaning none), and the type of value that this function returns (int). Program
execution starts in the main() function.
{
…
}
The curly braces are used in pairs to indicate where a block of code begins and ends. They can be used in a lot of
ways, but in this case they indicate where the function begins and ends.
puts("Hello, World");
This line calls the puts() function to output text to standard output (the screen, by default), followed by a newline.
"Hello, World" is the string that will be written to the screen. In C, every string literal value must be inside the
double quotes "…".
return 0;
When we defined main(), we declared it as a function returning an int, meaning it needs to return an integer. In
this example, we are returning the integer value 0, which is used to indicate that the program exited successfully.
After the return 0; statement, the execution process will terminate.
Simple text editors include vim or gedit on Linux, or Notepad on Windows. Cross-platform editors also include
Visual Studio Code or Sublime Text.
The editor must create plain text files, not RTF or other any other format.
To run the program, this source file (hello.c) first needs to be compiled into an executable file (e.g. hello on
Unix/Linux system or hello.exe on Windows). This is done using a compiler for the C language.
GCC (GNU Compiler Collection) is a widely used C compiler. To use it, open a terminal, use the command line to
navigate to the source file's location and then run:
If no errors are found in the the source code (hello.c), the compiler will create a binary file, the name of which is
given by the argument to the -o command line option (hello). This is the final executable file.
We can also use the warning options -Wall -Wextra -Werror, that help to identify problems that can cause the
program to fail or produce unexpected results. They are not necessary for this simple program but this is way of
adding them:
By design, the clang command line options are similar to those of GCC.
cl hello.c
Once compiled, the binary file may then be executed by typing ./hello in the terminal. Upon execution, the
compiled program will print Hello, World, followed by a newline, to the command prompt.
Version = K&R
#include <stdio.h>
main()
{
printf("hello, world\n");
}
Notice that the C programming language was not standardized at the time of writing the first edition of this book
(1978), and that this program will probably not compile on most modern compilers unless they are instructed to
accept C90 code.
This very first example in the K&R book is now considered poor quality, in part because it lacks an explicit return
type for main() and in part because it lacks a return statement. The 2nd edition of the book was written for the old
C89 standard. In C89, the type of main would default to int, but the K&R example does not return a defined value
to the environment. In C99 and later standards, the return type is required, but it is safe to leave out the return
statement of main (and only main), because of a special case introduced with C99 5.1.2.2.3 — it is equivalent to
returning 0, which indicates success.
The recommended and most portable form of main for hosted systems is int main (void) when the program does
not use any command line arguments, or int main(int argc, char **argv) when the program does use the
command line arguments.
A return from the initial call to the main function is equivalent to calling the exit function with the value
returned by the main function as its argument. If the main function executes a return that specifies no
value, the termination status returned to the host environment is undefined.
If a return statement without an expression is executed, and the value of the function call is used by the
If the return type of the main function is a type compatible with int, a return from the initial call to the
main function is equivalent to calling the exit function with the value returned by the main function as its
argument; reaching the } that terminates the main function returns a value of 0. If the return type is not
compatible with int, the termination status returned to the host environment is unspecified.
return 0;
}
#endif /* 0 */
/* this is a comment */
The comment above is a single line comment. Comments of this /* type can span multiple lines, like so:
/* this is a
multi-line
comment */
Though it is not strictly necessary, a common style convention with multi-line comments is to put leading spaces
and asterisks on the lines subsequent to the first, and the /* and */ on new lines, such that they all line up:
/*
* this is a
* multi-line
* comment
The extra asterisks do not have any functional effect on the comment as none of them have a related forward
slash.
These /* type of comments can be used on their own line, at the end of a code line, or even within lines of code:
Comments cannot be nested. This is because any subsequent /* will be ignored (as part of the comment) and the
first */ reached will be treated as ending the comment. The comment in the following example will not work:
/* outer comment, means this is ignored => /* attempted inner comment */ <= ends the comment, not
this one => */
To comment blocks of code that contain comments of this type, that would otherwise be nested, see the
Commenting using the preprocessor example below
C99 introduced the use of C++-style single-line comments. This type of comment starts with two forward slashes
and runs to the end of a line:
// this is a comment
This type of comment does not allow multi-line comments, though it is possible to make a comment block by
adding several single line comments one after the other:
This type of comment may be used on its own line or at the end of a code line. However, because they run to the
end of the line, they may not be used within a code line
While writing // delimited comments, it is possible to make a typographical error that affects their expected
operation. If one types:
The / at the end was a typo but now will get interpreted into \. This is because the ??/ forms a trigraph.
The ??/ trigraph is actually a longhand notation for \, which is the line continuation symbol. This means that the
compiler thinks the next line is a continuation of the current line, that is, a continuation of the comment, which may
not be what is intended.
The following set of operators with identical precedence and associativity are reused in declarators, namely:
The above three operators have the following precedence and associativity:
When interpreting declarations, one has to start from the identifier outwards and apply the adjacent operators in
the correct order as per the above table. Each application of an operator can be substituted with the following
English words:
Expression Interpretation
thing[X] an array of size X of...
thing(t1, t2, t3) a function taking t1, t2, t3 and returning...
*thing a pointer to...
It follows that the beginning of the English interpretation will always start with the identifier and will end with the
type that stands on the left-hand side of the declaration.
Examples
char *names[20];
[] takes precedence over *, so the interpretation is: names is an array of size 20 of a pointer to char.
char (*place)[10];
In case of using parentheses to override the precedence, the * is applied first: place is a pointer to an array of size
10 of char.
There is no precedence to worry about here: fn is a function taking long, short and returning int.
int *fn(void);
The () is applied first: fn is a function taking void and returning a pointer to int.
Overriding the precedence of (): fp is a pointer to a function taking void and returning int.
int arr[5][8];
Multidimensional arrays are not an exception to the rule; the [] operators are applied in left-to-right order
according to the associativity in the table: arr is an array of size 5 of an array of size 8 of int.
int **ptr;
The two dereference operators have equal precedence, so the associativity takes effect. The operators are applied
in right-to-left order: ptr is a pointer to a pointer to an int.
Multiple Declarations
The comma can be used as a separator (*not* acting like the comma operator) in order to delimit multiple
declarations within a single statement. The following statement contains five declarations:
int fn(void), *ptr, (*fp)(int), arr[10][20], num;
Alternative Interpretation
Because declarations mirror use, a declaration can also be interpreted in terms of the operators that could be
applied over the object and the final resulting type of that expression. The type that stands on the left-hand side is
the final result that is yielded after applying all operators.
/*
* Subscripting "arr" and dereferencing it yields a "char" result.
* Particularly: *arr[5] is of type "char".
*/
char *arr[20];
/*
* Calling "fn" yields an "int" result.
* Particularly: fn('b') is of type "int".
*/
int fn(char);
/*
* Dereferencing "fp" and then calling it yields an "int" result.
* Particularly: (*fp)() is of type "int".
*/
int (*fp)(void);
/*
* Subscripting "strings" twice and dereferencing it yields a "char" result.
* Particularly: *strings[5][15] is of type "char"
*/
char *strings[10][20];
The header <stdint.h> provides several fixed-width integer type definitions. These types are optional and only
provided if the platform has an integer type of the corresponding width, and if the corresponding signed type has a
two's complement representation of negative values.
See the remarks section for usage hints of fixed width types.
signed char c = 127; /* required to be 1 byte, see remarks for further information. */
signed short int si = 32767; /* required to be at least 16 bits. */
signed int i = 32767; /* required to be at least 16 bits */
signed long int li = 2147483647; /* required to be at least 32 bits. */
Version ≥ C99
signed long long int li = 2147483647; /* required to be at least 64 bits */
For all types but char the signed version is assumed if the signed or unsigned part is omitted. The type char
constitutes a third character type, different from signed char and unsigned char and the signedness (or not)
depends on the platform.
Different types of integer constants (called literals in C jargon) can be written in different bases, and different width,
based on their prefix or suffix.
Decimal constants are always signed. Hexadecimal constants start with 0x or 0X and octal constants start just with
a 0. The latter two are signed or unsigned depending on whether the value fits into the signed type or not.
Without a suffix the constant has the first type that fits its value, that is a decimal constant that is larger than
The header file <limits.h> describes the limits of integers as follows. Their implementation-defined values shall be
equal or greater in magnitude (absolute value) to those shown below, with the same sign.
If the value of an object of type char sign-extends when used in an expression, the value of CHAR_MIN shall be the
same as that of SCHAR_MIN and the value of CHAR_MAX shall be the same as that of SCHAR_MAX . If the value of an
object of type char does not sign-extend when used in an expression, the value of CHAR_MIN shall be 0 and the
value of CHAR_MAX shall be the same as that of UCHAR_MAX.
Version ≥ C99
The C99 standard added a new header, <stdint.h>, which contains definitions for fixed width integers. See the
fixed width integer example for a more in-depth explanation.
Floating point arithmetic is implementation defined. However, most modern platforms (arm, x86, x86_64, MIPS) use
IEEE 754 floating point operations.
C also has three optional complex floating point types that are derived from the above.
String literals are not modifiable (and in fact may be placed in read-only memory such as .rodata). Attempting to
alter their values results in undefined behaviour.
char* s = "foobar";
s[0] = 'F'; /* undefined behaviour */
Multiple string literals are concatenated at compile time, which means you can write construct like these.
C has many powerful operators. Many C operators are binary operators, which means they have two operands. For
example, in a / b, / is a binary operator that accepts two operands (a, b). There are some unary operators which
take one operand (for example: ~, ++), and only one ternary operator ? :.
Equals "=="
1 == 0; /* evaluates to 0. */
1 == 1; /* evaluates to 1. */
int x = 5;
int y = 5;
int *xptr = &x, *yptr = &y;
xptr == yptr; /* evaluates to 0, the operands hold different location addresses. */
*xptr == *yptr; /* evaluates to 1, the operands point at locations that hold the same value. */
Attention: This operator should not be confused with the assignment operator (=)!
1 != 0; /* evaluates to 1. */
1 != 1; /* evaluates to 0. */
int x = 5;
int y = 5;
int *xptr = &x, *yptr = &y;
xptr != yptr; /* evaluates to 1, the operands hold different location addresses. */
*xptr != *yptr; /* evaluates to 0, the operands point at locations that hold the same value. */
This operator effectively returns the opposite result to that of the equals (==) operator.
Not "!"
!someVal
Checks whether the left hand operand has a greater value than the right hand operand
5 > 4 /* evaluates to 1. */
4 > 5 /* evaluates to 0. */
4 > 4 /* evaluates to 0. */
Checks whether the left hand operand has a smaller value than the right hand operand
5 < 4 /* evaluates to 0. */
4 < 5 /* evaluates to 1. */
4 < 4 /* evaluates to 0. */
Checks whether the left hand operand has a greater or equal value to the right operand.
5 >= 4 /* evaluates to 1. */
4 >= 5 /* evaluates to 0. */
4 >= 4 /* evaluates to 1. */
Checks whether the left hand operand has a smaller or equal value to the right operand.
5 <= 4 /* evaluates to 0. */
4 <= 5 /* evaluates to 1. */
4 <= 4 /* evaluates to 1. */
a = b ? c : d;
is equivalent to:
if (b)
a = c;
else
a = d;
This pseudo-code represents it : condition ? value_if_true : value_if_false. Each value can be the result of
an evaluated expression.
int x = 5;
int y = 42;
printf("%i, %i\n", 1 ? x : y, 0 ? x : y); /* Outputs "5, 42" */
The following example writes even integers to one file and odd integers to another file:
#include<stdio.h>
int main()
{
FILE *even, *odds;
int n = 10;
size_t k = 0;
return 0;
}
The conditional operator associates from right to left. Consider the following:
Symbol Operator
& bitwise AND
| bitwise inclusive OR
^ bitwise exclusive OR (XOR)
~ bitwise not (one's complement)
<< logical left shift
>> logical right shift
#include <stdio.h>
int main(void)
{
c = a | b; /* 61 = 0011 1101 */
printf("%d | %d = %d\n", a, b, c );
c = a ^ b; /* 45 = 0010 1101 */
printf("%d ^ %d = %d\n", a, b, c );
return 0;
}
Bitwise operations with signed types should be avoided because the sign bit of such a bit representation has a
particular meaning. Particular restrictions apply to the shift operators:
Left shifting a 1 bit into the signed bit is erroneous and leads to undefined behavior.
Right shifting a negative value (with sign bit 1) is implementation defined and therefore not portable.
If the value of the right operand of a shift operator is negative or is greater than or equal to the width of the
promoted left operand, the behavior is undefined.
Masking:
Masking refers to the process of extracting the desired bits from (or transforming the desired bits in) a variable by
using logical bitwise operations. The operand (a constant or variable) that is used to perform masking is called a
mask.
The following function uses a mask to display the bit pattern of a variable:
#include <limits.h>
void bit_pattern(int u)
{
int i, x, word;
unsigned mask = 1;
Example:
#include <stdio.h>
int main(void) {
int a = 20;
int b = -5;
return 0;
}
#include <stdio.h>
int print(int i) {
printf("print function %d\n", i);
return i;
}
int main(void) {
int a = 20;
return 0;
}
$ ./a.out
print function 20
I will be printed!
Short circuiting is important, when you want to avoid evaluating terms that are (computationally) costly. Moreover,
it can heavily affect the flow of your program like in this case: Why does this program print "forked!" 4 times?
Note that the comma used in functions calls that separate arguments is NOT the comma operator, rather it's called a
separator which is different from the comma operator. Hence, it doesn't have the properties of the comma operator.
The above printf() call contains both the comma operator and the separator.
The comma operator is often used in the initialization section as well as in the updating section of a for loop. For
example:
for(k = 1; k < 10; printf("\%d\\n", k), k += 2); /*outputs the odd numbers below 9/*
Return a value that is the result of applying the left hand operand to the right hand operand, using the associated
mathematical operation. Normal mathematical rules of commutation apply (i.e. addition and multiplication are
commutative, subtraction, division and modulus are not).
Addition Operator
The addition operator (+) is used to add two operands together. Example:
#include <stdio.h>
int main(void)
{
int a = 5;
int b = 7;
return 0;
}
Subtraction Operator
The subtraction operator (-) is used to subtract the second operand from the first. Example:
#include <stdio.h>
int main(void)
{
int a = 10;
int b = 7;
return 0;
}
Multiplication Operator
#include <stdio.h>
int main(void)
{
int a = 5;
int b = 7;
return 0;
}
Division Operator
The division operator (/) divides the first operand by the second. If both operands of the division are integers, it will
return an integer value and discard the remainder (use the modulo operator % for calculating and acquiring the
remainder).
If one of the operands is a floating point value, the result is an approximation of the fraction.
Example:
#include <stdio.h>
return 0;
}
Modulo Operator
The modulo operator (%) receives integer operands only, and is used to calculate the remainder after the first
operand is divided by the second. Example:
#include <stdio.h>
return 0;
}
a--
) operators are different in that they change the value of the variable you apply them to without an assignment
operator. You can use increment and decrement operators either before or after the variable. The placement of the
operator changes the timing of the incrementation/decrementation of the value to before or after assigning it to
the variable. Example:
#include <stdio.h>
int main(void)
{
int a = 1;
int b = 4;
a++;
printf("a = %d\n",a); /* Will output "a = 2" */
b--;
printf("b = %d\n",b); /* Will output "b = 3" */
As the example for c and d shows, both operators have two forms, as prefix notation and postfix notation. Both
have the same effect in incrementing (++) or decrementing (--) the variable, but differ by the value they return:
prefix operations do the operation first and then return the value, whereas postfix operations first determine the
value that is to be returned, and then do the operation.
Because of this potentially counter-intuitive behaviour, the use of increment/decrement operators inside
expressions is controversial.
Member of object
Evaluates into the lvalue denoting the object that is a member of the accessed object.
struct MyStruct
{
int x;
int y;
};
Syntactic sugar for dereferencing followed by member access. Effectively, an expression of the form x->y is
shorthand for (*x).y — but the arrow operator is much clearer, especially if the structure pointers are nested.
struct MyStruct
{
int x;
int y;
p->x = 42;
p->y = 123;
Address-of
The unary & operator is the address of operator. It evaluates the given expression, where the resulting object must
be an lvalue. Then, it evaluates into an object whose type is a pointer to the resulting object's type, and contains the
address of the resulting object.
int x = 3;
int *p = &x;
printf("%p = %p\n", (void *)&x, (void *)p); /* Outputs "A = A", for some implementation-defined A.
*/
Dereference
The unary * operator dereferences a pointer. It evaluates into the lvalue resulting from dereferencing the pointer
that results from evaluating the given expression.
int x = 42;
int *p = &x;
printf("x = %d, *p = %d\n", x, *p); /* Outputs "x = 42, *p = 42". */
*p = 123;
printf("x = %d, *p = %d\n", x, *p); /* Outputs "x = 123, *p = 123". */
Indexing
Indexing is syntactic sugar for pointer addition followed by dereferencing. Effectively, an expression of the form
a[i] is equivalent to *(a + i) — but the explicit subscript notation is preferred.
int arr[] = { 1, 2, 3, 4, 5 };
printf("arr[2] = %i\n", arr[2]); /* Outputs "arr[2] = 3". */
Interchangeability of indexing
Adding a pointer to an integer is a commutative operation (i.e. the order of the operands does not change the
result) so pointer + integer == integer + pointer.
Usage of an expression 3[arr] instead of arr[3] is generally not recommended, as it affects code readability. It
tends to be a popular in obfuscated programming contests.
Evaluates into the size in bytes, of type size_t, of objects of the given type. Requires parentheses around the type.
printf("%zu\n", sizeof(int)); /* Valid, outputs the size of an int object, which is platform-
dependent. */
printf("%zu\n", sizeof int); /* Invalid, types as arguments need to be surrounded by parentheses! */
Evaluates into the size in bytes, of type size_t, of objects of the type of the given expression. The expression itself
is not evaluated. Parentheses are not required; however, because the given expression must be unary, it's
considered best practice to always use them.
char ch = 'a';
printf("%zu\n", sizeof(ch)); /* Valid, will output the size of a char object, which is always 1 for
all platforms. */
printf("%zu\n", sizeof ch); /* Valid, will output the size of a char object, which is always 1 for
all platforms. */
int x = 3;
int y = 4;
printf("%f\n", (double)x / y); /* Outputs "0.750000". */
Here the value of x is converted to a double, the division promotes the value of y to double, too, and the result of
the division, a double is passed to printf for printing.
int a = 1;
int b = 1;
int tmp = 0;
Note that arithmetic operations do not introduce sequence points, so certain expressions with ++ or -- operators
may introduce undefined behaviour.
a += b /* equal to: a = a + b */
a -= b /* equal to: a = a - b */
a *= b /* equal to: a = a * b */
a /= b /* equal to: a = a / b */
a %= b /* equal to: a = a % b */
a &= b /* equal to: a = a & b */
a |= b /* equal to: a = a | b */
a ^= b /* equal to: a = a ^ b */
a <<= b /* equal to: a = a << b */
a >>= b /* equal to: a = a >> b */
One important feature of these compound assignments is that the expression on the left hand side (a) is only
evaluated once. E.g if p is a pointer
*p += 27;
*p = *p + 27;
It should also be noted that the result of an assignment such as a = b is what is known as an rvalue. Thus, the
assignment actually has a value which can then be assigned to another variable. This allows the chaining of
assignments to set multiple variables in a single statement.
This rvalue can be used in the controlling expressions of if statements (or loops or switch statements) that guard
char *buffer;
if ((buffer = malloc(1024)) != NULL)
{
/* do something with buffer */
free(buffer);
}
else
{
/* report allocation failure */
}
Because of this, care must be taken to avoid a common typo which can lead to mysterious bugs.
int a = 2;
/* ... */
if (a = 1)
/* Delete all files on my hard drive */
This will have disastrous results, as a = 1 will always evaluate to 1 and thus the controlling expression of the if
statement will always be true (read more about this common pitfall here). The author almost certainly meant to use
the equality operator (==) as shown below:
int a = 2;
/* ... */
if (a == 1)
/* Delete all files on my hard drive */
Operator Associativity
int a, b = 1, c = 2;
a = b = c;
This assigns c to b, which returns b, which is than assigned to a. This happens because all assignment-operators
have right associativity, that means the rightmost operation in the expression is evaluated first, and proceeds from
right to left.
Performs a logical boolean AND-ing of the two operands returning 1 if both of the operands are non-zero. The
logical AND operator is of type int.
0 && 0 /* Returns 0. */
0 && 1 /* Returns 0. */
2 && 0 /* Returns 0. */
2 && 3 /* Returns 1. */
Logical OR
Performs a logical boolean OR-ing of the two operands returning 1 if any of the operands are non-zero. The logical
OR operator is of type int.
0 || 0 /* Returns 0. */
Logical NOT
Performs a logical negation. The logical NOT operator is of type int. The NOT operator checks if at least one bit is
equal to 1, if so it returns 0. Else it returns 1;
!1 /* Returns 0. */
!5 /* Returns 0. */
!0 /* Returns 1. */
Short-Circuit Evaluation
There are some crucial properties common to both && and ||:
the left-hand operand (LHS) is fully evaluated before the right-hand operand (RHS) is evaluated at all,
there is a sequence point between the evaluation of the left-hand operand and the right-hand operand,
and, most importantly, the right-hand operand is not evaluated at all if the result of the left-hand operand
determines the overall result.
if the LHS evaluates to 'true' (non-zero), the RHS of || will not be evaluated (because the result of 'true OR
anything' is 'true'),
if the LHS evaluates to 'false' (zero), the RHS of && will not be evaluated (because the result of 'false AND
anything' is 'false').
If a negative value is passed to the function, the value >= 0 term evaluates to false and the value < NUM_NAMES
term is not evaluated.
Given a pointer and a scalar type N, evaluates into a pointer to the Nth element of the pointed-to type that directly
succeeds the pointed-to object in memory.
It does not matter if the pointer is used as the operand value or the scalar value. This means that things such as 3 +
arr are valid. If arr[k] is the k+1 member of an array, then arr+k is a pointer to arr[k]. In other words, arr or
arr+0 is a pointer to arr[0], arr+1 is a pointer to arr[2], and so on. In general, *(arr+k) is same as arr[k].
#include<stdio.h>
static const size_t N = 5
int main()
{
size_t k = 0;
int arr[] = {1, 2, 3, 4, 5};
for(k = 0; k < N; k++)
{
printf("\n\t%d", *(arr + k));
}
return 0;
}
By defining a pointer to the array, the above program is equivalent to the following:
#include<stdio.h>
static const size_t N = 5
int main()
{
size_t k = 0;
int arr[] = {1, 2, 3, 4, 5};
int *ptr = arr; /* or int *ptr = &arr[0]; */
for(k = 0; k < N; k++)
{
printf("\n\t%d", ptr[k]);
/* or printf("\n\t%d", *(ptr + k)); */
/* or printf("\n\t%d", *ptr++); */
}
return 0;
}
See that the members of the array arr are accessed using the operators + and ++. The other operators that can be
used with the pointer ptr are - and --.
Pointer subtraction
Given two pointers to the same type, evaluates into an object of type ptrdiff_t that holds the scalar value that
must be added to the second pointer in order to obtain the value of the first pointer.
The type name may not be an incomplete type nor a function type. If an array is used as the type, the type of the
array element is used.
This operator is often accessed through the convenience macro alignof from <stdalign.h>.
int main(void)
{
printf("Alignment of char = %zu\n", alignof(char));
printf("Alignment of max_align_t = %zu\n", alignof(max_align_t));
printf("alignof(float[10]) = %zu\n", alignof(float[10]));
printf("alignof(struct{char c; int n;}) = %zu\n",
alignof(struct {char c; int n;}));
}
Possible Output:
Alignment of char = 1
Alignment of max_align_t = 16
alignof(float[10]) = 4
alignof(struct{char c; int n;}) = 4
http://en.cppreference.com/w/c/language/_Alignof
Using the system header file stdbool.h allows you to use bool as a Boolean data type. true evaluates to 1 and
false evaluates to 0.
#include <stdio.h>
#include <stdbool.h>
int main(void) {
bool x = true; /* equivalent to bool x = 1; */
bool y = false; /* equivalent to bool y = 0; */
if (x) /* Functionally equivalent to if (x != 0) or if (x != false) */
{
puts("This will print!");
}
if (!y) /* Functionally equivalent to if (y == 0) or if (y == false) */
{
puts("This will also print!");
}
}
bool is just a nice spelling for the data type _Bool. It has special rules when numbers or pointers are converted to it.
#include <stdio.h>
int main(void) {
bool x = true; /* Equivalent to int x = 1; */
bool y = false; /* Equivalent to int y = 0; */
if (x) /* Functionally equivalent to if (x != 0) or if (x != false) */
{
puts("This will print!");
}
if (!y) /* Functionally equivalent to if (y == 0) or if (y == false) */
{
puts("This will also print!");
}
}
Don't introduce this in new code since the definition of these macros might clash with modern uses of
<stdbool.h>.
Added in the C standard version C99, _Bool is also a native C data type. It is capable of holding the values 0 (for
false) and 1 (for true).
#include <stdio.h>
int main(void) {
_Bool x = 1;
_Bool y = 0;
if(x) /* Equivalent to if (x == 1) */
{
puts("This will print!");
}
if (!y) /* Equivalent to if (y == 0) */
{
puts("This will also print!");
}
}
_Bool is an integer type but has special rules for conversions from other types. The result is analogous to the usage
of other types in if expressions. In the following
_Bool z = X;
If X has an arithmetic type (is any kind of number), z becomes 0 if X == 0. Otherwise z becomes 1.
If X has a pointer type, z becomes 0 if X is a null pointer and 1 otherwise.
To use nicer spellings bool, false and true you need to use <stdbool.h>.
The expression argc % 4 is evaluated and leads to one of the values 0, 1, 2 or 3. The first, 0 is the only value that is
"false" and brings execution into the else part. All other values are "true" and go into the if part.
Here the pointer A is evaluated and if it is a null pointer, an error is detected and the program exits.
Many people prefer to write something as A == NULL, instead, but if you have such pointer comparisons as part of
For this to check, you'd have to scan a complicated code in the expression and be sure about operator preference.
is relatively easy to capture: if the pointer is valid we check if the first character is non-zero and then check if it is a
letter.
This allows compilers for historic versions of C to function, but remains forward compatible if the code is compiled
with a modern C compiler.
For more information on typedef, see Typedef, for more on enum see Enumerations
This means that a C-string with a content of "abc" will have four characters 'a', 'b', 'c' and '\0'.
#include <stdio.h>
#include <string.h>
int main(void)
{
int toknum = 0;
char src[] = "Hello,, world!";
const char delimiters[] = ", !";
char *token = strtok(src, delimiters);
while (token != NULL)
{
printf("%d: [%s]\n", ++toknum, token);
token = strtok(NULL, delimiters);
}
/* source is now "Hello\0, world\0\0" */
}
Output:
1: [Hello]
2: [world]
The string of delimiters may contain one or more delimiters and different delimiter strings may be used with each
call to strtok.
Calls to strtok to continue tokenizing the same source string should not pass the source string again, but instead
pass NULL as the first argument. If the same source string is passed then the first token will instead be re-tokenized.
That is, given the same delimiters, strtok would simply return the first token again.
Note that as strtok does not allocate new memory for the tokens, it modifies the source string. That is, in the above
example, the string src will be manipulated to produce the tokens that are referenced by the pointer returned by
the calls to strtok. This means that the source string cannot be const (so it can't be a string literal). It also means
that the identity of the delimiting byte is lost (i.e. in the example the "," and "!" are effectively deleted from the
source string and you cannot tell which delimiter character matched).
Note also that multiple consecutive delimiters in the source string are treated as one; in the example, the second
comma is ignored.
strtok is neither thread safe nor re-entrant because it uses a static buffer while parsing. This means that if a
function calls strtok, no function that it calls while it is using strtok can also use strtok, and it cannot be called by
any function that is itself using strtok.
do
{
char *part;
/* Nested calls to strtok do not work as desired */
printf("[%s]\n", first);
part = strtok(first, ".");
while (part != NULL)
{
printf(" [%s]\n", part);
part = strtok(NULL, ".");
}
} while ((first = strtok(NULL, ",")) != NULL);
Output:
[1.2]
[1]
[2]
The expected operation is that the outer do while loop should create three tokens consisting of each decimal
number string ("1.2", "3.5", "4.2"), for each of which the strtok calls for the inner loop should split it into
separate digit strings ("1", "2", "3", "5", "4", "2").
However, because strtok is not re-entrant, this does not occur. Instead the first strtok correctly creates the "1.2\0"
token, and the inner loop correctly creates the tokens "1" and "2". But then the strtok in the outer loop is at the
end of the string used by the inner loop, and returns NULL immediately. The second and third substrings of the src
array are not analyzed at all.
The standard C libraries do not contain a thread-safe or re-entrant version but some others do, such as POSIX'
strtok_r. Note that on MSVC the strtok equivalent, strtok_s is thread-safe.
Version ≥ C11
C11 has an optional part, Annex K, that offers a thread-safe and re-entrant version named strtok_s. You can test
for the feature with __STDC_LIB_EXT1__. This optional part is not widely supported.
The strtok_s function differs from the POSIX strtok_r function by guarding against storing outside of the string
being tokenized, and by checking runtime constraints. On correctly written programs, though, the strtok_s and
strtok_r behave the same.
Using strtok_s with the example now yields the correct response, like so:
#ifndef __STDC_LIB_EXT1__
# error "we need strtok_s from Annex K"
#endif
do
{
char *part;
char *posn;
printf("[%s]\n", first);
part = strtok_s(first, ".", &posn);
while (part != NULL)
{
printf(" [%s]\n", part);
part = strtok_s(NULL, ".", &posn);
}
}
while ((first = strtok_s(NULL, ",", &next)) != NULL);
[1.2]
[1]
[2]
[3.5]
[3]
[5]
[4.2]
[4]
[2]
For historical reasons, the elements of the array corresponding to a string literal are not formally const.
Nevertheless, any attempt to modify them has undefined behavior. Typically, a program that attempts to modify
the array corresponding to a string literal will crash or otherwise malfunction.
Where a pointer points to a string literal -- or where it sometimes may do -- it is advisable to declare that pointer's
referent const to avoid engaging such undefined behavior accidentally.
On the other hand, a pointer to or into the underlying array of a string literal is not itself inherently special; its value
can freely be modified to point to something else:
Furthermore, although initializers for char arrays can have the same form as string literals, use of such an initializer
does not confer the characteristics of a string literal on the initialized array. The initializer simply designates the
length and initial contents of the array. In particular, the elements are modifiable if not explicitly declared const:
return EXIT_SUCCESS;
}
This program computes the length of its second input argument and stores the result in len. It then prints that
length to the terminal. For example, when run with the parameters program_name "Hello, world!", the program
will output The length of the second argument is 13. because the string Hello, world! is 13 characters long.
strlen counts all the bytes from the beginning of the string up to, but not including, the terminating NUL
character, '\\0'. As such, it can only be used when the string is guaranteed to be NUL-terminated.
Also keep in mind that if the string contains any Unicode characters, strlen will not tell you how many characters
are in the string (since some characters may be multiple bytes long). In such cases, you need to count the
characters (i.e., code units) yourself. Consider the output of the following example:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void)
{
char asciiString[50] = "Hello world!";
char utf8String[50] = "Γειά σου Κόσμε!"; /* "Hello World!" in Greek */
We can create strings using string literals, which are sequences of characters surrounded by double quotation
marks; for example, take the string literal "hello world". String literals are automatically null-terminated.
We can create strings using several methods. For instance, we can declare a char * and initialize it to point to the
first character of a string:
When initializing a char * to a string constant as above, the string itself is usually allocated in read-only data;
string is a pointer to the first element of the array, which is the character 'h'.
Since the string literal is allocated in read-only memory, it is non-modifiable1. Any attempt to modify it will lead to
undefined behaviour, so it's better to add const to get a compile-time error like this
To create a modifiable string, you can declare a character array and initialize its contents using a string literal, like
so:
char modifiable_string[] = {'h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd', '\0'};
Since the second version uses brace-enclosed initializer, the string is not automatically null-terminated unless a
'\0' character is included explicitly in the character array usually as its last element.
1 Non-modifiable implies that the characters in the string literal can't be modified, but remember that the pointer
string can be modified (can point somewhere else or can be incremented or decremented).
2 Both strings have similar effect in a sense that characters of both strings can't be modified. It should be noted
that string is a pointer to char and it is a modifiable l-value so it can be incremented or point to some other
location while the array string_arr is a non-modifiable l-value, it can't be modified.
#include <stdio.h>
int main(void) {
int a = 10, b;
char c[] = "abc", *d;
b = a; /* Integer is copied */
a = 20; /* Modifying a leaves b unchanged - b is a 'deep copy' of a */
printf("%d %d\n", a, b); /* "20 10" will be printed */
d = c;
/* Only copies the address of the string -
there is still only one string stored in memory */
c[1] = 'x';
/* Modifies the original string - d[1] = 'x' will do exactly the same thing */
return 0;
}
The above example compiled because we used char *d rather than char d[3]. Using the latter would cause a
compiler error. You cannot assign to arrays in C.
#include <stdio.h>
int main(void) {
char a[] = "abc";
char b[8];
b = a; /* compile error */
printf("%s\n", b);
return 0;
}
To actually copy strings, strcpy() function is available in string.h. Enough space must be allocated for the
destination before copying.
#include <stdio.h>
#include <string.h>
int main(void) {
char a[] = "abc";
char b[8];
return 0;
}
Version ≥ C99
To avoid buffer overrun, snprintf() may be used. It is not the best solution performance-wise since it has to parse
the template string, but it is the only buffer limit-safe function for copying strings readily-available in standard
library, that can be used without any extra steps.
#include <stdio.h>
#include <string.h>
int main(void) {
char a[] = "012345678901234567890";
char b[8];
#if 0
strcpy(b, a); /* causes buffer overrun (undefined behavior), so do not execute this here! */
#endif
return 0;
}
strncat()
A second option, with better performance, is to use strncat() (a buffer overflow checking version of strcat()) - it
takes a third argument that tells it the maximum number of bytes to copy:
char dest[32];
dest[0] = '\0';
strncat(dest, source, sizeof(dest) - 1);
/* copies up to the first (sizeof(dest) - 1) elements of source into dest,
then puts a \0 on the end of dest */
Note that this formulation use sizeof(dest) - 1; this is crucial because strncat() always adds a null byte (good),
but doesn't count that in the size of the string (a cause of confusion and buffer overwrites).
Also note that the alternative — concatenating after a non-empty string — is even more fraught. Consider:
Note, though, that the size specified as the length was not the size of the destination array, but the amount of space
left in it, not counting the terminal null byte. This can cause big overwriting problems. It is also a bit wasteful; to
specify the length argument correctly, you know the length of the data in the destination, so you could instead
specify the address of the null byte at the end of the existing content, saving strncat() from rescanning it:
This produces the same output as before, but strncat() doesn't have to scan over the existing content of dst
before it starts copying.
strncpy()
The last option is the strncpy() function. Although you might think it should come first, it is a rather deceptive
function that has two main gotchas:
1. If copying via strncpy() hits the buffer limit, a terminating null-character won't be written.
2. strncpy() always completely fills the destination, with null bytes if necessary.
(Such quirky implementation is historical and was initially intended for handling UNIX file names)
Even then, if you have a big buffer it becomes very inefficient to use strncpy() because of additional null padding.
char * string = "hello world"; /* This 11 chars long, excluding the 0-terminator. */
size_t i = 0;
for (; i < 11; i++) {
printf("%c\n", string[i]); /* Print each character of the string. */
}
Alternatively, we can use the standard function strlen() to get the length of a string if we don't know what the
string is:
Finally, we can take advantage of the fact that strings in C are guaranteed to be null-terminated (which we already
did when passing it to strlen() in the previous example ;-)). We can iterate over the array regardless of its size and
stop iterating once we reach a null-character:
size_t i = 0;
while (string[i] != '\0') { /* Stop looping when we reach the null-character. */
printf("%c\n", string[i]); /* Print each character of the string. */
i++;
}
char * string_array[] = {
"foo",
"bar",
"baz"
};
Remember: when we assign string literals to char *, the strings themselves are allocated in read-only memory.
However, the array string_array is allocated in read/write memory. This means that we can modify the pointers in
the array, but we cannot modify the strings they point to.
In C, the parameter to main argv (the array of command-line arguments passed when the program was run) is an
array of char *: char * argv[].
We can also create arrays of character arrays. Since strings are arrays of characters, an array of strings is simply an
array whose elements are arrays of characters:
char modifiable_string_array_literals[][4] = {
"foo",
"bar",
"baz"
};
char modifiable_string_array[][4] = {
{'f', 'o', 'o', '\0'},
{'b', 'a', 'r', '\0'},
{'b', 'a', 'z', '\0'}
};
Note that we specify 4 as the size of the second dimension of the array; each of the strings in our array is actually 4
bytes since we must include the null-terminating character.
#include <stdio.h>
#include <stdlib.h>
val = atoi(argv[1]);
return 0;
}
When the string to be converted is a valid decimal integer that is in range, the function works:
$ ./atoi 100
String value = 100, Int value = 100
$ ./atoi 200
String value = 200, Int value = 200
For strings that start with a number, followed by something else, only the initial number is parsed:
$ ./atoi 0x200
0
$ ./atoi 0123x300
123
$ ./atoi hello
Formatting the hard disk...
Because of the ambiguities above and this undefined behavior, the atoi family of functions should never be used.
Version ≥ C99
#include <stdio.h>
int main ()
{
char buffer [50];
double PI = 3.1415926;
sprintf (buffer, "PI = %.7f", PI);
printf ("%s\n",buffer);
return 0;
}
#include <stdio.h>
int main ()
{
char sentence []="date : 06-06-2012";
char str [50];
int year;
int month;
int day;
sscanf (sentence,"%s : %2d-%2d-%4d", str, &day, &month, &year);
printf ("%s -> %02d-%02d-%4d\n",str, day, month, year);
return 0;
}
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void)
{
char toSearchFor = 'A';
{
char *firstOcc = strchr(argv[1], toSearchFor);
if (firstOcc != NULL)
{
printf("First position of %c in %s is %td.\n",
toSearchFor, argv[1], firstOcc-argv[1]); /* A pointer difference's result
is a signed integer and uses the length modifier 't'. */
}
else
{
printf("%c is not in %s.\n", toSearchFor, argv[1]);
}
}
{
char *lastOcc = strrchr(argv[1], toSearchFor);
if (lastOcc != NULL)
{
printf("Last position of %c in %s is %td.\n",
return EXIT_SUCCESS;
}
$ ./pos AAAAAAA
First position of A in AAAAAAA is 0.
Last position of A in AAAAAAA is 6.
$ ./pos BAbbbbbAccccAAAAzzz
First position of A in BAbbbbbAccccAAAAzzz is 1.
Last position of A in BAbbbbbAccccAAAAzzz is 15.
$ ./pos qwerty
A is not in qwerty.
One common use for strrchr is to extract a file name from a path. For example to extract myfile.txt from
C:\Users\eak\myfile.txt:
return NULL;
}
int main(void)
{
/* Always ensure that your string is large enough to contain the characters
* and a terminating NUL character ('\0')!
*/
char mystring[10];
return 0;
}
Outputs:
foo
foobar
bar
The strcmp function lexicographically compare two null-terminated character arrays. The functions return a
negative value if the first argument appears before the second in lexicographical order, zero if they compare equal,
or positive if the first argument appears after the second in lexicographical order.
#include <stdio.h>
#include <string.h>
int main(void)
{
compare("BBB", "BBB");
compare("BBB", "CCCCC");
compare("BBB", "AAAAAA");
return 0;
}
Outputs:
As strcmp, strcasecmp function also compares lexicographically its arguments after translating each character to its
lowercase correspondent:
int main(void)
{
compare("BBB", "bBB");
compare("BBB", "ccCCC");
compare("BBB", "aaaaaa");
return 0;
}
Outputs:
#include <stdio.h>
#include <string.h>
int main(void)
{
compare("BBB", "Bb", 1);
compare("BBB", "Bb", 2);
compare("BBB", "Bb", 3);
return 0;
}
Outputs:
BBB equals Bb
BBB comes before Bb
BBB comes before Bb
Since C99 the C library has a set of safe conversion functions that interpret a string as a number. Their names are of
the form strtoX, where X is one of l, ul, d, etc to determine the target type of the conversion
/* At this point we know that everything went fine so ret may be used */
If the string in fact contains no number at all, this usage of strtod returns 0.0.
If this is not satisfactory, the additional parameter endptr can be used. It is a pointer to pointer that will be pointed
to the end of the detected number in the string. If it is set to 0, as above, or NULL, it is simply ignored.
This endptr parameter provides indicates if there has been a successful conversion and if so, where the number
ended:
char *check = 0;
double ret = strtod(argv[1], &check); /* attempt conversion */
/* At this point we know that everything went fine so ret may be used */
These functions have a third parameter nbase that holds the number base in which the number is written.
long a = strtol("101", 0, 2 ); /* a = 5L */
long b = strtol("101", 0, 8 ); /* b = 65L */
long c = strtol("101", 0, 10); /* c = 101L */
long d = strtol("101", 0, 16); /* d = 257L */
long e = strtol("101", 0, 0 ); /* e = 101L */
The special value 0 for nbase means the string is interpreted in the same way as number literals are interpreted in a
C program: a prefix of 0x corresponds to a hexadecimal representation, otherwise a leading 0 is octal and all other
numbers are seen as decimal.
Thus the most practical way to interpret a command-line argument as a number would be
...
return EXIT_SUCCESS;
}
This means that the program can be called with a parameter in octal, decimal or hexadecimal.
/*
Provided a string of "tokens" delimited by "separators", print the tokens along
with the token separators that get skipped.
*/
#include <stdio.h>
#include <string.h>
int main(void)
{
const char sepchars[] = ",.;!?";
char foo[] = ";ball call,.fall gall hall!?.,";
char *s;
int n;
if (n > 0)
printf("skipping separators: << %.*s >> (length=%d)\n", n, s, n);
return 0;
}
Analogous functions using wide-character strings are wcsspn and wcscspn; they're used the same way.
In order to use these suffixes, the literal must be a floating point literal. For example, 3f is an error, since 3 is an
integer literal, while 3.f or 3.0f are correct. For long double, the recommendation is to always use capital L for the
sake of readability.
The L prefix makes the literal a wide character array, of type wchar_t*. For example, L"abcd".
For the latter two, it can be queried with feature test macros if the encoding is effectively the corresponding UTF
encoding.
The L prefix before a character literal makes it a wide character of type wchar_t. Likewise since C11 u and U prefixes
make it wide characters of type char16_t and char32_t, respectively.
When intending to represent certain special characters, such as a character that is non-printing, escape sequences
are used. Escape sequences use a sequence of characters that are translated into another character. All escape
sequences consist of two or more characters, the first of which is a backslash \. The characters immediately
following the backslash determine what character literal the sequence is interpreted as.
A universal character name is a Unicode code point. A universal character name may map to more than one
character. The digits n are interpreted as hexadecimal digits. Depending on the UTF encoding in use, a universal
character name sequence may result in a code point that consists of multiple characters, instead of a single normal
char character.
When using the line feed escape sequence in text mode I/O, it is converted to the OS-specific newline byte or byte
sequence.
The question mark escape sequence is used to avoid trigraphs. For example, ??/ is compiled as the trigraph
representing a backslash character '\', but using ?\?/ would result in the string "??/".
There may be one, two or three octal numerals n in the octal value escape sequence.
Note that this writing doesn't include any sign, so integer literals are always positive. Something like -1 is treated as
an expression that has one integer literal (1) that is negated with a -
The type of a decimal integer literal is the first data type that can fit the value from int and long. Since C99, long
long is also supported for very large literals.
The type of an octal or hexadecimal integer literal is the first data type that can fit the value from int, unsigned,
long, and unsigned long. Since C99, long long and unsigned long long are also supported for very large literals.
Suffix Explanation
L, l long int
LL, ll (since C99) long long int
U, u unsigned
The U and L/LL suffixes can be combined in any order and case. It is an error to duplicate suffixes (e.g. provide two
U suffixes) even if they have different cases.
p is initialized to the address of the first element of an unnamed array of two ints.
The compound literal is an lvalue. The storage duration of the unnamed object is either static (if the literal appears
at file scope) or automatic (if the literal appears at block scope), and in the latter case the object's lifetime ends
when control leaves the enclosing block.
void f(void)
{
int *p;
/*...*/
p = (int [2]){ *p };
/*...*/
}
p is assigned the address of the first element of an array of two ints, the first having the value previously
pointed to by p and the second, zero.[...]
struct point {
unsigned x;
unsigned y;
};
A fictive function drawline receives two arguments of type struct point. The first has coordinate values x == 1
and y == 1, whereas the second has x == 3 and y == 4
In this case the size of the array is no specified then it will be determined by the length of the initializer.
Compound literal having length of initializer less than array size specified
int *p = (int [10]){1, 2, 3};
Note that a compound literal is an lvalue and therefore it's elements can be modifiable. A read-only compound
literal can be specified using const qualifier as (const int[]){1,2}.
Inside a function, a compound literal, as for any initialization since C99, can have arbitrary expressions.
void foo()
{
int *p;
int i = 2; j = 5;
/*...*/
p = (int [2]){ i+j, i*j };
/*...*/
}
Most variables in C have a size that is an integral number of bytes. Bit-fields are a part of a structure that don't
necessarily occupy a integral number of bytes; they can any number of bits. Multiple bit-fields can be packed into a
single storage unit. They are a part of standard C, but there are many aspects that are implementation defined.
They are one of the least portable parts of C.
struct encoderPosition {
unsigned int encoderCounts : 23;
unsigned int encoderTurns : 4;
unsigned int _reserved : 5;
};
In this example we consider an encoder with 23 bits of single precision and 4 bits to describe multi-turn. Bit-fields
are often used when interfacing with hardware that outputs data associated with specific number of bits. Another
example could be communication with an FPGA, where the FPGA writes data into your memory in 32 bit sections
allowing for hardware reads:
struct FPGAInfo {
union {
struct bits {
unsigned int bulb1On : 1;
unsigned int bulb2On : 1;
unsigned int bulb1Off : 1;
unsigned int bulb2Off : 1;
unsigned int jetOn : 1;
};
unsigned int data;
};
};
For this example we have shown a commonly used construct to be able to access the data in its individual bits, or to
write the data packet as a whole (emulating what the FPGA might do). We could then access the bits like this:
FPGAInfo fInfo;
fInfo.data = 0xFF34F;
if (fInfo.bits.bulb1On) {
printf("Bulb 1 is on\n");
}
This is valid, but as per the C99 standard 6.7.2.1, item 10:
The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is
implementation-defined.
typedef union {
struct bits {
#if defined(WIN32) || defined(LITTLE_ENDIAN)
uint8_t commFailure :1;
uint8_t hardwareFailure :1;
uint8_t _reserved :6;
#else
uint8_t _reserved :6;
uint8_t hardwareFailure :1;
uint8_t commFailure :1;
#endif
};
uint8_t data;
} hardwareStatus;
int main(void)
{
/* define a small bit-field that can hold values from 0 .. 7 */
struct
{
unsigned int uint3: 3;
} small;
return 0;
}
struct C
{
short s; /* 2 bytes */
char c; /* 1 byte */
int bit1 : 1; /* 1 bit */
int nib : 4; /* 4 bits padded up to boundary of 8 bits. Thus 3 bits are padded */
int sept : 7; /* 7 Bits septet, padded up to boundary of 32 bits. */
};
An unnamed bit-field may be of any size, but they can't be initialized or referenced.
A zero-width bit-field cannot be given a name and aligns the next field to the boundary defined by the datatype of
the bit-field. This is achieved by padding bits between the bit-fields.
struct A
{
unsigned char c1 : 3;
unsigned char c2 : 4;
unsigned char c3 : 1;
};
In structure B, the first unnamed bit-field skips 2 bits; the zero width bit-field after c2 causes c3 to start from the
char boundary (so 3 bits are skipped between c2 and c3. There are 3 padding bits after c4. Thus the size of the
structure is 2 bytes.
struct B
{
unsigned char c1 : 1;
unsigned char : 2; /* Skips 2 bits in the layout */
unsigned char c2 : 2;
unsigned char : 0; /* Causes padding up to next container boundary */
unsigned char c3 : 4;
unsigned char c4 : 1;
};
int SomeFunction(void)
{
// Somewhere in the code
A a = { … };
printf("Address of a.c2 is %p\n", &a.c2); /* incorrect, see point 2 */
printf("Size of a.c2 is %zu\n", sizeof(a.c2)); /* incorrect, see point 4 */
}
e.g. consider the following variables having the ranges as given below.
a --> range 0 - 3
b --> range 0 - 1
c --> range 0 - 7
d --> range 0 - 1
e --> range 0 - 1
If we declare these variables separately, then each has to be at least an 8-bit integer and the total space required
will be 5 bytes. Moreover the variables will not use the entire range of an 8 bit unsigned integer (0-255). Here we
can use bit-fields.
typedef struct {
unsigned int a:2;
unsigned int b:1;
unsigned int c:3;
unsigned int d:1;
unsigned int e:1;
} bit_a;
The bit-fields in the structure are accessed the same as any other structure. The programmer needs to take care
that the variables are written in range. If out of range the behaviour is undefined.
int main(void)
{
bit_a bita_var;
bita_var.a = 2; // to write into element a
printf ("%d",bita_var.a); // to read from element a.
return 0;
}
Often the programmer wants to zero the set of bit-fields. This can be done element by element, but there is second
method. Simply create a union of the structure above with an unsigned type that is greater than, or equal to, the
size of the structure. Then the entire set of bit-fields may be zeroed by zeroing this unsigned integer.
typedef union {
struct {
unsigned int a:2;
unsigned int b:1;
unsigned int c:3;
unsigned int d:1;
unsigned int e:1;
};
uint8_t data;
} union_bit;
Usage is as follows
int main(void)
{
union_bit un_bit;
un_bit.data = 0x00; // clear the whole bit-field
un_bit.a = 2; // write into element a
In conclusion, bit-fields are commonly used in memory constrained situations where you have a lot of variables
which can take on limited ranges.
C supports dynamically allocated arrays whose size is determined at run time. C99 and later supports variable
length arrays or VLAs.
type arrName[size];
where type could be any built-in type or user-defined types such as structures, arrName is a user-defined identifier,
and size is an integer constant.
Declaring an array (an array of 10 int variables in this case) is done like this:
int array[10];
it now holds indeterminate values. To ensure it holds zero values while declaring, you can do this:
Arrays can also have initializers, this example declares an array of 10 int's, where the first 3 int's will contain the
values 1, 2, 3, all other values will be zero:
In the above method of initialization, the first value in the list will be assigned to the first member of the array, the
second value will be assigned to the second member of the array and so on. If the list size is smaller than the array
size, then as in the above example, the remaining members of the array will be initialized to zeros. With designated
list initialization (ISO C99), explicit initialization of the array members is possible. For example,
In most cases, the compiler can deduce the length of the array for you, this can be achieved by leaving the square
brackets empty:
Variable Length Arrays (VLA for short) were added in C99, and made optional in C11. They are equal to normal
arrays, with one, important, difference: The length doesn't have to be known at compile time. VLA's have automatic
storage duration. Only pointers to VLA's can have static storage duration.
Important:
VLA's are potentially dangerous. If the array vla in the example above requires more space on the stack than
available, the stack will overflow. Usage of VLA's is therefore often discouraged in style guides and by books and
exercises.
size_t i, j;
for (i = 0; i < ARRLEN; ++i)
{
for(j = 0; j < ARRLEN; ++j)
{
array[j][i] = 0;
}
}
size_t i, j;
for (i = 0; i < ARRLEN; ++i)
{
for(j = 0; j < ARRLEN; ++j)
{
array[i][j] = 0;
}
}
In the same vein, this is why when dealing with an array with one dimension and multiple indexes (let's say 2
dimensions here for simplicity with indexes i and j) it is important to iterate through the array like this:
#define DIM_X 10
#define DIM_Y 20
int array[DIM_X*DIM_Y];
size_t i, j;
for (i = 0; i < DIM_X; ++i)
#define DIM_X 10
#define DIM_Y 20
#define DIM_Z 30
int array[DIM_X*DIM_Y*DIM_Z];
size_t i, j, k;
for (i = 0; i < DIM_X; ++i)
{
for(j = 0; j < DIM_Y; ++j)
{
for (k = 0; k < DIM_Z; ++k)
{
array[i*DIM_Y*DIM_Z+j*DIM_Z+k] = 0;
}
}
}
Or in a more generic way, when we have an array with N1 x N2 x ... x Nd elements, d dimensions and indices noted
as n1,n2,...,nd the offset is calculated like this
int array[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
However, in most contexts where an array appears in an expression, it is automatically converted to ("decays to") a
pointer to its first element. The case where an array is the operand of the sizeof operator is one of a small number
of exceptions. The resulting pointer is not itself an array, and it does not carry any information about the length of
the array from which it was derived. Therefore, if that length is needed in conjunction with the pointer, such as
when the pointer is passed to a function, then it must be conveyed separately.
For example, suppose we want to write a function to return the last element of an array of int. Continuing from the
Note in particular that although the declaration of parameter input resembles that of an array, it in fact declares
input as a pointer (to int). It is exactly equivalent to declaring input as int *input. The same would be true even
if a dimension were given. This is possible because arrays cannot ever be actual arguments to functions (they decay
to pointers when they appear in function call expressions), and it can be viewed as mnemonic.
It is a very common error to attempt to determine array size from a pointer, which cannot work. DO NOT DO THIS:
return input[length - 1]; /* Oops -- not the droid we are looking for */
}
In fact, that particular error is so common that some compilers recognize it and warn about it. clang, for instance,
will emit the following warning:
warning: sizeof on array function parameter will return size of 'int *' instead of 'int []' [-
Wsizeof-array-argument]
int length = sizeof(input) / sizeof(input[0]);
^
note: declared here
int BAD_get_last(int input[])
^
#include <assert.h>
#include <stdlib.h>
/* An array of pointers may be passed to this, since it'll decay into a pointer
to pointer, but an array of arrays may not. */
void h(int **x) {
assert(sizeof(*x) == sizeof(int*));
}
int main(void) {
int foo[2][4];
f(foo);
g(foo);
h(bar);
See also
type name[size1][size2]...[sizeN];
For example, the following declaration creates a three dimensional (5 x 10 x 4) integer array:
int arr[5][10][4];
Two-dimensional Arrays
The simplest form of multidimensional array is the two-dimensional array. A two-dimensional array is, in essence, a
list of one-dimensional arrays. To declare a two-dimensional integer array of dimensions m x n, we can write as
follows:
Where type can be any valid C data type (int, float, etc.) and arrayName can be any valid C identifier. A two-
dimensional array can be visualized as a table with m rows and n columns. Note: The order does matter in C. The
array int a[4][3] is not the same as the array int a[3][4]. The number of rows comes first as C is a row-major
language.
A two-dimensional array a, which contains three rows and four columns can be shown as follows:
Thus, every element in the array a is identified by an element name of the form a[i][j], where a is the name of the
array, i represents which row, and j represents which column. Recall that rows and columns are zero indexed. This
is very similar to mathematical notation for subscripting 2-D matrices.
Multidimensional arrays may be initialized by specifying bracketed values for each row. The following define an
array with 3 rows where each row has 4 columns.
int a[3][4] = {
{0, 1, 2, 3} , /* initializers for row indexed by 0 */
{4, 5, 6, 7} , /* initializers for row indexed by 1 */
{8, 9, 10, 11} /* initializers for row indexed by 2 */
};
The nested braces, which indicate the intended row, are optional. The following initialization is equivalent to the
previous example:
While the method of creating arrays with nested braces is optional, it is strongly encouraged as it is more readable
and clearer.
An element in a two-dimensional array is accessed by using the subscripts, i.e., row index and column index of the
array. For example −
The above statement will take the 4th element from the 3rd row of the array. Let us check the following program
where we have used a nested loop to handle a two-dimensional array:
#include <stdio.h>
int main () {
return 0;
}
When the above code is compiled and executed, it produces the following result:
a[0][0]: 0
a[0][1]: 0
a[1][0]: 1
a[1][1]: 2
a[2][0]: 2
a[2][1]: 4
a[3][0]: 3
a[3][1]: 6
a[4][0]: 4
a[4][1]: 8
Three-Dimensional array:
A 3D array is essentially an array of arrays of arrays: it's an array or collection of 2D arrays, and a 2D array is an
array of 1D arrays.
Initializing a 3D Array:
double cprogram[3][2][4]={
{{-0.1, 0.22, 0.3, 4.3}, {2.3, 4.7, -0.9, 2}},
{{0.9, 3.6, 4.5, 4}, {1.2, 2.4, 0.22, -1}},
{{8.2, 3.12, 34.2, 0.1}, {2.1, 3.2, 4.3, -2.0}}
};
We can have arrays with any number of dimensions, although it is likely that most of the arrays that are created will
be of one or two dimensions.
return 0;
}
int main(void)
{
int array[ARRLEN]; /* Allocated but not initialised, as not defined static or global. */
size_t i;
for(i = 0; i < ARRLEN; ++i)
{
array[i] = 0;
}
return EXIT_SUCCESS;
}
An common short cut to the above loop is to use memset() from <string.h>. Passing array as shown below makes
it decay to a pointer to its 1st element.
memset(array, 0, ARRLEN * sizeof (int)); /* Use size explicitly provided type (int here). */
or
As in this example array is an array and not just a pointer to an array's 1st element (see Array length on why this is
important) a third option to 0-out the array is possible:
int val;
int array[10];
As a side effect of the operands to the + operator being exchangeable (--> commutative law) the following is
equivalent:
*(array + 4) = 5;
*(4 + array) = 5;
array[4] = 5;
4[array] = 5; /* Weird but valid C ... */
val = array[4];
val = 4[array]; /* Weird but valid C ... */
C doesn't perform any boundary checks, accessing contents outside of the declared array is undefined (Accessing
memory beyond allocated chunk ):
int val;
int array[10];
array[4] = 5; /* ok */
val = array[4]; /* ok */
array[19] = 20; /* undefined behavior */
val = array[15]; /* undefined behavior */
return EXIT_SUCCESS;
}
This program tries to scan in an unsigned integer value from standard input, allocate a block of memory for an
array of n elements of type int by calling the calloc() function. The memory is initialized to all zeros by the latter.
return 0;
}
Here, in the initialization of p in the first for loop condition, the array a decays to a pointer to its first element, as it
would in almost all places where such an array variable is used.
Then, the ++p performs pointer arithmetic on the pointer p and walks one by one through the elements of the
#include <stdio.h>
#include <stdlib.h>
/* This data is not always stored in a structure, but it is sometimes for ease of use */
struct Node {
/* Sometimes a key is also stored and used in the functions */
int data;
struct Node* next;
struct Node* previous;
};
int main(void) {
/* Sometimes in a doubly linked list the last node is also stored */
struct Node *head = NULL;
printf("Insert a node at the beginning, and then print the list backwards\n");
insert_at_beginning(&head, 10);
print_list_backwards(head);
printf("Insert a node at the end, and then print the list forwards.\n");
insert_at_end(&head, 15);
print_list(head);
free_list(head);
return 0;
}
while (i != NULL) {
printf("Value: %d\n", i->data);
i = i->previous;
}
}
if (NULL == pheadNode)
{
return;
}
/*
This is done similarly to how we insert a node at the beginning of a singly linked
list, instead we set the previous member of the structure as well
*/
currentNode = malloc(sizeof *currentNode);
currentNode->next = NULL;
currentNode->previous = NULL;
currentNode->data = value;
currentNode->next = *pheadNode;
(*pheadNode)->previous = currentNode;
*pheadNode = currentNode;
}
if (NULL == pheadNode)
{
return;
}
/*
This can, again be done easily by being able to have the previous element. It
would also be even more useful to have a pointer to the last node, which is commonly
used.
*/
currentNode->data = value;
if (*pheadNode == NULL) {
*pheadNode = currentNode;
return;
}
i->next = currentNode;
currentNode->previous = i;
}
Note that sometimes, storing a pointer to the last node is useful (it is more efficient to simply be able to jump
straight to the end of the list than to need to iterate through to the end):
Sometimes, a key is also used to identify elements. It is simply a member of the Node structure:
struct Node {
int data;
int key;
struct Node* next;
struct Node* previous;
};
The key is then used when any tasks are performed on a specific element, like deleting elements.
#include <stdio.h>
#include <stdlib.h>
#define NUM_ITEMS 10
struct Node {
int data;
struct Node *next;
};
int main(void) {
int i;
struct Node *head = NULL;
currentNode->data = nodeValue;
if(position == 1) {
currentNode->next = *headNode;
*headNode = currentNode;
return;
}
currentNode->next = nodeBeforePosition->next;
nodeBeforePosition->next = currentNode;
}
/* Iterator will be NULL by the end, so the last node will be stored in
previousNode. We will set the last node to be the headNode */
*headNode = previousNode;
We start the previousNode out as NULL, since we know on the first iteration of the loop, if we are looking for the
node before the first head node, it will be NULL. The first node will become the last node in the list, and the next
variable should naturally be NULL.
Basically, the concept of reversing the linked list here is that we actually reverse the links themselves. Each node's
next member will become the node before it, like so:
Finally, the head should point to the 5th node instead, and each node should point to the node previous of it.
Node 1 should point to NULL since there was nothing before it. Node 2 should point to node 1, node 3 should point
to node 2, et cetera.
However, there is one small problem with this method. If we break the link to the next node and change it to the
previous node, we will not be able to traverse to the next node in the list since the link to it is gone.
The solution to this problem is to simply store the next element in a variable (nextNode) before changing the link.
#include <stdio.h>
#include <stdlib.h>
struct Node {
int data;
struct Node* next;
};
print_list(head);
return 0;
/* Assign data */
currentNode->data = value;
/* Holds a pointer to the 'next' field that we have to link to the new node.
By initializing it to &head we handle the case of insertion at the beginning. */
struct Node **nextForPosition = &head;
/* Iterate to get the 'next' field we are looking for.
Note: Insert at the end if position is larger than current number of elements. */
for (i = 0; i < position && *nextForPosition != NULL; i++) {
/* nextForPosition is pointing to the 'next' field of the node.
So *nextForPosition is a pointer to the next node.
Update it with a pointer to the 'next' field of the next node. */
nextForPosition = &(*nextForPosition)->next;
}
/* Here, we are taking the link to the next node (the one our newly inserted node should
point to) by dereferencing nextForPosition, which points to the 'next' field of the node
that is in the position we want to insert our node at.
We assign this link to our next value. */
currentNode->next = *nextForPosition;
/* Now, we want to correct the link of the node before the position of our
new node: it will be changed to be a pointer to our new node. */
*nextForPosition = currentNode;
return head;
}
/* This program will demonstrate inserting a node at the beginning of a linked list */
#include <stdio.h>
#include <stdlib.h>
struct Node {
int data;
struct Node* next;
};
return 0;
}
*head = currentNode;
}
In order to understand how we add nodes at the beginning, let's take a look at possible scenarios:
1. The list is empty, so we need to add a new node. In which case, our memory looks like this where HEAD is a
pointer to the first node:
The line currentNode->next = *headNode; will assign the value of currentNode->next to be NULL since headNode
originally starts out at a value of NULL.
Now, we want to set our head node pointer to point to our current node.
----- -------------
|HEAD | --> |CURRENTNODE| --> NULL /* The head node points to the current node */
2. The list is already populated; we need to add a new node to the beginning. For the sake of simplicity, let's
start out with 1 node:
----- -----------
HEAD --> FIRST NODE --> NULL
----- -----------
If you use enum instead of int or string/ char*, you increase compile-time checking and avoid errors from passing
in invalid constants, and you document which values are legal to use.
Example 1
enum color{ RED, GREEN, BLUE };
case GREEN:
color_name = "GREEN";
break;
case BLUE:
color_name = "BLUE";
break;
}
printf("%s\n", color_name);
}
int main(){
enum color chosenColor;
printf("Enter a number between 0 and 2");
scanf("%d", (int*)&chosenColor);
printColor(chosenColor);
return 0;
}
Version ≥ C99
Example 2
(This example uses designated initializers which are standardized since C99.)
This enables us to define compile time constants of type int that can as in this example be used as array length.
enum Dupes
{
Base, /* Takes 0 */
One, /* Takes Base + 1 */
Two, /* Takes One + 1 */
Negative = -1,
AnotherZero /* Takes Negative + 1 == 0, sigh */
};
int main(void)
{
printf("Base = %d\n", Base);
printf("One = %d\n", One);
printf("Two = %d\n", Two);
printf("Negative = %d\n", Negative);
printf("AnotherZero = %d\n", AnotherZero);
return EXIT_SUCCESS;
}
Base = 0
One = 1
Two = 2
enum color
{
RED,
GREEN,
BLUE
};
This enumeration must then always be used with the keyword and the tag like this:
If we use typedef directly when declaring the enum, we can omit the tag name and then use the type without the
enum keyword:
typedef enum
{
RED,
GREEN,
BLUE
} color;
But in this latter case we cannot use it as enum color, because we didn't use the tag name in the definition. One
common convention is to use both, such that the same name can be used with or without enum keyword. This has
the particular advantage of being compatible with C++
Function:
void printColor()
{
if (chosenColor == RED)
{
printf("RED\n");
}
else if (chosenColor == GREEN)
{
printf("GREEN\n");
A structure with at least one member may additionally contain a single array member of unspecified length at the
end of the structure. This is called a flexible array member:
struct ex1
{
size_t foo;
int flex[];
};
struct ex2_header
{
int foo;
char bar;
};
struct ex2
{
struct ex2_header hdr;
int flex[];
};
A flexible array member is treated as having no size when calculating the size of a structure, though padding
between that member and the previous member of the structure may still exist:
/* Also prints "8,8" on my machine, so there is no padding in the ex2 structure itself. */
printf("%zu,%zu\n", sizeof(struct ex2_header), sizeof(struct ex2));
The flexible array member is considered to have an incomplete array type, so its size cannot be calculated using
sizeof.
You can declare and initialize an object with a structure type containing a flexible array member, but you must not
attempt to initialize the flexible array member since it is treated as if it does not exist. It is forbidden to try to do
this, and compile errors will result.
Similarly, you should not attempt to assign a value to any element of a flexible array member when declaring a
structure in this way since there may not be enough padding at the end of the structure to allow for any objects
required by the flexible array member. The compiler will not necessarily prevent you from doing this, however, so
this can lead to undefined behavior.
You may instead choose to use malloc, calloc, or realloc to allocate the structure with extra storage and later
free it, which allows you to use the flexible array member as you wish:
/* valid: allocate an object of structure type `ex1` along with an array of 2 ints */
struct ex1 *pe1 = malloc(sizeof(*pe1) + 2 * sizeof(pe1->flex[0]));
/* valid: allocate an object of structure type ex2 along with an array of 4 ints */
struct ex2 *pe2 = malloc(sizeof(struct ex2) + sizeof(int[4]));
/* valid: allocate 5 structure type ex3 objects along with an array of 3 ints per object */
struct ex3 *pe3 = malloc(5 * (sizeof(*pe3) + sizeof(int[3])));
pe1->flex[0] = 3; /* valid */
pe3[0]->flex[0] = pe1->flex[0]; /* valid */
Version < C99
The 'struct hack'
Flexible array members did not exist prior to C99 and are treated as errors. A common workaround is to declare an
array of length 1, a technique called the 'struct hack':
struct ex1
{
size_t foo;
int flex[1];
};
This will affect the size of the structure, however, unlike a true flexible array member:
To use the flex member as a flexible array member, you'd allocate it with malloc as shown above, except that
sizeof(*pe1) (or the equivalent sizeof(struct ex1)) would be replaced with offsetof(struct ex1, flex) or the
longer, type-agnostic expression sizeof(*pe1)-sizeof(pe1->flex). Alternatively, you might subtract 1 from the
desired length of the "flexible" array since it's already included in the structure size, assuming the desired length is
Compatibility
If compatibility with compilers that do not support flexible array members is desired, you may use a macro defined
like FLEXMEMB_SIZE below:
struct ex1
{
size_t foo;
int flex[FLEXMEMB_SIZE];
};
When allocating objects, you should use the offsetof(struct ex1, flex) form to refer to the structure size
(excluding the flexible array member) since it is the only expression that will remain consistent between compilers
that support flexible array members and compilers that do not:
The alternative is to use the preprocessor to conditionally subtract 1 from the specified length. Due to the increased
potential for inconsistency and general human error in this form, I moved the logic into a separate function:
typedef struct
{
int x, y;
} Point;
as opposed to:
struct Point
{
int x, y;
};
Point point;
instead of:
struct Point
{
int x, y;
};
to have advantage of both possible definitions of point. Such a declaration is most convenient if you learned C++
first, where you may omit the struct keyword if the name is not ambiguous.
typedef names for structs could be in conflict with other identifiers of other parts of the program. Some consider
this a disadvantage, but for most people having a struct and another identifier the same is quite disturbing.
Notorious is e.g POSIX' stat
where you see a function stat that has one argument that is struct stat.
typedef'd structs without a tag name always impose that the whole struct declaration is visible to code that uses
it. The entire struct declaration must then be placed in a header file.
Consider:
#include "bar.h"
struct foo
{
bar *aBar;
};
So with a typedefd struct that has no tag name, the bar.h file always has to include the whole definition of bar. If
we use
See Typedef
/* structs */
struct stack
{
struct node *top;
int size;
};
struct node
{
int data;
struct node *next;
};
/* function declarations */
int push(int, struct stack*);
int pop(struct stack*);
void destroy(struct stack*);
int main(void)
{
int result = EXIT_SUCCESS;
size_t i;
/* initialize stack */
stack->top = NULL;
stack->size = 0;
/* push 10 ints */
{
int data = 0;
for(i = 0; i < 10; i++)
{
printf("Pushing: %d\n", data);
if (-1 == push(data, stack))
{
perror("push() failed");
result = EXIT_FAILURE;
break;
}
++data;
}
}
if (EXIT_SUCCESS == result)
{
/* pop 5 ints */
for(i = 0; i < 5; i++)
{
printf("Popped: %i\n", pop(stack));
/* destroy stack */
destroy(stack);
return result;
}
return result;
}
/* coordinates.h */
/* Constructor */
coordinate *coordinate_create(void);
/* Destructor */
void coordinate_destroy(coordinate *this);
/* coordinates.c */
#include "coordinates.h"
#include <stdio.h>
#include <stdlib.h>
/* Constructor */
coordinate *coordinate_create(void)
{
coordinate *c = malloc(sizeof(*c));
if (c != 0)
{
c->setx = &coordinate_setx;
c->sety = &coordinate_sety;
c->print = &coordinate_print;
c->x = 0;
c->y = 0;
}
return c;
}
/* Destructor */
void coordinate_destroy(coordinate *this)
{
if (this != NULL)
{
free(this);
}
}
/* Methods */
static void coordinate_setx(coordinate *this, int x)
{
if (this != NULL)
{
this->x = x;
}
}
/* main.c */
#include "coordinates.h"
#include <stddef.h>
int main(void)
{
/* Create and initialize pointers to coordinate objects */
coordinate *c1 = coordinate_create();
coordinate *c2 = coordinate_create();
/* Now we can use our objects using our methods and passing the object as parameter */
c1->setx(c1, 1);
c1->sety(c1, 2);
c2->setx(c2, 3);
c2->sety(c2, 4);
c1->print(c1);
c2->print(c2);
/* After using our objects we destroy them using our "destructor" function */
coordinate_destroy(c1);
c1 = NULL;
coordinate_destroy(c2);
c2 = NULL;
return 0;
}
struct point
{
int x;
int y;
};
#include <stdio.h>
#include <math.h>
#include <errno.h>
#include <fenv.h>
int main()
{
double pwr, sum=0;
int i, n;
printf("\n1+4(3+3^2+3^3+3^4+...+3^N)=?\nEnter N:");
scanf("%d",&n);
if (n<=0) {
printf("Invalid power N=%d", n);
return -1;
}
return 0;
}
Example Output:
1+4(3+3^2+3^3+3^4+...+3^N)=?
Enter N:10
N= 0 S= 1
N= 1 S= 13
N= 2 S= 49
N= 3 S= 157
N= 4 S= 481
N= 5 S= 1453
N= 6 S= 4369
N= 7 S= 13117
N= 8 S= 39361
N= 9 S= 118093
N= 10 S= 354289
int main(void)
{
double x = 10.0;
double y = 5.1;
return 0;
}
Output:
4.90000
Important: Use this function with care, as it can return unexpected values due to the operation of floating point
values.
#include <math.h>
#include <stdio.h>
int main(void)
{
printf("%f\n", fmod(1, 0.1));
printf("%19.17f\n", fmod(1, 0.1));
return 0;
}
Output:
0.1
0.09999999999999995
These functions returns the floating-point remainder of the division of x/y. The returned value has the same sign as
x.
Single Precision:
Output:
4.90000
int main(void)
{
long double x = 10.0;
long double y = 5.1;
Output:
4.90000
Version ≥ C99
#include <stddef.h> // for size_t
In this way the scanf() function call is executed n times (10 times in our example), but is written only once.
Here, the variable i is the loop index, and it is best declared as presented. The type size_t (size type) should be
used for everything that counts or loops through data objects.
This way of declaring variables inside the for is only available for compilers that have been updated to the C99
standard. If for some reason you are still stuck with an older compiler you can declare the loop index before the
for loop:
do_B();
while (condition) {
do_A();
do_B();
}
To avoid potential cut/paste problems with repeating B twice in the code, Duff's Device could be applied to start the
loop from the middle of the while body, using a switch statement and fall through behavior.
Duff's Device was actually invented to implement loop unrolling. Imagine applying a mask to a block of memory,
where n is a signed integral type with a positive value.
do {
*ptr++ ^= mask;
} while (--n > 0);
do {
*ptr++ ^= mask;
*ptr++ ^= mask;
*ptr++ ^= mask;
*ptr++ ^= mask;
} while ((n -= 4) > 0);
But, with Duff's Device, the code can follow this unrolling idiom that jumps into the right place in the middle of the
loop if n is not divisible by 4.
switch (n % 4) do {
case 0: *ptr++ ^= mask; /* FALL THROUGH */
case 3: *ptr++ ^= mask; /* FALL THROUGH */
case 2: *ptr++ ^= mask; /* FALL THROUGH */
case 1: *ptr++ ^= mask; /* FALL THROUGH */
} while ((n -= 4) > 0);
This kind of manual unrolling is rarely required with modern compilers, since the compiler's optimization engine
can unroll loops on the programmer's behalf.
int num = 1;
while (num != 0)
{
scanf("%d", &num);
}
For example this do-while loop will get numbers from user, until the sum of these values is greater than or equal to
50:
do
{
scanf("%d", &num);
sum += num;
In a for loop, the loop condition has three expressions, all optional.
The first expression, declaration-or-expression, initializes the loop. It is executed exactly once at the
beginning of the loop.
Version ≥ C99
It can be either a declaration and initialization of a loop variable, or a general expression. If it is a declaration, the
scope of the declared variable is restricted by the for statement.
Historical versions of C only allowed an expression, here, and the declaration of a loop variable had to be placed
before the for.
The second expression, expression2, is the test condition. It is first executed after the initialization. If the
condition is true, then the control enters the body of the loop. If not, it shifts to outside the body of the loop
at the end of the loop. Subsequently, this conditon is checked after each execution of the body as well as the
update statement. When true, the control moves back to the beginning of the body of the loop. The
condition is usually intended to be a check on the number of times the body of the loop executes. This is the
primary way of exiting a loop, the other way being using jump statements.
The third expression, expression3, is the update statement. It is executed after each execution of the body of
the loop. It is often used to increment a variable keeping count of the number of times the loop body has
executed, and this variable is called an iterator.
Example:
Version ≥ C99
for(int i = 0; i < 10 ; i++)
{
printf("%d", i);
}
In the above example, first i = 0 is executed, initializing i. Then, the condition i < 10 is checked, which evaluates
to be true. The control enters the body of the loop and the value of i is printed. Then, the control shifts to i++,
updating the value of i from 0 to 1. Then, the condition is again checked, and the process continues. This goes on
till the value of i becomes 10. Then, the condition i < 10 evaluates false, after which the control moves out of the
loop.
Example:
Version ≥ C99
for (int i = 0; i >= 0; )
{
/* body of the loop where i is not changed*/
}
In the above example, the variable i, the iterator, is initialized to 0. The test condition is initially true. However, i is
not modified anywhere in the body and the update expression is empty. Hence, i will remain 0, and the test
condition will never evaluate to false, leading to an infinite loop.
Assuming that there are no jump statements, another way an infinite loop might be formed is by explicitly keeping
the condition true:
while (true)
{
/* body of the loop */
}
In a for loop, the condition statement optional. In this case, the condition is always true vacuously, leading to an
infinite loop.
for (;;)
{
/* body of the loop */
}
However, in certain cases, the condition might be kept true intentionally, with the intention of exiting the loop
using a jump statement such as break.
while (true)
{
/* statements */
if (condition)
{
/* more statements */
break;
}
}
if(cond)
{
statement(s); /*to be executed, on condition being true*/
}
For example,
if (a > 1) {
puts("a is larger than 1");
}
Where a > 1 is a condition that has to evaluate to true in order to execute the statements inside the if block. In
this example "a is larger than 1" is only printed if a > 1 is true.
if selection statements can omit the wrapping braces { and } if there is only one statement within the block. The
above example can be rewritten to
if (a > 1)
puts("a is larger than 1");
However for executing multiple statements within block the braces have to used.
The condition for if can include multiple expressions. if will only perform the action if the end result of expression
is true.
For example
will only execute the printf and a++ if both a and b are greater than 1.
An if()...else ladder:
#include <stdio.h>
Is, in the general case, considered to be better than the equivalent nested if()...else:
#include <stdio.h>
int a = 1;
switch (a) {
case 1:
puts("a is 1");
break;
case 2:
puts("a is 2");
break;
default:
puts("a is neither 1 nor 2");
break;
}
int a = 1;
if (a == 1) {
puts("a is 1");
} else if (a == 2) {
puts("a is 2");
} else {
puts("a is neither 1 nor 2");
}
If the value of a is 1 when the switch statement is used, a is 1 will be printed. If the value of a is 2 then, a is 2 will
be printed. Otherwise, a is neither 1 nor 2 will be printed.
case n: is used to describe where the execution flow will jump in when the value passed to switch statement is n.
n must be compile-time constant and the same n can exist at most once in one switch statement.
default: is used to describe that when the value didn't match any of the choices for case n:. It is a good practice
to include a default case in every switch statement to catch unexpected behavior.
Note: If you accidentally forget to add a break after the end of a case, the compiler will assume that you intend to
"fall through" and all the subsequent case statements, if any, will be executed (unless a break statement is found in
any of the subsequent cases), regardless of whether the subsequent case statement(s) match or not. This particular
property is used to implement Duff's Device. This behavior is often considered a flaw in the C language
specification.
int a = 1;
switch (a) {
case 1:
case 2:
Note that the default case is not necessary, especially when the set of values you get in the switch is finished and
known at compile time.
most compilers will report a warning if you don't handle a value (this would not be reported if a default case
were present)
for the same reason, if you add a new value to the enum, you will be notified of all the places where you forgot
to handle the new value (with a default case, you would need to manually explore your code searching for
such cases)
The reader does not need to figure out "what is hidden by the default:", whether there other enum values or
whether it is a protection for "just in case". And if there are other enum values, did the coder intentionally use
the default case for them or is there a bug that was introduced when he added the value?
handling each enum value makes the code self explanatory as you can't hide behind a wild card, you must
explicitly handle each of them.
Thus you may add an extra check before your switch to detect it, if you really need it.
Example:
if (a > 1)
puts("a is larger than 1");
else
puts("a is not larger than 1");
Just like the if statement, when the block within if or else is consisting of only one statement, then the braces can
be omitted (but doing so is not recommended as it can easily introduce problems involuntarily). However if there's
more than one statement within the if or else block, then the braces have to be used on that particular block.
if (a > 1)
{
puts("a is larger than 1");
a--;
}
else
{
puts("a is not larger than 1");
a++;
}
Example:
if (a >= 1)
{
printf("a is greater than or equals 1.\n");
}
else if (a == 0) //we already know that a is smaller than 1
{
printf("a equals 0.\n");
}
else /* a is smaller than 1 and not equals 0, hence: */
{
printf("a is negative.\n");
}
Scalar variables may be initialized when they are defined by following the name with an equals sign and an
expression:
int x = 1;
char squota = '\'';
long day = 1000L * 60L * 60L * 24L; /* milliseconds/day */
For external and static variables, the initializer must be a constant expression2; the initialization is done once,
conceptually before the program begins execution.
For automatic and register variables, the initializer is not restricted to being a constant: it may be any expression
involving previously defined values, even function calls.
instead of
low = 0;
high = n - 1;
In effect, initialization of automatic variables are just shorthand for assignment statements. Which form to prefer is
largely a matter of taste. We generally use explicit assignments, because initializers in declarations are harder to
see and further away from the point of use. On the other hand, variables should only be declared when they're
about to be used whenever possible.
Initializing an array:
An array may be initialized by following its declaration with a list of initializers enclosed in braces and separated by
commas.
For example, to initialize an array days with the number of days in each month:
int days_of_month[] = { 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31 }
When the size of the array is omitted, the compiler will compute the length by counting the initializers, of which
there are 12 in this case.
It is an error to have too many initializers. There is no standard way to specify repetition of an initializer — but GCC
has an extension to do so.
In C89/C90 or earlier versions of C, there was no way to initialize an element in the middle of an array without
supplying all the preceding values as well.
Version ≥ C99
With C99 and above, designated initializers allow you to initialize arbitrary elements of an array, leaving any
uninitialized values as zeros.
Character arrays are a special case of initialization; a string may be used instead of the braces and commas
notation:
In this case, the array size is six (five characters plus the terminating '\0').
2 Note that a constant expression is defined as something that can be evaluated at compile-time. So, int
global_var = f(); is invalid. Another common misconception is thinking of a const qualified variable as a constant
expression. In C, const means "read-only", not "compile time constant". So, global definitions like const int SIZE =
10; int global_arr[SIZE]; and const int SIZE = 10; int global_var = SIZE; are not legal in C.
C99 introduced the concept of designated initializers. These allow you to specify which elements of an array,
structure or union are to be initialized by the values following.
int array[] = { [4] = 29, [5] = 31, [17] = 101, [18] = 103, [19] = 107, [20] = 109 };
The term in square brackets, which can be any constant integer expression, specifies which element of the array is
to be initialized by the value of the term after the = sign. Unspecified elements are default initialized, which means
zeros are defined. The example shows the designated initializers in order; they do not have to be in order. The
example shows gaps; those are legitimate. The example doesn't show two different initializations for the same
element; that too is allowed (ISO/IEC 9899:2011, §6.7.9 Initialization, ¶19 The initialization shall occur in initializer list
order, each initializer provided for a particular subobject overriding any previously listed initializer for the same
subobject).
You can specify which elements of a structure are initialized by using the .element notation:
struct Date
{
int year;
int month;
int day;
};
You can specify which element of a union is initialize with a designated initializer.
Version = C89
Prior to the C standard, there was no way to initialize a union. The C89/C90 standard allows you to initialize the first
member of a union — so the choice of which member is listed first matters.
struct discriminated_union
{
enum { DU_INT, DU_DOUBLE } discriminant;
union
{
int du_int;
double du_double;
} du;
};
Note that C11 allows you to use anonymous union members inside a structure, so that you don't need the du name
in the previous example:
struct discriminated_union
{
enum { DU_INT, DU_DOUBLE } discriminant;
union
{
int du_int;
double du_double;
};
};
struct date_range
{
Date dr_from;
Date dr_to;
char dr_what[80];
};
GCC provides an extension that allows you to specify a range of elements in an array that should be given the same
initializer:
The triple dots need to be separate from the numbers lest one of the dots be interpreted as part of a floating point
number (maximimal munch rule).
struct Date
{
int year;
int month;
int day;
};
Note that the array initialization could be written without the interior braces, and in times past (before 1990, say)
often would have been written without them:
Although this works, it is not good modern style — you should not attempt to use this notation in new code and
should fix the compiler warnings it usually yields.
A definition actually instantiates/implements this identifier. It's what the linker needs in order to link references to
those entities. These are definitions corresponding to the above declarations:
int bar;
int g(int lhs, int rhs) {return lhs*rhs;}
double f(int i, double d) {return i+d;}
double h1(int a, int b) {return -1.5;}
double h2() {} /* prototype is implied in definition, same as double h2(void) */
However, it must be defined exactly once. If you forget to define something that's been declared and referenced
somewhere, then the linker doesn't know what to link references to and complains about a missing symbols. If you
define something more than once, then the linker doesn't know which of the definitions to link references to and
complains about duplicated symbols.
Exception:
This exception can be explained using concepts of "Strong symbols vs Weak symbols" (from a linker's perspective) .
Please look here ( Slide 22 ) for more explanation.
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <limits.h>
errno = 0;
char *p;
long argument_numValue = strtol(argv[i], &p, 10);
if (p == argv[i]) {
fprintf(stderr, "Argument %d is not a number.\n", i);
}
else if ((argument_numValue == LONG_MIN || argument_numValue == LONG_MAX) && errno ==
ERANGE) {
fprintf(stderr, "Argument %d is out of range.\n", i);
}
else {
printf("Argument %d is a number, and the value is: %ld\n",
i, argument_numValue);
}
}
return 0;
}
References:
Notes
With glibc in a Linux or Unix environment you can use the getopt tools to easily define, validate, and parse
command-line options from the rest of your arguments.
These tools expect your options to be formatted according to the GNU Coding Standards, which is an extension of
what POSIX specifies for the format of command-line options.
The example below demonstrates handling command-line options with the GNU getopt tools.
#include <stdio.h>
#include <getopt.h>
#include <string.h>
if (opt == -1) {
/* a return value of -1 indicates that there are no more options */
break;
}
switch (opt) {
case 'h':
/* the help_flag and value are specified in the longopts table,
* which means that when the --help option is specified (in its long
* form), the help_flag variable will be automatically set.
* however, the parser for short-form options does not support the
* automatic setting of flags, so we still need this code to set the
* help_flag manually when the -h option is specified.
*/
help_flag = 1;
break;
case 'f':
/* optarg is a global variable in getopt.h. it contains the argument
* for this option. it is null if there was no argument.
*/
printf ("outarg: '%s'\n", optarg);
strncpy (filename, optarg ? optarg : "out.txt", sizeof (filename));
/* strncpy does not fully guarantee null-termination */
filename[sizeof (filename) - 1] = '\0';
break;
case 'm':
/* since the argument for this option is required, getopt guarantees
* that aptarg is non-null.
*/
strncpy (message, optarg, sizeof (message));
message[sizeof (message) - 1] = '\0';
if (help_flag) {
usage (stdout, argv[0]);
return 0;
}
if (filename[0]) {
fp = fopen (filename, "w");
} else {
fp = stdout;
}
if (!fp) {
fprintf(stderr, "Failed to open file.\n");
return 1;
}
fclose (fp);
return 0;
}
It supports three command-line options (--help, --file, and --msg). All have a "short form" as well (-h, -f, and -m).
The "file" and "msg" options both accept arguments. If you specify the "msg" option, its argument is required.
/* Writes text to file. Unlike puts(), fputs() does not add a new-line. */
if (fputs("Output in file.\n", file) == EOF)
{
perror(path);
e = EXIT_FAILURE;
}
/* Close file */
if (fclose(file))
{
perror(path);
return EXIT_FAILURE;
}
return e;
}
This program opens the file with name given in the argument to main, defaulting to output.txt if no argument is
given. If a file with the same name already exists, its contents are discarded and the file is treated as a new empty
file. If the files does not already exist the fopen() call creates it.
If the fopen() call fails for some reason, it returns a NULL value and sets the global errno variable value. This means
that the program can test the returned value after the fopen() call and use perror() if fopen() fails.
If the fopen() call succeeds, it returns a valid FILE pointer. This pointer can then be used to reference this file until
fclose() is called on it.
The fputs() function writes the given text to the opened file, replacing any previous contents of the file. Similarly to
fopen(), the fputs() function also sets the errno value if it fails, though in this case the function returns EOF to
The fclose() function flushes any buffers, closes the file and frees the memory pointed to by FILE *. The return
value indicates completion just as fputs() does (though it returns '0' if successful), again also setting the errno
value in the case of a fail.
print_all(stream);
pclose(stream);
return 0;
}
This program runs a process (netstat) via popen() and reads all the standard output from the process and echoes
that to standard output.
Note: popen() does not exist in the standard C library, but it is rather a part of POSIX C)
A side note: Some systems (infamously, Windows) do not use what most programmers would call "normal" line
endings. While UNIX-like systems use \n to terminate lines, Windows uses a pair of characters: \r (carriage return)
and \n (line feed). This sequence is commonly called CRLF. However, whenever using C, you do not need to worry
about these highly platform-dependent details. A C compiler is required to convert every instance of \n to the
correct platform line ending. So a Windows compiler would convert \n to \r\n, but a UNIX compiler would keep it as-
is.
#include <stdlib.h>
#include <stdio.h>
int main(void)
{
/* Open the file for reading */
char *line_buf = NULL;
size_t line_buf_size = 0;
int line_count = 0;
ssize_t line_size;
FILE *fp = fopen(FILENAME, "r");
if (!fp)
{
fprintf(stderr, "Error opening file '%s'\n", FILENAME);
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}
This is a file
which has
multiple lines
with various indentation,
blank lines
a really long line to show that getline() will reallocate the line buffer if the length of a line
is too long to fit in the buffer it has been given,
and punctuation at the end of the lines.
In the example, getline() is initially called with no buffer allocated. During this first call, getline() allocates a
buffer, reads the first line and places the line's contents in the new buffer. On subsequent calls, getline() updates
the same buffer and only reallocates the buffer when it is no longer large enough to fit the whole line. The
temporary buffer is then freed when we are done with the file.
Another option is getdelim(). This is the same as getline() except you specify the line ending character. This is
only necessary if the last character of the line for your file type is not '\n'. getline() works even with Windows text
files because with the multibyte line ending ("\r\n")'\n'` is still the last character on the line.
/* Only include our version of getline() if the POSIX version isn't available. */
/* Step through the file, pulling characters until either a newline or EOF. */
{
int c;
while (EOF != (c = getc(fin)))
{
/* Note we read a character. */
num_read++;
#endif
file.txt:
This is just
a test file
to be used by fscanf()
#include <stdlib.h>
#include <stdio.h>
int main(void)
{
FILE *fp;
printAllWords(fp);
fclose(fp);
return EXIT_SUCCESS;
}
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAX_LINE_LENGTH 80
if (argc < 1)
return EXIT_FAILURE;
path = argv[1];
/* Open file */
FILE *file = fopen(path, "r");
if (!file)
{
perror(path);
return EXIT_FAILURE;
}
/* Close file */
if (fclose(file))
{
return EXIT_FAILURE;
perror(path);
Calling the program with an argument that is a path to a file containing the following text:
This is a file
which has
multiple lines
with various indentation,
blank lines
a really long line to show that the line will be counted as two lines if the length of a line is
too long to fit in the buffer it has been given,
and punctuation at the end of the lines.
This very simple example allows a fixed maximum line length, such that longer lines will effectively be counted as
two lines. The fgets() function requires that the calling code provide the memory to be used as the destination for
the line that is read.
POSIX makes the getline() function available which instead internally allocates memory to enlarge the buffer as
necessary for a line of any length (as long as there is sufficient memory).
int main(void)
{
result = EXIT_SUCCESS;
fclose(fp);
}
return result;
}
This program creates and writes text in the binary form through the fwrite function to the file output.bin.
If a file with the same name already exists, its contents are discarded and the file is treated as a new empty file.
A binary stream is an ordered sequence of characters that can transparently record internal data. In this mode,
bytes are written between the program and the file without any interpretation.
To write integers portably, it must be known whether the file format expects them in big or little-endian format, and
the size (usually 16, 32 or 64 bits). Bit shifting and masking may then be used to write out the bytes in the correct
order. Integers in C are not guaranteed to have two's complement representation (though almost all
implementations do). Fortunately a conversion to unsigned is guaranteed to use twos complement. The code for
writing a signed integer to a binary file is therefore a little surprising.
The other functions follow the same pattern with minor modifications for size and byte order.
Note that length modifiers can be applied to %n (e.g. %hhn indicates that a following n conversion specifier applies to a
pointer to a signed char argument, according to the ISO/IEC 9899:2011 §7.21.6.1 ¶7).
Note that the floating point conversions apply to types float and double because of default promotion rules —
§6.5.2.2 Function calls, ¶7 The ellipsis notation in a function prototype declarator causes argument type conversion to
stop after the last declared parameter. The default argument promotions are performed on trailing arguments.) Thus,
functions such as printf() are only ever passed double values, even if the variable referenced is of type float.
With the g and G formats, the choice between e and f (or E and F) notation is documented in the C standard and in
the POSIX specification for printf():
The double argument representing a floating-point number shall be converted in the style f or e (or in the
style F or E in the case of a G conversion specifier), depending on the value converted and the precision.
Let P equal the precision if non-zero, 6 if the precision is omitted, or 1 if the precision is zero. Then, if a
conversion with style E would have an exponent of X:
If P > X >= -4, the conversion shall be with style f (or F) and precision P - (X+1).
Otherwise, the conversion shall be with style e (or E) and precision P - 1.
printf("Hello world!");
// Hello world!
Normal, unformatted character arrays can be printed by themselves by placing them directly in between the
parentheses.
int x = 3;
char y = 'Z';
char* z = "Example";
printf("Int: %d, Char: %c, String: %s", x, y, z);
// Int: 3, Char: Z, String: Example
Alternatively, integers, floating-point numbers, characters, and more can be printed using the escape character %,
followed by a character or sequence of characters denoting the format, known as the format specifier.
All additional arguments to the function printf() are separated by commas, and these arguments should be in the
same order as the format specifiers. Additional arguments are ignored, while incorrectly typed arguments or a lack
of arguments will cause errors or undefined behavior. Each argument can be either a literal value or a variable.
After successful execution, the number of characters printed is returned with type int. Otherwise, a failure returns
a negative value.
These flags are also supported by Microsoft with the same meanings.
int main(void)
{
int i;
int * p = &i;
return EXIT_SUCCESS;
}
Version ≥ C99
Using <inttypes.h> and uintptr_t
Another way to print pointers in C99 or later uses the uintptr_t type and the macros from <inttypes.h>:
int main(void)
{
int i;
int *p = &i;
return 0;
}
In theory, there might not be an integer type that can hold any pointer converted to an integer (so the type
uintptr_t might not exist). In practice, it does exist. Pointers to functions need not be convertible to the uintptr_t
type — though again they most often are convertible.
If the uintptr_t type exists, so does the intptr_t type. It is not clear why you'd ever want to treat addresses as
Prior to C89 during K&R-C times there was no type void* (nor header <stdlib.h>, nor prototypes, and hence no
int main(void) notation), so the pointer was cast to long unsigned int and printed using the lx length
modifier/conversion specifier.
The example below is just for informational purpose. Nowadays this is invalid code, which very well might
provoke the infamous Undefined Behaviour.
int main()
{
int i;
int *p = &i;
return 0;
}
To make sure there is a type being wide enough to hold such a "pointer-difference", since C99 <stddef.h> defines
the type ptrdiff_t. To print a ptrdiff_t use the t length modifier.
Version ≥ C99
#include <stdlib.h> /* for EXIT_SUCCESS */
#include <stdio.h> /* for printf() */
#include <stddef.h> /* for ptrdiff_t */
int main(void)
{
int a[2];
int * p1 = &a[0], * p2 = &a[1];
ptrdiff_t pd = p2 - p1;
return EXIT_SUCCESS;
}
p1 = 0x7fff6679f430
p2 = 0x7fff6679f434
p2 - p1 = 1
*1If the two pointers to be subtracted do not point to the same object the behaviour is undefined.
If a length modifier appears with any conversion specifier other than as specified above, the behavior is undefined.
Microsoft specifies some different length modifiers, and explicitly does not support hh, j, z, or t.
Note that the C, S, and Z conversion specifiers and the I, I32, I64, and w length modifiers are Microsoft extensions.
Treating l as a modifier for long double rather than double is different from the standard, though you'll be hard-
pressed to spot the difference unless long double has a different representation from double.
int *pointer; /* inside a function, pointer is uninitialized and doesn't point to any valid object
yet */
To declare two pointer variables of the same type, in the same declaration, use the asterisk symbol before each
identifier. For example,
The address-of or reference operator denoted by an ampersand (&) gives the address of a given variable which can
be placed in a pointer of appropriate type.
int value = 1;
pointer = &value;
The indirection or dereference operator denoted by an asterisk (*) gets the contents of an object pointed to by a
pointer.
If the pointer points to a structure or union type then you can dereference it and access its members directly using
the -> operator:
SomeStruct *s = &someObject;
s->someMember = 5; /* Equivalent to (*s).someMember = 5 */
In C, a pointer is a distinct value type which can be reassigned and otherwise is treated as a variable in its own right.
For example the following example prints the value of the pointer (variable) itself.
Because a pointer is a mutable variable, it is possible for it to not point to a valid object, either by being set to null
pointer = 0; /* or alternatively */
pointer = NULL;
or simply by containing an arbitrary bit pattern that isn't a valid address. The latter is a very bad situation, because
it cannot be tested before the pointer is being dereferenced, there is only a test for the case a pointer is null:
if (!pointer) exit(EXIT_FAILURE);
The value returned by the dereference operator is a mutable alias to the original variable, so it can be changed,
modifying the original variable.
*pointer += 1;
printf("Value of pointed to variable after change: %d\n", *pointer);
/* Value of pointed to variable after change: 2 */
Pointers are also re-assignable. This means that a pointer pointing to an object can later be used to point to
another object of the same type.
Like any other variable, pointers have a specific type. You can't assign the address of a short int to a pointer to a
long int, for instance. Such behavior is referred to as type punning and is forbidden in C, though there are a few
exceptions.
Although pointer must be of a specific type, the memory allocated for each type of pointer is equal to the memory
used by the environment to store addresses, rather than the size of the type that is pointed to.
#include <stdio.h>
int main(void) {
printf("Size of int pointer: %zu\n", sizeof (int*)); /* size 4 bytes */
printf("Size of int variable: %zu\n", sizeof (int)); /* size 4 bytes */
printf("Size of char pointer: %zu\n", sizeof (char*)); /* size 4 bytes */
printf("Size of char variable: %zu\n", sizeof (char)); /* size 1 bytes */
printf("Size of short pointer: %zu\n", sizeof (short*)); /* size 4 bytes */
printf("Size of short variable: %zu\n", sizeof (short)); /* size 2 bytes */
return 0;
}
(NB: if you are using Microsoft Visual Studio, which does not support the C99 or C11 standards, you must use %Iu1 instead
of %zu in the above sample.)
Note that the results above can vary from environment to environment in numbers but all environments would
show equal sizes for different types of pointer.
Pointers and arrays are intimately connected in C. Arrays in C are always held in contiguous locations in memory.
Pointer arithmetic is always scaled by the size of the item pointed to. So if we have an array of three doubles, and a
pointer to the base, *ptr refers to the first double, *(ptr + 1) to the second, *(ptr + 2) to the third. A more
convenient notation is to use array notation [].
So essentially ptr and the array name are interchangeable. This rule also means that an array decays to a pointer
when passed to a subroutine.
A pointer may point to any element in an array, or to the element beyond the last element. It is however an error to
set a pointer to any other value, including the element before the array. (The reason is that on segmented
architectures the address before the first element may cross a segment boundary, the compiler ensures that does
not happen for the last element plus one).
Footnote 1: Microsoft format information can be found via printf() and format specification syntax.
Memory allocation is not guaranteed to succeed, and may instead return a NULL pointer. Using the returned value,
without checking if the allocation is successful, invokes undefined behavior. This usually leads to a crash, but there
is no guarantee that a crash will happen so relying on that can also lead to problems.
Safe way:
For a given compiler/machine configuration, types have a known size; however, there isn't any standard which
defines that the size of a given type (other than char) will be the same for all compiler/machine configurations. If
the code uses 4 instead of sizeof(int) for memory allocation, it may work on the original machine, but the code
isn't necessarily portable to other machines or compilers. Fixed sizes for types should be replaced by
Non-portable allocation:
Portable allocation:
Memory leaks
Failure to de-allocate memory using free leads to a buildup of non-reusable memory, which is no longer used by
the program; this is called a memory leak. Memory leaks waste memory resources and can lead to allocation
failures.
Logical errors
Failure to adhere to this pattern, such as using memory after a call to free (dangling pointer) or before a call to
malloc (wild pointer), calling free twice ("double free"), etc., usually causes a segmentation fault and results in a
crash of the program.
These errors can be transient and hard to debug – for example, freed memory is usually not immediately reclaimed
by the OS, and thus dangling pointers may persist for a while and appear to work.
On systems where it works, Valgrind is an invaluable tool for identifying what memory is leaked and where it was
originally allocated.
Creating a pointer does not extend the life of the variable being pointed to. For example:
int* myFunction()
{
int x = 10;
return &x;
}
Here, x has automatic storage duration (commonly known as stack allocation). Because it is allocated on the stack, its
lifetime is only as long as myFunction is executing; after myFunction has exited, the variable x is destroyed. This
function gets the address of x (using &x), and returns it to the caller, leaving the caller with a pointer to a non-
existent variable. Attempting to access this variable will then invoke undefined behavior.
To resolve this, either malloc the storage for the variable to be returned, and return a pointer to the newly created
storage, or require that a valid pointer is passed in to the function instead of returning one, for example:
#include <stdlib.h>
#include <stdio.h>
int *solution1(void)
{
int *x = malloc(sizeof *x);
if (x == NULL)
{
/* Something went wrong */
return NULL;
}
*x = 10;
return x;
}
*x = 10;
}
int main(void)
{
{
/* Use solution1() */
free(foo); /* Tidy up */
}
{
/* Use solution2() */
int bar;
solution2(&bar);
return 0;
Post incrementing / decrementing is executed before dereferencing. Therefore, this expression will increment the
pointer p itself and return what was pointed by p before incrementing without changing it.
This rule also applies to post decrementing: *p-- will decrement the pointer p itself, not what is pointed by p.
To dereference a_pointer and change the value of a, we use the following operation
*a_pointer = 2;
However, one would be mistaken to dereference a NULL or otherwise invalid pointer. This
p1 = (int *) 0xbad;
p2 = NULL;
*p1 = 42;
*p2 = *p1 + 1;
is usually undefined behavior. p1 may not be dereferenced because it points to an address 0xbad which may not be
a valid address. Who knows what's there? It might be operating system memory, or another program's memory.
The only time code like this is used, is in embedded development, which stores particular information at hard-
coded addresses. p2 cannot be dereferenced because it is NULL, which is invalid.
struct MY_STRUCT
{
int my_int;
float my_float;
};
We can define MY_STRUCT to omit the struct keyword so we don't have to type struct MY_STRUCT each time we use
it. This, however, is optional.
MY_STRUCT *instance;
If this statement appears at file scope, instance will be initialized with a null pointer when the program starts. If
this statement appears inside a function, its value is undefined. The variable must be initialized to point to a valid
MY_STRUCT variable, or to dynamically allocated space, before it can be dereferenced. For example:
When the pointer is valid, we can dereference it to access its members using one of two different notations:
int a = (*instance).my_int;
float b = instance->my_float;
While both these methods work, it is better practice to use the arrow -> operator rather than the combination of
parentheses, the dereference * operator and the dot . operator because it is easier to read and understand,
especially with nested uses.
In this case, copy contains a copy of the contents of instance. Changing my_int of copy will not change it in
instance.
In this case, ref is a reference to instance. Changing my_int using the reference will change it in instance.
It is common practice to use pointers to structs as parameters in functions, rather than the structs themselves.
Using the structs as function parameters could cause the stack to overflow if the struct is large. Using a pointer to a
struct only uses enough stack space for the pointer, but can cause side effects if the function changes the struct
which is passed into the function.
Pointer to an int
The pointer can point to different integers and the int's can be changed through the pointer. This sample of
code assigns b to point to int b then changes b's value to 100.
int b;
int* p;
p = &b; /* OK */
*p = 100; /* OK */
The pointer can point to different integers but the int's value can't be changed through the pointer.
int b;
const int* p;
p = &b; /* OK */
*p = 100; /* Compiler Error */
The pointer can only point to one int but the int's value can be changed through the pointer.
int a, b;
int* const p = &b; /* OK as initialisation, no assignment */
*p = 100; /* OK */
p = &a; /* Compiler Error */
The pointer can only point to one int and the int can not be changed through the pointer.
int a, b;
const int* const p = &b; /* OK as initialisation, no assignment */
p = &a; /* Compiler Error */
*p = 100; /* Compiler Error */
Pointer to Pointer
This code assigns the address of p1 to the to double pointer p (which then points to int* p1 (which points to
int)).
void f1(void)
{
int a, b;
int *p1;
int **p;
p1 = &b; /* OK */
p = &p1; /* OK */
*p = &a; /* OK */
**p = 100; /* OK */
}
void f2(void)
{
int b;
const int *p1;
const int **p;
void f3(void)
{
int b;
int *p1;
int * const *p;
p = &p1; /* OK */
*p = &b; /* error: assignment of read-only location ‘*p’ */
**p = 100; /* OK */
}
void f4(void)
{
int b;
int *p1;
int ** const p = &p1; /* OK as initialisation, not assignment */
p = &p1; /* error: assignment of read-only variable ‘p’ */
*p = &b; /* OK */
**p = 100; /* OK */
}
void f5(void)
{
int b;
const int *p1;
const int * const *p;
p = &p1; /* OK */
*p = &b; /* error: assignment of read-only location ‘*p’ */
**p = 100; /* error: assignment of read-only location ‘**p’ */
}
void f6(void)
{
int b;
const int *p1;
const int ** const p = &p1; /* OK as initialisation, not assignment */
p = &p1; /* error: assignment of read-only variable ‘p’ */
*p = &b; /* OK */
**p = 100; /* error: assignment of read-only location ‘**p’ */
}
void f7(void)
{
my_pointer = &my_function;
...
printf("%d\n", result);
}
Although this syntax seems more natural and coherent with basic types, attributing and dereferencing function
pointers don't require the usage of & and * operators. So the following snippet is equally valid:
Another readability trick is that the C standard allows one to simplify a function pointer in arguments like above
(but not in variable declaration) to something that looks like a function prototype; thus the following can be
equivalently used for function definitions and declarations:
See also
Function Pointers
void qsort (
void *base, /* Array to be sorted */
size_t num, /* Number of elements in array */
size_t size, /* Size in bytes of each element */
int (*compar)(const void *, const void *)); /* Comparison function for two elements */
The array to be sorted is passed as a void pointer, so an array of any type of element can be operated on. The next
two arguments tell qsort() how many elements it should expect in the array, and how large, in bytes, each element
is.
The last argument is a function pointer to a comparison function which itself takes two void pointers. By making the
caller provide this function, qsort() can effectively sort elements of any type.
Here's an example of such a comparison function, for comparing floats. Note that any comparison function passed
to qsort() needs to have this type signature. The way it is made polymorphic is by casting the void pointer
arguments to pointers of the type of element we wish to compare.
Now, the usage of the polymorphic function qsort on an array "array" with length "len" is very simple:
Suppose that
int i = 1;
int *p = NULL;
So then a statement p = &i;, copies the address of the variable i to the pointer p.
#include <stddef.h>
int main()
{
int *p1 = NULL;
char *p2 = NULL;
float *p3 = NULL;
...
}
In most operating systems, inadvertently using a pointer that has been initialized to NULL will often result in the
program crashing immediately, making it easy to identify the cause of the problem. Using an uninitialized pointer
can often cause hard-to-diagnose bugs.
Caution:
The result of dereferencing a NULL pointer is undefined, so it will not necessarily cause a crash even if that is the
natural behaviour of the operating system the program is running on. Compiler optimizations may mask the crash,
cause the crash to occur before or after the point in the source code at which the null pointer dereference
occurred, or cause parts of the code that contains the null pointer dereference to be unexpectedly removed from
the program. Debug builds will not usually exhibit these behaviours, but this is not guaranteed by the language
standard. Other unexpected and/or undesirable behaviour is also allowed.
Because NULL never points to a variable, to allocated memory, or to a function, it is safe to use as a guard value.
Usually NULL is defined as (void *)0. But this does not imply that the assigned memory address is 0x0. For more
clarification refer to C-faq for NULL pointers
Note that you can also initialize pointers to contain values other than NULL.
int i1;
int main()
{
int *p1 = &i1;
const char *p2 = "A constant string to point to";
float *p3 = malloc(10 * sizeof(float));
}
#include <stdio.h>
#include <stdlib.h>
int main(void) {
int A = 42;
int* pA = &A;
int** ppA = &pA;
int*** pppA = &ppA;
return EXIT_SUCCESS;
}
#include <stdio.h>
#include <stdlib.h>
int main(void) {
int A = 42;
int* pA = &A;
int** ppA = &&A; /* Compilation error here! */
int*** pppA = &&&A; /* Compilation error here! */
...
void* is a catch all type for pointers to object types. An example of this in use is with the malloc function, which is
declared as
void* malloc(size_t);
It is generally considered good practice to not explicitly cast the values into and out of void pointers. In specific case
of malloc() this is because with an explicit cast, the compiler may otherwise assume, but not warn about, an
incorrect return type for malloc(), if you forget to include stdlib.h. It is also a case of using the correct behavior of
void pointers to better conform to the DRY (don't repeat yourself) principle; compare the above to the following,
wherein the following code contains several needless additional places where a typo could cause issues:
void* memcpy(void *restrict target, void const *restrict source, size_t size);
have their arguments specified as void * because the address of any object, regardless of the type, can be passed
in. Here also, a call should not use a cast
The most confusing thing surrounding pointer syntax in C and C++ is that there are actually two different meanings
that apply when the pointer symbol, the asterisk (*), is used with a variable.
Example
int i = 5;
/* 'p' is a pointer to an integer, initialized as NULL */
int *p = NULL;
/* '&i' evaluates into address of 'i', which then assigned to 'p' */
p = &i;
/* 'p' is now holding the address of 'i' */
When you're not declaring (or multiplying), * is used to dereference a pointer variable:
*p = 123;
/* 'p' was pointing to 'i', so this changes value of 'i' to 123 */
When you want an existing pointer variable to hold address of other variable, you don't use *, but do it like this:
p = &another_variable;
A common confusion among C-programming newbies arises when they declare and initialize a pointer variable at
the same time.
Since int i = 5; and int i; i = 5; give the same result, some of them might thought int *p = &i; and int *p;
*p = &i; give the same result too. The fact is, no, int *p; *p = &i; will attempt to deference an uninitialized
pointer which will result in UB. Never use * when you're not declaring nor dereferencing a pointer.
Conclusion
The asterisk (*) has two distinct meanings within C in relation to pointers, depending on where it's used. When used
within a variable declaration, the value on the right hand side of the equals side should be a pointer value to an
address in memory. When used with an already declared variable, the asterisk will dereference the pointer value,
following it to the pointed-to place in memory, and allowing the value stored there to be assigned or retrieved.
Takeaway
It is important to mind your P's and Q's, so to speak, when dealing with pointers. Be mindful of when you're using
the asterisk, and what it means when you use it there. Overlooking this tiny detail could result in buggy and/or
undefined behavior that you really don't want to have to deal with.
a + b;
a - b;
a * b;
a / b;
a % b;
a & b;
a | b;
In the above examples, the expression a may be evaluated before or after the expression b, b may be evaluated
before a, or they may even be intermixed if they correspond to several instructions.
f(a, b);
Here not only a and b are unsequenced (i.e. the , operator in a function call does not produce a sequence point) but
also f, the expression that determines the function that is to be called.
Side effects may be applied immediately after evaluation or deferred until a later point.
Expressions like
or
x++ & x;
f(x++, x);
x++ * x;
a[i++] = i;
a && b
a || b
In all cases, the expression a is fully evaluated and all side effects are applied before either b or c are evaluated. In
the fourth case, only one of b or c will be evaluated. In the last case, b is fully evaluated and all side effects are
applied before c is evaluated.
In all cases, the evaluation of expression a is sequenced before the evaluations of b or c (alternately, the evaluations
of b and c are sequenced after the evaluation of a).
unsigned counter = 0;
unsingned account(void) {
return counter++;
}
int main(void) {
printf("the order is %u %u\n", account(), account());
}
This implicit twofold modification of counter during the evaluation of the printf arguments is valid, we just don't
know which of the calls comes first. As the order is unspecified, it may vary and cannot be depended on. So the
printout could be:
the order is 0 1
or
the order is 1 0
has undefined behavior because there is no sequence point between the two modifications of counter.
Usage
So the above code will graph whatever function you passed into it - as long as that function meets certain criteria:
namely, that you pass a double in and get a double out. There are many functions like that - sin(), cos(), tan(),
exp() etc. - but there are many that aren't, such as graph() itself!
Syntax
So how do you specify which functions you can pass into graph() and which ones you can't? The conventional way
is by using a syntax that may not be easy to read or understand:
The problem above is that there are two things trying to be defined at the same time: the structure of the function,
and the fact that it's a pointer. So, split the two definitions! But by using typedef, a better syntax (easier to read &
understand) can be achieved.
enum Op
{
int main(void)
{
int a, b, c;
int (*fp)(int,int);
fp = getmath(ADD);
a = 1, b = 2;
c = (*fp)(a, b);
printf("%d + %d = %d\n", a, b, c);
return 0;
}
It might be handy to use a typedef instead of declaring the function pointer each time by hand.
Example:
Posit that we have a function, sort, that expects a function pointer to a function compare such that:
compare - A compare function for two elements which is to be supplied to a sort function.
Without a typedef we would pass a function pointer as an argument to a function in the following manner:
Function pointers are the only place where you should include the pointer property of the type, e.g. do not try to
define types like typedef struct something_struct *something_type. This applies even for a structure with
members which are not supposed to accessed directly by API callers, for example the stdio.h FILE type (which as
you now will notice is not a pointer).
A function pointer should almost always take a user-supplied void * as a context pointer.
Example
/* function minimiser, details unimportant */
double findminimum( double (*fptr)(double x, double y, void *ctx), void *ctx)
{
...
/* repeatedly make calls like this */
temp = (*fptr)(testx, testy, ctx);
}
void caller()
Using the context pointer means that the extra parameters do not need to be hard-coded into the function pointed
to, or require the use globals.
The library function qsort() does not follow this rule, and one can often get away without context for trivial
comparison functions. But for anything more complicated, the context pointer becomes essential.
See also
Functions pointers
int main(void)
{
int num = 0; /* declare number to increment */
int (*fp)(int); /* declare a function pointer */
Declaring the pointer takes the return value of the function, the name of the function, and the type of
arguments/parameters it receives.
void Print(void){
printf("look ma' - no hands, only pointers!\n");
}
As seen in more advanced examples in this document, declaring a pointer to a function could get messy if the
function is passed more than a few parameters. If you have a few pointers to functions that have identical
"structure" (same type of return value, and same type of parameters) it's best to use the typedef command to save
you some typing, and to make the code more clear:
int main()
You can also create an Array of function-pointers. If all the pointers are of the same "structure":
It is also possible to define an array of function-pointers of different types, though that would require casting when-
ever you want to access the specific function. You can learn more here.
#include <stdio.h>
void modify(int v) {
printf("modify 1: %d\n", v); /* 0 is printed */
v = 42;
printf("modify 2: %d\n", v); /* 42 is printed */
}
int main(void) {
int v = 0;
printf("main 1: %d\n", v); /* 0 is printed */
modify(v);
printf("main 2: %d\n", v); /* 0 is printed, not 42 */
return 0;
}
You can use pointers to let callee functions modify caller functions' local variables. Note that this is not pass by
reference but the pointer values pointing at the local variables are passed.
#include <stdio.h>
void modify(int* v) {
printf("modify 1: %d\n", *v); /* 0 is printed */
*v = 42;
printf("modify 2: %d\n", *v); /* 42 is printed */
}
int main(void) {
int v = 0;
printf("main 1: %d\n", v); /* 0 is printed */
modify(&v);
printf("main 2: %d\n", v); /* 42 is printed */
return 0;
}
However returning the address of a local variable to the callee results in undefined behaviour. See Dereferencing a
pointer to variable beyond its lifetime.
/* Type "void" and VLAs ("int friend_indexes[static size]") require C99 at least.
In C11 VLAs are optional. */
void getListOfFriends(size_t size, int friend_indexes[static size]) {
Here the static inside the [] of the function parameter, request that the argument array must have at least as
many elements as are specified (i.e. size elements). To be able to use that feature we have to ensure that the size
parameter comes before the array parameter in the list.
int main(void) {
size_t size_of_list = LIST_SIZE;
int friends_indexes[size_of_list];
return 0;
}
See also
#include <stdio.h>
int main(void)
{
int a = 1;
function(a++, ++a);
return 0;
}
#include <stdio.h>
int main(void)
{
int a = 0;
double b = 0.0;
return 0;
}
However you can also use a struct as a return value which allows you to return both an error status along with
other values as well. For instance.
typedef struct {
int iStat; /* Return status */
int iValue; /* Return value */
} RetValue;
return iRetStatus;
}
if (iRet.iStat == 1) {
/* do things with iRet.iValue, the returned value */
}
return 0;
}
#include <stdio.h>
#include <stdlib.h>
#define ROWS 3
#define COLS 2
int main()
{
int array_2D[ROWS][COLS] = { {1, 2}, {3, 4}, {5, 6} };
int n = ROWS;
int m = COLS;
fun1(array_2D, n, m);
return EXIT_SUCCESS;
}
But the compiler, here GCC in version 4.9.4 , does not appreciate it well.
The reasons for this are twofold: the main problem is that arrays are not pointers and the second inconvenience is
the so called pointer decay. Passing an array to a function will decay the array to a pointer to the first element of the
array--in the case of a 2d array it decays to a pointer to the first row because in C arrays are sorted row-first.
#include <stdio.h>
#include <stdlib.h>
#define ROWS 3
#define COLS 2
fun1(array_2D, n, m);
return EXIT_SUCCESS;
}
#include <stdio.h>
#include <stdlib.h>
#define ROWS 3
#define COLS 2
int main()
{
int array_2D[ROWS][COLS] = { {1, 2}, {3, 4}, {5, 6} };
int rows = ROWS;
fun1(array_2D, rows);
return EXIT_SUCCESS;
}
n = rows;
/* Works, because that information is passed (as "COLS").
It is also redundant because that value is known at compile time (in "COLS"). */
m = (int) (sizeof(a[0])/sizeof(a[0][0]));
/* Does not work here because the "decay" in "pointer decay" is meant
literally--information is lost. */
printf("FUN1: %zu\n",sizeof(a)/sizeof(a[0]));
The number of columns is predefined and hence fixed at compile time, but the predecessor to the current C-
standard (that was ISO/IEC 9899:1999, current is ISO/IEC 9899:2011) implemented VLAs (TODO: link it) and although
the current standard made it optional, almost all modern C-compilers support it (TODO: check if MS Visual Studio
supports it now).
#include <stdio.h>
#include <stdlib.h>
if(argc != 3){
fprintf(stderr,"Usage: %s rows cols\n",argv[0]);
exit(EXIT_FAILURE);
}
rows = atoi(argv[1]);
cols = atoi(argv[2]);
int array_2D[rows][cols];
exit(EXIT_SUCCESS);
}
n = rows;
/* Does not work anymore, no sizes are specified anymore
m = (int) (sizeof(a[0])/sizeof(a[0][0])); */
m = cols;
It becomes a bit clearer if we intentionally make an error in the call of the function by changing the declaration to
void fun1(int **a, int rows, int cols). That causes the compiler to complain in a different, but equally
nebulous way
We can react in several ways, one of it is to ignore all of it and do some illegible pointer juggling:
#include <stdio.h>
#include <stdlib.h>
if(argc != 3){
fprintf(stderr,"Usage: %s rows cols\n",argv[0]);
exit(EXIT_FAILURE);
}
rows = atoi(argv[1]);
cols = atoi(argv[2]);
int array_2D[rows][cols];
printf("Make array with %d rows and %d columns\n", rows, cols);
for (i = 0; i < rows; i++) {
for (j = 0; j < cols; j++) {
array_2D[i][j] = i * cols + j;
printf("array[%d][%d]=%d\n", i, j, array_2D[i][j]);
}
}
exit(EXIT_SUCCESS);
}
Or we do it right and pass the needed information to fun1. To do so wee need to rearrange the arguments to fun1:
the size of the column must come before the declaration of the array. To keep it more readable the variable holding
the number of rows has changed its place, too, and is first now.
#include <stdio.h>
#include <stdlib.h>
if(argc != 3){
fprintf(stderr,"Usage: %s rows cols\n",argv[0]);
exit(EXIT_FAILURE);
}
rows = atoi(argv[1]);
cols = atoi(argv[2]);
int array_2D[rows][cols];
printf("Make array with %d rows and %d columns\n", rows, cols);
for (i = 0; i < rows; i++) {
for (j = 0; j < cols; j++) {
array_2D[i][j] = i * cols + j;
printf("array[%d][%d]=%d\n", i, j, array_2D[i][j]);
}
}
exit(EXIT_SUCCESS);
}
n = rows;
m = cols;
This looks awkward to some people, who hold the opinion that the order of variables should not matter. That is not
much of a problem, just declare a pointer and let it point to the array.
#include <stdio.h>
#include <stdlib.h>
if(argc != 3){
fprintf(stderr,"Usage: %s rows cols\n",argv[0]);
exit(EXIT_FAILURE);
}
rows = atoi(argv[1]);
cols = atoi(argv[2]);
int array_2D[rows][cols];
printf("Make array with %d rows and %d columns\n", rows, cols);
for (i = 0; i < rows; i++) {
for (j = 0; j < cols; j++) {
array_2D[i][j] = i * cols + j;
printf("array[%d][%d]=%d\n", i, j, array_2D[i][j]);
}
}
// a "rows" number of pointers to "int". Again a VLA
int *a[rows];
// initialize them to point to the individual rows
for (i = 0; i < rows; i++) {
a[i] = array_2D[i];
}
exit(EXIT_SUCCESS);
}
n = rows;
m = cols;
/* pass it to a subroutine */
manipulate_matrix(matrix, width, height);
Value Meaning
EDOM Domain error
ERANGE Range error
EILSEQ Illegal multi-byte character sequence
if (last_error) {
fprintf(stderr, "fopen: Could not open %s for writing: %s",
argv[1], strerror(last_error));
fputs("Cross fingers and continue", stderr);
}
return EXIT_SUCCESS;
}
Code that invokes UB may work as intended on a specific system with a specific compiler, but will likely not work on
another system, or with a different compiler, compiler version or compiler settings.
return 0;
}
Some compilers helpfully point this out. For example, gcc warns with:
warning: address of stack memory associated with local variable 'baz' returned
[-Wreturn-stack-address]
for the above code. But compilers may not be able to help in complex code.
(1) Returning reference to variable declared static is defined behaviour, as the variable is not destroyed after
leaving current scope.
(2) According to ISO/IEC 9899:2011 6.2.4 §2, "The value of a pointer becomes indeterminate when the object it
points to reaches the end of its lifetime."
(3) Dereferencing the pointer returned by the function foo is undefined behaviour as the memory it references
holds an indeterminate value.
... attempts to copy 10 bytes where the source and destination memory areas overlap by three bytes. To visualize:
overlapping area
|
_ _
| |
v v
T h i s i s a n e x a m p l e \0
^ ^
| |
| destination
|
source
Among the standard library functions with a limitation of this kind are memcpy(), strcpy(), strcat(), sprintf(),
and sscanf(). The standard says of these and several other functions:
If copying takes place between objects that overlap, the behavior is undefined.
The memmove() function is the principal exception to this rule. Its definition specifies that the function behaves as if
the source data were first copied into a temporary buffer and then written to the destination address. There is no
exception for overlapping source and destination regions, nor any need for one, so memmove() has well-defined
behavior in such cases.
The distinction reflects an efficiency vs. generality tradeoff. Copying such as these functions perform usually occurs
between disjoint regions of memory, and often it is possible to know at development time whether a particular
instance of memory copying will be in that category. Assuming non-overlap affords comparatively more efficient
implementations that do not reliably produce correct results when the assumption does not hold. Most C library
functions are allowed the more efficient implementations, and memmove() fills in the gaps, serving the cases where
the source and destination may or do overlap. To produce the correct effect in all cases, however, it must perform
additional tests and / or employ a comparatively less efficient implementation.
Most instances of this type of undefined behavior are more difficult to recognize or predict. Overflow can in
principle arise from any addition, subtraction, or multiplication operation on signed integers (subject to the usual
arithmetic conversions) where there are not effective bounds on or a relationship between the operands to prevent
it. For example, this function:
int square(int x) {
return x * x; /* overflows for some values of x */
}
is reasonable, and it does the right thing for small enough argument values, but its behavior is undefined for larger
argument values. You cannot judge from the function alone whether programs that call it exhibit undefined
behavior as a result. It depends on what arguments they pass to it.
On the other hand, consider this trivial example of overflow-safe signed integer arithmetic:
int zero(int x) {
return x - x; /* Cannot overflow */
}
The relationship between the operands of the subtraction operator ensures that the subtraction never overflows.
Or consider this somewhat more practical example:
As long as that the counters do not overflow individually, the operands of the final subtraction will both be non-
negative. All differences between any two such values are representable as int.
The variable a is an int with automatic storage duration. The example code above is trying to print the value of an
uninitialized variable (a was never initialized). Automatic variables which are not initialized have indeterminate
values; accessing these can lead to undefined behavior.
Note: Variables with static or thread local storage, including global variables without the static keyword, are
initialized to either zero, or their initialized value. Hence the following is legal.
static int b;
printf("%d", b);
A very common mistake is to not initialize the variables that serve as counters to 0. You add values to them, but
Example:
#include <stdio.h>
int main(void) {
int i, counter;
for(i = 0; i < 10; ++i)
counter += i;
printf("%d\n", counter);
return 0;
}
Output:
The above rules are applicable for pointers as well. For example, the following results in undefined behavior
int main(void)
{
int *p;
p++; // Trying to increment an uninitialized pointer.
}
Note that the above code on its own might not cause an error or segmentation fault, but trying to dereference this
pointer later would cause the undefined behavior.
C11 introduced support for multiple threads of execution, which affords the possibility of data races. A program
contains a data race if an object in it is accessed1 by two different threads, where at least one of the accesses is
non-atomic, at least one modifies the object, and program semantics fail to ensure that the two accesses cannot
overlap temporally.2 Note well that actual concurrency of the accesses involved is not a condition for a data race;
data races cover a broader class of issues arising from (allowed) inconsistencies in different threads' views of
memory.
#include <threads.h>
int a = 0;
return 0;
}
int b = a;
thrd_join( id , NULL );
}
The main thread calls thrd_create to start a new thread running function Function. The second thread modifies a,
and the main thread reads a. Neither of those access is atomic, and the two threads do nothing either individually
or jointly to ensure that they do not overlap, so there is a data race.
Among the ways this program could avoid the data race are
the main thread could perform its read of a before starting the other thread;
the main thread could perform its read of a after ensuring via thrd_join that the other has terminated;
the threads could synchronize their accesses via a mutex, each one locking that mutex before accessing a
and unlocking it afterward.
As the mutex option demonstrates, avoiding a data race does not require ensuring a specific order of operations,
such as the child thread modifying a before the main thread reads it; it is sufficient (for avoiding a data race) to
ensure that for a given execution, one access will happen before the other.
2 (Quoted from ISO:IEC 9889:201x, section 5.1.2.4 "Multi-threaded executions and data races")
The execution of a program contains a data race if it contains two conflicting actions in different threads, at least
one of which is not atomic, and neither happens before the other. Any such data race results in undefined
behavior.
char *p = malloc(5);
free(p);
if (p == NULL) /* NOTE: even without dereferencing, this may have UB */
{
[…] The value of a pointer becomes indeterminate when the object it points to (or just past) reaches the
end of its lifetime.
long z = 'B';
printf("%c\n", z);
printf("%f\n",0);
Above line of code is undefined behavior. %f expects double. However 0 is of type int.
Note that your compiler usually can help you avoid cases like these, if you turn on the proper flags during compiling
(-Wformat in clang and gcc). From the last example:
warning: format specifies type 'double' but the argument has type
'int' [-Wformat]
printf("%f\n",0);
~~ ^
%d
However, modifying a mutable array of char directly, or through a pointer is naturally not undefined behavior, even
if its initializer is a literal string. The following is fine:
a[0] = 'H';
p[7] = 'W';
That's because the string literal is effectively copied to the array each time the array is initialized (once for variables
with static duration, each time the array is created for variables with automatic or thread duration — variables with
allocated duration aren't initialized), and it is fine to modify array contents.
However, the undefined behavior does not always mean that the program crashes — some systems take steps to
avoid the crash that normally happens when a null pointer is dereferenced. For example Glibc is known to print
(null)
for the code above. However, add (just) a newline to the format string and you will get a crash:
char *foo = 0;
printf("%s\n", foo); /* undefined behavior */
In this case, it happens because GCC has an optimization that turns printf("%s\n", argument); into a call to puts
with puts(argument), and puts in Glibc does not handle null pointers. All this behavior is standard conforming.
Note that null pointer is different from an empty string. So, the following is valid and has no undefined behaviour. It'll
just print a newline:
Code like this often leads to speculations about the "resulting value" of i. Rather than specifying an outcome,
however, the C standards specify that evaluating such an expression produces undefined behavior. Prior to C2011,
the standard formalized these rules in terms of so-called sequence points:
Between the previous and next sequence point a scalar object shall have its stored value modified at
most once by the evaluation of an expression. Furthermore, the prior value shall be read only to
determine the value to be stored.
That scheme proved to be a little too coarse, resulting in some expressions exhibiting undefined behavior with
respect to C99 that plausibly should not do. C2011 retains sequence points, but introduces a more nuanced
approach to this area based on sequencing and a relationship it calls "sequenced before":
If a side effect on a scalar object is unsequenced relative to either a different side effect on the same
scalar object or a value computation using the value of the same scalar object, the behavior is undefined.
If there are multiple allowable orderings of the subexpressions of an expression, the behavior is
undefined if such an unsequenced side effect occurs in any of the orderings.
The full details of the "sequenced before" relation are too long to describe here, but they supplement sequence
points rather than supplanting them, so they have the effect of defining behavior for some evaluations whose
int i = 42;
i = (i++, i+42); /* The comma-operator creates a sequence point */
int i = 42;
printf("%d %d\n", i++, i++); /* commas as separator of function arguments are not comma-operators */
As with any form of undefined behavior, observing the actual behavior of evaluating expressions that violate the
sequencing rules is not informative, except in a retrospective sense. The language standard provides no basis for
expecting such observations to be predictive even of the future behavior of the same program.
int * x = malloc(sizeof(int));
*x = 9;
free(x);
free(x);
Otherwise, if the argument does not match a pointer earlier returned by the calloc, malloc, or realloc
function, or if the space has been deallocated by a call to free or realloc, the behavior is undefined.
If left shift is performed on a positive value and result of the mathematical value is not representable in the type,
it's undefined1:
/* Assuming an int is 32-bits wide, the value '5 * 2^72' doesn't fit
* in an int. So, this is undefined. */
Note that right shift on a negative value (.e.g -5 >> 3) is not undefined but implementation-defined.
If the value of the right operand is negative or is greater than or equal to the width of the promoted left
operand, the behavior is undefined.
The function specifier _Noreturn was introduced in C11. The header <stdnoreturn.h> provides a macro noreturn
which expands to _Noreturn. So using _Noreturn or noreturn from <stdnoreturn.h> is fine and equivalent.
A function that's declared with _Noreturn (or noreturn) is not allowed to return to its caller. If such a function does
return to its caller, the behavior is undefined.
In the following example, func() is declared with noreturn specifier but it returns to its caller.
#include <stdio.h>
#include <stdlib.h>
#include <stdnoreturn.h>
void func(void)
{
printf("In func()...\n");
} /* Undefined behavior as func() returns */
int main(void)
{
func();
return 0;
}
$ gcc test.c
test.c: In function ‘func’:
test.c:9:1: warning: ‘noreturn’ function does return
}
^
$ clang test.c
test.c:9:1: warning: function declared 'noreturn' should not return [-Winvalid-noreturn]
}
^
#include <stdio.h>
#include <stdlib.h>
#include <stdnoreturn.h>
int main(void)
{
my_exit();
return 0;
}
int array[3];
int *beyond_array = array + 3;
*beyond_array = 0; /* Accesses memory that has not been allocated. */
The third line accesses the 4th element in an array that is only 3 elements long, leading to undefined behavior.
Similarly, the behavior of the second line in the following code fragment is also not well defined:
int array[3];
array[3] = 0;
Note that pointing past the last element of an array is not undefined behavior (beyond_array = array + 3 is well
defined here), but dereferencing it is (*beyond_array is undefined behavior). This rule also holds for dynamically
allocated memory (such as buffers created through malloc).
foo_ptr = (int *)&foo_readonly; /* (1) This casts away the const qualifier */
*foo_ptr = 20; /* This is undefined behavior */
return 0;
}
If an attempt is made to modify an object defined with a const-qualified type through use of an lvalue
with non-const-qualified type, the behavior is undefined. [...]
(1) In GCC this can throw the following warning: warning: assignment discards ‘const’ qualifier from pointer
target type [-Wdiscarded-qualifiers]
uninitialized
defined with automatic storage duration
it's address is never taken
1 (Quoted from: ISO:IEC 9899:201X 6.3.2.1 Lvalues, arrays, and function designators 2)
If the lvalue designates an object of automatic storage duration that could have been declared with the register
storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no
assignment to it has been performed prior to use), the behavior is undefined.
According to C11, if addition or subtraction of a pointer into, or just beyond, an array object and an integer type
produces a result that does not point into, or just beyond, the same array object, the behavior is undefined (6.5.6).
Additionally it is naturally undefined behavior to dereference a pointer that points to just beyond the array:
A NULL pointer is guaranteed by the C standard to compare unequal to any pointer to a valid object, and
dereferencing it invokes undefined behavior.
#include <stdio.h>
int main()
{
int i;
char input[4096];
scanf("%i", &i);
fflush(stdin); // <-- undefined behavior
gets(input);
return 0;
}
There is no standard way to discard unread characters from an input stream. On the other hand, some
implementations uses fflush to clear stdin buffer. Microsoft defines the behavior of fflush on an input stream: If
the stream is open for input, fflush clears the contents of the buffer. According to POSIX.1-2008, the behavior of
fflush is undefined unless the input file is seekable.
If, within a translation unit, the same identifier appears with both internal and external linkage, the
behavior is undefined.
Note that if an prior declaration of an identifier is visible then it'll have the prior declaration's linkage. C11, §6.2.2, 4
allows it:
For an identifier declared with the storage-class specifier extern in a scope in which a prior declaration of
that identifier is visible,31) if the prior declaration specifies internal or external linkage, the linkage of the
identifier at the later declaration is the same as the linkage specified at the prior declaration. If no prior
declaration is visible, or if the prior declaration specifies no linkage, then the identifier has external
linkage.
int main(void) {
/* Trying to use the (not) returned value causes UB */
int value = foo();
return 0;
}
When a function is declared to return a value then it has to do so on every possible code path through it. Undefined
behavior occurs as soon as the caller (which is expecting a return value) tries to use the return value1.
Note that the undefined behaviour happens only if the caller attempts to use/access the value from the function.
For example,
int foo(void) {
/* do stuff */
/* no return here */
}
int main(void) {
/* The value (not) returned from foo() is unused. So, this program
* doesn't cause *undefined behaviour*. */
foo();
return 0;
}
Version ≥ C99
The main() function is an exception to this rule in that it is possible for it to be terminated without a return
statement because an assumed return value of 0 will automatically be used in this case2.
If the } that terminates a function is reached, and the value of the function call is used by the caller, the
behavior is undefined.
or
double x = 0.0;
double y = 5.0 / x; /* floating point division */
or
int x = 0;
int y = 5 % x; /* modulo operation */
For the second line in each example, where the value of the second operand (x) is zero, the behaviour is undefined.
Note that most implementations of floating point math will follow a standard (e.g. IEEE 754), in which case
operations like divide-by-zero will have consistent results (e.g., INFINITY) even though the C standard says the
operation is undefined.
The undefined behavior happens as the pointer is converted. According to C11, if a conversion between two pointer
types produces a result that is incorrectly aligned (6.3.2.3), the behavior is undefined. Here an uint32_t could require
alignment of 2 or 4 for example.
calloc on the other hand is required to return a pointer that is suitably aligned for any object type; thus
memory_block is properly aligned to contain an uint32_t in its initial part. Then, on a system where uint32_t has
required alignment of 2 or 4, memory_block + 1 will be an odd address and thus not properly aligned.
Observe that the C standard requests that already the cast operation is undefined. This is imposed because on
platforms where addresses are segmented, the byte address memory_block + 1 may not even have a proper
representation as an integer pointer.
Casting char * to pointers to other types without any concern to alignment requirements is sometimes incorrectly
used for decoding packed structures such as file headers or network packets.
You can avoid the undefined behavior arising from misaligned pointer conversion by using memcpy:
Here no pointer conversion to uint32_t* takes place and the bytes are copied one by one.
This copy operation for our example only leads to valid value of mvalue because:
We used calloc, so the bytes are properly initialized. In our case all bytes have value 0, but any other proper
initialization would do.
uint32_t is an exact width type and has no padding bits
Any arbitrary bit pattern is a valid representation for any unsigned type.
The getenv function returns a pointer to a string associated with the matched list member. The string
pointed to shall not be modified by the program, but may be overwritten by a subsequent call to the
getenv function.
The strerror function returns a pointer to the string, the contents of which are localespecific. The array
pointed to shall not be modified by the program, but may be overwritten by a subsequent call to the
strerror function.
The pointer to string returned by the setlocale function is such that a subsequent call with that string
value and its associated category will restore that part of the program’s locale. The string pointed to shall
not be modified by the program, but may be overwritten by a subsequent call to the setlocale function.
Similarly the localeconv() function returns a pointer to struct lconv which shall not be modified.
The localeconv function returns a pointer to the filled-in object. The structure pointed to by the return
value shall not be modified by the program, but may be overwritten by a subsequent call to the
localeconv function.
srand(int) is used to seed the pseudo-random number generator. Each time rand() is seeded wih the same seed,
it must produce the same sequence of values. It should only be seeded once before calling rand(). It should not be
repeatedly seeded, or reseeded every time you wish to generate a new batch of pseudo-random numbers.
Standard practice is to use the result of time(NULL) as a seed. If your random number generator requires to have a
deterministic sequence, you can seed the generator with the same value on each program start. This is generally
not required for release code, but is useful in debug runs to make bugs reproducible.
It is advised to always seed the generator, if not seeded, it behaves as if it was seeded with srand(1).
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int main(void) {
int i;
srand(time(NULL));
i = rand();
Possible output:
Notes:
The C Standard does not guarantee the quality of the random sequence produced. In the past, some
implementations of rand() had serious issues in distribution and randomness of the generated numbers. The
usage of rand() is not recommended for serious random number generation needs, like cryptography.
Why would you want such a thing? Maybe you don't trust your platform's builtin random number generator, or
maybe you want a reproducible source of randomness independent of any particular library implementation.
This code is PCG32 from pcg-random.org, a modern, fast, general-purpose RNG with excellent statistical properties.
It's not cryptographically secure, so don't use it for cryptography.
#include <stdint.h>
#include <stdio.h>
int main(void) {
pcg32_random_t rng; /* RNG state */
int i;
return 0;
}
Example implementation
#include <stdint.h>
/* These state variables must be initialised so that they are not all zero. */
uint32_t w, x, y, z;
uint32_t xorshift128(void)
{
uint32_t t = x;
t ^= t << 11U;
t ^= t >> 8U;
x = y; y = z; z = w;
w ^= w >> 19U;
w ^= t;
The macro
i = (int)(uniform() * N)
Unfortunately there is a technical flaw, in that RAND_MAX is permitted to be larger than a variable of type double
can accurately represent. This means that RAND_MAX + 1.0 evaluates to RAND_MAX and the function occasionally
returns unity. This is unlikely however.
my-header-file.h
#ifndef MY_HEADER_FILE_H
#define MY_HEADER_FILE_H
#endif
This ensures that when you #include "my-header-file.h" in multiple places, you don't get duplicate declarations
of functions, variables, etc. Imagine the following hierarchy of files:
header-1.h
typedef struct {
…
} MyStruct;
header-2.h
#include "header-1.h"
main.c
#include "header-1.h"
#include "header-2.h"
int main() {
// do something
}
This code has a serious problem: the detailed contents of MyStruct is defined twice, which is not allowed. This
would result in a compilation error that can be difficult to track down, since one header file includes another. If you
instead did it with header guards:
header-1.h
#ifndef HEADER_1_H
#define HEADER_1_H
typedef struct {
#endif
header-2.h
#ifndef HEADER_2_H
#define HEADER_2_H
#include "header-1.h"
#endif
main.c
#include "header-1.h"
#include "header-2.h"
int main() {
// do something
}
#ifndef HEADER_1_H
#define HEADER_1_H
typedef struct {
…
} MyStruct;
#endif
#ifndef HEADER_2_H
#define HEADER_2_H
typedef struct {
…
} MyStruct;
#endif
#endif
int main() {
// do something
When the compiler reaches the second inclusion of header-1.h, HEADER_1_H was already defined by the previous
inclusion. Ergo, it boils down to the following:
#define HEADER_1_H
typedef struct {
…
} MyStruct;
#define HEADER_2_H
int main() {
// do something
}
Note: There are multiple different conventions for naming the header guards. Some people like to name it
HEADER_2_H_, some include the project name like MY_PROJECT_HEADER_2_H. The important thing is to ensure that the
convention you follow makes it so that each file in your project has a unique header guard.
If the structure details were not included in the header, the type declared would be incomplete or an opaque type.
Such types can be useful, hiding implementation details from users of the functions. For many purposes, the FILE
type in the standard C library can be regarded as an opaque type (though it usually isn't opaque so that macro
implementations of the standard I/O functions can make use of the internals of the structure). In that case, the
header-1.h could contain:
#ifndef HEADER_1_H
#define HEADER_1_H
#endif
Note that the structure must have a tag name (here MyStruct — that's in the tags namespace, separate from the
ordinary identifiers namespace of the typedef name MyStruct), and that the { … } is omitted. This says "there is a
structure type struct MyStruct and there is an alias for it MyStruct".
In the implementation file, the details of the structure can be defined to make the type complete:
struct MyStruct {
…
};
If you are using C11, you could repeat the typedef struct MyStruct MyStruct; declaration without causing a
compilation error, but earlier versions of C would complain. Consequently, it is still best to use the include guard
Many compilers support the #pragma once directive, which has the same results:
my-header-file.h
#pragma once
However, #pragma once is not part of the C standard, so the code is less portable if you use it.
A few headers do not use the include guard idiom. One specific example is the standard <assert.h> header. It may
be included multiple times in a single translation unit, and the effect of doing so depends on whether the macro
NDEBUG is defined each time the header is included. You may occasionally have an analogous requirement; such
cases will be few and far between. Ordinarily, your headers should be protected by the include guard idiom.
However, if the source code you have surrounded with a block comment has block style comments in the source,
the ending */ of the existing block comments can cause your new block comment to be invalid and cause
compilation problems.
/* Return 5 */
return i;
}
*/
In the previous example, the last two lines of the function and the last '*/' are seen by the compiler, so it would
compile with errors. A safer method is to use an #if 0 directive around the code you want to block out.
#if 0
/* #if 0 evaluates to false, so everything between here and the #endif are
* removed by the preprocessor. */
int myUnusedFunction(void)
{
A benefit with this is that when you want to go back and find the code, it's much easier to do a search for "#if 0"
than searching all your comments.
Another very important benefit is that you can nest commenting out code with #if 0. This cannot be done with
comments.
An alternative to using #if 0 is to use a name that will not be #defined but is more descriptive of why the code is
being blocked out. For instance if there is a function that seems to be useless dead code you might use #if
defined(POSSIBLE_DEAD_CODE) or #if defined(FUTURE_CODE_REL_020201) for code needed once other
functionality is in place or something similar. Then when going back through to remove or enable that source, those
sections of source are easy to find.
#ifdef DEBUG
# define LOGFILENAME "/tmp/logfile.log"
# define LOG(str) do { \
FILE *fp = fopen(LOGFILENAME, "a"); \
if (fp) { \
fprintf(fp, "%s:%d %s\n", __FILE__, __LINE__, \
/* don't print null pointer */ \
str ?str :"<null>"); \
fclose(fp); \
} \
else { \
perror("Opening '" LOGFILENAME "' failed"); \
} \
} while (0)
#else
/* Make it a NOOP if DEBUG is not defined. */
# define LOG(LINE) (void)0
#endif
#include <stdio.h>
Here in both cases (with DEBUG or not) the call behaves the same way as a function with void return type. This
ensures that the if/else conditionals are interpreted as expected.
In the DEBUG case this is implemented through a do { ... } while(0) construct. In the other case, (void)0 is a
statement with no side effect that is just ignored.
If you use GCC, you can also implement a function-like macro that returns result using a non-standard GNU
extension — statement expressions. For example:
#include <stdio.h>
#define POW(X, Y) \
({ \
int i, r = 1; \
for (i = 0; i < Y; ++i) \
r *= X; \
r; \ // returned value is result of last operation
})
int main(void)
{
int result;
#include <stdio.h>
#include "myheader.h"
#include replaces the statement with the contents of the file referred to. Angle brackets (<>) refer to header files
installed on the system, while quotation marks ("") are for user-supplied files.
Macros themselves can expand other macros once, as this example illustrates:
#if VERSION == 1
#define INCFILE "vers1.h"
#elif VERSION == 2
#define INCFILE "vers2.h"
/* and so on */
#else
#define INCFILE "versN.h"
#endif
/* ... */
#include INCFILE
The #if directives behaves similar to the C if statement, it shall only contain integral constant expressions, and no
casts. It supports one additional unary operator, defined( identifier ), which returns 1 if the identifier is
defined, and 0 otherwise.
In most cases a release build of an application is expected to have as little overhead as possible. However during
testing of an interim build, additional logs and information about problems found can be helpful.
For example assume there is some function SHORT SerOpPluAllRead(PLUIF *pPif, USHORT usLockHnd) which
when doing a test build it is desired will generate a log about its use. However this function is used in multiple
places and it is desired that when generating the log, part of the information is to know where is the function being
called from.
So using conditional compilation you can have something like the following in the include file declaring the function.
This replaces the standard version of the function with a debug version of the function. The preprocessor is used to
replace calls to the function SerOpPluAllRead() with calls to the function SerOpPluAllRead_Debug() with two
additional arguments, the name of the file and the line number of where the function is used.
Conditional compilation is used to choose whether to override the standard function with a debug version or not.
#if 0
// function declaration and prototype for our debug version of the function.
SHORT SerOpPluAllRead_Debug(PLUIF *pPif, USHORT usLockHnd, char *aszFilePath, int nLineNo);
// macro definition to replace function call using old name with debug function with additional
arguments.
#define SerOpPluAllRead(pPif,usLock) SerOpPluAllRead_Debug(pPif,usLock,__FILE__,__LINE__)
#else
// standard function declaration that is normally used with builds.
SHORT SerOpPluAllRead(PLUIF *pPif, USHORT usLockHnd);
#endif
There is one important consideration: any file using this function must include the header file where this approach is
used in order for the preprocessor to modify the function. Otherwise you will see a linker error.
The definition of the function would look something like the following. What this source does is to request that the
preprocessor rename the function SerOpPluAllRead() to be SerOpPluAllRead_Debug() and to modify the
argument list to include two additional arguments, a pointer to the name of the file where the function was called
and the line number in the file at which the function is used.
#if defined(SerOpPluAllRead)
// forward declare the replacement function which we will call once we create our log.
SHORT SerOpPluAllRead_Special(PLUIF *pPif, USHORT usLockHnd);
// only print the last 30 characters of the file name to shorten the logs.
iLen = strlen (aszFilePath);
if (iLen > 30) {
iLen = iLen - 30;
}
else {
iLen = 0;
}
// now that we have issued the log, continue with standard processing.
return SerOpPluAllRead_Special(pPif, usLockHnd);
}
// our special replacement function name for when we are generating logs.
SHORT SerOpPluAllRead_Special(PLUIF *pPif, USHORT usLockHnd)
#else
// standard, normal function name (signature) that is replaced with our debug version.
SHORT SerOpPluAllRead(PLUIF *pPif, USHORT usLockHnd)
#endif
{
if (STUB_SELF == SstReadAsMaster()) {
return OpPluAllRead(pPif, usLockHnd);
}
return OP_NOT_MASTER;
}
For example a function or other external is defined in a C source file but is used in a C++ source file. Since C++ uses
name mangling (or name decoration) in order to generate unique function names based on function argument
Since C compilers do not do name mangling but C++ compilers do for all external labels (function names or variable
names) generated by the C++ compiler, a predefined preprocessor macro, __cplusplus, was introduced to allow for
compiler detection.
In order to work around this problem of incompatible compiler output for external names between C and C++, the
macro __cplusplus is defined in the C++ Preprocessor and is not defined in the C Preprocessor. This macro name
can be used with the conditional preprocessor #ifdef directive or #if with the defined() operator to tell whether
a source code or include file is being compiled as C++ or C.
#ifdef __cplusplus
printf("C++\n");
#else
printf("C\n");
#endif
#if defined(__cplusplus)
printf("C++\n");
#else
printf("C\n");
#endif
In order to specify the correct function name of a function from a C source file compiled with the C compiler that is
being used in a C++ source file you could check for the __cplusplus defined constant in order to cause the extern
"C" { /* ... */ }; to be used to declare C externals when the header file is included in a C++ source file.
However when compiled with a C compiler, the extern "C" { */ ... */ }; is not used. This conditional
compilation is needed because extern "C" { /* ... */ }; is valid in C++ but not in C.
#ifdef __cplusplus
// if we are being compiled with a C++ compiler then declare the
// following functions as C functions to prevent name mangling.
extern "C" {
#endif
#ifdef __cplusplus
// if this is a C++ compiler, we need to close off the extern declaration.
};
#endif
#ifdef UNICODE
#define TEXT(x) L##x
Whenever a user writes TEXT("hello, world"), and UNICODE is defined, the C preprocessor concatenates L and
the macro argument. L concatenated with "hello, world" gives L"hello, world".
__FILE__, which gives the file name of the current source file (a string literal),
__LINE__ for the current line number (an integer constant),
__DATE__ for the compilation date (a string literal),
__TIME__ for the compilation time (a string literal).
There's also a related predefined identifier, __func__ (ISO/IEC 9899:2011 §6.4.2.2), which is not a macro:
The identifier __func__ shall be implicitly declared by the translator as if, immediately following the
opening brace of each function definition, the declaration:
__FILE__, __LINE__ and __func__ are especially useful for debugging purposes. For example:
Pre-C99 compilers, may or may not support __func__ or may have a macro that acts the same that is named
differently. For example, gcc used __FUNCTION__ in C89 mode.
__STDC_VERSION__ The version of the C Standard implemented. This is a constant integer using the format
yyyymmL (the value 201112L for C11, the value 199901L for C99; it wasn't defined for C89/C90)
__STDC_HOSTED__ 1 if it's a hosted implementation, else 0.
__STDC__ If 1, the implementation conforms to the C Standard.
__STDC_ISO_10646__ An integer constant of the form yyyymmL (for example, 199712L). If this
symbol is defined, then every character in the Unicode required set, when stored in an object of
type wchar_t, has the same value as the short identifier of that character. The Unicode required set
consists of all the characters that are defined by ISO/IEC 10646, along with all amendments and
technical corrigenda, as of the specified year and month. If some other encoding is used, the macro
shall not be defined and the actual encoding used is implementation-defined.
__STDC_MB_MIGHT_NEQ_WC__ The integer constant 1, intended to indicate that, in the encoding for
wchar_t, a member of the basic character set need not have a code value equal to its value when
__STDC_UTF_16__ The integer constant 1, intended to indicate that values of type char16_t are
UTF−16 encoded. If some other encoding is used, the macro shall not be defined and the actual
encoding used is implementation-defined.
__STDC_UTF_32__ The integer constant 1, intended to indicate that values of type char32_t are
UTF−32 encoded. If some other encoding is used, the macro shall not be defined and the actual
encoding used is implementation-defined.
Let's say you want to create some print-macro for debugging your code, let's take this macro as an example:
The function somefunc() returns -1 if failed and 0 if succeeded, and it is called from plenty different places within
the code:
if(retVal == -1)
{
debug_printf("somefunc() has failed");
}
if(retVal == -1)
{
debug_printf("somefunc() has failed");
}
What happens if the implementation of somefunc() changes, and it now returns different values matching different
possible error types? You still want use the debug macro and print the error value.
To solve this problem the __VA_ARGS__ macro was introduced. This macro allows multiple parameters X-macro's:
Example:
Usage:
This macro allows you to pass multiple parameters and print them, but now it forbids you from sending any
parameters at all.
debug_print("Hey");
This would raise some syntax error as the macro expects at least one more argument and the pre-processor would
not ignore the lack of comma in the debug_print() macro. Also debug_print("Hey",); would raise a syntax error
as you cant keep the argument passed to macro empty.
To solve this, ##__VA_ARGS__ macro was introduced, this macro states that if no variable arguments exist, the
comma is deleted by the pre-processor from code.
Example:
Usage:
double b = 34;
int c = 23;
The replacement is done before any other interpretation of the program text. In the first call to TIMES10 the name A
from the definition is replaced by b and the so expanded text is then put in place of the call. Note that this
definition of TIMES10 is not equivalent to
because this could evaluate the replacement of A, twice, which can have unwanted side effects.
The following defines a function-like macro which value is the maximum of its arguments. It has the advantages of
working for any compatible types of the arguments and of generating in-line code without the overhead of function
calling. It has the disadvantages of evaluating one or the other of its arguments a second time (including side
effects) and of generating more code than a function if invoked several times.
Because of this, such macros that evaluate their arguments multiple times are usually avoided in production code.
Since C11 there is the _Generic feature that allows to avoid such multiple invocations.
The abundant parentheses in the macro expansions (right hand side of the definition) ensure that the arguments
and the resulting expression are bound properly and fit well into the context in which the macro is called.
#define DEBUG
#ifdef DEBUG
#error "Debug Builds Not Supported"
#endif
int main(void) {
return 0;
}
Possible output:
#include <stdio.h>
#include <stdlib.h>
struct LinkedListNode
{
int data;
struct LinkedListNode *next;
};
/* Usage */
int main(void)
{
struct LinkedListNode *list, **plist = &list, *node;
int i;
You can make a standard interface for such data-structures and write a generic implementation of FOREACH as:
#include <stdio.h>
#include <stdlib.h>
/* must implement */
void *first(void *coll)
{
return ((Collection*)coll)->collectionHead;
}
/* must implement */
void *last(void *coll)
{
return NULL;
}
/* must implement */
void *next(void *coll, CollectionItem *curr)
{
return curr->next;
}
Collection *new_Collection()
{
Collection *nc = malloc(sizeof(Collection));
nc->first = first;
nc->last = last;
nc->next = next;
return nc;
}
/* generic implementation */
#define FOREACH(node, collection) \
for (node = (collection)->first(collection); \
node != (collection)->last(collection); \
node = (collection)->next(collection, node))
int main(void)
{
Collection *coll = new_Collection();
CollectionItem *node;
int i;
To use this generic implementation just implement these functions for your data structure.
The signal() function is part of the ISO C standard and can be used to assign a function to handle a specific signal
default:
/* Reset the signal to the default handler,
so we will not be called again if things go
wrong on return. */
signal(sig, SIG_DFL);
/* let everybody know that we are finished */
finished = sig;
return;
}
}
int main(void)
{
/* Then: */
if (finished) {
fprintf(stderr, "we have been terminated by signal %d\n", (int)finished);
return EXIT_FAILURE;
}
Using signal() imposes important limitations what you are allowed to do inside the signal handlers, see the
remarks for further information.
POSIX recommends the usage of sigaction() instead of signal(), due to its underspecified behavior and
significant implementation variations. POSIX also defines many more signals than ISO C standard, including
SIGUSR1 and SIGUSR2, which can be used freely by the programmer for any purpose.
Variable arguments are used by functions in the printf family (printf, fprintf, etc) and others to allow a function
to be called with a different number of arguments each time, hence the name varargs.
To implement functions using the variable arguments feature, use #include <stdarg.h>.
To call functions which take a variable number of arguments, ensure there is a full prototype with the trailing
ellipsis in scope: void err_exit(const char *format, ...); for example.
The simplest technique is to pass an explicit count of the other arguments (which are normally all the same type).
This is demonstrated in the variadic function in the code below which calculates the sum of a series of integers,
where there may be any number of integers but that count is specified as an argument prior to the variable
argument list.
#include <stdio.h>
#include <stdarg.h>
return sum;
}
int main(void)
{
printf("%d\n", sum(5, 1, 2, 3, 4, 5)); /* prints 15 */
printf("%d\n", sum(10, 5, 9, 2, 5, 111, 6666, 42, 1, 43, -6218)); /* prints 666 */
return 0;
}
/* First argument specifies the number of parameters; the remainder are also int */
extern int sum(int n, ...);
Sometimes it's more robust to add an explicit terminator, exemplified by the POSIX execlp() function. Here's
another function to calculate the sum of a series of double numbers:
#include <stdarg.h>
#include <stdio.h>
#include <math.h>
va_start(va, x);
for (; !isnan(x); x = va_arg(va, double)) {
sum += x;
}
va_end(va);
return sum;
}
errmsg.h
#include <stdarg.h>
#include <stdnoreturn.h> // C11
#endif
This is a bare-bones example; such packages can be much elaborate. Normally, programmers will use either
errmsg() or warnmsg(), which themselves use verrmsg() internally. If someone comes up with a need to do more,
though, then the exposed verrmsg() function will be useful. You could avoid exposing it until you have a need for it
(YAGNI — you aren't gonna need it), but the need will arise eventually (you are gonna need it — YAGNI).
errmsg.c
This code only needs to forward the variadic arguments to the vfprintf() function for outputting to standard
error. It also reports the system error message corresponding to the system error number (errno) passed to the
functions.
#include "errmsg.h"
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void
verrmsg(int errnum, const char *fmt, va_list ap)
{
if (fmt)
vfprintf(stderr, fmt, ap);
if (errnum != 0)
fprintf(stderr, ": %s", strerror(errnum));
putc('\n', stderr);
}
void
errmsg(int exitcode, int errnum, const char *fmt, ...)
{
va_list ap;
va_start(ap, fmt);
verrmsg(errnum, fmt, ap);
va_end(ap);
exit(exitcode);
}
void
warnmsg(int errnum, const char *fmt, ...)
{
va_list ap;
va_start(ap, fmt);
verrmsg(errnum, fmt, ap);
va_end(ap);
}
Using errmsg.h
#include "errmsg.h"
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
If either the open() or read() system calls fails, the error is written to standard error and the program exits with
exit code 1. If the close() system call fails, the error is merely printed as a warning message, and the program
continues.
If you are using GCC (the GNU C Compiler, which is part of the GNU Compiler Collection), or using Clang, then you
can have the compiler check that the arguments you pass to the error message functions match what printf()
expects. Since not all compilers support the extension, it needs to be compiled conditionally, which is a little bit
fiddly. However, the protection it gives is worth the effort.
First, we need to know how to detect that the compiler is GCC or Clang emulating GCC. The answer is that GCC
defines __GNUC__ to indicate that.
See common function attributes for information about the attributes — specifically the format attribute.
Rewritten errmsg.h
#ifndef ERRMSG_H_INCLUDED
#define ERRMSG_H_INCLUDED
#include <stdarg.h>
#include <stdnoreturn.h> // C11
#if !defined(PRINTFLIKE)
#if defined(__GNUC__)
#define PRINTFLIKE(n,m) __attribute__((format(printf,n,m)))
#else
#define PRINTFLIKE(n,m) /* If only */
#endif
The example below shows a a function that wraps the standard printf() function, only allowing for the use of
variadic arguments of the type char, int and double (in decimal floating point format). Here, like with printf(), the
first argument to the wrapping function is the format string. As the format string is parsed the function is able to
determine if there is another variadic argument expected and what it's type should be.
#include <stdio.h>
#include <stdarg.h>
if (*format == '%')
{
++format;
switch(*format)
{
case 'c' :
case 'f' :
f = printf("%f", va_arg(ap, double)); /* print next variadic argument */
break;
default :
f = -1; /* invalid format specifier */
break;
}
}
else
{
f = printf("%c", *format); /* print any other characters */
}
return printed;
}
An assertion is a predicate that the presented condition must be true at the moment the assertion is encountered
by the software. Most common are simple assertions, which are validated at execution time. However, static
assertions are checked at compile time.
#include <stdio.h>
/* Uncomment to disable `assert()` */
/* #define NDEBUG */
#include <assert.h>
int main(void)
{
int x = -1;
assert(x >= 0);
x = -1
It's good practice to define NDEBUG globally, so that you can easily compile your code with all assertions either on or
off. An easy way to do this is define NDEBUG as an option to the compiler, or define it in a shared configuration
header (e.g. config.h).
Static assertions are used to check if a condition is true when the code is compiled. If it isn't, the compiler is
required to issue an error message and stop the compiling process.
#include <assert.h>
enum {N = 5};
_Static_assert(N == 5, "N does not equal 5");
static_assert(N > 10, "N is not greater than 10"); /* compiler error */
Version = C99
Prior to C11, there was no direct support for static assertions. However, in C99, static assertions could be emulated
with macros that would trigger a compilation failure if the compile time condition was false. Unlike _Static_assert,
the second parameter needs to be a proper token name so that a variable name can be created with it. If the
assertion fails, the variable name is seen in the compiler error, since that variable was used in a syntactically
incorrect array declaration.
enum { N = 5 };
STATIC_ASSERT(N == 5, N_must_equal_5);
STATIC_ASSERT(N > 5, N_must_be_greater_than_5); /* compile error */
Before C99, you could not declare variables at arbitrary locations in a block, so you would have to be extremely
cautious about using this macro, ensuring that it only appears where a variable declaration would be valid.
However, you can use logical AND (&&) to give an error message as well
Now, if the assertion fails, an error message will read something like this
The reason as to why this works is that a string literal always evaluates to non-zero (true). Adding && 1 to a Boolean
expression has no effect. Thus, adding && "error message" has no effect either, except that the compiler will
display the entire expression that failed.
switch (color) {
case COLOR_RED:
case COLOR_GREEN:
case COLOR_BLUE:
break;
default:
assert(0);
}
Whenever the argument of the assert() macro evaluates false, the macro will write diagnostic information to the
standard error stream and then abort the program. This information includes the file and line number of the
assert() statement and can be very helpful in debugging. Asserts can be disabled by defining the macro NDEBUG.
Another way to terminate a program when an error occurs are with the standard library functions exit, quick_exit
or abort. exit and quick_exit take an argument that can be passed back to your environment. abort() (and thus
assert) can be a really severe termination of your program, and certain cleanups that would otherwise be
performed at the end of the execution, may not be performed.
The primary advantage of assert() is that it automatically prints debugging information. Calling abort() has the
advantage that it cannot be disabled like an assert, but it may not cause any debugging information to be displayed.
In some situations, using both constructs together may be beneficial:
When asserts are enabled, the assert() call will print debug information and terminate the program. Execution
never reaches the abort() call. When asserts are disabled, the assert() call does nothing and abort() is called.
This ensures that the program always terminates for this error condition; enabling and disabling asserts only effects
whether or not debug output is printed.
You should never leave such an assert in production code, because the debug information is not helpful for end
users and because abort is generally a much too severe termination that inhibit cleanup handlers that are installed
for exit or quick_exit to run.
#include <stdio.h>
/* Uncomment to disable `assert()` */
/* #define NDEBUG */
#include <assert.h>
/* Precondition: */
/* NULL is an invalid vector */
assert (a != NULL);
/* Number of dimensions can not be negative.*/
assert (count >= 0);
/* Calculation */
for (i = 0; i < count; ++i)
{
result = result + (a[i] * a[i]);
}
/* Postcondition: */
/* Resulting length can not be negative. */
assert (result >= 0);
return result;
}
#define COUNT 3
int main(void)
{
const int i = 1;
int j = 1;
double k = 1.0;
printf("i is %s\n", is_const_int(i));
printf("j is %s\n", is_const_int(j));
printf("k is %s\n", is_const_int(k));
}
Output:
i is a const int
j is a non-const int
k is of other type
i is a non-const int
j is a non-const int
k is of other type
This is because all type qualifiers are dropped for the evaluation of the controlling expression of a _Generic
primary expression.
Here, the controlling expression (X)+(Y) is only inspected according to its type and not evaluated. The usual
conversions for arithmetic operands are performed to determine the selected type.
For more complex situation, a selection can be made based on more than one argument to the operator, by nesting
them together.
This example selects between four externally implemented functions, that take combinations of two int and/or
string arguments, and return their sum.
#define AddStr(y) \
_Generic((y), int: AddStrInt, \
char*: AddStrStr, \
const char*: AddStrStr )
#define AddInt(y) \
_Generic((y), int: AddIntInt, \
char*: AddIntStr, \
const char*: AddIntStr )
#define Add(x, y) \
_Generic((x) , int: AddInt(y) , \
char*: AddStr(y) , \
const char*: AddStr(y)) \
((x), (y))
int c = 1;
const char d[] = "0";
result = Add( d , ++c );
}
Even though it appears as if argument y is evaluated more than once, it isn't 1. Both arguments are evaluated only
once, at the end of macro Add: ( x , y ), just like in an ordinary function call.
int main(void) {
print(42);
print(3.14);
print("hello, world");
}
Output:
int: 42
double: 3.14
unknown argument
Note that if the type is neither int nor double, a warning would be generated. To eliminate the warning, you can
add that type to the print(X) macro.
/* define X to use */
#define X(val) printf("X(%d) made this print\n", val);
X_123
#undef X
/* good practice to undef X to facilitate reuse later on */
This example will result in the preprocessor generating the following code:
As always with X macros, the master macro represents a list of items whose significance is specific to that macro. In
this variation, such a macro might be defined like so:
One might then generate code to print the item names like so:
In contrast to standard X macros, where the "X" name is a built-in characteristic of the master macro, with this style
Next you can use the enumerated value in your code and easily print its identifier using :
printf("%s\n", enum2string(MyEnum_item2));
Here we use X-macros to declare an enum containing 4 commands and a map of their names as strings
Similarly, we can generate a jump table to call functions by the enum value.
This requires all functions to have the same signature. If they take no arguments and return an int, we would put
this in a header with the enum definition:
All of the following can be in different compilation units assuming the part above is included as a header:
An example of this technique being used in real code is for GPU command dispatching in Chromium.
// a normal variable, effective type uint32_t, and this type never changes
uint32_t a = 0.0;
Observe that for the latter, it was not necessary that we even have an uint32_t* pointer to that object. The fact
that we have copied another uint32_t object is sufficient.
float fval = 4;
float eval = 77;
fun(&eval, &fval);
is 4 equal to 4?
is printed. If we pass the same pointer, the program will still do the right thing and print
is 4 equal to 22?
This can turn out to be inefficient, if we know by some outside information that e and f will never point to the same
data object. We can reflect that knowledge by adding restrict qualifiers to the pointer parameters:
Then the compiler may always suppose that e and f point to different objects.
#include <inttypes.h>
#include <stdio.h>
int main(void) {
uint32_t a = 57;
// conversion from incompatible types needs a cast !
unsigned char* ap = (unsigned char*)&a;
for (size_t i = 0; i < sizeof a; ++i) {
/* set each byte of a to 42 */
ap[i] = 42;
}
printf("a now has value %" PRIu32 "\n", a);
}
The access is made to the individual bytes seen with type unsigned char so each modification is well
defined.
The two views to the object, through a and through *ap, alias, but since ap is a pointer to a character type, the
strict aliasing rule does not apply. Thus the compiler has to assume that the value of a may have been
changed in the for loop. The modified value of a must be constructed from the bytes that have been
changed.
The type of a, uint32_t has no padding bits. All its bits of the representation count for the value, here
This is undefined because it violates the "effective type" rule, no data object that has an effective type may be
accessed through another type that is not a character type. Since the other type here is int, this is not allowed.
Even if alignment and pointer sizes would be known to fit, this would not exempt from this rule, behavior would still
be undefined.
This means in particular that there is no way in standard C to reserve a buffer object of character type that can be
used through pointers with different types, as you would use a buffer that was received by malloc or similar
function.
A correct way to achieve the same goal as in the above example would be to use a union.
static bufType a = { .c = { 0 } };
int* b = a.i;
*b = 2;
_Thread_local bufType a = { .c = { 0 } };
int* b = a.i;
*b = 3;
}
Here, the union ensures that the compiler knows from the start that the buffer could be accessed through different
u and f have different base type, and thus the compiler can assume that they point to different objects. There is no
possibility that *f could have changed between the two initializations of a and b, and so the compiler may optimize
the code to something equivalent to
That is, the second load operation of *f can be optimized out completely.
float fval = 4;
uint32_t uval = 77;
fun(&uval, &fval);
4 should equal 4
is printed. But if we cheat and pass the same pointer, after converting it,
float fval = 4;
uint32_t* up = (uint32_t*)&fval;
fun(up, &fval);
we violate the strict aliasing rule. Then the behavior becomes undefined. The output could be as above, if the
compiler had optimized the second access, or something completely different, and so your program ends up in a
completely unreliable state.
In general, the exact sequence how to invoke a C compiler depends much on the system that you are using. Here
we are using the GCC compiler, though it should be noted that many more compilers exist:
% is the OS' command prompt. This tells the compiler to run the pre-processor on the file foo.c and then compile it
into the object code file foo.o. The -c option means to compile the source code file into an object file but not to
invoke the linker. This option -c is available on POSIX systems, such as Linux or macOS; other systems may use
different syntax.
If your entire program is in one source code file, you can instead do this:
This tells the compiler to run the pre-processor on foo.c, compile it and then link it to create an executable called
foo. The -o option states that the next word on the line is the name of the binary executable file (program). If you
don't specify the -o, (if you just type gcc foo.c), the executable will be named a.out for historical reasons.
In general the compiler takes four steps when converting a .c file into an executable:
1. pre-processing - textually expands #include directives and #define macros in your .c file
2. compilation - converts the program into assembly (you can stop the compiler at this step by adding the -S
option)
3. assembly - converts the assembly into machine code
4. linkage - links the object code to external libraries to create an executable
Note also that the name of the compiler we are using is GCC, which stands for both "GNU C compiler" and "GNU
compiler collection", depending on context. Other C compilers exist. For Unix-like operating systems, many of them
have the name cc, for "C compiler", which is often a symbolic link to some other compiler. On Linux systems, cc is
often an alias for GCC. On macOS or OS-X, it points to clang.
The POSIX standards currently mandates c99 as the name of a C compiler — it supports the C99 standard by
default. Earlier versions of POSIX mandated c89 as the compiler. POSIX also mandates that this compiler
understands the options -c and -o that we used above.
Note: The -Wall option present in both gcc examples tells the compiler to print warnings about questionable
constructions, which is strongly recommended. It is a also good idea to add other warning options, e.g. -Wextra.
1. Source files: These files contain function definitions, and have names which end in .c by convention. Note:
.cc and .cpp are C++ files; not C files.
e.g., foo.c
2. Header files: These files contain function prototypes and various pre-processor statements (see below).
They are used to allow source code files to access externally-defined functions. Header files end in .h by
convention.
e.g., foo.h
3. Object files: These files are produced as the output of the compiler. They consist of function definitions in
binary form, but they are not executable by themselves. Object files end in .o by convention, although on
some operating systems (e.g. Windows, MS-DOS), they often end in .obj.
e.g., foo.o foo.obj
4. Binary executables: These are produced as the output of a program called a "linker". The linker links
together a number of object files to produce a binary file which can be directly executed. Binary executables
have no special suffix on Unix operating systems, although they generally end in .exe on Windows.
e.g., foo foo.exe
5. Libraries: A library is a compiled binary but is not in itself an an executable (i.e., there is no main() function
in a library). A library contains functions that may be used by more than one program. A library should ship
with header files which contain prototypes for all functions in the library; these header files should be
referenced (e.g; #include <library.h>) in any source file that uses the library. The linker then needs to be
referred to the library so the program can successfully compiled. There are two types of libraries: static and
dynamic.
Static library: A static library (.a files for POSIX systems and .lib files for Windows — not to be
confused with DLL import library files, which also use the .lib extension) is statically built into the
program . Static libraries have the advantage that the program knows exactly which version of a library
is used. On the other hand, the sizes of executables are bigger as all used library functions are
included.
e.g., libfoo.a foo.lib
Dynamic library: A dynamic library (.so files for most POSIX systems, .dylib for OSX and .dll files
for Windows) is dynamically linked at runtime by the program. These are also sometimes referred to
as shared libraries because one library image can be shared by many programs. Dynamic libraries
have the advantage of taking up less disk space if more than one application is using the library. Also,
they allow library updates (bug fixes) without having to rebuild executables.
e.g., foo.so foo.dylib foo.dll
During the link process, the linker will pick up all the object modules specified on the command line, add some
system-specific startup code in front and try to resolve all external references in the object module with external
definitions in other object files (object files can be specified directly on the command line or may implicitly be added
through libraries). It will then assign load addresses for the object files, that is, it specifies where the code and data
This includes both the object files that the compiler created from your source code files as well as object files that
have been pre-compiled for you and collected into library files. These files have names which end in .a or .so, and
you normally don't need to know about them, as the linker knows where most of them are located and will link
them in automatically as needed.
Like the pre-processor, the linker is a separate program, often called ld (but Linux uses collect2, for example).
Also like the pre-processor, the linker is invoked automatically for you when you use the compiler. Thus, the normal
way of using the linker is as follows:
This line tells the compiler to link together three object files (foo.o, bar.o, and baz.o) into a binary executable file
named myprog. Now you have a file called myprog that you can run and which will hopefully do something cool
and/or useful.
It is possible to invoke the linker directly, but this is seldom advisable, and is typically very platform-specific. That is,
options that work on Linux won't necessarily work on Solaris, AIX, macOS, Windows, and similarly for any other
platform. If you work with GCC, you can use gcc -v to see what is executed on your behalf.
The linker also takes some arguments to modify it's behavior. The following command would tell gcc to link foo.o
and bar.o, but also include the ncurses library.
(although libncurses.so could be libncurses.a, which is just an archive created with ar). Note that you should list
the libraries (either by pathname or via -lname options) after the object files. With static libraries, the order that
they are specified matters; often, with shared libraries, the order doesn't matter.
Note that on many systems, if you are using mathematical functions (from <math.h>), you need to specify -lm to
load the mathematics library — but Mac OS X and macOS Sierra do not require this. There are other libraries that
are separate libraries on Linux and other Unix systems, but not on macOS — POSIX threads, and POSIX realtime,
and networking libraries are examples. Consequently, the linking process varies between platforms.
This is all you need to know to begin compiling your own C programs. Generally, we also recommend that you use
the -Wall command-line option:
If you want the compiler to throw more warnings at you (including variables that are declared but not used,
forgetting to return a value etc.), you can use this set of options, as -Wall, despite the name, doesn't turn all of the
possible warnings on:
Note that clang has an option -Weverything which really does turn on all warnings in clang.
Preprocessor commands start with the pound sign ("#"). There are several preprocessor commands; two of the
most important are:
1. Defines:
becomes
int a = 1000000;
#define is used in this way so as to avoid having to explicitly write out some constant value in many different
places in a source code file. This is important in case you need to change the constant value later on; it's
much less bug-prone to change it once, in the #define, than to have to change it in multiple places scattered
all over the code.
Because #define just does advanced search and replace, you can also declare macros. For instance:
becomes:
// in the function:
a = x;
do {
a = a ? 1 : 0;
} while(0);
Also note here, that the preprocessor would also replace comments with a blanks as explained below.
2. Includes:
#include is used to access function definitions defined outside of a source code file. For instance:
#include <stdio.h>
causes the preprocessor to paste the contents of <stdio.h> into the source code file at the location of the
#include statement before it gets compiled. #include is almost always used to include header files, which
are files which mainly contain function declarations and #define statements. In this case, we use #include in
order to be able to use functions such as printf and scanf, whose declarations are located in the file
stdio.h. C compilers do not allow you to use a function unless it has previously been declared or defined in
that file; #include statements are thus the way to re-use previously-written code in your C programs.
3. Logic operations:
variable = another_variable + 1;
if A or B were defined somewhere in the project before. If this is not the case, of course the preprocessor will
do this:
variable = another_variable * 2;
This is often used for code, that runs on different systems or compiles on different compilers. Since there are
global defines, that are compiler/system specific you can test on those defines and always let the compiler
just use the code he will compile for sure.
4. Comments
The Preprocessor replaces all comments in the source file by single spaces. Comments are indicated by // up
to the end of the line, or a combination of opening /* and closing */ comment brackets.
An implementation of a C compiler may combine several steps together, but the resulting image must still behave
as if the above steps had occurred separately in the order listed above.
#define mov(x,y) \
{ \
__asm__ ("l.cmov %0,%1,%2" : "=r" (x) : "r" (y), "r" (0x0000000F)); \
}
///Using
mov(state[0][1], sbox[si][sj]);
Using inline assembly instructions embedded in C code can improve the run time of a program. This is very helpful
in time critical situations like cryptographic algorithms such as AES. For example, for a simple shift operation that is
needed in the AES algorithm, we can substitute a direct Rotate Right assembly instruction with C shift operator >>.
We can change three shift + assign and one assign C expression with only one assembly Rotate Right operation.
__asm__ ("l.ror %0,%1,%2" : "=r" (* (unsigned int *) subkey) : "r" (w), "r" (0x10));
where AssemblerInstructions is the direct assembly code for the given processor. The volatile keyword is optional
and has no effect as gcc does not optimize code within a basic asm statement. AssemblerInstructions can contain
multiple assembly instructions. A basic asm statement is used if you have an asm routine that must exist outside of
a C function. The following example is from the GCC manual:
where AssemblerTemplate is the template for the assembler instruction, OutputOperands are any C variables that
can be modified by the assembly code, InputOperands are any C variables used as input parameters, Clobbers are
a list or registers that are modified by the assembly code, and GotoLabels are any goto statement labels that may
be used in the assembly code.
The extended format is used within C functions and is the more typical usage of inline assembly. Below is an
example from the Linux kernel for byte swapping 16-bit and 32-bit numbers for an ARM processor:
#endif
Each asm section uses the variable x as its input and output parameter. The C function then returns the
manipulated result.
With the extended asm format, gcc may optimize the assembly instructions in an asm block following the same
rules it uses for optimizing C code. If you want your asm section to remain untouched, use the volatile keyword
for the asm section.
/* The parameter name, apple, has function prototype scope. These names
are not significant outside the prototype itself. This is demonstrated
below. */
int main(void)
{
int orange = 5;
orange = test_function(orange);
printf("%d\r\n", orange); //orange = 6
return 0;
}
Note that you get puzzling error messages if you introduce a type name in a prototype:
struct whatever
{
int a;
// ...
};
No different entities with the same identifier can have the same scope, but scopes may overlap. In case of
overlapping scopes the only visible one is the one declared in the innermost scope.
#include <stdio.h>
int main(void)
{
int foo = 3; // foo has scope main function block
printf("%d\n", foo); // 3
test(5);
printf("%d\n", foo); // 3
return 0;
} // end of scope for main:foo
void test_function(void)
{
foo += 2;
}
int main(void)
{
foo = 1;
test_function();
printf("%d\r\n", foo); //foo = 3;
return 0;
}
#include <stdio.h>
INSIDE may seem defined inside the if block, as it is the case for i which scope is the block, but it is not. It is visible
in the whole function as the instruction goto INSIDE; illustrates. Thus there can't be two labels with the same
identifier in a single function.
A possible usage is the following pattern to realize correct complex cleanups of allocated ressources:
#include <stdlib.h>
#include <stdio.h>
void a_function(void) {
double* a = malloc(sizeof(double[34]));
if (!a) {
fprintf(stderr,"can't allocate\n");
return; /* No point in freeing a if it is null */
}
FILE* b = fopen("some_file","r");
if (!b) {
fprintf(stderr,"can't open\n");
goto CLEANUP1; /* Free a; no point in closing b */
}
/* do something reasonable */
if (error) {
fprintf(stderr,"something's wrong\n");
goto CLEANUP2; /* Free a and close b to prevent leaks */
}
/* do yet something else */
CLEANUP2:
close(b);
CLEANUP1:
free(a);
}
Labels such as CLEANUP1 and CLEANUP2 are special identifiers that behave differently from all other identifiers. They
are visible from everywhere inside the function, even in places that are executed before the labeled statement, or
even in places that could never be reached if none of the goto is executed. Labels are often written in lower-case
rather than upper-case.
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one
more than the maximum value that can be represented in the new type until the value is in the range of
the new type.
Otherwise, the new type is signed and the value cannot be represented in it; either the result is
implementation-defined or an implementation-defined signal is raised.
Usually you should not truncate a wide signed type to a narrower signed type, because obviously the values can't fit
and there is no clear meaning that this should have. The C standard cited above defines these cases to be
"implementation-defined", that is, they are not portable.
#include <stdio.h>
#include <stdint.h>
int main(void) {
return 0;
}
#include <stdio.h>
struct struct_b {
int a;
int b;
} data_b;
/*
* Explicit ptr conversion for other types
*
* Note that here although the have identical definitions,
* the types are not compatible, and that the this call is
* erroneous and leads to undefined behavior on execution.
*/
func_struct_b((struct struct_b*)&data_a);
/* My output shows: */
/* func_charp Address of ptr is 0x601030 */
/* func_voidp Address of ptr is 0x601030 */
/* func_struct_b Address of ptr is 0x601030 */
return 0;
}
The compiler will not optimize anything that has to do with the volatile variable.
void main()
{
...
while (!quit) {
// Do something that does not modify the quit variable
}
...
}
void interrupt_handler(void)
{
quit = true;
}
The compiler is allowed to notice the while loop does not modify the quit variable and convert the loop to a
endless while (true) loop. Even if the quit variable is set on the signal handler for SIGINT and SIGTERM, the
compiler does not know that.
Declaring quit as volatile will tell the compiler to not optimize the loop and the problem will be solved.
The same problem happens when accessing hardware, as we see in this example:
The behavior of the optimizer is to read the variable's value once, there is no need to reread it, since the value will
always be the same. So we end up with an infinite loop. To force the compiler to do what we want, we modify the
declaration to:
The const qualification only means that we don't have the right to change the data. It doesn't mean that the value
cannot change behind our back.
During the execution of the other calls *a might have changed, and so this function may return either false or
true.
Warning
const int a = 0;
int *a_ptr = (int*)&a; /* This conversion must be explicitly done with a cast */
*a_ptr += 10; /* This has undefined behavior */
But doing so is an error that leads to undefined behavior. The difficulty here is that this may behave as expected in
simple examples as this, but then go wrong when the code grows.
In the C standard, typedef is classified as a 'storage class' for convenience; it occurs syntactically where storage
classes such as static or extern could appear.
Person person;
Compared to the traditional way of declaring structs, programmers wouldn't need to have struct every time they
declare an instance of that struct.
Note that the name Person (as opposed to struct Person) is not defined until the final semicolon. Thus for linked
lists and tree structures which need to contain a pointer to the same structure type, you must use either:
or:
struct Person {
char name[32];
int age;
Person *next;
};
union Float
{
float f;
char b[sizeof(float)];
};
A structure similar to this can be used to analyze the bytes that make up a float value.
#include<stdio.h>
void print_to_n(int n)
{
for (int i = 1; i <= n; ++i)
printf("%d\n", i);
}
void print_n(int n)
{
printf("%d\n, n);
}
Now we can use a typedef to create a named function pointer type called printer:
This creates a type, named printer_t for a pointer to a function that takes a single int argument and returns
nothing, which matches the signature of the functions we have above. To use it we create a variable of the created
type and assign it a pointer to one of the functions in question:
printer_t p = &print_to_n;
void (*p)(int) = &print_to_n; // This would be required without the type
Thus the typedef allows a simpler syntax when dealing with function pointers. This becomes more apparent when
function pointers are used in more complex situations, such as arguments to functions.
If you are using a function that takes a function pointer as a parameter without a function pointer type defined the
function definition would be,
Likewise functions can return function pointers and again, the use of a typedef can make the syntax simpler when
doing so.
That's a function that takes two arguments — an int and a pointer to a function which takes an int as an argument
and returns nothing — and which returns a pointer to function like its second argument.
On the whole, this is easier to understand (even though the C standard did not elect to define a type to do the job).
The signal function takes two arguments, an int and a SigCatcher, and it returns a SigCatcher — where a
SigCatcher is a pointer to a function that takes an int argument and returns nothing.
Although using typedef names for pointer to function types makes life easier, it can also lead to confusion for
others who will maintain your code later on, so use with caution and proper documentation. See also Function
Pointers.
Instead of:
/* write once */
typedef long long ll;
typedef struct mystructure mystruct;
This reduces the amount of typing needed if the type is used many times in the program.
Improving portability
The attributes of data types vary across different architectures. For example, an int may be a 2-byte type in one
implementation and an 4-byte type in another. Suppose a program needs to use a 4-byte type to run correctly.
In one implementation, let the size of int be 2 bytes and that of long be 4 bytes. In another, let the size of int be 4
bytes and that of long be 8 bytes. If the program is written using the second implementation,
Then, only the typedef statement would need to be changed each time, instead of examining the whole program.
Version ≥ C99
The <stdint.h> header and the related <inttypes.h> header define standard type names (using typedef) for
integers of various sizes, and these names are often the best choice in modern code that needs fixed size integers.
For example, uint8_t is an unsigned 8-bit integer type; int64_t is a signed 64-bit integer type. The type uintptr_t
is an unsigned integer type big enough to hold any pointer to object. These types are theoretically optional — but it
is rare for them not to be available. There are variants like uint_least16_t (the smallest unsigned integer type with
at least 16 bits) and int_fast32_t (the fastest signed integer type with at least 32 bits). Also, intmax_t and
uintmax_t are the largest integer types supported by the implementation. These types are mandatory.
If a set of data has a particular purpose, one can use typedef to give it a meaningful name. Moreover, if the
property of the data changes such that the base type must change, only the typedef statement would have to be
changed, instead of examining the whole program.
Since all objects, not living in global scope or being declared static, have automatic storage duration by default
when defined, this keyword is mostly of historical interest and should not be used:
int foo(void)
{
/* An integer with automatic storage duration. */
auto int i = 3;
/* Same */
int j = 5;
return 0;
} /* The values of i and j are no longer able to be used. */
The only property that is definitively different for all objects that are declared with register is that they cannot
have their address computed. Thereby register can be a good tool to ensure certain optimizations:
is an object that can never alias because no code can pass its address to another function where it might be
changed unexpectedly.
cannot decay into a pointer to its first element (i.e. array turning into &array[0]). This means that the elements of
such an array cannot be accessed and the array itself cannot be passed to a function.
In fact, the only legal usage of an array declared with a register storage class is the sizeof operator; any other
operator would require the address of the first element of the array. For that reason, arrays generally should not be
declared with the register keyword since it makes them useless for anything other than size computation of the
entire array, which can be done just as easily without the register keyword.
The register storage class is more appropriate for variables that are defined inside a block and are accessed with
high frequency. For example,
/* Same; static is attached to the function type of f, not the return type int. */
static int f(int n);
2. To save data for use with the next call of a function (scope=block):
void foo()
{
static int a = 0; /* has static storage duration and its lifetime is the
* entire execution of the program; initialized to 0 on
* first function call */
int b = 0; /* b has block scope and has automatic storage duration and
* only "exists" within function */
a += 10;
b += 10;
int main(void)
{
int i;
for (i = 0; i < 5; i++)
{
foo();
}
return 0;
}
Static variables retain their value even when called from multiple different threads.
3. Used in function parameters to denote an array is expected to have a constant minimum number of
elements and a non-null parameter:
The required number of items (or even a non-null pointer) is not necessarily checked by the compiler, and
compilers are not required to notify you in any way if you don't have enough elements. If a programmer
passes fewer than 512 elements or a null pointer, undefined behavior is the result. Since it is impossible to
enforce this, extra care must be used when passing a value for that parameter to such a function.
/* NodeRef is a type used for pointers to a structure type with the tag "node" */
typedef struct node *NodeRef;
/* SigHandler is the function pointer type that gets passed to the signal function. */
typedef void (*SigHandler)(int);
While not technically a storage class, a compiler will treat it as one since none of the other storage classes are
allowed if the typedef keyword is used.
The typedefs are important and should not be substituted with #define macro.
However,
/* file2.c */
#include <stdio.h>
int main(void)
{
/* `extern` keyword refers to external definition of `foo`. */
extern int foo;
printf("%d\n", foo);
return 0;
}
Version ≥ C99
Things get slightly more interesting with the introduction of the inline keyword in C99:
/* Should usually be place in a header file such that all users see the definition */
/* Hints to the compiler that the function `bar` might be inlined */
/* and suppresses the generation of an external symbol, unless stated otherwise. */
inline void bar(int drink)
{
printf("You ordered drink no.%d\n", drink);
}
This was a new storage specifier introduced in C11 along with multi-threading. This isn't available in earlier C
standards.
Denotes thread storage duration. A variable declared with _Thread_local storage specifier denotes that the object is
local to that thread and its lifetime is the entire execution of the thread in which it's created. It can also appear along
with static or extern.
#include <threads.h>
#include <stdio.h>
#define SIZE 5
return 0;
}
int main(void)
/* create 5 threads. */
for(int i = 0; i < SIZE; i++) {
thrd_create(&id[i], thread_func, &arr[i]);
}
/**
* This is a function declaration.
* It tells the compiler that the function exists somewhere.
*/
void foo(int id, char *name);
#endif /* FOO_DOT_H */
foo.c
#include "foo.h" /* Always include the header file that declares something
* in the C file that defines it. This makes sure that the
* declaration and definition are always in-sync. Put this
* header first in foo.c to ensure the header is self-contained.
*/
#include <stdio.h>
/**
* This is the function definition.
* It is the actual body of the function which was declared elsewhere.
*/
void foo(int id, char *name)
{
fprintf(stderr, "foo(%d, \"%s\");\n", id, name);
/* This will print how foo was called to stderr - standard error.
* e.g., foo(42, "Hi!") will print `foo(42, "Hi!")`
*/
}
main.c
#include "foo.h"
int main(void)
{
foo(42, "bar");
return 0;
}
First, we compile both foo.c and main.c to object files. Here we use the gcc compiler, your compiler may have a
different name and need other options.
global.h
/**
* This tells the compiler that g_myglobal exists somewhere.
* Without "extern", this would create a new variable named
* g_myglobal in _every file_ that included it. Don't miss this!
*/
extern int g_myglobal; /* _Declare_ g_myglobal, that is promise it will be _defined_ by
* some module. */
#endif /* GLOBAL_DOT_H */
global.c
#include "global.h" /* Always include the header file that declares something
* in the C file that defines it. This makes sure that the
* declaration and definition are always in-sync.
*/
main.c
#include "global.h"
int main(void)
{
g_myglobal = 42;
return 0;
}
See also How do I use extern to share variables between source files?
The above declaration declares single identifier named a which refers to some object with int type.
Basically, the way this works is like this - first you put some type, then you write a single or multiple expressions
separated via comma (,) (which will not be evaluated at this point - and which should otherwise be referred
to as declarators in this context). In writing such expressions, you are allowed to apply only the indirection (*),
function call (( )) or subscript (or array indexing - [ ]) operators onto some identifier (you can also not use any
operators at all). The identifier used is not required to be visible in the current scope. Some examples:
Note that none of the above identifiers were visible prior to this declaration and so the expressions used would not be
valid before it.
After each such expression, the identifier used in it is introduced into the current scope. (If the identifier has
assigned linkage to it, it may also be re-declared with the same type of linkage so that both identifiers refer to the
same object or function)
Additionally, the equal operator sign (=) may be used for initialization. If an unevaluated expression (declarator) is
followed by = inside the declaration - we say that the identifier being introduced is also being initialized. After the =
sign we can put once again some expression, but this time it'll be evaluated and its value will be used as initial for
the object declared.
Examples:
Later in your code, you are allowed to write the exact same expression from the declaration part of the newly
introduced identifier, giving you an object of the type specified at the beginning of the declaration, assuming that
you've assigned valid values to all accessed objects in the way. Examples:
void f()
{
int b2; /* you should be able to write later in your code b2
which will directly refer to the integer object
that b2 identifies */
b2 = 2; /* assign a value to b2 */
int *b3; /* you should be able to write later in your code *b3 */
int **b4; /* you should be able to write later in your code **b4 */
b4 = &b3;
void (*p)(); /* you should be able to write later in your code (*p)() */
The declaration of b3 specifies that you can potentially use b3 value as a mean to access some integer object.
Of course, in order to apply indirection (*) to b3, you should also have a proper value stored in it (see pointers for
more info). You should also first store some value into an object before trying to retrieve it (you can see more about
this problem here). We've done all of this in the above examples.
This one tells the compiler that you'll attempt to call a3. In this case a3 refers to function instead of an object. One
difference between object and function is that functions will always have some sort of linkage. Examples:
void f1()
{
{
int f2(); /* 1 refers to some function f2 */
}
{
int f2(); /* refers to the exact same function f2 as (1) */
}
}
In the above example, the 2 declarations refer to the same function f2, whilst if they were declaring objects then in
this context (having 2 different block scopes), they would have be 2 different distinct objects.
int (*a3)(); /* you should be able to apply indirection to `a3` and then call it */
Now it may seems to be getting complicated, but if you know operators precedence you'll have 0 problems reading
the above declaration. The parentheses are needed because the * operator has less precedence then the ( ) one.
In the case of using the subscript operator, the resulting expression wouldn't be actually valid after the declaration
because the index used in it (the value inside [ and ]) will always be 1 above the maximum allowed value for this
object/function.
a4[5] will result into UB. More information about arrays can be found here.
Unfortunately for us, although syntactically possible, the declaration of a5 is forbidden by the current standard.
(you can technically put the typedef after the type too - like this int typedef (*(*t0)())[5]; but this is discouraged)
The above declarations declares an identifier for a typedef name. You can use it like this afterwards:
t0 pf;
int (*(*pf)())[5];
As you can see the typedef name "saves" the declaration as a type to use later for other declarations. This way you
can save some keystrokes. Also as declaration using typedef is still a declaration you are not limited only by the
above example:
t0 (*pf1);
int (*(**pf1)())[5];
Declare those in a separate header which gets included by any file ("Translation Unit") which wants to make use of
them. It's handy to use the same header to declare a related enumeration to identify all string-resources:
resources.h:
#ifndef RESOURCES_H
#define RESOURCES_H
typedef enum { /* Define a type describing the possible valid resource IDs. */
RESOURCE_UNDEFINED = -1, /* To be used to initialise any EnumResourceID typed variable to be
extern const char * const resources[RESOURCE_MAX]; /* Declare, promise to anybody who includes
this, that at linkage-time this symbol will be around.
The 1st const guarantees the strings will not change,
the 2nd const guarantees the string-table entries
will never suddenly point somewhere else as set during
initialisation. */
#endif
To actually define the resources created a related .c-file, that is another translation unit holding the actual instances
of the what had been declared in the related header (.h) file:
resources.c:
#include "resources.h" /* To make sure clashes between declaration and definition are
recognised by the compiler include the declaring header into
the implementing, defining translation unit (.c file).
main.c:
#include "resources.h"
int main(void)
{
EnumResourceID resource_id = RESOURCE_UNDEFINED;
return EXIT_SUCCESS;
}
Compile the three file above using GCC, and link them to become the program file main for example using this:
(use these -Wall -Wextra -pedantic -Wconversion to make the compiler really picky, so you don't miss anything
before posting the code to SO, will say the world, or even worth deploying it into production)
$ ./main
And get:
STEP 1
Find the identifier. This is your starting point. Then say to yourself, "identifier is." You've started your declaration.
STEP 2
Look at the symbols on the right of the identifier. If, say, you find () there, then you know that this is the
declaration for a function. So you would then have "identifier is function returning". Or if you found a [] there, you
would say "identifier is array of". Continue right until you run out of symbols OR hit a right parenthesis ). (If you hit a
left parenthesis (, that's the beginning of a () symbol, even if there is stuff in between the parentheses. More on
that below.)
STEP 3
Look at the symbols to the left of the identifier. If it is not one of our symbols above (say, something like "int"), just
say it. Otherwise, translate it into English using that table above. Keep going left until you run out of symbols OR hit
int *p[];
int *p[];
^
"p is"
int *p[];
^^
Can't move right anymore (out of symbols), so move left and find:
int *p[];
^
int *p[];
^^^
Another example:
int *(*func())();
int *(*func())();
^^^^
"func is"
Move right.
int *(*func())();
^^
int *(*func())();
^
Can't move left anymore because of the left parenthesis, so keep going right.
int *(*func())();
^^
int *(*func())();
^
And finally, keep going left, because there's nothing left on the right.
int *(*func())();
^^^
As you can see, this rule can be quite useful. You can also use it to sanity check yourself while you are creating
declarations, and to give you a hint about where to put the next symbol and whether parentheses are required.
Some declarations look much more complicated than they are due to array sizes and argument lists in prototype
form. If you see [3], that's read as "array (size 3) of...". If you see (char *,int) that's read as *"function expecting
(char ,int) and returning...".
*"fun_one is pointer to function expecting (char ,double) and returning pointer to array (size 9) of array (size 20) of int."
As you can see, it's not as complicated if you get rid of the array sizes and argument lists:
int (*(*fun_one)())[][];
You can decipher it that way, and then put in the array sizes and argument lists later.
It is quite possible to make illegal declarations using this rule, so some knowledge of what's legal in C is necessary.
For instance, if the above had been:
int *((*fun_one)())[][];
In all the above cases, you would need a set of parentheses to bind a * symbol on the left between these () and []
right-side symbols in order for the declaration to be legal.
Legal
int i; an int
int *p; an int pointer (ptr to an int)
int a[]; an array of ints
int f(); a function returning an int
int **pp; a pointer to an int pointer (ptr to a ptr to an int)
int (*pa)[]; a pointer to an array of ints
int (*pf)(); a pointer to a function returning an int
int *ap[]; an array of int pointers (array of ptrs to ints)
int aa[][]; an array of arrays of ints
int *fp(); a function returning an int pointer
int ***ppp; a pointer to a pointer to an int pointer
int (**ppa)[]; a pointer to a pointer to an array of ints
int (**ppf)(); a pointer to a pointer to a function returning an int
int *(*pap)[]; a pointer to an array of int pointers
int (*paa)[][]; a pointer to an array of arrays of ints
int *(*pfp)(); a pointer to a function returning an int pointer
int **app[]; an array of pointers to int pointers
int (*apa[])[]; an array of pointers to arrays of ints
int (*apf[])(); an array of pointers to functions returning an int
int *aap[][]; an array of arrays of int pointers
int aaa[][][]; an array of arrays of arrays of int
int **fpp(); a function returning a pointer to an int pointer
int (*fpa())[]; a function returning a pointer to an array of ints
int (*fpf())(); a function returning a pointer to a function returning an int
Illegal
Source: http://ieng9.ucsd.edu/~cs30x/rt_lt.rule.html
Depending on the CPU architecture and the compiler, a structure may occupy more space in memory than the sum
of the sizes of its component members. The compiler can add padding between members or at the end of the
structure, but not at the beginning.
struct foo {
char *p; /* 8 bytes */
char c; /* 1 byte */
long x; /* 8 bytes */
};
The structure will be automatically padded to have8-byte alignment and will look like this:
struct foo {
char *p; /* 8 bytes */
char c; /* 1 byte */
long x; /* 8 bytes */
};
So sizeof(struct foo) will give us 24 instead of 17. This happened because of a 64 bit compiler read/write from/to
Memory in 8 bytes of word in each step and obvious when try to write char c; a one byte in memory a complete 8
bytes (i.e. word) fetched and consumes only first byte of it and its seven successive of bytes remains empty and not
accessible for any read and write operation for structure padding.
Structure packing
But if you add the attribute packed, the compiler will not add padding:
To save space.
It must be taken in consideration that some processors such as the ARM Cortex-M0 do not allow unaligned memory
access; in such cases, structure packing can lead to undefined behaviour and can crash the CPU.
struct test_32 {
int a; // 4 byte
short b; // 2 byte
int c; // 4 byte
} str_32;
We might expect this struct to occupy only 10 bytes of memory, but by printing sizeof(str_32) we see it uses 12
bytes.
This happened because the compiler aligns variables for fast access. A common pattern is that when the base type
occupies N bytes (where N is a power of 2 such as 1, 2, 4, 8, 16 — and seldom any bigger), the variable should be
aligned on an N-byte boundary (a multiple of N bytes).
For the structure shown with sizeof(int) == 4 and sizeof(short) == 2, a common layout is:
Thus struct test_32 occupies 12 bytes of memory. In this example, there is no trailing padding.
The compiler will ensure that any struct test_32 variables are stored starting on a 4-byte boundary, so that the
members within the structure will be properly aligned for fast access. Memory allocation functions such as
malloc(), calloc() and realloc() are required to ensure that the pointer returned is sufficiently well aligned for
use with any data type, so dynamically allocated structures will be properly aligned too.
You can end up with odd situations such as on a 64-bit Intel x86_64 processor (e.g. Intel Core i7 — a Mac running
macOS Sierra or Mac OS X), where when compiling in 32-bit mode, the compilers place double aligned on a 4-byte
boundary; but, on the same hardware, when compiling in 64-bit mode, the compilers place double aligned on an 8-
byte boundary.
For managing dynamically allocated memory, the standard C library provides the functions malloc(), calloc(),
realloc() and free(). In C99 and later, there is also aligned_alloc(). Some systems also provide alloca().
The C dynamic memory allocation functions are defined in the <stdlib.h> header. If one wishes to allocate
memory space for an object dynamically, the following code can be used:
This computes the number of bytes that ten ints occupy in memory, then requests that many bytes from malloc
and assigns the result (i.e., the starting address of the memory chunk that was just created using malloc) to a
pointer named p.
It is good practice to use sizeof to compute the amount of memory to request since the result of sizeof is
implementation defined (except for character types, which are char, signed char and unsigned char, for which
sizeof is defined to always give 1).
Because malloc might not be able to service the request, it might return a null pointer. It is important to
check for this to prevent later attempts to dereference the null pointer.
Memory dynamically allocated using malloc() may be resized using realloc() or, when no longer needed,
released using free().
Alternatively, declaring int array[10]; would allocate the same amount of memory. However, if it is declared
inside a function without the keyword static, it will only be usable within the function it is declared in and the
functions it calls (because the array will be allocated on the stack and the space will be released for reuse when the
function returns). Alternatively, if it is defined with static inside a function, or if it is defined outside any function,
then its lifetime is the lifetime of the program. Pointers can also be returned from a function, however a function in
C can not return an array.
Zeroed Memory
The memory returned by malloc may not be initialized to a reasonable value, and care should be taken to zero the
memory with memset or to immediately copy a suitable value into it. Alternatively, calloc returns a block of the
A note on calloc: Most (commonly used) implementations will optimise calloc() for performance, so it will be
faster than calling malloc(), then memset(), even though the net effect is identical.
Aligned Memory
Version ≥ C11
C11 introduced a new function aligned_alloc() which allocates space with the given alignment. It can be used if
the memory to be allocated is needed to be aligned at certain boundaries which can't be satisfied by malloc() or
calloc(). malloc() and calloc() functions allocate memory that's suitably aligned for any object type (i.e. the
alignment is alignof(max_align_t)). But with aligned_alloc() greater alignments can be requested.
The C11 standard imposes two restrictions: 1) the size (second argument) requested must be an integral multiple of
the alignment (first argument) and 2) the value of alignment should be a valid alignment supported by the
implementation. Failure to meet either of them results in undefined behavior.
The memory pointed to by p is reclaimed (either by the libc implementation or by the underlying OS) after the call
to free(), so accessing that freed memory block via p will lead to undefined behavior. Pointers that reference
memory elements that have been freed are commonly called dangling pointers, and present a security risk.
Furthermore, the C standard states that even accessing the value of a dangling pointer has undefined behavior.
Please note that you can only call free() on pointers that have directly been returned from the malloc(),
calloc(), realloc() and aligned_alloc() functions, or where documentation tells you the memory has been
allocated that way (functions like strdup () are notable examples). Freeing a pointer that is,
is forbidden. Such an error will usually not be diagnosed by your compiler but will lead the program execution in an
undefined state.
There are two common strategies to prevent such instances of undefined behavior.
The first and preferable is simple - have p itself cease to exist when it is no longer needed, for example:
if (something_is_needed())
{
free(p);
}
By calling free() directly before the end of the containing block (i.e. the }), p itself ceases to exist. The compiler will
give a compilation error on any attempt to use p after that.
A second approach is to also invalidate the pointer itself after releasing the memory to which it points:
free(p);
p = NULL; // you may also use 0 instead of NULL
On many platforms, an attempt to dereference a null pointer will cause instant crash: Segmentation fault.
Here, we get at least a stack trace pointing to the variable that was used after being freed.
Without setting pointer to NULL we have dangling pointer. The program will very likely still crash, but later,
because the memory to which the pointer points will silently be corrupted. Such bugs are difficult to trace
because they can result in a call stack that completely unrelated to the initial problem.
It is safe to free a null pointer. The C Standard specifies that free(NULL) has no effect:
The free function causes the space pointed to by ptr to be deallocated, that is, made available for
further allocation. If ptr is a null pointer, no action occurs. Otherwise, if the argument does not
Sometimes the first approach cannot be used (e.g. memory is allocated in one function, and deallocated
much later in a completely different function)
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int *p = malloc(10 * sizeof *p);
if (NULL == p)
{
perror("malloc() failed");
return EXIT_FAILURE;
}
p[0] = 42;
p[9] = 15;
p = temporary;
}
/* From here on, array can be used with the new size it was
* realloc'ed to, until it is free'd. */
free(p);
The reallocated object may or may not have the same address as *p. Therefore it is important to capture the return
value from realloc which contains the new address if the call is successful.
Make sure you assign the return value of realloc to a temporary instead of the original p. realloc will return null in
case of any failure, which would overwrite the pointer. This would lose your data and create a memory leak.
If the size of the space requested is zero, the behavior of realloc is implementation-defined. This is similar for all
memory allocation functions that receive a size parameter of value 0. Such functions may in fact return a non-null
pointer, but that must never be dereferenced.
This means realloc(ptr,0) may not really free/deallocate the memory, and thus it should never be used as a
replacement for free.
Since C99, C has variable length arrays, VLA, that model arrays with bounds that are only known at initialization
time. While you have to be careful not to allocate too large VLA (they might smash your stack), using pointers to VLA
and using them in sizeof expressions is fine.
Here matrix is a pointer to elements of type double[m], and the sizeof expression with double[n][m] ensures that
it contains space for n such elements.
The presence of VLA in the language also affects the possible declarations of arrays and pointers in function
headers. Now, a general integer expression is permitted inside the [] of array parameters. For both functions the
expressions in [] use parameters that have declared before in the parameter list. For sumAll these are the lengths
that the user code expects for the matrix. As for all array function parameters in C the innermost dimension is
rewritten to a pointer type, so this is equivalent to the declaration
That is, n is not really part of the function interface, but the information can be useful for documentation and it
could also be used by bounds checking compilers to warn about out-of-bounds access.
Likwise, for main, the expression argc+1 is the minimal length that the C standard prescribes for the argv argument.
Note that officially VLA support is optional in C11, but we know of no compiler that implements C11 and that
doesn't have them. You could test with the macro __STDC_NO_VLA__ if you must.
Manual page
#include <alloca.h>
// glibc version of stdlib.h include alloca.h by default
Allocate memory on the stack frame of the caller, the space referenced by the returned pointer is automatically
free'd when the caller function finishes.
While this function is convenient for automatic memory management, be aware that requesting large allocation
could cause a stack overflow, and that you cannot use free with memory allocated with alloca (which could cause
more issue with stack overflow).
For these reason it is not recommended to use alloca inside a loop nor a recursive function.
And because the memory is free'd upon function return you cannot return the pointer as a function result (the
behavior would be undefined).
Summary
Recommendation
Version ≥ C99
Modern alternative.
This works where alloca() does, and works in places where alloca() doesn't (inside loops, for example). It does
assume either a C99 implementation or a C11 implementation that does not define __STDC_NO_VLA__.
To implement a simple scheme, a control block is stored in the region of memory immediately before the pointer to
be returned from the call. This means that free() may be implemented by subtracting from the returned pointer
and reading off the control information, which is typically the block size plus some information that allows it to be
put back in the free list - a linked list of unallocated blocks.
When the user requests an allocation, the free list is searched until a block of identical or larger size to the amount
requested is found, then if necessary it is split. This can lead to memory fragmentation if the user is continually
making many allocations and frees of unpredictable size and and at unpredictable intervals (not all real programs
behave like that, the simple scheme is often adequate for small programs).
Many programs require large numbers of allocations of small objects of the same size. This is very easy to
implement. Simply use a block with a next pointer. So if a block of 32 bytes is required:
union block
void *block_alloc()
{
void *answer = head;
if (answer)
head = head->next;
return answer;
}
This scheme is extremely fast and efficient, and can be made generic with a certain loss of clarity.
switch (SIGN_REP(long)) {
case sign_magnitude: { /* do something */ break; }
case ones_compl: { /* do otherwise */ break; }
case twos_compl: { /* do yet else */ break; }
case 0: { _Static_assert(SIGN_REP(long), "bogus sign representation"); }
}
The same pattern applies to the representation of narrower types, but they cannot be tested by this technique
because the operands of & are subject to "the usual arithmetic conversions" before the result is computed.
int myThread(void* a) {
++active; // increment active race free
// do something
--active; // decrement active race free
return 0;
}
All lvalue operations (operations that modify the object) that are allowed for the base type are allowed and will not
lead to race conditions between different threads that access them.
Operations on atomic objects are generally orders of magnitude slower than normal arithmetic operations.
This also includes simple load or store operations. So you should only use them for critical tasks.
Usual arithmetic operations and assignment such as a = a+1; are in fact three operations on a: first a load,
then addition and finally a store. This is not race free. Only the operation a += 1; and a++; are.
/* Do stuff. */
return EXIT_SUCCESS;
}
Additional notes:
1. For a function having a return type as void (not including void * or related types), the return statement
should not have any associated expression; i.e, the only allowed return statement would be return;.
2. For a function having a non-void return type, the return statement shall not appear without an expression.
3. For main() (and only for main()), an explicit return statement is not required (in C99 or later). If the execution
reaches the terminating }, an implicit value of 0 is returned. Some people think omitting this return is bad
practice; others actively suggest leaving it out.
Returning nothing
size_t i,j;
for (i = 0; i < myValue && !breakout_condition; ++i) {
But the C language offers the goto clause, which can be useful in this case. By using it with a label declared after the
loops, we can easily break out of the loops.
size_t i,j;
for (i = 0; i < myValue; ++i) {
for (j = 0; j < mySecondValue; ++j) {
...
if(breakout_condition)
goto final;
}
}
final:
However, often when this need comes up a return could be better used instead. This construct is also considered
"unstructured" in structural programming theory.
/* normal processing */
free(ptr);
return SUCCESS;
out_of_memory:
free(ptr); /* harmless, and necessary if we have further errors */
return FAILURE;
Use of goto keeps error flow separate from normal program control flow. It is however also considered
"unstructured" in the technical sense.
int main(void)
{
int sum = 0;
printf("Enter digits to be summed up or 0 to exit:\n");
do
{
int c = getchar();
if (EOF == c)
break;
}
if ('\n' != c)
{
flush_input_stream(stdin);
}
if (!isdigit(c))
{
printf("%c is not a digit! Start over!\n", c);
continue;
}
if ('0' == c)
{
printf("Exit requested.\n");
break;
}
sum += c - '0';
return EXIT_SUCCESS;
}
if (0 != i)
{
fprintf(stderr, "Flushed %zu characters from input.\n", i);
}
}
Headers declare types, functions, macros etc that are needed by the consumers of a set of facilities. All the code
that uses any of those facilities includes the header. All the code that defines those facilities includes the header.
This allows the compiler to check that the uses and definitions match.
Idemopotence
If a header file is included multiple times in a translation unit (TU), it should not break builds.
Self-containment
If you need the facilities declared in a header file, you should not have to include any other headers explicitly.
Minimality
You should not be able to remove any information from a header without causing builds to fail.
Of more concern to C++ than C, but nevertheless important in C too. If the code in a TU (call it code.c)
directly uses the features declared by a header (call it "headerA.h"), then code.c should #include
"headerA.h" directly, even if the TU includes another header (call it "headerB.h") that happens, at the
moment, to include "headerA.h".
Occasionally, there might be good enough reasons to break one or more of these guidelines, but you should both
be aware that you are breaking the rule and be aware of the consequences of doing so before you break it.
Historical rules
Header files should not be nested. The prologue for a header file should, therefore, describe what other
headers need to be #included for the header to be functional. In extreme cases, where a large number
of header files are to be included in several different source files, it is acceptable to put all common
#includes in one include file.
Modern rules
However, since then, opinion has tended in the opposite direction. If a source file needs to use the facilities
declared by a header header.h, the programmer should be able to write:
#include "header.h"
and (subject only to having the correct search paths set on the command line), any necessary pre-requisite headers
will be included by header.h without needing any further headers added to the source file.
This provides better modularity for the source code. It also protects the source from the "guess why this header
was added" conundrum that arises after the code has been modified and hacked for a decade or two.
The NASA Goddard Space Flight Center (GSFC) coding standards for C is one of the more modern standards — but
is now a little hard to track down. It states that headers should be self-contained. It also provides a simple way to
ensure that headers are self-contained: the implementation file for the header should include the header as the
first header. If it is not self-contained, that code will not compile.
This standard requires a unit’s header to contain #include statements for all other headers required by
the unit header. Placing #include for the unit header first in the unit body allows the compiler to verify
that the header contains all required #include statements.
An alternate design, not permitted by this standard, allows no #include statements in headers; all
#includes are done in the body files. Unit header files then must contain #ifdef statements that check
that the required headers are included in the proper order.
One advantage of the alternate design is that the #include list in the body file is exactly the dependency
list needed in a makefile, and this list is checked by the compiler. With the standard design, a tool must be
used to generate the dependency list. However, all of the branch recommended development
environments provide such a tool.
A major disadvantage of the alternate design is that if a unit’s required header list changes, each file that
uses that unit must be edited to update the #include statement list. Also, the required header list for a
Another disadvantage of the alternate design is that compiler library header files, and other third party
files, must be modified to add the required #ifdef statements.
If a header header.h needs a new nested header extra.h, you do not have to check every source file that
uses header.h to see whether you need to add extra.h.
If a header header.h no longer needs to include a specific header notneeded.h, you do not have to check
every source file that uses header.h to see whether you can safely remove notneeded.h (but see Include
what you use.
You do not have to establish the correct sequence for including the pre-requisite headers (which requires a
topological sort to do the job properly).
Checking self-containment
See Linking against a static library for a script chkhdr that can be used to test idempotence and self-containment of
a header file.
For example, a project header should not include <stdio.h> unless one of the function interfaces uses the type
FILE * (or one of the other types defined solely in <stdio.h>). If an interface uses size_t, the smallest header that
suffices is <stddef.h>. Obviously, if another header that defines size_t is included, there is no need to include
<stddef.h> too.
If the headers are minimal, then it keeps the compilation time to a minimum too.
It is possible to devise headers whose sole purpose is to include a lot of other headers. These seldom turn out to be
a good idea in the long run because few source files will actually need all the facilities described by all the headers.
For example, a <standard-c.h> could be devised that includes all the standard C headers — with care since some
headers are not always present. However, very few programs actually use the facilities of <locale.h> or
<tgmath.h>.
So, the double quoted form may look in more places than the angle-bracketed form. The standard specifies by
example that the standard headers should be included in angle-brackets, even though the compilation works if you
use double quotes instead. Similarly, standards such as POSIX use the angle-bracketed format — and you should
too. Reserve double-quoted headers for headers defined by the project. For externally-defined headers (including
headers from other projects your project relies on), the angle-bracket notation is most appropriate.
Note that there should be a space between #include and the header, even though the compilers will accept no
space there. Spaces are cheap.
#include <openssl/ssl.h>
#include <sys/stat.h>
#include <linux/kernel.h>
You should consider whether to use that namespace control in your project (it is quite probably a good idea). You
should steer clear of the names used by existing projects (in particular, both sys and linux would be bad choices).
If you use this, your code should be careful and consistent in the use of the notation.
Header files should seldom if ever define variables. Although you will keep global variables to a minimum, if you
need a global variable, you will declare it in a header, and define it in one suitable source file, and that source file
will include the header to cross-check the declaration and definition, and all source files that use the variable will
use the header to declare it.
Corollary: you will not declare global variables in a source file — a source file will only contain definitions.
Header files should seldom declare static functions, with the notable exception of static inline functions which
will be defined in headers if the function is needed in more than one source file.
Cross-references
There are two ways to achieve idempotence: header guards and the #pragma once directive.
Header guards
Header guards are simple and reliable and conform to the C standard. The first non-comment lines in a header file
should be of the form:
#ifndef UNIQUE_ID_FOR_HEADER
#define UNIQUE_ID_FOR_HEADER
The last non-comment line should be #endif, optionally with a comment after it:
#endif /* UNIQUE_ID_FOR_HEADER */
All the operational code, including other #include directives, should be between these lines.
Each name must be unique. Often, a name scheme such as HEADER_H_INCLUDED is used. Some older code uses a
symbol defined as the header guard (e.g. #ifndef BUFSIZ in <stdio.h>), but it is not as reliable as a unique name.
One option would be to use a generated MD5 (or other) hash for the header guard name. You should avoid
emulating the schemes used by system headers which frequently use names reserved to the implementation —
names starting with an underscore followed by either another underscore or an upper-case letter.
Alternatively, some compilers support the #pragma once directive which has the same effect as the three lines
shown for header guards.
#pragma once
The compilers which support #pragma once include MS Visual Studio and GCC and Clang. However, if portability is a
concern, it is better to use header guards, or use both. Modern compilers (those supporting C89 or later) are
required to ignore, without comment, pragmas that they do not recognize ('Any such pragma that is not recognized
by the implementation is ignored') but old versions of GCC were not so indulgent.
Suppose a source file source.c includes a header arbitrary.h which in turn coincidentally includes freeloader.h,
but the source file also explicitly and independently uses the facilities from freeloader.h. All is well to start with.
Then one day arbitrary.h is changed so its clients no longer need the facilities of freeloader.h. Suddenly,
source.c stops compiling — because it didn't meet the IWYU criteria. Because the code in source.c explicitly used
the facilities of freeloader.h, it should have included what it uses — there should have been an explicit #include
"freeloader.h" in the source too. (Idempotency would have ensured there wasn't a problem.)
The IWYU philosophy maximizes the probability that code continues to compile even with reasonable changes
made to interfaces. Clearly, if your code calls a function that is subsequently removed from the published interface,
This is a particular problem in C++ because standard headers are allowed to include each other. Source file
file.cpp could include one header header1.h that on one platform includes another header header2.h. file.cpp
might turn out to use the facilities of header2.h as well. This wouldn't be a problem initially - the code would
compile because header1.h includes header2.h. On another platform, or an upgrade of the current platform,
header1.h could be revised so it no longer includes header2.h, and thenfile.cpp would stop compiling as a result.
IWYU would spot the problem and recommend that header2.h be included directly in file.cpp. This would ensure
it continues to compile. Analogous considerations apply to C code too.
All of these functions take one parameter, an int that must be either EOF or representable as an unsigned char.
The names of the classifying functions are prefixed with 'is'. Each returns an integer non-zero value (TRUE) if the
character passed to it satisfies the related condition. If the condition is not satisfied then the function returns a zero
value (FALSE).
int a;
int c = 'A';
a = isalpha(c); /* Checks if c is alphabetic (A-Z, a-z), returns non-zero here. */
a = isalnum(c); /* Checks if c is alphanumeric (A-Z, a-z, 0-9), returns non-zero here. */
a = iscntrl(c); /* Checks is c is a control character (0x00-0x1F, 0x7F), returns zero here. */
a = isdigit(c); /* Checks if c is a digit (0-9), returns zero here. */
a = isgraph(c); /* Checks if c has a graphical representation (any printing character except space),
returns non-zero here. */
a = islower(c); /* Checks if c is a lower-case letter (a-z), returns zero here. */
a = isprint(c); /* Checks if c is any printable character (including space), returns non-zero here.
*/
a = isupper(c); /* Checks if c is a upper-case letter (a-z), returns zero here. */
a = ispunct(c); /* Checks if c is a punctuation character, returns zero here. */
a = isspace(c); /* Checks if c is a white-space character, returns zero here. */
a = isupper(c); /* Checks if c is an upper-case letter (A-Z), returns non-zero here. */
a = isxdigit(c); /* Checks if c is a hexadecimal digit (A-F, a-f, 0-9), returns non-zero here. */
Version ≥ C99
a = isblank(c); /* Checks if c is a blank character (space or tab), returns non-zero here. */
There are two conversion functions. These are named using the prefix 'to'. These functions take the same argument
as those above. However the return value is not a simple zero or non-zero but the passed argument changed in
some manner.
int a;
int c = 'A';
ASCII
characters iscntrl isblank isspace isupper islower isalpha isdigit isxdigit isalnum ispunct isgraph isprint
values
0x00 ..
NUL, (other control codes) •
0x08
0x09 tab ('\t') • • •
0x0A .. (white-space control
• •
0x0D codes: '\f','\v','\n','\r')
0x0E ..
(other control codes) •
0x1F
0x20 space (' ') • • •
0x21 ..
!"#$%&'()*+,-./ • • •
0x2F
0x30 ..
0123456789 • • • • •
0x39
0x3a ..
:;<=>?@ • • •
0x40
0x41 ..
ABCDEF • • • • • •
0x46
0x47 ..
GHIJKLMNOPQRSTUVWXYZ • • • • •
0x5A
0x5B ..
[]^_` • • •
0x60
0x61 ..
abcdef • • • • • •
0x66
0x67 ..
ghijklmnopqrstuvwxyz • • • • •
0x7A
0x7B ..
{}~bar • • •
0x7E
0x7F (DEL) •
typedef struct {
size_t space;
size_t alnum;
size_t punct;
} chartypes;
return types;
}
The classify function reads characters from a stream and counts the number of spaces, alphanumeric and
When reading a character from a stream, the result is saved as an int, since otherwise there would be an
ambiguity between reading EOF (the end-of-file marker) and a character that has the same bit pattern.
The character classification functions (e.g. isspace) expect their argument to be either representable as an
unsigned char, or the value of the EOF macro. Since this is exactly what the fgetc returns, there is no need for
conversion here.
The return value of the character classification functions only distinguishes between zero (meaning false)
and nonzero (meaning true). For counting the number of occurrences, this value needs to be converted to a
1 or 0, which is done by the double negation, !!.
typedef struct {
size_t space;
size_t alnum;
size_t punct;
} chartypes;
return types;
}
The classify function examines all characters from a string and counts the number of spaces, alphanumeric and
punctuation characters. It avoids several pitfalls.
The character classification functions (e.g. isspace) expect their argument to be either representable as an
unsigned char, or the value of the EOF macro.
The expression *p is of type char and must therefore be converted to match the above wording.
The char type is defined to be equivalent to either signed char or unsigned char.
When char is equivalent to unsigned char, there is no problem, since every possible value of the char type is
representable as unsigned char.
When char is equivalent to signed char, it must be converted to unsigned char before being passed to the
character classification functions. And although the value of the character may change because of this
conversion, this is exactly what these functions expect.
The return value of the character classification functions only distinguishes between zero (meaning false)
and nonzero (meaning true). For counting the number of occurrences, this value needs to be converted to a
1 or 0, which is done by the double negation, !!.
int n, x = 5;
n = ++x; /* x is incremented by 1(x=6), and result is assigned to n(6) */
/* this is a short form for two statements: */
/* x = x + 1; */
/* n = x ; */
When used in the postfix form, the operand's current value is used in the expression and then the value of the
operand is incremented by 1. Consider the following example:
int n, x = 5;
n = x++; /* value of x(5) is assigned first to n(5), and then x is incremented by 1; x(6) */
/* this is a short form for two statements: */
/* n = x; */
/* x = x + 1; */
int main()
{
int a, b, x = 42;
a = ++x; /* a and x are 43 */
b = x++; /* b is 43, x is 44 */
a = x--; /* a is is 44, x is 43 */
b = --x; /* b and x are 42 */
return 0;
}
From the above it is clear that post operators return the current value of a variable and then modify it, but pre
operators modify the variable and then return the modified value.
In all versions of C, the order of evaluation of pre and post operators are not defined, hence the following code can
return unexpected outputs:
int main()
{
int a, x = 42;
a = x++ + x; /* wrong */
a = x + x; /* right */
++x;
int ar[10];
x = 0;
ar[x] = x++; /* wrong */
Note that it is also good practice to use pre over post operators when used alone in a statement. Look at the above
code for this.
Note also, that when a function is called, all side effects on arguments must take place before the function runs.
int foo(int x)
{
return x;
}
int main()
{
int a = 42;
int b = foo(a++); /* This returns 43, even if it seems like it should return 42 */
return 0;
}
To solve this problem, the C standard suggested the use of combinations of three characters to produce a single
character called a trigraph. A trigraph is a sequence of three characters, the first two of which are question marks.
The following is a simple example that uses trigraph sequences instead of #, { and }:
??=include <stdio.h>
int main()
??<
printf("Hello World!\n");
??>
This will be changed by the C preprocessor by replacing the trigraphs with their single-character equivalents as if
the code had been written:
#include <stdio.h>
int main()
{
printf("Hello World!\n");
}
Trigraph Equivalent
??= #
??/ \
??' ^
??( [
??) ]
??! |
??< {
??> }
??- ~
Note that trigraphs are problematic because, for example, ??/ is a backslash and can affect the meaning of
continuation lines in comments, and have to be recognized inside strings and character literals (e.g. '??/??/' is a
single character, a backslash).
In 1994 more readable alternatives to five of the trigraphs were supplied. These use only two characters and are
known as digraphs. Unlike trigraphs, digraphs are tokens. If a digraph occurs in another token (e.g. string literals or
The following shows the difference before and after processing the digraphs sequence.
#include <stdio.h>
int main()
<%
printf("Hello %> World!\n"); /* Note that the string contains a digraph */
%>
#include <stdio.h>
int main()
{
printf("Hello %> World!\n"); /* Note the unchanged digraph within the string. */
}
Digraph Equivalent
<: [
:> ]
<% {
%> }
%: #
This code breaches the constraint and must produce a diagnostic message at compile time. This is very useful as
compared to undefined behavior as the developer will be informed of the issue before the program is run,
potentially doing anything.
Constraints thus tend to be errors which are easily detectable at compile time such as this, issues which result in
undefined behavior but would be difficult or impossible to detect at compile time are thus not constraints.
1) exact wording:
Version = C99
If an identifier has no linkage, there shall be no more than one declaration of the identifier (in a declarator or type
specifier) with the same scope and in the same name space, except for tags as specified in 6.7.2.3.
struct foo
{
bool bar;
};
void baz(void)
{
struct foo testStruct;
-testStruct; /* This breaks the constraint so must produce a diagnostic */
}
In this example we use four functions (plus main()) in three source files. Two of those (plusfive() and timestwo())
each get called by the other two located in "source1.c" and "source2.c". The main() is included so we have a
working example.
main.c:
#include <stdio.h>
#include <stdlib.h>
#include "headerfile.h"
int main(void) {
int start = 3;
int intermediate = complicated1(start);
printf("First result is %d\n", intermediate);
intermediate = complicated2(start);
printf("Second result is %d\n", intermediate);
return 0;
}
source1.c:
#include <stdio.h>
#include <stdlib.h>
#include "headerfile.h"
source2.c:
#include <stdio.h>
#include <stdlib.h>
#include "headerfile.h"
headerfile.h:
#ifndef HEADERFILE_H
#define HEADERFILE_H
#endif
Functions timestwo and plusfive get called by both complicated1 and complicated2, which are in different
"translation units", or source files. In order to use them in this way, we have to define them in the header.
We use the -O2 optimization option because some compilers don't inline without optimization turned on.
The effect of the inline keyword is that the function symbol in question is not emitted into the object file.
Otherwise an error would occur in the last line, where we are linking the object files to form the final executable. If
we would not have inline, the same symbol would be defined in both .o files, and a "multiply defined symbol"
error would occur.
In situations where the symbol is actually needed, this has the disadvantage that the symbol is not produced at all.
There are two possibilities to deal with that. The first is to add an extra extern declaration of the inlined functions
in exactly one of the .c files. So add the following to source1.c:
The other possibility is to define the function with static inline instead of inline. This method has the drawback
that eventually a copy of the function in question may be produced in every object file that is produced with this
header.
It is important to note however, this is not permitted by the C standard current or past and will result in undefined
behavior, none the less is is a very common extension offered by compilers (so check your compiler docs if you plan
to do this).
One real life example of this technique is the "Fast Inverse Square Root" algorithm which relies on implementation
details of IEEE 754 floating point numbers to perform an inverse square root more quickly than using floating point
operations, this algorithm can be performed either through pointer casting (which is very dangerous and breaks
the strict aliasing rule) or through a union (which is still undefined behavior but works in many compilers):
union floatToInt
{
int32_t intMember;
float floatMember; /* Float must be 32 bits IEEE 754 for this to work */
};
This technique was widely used in computer graphics and games in the past due to its greater speed compared to
using floating point operations, and is very much a compromise, losing some accuracy and being very non portable
in exchange for speed.
The simple example below demonstrates a union with two members, both of the same type. It shows that writing
to member m_1 results in the written value being read from member m_2 and writing to member m_2 results in the
written value being read from member m_1.
#include <stdio.h>
Result
u.m_2: 1
u.m_1: 2
#include <stdio.h>
#include <string.h>
union My_Union
{
int variable_1;
int variable_2;
};
struct My_Struct
{
int variable_1;
int variable_2;
};
#include <threads.h>
#include <stdlib.h>
void destroyBig(void) {
free((void*)Big);
}
void initBig(void) {
// assign to temporary with no const qualification
double* b = malloc(largeNum);
if (!b) {
perror("allocation failed for Big");
exit(EXIT_FAILURE);
}
// now initialize and store Big
initializeBigWithSophisticatedValues(largeNum, b);
Big = b;
// ensure that the space is freed on exit or quick_exit
atexit(destroyBig);
at_quick_exit(destroyBig);
}
The once_flag is used to coordinate different threads that might want to initialize the same data Big. The call to
call_once guarantees that
Besides allocation, a typical thing to do in such a once-called function is a dynamic initialization of a thread control
data structures such as mtx_t or cnd_t that can't be initialized statically, using mtx_init or cnd_init, respectively.
struct my_thread_data {
double factor;
};
int my_thread_func(void* a) {
struct my_thread_data* d = a;
// do something with d
printf("we found %g\n", d->factor);
// return an success or error code
return d->factor > 1.0;
}
return 0;
}
thrd_join(&thread, &result);