Gerdelan Anton - Professional Programming Tools For C and C++ (2020)
Gerdelan Anton - Professional Programming Tools For C and C++ (2020)
https://archive.org/details/professionalprogO000gerd
Professional
Programming Tools
for C and C++
be Seta
————
;
onesies ners
Hey
~ ~~
—"s
s
Professional
Programming Tools
for C and C++
\ /
Anton Gerdelan
Illustrated by Katja Zibrek
Professional Programming Tools for C and C++
ISBN: 978-1-5272-5848-8
First printing
About the cover: The cover image was the result of a short discussion between
the authors. “Most programming books have a picture of mountains or animals.
We should just do a cartoon of ourselves, or a startled beaver.”
Contents
Preface
About this Book
Assumed Knowledge
Code Examples
Reading Recommendations
4 Interactive Debuggers 42
Why Use Interactive Debuggers 42
Some Interactive Debuggers 43
How to use a Debugger 45
Quick Command-Line Backtrace 47
Using Core Dumps and JIT Debugging 48
Tips and Common Problems 50
5 Performance Profilers 52
Why Use Profilers 52
How to Analyse a Program 53
Manual Instrumentation and Sampling 53
Gprof 54
The Flat Profile 55
The Call Graph 56
How to use the Profile to Optimise 5/7.
Other Profilers 60
Very Sleepy (Windows) 60
Perf (Linux) 62
Remotery 64
Microsoft Visual Studio (Windows) 65
Instruments (macOS) 66
Tips and Common Problems 67
6 Build Systems 68
Why Use a Build System 68
Catches with Build Systems 68
Choosing a Build System 69
Linking a C Program with Multiple Source Files. 69
Linking a Library with a C Program 71
Make and Makefiles i2
Efficient Builds
Meta-Build Systems
Tips and Common Problems
10 Asm Inspection
Why Inspecting Asm is Useful
Assembly Concepts Overview
Compiler Assembly Output
Compiler Explorer
How to Use Compiler Explorer
Recommended Reading
Tips and Common Problems
Acknowledgements 144
This book presents a quick start to a full range of tools you can use for
programming and shipping quality software written in the C or C++ programming
languages. Each chapter addresses an important program development task,
and introduces tools for completing the task on all the major desktop operating
systems. We try to minimise the discussion, and get you started right away with
practical instructions, adding hints and tips for common issues at the end of each
chapter.
C and C++ programming languages are very powerful, but have some memory
and security vulnerabilities that require special attention. This book will get the
reader up to speed with standard programming tools used with many languages,
but also with specialist tools like fuzzers and static code analysers that are
particularly useful for finding flaws in C and C++ programs.
Rest assured that you may use many of the same techniques from this book in
other programming languages, including web technology, with some slightly
different tools and functions. In any case it is beneficial for all programmers to
spend some time with C, reasoning about memory allocation and memory access
patterns. These are important skills with respect to performance for modern
computers in any programming language, but can be abstracted further from the
sight of the programmer in other languages.
The last content chapter, “Argh! My New Job is on Linux! - Unix Tools”, is not
specific to C or C++ but is an increasingly common scenario for coders in a
modern office. C is the language of Unix, and Unix derived-systems have always
supported C and C++ development. C++ is very popular in the 3D graphics
world, where coders have mostly been based on Windows. These days we’re
seeing a lot of graphics on embedded devices and AR/VR headsets, which often
run on a little Linux device. We also see teams commonly using a variety of
virtualised or Docker-container Linux systems to distribute code or run servers. It
felt right to include a chapter on Unix tools here, with a focus on monitoring and
getting the most out of your C programs.
We challenged ourselves to keep this book short, so had to leave a lot of useful
topics out. In particular, it would be really valuable for those entering industry
roles to follow a "day in the life" view of how a programmer in industry works in a
team, their collaboration tools and processes, version control tools such as Git,
reporting tools, and how we are expected to work with designers, QA, and
product teams in a modern office. Look out for future volumes!
The basics of programming with C and C++ are not covered in this book. You
should already be able to write at least a few simple C programs with loops,
arrays, printf() and text file output, and know the basics of dynamic memory
allocation. You should know that
allocates heap memory large enough to store 16 integers, and stores the address
of the first byte of that in my_pointer.
The compiler is the most useful C programming tool of course, but | expect you
already know how to compile and link a simple program with your chosen
compiler, e.g.
gcc main.c
Even if you're using a different compiler with different settings and flags, it's a
helpful competency to also know the basics of GCC or Clang. GCC and Clang
are used everywhere in the industry and almost everywhere to compile free and
open source software; which is in turn used by non-free industry software
extensively. You will certainly need to compile some of it in your career.
If you're not comfortable with these topics yet then | suggest the following might
be a useful start:
10
The good news is that C is quite a small language that doesn't take long to learn.
If you're coming from another programming language this should be quite
do-able.
Code Examples
Rest assured that the tools described in this book apply equally well to C and
C++. To keep example snippets fairly generic to versions of both languages the
code examples in this book are written in C99, without relying on any features
that diverge from C++. This should compile as C or C++ on most compilers
without any special compiler flags, and without any major code changes.
If you need to use C89 you probably already know how to change the single line
comments and declare-anywhere variables to suit your compiler. C++ users may
wish to explicitly cast the type of the pointer returned by malloc(), or may prefer
to use new instead.
int* my_pointer =
ll malloc( 16 * sizeof( int ) );
Avoid copy-pasting code from book examples when you're learning - you are
much better served rewriting it all yourself in your own style. To see how each
snippet may fit into a small, complete, program, there is a repository of code
examples at https://github.com/capnramses/pro_programming_tools_c_cpp.
11
Reading Recommendations
If you like short, useful, programming books similar to this, but with a creative
task:
e Ray Tracing in One Weekend (Ray Tracing Minibooks Book 1), by Peter
Shirley. Digital. 2016.
e Make Your Own Neural Network, by Tariq Rashid. CreateSpace, 2016.
ISBN-13: 978-1530826605
If you’re interested in the Deep Learning boom Make Your Own Neural Network
is a pretty easy practical how-to guide to writing a handwriting recognition
program. It’s Python-based, but | was able to pretty easily write a C version
based on the Python examples, which is on my GitHub
https://github.com/capnramses/neural_net_handwriting.
For C Programming:
These books are quite old now. | mostly use C99, which has a few convenience
features over the venerable C89 (AKA C90). C11 is also used, but not quite as
widely supported across compilers. | suggest supplementing the above books
with some quick reading over the new features in C99 and C11. Thankfully C is
pretty small so that’s only a few minutes of reading!
fe
1 Coding Assertively with
Assertions
Assertions are a basic run-time test for program state correctness, and can save
you a lot of time. If given an expression that resolves to false, an assertion prints
the file name and line number of the failing call, and raises SIGABRT. This
deliberately crashes the program in C, but it can also be intercepted by a
debugger to help analyse the cause of failure. If the expression resolves to true
then it does nothing.
Adsserlions
:
vo fase ond fol
it wail vesove
a \m True Nope |
(,0°%
as, eco
e Test your logical assumptions when writing code, and catch incorrect or
unexpected state.
e |n-code assertions are very time-effective and flexible to write during
coding.
e lf the program crashes it forces you to get the problem fixed right away.
Fixing bugs before writing more code is always a good habit for quality of
work.
e Continuing a program with out-of-control state can lead to very complex
bugs later in execution, that take longer to diagnose.
e They are especially helpful when learning, for testing your assumptions
about how things actually work, getting each concept right before moving
on.
e Can be removed from release builds. In C or C++ define NDEBUG to
remove all the assertions from a build.
e Can catch unexpected changes elsewhere in the code that break your
code at a later time.
e When used aggressively, can be used as an alternative technique to
formal test programs and test-driven development (TDD). This can be very
handy for tests in code that is iterated on and changes frequently, where
updating formal test frameworks would slow you down.
14
How and When to Use Assertions
Include <assert.h>
Call assert( your_boolean_ variable or_expression_here );
Run your program, and check the printed output for abort reports.
Use a debugger to trace the stack of function calls and parameter values
ON
that lead up to the assertion failing.
A good strategy is to test the inputs of each function for validity, e.g. NULL
pointers, or check if values used to index arrays will be within bounds. As a rule
of thumb, any code where you think...
“Argh! If this variable is ever negative / NULL / too large / false, the code will
break!”
...then it is a good place to guard with an assertion. The assertion might pass for
now, but if you leave it in, the assertion can catch a breaking change to the code
at a later date.
Assertions are not appropriate when your program should gracefully handle an
issue and continue running. This usually includes attempting to load a file, or
occasions where reporting an error to the user is more appropriate.
15
#incLude <string.h>
#include <stdio.h>
#include <assert.h>
fs
Stristreindeds= "\O07
és
int main() {
char name[10@];
strncpy( name, "anton", 10 );
int new_Length = 3; // try changing to a number >= 10
my_str_truncate( name, 10, new_Length );
Jf SnOULd print’ “art”
print? -cruncated: z2s.n", name );
return @;
16
Unit Testing and Assertions
You may also use assertions to unit test for correct outputs of your functions in a
test program. A unit test typically checks a set of inputs for a unit of code versus”
known correct outputs. Our “unit” is a function here.
void run_tests() {
int a_inputs[3] { @, 100, -100 };
int b_inputs[3] { 0, =200 200 };
int expected outputs[ 3] { 8, -100, 162 };
for (int i, = 6; i < -33 21445) {
int result = my_adder( a_inputs[i], b_inputs[i] );
assert( expected outputs[i] == result );
i
}
To have a unit testing assertion that collects statistics and write reports out on
tests passed without stopping the program, you could write your own macro, or
you can use a unit testing framework:
Test frameworks require additional boiler-plate code, and so require a bit more
time investment. Usually similar tests need to be grouped into suites, using some
macros in your test program. These tools can be integrated as part of an
automated build process for release builds.
17
Tips and Common Problems
For best effect assertions should be used in combination with other testing and
debugging tools. When combined with an interactive debugger, a back-trace of
functions called, and logging and printing of errors and warnings, a programmer
can very rapidly diagnose, follow, and reason about problems in even very
complex code with many people contributing to it.
For small projects assertions may be all you need, but for production code,
assertions should not be your only code tests. It is common to see studios
pair heavy use of in-code assertions with an automated “smoke & build"
test, or automated fuzz testing.
When an assertion fails it also prints the expression that failed. You can
exploit this to add a string of text to the output.
You can define NDEBUG when compiling to remove all the assert calls from
a build, e.g.
Don't call any functions, or put any code that changes program state,
inside the assertion: e.g. assert( render() ). Release builds often
define NDEBUG, which will remove the assertion macros, including any code
called inside assert() parentheses!
Code inside an assert expression can also cause a Heisenbug - a bug that
goes away when you run a debugger to find it - when NDEBUG is not set.
If you want a special assert() that also works reliably in release builds
you can create your own macro that tests an input and calls abort().
| have even heard of studios that leave assertions in shipped code, to force
18
bugs to be found and fixed immediately, and not allow an app to get into an
unstable state.
A custom assert() can be used to programmatically trigger debugger
breakpoints to help you analyse the failing condition in a debugger. The
method for doing this varies per debugger and platform.
A custom assert() macro can also print or log more detailed information,
such as a backtrace of function calls to help you diagnose and reproduce
a reported crash very quickly.
Many IDEs (Integrated Development Environment) will recognise the
printed output of assert() and let you click on it to jump to the file and
line.
Most fuzz testing tools will count assertions as crashes to fix. In this case
you need to gracefully handle any invalid inputs without crashing.
CMake builds can quietly enable NDEBUG on release builds without you
asking for it. If your assertions aren't working, maybe that's why.
Some languages have a "static" version of assertions for compile-time
checks. C11 and C++11 have static_assert(). These assertions are
useful for ensuring the size of different types or the values of enumerated
types.
19
2 Writing Out an Image File
We typically work with images held in blocks of bytes in main memory. Think of
an array. Each pixel, if a coloured image, is typically represented by 3 bytes - a
first byte representing the red colour contribution, a second byte representing the
green colour contribution, and a third byte representing the blue colour
contribution. This is called RGB (red-green-blue) format, and we can say that it
has 3 colour channels. If each colour channel has 1 byte then there are 256
possible shades (values 0 to 255) of each, giving a combined 24-bits per pixel
colour description. There are many other colour arrangements. Greyscale may
have 1 channel and 1 byte. There may be an additional alpha channel,
representing opacity, or some other special property of the image. BGR images
have the same 3 channels but in reverse order.
20
RGB’ Pixel Memory
2 px
height
Toy Ma
79
7S 4 AO Hh 12 43 fA 48 Ae 4A
3 px width
first byte last byte
(red) (blue)
rows follow in mem Berea1
colour channel (A bite : pink!”
(B-255 or BLO to Mr FF)
unit®_t plr = malloc(18),
*other formats: RGBD, BGR, Greyscale PGBs 3 ¥hanaes
(Alpi usualy represents opaci ty if has 6 pixels (2 rows X 3 columns)=6 x 3 = 48 bytes
Figure 2.1. The 6 RGB pixels in this image require 18 bytes of memory - 3 bytes
per pixel.
Given the number of channels n_channels, and the width and height of the
image we want to work with, we can allocate Meigenel it. To esa 1 She
in C you can use unsigned char, or uint8_t if you# Je tdint.h
You can alternatively use cal loc(), which also initialises all the memory
allocated to zero. That’s handy for images because it sets your initial colour to
black.
Images start counting their pixels usually in either the top-left or bottom-left
corner, and if you follow the memory order, you will write the 3 colour bytes for
the first pixel, then the 3 colour bytes for the pixel to the right, and so on, until the
end of the row. Then starting again at the left-hand side of the next row. This is
essentially one long 1D array.
To retrieve the index of a particular pixel in our image memory, given by an (x,y)
coordinate in the image, we can get an index into the memory with:
21
int pixel _idx = n_channels * ( y * width + x );
For an RGB image this gives us the index of that pixel's red channel. To get the
green channel byte's index add +1, and +2 for blue.
I've used hexadecimal numbers here for convenience, but you can also use the
decimal numbers. Just remember that each channel is only one byte, so only
decimal values 0-255 (00-FF in hexadecimal) are valid. You may have just had an
"Aha!" moment thinking about how HTML colours are expressed?
With this knowledge we can draw into the image we keep in memory, at any pixel
we wish. You might find it convenient to write a function similar to:
22
Output a File
e Wecan dump the image memory out "as is" as a raw image file. But
viewing software doesn't know the dimensions or channel count.
e Wecan use a simple, well-known file format to add this information ahead
of the data. PPM (“Portable Pixel Map”) is one of the simplest that is
supported by viewers.
Let's use PPM. It has a typical structure of a header containing the file type and
dimensions and then a body containing the image data. The header is in ASCII
(American Standard Code for Information Interchange) text. That means when
we write Pé at the top it's 2 bytes. The ASCII code for 'P', and then the ASCII
code for '6'. Not the value 6! This is a convention with binary files, 2 or 3 ASCII
code bytes at the start to indicate what type of file it is.
Figure 2.2. The PPM P6 format has an ASCII header (strings) and a binary body
(our image data array).
23
binary, we just need to plop that directly into a file for PPM.
Hinclude <stdio.h>
int main() {
int width = 256, height = 128, n_channels = 3;
felose@ fpeEr is
}
free( image ptr );
return 0;
Drawing Lines, Shapes, Gradients, and GIFs
Figure 2.3. We can add a few functions to create quick visualisations or chart
data outputs.
To draw gradients or other interesting patterns you can use linear interpolation
(erp) or sine wave functions, using the coordinates of each pixel as inputs. To
generate a rainbow of colours | used a lerp() function where the red value of
each pixel was @xFF atx == 0,downtoOatx > width / 2. Green ramped up
to the middle of the image, then down again, and blue was zero if x < width /
2, and after that ramped up to @xFF at x == width.
If you have a series of images, such as the steps in a sorting algorithm, and you
number the filenames, then you can combine them into an animated GIF file or
video using GIMP or ffmpeg.
29
Other Image Formats
26
Tips and Common Problems
If you can't open a PPM file with your default system image viewer, open it
with GIMP or IrfanView.
"But how do | debug a binary file? We should have just used the ASCI/
version!". And now | reveal the genius of my plan! This was a lead-in to
talking about another great debugging (and hacking) tool in the following
chapter - hex editors!
Test your image code on tiny 2x2 or so images first - it's much easier to
reason about, and debug, small problem domains. Scale up later.
Many image viewers will upscale or Zoom small images for viewing with
added anti-aliasing effects, which can make your pixel-exact images look
blurry. If in doubt, open in an image editor such as GIMP.
When finding the index of a pixel in memory, it's easy to forget to multiply
by the number of channels, and modify the wrong byte.
It's also easy to forget the number of channels when modifying and make a
similar mistake, or extend outside of your memory bounds. | use the
n_channels as a variable, rather than hard-coded numbers, wherever | can
to mitigate this mistake.
When working with memory and pointers you may get some interesting
run-time errors and memory segmentation violations - the program can
throw SIGSEGV and crash. It's a good idea when testing to deliberately
make mistakes with memory bounds to see what happens (and find out
what will work without warning you), even when it's incorrect and unstable.
Always initialise all your variables to something. | used calloc() instead
of malloc(), to explicitly zero-out my allocated image memory. If you
forget to do this it may just so happen to be zero when testing and work on
your machine, but will break as soon as your friend tries running it. In our
case this might mean that unexpected colours will be written into the image
where we didn't explicitly set any pixel colours.
Zi
It's easy to mix up ASCII character codes representing numbers, and the
numerical values those characters display. You'll run into this when storing
the width and height values in your file. These are stored in ASCII in PPM,
even in the binary version of the format, which is unusual. 42 in ASCII is
represented by two bytes - the byte for the character 4, and the byte for the
character 2. The value 42 itself, however, only requires 1 byte to store in a
binary file.
lf you write out a numbered sequence of images, at each iteration of your
simulation, for example, then you can easily compose them into an
animated GIF (Graphics Interchange Format) or movie file. This is a great
way to visualise your program, debug it, and demonstrate it.
28
3 Binary Files and Hex Editors
If you’ve just written a PPM image out in binary format, then a hex editor is your
text editor equivalent for inspecting the content of the binary file. You can find
problems writing files, or inspect a new type of file to figure out its structure for
reading it.
As an example, storing the integer 1048576 in ASCII requires 8 bytes - one byte
for the ASCII code of each character, plus one byte at the end for the space to
separate it. Storing the number in binary, however, can be done with the 4-byte
value of the original integer.
Representing floating point values in ASCII files is generally not a good idea
because it also introduces issues with preserving precision.
BS)
Why use a Hex Editor
Hex editors are particularly useful when writing a program to read a binary file
into the program’s memory. You can inspect the byte values in the file, and
compare that to your assumptions about the data layout.
30
IFile Edit View Terminal Tabs Help
Go 36 OA 32 35 2 38 OA 32 MA FF P6.256 128.255..
00 FF FF FFF 00 | 2 FF FF 0 FF F 00
) FF
the middle columns. Row addresses are shown in the left. Any bytes that could
be ASCII chars are displayed in the right-hand column.
Hex editors can also be used to hack a compiled program. Code is also data. If
you have hard-coded strings in your compiled code they will show up in a hex
editor, and can be easily modified to achieve all sorts of interesting effects.
Most hex editors will display each byte in the opened file as a hexadecimal code -
00 to FF. These are usually separated by spaces and grouped in columns and
31
lines to make it easier to read.
We usually have a left-hand column showing the offset or address of the byte at
the start of each line. This is usually also given in hex, and basically takes the
role of a scrollbar to show you where you are viewing in the file.
Any byte values that correspond to printable ASCII values (hex values 20 for
space to 7E for ‘~’) are usually shown in a column to help you find the location of
strings in the code. Other values are shown as dots. Note that this doesn’t mean
the bytes are actually ASCII characters - just that the values are in that range.
You can also search for ASCII strings or byte sequences in the file.
In the above image (Figure 3.1) you can see that the ASCII header bytes at the
start of the binary P6 PPM file are displayed as characters:
Note that any line feed (OA) characters are shown as dots, but the space (20) is
represented as a space. We recall that this gives us the type of PPM - P6, the
width of the image - 256 pixels, and the height - 127 pixels, and maximum colour
channel value of 255, or FF in hex.
The rest of the file - the binary body - gives sequences of 3 bytes for RGB pixel
colours. Our first colour sequence is FF 00 FF - full red, no green, and full blue (a
bright purple pixel). If we open that image in an image editor it can be confirmed.
Figure 3.2. Opening the PPM image in an image editor such as GIMP will show
that our assumption about purple pixels was correct.
32
This kind of process gives us a good basis for confirming that the file has been
written correctly and has a structure that agrees with our assumptions. For
reading and writing images and the like, it’s often a good idea to create a test file
with one red pixel at the top-left. This can tell you if you are reading the data from
the correct location, or are somehow offset and there is more or less header than
expected. You can do similar things for audio files.
You can also directly modify byte values with a hex editor. You can change the
first body byte from FF to 00 to get a blue pixel in the top-left. This can be helpful
for debugging, and sometimes is the easiest way to quickly patch small issues in
software when the source code is not available.
In a Gamasutra article More dirty coding tricks from game developers by Brandon
Sheffield (July 24, 2015), Ken Demarest recounts:
Back on the first Wing Commander we were getting an exception from our
EMM386 memory manager when we exited the game. We'd clear the screen
and a single line would print out, something like EMM386 Memory manager
error. Blah blah blah.
We had to ship ASAP, so | hex edited the error in the memory manager itself to
read:
Thank you for playing Wing Commander.
33
How to Work with Binary Files
a Usually 2 or 3 ASCII bytes at the start of the file to indicate what type of
file it is. This is sometimes referred to as a file’s “magic number’. Binary
PPM has P and 6.
A well specified (in a document or website) header with fixed sizes for
each variable.
The amount or count of data to read from the body.
The size of memory to allocate is given in the header.
If this is all true then we have a very easy process for reading the file:
34
Parsing a Binary PPM File
lf we want to create a function that can read our binary PPM file it’s not
particularly difficult. Unfortunately binary PPM has an ASCII header, which
means it has a varying number of bytes to represent numbers, and might have
additional carriage return bytes at line endings. It’s then easier to use typical
ASCII file or string parsing functions to read this part of the file.
Avoid using %s to read strings as they have a risk of buffer overruns. | read
exactly two %c characters. Avoiding buffer overruns is one reason why binary
headers with fixed-sized fields are preferable.
Next we can determine the pixel data size in the body, allocate memory for it, and
read it in.
The PPM file and the pixel data is now in our program’s memory and we can use
it.
35
Reading Entire Binary Files
In some binary file formats the specification will give fixed byte sizes for each
header variable. In that case it’s convenient, and quicker, to read the entire file
into memory first, then retrieve the header and the body from that memory.
Validation is left off the following snippet to keep it short.
struct file_record {
void* data;
size_t sz;
}3
36
Structs and Memory Alignment
The process of converting flat data, such as the record we have loaded from a
file, into data structures in our program is called deserialisation. When writing a
file we do the opposite - we convert our data structures to a 1D “array” of data
(put it in series). This is called serialisation.
lf we have a binary file loaded into memory, and the header is well specified, we
can create a struct to mirror the header format. Then we can point a struct pointer
at the binary data to conveniently pull out variables. BMP image files do have a
fixed-sized header.
If you look up the BMP file format specification (Wikipedia has a good article) you
can see we would write a bmp_file header struct with exactly 14 bytes of
variables. Offsets and sizes of variables are given in hex first, which gives you a
hint of which tool we might use when working with this.
This looks pretty similar to the PPM header, except it has exact sizes for
unsigned integers, and we also have some unusual preprocessor instructions.
Compilers can add blank memory padding between variables in structs to align
the contents in memory. The compiler may add bytes such that struct variables
37
align to 4 or 8 bytes of memory, for example. The amount of padding introduced
depends on the compiler, the struct layout, and the compiler flags.
Any padding would break our struct pointer idea - the variables would be cast in
the wrong places because the file’s memory layout doesn’t have this padding.
You can add a SSH ed.to seb struct pao or a can add a hint in
the code. The # ma pack ush, 1 ) and# gma pack( pop )tell the
compiler “Don’t ae any seca arenes to site unions, or paecee inside
here!” These preprocessor instructions should work on recent versions of all
compilers the same way.
With padding disabled we can also fwrite() a struct to a binary file. If it doesn’t
quite line up correctly we would bring up our hex editor and see where our struct
variable sizes disagree with the values in a correctly laid out example file. This is
a pretty typical method for creating file readers and writers, and a common use of
a hex editor.
38
Endianness
39
Tips and Common Problems
e Parsing a file part-by-part with many disk access functions can be quite
slow, but sometimes it is still the most convenient option.
e You can read the entire file to memory with one fread(), and parse from
memory instead, which can be very convenient for some binary file
formats, and is usually faster.
e t's also possible to map a file’s disk memory into the program’s memory
space. This requires operating system-specific calls such as mmap() on
Unix systems. This can be faster than standard file reading operations.
e Many commonly-used file formats are very poorly designed, and have
ambiguous specifications, specifications that don’t match common use,
many diverging versions of the format to handle, excessive file size bloat,
or are otherwise inefficient to parse.
e Reading and writing structs to and from files is unreliable unless you
disable struct padding (byte alignment) when compiling.
e Using disables struct padding but breaks the
alignment of your variables on memory. This may adversely affect the
performance of your program on some CPU architectures or cause
unexpected behaviour. If you intend your code to run on more than typical
desktop processors, then this may not be a suitable option for you.
e If is unsuitable, the of fsetof() macro can be used to
retrieve the byte offset of a named variable within a struct.
e To get integers with exact bit sizes like uint8_t include the stdint.h
header. This is commonly used in professional software to ensure that the
size of the integer or data type used by the computer system matches the
size used in the file. It’s a good idea to replace all your integer data types
with explicit sizes where they are going to be serialised or deserialised to
and from files.
e Each file access function returns a value that should be validated - the
number of variables scanned, the number of bytes read, etc. Your program
40
should be able to gracefully handle a corrupted file, or a file that was
moved or deleted whilst being accessed.
File reading and writing functions, especially offset into allocated memory,
are a major reliability and security vulnerability of C and C++. You should
consider using a fuzzer (see Chapter 9 - Fuzz Testing with AFL) to find the
weak points in your file parsing routines.
To test your image file functions you could modify the pixel memory and
write out a new file. Does it open correctly in a range of image editors? Do
the image file sizes match? Can you also read images created by other
software?
You can use the Unix cmp command-line utility to check if two binary files
are the same. This is useful for testing read/write function pairs are
consistent.
41
4 Interactive Debuggers
all +h @ yariobles
debugger controls
x0]
File Edit View Debvua ZERRwl
r
mu-var 22 2.
ape -4} % 3, breakpoints
rh
es
6
)
ae
variables stack console (
ou. are trace out pu
watching window
Figure 4.1. A typical interactive debugger lets you pause your program at
breakpoints, step through code one line at a time, and use a watch list to see
your variables change value.
It's a good exercise to be able to walk through a segment of your code on paper 5]
to make sure you understand how the values in variables change after each
instruction. This is what an interactive debugger does, except it works on your
program while it is running, and gives you the exact values stored in variables at
each step.
42
e Step through someone else’s spaghetti code to help learn how it works.
e Step through your own spaghetti code to help learn how it works.
e Walk through critical sections of code to inspect for inefficiencies or risks.
Can you do this by sprinkling printf()? Yes, it can be quicker sometimes, and it
turns out some very experienced programmers don’t like interactive debuggers,
although printf debugging can get very tedious. If you haven't learned how to
use an interactive debugger yet - do so - more tools means more options for you,
and they can be extra helpful for people learning. It’s still useful to be able to get
a backtrace of your program after a crash, and we will look at how your
debugger can do this too.
e GDB is the GNU project debugger (pairs with GCC) - very commonly
used on GNU/Linux.
e LLDB is the LLVM project debugger (pairs with the Clang compiler) - the
default on Apple.
e The Microsoft Visual Studio IDE has a very powerful integrated debugger.
LLDB has an interface almost exactly the same as GDB, just as Clang mirrors
GCC’s interface.
If you enter gdb or gcc ina macOS terminal, after you install Xcode, it will
actually run lldb and clang, respectively.
43
5 main.c - dwarf3d - Visual Studio Code
File Edit Selection View Go Debug Terminal Help
g_managed_shaders[{_g_ui_panel_sh 194 {
Figure 4.2. Microsoft Visual Studio Code using its C/C++ plug-in to act as a front
end to the GDB debugger on a Linux system. It supports several debuggers and
platforms.
GDB and LLDB have interactive command-line interfaces. You can do everything
in the terminal if you like, sometimes that’s convenient, but usually it’s much
easier to use a graphical front-end to the debuggers. Microsoft's open-source
Visual Studio Code editor has a decent GDB/LLDB front-end, and it is my main
debugging tool on all platforms these days. Xcode, other IDEs, and some
stand-alone tools also give you a debugger front-end, although many are not very
fast or reliable.
44
How to use a Debugger
next breakpoint i.
Figure 4.3. Click in the margin to set breakpoints at lines in your code. Use
debugger controls to start debugging. Execution pauses on breakpoints.
45
3. Most IDEs will have a big green arrow button, a button with an icon of a
bug, or a Debug menu that gives you the option to run your program in
debugging mode. Visual Studio Code will ask you to tell it the path to your
debugger and your program in a config file first. Hit the button! Your
program should run as normal, and pause at the line of your breakpoint,
without executing the instruction.
4. When paused
a. You can usually hover your mouse over variable names to see
what value they currently hold.
b. You usually get a panel of in-scope local variables and their
current values.
c. Youcan see the stack of functions currently open in a stack trace
window.
d. You can right-click a variable name in your code, and add it toa
watch list, to track the value it holds.
e. You can step over to execute the current instruction, and pause
on the next instruction.
f. You can step in to have the debugger step to the first instruction
inside a function call.
g. You can step out to continue execution and pause on the
instruction after the current function call.
h. You can unpause and continue execution until the next
breakpoint is hit.
5. When not paused you should have a button to pause execution. This can
be handy to somewhat randomly find functions that consume a lot of
processing time, and start debugging them.
6. If the program crashes the stack trace gives you the backtrace - a trail of
fingerprints (function calls) you need to reconstruct the crime scene, and
find out what went wrong. If it’s not clear, you can start adding
breakpoints and assert() statements to catch your problem ona
subsequent run.
46
Quick Command-Line Backtrace
If your program crashes when you're not using an IDE, you can run e.g. GDB on
the command line, and deliberately try to crash it again.
gdb ./my_program
gdb) run
Program received signal SIGSEGV, Segmentation fault.
function_that_crashed () at .\main.c:6
6 int value = *ptr; // should crash here.
gdb) bt
#@ function_that_crashed () at .\main.c:6
#1 @xe000000000481589 in some_intermediate_function () at
WLC td
#2 @xeeege0ee004e15a2 in main () at .\main.c:15
gdb) q
If your program takes command line arguments, put them after ‘run’ in the GDB
terminal.
The backtrace printed by the bt command gives the path taken through the
program, with function names, file names, and line numbers, that lead up to the
crash. This is often all you need to find a fail point in the code.
You can also print the source code near the crash with list.
And print the values of any in-scope variable with p my_variable_name.
Within GDB, type help for more commands.
Outside of GDB type man gdb to access the manual.
47
Using Core Dumps and JIT Debugging
The down-side to the above method of getting a backtrace is that you need to
successfully crash the program again.
With Visual Studio you can enable JIT (Just In Time) Debugging in the project
properties settings. When the program crashes it will pop up a dialogue asking if
you'd like to jump to a debugger. This takes you to the line that crashed, where
you'll have a stack trace and can inspect the variables at the time of the crash.
That should give you enough detail to find most crashing bugs.
On Linux and macOS you can enable core dumps in the terminal. When a
program crashes it will then write a file containing all the program data at the time
of the crash. By default the size limit for crash dumps is set to 0. To find the
current limit enter ulimit -c. Set this to unlimited to enable core dumps.
ulimit -c unlimited
These can be very large files. You'll need to clean them up or disable core dump
writing when you're no longer testing programs.
$ ./my_program
Segmentation fault: 11 (core dumped)
On macOS these files are written into /cores/, and you can start a GDB or
LLDB session with your core dump. My core dump was called
/cores/core.4287. Once in the debugger you can get a backtrace. It will also
tell you the signal fired when the program crashed.
48
}(lldb) target create "./a.out" --core "/cores/core.4287"
(Core file ‘/cores/core.4287' (x86_64) was loaded.
P(lidb) bt
'* thread #1, stop reason = signal SIGSTOP
* frame #0: 0x000000010d33ef3f a.out*function_that_crashed at main.c:43:14
frame #1: 0x000000010d33ef69 a.out*some_intermediate_function at main.c:48:2
frame #2: 0x000000010d33ef84 a.out*main at main.c:52:2
frame #3: OxQQ007fff7ad253d5 Libdyld.dylib’ start + 1
frame #4: Qx00007fff7ad253d5 Libdyld.dylib start + 1
Figure 4.4. Typing bt in my LLDB session gave me the backtrace. | get line
numbers where it crashed too, since my program was built with debug symbols.
We can use the debugger to get more information from the core near the crash.
We can also look at the source code of the crashing function, and get the values
of variables to find the cause of the crash. The list command prints source code.
With GDB we can use print ptr to inspect the value of ptr, which should
reveal the address 0x00 - or NULL.
) List function_that_crashed
: /Users/anton/projects/howto_pvt/book_testing/04_deliberate_crash/main.c
#include <stdio.h>
| function_that_crashed() {
value = *ptr; // should crash here. dereferencing a null poi
| some_intermediate_function() {
function_that_crashed() ;
}
(11db) frame variable ptr
(int *) ptr = 0x0000000000000000-
Figure 4.5. By listing the code of the crashing function we see a pointer on the
crashing line. Getting the value of that variable we see 0x00, which is a NULL
pointer. Dereferencing a NULL pointer caused our crash.
49
Tips and Common Problems
e Remember to build a debug version of your program, and run that same
debug version in the debugger, or the tools won't give you much
information. If you see lots of question marks where function names
should be, or the debugger doesn’t stop at a breakpoint - this is probably
why.
e Most debuggers will let you set a conditional breakpoint, that only stops
the debugger if some condition relating the variables in your code is
satisfied like number_of_iterations > 10992 to find problems that
happen later in your program. In Visual Studio Code right-click instead of
left-clicking when placing a breakpoint in the margin.
e Some debuggers will also let you set a data breakpoint, that stops the
debugger if a variable from your watch list changes value. The CLion IDE
calls these watchpoints.
@ Some debuggers will let you set a logpoint. This is a breakpoint that
prints text instead of halting execution. It can be used to inject a printfina
program that’you are debugging but cannot stop to modify.
e l|tcan be misleading to inspect the value of a variable when it hasn't been
initialised yet.
e If you’ve added an array to the watch list, but it’s only added the first
variable, you should be able to change that to a drop down list of
elements. Each IDE has a slightly different syntax for setting that in the
watch list.
e The Visual Studio debugger lets you “go to the disassembly’ during a
debugging session, and step through asm (assembly code)
corresponding to your higher-level code instructions.
e Visual Studio, and some other IDEs, maintain different project settings for
the release and debug builds - you may need to go and replicate some of
your project settings, libraries used, and include paths in settings panels
to get the debug build to compile.
50
If you want to debug into libraries you are using, some libraries will supply
debug builds of the libraries. You don’t normally need these to debug your
program.
If you can’t step in to a function, it’s probably because it comes from a file,
or a library, that was not built with debugging symbols.
Most watch lists will let you switch display of values to
hexadecimal,
which can be useful for some types of data like colour codes.
You can use most watch lists to convert hex values to decimal for you -
add a hex value, e.g. @xFF, rather than a variable name, to the watch list.
You can do the same trick for getting the ASCII value for a character e.g.
Having a friend or colleague sit with you when debugging a tricky problem
can often be really helpful to spot something you missed, especially if
they ask lots of pesky questions, and you can tell them how you think it
should work.
You can also do this by yourself out loud - it’s called rubber ducking
(explaining your code to a rubber duck on your desk).
“The debugger skipped over my breakpoint!” This can happen if the
compiler optimised out a chunk of your code that it determined did
nothing, you’re not running a debug build, or the code you’re editing is not
building to the same .exe file that you’re debugging!
Visual Studio has a full memory inspection panel where you can look up
the address of any variables during debugging and find what values are
set there in hex, and the byte values in memory of anything nearby.
Some debuggers also provide diagnostic tools that can tell you at any
point the CPU and memory used by the program.
Debuggers will allow you to attach to an already running program, instead
of launching it from the debugger. If you know the process ID of the
program (see Chapter 13 for how to list process IDs), with GDB
gdb -p 1234
51
5 Performance Profilers
Some profilers will let you set the sampling frequency. Note that this means a
profile is usually not an exact analysis. Instrumenting your code will also change
52
the performance of your program, which adds a significant bias to analysis.
Profilers typically record events in a trace over a run, or successive runs of your
program.
e A flat profile - The count, or frequency, each function was called during
the trace, and the average or total time spent in each function.
e Acall graph - the paths through functions where most program time is
spent.
You may add timers to your code, before and after a code block of interest. | do
this during graphics and games development for major parts of the update loop,
and display the times on-screen. This lets you see the relative cost of different
parts as your scene changes. | don't do this every iteration, but take a sample on
a fixed interval. You can also log this to a file to build a profile.
The basic timers in C are not very precise, and you need to use an operating
system-specific function to get a reliable high-precision timer.
53
nanosecond timer. Monotonic clocks are not affected by things like date
or time changes on the system, so are reliable timers.
e OnmacOS mach_absolute_time() can be used with
mach_timebase_info().
e In C++11 or newer there is a cross-platform std: : chrono API which
provides high-resolution timers. The highest resolution clock is accessed
with std: : chrono: :high_resolution_clock: :now(). You can check if
the high resolution clock is also monotonic with is_steady(). The API
has functions for comparing times and returning a result in seconds or
milliseconds.
Similarly, you can occasionally pause your program during debugging to see
where it stops. The most frequently used function bottlenecks may be revealed.
Gprof
Gprof is one of the oldest and most widely used profilers. It is a hybrid
instrumenting/sampling profiler, and pairs with GCC and Clang. If you are using
C or C++, start with Gprof as a first profiler, because it’s quite good, and also
introduces you to some of the issues with profiling.
1. Ask GCC or Clang to instrument your code by providing the -pg flag to
the compiler.
./my_program
54
3. Run gprof on the log to produce results tables.
In the example output below, | have a profiled program that contains functions
that call other functions, with some pointless loops to occupy time. The functions
are called parent(), child(), grandchild(), and great_grandchild().
The flat profile lists all the functions called in your program by the time spent in
each, individually and cumulatively.
Figure 5.1. We can see that great_grandchild() is by far our most expensive
function with 91% of program time spent in it, and it was only called once.
95
The Call Graph
The Call Graph helps spot expensive chains or graphs of functions calling
functions, that together add up to a high cost.
é y
most expensive
mains] ; ‘
chain of pasty
. ne
(including time in
Indented function
our fundion
function it calls —&
most expensive
function on its own
Other profilers have a similar process and have standardised on the same sort of
profile output as Gprof. Gprof’s instrumented builds slow down your program
considerably, which can limit its usefulness for some cases.
56
How to use the Profile to Optimise
We can try to optimise performance bottlenecks you may spot at the top of your
flat profile and call graph
Inlined code is duplicated in your compiled program, rather than being reused in
a function call. This avoids function call overhead, and can be suitable for very
small functions that are used everywhere. C++ and C99 have the inline
keyword for functions. You can use a macro for one-liners.
Some functions do a lot of heavy lifting and need to be time-expensive. Our goal
here is to find code that can be simplified, and check for unexpected costs.
In expensive functions you can think about algorithmic complexity - look for
nested loops.
Figure 5.3. /n a monster update function every monster looks at every other
monster. Can we simplify this by only looking at a subset of other monsters?
of
Perhaps a game's monsters, (Figure 5.3), use an update loop over a long list of
n monsters. Monsters are moved in the update function, but shouldn't collide.
Does each monster also look at every other monster? That suggests an O(n’)
complexity algorithm. Can it be reduced to O(n)? Perhaps by checking a much
smaller subset of nearby monsters instead of the entire list of monsters each
time.
You may also spot expensive code paths in the call graph that have deep call
chains of functions that are not expensive individually. Are there too many very
small functions? Would these be easier to reason about in one, longer function?
Perhaps a small generic function is used everywhere, but using smaller, more
specialised functions can improve overall performance and clarity. Can a
recursive function be replaced with a loop?
Sometimes expensive code paths are through stacks of libraries, and often you
can make the biggest performance improvements to your program by replacing
libraries with small hand-written utility functions.
58
Sometimes the answer is “No, don’t optimise further!”. Over-optimised code
means
Optimise-or-not decisions
99
Other Profilers
| PETS aa
= T.
DI) Stop Abort coed hens
Very Sleepy lists running processes, and the big advantage is that we can easily
start recording profiling data from a program that is already running. | was able to
attach it to a game project. We can also interactively pause or stop recording a
trace, so that we only record a period of interest to us.
60
@very Sleepy CS - C:\Users\anton\AppData\Local\Temp\B88E.tmp
i< >
‘Chile Calis
*:\Name Sa.. ~ %Calls Module Source File
|| _apg_pixfont_strnlen 0.03s 100.00... dwarf3d C\Users\anton\y
Figure 5.5. After |stopped recording the profile in the previous image Very
Sleepy presents the profile results. It shows the most expensive function in my
trace - apg_pixfont_str_into_image(), and gives a view into the function’s
source code.
If Very Sleepy can find the debug symbols then it presents call frequency, time,
and parent and child calls for each function. We also get the functions and files of
expensive functions in our code as well as a source code view. The ease of use
makes Very Sleepy a very handy tool for Windows programming, without needing
to prepare a special build or instrumenting code. | have Very Sleepy pinned to
the Windows taskbar for convenience.
61
Perf (Linux)
Perf usually requires super user permissions to run. When recording -a profiles
all CPUs, -g generates a call graph that you can interactively browse, and -F
samples at 99Hz.
it voxel world
regenerate voxel slab nonwater
serial
recalc voxel. occlusion
le_valid anda solid
.90% get voxel properties
is tile valid
- 2.10% voxelise cube
0.60% coords from voxel
1.20% regenerate voxel slab water serial
0.60% create mesh
-53% memset avx2_erms
-58% draw voxel world
0.060% dwarf3d dwarf3d .] init voxel world
1.00% dwarf3d {unknown ] .] 9988000800H000000
8.60% dwarf3d dwarf3d .] regenerate voxel stab nonwater serial
0.00% dwarf3d [unknown } .] 6x0000000100000006
0.00% Xorg libc-2.27.so de GE td Octet
0.00% dwarf3d {unknown} .] 0x00601000000008000
dwarf3d libc-2.27.s0 .] _.memset_avx2_erms
Chrome ~dThread [kernél.kallsyms] update cfs group
Chrome ~dThread [unknown] tololeoletelolefelofejol
atelala)
file, please install perf! :
The report is an interactive multi-view program that can browse the function call
frequencies, and also annotate source code with time percentages for each
62
function, disassembled to asm.
Perf can also be used to time a process with sudo perf stat, and be run as an
interactive performance monitor, listing the most CPU-expensive functions of
your code currently running, with perf top.
63
Remotery
ClientThread_HandleInput 1 - SampleVoxelsBatched
Renderer_|
SampleVoxelNormals 0, 1 4 DataChunka
WortdUpdate ‘ t
Worid_GatherParts
PartProces Pack\oxels
n CopyGPUBuffertoCPU
CopyGPUBufferToCPU
SampleVoxelNormals
Runs with pretty much every operating system, including mobile devices.
Easy to build into your project and run.
Interactive timeline view.
Can also profile GPU events with several graphics APIs.
Can debug remote or embedded devices over a network.
64
Microsoft Visual Studio (Windows)
Recent releases of Visual Studio IDE have a great integrated code profiler tool
set under the Analyze menu. It has all the same profile data as Gprof, but also
provides an interactive report with a Hot Path slice of the call graph to help
highlight bottlenecks. It also has an interactive timeline to filter profile results.
This can be used to help drill down into performance spikes that may cause a
bad user experience.
30
Hot Path
Function Name
we
Figure 5.8. The Visual Studio profiler can interactively filter its profile by a
selection of the trace timeline.
In addition to the integrated profiler, the Intel VTune profiler has been a leading
commercial profiler, and is now available as a free plug-in for Visual Studio.
https://software.intel.com/en-us/vtune
65
Instruments (macOS)
instruments
Points
Current ‘janet
{
Figure 5.9. Instruments has recorded all the processes on my system, and I’m
examining the Call Graph tree for the functions in my program (a. out).
If you CTRL+click on the Xcode app icon you can launch Instruments
independently.
e The Time Profiler option gives you a profiler, with a scoped timeline
similar to Visual Studio.
e You can have the profiler record all processes currently running on the
system, including your program.
e You can explore the Call Graph tree to get details of the cost of functions
within your program for a slice of the recorded timeline.
66
Tips and Common Problems
If you need finer-grained statistics about the frequency that each code
instruction is called, you can use the GNU gcov tool together with gprof.
Gcov is usually used to produce test coverage reports - you would run it
with your test program to find code not called by the test suites. It outputs
a copy of your source code with an exact count of calls annotated beside
each instruction. https://gcc.gnu.org/onlinedocs/gcc/Gcov-Intro.html
The performance of debug and release builds, or at different levels of
compiler optimisation, can vary hugely. Know which version you are
profiling!
Some instrumenting compilers require special compiler flags in order to
work.
Blocking I/O operations like print-outs may be excluded from profiler
timing. Testing a profiler on a loop of print statements may produce
unintuitive results!
Disk accesses can be very time consuming. Can these be reduced,
working from main memory instead?
Agner Fog has a large collection of optimisation articles and resources.
https://www.agner.org/optimize/
67
6 Build Systems
Building a program or library from several parts can be a little tricky, and some
projects use a build system to help organise the different parts. Build systems
compile and link a program or library from a list of files and dependencies.
You will come across many build systems in your career, and need to be familiar
with a few. The most portable and longest-living projects tend to be the ones that
require no build system at all.
68
Choosing a Build System
A bad reason for choosing a build system is to avoid learning how to build
projects from the command line. The command line should always be your first
build system to consider. If your project can easily be compiled from the
command line then anyone can easily drag your source files into their favourite
build system too, which means your code is easy to distribute.
The following sections review building programs from the command line with
GCC, then we will move those examples into a Makefile. Other compilers use
almost the same instructions, with slightly different flags and options.
To create a program from more than one source file you simply add those into
the compilation command
One file, and only one file, must contain a function called main().
Symbols (variable and function names) are considered external by default
during linking, which means they can be shared between files.
But source files are compiled without knowing about any other files.
To call a function in main.c that is defined in second.c you need to add a
declaration of that function at the top of main.c. The compiler then trusts
69
that the linker will hook the call up to the definition later.
e The linker will fail if function names and global variables have the same
name in different files: “multiple definition of ...”. To keep a symbol
internal or private to a translation unit (compiled source code file) so this
doesn't happen, put the static keyword in front of it. This is a good
default to add for all functions and global variables that you don’t intend to
share between files.
e Header files are usually used to share declarations of functions between
several files.
e lf aheader file is included more than once the declarations can appear to
be multiply defined, which will stop the build. Add a header guard to the
top of the header to stop this happening
This preprocessor directive should work on all recent compilers now. Otherwise
the older approach is to open a header guard at the top of the file, using a unique
name for the file.
70
Linking a Library with a C Program
For Clang and GCC the naming convention for dynamic libraries usually does not
include their filename extension. If the library filename starts with “lib”, such as
libm.so, that is also left off.
lf your library is in a different folder you will need to supply the path to it. With
Clang and GCC that is the -L flag.
Library interface functions are usually declared in a header file. You will need to
supply the paths to any header files in different locations with -I.
TAs
Make and Makefiles
Make is one of the most common, easily recognisable, and oldest build systems
for C projects, and one of the simplest, but it has some gotchas.
| have a program that has 2 source files, and links in Libm. On the command line |
compile like this
Makefile at filename
a varioble
the command
Subs in our gcc’
Type "make clean.
V2
e |f you have the make program installed just type make and it will run the
first rule in the file.
You can also type make, followed by a rule name to run a specific rule.
e Most makefiles will have an ‘all’, ‘clean’, and ‘install’ rule, but you
can use any rule names.
e The biggest gotcha with Makefiles is that commands (under rules), like
gcc main.c... must be indented with a tab, and make will give an
error if they are indented with anything else.
e If you type make, space then hit Tab some terminals will autocomplete and
list the available rules.
e Any commands that run on your command line can be used, but
commands like rm will only run on operating systems that have that
command.
e You can set easy-to-change variables like compilers to use, or directories
to look for headers in. These will substitute as text elsewhere in your
Makefile.
e Rules can call other rules.
Efficient Builds
To have the Makefile speed up builds you can add a dependency rule so that
unchanged files are not recompiled. This is not necessary for small projects, and
| don’t always do this for larger projects because | like to maintain full builds that
take less than 5 seconds, which is great for iteration time.
CC=gcc
INCLUDES=two.h
.PHONY : clean
clean:
rm -f *.o my_program
73
%.0: %.c $( INCLUDES)
$(CC) -c -o $@ $<
e all: main.o two.o - Indicate that these two files must be built first
before executing the command.
e $(CC) -o my_program main.o two.o -lm-Weuse the two .o object
files rather than .c source files. This allows us to split the build into parts.
e .PHONY : clean - Prevents make from being confused with a file of the
same name, in this case ‘clean’.
e rm -f *.o my_program - Also removes the object files.
%.0: - Give make a rule for creating .o files.
%.0: %.c $(INCLUDES) - Rebuild the code if the .c or any header in the
list changed.
e $(CC) -c -o $@ $< - Command to outputa .o from the .c of the same
name.
Meta-Build Systems
In the Unix world, you’ll come across various tools that are used to create
Makefiles configured to a user’s system. The Autotools - GNU Automake with
GNU Autoconf are commonly used. To build those projects you run a
configuration script, then run make.
./configure
make
make install
74
CMake, pointing it at the main folder that contains a CMakeLists.txt file. E.g.
On a Linux machine, from the directory containing CMakeLists.txt | would type:
mkdir build
cd build
cmake ..
cmake --build .
Where the cmake --build command will run the appropriate build command for
depending on the generator it used on your platform - make, msbuild,
xcodebuild, etc.
CMake requires a CMakeLists.txt file in the project, where you create a list of
source files, dependencies, and other build settings. Larger projects get very
complex, with nested CMakeLists.txt files within subdirectories too. It can be
quite tricky to keep track of options and settings between these files.
There is also a GUI (graphical user interface) for building with CMake. Some
IDEs, such as CLion, and newer versions of Visual Studio, can integrate CMake
into the IDE as a project build system.
It is worth your time to familiarise yourself using at least the CMake GUI to build a
third-party library for your project to link against. For example, find an interesting
library on GitHub to use in your project. If you open CMake GUI and point it at the
library's folder containing CMakeLists.txt it will ask you what build system you
want as output - on Windows that may be a solution for a particular version of
Visual Studio, on Unix systems it’s probably a Makefile. You can then load the
solution and build it from within Visual Studio, or call make, from wherever you
specify the output directory.
When you are distributing a reasonably complex library project with source code
it is worth considering CMake or another meta-build system to give users more
flexibility for building it. The function syntax and general advice varies between
versions of CMake, so you must first decide which version of CMake to use. You
can start with a simple example from https://cmake.org/examples and then follow
the extensive developer documentation from Kitware’s CMake Wiki
https://gitlab.kitware.com/cmake/community/-/wikis/home.
75
Tips and Common Problems
Don't get carried away with complicated build systems! It’s easy to spend
far too much time maintaining these things, and that isn’t productive
coding.
It's worth learning how to use and write simple Makefiles, and build from
CMake files, because they are so commonly used.
Using meta-build systems as your main build system will slow down your
first-time builds on a machine considerably because they add additional
build steps, such as searching for required libraries.
Using a generated Makefile on subsequent builds will typically still build
more slowly than your hand-written Makefile or build script.
Generated build files can be very difficult to understand and hand-tweak.
There are a huge variety of build system tools available for C and C++.
There is a tool similar to Make, called Ninja, that is intended to work with
meta-build systems. The Qt project has its own build systems; qmake and
Qbs.
If you use Visual Studio you can run all its build tools from the command
line. They will be on the path if you run the Developer Command Prompt.
You may then compile from the command line using cl. exe, in a very
similar way to GCC and Clang. See Microsoft’s online MSVC
documentation for command line flags. This can be simpler than setting
up project builds in the IDE menus.
cl “main.c
Try to build new software doing full rebuilds of all your project files on
every compile - keeping builds to a few seconds will train you to recognise
problematic code. "/'m not slacking off, my code is compiling!" is probably
a good indication that your build or implementation complexity is far too
high, which is impacting your productivity (and the environment and the
76
power bill).
If your code builds very quickly it is also possible to split a large chunk out
into a dynamic/shared library and have your changes recompile and live
reload any changes to the code, as you type it, while the main program is
still running. This is called hot reloading code. You can approach the
iteration speed of a scripting language with this approach. Functions for
reloading shared object libraries differ between operating systems.
More complex build systems allow you to support multiple build tools and
IDE solutions but tend to slow down your build time and iteration time,
and require users to install and learn the build tools.
Your project’s build-for-release should only require one command or
button click. Multi-step builds invite errors, major goofs, and stress in
production.
Try to keep your build complexity as low as possible - remove
unnecessary libraries, files, and headers when a simple function can
replace them. Faster build times speed up your iteration time and reduce
your development opportunity cost considerably.
Stick with the simplest-level of tech until you need more. C projects
typically build very quickly. Linking libraries adds time. C++ projects
typically take longer to build. Templated code or libraries can add
considerable build time.
If your builds are very slow, you can use a tool called ccache
https://ccache.dev/ to speed up recompiles. My team on one job managed
to reduce a 20 minute build down to a few minutes on one large legacy
project using this.
Build systems can be very time-consuming and frustrating to maintain
across large multi-team projects using different versions of build tools.
Tae
7 End Code Style Arguments with
clang-format
Programmers can have the strongest of opinions about the least important
things. If you really like writing in your own style you can, and have
clang-format convert your code to the project's style later, perhaps
automatically called when you submit your work.
The above command will format all your .c files with the LLVM style.
Clang-format has a few built-in styles to choose from. You can also customise
your own style based on one of those by creating a .clang-format file.
To start creating a config file for your project, based on the Webkit style:
78
You can edit the file in a text editor.
Put your .clang-format file in your project’s root directory and it should
be automatically found by Visual Studio, Visual Studio Code, or CLion.
You can use keyboard shortcuts to apply formatting to the whole file, or a
selection.
Other IDEs can have custom keyboard shortcuts set up to call
clang-format from within the IDE for the current source file.
TAS)
8 Remove Lint with Static
Analysis of Code
Static analysis means analysing your code without running it. A static analysis
tool, or /inter is usually just a compiler that is re-written to provide more
information - at the expense of longer error-check time. Lint was a program on
Unix in 1978, named after a lint (clothing fluff) remover. It was based on a C
compiler.
e Run as part of your build script with a cost in build time, orjust
occasionally.
Look up the types of errors they can catch in the manual/website.
e Can hook up to a Continuous Integration system as part of code and
compile checks.
e Some IDEs have one built in.
A good approach is to use more than one linter on your code to get as
many potential issues flagged as possible.
80
Scan-Build
Scan-build sneakily replaces your compiler with its own stand-in Clang compiler
that looks for various problems in your code.
int main() {
intoarray|si) = {ley 11. 22 ee:
printf( “array[4] = %i\n", array[4] );
return @;
1 warning generated.
scan-build: 1 bug found.
scan-build: Run 'scan-view
81
/tmp/scan-build-2019-99-12-154652-2274-1' to examine bug
reports.
@ main.c x +
QJ GS tl @ File| «
Bug Summary
File: main.c
Warning: line 5, column 2
2nd function call argument is an uninitialized value
6 return 0;
7 }
Figure 8.1. Scan-view can output a report for interactive web-browser viewing.
For Xcode users, you don’t need to use the command line - the Clang Static
Analyzer is integrated directly into the IDE, and has very rich inline breadcrumb
trail visualisation of problems in the code. See https://clang-analyzer.|lvm.org/.
82
Cppcheck
$ cppcheck main.c
Checking main.c
[main.c:5]: (error) Array ‘array[3]' accessed at index 4, which
is out of bounds.
$
Gcppcnec =.
File Edit View Analyze Help
aESOv.xO CASGBEO SW
File 5 everity i
Line Summary Since date Tag
» main.c
© main _jerror 5 Array ‘array{3]' accessed at index 4, which is out of bounds.
#include <stdio.h>
thew a. iqavalanenl
(()) 4h
Inte array hoy = eC lems
printf ("array[4]-= Si\n", array[4]);
return 0;
Figure 8.2. Cppcheck also provides an interactive GUI program to help highlight
problems in your code.
83
Clang-Tidy
It od lot of output,
$0 grain helptal to redirect.
that to a file.
v
We can aive it our headers
It's possible to feed a single file
folders so that it can
to clang-tidy or give it a full find files included in our
path to reo recursively.
source code.
84
e Use of bug-prone code constructs.
e Clang-tidy often finds problems that the other linters do not.
85
9 Fuzz Testing with
AFL
2.52b (te
Figure 9.1. American Fuzzy Lop fuzzing my PPM reader/writer code based on a
set of sample input images to read. It builds a folder of crash reproduction cases.
86
Fuzz Testing and Why You Should Use It
We use fuzz testing to catch any unhandled or unexpected edge cases. Doing
this with real testers is laborious and error-prone. Some fuzzers use a genetic
algorithm to create generations of valid-looking inputs that explore deeper into
the paths of your program. This also gives fuzz testing a value for
security-hardening your code against deliberately malicious inputs.
1. Build the functions you want to test into a little program, where the inputs
can be represented by a file.
87
const char* file_out = "“out.ppm";
tnearecese a) 4) Tiesins = areVvind lee}
Lal ece ee Meh LLecouce= are
ie se}
free( my_image.ptr );
return @;
Ae AB a
fuzzer_inputs/8x8invader®.ppm
fuzzer_inputs/8x8invader1.ppm
fuzzer_inputs/8x8invader2.ppm
fuzzer_inputs/16x16letter_a.ppm
3. Install the American Fuzzy Lop (AFL) fuzzer. Website:
http://Iicamtuf.coredump.cx/afl/ or on Google's GitHub
https://github.com/google/AFL.
On Ubuntu you can install from the repositories:
4. Use afl-gcc or afl-clang to build a clean compile of your program (e.g. make
clean && make al1) in place of gcc or clang. It probably helps to give
your fuzzer build a different file name to the regular build:
The fuzzer will look for sample input files in the directory we made, and make a
new directory called fuzzer_outputs/ where it will put any input files it
generates that cause your program to crash. The -- separator separates the
fuzzer and its arguments from your program and its arguments.
If you normally run your program with the file name as an argument (we do in our
example code) as e.g. ./test_ppm the_test_file.ppm then afl_fuzz will
feed it the correct file name where the @@ appears in the command above. Any
additional command-line arguments you need to send to your program go at the
end.
AFL will probably warn about a few system settings. If you don't want to address
as per instructions then you can suppress them. Put the following all on one line.
AFL_EXIT_WHEN_DONE=1 AFL_I_DONT_CARE_ABOUT_MISSING_
CRASHES=1
AFL_SKIP_CPUFREQ=1 afl-fuzz -i fuzzer_inputs/ -o
fuzzer_outputs/ -- ./test_ppm_fuzz @@
89
6. Either wait for the entire fuzzing process to finish ( it can take an extremely
long time ), or after a couple of hours, have a look in the
fuzzer_outputs/crashes/ directory for anything it puts in there while it's
running. You can also halt the fuzzer with CTRL+C. Then you can run your
regular program with these as input files to reproduce the crashes that
were found.
./test_ppm fuzzer_outputs/crashes/id:90000
very_long filename
If you run this in an interactive debugger, or e.g. get a backtrace after a crash
from GDB, then you can find where your program is crashing pretty quickly. Try
to fix all the problems highlighted by the crash reproductions. Some of these will
be false positives or duplicates of the same issue, although AFL does a good job
of minimising those. Can you guess what's next?
7. Run the fuzzing process again. Repeat the process until no further bugs
are found.
For an example to fuzz that definitely has a problem, you can add a deliberate
crash to your PPM image reader code. The samples still have to load, but
random variations will have invalid data. You can put an assert() if the “P6”
number at the start of the file wasn’t there. The samples will have a valid P6, but
random inputs may not. The idea here is to find places like this that crash, and fix
your code so that it gracefully exits or continues when malformed inputs are
found.
char type[2];
int n = fscanf( fptr, "%c%c\n", &type[O], &type[1] );
btn t=e2e|4 typele) t= .7P" | \etypela] t=916" )4
assert( false );
90
In the fuzzer_outputs/ directory, AFL creates a lot of content.
queue/ - AFL builds a set of test inputs here from your samples.
crashes/ - inputs that caused unique crashes are stored here.
hangs/ - inputs that caused the test program to time out are stored here.
a
https://github.com/vanhauser-thc/AFLplusplus. At the time of writing it also
includes a script for getting past CPU governor and kernel core pattern
issues you may see reported by AFL under
AFLplusplus/afl-system-config.
The docs/ folder of AFL contains tips for fuzzing more quickly, enabling
multi-core fuzzing, and descriptions of various errors, reported in the UI in
red text.
me
10 Asm Inspection
Assembly code (asm) is the low-level language in your compiler chain, one level
before the assembler converts it into the CPU's native machine code. You
usually never see it when you compile your code. It's worth looking at because C
and C++ are high-level languages that can give an imprecise picture of how
your code translates to machine instructions. Each C instruction usually
translates to a few asm instructions. One asm instruction usually translates to
one machine code instruction. Assembly for modern CPUs is 64-bit x86-64, often
referred to as x64.
Ever run a debug build of your program and it's very sluggish compared to a
release build? This usually indicates the compiler has to do a lot of additional
optimisation work on your code to get it to run properly, or that it has lots of extra
debugging checks built in. This is a really common complaint of C++
programmers, and you see people not testing with debug builds as a
consequence, which means they lose useful information. It’s usually hard to see
the difference between builds when we are looking at our C or C++ code. We can
get some insight by comparing the assembled code (asm) of the debug and
optimised builds.
Do you need to learn how to code in asm directly? No, not for most jobs any
more, but it's still insightful to get a feel for how code relates to instructions in
asm - much closer to the compiled machine code that will actually be processed.
Making a habit of inspecting and comparing code, with just a small amount of
looking up what the different asm instructions do, will give you a perspective on
code complexity that most programmers don't have.
93
e Gain a clearer intuition of what compilers do when they optimise, and
where you need to take care.
The asm for an instruction to add two integers may look like
Where
In asm for x86-64 CPU architectures, there are a set of general purpose 64-bit
registers that can be used by name. It’s also possible to use half of a register by
using its equivalent 32-bit name.
94
x86-64 General Purpose Registers
64-bit tax rbx rex rdx rsi rdi rbp rsp 18 19 .. 115
name
32-bit eax ebx ecx edx esi edi ebp esp r8d r9d_... r15d
name
There are also a set of floating point registers xmm0-xmm15. We can see that eax
and edx registers are 32-bits - the size of an int, and we will see these in the
asm of programs where we add two integers.
You can ask your compiler to output the asm from a C file. In GCC you supply
the -S flag.
gcc -S my_file.c
This will generate a file with the .s extension, which you can open in a text
editor. You will see the asm under labels for each of your functions. You will also
see any constants such as strings.
Most debuggers also let you step through assembly code, which can be quite a
useful live view of asm. These tools are sometimes useful, but a bit
overwhelming if you haven't learned the basic concepts of asm yet.
Compiler Explorer is a much more convenient tool for asm inspection, and for
learning about asm.
95
Compiler Explorer
simple
math
:
core
ng:
BS
©
é
1
lenge
mov edx, DWORD
g iePTR [rbp-20]tae
IRD PTR [rbp- edi
Figure 10.1. The Compiler Explorer website. | wrote a simple C function in the left
panel. Line 2 has some simple maths, coloured yellow. The corresponding asm
output for it is also shown in yellow in the right-hand panel.
96
How to Use Compiler Explorer
| wrote a very simple function into Compiler Explorer's C source area. You can try
writing this, or a similar function, and inspect the asm generated.
e Note that the generated asm does not show our variable names, and we
need to hunt through, following the register names to trace what is
happening to our variables.
e Opcodes have names push mov add pop ret. If you hover the mouse
over the opcodes in Compiler Explorer you can get a description of what
they do, and what the arguments to each mean. This is a great way to
learn asm! You can probably guess what add and ret do!
e Enter optimisation level -03 in the compiler options input field. What
97.
happens to the asm?
e Try switching to different compilers and versions in the dropdown menu.
Is the output the same at different optimisation levels? You could try one
of your own functions here.
With Compiler Explorer we have a sandbox code palette to try some more simple
functions. A little bit of tinkering in Compiler Explorer can help you learn some
asm, and how it relates to your original code. You can take this knowledge to
improve your debugging - now stepping into the asm is a little more useful!
Recommended Reading
98
Tips and Common Problems
| suggest using Compiler Explorer to test one small function or data
structure in isolation. | wouldn’t try pasting in a large graphical program
and expect it to work.
It used to be common to mix asm into C code to hand-optimise sections.
You can still do this in GCC and Clang, but Microsoft removed this
capability from Visual Studio. There is a newer equivalent called
intrinsics, which is supported on the most popular 64-bit compilers, and
also provides some highly optimised, small, function-like operations.
Comparing the complexity of functions and data types from the standard
libraries is where asm inspection is most useful as an additional
complexity measurement of code.
Fewer asm instructions does not necessarily result in faster programs.
The highest compiler optimisation level can produce much longer, faster
asm code.
Asm inspection gets really interesting when comparing the complexity of
abstract data types in C++, which has much more complex container
types than C. Try substituting an std: : vector with an array, and
comparing the asm. Then try again at a higher compiler optimisation level.
How much work is moved from run-time complexity to compile-time
complexity when we increase compiler optimisations?
hs)
11 Memory Debuggers, Sanitisers,
and Cache Simulators
In C and C++ it's easy to make mistakes misusing memory. Some of these
issues can be found with static analysis tools and fuzzing, but other issues
depend on the path taken by the program during run-time. Typical problems that
occur at run time relate to dynamic memory allocation, deallocation, and access.
Common memory issues include:
e Memory leaks - allocating memory without freeing it. This can compound
if done in a main program loop.
Double free of memory.
Accessing memory after freeing it.
Accessing memory outside of an allocated block.
Using uninitialised pointers.
Relying on undefined behaviour in the language, that compilers may
change to perform optimisations.
Consider the following buggy C code, which compiles and runs without error:
void my_func() {
int? x= malloc( 16 * sizeof( int ) );
7ase ae I
int main() {
my_func();
return @;
100
Other languages use more sophisticated memory models with garbage
collection and bounds checking, to mitigate most of these problems, at the
cost of performance and compilation time. It’s possible to use or write more
sophisticated C and C++ data structures to get some of these features, if we
want them.
We can use a variety of stand-alone and built-in tools to check our program’s
memory usage for invalid operations and leaks.
Valarind —Valgrind
rind UBSan ASan
Memcheck CacheG
101
Valgrind Memcheck
If we compile our leaky program sample, above, with debugging symbols, we can
run it with Valgrind’s leak checker.
gcc leaky.c -g
Valgrind --leak-check=yes ./my_program
e lf your program has command-line arguments, you can add them after the
binary’s name: ./my_program arg1 arg2
e lf you don’t see file and line numbers in your output from Valgrind,
compile your program with debug symbols, and try again.
e Valgrind will report an invalid write of a 4-byte integer on line 5, where
x[16] = @;. The memory allocated was only 16 integers long - indices
0-15 are valid.
e Valgrind will report a leak of 64 bytes (the 16 4-byte integers we
allocated).
e Your code may be fine, but Valgrind can also find leaks and other
memory issues in libraries, drivers, and other code that you have relied
on. This may be from your misuse of the library’s API - read the
instructions again!
e Technically, we don’t need to free the allocated memory in our example
program, since it is released at program exit.
e |f we were to call our allocating function within a loop, or in response to a
user input, it might create an unmanageable leak.
102
Edit View Terminal Tabs Help
gram |
|==30065== Memcheck, a memory error detector
'==30065== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. |
|==30065== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
|==30065== Command: ./my_ program
pussegbse= |
\==30065== Invalid write of size 4
|==30065== at 0x108668: my func (main.c:17)
|==30065== by 0x10867E: main (main.c:22)
|==30065== Address 0x522d080 is © bytes after a block of size 64 alloc'd
|==30065== at Ox4C2FBOF: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
|'==30065== by 0x10865B: my func (main.c:16)
|==30065== by @x10867E: main (main.c:22)
|==30065==
==30065==
|==30065== HEAP SUMMARY :
==30065== in use at exit: 64 bytes in 1 blocks
|==30065== total heap usage: 1 allocs, 0 frees, 64 bytes allocated
|==30065==
|==30065== 64 bytes in 1 blocks are definitely lost in loss record 1 of 1
|==30065== at Ox4C2FBOF: malloc (in /usr/lib/valgrind/vgpreload memcheck-amd64-linux.so)
==30065== by 0x10865B: my_func (main.c:16)
|==30065== by Ox10867E: main (main.c:22)
|==30065==
==30065== LEAK SUMMARY:
==30065== definitely lost: 64 bytes in 1 blocks
|==30065== indirectly lost: 0 bytes in 0 blocks
(==30065== possibly lost: 0 bytes in 0 blocks
|==30065== still reachable: 0 bytes in 0 blocks
==30065== suppressed: 0 bytes in 0 blocks
|==30065==
==30065== For counts of detected and suppressed errors, rerun with: -v
\==30065== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
Valgrind is very powerful, but is primarily a Linux tool, and makes your program
run very slowly, so doesn't suit every program. On one job, the embedded device
we were working on had a process watchdog that would kill Valgrind before it
finished because it used too many system resources.
Be warned - Valgrind can not inspect the equivalent bounds and allocation errors
that occur in dynamically allocated stack memory.
103
ASan - Address Sanitizer
ASan is Address Sanitizer, a Clang tool that you can add to your program’s build
to help you debug memory issues in your running program. ASan can be used in
command-line builds with Clang, is integrated with other sanitiser tools in Xcode
through checkboxes in the Scheme— Diagnostics panel, and at the time of writing
ASan is also being integrated into new versions of Visual Studio. It covers much
of the same memory error checking surface as Valgrind, but it’s faster and has
some limited support outside of Linux. It’s partially built into your program
(instrumented), but also uses an external ASan library that requires Clang for
linking.
Running the program will give us the same errors caught as Valgrind, and with
more detailed information.
Figure 11.2. ASan’s output about the invalid memory access even visualises the
nearby valid and invalid memory locations.
You can compile your program with optimisation flags at -01 or higher to get
better performance with ASan. ASan will deliberately crash your program after
detecting the first error. For development builds it can be handy to always build
with ASan, as it’s not too slow to work with, and can catch errors before they get
104
built on top of.
To get file, function name, and line information added to the output, you can run
your program with llvm-symbolizer (if you have it installed).
ASAN_SYMBOLIZER_PATH=/usr/bin/llvm-symbolizer ./my_program
e lf you do not see line numbers beside error reports, check that you have
compiled your program with the debug flag -g.
e If Ilvm-symbolizer is not on your system check that you have installed the
main (meta) Llvm package in addition to a specific, numbered, version of
Ilvm.
105
UBSan - Undefined Behaviour Sanitizer
UBSan is Clang’s sanitiser for undefined behaviour. It’s a potential weak point in
your code if your program works, but relies on the way the compiler currently
implements behaviour that is undefined by the language.
int main() {
5 ge ae oe WE
es eapal sf eee wele 1 ne
prantr( ab =941\n" 4b);
UBSan performs extra compile-time checks, but can also use a UBSan library to
provide additional checks at run time.
The compiler sometimes produces a warning for things that are undefined
behaviour, but it doesn’t catch both of our examples. The run-time UBSan does
report both issues.
106
main.c:5:8: runtime error: left shift of negative value -1
b= -2
main.c:8:12: runtime error: shift exponent 32 is too large for
32-bit type ‘int'
c = 6959232
107
Valgrind CacheGrind
The CPU's cache is a set of extremely fast, but expensive, memory buffers that
sits close to the CPU - in between registers and RAM. Modern machines have
3-4 cache levels - L1 next to the CPU to L3 or L4 - called LL (last level), before
RAM. Accessing data from the cache is about an order of magnitude faster than
accessing data from RAM, and faster again than accessing from disk, or over a
network.
e When your program accesses data, it will check the first level of fast
cache memory for it. If it finds the data we call this a cache hit.
e lf it does not find the data, it proceeds to the next cache level, and so on,
and finally to main memory (RAM). We call this a cache miss.
e When data is fetched from an address in RAM, a small chunk of data
adjacent in memory (a cache line's size - usually 32, 64, or 128 bytes), is
preemptively moved with it to the cache.
e Acache miss at L1 costs about 10 CPU cycles.
A cache miss at LL costs up to 200 CPU cycles.
Therefore the idea for optimising performance is to minimise cache
misses by having as much data as possible already in the cache.
We can also reduce the number of branching conditions in our code, and change
our switch and if-statement clauses to improve the branch prediction
108
performance results. A branch misprediction costs 10-30 cycles.
Compile your program with debug symbols, and compiler optimisation turned to
your release settings, to give a realistic simulation.
109
==1774== Mispredicts: 4,785 ( 4,642 cond + 143 ind)
==1774== Mispred rate: LUG CP PLA Ba + “18.24 )
Some compilers do a better job of optimising your code against branch prediction
and cache misses. The Intel C compiler historically has done the best.
If your compiler does not help then you may need to modify your code. Improving
the cache hit ratio and branch prediction are generally best done by following
up-to-date guides produced by the CPU manufacturer, which typically list things
like:
After making changes run the program with a timer to see if it’s faster, and run
the cache simulation again.
110
ee
a
ps and Common Problems
ampe
Try to write your programs such that all errors reported by Valgrind are
fixed right away, before adding new features. This will make a significant
reliability improvement to your software, and reduce the risk of building
onto buggy software, which can be expensive to modify later.
Valgrind may find errors that never show up during debugging. This
doesn’t mean they won’t show up on an end-user’s machine!
Valgrind builds require 10-30x the normal processor resources. If your
program is running too slowly, or Valgrind is not suitable for testing on
e.g. an embedded device, then try ASan instead.
It’s possible to add memory leak checks to your code by writing a wrapper
around memory allocation and freeing calls. An example of this is
stb_leakcheck https://github.com/nothings/stb/.
It's also a valid strategy to allocate all of the memory required by your
program at the start, in a large block. It can then all be released at once,
or will be automatically when the program exits, without needing to worry
about leaks.
| always make it extremely clear if a C function | have written will allocate
memory, and if there is a matching function, at the same level, that the
user of my code can call to free it.
A good C API will allow a programmer to provide their own malloc() and
free() functions to it, so that the library can use the same memory
allocation strategy as the rest of the program.
Remember to disable compiler sanitiser flags before shipping, or your
program will depend on the libraries for those tools.
Address Sanitizer’s memory leak checker is only enabled by default on
Linux. To enable it on macOS you need to set an environment variable
before running your program: ASAN_OPTIONS=detect_leaks=1.
A good example for testing CacheGrind on, where you can see a
significant cache miss difference between sequential and random
111
memory access alternatives can be found at
https://github.com/perfanov/CacheMissExample.
KCacheGrind is a very useful front-end for CacheGrind that can drill down
into more detail to find hotspots in your program for cache misses. It also
visualises profiler outputs and call graphs
http://kcachegrind.sourceforge.net/html/Home.html
2
12 It worked on my computer!
Shipping the Program
Distributing your compiled C or C++ program such that it actually runs on another
computer has a few challenges.
You can avoid many of these issues if you ship the source code, and have the
user compile it, but that has issues to consider too.
‘fe You must provide clear, concise build and run instructions.
Ze The user must have a compatible compiler, and all the external
dependencies required by your source code.
3. You need to provide the licence and copyright limitations of your own
work, in addition to meeting licence and copyright obligations of any
dependencies.
You either need to bundle any external dependencies, or distribute your
program through a package manager system that can pull them in from a
repository.
113
5. Your build system and number of installation and build steps should be
simple enough for your target audience to use.
Shared (dynamically linked) libraries are probably the biggest issue for C and
C++ code distribution. Your ideal tactic for distribution and future-proofing your
program is to reduce the number of libraries that your program uses. A small
dose of not invented here syndrome can help with that.
114
also need to be accounted for. On Windows you can distribute the
compiler's libraries with your program, bake them into your binary
statically, or have your users install the matching redistributable
package for your compiler version. On Linux, unless you create some
installation instructions or script for fetching libraries, you will need to
bundle them with your program, ideally using an old version for maximum
compatibility across distributions, current and future.
For C shared libraries you can have your program open them at run time,
using e.g. dlopen() or LoadLibrary() instead of linking to them. You
can handle libraries linked at run time more flexibly, but it adds boilerplate
to your code.
ks
Dependency Walker and
dumpbin.exe (Windows)
,
| Module |File Time Stamp |Link Time Stamp [File Size | Attr. Link Checksum Real Checksum 4“
AP|-MS-WIN-CORE-APIQUERY-L1-1-0.DLL | Error opening file. The system cannot find the file specified (2)
API-IMS-WIN-CORE-APPCOMPAT-L1-1-0.DLL Error opening file, The systern cannot find the file specified (2)
AP|-MS-WIN-CORE-APPCOMPAT-L1-1-1.DLL Error opening file, The system cannot find the file specified (2),
API-MS-WIN-CORE-APPINIT-L1-1-0.DLL Error opening file. The system cannot find the file specified (2).
API-MS-WIN-CORE-ATOMS-L1-1-0.DLL Error opening file. The system cannot find the file specified (2)
API-MS-WIN-CORE-COM-L1-1-0.DLL Error opening file. The system cannot find the file specified (2)
ISSOGOSG
ADI_RAC_1AINI_CORE_CORABA_11_1_0 Mt Frear nneninn file The cu tem ear Am
| a
|v
Error: At least one required implicit or forwarded dependency was not found. A
|Error: At least one module has an unresolved import due to a missing export function in an implicitly dependent module,
Frror: Modules with different CPU tvnes were found
For Help, press F1
Figure 12.1. Dependency Walker gives you an interactive tree view of the
external library dependencies for any program, and where they are being found
on the system.
If you hit the ‘View full paths’ button you should also get a list of where your
program is currently finding each library.
116
e You can ignore system files such as kernel32.d11, user32.d11,
gdi32.d1l, and shel132.d11 - these are core to Windows.
e You can ignore any graphics drivers, and OpenGL32.d11, which will install
with a user’s video driver updates.
e msvcrt.d1l is the C Run Time. For Visual Studio you will see e.g.
[email protected], and we need to think about bundling that somehow.
If you are using Visual Studio there is also a program called dumpbin that you
can call from within Visual Studio’s Developer Command Prompt tool. It will list a
program or library’s dependencies when you use the /DEPENDENTS flag.
Your program might run using the msvcrt.d11 that comes with a user's Windows
install. If you have compiled with MinGW GCC this is the one being used, and
you don’t need to do anything. Microsoft, however, recommends not relying on
this, as there is no guarantee of stability between versions of the library.
gue 12.2. Visual Studio will static compile the CRT when the Runtime Library
option is /MT.
For Visual Studio builds, where you will see a version number in the CRT
filename, you can:
e Have the user install the Visual C++ redistributable package that matches
your compiler. You can find these on the Microsoft website.
e Statically compile the runtime into your app. This is a good idea! Find the
Runtime Library option in your project's Property Pages under C/C++, and
switch it from /MD to /MT.
Mai
e Drop the required DLL files into the same folder as your .exe, and bundle
that together for distribution. See
lf you run Dependency Walker or dumpbin again on your binary after static
compilation of dependencies you should see a much smaller list.
C:\WINDOWS\system32\cmd.exe
KERNEL32.d11
Summary
2000 .data
1000 .pdata
A@0@ .rdata
1000 .reloc
1000 .rsrc
11000 .text
1006 _RDATA
: \Users\anton\projects\Project1\x64\Release>
118
ldd (Linux)
To get a list of all of the dependencies of your binary, and their dependencies,
you can run 1dd. It’s more helpful to split the output by direct dependencies, and
dependencies of dependencies. You can also find this information, and more
details, with readelf or objdump.
ldd -v ./my_program
readelf -d ./my_program
objdump -x ./my_program
You will find the direct dependencies of your program under the first sections of
each command’s output.
Figure 12.4. ldd reports that my program directly depends on a local library (that
it can't find) called libsecond. so.
To have your binary look for shared object libraries in a local subdirectory, rather
than a system directory, you can:
export LD_RUN_PATH=\$ORIGIN/my_libs/
gcc -O my_program main.c -lmylib -Lmy_libs/
The \$ORIGIN part means “relative to the location of the binary’. Otherwise it will
look for my_libs/ relative to your current working directory, which isn’t as useful.
2. Or create a run script that users will run instead of your binary. This will
119
set LD_LIBRARY_PATH.
' Jina mm L ~ L.
for masn
export LD_LIBRARY_PATH=\$ORIGIN/my_libs/
./my_program
3. Or set the -rpath flag to the linker, 1d, with the relative path to your
libraries. If you are not invoking the linker directly, you can send linker
commands through GCC with -Ww1.
The relative path to your library comes after -rpath, following the comma.
Figure 12.5. After building in the library path ldd now shows the program finds
Libsecond. so in a local directory.
4. Have your program open the library in code using dlopen(), then look for
functions and hook them up to function pointers using dlsym().
With one of these methods you should then be able to create an archive of your
binary, your assets folder, and your 1ib/ folder, and distribute it.
120
otool and install_name_tool (macOS)
Similar to 1dd, on macOS we can use otool to find the shared objects a binary
will try to link to:
otool -L ./my_program
MacOS shared object libraries have the .dylib extension, but you will also see
framework files such as Cocoa. framework. These are a bundle comprising
dylibs and header files for development. You don’t need to redistribute system
frameworks.
If you want to bundle a library with your program then you may need to change
where your program looks for the library. It might run now, but if you send it toa
friend it might not run on their computer, even if you copy the library file with it.
They may also want to run your program from another working directory. The
best solution is to make sure your program looks for the library in a path relative
to itself. To change the path to a library for a binary we can use
install_name_tool.
The -change command modifies our binary so that it looks for the library in the
directory we prefer.
1. The first argument is the application’s previous path to the library. Get this
from otool -L.
2. The second argument is the new path. The @executable_path variable
makes this relative to the directory of the binary, rather than relative to the
user’s current working directory.
3. The third argument is the filename of the binary to modify.
WA
If you move a library into a local folder you can use otool -D ./mylib.dylib to
check its internal ID. This should also be a path relative to the executable that
links against it. You can change the ID with install _name_tool -id.
1. The first argument is the path to the library, relative to the program.
2. The second argument is the library file to modify.
These two commands should allow you to bundle your library in a directory with
the binary and run your program from another working directory. See the man
page for install _name_tool for more options.
App Bundles
If you create your own bundle, you will need to use install_name_tool, as in
the previous section because it will always be launched from an external working
directory. You can write a script to create the bundle, copy files into the directory
structure, and invoke install name_tool on the program binary and any
libraries as part of your build process.
122
App Containers and Package Managers
Taking on any additional software layers has a complexity and reliability cost too
that must be accounted for. It’s always worth reducing the complexity of your
software and build. Ideally you can distribute it without needing additional
packaging software.
123
Tips and Common Problems
124
C++ libraries have various language features that cause the ABI to be
less stable between library versions, and incompatible between
compilers. If you need to use or make a library, a preference for C over
C++ libraries can avoid a lot of these frustrating problems that can stop
your program running. You can write a C interface for a C++ library, which
may help.
Make sure you test your program on machines that never had any
developer tools installed - your program may be dynamically linking to
system-installed libraries that other users won't have. A fresh system
install per release test, e.g. on a new virtual machine, can spot missing
dependencies.
On Visual Studio your default folder structure and binary output location
probably don’t match how you want your final program's folder structure
to look. Asset locations may be in a different relative directory. Under
project settings you can configure the project’s binary output locations to
be in a sensible place so that the relative path to your assets makes more
sense.
Consider giving your debug and release binaries different file names so
you don’t mix them up during testing or bundling.
Remove unstable compiler flags that may have crept in like - ffast-math,
or -Ofast, and enable disabled warnings.
Compile on different compilers to collect any missed warnings.
Compile your release build with optimisation flags to improve performance
of C++ templates etc.
Run static analysis tools, sanitisers, and memory leak checkers.
Before distributing, leave your program running overnight, or for a few
days, to pop out any bugs with memory leaks, floating point error
accumulations, or overflowing timers.
Have someone that wasn't involved in the development process go
through the install and testing process - it’s usually much more fiddly and
error-prone than developers are aware of.
Try testing your bundled application on as many real machines and
hardware configurations as possible - virtual machines will hide real
hardware issues. A common hardware support issue these days is
high-DPI monitors. Does your app run properly at 1440p and above on
different monitors?
Invest time to create a one-command (or one click) build, test & bundle
script that does your entire app compile, version number stamp, any
125
smoke tests, and bundles for a release. A multi-step or fiddly process will
cause avoidable problems and delays.
Have a QA (quality assurance) testing process and sign-off step after
every release is built, and before it gets published. If a team is putting
everything together the rush or the clash of different completed systems,
tested against earlier versions of the codebase, can create lots of issues
that need to be caught.
Create a Release Notes document with new features and known issues,
which can help manage expectations of users and reduce duplicate bug
reports.
On GNU/Linux you can get away with asking the user to have a few basic
shared libraries, and some Linux users prefer this as part of the
philosophy over statically compiled libraries. They’d like it even more if
you distributed your program as source code, which is sometimes viable
even for commercial software.
Creating a distribution-specific package, like a .deb for Ubuntu and
Debian with a tool like dpkg-buildpackage is the most ‘native’ sort of
installer, but sometimes just distributing a tar.gz archive that extracts
and runs from a user’s local folder is fine.
Have a system to capture user feedback and bug reports.
Allow users to send you a log or a dump if the program crashes.
Include user system specs in the reports you get to identify troublesome
system configurations.
cf
126
13 Argh! My New Job is on Linux!
Unix Tools
Unix-derived operating systems were built with a philosophy of many
mini-programs interacting with each other on the command line. You can get a lot
of productive common tasks done by using little programs, rather than library
functions in your own program. If I'm working on these systems | often have a
terminal window open for quick tasks. Unix-derived systems include GNU/Linux,
Solaris, Minix, and BSD-derivatives, such as FreeBSD and Apple’s macOS.
"Argh! I'm on Linux in my new job, how do | work productively without a degree in
Linux?
I'm assuming, reading this, you're already familiar with the absolute basics 1s,
cd, rm, and mkdir, and how to use your distribution's package manager to
install software from repositories. If you're coming from Windows and need a
basic text editor on the command line, try nano - it has the key commands
on-screen.
AZ7
Windows Subsystem for Linux
Print
Show ‘this message
woe .githut,org/bhusyloop/lolcat/issues>
ithub.org/busyloop/iglcatd>
speak lolcat coms
Figure 13.1. Ubuntu running on Windows Subsystem for Linux. It's in a window
on a Windows 10 desktop.
With your own Linux-in-a-window you can access the folders on your Windows
computer under /mnt/c/. You can install tools and software through the
distribution's package managers.
There are some small niggles between operating systems when processing files
128
saved with MS-DOS carriage return \r at the end of lines, but you can change
that by opening your file in nano my_file.sh and switching line-endings mode
on save (CTRL+0 then ALT+D).
WZg
RTFM with man
Unix systems use a system called man pages to store user manuals for
programs, programming library functions, system calls, and other tools. If a
command tool, such as GCC, has a manual installed
man gcc
This should give you a summary of the different command line flags or options,
examples, and any caveats or warnings. Man pages are grouped into numbered
sections.
To quickly look up the documentation for a library function we can specify section
3, then the function name to look up its manual.
man 3 fprintf
f
130
is SE a * Se Fla Ne
RRR it Nar ae et
nae Ree SR a aeg a er a = . = ay
NAME
printf, fprintf, dprintf, sprintf, snprintf, vprintf, vfprintf,
vdprintf, vsprintf, vsnprintf - formatted output conversion
SYNOPSIS
#include <stdio.h>
#include <stdarg.h> |
|
| use this all the time when working on Linux or Mac - it's usually quicker than
Googling. And usually far more correct than what you're likely to find on top-voted
Stack Overflow responses to questions!
e Man pages should always be your first reference, Google for additional
information - usually this is a caveat not considered in the man pages like
"This function is not portable to Windows, use XYZ instead."
131
View Directory Structure with tree
The 1s command is similar to dir on Windows. For a tree view of directories and
their contents install and run tree.
06 intro.md
01 assertions.md
62 image write
| a.out
main.c
out. ppm
@2_ image write chart
— @.out
i— image.c
— image.h
— main.c
-— out2.ppm
— out. ppm
62 image write code
|L— aout
— out.ppm
@2 image write.md
63 binary files hexedit.md
04 interactive debuggers.md
05 profilers.md
05 remotery
i Vib
Figure 13.3. A complete tree picture of project directory structure and contents.
132
Find in Files with grep
You can very quickly search through the contents of files for a particular keyword,
sentence, or fragment, using grep. man grep for the full list of options. | use this
when | want to quickly find any and every place in any file that calls a particular
function, or uses a particular variable.
133
Search and Replace with sed
134
Count Lines of Code with wc and cloc
The wc (word count) command counts newlines, words, and bytes in a file, where
the first value gives us a rough line count.
wc main.cpp
31.147 914 matn. cpp
iC 2 22 14 158
|C/C++ Header a 12 12 15
| SUM 3 34 26 173
|
|anton:~/projects/antons_howto_guides/02_image_write_chart[master]$ Jj
Figure 13.4. cloc counts lines of code, excluding whitespace, and knows many
languages.
For a more accurate lines-of-code count, including comments, you can install a
program called cloc https://github.com/AlDanial/cloc. This is available in Linux
and HomeBrew repositories (for macOS), and can also be installed natively on
Windows.
135
Redirect and Pipe stdout, stderr, or stdin
You will often see command-line scripts that chain the inputs and outputs of
several programs together using pipe |, get-input-from-file <,
redirect-output-to-file >, and append-to-file >>.
The most common use is to write the stdout of a program to a text file.
You can also write only the error output to a file by specifying file stream 2 for
stderr.
Pipe | connects the stdout of a program to the stdin of another program. You
can, for example, search for a word in the output of a program
e You can chain several programs’ inputs and outputs together in this way,
136
in one command.
e These mostly work the same on Windows too, although in PowerShell
you cannot use the < operator.
e To reduce output of a command (or program) to just the top or bottom few
lines, you can pipe the result to head or tail, respectively, and then
again redirect that to a file if you wish.
e tail can also be handy for dealing with big system log files, where you
only care about the last hour or so.
137
Concatenate Files with cat
The cat command, given one file as an argument, will print its contents to
stdout. Given more than one file, it concatenates their contents
one-after-the-other.
This creates a file combining the contents of the first two files. You could also
append to a file by using the >> operator instead of >.
By default ps only lists your own user-owned processes. The most commonly
used invocation of ps is ps aux, which shows (a)ll processes, and their (u)ser or
owner, and (x) processes not attached to a user or terminal.
138
Monitor Resource Usage with top and htop
top is a process monitor. You can watch your program's resource consumption,
or find out if your program is running slowly because something else is using all
the resources. The hungriest processes are at the top of the list, where you can
also see its process identifier (PID), with percentages of CPU and memory
consumed.
139
“nice” numbers have higher priority.
e You can kill a process with k.
gs anteleiad erry pechiatelrnines ssprolacsaalitaiee | eta _guides/02_ image write) > o fl °< |
114321 anton 20 © 3031M 406M 182M S$ 19.2 1.3 8:55.58 /usr/Lib/firefox/firefox -§)
1268 root 20 © 456M 131M 88944 R 16.0 6.4 6:08.38 /usr/Lib/xorg/Xorg -core :§|
110002 anton 20 © 3526M 817M 227MR 6.4 2.5 10:64.30 /usr/lib/firefox/firefox -§|
10130 anton 20 © 3317M 618M 142MS 3.8 1.9 4:34.97 /usr/lib/firefox/firefox -]]|
| 9199 anton 9 -11 1352M 17800 13580 S 3.2 0.1 1:17.34 /usr/bin/pulseaudio --star |
| 1319 root 20 © 456M 131M 88944 S 3.2 06.4 6:42.82 /usr/lLib/xorg/Xorg -core :fj|
'14488 anton 20 © 3031M 406M 182M S 3.2 1.3 0:46.02 /usr/lib/firefox/firefox
17555 anton 26 © 363M 27152 21756 S 2.6 0.1 0:00.15 xfce4-screenshooter
| 9101 anton 20 © 699M 76828 38252 S 2.6 60.2 0:08.66 platy
14473 anton 20 © 3031M 406M 182MS 2.6 1.3 1:00.93 /usr/lib/firefox/firefox
| 9524 anton -6 © 1352M 17800 13580 S 2.6 0.1 0:50.52 /usr/bir ‘jai neaudie -Star
'17527 anton 20 © 27328 4548 3456 R 2.6 0.0 6:02.00 htop
|14481 anton 20 © 3031M 406M 182MS 2.6 1.3 0:46.14 /usr/lib/firefox/firefox
9099 anton 20 © 498M 35924 25084 S 2.6 6.1 ©:16.80 Thunar --daemon
14477 anton 20 © 3031M 406M 182M S 1.9 1.3 0:45.92 /usr/lib/firefox/firefox
| 14478 anton 20 © 3031M 406M 182MS 1.9 1.3 6:45.83 /usr/lib/firefox/firefox
|10733 anton 20 © 2745M 242M 103M S 1.9 0.8 1:57.57 /usr/lib/firefox/firefox -
eso anton = 6 917M = S07 24-o 23° Bel abo gel hese pate terminal
5 anton © 3897M 3M 343M Ss: 2S 2.4 8s3 ib/firetox/ firefox
Be el
Figure 13.6. The htop process monitor provides visualisations and different view
filters.
A more sophisticated version called htop can be installed. It's a bit easier to
read, and has visualisations of a sort, to give a clearer picture of overall system
CPU core utilisation and memory consumption.
140
Force Quit a Process
with kill, killall, and
xkill
To time a program or command you can use the time command, or the more
detailed /usr/bin/time.
time ./my_program
/usr/bin/time ./my_program
141
The /usr/bin/time program can give additional statistics in verbose mode -v.
Om0.005s
QmO0.005s
Qm0.000s
:~/projects/howto.
guides pvt/book testing/92_image write chart[master]$
/usr/bin/time -v ./my program
Command being timed: "./my program"
User time (seconds): 0.01
System time (seconds): 0.00
Percent of CPU this job got: 92%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.01
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 1728
Average resident set size (kbytes): 0
Major (requiring I/0) page faults: 0
Minor (reclaiming a frame) page faults: 111
Voluntary context switches: 1
Involuntary context switches: 0
Swaps: 0
File system inputs: 0
File system outputs: 400
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
anton:~/projects/howto.
guides pvt/book testing/02 image write chart [master]$
Figure 13.7. Two different timers collecting user, system, and real run time for a
program called my_progranm.
It is also possible to collect the running time and other statistics using the perf
profiler on Linux:
142
Closing Remarks
We hope this book introduced you to some useful tools for improving your work,
and that you enjoyed the presentation. We intend to expand the availability of this
work in other formats and to other distributors. If you have any preferences in this
regard please let us know.
We would love to produce further resources, and do lots more illustrations and
cartoons to help with other subjects. Look out for future volumes! If you think this
style would work well with another topic please also send a request.
143
Acknowledgements
Special thanks to Clare Conran and the excellent people at the ADAPT Centre
for hosting at Trinity College Dublin and encouraging the start of this work.
We would like to thank the following reviewers for their invaluable insights and
corrections:
Thanks to Daniel Gibson for comments and suggestions on assertions that are
incorporated here. Thanks to Don Williamson for advice about, and images, of
Remotery. Thanks again to Saija Sorsa for training me in the art of fuzzing
software. Thanks to Rowan Hughes for advice. Thanks to Rich Geldreich for
encouraging the use of fuzzers, which set this work in a good direction. Thanks to
Claire Cunningham for design advice.
144
About the Authors
Anton Gerdelan
Katja Zibrek
145
Printed in Poland
by Amazon Fulfillment
Poland Sp. z 0.0., Wroctaw
ALMA
2177977R00086
If you answer yes to any of the following then this book is definitely for you!
: Are you looking to pick up a few new tools? Never used a fuzzer? You should!
Are you a graduate student wanting some practical knowledge with techniques
and tools that are used in the industry?
& Are you a professor and want to show some neat tricks and tools to your
class?
Are you an engineering manager, and want to improve the quality and
performance of your team's code by improving their knowledge of memory
ja sanitation, performance profiling, and CPU cache efficiency?
Are you starting a new job and you have to use Linux seriously '
time?
This book presents a quick start to a full range of tools you can use for
and shipping quality software written in the C or C++ programming lan
chapter addresses an important program development task, and introv
completing the task on all the major desktop operating systems. We t
the discussion, and get you started right away with practical instructio
have a section on Tips and Common Problems at the end.