C Cplusplus Language Notes
C Cplusplus Language Notes
Dennis Yurichev
<[email protected]>
cbnd
c
2013,
Dennis Yurichev.
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported
License. To view a copy of this license, visit
http://creativecommons.org/licenses/by-nc-nd/3.0/.
Text version (May 15, 2014).
There is probably a newer version of this text, and also Russian language version also accessible at
http://yurichev.com/C-book.html
You may also subscribe to my twitter, to get information about updates of this text, etc: @yurichev, or
to subscribe to mailing list.
CONTENTS
CONTENTS
Contents
Preface
0.1 Target audience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
0.2 About author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
0.3 Thanks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
iii
iii
iii
iii
2 C
2.1
2.2
Memory in C . . . . . . . . . . . . . .
2.1.1
Local stack . . . . . . . . . . .
2.1.2 alloca() . . . . . . . . . . . . .
2.1.3 Allocating memory in heap . .
2.1.4 Local stack or heap? . . . . . .
Strings in C . . . . . . . . . . . . . . .
2.2.1 String length storage . . . . . .
2.2.2 String returning . . . . . . . .
2.2.3 1: Constant string returning . .
2.2.4 2: Via global array of characters
2.2.5 Standard string C functions . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
1
1
4
5
5
6
7
8
9
10
10
13
14
15
16
18
18
19
19
19
20
20
21
22
22
22
23
23
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
24
24
24
24
24
27
28
29
31
31
31
32
1 Operating System
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
CONTENTS
. . . . 35
. . . . 35
. . . . 35
. . . . 35
. . . . 37
. . . . 37
. . . . 38
. . . . 38
. . . . 38
. . . . 38
. . . . 38
. . . . 39
. . . . 39
. . . . 40
. . . . 40
. . . . 40
. . . . 40
. . . . 43
. . . . 44
. . . . 46
. . . . 46
. . . . 47
. . . . 47
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
48
48
48
48
49
49
49
50
50
50
4 Other
4.1 Error codes returning . . . . . . . . . . . . . . . . .
4.1.1
Negative error codes . . . . . . . . . . . . .
4.2 Global variables . . . . . . . . . . . . . . . . . . .
4.3 Bit fields . . . . . . . . . . . . . . . . . . . . . . . .
4.4 Interesting open-source projects worth for learning
4.4.1 C . . . . . . . . . . . . . . . . . . . . . . .
4.4.2 C++ . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
51
51
51
52
54
55
55
55
2.3
2.4
2.5
2.6
2.2.6 Unicode . . . . . . . . . . .
2.2.7 Lists of strings . . . . . . .
Your own data structures in C . . .
2.3.1 Lists in C . . . . . . . . . .
2.3.2 Binary trees in C . . . . . .
2.3.3 One more thing . . . . . .
Object-oriented programming in C
2.4.1 Structures initialization . .
2.4.2 Structures deinitialization .
2.4.3 Structures copying . . . . .
2.4.4 Encapsulation . . . . . . .
C standard library . . . . . . . . .
2.5.1 assert . . . . . . . . . . . .
2.5.2 UNIX time . . . . . . . . . .
2.5.3 memcpy() . . . . . . . . . .
2.5.4 bzero() and memset() . . .
2.5.5 printf() . . . . . . . . . . .
2.5.6 atexit() . . . . . . . . . . .
2.5.7 bsearch(), lfind() . . . . . .
2.5.8 setjmp(), longjmp() . . . .
2.5.9 stdarg.h . . . . . . . . . . .
2.5.10 srand() and rand() . . . . .
C99 C standard . . . . . . . . . . .
3 C++
3.1 Name mangling . . . . . . .
3.2 C++ declarations . . . . . .
3.2.1 C++11: auto . . . . .
3.3 C++ language elements . .
3.3.1 references . . . . .
3.4 Input/output . . . . . . . .
3.5 Templates . . . . . . . . . .
3.6 Standard Template Library .
3.7 Criticism . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5 GNU tools
56
5.1 gcov . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
6 Testing
58
Afterword
59
6.1 Questions? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Acronyms used
60
Bibliography
61
Glossary
62
Index
63
ii
CHAPTER 0. PREFACE
Preface
Today, in year 2013, if one wants to write 1) as fast program as possible; 2) or as compact as possible for embedded systems
or low-cost microcontrollers, the choice is very limited: C, C++ or assembly language. And as it seems in the near future, there
are no alternative to these old but popular programming languages.
Pure C should be still considered, a huge number of large programs are still developed in it, e.g. Linux kernel, Windows
NT OS line kernels, Oracle RDBMS, etc.
0.1
Target audience
This notes collections is not intended for beginners, neither for experts, it is rather for those who wants to fresh their C/C++
knowledge.
0.2
About author
Dennis Yurichev is an experienced programmer reverse engineer. Also available as a freelance teacher of assembly language,
reverse engineering, C/C++. Can teach remotely via E-Mail, Skype, any other messengers, or personally in Kiev, Ukraine. His
CV is available here.
0.3
Thanks
Andrey herm1t Baranovich, Slava Avid Kazakov, Tuta Muniz, Shell Rocket.
iii
Chapter 1
Header files
Header files are an interface description, a documentation in some sense. It is very convenient to work with the code in one
editor window while having header files in another window for reference. It can be said it is advisable to make header files
looks like references.
1.2
1.2.1
C/C++ declarations
As it is well-known, in the header files (headers) function declarations are usually present, i.e., function names, arguments
and types, returning type, but no function bodies. This is done for the compiler so it will have information, what it is working
with, without delving into the intricacies of function implementations.
The same can be done for types. In order not to include with the help of #include the file with a class definitions into the
other header file, one can just declare the type presence.
For example, you work with complex numbers and you have a such structure somewhere:
struct complex
{
double real;
double imag;
};
And lets say it is declared in the file my_complex.h.
Of course, one should include the file if one have intention to work with variables of complex type and specific structure
fields. But if you declare your functions using the structure in other header file, you may not include my_complex.h there,
compiler just needs to know that the complex is a structre:
struct complex;
void sum(struct complex *x, struct complex *y, struct complex *out);
void pow(struct complex *x, struct complex *y, struct complex *out);
This may speed up the compilation process and also solve circular dependencies, when two modules uses functions and
type of each other.
Frequent caveats
In order to declare two pointers to char, one may write by inertion:
char* p1, p2;
It is not correct, because the compiler treat this declaration as:
char *p1, p2;
... and declares the pointer to char and just char.
This one is correct:
char *p1, *p2;
const
To declare variables, function arguments and C++ class methods as const is advisable because:
Self-documentation it can be easy seen visually that the element is read-only.
Protection from errors: in case of global const-variable, the process will crash while attempting to write to it due to
memory protection. The compiler also reports an error if to try to modify a const-argument inside of a function.
Optimization: the compiler considering the element is always read-only, may generate faster code for using it.
It is highly advisable to declare all function arguments which you do not plan to modify as const. For example, the
strcmp() function is changing nothing in the input arguments, so they are both usually declared as const. The strcat()
changing nothing in the second argument, but changing something in the first, so it is usually declared with a const in the
second argument.
C++ In the C++, a class methods which are not changing anything in the object is highly advisable to be declared as const.
const-methods of a class are also called as accessors, while non-const methods as manipulators [11].
long double float is 32-bit number in IEEE 754 format, double is 64-bit variable but x86 FPU-coprocessor is in fact operating
80-bit numbers. There is another type for those: long double, it is supported in the GCC1 but not in MSVC2 .
int The int usually occupies the same number of bits as general purpose CPU registers. However, in the x86-64, for better
backward compatibility, int width is still 32 bits.
short, long and long long
At least in the MSVC and GCC short 16-bit, long 32-bit, and long long 64-bit.
In order to avoid confusion, stdint.h (at least in C99) has new types: int8_t, uint8_t, int16_t, uint16_t, int32_t, uint32_t, int64_t,
uint64_t.
bool bool is present in the C++, but also in the C starting at C99(2.5.10) (stdbool.h).
Both in the MSVC and GCC, bool occupies 1 byte.
There is a synonymous to the int type in Windows API BOOL.
Signed or unsigned? Signed types (int, char) are used much more often than unsigned (unsigned int, unsigned char) 3 .
However, in the sense of the code self-documenting, if you declare a variable which will not be assigned to a negative
value, including array indices, perhaps, unsigned type is better. For example, unsigned type is often used in LLVM at the
places where int might be used.
If you work with bytes, e.g. with bytes in memory, then perhaps unsigned char is better.
Aside from that, this may help protecting from the errors related to integer overflow [1].
As a simple example:
#define MAX_BUFFER_SIZE 1024
void f(int size)
{
if (size>MAX_BUFFER_SIZE)
die ("Too large!");
void *p=malloc (size);
...
};
If size will be, for example, 1, then malloc() will be called with an argument 0xffffffff (it is 4294967295). Of course,
we need to add a second sanitizing check: if (size<0), but such check here will have absurdical look.
So, the type unsignedshould be used here, maybe even size_t. size_t declares a big enough type able to store a size of
any, big enough memory block. It is unsigned int on 32-bit architectures, and unsigned int64 on 64-bit ones.
char or uint8_t instead of int? One may think that is a value will always be in 0..100 limits, then it is not necessary to
allocate the whole 32-bit int, smaller types may be enough like char or unsigned char. Besides, it will require less memory.
It is not so. Because of aligning by 4-bytes border (or by 8-bytes border in 64-bit architectures), the variables declared
with the type char, requires as much space as int.
Of course the compiler may allocate only 1 bytes for the char, but then CPU4 will spent more time for accessing unaligned
by border bytes.
Specific bytes processing may be more expensive and slower then processing 32-bit or 64-bit values because CPU registers are usually has the same width as CPU bits. Even more than that, RISC5 -units (e.g. ARM) may not be able to work with
specific bytes internally at all because they have only 32-bit registers.
So if you considering about type for the local variable, int/unsigned int may be better.
On the other hand, which types are better suited for a structures? This is a question of seeking balance between speed and
compactness. On the one hand, one may use char for a small variables, flags, bitfields, enums, etc, but one should not forget
that access to these variables will be slower. On the other hand, if to assign int to each variables, working with a structure
will be faster, but it will require more space in memory.
For example:
1 GNU Compiler Collection
2 Microsoft Visual C++
3 The data type of char is not fixed in the standard, but in GCC and MSVC it is exactly signed type by default. In can be changed in the GCC by adding key
-funsigned-char and in MSVC key /J.
4 Central processing unit
5 Reduced instruction set computing
struct
{
char some_flags; // 1 byte
void* ptr; // 4/8 bytes, offset: +1
} s;
If to compile this with structure packing by 1-byte border, access to the some_flags in the memory may be even faster
then to ptr, because the first field is aligned by 4-byte border, while the second is not.
If to compile this by default structure packing, then 4 bytes will be allocated for the first field and the offset of the second
field will be +4.
Summarizing: if compactness and memory footprint is important, then char, uint16_t, etc, may be used.
x86-64 or AMD64 On the new 64-bit x86 CPUs, the int/unsigned int is still 32-bit, perhaps, for compatibility. So if one need
64-bit variables, one may use uint64_t or int64_t.
But pointers, of course, has 64-bit width.
typedef
typedef introduces synonym for a data type. It is often used for structures, for the reason not to write struct each time before
its name, e.g.:
typedef struct _node
{
node *prev;
node *next;
void *data;
} node;
A lot of such examples may be found in header files in the Windows SDK (Windows API).
Nevertheless, typedef can be also used not only for structures, but also for usual integral types like:
typedef int age;
int compute_mean (age wife, age husband);
typedef int coord;
void draw (coord X, coord Y, coord Z);
typedef uint32_t address;
void write_memory (address a, size_t size, byte *buf);
As we can see, typedef here may help with code documentation, it is now easier to read.
For example, the time_t (The time in the UNIX time format, e.g. what the localtime() returns), it is in fact 32-bit number,
but the type is defined in the time.h file usually as:
typedef long __time32_t;
A preprocessor directive #define may be used here (many do so), but it is worse in the sense of errors handling during
compilation.
typedef criticism Nevertheless, such well-known and experiences programmer as Linus Torvalds is against typedef usage:
[17].
1.2.2
Definitions
String declarations
Character sequences used in strings
\0
\a
\t
\n
\r
0x00
0x07
0x09
0x0A
0x0D
zero byte
bell 6
tabulation
line feed (LF)
carriage return (CR)
4
1.3
Language elements
1.3.1
Comments
It is sometimes useful to insert them right into a function call, in order to have a visual note about meaning of an argument:
f (val1, /* a very special flag! */ false, /* another special flag here */ true);
The whole code block can be commented with the help of #if 8 :
ta
rhold
#if 0
rstrobe = aemif_calc_rate(t->rstrobe, clkrate, RSTROBE_MAX);
rsetup = aemif_calc_rate(t->rsetup, clkrate, RSETUP_MAX);
whold
= aemif_calc_rate(t->whold, clkrate, WHOLD_MAX);
#endif
wstrobe = aemif_calc_rate(t->wstrobe, clkrate, WSTROBE_MAX);
wsetup = aemif_calc_rate(t->wsetup, clkrate, WSETUP_MAX);
This might be more convenient then usual way because the text editor or IDE9 in this case will not break indentation
while auto-indentation.
7 https://en.wikipedia.org/wiki/Here_document
8 preprocessor directive
9 Integrated development environment
1.3.2
goto
Usage of goto10 is considered as bad taste and harmful [4] [3], nevertheless, its usage in reasonable doses [9] may be very
helpful.
One frequent example is return from a function:
void f(...)
{
byte* buf1=malloc(...);
byte* buf2=malloc(...);
...
if (something_goes_wrong_1)
goto cleanup_and_exit;
...
if (something_goes_wrong_2)
goto cleanup_and_exit;
...
cleanup_and_exit:
free(buf1);
free(buf2);
return;
};
More complex example:
void f(...)
{
byte* buf1=malloc(...);
byte* buf2=malloc(...);
FILE* f=fopen(...);
if (f==NULL)
goto cleanup_and_exit;
...
if (something_goes_wrong_1)
goto close_file_cleanup_and_exit;
...
if (something_goes_wrong_2)
goto close_file_cleanup_and_exit;
...
close_file_cleanup_and_exit:
fclose(f);
cleanup_and_exit:
free(buf1);
free(buf2);
return;
};
10 statement
1.3.3
for
The for() statement, as we know, has 3 expressions: 1st computing before all iterations begin, 2nd computing before each
iteration and the 3rd after each iteration.
And of course, there might be written something different from the usual counter.
Caveat #1
If to write this:
#include <stdio.h>
#include <string.h>
void count_spaces(char *s)
{
int spaces=0;
for (int i=0; i<strlen(s); i++)
{
if (s[i]== )
spaces++;
};
printf ("spaces=%d\n", spaces);
};
int main()
{
count_spaces("The quick brown fox jumps over the lazy dog");
return 0;
};
... perhaps this is a mistake: strlen(s) will be called before each iteration that is the code MSVC 2010 generated.
However, GCC 4.8.1 calls strlen(s) only once, at the loop beginning.
Comma
Comma [6, 6.5.17] is not widely understood C feature, however, it is very useful for using in a for() declarations.
For example, it is useful to have two counter or iterators simultaneously. Let the counter just counts from 0 adding 1 at
each iteration, and the Iterator points to the list element:
#include <iostream>
#include <list>
int main()
{
std::list<int> l;
l.push_back(123);
l.push_back(456);
l.push_back(789);
l.push_back(1);
int i;
std::list<int>::iterator it;
for (i=0, it=l.begin(); it!=l.end(); i++, it++)
std::cout << i << ": " << *it << std::endl;
123
456
789
1
However, it is not possible to declare iterators with its different types in for() clause:
for (int i=0, std::list<int>::iterator it=l.begin(); it!=l.end(); i++, it++)
Nevertheless, variables of the same type can be defined:
for (int i=0, j=10; i<20; i++, j++)
continue
continue is unconditional goto to the end of loop body.
This may be very useful, for example, in such code:
for (...)
{
if (is_element_satisfied_criteria_1(...)==true)
{
// do something need in is_element_satisfied_criteria_2()
if (is_element_satisfied_criteria_2(...)==true)
{
do_something_1();
do_something_2();
do_something_3();
};
};
};
... it is all can be replaced by neat:
for (...)
{
if (is_element_satisfied_criteria_1(...)==false)
continue;
// do something need in is_element_satisfied_criteria_2()
if (is_element_satisfied_criteria_2(...)==false)
continue;
do_something_1();
do_something_2();
do_something_3();
};
1.3.4
if
...
printf ("val=%s\n", val ? "true" : "false");
1.3.5
switch
0:
1:
2:
3:
fn1();
break;
case
case
case
case
4:
5:
6:
7:
fn2();
break;
};
And this non-standard GCC extension 11 may make things somewhat simpler:
switch(...)
{
case 0 ... 3:
fn1();
break;
case 4 ... 7:
fn2();
break;
};
So if you plan to use only GCC compiler, it is possible to do so.
Variable declarations inside switch()
It is not possible, but it is possible to open a new block and to declare them there (in C++ or starting from C99):
switch(...)
{
case 0:
{
11 http://gcc.gnu.org/onlinedocs/gcc/Case-Ranges.html
};
break;
case 1:
case 2:
...
};
1.3.6
sizeof
Usually, sizeof() is applied to integral types or to structures, but nevertheless it is possible to apply it to arrays as well:
char buf[1024];
snprintf(buf, sizeof(buf), "...");
Otherwise, if to specify array length (1024) in both places (in buf declaration and as a second argument of snprintf()),
then the value is have to be changed at the both places each time, and it is easy to forget about this.
If one need wide-strings, then sizeof() can be applied to wchar_t (which is in turn, 16-bit data type short):
wchar_t buf[1024];
swprintf(buf, sizeof(buf)/sizeof(wchar_t), "...");
sizeof() returns the size in bytes, so it will be here 1024 * 2, i.e., 2048. But we can divide this value by length of one array
element (wchar_t) is 2 in bytes, in order to get elements number in array (1024).
sizeof() can be applied to array of structures:
struct phonebook_entry
{
char *name;
char *surname;
char *tel;
};
struct phonebook_entry phonebook[]=
{
{ "Kirk", "Hammett", "555-1234" },
{ "Lars", "Ulrich", "555-5678" },
{ "James", "Hetfield", "555-1122" },
{ "Robert", "Trujillo", "555-7788" }
};
void dump (struct phonebook_entry* input)
{
for (int i=0; i<sizeof(phonebook)/sizeof(struct phonebook_entry); i++)
printf ("%s %s - %s\n", input[i].name, input[i].surname, input[i].tel);
};
sizeof(phonebook) is a size of the whole array of structures in bytes. sizeof(struct phonebook_entry) is a size
of one structure in bytes. By division we get number of structures in an array.
1.3.7
Pointers
As Donald Knuth once said in the interview [10], the way C handles pointers, was a brilliant innovation at the time.
So let us fix terminology. A pointer is a just an address of some element in memory. The reason pointers are so popular is
that an address of object is much easier to pass into a function instead of passing the whole object because it is absurdly.
Besides, calling function, e.g. processing a data array, will just change something in it instead of returning new one, which
is absurdly too.
Lets take a simple example. The standard C function strtok() just divide string by substrings using specified character as
delimiter. For example, we may specify the string The quick brown fox jumps over the lazy dog and set the space
as a delimiter.
10
#include <string.h>
#include <stdio.h>
int main()
{
char str[] = "The quick brown fox jumps over the lazy dog"; // correct
//char *str= "The quick brown fox jumps over the lazy dog"; // incorrect
char *sep = " ";
/* get the first token */
char *token = strtok(str, sep);
/* walk through
while( token !=
{
printf(
token =
}
other tokens */
NULL )
"%s\n", token );
strtok(NULL, sep);
};
What we got on output:
The
quick
brown
fox
jumps
over
the
lazy
dog
What is going on here is that the strtok() just searching for the next space in the input string (or any other delimiter set),
writes 0 to it (this is string terminator by C conventions) and returns a pointer to that place.
As a shortcoming, it can be said that the strtok() function garbles input string, writing zeros at the delimiters places.
What is worth to note: no strings or substrings copied in memory. The input string is still on its own place.
It is only pointer to the string (or its address) is passed to the strtok() function.
The function then after it writes 0, returns address of each consecutive word.
The address of the word is then passed to the printf(), where it dumped to the console.
N.B. An incorrect declaration of str is present in the source code.
It is incorrect in that sense that the C string has type const char*, i.e., it is located in the constant data segment, writeprotected.
If do so, then the strtok() will not be able modify the input string by writing zeros there and the process will crash.
So, in our example, the string is allocated as an array of char instead of array of const char.
Generalizing, we may say all standard C strings functions works with them using only their addresses.
For example, the function of string comparison strcmp() takes addresses of two strings and compare them by one character. It would be absurdly to copy these strings to some other place so the strcmp() may process them.
The difficulty of C pointers understanding is in the fact that pointer is a part of an object. The pointer to the string is not
the string itselfs. The string should be placed somewhere in memory, a memory should be allocated for it before, etc.
In a higher level PL12 an object and a pointer may be represented as a single whole, and that is makes understanding
simpler.
It is however not mean that a strings and other objects are copied misspendinly in these PL a pointers are used there
internally likewise as in C, but this mechanisms are hidden from the programmer.
Passing a value to a function is also called call by value or pass by value while passing a pointer to an object is called
call by reference or pass by reference.
Syntactic sugar for array[index]
For the sake of simplification, it could be said that C has not arrays at all, it has only syntactic sugar for expressions like
array[index].
12 Programming Language
11
12
1.3.8
Operators
==
Somewhat unpleasant mistakes may appear if in if(a==3) condition become if(a=3) in result of typo. Because the statement
a=3 returns 3, and 3 is not a 0, so the if() condition will always trigger.
It was fashionable in past to protect from such mistakes by writing: if(3==a), and thus, we will get a if(3=a) in case of typo
and the compiler will report error instantly.
Nevertheless, in modern times, compilers are usually warns if to write if(a=3), so elements swapping in conditions is probably not necessary these days.
13
1.3.9
Arrays
14
1.3.10
struct
In the C99(2.5.10) it is possible to initialize specific structure fields. Fields not set will be filled by zeroes. A lot of such examples
can be found in Linux kernel.
struct color
{
int R;
int G;
int B;
};
struct color blue={ .B=255 };
And even more than that, it is possible to create a structure right in the function arguments, e.g.:
struct color
{
int R;
int G;
int B;
};
void print_color_info (struct color *c)
{
printf ("%d %d %d\n", c->R, c->G, c->B);
};
int main()
{
print_color_info(&blue);
print_color_info(&(struct color){ .G=255 });
};
The structure is also can be returned from the function in the same way:
struct pair
{
int a;
int b;
17 works like lfind(), but if the element is absent there, it also inserts it
18 http://gcc.gnu.org/onlinedocs/gcc/Designated-Inits.html
15
1.3.11
union
union is often used when in some place of the structure one need to store various data types by choice. For example:
union
{
int i; // 4 bytes
float f; // 4 bytes
double d; // 8 bytes
} u;
Such union allows to store one of these variables by choice. It will require the same ammount of space as a largest
element (double) 8 bytes.
union is often used as a way to access some data type as another data type.
For example, as we know, each XMM-register in SSE may be represented as 16 bytes, 8 16-bit words, 3 32-bit words, 2
64-bit words, 4 float variables and 2 double variables. This is how it can be declared:
union
{
double d[2];
float f[4];
uint8_t b[16];
uint16_t w[8];
uint32_t i[4];
uint64_t q[2];
} XMM_register;
union XMM_register reg1;
reg.u.d[0]=123.4567;
reg.u.d[1]=89.12345;
// here we can use reg.u.b[...]
It is also handy to use it with a structure where fields has bit granularity. As x86-CPU flags:
typedef struct _s_EFLAGS
{
unsigned CF : 1;
unsigned reserved1 : 1;
unsigned PF : 1;
unsigned reserved2 : 1;
unsigned AF : 1;
unsigned reserved3 : 1;
unsigned ZF : 1;
16
tagged union
It is union plus flag (tag), defines type of union. For example, if we need a variable which can be a number, a float point
number, a text string (as in dynamically typed PL 20 ), then we may declare such structure:
enum var_type
{
INT,
DOUBLE,
STRING
};
struct
{
enum var_type tag; // 4 bytes
union
{
int i; // 4 bytes
double d; // 8 bytes
char *string; // 4 bytes (on 32-bit architecture)
} u;
19 example is taken from:
http://stackoverflow.com/questions/1001307/detecting-endianness-programmatically-in-a-c-program
17
1.4. PREPROCESSOR
} variable;
The whole size of the structure is 8 + 4 = 12 byte. It is much more compact than to allocate fields for the variable of each
type.
Beginning with C11 [7], (u) may not be specified, it is called anonymous union:
struct
{
enum var_type tag; // 4 bytes
union
{
int i; // 4 bytes
double d; // 8 bytes
char *string; // 4 bytes (on 32-bit architecture)
};
} variable;
... and to access them as variable.i, variable.d, etc.
tagged pointers
Lets back to the example of variable declaration, which can be a number, a floating point number, and a text string. Largest
type double (8 bytes), this means, by storing a lot of such blocks in memory back to back, a pointer to each block will
always be aligned on 8-byte border. Even more than that, glibc malloc() always allocates memory blocks by 8-byte border.
Hence, the pointer to such block will always have zero in 3 lowest bits. And if so, these lowest bits may be used for something.
One chance is to store there the type of union. We have only 3 variable types, so we need only 2 bits for storing a number in
0..2 range.
This is what is called tagger pointer. That is the way to make memory footprint smaller and to get rid of data type defining
enum from the structure.
This approach is very popular in LISP-interpreters and compilers because LISP atoms is a similar variable defining structures, a there are may be a lot of them in memory, so it can reduce memory footprint by using lowest bits of pointers.
Likewise, any other information maybe stored in the tagger pointer bits.
As a negative side, one should always keep in mind that this is not an usual pointer, but has more information. Debuggers
will not be able to work with such pointers correctly.
1.4
Preprocessor
The preprocessor handles directives started with # #define, #include, #if, etc.
Listing 1.1: or
#if defined(LINUX) || defined(ANDROID)
1.5
1.5.1
__FILE__, __LINE__, __FUNCTION__ current file name, current line number, current function name respectively.
In order to get values of __FILE__ and __FUNCTION__ in UTF-16, the following hack may be used:
#define CONCAT(x, y) x##y
#define WIDEN(x) CONCAT(L,x)
wprintf (L"%s\n", WIDEN(__FUNCTION__));
1.5.2
Empty macro
_DEBUG is well-known macro without any value. It is usually checked its presence or absence. Here is another example of
useful empty macro:
In the Windows API header files we can find this:
typedef NTSTATUS
(NTAPI *TDI_REGISTER_CALLBACK)(
IN PUNICODE_STRING DeviceName,
OUT HANDLE *TdiHandle);
...
typedef NDIS_STATUS
(NTAPI *CM_CLOSE_CALL_HANDLER)(
IN NDIS_HANDLE CallMgrVcContext,
IN NDIS_HANDLE CallMgrPartyContext
IN PVOID CloseData OPTIONAL,
IN UINT Size OPTIONAL);
OPTIONAL,
IN
IN
OUT
OUT
OPTIONAL
OPTIONAL
They carry no information for compiler at all, they are intended for documenting purposes, to mark function arguments.
1.5.3
Frequent mistakes
#1
For example, you may want to define a macro for taking the power of a number:
#define square(x)
x*x
It is a mistake because the expression square(a+b) will unfold into + * + , and that is not what you probably
wanted. So, all variables in the macro definition, and also macro itself, should be parenthesized:
21 LP mean Long Pointer, i.e., pointer require 64 bits for storage
19
#2
If you define a constant somewhere:
#define N 1234
... and then redefine it somewhere, compiler will be silent and that may lead to hard-to-find bug.
So that is why it is advisable to define constants as a global variables with a const modifier.
1.6
Compiler warnings
Is it worth to turn on -Wall in GCC or /Wall in MSVC, in other words, to dump all possible warnings? Yes, it is worth to do it,
in order to determine quickly small errors. In GCC it is even possible to turn on -Werror or /WX in MSVC then all warning
will be treated as errors.
1.6.1
Example #1
#include <stdio.h>
int f1(int a, int b, int c)
{
printf ("(in %s) %d\n", __FUNCTION__, a*b+c);
// return a*b+c; // OOPS, accidentally I forgot to add this
};
int main()
{
printf ("(in %s) %d\n", __FUNCTION__, f1(123,456,789));
};
The author forgot to add return in f1() function. Nevertheless, GCC 4.8.1 compiles this silently.
It is because in the both C standard ( [6, 6.9.1/12]) and in C++ ( [8, 6.6.3/2]) is okay if a function does not return a value
when it should.
After running we will see this:
(in f1) 56877
(in main) 14
Where the 14 number is came from? This is what returns the printf() called from f1(). Returned functions results of integral
types are leaved in the EAX/RAX registers. The value from the EAX/RAX register is taken in the main() function and then passed
into the second printf() 22 .
If to compile with the -Wall option, GCC will tell:
1.c: In function f1:
1.c:7:1: warning: control reaches end of non-void function [-Wreturn-type]
};
^
1.c: In function main:
1.c:12:1: warning: control reaches end of non-void function [-Wreturn-type]
};
^
22 About how results are returned via registers, you may read more here [19]
20
1.6.2
Example #2
In the C99 standard, new type bool and according to standard, it should be big enough to store at least one bit. It is byte in
GCC.
If GCC cannot find declaration of some function we going to use, it considers its returning type as int by default and warns
about it.
Now lets consider we have two files:
Listing 1.2: file1.c
bool f1()
{
...
return cond ? true : false;
};
Listing 1.3: file2.c
...
if (f1())
do_something();
...
GCC has not information about f1() while compiling file2, so it considers its return type as int. GCC knows it should return bool while compiling file1.c, but byte is enough. Variables of integral types are returned via EAX or RAX registers of x86processors 23 , so GCC generates a code which can set only low byte of the register (AL) to 1 or 0 and do not touch the rest
regsiter part, so there might be random noise leaved from an other code execution. So, the generated code of f1() may return
false by writing 0 into the lowest byte of register EAX/RAX, while other bits will contain noise. From the point of view of file2.c
where returning type of f1() is considered to be int, the returning value may looks like: 0x??????00, where ? random bits.
So even if when f1() returning false, if() condition may be triggered almost always.
This notes author once spent several hours for bug-hunting of such error, and he had to dive into the debugger and assembly
listings.
A variant of this bug:
Listing 1.4: file1.c
uint64_t f1()
{
return some_large_number;
};
Listing 1.5: file2.c
...
uint64_t tmp=f1();
...
23 read more here [19, 1.6] on how integral type variables are returned from functions
21
1.7. THREADS
CHAPTER 1. COMMON FOR C AND C++
If the compiler will treat return value type of f1() as int, the 64-bit value will be clipped to 32-bit (because, supposedly
for better compatibility, int is still a 32-bit type in 64-bit environment).
1.7
Threads
In the C++11 standard, a new thread_local modifier was added, showing that each thread will have its own version of the
variable, it can be initialized, and it is located in the TLS24 :
Listing 1.6: C++11
#include <iostream>
#include <thread>
thread_local int tmp=3;
int main()
{
std::cout << tmp << std::endl;
};
25
In the resulting executable file, the tmp variable will be stored in the TLS.
It is useful for storing global variables like errno, which cannot be one single variable for all threads.
1.8
main() function
Standard declaration:
int main(int argc, char* argv[], char* envp[])
argc will be 1 if no arguments present, 2 if one argument, 3 if two, etc.
argv[0] current running program name.
argv[1] first argument.
argv[2] second argument.
etc.
argv can be enumerated in loop. For example, the program may take a file list in command line (like UNIX cat utility does,
etc). Dashed options may be supplied in order to distinguish them from file names.
Both envp[] and argc/argv[] can be omitted in the main() function argument list, and it is correct. Read more here on why
it is correct: [19, 1.2.1].
Return clause can be omitted in functions as of C99 (1.6.1) (then the main() function will return 0 26 ).
in CRT27 the return value of main() function is eventually passed to the exit() function or ExitProcess() in win32. It is usually
a return error code which may be checked in command shells, etc. 0 is usually means success, but of course, it is up to author
to define (or redefine) its own return codes.
1.9
stdout is what is dumped to the console with the help of function printf() or cout in C++. stdout is buffered output, so a user,
usually not aware of this, sees output by portions. Sometimes, the program output something using printf() or cout and then
crashes. If something goes to the buffer, but buffer did not have time to flush into the console, a user will not see anything.
This is sometimes unconvenient. Thus, for dumping more important information, including debugging, it is more convenient
to use stderr or cerr.
stderr is not bufferized output, so, anything comes in this stream with the help of fprintf(stderr,...) or cerr, appearing in the console instantly.
24 Thread Local Storage
25 Compiled in GCC 4.8.1, but not in MSVC 2012
26 this rule exception is present only for main()
27 C runtime library
22
1.10
Outdated features
1.10.1
register
This keyword was used in past to mark a variables which compiler should (if possible) to allocate in CPU registers for the
faster access to them.
void f()
{
int a, b;
register int x, y;
...
}
Modern compilers are advanced enough to make such decisions on their own, so this keyword is outdated. However, it
might be useful while reading ancient source code for quickly spotting busiest variables.
23
CHAPTER 2. C
Chapter 2
C
2.1
Memory in C
Probably, there are two most common memory types available for a programmer in C.
Memory space in the local stack. It is local variables, a memory allocated with the help of alloca(). It is usually a memory
very fast to allocate.
Heap. It is what allocated with the help of malloc().
2.1.1
Local stack
If you declare something like char a[1024], there are no memory allocation happens, it is just stack pointer moving back
for 1024 bytes [19, 1.2.3]. This is very fast operation.
One not need to free that memory, it is happen automatically at the function end, with the stack pointer restoring.
As the flip side, one need to know exactly how much space to allocate, and also, the block cannot be shrinked or expanded,
freed and reallocated again.
Local variables allocated in the local stack by simple shifting stack pointer back [?, 1.2.1]REBook]. During that, nothing
else is happen, new variables will contain the values which were at the place in stack, most likely, what was leaved there from
previous functions execution.
2.1.2
alloca()
alloca() function likewise allocates a memory block in the local stack, shifting stack pointer [19, 1.2.4]. The memory block will
be freed at the function finish automatically.
In the C99(2.5.10) standard, it is necessary to use alloca(), one can write just:
void f(size_t s, ...)
{
char a[s];
};
This is called variable length array.
Internally it works just as alloca() however.
Criticism: Linus Torvalds against usage of alloca() [18].
2.1.3
The heap is an area of memory allocated by OS to the process, where it can divide it within its sole discretion. After terminating
of the process (including process crash), the heap is annuled automatically and OS will not need to free all allocated blocks
one by one.
There are standard C functions to work with the heap: malloc(), calloc(), realloc(), free(), and new/delete in C++.
Apparently, heap manager must use a lot of interconnected structures in order to preserve information about allocated
blocks. So thats why quite tangible overhead is present. You can allocate memory block of size 8 bytes, but at least more 8
bytes1 will be used for preserving information about allocated block 2 . In 64-bit OS pointers requiring twices as much space,
1 MSVC, 32-bit Windows, almost the same in Linux
2 It is also called metadata, i.e., data about data
24
2.1. MEMORY IN C
CHAPTER 2. C
so information about each block will require at least 16 bytes. In the light of this, in order to effectively use as much memory
as possible, the blocks should be as large as possible, or, the orgranization of data must be different.
Heap using require a programmers discipline, it is easy to make a lot of mistakes without one. Probably because of this,
it is widely considered that PL with RAII3 like C++ or PL with garbage collector (Python, Ruby) are easier.
One of the common mistakes: memory leaks
Memory was allocated, but we forgot to free it via free(). This problem is easily solved by thunk functions on top of malloc()/free(). Let this thunk to keep a records about blocks allocated, and also, where and when (and for what) each block was
allocated.
I made this in my octothorpe library 4 . DMALLOC macro calls dmalloc() function passing it the file name, name of the
calling function, line number and comment (block name). At the end of program, we call dump_unfreed_blocks() and it
will dump the list of blocks we forgot to free:
seq_n:2, size: 124, filename: dmalloc_test.c:31, func: main, struct: block124
seq_n:3, size: 12, filename: dmalloc_test.c:33, func: main, struct: block12
seq_n:4, size: 555, filename: dmalloc_test.c:35, func: main, struct: block555
Each block also has a number. This is helpful because one can set a breakpoint by a block number and debugger will
trigger at the moment the block is being allocated, and you can see, where and under what conditions it is occurring.
It is boring to write a comment for each block allocated, but very useful. Then it is easy to see, what was memory allocated
for. I first saw this idea in the Oracle RDBMS. Aside from that, it also keeps statistics of block types, how many memory was
allocated for each, and it is easy to see it:
SQL> select * from v$sgastat;
POOL
-----------shared pool
shared pool
shared pool
shared pool
shared pool
shared pool
shared pool
large pool
large pool
large pool
large pool
NAME
BYTES
CON_ID
-------------------------- ---------- ---------AQ Slave list
1224
1
KQR L PO
653312
2
KQR X SO
635808
2
RULEC
20688
1
KQR M SO
7168
2
work area table entry
12240
2
kglsim object batch
3864
2
PX msg pool
860160
1
free memory
30523392
0
SWRF Metric CHBs
1802240
2
SWRF Metric Eidbuf
368640
2
The same thing present in the Windows kernel, it is called there tagging.
When one allocates memory in the kernel or dirver, a 32-bit tag may be set (usually, it is a four-letter abbreviation, indicating Windows subsystem). Then it is possible to see a statistics in a debugger, how much memory is allocated what for:
kd> !poolused 4
Sorting by Paged Pool Consumed
Pool Used:
NonPaged
Tag
Allocs
Used
CM25
0
0
Binary: nt!cm
Gh05
0
0
MmSt
0
0
CM35
0
0
Binary: nt!cm
vmfb
0
0
Ntff
5
1040
ArbA
0
0
NtfF
0
0
Paged
Allocs
Used
935 4124672
268
2119
91
3291016
2936752
2150400
13
1287
108
457
2148752
1070784
442368
431408
25
2.1. MEMORY IN C
CM16
0
Binary: nt!cm
IoNm
0
Ttfd
0
Ifs
0
h)
CM29
0
Binary: nt!cm
62
331776
CHAPTER 2. C
Internal Configuration manager allocations ,
0
0
0
2022
159
4
267288
253976
249968
26
212992
Of course, one may argue the heap manager will require much more space about allocated blocks, including their names
or tags. And it is slowing down the program much more. That is for sure. Then we may use it only in debug builds, and in the
release-builds DMALLOC() will be simple empty thunk-function for malloc(). It is also turned off by default in Windows and it
must be turned on with the help of GFlags utility 5 Aside from that, something similar present in MSVC 6 .
One of the common mistakes: heap corruption
It is easy to allocate a memory for 4 bytes, but write there fifth by accident. Most likely, it will not come out instantly, but in
fact, it is a very dangerous time bomb, dangerous because it is hard-to-find bug. The byte next after block you allocated, most
likely, is not used at all, but there may begin a heap manager structure, keeping the information about some other allocated
block, or maybe even that block. If some of these structures to corrupt or rewrite intentionally, consequent malloc() or free()
calls will not work properly. Sometimes it is manifested in errors like (in Windows):
HEAP[Application.exe]: HEAP: Free Heap block 211a10 modified at 211af8 after it was freed
Such errors are exploited by exploit authors: if to know you can alter heap manager structures in a way you need, you
may achieve some specific program behaviour you need (this is called heap overflow 7 ).
Widely used protection from such errors: just to write guard values (e.g. of 32-bit size) at the both sides of the block.
For example, I did it in DMALLOC. At each free() call, integrity of both guards are checked (these may be a fixed values like
0x12345678), and if something or someone writed to it, that fact can be reported instantly.
One of the common mistakes: not checking malloc() result
If malloc() finishes successfully, it returns the pointer to the newly allocated block can be used, or NULL in case of memory
shortage. Of course, in our time of cheap memory, this is rare problem, nevertheless, if one use it a lot, one should consider
it. It is not handy to check the returned pointer after each malloc() call, so there are a popular technique to write own thunk
functions named xmalloc(), xrealloc() calling malloc()/realloc(), which checks returning result and exiting in case of error.
It is interesting to note how xmalloc() behaves in git:
void *xmalloc(size_t size)
{
void *ret;
memory_limit_check(size);
ret = malloc(size);
if (!ret && !size)
ret = malloc(1);
if (!ret) {
try_to_free_routine(size);
ret = malloc(size);
if (!ret && !size)
ret = malloc(1);
if (!ret)
die("Out of memory, malloc failed (tried to allocate %lu bytes)",
(unsigned long)size);
}
#ifdef XMALLOC_POISON
memset(ret, 0xA5, size);
#endif
return ret;
}
5 http://msdn.microsoft.com/en-us/library/windows/hardware/ff549557(v=vs.85).aspx
6 read more about the _CrtSetDbgFlag and _CrtDumpMemoryLeaks functions
7 https://en.wikipedia.org/wiki/Heap_overflow
26
2.1. MEMORY IN C
CHAPTER 2. C
If malloc() is not successful, it tries to free some already allocated (but not very needed) blocks with the help of try_to_free_routine(),
and then to call malloc() again.
Aside from that, if XMALLOC_POISON macro is defined, all bytes in the block allocated is filled with 0xA5.
This may help to see, visually, when you use some value from the block before its initialization.
The value of 0xA5A5A5A5 will easily be spotted in the debugger, or, in some place in dump where it will be printed in
hexadecimal form. There are the constant for the same purpose in MSVC: 0xbaadf00d.
Even more than that: after call of free(), freed block may be marked by some other constant, in order to spot visually if
someone attempting to use some data from the block after it has been freed.
Some constants from Microsoft:
* 0xABABABAB : Used by Microsofts HeapAlloc() to mark "no mans land" guard bytes after
allocated heap memory
* 0xABADCAFE : A startup to this value to initialize all free memory to catch errant pointers
* 0xBAADF00D : Used by Microsofts LocalAlloc(LMEM_FIXED) to mark uninitialised allocated heap
memory
* 0xCCCCCCCC : Used by Microsofts C++ debugging runtime library to mark uninitialised stack
memory
* 0xCDCDCDCD : Used by Microsofts C++ debugging runtime library to mark uninitialised heap
memory
* 0xFDFDFDFD : Used by Microsofts C++ debugging heap to mark "no mans land" guard bytes before
and after allocated heap memory
* 0xFEEEFEEE : Used by Microsofts HeapFree() to mark freed heap memory
9
2.1.4
27
2.2. STRINGS IN C
CHAPTER 2. C
break;
bool emulate_success=try_to_emulate(&dDA);
if (emulate_success==false)
break;
};
There are no costs for allocating disassembler structures at all. Otherwise, we need to call malloc()/free() at teach loop
iteration, each of which will also work with heap data structures, etc.
As we know, x86-instructions may have up to 3 operands, so, in my structure, aside from instruction code, there are also
information about 3 operands. Of course, I could do it like:
struct disassembled_instruction
{
int instruction_code;
struct operand *op1;
struct operand *op2;
struct operand *op3;
};
... and let it be NULL there in case of absence of some operand. Nevertheless, it is still heap memory allocations.
So I did it like:
struct disassembled_instruction
{
int instruction_code;
int operands_total;
struct operand op[3];
};
Such structure requires more memory. Aside from that, 3-operand instructions are rare in x86-code, but third operand is
stored here always. However, there are no extra manipulations with memory.
But if one like to save a space by not storing third operator, it may be not stored at all: it is easy to calculate structure
size without one operatnd: sizeof(disassembled_instruction) - sizeof(struct operand) and copy it to the some
place where it must be stored. Because no one prohibits to use (and store) not the whole structure, but only its part. Besides,
the functions which works with the structure, may not touch third operand at all, and that will work correctly.
Even more than that: I made my disassembler intentionally in that way that it can take not initialized structure and may
work even if there are still some information leaved from the previous calls.
Maybe it is overkill, but you got the idea.
Thus, if you allocate small structures of known size and if speed is crucial, you may consider allocating them in the local
stack.
2.2
Strings in C
The reason why C string format is as it is (zero-terminating) is apparently hisorical. In [15] we can read:
A minor difference was that the unit of I/O was the word, not the byte, because the PDP-7 was a wordaddressed machine. In practice this meant merely that all programs dealing with character streams ignored
null characters, because null was used to pad a file to an even number of characters.
There are no features in C to handle strings like those present in higher level PLs like concatenation.
People often complain about awkward string concatenation (i.e., glueling together). Also irritating sprintf(), for which is
hard to predict how much space will need.
Strings copying with strcpy() is not easy as well one need to think how many bytes must be allocated for buffer. Aside
from that, awkward C strings is the source of huge number of vulnerabilities related to buffer overflow [19, 1.14.2].
In the first place, we should ask ourselves, which string operations we need. Concatenation (glueling) is needed for 1)
output messages to log; 2) construction of strings and then to pass (or write) them to some place.
For 1) it is possible to use streams without string construction just to output it by portions, e.g.:
28
2.2. STRINGS IN C
CHAPTER 2. C
2.2.1
Storing always string length it was done in Pascal PL implementations. Aside from holy wars outcomes between both PL
devotees, nevertheless, almost all string libraries keep current string length just because conveniences outweigh the need
of length value recalculation after each modification.
For example, strlen() 22 is not needed at all, string length is always known. String concatenation is also much faster,
because we do not need to calculate length of the first string. The function of strings comparing may just compare string
lengths at the beginning and if they are not equals to each other, return false without starting to compare characters in the
strings.
In the network libraries of Oracle RDBMS, to the various string functions often passed string with its length, as separate
argument 23 . Not very sthetical, looks redundant, but very useful. For example, we have a function, which needs to know,
which string was passed to it:
14 https://developer.gnome.org/glib/
15 https://github.com/GNOME/glib/blob/master/glib/gstring.h
16 https://github.com/GNOME/glib/blob/master/glib/gstring.c
17 https://github.com/git/git/blob/master/strbuf.h
18 https://github.com/git/git/blob/master/strbuf.c
19 http://docs.oracle.com/javase/7/docs/api/java/nio/Buffer.html
20 http://docs.oracle.com/javase/7/docs/api/java/nio/ByteBuffer.html
21 http://docs.oracle.com/javase/7/docs/api/java/nio/CharBuffer.html
22 string length calculation
23 http://blog.yurichev.com/node/64
29
2.2. STRINGS IN C
CHAPTER 2. C
30
2.2. STRINGS IN C
2.2.2
CHAPTER 2. C
String returning
2.2.3
2.2.4
That is how asctime() it does. Keep in mind that string should be used before each subsequent call to asctime().
For example, this is correct:
printf("date1: %s\n", asctime(&date1));
printf("date2: %s\n", asctime(&date2));
31
2.2. STRINGS IN C
This is not:
CHAPTER 2. C
char *date1=asctime(&date1);
char *date2=asctime(&date2);
printf("date1: %s\n", date1);
printf("date2: %s\n", date2);
... because date1 and date2 pointers will point to one place and printf() output will be the same.
In hex.c of git24 we may find this:
char *sha1_to_hex(const unsigned char *sha1)
{
static int bufno;
static char hexbuffer[4][50];
static const char hex[] = "0123456789abcdef";
char *buffer = hexbuffer[3 & ++bufno], *buf = buffer;
int i;
for (i = 0; i < 20; i++)
unsigned int val
*buf++ = hex[val
*buf++ = hex[val
}
*buf = \0;
{
= *sha1++;
>> 4];
& 0xf];
return buffer;
}
In fact, the string is returned via global variable, static declaration makes it visible only from this function. Here is a
shortcoming: after call to sha1_to_hex() you cannot call it again for the second string result before you use the first somehow,
because it will be overwritten. Apparently, in order to solve the problem, here are 4 buffers, and the string is returned each
time in the next one. It is also worth to notice it is possible to do such things if you are sure in what you do, the code is
on the dirty hack level. If you will call this function 5 times and will need to use the first string somehow, this may lead to
hard-to-find bug.
You may also notice that bufno is not initialized, because only 2 lower bits are used, aside from that, it is not important at
all, which value it will hold at the program start.
2.2.5
Some functions like getcwd() not only filling the buffer, but also returns a pointer to it. It is made for the situations, where it
is more compact to write something like:
char buf[256];
do_something (getcwd (buf, sizeof(buf)));
... instead of:
char buf[256];
getcwd (buf, sizeof(buf))
do_something (buf);
32
2.2. STRINGS IN C
atoi(), atof(), strtod(), strtof()
CHAPTER 2. C
atoi()/atof() cannot signal an error, but strtod()/strtof() while doing the same thing can signal.
scanf(), fscanf(), sscanf()
A well-known holy-war, is text files are better than binary files or otherwise. It is easier and faster to process binary files,
however, text files are easier to view and edit in any text editor, beside, UNIX has a lot of utilities for text and strings processing.
But text files must be parsed.
scanf() function [6, 7.19.6/2] of course, does not support regular expressions, however, some simple sequences can be
parsed by it.
Example #1
MemTotal:
MemFree:
Buffers:
Cached:
SwapCached:
...
kB
kB
kB
kB
kB
Lets consider, we need to get first and third numbers, ignoring second and rest. That is how it can be done:
void read_proc_meminfo()
{
FILE *f=fopen("/proc/meminfo", "r");
assert(f);
unsigned result1, result2;
if (fscanf (f, "MemTotal:\t%d kB\n"
"MemFree:\t%*d kB\n"
"Buffers:\t%d kB\n",
&result1, &result2)==2)
printf ("results: %d %d\n", result1, result2);
fclose(f);
};
The format string is defined in three lines, it is one in fact: (1.2.2). N.B. The newline is defined as \n.
* in the scanf-string modifier pointing out that the number will be read, but will not be stored. Thus, the field is being
ignored. scanf()-functions are returning not a number of fields read (3 will be here), buf number of fields stored (2 will be
here).
Example #2 There a text file containing key-value pairs in each string:
some_param1=some_value
some_param2=Lazy fox etc etc.
param3=Lorem Ipsum etc etc.
space here=value containing space
too long param, we should fail here=value
We should just read two fields:
int main(int argc, char *argv[])
{
assert(argc==2);
assert(argv[1]);
FILE *f=fopen (argv[1], "r");
assert(f);
int line=1;
do
{
char param[16];
char value[60];
if (fscanf (f, "%16[^=]=%60[^\n]\n", param, value)==2)
33
2.2. STRINGS IN C
CHAPTER 2. C
{
printf ("param=%s\n", param);
printf ("value=%s\n", value);
}
else
{
printf ("error at line %d\n", line);
return 0;
};
line++;
} while (!feof(f) && !ferror(f));
};
%16[=] is somewhat looks like regular expression. Meaning, to read any 16 characters, except equal (=) sign. Then
we point to scanf() that there must be this sign (=). Then let him to read any 60 characters. We read newline character at the
end.
This works, and field lengths are limited to 16 and 60 characters. That is why error predictabily occuring on the fifth string,
because it has larger length of parameter (first field).
Thus it is possible to parse simple file formats, even CSV25 .
However, it should be noted that scanf()-functions are not able to read empty string where %s modifier is defined. Thus
it is not possible to parse a key-value file with absent keys or values.
Caveat #1 scanf() treat %d modifier in the format string as 32-bit int on both x86 and x64 CPUs.
It is a common mistake to write:
char a[10];
scanf ("%d %d %d %d", &a[0], &a[1], &a[2], &[3]);
Characters (or bytes) are placed adjacently to each other. When scanf() will process first value, it will treat it as 32-bit int
and overwrite other 3 located near. And so on.
strspn(), strcspn()
strspn() is often used to get to be sure that a string has only characters from the list we defined:
if (strspn(s, "1234567890") == strlen(s)) ... OK
...
if (strspn(IPv4, "1234567890.") == strlen(IPv4)) ... OK
...
if (strspn(IPv6, "0123456789AaBbCcDdEeFf:.") == strlen(IPv6)) ... OK
Or to skip a begin of a string:
const char *whitespaces = " \n\r\t";
*buf += strspn(*buf, whitespaces); // skip whitespaces at start
strcspn() is inverse function, it can be used for skipping all symbols at the string beginning, which are not defined in a
set:
s += strcspn(s, whitespaces); // first, skip anything till whitespaces
s += strspn(s, whitespaces); // then skip whitespaces
// here s is pointing to the part of string after whitespaces
34
2.2.6
CHAPTER 2. C
Unicode
2.2.7
Lists of strings
The simplest list of strings is just a strings set ending with the zero. For example, in Windows API, in the Common Dialogs
library, thus 28 a list of available file extensions for dialog box are passed:
// Initialize OPENFILENAME
ZeroMemory(&ofn, sizeof(ofn));
...
ofn.lpstrFilter = "All\0*.*\0Text\0*.TXT\0";
...
// Display the Open dialog box.
if (GetOpenFileName(&ofn)==TRUE)
...
2.3
2.3.1
Lists in C
Lists are linked set of elements. Singly-linked list it is when each element has pointer to the next one. Doubly-linked list
is when each element has pointers to the both previous element and the next one.
In comparison with arrays, one significant advantage is ease of new element adding at the random place. As disadvantages: list supporting data structures consumes some memory overhead, and also it is not possible to index a list as an array.
Singly-linked list
Simplest to implement. In the structure intended for linking into a list, it is enough just to add somewhere a link to the next
element, usually this field called next:
27 Simultenous builds with Unicode and without were popular in the time of popularity of both Windows NT/2000/XP and Windows 95/98/ME lines. Unicode support in the second was not very good
28 http://msdn.microsoft.com/en-us/library/windows/desktop/ms646829(v=vs.85).aspx
35
CHAPTER 2. C
struct some_object
{
...
...
struct some_object* next;
};
NULL in the next meaning that the element is last in the list.
Elements enumerating in this list is straightforward:
for (struct some_object *i=list; i!=NULL; i=i->next)
...
One need first to find the last element:
for (struct some_object *i=list; i->next!=NULL; i=i->next);
struct some_object *last_element=i;
... and then, after creating new structure, store the pointer to it in next:
struct some_object *new_object=calloc(1, sizeof(struct some_object));
// populate new_object with data
last_element->next=new_object;
calloc() is different from the malloc() in that sense that all allocated space will be cleared and consequently, there will be
NULL in the next field 29 .
Searching for the needed element is just enumerating all elements in the list with comparing them with the sought-for
element until it is found.
Element deletion: find previous the element and the next, set the next pointer in the previous element to the next one,
then free memory block allocated for the current element.
The very first list element is also called list head. The very first element structure can be declared as a local or global
variable. But it will be harder to delete first element. On the other hand, it is possible to declare the pointer to the first list
element, then it will be easier to assign other element to this pointer (which will be first).
Doubly-linked list
Almost the same, but, aside from the pointer to a next element, also pointer to a previous element is stored. If the element
is first, the pointer to the previous element may be NULL, or it may point to itself (whatever you like).
When working with doubly-linked list, it is easier to find previous elements, e.g., in case of element deletion. It is easier
to enumerate elements backwards from the end of the list. But the memory overhead is slightly larger.
Often, doubly-linked list is also circular, i.e., the first and the last elements are pointing to each other. For example, that is
how it is done in std::list in C++ STL [19, 2.4.2]. This simplifying searching of the last element (one do not need to enumerate
all elements).
Windows API
Here, and also in a lot of places of Windows kernel, two primitive data structures are used:
typedef struct _LIST_ENTRY {
struct _LIST_ENTRY *Flink;
struct _LIST_ENTRY *Blink;
} LIST_ENTRY, *PLIST_ENTRY, *RESTRICTED_POINTER PRLIST_ENTRY;
typedef struct _SINGLE_LIST_ENTRY {
struct _SINGLE_LIST_ENTRY *Next;
} SINGLE_LIST_ENTRY, *PSINGLE_LIST_ENTRY;
These structures are not intended for independent use, but rather they are intended for embedding into another structures. For example, we need to unite a color-describing structure into a list:
29 More about structures initialization, read here:
(2.4.1).
36
CHAPTER 2. C
struct color
{
int R;
int G;
int B;
LIST_ENTRY list;
};
Now we have a pointers to the both next and previous elements. There is a small API present in Windows API using these
structures 30 .
Linux
Doubly-linked list routines in Linux kernel are declared in the file /include/linux/list.h 31 .
It is heavily used there, in the kernel version 3.12 there are at least 2900 references to struct list_head.
Glib
One might ask, is not it possible to declare a perticular structure for the list element, and not to embed it to own structures?
Yes, for example, that is how it is done in glist.h 32 in Glib:
struct _GList
{
gpointer data;
GList *next;
GList *prev;
};
data may point to any object you like, to any existing structure in which you want not to change anything, this is also
called opaque pointer. Of course, sthetically it is better. But one should remember that there will be two allocated memory block for each element of list + memory overhead for supporting allocated blocks in heap(2.1.3).
Thus, this approach is acceptable if memory footprint is not important.
2.3.2
Binary trees in C
Binary trees are the one of the most important structures in computer science. Most often these are used for key-values
pairs storage. This is what implemented in std::map in C++ STL33 .
Simply speaking, in comparison with lists, trees offers much faster selection. On the other hand, element insertion may
be slower.
There are no C standard functions working with a trees, but some things are present in POSIX34 (tsearch(), twalk(), tfind(),
tdelete()) 35 .
This family of functions are used actively in the Bash 4.2, BIND 9.9.1, GCC it can be seen there how it can be used.
The Glib also has the tree functions declared in the gtree.h 36 .
The set (std::set in C++ STL) can be implemented as binary trees as well, one may just not to store value and store only
key.
2.3.3
Data structures related to collections may also contain pointers to the functions working with elements, like comparison
functions, copying, etc.
For example in GTree in Glib:
30 http://msdn.microsoft.com/en-us/library/windows/hardware/ff563802(v=vs.85).aspx
31 http://lxr.free-electrons.com/source/include/linux/list.h
32 https://github.com/GNOME/glib/blob/master/glib/glist.h
https://developer.gnome.org/glib/2.37/
glib-Doubly-Linked-Lists.html
33 Standard Template Library
34 Portable Operating System Interface
35 http://pubs.opengroup.org/onlinepubs/009696799/functions/tsearch.html
36 https://github.com/GNOME/glib/blob/master/glib/gtree.h
37
CHAPTER 2. C
Listing 2.1: gtree.c
struct _GTree
{
GTreeNode
*root;
GCompareDataFunc key_compare;
GDestroyNotify
key_destroy_func;
GDestroyNotify
value_destroy_func;
gpointer
key_compare_data;
guint
nnodes;
gint
ref_count;
};
By setting the functions for key/value comparison and also deallocator function (in g_tree_new_full()), tree functions
in Glib will be able to compare two trees or to free a tree on its own.
2.4
Object-oriented programming in C
Of course, there are no OOP support in C, it is present in C++, nevertheless, it is possible to program in OOP style in pure C.
OOP, in short, is a separation to object and methods. In C, structures are easily can be represented as objects, and usual
functions as methods.
2.4.1
Structures initialization
C++ has class constructors. If one need to initialize structure in some special way, one would write a special function for it in
C as well. But if it simple structure, it is possible to initialize it with calloc() 37 or bzero()(2.5.3).
All int-variables are set to zeroes. Zero value bool in C99(2.5.10) and C++ is false, same as BOOL in Windows API. All
pointers are set to NULL. And even floating point 0.0 in IEEE 754 format is zero bits in all positions.
If structure has pointers to another structures, NULL can mean object absence.
Among other things, not initialized global variables are also zeroed [6, 6.7.8.10].
Initialization is important thing. It is very hard to catch a bug related to not initialized variables accesses. Compiler will
not warn if to use a structure field without initialization.
2.4.2
Structures deinitialization
If structure has pointers to another structures, they are also must be freed. In simple case, it is just a call to free(). By the way,
that is why NULL is valid argument for free(), it allows to write free(s->field) instead of if (s->field) free(s->field),
that is shorter.
2.4.3
Structures copying
If the structure is simple, it is possible to copy it with a call to memcpy()(2.5.2). If to copy structures having pointers to
another structures in this manner, it will be called shallow copy38 . And in opposite, deep copy is copying a structure with
all connected to it structures (is slower).
That is why it may be more convenient to store a string in the structure as an fixed-size array of characters. For example,
a lot of such cases in the Windows API. Such structure is easier to copy, it requires smaller memory overhead in the heap. On
the other hand, we should accept string length fixedness.
Aside from that, a structure can be copied just as: s1=s2 the code generated will copy each structure filed. Perhaps it
is easier to read than a call to memcpy() at the same place.
2.4.4
Encapsulation
C++ offers encapsulation (information hiding). For example, you cannot write a program which modify a protected class field,
this is a compile-stage protection [19, 1.7.3].
There are no such thing in C, it requires more discipline.
However, it is possible to protect a structure from prying eyes. For example, Glib has a library intended for work with
trees. In the header file gtree.h39 there are no declaration of the structure (it is present only in the gtree.c 40 ), there are only
37 It is the same as the malloc() + allocated memory filling with zeroes
38 https://en.wikipedia.org/wiki/Object_copy
39 https://github.com/GNOME/glib/blob/master/glib/gtree.h
40 https://github.com/GNOME/glib/blob/master/glib/gtree.c
38
2.5
C standard library
2.5.1
assert
This macro is commonly used for validating 41 of input values. For eample, if you have a function working with data, you
probably may want to add that code to the begin: assert (month>=1 && month<=12).
Here is what one should remember: standard assert() macro is available only in debug builds. In a release build, where
NDEBUG is defined, all statements are disappearing. That is why it is not correct to write assert(f=malloc(...)). However, you
may want to write something like assert(object->get_something()==123).
Error messages also can be embedded in an assert statements: you will see it if expression will not be true. For example,
in the LLVM42 source code we may find this:
assert(Index < Length && "Invalid index!");
...
assert(i + Count <= M && "Invalid source range");
...
assert(j + Count <= N && "Invalid dest range");
Text string has const char* type and it is never NULL. Thus it is possible to add ... && true to any expression without
changing its sense.
assert() macro can also be used for documenting purposes.
For example:
Listing 2.2: GNU Chess
int my_random_int(int n) {
int r;
ASSERT(n>0);
r = int(floor(my_random_double()*double(n)));
ASSERT(r>=0&&r<n);
return r;
}
By reading the code we can quickly see legal values of the and variables.
assert-s are also called active comments [11].
41 invariant and sanitization terms are also used
42 http://llvm.org/
39
2.5.2
CHAPTER 2. C
UNIX time
In UNIX-environment UNIX time representation is very popular. It is just a 32-bit number, counting number of seconds passing
from the 1st January 1970.
On a positive side: 1) 32-bit number is easy to store; 2) date difference is calculated easily; 3) it is not possible to encode
incorrect date and time, like 32th January, 29th February of non-leap year, 25 hours 62 minutes.
On a negative side: 1) it is not possible to encode a date before year 1970.
If to use UNIX time format in modern time, it is worth to remember that it has expiration date in year 2038, that will be
a year when a 32-bit number will overflow, i.e., 232 seconds will pass since year 1970. So, 64-bit value should be used instead,
i.e., time64.
2.5.3
memcpy()
It is hard to memorize arguments order in the functions memcpy(), strcpy() at first. It can be easier to memorize if to visualize
= (equal) sign between arguments.
2.5.4
bzero() is a function which just fills memory block to zeroes. The memset() is often used in the place. But the memset()
has unpleasant feature, it is easy to reverse the second and the third arguments and the compiler will be silent because the
filling byte is specified by int type.
Aside from that bzero() function name is easier to read.
On the other hand, it is absent in C standard, however, it is present in POSIX.
In the Windows API there are present ZeroMemory() 43 for the same purpose.
2.5.5
printf()
";
40
CHAPTER 2. C
It is often irritating when it is logical to pass to printf(), lets say, a structure describing complex number, or a color encoded
as 3 int numbers as a single entity.
In C++ this problem is usually solved by definition operator in ostream for the own type, or by a method definition
named ToString() (3.4).
In printk() (printf-like function in Linux kernel) there are additional modifiers exist 44 , like %pM (Mac-address), %pI4 (IPv4address), %pUb (UUID45 /GUID46 ).
In GNU Multiple Precision Arithmetic Library there are gmp_printf() 47 function having non-standard modifiers for BigIntnumbers outputting.
In the Plan9 OS, and in Go compiler source code, we may find fmtinstall() function for a new printf-string modifier definition,
e.g.:
Listing 2.3: go\src\cmd\5c\list.c
void
listinit(void)
{
fmtinstall(A,
fmtinstall(P,
fmtinstall(S,
fmtinstall(N,
fmtinstall(B,
fmtinstall(D,
fmtinstall(R,
Aconv);
Pconv);
Sconv);
Nconv);
Bconv);
Dconv);
Rconv);
}
...
int
Pconv(Fmt *fp)
{
char str[STRINGSZ], sc[20];
Prog *p;
int a, s;
p = va_arg(fp->args, Prog*);
a = p->as;
s = p->scond;
strcpy(sc, extra[s & C_SCOND]);
if(s & C_SBIT)
strcat(sc, ".S");
if(s & C_PBIT)
strcat(sc, ".P");
if(s & C_WBIT)
strcat(sc, ".W");
if(s & C_UBIT)
/* ambiguous with FBIT */
strcat(sc, ".U");
if(a == AMOVM) {
if(p->from.type == D_CONST)
sprint(str, "
%A%s
%R,%D", a, sc, &p->from, &p->to);
else
if(p->to.type == D_CONST)
sprint(str, "
%A%s
%D,%R", a, sc, &p->from, &p->to);
else
sprint(str, "
%A%s
%D,%D", a, sc, &p->from, &p->to);
44 http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/printk-formats.txt
45 Universally unique identifier
46 Globally Unique Identifier
47 http://gmplib.org/manual/Formatted-Output-Strings.html
41
CHAPTER 2. C
(http://plan9.bell-labs.com/sources/plan9/sys/src/cmd/5c/list.c)
The Pconv() will be called if %P modifier in the format string will be met. Then it copies the string created using fmtstrcpy().
< HEAD By the way, that function also uses other defined modifiers like %A, %D, etc.
======= By the way, that function also uses other defined modifiers like %A, %D, etc.
> 089062bad49cdc4f5897b28772d7e62212d0235f
The glibc has non-standard extension 48 , allowing to define our own modifiers, but it is deprecated.
Lets try to define our own modifiers for Mac-address outputting and also for byte outputting in a binary form:
#include <stdio.h>
#include <stdint.h>
#include <printf.h>
static int printf_arginfo_M(const struct printf_info *info, size_t n, int *argtypes)
{
if (n > 0)
argtypes[0] = PA_POINTER;
return 1;
}
static int printf_output_M(FILE *stream, const struct printf_info *info, const void *const *args)
{
const unsigned char *mac;
int len;
mac = *(unsigned char **)(args[0]);
len = fprintf(stream, "%02x:%02x:%02x:%02x:%02x:%02x",
mac[0], mac[1], mac[2], mac[3], mac[4], mac[5]);
return len;
}
static int printf_arginfo_B(const struct printf_info *info, size_t n, int *argtypes)
{
if (n > 0)
argtypes[0] = PA_POINTER;
return 1;
}
static int printf_output_B(FILE *stream, const struct printf_info *info, const void *const *args)
{
48 http://www.gnu.org/software/libc/manual/html_node/Customizing-Printf.html
42
CHAPTER 2. C
(declared at /usr/include/printf.h
(declared at /usr/include/printf.h
format [-Wformat]
format [-Wformat]
GCC is able to track accordance between modifiers in the printf-string and arguments in printf(), however, unfamiliar to
it modifiers are present here, so it warns us about them.
Nevertheless, our program works:
$ ./a.out
00:11:22:33:44:55
10101011
2.5.6
atexit()
With the help of atexit() it is possible to add a function automatically called before each exit from your program. By the
way, C++ programs use atexit() for adding global objects destructors.
Lets try to see:
#include <string>
std::string s="test";
int main()
{
};
In the assembly listing we will find constructor of the global object:
Listing 2.4: MSVC 2010
??__Es@@YAXXZ PROC
; Line 3
push
ebp
mov
ebp, esp
49 The base of example was taken from:
http://codingrelic.geekhold.com/2008/12/printf-acular.html
43
2.5.7
bsearch(), lfind()
44
*ptr;
45
CHAPTER 2. C
2.5.8
setjmp(), longjmp()
2.5.9
stdarg.h
There are functions intended for variable arguments handling. At least the functions of the printf() and scanf() family are
these.
46
CHAPTER 2. C
The variable of the type va_list may be used only once, if one need more, it should be copied:
va_list v1, v2;
va_start(v1, fmt);
va_copy(v2, v1);
// use v1
// use v2
va_end(v2);
va_end(v1);
2.5.10
PRNG53 from the standard library has a very poor quality, besides, it is able to generate numbers only in 0..32767 limits. Avoid
it.
2.6
C99 C standard
47
CHAPTER 3. C++
Chapter 3
C++
3.1
Name mangling
In the C a underscore symbol is to be prepended before each function name, so function may in fact have the following
name in object file: _function.
The C++ has operator overloading, so, several functions may share one name but different types. On the other hand, OS
are not aware of C++ and works with plain function names (or symbols). As a consequence, there is a need to encode function name, argument types, return value type, and probably a class name into one line.
For example, that is how the box class constructor defined as:
box::box(int color, int width, int height, int depth)
... In the MSVC conventions will have the following name: ??0box@@QAE@HHHH@Z for example, four consequtive H characters means for consequtive argument types int.
This is what called name mangling.
That is why header files may contain extern "C":
#ifdef __cplusplus
extern "C" {
#endif
void foo(int a, int b);
#ifdef
}
#endif
__cplusplus
This mean that foo() is written in C, compiled as C function and will have the following name in object files: _foo.
If to include that header in the C++ project, the compiler will treat its internal name as _foo. Without this directive, the
compiler will look for the function named ?foo@@YAXHH@Z.
Therefore, this directive is needed for linking C libraries to C++-projects.
And ifdef makes this directive visible only in C++.
More about name mangling: [19, 1.17].
3.2
C++ declarations
3.2.1
C++11: auto
When using STL, sometimes it is very boring to declare the type of iterator like:
for (std::list<int>::iterator it=list.begin(); it!=list.end(); it++)
48
3.3
3.3.1
references
It is the same thing as pointers (1.3.7), but safe, because it is harder to make a mistake while working with them [8, 8.3.2].
For example, reference should be always be pointing to the object of corresponding type and cant be NULL [2, 8.6]. Even
more, reference cannot be changed, its not possible to point to to another object (reseat) [2, 8.5].
In [19, 1.7.1] it was demonstrated that on x86-level of code it is the same thing.
Just like pointers, references can be returned by functions, e.g.:
#include <iostream>
int& use_count()
{
static int uc=1000; // starting value
return uc;
};
void main()
{
std::cout
std::cout
std::cout
std::cout
};
<<
<<
<<
<<
++use_count()
++use_count()
++use_count()
++use_count()
<<
<<
<<
<<
std::endl;
std::endl;
std::endl;
std::endl;
3.4
Input/output
It is often necessary to output whole data structures into ostream while it is not handy to do output by each field. Sometimes
it can be solved by adding the ToString() method to a class. Other solution is to make a free function for outputing like:
ostream& operator<< (ostream &out, const Object &in)
{
out << "Object. size=" << in.size << " value=" << in.value << " ";
return out;
};
Now it is possible to send objects right into ostream:
Object o1, o2;
...
cout << "o1=" << o1 << " o2=" << o2 << endl;
For the functions ability to access any class fields, it can be marked as friend. However, there is a point of view about not
making them as friend for encapsulation reasons [12, Item 23 Prefer non-member non-friend functions to member functions].
49
3.5. TEMPLATES
3.5
CHAPTER 3. C++
Templates
Templates are usually necessary in order to make a class universal for several data types. For example, std::string is in
fact std::basic_string<char>,
and std::wstring is std::basic_string<wchar_t>.
It is often done for data types like float/double/complex and even int. Some mathematical algorithm can be defined only
once, but be compiled in several versions for all these data types.
Thus it is possible to define algorithms only once, but they will work for several data types.
Simplest examples are the max, min, swap functions working for any type, variables of which can be compared and assigned. Then you may want to write your own BigInt implementation, and if there is a comparison operator (operator<) is
present, then written earlier max/min will work for the new class as well.
That is why lists and other containers in the STL are exactly templates: it can be said, they embedds the possibility of
be united into list or collection to your class.
3.6
3.7
Criticism
http://yosefk.com/c++fqa/
Linus Torvalds: http://harmful.cat-v.org/software/c++/linus; http://yarchive.net/comp/linux/c++.html
50
CHAPTER 4. OTHER
Chapter 4
Other
What is stored in the binary (.o, .obj, .exe, .dll, .so) files?
Usually it is only data (global variables) and function bodies (including class methods).
There are no type information (classes, structures, typedefs(1.2.1)) there. This may helps in understanding how it works
internally.
See also about name mangling: ( 3).
That is one of the serious decompilation problems type information absence.
Read more about how everything is compiled into machine code: [19].
4.1
The simplest way to indicate to caller about success is to return boolean value, false in case of error, and true in case of
success. A lot of such functions are present in the Windows API. And if one need to return more information, the error code
may be left in TIB1 , from where it is possible to get it using GetLastError(). Or, in the UNIX-enviroments, to leave the error code
in the global variable errno.
4.1.1
Another interesting approach to pass more information in returning value. For example, in the manuals of the IBM DB2 9.1,
we may spot this:
Regardless of whether the application program provides an SQLCA or a stand-alone variable, SQLCODE
is set by DB2 after each SQL statement is executed. DB2 conforms to the ISO/ANSI SQL standard as follows:
If SQLCODE = 0, execution was successful.
If SQLCODE > 0, execution was successful with a warning.
If SQLCODE < 0, execution was not successful.
SQLCODE = 100, "no data" was found. For example, a FETCH statement returned no data because the cursor
was positioned after the last row of the result table.
23
51
CHAPTER 4. OTHER
if (!csize)
return 0;
vaddr = ioremap(pfn << PAGE_SHIFT, PAGE_SIZE);
if (!vaddr)
return -ENOMEM;
if (userbuf) {
if (copy_to_user(buf, vaddr + offset, csize)) {
iounmap(vaddr);
return -EFAULT;
}
} else {
memcpy(buf, vaddr + offset, csize);
}
iounmap(vaddr);
return csize;
}
Take a look a function many return both number of bytes and the error code. The ssize_t type is signed size_t, i.e.,
able to store negative values. ENOMEM and EFAULT are standard error codes from the errno.h.
4.2
Global variables
OOP5 hype and other such things tells us that global variables is a bad thing, nevertheless, sometimes it is worth to use it
(keeping in mind thread-awareness), e.g. for returning large amount of information from functions.
Thus, several C standard library functions are returned error code via global variable errno, which is not a global anymore
in our time, but is stored in the TLS.
In Windows API the error code can be determined by calling the GetLastError(), which just takes a value from the TIB.
In the OpenWatcom compiler everything is stored in the global variables, so the very main function looks like:
Listing 4.2: bld\cg\c\generate.c
extern void
Generate( bool routine_done )
/*******************************************/
/* The big one - heres where most of code generation happens.
* Follow this routine to see the transformation of code unfold.
*/
{
if( BGInInline() ) return;
HaveLiveInfo = FALSE;
HaveDominatorInfo = FALSE;
#if ( _TARGET & ( _TARG_370 | _TARG_RISC ) ) == 0
/* if we want to go fast, generate statement at a time */
if( _IsModel( NO_OPTIMIZATION ) ) {
if( !BlockByBlock ) {
InitStackMap();
5 Object-Oriented Programming
52
CHAPTER 4. OTHER
4.3
CHAPTER 4. OTHER
Bit fields
_A_NORMAL
_A_RDONLY
_A_HIDDEN
_A_SYSTEM
0x00
0x01
0x02
0x04
54
CHAPTER 4. OTHER
Of course, it would not be very compact to pass each attribute by a bool variable.
Contrariwise, it is possible to use bit fields for passing flags into the function. For example, CreateFile() 6 from Windows
API.
For flags specifying, in order not to make a typo and mess, they can be defined as:
#define
#define
#define
#define
#define
FLAG1
FLAG2
FLAG3
FLAG4
FLAG5
(1<<0)
(1<<1)
(1<<2)
(1<<3)
(1<<4)
On the other hand, one need to keep in mind that operation of isolation of each bit in the value of type int is usually costly
for the CPU then bool type processing in 32-bit register . So if the speed is more crucial for you then memory footprint, you
may try to use bool.
4.4
4.4.1
Go Compiler http://golang.org/doc/install/source
Git https://github.com/git/git
4.4.2
C++
LLVM http://llvm.org/releases/download.html
Google Chrome http://www.chromium.org/developers/how-tos/get-the-code
6 http://msdn.microsoft.com/en-us/library/windows/desktop/aa363858(v=vs.85).aspx
55
Chapter 5
GNU tools
5.1
gcov
Compile it as (-g means adding debug information to the resulting executable file, -O0 absence of code optimization
, the rest gcov parameters):
0:Source:gcov_test.c
0:Graph:gcov_test.gcno
0:Data:gcov_test.gcda
1 It is very important because generated CPU instructions should be grouped and match C/C++ code lines. Optimization may distort this relation and
gcov (and also gdb) will not be able to show source code lines correctly.
56
5.1. GCOV
-:
-:
-:
-:
1:
-:
-:
-:
1:
-:
#####:
1:
-:
-:
101:
20100:
-:
20000:
-:
-:
-:
1:
-:
1:
-:
-:
Strings marked as ##### was not executed. This can be particularly useful for tests writing.
57
CHAPTER 6. TESTING
Chapter 6
Testing
Testing is crucial. Simplest possible test is a program calling your functions and comparing their results with correct ones:
void should_be_true(bool a)
{
if (a==false)
die ("one of tests failed\n");
};
int main()
{
should_be_true(f1(...)==correct_value1);
should_be_true(f2(...)==correct_value2);
should_be_true(f3(...)==correct_value3);
};
Tests should be work automatically (without human intervention) and be running as frequently as possible, ideally after
each code change.
For tests writing, gcov (5) or any other coverage tool is very useful. Good test should test correctness of all functions, but
also of all function parts.
Other testing articles and advices: [13].
58
AFTERWORD
Afterword
6.1
Questions?
59
ACRONYMS USED
Acronyms used
STL Standard Template Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
TLS Thread Local Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
TIB Thread Information Block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
RAII Resource Acquisition Is Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
OS Operating System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
PL Programming Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
OOP Object-Oriented Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
PRNG Pseudorandom number generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
MSVC Microsoft Visual C++ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
GCC GNU Compiler Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
POSIX Portable Operating System Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
CPU Central Processor Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
IDE Integrated development environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
RISC Reduced instruction set computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
IOCCC The International Obfuscated C Code Contest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
GUID Globally Unique Identifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
UUID Universally unique identifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
CRT C runtime library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
CSV Comma-separated values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
CPU Central processing unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
60
BIBLIOGRAPHY
Bibliography
[1] blexim. Basic integer overflows. Phrack, 2002. Also available as http://yurichev.com/mirrors/phrack/p60-0x0a.
txt.
[2] Marshall Cline. C++ faq. Also available as http://www.parashift.com/c++-faq-lite/index.html.
[3] E. Dijkstra. Classics in software engineering. chapter Go to statement considered harmful, pages 2733. Yourdon Press,
Upper Saddle River, NJ, USA, 1979.
[4] Edsger W. Dijkstra. Letters to the editor: go to statement considered harmful. Commun. ACM, 11(3):147148, March 1968.
[5] Agner Fog. Optimizing software in C++. 2013. http://agner.org/optimize/optimizing_cpp.pdf.
[6] ISO. ISO/IEC 9899:TC3 (C C99 standard). 2007. Also available as http://www.open-std.org/jtc1/sc22/WG14/www/
docs/n1256.pdf.
[7] ISO. ISO/IEC 9899:2011 (C C11 standard). 2011. Also available as http://www.open-std.org/jtc1/sc22/wg14/www/
docs/n1539.pdf.
[8] ISO. ISO/IEC 14882:2011 (C++ 11 standard). 2013. Also available as http://www.open-std.org/jtc1/sc22/wg21/
docs/papers/2013/n3690.pdf.
[9] Donald E. Knuth. Structured programming with go to statements. ACM Comput. Surv., 6(4):261301, December 1974.
Also available as http://yurichev.com/mirrors/KnuthStructuredProgrammingGoTo.pdf.
[10] Donald E. Knuth. Computer literacy bookshops interview, 1993. Also available as http://yurichev.com/mirrors/C/
knuth-interview1993.txt.
[11] John Lakos.
Large-Scale C++ Software Design.
Large-Scale-Software-Design-John-Lakos/dp/0201633620.
1996.
http://www.amazon.com/
[12] Scott Meyers. Effective C++: 55 Specific Ways to Improve Your Programs and Designs (3rd Edition). 2005. http://www.
amazon.com/Effective-Specific-Improve-Programs-Designs/dp/0321334876.
[13] Nicholas Nethercote. Good coding practices. Also available as http://njn.valgrind.org/good-code.html.
[14] Dennis Ritchie. about short-circuit operators. http://c-faq.com/misc/xor.dmr.html, 1995. [Online; accessed 2013].
[15] Dennis M. Ritchie. The evolution of the unix time-sharing system. 1979.
[16] Linus Torvalds.
CodingStyle.
61
Glossary
Glossary
Integral type The data type that can be converted to a number type, like: int, short, char. 4, 10, 20, 21
Iterator The pointer to the current list or any other collection element, used for an elements enumerating. 1, 7, 8, 13, 48
Free function Function which is not a method of any class. 49
BigInt This is how a libraries for multiply precision numbers crunching are usually called, e.g. http://gmplib.org/. 41, 50
glibc The Linux standard library. 18, 42
62
Index
Comma, 7
Preprocessor, 18
IN, 19
NDEBUG, 39
OPTIONAL, 19
OUT, 19
UNICODE, 35
alloca(), 24, 46
ARM, 3
asctime(), 31
assert(), 39
atexit(), 43
atof(), 33
atoi(), 33
BIND, 45
bool, 14, 55
bsearch(), 15, 44
bzero(), 38, 40
C++
bool, 3, 38
cerr, 22
cout, 22
delete, 24
new, 24
operator, 41, 49
ostream, 29
references, 49
STL, 48, 50
map, 37
set, 37
string, 50
C++03, 9
C++11, 22, 48
C99, 1, 3, 9, 14, 15, 20, 22, 47
bool, 3, 21, 38
call by reference, 11
call by value, 11
calloc(), 24, 38
char, 3, 40
const, 2
Deep copy, 38
double, 3
errno, 22, 51
exit(), 22
findfirst(), 54
float, 3
FORTRAN, 12
free(), 24, 38
fwprintf(), 35
getcwd(), 32
git, 26, 29
Glib, 29
GList, 37
GString, 29
GTree, 37, 38
GNU
gcov, 56
gdb, 56
Go, 41
goto, 6, 46
IBM DB2, 51
IEEE 754, 3, 38
if(), 9
int, 3
Integer overflow, 3
iswalpha(), 35
Java, 29
lfind(), 15, 44
Linux, 7, 17, 37
printk(), 41
LISP, 18
LLVM, 3, 39
long, 3
long double, 3
long long, 3
longjmp(), 46
Magic numbers, 27
malloc(), 3, 24
memchr(), 14, 32
memcpy(), 38, 40
memmem(), 32
memset(), 40
OpenWatcom, 45, 52
Oracle RDBMS, 25, 29, 46
Pascal, 29
Plan9, 41
POSIX
tdelete(), 37
tfind(), 37
tsearch(), 37
twalk(), 37
63
INDEX
printf(), 20, 40
INDEX
xmalloc(), 26
xrealloc(), 26
qsort(), 45
RAII, 25
rand(), 47
realloc(), 24
RISC, 3
scanf(), 33
setjmp(), 46
Shallow copy, 38
short, 3
sizeof(), 10
snprintf(), 10
sprintf(), 28, 29
srand(), 47
SSE, 16
stdarg.h, 46
stderr, 22
stdint.h, 3
stdlib.h, 27
stdout, 22
strcat(), 2, 28
strchr(), 32
strcmp(), 2, 11, 14
strcpy(), 28
strcspn(), 34
stricmp(), 45
strlen(), 29
strpbrk(), 34
strspn(), 34
strstr(), 32
strtod(), 33
strtof(), 33
strtok(), 10, 34
switch(), 9
tchar.h, 35
ToString(), 41, 49
UNIX, 40
bash, 14
cat, 22
UTF-16, 19, 35
UTF-8, 35
va_list, 47
Valgrind, 27
Variable length array, 24
wchar_t, 10, 35
wcscmp(), 35
wcslen(), 35
Windows API, 36, 38, 51
BOOL, 3, 38
CreateFile(), 55
ExitProcess(), 22
GetLastError(), 52
ZeroMemory(), 40
x86-64, 4
x86-84, 3
64